Analysing Global Climate change using Python and GridDB

The understanding of climate change impacts and the associated climate extreme events at regional and local scales is of critical importance for planning and development of feasible adaptation strategies.

Climate change is without a doubt the most serious threat to humanity in the modern era. To have any hope of mitigating the harmful effects of climate change, the global mean temperature should be limited to 1.5 degrees Celsius above pre-industrial levels, according to the IPCC.

In this article, I am going to analyse how to create map charts and animations of temperature variability, by using Python and the power of GridDB.

A link to the full source code and jupyter file: https://github.com/griddbnet/Blogs/tree/analyzing-global-climate-change

The outline of the tutorial is as follows:

  1. Dataset overview
  2. Importing required libraries
  3. Loading the dataset
  4. Data Cleaning and Preprocessing
  5. Analysing with data visualization
  6. Conclusion

Prerequisites and Environment setup

This tutorial is carried out in Anaconda Navigator (Python version – 3.8.5) on Windows Operating System. The following packages need to be installed before you continue with the tutorial –

  1. Pandas

  2. NumPy

  3. plotly

  4. Matplotlib

  5. Seaborn

  6. griddb_python

You can install these packages in Conda’s virtual environment using conda install package-name. In case you are using Python directly via terminal/command prompt, pip install package-name will do the work.

GridDB installation

While loading the dataset, this tutorial will cover two methods – Using GridDB as well as Using Pandas. To access GridDB using Python, the following packages also need to be installed beforehand:

  1. GridDB C-client
  2. SWIG (Simplified Wrapper and Interface Generator)
  3. GridDB Python Client

Dataset Overview

The dataset contains information about visits to GStore (Google swag online store), each row is a unique visit and each user has a unique ‘fullVisitorId’.

Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures LandAverageTemperature: global average land temperature in celsius

https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data

Importing Required Libraries

# import griddb_python as griddb
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.express as px
import plotly.graph_objs as go
import seaborn as sns
import time
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

Loading the Dataset

Let’s proceed and load the dataset into our notebook.

Using GridDB

Toshiba GridDB™ is a highly scalable NoSQL database best suited for IoT and Big Data. The foundation of GridDB’s principles is based upon offering a versatile data store that is optimized for IoT, provides high scalability, tuned for high performance, and ensures high reliability.

To store large amounts of data, a CSV file can be cumbersome. GridDB serves as a perfect alternative as it in open-source and a highly scalable database. GridDB is a scalable, in-memory, No SQL database which makes it easier for you to store large amounts of data. If you are new to GridDB, a tutorial on reading and writing to GridDB can be useful.

Assuming that you have already set up your database, we will now write the SQL query in python to load our dataset.

factory = griddb.StoreFactory.get_instance()

Initialize the GridDB container (enter your database credentials)
try:
    gridstore = factory.get_store(host=host_name, port=your_port, 
            cluster_name=cluster_name, username=admin, 
            password=admin)

    info = griddb.ContainerInfo("GlobalTemperatures",
                    [["dt", griddb.Type.TIMESTAMP],["LandAverageTemperature", griddb.Type.DOUBLE],
                     ["LandAverageTemperatureUncertainty", griddb.Type.DOUBLE],
                     ["LandMaxTemperature", griddb.Type.DOUBLE],
                     ["LandMaxTemperatureUncertainty", griddb.Type.DOUBLE],
                     ["LandMinTemperature", griddb.Type.DOUBLE],
                     ["LandMinTemperatureUncertainty", griddb.Type.DOUBLE],
                     ["LandAndOceanAverageTemperature", griddb.Type.DOUBLE],
                     ["LandAndOceanAverageTemperatureUncertainty", griddb.Type.DOUBLE],
                     , True)
                     
    cont = gridstore.put_container(info) 
    data = pd.read_csv("GlobalTemperatures.csv")
    #Add data
    for i in range(len(data)):
        ret = cont.put(data.iloc[i, :])
        print("Data added successfully")
                     
                     
try:
    gridstore = factory.get_store(host=host_name, port=your_port, 
            cluster_name=cluster_name, username=admin, 
            password=admin)

    info = griddb.ContainerInfo("GlobalLandTemperaturesByCity",
                    [["dt", griddb.Type.TIMESTAMP],["AverageTemperature", griddb.Type.DOUBLE],
                     ["AverageTemperatureUncertainty", griddb.Type.DOUBLE],
                     ["City", griddb.Type.STRING],["Country", griddb.Type.STRING], 
                     ["Latitude", griddb.Type.STRING], ["Longitude", griddb.Type.STRING],True)
                     
    cont = gridstore.put_container(info) 
    data = pd.read_csv("GlobalLandTemperaturesByCity.csv")
    #Add data
    for i in range(len(data)):
        ret = cont.put(data.iloc[i, :])
        print("Data added successfully")
                     
                     
try:
    gridstore = factory.get_store(host=host_name, port=your_port, 
            cluster_name=cluster_name, username=admin, 
            password=admin)

    info = griddb.ContainerInfo("GlobalLandTemperaturesByState",
                    [["dt", griddb.Type.TIMESTAMP],["AverageTemperature", griddb.Type.DOUBLE],["AverageTemperatureUncertainty", griddb.Type.DOUBLE],
                     ["State", griddb.Type.STRING],["Country", griddb.Type.STRING], True)
    cont = gridstore.put_container(info) 
    data = pd.read_csv("GlobalLandTemperaturesByState.csv")
    #Add data
    for i in range(len(data)):
        ret = cont.put(data.iloc[i, :])
        print("Data added successfully")
Data added successfully

The read_sql_query function offered by the pandas library converts the data fetched into a panda data frame to make it easy for the user to work.

sql_statement1 = ('SELECT * FROM GlobalTemperatures.csv')
df1 = pd.read_sql_query(sql_statement1, cont)

sql_statement2 = ('SELECT * FROM GlobalLandTemperaturesByCity.csv')
df2 = pd.read_sql_query(sql_statement2, cont)

sql_statement3 = ('SELECT * FROM GlobalLandTemperaturesByState.csv')
df3 = pd.read_sql_query(sql_statement3, cont)

Note that the cont variable has the container information where our data is stored. Replace the credit_card_dataset with the name of your container. More info can be found in this tutorial reading and writing to GridDB.

When it comes to IoT and Big Data use cases, GridDB clearly stands out among other databases in the Relational and NoSQL space. Overall, GridDB offers multiple reliability features for mission-critical applications that require high availability and data retention.

Using pandas read_csv

We can also use Pandas’ read_csv function to load our data. Both of the above methods will lead to the same output as the data is loaded in the form of a pandas dataframe using either of the methods.

global_temp_country = pd.read_csv('GlobalLandTemperaturesByCity.csv')
global_temp_country.head()
dt AverageTemperature AverageTemperatureUncertainty City Country Latitude Longitude
0 1743-11-01 6.068 1.737 Århus Denmark 57.05N 10.33E
1 1743-12-01 NaN NaN Århus Denmark 57.05N 10.33E
2 1744-01-01 NaN NaN Århus Denmark 57.05N 10.33E
3 1744-02-01 NaN NaN Århus Denmark 57.05N 10.33E
4 1744-03-01 NaN NaN Århus Denmark 57.05N 10.33E
global_temp=pd.read_csv('GlobalTemperatures.csv')
global_temp.head()
dt LandAverageTemperature LandAverageTemperatureUncertainty LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature LandMinTemperatureUncertainty LandAndOceanAverageTemperature LandAndOceanAverageTemperatureUncertainty
0 1750-01-01 3.034 3.574 NaN NaN NaN NaN NaN NaN
1 1750-02-01 3.083 3.702 NaN NaN NaN NaN NaN NaN
2 1750-03-01 5.626 3.076 NaN NaN NaN NaN NaN NaN
3 1750-04-01 8.490 2.451 NaN NaN NaN NaN NaN NaN
4 1750-05-01 11.573 2.072 NaN NaN NaN NaN NaN NaN
GlobalTempState = pd.read_csv('GlobalLandTemperaturesByState.csv') 
GlobalTempState.head()
dt AverageTemperature AverageTemperatureUncertainty State Country
0 1855-05-01 25.544 1.171 Acre Brazil
1 1855-06-01 24.228 1.103 Acre Brazil
2 1855-07-01 24.371 1.044 Acre Brazil
3 1855-08-01 25.427 1.073 Acre Brazil
4 1855-09-01 25.675 1.014 Acre Brazil

Data Cleaning and Preprocessing

global_temp_country.isna().sum()
dt                                    0
AverageTemperature               364130
AverageTemperatureUncertainty    364130
City                                  0
Country                               0
Latitude                              0
Longitude                             0
dtype: int64
global_temp_country.dropna(axis='index',how='any',subset=['AverageTemperature'],inplace=True)
def fetch_year(date):
    return date.split('-')[0]
global_temp['years']=global_temp['dt'].apply(fetch_year)
global_temp.head()
dt LandAverageTemperature LandAverageTemperatureUncertainty LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature LandMinTemperatureUncertainty LandAndOceanAverageTemperature LandAndOceanAverageTemperatureUncertainty years
0 1750-01-01 3.034 3.574 NaN NaN NaN NaN NaN NaN 1750
1 1750-02-01 3.083 3.702 NaN NaN NaN NaN NaN NaN 1750
2 1750-03-01 5.626 3.076 NaN NaN NaN NaN NaN NaN 1750
3 1750-04-01 8.490 2.451 NaN NaN NaN NaN NaN NaN 1750
4 1750-05-01 11.573 2.072 NaN NaN NaN NaN NaN NaN 1750

Analyzing with Data Visualization

Let’s calculate the average temperature for each country

avg_temp=global_temp_country.groupby(['Country'])['AverageTemperature'].mean().to_frame().reset_index()
avg_temp
Country AverageTemperature
0 Afghanistan 13.816497
1 Albania 15.525828
2 Algeria 17.763206
3 Angola 21.759716
4 Argentina 16.999216
154 Venezuela 25.482422
155 Vietnam 24.846825
156 Yemen 25.768408
157 Zambia 20.937623
158 Zimbabwe 19.822971

159 rows × 2 columns

fig=px.choropleth(avg_temp,locations='Country',locationmode='country names',color='AverageTemperature')
fig.update_layout(title='Choropleth map of average temperature')
fig.show()

# The average temperature and Horizontal Bar sort by countries

sns.barplot(x=avg_temp.sort_values(by='AverageTemperature',ascending=False)['AverageTemperature'][0:20],y=avg_temp.sort_values(by='AverageTemperature',ascending=False)['Country'][0:20])
<AxesSubplot:xlabel='AverageTemperature', ylabel='Country'>

Is there global warming?

data=global_temp.groupby('years').agg({'LandAverageTemperature':'mean','LandAverageTemperatureUncertainty':'mean'}).reset_index()
data.head()
years LandAverageTemperature LandAverageTemperatureUncertainty
0 1750 8.719364 2.637818
1 1751 7.976143 2.781143
2 1752 5.779833 2.977000
3 1753 8.388083 3.176000
4 1754 8.469333 3.494250
data['Uncertainty top']=data['LandAverageTemperature']+data['LandAverageTemperatureUncertainty']
data['Uncertainty bottom']=data['LandAverageTemperature']-data['LandAverageTemperatureUncertainty']
fig=px.line(data,x='years',y=['LandAverageTemperature',
       'Uncertainty top', 'Uncertainty bottom'],title='Average Land Tmeperature in World')
fig.show()

The charts show that there is currently global warming. The average temperature of the Earth’s surface has reached its highest level in three centuries. Temperatures have risen at the fastest rate in the last 30 years. This concerns me; I hope that humanity will soon fully transition to ecological energy sources, which will reduce CO2. We will be in trouble if it does not happen. This chart also includes confidence intervals, indicating that temperature measurement has become more accurate in recent years.

Average temperature in each season

global_temp = global_temp[['dt', 'LandAverageTemperature']]

global_temp['dt'] = pd.to_datetime(global_temp['dt'])
global_temp['year'] = global_temp['dt'].map(lambda x: x.year)
global_temp['month'] = global_temp['dt'].map(lambda x: x.month)

def get_season(month):
    if month >= 3 and month <= 5:
        return 'spring'
    elif month >= 6 and month <= 8:
        return 'summer'
    elif month >= 9 and month <= 11:
        return 'autumn'
    else:
        return 'winter'
    
min_year = global_temp['year'].min()
max_year = global_temp['year'].max()
years = range(min_year, max_year + 1)

global_temp['season'] = global_temp['month'].apply(get_season)

spring_temps = []
summer_temps = []
autumn_temps = []
winter_temps = []

for year in years:
    curr_years_data = global_temp[global_temp['year'] == year]
    spring_temps.append(curr_years_data[curr_years_data['season'] == 'spring']['LandAverageTemperature'].mean())
    summer_temps.append(curr_years_data[curr_years_data['season'] == 'summer']['LandAverageTemperature'].mean())
    autumn_temps.append(curr_years_data[curr_years_data['season'] == 'autumn']['LandAverageTemperature'].mean())
    winter_temps.append(curr_years_data[curr_years_data['season'] == 'winter']['LandAverageTemperature'].mean())
sns.set(style="whitegrid")
sns.set_color_codes("pastel")
f, ax = plt.subplots(figsize=(10, 6))

plt.plot(years, summer_temps, label='Summers average temperature', color='orange')
plt.plot(years, autumn_temps, label='Autumns average temperature', color='r')
plt.plot(years, spring_temps, label='Springs average temperature', color='g')
plt.plot(years, winter_temps, label='Winters average temperature', color='b')

plt.xlim(min_year, max_year)

ax.set_ylabel('Average temperature')
ax.set_xlabel('Year')
ax.set_title('Average temperature in each season')
legend = plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), frameon=True, borderpad=1, borderaxespad=1)

# Statewise scenario of average temperature.
country_state_temp = GlobalTempState.groupby(by = ['Country','State']).mean().reset_index().sort_values('AverageTemperature',ascending=False).reset_index()
country_state_temp
country_state_temp["world"] = "world" 
fig = px.treemap(country_state_temp.head(200), path=['world', 'Country','State'], values='AverageTemperature',
                  color='State',color_continuous_scale='RdGr')
fig.show()

Conclusion

In this tutorial we analysed the global climate using Python and GridDB. We examined two ways to import our data, using (1) GridDB and (2) Pandas. For large datasets, GridDB provides an excellent alternative to import data in your notebook as it is open-source and highly scalable.

If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.