Performing Analysis of Meteorological Data
Meteorological Data Analysis
by Sayantan Bhattacharyya
Language used: Python
Libraries used: Numpy, Pandas, Matplotlib, Seaborn
Data source:: Click here
Overview:
In this project, we are doing hypothesis testing on whether the dataset and trying to prove that hypothesis is correct or not. We are also doing some data cleaning techniques, Data visualization, and hypothesis testing.
The Null Hypothesis, H0 is "Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming"
The H0 means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period has increased or not.Monthly analysis has to be done for all 12
months over the 10 year period.
So basically, we have to resample our data from hourly
to monthly, then comparing the same month over the 10 year period. Then we support our
analysis with appropriate visualizations using matplotlib and seaborn library.
Step 1:
Import the required libraries.
import numpy as np import pandas as pd import matplotlib.pyplot as plt import warnings import seaborn as sbn
Step 2:
Reading the data
df = pd.read_csv("weatherHistory.csv") df
Step 3:
Describing the data
Checking for null values
df.isnull().sum()
We see that the column 'Precip Type' has 517 null type values. Now we check for non-null type values.
df.info()Step 5 :
Resampling the data(pre-processing)
new_df['Formatted Date'] = pd.to_datetime(new_df['Formatted Date'] , utc =True) new_df = new_df.set_index('Formatted Date')
resampled_df = (new_df.resample('M')).mean() # resample accroading to Month end ('M') resampled_df
Step 6:
Plotting a graph of Humidity 2006-2016
plt.figure(figsize = (15,8)) hum_plot = sbn.lineplot(y = resampled_df['Humidity'], x = resampled_df.index,data = resampled_df) hum_plot.set_xlabel("Year", fontsize = 15) hum_plot.set_ylabel("Humidity", fontsize = 15) plt.title("Humidity plot [2006-2016]")
Step 7:
Plotting a graph of Apparent Temperature(C) 2006-2016
plt.figure(figsize = (15,8)) temp_plot = sbn.lineplot(y = resampled_df['Apparent Temperature (C)'], x = resampled_df.index,data = resampled_df) temp_plot.set_xlabel("Year", fontsize = 15) temp_plot.set_ylabel("Apparent Temperature (C)", fontsize = 15) plt.title("Apparent Temperature plot [2006-2016]")
Step 8:
Plotting a graph of Temperature(C) Vs. Apparent Temperature(C) 2006-2016
plt.figure(figsize = (15,8)) tem_plot = sbn.lineplot(y = resampled_df['Temperature (C)'], x = resampled_df.index,data = resampled_df,color ='blue') tem_plot = sbn.lineplot(y = resampled_df['Apparent Temperature (C)'], x = resampled_df.index,data = resampled_df, color ='green') plt.legend(labels=[" Temperarture(C)","Apparent Temperarture(C)"]) tem_plot.set_xlabel("Year", fontsize = 15) tem_plot.set_ylabel("Temperature(C)", fontsize = 15) plt.title("Temperature Vs. Apparent Temperature [2006-2016]")
Step 9:
Plotting monthly graphs of Mean Apparent Temperature(C) 2006-2016
new_df['month'] = new_df.index.month new_df['year'] = new_df.index.year avg_data_tempreature_monthly = {} for year in range(2006,2017): for month in range(1,13): result = list(new_df.loc[(new_df['month'] == month)&(new_df['year']==year) , :]['Apparent Temperature (C)'].values) if month not in avg_data_tempreature_monthly: avg_data_tempreature_monthly[month] = [np.mean(result)] else: avg_data_tempreature_monthly[month].append(np.mean(result)) TM = pd.DataFrame(avg_data_tempreature_monthly) TM['year'] = range(2006,2017) title = {1:'Jan',2:'Feb',3:'March',4:'April',5:'May',6:'June',7:'July',8:'Aug',9:'Sep', 10:'Oct',11:'Nov',12:'Dec'} for month in range(1,13): sbn.barplot(x = TM['year'] , y = TM[month]) plt.title('Bar plot for Month :' + title[month]) plt.ylabel("Mean Apparent Temperature(C)") plt.xlabel("Year") plt.show()
Step 10:
Plotting monthly graphs of Average Humidity 2006-2016
avg_data_humidity_monthly = {} for year in range(2006,2017): for month in range(1,13): result = list(new_df.loc[(new_df['month'] == month)&(new_df['year']==year) , :]['Humidity'].values) if month not in avg_data_humidity_monthly: avg_data_humidity_monthly[month] = [np.mean(result)] else: avg_data_humidity_monthly[month].append(np.mean(result))
HM = pd.DataFrame(avg_data_humidity_monthly) HM['year'] = range(2006,2017)
for month in range(1,13): sbn.barplot(x = HM['year'] , y = HM[month]) plt.title('Bar plot for Month :' + title[month]) plt.ylabel("Average Humidity") plt.xlabel("Year") plt.show()
Step 10:
Plotting monthly graphs of Apparent Temperature(C) Vs. Humidity 2006-2016
for month in range(1,13): plt.plot(range(2006,2017),avg_data_tempreature_monthly[month] , label = 'Apparent Temperature(C)' , color = 'red') plt.plot(range(2006,2017),avg_data_humidity_monthly[month] , label = 'Humidity') plt.legend() plt.title('Apparent Temperature Vs. Humidity for Month : '+ title[month]) plt.show()
Conclusion:
From the visualization, we can see that the monthly average humidity is nearly the same from 2006-2016, but this is not the case with the monthly average apparent temperature from 2006-2016. So we can conclude that global warming is affecting the apparent temperature and not humidity.
GitHub link--
I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Exprience. Thank you www.suvenconsultants.com
































Well done bro, quite interesting and very carefully done. Keep it up.
ReplyDeleteEnriched a lot! 😊
ReplyDeleteIt's great, well organized and informative blog.
ReplyDeleteWell explained and presented.
ReplyDeleteGood work 👍
Great bro!! Keep it up..
ReplyDeleteWell written!
ReplyDelete