Unraveling climate patterns and forecasting future

This project leverages advanced methods to analyze historical climate data and uncover hidden patterns, helping us understand and predict future climate conditions.

In this project, I created an interactive Smart Dashboard that combines climate data from the National Oceanic and Atmospheric Administration (NOAA) and employed ARIMA (AutoRegressive Integrated Moving Average) model to forecast weather. The dashboard delivers annual forecasts on a global scale, enabling users to explore climate patterns and trends across various regions. The project's main objective was to design a user-friendly platform that presents complex climate data in a simple and engaging format. The dashboard's adaptable design allows users to zoom into specific areas or view broad climate trends, providing a valuable tool for gaining insights into climate-related topics. By utilizing ARIMA models, I produced accurate forecasts for critical weather indicators, like Average Temperature. Our dashboard focuses on being clear and easy to use, allowing users to understand and engage with the data effortlessly. Through this effort, I aim to raise climate awareness to address shifting weather patterns.

Data Preprocessing and Analysis

To address the complex challenge of understanding climate patterns and predicting future weather conditions, our methodology centers on a comprehensive approach that blends advanced data analysis techniques and predictive modeling.

  1. We collected data from @NOAA – Climate data online
  2. Parameters collected : Average Temperature, Minimum & Maximum Temperature
  3. Over 224 countries with 600 climate stations data is collected.
With this extensive dataset, by using the application of various statistical and machine learning algorithms, I aimed at uncovering patterns and trends within the data. So, I started with data preprocessing , mainly filling null data values using seasonal and monthly averages. A key aspect of our approach involves utilizing time series analysis techniques, including the widely recognized ARIMA (AutoRegressive Integrated Moving Average) models, which are adept at capturing temporal dependencies and fluctuations in weather variables over time.

Stationarity Assessment: Augmented Dickey-Fuller (ADF) Test

  1. Null Hypothesis (H0): The time series dataset contains a unit root, suggesting non-stationarity.
  2. Alternative Hypothesis (H1): The time series dataset lacks a unit root, suggesting stationarity.
  3. If the p-value is lower than the critical value (0.05), we reject the Null Hypothesis; otherwise, we accept it. For our datasets, the majority have yielded results indicating rejection of the null hypothesis, implying that the datasets are stationary, as demonstrated below.

SARIMA model

SARIMA (Seasonal Autoregressive Integrated Moving Average) model is a powerful tool for time series forecasting, incorporating both trend and seasonal components. It builds upon ARIMA by considering seasonal variations, making it suitable for data with recurring patterns.

  1. It builds upon ARIMA by considering seasonal variations, making it suitable for data with recurring patterns.
  2. SARIMA parameters (p, d, q) are chosen based on autocorrelation and partial autocorrelation functions, while seasonal parameters (P, D, Q) are determined through seasonal differencing.
  3. ACF: Identifies correlation between time series & lagged values.
  4. PACF: Determines relationship between a point & its lag, excluding intervening lags.
  5. Interpretation: Significant spikes beyond confidence intervals indicate relationships.
  6. Parameters: Utilized (3,0,0) for order and (0,1,1,12) for seasonal order. Trend: Specified 'c' for constant trend component.

Model Validation

  1. Current vs Predicted Values
  2. Errors vs Predicted Values
  3. Residuals Normality Check
  4. Residuals Autocorrelation

Observations

  1. Validation Results: SARIMA(3,0,0),(0,1,1,12),'c' model achieved an RMSE of 2.8908°C.
  2. Represents a significant improvement, reducing RMSE by -69.62% compared to baseline model.
  3. Unseen Data Baseline RMSE : 8.68°C
  4. Unseen Data SARIMA RMSE : 3.86°C. Demonstrates a notable enhancement, with an overall improvement of 55.59%.

Discover the full Python data analysis,preprocessing scripts and data on my GitHub! Dive into the details at @World-Climate-Data-Analysis .


GitHub GitHub

Link to Climate data project python scripts, SARIMA pipeline notebook and reference files.

Tableau

Link to Tableau Dashboard with several data visualization pages and filters.