How to get historical weather data (min temp, max temp and precipitation) directly from NOAA (National Oceanic and Atmospheric Agency) using Python – Part 3 (extracting temperature/precip data from netcdf files)

In the last post I described how to create a GIS shape file for the region of interest for which you need to extract the temperature and precipitations information extracted for. In this post we will use the shape file from part 2 and the NETCDF files downloaded in part1 to extract the max temperature and precipitation for the region represented by the shape file which I created for Dekalb Illinois.

Let’s Start

Step 1:- Import Dependencies

import xarray as xr
import rasterio as rio
import rioxarray
import geopandas as gpd
import rasterstats as rstats

Step 2:- Load the tmax netcdf file using xarray library

reproject = xr.load_dataarray('./tmax.2022.nc')

Step 3:- Now assign a coordinate reference system using rasterio xarray extension

reproject.rio.write_crs('EPSG:4326',inplace=True)

Step 4:- Reproject the data in xarray object to the projection in which you created the shape file in part 2

reprojected = reproject.rio.reproject('EPSG:4326')

Step 5: – Write the reprojected data to a new netcdf file

reprojected.to_netcdf('temp_tmax2022.nc')

Step 6:- Extract the dataset from the data array and the dates in the dataset

ds = reprojected.to_dataset()
var = ds['tmax']
dates = ds['time'].values

Step 7: – Calculate the affine transform from the saved reprojected netcdf file

fn = 'temp_tmax2022.nc'
affine = rio.open(fn).transform

Step 8 :- Read the shape file created in part 2 into memory

shp = gpd.read_file('Dekalb.shp')
shp['geometry']

Step 9 :- Select the array for the latest date available in the dataset

arr = var.sel(time=dates.max()) 
arr_vals = arr.values

Step 10: – Use rasterstats zonal stats to run the shape file against the extracted data and find the mean temp max. That’s it you found the max temperature for Dekalb region for the latest date available in the netcdf data

math = rstats.zonal_stats(shp, arr_vals, affine=affine, all_touched=True, stats="mean", nodata=-9.96921e+36)
math

Full Code

import xarray as xr
import rasterio as rio
import rioxarray
import geopandas as gpd
import rasterstats as rstats

reproject = xr.load_dataarray('./tmax.2022.nc')

reproject.rio.write_crs('EPSG:4326',inplace=True)

reprojected = reproject.rio.reproject('EPSG:4326')

reprojected.to_netcdf('temp_tmax2022.nc')

ds = reprojected.to_dataset()

var = ds['tmax']

dates = ds['time'].values

fn = 'temp_tmax2022.nc'
affine = rio.open(fn).transform

#shp = gpd.read_file('./shps_simplified/27.shp')

shp = gpd.read_file('Dekalb.shp')

shp['geometry']

dates.max()

arr = var.sel(time=dates.max()) 

arr_vals = arr.values

mean_tmax = rstats.zonal_stats(shp, arr_vals, affine=affine, all_touched=True, stats="mean", nodata=-9.96921e+36)

mean_tmax

Conclusion

Similarly you can extract data from precipitation, tmin files for any shape file you have. You can even download pre created shape files for US and other regions. This completes my 3 part series for extracting tmin, tmax and precip data directly from netcdf files from NOAA. You can read previous two parts here Part1 and Part2

Now in next articles we will explore how you can do this for the whole of US and also speed it up using parallel processing.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.