In the last post I described how to create a GIS shape file for the region of interest for which you need to extract the temperature and precipitations information extracted for. In this post we will use the shape file from part 2 and the NETCDF files downloaded in part1 to extract the max temperature and precipitation for the region represented by the shape file which I created for Dekalb Illinois.
Let’s Start
Step 1:- Import Dependencies
import xarray as xr import rasterio as rio import rioxarray import geopandas as gpd import rasterstats as rstats
Step 2:- Load the tmax netcdf file using xarray library
reproject = xr.load_dataarray('./tmax.2022.nc')
Step 3:- Now assign a coordinate reference system using rasterio xarray extension
reproject.rio.write_crs('EPSG:4326',inplace=True)
Step 4:- Reproject the data in xarray object to the projection in which you created the shape file in part 2
reprojected = reproject.rio.reproject('EPSG:4326')
Step 5: – Write the reprojected data to a new netcdf file
reprojected.to_netcdf('temp_tmax2022.nc')
Step 6:- Extract the dataset from the data array and the dates in the dataset
ds = reprojected.to_dataset() var = ds['tmax'] dates = ds['time'].values
Step 7: – Calculate the affine transform from the saved reprojected netcdf file
fn = 'temp_tmax2022.nc' affine = rio.open(fn).transform
Step 8 :- Read the shape file created in part 2 into memory
shp = gpd.read_file('Dekalb.shp') shp['geometry']
Step 9 :- Select the array for the latest date available in the dataset
arr = var.sel(time=dates.max()) arr_vals = arr.values
Step 10: – Use rasterstats zonal stats to run the shape file against the extracted data and find the mean temp max. That’s it you found the max temperature for Dekalb region for the latest date available in the netcdf data
math = rstats.zonal_stats(shp, arr_vals, affine=affine, all_touched=True, stats="mean", nodata=-9.96921e+36) math
Full Code
import xarray as xr import rasterio as rio import rioxarray import geopandas as gpd import rasterstats as rstats reproject = xr.load_dataarray('./tmax.2022.nc') reproject.rio.write_crs('EPSG:4326',inplace=True) reprojected = reproject.rio.reproject('EPSG:4326') reprojected.to_netcdf('temp_tmax2022.nc') ds = reprojected.to_dataset() var = ds['tmax'] dates = ds['time'].values fn = 'temp_tmax2022.nc' affine = rio.open(fn).transform #shp = gpd.read_file('./shps_simplified/27.shp') shp = gpd.read_file('Dekalb.shp') shp['geometry'] dates.max() arr = var.sel(time=dates.max()) arr_vals = arr.values mean_tmax = rstats.zonal_stats(shp, arr_vals, affine=affine, all_touched=True, stats="mean", nodata=-9.96921e+36) mean_tmax
Conclusion
Similarly you can extract data from precipitation, tmin files for any shape file you have. You can even download pre created shape files for US and other regions. This completes my 3 part series for extracting tmin, tmax and precip data directly from netcdf files from NOAA. You can read previous two parts here Part1 and Part2
Now in next articles we will explore how you can do this for the whole of US and also speed it up using parallel processing.