3. Reading and writing datasets

As nd is built around xarray, most of the IO is handled by xarray. However, nd provides an extra layer of abstraction, specifically with the functions nd.open_dataset() and nd.to_netcdf().

3.1. Reading a dataset

Ideally, your data exists already in the netCDF file format. However, as most geospatial data is distributed as GeoTiff or other file formats, nd and xarray rely on rasterio as a Python-friendly wrapper around GDAL for dealing with such raster data.

The nd.io module contains three additional functions to read different file formats:

as well as the convenience function nd.open_dataset() which resorts to one of the three functions above based on the file extension. All of these return xarray.Dataset or xarray.DataArray objects.

Most of the algorithms work on both Dataset and DataArray objects.

3.2. Writing a dataset

Write your processed data to disk using nd.to_netcdf().

Note

Currently, it is assumed that you will only ever want to convert your data from other formats into netCDF, but not the other way around. So if you need to export your result as a GeoTiff, you are on your own (for now). Sorry about that!

Here is a list of things that nd.open_dataset and nd.to_netcdf do in addition to xarray.open_dataset and xarray.Dataset.to_netcdf:

  • Handle complex-valued data. NetCDF doesn’t support complex valued data, so before writing to disk, complex variables are disassembled into their real and imaginary parts. After reading from disk, these parts can be reassembled into complex valued variables. That means if you use the functions provided by nd you don’t have to worry about complex-valued data at all.

  • Provide x and y coordinate arrays even if the NetCDF file uses lat and lon nomenclature. This is to be consistent with the general case of arbitrary projections.

>>> import nd
>>> import xarray as xr
>>> path = 'data/C2.nc'
>>> ds_nd = nd.open_dataset(path)
>>> ds_xr = xr.open_dataset(path)
>>> {v: ds_nd[v].dtype for v in ds_nd.data_vars}
{'C11': dtype('<f4'),
 'C22': dtype('<f4'),
 'C12': dtype('complex64')}
>>> {v: ds_xr[v].dtype for v in ds_xr.data_vars}
{'C11': dtype('float32'),
 'C12__im': dtype('float32'),
 'C12__re': dtype('float32'),
 'C22': dtype('float32')}