This is a set of matlab functions to make it easier for oceanographers to handle large sets of NetCDF files. The functions are built on the low-level Matlab NetCDF library package.
- Load variables in a customed region across multiple files quickly.
- Enhanced feature for downloadeding NetCDF files via OpenDAP.
-
load variables
- Correct scales, offsets, missing values automatically
- Load time in "datenum" format (days since 0000-01-00 00:00:00)
- Load a subset of a variable by specifying the longitude, latitude, time, etc., directly.
- Load variable from multiple files automatically
- Save/load dimensional information to/from a cache file, which can improve the performance significantly when it reads from a large number of files.
-
File operations
- Extract a subset of a NetCDF file to a new one.
- Download netcdf files via OpenDAP easily and reliably.
- support large datasets, variable can be downloaded block by block
- Retry automatically after interruptions.
- Merge files by time
- Merge files by time (save mean values)
-
write NetCDF files
- Write simple NetCDF files quickly. (The support for writting NetCDF files is not the coral propose of this toolbox. Please use the Matlab NetCDF library package for writting complex files.
- This toolbox does not support groups. And there is no plan to add this feature in near future.
Path | Notes |
---|---|
./ | Functions for current version |
./private | Other functions called by this toolbox |
./Documents_and_demo | some documents file and the netcdf files used in demo |
./Archive | Some old codes, They shoud not be used. |
FUN_nc_varget_enhanced_region_2_multifile
: Read a subset of a variable from multiple filesFUN_nc_varget_enhanced
: Read a variable from one fileFUN_nc_varget_enhanced_region_2
: Read a subset of a variable from one fileFUN_nc_get_time_in_matlab_format
: Read time variable into "matlab unit" (days since 0000-01-00 00:00
)FUN_nc_OpenDAP_with_limit
: Download via OpenDAP. It support block-by-block download and automagical retry after internet errors.
You need to add this to the searching path of your Matlab. The subfolders (private
, Documents_and_demo
, Archive
should not be added to the searching path.) It can be done by two ways:
Click "Home tab> Set Path". It will open a dialog for setting the path. Then, click "Add Folder...", add the root path of this package (the folder contains a lot of functions, including FUN_nc_varget.m
), then click "Save" near the bottom of the dialog.
-
Method 1 (recommended):
addpath('/path/to/Easy_netcdf/'); savepath
-
Method 2: Matlab will run
startup.m
during boot automatically if the file exists. Thus, you can addaddpath('/path/to/Easy_netcdf/');
to thestartup.m
file and make sure that thestartup.m
is put in existing searching path. This provide more flexibility.
Several functions in this package were written for this purpose, all of which, except FUN_nc_varget
, can be replaced by FUN_nc_varget_enhanced_region_2_multifile
.
[New] You can load all variables from a single netcdf file by this
data = FUN_nc_load_all_variables( fn );
, where fn
is the name of the netcdf file.
or
data = FUN_nc_load_all_variables( fn, 'time_var_name', var_time );
which will convert the time variable (var_time
) to matlab unit (days since 0000-01-00 00:00) according to its units
property.
data = FUN_nc_varget( filename, varname );
-
off_set
will not be applied -
scale
will not be applied -
missing values
will not be replaced by NaN. -
Loaded data will keep its original type as in the netcdf file.
- filename: path to a specific netcdf file
- varname : name of the variable to be read
- data: values read from the netcdf file.
data = FUN_nc_varget( 'Demo_SST_2001.nc', 'sst');
data = FUN_nc_varget_enhanced( filename, varname );
This is the recommended command for loading one variable from one file.
off_set
will be appliedscale
will be appliedmissing values
will be replaced by nan- Data will be converted to
double
.
- filename: path to a specific netcdf file
- varname : name of the variable to be read
- data: values read from the netcdf file.
data = FUN_nc_varget_enhanced( 'Demo_SST_2001.nc', 'sst');
data = FUN_nc_varget_enhanced_region( filename, varname, start, count, stride);
- Read a part of the domain.
off_set
will be appliedscale
will be appliedmissing values
will be replaced by nan- Loaded data will be converted to
double
-
filename: path to a specific netcdf file
-
varname : name of the variable to be read
-
start, count, stride: same as this document for
netcdf.getVar
-
data: values read from the netcdf file.
% parameters
fn = 'Demo_SST_2001.nc';
lonlimit = [-110 -20];
latlimit = [15 70];
% read lon/lat
lon = FUN_nc_varget_enhanced( fn, 'lon' );
lat = FUN_nc_varget_enhanced( fn, 'lat' );
% calculate [start, count] from range for lon/lat.
[x_start, x_count, xloc] = FUN_nc_varget_sub_genStartCount( lon, lonlimit );
[y_start, y_count, yloc] = FUN_nc_varget_sub_genStartCount( lat, latlimit );
nc_start = [ x_start, y_start, 0 ] % the third value is for time.
nc_count = [ x_count, y_count, 1 ]
nc_stride= [1, 1, 1];
% load data
data = FUN_nc_varget_enhanced_region( fn, 'sst', nc_start, nc_count, nc_stride);
[ out_dim, data ] = FUN_nc_varget_enhanced_region_2( filename, varname, dim_name, dim_limit, [time_var_name], [dim_varname] );
This can also be done by "FUN_nc_varget_enhanced_region_2_multifile.m".
- Load a part of the domain.
off_set
will be applied.scale
will be applied.missing values
will be replaced by nan.- Loaded data will be converted to
double
.
-
filename [char]: name of the NetCDF file (e.g., 'temp.nc')
-
varname [char]: name of the variable (e.g., 'sst' or 'ssh')
-
dim_name [cell]: name of dimensions related to the variable specified above, like {'lon'}, {'lon','lat'}, {'lon', 'lat', 'depth, 'time'}. Dimensions with customed limits must be listed here here. Other dimensions are optional.
-
dim_limit [cell]: limits of dimensions in a cell. (e.g., {[-85 -55 ], [-inf inf]}). Please provide limits in the same order as they are listed in
dim_name
. -
time_var_name [char, optional]: name of the variable for time.
- If this is not empty, the limit for time in
dim_limit
can be given in a "matlab units" (days since 0000-01-00 00:00) or created bydatenum
. - If this is not empty, the time in
out_dim
will be given in "matlab units" (days since 0000-01-00 00:00).
- If this is not empty, the limit for time in
-
dim_varname [cell, optional]: name of the variable defining the axis at each dimension.
-
by default, each axis is defined by a variable sharing the same name. For example, the axis
lon
should be accompanied by a variable namedlon
. In such a case, thedim_varname
should be left empty. -
If the axis is defined by a variable with a different name, the name of the variable should be specified manually here. For example, the meridional dimension in 'Demo_SST_2001.nc' is named "y". However, the latitude is defined by a variable
lat
. In such a situation:fn = 'Demo_SST_2001.nc' dim_name={'lon','y'}; dim_limit = { [-110 -20], [15 70]}; time_var_name='time'; dim_varname={'lon','lat'} varname='sst' [ out_dim, data ] = FUN_nc_varget_enhanced_region_2( fn, 'sst', dim_name, dim_limit, time_var_name, dim_varname );
- "dim_varname{1} = nan" indicates that the axis is not defined by any variables in file. Thus, it will be defined as 1, 2, 3, ... Nx, where Nx is the length of the dimension.
-
- out_dim : dimension info (e.g., longitude, latitude, if applicable)
- data : data extracted from the given netcdf file.
- When
time_var_name
is not empty, the corresponding variable inout_dim
is converted to the same format asdatenum
. However, this unit conversion will never be applied to the output variabledata
. If you want to read the time variable itself, please useFUN_nc_get_time_in_matlab_format
.
- When
fn = 'Demo_SST_2001.nc';
dim_name = {'lon','y', 'time'};
dim_limit = { [-110 -20], [15 70], [datenum(2001,5,1) datenum(2001,5,31)] };
time_var_name = 'time';
dim_varname = {'lon','lat','time'};
varname = 'sst';
[ out_dim, data ] = FUN_nc_varget_enhanced_region_2( fn, 'sst', dim_name, dim_limit, time_var_name, dim_varname );
% Plot
%FUN_MAP_pcolor_lonlat_quick( lon, lat, data(:,:,1)');
pcolor( out_dim.lon, out_dim.lat, data' );
cbar = colorbar;
shading interp
title(datestr(out_dim.time))
dim_varname{2} is set to nan to read the second record in time.
fn = 'Demo_SST_2001.nc';
dim_name = { 'y' 'time'};
dim_limit = { [0 50], [2,2] };
time_var_name = [];
dim_varname = {'lat', nan};
varname = 'sst';
[ out_dim, data ] = FUN_nc_varget_enhanced_region_2( fn, 'sst', dim_name, dim_limit, time_var_name, dim_varname );
% Plot
pcolor( out_dim.lon, out_dim.lat, data' );
cbar = colorbar;
shading interp
axis equal
%title(datestr(out_dim.time))
[ out_dim, data_out ] = FUN_nc_varget_enhanced_region_2_multifile( filelist, varname, dim_name, dim_limit, merge_dim_name, time_var_name, dim_varname );
- Load a variable across several files.
- Load a part of the domain.
off_set
will be applied.scale
will be applied.missing values
will be replaced by nan.- Loaded data will be converted to
double
.
filelist [struct array]: name and folder of the NetCDF file
filelist must include 2 attributes, name and folder. For
each element of filelist (e.g. the ith one), the full path
will be generated by fullfile( filelist(ith).folder, filelist(ith).name)
It can also be a cell array contain paths of files,
or a char matrix, each raw of which contains one path.
varname [char]: name of the variable
dim_limit_str [cell]: name of dimensions, like {'lon','lat'}. Dimensions with customed limits must be listed here here. Other dimensions are optional.
dim_limit_limit [cell]: limits of dimensions, like {[-85 -55], [30 45]}.
merge_dim_name [string]: name of the dimension in which the variables
from different files will be concatenated. If merge_dim_name is
empty, the variable will be concatenated after its last
dimension.
+ Example 1: if you want to read gridded daily
temperature given in [lon, lat, depth, time] from a set of
files, and each file contains temperature in one day,
the merge_dim_name should be 'time'.
+ Example 2: if you want to read gridded daily temperature given in
[lon, lat, depth], in which time is not given
explicitly in each file, you can leave merge_dim_name
empty.
time_var_name [char, optional]: name of the time axis
+ variable defined by this will be loaded into time in "matlab units" (days since 0000-01-00)
+ This is helpful for setting timelimit in a easy way, avoiding
calculating the timelimit from units in netcdf files.
For example, to read data between 02/15/2000 00:00 and
02/16/2000 00:00 from a netcdf file, which includes a time variable "ob_time"
in units of "days since 2000-00-00 00:00", you need to set
timelimit as [46 47] when time_var_name is empty. However, you
should set timelimit as [datenum(2000,2,15),
datenum(2000,2,16)] if the tiem_var_name is set to "ob_time".
dim_varname [cell, optional]: name of the variable defining the axis at each dimension.
+ by default, each axis is defined by a variable sharing the same name as the dimension.
+ "dim_varname{1} = nan" indicates that the axis is not defined
not defined by any variable in file. It will be defined
as 1, 2, 3, ... Nx, where Nx is the length of the dimension.
out_dim : dimension info (e.g., longitude, latitude, if applicable)
data : data extracted from the given netcdf file.
filelist = dir('Demo_*.nc');
varname = 'sst';
dim_name = { 'time' };
dim_limit = { [datenum(2001,12,1) datenum(2003,5,31)] };
merge_dim_name = 'time';
time_var_name = 'time';
dim_varname = [];
[ out_dim, data ] = FUN_nc_varget_enhanced_region_2_multifile( filelist, varname, dim_name, dim_limit, merge_dim_name, time_var_name, dim_varname );
Note: filelist can also be a cell like this
filelist = { 'Demo_SST_2001.nc'
'Demo_SST_2002.nc'
'Demo_SST_2003.nc'
'Demo_SST_2004.nc'
'Demo_SST_2005.nc'
'Demo_SST_2006.nc'
'Demo_SST_2007.nc'
'Demo_SST_2008.nc'
'Demo_SST_2009.nc'
'Demo_SST_2010.nc'};
or char array like this
filelist = ['Demo_SST_2001.nc'
'Demo_SST_2002.nc'
'Demo_SST_2003.nc'
'Demo_SST_2004.nc'
'Demo_SST_2005.nc'
'Demo_SST_2006.nc'
'Demo_SST_2007.nc'
'Demo_SST_2008.nc'
'Demo_SST_2009.nc'
'Demo_SST_2010.nc'];
filelist = dir('Demo_*.nc');
varname = 'sst';
dim_name = { 'lon', 'y', 'time' }; % In the demo files, the meridional dimension is named as "y".
dim_limit = { [-110 -20], [15 70], [datenum(2001,12,1) datenum(2003,11,30)] };
merge_dim_name = 'time'; % merge data in "time" dimension.
time_var_name = 'time'; % convert values in "time" to matlab units (days since 0000-01-00 00:00).
dim_varname = {'lon','lat','time'}; % This is to force the function to read values for the meridional dimension from the variable "lat".
[ out_dim, data ] = FUN_nc_varget_enhanced_region_2_multifile( filelist, varname, dim_name, dim_limit, merge_dim_name, time_var_name, dim_varname );
**Notes: ** ==
FUN_nc_gen_presaved_netcdf_info
is replaced byFUN_nc_gen_presaved_netcdf_info_v2
introduced in v1.50-beta.== The new version save the dimensional information in a new format ("v2"), dropping lots of unnecessary information. The output mat file is 10 times less than the previous one with a better performance and related functions have been updated to support the new format.
It might be slow to read a subset of data from hundreds of files by provide a list of all files for FUN_nc_varget_enhanced_region_2_multifile
. The function needs to open every single file for some dimensional information. To speed up this command, an alternative way is to read and save the dimensional information at the very beginning. Then, providing the pre-saved information to FUN_nc_varget_enhanced_region_2_multifile
.
The dimensional information can be generated by FUN_nc_gen_presaved_netcdf_info
and an example is shown below:
%% generate info -----------------------------------------------------------
filelist = dir('Demo_*.nc');
merge_dim_name = 'time'; % merge data in "time" dimension.
dim_name = { 'lon', 'y', 'time' }; % In the demo files, the meridional dimension is named as "y".
dim_varname = {'lon','lat','time'}; % This is to force the function to read values for the meridional dimension from the variable "lat".
time_var_name = 'time'; % convert values in "time" to matlab units (days since 0000-01-00 00:00). This is optional
output_file_path = 'Presaved_info_demo.mat';
% Please note that **absolute** file paths are saved in the generated file. If you moved the data, you need to run this again
pregen_info = FUN_nc_gen_presaved_netcdf_info_v2( filelist, merge_dim_name, dim_name, dim_varname, time_var_name, output_file_path );
%% read data --------------------------------------------------------------
varname = 'sst';
dim_name = { 'time' };
dim_limit = { [datenum(2001,12,1) datenum(2003,5,31)] };
merge_dim_name = 'time';
presaved_info = load(output_file_path);
presaved_info = presaved_info.pregen_info;
[ out_dim, data_out ] = FUN_nc_varget_enhanced_region_2_multifile( presaved_info, varname, dim_name, dim_limit);
FUN_nc_OpenDAP_with_limit( filename0, filename1, dim_limit_var, dim_limit_val, var_download, var_divided, divided_dim_str, Max_Count_per_group, ... )
- support large dataset.
- Retry automatically after interruptions.
- Download data piece by piece.
- Download a subset of the original file.
filename0 : source of the netcdf file (OpenDAP URL here)
filename1 : Name of output netcdf file
dim_limit_var : which axises you want to set the limit
dim_limit_val : the limit of each axises
var_download : the variable you'd like to download. [var_download = [] will download all variables.]
var_divided : the varialbes need to be downloaded piece by piece in a specific dimension. In many cases, OpenDAP will end up with no response if you try to donwloading too large data at once. A solution for this is to download data piece by piece
divided_dim_str: which dim you'd like to download piece by piece (e.g., 'time', or 'depth'). divided_dim_str = [] means all varialbes will be downloaded completely at once.
Max_Count_per_group: Max number of points in the divided dimension.
Parameter | Default value | note |
---|---|---|
dim_varname | dim_limit_name | Names of variables defining dimensions given in dim_limit_name |
time_var_name | [] | Name of the variable describing time |
is_auto_chunksize | false | Calculate chunk size by a function in this package (beta) |
compression_level | 1 | |
is_skip_blocks_with_errors | false | |
N_max_retry | 10 | |
var_exclude | [] |
N/A
Notice: To recongnize the axis correctly, there must be one variable named as by the axis! Assign a variable to a specific axis is not supported yet.
% HYCOM dataset at an OpenDAP server
filename0 = 'http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_19.1/2012';
% output filename
filename1 = 'HYCOM_test2.nc';
% calculate time limits
timelimit = [datenum(2012,1,1) datenum(2012,1,3)];
time = FUN_nc_varget(filename0,'time');
time_unit = FUN_nc_attget(filename0,'time','units');
[time0, unit_str, unit_to_day] = FUN_nc_get_time0_from_str( time_unit );
timelimit = (timelimit - time0)/unit_to_day ;
% set limits
lonlimit = [-76 -70 ];
latlimit = [32 39];
depthlimit = [0 100];
dim_limit_var = {'lon','lat','depth','time'};
dim_limit_val = {lonlimit, latlimit depthlimit timelimit};
% variable to be downloaded
var_download = {'water_temp','lon','lat','depth','time'}; % empty indicates downloading all variables
% Variables that should be downloaded block by block
var_divided = {'water_temp'};
% which dim you'd like to download block by block (e.g., 'time', or 'depth')
divided_dim_str = 'depth'
% max size of each "piece"
Max_Count_per_group = 5;
FUN_nc_OpenDAP_with_limit( filename0, filename1, dim_limit_var, dim_limit_val, var_download, var_divided, divided_dim_str, Max_Count_per_group )
% HYCOM dataset at an OpenDAP server
opendapURL = 'http://tds.hycom.org/thredds/dodsC/GLBu0.08/expt_19.1/2012';
% output filename
filename_out = 'HYCOM_test1.nc';
% calculate time limits
timelimit = [datenum(2012,1,1) datenum(2012,1,3)];
time_varname = 'time'; %Tell the code which variable contains time
% set limits
lonlimit = [-76 -70 ];
latlimit = [32 39];
depthlimit = [0 100];
dim_limit_var = {'lon', 'lat', 'depth', 'time' };
dim_limit_val = {lonlimit, latlimit, depthlimit, timelimit};
% variable to be downloaded
var_download = {'water_temp','lon','lat','depth','time'}; % empty indicates downloading all variables
% Variables that should be downloaded block by block
var_divided = var_download;
% which dim you'd like to download block by block (e.g., 'time', or 'depth')
divided_dim_str = 'depth';
% max size of each "piece"
N_divided_rec_per_group = 2;
FUN_nc_OpenDAP_with_limit( opendapURL, filename_out, dim_limit_var, dim_limit_val, var_download, var_divided, divided_dim_str, N_divided_rec_per_group, 'time_var_name', time_varname);
FUN_nc_merge( input_dir, filelist, output_fn, merge_dim_name, compatibility_mode )
input_dir: The folder in which all input netcdf given by "filelist" is located
filelist : the list of files which will be merged. This should be generated by matlab built-in command: `dir`.
This function will merge the netcdf files following the order given in this variable. Please make sure this variable has been resorted properly.
output_fn : Name of output netcdf file
merge_dim_name : name of the dimension in which all varialbes will be merged.
compatibility_mode:
compatibility_mode = 1: write netCDF in 'CLOBBER'; Compression would be disabled.
compatibility_mode = 0: write netCDF in 'NETCDF4'.
N/A
Note
- To recognize the axis correctly, there must be one variable named as by the axis!
- Variables without the dimension
merge_dim_name
will be copied from the first file given in the variable filelist - The time in the merged file may not be correct if the time units vary between files.
% input_dir: path for the folder containing the files
input_dir = '.';
% filelist
filelist = dir(fullfile(input_dir,'Merge_Demo*.nc'));
% output filename
output_fn = 'Merged_Output.nc';
% name of the demension to be merged.
merge_dim_name = 'time';
% compatibility_mode:
% compatibility_mode = 1: write netCDF in 'CLOBBER'; Compression would be disabled.
% compatibility_mode = 0: write netCDF in 'NETCDF4'.
compatibility_mode = 0;
strvcat( filelist(:).name )
FUN_nc_merge( input_dir, filelist, output_fn, merge_dim_name, compatibility_mode )
FUN_nc_merge_save_mean( input_dir, filelist, output_fn, merge_dim_name, compatibility_mode, list_var_excluded )
% input_dir: path for the folder containing the files
input_dir = '.';
% filelist
filelist = dir(fullfile(input_dir,'Merge_Demo*.nc'));
% Output filename
output_fn = 'Merged_Output_mean.nc';
% Name of the demension to be merged.
merge_dim_name = 'time';
% compatibility_mode:
% compatibility_mode = 1: write netCDF in 'CLOBBER';
% compatibility_mode = 0: write netCDF in 'NETCDF4';
compatibility_mode = 0;
% Variable should not be included in the output file.
list_var_excluded = [];
% Execute
FUN_nc_merge_save_mean( input_dir, filelist, output_fn, merge_dim_name, compatibility_mode, list_var_excluded );
FUN_nc_easywrite_enhanced( filename, dim_name, dim_length, varname, dimNum_of_var, data, global_str_att )
filename [char]: name of the output netcdf file (e.g., 'test.nc')
dim_name [cell]: names of dimensions (e.g., {'lon','lat'}
dim_length [array]: length of each dimension (e.g., [ 360, 180 ] )
varname [cell]: names of variables (e.g., {'lon','lat','sst','ssh'}
dimNum_of_var [cell]: dimensional ID for each variable { 1, 2, [1,2],[1,2]} ). Value "1" indciate the first dimension in `dim_name`. value 2 indicate the second dimension in `dim_name`.
data [cell]: values for each variable (e.g., {lon,lat,sst,ssh})
global_str_att: global attribute
Parameter | Default value | note |
---|---|---|
dim_varname | dim_limit_name | Names of variables defining dimensions given in dim_limit_name |
time_var_name | [] | Name of the variable describing time |
is_auto_chunksize | false | |
compressiion_level | 1 | |
is_skip_blocks_with_errors | false | |
N_max_retry | 10 | |
var_exclude | [] |
N/A
% ---- generate random data ----
lon = [-75:-55] ;
lat = [26 : 55] ;
depth= [0:5:100] ;
temp = rand( length(lon), length(lat), length(depth) );
%% ---- write nctCDF ----
filename = 'Test_random_values.nc';
dim_name = {'lon','lat','depth'};
dim_length = [length(lon), length(lat), length(depth)];
varname = {'temp', 'lon','lat','depth'};
dimNum_of_var = {[1,2,3], 1, 2, 3 };
data = { temp, lon, lat, depth };
global_att = 'This is a test.';
FUN_nc_easywrite_enhanced( filename, dim_name, dim_length, varname, dimNum_of_var, data, global_att )
% FUN_nc_easywrite_enhanced('temp.nc',...
% {'Node','Cell','time'},[1000 2000 500],...
% {'node_lon','node_lat','lon_cell','lat_cell','sst'},{1,1,2,2,[1 3]},...
% {lon_node,lat_node,lon_cell,lat_cell,sst},'This is an example');
*ncwriteschema
would be a better choice to write a more complex NetCDF file from structures.
FUN_nc_easywrite_add_var
: add a variable to an existing netcdf fileFUN_nc_easywrite_add_att
: add an attribute to an existing variable in an existing netcdf file.FUN_nc_easywrite
: write one variable into a new netcdf file.FUN_nc_easywrite_write_var
: replace values of an existing variable in an existing netcdf file.
Output of some examples above can be found here