NetCDF

Chunking

The definition of chunking is well described here: https://www.unidata.ucar.edu/software/netcdf/workshops/most-recent/nc4chunking/WhatIsChunking.html.

  • how to know the chunk size of a file (ncdump -hs):

ncdump -hs toce_nico.nc

netcdf toce_nico {
dimensions:
      deptht = 75 ;
      axis_nbounds = 2 ;
      y = 567 ;
      x = 687 ;
      time_counter = UNLIMITED ; // (32 currently)
variables:
...
      float toce(time_counter, deptht, y, x) ;
              toce:standard_name = "sea_water_potential_temperature" ;
    ...
              toce:_Storage = "chunked" ;
              toce:_ChunkSizes = 1, 19, 142, 172 ;
              toce:_Endianness = "little" ;
  • In the previous file, it means to access toce, netcdf will read tile of (1, 19, 142, 172). So, if you read with an horizontal 2d access (a map), you will read more data than needed because netcdf will first read 19 levels then filter out the map you want. For a ‘map’ access, the best is to have a chunksize of 1 on the vertical. I don’t know for AMUXL, but for eORCA12 access of 3d variable this is CRITICAL to have the right chunking size. If you plan to extract only vertical profile from your output file, the chunck will probably be something like (1, nz, cx, cy) cx and cy to be defined by testing.

  • how to specify a specific chunking: to do so, you can either: - copy you file with nccopy and specifying a new chunksize with the -c option. - extract and copy with ncks and –cnk_dmn option

nccopy -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc out.nc

or

ncks --cnk_dmn x,200 --cnk_dmn y,200 --cnk_dmn time_counter,1 --cnk_dmn deptht,1 -d deptht,0,0 -v toce toce_nico.nc out.nc
  • If you found on of your processing is very very slow, have a look to your chunksize and see if it is adapted to your reading pattern.

Deflation

Netcdf since the version 4 support deflation (ie compression), depending on your netcdf it can save a lot of storage (the more 0 you have, the more efficient it is).

Below some test made with AMUXL12 configuration with a file containing a month of daily 3d temperature.

  • as expected, the higher is the compression level, the longer it takes to create it (-d option of nccopy or -L option with ncks).

time nccopy -d 0 -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d0

real  0m23.817s
user  0m2.731s
sys   0m7.741s
time nccopy -d 1 -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d1

real  1m37.195s
user  1m32.293s
sys   0m4.242s
time nccopy -d 2 -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d2

real  1m38.894s
user  1m34.070s
sys   0m4.317s
time nccopy -d 9 -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d9

real  2m27.320s
user  2m19.162s
sys   0m4.354s
  • as expected, the higher the compression level is, the smaller the final file is. However for AMUXL, the differences are tiny between d1 and d9 and you will spend more time to create it and read it (2.096 Gb for d1 and 2.068 Gb for d9). d1 is thus a good compromise. You spare 1.7 Gb (ie 44 %) of storage while minimizing the draw back of a slower writing/reading (probably this depends of the computer as you trade reading by cpu for decompression)

3804842884 Jan 20 10:23 toce_nico.nc
2096447815 Jan 20 10:25 toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d1
2094722329 Jan 20 10:38 toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d2
2068371276 Jan 20 10:33 toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d9

nco

NCO is netcdf tools box to manipulate netcdf. It is very useful to extract date, concatenate file, make some operation, average files …

ncks

ncks is a tools to extract data out of a netcdf file. I allow to extract/exclude variable or cut file along dimension.

For exemple, the following command will extract only the first deptht level (-d option) from toce variable (-v option) in out.nc. out.nc will be written with a deflation level of 1 (-L option) and a chunksize of (x/200, y/200, deptht/1, time_counter/1).

ncks -L 1 --cnk_dmn x,200 --cnk_dmn y,200 --cnk_dmn time_counter,1 --cnk_dmn deptht,1 -d deptht,0,0 -v toce toce_nico.nc out.nc

reStructuredText

Creation of the html from rst file

1: installation of sphinx:

conda is your best friends:

$ conda install sphinx
2: run a quick installation:

The fastest way to do it is to use sphinx-quickstart

$ sphinx-quickstart

your output should be like this:
Welcome to the Sphinx 3.2.1 quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).

Selected root path: .

You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/n) [n]: y

The project name will occur in several places in the built documentation.
> Project name: test
> Author name(s): toto
> Project release []: 0.0

If the documents are to be written in a language other than English,
you can select a language here by its language code. Sphinx will then
translate text that it generates into that language.

For a list of supported codes, see
https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-language.
> Project language [en]:

Creating file /Users/mathiotp/Test/source/conf.py.
Creating file /Users/mathiotp/Test/source/index.rst.
Creating file /Users/mathiotp/Test/Makefile.
Creating file /Users/mathiotp/Test/make.bat.

Finished: An initial directory structure has been created.

You should now populate your master file /Users/mathiotp/Test/source/index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
   make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.
3: build the web site

the command make html will do the job for you.

$ make html

Running Sphinx v3.2.1
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
generating indices...  genindexdone
writing additional pages...  searchdone
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.
4: visualise your html

a index.html is available in build/html and you can visualise it with your favorite browser

5: customize it

the graphical option are controle by the conf.py file in source. For something similar to this, refer to the conf.py in the source directory of the ghpages branch project.

6: publish it on github pages.

This is probably the most tricky part.

  • setp 1: you need to have an account on git, create your project and push all the files (make, build, source …). If you are not premium, you need to have your repo public.

  • step 2: create a branch call it gh-pages

  • step 3: check in your repository setting that gh-pages is activated

  • step 4: add an empty file .nojekyll in your directory to tell github to not interpret it as md.

  • step 5: if not done, you need to add in your conf.py these lines (it works for me):

import sphinx_rtd_theme

html_theme = 'sphinx_rtd_theme'

and add sphinx_rtd_theme to the extension:

extensions = [...,'sphinx_rtd_theme']

These modules can be installed with these commands:

conda install -c anaconda sphinx_rtd_theme
conda install -c conda-forge sphinxcontrib-bibtex

Syntax of rst file

CMIP6 data

Wget

You can find all the available name, variable … here: https://esgf-index1.ceda.ac.uk/search/cmip6-ceda/ A listing of all the facet available for the query is described here: https://esgf-node.llnl.gov/esg-search/search?project=CMIP6&facets=*&limit=0

To create the wget script: 1. look for your data

To do this you have to enter this http: https://esgf-node.llnl.gov/esg-search/search?experiment_id=ssp585&variable=tob&distrib=true. In this case, I am looking for the variable tob in the ssp585 experiment id on all the server. More details on the method here: https://esgf.github.io/esgf-user-support/user_guide.html#facet-queries.

  1. get your wget script Replace in the http, ‘search?’ by ‘wget?’

  2. run the script with the -s option

For more advanced method, look at the documentation.

Git

Create a remote branch:

  • create a local branch (it will create a local from your local branch):

git checkout -b <branch-name>
  • push the branch to the remote directory:

git push -u origin <branch-name>

List available branches:

git branch

Checkout remote branch from a team member:

git fetch
git checkout <branch-name>

**Change from https to ssh

  • First check you are in https:

$ git remote -v
origin      https://github.com/username/TOTO.git (fetch)
origin      https://github.com/username/TOTO.git (push)
  • switch to ssh:

$ git remote set-url origin git@github.com:username/TOTO.git
  • check if it works:

$ git remote -v
origin      git@github.com:username/TOTO.git (fetch)
origin      git@github.com:username/TOTO.git (push)

**Create a new git directory from an existing directtory

  • first create your git directory locally :

$ cd MY_PROJECT
$ git init
$ git add file1
$ git commit
  • Then connect it to github
    • Go to github.

    • Log in to your account.

    • Click the new repository button in the top-right. You’ll have an option there to initialize the repository with a README file, but I don’t.

    • Click the “Create repository” button.

  • Finally push to the remote dir

$ git remote add origin git@github.com:username/new_repo
$ git push -u origin master

CONDA

  • Create an environment:

conda create -n <env name> python=X.Y.Z
  • add a new package in an environement:

conda install <package name>

or

conda install -c conda-forge <package name>
  • export an environment:

conda env export > <env file name>.yml
  • create an environment from a yml file:

conda env create -f <env file name>.yml
  • add a new conda env to your jupyter notebook

python -m ipykernel install --user --name=firstEnv
  • Activate or deactivate an environment:

conda activate <env name>
conda deactivate