****** NetCDF ****** Chunking ======== The definition of chunking is well described here: https://www.unidata.ucar.edu/software/netcdf/workshops/most-recent/nc4chunking/WhatIsChunking.html. - how to know the chunk size of a file (ncdump -hs): .. code-block:: console ncdump -hs toce_nico.nc netcdf toce_nico { dimensions: deptht = 75 ; axis_nbounds = 2 ; y = 567 ; x = 687 ; time_counter = UNLIMITED ; // (32 currently) variables: ... float toce(time_counter, deptht, y, x) ; toce:standard_name = "sea_water_potential_temperature" ; ... toce:_Storage = "chunked" ; toce:_ChunkSizes = 1, 19, 142, 172 ; toce:_Endianness = "little" ; - In the previous file, it means to access toce, netcdf will read tile of (1, 19, 142, 172). So, if you read with an horizontal 2d access (a map), you will read more data than needed because netcdf will first read 19 levels then filter out the map you want. For a 'map' access, the best is to have a chunksize of 1 on the vertical. I don't know for AMUXL, but for eORCA12 access of 3d variable this is CRITICAL to have the right chunking size. If you plan to extract only vertical profile from your output file, the chunck will probably be something like (1, nz, cx, cy) cx and cy to be defined by testing. - how to specify a specific chunking: to do so, you can either: - copy you file with nccopy and specifying a new chunksize with the -c option. - extract and copy with ncks and --cnk_dmn option .. code-block:: console nccopy -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc out.nc or .. code-block:: console ncks --cnk_dmn x,200 --cnk_dmn y,200 --cnk_dmn time_counter,1 --cnk_dmn deptht,1 -d deptht,0,0 -v toce toce_nico.nc out.nc - If you found on of your processing is very very slow, have a look to your chunksize and see if it is adapted to your reading pattern. Deflation ========= Netcdf since the version 4 support deflation (ie compression), depending on your netcdf it can save a lot of storage (the more 0 you have, the more efficient it is). Below some test made with AMUXL12 configuration with a file containing a month of daily 3d temperature. - as expected, the higher is the compression level, the longer it takes to create it (-d option of nccopy or -L option with ncks). .. code-block:: console time nccopy -d 0 -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d0 real 0m23.817s user 0m2.731s sys 0m7.741s .. code-block:: console time nccopy -d 1 -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d1 real 1m37.195s user 1m32.293s sys 0m4.242s .. code-block:: console time nccopy -d 2 -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d2 real 1m38.894s user 1m34.070s sys 0m4.317s .. code-block:: console time nccopy -d 9 -c x/200,y/200,deptht/1,time_counter/1 toce_nico.nc toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d9 real 2m27.320s user 2m19.162s sys 0m4.354s - as expected, the higher the compression level is, the smaller the final file is. However for AMUXL, the differences are tiny between d1 and d9 and you will spend more time to create it and read it (2.096 Gb for d1 and 2.068 Gb for d9). d1 is thus a good compromise. You spare 1.7 Gb (ie 44 %) of storage while minimizing the draw back of a slower writing/reading (probably this depends of the computer as you trade reading by cpu for decompression) .. code-block:: console 3804842884 Jan 20 10:23 toce_nico.nc 2096447815 Jan 20 10:25 toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d1 2094722329 Jan 20 10:38 toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d2 2068371276 Jan 20 10:33 toce_nico.nc_4.4.4-intel-19.0.4-intelmpi-2019.4.243_d9 nco === NCO is netcdf tools box to manipulate netcdf. It is very useful to extract date, concatenate file, make some operation, average files ... ncks ---- ncks is a tools to extract data out of a netcdf file. I allow to extract/exclude variable or cut file along dimension. For exemple, the following command will extract only the first deptht level (-d option) from toce variable (-v option) in out.nc. out.nc will be written with a deflation level of 1 (-L option) and a chunksize of (x/200, y/200, deptht/1, time_counter/1). .. code-block:: console ncks -L 1 --cnk_dmn x,200 --cnk_dmn y,200 --cnk_dmn time_counter,1 --cnk_dmn deptht,1 -d deptht,0,0 -v toce toce_nico.nc out.nc **************** reStructuredText **************** Creation of the html from rst file ================================== 1: installation of sphinx: conda is your best friends: .. code-block:: console $ conda install sphinx 2: run a quick installation: The fastest way to do it is to use sphinx-quickstart .. code-block:: console $ sphinx-quickstart your output should be like this: .. code-block:: console Welcome to the Sphinx 3.2.1 quickstart utility. Please enter values for the following settings (just press Enter to accept a default value, if one is given in brackets). Selected root path: . You have two options for placing the build directory for Sphinx output. Either, you use a directory "_build" within the root path, or you separate "source" and "build" directories within the root path. > Separate source and build directories (y/n) [n]: y The project name will occur in several places in the built documentation. > Project name: test > Author name(s): toto > Project release []: 0.0 If the documents are to be written in a language other than English, you can select a language here by its language code. Sphinx will then translate text that it generates into that language. For a list of supported codes, see https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-language. > Project language [en]: Creating file /Users/mathiotp/Test/source/conf.py. Creating file /Users/mathiotp/Test/source/index.rst. Creating file /Users/mathiotp/Test/Makefile. Creating file /Users/mathiotp/Test/make.bat. Finished: An initial directory structure has been created. You should now populate your master file /Users/mathiotp/Test/source/index.rst and create other documentation source files. Use the Makefile to build the docs, like so: make builder where "builder" is one of the supported builders, e.g. html, latex or linkcheck. 3: build the web site the command make html will do the job for you. .. code-block:: console $ make html Running Sphinx v3.2.1 making output directory... done building [mo]: targets for 0 po files that are out of date building [html]: targets for 1 source files that are out of date updating environment: [new config] 1 added, 0 changed, 0 removed reading sources... [100%] index looking for now-outdated files... none found pickling environment... done checking consistency... done preparing documents... done writing output... [100%] index generating indices... genindexdone writing additional pages... searchdone copying static files... ... done copying extra files... done dumping search index in English (code: en)... done dumping object inventory... done build succeeded. 4: visualise your html a index.html is available in build/html and you can visualise it with your favorite browser 5: customize it the graphical option are controle by the conf.py file in source. For something similar to this, refer to the conf.py in the source directory of the ghpages branch project. 6: publish it on github pages. This is probably the most tricky part. - setp 1: you need to have an account on git, create your project and push all the files (make, build, source ...). If you are not premium, you need to have your repo public. - step 2: create a branch call it gh-pages - step 3: check in your repository setting that gh-pages is activated - step 4: add an empty file .nojekyll in your directory to tell github to not interpret it as md. - step 5: if not done, you need to add in your conf.py these lines (it works for me): .. code-block:: console import sphinx_rtd_theme html_theme = 'sphinx_rtd_theme' and add sphinx_rtd_theme to the extension: .. code-block:: console extensions = [...,'sphinx_rtd_theme'] These modules can be installed with these commands: .. code-block:: console conda install -c anaconda sphinx_rtd_theme conda install -c conda-forge sphinxcontrib-bibtex - step 6: commit and push the changes to gh-pages branch (you can check in your github interface if the web site is well published). The site will be in 'https://username.github.io/REPO/docs/build/html/index.html'. For a quick access, you can add the link to your README. Syntax of rst file ================== - a quick and detailed guide is available here: https://docutils.sourceforge.io/docs/user/rst/quickref.html - All this site is written in rst, so for any option you see in these pages, you can refer to the source code. ********** CMIP6 data ********** Wget ==== You can find all the available name, variable ... here: https://esgf-index1.ceda.ac.uk/search/cmip6-ceda/ A listing of all the facet available for the query is described here: https://esgf-node.llnl.gov/esg-search/search?project=CMIP6&facets=*&limit=0 To create the wget script: 1. look for your data To do this you have to enter this http: https://esgf-node.llnl.gov/esg-search/search?experiment_id=ssp585&variable=tob&distrib=true. In this case, I am looking for the variable tob in the ssp585 experiment id on all the server. More details on the method here: https://esgf.github.io/esgf-user-support/user_guide.html#facet-queries. 2. get your wget script Replace in the http, 'search?' by 'wget?' 3. run the script with the -s option For more advanced method, look at the documentation. *** Git *** **Create a remote branch:** - create a local branch (it will create a local from your local branch): .. code-block:: console git checkout -b - push the branch to the remote directory: .. code-block:: console git push -u origin **List available branches:** .. code-block:: console git branch **Checkout remote branch from a team member:** .. code-block:: console git fetch git checkout **Change from https to ssh - First check you are in https: .. code-block:: console $ git remote -v origin https://github.com/username/TOTO.git (fetch) origin https://github.com/username/TOTO.git (push) - switch to ssh: .. code-block:: console $ git remote set-url origin git@github.com:username/TOTO.git - check if it works: .. code-block:: console $ git remote -v origin git@github.com:username/TOTO.git (fetch) origin git@github.com:username/TOTO.git (push) **Create a new git directory from an existing directtory - first create your git directory locally : .. code-block:: console $ cd MY_PROJECT $ git init $ git add file1 $ git commit - Then connect it to github - Go to github. - Log in to your account. - Click the new repository button in the top-right. You’ll have an option there to initialize the repository with a README file, but I don’t. - Click the “Create repository” button. - Finally push to the remote dir .. code-block:: console $ git remote add origin git@github.com:username/new_repo $ git push -u origin master ***** CONDA ***** - Create an environment: .. code-block:: console conda create -n python=X.Y.Z - add a new package in an environement: .. code-block:: console conda install or .. code-block:: console conda install -c conda-forge - export an environment: .. code-block:: console conda env export > .yml - create an environment from a yml file: .. code-block:: console conda env create -f .yml - add a new conda env to your jupyter notebook .. code-block:: console python -m ipykernel install --user --name=firstEnv - Activate or deactivate an environment: .. code-block:: console conda activate conda deactivate