Merge pull request #26 from jkshenton/master

Add soprano CLI
CCP-NC · Oct 3, 2024 · 31ee78e · 31ee78e
2 parents 2bd5f54 + c70b940
commit 31ee78e
Show file tree

Hide file tree

Showing 138 changed files with 25,646 additions and 2,095 deletions.
diff --git a/.github/workflows/docs-build-deploy.yml b/.github/workflows/docs-build-deploy.yml
@@ -34,9 +34,8 @@ jobs:
       with:
         python-version: 3.11
 
-    - name: Install dependencies
-      run: |
-        pip install -r requirements.txt
+    - name: Install Hatch
+      run: pip install hatch
 
     # (optional) Cache your executed notebooks between runs
     # if you have config:
@@ -51,7 +50,7 @@ jobs:
     # Build the book
     - name: Build the book
       run: |
-        jupyter-book build docs
+        hatch run docs:build
 
     # Upload the book's HTML as an artifact
     - name: Upload artifact

diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
@@ -30,8 +30,8 @@ jobs:
       - name: Build release distributions
         run: |
           # Seems to work for Soprano:
-          python -m pip install build
-          python -m build
+          python -m pip install hatch
+          hatch build
 
       - name: Upload distributions
         uses: actions/upload-artifact@v4
@@ -52,8 +52,8 @@ jobs:
     # Dedicated environments with protections for publishing are strongly recommended.
     environment:
       name: pypi
-      # OPTIONAL: uncomment and update to include your PyPI project URL in the deployment status:
-      # url: https://pypi.org/p/YOURPROJECT
+      # include PyPI project URL in the deployment status:
+      url: https://pypi.org/project/Soprano
 
     steps:
       - name: Retrieve release distributions

diff --git a/.gitignore b/.gitignore
@@ -113,3 +113,6 @@ venv.bak/
 # Test stuff
 tests/test_save/*
 tests/*.pkl
+
+# Temporary files from tutorial notebooks
+tutorials/_temp_output/*
diff --git a/README.md b/README.md
@@ -47,7 +47,7 @@ The AtomsCollection class generalises ASE's Atoms class by treating groups of st
 Many functions in Soprano require to compute interatomic distances, such as when computing bonds, or estimating NMR dipolar couplings. Soprano always takes the utmost care in dealing with periodic boundaries, using algorithms that ensure that the closest periodic copies are always properly accounted for in a fast and efficient way. This approach can also be used in custom functions as the algorithm can be found in the function `soprano.utils.minimum_periodic`.
 
 ### Easy processing of NMR parameters and spectral simulations
-ASE can read NMR parameters in the `.magres` file format, but Soprano can turn them to more meaningful physical quantities such as isotropies, anisotropies and asymmetries. In addition, with a full database of NMR active nuclei, Soprano can compute quadrupolar and dipolar couplings for specific isotopes. Finally, Soprano can produce a fast approximation of a powder spectrum - both MAS and static - in the diluted atoms approximation, or if that is not enough for your needs, provide an interface to NMR simulation software [Simpson](http://inano.au.dk/about/research-centers/nmr/software/simpson/).
+ASE can read NMR parameters in the `.magres` file format, but Soprano can turn them to more meaningful physical quantities such as isotropies, anisotropies and asymmetries. In addition, with a full database of NMR active nuclei, Soprano can compute quadrupolar and dipolar couplings for specific isotopes. Finally, Soprano can produce a fast approximation of a powder spectrum - both MAS and static - in the diluted atoms approximation, or if that is not enough for your needs, provide an interface to NMR simulation software [Simpson](https://inano.au.dk/about/research-centers-and-projects/nmr/software/simpson).
 
 ### Machine learning and phylogenetic analysis
 The `soprano.analyse.phylogen` module contains functionality to classify collections of structures based on relevant parameters of choice and identify similarities and patterns using Scipy's hierarchy and k-means clustering algorithms. This can be of great help when analysing collections of potential crystal structure looking for polymorphs, finding defect sites, or analysing disordered systems.

diff --git a/docs/_config.yml b/docs/_config.yml
@@ -23,8 +23,8 @@ latex:
 
 # Information about where the book exists on the web
 repository:
-  url: https://github.com/CCP-NC/soprano  # Online location of your book
-  path_to_book: docs  # Optional path to your book, relative to the repository root
+  url: https://github.com/jkshenton/soprano  # Online location of your book
+  # path_to_book: docs  # Optional path to your book, relative to the repository root
   branch: master  # Which branch of the repository should be used when creating links (optional)
 
 # Add GitHub buttons to your book
@@ -38,6 +38,10 @@ html:
   # google_analytics_id       : ""  # A GA id that can be used to track book views.
   # announcement              : "" # A banner announcement at the top of the site.
 
+launch_buttons:
+  colab_url: "https://colab.research.google.com"
+  binderhub_url: "https://mybinder.org" 
+
 sphinx:
   extra_extensions:
   - 'sphinx.ext.autodoc'
@@ -46,6 +50,8 @@ sphinx:
   - 'sphinx.ext.autosummary'
   - 'sphinxcontrib.mermaid'
   - 'sphinx.ext.mathjax'
+  - 'sphinx_click'
+  - 'sphinxcontrib.bibtex'
   config:
     add_module_names: True
     html_theme: "sphinx_book_theme"
@@ -58,3 +64,7 @@ sphinx:
       inherited-members: True
       private-members: True
       show-inheritance: True
+    # Automatically include type hints in the descriptions
+    autodoc_typehints: 'description'  # Or 'both' if you want them in both the signature and description
+
+    bibtex_bibfiles: ['references.bib']
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -2,6 +2,9 @@ format: jb-article
 root: intro
 sections:
 - file: installation
+- file: cli
+  sections:
+  - file: cli-cookbook
 - file: tutorials
   sections:
   - file: tutorials/01-basic_concepts.ipynb
@@ -10,6 +13,8 @@ sections:
   - file: tutorials/04-clustering.ipynb
   - file: tutorials/05-nmr.ipynb
   - file: tutorials/06-defect_calculations.ipynb
+  - file: tutorials/07-soprano-cli.ipynb
 - file: submitter
 - file: api
 - file: citing
+- file: references
diff --git a/docs/cli-cookbook.md b/docs/cli-cookbook.md
@@ -0,0 +1,219 @@
+# CLI Cookbook
+
+## NMR data extraction
+The `nmr` subcommand has a number of options to extract NMR data from a Magres file. You can see the full help by running `soprano nmr -h`. Here are some common examples:
+
+* Extract a full summary (will look for both EFG and MS data):
+
+    ```bash
+    soprano nmr seedname.magres
+    ```
+
+* Output summary to a CSV file:
+
+    ```bash
+    soprano nmr seedname.magres -o summary.csv
+    ```
+
+* Output summary to a JSON file:
+
+    ```bash
+    soprano nmr seedname.magres -o summary.json
+    ```
+
+* Extract a full summary for multiple files:
+
+    ```bash
+    soprano nmr *.magres
+    ```
+
+* Extract a full summary for multiple files, merging into one table:
+
+    ```bash
+    soprano nmr --merge *.magres
+    ```
+
+* Extract just the MS data:
+
+    ```bash
+    soprano nmr -p ms seedname.magres
+    ```
+
+* Extract just the MS data for Carbon:
+
+    ```bash
+    soprano nmr -p ms -s C seedname.magres
+    ```
+
+* Or just the first 4 Carbon atoms:
+
+    ```bash
+    soprano nmr -p ms -s C.1-4 seedname.magres
+    ```
+
+* Extract just the MS data for Carbon and Nitrogen:
+
+    ```bash
+    soprano nmr -p ms -s C,N seedname.magres
+    ```
+
+* Extract just MS data for the sites with label H1a:
+
+    ```bash
+    soprano nmr -p ms -s H1a seedname.magres
+    ```
+
+* Set chemical shift references and gradients (non-specified references are set to zero and non-specified gradients are set to -1):
+
+    ```bash
+    soprano nmr -p ms --references C:170,H:100 --gradients C:-1,H:-0.95 seedname.magres
+    ```
+
+* Set custom isotope
+
+    ```bash
+    soprano nmr -p efg --isotopes 13C,2H seedname.magres
+    ```
+
+* By default, Soprano will reduce the structure to the uniques sites (based either on CIF labels or symmetry operations. If you want to disable this, you can use the `--no-reduce` option:
+
+    ```bash
+    soprano nmr --no-reduce seedname.magres
+    ```
+
+* You can construct queries that are applied to all loaded magres files using the pandas dataframe query syntax. For example, to extract the MS data for all H sites with a chemical shielding between 100 and 200 ppm *and* an asymmetry parameter greater than 0.5:
+
+    ```bash
+    soprano nmr -s H --query "10 < MS_shielding < 30 and MS_asymmetry > 0.5" *.magres 
+    ```
+
+## 2D NMR plots
+
+The `plotnmr` subcommand can be used to generate 2D NMR plots from a magres file. Most of the options are the same as for the `nmr` subcommand in terms of filtering sites, setting references, isotopes etc. You can see the full help by running `soprano plotnmr --help`. 
+
+Here are some common examples:
+
+* Plot proton-proton correlation spectrum:
+
+    ```bash
+    soprano plotnmr -p 2D -x H -y H seedname.magres
+    ```
+
+* Plot C-H correlation spectrum with marker sizes proportional to the dipolar coupling strength. Plot the chemical shift rather than shielding by supplying reference values:
+
+    ```bash
+    soprano plotnmr -x C -y H --scale-marker-by dipolar --references C:180,H:30 seedname.magres
+    ```
+
+* As previous, but plot a heatmap and contour lines in addition to the markers:
+
+    ```bash
+    soprano plotnmr -x C -y H --scale-marker-by dipolar --references C:180,H:30 --heatmap --contour seedname.magres
+    ```
+
+* Plot the H-H double quantum correlation spectrum:
+
+    ```bash
+    soprano plotnmr -p 2D -x H -y H --yaxis-order 2Q seedname.magres
+    ```
+
+* As previous, but averaging over dynamic CH3 and NH3 sites:
+
+    ```bash
+    soprano plotnmr -p 2D -x H -y H --yaxis-order 2Q -g CH3,NH3 seedname.magres
+    ```
+
+* By default, Soprano will reduce the system to the inequivalent sites first (e.g. those with the same CIF label or a symmetrically equivalent position). To prevent this, use the `--no-reduce` option:
+
+    ```bash
+    soprano plotnmr -p 2D -x H -y H --yaxis-order 2Q -g CH3,NH3 --no-reduce seedname.magres
+    ```
+
+* Impose a distance cut-off (in Å) between pairs of sites:
+
+    ```bash
+    soprano plotnmr -p 2D -x C -y H --rcut 1.5 seedname.magres
+    ```
+
+* Combining several of these options:
+
+    ```bash
+    soprano plotnmr -p 2D -x C -y H \
+            -g CH3 \
+            --rcut 1.5 \
+            --scale-marker-by dipolar \
+            --no-markers \
+            --references C:180,H:30 \
+            --heatmap \
+            --colormap "viridis" \
+            --contour \
+            --contour-levels 15 \
+            --contour-color "black" \
+            --contour-linewidth 0.5 \
+            seedname.magres
+    ```
+
+
+
+## Dipolar Couplings
+
+* Extract dipolar couplings between all pairs of sites:
+
+    ```bash
+    soprano dipolar seedname.magres
+    ```
+
+* Extract dipolar couplings between all pairs of sites, outputting to a CSV file:
+
+    ```bash
+    soprano dipolar seedname.magres -o dipolar.csv
+    ```
+
+* Extract dipolar couplings between all pairs of sites, and print out those whose absolute value is greater than 10 kHz:
+
+    ```bash
+    soprano dipolar --query "abs(D) > 10.0" seedname.magres
+    ```
+
+
+## Split up molecules
+
+The `splitmols` command can be used to split up a structure into its components (e.g. molecules, framework) based on a connectivity matrix. You can see the full help by running `soprano splitmols --help`. This should work with structure files in any format that ASE can read (= almost all structure formats).
+
+By default the command will output the components to separate extended xyz files. For example
+
+* Split up a structure into molecules within the same unit cell etc. and output to separate .xyz files:
+
+    ```bash
+    soprano splitmols seedname.cif
+    ```
+
+* Split up a structure into molecules use the ASE GUI to view the structures (no files are written):
+
+    ```bash
+    soprano splitmols seedname.cif --view --no-write
+    ```
+
+* Split up a structure into molecules and output to a directory in the CASTEP .cell format:
+
+    ```
+        soprano splitmols seedname.cif -o output_directory -f cell
+    ```
+
+* Center the molecules in a new cell with a 10 Å vacuum spacing:
+
+    ```bash
+    soprano splitmols seedname.cif -c --vacuum 10.0
+    ```
+
+* Split a zeolite framework with a molecule in a pore into separate files. Here the `--vdw-scale` option is used to increase the van der Waals radii of the atoms by 30% to ensure that the framework is intact and the molecule is separate. The `--no-cell-indices` option is used to prevent the framework atoms from crossing the cell boundaries. These settings work for the tests/test_data/ZSM-5_withH2O.cif example. In other cases you might need to tweak the vdW values manually using the ` --vdw-custom` flag. Use the `-vvv` verbosity flag to see the vdW radii used.
+
+    ```bash
+    soprano splitmols seedname.cif --vdw-scale 1.3 --no-cell-indices
+    ```
+
+* Split the molecules into a new cell defined manually. We can provide the cell as a single float (= cubic cell with that lattice parameter) or as a string with three floats separated by spaces (e.g. `"10 10 20"` for a 10x10x20 Å cell or `"10 10 10 90 90 90"` for a 10x10x10 Å cell with 90° angles) or as a list of 9 floats (e.g. `"10 0 0 0 10 0 0 0 10"`) for a general cell.
+
+    ```bash
+    soprano splitmols seedname.cif --cell "10 10 20"
+    ```
diff --git a/docs/cli.rst b/docs/cli.rst
@@ -0,0 +1,9 @@
+Command Line Interface
+=======================================================
+
+.. click:: soprano.scripts.cli:soprano
+  :prog: soprano
+  :show-nested:
+
+
+