Authors:
- Hadrien Mary
Authors:
- Hadrien Mary
Changed:
- Different docs style, with Tabs on the top for
Overview
,Usage
,Tutorials
,API
,Contribute
,License
Authors:
- DomInvivo
Added:
- Add two new fucntions
dm.open_df
anddm.save_df
that automatically open and save a dataframe to a file. This is a convenience function that automatically handles the file format based on the file extension. The functions are also able to handle compressed files.
Changed:
- All the datamol modules and objects are now lazy loaded. It means
that loading now happens on-demand. Preliminary tests suggest the
datamol import time decreases by 20-fold (from 1s to 50ms on a
regular Ubuntu laptop) without affecting the subsequent calls to the
modules and objects. This is a major improvement for the datamol
usability. This new behaviour is enabled by default but can be
disabled by setting the environment variable
DATAMOL_DISABLE_LAZY_LOADING
to1
. - Move the fs module to its dedicated section in the docs. Fix #160.
Removed:
- Remove unused, broken and uncovered
datamol.fragment.assemble_fragment_iter()
function.
Authors:
- Hadrien Mary
- dessygil
Authors:
- Hadrien Mary
Authors:
- Hadrien Mary
Fixed:
- Fix wrong image output for lasso viz function. Make it consistent
with
dm.to_image()
and rdkit. - Avoid global
IPython
import so it's not an hard datamol dependency. - Add
importlib-resources
dep in the datamol pypi package.
Authors:
- Hadrien Mary
Added:
- added a feature that highlights substructures of 2D molecular images
Changed:
- Update CNAME to docs.datamol.io
- Replace all occurrences of doc.datamol.io by docs.datamol.io
- Switch from
pkg_resources
toimportlib.resources
for loading resources. - Enable python 3.11 on the CI.
- Relocatem
datamol/data.py
todatamol/data/\_\_init\_\_.py
.
Fixed:
- Color bug of the search input bar
Authors:
- Emmanuel Noutahi
- Hadrien Mary
- Honoré Hounwanou
- dessygil
Added:
- A multi-mol2 file reader that converts into rdkit objects
Fixed:
- Updated the logging in
\_sanifix4.py
to use the RDKit logger
Authors:
- Cas
- Hadrien Mary
- Pakman450
- Therence1
Changed:
- moved
CODE_OF_CONDUCT.md
,CODEOWNDERS
,CONTRIBUTING.md
andSECURITY.md
to.github/
dir - Improve and automate the release process.
- Adapt the logo and colors to the new branding.
- Replace
datamol-org
todatamol-io
everywhere in the codebase due to GH org rename.
Authors:
- Hadrien Mary
- Saurav Maheshkar
Changed:
- Add
TypeAlias
types todatamol.types.\*
. - Drop
setup.py
in favour ofpyproject.toml
only. - Replace unmaintained
appdirs
by maintainedplatformdirs
. - Enable weekly tests on
main
branch.
Fixed:
- Add missing fcfp func in fingerprint functions dict
Authors:
- Hadrien Mary
- michelml
Added:
- Add PDB read/writer functions:
dm.to_pdbblock()
,dm.read_pdbblock()
,dm.read_pdbfile()
,dm.to_pdbfile()
Changed:
- Improve output type in
to_df
.`
Authors:
- Hadrien Mary
Added:
- Add multiple utilities to work with mapped SMILES with hydrogens.
- Add
dm.clear_atom_props()
to remove atom's properties. - Add
dm.clear_atom_map_number()
to remove the atom map number property. - Add
dm.get_atom_positions()
to retrieve the atomic positions of a conformer of a molecule. - Add
dm.set_atom_positions()
to add a new confomer to a molecule given a list of atomic positions.
Changed:
- Add new arguments to `dm.to_mol`:
allow_cxsmiles
,parse_name
,remove_hs
andstrict_cxsmiles
. Refers to the docstring for the details. - Set
copy
toTrue
by default todm.atom_indices_to_mol()
. - Allow to specify the property keys to clear in
dm.clear_mol_props()
. If not set, the original default beahviour is to clear everything.
Authors:
- Hadrien Mary
Fixed:
- Ensure rdkit 2021.03 works with latest datamol. The support is not "official" but only a single function must be adapted so it's ok.
Authors:
- Hadrien Mary
Added:
- Support for
max_num_mols
indm.read_sdf()
. Useful when files are large and debugging code. - Support for returning the invalid molecules in
dm.read_sdf
. Useful when we need to know which one failed. - Support for more compression formats when reading SDF files using
fssep.open(..., compression="infer")
. - Add
CODEOWNERS
file. - Add
dm.descriptors.n_spiro_atoms
anddm.descriptors.n_stereo_centers_unspecified
.
Changed:
- Overload output types for
dm.read_sdf
anddm.data.\*
. - Reduce tests duration (especially in CI).
Authors:
- DomInvivo
- Hadrien Mary
Changed:
- Add a comment recommending to not use the SMI file format.
Fixed:
- Fix a bug when reading a remote file with
dm.read_smi()
.
Authors:
- Hadrien Mary
Added:
- Parallelization to
to_df
for faster conversion to dataframe
Fixed:
- Error in docs
Authors:
- Emmanuel Noutahi
Fixed:
- Fix a typo in a tutorial.
Authors:
- Hadrien Mary
- Valence-JonnyHsu
Changed:
- Remove the
rdkit
dependency in the setup.py to prevent pip to always override the conda rdkit package. See rdkit/rdkit#2690 (comment) for context.
Authors:
- Hadrien Mary
Added:
dm.Atom
anddm.Bond
types.- Add RDKit as a pypi dep.
- Add
datamol.hash_mol()
based onrdkit.Chem.RegistrationHash
.
Changed:
- RDKit 2022.09: use
Draw.shouldKekulize
instead ofDraw.\_okToKekulizeMol
. - RDKit 2022.09: don't use
dm.convert.\_ChangeMoleculeRendering
for RDKit >=2022.09.
Authors:
- Hadrien Mary
Added:
- Added argument product_index in
select_reaction_output
. It allows to return all products and a product of interest by the index. - Updated unit tests.
Authors:
- Lu Zhu
Added:
- Added a new chemical reaction module for rdkit chemical reactions and attachment manipulations.
Fixed:
Authors:
- Hadrien Mary
- Lu Zhu
Changed:
- Bump upstream GH actions versions.
dm.fs.copy_dir
now uses the internal fsspeccopy
when the two source and destination fs are the same. It makes the copy much faster.
Fixed:
- Use
os.PathLike
to recognize a broader range of string-based path inputs in thedm.fs
module. It prevents file objects such aspy.\_path.local.LocalPath
not being recognized as path.
Authors:
- Hadrien Mary
Fixed:
- Missing header in the fragment tutorial.
Authors:
- Hadrien Mary
- Valence-JonnyHsu
Added:
- Add
with_atom_indices
todm.to_smiles
. If enable, atom indices will be added to the SMILES.
Changed:
- Changed the default for
dm.fs.is_file()
fromTrue`` to \
False`. - Refactor the API doc to breakdown all the submodules in individual doc. Thanks to @MichelML for the suggestion.
- Re-enable pipy activity in rever.
Fixed:
- Minor typo in the documentation of
dm.conformers.generate()
Authors:
- Cas
- Hadrien Mary
- Valence-JonnyHsu
Added:
- New aligning tutorials.
Removed:
rdkit
dep from pypi (the dep is only on the conda forge package)
Fixed:
- Grammar in tutorials.
Authors:
- Hadrien Mary
- Valence-JonnyHsu
Fixed:
- Fix minor typos in tutorials
Authors:
- Hadrien Mary
- michelml
Added:
- Add configurations for dev containers based on the micromamba Docker image. More informations about dev container at https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/introduction-to-dev-containers.
- support for two additional forcefields: MMFF94s with and without electrostatic component
- energies output along with delta-energy to lowest energy conformer
Changed:
- API of dm.conformers.generate() to support choice of forcefield. In addition ewindow and eratio flags added to reject high energy conformers, either on absoute scale, or as ratio to rotatable bonds
- Revamped all the datamol tutorials and add new tutorials. Huge thanks to @Valence-jonnyhsu for leading the refactoring of the datamol tutorials.
- Improve documentation for
dm.standardize_mol()
- Multiple various docstring and typing improvments.
- Embed the cdk2.sdf and solubility*.sdf files within the datamol package to prevent issue with the RDKit config dir.
- Enable strict mode on the documentation to prevent any issues and inconsistency with the types and docstrings of datamol.
- Refactor micromamba CI to use latest and simplify it.
Removed:
- Remove unused and unmaintained
dm.actions
anddm.reactions
module. - Remove
copy
args fromadd_hs
andremove_hs
(RDKit already returns copies).
Fixed:
- Errors in ECFP fingerprints that computes FCFP instead of ECFP.
Authors:
- Emmanuel Noutahi
- Hadrien Mary
- Matt
Added:
- New possibilities for ambiguous matching of molecules in the
function
reorder_mol_from_template
Changed:
- Replaced
allow_ambiguous_hs_only
by the option"hs_only"
for theambiguous_match_mode
parameter ambiguous_match_mode
is now a String, no longer a bool.
Deprecated:
allow_ambiguous_hs_only
is no longer deprecated, but without warning since the feature is brand new.- Same for
ambiguous_match_mode
being a bool.
Authors:
- DomInvivo
- Hadrien Mary
Added:
datamol.graph.match_molecular_graphs
, with unit-testsdatamol.graph.reorder_mol_from_template
, with unit-tests
Changed:
- Typing in
datamol.graph.py
, changedrdkit.Chem.rdchem.Mol
todm.Mol
Deprecated:
- NOTHING
Removed:
- NOTHING
Fixed:
- NOTHING
Security:
- NOTHING
Authors:
- DomInvivo
- Emmanuel Noutahi
Fixed:
- Bug in
dm.conformer.generate()
when multiple conformers had equal energies - Fix the documentation.
Authors:
- Cas
- Hadrien Mary
Added:
- Add
dm.read_molblock()
anddm.to_molblock()
functions. - Add
dm.to_xlsx()
function.
Fixed:
- Fix the API doc.
Authors:
- Hadrien Mary
Changed:
- Add
joblib_batch_size
indm.parallelized_with_batches()
to be able to control the joblib batch size (which is different than thedm.parallelized_with_batches
batch size. - Various small improvements for unit tests.
Authors:
- Hadrien Mary
Added:
- Add
dm.parallelized_with_batches()
to parallelize workload with a function that take a batch of inputs.
Authors:
- Hadrien Mary
Changed:
- Don't import
sasscorer
by default but only during the call todm.descriptors.sas(mol)
Authors:
- Hadrien Mary
Changed:
- Use micromamba during CI.
- Add CI tests for RDKit=2022.03.
- Adapt a test to new rdkit version.
Fixed:
- typing for what is returned by dm.align.template_align
Authors:
- Hadrien Mary
- michelml
Changed:
- allow_r_groups option in dm.align.auto_align_many
Removed:
- should_align
Authors:
- Hadrien Mary
- michelml
Added:
- A new
dm.align
module with various functions to align a list of molecules. Usedm.align.template_align
to align a molecule to a template anddm.align.auto_align_many
to automatically partition and align a list of molecules. - New descriptors:
formal_charge
- New descriptors:
refractivity
- New descriptors:
n_rigid_bonds
- New descriptors:
n_stereo_centers
- New descriptors:
n_charged_atoms
- Add
dm.clear_props
to clear all the properties of a mol. - Add a new dataset in addition to freesolv based on RDKit CDK2 at
dm.cdk2()
. - Add
dm.strip_mol_to_core
to remove all R groups from a molecule. - Add
dm.UNSPECIFIED_BOND
dm.compute_ring_system
to extract the ring systems from a molecule.
Changed:
- Improve typing.
- Improve relative imports coverage.
- Adapt
dm.to_image
to use thealign
module.
Removed:
- Remove a lot of
\# type: ignore
as those can be error prone (hopefully the tests are here!)
Authors:
- Hadrien Mary
Added:
- Add
dm.conformers.keep_conformers
in order to only keep one or multiple conformers from a molecules.
Changed:
- Change the conformer generation arguments to use
useRandomCoords=True
by default. - Start using explicit
Optional
instead of implicitOptional
for typing. - Start using relative imports instead of absolute ones.
- When conformers are not minimized, sort them by energy (can be turned to False).
Removed:
- Remove
fallback_to_random_coords
argument fromgenerate_conformers
.
Authors:
- Hadrien Mary
Added:
- Support for selfies<2.0.0 in tests
Changed:
- Behaviour of all inchi functions to return None with a warning instead of silently returning an empty string
- Order of str evaluation on convertion function.
isinstance(str)
is now evaluated beforeis None
Fixed:
- Bug in unique_id making this evaluation falling back on 'd41d8cd98f00b204e9800998ecf8427e' on unsupported inputs. Instead None is returned now
Authors:
- Emmanuel Noutahi
Changed:
- Add
remove_hs
flag indm.read_sdf()
.
Authors:
- Hadrien Mary
Added:
- Add
dm.descriptors.n_aromatic_atoms
- Add
dm.descriptors.n_aromatic_atoms_proportion
- Add
dm.predictors.esol
- Add
dm.predictors.esol_from_data
Changed:
- Make
descriptors
a folder (backward compatible). - Rename
any_descriptor
toany_rdkit_descriptor
to be more explicit.
Authors:
- Hadrien Mary
Added:
- Add
dm.conformers.align_conformers()
to align the conformers of a list of molecules.
Changed:
- New lower bound rdkit version to
\>=2021.09
. See #81 for details.
Authors:
- Hadrien Mary
Fixed:
- Catch too long integer values in
set_mol_props
and switch toSetDoubleProp
instead ofSetIntProp
Authors:
- Hadrien Mary
Changed:
- Expose the clean_it flag when enumerating stereoisomers.
Authors:
- Hadrien Mary
- Julien Horwood
Added:
- Parameters allowing to customize or ignore failures when running the conformer generation.
Changed:
- When the conformer embedding fails, it will now optionally fall back to using random coordinates.
Authors:
- Hadrien Mary
- Julien Horwood
Added:
- Add a new
total
arg indm.parallelized()
(only useful when theprogress
is set toTrue
)
Changed:
- Prevent
tqdm_kwargs`` collision in \
dm.parallelized()`.
Authors:
- Hadrien Mary
Added:
- Add
dm.to_inchi_non_standard()
anddm.to_inchikey_non_standard()
in order to generate InChi values that are sensitive to tautomerism as well as undefined stereoisomery. - Add
dm.unique_id
to generate unique molecule identifiers based ondm.to_inchikey_non_standard
Changed:
- Add
use_non_standard_inchikey
flag argument todm.same_mol
.
Authors:
- Hadrien Mary
Added:
- Add
dm.utils.fs.copy_dir()
to recursively copy directories across filesystems + tests. - Add
dm.utils.fs.mkdir
+ tests. - Add a new
dm.descriptors
module withcompute_many_descriptors
andbatch_compute_many_descriptors
+ tests. - Add
dm.viz.match_substructure
to highlight one or more substructures in a list of molecules + tests. Note that the current function does not show different colors per match and submatch because of a limitation inMolsToGridImage
. We plan to address this in a future version of datamol. - Add a new
mcs
module backed byrdkit.Chem.rdFMCS
withfind_mcs
function + tests. - Add a new function
dm.viz.utils.align_2d_coordinates
to align 2d coordinates of molecules using either a given pattern or MCS. - Add
dm.canonical_tautomer
to canonicalize tautomers. - Add
dm.remove_stereochemistry()
. - Add a
bond_line_width
arg toto_image
. - Add
dm.atom_list_to_bond()
- Add
enable
flag todm.without_rdkit_log()
- Add a tutorial about the filesystem module.
- Add a tutorial about the viz module (still incomplete).
- Add
dm.substructure_matching_bonds
to perform a standard substructure match but also return the matching bonds instead of only the matching atoms. - Add new
dm.isomers
module + move relevant functions fromdm.mol
todm.isomers
- Add
dm.add_hs
anddm.remove
to add and remove hydrogens from molecules.
Changed:
- Set
fsspec
minimum version to\>=2021.9
. - Pimp up
dm.utils.to_image
to make it more robust (don't fail on certain molecules due to incorrect aromaticity) and also propagate more drawing options to RDKit such aslegend_fontsize
and others. - Add a new
align
argument indm.to_image()
to align the 2d coordinates of the molecules. - In
dm.to_image
,use_svg
is now set toTrue
by default. - Change the default
mol_size
from 200 to 300 into_image
. - Link
datamol.utils.fs
todatamol.fs
. - Change default
chunk_size
incopy_file
from 2048 to 1024 * 1024 (1MB). - Support parallel chunked distances computation in
dm.similarity.cdist
Authors:
- Hadrien Mary
Changed:
- The default git branch is now
main
appdirs
is now an hard dep.- Change CI to use rdkit
\[2021.03, 2021.09\]
and add the info the readme and doc.
Fixed:
- Test related to SELFIES to make it work with the latest 2.0 version.
dm.to_mol
acceptmol
as input but the specified type was onlystr
.
Authors:
- Hadrien Mary
Fixed:
- Force the input value(s) of
dm.molar.log_to_molar
to be a float since power of integers are not allowed.
Authors:
- Hadrien Mary
Removed:
py.typed
file that seems unused beside confusing static analyzer tools.
Authors:
- Hadrien Mary
Added:
to_smarts
for exporting molecule objects as SMARTSfrom_smarts
for reading molecule from SMARTS string
Changed:
- Allow exporting smiles in kekule representaiton
to_smarts
is properly renamed intosmiles_as_smarts
Authors:
- Emmanuel Noutahi
Removed:
- Revert batch_size fix to use default joblib instead
Fixed:
- Issue #58: sequence bug in parallel.
Authors:
- Emmanuel Noutahi
Added:
- Add a new function to measure execution time
dm.utils.perf.watch_duration
.
Changed:
- Add a
batch_size
option todm.utils.parallelized
. The default behaviourbatch_size=None
is unchanged and so 100% backward compatible.
Authors:
- Hadrien Mary
Changed:
get_protocol
is more general
Fixed:
- Bug in fs.glob due to protocol being a list
Authors:
- Emmanuel Noutahi
Added:
- Add missing appdirs dependency
- Add missing appdirs dependency
Fixed:
- Propagate tqdm_kwargs for parallel (was only done for sequential)
Authors:
- Hadrien Mary
Added:
- Add
tqdm_kwargs
todm.utils.JobRunner()
- Add
tqdm_kwargs
todm.utils.parallelized()
Changed:
- Propagate
job_kwargs
to dm.utils.parallelized()`
Authors:
- Hadrien Mary
Added:
- Add a DOI so datamol can get properly cited.
- Better doc about compat and CI
- Add a datamol Mol type:
dm.Mol
identical toChem.rdchem.Mol
Changed:
- Bump test coverage from 70% to 80%.
Authors:
- DeepSource Bot
- Hadrien Mary
- deepsource-autofix[bot]
Added:
- More tests for the
dm.similarity
modules + check against RDKit equivalent methods. dm.same_mol(mol1, mol2)
to check whether 2 molecules are the same based on their InChiKey.
Changed:
- use
scipy
indm.similarity.pdist()
. - Raise an error when a molecule is invalid in
dm.similarity.pdist/cdist
.
Deprecated:
dm.similarity.pdist()
nows returns only the dist matrix without thevalid_idx
vector.
Fixed:
- A bug returning an inconsistent dist matrix with
dm.similarity.pdist()
.
Authors:
- Hadrien Mary
Changed:
- A better and manually curated API documentation.
Authors:
- Hadrien Mary
Added:
- Add support for more fingerprint types.
- Two utility functions for molar concentration conversion:
dm.molar_to_log()
anddm.log_to_molar()
. - Add the
dm.utils.fs
module to work with any type of paths (remote or local).
Authors:
- Hadrien Mary
Added:
- Add a sanitize flag to
from_df
. - Automatically detect the mol column in
from_df
. - Add
add_hs
arg tosanitize_mol
.
Changed:
- Allow input a single molecule to
dm.to_sdf
instead of a list of mol. - Preserve mol properties and the frist conformer in
dm.sanitize_mol
. - Display a warning message when input mol has multiple conformers in
dm.sanitize_mol
.
Fixed:
- Remove call to
sanitize_mol
inread_sdf
, instead usesanitize=True
from RDKit. - Remove the
mol
column from the mol properties infrom_df
. It also fixesto_sdf
.
Authors:
- Hadrien Mary
Changed:
- Propagate
sanitize
andstrict_parsing
todm.read_sdf
.
Authors:
- Hadrien Mary
- Ishan Kumar
- michelml
Fixed:
- Fix again and hopefully the last time google analytics.
Authors:
- Hadrien Mary
Changed:
- Add s3fs and gcsfs as hard dep
Authors:
- Hadrien Mary
Authors:
- Hadrien Mary
- michelml
Authors:
- Hadrien Mary
Changed:
- New logo.
Authors:
- Hadrien Mary
Fixed:
- Fixed typo in readme
Authors:
- Emmanuel Noutahi
- Hadrien Mary
Authors:
- Hadrien Mary
Added:
dm.copy_mol
dm.set_mol_props
dm.copy_mol_props
dm.conformers.get_coords
dm.conformers.center_of_mass
dm.conformers.translate
dm.enumerate_stereoisomers
dm.enumerate_tautomers
dm.atom_indices_to_mol
Changed:
-
rdkit fp to numpy array conversion is purely numpy-based now (x4 faster).
-
Cleaning of various docstrings (removing explicit types).
-
Clean various types.
-
Allow
dm.to_image
instead ofdm.viz.to_image
-
Add atom indices drawing option to
dm.to_image
-
Allow to smiles to fail (default is to not fail but return None as before).
-
Add CXSmiles bool flag to to_smiles.
-
Rename utils.paths to utils.fs
-
Integrate pandatools into
dm.to_df
. -
Build a mol column from smiles in read_csv and read_excel
-
Rename
dm.sanitize_best
todm.sanitize_first
-
Fixed:
-
Scaffold tests for new rdkit version
-
Conformer cluster tests for new rdkit version
Authors:
- Hadrien Mary
- Therence1
- michelml
- mike
Fixed:
- Tqdm progress bar update on completion of job and not submission
Authors:
- Emmanuel Noutahi
Changed:
- Make ipywidgets an optional dep.
Authors:
- Hadrien Mary
Changed:
- Propagate more options to dm.reorder_atoms.
Authors:
- Hadrien Mary
Added:
dm.pick_centroids
for picking a set of centroid molecules using various algorithmdm.assign_to_centroids
for clustering molecules based on precomputed centroids.
Changed:
- Make
add_hs
optional inconformers.generate
and removed them whenadd_hs
is True. Explicit hydrogens will be lost.
Fixed:
- Doc string of
dm.pick_diverse
Authors:
- Emmanuel Noutahi
- Hadrien Mary
Added:
- Added outfile to viz.to_image
Changed:
- Replace ete3 by networkx due to GPL licensing.
- Fix some typos in docs.
Fixed:
- Null pointer exception during conformers generation.
Authors:
- Emmanuel Noutahi
- Hadrien Mary
- Honoré Hounwanou
- michelml
Added:
- Add a test to monitor datamol import duration.
Changed:
- Add rms cutoff option during conformers generation.
- Refactor conformer cluster function.
Authors:
- Hadrien Mary
Added:
- Include stub files for rdkit generated using stubgen from mypy.
Authors:
- Hadrien Mary
Added:
- Add
to_smi
andfrom_smi
in the IO module. - Support filelike object in io module.
- Add kekulization to to_mol
Changed:
- Switch tests of the IO module to regular functions.
Deprecated:
- In the IO module, use
urlpath
instead offile_uri
to followfsspec
conventions.
Fixed:
- Fix bug in read_excel where sheet_name wasnt being used.
Authors:
- Emmanuel Noutahi
- Hadrien Mary
Changed:
- Constraint rdkit to 2020.09 to get
rdBase.LogStatus()
Authors:
- Hadrien Mary
Changed:
- Better rdkit log disable/enable.
Authors:
- Hadrien Mary
Added:
- Test that execute the notebooks.
Fixed:
- Force rdkit >=2020.03.6 to avoid thread-related bug in
rdMolStandardize
Authors:
- Hadrien Mary
Added:
- Add
cdist
function to compute tanimoto sim between two list of molecules.
Fixed:
- Fix a bug in
dm.from_df
when the dataframe has a size of zero.
Authors:
- Hadrien Mary
Added:
- Add all the common sanitize functions.
- Add the 2_Preprocessing_Molecules notebook.
- Add fragment module.
- Add scaffold module.
- Add cluster module.
- Add assemble module.
- Add actions module.
- Add reactions module.
- Add dm.viz.circle_grid function
- Add doc with mkdocs
Authors:
- Hadrien Mary
Authors:
- Hadrien Mary
Authors:
Added:
- first release!