Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Standardization module import #70

Open
wants to merge 57 commits into
base: master
Choose a base branch
from

Conversation

Allend95
Copy link

@Allend95 Allend95 commented Jan 24, 2021

Description

Add the Standardization module to Opencadd formely developed under the name standardizer and mrstan for Allen Dumler's Bachelor thesis.

Final Tasks

  • Read papers (28.-30. April)
  • Save citations (28.-30. April)
  • integrate citations into text (30. April)
  • complete Background chapter (30. April -1. May)
  • review of Background by Jaime?
    • fixing remarks
  • review curation steps (Trust, but verify paper) (1. May)
  • complete data and methods chapter (1.-2. May)
  • review of Data and Methods by Jaime?
    • fixing remarks
  • review and complete explanational text in notebook (2. May)
  • add minor missing functions to notebook (2. May)
  • final review of notebook by Jaime?
    • fixing remarks
  • evaluate resulting functionality in Results and Discussion (3. May)
  • complete Results and Discussion (4. -5. May)
  • review of Results and Discussion by Jaime?
    • fixing remarks
  • finish Conclusion (6. May)
  • finish Outlook (6. May)
  • review whole Thesis gramatically + spell check (7. May)
  • final review of the text by Jaime?
    • fixing final remarks

Todos CW 10

  • Goal: Finished guided notebook
    • further Evaluation of Substeps mentioned in paper
    • write helper functions to cover filtering steps
    • write explanations for every standardization step done
  • Commented list of content (describing what will be covered in each chapter
  • fill out sign up form for thesis (Implementation and evaluation of a computanional standardization pipeline for chemical compounds)

Todos CW 11

  • Research a good suiting data set for testing of the pipline
  • Setup overleaf doc
    • create chapters
    • base necessities

Todos CW 12

Code:

  • Check if test data sets trigger any filtering functions
  • add explanational texts
  • Write actual code for:
    • data import
    • SMILEs to Mol Conversion
    • detection of inorganics + Creation of subsets
    • detection of mixtures + Creation of subsets

Todos CW 13

Code:

  • Write actual code for:
    • filtering metals
    • Structural Conversion and Cleaning
      • Concept
      • --> Realize code from it
    • Removal of salts
      • Concept
      • --> Realize code from it
    • Normalization of Specific Chemotypes
      • Concept
      • --> Realize code from it
    • Removal of duplicates
      • Concept
      • --> Realize code from it

Todos CW 14

  • fixing some issues with subfunctions in the notebook
    Writing
  • Abstract (first draft)
  • Introduction (first draft)
  • Reading and listing Ressources for Introduction

Todos CW 15

Writing:

  • Introduction
    • turn notes into text
  • Data and Methods (draft)

Todos CW 16

Writing:

  • Data and Methods
    • turn notes into text
  • Results and Discussion (draft)
  • Review Abstract
  • Write Conclusion
  • Outlook (draft)

Todos CW 17

  • Review the explanational text of the notebook
  • Add missing explanations
  • Write explanational text on:
    • Normalization of Specific Chemotypes
    • Removal of duplicates

Non Technical To Dos:

  • Get my registration form ready for the Thesis
  • Organising my literature list
    • Read through important papers / literature
    • Take organized notes and evaluate in which chapters of the BT to use
    • organize citations
      • set up citavi
      • set up the right citation style
  • Continue Writing the chapters of the thesis in the overleaf doc (Goal finished Thesis)
    • Abstract
    • Introduction / Problems
    • State of the Art
    • Data and Methods
    • Results and Discussion
    • Conclusion
    • Outlook

Pull Request To Do's:

Todos

  • Evaluation of test data sets
  • Check what changed in rdKit that causes old pytest's to fail
  • Finish Notebook based on Fourches 2010 Paper

Notable points that this PR has either accomplished or will accomplish.

  • initial import of the old code
  • rename the imports inside the standardization modules
  • rename the modules following the OpenCadd naming convention
  • adjust tests
  • add wrapper functions for following functions
    • Chem.SetAromaticity(mol)
    • Chem.SetHybridization(mol)
    • AllChem.ComputeGasteigerCharges(mol)
    • AllChem.Compute2DCoords(mol) (necessary to create 2D-Coordinates for molecules entered with a SMILES string, where this information is missing)

Status

  • Ready to go

@Allend95 Allend95 changed the title Standardization module Initial Standardization module import Jan 24, 2021
@jaimergp
Copy link
Contributor

I noticed that init.py is not used to import modules anymore. How do I do that now or how are modules initialized now?

Can you point me to an example where this is happening? You still need __init__.py files afaik, unless you are talking about namespace (sub)packages.

@Allend95
Copy link
Author

I noticed that init.py is not used to import modules anymore. How do I do that now or how are modules initialized now?

Can you point me to an example where this is happening? You still need __init__.py files afaik, unless you are talking about namespace (sub)packages.

Yeah, I thinks that's what I mean. What I did in my project, was to import all subpackages like this:

# Add imports here
from .standardization import *
from .assign_stereochemistry import *
from .convert_format import *
from .detect_inorganic import *

But in all other packages no one uses those import as far as I can see it. My problem now is, that when I want to load my modules, like for example to perform pytest , then it can't find my modules and I don't know how to configure it in any other way...

@jaimergp
Copy link
Contributor

No, we don't want star * imports like that because they can pollute the namespace. For pytest, use the full import path, down to the module:

from opencadd.compounds.standardization import X

@jaimergp
Copy link
Contributor

Ah, I see you are using __all__. Then star * imports are fine, as long as they stay within opencadd.compounds.standardization. Add an __init__.py there if needed.

Comment on lines 7 to 19
from opencadd.compounds.standardization.standardization import *
from opencadd.compounds.standardization.assign_stereochemistry import *
from opencadd.compounds.standardization.convert_format import *
from opencadd.compounds.standardization.detect_inorganic import *
from opencadd.compounds.standardization.disconnect_metals import *
from opencadd.compounds.standardization.handle_charges import *
from opencadd.compounds.standardization.handle_fragments import *
from opencadd.compounds.standardization.handle_hydrogens import *
from opencadd.compounds.standardization.normalize_molecules import *
from opencadd.compounds.standardization.remove_salts import *
from opencadd.compounds.standardization.utils import *
from opencadd.compounds.standardization.sanitize_molecules import *
from opencadd.compounds.standardization.validate_molecules import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be relative imports. If it's not working then there's something else going on, but for sure absolute paths should not be needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The absolute paths are only relevant in the pytest suite.

@jaimergp
Copy link
Contributor

Hi @Allend95, I have cleaned the notebook a bit:

  • Applied black-nb
  • Removed the need to check for your current working directory -- we handle that through the _dh variable
  • Replaced failedAt with a more appropriate lambda that ties the global scope to local scope through a keyword argument. It's more verbose but less prone to errors.

@dominiquesydow dominiquesydow added the module-compounds-standardization Concerns opencadd.compounds.standardization module label Jun 14, 2021
@AndreaVolkamer AndreaVolkamer added the not-continued This will not be worked on label May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module-compounds-standardization Concerns opencadd.compounds.standardization module not-continued This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants