Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #29

Merged
merged 31 commits into from
Nov 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
803d486
test set up to address path issue
pnrobinson Sep 25, 2023
4d9c384
Ignore pkl files
Sep 25, 2023
6d3e855
Add lung ingest python script
Sep 25, 2023
673636c
Change code to always set taxonomy to Human
Sep 25, 2023
426814c
Add neoplasm disease when disease is empty, while processing mutations
Sep 25, 2023
538ea02
Add arg to increase batch size when download all the things
Sep 25, 2023
068105a
Add code to check for mutations in phenopackets (no dice)
Sep 25, 2023
b4ed929
Fix things so we are adding mutations to phenopackets
Sep 25, 2023
9f8a8a0
fixing diagnosis mapper
pnrobinson Sep 25, 2023
a2df42c
Run tests for PRs on develop
Sep 25, 2023
6b46168
Fix tests
Sep 25, 2023
4221539
fixing diagnosis mapper
pnrobinson Sep 25, 2023
80bda5e
adding tsv inclusion in pyproject.toml
pnrobinson Sep 25, 2023
e65ea47
Merge pull request #24 from monarch-initiative/op_diagnosis_mapper_test
justaddcoffee Sep 25, 2023
a7ba356
Merge branch 'develop' into fix_ingest
Sep 25, 2023
79c6cad
Fix test
Sep 25, 2023
a79f8d4
Delete unused imports
Sep 25, 2023
08aa549
Fix (maybe) import of TSV for tests
Sep 25, 2023
2800927
Merge pull request #23 from monarch-initiative/fix_ingest
justaddcoffee Sep 25, 2023
56e41f8
Write out phenopackets in run_lung.py
Sep 25, 2023
f90359c
Write out phenopackets in run_lung.py
Sep 25, 2023
7a3571a
Merge branch 'develop' of https://github.com/monarch-initiative/c2p i…
Sep 25, 2023
13d93b0
adding more tsv diagnosis codes for cervical cancer
pnrobinson Sep 25, 2023
2deb134
Fix duplicated disease entries
Sep 25, 2023
5116e10
Merge pull request #25 from monarch-initiative/more_diagnosis_codes
justaddcoffee Sep 25, 2023
5837825
Do not add the disease with the same id twice.
ielis Sep 25, 2023
05d7961
Merge pull request #26 from monarch-initiative/fix_duplicated_disease…
ielis Sep 25, 2023
25c84e1
Fix cache files so caches for different queries don't clobber each other
Sep 26, 2023
acae2de
Merge pull request #27 from monarch-initiative/better_cache_filename
justaddcoffee Sep 26, 2023
b2c6e2e
mkdocs
pnrobinson Nov 16, 2023
37252c0
documentation
pnrobinson Nov 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 21 additions & 39 deletions .github/workflows/documentation.yaml
Original file line number Diff line number Diff line change
@@ -1,44 +1,26 @@
name: Sphinx Documentation
name: mkdocs-generation
on:
push:
branches: [ main ]

branches:
- master
- main
permissions:
contents: write
jobs:
build-docs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@main
with:
fetch-depth: 0 # otherwise, you will fail to push refs to dest repo
ref: ${{ github.ref }}

- name: Set up Python 3
uses: actions/setup-python@v4
with:
python-version: 3.9

- name: Install Python dependencies
run: pip3 install sphinx sphinx-rtd-theme sphinx-copybutton

- name: Build documentation
run: |
## Init the target folder.
# We will put all site documentation there.
mkdir -p gh-pages
touch gh-pages/.nojekyll
git checkout main
python3 -m pip install .
# Generate the HTML pages and move the generated content into the target folder.
sphinx-apidoc --separate --module-first -d 4 -H "API reference" --follow-links -o docs/apidocs src/oncoexporter
cd docs
make html
mv _build/html/* ../gh-pages

- name: Deploy documentation
if: ${{ github.event_name == 'push' }}
uses: JamesIves/github-pages-deploy-action@v4.4.1
with:
branch: gh-pages
clean: true
folder: gh-pages
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: 3.x
- uses: actions/cache@v2
with:
key: ${{ github.ref }}
path: .cache
- run: pip install mkdocs-material
- run: pip install mkdocs-material[imaging]
- run: pip install mkdocs-material-extensions
- run: pip install pillow cairosvg
- run: pip install mkdocstrings[python]
- run: mkdocs gh-deploy --force
2 changes: 1 addition & 1 deletion .github/workflows/qc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
branches: [ main, develop ]
workflow_dispatch:

permissions:
Expand Down
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -163,4 +163,12 @@ cython_debug/
.vscode/
.idea/
c2p_env/
notebooks/Untitled.ipynb
notebooks/Untitled.ipynb

\.*.pkl

*.pkl
*.tsv
.DS_Store
src/phenopackets
*.tar.gz
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

88 changes: 88 additions & 0 deletions docs/cda/cda_disease.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# CDA Disease

We extract information about the disease diagnosis from two CDA tables, `diagnosis` and `researchsubject`. We first summarize the tables and then outline our ETL strategy.



## diagnosis


| Column | Example | Explanation |
|:----------------|:---------------|:----------------|
| diagnosis_id | CGCI-HTMCP-CC.HTMCP-03-06-02424.HTMCP-03-06-02424_diagnosis| y |
| diagnosis_identifier | see below | y |
| primary_diagnosis | Squamous cell carcinoma, keratinizing, NOS | y |
| age_at_diagnosis | 13085.0 | y |
| morphology | 8071/3 | y |
| stage | None | y |
| grade | G3 | y |
| method_of_diagnosis | Biopsy | y |
| subject_id | CGCI.HTMCP-03-06-02424 | y |
| researchsubject_id | CGCI-HTMCP-CC.HTMCP-03-06-02424| y |


The fields of the table have the following meaning.

- diagnosis_id
Question: It seems as if this identifier has some syntex of meaning or is it random?
- diagnosis_identifier
Question: This field seems to have a lot of structure. How is it used in CDA and is there documentation on how to interpret it?
This field has the following structure.
```
[{'system': 'GDC',
'field_name': 'case.diagnoses.diagnosis_id',
'value': '06af070e-aad4-5b2d-a693-b6ccfe93985a'},
{'system': 'GDC',
'field_name': 'case.diagnoses.submitter_id',
'value': 'HTMCP-03-06-02424_diagnosis'}]
```
- primary_diagnosis
This field represents the main cancer diagnosis of this individual
- age_at_diagnosis
This field represents the number of days of life of the individual on the day during which the cancer diagnosis was made.
- morphology
Question: What do entries such as `8071/3` mean? Is there a data dictionary for morphology?
- stage
Cancer stage.
- grade
Cancer grade. Note that in many tables there are strings such as G3. NCIT has more detailed terms, but we think it best to stick to the top level, and possible consider postcomposition to represent specific stage systems.
- method_of_diagnosis
This corresponds to
- subject_id
Identifier for the individual being investigated
- researchsubject_id
Identifier for the researchsubject (which can be a sample or an individaul - Question: where is this documented?)


## researchsubject


| Column | Example | Explanation |
|:----------------|:---------------|:----------------|
| researchsubject_id | CPTAC-3.C3L-00563 | y |
| researchsubject_identifier | see below | y |
| member_of_research_project | CPTAC-3 | y |
| primary_diagnosis_condition | Adenomas and Adenocarcinomas | y |
| primary_diagnosis_site | Uterus, NOS | y |
| subject_id | CPTAC.C3L-00563 | y |


- researchsubject_id
xyz
- researchsubject_identifier
Question: How do we interpret this kind of structure:
```
[{'system': 'GDC',
'field_name': 'case.case_id',
'value': '2b1894fb-b168-42ca-942f-a5def0bb8309'},
{'system': 'GDC', 'field_name': 'case.submitter_id', 'value': 'C3L-00563'}]
```

- member_of_research_project
Question: Where do we get more information about the research projects? What informationis available?
- primary_diagnosis_condition
Question: This seems to be duplicative with the field `primary_diagnosis` in the diagnosis table. What is the difference?
- primary_diagnosis_site
Todo - we can map this to uberon
- subject_id
This relates to the subject_id in other tables.
95 changes: 0 additions & 95 deletions docs/conf.py

This file was deleted.

17 changes: 17 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# oncoexporter


Oncoexporter is a Python package that supports extract transform load (ETL)
operations for patient data in translational research on oncology.
Input data from sources such as Cancer Data Aggregator (CDA) are
transformed into collections of
[GA4GH Phenopackets](https://github.com/phenopackets/phenopacket-schema){:target="\_blank"}.



### Feedback


The best place to leave feedback, ask questions, and report bugs is the
[Oncoexporter Issue Tracker](https://github.com/monarch-initiative/oncoexporter/issues).

37 changes: 0 additions & 37 deletions docs/index.rst

This file was deleted.

Loading