Skip to content

Commit

Permalink
Merge pull request #29 from monarch-initiative/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
pnrobinson authored Nov 16, 2023
2 parents 2d83325 + 37252c0 commit 9cb2c97
Show file tree
Hide file tree
Showing 26 changed files with 540 additions and 344 deletions.
60 changes: 21 additions & 39 deletions .github/workflows/documentation.yaml
Original file line number Diff line number Diff line change
@@ -1,44 +1,26 @@
name: Sphinx Documentation
name: mkdocs-generation
on:
push:
branches: [ main ]

branches:
- master
- main
permissions:
contents: write
jobs:
build-docs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@main
with:
fetch-depth: 0 # otherwise, you will fail to push refs to dest repo
ref: ${{ github.ref }}

- name: Set up Python 3
uses: actions/setup-python@v4
with:
python-version: 3.9

- name: Install Python dependencies
run: pip3 install sphinx sphinx-rtd-theme sphinx-copybutton

- name: Build documentation
run: |
## Init the target folder.
# We will put all site documentation there.
mkdir -p gh-pages
touch gh-pages/.nojekyll
git checkout main
python3 -m pip install .
# Generate the HTML pages and move the generated content into the target folder.
sphinx-apidoc --separate --module-first -d 4 -H "API reference" --follow-links -o docs/apidocs src/oncoexporter
cd docs
make html
mv _build/html/* ../gh-pages
- name: Deploy documentation
if: ${{ github.event_name == 'push' }}
uses: JamesIves/github-pages-deploy-action@v4.4.1
with:
branch: gh-pages
clean: true
folder: gh-pages
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: 3.x
- uses: actions/cache@v2
with:
key: ${{ github.ref }}
path: .cache
- run: pip install mkdocs-material
- run: pip install mkdocs-material[imaging]
- run: pip install mkdocs-material-extensions
- run: pip install pillow cairosvg
- run: pip install mkdocstrings[python]
- run: mkdocs gh-deploy --force
2 changes: 1 addition & 1 deletion .github/workflows/qc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
branches: [ main, develop ]
workflow_dispatch:

permissions:
Expand Down
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -163,4 +163,12 @@ cython_debug/
.vscode/
.idea/
c2p_env/
notebooks/Untitled.ipynb
notebooks/Untitled.ipynb

\.*.pkl

*.pkl
*.tsv
.DS_Store
src/phenopackets
*.tar.gz
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

88 changes: 88 additions & 0 deletions docs/cda/cda_disease.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# CDA Disease

We extract information about the disease diagnosis from two CDA tables, `diagnosis` and `researchsubject`. We first summarize the tables and then outline our ETL strategy.



## diagnosis


| Column | Example | Explanation |
|:----------------|:---------------|:----------------|
| diagnosis_id | CGCI-HTMCP-CC.HTMCP-03-06-02424.HTMCP-03-06-02424_diagnosis| y |
| diagnosis_identifier | see below | y |
| primary_diagnosis | Squamous cell carcinoma, keratinizing, NOS | y |
| age_at_diagnosis | 13085.0 | y |
| morphology | 8071/3 | y |
| stage | None | y |
| grade | G3 | y |
| method_of_diagnosis | Biopsy | y |
| subject_id | CGCI.HTMCP-03-06-02424 | y |
| researchsubject_id | CGCI-HTMCP-CC.HTMCP-03-06-02424| y |


The fields of the table have the following meaning.

- diagnosis_id
Question: It seems as if this identifier has some syntex of meaning or is it random?
- diagnosis_identifier
Question: This field seems to have a lot of structure. How is it used in CDA and is there documentation on how to interpret it?
This field has the following structure.
```
[{'system': 'GDC',
'field_name': 'case.diagnoses.diagnosis_id',
'value': '06af070e-aad4-5b2d-a693-b6ccfe93985a'},
{'system': 'GDC',
'field_name': 'case.diagnoses.submitter_id',
'value': 'HTMCP-03-06-02424_diagnosis'}]
```
- primary_diagnosis
This field represents the main cancer diagnosis of this individual
- age_at_diagnosis
This field represents the number of days of life of the individual on the day during which the cancer diagnosis was made.
- morphology
Question: What do entries such as `8071/3` mean? Is there a data dictionary for morphology?
- stage
Cancer stage.
- grade
Cancer grade. Note that in many tables there are strings such as G3. NCIT has more detailed terms, but we think it best to stick to the top level, and possible consider postcomposition to represent specific stage systems.
- method_of_diagnosis
This corresponds to
- subject_id
Identifier for the individual being investigated
- researchsubject_id
Identifier for the researchsubject (which can be a sample or an individaul - Question: where is this documented?)


## researchsubject


| Column | Example | Explanation |
|:----------------|:---------------|:----------------|
| researchsubject_id | CPTAC-3.C3L-00563 | y |
| researchsubject_identifier | see below | y |
| member_of_research_project | CPTAC-3 | y |
| primary_diagnosis_condition | Adenomas and Adenocarcinomas | y |
| primary_diagnosis_site | Uterus, NOS | y |
| subject_id | CPTAC.C3L-00563 | y |


- researchsubject_id
xyz
- researchsubject_identifier
Question: How do we interpret this kind of structure:
```
[{'system': 'GDC',
'field_name': 'case.case_id',
'value': '2b1894fb-b168-42ca-942f-a5def0bb8309'},
{'system': 'GDC', 'field_name': 'case.submitter_id', 'value': 'C3L-00563'}]
```

- member_of_research_project
Question: Where do we get more information about the research projects? What informationis available?
- primary_diagnosis_condition
Question: This seems to be duplicative with the field `primary_diagnosis` in the diagnosis table. What is the difference?
- primary_diagnosis_site
Todo - we can map this to uberon
- subject_id
This relates to the subject_id in other tables.
95 changes: 0 additions & 95 deletions docs/conf.py

This file was deleted.

17 changes: 17 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# oncoexporter


Oncoexporter is a Python package that supports extract transform load (ETL)
operations for patient data in translational research on oncology.
Input data from sources such as Cancer Data Aggregator (CDA) are
transformed into collections of
[GA4GH Phenopackets](https://github.com/phenopackets/phenopacket-schema){:target="\_blank"}.



### Feedback


The best place to leave feedback, ask questions, and report bugs is the
[Oncoexporter Issue Tracker](https://github.com/monarch-initiative/oncoexporter/issues).

37 changes: 0 additions & 37 deletions docs/index.rst

This file was deleted.

Loading

0 comments on commit 9cb2c97

Please sign in to comment.