-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #29 from monarch-initiative/develop
Develop
- Loading branch information
Showing
26 changed files
with
540 additions
and
344 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,44 +1,26 @@ | ||
name: Sphinx Documentation | ||
name: mkdocs-generation | ||
on: | ||
push: | ||
branches: [ main ] | ||
|
||
branches: | ||
- master | ||
- main | ||
permissions: | ||
contents: write | ||
jobs: | ||
build-docs: | ||
deploy: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@main | ||
with: | ||
fetch-depth: 0 # otherwise, you will fail to push refs to dest repo | ||
ref: ${{ github.ref }} | ||
|
||
- name: Set up Python 3 | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: 3.9 | ||
|
||
- name: Install Python dependencies | ||
run: pip3 install sphinx sphinx-rtd-theme sphinx-copybutton | ||
|
||
- name: Build documentation | ||
run: | | ||
## Init the target folder. | ||
# We will put all site documentation there. | ||
mkdir -p gh-pages | ||
touch gh-pages/.nojekyll | ||
git checkout main | ||
python3 -m pip install . | ||
# Generate the HTML pages and move the generated content into the target folder. | ||
sphinx-apidoc --separate --module-first -d 4 -H "API reference" --follow-links -o docs/apidocs src/oncoexporter | ||
cd docs | ||
make html | ||
mv _build/html/* ../gh-pages | ||
- name: Deploy documentation | ||
if: ${{ github.event_name == 'push' }} | ||
uses: JamesIves/github-pages-deploy-action@v4.4.1 | ||
with: | ||
branch: gh-pages | ||
clean: true | ||
folder: gh-pages | ||
- uses: actions/checkout@v3 | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: 3.x | ||
- uses: actions/cache@v2 | ||
with: | ||
key: ${{ github.ref }} | ||
path: .cache | ||
- run: pip install mkdocs-material | ||
- run: pip install mkdocs-material[imaging] | ||
- run: pip install mkdocs-material-extensions | ||
- run: pip install pillow cairosvg | ||
- run: pip install mkdocstrings[python] | ||
- run: mkdocs gh-deploy --force |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# CDA Disease | ||
|
||
We extract information about the disease diagnosis from two CDA tables, `diagnosis` and `researchsubject`. We first summarize the tables and then outline our ETL strategy. | ||
|
||
|
||
|
||
## diagnosis | ||
|
||
|
||
| Column | Example | Explanation | | ||
|:----------------|:---------------|:----------------| | ||
| diagnosis_id | CGCI-HTMCP-CC.HTMCP-03-06-02424.HTMCP-03-06-02424_diagnosis| y | | ||
| diagnosis_identifier | see below | y | | ||
| primary_diagnosis | Squamous cell carcinoma, keratinizing, NOS | y | | ||
| age_at_diagnosis | 13085.0 | y | | ||
| morphology | 8071/3 | y | | ||
| stage | None | y | | ||
| grade | G3 | y | | ||
| method_of_diagnosis | Biopsy | y | | ||
| subject_id | CGCI.HTMCP-03-06-02424 | y | | ||
| researchsubject_id | CGCI-HTMCP-CC.HTMCP-03-06-02424| y | | ||
|
||
|
||
The fields of the table have the following meaning. | ||
|
||
- diagnosis_id | ||
Question: It seems as if this identifier has some syntex of meaning or is it random? | ||
- diagnosis_identifier | ||
Question: This field seems to have a lot of structure. How is it used in CDA and is there documentation on how to interpret it? | ||
This field has the following structure. | ||
``` | ||
[{'system': 'GDC', | ||
'field_name': 'case.diagnoses.diagnosis_id', | ||
'value': '06af070e-aad4-5b2d-a693-b6ccfe93985a'}, | ||
{'system': 'GDC', | ||
'field_name': 'case.diagnoses.submitter_id', | ||
'value': 'HTMCP-03-06-02424_diagnosis'}] | ||
``` | ||
- primary_diagnosis | ||
This field represents the main cancer diagnosis of this individual | ||
- age_at_diagnosis | ||
This field represents the number of days of life of the individual on the day during which the cancer diagnosis was made. | ||
- morphology | ||
Question: What do entries such as `8071/3` mean? Is there a data dictionary for morphology? | ||
- stage | ||
Cancer stage. | ||
- grade | ||
Cancer grade. Note that in many tables there are strings such as G3. NCIT has more detailed terms, but we think it best to stick to the top level, and possible consider postcomposition to represent specific stage systems. | ||
- method_of_diagnosis | ||
This corresponds to | ||
- subject_id | ||
Identifier for the individual being investigated | ||
- researchsubject_id | ||
Identifier for the researchsubject (which can be a sample or an individaul - Question: where is this documented?) | ||
|
||
|
||
## researchsubject | ||
|
||
|
||
| Column | Example | Explanation | | ||
|:----------------|:---------------|:----------------| | ||
| researchsubject_id | CPTAC-3.C3L-00563 | y | | ||
| researchsubject_identifier | see below | y | | ||
| member_of_research_project | CPTAC-3 | y | | ||
| primary_diagnosis_condition | Adenomas and Adenocarcinomas | y | | ||
| primary_diagnosis_site | Uterus, NOS | y | | ||
| subject_id | CPTAC.C3L-00563 | y | | ||
|
||
|
||
- researchsubject_id | ||
xyz | ||
- researchsubject_identifier | ||
Question: How do we interpret this kind of structure: | ||
``` | ||
[{'system': 'GDC', | ||
'field_name': 'case.case_id', | ||
'value': '2b1894fb-b168-42ca-942f-a5def0bb8309'}, | ||
{'system': 'GDC', 'field_name': 'case.submitter_id', 'value': 'C3L-00563'}] | ||
``` | ||
|
||
- member_of_research_project | ||
Question: Where do we get more information about the research projects? What informationis available? | ||
- primary_diagnosis_condition | ||
Question: This seems to be duplicative with the field `primary_diagnosis` in the diagnosis table. What is the difference? | ||
- primary_diagnosis_site | ||
Todo - we can map this to uberon | ||
- subject_id | ||
This relates to the subject_id in other tables. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# oncoexporter | ||
|
||
|
||
Oncoexporter is a Python package that supports extract transform load (ETL) | ||
operations for patient data in translational research on oncology. | ||
Input data from sources such as Cancer Data Aggregator (CDA) are | ||
transformed into collections of | ||
[GA4GH Phenopackets](https://github.com/phenopackets/phenopacket-schema){:target="\_blank"}. | ||
|
||
|
||
|
||
### Feedback | ||
|
||
|
||
The best place to leave feedback, ask questions, and report bugs is the | ||
[Oncoexporter Issue Tracker](https://github.com/monarch-initiative/oncoexporter/issues). | ||
|
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.