Skip to content

Latest commit

Β 

History

History
146 lines (110 loc) Β· 7.5 KB

File metadata and controls

146 lines (110 loc) Β· 7.5 KB

{{ cookiecutter.project_name }}

{{ cookiecutter.description }}

File Structure

β”œβ”€β”€ .devcontainer                      # Definition of the Docker container and environment for VS Code
β”‚   β”œβ”€β”€ Dockerfile                     # Defines the Docker container
β”‚   β”œβ”€β”€ devcontainer.json              # Defines the devcontainer settings for VS Code
β”‚   └── noop.txt                       # Placeholder file to ensure the COPY instruction does not fail if no environment.yml exists
β”œβ”€β”€ .gitattributes                     # Git attributes for handling line endings and merge strategies
β”œβ”€β”€ .gitignore                         # Git ignore file to exclude files and directories from version control
β”œβ”€β”€ Makefile                           # Makefile with commands like `make data` and `make clean`
β”œβ”€β”€ README.md                          # Project readme
β”œβ”€β”€ code                               # Source code and notebooks
β”‚   β”œβ”€β”€ notebooks                      # Jupyter notebooks
β”‚   β”‚   └── exploratory                # Data explorations
β”‚   β”‚       └── 1.0-tg-example.ipynb   # Jupyter notebook with naming conventions. tg are initials
β”‚   β”œβ”€β”€ project_package                # Project-specific Python package
β”‚   β”‚   β”œβ”€β”€ __init__.py                # Makes project_package a Python module
β”‚   β”‚   β”œβ”€β”€ data                       # Scripts to download, generate and parse data
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ config.py              # Project-wide path definitions
β”‚   β”‚   β”‚   β”œβ”€β”€ example.py             # Example script
β”‚   β”‚   β”‚   β”œβ”€β”€ import_data.py         # Functions to read raw data
β”‚   β”‚   β”‚   └── make_dataset.py        # Scripts to download or generate data (used in the Makefile)
β”‚   β”‚   β”œβ”€β”€ tools                      # Scripts and functions for general use
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   └── convert_latex.py       # Functions to convert elements for use in LaTeX
β”‚   β”‚   └── visualization              # Scripts and functions to create visualizations
β”‚   β”‚       β”œβ”€β”€ __init__.py
β”‚   β”‚       β”œβ”€β”€ make_plots.py          # Scripts to make all plots for the publication
β”‚   β”‚       └── visualize.py           # Functions to produce final plots
β”‚   └── pyproject.toml                 # Configuration file for the project
β”œβ”€β”€ data                               # Data directories
β”‚   β”œβ”€β”€ 01_raw                         # The original, immutable data dump
β”‚   β”‚   └── demo.csv                   # Example raw data file
β”‚   β”œβ”€β”€ 02_intermediate                # Intermediate processed data
β”‚   β”œβ”€β”€ 03_primary                     # cleaned data, used for the dissemination
β”‚   β”œβ”€β”€ 04_feature                     # For Machine learning, features based on the primary data
β”‚   β”œβ”€β”€ 05_model_input                 # The final data used for machine learning
β”‚   β”œβ”€β”€ 06_models                      # Stored, serialized pre-trained machine learning models
β”‚   β”œβ”€β”€ 07_model_output                # Output from trained machine learning models
β”‚   └── 08_reporting                   # Reporting data like log files
β”œβ”€β”€ dissemination                      # Materials for dissemination
β”‚   β”œβ”€β”€ figures                        # Figures for paper generated with Python
β”‚   β”‚   └── demo.png                   # Example figure file
β”‚   β”œβ”€β”€ presentations                  # All related PowerPoint files, especially for deliverables
β”‚   └── papers                         # LaTeX-based papers
β”‚       └── paper.tex                  # Example LaTeX paper
β”œβ”€β”€ environment.yml                    # Conda environment configuration file
└── literature                         # References and explanatory materials
    └── references.bib                 # Bibliography file for LaTeX documents

Important

  • Raw data is immutable: Do not change the data in data/01_raw.
  • Reusable functions: Develop reusable functions in Jupyter notebooks and then put them in the project_package with docstrings and type hints.
  • VS Code settings: Some settings are already defined in devcontainer.json.
  • Default shell: The default shell inside the container is zsh with the p10k theme.

Project-Specific Packages and Settings

You can customize the development environment in multiple ways:

  • Add Python packages: Modify the environment.yml file to include additional Python packages.
  • Add Dev Container features: Use the VS Code command Dev Container: Configure Container Features to add features like R, Julia, and more.
  • Modify Dockerfile: Update the Dockerfile in .devcontainer to add additional software not available as Dev Container features.
  • Install LaTeX packages: Add LaTeX packages using the postCreateCommand in devcontainer.json.

Working with Jupyter Notebooks

Use Jupyter notebooks directly in VS Code. It supports many useful functionalities.

Working with LaTeX

An example LaTeX file is provided in dissemination/papers. The LaTeX extension is also pre-installed. To compile the LaTeX file:

  • Open the file.
  • Use the TeX symbol on the side panel.
  • Select Build LaTeX project and use the recipe: pdflatex -> biber -> pdflatex*2.

Export figures to dissemination/figures. The path is already defined in project_package.data.config:

from project_package.data import config

filename = config.FIGURES_FOLDER.joinpath("example.png")

Use functions in project_package/tools/ to convert output like CSV, PDF, PNG for LaTeX use.

To redo all plots for the publication, run:

make plots

This command runs src/visualization/make_plots.py. Add all your final plot functions there to regenerate all plots for the publication with one command, saving time during the publication process.

Data Handling

  • Small datasets: Save small datasets like CSV files directly in data/01_raw and commit them to the repo.
  • Collect data from external sources: Write functions to collect data from servers or databases in code/project_package/data/make_dataset.py.

To run the data collection function, execute:

make data

Or mount a data folder to the container by adding the following line to devcontainer.json:

"mounts": ["source=WHEREVER_YOUR_DATA_IS,target=/workspace/data/01_raw/,type=bind,consistency=cached"]

Replace WHEREVER_YOUR_DATA_IS with the path to the data on the host machine, such as /home/user/data, which will be mapped to data/01_raw in the container.

Running Tasks in VS Code

This project integrates several tasks using the Makefile. You can run these tasks directly from VS Code using the Tasks: Run Task command from the Command Palette (Ctrl+Shift+P).

Available Tasks

β€’	Make Data: Generates the dataset by running the data creation scripts.
β€’	Make Plots: Creates all plots for the publication.
β€’	Make Paper: Compiles the LaTeX paper.
β€’	Make Clean: Deletes all temporary compiled Python and LaTeX files.
β€’	Make delete_demo: Deletes all demo files.

To run a task:

1.	Open the Command Palette (Ctrl+Shift+P).
2.	Select Tasks: Run Task.
3.	Choose the desired task from the list.

These tasks are configured in the .vscode/tasks.json file.

More Info

Made with the template from ttps://github.com/tgoelles/cookiecutter_science template version: 2.1.0

Contact: thomas.goelles@gmail.com