Skip to content

Commit

Permalink
[DOC] Adding development docs to make sure it is clear how to do dev (#…
Browse files Browse the repository at this point in the history
…150)

* Adding development docs to make sure it is clear how to do dev

Signed-off-by: Adam Li <adam2392@gmail.com>

* Update reqs

Signed-off-by: Adam Li <adam2392@gmail.com>

* Fix rendering of cli commands

Signed-off-by: Adam Li <adam2392@gmail.com>

---------

Signed-off-by: Adam Li <adam2392@gmail.com>
  • Loading branch information
adam2392 authored Oct 19, 2023
1 parent 9ad065e commit 860d197
Show file tree
Hide file tree
Showing 2 changed files with 117 additions and 7 deletions.
122 changes: 116 additions & 6 deletions DEVELOPING.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,103 @@
<!-- TOC -->

- [Requirements](#requirements)
- [Setting up your development environment](#setting-up-your-development-environment)
- [Building the project from source](#building-the-project-from-source)
- [Development Tasks](#development-tasks)
- [Basic Verification](#basic-verification)
- [Docsite](#docsite)
- [Details](#details)
- [Coding Style](#coding-style)
- [Lint](#lint)
- [Type checking](#type-checking)
- [Unit tests](#unit-tests)
- [Advanced Updating submodules](#advanced-updating-submodules)
- [Cython and C++](#cython-and-c)
- [Making a Release](#making-a-release)

<!-- /TOC -->

# Requirements
* Python 3.8+
* Poetry (`curl -sSL https://install.python-poetry.org | python - --version=1.2.2`)

For the other requirements, inspect the ``pyproject.toml`` file. If you are updated the dependencies, please run `poetry update` to update the
* Python 3.9+
* numpy>=1.25
* scipy>=1.11
* scikit-learn>=1.3.1

For the other requirements, inspect the ``pyproject.toml`` file.

# Setting up your development environment

We recommend using miniconda, as python virtual environments may not setup properly compilers necessary for our compiled code. For detailed information on setting up and managing conda environments, see https://conda.io/docs/test-drive.html.

<!-- Setup a conda env -->

conda create -n sktree
conda activate sktree

**Make sure you specify a Python version if your system defaults to anything less than Python 3.9.**

**Any commands should ALWAYS be after you have activated your conda environment.**
Next, install necessary build dependencies. For more information, see https://scikit-learn.org/stable/developers/advanced_installation.html.

conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

Assuming these steps have worked properly and you have read and followed any necessary scikit-learn advanced installation instructions, you can then install dependencies for scikit-tree.

If you are developing locally, you will need the build dependencies to compile the Cython / C++ code:

pip install -r build_requirements.txt

Other requirements can be installed as such:

pip install -r requirements.txt
pip install -r style_requirements.txt
pip install -r test_requirements.txt
pip install -r doc_requirements.txt

# Building the project from source

We leverage meson to build scikit-tree from source. We utilize a CLI tool, called [spin](https://github.com/scientific-python/spin), which wraps certain meson commands to make building easier.

For example, the following command will build the project completely from scratch

spin build --clean

If you have part of the build already done, you can run:

spin build

The following command will test the project

spin test

For other commands, see

spin --help

Note at this stage, you will be unable to run Python commands directly. For example, ``pytest ./sktree`` will not work.

However, after installing and building the project from source using meson, you can leverage editable installs to make testing code changes much faster. For more information on meson-python's progress supporting editable installs in a better fashion, see https://meson-python.readthedocs.io/en/latest/how-to-guides/editable-installs.html.

pip install --no-build-isolation --editable .

**Note: editable installs for scikit-tree REQUIRE you to have built the project using meson already.** This will now link the meson build to your Python runtime. Now if you run

pytest ./sktree

the unit-tests should run.

# Development Tasks
There are a series of top-level tasks available through Poetry. These can each be run via
There are a series of top-level tasks available through Poetry. If you are updated the dependencies, please run `poetry update` to update the lock file. These can each be run via

`poetry run poe <taskname>`

To do so, first install poetry and poethepoet.

pip install poetry poethepoet

Now, you are ready to run quick commands to format the codebase, lint the codebase and type-check the codebase.

### Basic Verification
* **format** - runs the suite of formatting tools applying tools to make code compliant
* **format_check** - runs the suite of formatting tools checking for compliance
Expand Down Expand Up @@ -53,6 +142,23 @@ In order for any code to be added to the repository, we require unit tests to pa

poetry run poe unit_test

# (Advanced) Updating submodules

Scikit-tree relies on a submodule of a forked-version of scikit-learn for certain Python and Cython code that extends the ``DecisionTree*`` models. Usually, if a developer is making changes, they should go over to the ``submodulev3`` branch on ``https://github.com/neurodata/scikit-learn`` and
submit a PR to make changes to the submodule.

This should **ALWAYS** be supported by some use-case in scikit-tree. We want the minimal amount of code-change in our forked version of scikit-learn to make it very easy to merge in upstream changes, bug fixes and features for tree-based code.

Once a PR is submitted and merged, the developer can update the submodule here in scikit-tree, so that we leverage the new commit. You **must** update the submodule commit ID and also commit this change, so that way the build leverages the new submodule commit ID.

git submodule update --init --recursive --remote
git add -A
git commit -m "Update submodule" -s

Now, you can re-build the project using the latest submodule changes.

spin build --clean

# Cython and C++
The general design of scikit-tree follows that of the tree-models inside scikit-learn, where tree-based models are inherently Cythonized, or written with C++. Then the actual forest (e.g. RandomForest, or ExtraForest) is just a Python API wrapper that creates an ensemble of the trees.

Expand All @@ -68,13 +174,17 @@ https://github.com/neurodata/scikit-tree/actions/workflows/build_wheels.yml will

2. Upload wheels to test PyPi

twine upload --repository-url https://test.pypi.org/legacy/ dist/*
```
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
```

Verify that installations work as expected on your machine.

3. Upload wheels

twine upload dist/*
```
twine upload dist/*
```

or if you have two-factor authentication enabled: https://pypi.org/help/#apitoken

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
numpy>=1.25
scipy
scipy>=1.11
scikit-learn>=1.3.1

0 comments on commit 860d197

Please sign in to comment.