Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add poetry and linting #450

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ jobs:
- name: Dependencies
run: |
python -m pip install --upgrade pip wheel
pip install -r requirements.txt
pip install -r requirements-dev.txt
- name: Build Docs
uses: ammaraskar/sphinx-action@master
with:
docs-folder: "docs/"
python -m pip install poetry
poetry install
- name: Directly build docs
run: |
pip install -r docs/requirements.txt
sphinx-build -D docs/source ./docs/build/html/
- name: Deploy Docs
uses: peaceiris/actions-gh-pages@v3
with:
Expand Down
13 changes: 6 additions & 7 deletions .github/workflows/pypi-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,15 @@ jobs:
steps:
- name: Clone
uses: actions/checkout@v2
- name: Set up Python 3.7
- name: Set up Python 3.12
uses: actions/setup-python@v2
with:
python-version: 3.7
python-version: 3.12
- name: Build package
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install wheel
python setup.py bdist_wheel sdist
python -m pip install --upgrade pip wheel
python -m pip install poetry
poetry install
poetry build
- name: Publish package
uses: pypa/gh-action-pypi-publish@release/v1
9 changes: 5 additions & 4 deletions .github/workflows/test-docs-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.10']
python-version: ['3.12']
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- uses: ammaraskar/sphinx-action@master
with:
docs-folder: "docs/"
- name: directly build sphinx (plugin only supports python 3.8)
run: |
pip install -r docs/requirements.txt
sphinx-build docs/source ./docs/build/html/
8 changes: 4 additions & 4 deletions .github/workflows/test-suite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
python-version: ['3.10', '3.11', '3.12', '3.13']

steps:
- uses: actions/checkout@v2
Expand All @@ -26,8 +26,8 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip wheel
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt
python -m pip install poetry
poetry install
- name: Test with pytest
run: |
pytest
poetry run pytest tests
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
unreleased
==========

* Refactor: Use poetry as packaging tool
* Refactor: Add more typing
* Change `feature_names_in_` and `feature_names_out_` to `np.ndarray` instead of lists.
* Breaking: Do not allow scalar values as target variable (of length 1) anymore
* Breaking: Force dataframe column names to be strings.

v2.6.4
======
* fixed: Future Warning in Pandas
Expand Down
8 changes: 6 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,18 @@ How to Contribute
The preferred workflow to contribute to git-pandas is:

1. Fork this repository into your own github account.
2. Clone the fork on your account onto your local disk:

2. Clone the fork and install project via poetry:
```
$ git clone git@github.com:YourLogin/category_encoders.git
$ cd category_encoders
$ poetry install
```
3. Create a branch for your new awesome feature, do not work in the master branch:
```
$ git checkout -b new-awesome-feature
```
4. Write some code, or docs, or tests.
5. When you are done, submit a pull request.
Expand Down
65 changes: 32 additions & 33 deletions category_encoders/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""
"""Category encoders library.

.. module:: category_encoders
:synopsis:
Expand All @@ -7,51 +7,50 @@
"""

from category_encoders.backward_difference import BackwardDifferenceEncoder
from category_encoders.basen import BaseNEncoder
from category_encoders.binary import BinaryEncoder
from category_encoders.gray import GrayEncoder
from category_encoders.cat_boost import CatBoostEncoder
from category_encoders.count import CountEncoder
from category_encoders.glmm import GLMMEncoder
from category_encoders.gray import GrayEncoder
from category_encoders.hashing import HashingEncoder
from category_encoders.helmert import HelmertEncoder
from category_encoders.james_stein import JamesSteinEncoder
from category_encoders.leave_one_out import LeaveOneOutEncoder
from category_encoders.m_estimate import MEstimateEncoder
from category_encoders.one_hot import OneHotEncoder
from category_encoders.ordinal import OrdinalEncoder
from category_encoders.sum_coding import SumEncoder
from category_encoders.polynomial import PolynomialEncoder
from category_encoders.basen import BaseNEncoder
from category_encoders.leave_one_out import LeaveOneOutEncoder
from category_encoders.quantile_encoder import QuantileEncoder, SummaryEncoder
from category_encoders.rankhot import RankHotEncoder
from category_encoders.sum_coding import SumEncoder
from category_encoders.target_encoder import TargetEncoder
from category_encoders.woe import WOEEncoder
from category_encoders.m_estimate import MEstimateEncoder
from category_encoders.james_stein import JamesSteinEncoder
from category_encoders.cat_boost import CatBoostEncoder
from category_encoders.rankhot import RankHotEncoder
from category_encoders.glmm import GLMMEncoder
from category_encoders.quantile_encoder import QuantileEncoder, SummaryEncoder


__version__ = '2.6.4'

__author__ = "willmcginnis", "cmougan", "paulwestenthanner"
__author__ = 'willmcginnis', 'cmougan', 'paulwestenthanner'

__all__ = [
"BackwardDifferenceEncoder",
"BinaryEncoder",
"GrayEncoder",
"CountEncoder",
"HashingEncoder",
"HelmertEncoder",
"OneHotEncoder",
"OrdinalEncoder",
"SumEncoder",
"PolynomialEncoder",
"BaseNEncoder",
"LeaveOneOutEncoder",
"TargetEncoder",
"WOEEncoder",
"MEstimateEncoder",
"JamesSteinEncoder",
"CatBoostEncoder",
"GLMMEncoder",
"QuantileEncoder",
"SummaryEncoder",
'BackwardDifferenceEncoder',
'BinaryEncoder',
'GrayEncoder',
'CountEncoder',
'HashingEncoder',
'HelmertEncoder',
'OneHotEncoder',
'OrdinalEncoder',
'SumEncoder',
'PolynomialEncoder',
'BaseNEncoder',
'LeaveOneOutEncoder',
'TargetEncoder',
'WOEEncoder',
'MEstimateEncoder',
'JamesSteinEncoder',
'CatBoostEncoder',
'GLMMEncoder',
'QuantileEncoder',
'SummaryEncoder',
'RankHotEncoder',
]
41 changes: 24 additions & 17 deletions category_encoders/backward_difference.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Backward difference contrast encoding"""
"""Backward difference contrast encoding."""

from patsy.contrasts import Diff, ContrastMatrix
import numpy as np
from patsy.contrasts import ContrastMatrix, Diff

from category_encoders.base_contrast_encoder import BaseContrastEncoder

Expand All @@ -13,31 +13,39 @@ class BackwardDifferenceEncoder(BaseContrastEncoder):

Parameters
----------

verbose: int
integer indicating verbosity of the output. 0 for none.
cols: list
a list of columns to encode, if None, all string columns will be encoded.
drop_invariant: bool
boolean for whether or not to drop columns with 0 variance.
return_df: bool
boolean for whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).
boolean for whether to return a pandas DataFrame from transform
(otherwise it will be a numpy array).
handle_unknown: str
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'. Warning: if indicator is used,
an extra column will be added in if the transform matrix has unknown categories. This can cause
unexpected changes in dimension in some cases.
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'.
Warning: if indicator is used, an extra column will be added in if the transform matrix
has unknown categories. This can cause unexpected changes in dimension in some cases.
handle_missing: str
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'. Warning: if indicator is used,
an extra column will be added in if the transform matrix has nan values. This can cause
unexpected changes in dimension in some cases.
options are 'error', 'return_nan', 'value', and 'indicator'. The default is 'value'.
Warning: if indicator is used, an extra column will be added in if the transform
matrix has nan values. This can cause unexpected changes in dimension in some cases.

Example
-------
>>> from category_encoders import *
>>> import pandas as pd
>>> from sklearn.datasets import fetch_openml
>>> bunch = fetch_openml(name="house_prices", as_frame=True)
>>> display_cols = ["Id", "MSSubClass", "MSZoning", "LotFrontage", "YearBuilt", "Heating", "CentralAir"]
>>> bunch = fetch_openml(name='house_prices', as_frame=True)
>>> display_cols = [
... 'Id',
... 'MSSubClass',
... 'MSZoning',
... 'LotFrontage',
... 'YearBuilt',
... 'Heating',
... 'CentralAir',
... ]
>>> y = bunch.target
>>> X = pd.DataFrame(bunch.data, columns=bunch.feature_names)[display_cols]
>>> enc = BackwardDifferenceEncoder(cols=['CentralAir', 'Heating']).fit(X, y)
Expand All @@ -46,12 +54,11 @@ class BackwardDifferenceEncoder(BaseContrastEncoder):
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 intercept 1460 non-null int64
# Column Non-Null Count Dtype
--- ------ -------------- -----
1 Id 1460 non-null float64
2 MSSubClass 1460 non-null float64
3 MSZoning 1460 non-null object
3 MSZoning 1460 non-null object
4 LotFrontage 1201 non-null float64
5 YearBuilt 1460 non-null float64
6 Heating_0 1460 non-null float64
Expand All @@ -76,5 +83,5 @@ class BackwardDifferenceEncoder(BaseContrastEncoder):
"""

def get_contrast_matrix(self, values_to_encode: np.array) -> ContrastMatrix:
"""Get the contrast matrix for the backward difference encoder."""
return Diff().code_without_intercept(values_to_encode)

Loading
Loading