Skip to content

Commit

Permalink
Handle torch.device in evaluation function
Browse files Browse the repository at this point in the history
  • Loading branch information
chrislemke committed Nov 9, 2022
1 parent 178e8aa commit b4aabac
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 38 deletions.
14 changes: 12 additions & 2 deletions autoembedder/evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,23 @@ def __predict(
float: Loss value.
"""

device = (
torch.device(
"cuda"
if torch.cuda.is_available()
else "mps"
if torch.backends.mps.is_available() and parameters.get("use_mps", 0) == 1
else "cpu"
),
)

with torch.no_grad():
model.eval()
cat, cont = model_input(batch, parameters)
cat = rearrange(cat, "c r -> r c")
cont = rearrange(cont, "c r -> r c")
cat = __adjust_dtype(cat, model)
cont = __adjust_dtype(cont, model)
cat = __adjust_dtype(cat, model).to(device)
cont = __adjust_dtype(cont, model).to(device)
out = model(cat, cont)
return loss_fn(out, model.last_target).item()

Expand Down
8 changes: 3 additions & 5 deletions autoembedder/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def main() -> None:
parser.add_argument("--verbose", type=int, required=False, default=1)

parser.add_argument(
"--hidden_layer_representation",
"--hidden_layers",
type=str,
required=True,
help="""
Expand All @@ -105,11 +105,9 @@ def main() -> None:

args, _ = parser.parse_known_args()
args.cat_columns = args.cat_columns.replace("\\", "")
args.hidden_layer_representation = args.hidden_layer_representation.replace(
"\\", ""
)
args.hidden_layers = args.hidden_layers.replace("\\", "")
m_config = {
"hidden_layers": ast.literal_eval(args.hidden_layer_representation),
"hidden_layers": ast.literal_eval(args.hidden_layers),
"layer_bias": args.layer_bias,
}
__prepare_and_fit(vars(args), m_config)
Expand Down
60 changes: 30 additions & 30 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,36 +72,36 @@ fit(parameters, model, train_dl, valid_dl)
Check out [this Jupyter notebook](https://github.com/chrislemke/autoembedder/blob/main/example.ipynb) for an applied example using the [Credit Card Fraud Detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) from Kaggle.

## Parameters
This is a list of all parameters that can be passed to the Autoembedder for training:
This is a list of all parameters that can be passed to the Autoembedder for training. The `Required`, `Default value`, and `Comment` columns are only apply if using the training script (`training.py`):

| Argument | Type | Required (only for running using the `training.py`) | Default value | Comment |
| ---------------------------------- | ----- | --------------------------------------------------- | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| batch_size | int | False | 32 | |
| drop_last | int | False | 1 | True/False |
| pin_memory | int | False | 1 | True/False |
| num_workers | int | False | 0 | 0 means that the data will be loaded in the main process |
| use_mps | int | False | 0 | Set this to `1` if you want to use the [MPS Backend](https://pytorch.org/docs/master/notes/mps.html) for running on Mac using the M1 GPU. process |
| model_title | str | False | autoembedder_{`datetime`}.bin | |
| model_save_path | str | False | | |
| n_save_checkpoints | int | False | | |
| lr | float | False | 0.001 | |
| amsgrad | int | False | 0 | True/False |
| epochs | int | True | |
| dropout_rate | float | False | 0 | Dropout rate for the dropout layers in the encoder and decoder. |
| layer_bias | int | False | 1 | True/False | |
| weight_decay | float | False | 0 | |
| l1_lambda | float | False | 0 | |
| xavier_init | int | False | 0 | True/False
| activation | str | False | tanh | Activation function; either `tanh`, `relu`, `leaky_relu` or `elu` |
| tensorboard_log_path | str | False | | |
| trim_eval_errors | int | False | 0 | Removes the max and min loss when calculating the `mean loss diff` and `median loss diff`. This can be useful if some rows create very high losses. |
| verbose | int | False | 0 | Set this to `1` if you want to see the model summary and the validation and evaluation results. set this to `2` if you want to see the training progress bar. `0` means no output. |
| target | str | False | | The target column. If not set no evaluation will be performed. |
| train_input_path | str | True | | |
| test_input_path | str | True | |
| eval_input_path | False | True | | Path to the evaluation data. If no path is provided no evaluation will be performed. | |
| hidden_layer_representation | str | True | | Contains a string representation of a list of list of integers which represents the hidden layer structure. E.g.: `"[[64, 32], [32, 16], [16, 8]]"` activation |
| cat_columns | str | False | "[]" | Contains a string representation of a list of list of categorical columns (strings). The columns which use the same encoder should be together in a list. E.g.: `"[['a', 'b'], ['c']]"`. |
| Argument | Type | Required | Default value | Comment |
| -------------------- | ----- | -------- | ----------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| batch_size | int | False | 32 | |
| drop_last | int | False | 1 | True/False |
| pin_memory | int | False | 1 | True/False |
| num_workers | int | False | 0 | 0 means that the data will be loaded in the main process |
| use_mps | int | False | 0 | Set this to `1` if you want to use the [MPS Backend](https://pytorch.org/docs/master/notes/mps.html) for running on Mac using the M1 GPU. process |
| model_title | str | False | autoembedder_{`datetime`}.bin | |
| model_save_path | str | False | | |
| n_save_checkpoints | int | False | | |
| lr | float | False | 0.001 | |
| amsgrad | int | False | 0 | True/False |
| epochs | int | True | |
| dropout_rate | float | False | 0 | Dropout rate for the dropout layers in the encoder and decoder. |
| layer_bias | int | False | 1 | True/False | |
| weight_decay | float | False | 0 | |
| l1_lambda | float | False | 0 | |
| xavier_init | int | False | 0 | True/False |
| activation | str | False | tanh | Activation function; either `tanh`, `relu`, `leaky_relu` or `elu` |
| tensorboard_log_path | str | False | | |
| trim_eval_errors | int | False | 0 | Removes the max and min loss when calculating the `mean loss diff` and `median loss diff`. This can be useful if some rows create very high losses. |
| verbose | int | False | 0 | Set this to `1` if you want to see the model summary and the validation and evaluation results. set this to `2` if you want to see the training progress bar. `0` means no output. |
| target | str | False | | The target column. If not set no evaluation will be performed. |
| train_input_path | str | True | | |
| test_input_path | str | True | |
| eval_input_path | str | False | | Path to the evaluation data. If no path is provided no evaluation will be performed. | |
| hidden_layers | str | True | | Contains a string representation of a list of list of integers which represents the hidden layer structure. E.g.: `"[[64, 32], [32, 16], [16, 8]]"` activation |
| cat_columns | str | False | "[]" | Contains a string representation of a list of list of categorical columns (strings). The columns which use the same encoder should be together in a list. E.g.: `"[['a', 'b'], ['c']]"`. |


## Run the training script
Expand All @@ -110,7 +110,7 @@ Something like this should do it:
python3 training.py --epochs 20 \
--train_input_path "path/to/your/train_data" \
--test_input_path "path/to/your/test_data" \
--hidden_layer_representation "[[12, 6], [6, 3]]"
--hidden_layers "[[12, 6], [6, 3]]"
```


Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "Autoembedder"
version = "0.1.12"
version = "0.1.13"
description = "PyTorch autoencoder with additional embeddings layer for categorical data."
authors = ["Christopher Lemke <chris@syhbl.mozmail.com>"]
license = "MIT"
Expand Down

0 comments on commit b4aabac

Please sign in to comment.