Mes infos pour le wiki #1

spell00 · 2024-12-17T20:05:49Z

J'avais ajouté quelques info pour le readme, puis je me suis rappeler que tu as un wiki. J'ai vu que tu avais déjà fais les sections que j'ai documenté, mais je te fournis ce que j'avais fais si jamais il y a des choses qui manquent dans le wiki. J'aurai peut-être le temps bientôt de le faire moi-même si ça presse pas, mais je laisse ça ici en attendant pour pas l'ajouter dans mon pull request pour ce que j'ai modifié.

Machine learning

Training data filenames

The data files should be named like this:

Training arguments

experiment_name:
n_features: controls the number of features to be used for training. Only use in combination with use_mi or for debugging.

Neptune usage

Users are recommended to use neptune.ai in order to track the results of all the models trained during hyperparameters optimization.
If not using neptune, only the results from the best model are going to be saved

Results

All the results are saved in the folder results, which is created automatically when training the first model.
results/{exp_name}_{n_features}features_mi{is_mi}/ 
where exp_name is the experiment name, n_features is the number of features used (the default value is -1, which uses all features). The parameter is_mi controls

The following results are created:

Best model

The best model weights, best hyperparameters and scores are saved in results/{exp_name}_{n_features}features_mi{is_mi}/confusion_matrix/

Confusion matrices

The confusion matrices of the best model are saved in results/{exp_name}_{n_features}features_mi{is_mi}/confusion_matrix/.
Three confusion matrices are saved for the train, valid and test sets. Each confusion matrices is saved in two formats: csv and png.

Data visualization plots

All ordination plots for visualization are in results/{exp_name}_{n_features}features_mi{is_mi}/ord/. It includes:

MultiDimentional Scaling (MDS)
Principal Components Analysis (PCA)
Fisher's Linear Discriminant Analysis (LDA)
Uniform Manifold Approximation and Projection (UMAP)

Histograms

Four different histograms are saved in results/{exp_name}_{n_features}features_mi{is_mi}/histograms/.

The first histogram allclasses.png represents the distribution of values in the outputs from your best model, using 30 bins. The x-axis indicates the output values, and the y-axis represents the frequency of those values.

The histogram zeros_per_feature_allclasses.png illustrates the distribution of zeros across the features in the dataset. The x-axis represents the number of zeros per feature, while the y-axis indicates the count of features that fall within each range of zeros.

The histogram zeros_per_feature_allclasses.png illustrates the distribution of zeros across the samples in the dataset. The x-axis represents the number of zeros per sample, while the y-axis indicates the count of features that fall within each range of zeros.

If using the option use_mi, the figure mutual_info_gain.png is saved.

The text was updated successfully, but these errors were encountered:

Louis-MG · 2024-12-19T15:47:54Z

@spell00 il manque des bouts de phrases je suppose :

The parameter is_mi controls <br>

The following results are created: <br>

Louis-MG added the documentation Improvements or additions to documentation label Dec 19, 2024

Louis-MG self-assigned this Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mes infos pour le wiki #1

Mes infos pour le wiki #1

spell00 commented Dec 17, 2024

Louis-MG commented Dec 19, 2024

Mes infos pour le wiki #1

Mes infos pour le wiki #1

Comments

spell00 commented Dec 17, 2024

Machine learning

Training data filenames

Training arguments

Neptune usage

Results

Best model

Confusion matrices

Data visualization plots

Histograms

Louis-MG commented Dec 19, 2024