diff --git a/docs/docs/settings.rst b/docs/docs/settings.rst new file mode 100644 index 000000000..e21a4c30a --- /dev/null +++ b/docs/docs/settings.rst @@ -0,0 +1,276 @@ +.. _settings: + +===================================== +Library Settings and Constants +===================================== + +This guide explains the rationale behind the :class:`Settings ` and :class:`Constants ` system, how to extend and configure them, and how to use them effectively in your application. + +All the settings can be easily accessed with: + +.. code-block:: python + + import unitxt + + print(unitxt.settings.default_verbosity) # Output: "info" + +All the settings can be easily modified with: + +.. code-block:: python + + unitxt.settings.default_verbosity = "debug" + +Or through environment variables: + +.. code-block:: + + export UNITXT_DEFAULT_VERBOSITY = "debug" + +Rationale +========= + +Managing application-wide configuration and constants can be challenging, especially in larger systems. The :class:`Settings ` and :class:`Constants ` classes provide a centralized, thread-safe, and type-safe way to manage these configurations. + +- **Settings**: Designed for mutable configurations that can be customized dynamically, with optional type enforcement and environment variable overrides. +- **Constants**: Designed for immutable values that remain consistent throughout the application lifecycle. + +By centralizing these configurations, you can: +- Ensure consistency across your application. +- Simplify debugging and testing. +- Enable dynamic configuration using environment variables or runtime contexts. + +Adding New Settings +=================== + +To add a new setting, follow these steps: + +1. Open the :class:`Settings ` initialization block in the :class:`settings_utils ` module. +2. Add a new setting key with a tuple of `(type, default_value)` to enforce its type and provide a default value. + +.. code-block:: python + + settings.new_feature_enabled = (bool, False) # Adding a new boolean setting. + +Guidelines: +- Use a clear and descriptive name for the setting. +- Always specify the type as one of `int`, `float`, or `bool`. + +Adding New Constants +==================== + +To add a new constant: + +1. Open the :class:`Constants ` initialization block in the :class:`settings_utils ` module. +2. Assign a new constant key with its value. + +.. code-block:: python + + constants.new_constant = "new_value" # Adding a new constant. + +Guidelines: +- Constants should represent fixed, immutable values. +- Use clear and descriptive names that indicate their purpose. + +Using Settings Context +====================== + +The :class:`Settings ` class provides a `context` manager to temporarily override settings within a specific block of code. After exiting the block, the settings revert to their original values. + +Example: + +.. code-block:: python + + from unitxt import settings + + print(settings.default_verbosity) # Output: "info" + + with settings.context(default_verbosity="debug"): + print(settings.default_verbosity) # Output: "debug" + + print(settings.default_verbosity) # Output: "info" + +This feature is useful for scenarios like testing or running specific tasks with modified configurations. + +List of Settings +================ + +Below is the list of available settings, their types, default values, corresponding environment variable names, and descriptions: + +.. list-table:: + :header-rows: 1 + + * - Setting + - Type + - Default Value + - Environment Variable + - Description + * - allow_unverified_code + - bool + - False + - UNITXT_ALLOW_UNVERIFIED_CODE + - Enables or disables execution of unverified code. + * - use_only_local_catalogs + - bool + - False + - UNITXT_USE_ONLY_LOCAL_CATALOGS + - Restricts operations to use only local catalogs. + * - global_loader_limit + - int + - None + - UNITXT_GLOBAL_LOADER_LIMIT + - Sets a limit on the number of global data loaders. + * - num_resamples_for_instance_metrics + - int + - 1000 + - UNITXT_NUM_RESAMPLES_FOR_INSTANCE_METRICS + - Number of resamples used for calculating instance-level metrics. + * - num_resamples_for_global_metrics + - int + - 100 + - UNITXT_NUM_RESAMPLES_FOR_GLOBAL_METRICS + - Number of resamples used for calculating global metrics. + * - max_log_message_size + - int + - 100000 + - UNITXT_MAX_LOG_MESSAGE_SIZE + - Maximum size allowed for log messages. + * - catalogs + - None + - None + - UNITXT_CATALOGS + - Specifies the catalogs configuration. + * - artifactories + - None + - None + - UNITXT_ARTIFACTORIES + - Defines the artifact storage configuration. + * - default_recipe + - str + - "dataset_recipe" + - UNITXT_DEFAULT_RECIPE + - Specifies the default recipe for datasets. + * - default_verbosity + - str + - "info" + - UNITXT_DEFAULT_VERBOSITY + - Sets the default verbosity level for logging. + * - use_eager_execution + - bool + - False + - UNITXT_USE_EAGER_EXECUTION + - Enables eager execution for tasks. + * - remote_metrics + - list + - [] + - UNITXT_REMOTE_METRICS + - Defines a list of configurations for remote metrics. + * - test_card_disable + - bool + - False + - UNITXT_TEST_CARD_DISABLE + - Disables the use of test cards when enabled. + * - test_metric_disable + - bool + - False + - UNITXT_TEST_METRIC_DISABLE + - Disables the use of test metrics when enabled. + * - metrics_master_key_token + - None + - None + - UNITXT_METRICS_MASTER_KEY_TOKEN + - Specifies the master token for accessing metrics. + * - seed + - int + - 42 + - UNITXT_SEED + - Default seed value for random operations. + * - skip_artifacts_prepare_and_verify + - bool + - False + - UNITXT_SKIP_ARTIFACTS_PREPARE_AND_VERIFY + - Skips preparation and verification of artifacts. + * - data_classification_policy + - None + - None + - UNITXT_DATA_CLASSIFICATION_POLICY + - Specifies the policy for data classification. + * - mock_inference_mode + - bool + - False + - UNITXT_MOCK_INFERENCE_MODE + - Enables mock inference mode for testing. + * - disable_hf_datasets_cache + - bool + - True + - UNITXT_DISABLE_HF_DATASETS_CACHE + - Disables caching for Hugging Face datasets. + * - loader_cache_size + - int + - 1 + - UNITXT_LOADER_CACHE_SIZE + - Sets the cache size for data loaders. + * - task_data_as_text + - bool + - True + - UNITXT_TASK_DATA_AS_TEXT + - Enables representation of task data as plain text. + * - default_provider + - str + - "watsonx" + - UNITXT_DEFAULT_PROVIDER + - Specifies the default provider for tasks. + * - default_format + - None + - None + - UNITXT_DEFAULT_FORMAT + - Defines the default format for data processing. + +List of Constants +================= + +Below is the list of available constants and their values: + +.. list-table:: + :header-rows: 1 + + * - Constant + - Value + * - dataset_file + - Path to `dataset.py`. + * - metric_file + - Path to `metric.py`. + * - local_catalog_path + - Path to the local catalog directory. + * - package_dir + - Directory of the installed package. + * - default_catalog_path + - Default catalog directory path. + * - dataset_url + - URL for dataset resources. + * - metric_url + - URL for metric resources. + * - version + - Current version of the application. + * - catalog_hierarchy_sep + - Separator for catalog hierarchy levels. + * - env_local_catalogs_paths_sep + - Separator for local catalog paths in environment variables. + * - non_registered_files + - List of files excluded from registration. + * - codebase_url + - URL of the codebase repository. + * - website_url + - Official website URL. + * - inference_stream + - Name of the inference stream constant. + * - instance_stream + - Name of the instance stream constant. + * - image_tag + - Default image tag for operations. + * - demos_pool_field + - Field name for demos pool. + +Conclusion +========== + +The `Settings` and `Constants` system provides a robust and flexible way to manage your application's configuration and constants. By following the guidelines above, you can extend and use these classes effectively in your application. \ No newline at end of file diff --git a/docs/docs/tutorials.rst b/docs/docs/tutorials.rst index 83115ee06..28c19f677 100644 --- a/docs/docs/tutorials.rst +++ b/docs/docs/tutorials.rst @@ -31,4 +31,5 @@ Tutorials ✨ tags_and_descriptions types_and_serializers contributors_guide + settings diff --git a/src/unitxt/settings_utils.py b/src/unitxt/settings_utils.py index 75a3bd641..89cd66f04 100644 --- a/src/unitxt/settings_utils.py +++ b/src/unitxt/settings_utils.py @@ -1,3 +1,85 @@ +"""Library Settings and Constants. + +This module provides a mechanism for managing application-wide configuration and immutable constants. It includes the `Settings` and `Constants` classes, which are implemented as singleton patterns to ensure a single shared instance across the application. Additionally, it defines utility functions to access these objects and configure application behavior. + +### Key Components: + +1. **Settings Class**: + - A singleton class for managing mutable configuration settings. + - Supports type enforcement for settings to ensure correct usage. + - Allows dynamic modification of settings using a context manager for temporary changes. + - Retrieves environment variable overrides for settings, enabling external customization. + + #### Available Settings: + - `allow_unverified_code` (bool, default: False): Whether to allow unverified code execution. + - `use_only_local_catalogs` (bool, default: False): Restrict operations to local catalogs only. + - `global_loader_limit` (int, default: None): Limit for global data loaders. + - `num_resamples_for_instance_metrics` (int, default: 1000): Number of resamples for instance-level metrics. + - `num_resamples_for_global_metrics` (int, default: 100): Number of resamples for global metrics. + - `max_log_message_size` (int, default: 100000): Maximum size of log messages. + - `catalogs` (default: None): List of catalog configurations. + - `artifactories` (default: None): Artifact storage configurations. + - `default_recipe` (str, default: "dataset_recipe"): Default recipe for dataset operations. + - `default_verbosity` (str, default: "info"): Default verbosity level for logging. + - `use_eager_execution` (bool, default: False): Enable eager execution for tasks. + - `remote_metrics` (list, default: []): List of remote metrics configurations. + - `test_card_disable` (bool, default: False): Disable test cards if set to True. + - `test_metric_disable` (bool, default: False): Disable test metrics if set to True. + - `metrics_master_key_token` (default: None): Master token for metrics. + - `seed` (int, default: 42): Default seed for random operations. + - `skip_artifacts_prepare_and_verify` (bool, default: False): Skip artifact preparation and verification. + - `data_classification_policy` (default: None): Policy for data classification. + - `mock_inference_mode` (bool, default: False): Enable mock inference mode. + - `disable_hf_datasets_cache` (bool, default: True): Disable caching for Hugging Face datasets. + - `loader_cache_size` (int, default: 1): Cache size for data loaders. + - `task_data_as_text` (bool, default: True): Represent task data as text. + - `default_provider` (str, default: "watsonx"): Default service provider. + - `default_format` (default: None): Default format for data processing. + + #### Usage: + - Access settings using `get_settings()` function. + - Modify settings temporarily using the `context` method: + ```python + settings = get_settings() + with settings.context(default_verbosity="debug"): + # Code within this block uses "debug" verbosity. + ``` + +2. **Constants Class**: + - A singleton class for managing immutable constants used across the application. + - Constants cannot be modified once set. + - Provides centralized access to paths, URLs, and other fixed application parameters. + + #### Available Constants: + - `dataset_file`: Path to the dataset file. + - `metric_file`: Path to the metric file. + - `local_catalog_path`: Path to the local catalog directory. + - `package_dir`: Directory of the installed package. + - `default_catalog_path`: Default catalog directory path. + - `dataset_url`: URL for dataset resources. + - `metric_url`: URL for metric resources. + - `version`: Current version of the application. + - `catalog_hierarchy_sep`: Separator for catalog hierarchy levels. + - `env_local_catalogs_paths_sep`: Separator for local catalog paths in environment variables. + - `non_registered_files`: List of files excluded from registration. + - `codebase_url`: URL of the codebase repository. + - `website_url`: Official website URL. + - `inference_stream`: Name of the inference stream constant. + - `instance_stream`: Name of the instance stream constant. + - `image_tag`: Default image tag for operations. + - `demos_pool_field`: Field name for demos pool. + + #### Usage: + - Access constants using `get_constants()` function: + ```python + constants = get_constants() + print(constants.dataset_file) + ``` + +3. **Helper Functions**: + - `get_settings()`: Returns the singleton `Settings` instance. + - `get_constants()`: Returns the singleton `Constants` instance. +""" import importlib.metadata import importlib.util import os