NUX Optimize is an open-source tool for automating the fine-tuning of hyperparameters in Large Language Models (LLMs).
When using LLMs (like GPT4), developers are forced to experiment with seemingly unlimited combinations of temperature
values, system prompts
, top_p
and more. These values can all produce wildly different outputs from the model.
These then yield hundreds of versions of different hyperparameter combinations, not even mentioning changing the actual user_prompt
.
Hyperparameter optimization
offers a simple way to pursue objective evaluation of how each combination of hyperparameter compares to the desired output.
pip install requirements.txt
user_prompts = [
"generate a summary from this article: {{article}}",
"take a deep breath and generate a summary from this article: {{article}}..."
]
desired_output = "This article does xyz..."
nux = NuxAI(model="chatgpt", api_key="***")
results = nux.optimize(
max_combinations=100,
user_prompts=user_prompts,
desired_output=desired_output
)
api_key
: Your OpenAI api keymax_combinations
: This is the separation between each hyperparameter, for example an interval of0.1
will generatetemperature
combinations of0.1, 0.2, ...
. The lower the number, the more possible combinations which creates more API calls to GPT.user_prompt
: The various instructions as tasks provided to the model. Each of these will generate an entirely different set of hyperparameter settings, as such it is logarithmically expensive.desired_output
: Think of this like a Goal Seek formula. You are providing a goal and it works backwards from it
The script will call the LLM's inference service, and save each combination of hyperparam/user_prompt pairs against its' response in a local results.json
file.
Each response will be converted into an embedding and stored in a local HNSW
vector store where a cosine similarity against the desired_output
will be calculated.
An HTML page like index.html will be generated, which contains a ranking of each user_prompt
and hyperparameter
combination. Rankings are determined based on similarity scores from the local embedding store's KNN response.