- Repository Virtual Environment Setup
- CantusDB Setup
- Middleware Setup
- Text to SQL Data Collection Process
- Evaluation Gold and Predicted Outputs
- Links
- Ensure python3 is installed by typing the following into a terminal:
Note the python version used for this project: Python 3.11.3 (v3.11.3:f3909b8bc8, Apr 4 2023, 20:12:10) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
python3
- Create a virtual environment in the project directory:
python3 -m venv .venv
- Activate the virtual environment:
source .venv/bin/activate
- Verify the Python executable being used:
which python
- Upgrade pip to the latest version:
python3 -m pip install --upgrade pip
- Confirm the pip version:
python3 -m pip --version
- Install project dependencies from the requirements.txt file:
pip3 install -r requirements.txt
- Deactivate the virtual environment when done:
deactivate
- Clone the CantusDB repository and ensure it is up to date by regularly pulling the latest changes.
- Follow the instructions on the Deploying CantusDB Locally for Development page to set up the website for local development.
- Obtain the
dev_env
file from theCantusDB Resources
section. This file must be provided by a CantusDB developer. - During the Populating the Database step, request the
cantus_dump.sql
file from a CantusDB developer and use it to populate the database. - Verify that the setup is complete by confirming that the
Chants
,Sources
, andFeasts
sections are accessible vialocalhost
.
- Confirm that the CantusDB Setup is complete and the website is functional.
- Place the
middleware.py
file into the*/CantusDB/django/cantusdb_project/cantusdb/
directory. - Navigate to the
Chants
,Sources
, orFeasts
page on the CantusDB website, input search criteria, and click Apply. - The
middleware.py
script will automatically generate and populate thenlq_sql.json
andsql_queries.log
files located in*/CantusDB/django/cantusdb_project/
directory. - Each new search performed on the CantusDB website will update the data in
nlq_sql.json
andsql_queries.log
. - Note that not all SQL queries in these files are relevant. Focus on the query that matches the search results displayed on the website.
- Refer to the SQL Output Extraction section for additional instructions.
-
On the localhost version of the website, navigate to the
Chants
,Sources
, orFeasts
pages and enter the desired search information. For example:- Segment:
CANTUS Database
- General search:
Montreal
- Country:
Canada
- Century:
16th century
- Complete Source/Fragment:
Complete source
Observe that only one result is returned for this search.
- Segment:
-
Open the
*/CantusDB/django/cantusdb_project/nlq_sql.json
file and locate the query that matches the inputted search criteria. Look for:- A query starting with
SELECT DISTINCT
- Conditions such as:
UPPER(main_app_institution.country::text) LIKE UPPER('%Canada%') AND UPPER(main_app_century.name::text) LIKE UPPER('%16th century%') AND UPPER(main_app_institution.name::text) LIKE UPPER('%Montreal%')
- A query ending with:
ORDER BY main_app_institution.siglum ASC, main_app_source.shelfmark ASC LIMIT 1;
- A query starting with
-
Before executing the SQL query on the Docker Postgres container, remove the trailing
LIMIT 1
from the query. This precaution should be taken for all queries to ensure complete results are retrieved. -
Ensure the database container for the website is running, then run the following Docker command in a terminal to extract information into a file:
docker exec cantusdb-postgres-1 psql -U cantusdb -d cantusdb \ -c "\pset format csv" -c "GOLD_SQL_QUERY" | sed '1d' > */Path-to-the-Repository/nl-to-sql/NL2SQL/gold_outputs/object/object_output_filex.csv
-
When running the Docker command, replace
GOLD_SQL_QUERY
with the SQL query obtained from the search, update the path information appropriately, and ensure that the object is one ofchants
,sources
, orfeasts
. -
Ensure that the
object_output_filex.csv
follows the correct naming convention:object_output_filex
should bec
,s
, orf
forchants
,sources
, orfeasts
, respectively.- The
x
value represents the index number of the file within that directory.
-
If everything is formatted correctly, the
object_output_filex.csv
file will be generated in the specified directory. -
Copy and paste the gold SQL query and the gold output path into the appropriate
chants.json
,sources.json
, orfeasts.json
file. Add them to thesql_query
andgold_output_path
fields. -
Write the natural language query corresponding to the gold SQL query. Ensure that:
- It starts with: Given this database schema, generate a SQL query that shows me all the.
- It ends with: Format your response without any formatting or newlines.
- Values are enclosed in single quotes (e.g.,
'Canada'
). - Attribute names are capitalized (e.g.,
Country
).
-
Prompt the chosen LLM with the
natural_language_inputs
and eitherdatabase-schema-with-options
ordatabase-schema-without-options
. Ensure that the input does not exceed the character limit of the selected LLM. -
If the LLM generates an SQL query with incorrect formatting, clean it up by:
- Removing newlines.
- Adding spaces where necessary.
- Ensuring the query is formatted as a single line.
-
Ensure the database container for the website is running, then execute the following Docker command in a terminal to extract the information into a file:
docker exec cantusdb-postgres-1 psql -U cantusdb -d cantusdb \ -c "\pset format csv" -c "PREDICTED_SQL_QUERY" | sed '1d' > */Path-to-the-Repository/nl-to-sql/NL2SQL/predicted_outputs_without_options/object/llm/object_output_filex.csv
-
In the Docker command:
- Replace
PREDICTED_SQL_QUERY
with the cleaned-up query generated by the LLM. - Ensure the correct directory is chosen based on the schema used:
predicted_outputs_without_options
orpredicted_outputs_with_options
. - Update the
llm
directory name to match the specific LLM being used. - Rename the file
object_output_filex.csv
to follow the appropriate naming convention, whereobject
corresponds toc
forchants
,s
forsources
, orf
forfeasts
, andx
is the index number of the file.
- Replace
-
If everything is formatted correctly, the
object_output_filex.csv
file will be generated in the specified directory. -
Copy and paste the predicted SQL query and the predicted output path into the appropriate
chants.json
,sources.json
, orfeasts.json
file. Add them under thepredicted_sql_query_with_options
orpredicted_sql_query_without_options
fields within the correspondingllm
field.
- Navigate to the project repository in the terminal and ensure that you are in the
nl-to-sql/NL2SQL
directory:cd /Path-to-the-Repository/nl-to-sql/NL2SQL
- Run the eval.py script using the Python interpreter from the previously set up virtual environment:
python3 eval.py
- The script will output results resembling the following structure:
{
"without_options": {
"unordered": {
"gpt": 23,
"claude": 23,
"grok": 16
},
"ordered": {
"gpt": 16,
"claude": 15,
"grok": 11
},
"precision": {
"gpt": 0.7,
"claude": 0.75,
"grok": 0.56
},
"recall": {
"gpt": 0.69,
"claude": 0.74,
"grok": 0.47
},
"f1": {
"gpt": 0.68,
"claude": 0.7,
"grok": 0.48
}
},
"with_options": {
"unordered": {
"gpt": 32,
"claude": 28,
"grok": 23
},
"ordered": {
"gpt": 23,
"claude": 16,
"grok": 10
},
"precision": {
"gpt": 0.92,
"claude": 0.85,
"grok": 0.69
},
"recall": {
"gpt": 0.88,
"claude": 0.78,
"grok": 0.6
},
"f1": {
"gpt": 0.88,
"claude": 0.78,
"grok": 0.61
}
}
}
- CantusDB: https://github.com/DDMAL/CantusDB
- CantusDB Wiki: https://github.com/DDMAL/CantusDB/wiki/