-
Vector Storing of Unsafe Prompts
-
Threshold Optimization
-
Similarity Search Based Threshold Filtering
confi_unsafe
: Confidently unsafe in filtering phaseconfi_safe
: Confidently safe in filtering phaseunconfident
: Can't determinelosses
: Incorrect filtering
-
Classification for Remaining prompts Using Previous Classifiers
Moderation API
Perspective API
Llama-Guard 7B
GradSafe
Zero shot prompting GPT-4
Install the required dependencies using the following command:
conda create -n sac
conda activate sac
pip install -r requirements.txt
Create a .env
file and add the following line:
OPENAI_API_KEY = "YOUR_API_KEY"
PERSPECTIVE_API_KEY = "YOUR_API_KEY"
HUGGINGFACE_TOKEN="YOUR_API_KEY"
Example implementation codes command:
python main.py
python main.py --embed_model openai --model llama_guard --is_prepared False
python main.py --embed_model openai --model llama_guard --is_prepared True