Skip to content

Repository of "SAC: Search-Augmented Classification for Unsafe Prompts in Large Language Models"

Notifications You must be signed in to change notification settings

JwdanielJung/SAC_unsafe-prompts-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAC (Search-Augmented unsafe prompts Classification) Frameworks for LLMs

Framework Overview

  1. Vector Storing of Unsafe Prompts

  2. Threshold Optimization

  3. Similarity Search Based Threshold Filtering

    • confi_unsafe: Confidently unsafe in filtering phase
    • confi_safe: Confidently safe in filtering phase
    • unconfident: Can't determine
    • losses: Incorrect filtering
  4. Classification for Remaining prompts Using Previous Classifiers

    • Moderation API
    • Perspective API
    • Llama-Guard 7B
    • GradSafe
    • Zero shot prompting GPT-4

Implementation

Install the required dependencies using the following command:

conda create -n sac
conda activate sac
pip install -r requirements.txt

Create a .env file and add the following line:

OPENAI_API_KEY = "YOUR_API_KEY"
PERSPECTIVE_API_KEY = "YOUR_API_KEY"
HUGGINGFACE_TOKEN="YOUR_API_KEY"

Example implementation codes command:

python main.py
python main.py --embed_model openai --model llama_guard --is_prepared False
python main.py --embed_model openai --model llama_guard --is_prepared True

About

Repository of "SAC: Search-Augmented Classification for Unsafe Prompts in Large Language Models"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages