Privacy Policy Analyzer

A tool for analyzing privacy policies using AI. This system processes privacy policy text segments and categorizes them according to standard privacy policy categories like data collection, sharing, retention, and user controls.

This repository includes three specialized AI agents:

Policy Segmenter Agent: Breaks down privacy policies into meaningful segments
Policy Annotator Agent: Analyzes and categorizes policy segments
GDPR Compliance Agent: Evaluates policies against regulatory requirements

Features

Policy Segmentation
- Breaks down privacy policies into distinct clauses
- Preserves original text and context
- Ensures segments are self-contained and meaningful
Policy Annotation
- Categorizes segments into standard privacy categories:
  - First Party Collection/Use
  - Third Party Sharing/Collection
  - User Choice/Control
  - User Access, Edit, and Deletion
  - Data Retention
  - Data Security
  - Policy Change
  - Do Not Track
  - International and Specific Audiences
  - Other
- Provides detailed explanations for categorizations
- Compares results with human annotations
Regulatory Compliance
- Evaluates policies against GDPR requirements
- Checks for required disclosures and practices
- Identifies potential compliance gaps
- Generates compliance reports

Setup

Clone the repository:

git clone <repository-url>
cd policy-analyzer

Create a .env file with your API keys:

OPENAI_API_KEY=your_key_here

Start the services with Docker Compose:

docker compose watch

Running Analysis

Access the Streamlit app by navigating to http://localhost:8501 in your web browser.
The agent service API will be available at http://localhost:80. You can also use the OpenAPI docs at http://localhost:80/redoc.
Use docker compose down to stop the services.

This setup allows you to develop and test your changes in real-time without manually restarting the services.

Run the analysis script:

docker exec policy-analyzer-agent_service-1 python agents/run_experiment.py

Check results in data/analysis_results.json:

cat data/analysis_results.json

The results will contain:

Original policy segments
Human annotations (if provided)
AI model analysis with categories and explanations
Match status between human and AI annotations

Project Structure

.
├── data/                  # Data files
│   ├── records_test.json  # Example input data
│   └── analysis_results.json  # Generated results
├── src/                   # Source code
│   ├── agents/           # AI agent implementations
│   │   ├── privacy_policy_analyzer.py  # Main analyzer logic
│   │   └── run_experiment.py   # Analysis script
│   ├── service/          # API service
│   ├── schema/           # Data models
│   └── client/           # API client
├── docker/               # Docker configuration
│   ├── Dockerfile.app    # Streamlit app container
│   └── Dockerfile.service # Agent service container
└── compose.yaml         # Docker Compose configuration

Development

The project uses Docker Compose Watch for development. Changes to source files will automatically trigger container rebuilds.

Adding New Categories

To add new privacy policy categories:

Update PRIVACY_LABELS in src/agents/privacy_policy_analyzer.py
Update the system prompt to include the new categories
Rebuild the containers: docker compose build

Testing

Use the provided records_test.json to verify your setup and test new features.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Analysis Output Example

The analysis_results.json contains detailed results for each policy segment. Here's an example:

{
    "document": 3828,
    "policyURL": "http://www.kraftrecipes.com/about/privacynotice.aspx",
    "segment": "Effective Date: May 7, 2015 Kraft Site Privacy Notice...",
    "human_annotations": [
      {
        "Other": {
          "Other Type": "Introductory/Generic"
        }
      },
      {
        "Policy Change": {
          "Change Type": "Unspecified",
          "User Choice": "Unspecified",
          "Notification Type": "General notice in privacy policy"
        }
      }
    ],
    "model_analysis": {
      "category": {
        "Other": {
          "Other Type": "Introductory/Generic"
        }
      },
      "explanation": "This segment serves as an introductory statement..."
    },
    "matching_details": {
      "top_level_match": true,
      "exact_match": true,
      "matching_subcategories": 1,
      "total_subcategories": 1,
      "subcategory_match_ratio": 1.0,
      "matched_categories": [
        {
          "Other": {
            "Other Type": "Introductory/Generic"
          }
        }
      ]
    }
}

Understanding Match Results

The matching_details field provides metrics on how well the model's categorization matched human annotations:

top_level_match: True if the main category (e.g., "Other") matches at least one human annotation
exact_match: True if there's a perfect match between the model's category and one of the human annotations (including all subcategories)
matching_subcategories: Number of matching subcategories
total_subcategories: Total number of subcategories to match
subcategory_match_ratio: Ratio of matching subcategories (1.0 = perfect match)
matched_categories: List of human annotation categories that matched

In this example, the model correctly identified the segment as "Other: Introductory/Generic", matching one of the human annotations perfectly. Note that human annotators may identify multiple applicable categories (in this case, both "Other" and "Policy Change"), while the model currently focuses on the primary category.

Agent Architecture

Policy Segmenter Agent

Breaks down privacy policies into logical segments
Uses LangGraph for workflow management
Ensures each segment contains one complete privacy statement
Preserves original text exactly as written

Policy Annotator Agent

Analyzes individual policy segments
Categorizes according to standard privacy categories
Provides detailed explanations for categorizations
Compares results with human annotations for accuracy

GDPR Compliance Agent

Evaluates policies against key GDPR requirements:
- Identity and contact details
- Processing purposes and legal basis
- Data recipient information
- International transfer safeguards
- Retention periods
- Data subject rights
- Consent withdrawal
- Complaint procedures
- Automated decision-making details

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
agents		agents
data		data
docker		docker
fine_tuning		fine_tuning
media		media
service		service
src		src
tests/service		tests/service
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.txt		LICENSE.txt
README.md		README.md
THIRD-PARTY-LICENSES.txt		THIRD-PARTY-LICENSES.txt
compose.yaml		compose.yaml
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Privacy Policy Analyzer

Features

Setup

Running Analysis

Project Structure

Development

Adding New Categories

Testing

Contributing

Analysis Output Example

Understanding Match Results

Agent Architecture

Policy Segmenter Agent

Policy Annotator Agent

GDPR Compliance Agent

About

Releases

Packages

Contributors 2

Languages

License

JeffinWithYa/policy-analyzer

Folders and files

Latest commit

History

Repository files navigation

Privacy Policy Analyzer

Features

Setup

Running Analysis

Project Structure

Development

Adding New Categories

Testing

Contributing

Analysis Output Example

Understanding Match Results

Agent Architecture

Policy Segmenter Agent

Policy Annotator Agent

GDPR Compliance Agent

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages