Skip to content

JeffinWithYa/policy-analyzer

Repository files navigation

Privacy Policy Analyzer

A tool for analyzing privacy policies using AI. This system processes privacy policy text segments and categorizes them according to standard privacy policy categories like data collection, sharing, retention, and user controls.

This repository includes three specialized AI agents:

  1. Policy Segmenter Agent: Breaks down privacy policies into meaningful segments
  2. Policy Annotator Agent: Analyzes and categorizes policy segments
  3. GDPR Compliance Agent: Evaluates policies against regulatory requirements

Features

  • Policy Segmentation

    • Breaks down privacy policies into distinct clauses
    • Preserves original text and context
    • Ensures segments are self-contained and meaningful
  • Policy Annotation

    • Categorizes segments into standard privacy categories:
      • First Party Collection/Use
      • Third Party Sharing/Collection
      • User Choice/Control
      • User Access, Edit, and Deletion
      • Data Retention
      • Data Security
      • Policy Change
      • Do Not Track
      • International and Specific Audiences
      • Other
    • Provides detailed explanations for categorizations
    • Compares results with human annotations
  • Regulatory Compliance

    • Evaluates policies against GDPR requirements
    • Checks for required disclosures and practices
    • Identifies potential compliance gaps
    • Generates compliance reports

Setup

  1. Clone the repository:
git clone <repository-url>
cd policy-analyzer
  1. Create a .env file with your API keys:
OPENAI_API_KEY=your_key_here
  1. Start the services with Docker Compose:
docker compose watch

Running Analysis

  1. Access the Streamlit app by navigating to http://localhost:8501 in your web browser.

  2. The agent service API will be available at http://localhost:80. You can also use the OpenAPI docs at http://localhost:80/redoc.

  3. Use docker compose down to stop the services.

This setup allows you to develop and test your changes in real-time without manually restarting the services.

  1. Run the analysis script:
docker exec policy-analyzer-agent_service-1 python agents/run_experiment.py
  1. Check results in data/analysis_results.json:
cat data/analysis_results.json

The results will contain:

  • Original policy segments
  • Human annotations (if provided)
  • AI model analysis with categories and explanations
  • Match status between human and AI annotations

Project Structure

.
├── data/                  # Data files
│   ├── records_test.json  # Example input data
│   └── analysis_results.json  # Generated results
├── src/                   # Source code
│   ├── agents/           # AI agent implementations
│   │   ├── privacy_policy_analyzer.py  # Main analyzer logic
│   │   └── run_experiment.py   # Analysis script
│   ├── service/          # API service
│   ├── schema/           # Data models
│   └── client/           # API client
├── docker/               # Docker configuration
│   ├── Dockerfile.app    # Streamlit app container
│   └── Dockerfile.service # Agent service container
└── compose.yaml         # Docker Compose configuration

Development

The project uses Docker Compose Watch for development. Changes to source files will automatically trigger container rebuilds.

Adding New Categories

To add new privacy policy categories:

  1. Update PRIVACY_LABELS in src/agents/privacy_policy_analyzer.py
  2. Update the system prompt to include the new categories
  3. Rebuild the containers: docker compose build

Testing

Use the provided records_test.json to verify your setup and test new features.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Analysis Output Example

The analysis_results.json contains detailed results for each policy segment. Here's an example:

{
    "document": 3828,
    "policyURL": "http://www.kraftrecipes.com/about/privacynotice.aspx",
    "segment": "Effective Date: May 7, 2015 Kraft Site Privacy Notice...",
    "human_annotations": [
      {
        "Other": {
          "Other Type": "Introductory/Generic"
        }
      },
      {
        "Policy Change": {
          "Change Type": "Unspecified",
          "User Choice": "Unspecified",
          "Notification Type": "General notice in privacy policy"
        }
      }
    ],
    "model_analysis": {
      "category": {
        "Other": {
          "Other Type": "Introductory/Generic"
        }
      },
      "explanation": "This segment serves as an introductory statement..."
    },
    "matching_details": {
      "top_level_match": true,
      "exact_match": true,
      "matching_subcategories": 1,
      "total_subcategories": 1,
      "subcategory_match_ratio": 1.0,
      "matched_categories": [
        {
          "Other": {
            "Other Type": "Introductory/Generic"
          }
        }
      ]
    }
}

Understanding Match Results

The matching_details field provides metrics on how well the model's categorization matched human annotations:

  • top_level_match: True if the main category (e.g., "Other") matches at least one human annotation
  • exact_match: True if there's a perfect match between the model's category and one of the human annotations (including all subcategories)
  • matching_subcategories: Number of matching subcategories
  • total_subcategories: Total number of subcategories to match
  • subcategory_match_ratio: Ratio of matching subcategories (1.0 = perfect match)
  • matched_categories: List of human annotation categories that matched

In this example, the model correctly identified the segment as "Other: Introductory/Generic", matching one of the human annotations perfectly. Note that human annotators may identify multiple applicable categories (in this case, both "Other" and "Policy Change"), while the model currently focuses on the primary category.

Agent Architecture

Policy Segmenter Agent

  • Breaks down privacy policies into logical segments
  • Uses LangGraph for workflow management
  • Ensures each segment contains one complete privacy statement
  • Preserves original text exactly as written

Policy Annotator Agent

  • Analyzes individual policy segments
  • Categorizes according to standard privacy categories
  • Provides detailed explanations for categorizations
  • Compares results with human annotations for accuracy

GDPR Compliance Agent

  • Evaluates policies against key GDPR requirements:
    • Identity and contact details
    • Processing purposes and legal basis
    • Data recipient information
    • International transfer safeguards
    • Retention periods
    • Data subject rights
    • Consent withdrawal
    • Complaint procedures
    • Automated decision-making details

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published