Skip to content

Commit

Permalink
Merge pull request #16 from VectorInstitute/add_support_for_empower_api
Browse files Browse the repository at this point in the history
Add support for empower API
  • Loading branch information
amrit110 authored Nov 12, 2024
2 parents 8bf47b4 + 0f45d8d commit 3924cd3
Show file tree
Hide file tree
Showing 14 changed files with 1,438 additions and 591 deletions.
27 changes: 22 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,18 +82,35 @@ docker compose --env-file .env.development --profile frontend -f docker-compose.

### 📥 Data setup

#### Download service data (211 API)
#### Test data

**GTA data**
If you want to test the system without real data, you can generate some dummy testing data:

```bash
python3 scripts/download_data.py --api-key $YOUR_211_API_KEY --dataset on --is-gta --data-dir <path_to_data_dir>
python3 scripts/generate_test_data.py
```

**Ontario-wide data**
#### Download service data

If you are using the 211 API or Empower's API, make sure you check with them to see if the API keys are
configured correctly for the geography of interest.

**GTA data (211 API)**

```bash
python3 scripts/download_211_data.py --api-key $YOUR_211_API_KEY --dataset on --is-gta --data-dir <path_to_data_dir>
```

**Ontario-wide data (211 API)**

```bash
python3 scripts/download_211_data.py --api-key $YOUR_211_API_KEY --dataset on --data-dir <path_to_data_dir>
```

**Empower API data**

```bash
python3 scripts/download_data.py --api-key $YOUR_211_API_KEY --dataset on --data-dir <path_to_data_dir>
python3 scripts/download_empower_data.py --api-key $YOUR_EMPOWER_API_KEY --data-dir <path_to_data_dir>
```

#### Upload data and embeddings
Expand Down
27 changes: 22 additions & 5 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,18 +91,35 @@ docker compose --env-file .env.development --profile frontend -f docker-compose.

### 📥 Data setup

#### Download service data (211 API)
#### Test data

**GTA data**
If you want to test the system without real data, you can generate some dummy testing data:

```bash
python3 scripts/download_data.py --api-key $YOUR_211_API_KEY --dataset on --is-gta --data-dir <path_to_data_dir>
python3 scripts/generate_test_data.py
```

**Ontario-wide data**
#### Download service data

If you are using the 211 API or Empower's API, make sure you check with them to see if the API keys are
configured correctly for the geography of interest.

**GTA data (211 API)**

```bash
python3 scripts/download_211_data.py --api-key $YOUR_211_API_KEY --dataset on --is-gta --data-dir <path_to_data_dir>
```

**Ontario-wide data (211 API)**

```bash
python3 scripts/download_211_data.py --api-key $YOUR_211_API_KEY --dataset on --data-dir <path_to_data_dir>
```

**Empower API data**

```bash
python3 scripts/download_data.py --api-key $YOUR_211_API_KEY --dataset on --data-dir <path_to_data_dir>
python3 scripts/download_empower_data.py --api-key $YOUR_EMPOWER_API_KEY --data-dir <path_to_data_dir>
```

#### Upload data and embeddings
Expand Down
15 changes: 8 additions & 7 deletions eval/collect_rag_outputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,18 @@
import asyncio
import json
import logging
from typing import Dict, Any
from typing import Dict, Any, List, Optional

import aiohttp
from tqdm.asyncio import tqdm_asyncio


logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


async def fetch_recommendation(
session: aiohttp.ClientSession, query: Dict[str, Any], endpoint: str
) -> Dict[str, Any]:
) -> Optional[Dict[str, Any]]:
"""Fetch recommendation from the RAG system API."""
try:
async with session.post(
Expand All @@ -41,19 +40,19 @@ async def fetch_recommendation(


async def process_samples(
samples_file: str, output_file: str, batch_size: int = 5
samples_file: str, output_file: str, endpoint: str, batch_size: int = 5
) -> None:
"""Process samples in batches and save results."""
# Load samples
with open(samples_file, "r") as f:
samples = json.load(f)

results = []
results: List[Dict[str, Any]] = []
async with aiohttp.ClientSession() as session:
# Process in batches
for i in range(0, len(samples), batch_size):
batch = samples[i : i + batch_size]
tasks = [fetch_recommendation(session, query) for query in batch]
tasks = [fetch_recommendation(session, query, endpoint) for query in batch]
batch_results = await tqdm_asyncio.gather(*tasks)
results.extend([r for r in batch_results if r is not None])

Expand Down Expand Up @@ -89,7 +88,9 @@ def main() -> None:

args = parser.parse_args()

asyncio.run(process_samples(args.input, args.output, args.batch_size))
asyncio.run(
process_samples(args.input, args.output, args.endpoint, args.batch_size)
)


if __name__ == "__main__":
Expand Down
Loading

0 comments on commit 3924cd3

Please sign in to comment.