Skip to content

Commit

Permalink
add upload to hf script
Browse files Browse the repository at this point in the history
  • Loading branch information
omar-sol committed Jul 28, 2024
1 parent cd37733 commit 129499e
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 361 deletions.
353 changes: 0 additions & 353 deletions data/scraping_scripts/create_db.ipynb

This file was deleted.

34 changes: 34 additions & 0 deletions data/scraping_scripts/upload_dbs_to_hf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
"""
Hugging Face Data Upload Script
Purpose:
This script uploads a local folder to a Hugging Face dataset repository. It's designed to
update or create a dataset on the Hugging Face Hub by uploading the contents of a specified
local folder.
Usage:
- Run the script: python data/scraping_scripts/upload_dbs_to_hf.py
The script will:
- Upload the contents of the 'data' folder to the specified Hugging Face dataset repository.
- https://huggingface.co/datasets/towardsai-buster/ai-tutor-vector-db
Configuration:
- The script is set to upload to the "towardsai-buster/test-data" dataset repository.
- It ignores files with extensions .jsonl, .py, .txt, and .ipynb.
- It deletes all existing files in the repository before uploading (due to delete_patterns=["*"]).
"""

from huggingface_hub import HfApi

api = HfApi()

api.upload_folder(
folder_path="data",
repo_id="towardsai-buster/ai-tutor-vector-db",
repo_type="dataset",
multi_commits=True,
multi_commits_verbose=True,
delete_patterns=["*"],
ignore_patterns=["*.jsonl", "*.py", "*.txt", "*.ipynb"],
)
Loading

0 comments on commit 129499e

Please sign in to comment.