Github Crawl includes python scripts to fetch and store data on the most popular GitHub repositories created within specific timeframes. The data is fetched from the GitHub API and written to a Google Sheet for further surveying of the latest projects out there.
/path/to/your/github-crawl/
├── README.md
├── credentials
│ └── <your-gdrive-credentials>.json
├── requirements.txt
├── top_repos_this_month.py
├── write_to_sheet_30.py
├── write_to_sheet_180.py
├── .env
└── .env.example
- README.md: This file. Provides an overview and usage instructions for the project.
- credentials/<your-gdrive-credentials,json>.: Contains the Google API credentials required to interact with Google Sheets. Ensure this file is securely stored and not shared.
- requirements.txt: Lists the Python dependencies needed to run the scripts. These dependencies can be installed using pip.
- top_repos_this_month.py: A script that fetches the top repositories created in the current month.
- write_to_sheet_30.py: Fetches and writes the top repositories from the last 30 days to a Google Sheet.
- write_to_sheet_180.py: Fetches and writes the top repositories from the last 180 days to a Google Sheet.
The project requires the following Python packages:
- gspread
- oauth2client
- requests
- python-dotenv
You can install these dependencies using the following command:
pip install -r requirements.txt
The project relies on environment variables for configuration. These variables should be stored in a .env
file in the project root directory:
- GDRIVE_CREDENTIALS_PATH: Path to the Google API credentials JSON file.
- SPREADSHEET_ID: The ID of the Google Sheet where the repository data will be written.
Example .env
file:
GDRIVE_CREDENTIALS_PATH=/path/to/github-crawl/credentials/<your-gdrive-credentials-file.json>
SPREADSHEET_ID=your_spreadsheet_id_here
The project includes scripts that fetch the top GitHub repositories from different time periods and write them to a Google Sheet.
-
Last 30 Days:
- Script:
write_to_sheet_30.py
- Description: Fetches the top repositories created in the last 30 days and writes the data to a Google Sheet tab named
{YYYY-MM-DD}_top_gh_repos_30
.
python write_to_sheet_30.py
- Script:
-
Last 180 Days:
- Script:
write_to_sheet_180.py
- Description: Fetches the top repositories created in the last 180 days and writes the data to a Google Sheet tab named
{YYYY-MM-DD}_top_gh_repos_180
.
python write_to_sheet_180.py
- Script:
-
Current Month:
- Script:
top_repos_this_month.py
- Description: Fetches the top repositories created in the current month. (Customize as needed.)
python top_repos_this_month.py
- Script:
Before running any script, ensure that the environment is set up with the necessary dependencies and environment variables.
Execute the desired script from the terminal:
python write_to_sheet_30.py # For last 30 days
python write_to_sheet_180.py # For last 180 days
The scripts will fetch the data from GitHub, process it, and write the results to the specified Google Sheet.
Contributions are welcome! If you have suggestions for improving this project, please submit an issue or fork the repository and submit a pull request.
For questions or support, you can reach out to me on x @_genesisdayrit