This project focuses on scraping flight details from Google Flights, processing the data, and performing cleaning and visualization for future use in analytics or predictive modeling.
- Web Scraping: Extracts detailed flight information, including:
- Airline names
- Flight duration
- Price
- Departure and arrival times
- Departure and arrival dates
- Airports
- Stops
- CO2 emissions
- Data Cleaning: Processes and filters scraped data to handle missing or unavailable values.
- CSV Outputs: Saves data in various stages for easy access and further use.
)
Data is scraped from Google Flights.
![Detailed flight information which I am going to Scrap](
google_flights_data.csv
: Raw flight data scraped from the website.price_unavailable_data.csv
: Records where price information is unavailable.cleaned_google_flights_data.csv
: Cleaned and processed dataset ready for analysis.
- Python
- Selenium: For browser automation and interaction.
- BeautifulSoup: For parsing HTML and extracting data.
- Pandas: For data manipulation and cleaning.
- Install Dependencies:
pip install selenium beautifulsoup4 pandas
- Set Up ChromeDriver:
- Download the appropriate version of ChromeDriver from here.
- Update the path to the driver in the script.
- Run the Script:
python scrape_flight_details_and_visualization.ipynb
- Check Outputs:
- The scraped data will be saved as CSV files in the working directory.
- Scraping:
- Selenium navigates to the specified Google Flights URL.
- BeautifulSoup parses the HTML to extract flight details.
- Data Cleaning:
- Processes columns like "Price" and "Arrival Time."
- Saves filtered data to
cleaned_google_flights_data.csv
.
Airline | Flight Duration | Price | Departure Time | Departure Date | Departure Airport | Arrival Time | Arrival Date | Arrival Airport | Stops | CO2 Emissions | Next Day Dispatcher |
---|---|---|---|---|---|---|---|---|---|---|---|
IndiGo | 11 hr 30 min | ₹27,515 | 8:45 AM | Sun, Feb 2 | Jayprakash Narayan International Airport, Patna | 6:45 PM | Sun, Feb 2 | Zayed International Airport | 1 stop | 266 kg CO2e | 0 |
IndiGo | 11 hr 15 min | ₹28,349 | 12:40 PM | Sun, Feb 2 | Jayprakash Narayan International Airport, Patna | 10:25 PM | Sun, Feb 2 | Zayed International Airport | 1 stop | 257 kg CO2e | 0 |
- Autometion using crontab linux.
- Collect more data for extended analysis.
- Implement advanced visualization techniques.
- Apply ML/DL models for predictive analytics, such as estimating missing flight prices.
This project is licensed under the MIT License. See the LICENSE file for more details.
For questions or contributions, feel free to open an issue or submit a pull request.