This Python script utilizes Playwright and Beautiful Soup to scrape flight data from the ITA Matrix - https://matrix.itasoftware.com website. The scraped data is then stored in an Excel file.
- The script uses Playwright to automate interactions with the ITA Matrix website and collect flight data.
- User inputs for origin, destination, start date, and end date are validated.
- The script iterates over a date range, fills out the ITA Matrix search form, and collects data for each date.
- The collected data is organized into pandas dataframes and saved to an Excel file with separate sheets for each date.
- Clone the repository.
git clone https://github.com/AviDataGeek/ITAmatrix_price_scraper- Install required libraries via pip.
pip install -r requirements.txt- Run the script.
python main.py-
Follow the prompts to input the origin, destination, start date, and end date in the specified format.
-
Output should be saved into an Excel file named
output.xlsx. Make sure to check the generated Excel file for the collected flight information.
Here is the data scraping example
Used functions into this script is detailed below.
origin_input(): Takes user input for the origin airport in 3-char IATA format.destination_input(): Takes user input for the destination airport in 3-char IATA format.start_date_input(): Takes user input for the start date in MM/DD/YY format.end_date_input(): Takes user input for the end date in MM/DD/YY format.wait_for_table_load(page): Waits for the main table to load on the webpage.wait_for_carrier_table_load(page,carrier): Waits for carrier-specific table to load after clicking on a carrier.read_screen(page,starting_date,ending_date,dataframes): Reads the table contents for carriers and returns dataframe objects.flight_data_scraper(origin,destination,start_date,end_date): Main function to initiate the web scraping process.
Feel free to contribute! ✈