A Python web scraping application that collects product data from Amazon India and displays it in a beautiful vintage-styled analytics dashboard.
- Multi-Category Scraping: Scrapes 5 product categories (Laptops, Headphones, Keyboards, Gaming Mice, Monitors)
- Concurrent Scraping: Uses ThreadPoolExecutor for fast parallel scraping
- SQLite Database: Stores product data with name, price, rating, image URL, and timestamp
- Beautiful Dashboard: Flask web app with vintage notebook aesthetic
- Rich Analytics:
- 4 stat cards (total products, avg/min/max prices)
- Bar chart (average price by category)
- Pie chart (product distribution)
- Histogram (price distribution)
- Data table with all products
- Python 3.x
- BeautifulSoup4 - HTML parsing
- Requests - HTTP client
- Flask - Web framework
- Pandas - Data manipulation
- Matplotlib & Seaborn - Data visualization
- SQLite3 - Database
- Clone the repository:
git clone <your-repo-url>
cd web_scraper- Create and activate virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt./run.shThis script will:
- Run the scraper to collect ~110 products from Amazon
- Start the Flask dashboard at http://127.0.0.1:5000/
Run scraper only:
python main.pyRun dashboard only:
python web/app.pyThen open http://127.0.0.1:5000/ in your browser.
web_scraper/
├── main.py # Main scraping orchestration
├── requirements.txt # Python dependencies
├── run.sh # Convenience script to run everything
├── scrapers/
│ ├── base_scraper.py # Abstract base scraper class
│ └── specific_scraper.py # Amazon product scraper implementation
├── utils/
│ └── database_manager.py # SQLite database context manager
└── web/
├── app.py # Flask application
├── static/
│ └── style.css # Additional styles (if needed)
└── templates/
└── index.html # Dashboard HTML template
For each product:
- Product name
- Price (₹)
- Rating
- Image URL
- Source URL
- Scraped timestamp
- Vintage Aesthetic: Beige/brown color palette with notebook-style design
- Responsive Design: Works on desktop, tablet, and mobile
- Real-time Stats: Live calculations from database
- Visual Analytics: Multiple chart types for data insights
- Clean Tables: Top 50 products displayed with full details
- Scraper uses realistic browser headers to avoid detection
- Concurrent scraping with 5 workers for optimal performance
- Data persists in
data.dbSQLite database - Charts use vintage color palette matching the dashboard theme
MIT License - Feel free to use and modify!