Author: Pranjay Kumar Contact: [email protected]
- Introduction
- Project Features
- Data Sources
- Technologies Used
- Installation and Setup
- Usage
- Visualizations and Key Insights
- Planned Enhancements
- Contributing Guidelines
- License
- Contact Information
Wikipedia is among the most actively edited and accessed online encyclopedias. This project is designed to examine patterns in Wikipedia edits with a specific focus on language and geographical distribution. Key objectives include:
- Analyzing the spatial distribution of Wikipedia contributions
- Studying temporal trends in edit activity
- Investigating language-wise differences in contributions
- Delivering visual insights through interactive and static representations
- Multilingual edit activity analysis
- Geospatial mapping of contributions
- Temporal trend evaluation (hourly, daily, monthly)
- Clean and structured visualizations using industry-standard libraries
- Optimized preprocessing pipeline for large-scale datasets
- Wikipedia Public API: For edit history and metadata
- Open Wikipedia Contribution Datasets: To enrich regional analysis
- Geo-IP Databases: For mapping user IPs to locations
- Programming Language: Python 3.x
- Data Processing: Pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Geospatial Mapping: GeoPandas, Folium
- API Integration: MediaWiki (Wikipedia) API
# Clone the repository
$ git clone https://github.com/pranjaykumar926/Wikipedia-Edits-Analysis.git
$ cd Wikipedia-Edits-Analysis
# Optional: Create a virtual environment
$ python -m venv venv
$ source venv/bin/activate # On Windows: venv\Scripts\activate
# Install required dependencies
$ pip install -r requirements.txt# Run the main analysis script
$ python analyze_edits.pyAll charts, maps, and processed outputs will be saved in the output/ directory.
- Global Heatmaps displaying edit concentrations
- Time-Series Plots revealing activity patterns by hour/day/month
- Language Distribution Graphs highlighting editorial focus per language
- Top Contributors & Most Edited Articles statistics
- Natural Language Processing (NLP) for semantic content analysis
- Deployment of a live dashboard with real-time edit tracking
- Integration of machine learning to detect unusual edit patterns
- Expanded data coverage with historical archive parsing
We welcome contributions from the community. To contribute:
- Fork the repository
- Create a new feature branch (
git checkout -b feature/your-feature) - Commit your changes with a clear message
- Push to your fork (
git push origin feature/your-feature) - Open a pull request describing your enhancements
This project is currently not covered under a specific license. Please contact the author for permissions and use cases.
- GitHub: pranjaykumar926
- Email: [email protected]
Harnessing open knowledge to understand digital collaboration at scale.