Skip to content

A data-driven study examining how geographical distance influences language translation patterns on Wikipedia. This project constructs and compares weighted graphs—based on translation frequency, geographic proximity, and random connections—to analyze structural similarities using advanced graph distance metrics and Louvain community detection.

Notifications You must be signed in to change notification settings

pranjaykumar926/Wikipedia-Edits-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Edits Analysis

Author: Pranjay Kumar Contact: [email protected]

Python Pandas NumPy


Table of Contents

  1. Introduction
  2. Project Features
  3. Data Sources
  4. Technologies Used
  5. Installation and Setup
  6. Usage
  7. Visualizations and Key Insights
  8. Planned Enhancements
  9. Contributing Guidelines
  10. License
  11. Contact Information

Introduction

Wikipedia is among the most actively edited and accessed online encyclopedias. This project is designed to examine patterns in Wikipedia edits with a specific focus on language and geographical distribution. Key objectives include:

  • Analyzing the spatial distribution of Wikipedia contributions
  • Studying temporal trends in edit activity
  • Investigating language-wise differences in contributions
  • Delivering visual insights through interactive and static representations

Project Features

  • Multilingual edit activity analysis
  • Geospatial mapping of contributions
  • Temporal trend evaluation (hourly, daily, monthly)
  • Clean and structured visualizations using industry-standard libraries
  • Optimized preprocessing pipeline for large-scale datasets

Data Sources

  • Wikipedia Public API: For edit history and metadata
  • Open Wikipedia Contribution Datasets: To enrich regional analysis
  • Geo-IP Databases: For mapping user IPs to locations

Technologies Used

  • Programming Language: Python 3.x
  • Data Processing: Pandas, NumPy
  • Visualization: Matplotlib, Seaborn
  • Geospatial Mapping: GeoPandas, Folium
  • API Integration: MediaWiki (Wikipedia) API

Installation and Setup

# Clone the repository
$ git clone https://github.com/pranjaykumar926/Wikipedia-Edits-Analysis.git
$ cd Wikipedia-Edits-Analysis

# Optional: Create a virtual environment
$ python -m venv venv
$ source venv/bin/activate   # On Windows: venv\Scripts\activate

# Install required dependencies
$ pip install -r requirements.txt

Usage

# Run the main analysis script
$ python analyze_edits.py

All charts, maps, and processed outputs will be saved in the output/ directory.


Visualizations and Key Insights

  • Global Heatmaps displaying edit concentrations
  • Time-Series Plots revealing activity patterns by hour/day/month
  • Language Distribution Graphs highlighting editorial focus per language
  • Top Contributors & Most Edited Articles statistics

Planned Enhancements

  • Natural Language Processing (NLP) for semantic content analysis
  • Deployment of a live dashboard with real-time edit tracking
  • Integration of machine learning to detect unusual edit patterns
  • Expanded data coverage with historical archive parsing

Contributing Guidelines

We welcome contributions from the community. To contribute:

  1. Fork the repository
  2. Create a new feature branch (git checkout -b feature/your-feature)
  3. Commit your changes with a clear message
  4. Push to your fork (git push origin feature/your-feature)
  5. Open a pull request describing your enhancements

License

This project is currently not covered under a specific license. Please contact the author for permissions and use cases.


Contact Information


Harnessing open knowledge to understand digital collaboration at scale.

About

A data-driven study examining how geographical distance influences language translation patterns on Wikipedia. This project constructs and compares weighted graphs—based on translation frequency, geographic proximity, and random connections—to analyze structural similarities using advanced graph distance metrics and Louvain community detection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages