Wikidata CSV file annotation tool
A Python script that automatically matches cell values from CSV files to corresponding Wikidata entities using the Wikidata API, generating a new CSV with matched entities.
- Processes each cell in CSV files to find matching Wikidata entities
- Outputs clean results with only successful matches
- Formats Wikidata QIDs as complete URLs (e.g., http://www.wikidata.org/entity/Q42)
- Includes rate limiting to comply with Wikidata API policies
- Command-line interface for easy integration into workflows
- Python 3.6+ required
- Install dependencies: pip install requests
Basic command: python wikidata_matcher.py input.csv output.csv
Advanced options: python wikidata_matcher.py input.csv output.csv --file_id custom_id
input.csv: Path to your input CSV file (required)output.csv: Path for the results CSV file (required)--file_id: Optional identifier for the file (defaults to input filename)
The results CSV contains these columns:
file_id- Source file identifierrow_id- Row number (1-based index)col_id- Column number (1-based index)wikidata_qid- Full Wikidata entity URLwikidata_label- Official label from Wikidata
City,Country Paris,France Tokyo,Japan
python wikidata_matcher.py sample.csv results.csv
file_id,row_id,col_id,wikidata_qid,wikidata_label sample,1,1,http://www.wikidata.org/entity/Q90,Paris sample,1,2,http://www.wikidata.org/entity/Q142,France sample,2,1,http://www.wikidata.org/entity/Q1490,Tokyo sample,2,2,http://www.wikidata.org/entity/Q17,Japan
The script includes a 0.5 second delay between requests. For large datasets:
- Run during off-peak hours (UTC 00:00-06:00)
- Consider local caching of results
- Contact Wikimedia for bulk access if needed
Combine multiple Wikidata CSV outputs into a single consolidated file.
- 🔄 Merge unlimited processed CSV files
- 📊 Preserve original column structure
- ⚡ Efficient memory handling for large files
- 🔍 Optional deduplication of identical matches
python merge_wikidata.py "input_files/*.csv" merged_output.csvPlease open an issue on GitHub for:
- Bug reports
- Feature requests
- Usage questions
MIT License