Estimate how frequently Python packages are imported across public GitHub repositories.
We determine package popularity by:
- Randomly sampling GitHub repositories with Python as the main language
- Analyzing Python import statements in these repositories
- Extrapolating findings based on the total Python repository count (~18M repositories
The system continually improves its accuracy by sampling additional repositories every 6 hours via GitHub Actions.
Note: We have stopped considering standard Python libraries but have not yet removed all the data.
| Script | Purpose |
|---|---|
| find_repos.py | Queries GitHub API for random Python repositories |
| analyze_imports.py | Extracts import statements from repository files |
| count_libs.py | Aggregates and calculates package usage statistics |
| update_readme.py | Refreshes this README with latest data |
| total_python_repos.ipynb | Estimates total Python repository count on GitHub |
| File | Description | Format |
|---|---|---|
| repos.jsonl | Details of processed repositories | JSONL |
| imports.jsonl | Raw import statements extracted from repos | JSONL |
| library_counts.csv | Aggregated package usage statistics | CSV |
Our GitHub Actions workflow orchestrates the entire process:
Find Random Repos → Analyze Imports → Count Package Usage → Update Statistics → Refresh README
| Rank | Library | Count |
|---|---|---|
| 1 | numpy | 45585 |
| 2 | matplotlib | 14881 |
| 3 | torch | 14671 |
| 4 | pandas | 13586 |
| 5 | cv2 | 9868 |
| 6 | django | 9421 |
| 7 | sklearn | 7882 |
| 8 | utils | 7210 |
| 9 | requests | 7192 |
| 10 | tensorflow | 7044 |
Last updated: 2025-11-25 06:43:20 UTC