An interactive web application for analyzing gender distribution among chief investigators in academic research.
Visit the live application: https://ht-timchen.github.io/academic-gender-search
This project provides a comprehensive analysis of 2,679 Chief Investigators with 3+ Discovery Projects using a three-tier methodology:
- Tier 1: Web search analysis (74.8% - 2,004 researchers)
- Tier 2: Name-based AI predictions (18.4% - 493 researchers)
- Tier 3: Manual review system (6.8% - 182 researchers)
- Male: 1,906 (71.1%)
- Female: 591 (22.1%)
- Unknown: 182 (6.8%)
- Gender distribution analysis with transparent confidence levels
- Institutional affiliations and research areas
- Interactive search and filtering capabilities
- Responsive design for all devices
- Real-time search across names, affiliations, and research areas
- Filter by gender (Male, Female, Unknown)
- Filter by confidence level (High, Medium, Low)
- Live statistics showing gender distribution
- Total researcher count with breakdowns
- Visual indicators for data confidence
- Clean, minimal design
- Color-coded badges for easy identification
- Expandable summaries for detailed information
- Mobile-responsive layout
- Works on desktop, tablet, and mobile
- Optimized for all screen sizes
- Touch-friendly interface
- Frontend: Pure HTML, CSS, JavaScript
- Data: JSON format with comprehensive researcher profiles
- Deployment: GitHub Pages (static hosting)
- No dependencies: Runs entirely in the browser
- Tool: OpenAI GPT-4o-mini-search-preview with web search capabilities
- Process: Searches academic profiles, publications, institutional pages
- Output: High-confidence gender identification with research areas
- Cost: AUD $72.11 for 2,679 researchers
- Tool: OpenAI GPT-4o-mini (no web search)
- Process: Analyzes name patterns and linguistic origins
- Output: Speculative predictions clearly marked in metadata
- Cost: ~AUD $14.50 for 675 researchers (493 successful predictions)
- Process: Community-driven corrections via GitHub Issues
- Target: Remaining 182 researchers + any misclassifications
- Transparency: Full audit trail and correction history
academic-gender-search/
βββ index.html # Landing page with methodology
βββ visualizer.html # Main interactive application
βββ data/ # Data files directory
β βββ ci_gender.json # Final merged dataset (Tier 1+2)
β βββ ci_short_search_results.json # Tier 1 web search results
β βββ ci_name_based_gender_analysis.json # Tier 2 name analysis results
β βββ ci_gender_with_projects.json # Dataset with project counts
β βββ chief_investigators_data.json # Source data with project counts
β βββ ci_short.json # Input dataset
βββ output/ # Generated files directory
β βββ gender_chart_web.png # Web-optimized gender chart
β βββ gender_by_projects.png # Detailed project analysis chart
β βββ gender_analysis_detailed.png # Comprehensive analysis chart
β βββ chart_section.html # HTML section for embedding
βββ ci_gender_analyzer_v3.py # Tier 1 analyzer (web search)
βββ ci_name_based_gender_analyzer.py # Tier 2 analyzer (name-based)
βββ add_project_counts.py # Script to merge project data
βββ create_web_chart.py # Script to generate web charts
βββ visualize_gender_by_projects.py # Script to create detailed charts
βββ serve_local.py # Local development server
βββ .github/workflows/deploy.yml # GitHub Pages deployment
βββ README.md # This file
The project is organized into clear directories for better maintainability:
data/: Contains all JSON data files including input datasets, analysis results, and merged outputsoutput/: Contains all generated files including charts, images, and HTML sections- Root: Contains Python scripts, HTML files, and documentation
This separation makes it easy to:
- Distinguish between source data and generated outputs
- Clean up generated files without affecting source data
- Organize files by purpose and lifecycle
python3 serve_visualizer.pypython3 -m http.server 8000
# Open http://localhost:8000open visualizer.html
# Note: May have CORS issues with local JSON loadingThe final dataset (ci_gender.json) includes comprehensive metadata:
{
"total_analyzed": 2679,
"results": [
{
"name": "Researcher Name",
"affiliations": ["University Name"],
"gender": "male|female|unknown",
"summary": "Research background and details...",
"confidence": "high|medium|low",
"research_areas": ["Area 1", "Area 2"],
"web_sources_found": 5,
"search_successful": true,
"search_notes": "Tier 1 web search notes or Tier 2 name analysis disclaimer",
"name_analysis": {
"method": "name_pattern_analysis",
"original_gender": "unknown",
"name_based_gender": "male",
"confidence": "high",
"reasoning": "Common masculine name pattern",
"disclaimer": "Speculative prediction based on name only"
}
}
]
}- Tier 1 data:
web_sources_found,search_successful, detailedsummary - Tier 2 data:
name_analysisobject with prediction metadata - All tiers: Clear confidence indicators and transparency notes
This project is automatically deployed to GitHub Pages when changes are pushed to the main branch.
- Enable GitHub Pages in repository settings
- Set source to "GitHub Actions"
- Push changes to main branch
- Workflow will automatically deploy the site
- Add
CNAMEfile with your domain - Configure DNS settings
- Enable HTTPS in repository settings
- Fork the repository
- Create a feature branch
- Make your changes
- Test locally using the development server
- Submit a pull request
This project is open source and available under the MIT License.
For questions or suggestions, please open an issue on GitHub.
Built with β€οΈ for academic research transparency