WORK IN PROGRESS

Academic Gender Search Analysis

An interactive web application for analyzing gender distribution among chief investigators in academic research.

🚀 Live Demo

Visit the live application: https://ht-timchen.github.io/academic-gender-search

📊 Overview

This project provides a comprehensive analysis of 2,679 Chief Investigators with 3+ Discovery Projects using a three-tier methodology:

Tier 1: Web search analysis (74.8% - 2,004 researchers)
Tier 2: Name-based AI predictions (18.4% - 493 researchers)
Tier 3: Manual review system (6.8% - 182 researchers)

Final Results:

Male: 1,906 (71.1%)
Female: 591 (22.1%)
Unknown: 182 (6.8%)

Features:

Gender distribution analysis with transparent confidence levels
Institutional affiliations and research areas
Interactive search and filtering capabilities
Responsive design for all devices

✨ Features

🔍 Search & Filter

Real-time search across names, affiliations, and research areas
Filter by gender (Male, Female, Unknown)
Filter by confidence level (High, Medium, Low)

📈 Statistics Dashboard

Live statistics showing gender distribution
Total researcher count with breakdowns
Visual indicators for data confidence

🎨 Modern Interface

Clean, minimal design
Color-coded badges for easy identification
Expandable summaries for detailed information
Mobile-responsive layout

📱 Responsive Design

Works on desktop, tablet, and mobile
Optimized for all screen sizes
Touch-friendly interface

🛠️ Technical Stack

Frontend: Pure HTML, CSS, JavaScript
Data: JSON format with comprehensive researcher profiles
Deployment: GitHub Pages (static hosting)
No dependencies: Runs entirely in the browser

🔬 Three-Tier Methodology

Tier 1: Web Search Analysis (74.8%)

Tool: OpenAI GPT-4o-mini-search-preview with web search capabilities
Process: Searches academic profiles, publications, institutional pages
Output: High-confidence gender identification with research areas
Cost: AUD $72.11 for 2,679 researchers

Tier 2: Name-Based AI Analysis (18.4%)

Tool: OpenAI GPT-4o-mini (no web search)
Process: Analyzes name patterns and linguistic origins
Output: Speculative predictions clearly marked in metadata
Cost: ~AUD $14.50 for 675 researchers (493 successful predictions)

Tier 3: Manual Review System (6.8%)

Process: Community-driven corrections via GitHub Issues
Target: Remaining 182 researchers + any misclassifications
Transparency: Full audit trail and correction history

📁 Project Structure

academic-gender-search/
├── index.html                         # Landing page with methodology
├── visualizer.html                    # Main interactive application
├── data/                              # Data files directory
│   ├── ci_gender.json                 # Final merged dataset (Tier 1+2)
│   ├── ci_short_search_results.json   # Tier 1 web search results
│   ├── ci_name_based_gender_analysis.json # Tier 2 name analysis results
│   ├── ci_gender_with_projects.json   # Dataset with project counts
│   ├── chief_investigators_data.json  # Source data with project counts
│   └── ci_short.json                  # Input dataset
├── output/                            # Generated files directory
│   ├── gender_chart_web.png           # Web-optimized gender chart
│   ├── gender_by_projects.png         # Detailed project analysis chart
│   ├── gender_analysis_detailed.png   # Comprehensive analysis chart
│   └── chart_section.html             # HTML section for embedding
├── ci_gender_analyzer_v3.py           # Tier 1 analyzer (web search)
├── ci_name_based_gender_analyzer.py   # Tier 2 analyzer (name-based)
├── add_project_counts.py              # Script to merge project data
├── create_web_chart.py                # Script to generate web charts
├── visualize_gender_by_projects.py    # Script to create detailed charts
├── serve_local.py                     # Local development server
├── .github/workflows/deploy.yml       # GitHub Pages deployment
└── README.md                          # This file

📂 Folder Organization

The project is organized into clear directories for better maintainability:

data/: Contains all JSON data files including input datasets, analysis results, and merged outputs
output/: Contains all generated files including charts, images, and HTML sections
Root: Contains Python scripts, HTML files, and documentation

This separation makes it easy to:

Distinguish between source data and generated outputs
Clean up generated files without affecting source data
Organize files by purpose and lifecycle

🚀 Local Development

Option 1: Python Server

python3 serve_visualizer.py

Option 2: Simple HTTP Server

python3 -m http.server 8000
# Open http://localhost:8000

Option 3: Direct File Opening

open visualizer.html
# Note: May have CORS issues with local JSON loading

📈 Data Format

The final dataset (ci_gender.json) includes comprehensive metadata:

{
  "total_analyzed": 2679,
  "results": [
    {
      "name": "Researcher Name",
      "affiliations": ["University Name"],
      "gender": "male|female|unknown",
      "summary": "Research background and details...",
      "confidence": "high|medium|low",
      "research_areas": ["Area 1", "Area 2"],
      "web_sources_found": 5,
      "search_successful": true,
      "search_notes": "Tier 1 web search notes or Tier 2 name analysis disclaimer",
      "name_analysis": {
        "method": "name_pattern_analysis",
        "original_gender": "unknown", 
        "name_based_gender": "male",
        "confidence": "high",
        "reasoning": "Common masculine name pattern",
        "disclaimer": "Speculative prediction based on name only"
      }
    }
  ]
}

Key Fields:

Tier 1 data: web_sources_found, search_successful, detailed summary
Tier 2 data: name_analysis object with prediction metadata
All tiers: Clear confidence indicators and transparency notes

🌐 Deployment

This project is automatically deployed to GitHub Pages when changes are pushed to the main branch.

Manual Deployment Steps:

Enable GitHub Pages in repository settings
Set source to "GitHub Actions"
Push changes to main branch
Workflow will automatically deploy the site

Custom Domain (Optional):

Add CNAME file with your domain
Configure DNS settings
Enable HTTPS in repository settings

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test locally using the development server
Submit a pull request

📄 License

This project is open source and available under the MIT License.

📧 Contact

For questions or suggestions, please open an issue on GitHub.

Built with ❤️ for academic research transparency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WORK IN PROGRESS

Academic Gender Search Analysis

🚀 Live Demo

📊 Overview

Final Results:

Features:

✨ Features

🔍 Search & Filter

📈 Statistics Dashboard

🎨 Modern Interface

📱 Responsive Design

🛠️ Technical Stack

🔬 Three-Tier Methodology

Tier 1: Web Search Analysis (74.8%)

Tier 2: Name-Based AI Analysis (18.4%)

Tier 3: Manual Review System (6.8%)

📁 Project Structure

📂 Folder Organization

🚀 Local Development

Option 1: Python Server

Option 2: Simple HTTP Server

Option 3: Direct File Opening

📈 Data Format

Key Fields:

🌐 Deployment

Manual Deployment Steps:

Custom Domain (Optional):

🤝 Contributing

📄 License

📧 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
data		data
output		output
.gitignore		.gitignore
ANALYSIS_RESULTS_SUMMARY.md		ANALYSIS_RESULTS_SUMMARY.md
README.md		README.md
README_ci_analyzer.md		README_ci_analyzer.md
README_visualizer.md		README_visualizer.md
add_project_counts.py		add_project_counts.py
check_models.py		check_models.py
ci_gender_analyzer_v3.py		ci_gender_analyzer_v3.py
ci_name_based_gender_analyzer.py		ci_name_based_gender_analyzer.py
convert_results_to_csv.py		convert_results_to_csv.py
create_web_chart.py		create_web_chart.py
index.html		index.html
monitor_progress.py		monitor_progress.py
requirements.txt		requirements.txt
run_name_analysis.py		run_name_analysis.py
serve_local.py		serve_local.py
serve_visualizer.py		serve_visualizer.py
visualize_gender_by_projects.py		visualize_gender_by_projects.py
visualizer.html		visualizer.html

ht-timchen/academic-gender-search

Folders and files

Latest commit

History

Repository files navigation

WORK IN PROGRESS

Academic Gender Search Analysis

🚀 Live Demo

📊 Overview

Final Results:

Features:

✨ Features

🔍 Search & Filter

📈 Statistics Dashboard

🎨 Modern Interface

📱 Responsive Design

🛠️ Technical Stack

🔬 Three-Tier Methodology

Tier 1: Web Search Analysis (74.8%)

Tier 2: Name-Based AI Analysis (18.4%)

Tier 3: Manual Review System (6.8%)

📁 Project Structure

📂 Folder Organization

🚀 Local Development

Option 1: Python Server

Option 2: Simple HTTP Server

Option 3: Direct File Opening

📈 Data Format

Key Fields:

🌐 Deployment

Manual Deployment Steps:

Custom Domain (Optional):

🤝 Contributing

📄 License

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages