Skip to content

Fsoft-AIC/multimodal-fact-checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Multimodal Fact-Checker: AI-Powered Authenticity and Context Verification

License Python FastAPI Git LFS

A comprehensive system for multimedia authenticity and context verification in online media, developed for the ACM Multimedia 2025 Grand Challenge on Multimedia Verification.

πŸ“– Overview

This system addresses the growing challenge of misinformation and disinformation in online multimedia content. It provides a unified verification pipeline that evaluates both the authenticity and contextual accuracy of multimedia content across multilingual settings, producing expert-oriented verification reports alongside accessible summaries for the public.

Key Features

  • 🌍 Geolocation Verification - Advanced GPS coordinate prediction using the G3 framework
  • ⏰ Temporal Analysis - Capture/recording time detection through metadata and reverse image search
  • πŸ€– AI-Generated Content Detection - Sophisticated detection using AIGVDet with spatial-temporal analysis
  • πŸ“Š Evidence Aggregation - Comprehensive report generation with multi-source verification
  • πŸ”— Out-of-Context Detection - Hybrid OOC detection using SearchOOC and HierOOC methods
  • πŸš€ Scalable Architecture - Microservices-based design with FastAPI
  • πŸ“± Multi-format Support - Images, videos, and multilingual metadata processing

πŸ—οΈ System Architecture

The system follows a multi-stage pipeline architecture that integrates four core verification services:

Input (JSON + Images/Videos) 
    ↓
[Data Preprocessing]
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Core Verification Services      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 🌍 Where? β”‚ ⏰ When?   β”‚ πŸ€– AI Det  β”‚
β”‚ (G3)      β”‚ (Timestamp)β”‚ (AIGVDet) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ↓
[Evidence Aggregation & Report Generation]
    ↓
Output (Structured Verification Report)

Core Components

  1. Geolocation Service (modules/ACMMM25-Grand-Challenge-Geolocation/)

    • Implements the G3 framework for geographic location prediction
    • Combines visual features, textual descriptions, and GPS coordinates
    • Uses Google Gemini Pro 2.5 for contextual reasoning
  2. Timestamp Detection Service (modules/timestamp_detector/)

    • Detects capture/recording time through metadata analysis
    • Performs reverse image search and textual similarity matching
    • Provides confidence scoring for temporal estimates
  3. AI-Generated Visual Detection (modules/AIGVDet/)

    • Spatial Domain Detector for visual artifacts analysis
    • Optical Flow Detector for temporal inconsistencies
    • Trained on Generated Video Dataset (GVD) with 11 generative models
  4. Report Generation Service (modules/acmmm2025-report/)

    • Aggregates evidence from all verification services
    • Generates structured reports for experts and simplified summaries for public
    • Implements content classification and intent analysis

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • Git LFS (Large File Storage) for model files and datasets
  • CUDA-compatible GPU (recommended for AI detection)
  • Docker and Docker Compose
  • API keys for Google services (Vision, Gemini)

Installation

  1. Install Git LFS (if not already installed)

    # macOS
    brew install git-lfs
    
    # Ubuntu/Debian
    sudo apt install git-lfs
    
    # Windows
    # Download from https://git-lfs.github.io/
    
    # Initialize Git LFS
    git lfs install
  2. Clone the repository

    git clone <repository-url>
    cd multimodal-fact-checker
    
    # Fetch LFS files
    git lfs fetch
    git lfs checkout
  3. Set up environment variables

    # Copy and configure environment files for each service
    cp modules/ACMMM25-Grand-Challenge-Geolocation/.env.example .env
    # Add your API keys and service configurations
  4. Install dependencies for each service

    # Geolocation service
    cd modules/ACMMM25-Grand-Challenge-Geolocation
    pip install -r requirements.txt
    
    # Timestamp detection service  
    cd ../timestamp_detector
    pip install -r requirements.txt
    
    # AI detection service
    cd ../AIGVDet  
    pip install -r requirements.txt
    
    # Report generation service
    cd ../acmmm2025-report
    pip install -r requirements.txt
  5. Verify Git LFS files are downloaded

    # Check that model checkpoints are present
    ls -la modules/AIGVDet/checkpoints/
    # Should show optical.pth and original.pth
    
    # If files are missing, manually pull LFS files
    git lfs pull
    
    # Check file sizes (LFS files should not be text pointers)
    file modules/AIGVDet/checkpoints/*.pth

Running the Services

Option 1: Docker Compose (Recommended)

# Start all services
docker-compose up -d

# Check service health
curl http://localhost:8000/health  # Geolocation
curl http://localhost:8001/health  # Timestamp  
curl http://localhost:8002/health  # AI Detection
curl http://localhost:8003/health  # Report Generation

Option 2: Individual Services

Start each service in separate terminals:

# Terminal 1: Geolocation Service (Port 8000)
cd modules/ACMMM25-Grand-Challenge-Geolocation
uvicorn app:app --host 0.0.0.0 --port 8000

# Terminal 2: Timestamp Detection Service (Port 8001)  
cd modules/timestamp_detector
uvicorn app.main:app --host 0.0.0.0 --port 8001

# Terminal 3: AI Detection Service (Port 8002)
cd modules/AIGVDet
python main.py --port 8002

# Terminal 4: Report Generation Service (Port 8003)
cd modules/acmmm2025-report  
uvicorn app.main:app --host 0.0.0.0 --port 8003

Using the Pipeline

Method 1: Python Integration

import asyncio
from multimodal_fact_checker_pipeline import MultimodalFactCheckerPipeline

# Initialize pipeline
pipeline = MultimodalFactCheckerPipeline()

# Define metadata
metadata = {
    "title": "Breaking News: City Center Incident",
    "description": "Video shows emergency response downtown",
    "location": "Downtown Area, Major City",
    "category": "news",
    "violence_level": "low"
}

# Define media files
media_files = ["video.mp4", "image.jpg"]

# Run verification
result = await pipeline.verify_multimedia(media_files, metadata)

# Generate summary report
summary = pipeline.generate_summary_report(result)
print(summary)

Method 2: Direct API Calls

Geolocation Service:

curl -X POST "http://localhost:8000/g3/predict" \
  -F "[email protected]" \
  -F "[email protected]"

Timestamp Detection:

curl -X POST "http://localhost:8001/analyze/" \
  -F "[email protected]" \
  -F "[email protected]"

AI Detection:

python modules/AIGVDet/main.py \
  --input_path video.mp4 \
  --output_json results.json

πŸ“Š Service Details

🌍 Geolocation Service (G3 Framework)

Purpose: Predicts geographic locations of images and videos using advanced multimodal learning.

Technology Stack:

  • G3 Framework with Geo-alignment and Geo-diversification
  • Google Gemini Pro 2.5 for contextual reasoning
  • MP16-Pro dataset for training
  • CLIP encoders for visual-textual feature extraction

API Endpoint: POST /g3/predict

Input:

  • Images/videos (JPEG, PNG, MP4)
  • Metadata JSON file

Output:

{
  "prediction": {
    "latitude": 40.7128,
    "longitude": -74.0060, 
    "location": "New York City, NY, USA",
    "evidence": [
      {
        "analysis": "Landmark identification suggests Times Square area",
        "references": ["base64_image_data", "https://source.url"]
      }
    ]
  },
  "transcript": "Audio transcript if available"
}

Configuration:

  • Set GOOGLE_API_KEY for Gemini Pro access
  • Configure MP16_DATABASE_PATH for image database
  • Adjust device setting for CPU/GPU usage

⏰ Timestamp Detection Service

Purpose: Determines when multimedia content was captured or recorded.

Technology Stack:

  • Google Search API integration via SerpAPI
  • Reverse image search capabilities
  • Text similarity matching with SequenceMatcher
  • OpenCV for keyframe extraction

API Endpoint: POST /analyze/

Input:

  • Media files (images/videos)
  • Metadata JSON with title, description, location

Output:

{
  "results": [
    {
      "timestamp": "2024-01-15T10:30:00Z",
      "source": "https://news.example.com/article/123",
      "confidence": 0.85,
      "keyframe_file": "extracted_frame_001.jpg"
    }
  ]
}

Configuration:

  • Set SERPAPI_API_KEY for Google Search access
  • Configure MAX_SEARCH_RESULTS for result limits
  • Adjust SIMILARITY_THRESHOLD for matching sensitivity

πŸ€– AI-Generated Visual Detection (AIGVDet)

Purpose: Detects artificially generated video content using spatial and temporal analysis.

Technology Stack:

  • Dual-pathway architecture (Spatial + Optical Flow)
  • ResNet50 backbone for feature extraction
  • RAFT algorithm for optical flow computation
  • Trained on Generated Video Dataset (GVD)

Usage:

python modules/AIGVDet/main.py \
  --input_path video.mp4 \
  --output_json results.json

Output:

{
  "video_001": {
    "video_name": "sample.mp4",
    "authentic_confidence_score": 0.7234,
    "synthetic_confidence_score": 0.2766
  }
}

Model Architecture:

  • Spatial Domain Detector: Analyzes RGB frames for visual artifacts
  • Optical Flow Detector: Examines temporal motion patterns
  • Fusion Layer: Combines spatial and temporal predictions

Configuration:

  • Download model checkpoints: checkpoints/optical.pth, checkpoints/original.pth
  • Set CUDA_VISIBLE_DEVICES for GPU selection
  • Configure BATCH_SIZE for processing efficiency

πŸ“Š Report Generation Service

Purpose: Aggregates verification evidence and generates comprehensive reports.

Technology Stack:

  • FastAPI web framework
  • Large Language Models for content classification
  • Evidence synthesis and summarization
  • Multi-level reporting (expert + public)

API Endpoint: POST /v1/generate-report

Input:

{
  "metadata": {...},
  "media_files": ["file1.jpg", "file2.mp4"],
  "verification_results": {
    "geolocation": {...},
    "timestamp": {...}, 
    "ai_detection": {...}
  }
}

Output:

{
  "expert_report": {
    "overall_assessment": "authentic",
    "confidence_score": 0.82,
    "evidence_summary": {...},
    "technical_analysis": {...}
  },
  "public_summary": {
    "status": "Content appears authentic",
    "key_findings": ["Location verified", "Timeline consistent"],
    "confidence": "High"
  }
}

πŸ“¦ Git LFS (Large File Storage)

This project uses Git LFS to manage large files efficiently, including AI model checkpoints, datasets, and media files.

πŸ“‹ Files Tracked by Git LFS

The following file types are automatically tracked by Git LFS (configured in .gitattributes):

AI/ML Models:

  • *.pth, *.pt - PyTorch model files
  • *.h5, *.hdf5 - Keras/HDF5 model files
  • *.pkl, *.pickle - Pickled model files
  • *.bin, *.safetensors - Binary model weights
  • *.onnx - ONNX model files

Media Files:

  • *.mp4, *.avi, *.mov - Video files
  • *.jpg, *.jpeg, *.png - Image files
  • *.wav, *.mp3, *.flac - Audio files

Datasets:

  • *.csv, *.tsv, *.parquet - Large dataset files
  • Large *.json files in data/ and datasets/ directories

Archives:

  • *.zip, *.tar.gz, *.7z - Compressed archives

Documentation:

  • *.pdf - Research papers and large documents

πŸ› οΈ Git LFS Commands

Basic Operations:

# Check LFS status
git lfs status

# List all LFS tracked files
git lfs ls-files

# Show LFS file information
git lfs ls-files --size

# Pull all LFS files
git lfs pull

# Push LFS files
git lfs push origin main

Working with Large Files:

# Add a new large file (automatically tracked if extension matches .gitattributes)
git add large_model.pth
git commit -m "Add new model checkpoint"

# Track additional file types
git lfs track "*.newtype"
git add .gitattributes
git commit -m "Track new file type with LFS"

# Check which files will be uploaded to LFS
git lfs status

Troubleshooting:

# If LFS files appear as text pointers instead of actual files
git lfs fetch --all
git lfs checkout

# Reset LFS cache
git lfs prune

# Verify LFS installation
git lfs version
git lfs env

πŸ“Š Storage Information

Approximate LFS Storage Usage:

  • AIGVDet model checkpoints: ~500MB
  • G3 framework databases: ~200MB
  • Example datasets: ~100MB
  • Test media files: ~50MB
  • Total: ~850MB

Important Notes:

  • Git LFS has bandwidth limits on free accounts (1GB/month)
  • Consider using Git LFS for development and separate hosting for production models
  • Large files are only downloaded when explicitly requested (git lfs pull)

πŸ”„ Contributing with LFS Files

When adding new large files:

  1. Verify file tracking:

    # Check if file type is tracked
    git check-attr filter large_file.pth
    
    # Should output: large_file.pth: filter: lfs
  2. Add and commit:

    git add large_file.pth
    git commit -m "Add new model checkpoint"
    
    # Verify file is staged for LFS
    git lfs status
  3. Push with LFS:

    git push origin your-branch
    # LFS files are automatically pushed with regular git push

Best Practices:

  • Keep LFS files organized in appropriate directories (checkpoints/, data/, models/)
  • Use descriptive commit messages for LFS file changes
  • Test that LFS files are properly downloaded after cloning
  • Consider using .lfsconfig for project-specific LFS settings

⚠️ Common Issues

Problem: "This repository is over its data quota"

# Solution: Clean up old LFS files
git lfs prune --recent
git lfs prune --verify-remote

Problem: LFS files show as text pointers

# Solution: Fetch and checkout LFS files
git lfs fetch --all
git lfs checkout --force

Problem: Cannot push large files

# Check Git LFS quota and usage
git lfs env
# Consider using alternative hosting for very large files

πŸ”§ Configuration

Environment Variables

Create .env files in each service directory:

Geolocation Service (.env):

GOOGLE_API_KEY=your_gemini_api_key
GOOGLE_CLOUD_PROJECT=your_project_id  
MP16_DATABASE_PATH=./data/mp16_database
DEVICE=cuda  # or cpu

Timestamp Detection Service (.env):

SERPAPI_API_KEY=your_serpapi_key
MAX_SEARCH_RESULTS=10
SIMILARITY_THRESHOLD=0.75
ENABLE_REVERSE_IMAGE_SEARCH=true

Report Generation Service (.env):

LLM_API_KEY=your_llm_api_key
LLM_MODEL=gemini-2.5-flash-lite
MAX_REPORT_LENGTH=5000
ENABLE_PUBLIC_SUMMARIES=true

Service Ports

Service Default Port Health Check
Geolocation 8000 /health
Timestamp 8001 /health
AI Detection 8002 /health
Report Generation 8003 /health

Performance Tuning

For High-Volume Processing:

config = {
    'enable_parallel_processing': True,
    'max_retries': 5,
    'timeout': 600,  # 10 minutes
    'batch_size': 8
}

For Resource-Constrained Environments:

config = {
    'enable_parallel_processing': False,
    'max_retries': 2, 
    'timeout': 300,  # 5 minutes
    'batch_size': 1
}

πŸ“Š Research Background

This system implements the methodology described in our ACM Multimedia 2025 paper:

"Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media"

Key Research Contributions

  1. Unified Verification Pipeline - Integration of visual forensics, textual analysis, and multimodal reasoning
  2. Hybrid OOC Detection - Novel approaches combining semantic similarity, temporal alignment, and geolocation cues
  3. Multi-Agent Evidence Aggregation - Sophisticated fusion of verification outputs
  4. Scalable Architecture - Microservices design for real-world deployment

Evaluation Results

Our system achieved superior performance on the ACM Multimedia 2025 Grand Challenge:

Team Total Score Rank
Ours 644.65 2nd
Team 1 844.22 1st
Team 3 487.19 3rd
Team 4 295.86 4th

Out-of-Context Detection Results:

  • SearchOOC: 96.0% accuracy
  • HierOOC: 95.3% accuracy
  • Outperformed baseline methods by significant margins

Dataset

Main Task: Real-world multimedia cases from fact-checking archives

  • Multilingual content (images/videos + metadata)
  • Sensitive material handling
  • Expert verification ground truth

Sub-Task: COSMOS dataset for out-of-context detection

  • 161,752 training images
  • 41,006 validation images
  • 1,000 manually annotated test images

πŸ” Advanced Usage

Custom Pipeline Configuration

from multimodal_fact_checker_pipeline import MultimodalFactCheckerPipeline

# Advanced configuration
config = {
    'geolocation_url': 'http://custom-g3-server:8000',
    'timestamp_url': 'http://custom-timestamp-server:8001',
    'ai_detection_url': 'http://custom-ai-server:8002', 
    'report_url': 'http://custom-report-server:8003',
    'timeout': 600,
    'max_retries': 5,
    'enable_parallel_processing': True,
    'custom_weights': {
        'geolocation': 0.3,
        'timestamp': 0.2, 
        'ai_detection': 0.3,
        'context_verification': 0.2
    }
}

pipeline = MultimodalFactCheckerPipeline(config)

Selective Service Execution

# Run only specific services
result = await pipeline.verify_multimedia(
    media_files=['video.mp4'],
    metadata=metadata,
    services=['geolocation', 'ai_detection']  # Skip timestamp and report
)

Batch Processing

import glob

# Process multiple files
media_batches = [
    glob.glob("batch1/*.mp4"),
    glob.glob("batch2/*.jpg"),  
    glob.glob("batch3/*.mov")
]

results = []
for batch in media_batches:
    result = await pipeline.verify_multimedia(batch, metadata)
    results.append(result)

Custom Report Templates

# Generate custom reports
def custom_report_generator(result):
    return {
        'executive_summary': generate_executive_summary(result),
        'technical_details': extract_technical_details(result),
        'evidence_chain': build_evidence_chain(result),
        'recommendations': generate_recommendations(result)
    }

# Use custom generator
summary = custom_report_generator(result)

πŸ› οΈ Development

Project Structure

multimodal-fact-checker/
β”œβ”€β”€ README.md
β”œβ”€β”€ multimodal_fact_checker_pipeline.py    # Main integration pipeline
β”œβ”€β”€ ACM_MM_2025.pdf                        # Research paper
└── modules/
    β”œβ”€β”€ ACMMM25-Grand-Challenge-Geolocation/    # G3 geolocation service
    β”‚   β”œβ”€β”€ app.py                             # FastAPI application
    β”‚   β”œβ”€β”€ src/
    β”‚   β”‚   β”œβ”€β”€ g3_batch_prediction.py         # Core G3 implementation
    β”‚   β”‚   β”œβ”€β”€ prompt/                        # Prompt engineering
    β”‚   β”‚   └── g3/                            # G3 framework modules
    β”‚   └── requirements.txt
    β”œβ”€β”€ timestamp_detector/                     # Timestamp detection service
    β”‚   β”œβ”€β”€ app/
    β”‚   β”‚   β”œβ”€β”€ main.py                        # FastAPI application
    β”‚   β”‚   β”œβ”€β”€ core.py                        # Core processing logic
    β”‚   β”‚   └── utils.py                       # Utility functions
    β”‚   └── requirements.txt
    β”œβ”€β”€ AIGVDet/                               # AI-generated detection
    β”‚   β”œβ”€β”€ main.py                            # Command-line interface
    β”‚   β”œβ”€β”€ run.py                             # Core detection logic
    β”‚   β”œβ”€β”€ checkpoints/                       # Model weights
    β”‚   └── requirements.txt
    └── acmmm2025-report/                      # Report generation service
        β”œβ”€β”€ app/
        β”‚   β”œβ”€β”€ main.py                        # FastAPI application
        β”‚   β”œβ”€β”€ api/v1/                        # API routes
        β”‚   └── services/                      # Business logic
        └── requirements.txt

Adding New Services

  1. Create service directory under modules/
  2. Implement FastAPI application with standard endpoints
  3. Add health check endpoint at /health
  4. Update pipeline integration in multimodal_fact_checker_pipeline.py
  5. Add service configuration to environment variables

Testing

# Run unit tests for individual services
cd modules/ACMMM25-Grand-Challenge-Geolocation
python -m pytest tests/

# Integration testing
python -m pytest tests/integration/

# End-to-end pipeline testing  
python test_pipeline.py

Monitoring and Logging

The pipeline includes comprehensive logging:

import logging

# Configure logging level
logging.basicConfig(level=logging.DEBUG)

# Custom log formatting
formatter = logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

Health monitoring endpoints:

  • /health - Basic service health
  • /metrics - Prometheus metrics
  • /status - Detailed status information

🀝 Contributing

We welcome contributions to improve the multimodal fact-checking system!

How to Contribute

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add comprehensive docstrings
  • Include unit tests for new features
  • Update documentation for API changes
  • Ensure backward compatibility

Areas for Contribution

  • New verification methods (audio analysis, blockchain verification)
  • Performance optimization (caching, parallel processing)
  • Dataset integration (new fact-checking datasets)
  • UI/Frontend development (web interface, mobile apps)
  • Security enhancements (input validation, rate limiting)

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Citation

If you use this system in your research, please cite our paper:

@inproceedings{phan2025factchecking,
  title={Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media},
  author={Phan, Van-Hoang and Le-Duc, Tung-Duong and Pham, Long-Khanh and Le, Anh-Thu and Dinh-Nguyen, Quynh-Huong and Vo, Dang-Quan and Nguyen-Son, Hoang-Quoc and Tran, Anh-Duy and Vu, Dang and Dao, Minh-Son},
  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
  pages={xxxx--yyyy},
  year={2025},
  organization={ACM}
}

πŸ”— Related Work

πŸ“ž Contact

For questions, issues, or collaboration opportunities:

πŸ† Acknowledgments

Special thanks to:

  • National Institute of Information and Communications Technology (NICT) for research support
  • FPT Software AI Center for development resources
  • University of Science, Ho Chi Minh City for academic collaboration
  • ACM Multimedia 2025 organizing committee for the Grand Challenge framework
  • Open source community for foundational libraries and tools

Built with ❀️ for fighting misinformation and promoting information integrity in the digital age.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published