Multimodal Fact-Checker: AI-Powered Authenticity and Context Verification

A comprehensive system for multimedia authenticity and context verification in online media, developed for the ACM Multimedia 2025 Grand Challenge on Multimedia Verification.

📖 Overview

This system addresses the growing challenge of misinformation and disinformation in online multimedia content. It provides a unified verification pipeline that evaluates both the authenticity and contextual accuracy of multimedia content across multilingual settings, producing expert-oriented verification reports alongside accessible summaries for the public.

Key Features

🌍 Geolocation Verification - Advanced GPS coordinate prediction using the G3 framework
⏰ Temporal Analysis - Capture/recording time detection through metadata and reverse image search
🤖 AI-Generated Content Detection - Sophisticated detection using AIGVDet with spatial-temporal analysis
📊 Evidence Aggregation - Comprehensive report generation with multi-source verification
🔗 Out-of-Context Detection - Hybrid OOC detection using SearchOOC and HierOOC methods
🚀 Scalable Architecture - Microservices-based design with FastAPI
📱 Multi-format Support - Images, videos, and multilingual metadata processing

🏗️ System Architecture

The system follows a multi-stage pipeline architecture that integrates four core verification services:

Input (JSON + Images/Videos) 
    ↓
[Data Preprocessing]
    ↓
┌─────────────────────────────────────┐
│     Core Verification Services      │
├─────────────────────────────────────┤
│ 🌍 Where? │ ⏰ When?   │ 🤖 AI Det  │
│ (G3)      │ (Timestamp)│ (AIGVDet) │
└─────────────────────────────────────┘
    ↓
[Evidence Aggregation & Report Generation]
    ↓
Output (Structured Verification Report)

Core Components

Geolocation Service (modules/ACMMM25-Grand-Challenge-Geolocation/)
- Implements the G3 framework for geographic location prediction
- Combines visual features, textual descriptions, and GPS coordinates
- Uses Google Gemini Pro 2.5 for contextual reasoning
Timestamp Detection Service (modules/timestamp_detector/)
- Detects capture/recording time through metadata analysis
- Performs reverse image search and textual similarity matching
- Provides confidence scoring for temporal estimates
AI-Generated Visual Detection (modules/AIGVDet/)
- Spatial Domain Detector for visual artifacts analysis
- Optical Flow Detector for temporal inconsistencies
- Trained on Generated Video Dataset (GVD) with 11 generative models
Report Generation Service (modules/acmmm2025-report/)
- Aggregates evidence from all verification services
- Generates structured reports for experts and simplified summaries for public
- Implements content classification and intent analysis

🚀 Quick Start

Prerequisites

Python 3.8+
Git LFS (Large File Storage) for model files and datasets
CUDA-compatible GPU (recommended for AI detection)
Docker and Docker Compose
API keys for Google services (Vision, Gemini)

Installation

Install Git LFS (if not already installed)

# macOS
brew install git-lfs

# Ubuntu/Debian
sudo apt install git-lfs

# Windows
# Download from https://git-lfs.github.io/

# Initialize Git LFS
git lfs install

Clone the repository

git clone <repository-url>
cd multimodal-fact-checker

# Fetch LFS files
git lfs fetch
git lfs checkout

Set up environment variables

# Copy and configure environment files for each service
cp modules/ACMMM25-Grand-Challenge-Geolocation/.env.example .env
# Add your API keys and service configurations

Install dependencies for each service

# Geolocation service
cd modules/ACMMM25-Grand-Challenge-Geolocation
pip install -r requirements.txt

# Timestamp detection service  
cd ../timestamp_detector
pip install -r requirements.txt

# AI detection service
cd ../AIGVDet  
pip install -r requirements.txt

# Report generation service
cd ../acmmm2025-report
pip install -r requirements.txt

Verify Git LFS files are downloaded

# Check that model checkpoints are present
ls -la modules/AIGVDet/checkpoints/
# Should show optical.pth and original.pth

# If files are missing, manually pull LFS files
git lfs pull

# Check file sizes (LFS files should not be text pointers)
file modules/AIGVDet/checkpoints/*.pth

Running the Services

Option 1: Docker Compose (Recommended)

# Start all services
docker-compose up -d

# Check service health
curl http://localhost:8000/health  # Geolocation
curl http://localhost:8001/health  # Timestamp  
curl http://localhost:8002/health  # AI Detection
curl http://localhost:8003/health  # Report Generation

Option 2: Individual Services

Start each service in separate terminals:

# Terminal 1: Geolocation Service (Port 8000)
cd modules/ACMMM25-Grand-Challenge-Geolocation
uvicorn app:app --host 0.0.0.0 --port 8000

# Terminal 2: Timestamp Detection Service (Port 8001)  
cd modules/timestamp_detector
uvicorn app.main:app --host 0.0.0.0 --port 8001

# Terminal 3: AI Detection Service (Port 8002)
cd modules/AIGVDet
python main.py --port 8002

# Terminal 4: Report Generation Service (Port 8003)
cd modules/acmmm2025-report  
uvicorn app.main:app --host 0.0.0.0 --port 8003

Using the Pipeline

Method 1: Python Integration

import asyncio
from multimodal_fact_checker_pipeline import MultimodalFactCheckerPipeline

# Initialize pipeline
pipeline = MultimodalFactCheckerPipeline()

# Define metadata
metadata = {
    "title": "Breaking News: City Center Incident",
    "description": "Video shows emergency response downtown",
    "location": "Downtown Area, Major City",
    "category": "news",
    "violence_level": "low"
}

# Define media files
media_files = ["video.mp4", "image.jpg"]

# Run verification
result = await pipeline.verify_multimedia(media_files, metadata)

# Generate summary report
summary = pipeline.generate_summary_report(result)
print(summary)

Method 2: Direct API Calls

Geolocation Service:

curl -X POST "http://localhost:8000/g3/predict" \
  -F "[email protected]" \
  -F "[email protected]"

Timestamp Detection:

curl -X POST "http://localhost:8001/analyze/" \
  -F "[email protected]" \
  -F "[email protected]"

AI Detection:

python modules/AIGVDet/main.py \
  --input_path video.mp4 \
  --output_json results.json

📊 Service Details

🌍 Geolocation Service (G3 Framework)

Purpose: Predicts geographic locations of images and videos using advanced multimodal learning.

Technology Stack:

G3 Framework with Geo-alignment and Geo-diversification
Google Gemini Pro 2.5 for contextual reasoning
MP16-Pro dataset for training
CLIP encoders for visual-textual feature extraction

API Endpoint: POST /g3/predict

Input:

Images/videos (JPEG, PNG, MP4)
Metadata JSON file

Output:

{
  "prediction": {
    "latitude": 40.7128,
    "longitude": -74.0060, 
    "location": "New York City, NY, USA",
    "evidence": [
      {
        "analysis": "Landmark identification suggests Times Square area",
        "references": ["base64_image_data", "https://source.url"]
      }
    ]
  },
  "transcript": "Audio transcript if available"
}

Configuration:

Set GOOGLE_API_KEY for Gemini Pro access
Configure MP16_DATABASE_PATH for image database
Adjust device setting for CPU/GPU usage

⏰ Timestamp Detection Service

Purpose: Determines when multimedia content was captured or recorded.

Technology Stack:

Google Search API integration via SerpAPI
Reverse image search capabilities
Text similarity matching with SequenceMatcher
OpenCV for keyframe extraction

API Endpoint: POST /analyze/

Input:

Media files (images/videos)
Metadata JSON with title, description, location

Output:

{
  "results": [
    {
      "timestamp": "2024-01-15T10:30:00Z",
      "source": "https://news.example.com/article/123",
      "confidence": 0.85,
      "keyframe_file": "extracted_frame_001.jpg"
    }
  ]
}

Configuration:

Set SERPAPI_API_KEY for Google Search access
Configure MAX_SEARCH_RESULTS for result limits
Adjust SIMILARITY_THRESHOLD for matching sensitivity

🤖 AI-Generated Visual Detection (AIGVDet)

Purpose: Detects artificially generated video content using spatial and temporal analysis.

Technology Stack:

Dual-pathway architecture (Spatial + Optical Flow)
ResNet50 backbone for feature extraction
RAFT algorithm for optical flow computation
Trained on Generated Video Dataset (GVD)

Usage:

python modules/AIGVDet/main.py \
  --input_path video.mp4 \
  --output_json results.json

Output:

{
  "video_001": {
    "video_name": "sample.mp4",
    "authentic_confidence_score": 0.7234,
    "synthetic_confidence_score": 0.2766
  }
}

Model Architecture:

Spatial Domain Detector: Analyzes RGB frames for visual artifacts
Optical Flow Detector: Examines temporal motion patterns
Fusion Layer: Combines spatial and temporal predictions

Configuration:

Download model checkpoints: checkpoints/optical.pth, checkpoints/original.pth
Set CUDA_VISIBLE_DEVICES for GPU selection
Configure BATCH_SIZE for processing efficiency

📊 Report Generation Service

Purpose: Aggregates verification evidence and generates comprehensive reports.

Technology Stack:

FastAPI web framework
Large Language Models for content classification
Evidence synthesis and summarization
Multi-level reporting (expert + public)

API Endpoint: POST /v1/generate-report

Input:

{
  "metadata": {...},
  "media_files": ["file1.jpg", "file2.mp4"],
  "verification_results": {
    "geolocation": {...},
    "timestamp": {...}, 
    "ai_detection": {...}
  }
}

Output:

{
  "expert_report": {
    "overall_assessment": "authentic",
    "confidence_score": 0.82,
    "evidence_summary": {...},
    "technical_analysis": {...}
  },
  "public_summary": {
    "status": "Content appears authentic",
    "key_findings": ["Location verified", "Timeline consistent"],
    "confidence": "High"
  }
}

📦 Git LFS (Large File Storage)

This project uses Git LFS to manage large files efficiently, including AI model checkpoints, datasets, and media files.

📋 Files Tracked by Git LFS

The following file types are automatically tracked by Git LFS (configured in .gitattributes):

AI/ML Models:

*.pth, *.pt - PyTorch model files
*.h5, *.hdf5 - Keras/HDF5 model files
*.pkl, *.pickle - Pickled model files
*.bin, *.safetensors - Binary model weights
*.onnx - ONNX model files

Media Files:

*.mp4, *.avi, *.mov - Video files
*.jpg, *.jpeg, *.png - Image files
*.wav, *.mp3, *.flac - Audio files

Datasets:

*.csv, *.tsv, *.parquet - Large dataset files
Large *.json files in data/ and datasets/ directories

Archives:

*.zip, *.tar.gz, *.7z - Compressed archives

Documentation:

*.pdf - Research papers and large documents

🛠️ Git LFS Commands

Basic Operations:

# Check LFS status
git lfs status

# List all LFS tracked files
git lfs ls-files

# Show LFS file information
git lfs ls-files --size

# Pull all LFS files
git lfs pull

# Push LFS files
git lfs push origin main

Working with Large Files:

# Add a new large file (automatically tracked if extension matches .gitattributes)
git add large_model.pth
git commit -m "Add new model checkpoint"

# Track additional file types
git lfs track "*.newtype"
git add .gitattributes
git commit -m "Track new file type with LFS"

# Check which files will be uploaded to LFS
git lfs status

Troubleshooting:

# If LFS files appear as text pointers instead of actual files
git lfs fetch --all
git lfs checkout

# Reset LFS cache
git lfs prune

# Verify LFS installation
git lfs version
git lfs env

📊 Storage Information

Approximate LFS Storage Usage:

AIGVDet model checkpoints: ~500MB
G3 framework databases: ~200MB
Example datasets: ~100MB
Test media files: ~50MB
Total: ~850MB

Important Notes:

Git LFS has bandwidth limits on free accounts (1GB/month)
Consider using Git LFS for development and separate hosting for production models
Large files are only downloaded when explicitly requested (git lfs pull)

🔄 Contributing with LFS Files

When adding new large files:

Verify file tracking:

# Check if file type is tracked
git check-attr filter large_file.pth

# Should output: large_file.pth: filter: lfs

Add and commit:

git add large_file.pth
git commit -m "Add new model checkpoint"

# Verify file is staged for LFS
git lfs status

Push with LFS:

git push origin your-branch
# LFS files are automatically pushed with regular git push

Best Practices:

Keep LFS files organized in appropriate directories (checkpoints/, data/, models/)
Use descriptive commit messages for LFS file changes
Test that LFS files are properly downloaded after cloning
Consider using .lfsconfig for project-specific LFS settings

⚠️ Common Issues

Problem: "This repository is over its data quota"

# Solution: Clean up old LFS files
git lfs prune --recent
git lfs prune --verify-remote

Problem: LFS files show as text pointers

# Solution: Fetch and checkout LFS files
git lfs fetch --all
git lfs checkout --force

Problem: Cannot push large files

# Check Git LFS quota and usage
git lfs env
# Consider using alternative hosting for very large files

🔧 Configuration

Environment Variables

Create .env files in each service directory:

Geolocation Service (.env):

GOOGLE_API_KEY=your_gemini_api_key
GOOGLE_CLOUD_PROJECT=your_project_id  
MP16_DATABASE_PATH=./data/mp16_database
DEVICE=cuda  # or cpu

Timestamp Detection Service (.env):

SERPAPI_API_KEY=your_serpapi_key
MAX_SEARCH_RESULTS=10
SIMILARITY_THRESHOLD=0.75
ENABLE_REVERSE_IMAGE_SEARCH=true

Report Generation Service (.env):

LLM_API_KEY=your_llm_api_key
LLM_MODEL=gemini-2.5-flash-lite
MAX_REPORT_LENGTH=5000
ENABLE_PUBLIC_SUMMARIES=true

Service Ports

Service	Default Port	Health Check
Geolocation	8000	`/health`
Timestamp	8001	`/health`
AI Detection	8002	`/health`
Report Generation	8003	`/health`

Performance Tuning

For High-Volume Processing:

config = {
    'enable_parallel_processing': True,
    'max_retries': 5,
    'timeout': 600,  # 10 minutes
    'batch_size': 8
}

For Resource-Constrained Environments:

config = {
    'enable_parallel_processing': False,
    'max_retries': 2, 
    'timeout': 300,  # 5 minutes
    'batch_size': 1
}

📊 Research Background

This system implements the methodology described in our ACM Multimedia 2025 paper:

"Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media"

Key Research Contributions

Unified Verification Pipeline - Integration of visual forensics, textual analysis, and multimodal reasoning
Hybrid OOC Detection - Novel approaches combining semantic similarity, temporal alignment, and geolocation cues
Multi-Agent Evidence Aggregation - Sophisticated fusion of verification outputs
Scalable Architecture - Microservices design for real-world deployment

Evaluation Results

Our system achieved superior performance on the ACM Multimedia 2025 Grand Challenge:

Team	Total Score	Rank
Ours	644.65	2nd
Team 1	844.22	1st
Team 3	487.19	3rd
Team 4	295.86	4th

Out-of-Context Detection Results:

SearchOOC: 96.0% accuracy
HierOOC: 95.3% accuracy
Outperformed baseline methods by significant margins

Dataset

Main Task: Real-world multimedia cases from fact-checking archives

Multilingual content (images/videos + metadata)
Sensitive material handling
Expert verification ground truth

Sub-Task: COSMOS dataset for out-of-context detection

161,752 training images
41,006 validation images
1,000 manually annotated test images

🔍 Advanced Usage

Custom Pipeline Configuration

from multimodal_fact_checker_pipeline import MultimodalFactCheckerPipeline

# Advanced configuration
config = {
    'geolocation_url': 'http://custom-g3-server:8000',
    'timestamp_url': 'http://custom-timestamp-server:8001',
    'ai_detection_url': 'http://custom-ai-server:8002', 
    'report_url': 'http://custom-report-server:8003',
    'timeout': 600,
    'max_retries': 5,
    'enable_parallel_processing': True,
    'custom_weights': {
        'geolocation': 0.3,
        'timestamp': 0.2, 
        'ai_detection': 0.3,
        'context_verification': 0.2
    }
}

pipeline = MultimodalFactCheckerPipeline(config)

Selective Service Execution

# Run only specific services
result = await pipeline.verify_multimedia(
    media_files=['video.mp4'],
    metadata=metadata,
    services=['geolocation', 'ai_detection']  # Skip timestamp and report
)

Batch Processing

import glob

# Process multiple files
media_batches = [
    glob.glob("batch1/*.mp4"),
    glob.glob("batch2/*.jpg"),  
    glob.glob("batch3/*.mov")
]

results = []
for batch in media_batches:
    result = await pipeline.verify_multimedia(batch, metadata)
    results.append(result)

Custom Report Templates

# Generate custom reports
def custom_report_generator(result):
    return {
        'executive_summary': generate_executive_summary(result),
        'technical_details': extract_technical_details(result),
        'evidence_chain': build_evidence_chain(result),
        'recommendations': generate_recommendations(result)
    }

# Use custom generator
summary = custom_report_generator(result)

🛠️ Development

Project Structure

multimodal-fact-checker/
├── README.md
├── multimodal_fact_checker_pipeline.py    # Main integration pipeline
├── ACM_MM_2025.pdf                        # Research paper
└── modules/
    ├── ACMMM25-Grand-Challenge-Geolocation/    # G3 geolocation service
    │   ├── app.py                             # FastAPI application
    │   ├── src/
    │   │   ├── g3_batch_prediction.py         # Core G3 implementation
    │   │   ├── prompt/                        # Prompt engineering
    │   │   └── g3/                            # G3 framework modules
    │   └── requirements.txt
    ├── timestamp_detector/                     # Timestamp detection service
    │   ├── app/
    │   │   ├── main.py                        # FastAPI application
    │   │   ├── core.py                        # Core processing logic
    │   │   └── utils.py                       # Utility functions
    │   └── requirements.txt
    ├── AIGVDet/                               # AI-generated detection
    │   ├── main.py                            # Command-line interface
    │   ├── run.py                             # Core detection logic
    │   ├── checkpoints/                       # Model weights
    │   └── requirements.txt
    └── acmmm2025-report/                      # Report generation service
        ├── app/
        │   ├── main.py                        # FastAPI application
        │   ├── api/v1/                        # API routes
        │   └── services/                      # Business logic
        └── requirements.txt

Adding New Services

Create service directory under modules/
Implement FastAPI application with standard endpoints
Add health check endpoint at /health
Update pipeline integration in multimodal_fact_checker_pipeline.py
Add service configuration to environment variables

Testing

# Run unit tests for individual services
cd modules/ACMMM25-Grand-Challenge-Geolocation
python -m pytest tests/

# Integration testing
python -m pytest tests/integration/

# End-to-end pipeline testing  
python test_pipeline.py

Monitoring and Logging

The pipeline includes comprehensive logging:

import logging

# Configure logging level
logging.basicConfig(level=logging.DEBUG)

# Custom log formatting
formatter = logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

Health monitoring endpoints:

/health - Basic service health
/metrics - Prometheus metrics
/status - Detailed status information

🤝 Contributing

We welcome contributions to improve the multimodal fact-checking system!

How to Contribute

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

Development Guidelines

Follow PEP 8 style guidelines
Add comprehensive docstrings
Include unit tests for new features
Update documentation for API changes
Ensure backward compatibility

Areas for Contribution

New verification methods (audio analysis, blockchain verification)
Performance optimization (caching, parallel processing)
Dataset integration (new fact-checking datasets)
UI/Frontend development (web interface, mobile apps)
Security enhancements (input validation, rate limiting)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎯 Citation

If you use this system in your research, please cite our paper:

@inproceedings{phan2025factchecking,
  title={Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media},
  author={Phan, Van-Hoang and Le-Duc, Tung-Duong and Pham, Long-Khanh and Le, Anh-Thu and Dinh-Nguyen, Quynh-Huong and Vo, Dang-Quan and Nguyen-Son, Hoang-Quoc and Tran, Anh-Duy and Vu, Dang and Dao, Minh-Son},
  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
  pages={xxxx--yyyy},
  year={2025},
  organization={ACM}
}

🔗 Related Work

G3 Framework: Geolocalization using Large Multi-modality Models
AIGVDet: AI-Generated Video Detection via Spatial-Temporal Anomaly Learning
COSMOS Dataset: Out-of-Context Misinformation Detection
ACM MM 2025 Grand Challenge: Multimedia Verification Challenge

📞 Contact

For questions, issues, or collaboration opportunities:

Project Lead: Dang Vu ([email protected])
Research Contact: Minh-Son Dao ([email protected])
Technical Support: GitHub Issues

🏆 Acknowledgments

Special thanks to:

National Institute of Information and Communications Technology (NICT) for research support
FPT Software AI Center for development resources
University of Science, Ho Chi Minh City for academic collaboration
ACM Multimedia 2025 organizing committee for the Grand Challenge framework
Open source community for foundational libraries and tools

Built with ❤️ for fighting misinformation and promoting information integrity in the digital age.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
modules		modules
.gitattributes		.gitattributes
.gitignore		.gitignore
ACM_MM_2025.pdf		ACM_MM_2025.pdf
README.md		README.md
multimodal_fact_checker_pipeline.py		multimodal_fact_checker_pipeline.py

Fsoft-AIC/multimodal-fact-checker

Folders and files

Latest commit

History

Repository files navigation

Multimodal Fact-Checker: AI-Powered Authenticity and Context Verification

📖 Overview

Key Features

🏗️ System Architecture

Core Components

🚀 Quick Start

Prerequisites

Installation

Running the Services

Option 1: Docker Compose (Recommended)

Option 2: Individual Services

Using the Pipeline

Method 1: Python Integration

Method 2: Direct API Calls

📊 Service Details

🌍 Geolocation Service (G3 Framework)

⏰ Timestamp Detection Service

🤖 AI-Generated Visual Detection (AIGVDet)

📊 Report Generation Service

📦 Git LFS (Large File Storage)

📋 Files Tracked by Git LFS

🛠️ Git LFS Commands

📊 Storage Information

🔄 Contributing with LFS Files

⚠️ Common Issues

🔧 Configuration

Environment Variables

Service Ports

Performance Tuning

📊 Research Background

Key Research Contributions

Evaluation Results

Dataset

🔍 Advanced Usage

Custom Pipeline Configuration

Selective Service Execution

Batch Processing

Custom Report Templates

🛠️ Development

Project Structure

Adding New Services

Testing

Monitoring and Logging

🤝 Contributing

How to Contribute

Development Guidelines

Areas for Contribution

📄 License

🎯 Citation

🔗 Related Work

📞 Contact

🏆 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages