A comprehensive system for multimedia authenticity and context verification in online media, developed for the ACM Multimedia 2025 Grand Challenge on Multimedia Verification.
This system addresses the growing challenge of misinformation and disinformation in online multimedia content. It provides a unified verification pipeline that evaluates both the authenticity and contextual accuracy of multimedia content across multilingual settings, producing expert-oriented verification reports alongside accessible summaries for the public.
- π Geolocation Verification - Advanced GPS coordinate prediction using the G3 framework
- β° Temporal Analysis - Capture/recording time detection through metadata and reverse image search
- π€ AI-Generated Content Detection - Sophisticated detection using AIGVDet with spatial-temporal analysis
- π Evidence Aggregation - Comprehensive report generation with multi-source verification
- π Out-of-Context Detection - Hybrid OOC detection using SearchOOC and HierOOC methods
- π Scalable Architecture - Microservices-based design with FastAPI
- π± Multi-format Support - Images, videos, and multilingual metadata processing
The system follows a multi-stage pipeline architecture that integrates four core verification services:
Input (JSON + Images/Videos)
β
[Data Preprocessing]
β
βββββββββββββββββββββββββββββββββββββββ
β Core Verification Services β
βββββββββββββββββββββββββββββββββββββββ€
β π Where? β β° When? β π€ AI Det β
β (G3) β (Timestamp)β (AIGVDet) β
βββββββββββββββββββββββββββββββββββββββ
β
[Evidence Aggregation & Report Generation]
β
Output (Structured Verification Report)
-
Geolocation Service (
modules/ACMMM25-Grand-Challenge-Geolocation/)- Implements the G3 framework for geographic location prediction
- Combines visual features, textual descriptions, and GPS coordinates
- Uses Google Gemini Pro 2.5 for contextual reasoning
-
Timestamp Detection Service (
modules/timestamp_detector/)- Detects capture/recording time through metadata analysis
- Performs reverse image search and textual similarity matching
- Provides confidence scoring for temporal estimates
-
AI-Generated Visual Detection (
modules/AIGVDet/)- Spatial Domain Detector for visual artifacts analysis
- Optical Flow Detector for temporal inconsistencies
- Trained on Generated Video Dataset (GVD) with 11 generative models
-
Report Generation Service (
modules/acmmm2025-report/)- Aggregates evidence from all verification services
- Generates structured reports for experts and simplified summaries for public
- Implements content classification and intent analysis
- Python 3.8+
- Git LFS (Large File Storage) for model files and datasets
- CUDA-compatible GPU (recommended for AI detection)
- Docker and Docker Compose
- API keys for Google services (Vision, Gemini)
-
Install Git LFS (if not already installed)
# macOS brew install git-lfs # Ubuntu/Debian sudo apt install git-lfs # Windows # Download from https://git-lfs.github.io/ # Initialize Git LFS git lfs install
-
Clone the repository
git clone <repository-url> cd multimodal-fact-checker # Fetch LFS files git lfs fetch git lfs checkout
-
Set up environment variables
# Copy and configure environment files for each service cp modules/ACMMM25-Grand-Challenge-Geolocation/.env.example .env # Add your API keys and service configurations
-
Install dependencies for each service
# Geolocation service cd modules/ACMMM25-Grand-Challenge-Geolocation pip install -r requirements.txt # Timestamp detection service cd ../timestamp_detector pip install -r requirements.txt # AI detection service cd ../AIGVDet pip install -r requirements.txt # Report generation service cd ../acmmm2025-report pip install -r requirements.txt
-
Verify Git LFS files are downloaded
# Check that model checkpoints are present ls -la modules/AIGVDet/checkpoints/ # Should show optical.pth and original.pth # If files are missing, manually pull LFS files git lfs pull # Check file sizes (LFS files should not be text pointers) file modules/AIGVDet/checkpoints/*.pth
# Start all services
docker-compose up -d
# Check service health
curl http://localhost:8000/health # Geolocation
curl http://localhost:8001/health # Timestamp
curl http://localhost:8002/health # AI Detection
curl http://localhost:8003/health # Report GenerationStart each service in separate terminals:
# Terminal 1: Geolocation Service (Port 8000)
cd modules/ACMMM25-Grand-Challenge-Geolocation
uvicorn app:app --host 0.0.0.0 --port 8000
# Terminal 2: Timestamp Detection Service (Port 8001)
cd modules/timestamp_detector
uvicorn app.main:app --host 0.0.0.0 --port 8001
# Terminal 3: AI Detection Service (Port 8002)
cd modules/AIGVDet
python main.py --port 8002
# Terminal 4: Report Generation Service (Port 8003)
cd modules/acmmm2025-report
uvicorn app.main:app --host 0.0.0.0 --port 8003import asyncio
from multimodal_fact_checker_pipeline import MultimodalFactCheckerPipeline
# Initialize pipeline
pipeline = MultimodalFactCheckerPipeline()
# Define metadata
metadata = {
"title": "Breaking News: City Center Incident",
"description": "Video shows emergency response downtown",
"location": "Downtown Area, Major City",
"category": "news",
"violence_level": "low"
}
# Define media files
media_files = ["video.mp4", "image.jpg"]
# Run verification
result = await pipeline.verify_multimedia(media_files, metadata)
# Generate summary report
summary = pipeline.generate_summary_report(result)
print(summary)Geolocation Service:
curl -X POST "http://localhost:8000/g3/predict" \
-F "[email protected]" \
-F "[email protected]"Timestamp Detection:
curl -X POST "http://localhost:8001/analyze/" \
-F "[email protected]" \
-F "[email protected]"AI Detection:
python modules/AIGVDet/main.py \
--input_path video.mp4 \
--output_json results.jsonPurpose: Predicts geographic locations of images and videos using advanced multimodal learning.
Technology Stack:
- G3 Framework with Geo-alignment and Geo-diversification
- Google Gemini Pro 2.5 for contextual reasoning
- MP16-Pro dataset for training
- CLIP encoders for visual-textual feature extraction
API Endpoint: POST /g3/predict
Input:
- Images/videos (JPEG, PNG, MP4)
- Metadata JSON file
Output:
{
"prediction": {
"latitude": 40.7128,
"longitude": -74.0060,
"location": "New York City, NY, USA",
"evidence": [
{
"analysis": "Landmark identification suggests Times Square area",
"references": ["base64_image_data", "https://source.url"]
}
]
},
"transcript": "Audio transcript if available"
}Configuration:
- Set
GOOGLE_API_KEYfor Gemini Pro access - Configure
MP16_DATABASE_PATHfor image database - Adjust
devicesetting for CPU/GPU usage
Purpose: Determines when multimedia content was captured or recorded.
Technology Stack:
- Google Search API integration via SerpAPI
- Reverse image search capabilities
- Text similarity matching with SequenceMatcher
- OpenCV for keyframe extraction
API Endpoint: POST /analyze/
Input:
- Media files (images/videos)
- Metadata JSON with title, description, location
Output:
{
"results": [
{
"timestamp": "2024-01-15T10:30:00Z",
"source": "https://news.example.com/article/123",
"confidence": 0.85,
"keyframe_file": "extracted_frame_001.jpg"
}
]
}Configuration:
- Set
SERPAPI_API_KEYfor Google Search access - Configure
MAX_SEARCH_RESULTSfor result limits - Adjust
SIMILARITY_THRESHOLDfor matching sensitivity
Purpose: Detects artificially generated video content using spatial and temporal analysis.
Technology Stack:
- Dual-pathway architecture (Spatial + Optical Flow)
- ResNet50 backbone for feature extraction
- RAFT algorithm for optical flow computation
- Trained on Generated Video Dataset (GVD)
Usage:
python modules/AIGVDet/main.py \
--input_path video.mp4 \
--output_json results.jsonOutput:
{
"video_001": {
"video_name": "sample.mp4",
"authentic_confidence_score": 0.7234,
"synthetic_confidence_score": 0.2766
}
}Model Architecture:
- Spatial Domain Detector: Analyzes RGB frames for visual artifacts
- Optical Flow Detector: Examines temporal motion patterns
- Fusion Layer: Combines spatial and temporal predictions
Configuration:
- Download model checkpoints:
checkpoints/optical.pth,checkpoints/original.pth - Set
CUDA_VISIBLE_DEVICESfor GPU selection - Configure
BATCH_SIZEfor processing efficiency
Purpose: Aggregates verification evidence and generates comprehensive reports.
Technology Stack:
- FastAPI web framework
- Large Language Models for content classification
- Evidence synthesis and summarization
- Multi-level reporting (expert + public)
API Endpoint: POST /v1/generate-report
Input:
{
"metadata": {...},
"media_files": ["file1.jpg", "file2.mp4"],
"verification_results": {
"geolocation": {...},
"timestamp": {...},
"ai_detection": {...}
}
}Output:
{
"expert_report": {
"overall_assessment": "authentic",
"confidence_score": 0.82,
"evidence_summary": {...},
"technical_analysis": {...}
},
"public_summary": {
"status": "Content appears authentic",
"key_findings": ["Location verified", "Timeline consistent"],
"confidence": "High"
}
}This project uses Git LFS to manage large files efficiently, including AI model checkpoints, datasets, and media files.
The following file types are automatically tracked by Git LFS (configured in .gitattributes):
AI/ML Models:
*.pth,*.pt- PyTorch model files*.h5,*.hdf5- Keras/HDF5 model files*.pkl,*.pickle- Pickled model files*.bin,*.safetensors- Binary model weights*.onnx- ONNX model files
Media Files:
*.mp4,*.avi,*.mov- Video files*.jpg,*.jpeg,*.png- Image files*.wav,*.mp3,*.flac- Audio files
Datasets:
*.csv,*.tsv,*.parquet- Large dataset files- Large
*.jsonfiles indata/anddatasets/directories
Archives:
*.zip,*.tar.gz,*.7z- Compressed archives
Documentation:
*.pdf- Research papers and large documents
Basic Operations:
# Check LFS status
git lfs status
# List all LFS tracked files
git lfs ls-files
# Show LFS file information
git lfs ls-files --size
# Pull all LFS files
git lfs pull
# Push LFS files
git lfs push origin mainWorking with Large Files:
# Add a new large file (automatically tracked if extension matches .gitattributes)
git add large_model.pth
git commit -m "Add new model checkpoint"
# Track additional file types
git lfs track "*.newtype"
git add .gitattributes
git commit -m "Track new file type with LFS"
# Check which files will be uploaded to LFS
git lfs statusTroubleshooting:
# If LFS files appear as text pointers instead of actual files
git lfs fetch --all
git lfs checkout
# Reset LFS cache
git lfs prune
# Verify LFS installation
git lfs version
git lfs envApproximate LFS Storage Usage:
- AIGVDet model checkpoints: ~500MB
- G3 framework databases: ~200MB
- Example datasets: ~100MB
- Test media files: ~50MB
- Total: ~850MB
Important Notes:
- Git LFS has bandwidth limits on free accounts (1GB/month)
- Consider using Git LFS for development and separate hosting for production models
- Large files are only downloaded when explicitly requested (
git lfs pull)
When adding new large files:
-
Verify file tracking:
# Check if file type is tracked git check-attr filter large_file.pth # Should output: large_file.pth: filter: lfs
-
Add and commit:
git add large_file.pth git commit -m "Add new model checkpoint" # Verify file is staged for LFS git lfs status
-
Push with LFS:
git push origin your-branch # LFS files are automatically pushed with regular git push
Best Practices:
- Keep LFS files organized in appropriate directories (
checkpoints/,data/,models/) - Use descriptive commit messages for LFS file changes
- Test that LFS files are properly downloaded after cloning
- Consider using
.lfsconfigfor project-specific LFS settings
Problem: "This repository is over its data quota"
# Solution: Clean up old LFS files
git lfs prune --recent
git lfs prune --verify-remoteProblem: LFS files show as text pointers
# Solution: Fetch and checkout LFS files
git lfs fetch --all
git lfs checkout --forceProblem: Cannot push large files
# Check Git LFS quota and usage
git lfs env
# Consider using alternative hosting for very large filesCreate .env files in each service directory:
Geolocation Service (.env):
GOOGLE_API_KEY=your_gemini_api_key
GOOGLE_CLOUD_PROJECT=your_project_id
MP16_DATABASE_PATH=./data/mp16_database
DEVICE=cuda # or cpuTimestamp Detection Service (.env):
SERPAPI_API_KEY=your_serpapi_key
MAX_SEARCH_RESULTS=10
SIMILARITY_THRESHOLD=0.75
ENABLE_REVERSE_IMAGE_SEARCH=trueReport Generation Service (.env):
LLM_API_KEY=your_llm_api_key
LLM_MODEL=gemini-2.5-flash-lite
MAX_REPORT_LENGTH=5000
ENABLE_PUBLIC_SUMMARIES=true| Service | Default Port | Health Check |
|---|---|---|
| Geolocation | 8000 | /health |
| Timestamp | 8001 | /health |
| AI Detection | 8002 | /health |
| Report Generation | 8003 | /health |
For High-Volume Processing:
config = {
'enable_parallel_processing': True,
'max_retries': 5,
'timeout': 600, # 10 minutes
'batch_size': 8
}For Resource-Constrained Environments:
config = {
'enable_parallel_processing': False,
'max_retries': 2,
'timeout': 300, # 5 minutes
'batch_size': 1
}This system implements the methodology described in our ACM Multimedia 2025 paper:
"Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media"
- Unified Verification Pipeline - Integration of visual forensics, textual analysis, and multimodal reasoning
- Hybrid OOC Detection - Novel approaches combining semantic similarity, temporal alignment, and geolocation cues
- Multi-Agent Evidence Aggregation - Sophisticated fusion of verification outputs
- Scalable Architecture - Microservices design for real-world deployment
Our system achieved superior performance on the ACM Multimedia 2025 Grand Challenge:
| Team | Total Score | Rank |
|---|---|---|
| Ours | 644.65 | 2nd |
| Team 1 | 844.22 | 1st |
| Team 3 | 487.19 | 3rd |
| Team 4 | 295.86 | 4th |
Out-of-Context Detection Results:
- SearchOOC: 96.0% accuracy
- HierOOC: 95.3% accuracy
- Outperformed baseline methods by significant margins
Main Task: Real-world multimedia cases from fact-checking archives
- Multilingual content (images/videos + metadata)
- Sensitive material handling
- Expert verification ground truth
Sub-Task: COSMOS dataset for out-of-context detection
- 161,752 training images
- 41,006 validation images
- 1,000 manually annotated test images
from multimodal_fact_checker_pipeline import MultimodalFactCheckerPipeline
# Advanced configuration
config = {
'geolocation_url': 'http://custom-g3-server:8000',
'timestamp_url': 'http://custom-timestamp-server:8001',
'ai_detection_url': 'http://custom-ai-server:8002',
'report_url': 'http://custom-report-server:8003',
'timeout': 600,
'max_retries': 5,
'enable_parallel_processing': True,
'custom_weights': {
'geolocation': 0.3,
'timestamp': 0.2,
'ai_detection': 0.3,
'context_verification': 0.2
}
}
pipeline = MultimodalFactCheckerPipeline(config)# Run only specific services
result = await pipeline.verify_multimedia(
media_files=['video.mp4'],
metadata=metadata,
services=['geolocation', 'ai_detection'] # Skip timestamp and report
)import glob
# Process multiple files
media_batches = [
glob.glob("batch1/*.mp4"),
glob.glob("batch2/*.jpg"),
glob.glob("batch3/*.mov")
]
results = []
for batch in media_batches:
result = await pipeline.verify_multimedia(batch, metadata)
results.append(result)# Generate custom reports
def custom_report_generator(result):
return {
'executive_summary': generate_executive_summary(result),
'technical_details': extract_technical_details(result),
'evidence_chain': build_evidence_chain(result),
'recommendations': generate_recommendations(result)
}
# Use custom generator
summary = custom_report_generator(result)multimodal-fact-checker/
βββ README.md
βββ multimodal_fact_checker_pipeline.py # Main integration pipeline
βββ ACM_MM_2025.pdf # Research paper
βββ modules/
βββ ACMMM25-Grand-Challenge-Geolocation/ # G3 geolocation service
β βββ app.py # FastAPI application
β βββ src/
β β βββ g3_batch_prediction.py # Core G3 implementation
β β βββ prompt/ # Prompt engineering
β β βββ g3/ # G3 framework modules
β βββ requirements.txt
βββ timestamp_detector/ # Timestamp detection service
β βββ app/
β β βββ main.py # FastAPI application
β β βββ core.py # Core processing logic
β β βββ utils.py # Utility functions
β βββ requirements.txt
βββ AIGVDet/ # AI-generated detection
β βββ main.py # Command-line interface
β βββ run.py # Core detection logic
β βββ checkpoints/ # Model weights
β βββ requirements.txt
βββ acmmm2025-report/ # Report generation service
βββ app/
β βββ main.py # FastAPI application
β βββ api/v1/ # API routes
β βββ services/ # Business logic
βββ requirements.txt
- Create service directory under
modules/ - Implement FastAPI application with standard endpoints
- Add health check endpoint at
/health - Update pipeline integration in
multimodal_fact_checker_pipeline.py - Add service configuration to environment variables
# Run unit tests for individual services
cd modules/ACMMM25-Grand-Challenge-Geolocation
python -m pytest tests/
# Integration testing
python -m pytest tests/integration/
# End-to-end pipeline testing
python test_pipeline.pyThe pipeline includes comprehensive logging:
import logging
# Configure logging level
logging.basicConfig(level=logging.DEBUG)
# Custom log formatting
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)Health monitoring endpoints:
/health- Basic service health/metrics- Prometheus metrics/status- Detailed status information
We welcome contributions to improve the multimodal fact-checking system!
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
- Follow PEP 8 style guidelines
- Add comprehensive docstrings
- Include unit tests for new features
- Update documentation for API changes
- Ensure backward compatibility
- New verification methods (audio analysis, blockchain verification)
- Performance optimization (caching, parallel processing)
- Dataset integration (new fact-checking datasets)
- UI/Frontend development (web interface, mobile apps)
- Security enhancements (input validation, rate limiting)
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this system in your research, please cite our paper:
@inproceedings{phan2025factchecking,
title={Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media},
author={Phan, Van-Hoang and Le-Duc, Tung-Duong and Pham, Long-Khanh and Le, Anh-Thu and Dinh-Nguyen, Quynh-Huong and Vo, Dang-Quan and Nguyen-Son, Hoang-Quoc and Tran, Anh-Duy and Vu, Dang and Dao, Minh-Son},
booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
pages={xxxx--yyyy},
year={2025},
organization={ACM}
}- G3 Framework: Geolocalization using Large Multi-modality Models
- AIGVDet: AI-Generated Video Detection via Spatial-Temporal Anomaly Learning
- COSMOS Dataset: Out-of-Context Misinformation Detection
- ACM MM 2025 Grand Challenge: Multimedia Verification Challenge
For questions, issues, or collaboration opportunities:
- Project Lead: Dang Vu ([email protected])
- Research Contact: Minh-Son Dao ([email protected])
- Technical Support: GitHub Issues
Special thanks to:
- National Institute of Information and Communications Technology (NICT) for research support
- FPT Software AI Center for development resources
- University of Science, Ho Chi Minh City for academic collaboration
- ACM Multimedia 2025 organizing committee for the Grand Challenge framework
- Open source community for foundational libraries and tools
Built with β€οΈ for fighting misinformation and promoting information integrity in the digital age.