Skip to content

# Phishing URL Detector AI A powerful phishing URL detection system that combines a trusted domain whitelist with machine learning for accurate and efficient phishing detection.

Notifications You must be signed in to change notification settings

abriljordan/phishing-url-detector-ai

Repository files navigation

Phishing URL Detector AI

A powerful phishing URL detection system that combines a trusted domain whitelist with machine learning for accurate and efficient phishing detection.

Features

Two-Layer Protection

  1. Whitelist Database

    • Multiple trusted domain sources (Umbrella, Tranco, Majestic, DomCop)
    • Fast database lookups
    • High confidence for legitimate domains
    • Reduces false positives
  2. AI Model

    • BERT-based deep learning detection
    • Works completely offline
    • Catches sophisticated phishing attempts
    • High accuracy for unknown domains

Key Benefits

  • Speed: Quick whitelist checks for known domains
  • Accuracy: AI model for unknown domains
  • Reliability: Trusted sources (Umbrella, Tranco, Majestic, DomCop) for whitelist
  • Efficiency: Optimized database for fast lookups

Offline Setup

  1. Clone the repository
git clone https://github.com/yourusername/phishing-url-detector-ai.git
cd phishing-url-detector-ai
  1. Download the AI Model

    • Download the model from: Hugging Face Model
    • Create a models directory in the project root
    • Extract the model files into models/bert-finetuned-phishing
  2. Set up the environment

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
  1. Initialize the Whitelist Database
# Create database with schema
sqlite3 data/whitelist.db < schema.sql

# Import whitelist data (choose sources as needed)

# For Umbrella Top 1M:
wget -O data/top-1m.csv https://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip
unzip -o data/top-1m.csv.zip -d data/
sqlite3 data/whitelist.db ".mode csv" ".import --skip 1 data/top-1m.csv umbrella"

# For DomCop Top 10M (optional, large file):
# wget -O data/DomCoptop10milliondomains.csv.zip https://example.com/path/to/DomCoptop10milliondomains.csv
# unzip -o data/DomCoptop10milliondomains.csv.zip -d data/
# sqlite3 data/whitelist.db ".mode csv" ".import --skip 1 data/DomCoptop10milliondomains.csv domcop"

Usage

from phishing_detector import PhishingDetector

# Initialize detector (will use offline model)
detector = PhishingDetector(use_offline=True)

# Check a URL
result = detector.check_url("https://example.com")
print(f"Is phishing: {result['is_phishing']}")
print(f"Confidence: {result['confidence']:.2%}")

Data Sources

Notes

  • The model (1.34GB) and whitelist databases should be kept in the models and data directories respectively
  • Add these directories to your .gitignore to avoid committing large files
  • For production use, consider using a more robust database like PostgreSQL

Architecture

Components

  1. Whitelist Manager

    • SQLite database
    • Optimized for fast lookups
    • Multiple trusted domain sources
    • Automatic updates
  2. AI Model

    • BERT-based deep learning architecture
    • Feature extraction
    • Real-time prediction
    • Confidence scoring
  3. Web Interface

    • Modern, responsive design
    • Real-time URL checking
    • Detailed analysis view
    • Batch processing

Database Schema

  • umbrella: Trusted domains from Cisco Umbrella
  • Optimized indexes for fast lookups
  • Views for common queries
  • Automatic timestamp updates

Performance

  • Whitelist lookup: < 1ms
  • AI model prediction: ~100ms
  • Batch processing: ~50ms per URL
  • Database size: ~100MB

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

MIT License - See LICENSE for details.

About

# Phishing URL Detector AI A powerful phishing URL detection system that combines a trusted domain whitelist with machine learning for accurate and efficient phishing detection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published