โโโโโโโ โโโ โโโโโโโโโโ โโโ
โโโโโโโโ โโโ โโโโโโโโโโโโโโ
โโโ โโโโโโโ โโโโโโโโโโโโโโ
โโโ โโโโโโ โโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโ
โโโโโโโ โโโโโโโ โโโ โโโโโโโโโโโ
An intelligent web crawler built in Go that extracts email addresses from websites with precision and speed.
๐ Fast, intelligent, and scalable email discovery for modern web applications
- ๐ง Intelligent Crawling: Prioritizes contact and information pages
- ๐ Multi-language Support: Recognizes keywords in 6 languages (Spanish, English, French, German, Italian, Portuguese)
- ๐ Meta Redirects: Automatically follows HTML meta redirects
- โก Redis Cache: Smart caching with 12-month persistence and 5,400x speed improvement
- ๐ Async Processing: Background jobs with webhook notifications
- ๐ Auto Deduplication: Automatically removes duplicate emails
- ๐ณ Dockerized: Easy deployment with Docker Compose
- ๐ก REST API: Both synchronous and asynchronous endpoints
- โ๏ธ Configurable Depth: Explore up to 3 levels deep (configurable)
If you like this project, consider buying me a coffee โ๐

- Docker
- Docker Compose
# Pull and run the latest image
docker run -d --name gurl-crawler \
-p 8080:8080 \
-p 6379:6379 \
luisra51/gurl:latest
# Or use with external Redis
docker run -d --name gurl-crawler \
-p 8080:8080 \
-e REDIS_HOST=your-redis-host \
-e REDIS_PORT=6379 \
luisra51/gurl:latestgit clone https://github.com/luisra51/gurl.git
cd gurl
docker-compose up --buildService will be available at http://localhost:8080
# Basic scan
curl "http://localhost:8080/scan?url=example.com"
# With specific protocol
curl "http://localhost:8080/scan?url=https://company.com"Response:
{
"emails": ["[email protected]", "[email protected]"],
"from_cache": false,
"crawl_time": "2.3s"
}curl -X POST "http://localhost:8080/scan/async" \
-H "Content-Type: application/json" \
-d '{
"url": "slow-website.com",
"webhook_url": "https://your-api.com/webhook",
"callback_id": "optional-tracking-id"
}'Immediate Response:
{
"job_id": "uuid-123-456-789",
"status": "queued",
"estimated_time": "30-60s",
"webhook_url": "https://your-api.com/webhook",
"check_status_url": "/scan/status/uuid-123-456-789"
}Webhook Callback (When Complete):
{
"job_id": "uuid-123-456-789",
"callback_id": "optional-tracking-id",
"status": "completed",
"url": "https://slow-website.com",
"emails": ["[email protected]"],
"crawl_time": "45.2s",
"pages_visited": 15,
"completed_at": "2025-08-07T10:30:00Z"
}{
"emails": ["[email protected]", "[email protected]"],
"from_cache": true,
"crawl_time": "396ยตs"
}{
"emails": [],
"from_cache": false,
"crawl_time": "2.1s"
}{
"error": "Invalid URL provided"
}The crawler intelligently recognizes contact-related keywords in 6 languages:
- ๐ช๐ธ Spanish: contacto, informaciรณn, equipo, nosotros, empresa
- ๐บ๐ธ English: contact, about, team, support, help, office
- ๐ซ๐ท French: nous-contacter, รฉquipe, aide, assistance, bureau
- ๐ฉ๐ช German: kontakt, รผber-uns, impressum, unser-team, hilfe
- ๐ฎ๐น Italian: contatti, chi-siamo, squadra, informazioni, supporto
- ๐ต๐น Portuguese: contato, sobre-nos, equipe, ajuda, suporte
43+ keywords total across all languages for maximum coverage
| Method | Endpoint | Description |
|---|---|---|
GET |
/scan?url=<website> |
Scan website (immediate response) |
GET |
/cache/stats |
View Redis cache statistics |
DELETE |
/cache/invalidate |
Clear all cache |
DELETE |
/cache/invalidate?url=<website> |
Clear specific URL cache |
| Method | Endpoint | Description |
|---|---|---|
POST |
/scan/async |
Create async scan job |
GET |
/scan/status/<job_id> |
Check job status |
DELETE |
/scan/cancel/<job_id> |
Cancel queued job |
GET |
/scan/jobs |
View active job statistics |
# View cache statistics
curl "http://localhost:8080/cache/stats"
# Check async job status
curl "http://localhost:8080/scan/status/uuid-123-456"
# Cancel queued job
curl -X DELETE "http://localhost:8080/scan/cancel/uuid-123-456"
# View active jobs and statistics
curl "http://localhost:8080/scan/jobs"
# Clear complete cache
curl -X DELETE "http://localhost:8080/cache/invalidate"# Crawler Settings
CRAWLER_MAX_DEPTH=3 # Maximum crawling depth
CRAWLER_DEDUPLICATE_EMAILS=true # Remove duplicate emails
# Cache Settings
CACHE_ENABLED=true # Enable Redis cache
CACHE_EXPIRATION_MONTHS=12 # Cache TTL in months
# Async Processing Settings
ASYNC_ENABLED=true # Enable async processing
ASYNC_WORKERS=3 # Number of parallel workers
ASYNC_JOB_TIMEOUT_SECONDS=300 # Job timeout (5 minutes)
ASYNC_WEBHOOK_RETRIES=3 # Webhook retry attempts
# Redis Configuration
REDIS_HOST=localhost # Redis host
REDIS_PORT=6379 # Redis port
REDIS_PERSIST_DISK=false # Disk persistence (prod: true)
# Server Configuration
SERVER_PORT=8080 # Server port
SERVER_HOST=0.0.0.0 # Server host- ๐ฏ Smart Crawling: Prioritizes contact pages with multilingual keywords
- ๐ Depth Control: Configurable depth (default: 3 levels)
- โก Cache System: Redis-based caching with 12-month TTL
- ๐ Auto Deduplication: Automatic email normalization and deduplication
- ๐ Performance: 5,400x faster responses with cache hits
/
โโโ .env # Environment variables (development)
โโโ .env.example # Configuration example
โโโ go.mod # Go dependencies
โโโ Dockerfile # Container definition
โโโ docker-compose.yml # Redis + App services
โโโ scan_urls.sh # Batch processing script
โโโ cmd/
โ โโโ crawler/
โ โโโ main.go # Application entry point
โโโ internal/
โโโ cache/
โ โโโ cache.go # Redis cache management
โโโ config/
โ โโโ config.go # Environment configuration
โโโ crawler/
โ โโโ crawler.go # Core crawling logic
โโโ handler/
โ โโโ handler.go # HTTP endpoints (sync + async)
โโโ jobs/
โโโ types.go # Job data types
โโโ queue.go # Redis job queue
โโโ worker.go # Worker system + webhooks
- ๐๏ธ Cache Layer: Redis with configurable TTL and optional persistence
- โ๏ธ Job Queue: Redis-based async system with parallel workers
- ๐ก Webhook System: Result delivery with retries and exponential backoff
- ๐ Multi-language: 43+ keywords across 6 languages
- ๐ง Config Management: Environment-based configuration
# Copy environment variables
cp .env.example .env
# Start complete stack
docker-compose up --build# Install Redis locally
# Ubuntu/Debian: sudo apt install redis-server
# macOS: brew install redis
# Start Redis
redis-server
# Install Go dependencies
go mod tidy
# Run application
go run cmd/crawler/main.goWe welcome contributions! Here's how you can help:
- ๐ Bug Reports: Found a bug? Open an issue
- โจ Feature Requests: Have an idea? Start a discussion
- ๐ Documentation: Improve docs, add examples, fix typos
- ๐ Translations: Add support for more languages
- ๐งช Testing: Write tests, test edge cases
- ๐ป Code: Implement new features or fix bugs
- Fork the repository
- Clone your fork:
git clone https://github.com/your-username/gurl.git cd gurl - Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes
- Test your changes:
docker-compose up --build # Test your changes - Commit and push:
git commit -m "Add amazing feature" git push origin feature/amazing-feature - Open a Pull Request
- Follow standard Go conventions (
go fmt,go vet) - Add tests for new features
- Update documentation for API changes
- Use meaningful commit messages
- JavaScript: Does not execute JavaScript, only analyzes static HTML
- Single Page Applications: Limited on SPAs that load content dynamically
- Rate limiting: Does not implement throttling between requests
- Same domain: Only crawls pages from the same base domain
- ๐ผ Lead Generation: Find contact emails from company websites
- ๐ Research Automation: Collect contact information at scale
- ๐ Competitive Analysis: Study competitor contact pages
- ๐ API Integration: Integrate with CRMs via webhooks
- ๐ฆ Batch Processing: Process thousands of URLs with
scan_urls.sh - ๐๏ธ Microservices: Email discovery service for distributed architectures
# Single container (no Redis persistence)
docker run -d --name gurl-crawler \
-p 8080:8080 \
luisra51/gurl:latest
# With Docker Compose (includes Redis)
docker-compose -f docker-compose.hub.yml up -d
# Production with external Redis
docker run -d --name gurl-crawler \
-p 8080:8080 \
-e REDIS_HOST=your-redis-host \
-e REDIS_PORT=6379 \
-e REDIS_PERSIST_DISK=true \
-e ASYNC_WORKERS=5 \
-e CACHE_EXPIRATION_MONTHS=12 \
luisra51/gurl:latest# Quick development (no persistence)
docker-compose up --build
# Fast rebuilds
docker-compose up --build crawler-app
# Clean and start fresh
docker-compose down -v && docker-compose up --builddocker build -t email-crawler .
docker run -p 8080:8080 email-crawler# View cache statistics
curl "http://localhost:8080/cache/stats"
# View worker and job status
curl "http://localhost:8080/scan/jobs"
# Application logs
docker-compose logs -f crawler-app
# Redis logs
docker-compose logs -f redis
# Enter container for debugging
docker-compose exec crawler-app sh