LinlinScrappy

A robust LinkedIn Ad Library scraper that extracts ad details, downloads images and company logos while bypassing anti-bot protections.

Features

3-Level Scraping Pipeline
- Level 1: Extract ad URLs from search results
- Level 2: Scrape detailed ad information (headline, company, description, CTA, etc.)
- Level 3: Download ad images and company logos
Anti-Detection
- Browserless.io proxy with residential IPs
- Lynx CLI for HTML parsing (bypasses JavaScript detection)
- Realistic headers and request delays
Smart Filtering
- Automatically detects and skips video ads
- Downloads company logo only once per company
- Deduplicates ads by ID
Organized Output
- Company-based directories with timestamps
- Comprehensive JSON results with metadata
- Separate files for ad images and logos

Requirements

Node.js 16+
Lynx (text-based browser)
Browserless.io API key

Install Lynx

macOS:

brew install lynx

Ubuntu/Debian:

sudo apt-get install lynx

Installation

npm install

Configuration

Set your Browserless API key:

export BROWSERLESS_API_KEY="your_api_key_here"

Or create a .env file:

BROWSERLESS_API_KEY=your_api_key_here

Usage

Basic Usage

node main-scraper-lynx.js "<linkedin_ad_library_url>" [maxAds] [scrapingLevel]

Examples

Scrape 10 ads with full pipeline (images + logos):

node main-scraper-lynx.js "https://www.linkedin.com/ad-library/search?companyIds=89771&dateOption=last-30-days" 10 3

Extract 50 ad details without downloading images:

node main-scraper-lynx.js "https://www.linkedin.com/ad-library/search?companyIds=89771" 50 2

Just get ad URLs:

node main-scraper-lynx.js "https://www.linkedin.com/ad-library/search?companyIds=89771" 100 1

Scraping Levels

Level 1 - Find ad URLs only
Level 2 - Level 1 + Extract ad details
Level 3 - Level 1 + Level 2 + Download images (default)

Output Structure

priority_software_20251022/
├── 89771_logo.jpg                              # Company logo
├── 901310383_main_image_1761151474402.jpg     # Ad images
├── 878410933_main_image_1761151477960.jpg
└── scraping_results_lynx_1761151480917.json   # Metadata

Results JSON Structure

{
  "startTime": "2025-10-22T14:44:00.000Z",
  "endTime": "2025-10-22T14:44:33.000Z",
  "duration": 33000,
  "targetUrl": "https://www.linkedin.com/ad-library/search?companyIds=89771",
  "companyId": "89771",
  "scrapingLevel": 3,
  "processedAds": [
    {
      "adId": "901310383",
      "sourceUrl": "https://www.linkedin.com/ad-library/detail/901310383",
      "headline": "See Which ERP Performs Best for SMBs",
      "company": "Priority Software",
      "imageUrl": "https://media.licdn.com/...",
      "imageAlt": "See Which ERP Performs Best for SMBs",
      "targetUrl": "https://www.priority-software.com/...",
      "description": "Most IT leaders settle for vendor-led evaluations...",
      "logoUrl": "https://media.licdn.com/...",
      "adFormat": "Single Image Ad",
      "cta": null
    }
  ],
  "summary": {
    "totalAdsFound": 5,
    "totalAdsProcessed": 5,
    "successfulDetails": 4,
    "videoAdsSkipped": 1,
    "failedDetails": 0,
    "successfulDownloads": 4,
    "failedDownloads": 0
  }
}

Architecture

Main Components

main-scraper-lynx.js - Primary scraper with 3-level pipeline
image-downloader.js - Handles image/logo downloads
level1-only.js - Utility for quick URL extraction

Scraping Flow

1. Browserless Proxy
   └─> Navigate to LinkedIn Ad Library

2. Level 1: URL Collection
   └─> Scroll page to load ads
   └─> Extract ad detail URLs
   └─> Filter duplicates

3. Level 2: Detail Extraction (Lynx)
   └─> For each ad URL:
       ├─> Fetch HTML via Lynx
       ├─> Extract: headline, company, description, CTA, format
       └─> Skip if Video Ad

4. Level 3: Asset Download
   └─> Download ad images
   └─> Download company logo (once per company)

API Keys

Browserless.io

Get your API key at browserless.io

Free tier includes:

6 hours/month
Residential proxies
Chrome automation

Rate Limiting

Default delay between ads: 300ms
Image download timeout: 30s
Respects LinkedIn's rate limits

Limitations

Video ads are detected and skipped (no static images)
Requires active Browserless.io session
LinkedIn may block excessive scraping

Troubleshooting

Error: BROWSERLESS_API_KEY not set

export BROWSERLESS_API_KEY="your_key"

Error: lynx command not found

brew install lynx  # macOS

Error: Failed to rename directory

Directory already exists from previous run
Script will reuse existing directory

Code Review

See CODE_REVIEW.md for:

Architecture analysis
Performance optimizations
Security considerations
Recommended improvements

License

MIT

Contributing

Pull requests welcome! Please see CODE_REVIEW.md for optimization opportunities.

Disclaimer

This tool is for educational purposes. Ensure compliance with LinkedIn's Terms of Service and robots.txt. Use responsibly and respect rate limits.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
chargeflow_20251022		chargeflow_20251022
chargeflow_20251022_run1		chargeflow_20251022_run1
.gitignore		.gitignore
CODE_REVIEW.md		CODE_REVIEW.md
README.md		README.md
download-images-from-results.js		download-images-from-results.js
image-downloader.js		image-downloader.js
level1-only.js		level1-only.js
level2-batch.js		level2-batch.js
level2-lynx.js		level2-lynx.js
level2-scraper.js		level2-scraper.js
linkedin-page.html		linkedin-page.html
linkedin-page.png		linkedin-page.png
main-scraper-lynx.js		main-scraper-lynx.js
main-scraper.js		main-scraper.js
retry-failed-ads.js		retry-failed-ads.js
scraper.js		scraper.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LinlinScrappy

Features

Requirements

Install Lynx

Installation

Configuration

Usage

Basic Usage

Examples

Scraping Levels

Output Structure

Results JSON Structure

Architecture

Main Components

Scraping Flow

API Keys

Browserless.io

Rate Limiting

Limitations

Troubleshooting

Code Review

License

Contributing

Disclaimer

About

Uh oh!

Releases

Packages

Languages

Maximomx/LinlinScrappy

Folders and files

Latest commit

History

Repository files navigation

LinlinScrappy

Features

Requirements

Install Lynx

Installation

Configuration

Usage

Basic Usage

Examples

Scraping Levels

Output Structure

Results JSON Structure

Architecture

Main Components

Scraping Flow

API Keys

Browserless.io

Rate Limiting

Limitations

Troubleshooting

Code Review

License

Contributing

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages