Skip to content

A high-performance, async Medium scraper that simplifies converting articles to Markdown. Offers flexible request handling and a user-friendly web interface.

License

Notifications You must be signed in to change notification settings

sarperavci/medium-scraper

Repository files navigation

Medium Scraper

A free, high-scale, async Medium scraper with request abstraction and HTML-to-Markdown parser. Quickly discover and convert Medium articles to clean Markdown with our intuitive web interface.

Screenshot from 2025-08-26 16-04-20

Decodo

1756709054016

Collect real-time data from any website with Decodo’s Web Scraping API and award-winning proxies.

– Free trials available

– 125M+ IPs in 195+ locations

– 100+ ready-made scraping templates

– Extensive documentation

– 24/7 tech support

🌐 Web Interface (Recommended)

The easiest way to use Medium Scraper is through our web interface:

Docker Deployment (Easiest Setup)

Pull our pre-built image from GitHub Container Registry:

docker run -p 8000:8000 ghcr.io/sarperavci/medium-scraper:latest

Then open your browser to http://localhost:8000

Alternative Installation

# Install web dependencies  
pip install medium-scraper[web]

# Run web server
cd web && python app.py

🚀 Usage

Setting up Decodo API

Decodo provides a powerful API for scraping Medium articles. Our web interface supports this API out of the box.

To use the Decodo API, you need to get an API key from Decodo. Sign up and get your API key from Decodo.

Once you have your API key, you can set it in the web interface by clicking the "Advanced Settings", set the Sender to decodo-webscraping-api and paste your API key in the Decodo API Key field.

Screenshot from 2025-08-26 16-16-43

Setting up Custom Proxies

You can set up custom proxies in the web interface by clicking the "Advanced Settings", set the Sender to requests and paste your proxy list in the Proxies field.

Setting up Proxyless

You can set up proxyless in the web interface by clicking the "Advanced Settings", set the Sender to requests and leave the Proxies field empty.

Features

  • Intuitive GUI for scraping Medium articles
  • Real-time progress tracking via WebSocket
  • Download results as ZIP files
  • Job history and persistent storage
  • Multiple request modes:
    • Decodo API: Smart managed scraping (requires Decodo API key)
    • Custom Proxies: Bring your own proxy list
    • Proxyless: Direct requests with your IP

Core Library

These features are also available when using the library programmatically. See our Library Documentation for details.

📚 Library Usage

For programmatic usage of the core library, please refer to our Library Documentation which provides detailed examples.

🖥️ CLI Tool

The command-line interface offers powerful scraping capabilities. See our CLI Documentation for comprehensive usage instructions.

🛠️ Installation Options

Basic Installation

pip install medium-scraper

📚 Request Senders

The library supports multiple request backends:

  1. RequestsRequestSender: Standard requests library (works with custom proxies or proxyless)
  2. DecodoScraperRequestSender: Advanced scraping with Decodo API (requires API key)
  3. CachedRequestSender: Adds caching to any sender

Choose the appropriate sender based on your needs:

from medium_scraper import RequestsRequestSender, DecodoScraperRequestSender

# For simple use cases (proxyless or with custom proxies)
sender = RequestsRequestSender()

# For advanced scraping with Decodo (requires API key from https://decodo.com)
sender = DecodoScraperRequestSender(api_key="your-decodo-api-key")

🔄 Async Usage

The library is designed to be fully async:

import asyncio
from medium_scraper import MediumExplorer

async def main():
    explorer = MediumExplorer()
    articles = await explorer.get_tag_articles("python", limit=5)
    
    for article in articles:
        print(f"Title: {article.title}")
        print(f"URL: {article.url}")

asyncio.run(main())

📖 Documentation

For more detailed information about each component, please see the documentation in the docs folder:

🔗 Links

About

A high-performance, async Medium scraper that simplifies converting articles to Markdown. Offers flexible request handling and a user-friendly web interface.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published