No more bookmarking docs you'll forget about. No more 47 browser tabs. No more "I read this somewhere but can't find it."
npm i -g @memvid/mawOr just npx @memvid/maw if you don't want to install anything.
maw https://react.dev # crawls entire site → maw.mv2
maw find maw.mv2 "useEffect" # instant search
maw ask maw.mv2 "when should I use useCallback vs useMemo?" # AI answersThat last one needs OPENAI_API_KEY in your env. The first two work out of the box.
Basically anything.
maw https://docs.python.org # documentation (2,847 pages, ~4 min)
maw https://paulgraham.com # blogs
maw . # your local codebase
maw https://github.com/user/repo # any git repo
maw "https://news.ycombinator.com/item?id=12345" # single pagesIt figures out the right approach automatically:
- Single page URL → fetches just that page
- Domain root → crawls everything it can find
- Local path → reads your files directly
- Cloudflare/bot protection → switches to stealth browser
Crawl
maw <url> # saves to maw.mv2
maw <url> -o docs.mv2 # custom output file
maw <url> docs.mv2 # same (appends if file exists)
maw <url> --depth 5 --max-pages 500 # go deeperSearch
maw find docs.mv2 "authentication" # keyword search
maw ask docs.mv2 "how do I do X?" # AI-powered answers
maw list docs.mv2 # see what's in therePreview before crawling
maw preview stripe.com # shows sitemap, page count estimateExport
maw export docs.mv2 -f markdown # dump everything to markdown
maw export docs.mv2 -f json # or jsonBy default you get keyword search (BM25). It's fast and works well for most things.
Want semantic search? Add --embed:
maw https://kubernetes.io/docs --embed openaiCosts about $0.01 per 1000 pages. Your queries will understand meaning, not just keywords.
fetch (fast) → playwright (real browser) → rebrowser (stealth)
90% of sites work with a simple fetch. The other 10% get a real browser. If that's blocked too, stealth mode usually gets through.
You don't have to think about this. It just tries each approach until something works.
| Flag | What it does | Default |
|---|---|---|
-o, --output <file> |
Output file | maw.mv2 |
-d, --depth <n> |
How deep to crawl | 2 |
-m, --max-pages <n> |
Stop after this many pages | 150 |
-c, --concurrency <n> |
Parallel requests | 10 |
-r, --rate-limit <n> |
Max requests/second | 10 |
--include <regex> |
Only crawl URLs matching this | - |
--exclude <regex> |
Skip URLs matching this | - |
--browser |
Force browser mode | - |
--stealth |
Force stealth mode | - |
--embed [model] |
Enable semantic embeddings | - |
--no-robots |
Ignore robots.txt | - |
--no-sitemap |
Don't use sitemap.xml | - |
# grab some docs
maw https://react.dev
maw https://docs.python.org
maw https://stripe.com/docs
# archive a blog
maw https://paulgraham.com/articles.html
# your own code
maw .
maw https://github.com/your/repo
# combine sources into one file
maw https://react.dev https://nextjs.org -o frontend.mv2
# big crawl with semantic search
maw https://kubernetes.io/docs --depth 4 --max-pages 1000 --embed openaiUp to 50MB works without an API key. That's roughly 500-2000 pages depending on how much text is on each page.
Need more? Get a key at memvid.com.
Is this legal?
Respects robots.txt by default. What you do with --no-robots is your business.
What's an .mv2 file?
A memvid file. Single-file database with search built in. Like SQLite but for documents and memory.
Programmatic usage?
import { maw, find, ask } from '@memvid/maw'
await maw(['https://example.com'], { output: 'site.mv2' })
const results = await find('site.mv2', 'search term')
const answer = await ask('site.mv2', 'explain this to me')Will I get rate limited?
Default is 10 req/sec with backoff. Most sites won't notice. If you're worried, use --rate-limit 2.
JS-rendered content?
Works. Falls back to a real browser automatically when needed.
MIT License · Built on memvid
