Skip to content

Commit 1ab39ae

Browse files
authored
feat(readers/web-firecrawl): migrate to Firecrawl v2 SDK (run-llama#19773)
1 parent b961a59 commit 1ab39ae

File tree

8 files changed

+1134
-42
lines changed

8 files changed

+1134
-42
lines changed

docs/docs/examples/data_connectors/WebPageDemo.ipynb

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,16 @@
290290
"Using Firecrawl to gather an entire website"
291291
]
292292
},
293+
{
294+
"cell_type": "code",
295+
"execution_count": null,
296+
"id": "a41579cc",
297+
"metadata": {},
298+
"outputs": [],
299+
"source": [
300+
"%pip install firecrawl-py"
301+
]
302+
},
293303
{
294304
"cell_type": "code",
295305
"execution_count": null,

llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/firecrawl_web/README.md

Lines changed: 25 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,31 @@
66

77
1. **Install Firecrawl Package**: Ensure the `firecrawl-py` package is installed to use the Firecrawl Web Loader. Install it via pip with the following command:
88

9-
```bash
10-
pip install firecrawl-py
11-
```
9+
```bash
10+
pip install 'firecrawl-py>=4.3.3'
11+
```
1212

1313
2. **API Key**: Secure an API key from [Firecrawl.dev](https://www.firecrawl.dev/) to access the Firecrawl services.
1414

1515
### Using Firecrawl Web Loader
1616

17-
- **Initialization**: Initialize the FireCrawlWebReader by providing the API key, the desired mode of operation (`crawl`, `scrape`, `search`, or `extract`), and any optional parameters for the Firecrawl API.
17+
- **Initialization**: Initialize the `FireCrawlWebReader` by providing the API key, the desired mode of operation (`crawl`, `scrape`, `map`, `search`, or `extract`), and any optional parameters for the Firecrawl API.
1818

19-
```python
20-
from llama_index.readers.web.firecrawl_web.base import FireCrawlWebReader
19+
```python
20+
from llama_index.readers.web.firecrawl_web.base import FireCrawlWebReader
2121

22-
firecrawl_reader = FireCrawlWebReader(
23-
api_key="your_api_key_here",
24-
mode="crawl", # or "scrape" or "search" or "extract"
25-
params={"additional": "parameters"},
26-
)
27-
```
22+
firecrawl_reader = FireCrawlWebReader(
23+
api_key="your_api_key_here",
24+
mode="crawl", # or "scrape" or "map" or "search" or "extract"
25+
# Common params for the underlying Firecrawl client
26+
# e.g. formats for content types and crawl limits
27+
params={
28+
"formats": ["markdown", "html"], # for scrape or crawl
29+
"limit": 100, # for crawl
30+
# "scrape_options": {"formats": ["markdown", "html"]}, # alternative shape for crawl
31+
},
32+
)
33+
```
2834

2935
- **Loading Data**: To load data, use the `load_data` method with the URL you wish to process.
3036

@@ -43,8 +49,13 @@ Here is an example demonstrating how to initialize the FireCrawlWebReader, load
4349
# Initialize the FireCrawlWebReader with your API key and desired mode
4450
firecrawl_reader = FireCrawlWebReader(
4551
api_key="your_api_key_here", # Replace with your actual API key
46-
mode="crawl", # Choose between "crawl", "scrape", "search" and "extract"
47-
params={"additional": "parameters"}, # Optional additional parameters
52+
mode="crawl", # Choose between "crawl", "scrape", "map", "search" and "extract"
53+
params={
54+
# Provide formats for the content you want to retrieve
55+
"formats": ["markdown", "html"],
56+
# Limit the number of pages to crawl
57+
"limit": 50,
58+
},
4859
)
4960

5061
# Load documents from Paul Graham's essay URL

0 commit comments

Comments
 (0)