The crawler API helps backends retrieve webpage titles when direct requests fail and provides full content extraction for the AI team, ensuring reliable metadata and text access.
- Retrieve a single webpage title via
GET /title/{url} - Retrieve multiple webpage titles via
POST /titles - Extract webpage content via
GET /content/{url} - Asynchronous processing for efficient request handling
Ensure you have Python 3.8+ installed, then install the required dependencies:
pip install fastapi nest_asyncio pydantic uvicornTo run the application, use the following command:
python main.py- URL:
/ - Method:
GET - Description: Returns a welcome message.
- Response:
{ "message": "Hello, this is the root of the API. Try /title/{url} or /content/{url}" }
- URL:
/title/{url:path} - Method:
GET - Description: Crawls the title of the specified URL for backend.
- Path Parameters:
url(string): The URL to crawl.
- Example Request:
curl -X GET "http://127.0.0.1:8000/title/https://example.com" - Response:
{ "title": "The title of the page" }
- URL:
/titles - Method:
POST - Description: Crawls the titles of multiple URLs for backend.
- Request Body:
urls(list of strings): A list of URLs to crawl.
- Example Request:
{ "urls": ["https://example.com", "https://example.org"] } - Example cURL Request:
curl -X POST "http://127.0.0.1:8000/titles" -H "Content-Type: application/json" -d '{"urls":["https://example.com","https://example.org"]}'
- Response:
[ { "url": "https://example.com", "title": "The title of the page" }, { "url": "https://example.org", "title": "Another page title" } ]
- URL:
/content/{url:path} - Method:
GET - Description: Crawls the (filtered) content of a webpage with markdown format for AI team.
- Path Parameters:
url(string): The URL to crawl.
- Example Request:
curl -X GET "http://127.0.0.1:8000/content/https://example.com" - Response:
{ "url": "https://example.com", "content": "# content...." }