Smart Web Extraction API for AI Agents & LLMs
Recursive crawling, JS rendering, intelligent extraction, structured data.
Beats Firecrawl at 1/1000th the price. No signup required.
Trafilatura-powered article extraction. Strips nav, ads, footers. Returns clean text + metadata (title, author, date, language).
Headless Chromium in an isolated jail. Renders SPAs, React, Vue. Auto-fallback when static extraction yields too little content.
BFS crawl with parallel workers. Sitemap discovery, robots.txt compliant. Up to 200 pages per request.
Extract JSON-LD and microdata natively. No LLM needed. Schema.org, OpenGraph, product data — all included.
Send up to 10 URLs in one request. Parallel fetching with 4 workers. One response, all results.
No logs, no tracking. Free tier needs no API key. Requests proxied through isolated FreeBSD jails.
Convert a single URL or raw HTML to Markdown.
# Smart extraction with metadata
curl -s "https://synthetic-context.net/v1/markdown?url=https://example.com"
# With JSON-LD + microdata extraction
curl -s "https://synthetic-context.net/v1/markdown?url=https://example.com&extract=jsonld,microdata"
# JS rendering (paid tier)
curl -s -H "X-API-Key: sk_YOUR_KEY" \
"https://synthetic-context.net/v1/markdown?url=https://spa-app.com&js=1"
# Batch POST
curl -s -X POST https://synthetic-context.net/v1/markdown \
-H "Content-Type: application/json" \
-d '{"urls": ["https://a.com", "https://b.com"], "extract": "jsonld"}'
Recursively crawl an entire website. Discovers pages via sitemap + link extraction.
# Crawl up to 10 pages, depth 2
curl -s -X POST https://synthetic-context.net/v1/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://docs.example.com", "max_pages": 10, "max_depth": 2}'
# Crawl with JS rendering + structured data (paid tier)
curl -s -X POST https://synthetic-context.net/v1/crawl \
-H "Content-Type: application/json" \
-H "X-API-Key: sk_YOUR_KEY" \
-d '{"url": "https://spa.com", "max_pages": 50, "js": true, "extract": "jsonld"}'
{
"markdown": "# Article Title\n\nClean extracted text...",
"meta": {"title": "...", "author": "...", "date": "...", "language": "en"},
"jsonld": [{"@type": "Article", "name": "..."}],
"microdata": [{"@type": "https://schema.org/Product", "name": "..."}],
"source": "https://example.com",
"length": 4521
}
Enter a URL and see the result:
| Free | Paid | Premium | |
|---|---|---|---|
| Price | 0 | 5 sats/req | Contact |
| Requests/day | 100 | 10,000 | Unlimited |
| JS Rendering | - | Yes | Yes |
| Crawl max pages | 5 | 50 | 200 |
| API Key | Not needed | X-API-Key header | X-API-Key header |
| Batch URLs | 10 | 10 | 10 |
| Structured data | Yes | Yes | Yes |
| Feature | Firehose | Firecrawl | Crawl4AI | Jina Reader |
|---|---|---|---|---|
| Smart extraction | Yes | Yes | Yes | Yes |
| JS rendering | Yes | Yes | Yes | Yes |
| Recursive crawl | Yes | Yes | Yes | No |
| Sitemap discovery | Yes | Yes | Yes | No |
| JSON-LD extraction | Native (0 LLM) | Via LLM ($) | Via LLM ($) | No |
| Microdata extraction | Native | No | No | No |
| robots.txt | Yes | Yes | Yes | No |
| No signup | Yes | No | Yes | No |
| Price | 5 sats (~$0.005) | From $38/mo | Self-hosted | From $0.01/req |