crawlee: Tidy Interface for Reproducible Web Crawling

A tidy, pipe-friendly toolkit for reproducible web crawling and structured data collection, inspired by the architecture of the 'Crawlee' library. Provides a unified crawler with a deduplicating, resumable request queue, content-type aware handlers, structured storage backends and rich console logging via 'cli'. Supports crawling HTML pages, sitemaps, RSS and Atom feeds and PDF documents, with optional headless-browser rendering and helpers for retrieval-augmented generation.

Version: 0.1.0
Depends: R (≥ 4.1.0)
Imports: cli, httr2, R6, rlang, rvest, tibble, vctrs, xml2
Suggests: arrow, chromote, DBI, dplyr, duckdb, httptest2, jsonlite, knitr, later, nanoparquet, pdftools, polite, promises, rmarkdown, testthat (≥ 3.0.0)
Published: 2026-07-03
DOI: 10.32614/CRAN.package.crawlee (may not be active yet)
Author: Andre Leite [aut, cre], Marcos Wasilew [aut], Hugo Vasconcelos [aut], Carlos Amorin [aut], Diogo Bezerra [aut]
Maintainer: Andre Leite <leite at castlab.org>
BugReports: https://github.com/StrategicProjects/crawlee/issues
License: MIT + file LICENSE
URL: https://github.com/StrategicProjects/crawlee, https://strategicprojects.github.io/crawlee/
NeedsCompilation: no
Language: en-US
Materials: README, NEWS
CRAN checks: crawlee results

Documentation:

Reference manual: crawlee.html , crawlee.pdf
Vignettes: Getting started with crawlee (source, R code)
Crawling a website (source, R code)
A RAG pipeline (source, R code)
Scaling and politeness (source, R code)
Storage and resumable runs (source, R code)

Downloads:

Package source: crawlee_0.1.0.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): crawlee_0.1.0.tgz, r-oldrel (arm64): crawlee_0.1.0.tgz, r-release (x86_64): not available, r-oldrel (x86_64): crawlee_0.1.0.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=crawlee to link to this page.

mirror server hosted at Truenetwork, Russian Federation.