aCrawler documentation

PyPI Documentation Status

🔍 A powerful web-crawling framework, based on aiohttp.

Feature

  • Write your crawler in one Python script with asyncio
  • Schedule task with priority, fingerprint, exetime, recrawl…
  • Middleware: add handlers before or after task’s execution
  • Simple shortcuts to speed up scripting
  • Parse html conveniently with Parsel
  • Parse with rules and chained processors
  • Support JavaScript/browser-automation with pyppeteer
  • Stop and Resume: crawl periodically and persistently
  • Distributed work support with Redis

Installation

To install, simply use pipenv (or pip):

$ pipenv install acrawler

(Optional)
$ pipenv install uvloop      #(only Linux/macOS, for faster asyncio event loop)
$ pipenv install aioredis    #(if you need Redis support)
$ pipenv install motor       #(if you need MongoDB support)
$ pipenv install aiofiles    #(if you need FileRequest)

Indices and tables