aCrawler documentation¶
🔍 A powerful web-crawling framework, based on aiohttp.
Feature¶
- Write your crawler in one Python script with asyncio
- Schedule task with priority, fingerprint, exetime, recrawl…
- Middleware: add handlers before or after task’s execution
- Simple shortcuts to speed up scripting
- Parse html conveniently with Parsel
- Parse with rules and chained processors
- Support JavaScript/browser-automation with pyppeteer
- Stop and Resume: crawl periodically and persistently
- Distributed work support with Redis
Installation¶
To install, simply use pipenv (or pip):
$ pipenv install acrawler
(Optional)
$ pipenv install uvloop #(only Linux/macOS, for faster asyncio event loop)
$ pipenv install aioredis #(if you need Redis support)
$ pipenv install motor #(if you need MongoDB support)
$ pipenv install aiofiles #(if you need FileRequest)