Basic Concepts¶
About Tasks¶
Anything in aCrawler is a Task
, which execute()
and then may yield new Task
.
There are several basic Tasks defined here.
Request
task executes its defaultfetch()
method to make HTTP request. Then task will automatically yield a correspondingResponse
task. You can pass a function tocallback
argument and provide afamily
, which are all passed to the response task.Response
task executescallback()
. It call all functions incallbacks
with http response and may yield new task. AResponse
may have several callback functions (which are passed from decoratorcallback()
or corresponding request’s parameter).Item
task executes itscustom_process()
method, which you can rewrite.ParselItem
extends fromItem
. It accepts aSelector
and uses Parsel to parse content.- Any new
Task
yielded from an existingTask
‘s execution will be catched and delivered to scheduler. - Any new
dictionary
yielded from an existingTask
’s execution will be catched asDefaultItem
.
About Families¶
- Each
Handler
has only one family. If a handler’s family is in a task’s families, this handler matches the task and then somes fuctions will be called before and after the task. - Each task has
families
(defaults to names of all base classes and itself). If you passfamily
to a task, it will be appended to task’s families. Specially, aRequest
‘s user-passedfamily
will be passed to its correspondingResponse
’s family. family
is also used for decoratorcallback()
andregister()
- You can use decorator
@register()
to add ahandler
to crawler. It is also allowed to register a function but you should provide family, position as parameters. If ahandler
’s family is in atask
’s families, thenhandler
matchestask
. - You can use decorator
@callback(family='')
to add a callback to response. Iffamily
in@callback()
is in a response’s families, then callback will be combined to this response.
- You can use decorator