Basic Concepts¶
About Tasks¶
Anything in aCrawler is a Task, which execute() and then may yield new Task.
There are several basic Tasks defined here.
Requesttask executes its defaultfetch()method to make HTTP request. Then task will automatically yield a correspondingResponsetask. You can pass a function tocallbackargument and provide afamily, which are all passed to the response task.Responsetask executescallback(). It call all functions incallbackswith http response and may yield new task. AResponsemay have several callback functions (which are passed from decoratorcallback()or corresponding request’s parameter).Itemtask executes itscustom_process()method, which you can rewrite.ParselItemextends fromItem. It accepts aSelectorand uses Parsel to parse content.- Any new
Taskyielded from an existingTask‘s execution will be catched and delivered to scheduler. - Any new
dictionaryyielded from an existingTask’s execution will be catched asDefaultItem.
About Families¶
- Each
Handlerhas only one family. If a handler’s family is in a task’s families, this handler matches the task and then somes fuctions will be called before and after the task. - Each task has
families(defaults to names of all base classes and itself). If you passfamilyto a task, it will be appended to task’s families. Specially, aRequest‘s user-passedfamilywill be passed to its correspondingResponse’s family. familyis also used for decoratorcallback()andregister()- You can use decorator
@register()to add ahandlerto crawler. It is also allowed to register a function but you should provide family, position as parameters. If ahandler’s family is in atask’s families, thenhandlermatchestask. - You can use decorator
@callback(family='')to add a callback to response. Iffamilyin@callback()is in a response’s families, then callback will be combined to this response.
- You can use decorator