Gain：基于 asyncio, uvloop 和 aiohttp 的 Python 爬虫框架

栏目: Python · 发布时间: 9年前

内容简介：Gain：基于 asyncio, uvloop 和 aiohttp 的 Python 爬虫框架

Gain

Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. Every could write their own web crawler easily with gain framework. Gain framework provide a pretty simple api.

Road map

Basic spider
[] Custom header

Requirements

Python3.5+

Based on

asyncio
uvloop
aiohttp
pybloomfiltermmap
pyquery

Installation

pip install gain

Usage

Write spider.py:

from gain import Css, Item, Parser, Spider


class Post(Item):
    title = Css('.entry-title')
    content = Css('.entry-content')

    async def save(self):
        with open('scrapinghub.txt', 'a+') as f:
            f.writelines(self.results['title'] + '\n')


class MySpider(Spider):
    start_url = 'https://blog.scrapinghub.com/'
    parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
               Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]


MySpider.run()

run python spider.py

Gain：基于 asyncio, uvloop 和 aiohttp 的 <a href='https://www.codercto.com/topics/20097.html'>Python</a> 爬虫框架

Example

the examples are in the /example/ directory.

Contribution

Just pull request or open issue.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Developer's Guide to Social Programming

Mark D. Hawker / Addison-Wesley Professional / 2010-8-25 / USD 39.99

In The Developer's Guide to Social Programming, Mark Hawker shows developers how to build applications that integrate with the major social networking sites. Unlike competitive books that focus on a s......一起来看看《Developer's Guide to Social Programming》这本书的介绍吧!

码农工具