BeautifulSoup能和Scrapy一起使用吗？

是的你可以。如上所述：ref：above <faq-scrapy-bs-cmp>，`BeautifulSoup`_可用于解析Scrapy回调中的HTML响应。您只需将响应的主体提供给``BeautifulSoup``对象，并从中提取所需的任何数据。

下面是一个使用BeautifulSoupAPI的蜘蛛示例， lxml 作为HTML解析器：

from bs4 import BeautifulSoup
import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = (
        'http://www.example.com/',
    )

    def parse(self, response):
        # use lxml to get decent HTML parsing speed
        soup = BeautifulSoup(response.text, 'lxml')
        yield {
            "url": response.url,
            "title": soup.h1.string
        }

注解

``BeautifulSoup``支持几种HTML / XML解析器。请参阅“BeautifulSoup的官方文档”，了解哪些可用。

温馨提示

下载编程狮App，免费阅读超1000+编程语言教程

取消

确定

MIP.setData({ 'pageTheme' : getCookie('pageTheme') || {'day':true, 'night':false}, 'pageFontSize' : getCookie('pageFontSize') || 20 }); MIP.watch('pageTheme', function(newValue){ setCookie('pageTheme', JSON.stringify(newValue)) }); MIP.watch('pageFontSize', function(newValue){ setCookie('pageFontSize', newValue) }); function setCookie(name, value){ var days = 1; var exp = new Date(); exp.setTime(exp.getTime() + days*24*60*60*1000); document.cookie = name + '=' + value + ';expires=' + exp.toUTCString(); } function getCookie(name){ var reg = new RegExp('(^| )' + name + '=([^;]*)(;|$)'); return document.cookie.match(reg) ? JSON.parse(document.cookie.match(reg)[2]) : null; }

w3cschool 编程狮，随时随地学编程

BeautifulSoup能和Scrapy一起使用吗？

scrapy 2.3 安装指南

scrapy 2.3 教程

scrapy 2.3 命令行工具

scrapy 2.3 蜘蛛

scrapy 2.3 选择器

scrapy 2.3 使用选择器

scrapy 2.3 使用xpaths

scrapy 2.3 使用exslt扩展

scrapy 2.3 内置选择器引

scrapy 2.3 选择器实例

scrapy 2.3 项目

scrapy 2.3 项目类型

scrapy 2.3 使用项目对象

scrapy 2.3 使用项目对象

scrapy 2.3 项目加载器

scrapy 2.3 shell

scrapy 2.3 shell使用外壳

scrapy 2.3 项目管道

scrapy 2.3 项目管道示例

scrapy 2.3 Feed导出

scrapy 2.3 请求和响应

无标题文章

scrapy 2.3 请求子类

scrapy 2.3 链接提取器

scrapy 2.3 设置

scrapy 2.3 登录

scrapy 2.3 日志记录配置

scrapy 2.3 统计数据集合

scrapy 2.3 发送电子邮件

scrapy 2.3 远程登录控制台

scrapy 2.3 常见问题

scrapy 2.3 调试spiders

scrapy 2.3 蜘蛛合约

scrapy 2.3 常用做法

scrapy 2.3 宽爬行

scrapy 2.3 使用浏览器的开发人员工具进行抓取

scrapy 2.3 选择动态加载的内容

scrapy 2.3 调试内存泄漏

scrapy 2.3 下载和处理文件和图像

scrapy 2.3 如何部署蜘蛛

scrapy 2.3 AutoThrottle扩展