[Python] scrapyによるスクレイピング – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

scrapyとはweb scraping用のフレームワーク

$ pip3 install scrapy
$ scrapy version
Scrapy 2.5.1
$ scrapy startproject test1

$ cd test1
$ scrapy genspider test2 https://paypaymall.yahoo.co.jp/store/*/item/y-hp600-3/
Created spider ‘test2’ using template ‘basic’ in module:
test1.spiders.test2

test1/items.py

import scrapy
class Test1Item(scrapy.Item):
    title = scrapy.Field()

test1/spiders/test2.py

import scrapy
from test1.items import Test1Item

class Test2Spider(scrapy.Spider):
    name = 'test2'
    allowed_domains = ['paypaymall.yahoo.co.jp/store/*/item/y-hp600-3/']
    start_urls = ['https://paypaymall.yahoo.co.jp/store/*/item/y-hp600-3//']

    def parse(self, response):
        return Test1Item(
        		title = response.css('title').extract_first(),
        	)

$ scrapy crawl test2

BS4で良いじゃんと思ってしまうが、どうなんだろうか。