scrapyとはweb scraping用のフレームワーク
$ pip3 install scrapy
$ scrapy version
Scrapy 2.5.1
$ scrapy startproject test1
$ cd test1
$ scrapy genspider test2 https://paypaymall.yahoo.co.jp/store/*/item/y-hp600-3/
Created spider ‘test2’ using template ‘basic’ in module:
test1.spiders.test2
test1/items.py
import scrapy class Test1Item(scrapy.Item): title = scrapy.Field()
test1/spiders/test2.py
import scrapy from test1.items import Test1Item class Test2Spider(scrapy.Spider): name = 'test2' allowed_domains = ['paypaymall.yahoo.co.jp/store/*/item/y-hp600-3/'] start_urls = ['https://paypaymall.yahoo.co.jp/store/*/item/y-hp600-3//'] def parse(self, response): return Test1Item( title = response.css('title').extract_first(), )
$ scrapy crawl test2
BS4で良いじゃんと思ってしまうが、どうなんだろうか。