Skip to content Skip to sidebar Skip to footer

Scrapy Use Item And Save Data In A Json File

I want to use scrapy item and manipulate data and saving all in json file (using json file like a db). # Spider Class class Spider(scrapy.Spider): name = 'productpage' sta

Solution 1:

You should define the item you want. And yield it after parsed.

Last, run the command: scrapy crawl [spider] -o xx.json

PS: Default scrapy had support export json file.

Solution 2:

@Jadian's answer will get you a file with JSON in it, but not quite db like access to it. In order to do this properly from a design stand point I would follow the below instructions. You don't have to use mongo either there are plenty of other nosql dbs available that use JSON.

What I would recommend in this situation is that you build out the items properly using scrapy.Item() classes. Then you can use json.dumps into mongoDB. You will need to assign a PK to each item, but mongo is basically made to be a non relational json store. So what you would do is then create an item pipeline which checks for the PK of the item and if its found and no details are changed then raise DropItem() else update/store new data into the mongodb. You could even pipe into the json exporter if you wanted to probably, but I think just dumping the python object to json into mongo is the way to go and then mongo will present you with json to work with on the front end.

I hope that you understand this answer, but I think from a design point this will be a much easier solution since mongo is basically a non relational data store based on JSON, and you will be dividing your item pipeline logic into its own area instead of cluttering your spider with it.

I would provide a code sample, but most of mine are using ORM for SQL db. Mongo is actually easier to use than this...

Post a Comment for "Scrapy Use Item And Save Data In A Json File"