Question

Is it possible to get the data scraped from websites using Scrapy, and saving that data in an Microsoft SQL Server Database?

If Yes, are there any examples of this being done? Is it mainly a Python issue? i.e. if I find some code of Python saving to an SQL Server database, then Scrapy can do same?

Was it helpful?

Solution

Yes, but you'd have to write the code to do it yourself since scrapy does not provide an item pipeline that writes to a database.

Have a read of the Item Pipeline page from the scrapy documentation which describes the process in more detail (here's a JSONWriterPipeline as an example). Basically, find some code that writes to a SQL Server database (using something like PyODBC) and you should be able to adapt that to create a custom item pipeline that outputs items directly to a SQL Server database.

OTHER TIPS

Super late and completely self promotion here, but I think this could help someone. I just wrote a little scrapy extension to save scraped items to a database. scrapy-sqlitem

It is super easy to use.

pip install scrapy_sqlitem

Define Scrapy Items using SqlAlchemy Tables

from scrapy_sqlitem import SqlItem

class MyItem(SqlItem):
    sqlmodel = Table('mytable', metadata
        Column('id', Integer, primary_key=True),
        Column('name', String, nullable=False))

Add the following pipeline

from sqlalchemy import create_engine

class CommitSqlPipeline(object):

    def __init__(self):
            self.engine = create_engine("sqlite:///")

    def process_item(self, item, spider):
            item.commit_item(engine=self.engine)

Don't forget to add the pipeline to settings file and create the database tables if they do not exist.

http://doc.scrapy.org/en/1.0/topics/item-pipeline.html#activating-an-item-pipeline-component

http://docs.sqlalchemy.org/en/rel_1_1/core/tutorial.html#define-and-create-tables

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top