Scrapy on Windows – Setup

“Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.”

This article will walk you through installing Scrapy (on a windows operating system).

1.    Preliminaries

First, ensure the following dependencies exist on your machine;

Step 1: Python version 2.7 as scrapy only works with python 2.7. To check the python version issue the command

 python –version 
  • you need to add C:\Python27 and C:\Python27\Scripts to your Path environment variable.

Step 2: Install Microsoft Visual C++ for python 2.7 from here

Step 3: Ensure that pip is installed. Pip comes preinstalled on Python version 2.7.9 and above. See this nice SO post on the same.

Step 4: Download and install OpenSSL from here into your c:\python27 folder.
Right click on the .whl file and select Save link as to download the file to your python folder.
To install OpenSSL, open cmd. Then change the path using cd C:\Python27
Then use the following command: python -m pip install pyOpenSSL-16.0.0-py2.py3-none-any.whl

Once its installed completely, you can see the following message in the end.

 Successfully installed pyOpenSSL-16.0.0 

Step 5: Install lxml- Installing lxml is really important. Download lxml from here. We need to download and install the latest version which supports our Python 2.7. Download the file to your Python 2.7 folder, which is C:\Python27. To install the lxml, open cmd and change path to python as we did in previous step. Then use the following command to install lxml.
python -m pip install lxml-3.6.0-cp27-cp27m-win32.whl

Step 6: Install the relevant version of pywin32 for your computer’s OS from here or open command prompt and execute the command pip install pywin32. Here is a good answer on SO for the same, see the answer by the user ‘Kanguros’.

2. Scrapy installation on a Windows OS

Open the command prompt and issue the command

pip install Scrapy 

Once installed, you can check the version by issuing the command

 scrapy version 

. If all went well, then you should have something like this as shown in fig 1.

capture

Fig 1. Scrapy installation on windows OS

One drawback with scrapy to me was that I had to issue the scraping commands from the command prompt. I will now show you how to configure Scrapy to execute from an IDE. I use Pycharm as an IDE for programming.

3. Scrapy configuration with PyCharm IDE

Create a Run/Debug configuration in Pycharm as illustrated here on this SO answer by user “Pullie”

Once you have configured Pycharm IDE Run/Debug configuration, there is no longer any need to go to the command prompt and issue the crawl command. This can be done using the IDE. See fig 2 below for a successful spider crawl.

scrapy-1scrapy-2

Fig 2. Scrapy configuration and execution in Pycharm

In next post, I will focus on building a data pipeline with Scrapy. If you have any comments or suggestions, please let me know.

Cheers.

Advertisements

One thought on “Scrapy on Windows – Setup

  1. Pingback: Data extraction with Scrapy-I | My thoughts & learnings

Comments are closed.