Data extraction with Scrapy-II

In this post, I will discuss on the subtle features of the Scrapy framework responsible for data extraction including building a basic spider. But first, a key points to remember, that are as follows; Preliminaries From the documentation, “Scrapy spiders can return the extracted data as Python dicts” therefore to separate the key-value pair the…

Data extraction with Scrapy-I

Disclaimer: The objective of this post is purely educational in nature. There are no monetary benefits associated. Introduction In this digital age, we are surrounded by data and a majority of it is in unstructured format. The oxford dictionary defines unstructured as “Without formal organization or structure.”. Websites are a rich source of this unstructured…

Scrapy on Windows – Setup

“Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.” This article will walk you through installing Scrapy (on a windows operating system). 1.    Preliminaries First, ensure the following dependencies exist on your machine; Step 1: Python version 2.7 as scrapy only…

Learning from data science competitions- baby steps

Off lately a considerable number of winner machine learning enthusiasts have used XGBoost as their predictive analytics solution. This algorithm has taken a preceedence over the traditional tree based algorithms like Random Forests and Neural Networks. The acronym Xgboost stands for eXtreme Gradient Boosting package. The creators of this algorithm presented its implementation by winning the Kaggle Otto…

Data Transformations

A number of reasons can be attributed to when a predictive model crumples such as: Inadequate data pre-processing Inadequate model validation Unjustified extrapolation Over-fitting (Kuhn, 2013) Before we dive into data preprocessing, let me quickly define a few terms that I will be commonly using. Predictor/Independent/Attributes/Descriptors – are the different terms that are used as…

Data Splitting

A few common steps in data model building are; Pre-processing the predictor data (predictor – independent variable’s) Estimating the model parameters Selecting the predictors for the model Evaluating the model performance Fine tuning the class prediction rules “One of the first decisions to make when modeling is to decide which samples will be used to…