Sold! How do home features add up to its price tag?

I begin with a new project. It is from the Kaggle playground wherein the objective is to build a regression model (as the response variable or the outcome or dependent variable is continuous in nature) from a given set of predictors or independent variables. My motivation to work on this project are the following; Help me…

Predict Blood Donation -warmup

Continuing from my previous post, in this post I will discuss on the inferential and predictive analysis. About the dataset and the problem to solve: a brief The dataset is derived from UCI Machine learning repository and the task is to predict if a donor has donated blood in March 2007 (1 stand for donating blood; 0…

Learning from data science competitions- baby steps

Off lately a considerable number of winner machine learning enthusiasts have used XGBoost as their predictive analytics solution. This algorithm has taken a preceedence over the traditional tree based algorithms like Random Forests and Neural Networks. The acronym Xgboost stands for eXtreme Gradient Boosting package. The creators of this algorithm presented its implementation by winning the Kaggle Otto…

Basic assumptions to be taken care of when building a predictive model

Before starting to build on a predictive model in R, the following assumptions should be taken care off; Assumption 1: The parameters of the linear regression model must be numeric and linear in nature.  If the parameters are non-numeric like categorical then use one-hot encoding (python) or dummy encoding (R) to convert them to numeric. Assumption…

Data Transformations

A number of reasons can be attributed to when a predictive model crumples such as: Inadequate data pre-processing Inadequate model validation Unjustified extrapolation Over-fitting (Kuhn, 2013) Before we dive into data preprocessing, let me quickly define a few terms that I will be commonly using. Predictor/Independent/Attributes/Descriptors – are the different terms that are used as…

Data Splitting

A few common steps in data model building are; Pre-processing the predictor data (predictor – independent variable’s) Estimating the model parameters Selecting the predictors for the model Evaluating the model performance Fine tuning the class prediction rules “One of the first decisions to make when modeling is to decide which samples will be used to…

To read multiple files from a directory and save to a data frame

There are various solution to this questions like these but I will attempt to answer the problems that I encountered with there working solution that either I found or created by my own. Question 1: My initial problem was how to read multiple .CSV files and store them into a single data frame. Solution: Use…

Gini index to compute inequality or impurity in the data

“Gini index measures the extent to which the distribution of income or consumption expenditure among individuals or households within an economy deviates from a perfectly equal distribution. Thus a Gini index of 0 represents perfect equality, while an index of 100 implies perfect inequality.

Assessing Clustering Tendency in R

In clustering one of major problem a researcher/analyst face are two question. First, does the given dataset has any clustering tendency?And second, how to determine an optimal number of clusters in a dataset validate the clustered results. In this post, I have attempted to answer this using R