Data Transformations

A number of reasons can be attributed to when a predictive model crumples such as: Inadequate data pre-processing Inadequate model validation Unjustified extrapolation Over-fitting (Kuhn, 2013) Before we dive into data preprocessing, let me quickly define a few terms that I will be commonly using. Predictor/Independent/Attributes/Descriptors – are the different terms that are used as…

Data Splitting

A few common steps in data model building are; Pre-processing the predictor data (predictor – independent variable’s) Estimating the model parameters Selecting the predictors for the model Evaluating the model performance Fine tuning the class prediction rules “One of the first decisions to make when modeling is to decide which samples will be used to…

Scenarios when data preprocessing is imperative with examples in Dell Statistica 12 & RapidMiner Studio 6.5

There are usually several data preprocessing steps required before applying any machine learning algorithms to data. These are required by the nature of available data and algorithms. Below are listed few common instances where data preprocessing is required. Recall in this context, attributes are variables (columns in the data spreadsheet) and each row in this column is a…

Assessing Clustering Tendency in R

In clustering one of major problem a researcher/analyst face are two question. First, does the given dataset has any clustering tendency?And second, how to determine an optimal number of clusters in a dataset validate the clustered results. In this post, I have attempted to answer this using R