# Basic assumptions to be taken care of when building a predictive model

Before starting to build on a predictive model in R, the following assumptions should be taken care off; Assumption 1: The parameters of the linear regression model must be numeric and linear in nature.  If the parameters are non-numeric like categorical then use one-hot encoding (python) or dummy encoding (R) to convert them to numeric. Assumption…

# Scenarios when data preprocessing is imperative with examples in Dell Statistica 12 & RapidMiner Studio 6.5

There are usually several data preprocessing steps required before applying any machine learning algorithms to data. These are required by the nature of available data and algorithms. Below are listed few common instances where data preprocessing is required. Recall in this context, attributes are variables (columns in the data spreadsheet) and each row in this column is a…

# Gini index to compute inequality or impurity in the data

“Gini index measures the extent to which the distribution of income or consumption expenditure among individuals or households within an economy deviates from a perfectly equal distribution. Thus a Gini index of 0 represents perfect equality, while an index of 100 implies perfect inequality.

# Assessing Clustering Tendency in R

In clustering one of major problem a researcher/analyst face are two question. First, does the given dataset has any clustering tendency?And second, how to determine an optimal number of clusters in a dataset validate the clustered results. In this post, I have attempted to answer this using R

# Data Clustering- Theoretical concepts

Theoretical concepts are essentially the building blocks towards a bigger picture. In this post I have mentioned the fundamental blocks of data clustering which any data scientist would need to begin a data mining process. Although, its not complete but it should get you started.

# Using SSIS to transfer data from multiple SQL tables by executing join query that writes result to CSV

Using SSIS to transfer data from multiple tables dynamically at run time by executing a SQL join query and writing the results into a CSV file

# How to move data from Excel to SQL Server 2008/2012- Act 1 Scene 3

A hands on activity showing pictorial steps on how to dynamically move data from excel files to text files using SQL Server Data Tools (SSDT) and BIDS 2008

# How to move data from Excel to SQL Server 2008/2012- Act 1 Scene 2

Today, I will discuss on how to configure variables and excel connection manager in BIDS 2008. The same can be applied to BIDS 2012 and later too. I spent over a week trying to find out a method by which I could read multiple files into a database. Initially, I began with trying to read…

# Data Processing with Weka (Part II)

Today, I will discuss and elaborate on data processing in Weka 3.6 (it’s the same in version 3.7 too). This post is the second part in the series of “Data pre-processing with Weka”. If you have not seen my earlier post, you are directed to see that first. Continuing further, assuming that you have cleaned…