“Gini index measures the extent to which the distribution of income or consumption expenditure among individuals or households within an economy deviates from a perfectly equal distribution. Thus a Gini index of 0 represents perfect equality, while an index of 100 implies perfect inequality.
In clustering one of major problem a researcher/analyst face are two question. First, does the given dataset has any clustering tendency?And second, how to determine an optimal number of clusters in a dataset validate the clustered results. In this post, I have attempted to answer this using R
An attempt to “cluster” disparate data mining packages under one ‘roof’
A hands on approach on how to create a dissimilarity matrix in R and its subsequent cluster implementation
A working implementation of hierarchical clustering method in R: A case study
Missing data is an important step in data pre-processing. real world datasets are replete with missing values in variables. How are you going to solve this “missing” mystery? Read on to know the answer
Originally posted on The Data Science Lab:
A couple of weeks ago, here at The Data Science Lab we showed how Lloyd’s algorithm can be used to cluster points using k-means with a simple python implementation. We also produced interesting visualizations of the Voronoi tessellation induced by the clustering. At the end of the post we hinted at some of the shortcomings of this clustering procedure. The basic k-means is an extremely simple and efficient algorithm. However, it assumes prior knowledge of the data in order to choose the appropriate K. Other disadvantages are the sensitivity of the final clusters to the…