To penalise or not to penalise: The curious case of automatic feature selection

What is Lasso Regression? The LASSO (Least Absolute Shrinkage and Selection Operator)  is a shrinkage and selection method for linear regression. This method involves penalizing the absolute size of the regression coefficients. A good description for layman understanding is given on this SO post; to quote, ” By penalizing (or equivalently constraining the sum of the absolute…

A random forest approach to predicting breast cancer in working class women

What is a Random Forest? A random forest is an ensemble (group or combination) of tree’s that collectively vote for the most popular class (or feature) amongst them by cancelling out the noise. Ensemble learning– ensemble means group or combination. Ensemble learning in the context of machine learning is referred to methods that generate many classifiers…

Big or small-let’s save them all: How the data was collected

For this research study, I have chosen the “Gapminder Codebook”. It combines longitudinal survey data on respondents’ social, economic, psychological and physical well-being with contextual data on the family, neighbourhood, community, school, friendships, peer groups, and romantic relationships, providing unique opportunities to study how social environments and behaviours in adolescence are linked to health and achievement…