# Save water! Save life! Multivariate data analysis report

In a bid to explore and detect variables of interest that were most strongly correlated with each other, I proceeded with the regression analysis (see here for the previous article). But there were challenges; One of the fundamental problem was that the variables were categorical and for regression analysis it is important that the variables…

# To penalise or not to penalise: The curious case of automatic feature selection

What is Lasso Regression? The LASSO (Least Absolute Shrinkage and Selection Operator)  is a shrinkage and selection method for linear regression. This method involves penalizing the absolute size of the regression coefficients. A good description for layman understanding is given on this SO post; to quote, ” By penalizing (or equivalently constraining the sum of the absolute…

# Big or small-let’s save them all: Logistic Regression analysis

For this study, I am using the gapminder code book and have chosen the following variables for data analysis; Explanatory or Predictor or Independent variables: Income per person and alcohol consumption. Where Income per person is the 2010 Gross Domestic Product per capita in constant 2000 US\$. And alcohol consumption is the 2008 alcohol consumption per…

# Big or small-let’s save them all: Uncovering the factors responsible- Multiple Regression Analysis

Multiple regression analysis is tool that allows you to expand on your research question, and conduct a more rigorous test of the association between your explanatory and response variable by adding additional quantitative and/or categorical explanatory variables to your linear regression model. I discuss this in detail in this post