Tamsen Haught - May 16, 2016

A Beginner's Guide to Analysis with R Part II

This is a continuation of our previous blog on a Guide to Analysis.  The previous blog covered defining your S.M.A.R.T. goal.  In this section we will discuss preparing and checking your data for analysis.

Our S.M.A.R.T. Goal: Determine what program changes will increase next year’s membership retention for first year members by 10% compared to the previous two years

  • Dependent Variable
    • Renewal (This would be a yes/no answer; did they renew or did they not?)
  • Independent Variables
    • Chapter Participation
    • University
    • Committee Participation
    • Gender
    • Age
    • Organization Type
    • Organization Size
    • Location
    • Amount of Events Attended
    • What type of events did they attend?

Our dependent variable “Renewal” will need to be transformed so that it will be a binary variable (0 or 1).  Our independent variables are a mix of qualitative and quantitative variables.  For the purposes of our analysis, we are going to transform our qualitative variables (Organization Type, Gender, etc.) into quantitative variables.  For example, for gender you would normally have options: Female, Male, Other. For your data model you will instead need a field for each option, with a 1 or 0. The new variables will be Gender – Male, Gender – Female, and Gender - Other.gears

After initial data preparation, you will need to run descriptive statistics on all variables to to better understand your data and identify places where you may have missing or poor data.  You do this using R Studio to catch data issues before you begin your model.

R Studio

  1. Pull in data file to analyze.  If you are using Excel, it is easiest to save as csv or text.  Please note that R Studio will automatically be able to access some locations on your computer.  On my computer it is the My documents folder.  If the file is saved there, I do not need to put in the location.
    • CSV code: Friendly file name <- read.csv ("location/file name.csv", header = TRUE)
    • TXT code: Friendly file name <- read.delim("location/file name.txt")
  2. Make sure you have the appropriate packages installed to do descriptive statistics, packages are collections of R functions, data, and compiled code.  For the purposes of our analysis, we will be using the Hmisc and Psych packages.  You can install these by clicking on Install on the Packages tab.
  3. Now we are going to do descriptive statistics on the file.
    • Load library(Hmisc)
    • Load library(Psych)
    • Describe(Friendly file name)
    • Summary(Friendly file name)

 

Next time we will talk about what to look for in your descriptive statistic results and how to resolve any potential data issues.

Written by Tamsen Haught