Lab 7 - Due by the end of Module 5

Why?:  In Lab 7 we are using RStudio to get a lesson on how to use Logistic Regression.  We will use the “glm” function for logit modeling.

 

Again you will load packages on your own copy of RStudio.  You may be asked to pick a CRAN mirror.  This is a safe download location.  Just pick something geographically close to us (Michigan or Indiana, for example).

 

Complete Lab Exercise 7 in the EMC Lab Guide.  For credit, you only need to answer questions that appear below (at the bottom of this document).

Note: 

·         You will want to open the script file “logit.R” to assist you in the lab steps.  Click on that file in the “Get Stuff” (Black Board) link of our course page, then copy/paste the text into the script area of RStudio.

·         You will NOT execute the “setwd("~/LAB07")” line in the script, as you will load files from our “Get Stuff” link.

·         Note in Step 4:  This file will only read in correctly if you have set your working directory appropriately.

·         Note in Step 4:  The “table” function is giving you a count.  So “table(Mydata$MYDEPV)” tells you HOW MANY of the 750 records have Dependent Variable (“MYDEPV”) 0 and 1.  “table(Mydata$Price,Mydata$MYDEPV)” is giving you a count matrix by price.

·         Note also in Step 4:  Income is provided in thousands.

·         Note also in Step 4:  You should see that we have satisfied the “general rule”, since none of the dependent variables’ (Price, Income, Age) correlations are even close to .85 (the highest is .096).

·         Note in Step 5:  As business people, we need the “log odds of Purchase” to make sense.  Go here for a nice log odds calculator.  So, we know from our output that every one unit change in Income increases the log odds of purchase by 0.12876.  In business people plain English, that means for every $1,000 (one unit) more a customer has in Income, he/she is 3.21% (from the calculator… .5321 is 53.21% or 3.21% higher than the base case of 50/50) more likely to make that purchase.  Here is another quick explanation of p, odds, and log odds.

·         Note in Step 7:  Once again here (like in Lab 6), you want the Q-Q plot to be roughly linear along the X-Y axis, and we see that it is.

·         Note in Step 8:  This releveling simply changes the reference price point to 30 (from 10).

·         Note in Step 9:  You can actually install this “bitops” package here.  You will be asked for a CRAN (“Comprehensive R Archive Network”) mirror location, like in Lab 5.  Once again, just pick something geographically close to us (Michigan or Indiana, for example).

 

 

Post all answers/screen shots to your class Google Sites page under “LAB07”.

 

Step 5, (2):

Show a screen shot of the summary output of your logistic regression model.

 

 

Step 5, (second 2):

How much does one unit in Age increase the log odds of purchase?  What does this mean in plain business English?

 

 

Step 5, (second 2):

What is the general interpretation of the Price20 and Price30 coefficients?

 

 

Step 9, (8):

Show a screen shot of the ROC curve.  Give a business explanation of what this means.

 

 

Step 10, (2):

Show a screen shot of newdata1.  Give a business explanation of what this means.

 

Step 11, (2):

Show a screen shot of the plot of newdata2AgeP.  Give a business explanation of what this means.

 

 

Step 13, (2):

Show a screen shot of the 10 data samples in your random selection.  How many qualify for special offers?