Lab 5 - Due by the end of Module 4

Why?:  In Lab 5 we are using RStudio to work with grocery store transaction data to learn how to use Association Rules.  This is the first lab where I thought “YEAH, this is worth it!” as I was going through this the first time.  Very powerful, cannot be done in Excel or other basic statistics packages, and very useful.  We are starting to see the power of R.

When you install “arules”, you may be asked to pick a CRAN mirror.  This is a safe download location.  Just pick something geographically close to us (Michigan or Indiana, for example).

 

Complete Lab Exercise 5 in the EMC Lab Guide.  For credit, you only need to answer questions that appear below (at the bottom of this document).

Note: 

·         As always, you will need to set your Working Directory (using the “Session” menu option in RStudio) to something local (on your computer), whatever directory you saved the Get Stuff files from for Lab 5.

·         The script file used is “mba.R”; Click on that file in Blackboard, then copy/paste the text into the script area of RStudio if necessary (opening the file should produce the script).

·         You will NOT execute the “setwd("~/LAB05")” line in the script, as you will load files from Blackboard.  Instead, you will need to set the directory to wherever you are storing the files you retrieve from Blackboard.

·         Note in Step 2, we are not installing “arulesViz” (not in the script).

·         Note in Step 3:  This file will only read in correctly if you have set your working directory appropriately.

·         Note in Step 4:  You may get an error on the line “txn@transactionInfo” as shown below.  No issues, simply proceed.

·         Note in Step 7:  If you want to look at this large Groceries data set in Excel, you can download the csv file from Blackboard.

·         Note in Step 9:  The Console in RStudio often will not display enough lines (it has a line limit for performance reasons).  In that case, it’s often useful to use the sink() function.  For the inspect(subrules) I used:

sink("F:/output.txt")

inspect(subrules)

                That sends the output to the file “output.txt” on my flash (F:) drive.

                You will need to use sink() to direct output back to the console.

 

Installing package “arules” (possible warnings and errors):

 

Post all answers/screen shots to your class Google Sites page under “LAB05”.

 

Step 6, (4):

What book purchase rules were output given the specified parameters (Support>.5, Confidence>.9)?

 

Step 8:

How many Grocery purchase rules were generated with support=0.001, confidence=0.5?

 

Step 9(3):

How many Grocery purchase rules were generated now, in the “subrules”, with confidence > 0.8?

 

Step 9(5):

List the Top 3 Grocery purchase rules as sorted by Lift.