Lab 8 - Due by the end of Module 6

Why?:  In Lab 8 we are using RStudio to use Naïve Bayesian Classifier.  In Part 1 we are trying to predict whether someone will enroll in a Big Data Analytics course, given their Age, Income, Job Satisfaction, and Desire to learn new things.  We are SKIPPING Part 2.  Starting on Page 90, Part 2, skip ODBC.

Click here for Lab 8 Part 1 R script.

Complete Lab Exercise 8 in the EMC Lab Guide.  For credit, you only need to answer questions that appear below (at the bottom of this document).

Note: 

·         You will want to open the script file “NBcoderev.R” to assist you in the lab steps.

·         Note in Part 1, Step 3:  This package should install successfully even in your VM.

·         Note in Part 1, Step 7:  This is the easy way to do most of what you just did manually.

·         Note in Part 1, Step 9:  Since our training data included no one with Age=31-40 and Enrolls=No, our probability is 0.  The Laplace smoothing allows us to change that to a low probability (.01 here) so it is not an absolute uncertainty (prob=0), which is unrealistic.  You will see that the prediction does not change here.

·         SKIP PART 2!

 

Post all answers/screen shots to your class Google Sites page under “LAB08”.

 

Part 1, Step 6, (3):  What is the prediction for “Enrolls” for someone with Age<=30, Income=Medium, Jobsatisfaction=yes and Desire=Fair?

 

 

Part 1, Step 8, (2):  What is your prediction?