Nursery School Applicants

What's Naive Bayes?

  • A classification tool
  • Fast and easy to interpret 
  • Good for large data sets
  • It's "naive" because we are supposed to assume that all predictors or attributes are independent from one another
  • Based on Bayes' Rule of Posterior Probability:


Where:
  • P(c|x) is the posterior probability of class (c, Target variable) given predictor (x, attributes).
  • P(x|c) is the likelihood which is the probability of predictor given class.
  • P(c) is the prior probability of class. Or the probability of class before seeing the data
  • P(x) is the prior probability of predictor.


What data are we going to use?

The nursery data set was developed to rank student applications for-you guessed it--nursery schools. It was used for several years in 1980's when there was excessive enrollment to these schools in Ljubljana, Slovenia, and the rejected applications frequently needed an objective explanation. The final decision depended on three subproblems: occupation of parents and child's nursery, family structure and financial standing, and social and health picture of the family. 

Specifically,  It contains 12,960 profiles consisting of info about:
  1. Parents: usual, pretentious, great_pret
  2. Has_nurs: proper, less_proper, improper, critical, very critical
  3. Form: complete, completed, incomplete, foster
  4. Children: 1, 2, 3 or more
  5. Housing: convenient, less_conv, critical
  6. Finance: convenient, inconv
  7. Social: non-prob, slightly_prob, problematic
  8. Health: recommended, priority, not_recom
  9. Rank: not_recom, priority, recommend, spec_prior, very_recom
Here's the top part of the table:


Goal

Build a model so we can classify the ranks of two new hypothetical applicants.


Short Answer:

Our model can predict with 90% accuracy that the first applicant should be ranked as a "priority" while the second one should be a "not recommended".


R Code:


Comments