Nursery School Applicants

What's Naive Bayes?

A classification tool
Fast and easy to interpret
Good for large data sets
It's "naive" because we are supposed to assume that all predictors or attributes are independent from one another
Based on Bayes' Rule of Posterior Probability:

Where:

P(c|x) is the posterior probability of class (c, Target variable) given predictor (x, attributes).
P(x|c) is the likelihood which is the probability of predictor given class.
P(c) is the prior probability of class. Or the probability of class before seeing the data
P(x) is the prior probability of predictor.

What data are we going to use?

The nursery data set was developed to rank student applications for-you guessed it--nursery schools. It was used for several years in 1980's when there was excessive enrollment to these schools in Ljubljana, Slovenia, and the rejected applications frequently needed an objective explanation. The final decision depended on three subproblems: occupation of parents and child's nursery, family structure and financial standing, and social and health picture of the family.

Specifically, It contains 12,960 profiles consisting of info about:

Parents: usual, pretentious, great_pret
Has_nurs: proper, less_proper, improper, critical, very critical
Form: complete, completed, incomplete, foster
Children: 1, 2, 3 or more
Housing: convenient, less_conv, critical
Finance: convenient, inconv
Social: non-prob, slightly_prob, problematic
Health: recommended, priority, not_recom
Rank: not_recom, priority, recommend, spec_prior, very_recom

Here's the top part of the table:

Goal

Build a model so we can classify the ranks of two new hypothetical applicants.

Short Answer:

Our model can predict with 90% accuracy that the first applicant should be ranked as a "priority" while the second one should be a "not recommended".

Juan Antonio Pajarillo's Data Analytics Projects

Search This Blog