Naive Bayes

In grade 9, I was introduced to the Bayes' Theorem using the classic example of medical tests. I was intrigued by how deceiving probability can be, but at the time I didn't fully understand the true applications of Bayes' Theorem. While the example of medical tests may be especially relevant with the ongoing pandemic, the theorem didn't amaze me that much. What is amazing, though, is how it can be used in the context of machine learning.

Revisiting Bayes' Theorem

Note: If you are fully versed in the Bayes' Theorem, you can skip this section and the subsequent example with cancer tests and dive right in to the Naive Bayes section.

Firstly, recall what conditional probability is $$P(Y|X)=\frac{P(Y \cap X)}{P(X)}$$ $P(Y|X)$ means the probability of Y given X. Alternatively, you can think of it as the probability of output Y given the input X. Now, notice that we can simply replace the variables: $$P(X|Y)=\frac{P(X \cap Y)}{P(Y)}$$ But $P(Y \cap X)$ and $P(X \cap Y)$ are exactly the same probability - just the probability of both X and Y occurring. This means that we can take the second equation and rewrite it as $$P(X \cap Y)=P(X|Y)P(Y)$$ Now, we can substitute this into the first equation to get $$\bbox[yellow]{P(Y|X)=\frac{P(X|Y)P(Y)}{P(X)}}$$ For terminology purposes, $P(Y)$ is known as the Prior, $P(X|Y)$ is the Likelihood and $P(Y|X)$ is the Posterior.

Example with Cancer Tests

Suppose there is a cancer that occurs in 1% of the population, or $P(C)=0.01$

A person WITH cancer has a 90% chance of receiving a POSITIVE diagnosis - correct diagnosis
A person WITHOUT cancer has a 5% chance of receiving a POSITIVe diagnosis - a false positive

In the jargon of medical diagnosis, the first case is known as the sensitivity and the second is known as the specificity

Now the question is: suppose you take the test, and the test returns positive. What is the probability of you having the cancer?

In other words, we are seeking $P(C|pos)$, the probability that we have cancer given the rest returned positive. Let's translate the rest of the information to mathematical notation as well:

To calculate $P(pos)$, we simply go back to joint probability and do $$P(pos)=P(C, pos)+P(\neg C, pos)=(0.01)(0.9)+(0.99)(0.05)=0.0585$$ Therefore, $$P(C|pos)=\frac{P(pos|C)P(C)}{P(pos)}$$ $$=\frac{(0.9)(0.01)}{0.0585}=2/13≈0.154$$

This means that even if the test returned positive, you only have a 15% chance of actually having cancer.

Naive Bayes

Naive Bayes is a supervised classification algorithm that makes a naive assumption when using the Bayes' Theorem. But first, let's just apply Bayes' Theorem: $$P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots, x_n \mid y)}{P(x_1, \dots, x_n)}$$ where $x_1, x_2, \dots x_n$ represent the features of the data.

Now, the naive assumption is that the features $X_i$ and $X_j$ are conditionally independent given Y, for all $i≠j$. In other words, it is assumed that the features of the data are all independent relative to one another.

Mathematically, we can write $$\begin{aligned} P(x_1, x_2, \dots x_n|Y)&=P(x_1|y) \cdot P(x_2|y) \cdot P(x_3|y) \cdot \dots \cdot P(x_n|y) \\ &= \prod_{i=1}^{n}P(x_i \mid y) \end{aligned}$$ To gain an intuition as to why this is true, I highly recommend you check out this article on Towards Data Science.

But since $P(x_1, x_2, \cdot , x_n)$ is constant, we can introduce a proportionality:$$P(y \mid x_1, \dots, x_n) \propto P(y)\prod_{i=1}^{n}P(x_i \mid y)$$

Finally, to calculate the $P(y)$, we can employ the maximum a posteriori estimation:$$\hat{y} = \arg\max{_y} [P(y) \prod_{i=1}^{n} P(x_i \mid y)]$$

Gaussian Naive Bayes

There are three main types of naive Bayes depending on the distribution of $P(x_i \mid y)$: Gaussian, Multinomial and Bernoulli.

For example, this is the Gaussian distribution that I'm sure you have across before. It's literally just the bell curve. $$P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma^2_y}\right)$$ Here, to find the parameters $\mu$ and $\sigma$ (mean and variance), we use a method called maximum likelihood estimation that I briefly mentioned in my other post, The International Baccalaureate Examinations: Item Response Theory(go check it out if you haven't already!)

Implementation with Sklearn in Python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
classifier = GaussianNB()
classifier.fit(X_train, y_train)
prediction = classifier.predict(X_test)
print("Number of mislabeled points out of a total %d points : %d" % (X_test.shape[0], (y_test != y_pred).sum()))
accuracy = classifier.score(X_test, y_test)

[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0 1 1 1 2 0 2 0 0 1 2 2 1 2 1 2 1 1 2 1 1 2 1 2 1 0 2 1 1 1 1 2 0 0 2 1 0 0 1]
Number of mislabeled points out of a total 75 points : 4
0.9466666666666667
[Finished in 2.4s]

This is example code from the sklearn documentation, but it just shows how powerful the naive Bayes algorithm is, despite its very simple nature - an accuracy of 94.7% is quite good!

Conclusion

Machine learning is an extremely fast growing field that opens a whole new door of possibilities in a countless number of fields. It is an amalgamation of various mathematical topics including linear algebra, calculus, statistics and probability. There's so much to cover, that it's difficult to write about an algorithm in just a single post. I will probably be doing a similar article on other ML algorithms, so stay tuned!

Sources

SoySoy4444 2021/05/02

I like programming (Python, Java and some frontend), chess, xiangqi, maths, puzzles and more. You can find me on Chess.com @SoySoy4444. I'm always up for a challenge!

Alexandra May 05, 2021 04:44

Sounds familiar!

Naive Bayes

Revisiting Bayes' Theorem

Example with Cancer Tests

Naive Bayes

Gaussian Naive Bayes

Implementation with Sklearn in Python

Conclusion

Sources

More From Us

Our experience with Javafx at a Hackathon

RIKEN 13

The Schwarzschild Radius: a Glimpse of Relativity