## Friday, November 23, 2012

### The Birthday Problem

In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367 (since there are 366 possible birthdays, including February 29). However, 99% probability is reached with just 57 people, and 50% probability with 23 people. These conclusions are based on the assumption that each day of the year (except February 29) is equally probable for a birthday.
The mathematics behind this problem led to a well-known cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of cracking a hash function.

Calculating the Probability
The problem is to compute the approximate probability that in a room of n people, at least two have the same birthday. For simplicity, disregard variations in the distribution, such as leap years, twins, seasonal or weekday variations, and assume that the 365 possible birthdays are equally likely. Real-life birthday distributions are not uniform since not all dates are equally likely.
If P(A) is the probability of at least two people in the room having the same birthday, it may be simpler to calculate P(A'), the probability of there not being any two people having the same birthday. Then, because A and A' are the only two possibilities and are also mutually exclusive, P(A') = 1 − P(A).
In deference to widely published solutions concluding that 23 is the number of people necessary to have a P(A) that is greater than 50%, the following calculation of P(A) will use 23 people as an example.
When events are independent of each other, the probability of all of the events occurring is equal to a product of the probabilities of each of the events occurring. Therefore, if P(A') can be described as 23 independent events, P(A') could be calculated as P(1) × P(2) × P(3) × ... × P(23).
The 23 independent events correspond to the 23 people, and can be defined in order. Each event can be defined as the corresponding person not sharing his/her birthday with any of the previously analyzed people. For Event 1, there are no previously analyzed people. Therefore, the probability, P(1), that person number 1 does not share his/her birthday with previously analyzed people is 1, or 100%. Ignoring leap years for this analysis, the probability of 1 can also be written as 365/365, for reasons that will become clear below.
For Event 2, the only previously analyzed people is Person 1. Assuming that birthdays are equally likely to happen on each of the 365 days of the year, the probability, P(2), that Person 2 has a different birthday than Person 1 is 364/365. This is because, if Person 2 was born on any of the other 364 days of the year, Persons 1 and 2 will not share the same birthday.
Similarly, if Person 3 is born on any of the 363 days of the year other than the birthdays of Persons 1 and 2, Person 3 will not share their birthday. This makes the probabilityP(3) = 363/365.
This analysis continues until Person 23 is reached, whose probability of not sharing his/her birthday with people analyzed before, P(23), is 343/365.
P(A') is equal to the product of these individual probabilities:
(1) P(A') = 365/365 × 364/365 × 363/365 × 362/365 × ... × 343/365
The terms of equation (1) can be collected to arrive at:
(2) P(A') = (1/365)23 × (365 × 364 × 363 × ... × 343)
Evaluating equation (2) gives P(A') = 0.492703
Therefore, P(A) = 1 − 0.492703 = 0.507297 (50.7297%)
This process can be generalized to a group of n people, where p(n) is the probability of at least two of the n people sharing a birthday. It is easier to first calculate the probability p(n) that all n birthdays are different. According to the pigeonhole principle, p(n) is zero when n > 365. When n ≤ 365:
\begin{align} \bar p(n) &= 1 \times \left(1-\frac{1}{365}\right) \times \left(1-\frac{2}{365}\right) \times \cdots \times \left(1-\frac{n-1}{365}\right) \\ &= { 365 \times 364 \times \cdots \times (365-n+1) \over 365^n } \\ &= { 365! \over 365^n (365-n)!} = \frac{n!\cdot{365 \choose n}}{365^n} = \frac{^{365}P_n}{365^n}\end{align}
where ' ! ' is the factorial operator, ${365 \choose n}$ is the binomial coefficient and ${^{k}P_r}$ denotes permutation.
The equation expresses the fact that for no persons to share a birthday, a second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), and in general the nth birthday cannot be the same as any of the n − 1 preceding birthdays.

The event of at least two of the n persons having the same birthday is complementary to all n birthdays being different. Therefore, its probability p(n) is  $p(n) = 1 - \bar p(n). \,$

This probability surpasses 1/2 for n = 23 (with value about 50.7%). The following table shows the probability for some other values of n (this table ignores the existence of leap years, as described above):

np(n)
1011.7%
2041.1%
2350.7%
3070.6%
5097.0%
5799.0%
10099.99997%
20099.9999999999999999999999999998%
300(100 − (6×10−80))%
350(100 − (3×10−129))%
365(100 − (1.45×10−155))%
366100%