Probstat/notes/balls and bins

จาก Theory Wiki
ไปยังการนำทาง ไปยังการค้นหา
This is part of probstat. These notes are meant to be used in complement to the video lectures. They only contain summary of the materials discussed in the video. Don't use them to avoid watching the clips please.
The materials on this part is from this course at Berkeley.

We consider a balls-and-bins experiment where we throw n balls independently into n bins uniformly at random.

Question 1: How many possible outcomes are there?

Since each ball has n choices and their choices are independent, there are outcomes.

Question 2: What is the probability that bin 1 is empty?

In this case, each ball only have n - 1 choices (because they have to avoid bin 1); therefore there are outcomes where bin 1 is empty. Since each outcome is equally likely, the probability that bin 1 is empty is .

Number of empty bins

Let random variable X be the number of empty bins.

Question 1: What is P{X 0}?

If there is no empty bin, every ball must land into different bin. In this case, the first ball has n choices, the second ball has n-1 choices, and in general the i-th ball has n - i +1 choices. Therefore, there are outcomes. The probability is .

It will be very hard to compute E[X] directly. (Just computing P{ X = 3 } seems to be exceedingly hard.)

The Linearity of Expectation

One of the reason why expectations are rather easy to work with is that they have a certain linear property stated as follows.

Linearity of Expectation.

For random variables X and Y, we have that

This is also true for more than one random variable. Let be random variables. Thus,

This property is always true even when they are dependent.

This also leads to various nice properties of expectation. A very useful fact is this: for constants a and c, .

Using the linearity of expectation

In order to deal with complicated random variables such as X above, we can break it down into "simpler" random variables.

For bin i, we define a random variable Xi to be 1 if bin i is empty and 0 otherwise. Note that with this definition, we have that

.

(You should think for a few minutes so that you understand this.)

This implies that , because the expectations of two equal things must be equal.

Note that we use each random variable Xi to indicate if each bin is empty. This type of random variables is generally referred to as indicator random variables.

Since we know that X is the sum of other random variables, the linearity of expectation tells us that

.

Therefore, if we can find then we are done. So let's consider indicator random variable Xi. From the definition, we have that

,

which clearly is

So, when does Xi is 1? This, by the definition of Xi, is equivalent to the event that bin i is empty that we have already calculated in Question 2. Thus,

.

This implies that

Again, the first step is true because and the second step is true because of the linearity of expectation.

Note that as n gets close to infinity, is close to . Thus, , or there are about 36.7% empty bins.

Remark: For an indicator random variable A, .

Fullest bins

In this section, we will try to give a guarantee on the number of balls in the fullest bin.

Alternative analysis