It occurred to me that people here might be able to help.

Suppose you randomly distribute N thinks into M buckets, what is the probability that any of the buckets contain more than K items?

The kind of numbers we are interested in is N ~10000000, M ~4000, K ~32000.

Or, to put it another way, for a given M and K, how big can N be while the probability above remains vanishingly small?

This relates to a design decision in the Moodle 2.0 files API. See MDL-23885 for more details.

### Probability question

Number of replies: 7### Re: Probability question

I think you mean "what is the probability of having one or more buckets with more than K items". If this is the case, the probability is given by B(N-K+M-2 , M-1) / B(N+M-1 , M-1) where B(a,b) stands for the binomial coefficient. With the numbers you provide I get 2.71925*10^-6. The formula above can be approximated by (1-(K-1)/(N+M-10)^(M-1). IMHO, a good rule of thumb would be to choose N > K*M/(-log E) - M, where K and M are given and E is the upper bound of the probability you want to control.

### Re: Probability question

You correctly interpreted what I meant. I think what I wrote means exactly the same as what you wrote.

I am failing to follow the derivation of your formula. Can you give me some clues. It is about 10 years since I got my last maths degree, so I am pretty rusty, but I feel I ought to be able to understand this with some help.

I am failing to follow the derivation of your formula. Can you give me some clues. It is about 10 years since I got my last maths degree, so I am pretty rusty, but I feel I ought to be able to understand this with some help.

### Re: Probability question

Okay, we've got N objects and M buckets. Any distribution of the objects in the buckets may be visualized as follows: ooomoomoomoommmoo... where there are N 'o' letters and M-1 'm' letters. The 'm' letters separate M regions, which contain as many 'o's as objects in the bucket. Two 'm' together with no 'o' in between mean the empty bucket. So we have N+M-1 places to fill with 'o' and 'm', which, by the very definition of binomial coefficient, can be done in B(N+M-1,M-1) ways. This number quantifies all the possible arrangements of the objects in the buckets. If we are interested in finding how many of these arrangements contain at least a bucket with K objects or more, we can think this way. Let's put together K+1 objects ooo...ooo ( K+1 'o' in total) and then add in all possible ways M-1 'm' and N-K-1 'o'. This would result in an arrangement where we surely find a bucket with K+1 objects. By the same formula, there are B(N-K-M-1,M-1) posibilities. Dividing de latter by the former we get the probability in question.

The approximation I gave follows in a standard way with some manipulation, but it's in no manner necessary, some trial and error could be enough with any good mathematical software ( I don't think a spreadsheet would do).

The approximation I gave follows in a standard way with some manipulation, but it's in no manner necessary, some trial and error could be enough with any good mathematical software ( I don't think a spreadsheet would do).

### Re: Probability question

Je ne compris pas!!!!

### Re: Probability question

Je ne COMPRENDS pas, Colin

### Re: Probability question

Je comprends le français mais je ne comprends pas les maths !

### Re: Probability question

Je ne comprend Francais also!!!!!!!!

Live too far away from anywhere to understand anything.... nearest neighbour is 3,000ks in any direction.

Live too far away from anywhere to understand anything.... nearest neighbour is 3,000ks in any direction.