Probability question

Probability question

by Tim Hunt -
Number of replies: 7
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
It occurred to me that people here might be able to help.

Suppose you randomly distribute N thinks into M buckets, what is the probability that any of the buckets contain more than K items?

The kind of numbers we are interested in is N ~10000000, M ~4000, K ~32000.

Or, to put it another way, for a given M and K, how big can N be while the probability above remains vanishingly small?

This relates to a design decision in the Moodle 2.0 files API. See MDL-23885 for more details.
Average of ratings: -
In reply to Tim Hunt

Re: Probability question

by Samantha Steel -
I think you mean "what is the probability of having one or more buckets with more than K items". If this is the case, the probability is given by B(N-K+M-2 , M-1) / B(N+M-1 , M-1) where B(a,b) stands for the binomial coefficient. With the numbers you provide I get 2.71925*10^-6. The formula above can be approximated by (1-(K-1)/(N+M-10)^(M-1). IMHO, a good rule of thumb would be to choose N > K*M/(-log E) - M, where K and M are given and E is the upper bound of the probability you want to control.
In reply to Samantha Steel

Re: Probability question

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
You correctly interpreted what I meant. I think what I wrote means exactly the same as what you wrote.

I am failing to follow the derivation of your formula. Can you give me some clues. It is about 10 years since I got my last maths degree, so I am pretty rusty, but I feel I ought to be able to understand this with some help.
In reply to Tim Hunt

Re: Probability question

by Samantha Steel -
Okay, we've got N objects and M buckets. Any distribution of the objects in the buckets may be visualized as follows: ooomoomoomoommmoo... where there are N 'o' letters and M-1 'm' letters. The 'm' letters separate M regions, which contain as many 'o's as objects in the bucket. Two 'm' together with no 'o' in between mean the empty bucket. So we have N+M-1 places to fill with 'o' and 'm', which, by the very definition of binomial coefficient, can be done in B(N+M-1,M-1) ways. This number quantifies all the possible arrangements of the objects in the buckets. If we are interested in finding how many of these arrangements contain at least a bucket with K objects or more, we can think this way. Let's put together K+1 objects ooo...ooo ( K+1 'o' in total) and then add in all possible ways M-1 'm' and N-K-1 'o'. This would result in an arrangement where we surely find a bucket with K+1 objects. By the same formula, there are B(N-K-M-1,M-1) posibilities. Dividing de latter by the former we get the probability in question.
The approximation I gave follows in a standard way with some manipulation, but it's in no manner necessary, some trial and error could be enough with any good mathematical software ( I don't think a spreadsheet would do).
In reply to Samantha Steel

Re: Probability question

by Colin Fraser -
Picture of Documentation writers Picture of Testers
Je ne compris pas!!!!


In reply to Colin Fraser

Re: Probability question

by Mary Cooch -
Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Testers Picture of Translators

Je ne COMPRENDS pas, Colin wink

In reply to Mary Cooch

Re: Probability question

by Glenys Hanson -
Je comprends le français mais je ne comprends pas les maths ! sad
In reply to Glenys Hanson

Re: Probability question

by Colin Fraser -
Picture of Documentation writers Picture of Testers
Je ne comprend Francais also!!!!!!!! big grinbig grinbig grinbig grin

Live too far away from anywhere to understand anything.... nearest neighbour is 3,000ks in any direction. tongueout