In short, I'm not sure how moodle is performing its standard deviation calculations at all. I used to think it was just treating dashes as zeroes, but now I'm seeing really *really* odd behavior and I wonder if someone can help me track down *how* moodle calculates stddev. Here's what I have:
I have a list of grades for an assignment:
93 91 91 90 88 87 81 81 78 76 75 75 73 73 69 68 67 67 65 58 45 22 2 -
Note there's a grade at the end in the form of a '-'.
Then I wrote a really dead simple Python script to calculate the standard deviation. You can cut-n-paste the script below. It should work on any computer that has python installed, regardless of platform.
If you call that script and pass it all of the above grades replacing the "-" with a zero, the output looks like this:
=========================================
strudel:~ jonesy$ ./stddev.py 93 91 91 90 88 87 81 81 78 76 75 75 73 73 69 68 67 67 65 58 45 22 2 0
[93, 91, 91, 90, 88, 87, 81, 81, 78, 76, 75, 75, 73, 73, 69, 68, 67, 67, 65, 58, 45, 22, 2, 0]
Number of grades: 24
Total sum of all grades: 1615
Mean average: 67
Subtracted mean from each observed value: [26, 24, 24, 23, 21, 20, 14, 14, 11, 9, 8, 8, 6, 6, 2, 1, 0, 0, -2, -9, -22, -45, -65, -67]
Squared each value in above list: [676, 576, 576, 529, 441, 400, 196, 196, 121, 81, 64, 64, 36, 36, 4, 1, 0, 0, 4, 81, 484, 2025, 4225, 4489]
Added all values in above list: 15305
Variance: 665
Standard Deviation: 25.7875939165
Most grades fall between 92.7875939165 and 41.2124060835
=========================================
Now, if you *exclude* the dash from the sample population, your output will look like this:
=========================================
strudel:~ jonesy$ ./stddev.py 93 91 91 90 88 87 81 81 78 76 75 75 73 73 69 68 67 67 65 58 45 22 2
[93, 91, 91, 90, 88, 87, 81, 81, 78, 76, 75, 75, 73, 73, 69, 68, 67, 67, 65, 58, 45, 22, 2]
Number of grades: 23
Total sum of all grades: 1615
Mean average: 70
Subtracted mean from each observed value: [23, 21, 21, 20, 18, 17, 11, 11, 8, 6, 5, 5, 3, 3, -1, -2, -3, -3, -5, -12, -25, -48, -68]
Squared each value in above list: [529, 441, 441, 400, 324, 289, 121, 121, 64, 36, 25, 25, 9, 9, 1, 4, 9, 9, 25, 144, 625, 2304, 4624]
Added all values in above list: 10579
Variance: 480
Standard Deviation: 21.9089023002
Most grades fall between 91.9089023002 and 48.0910976998
=========================================
However, when I click "stats" in Moodle for the same set of grades, I get this:
Highest: | 93 |
Lowest: | - |
Average: | 67.29 |
Median: | 74 |
Mode: | 67, 73, 81, 75, 91 |
Standard Deviation: | 50.50 |
Can anyone explain how the standard deviation in Moodle is between 20 and 30 points different compared to my own calculations?
#!/usr/bin/python
import sys
import math
grades = [int(x) for x in sys.argv[1:]]
print grades
def getstats(grades):
ttl = sum(grades)
numgrades = len(grades)
avg = ttl/numgrades
avgsub = [val - avg for val in grades]
square_avgsub = [pow(val,2) for val in avgsub]
add_squares = sum(square_avgsub)
variance = add_squares / (numgrades-1)
stddev = math.sqrt(variance)
lowbound = avg - stddev
hibound = avg + stddev
print "Number of grades: %s" % numgrades
print "Total sum of all grades: %s" % ttl
print "Mean average: %s" % avg
print "Subtracted mean from each observed value: %s" % avgsub
print "Squared each value in above list: %s" % square_avgsub
print "Added all values in above list: %s" % add_squares
print "Variance: %s" % variance
print "Standard Deviation: %s" % stddev
print "Most grades fall between %s and %s" % (hibound, lowbound)
getstats(grades)