Visvanath -
It looks like I stirred up a hornet's nest here...
W/re rich function vs not - I think there's very much a place for both (and let the marketplace sort it out). Clearly Google has done well with the simple, lean and mean approach... but clearly there are more feature rich products which are also doing well. I think the Moodle developers are in a pretty sweet position in that: those that don't want the richer functions, they can stay with 1.9, probably can do so for years and years, and not be crippled in their teaching; conversely, for those that want more features/function, there will be 2.x. Clearly bloatware is a liability, and that's one of the dangers of rich-features, ie, the size and resource demands must not scale faster than the function set. MS with some of its mid-generation Windows clearly added bloat much faster than usable feature... and the users noticed (shall we say).
W/re optimization - I have never seen anything even approaching the level of packing and cycle optimization that was in the Apollo code, and having said that, never hope to ever again. The *ix kernal is orders of mag removed from that level of opt. I could probably do a multi-hour talk on what we had to do to make the thing work/fit. [ever written 12m lines of assembly code, with an instruction execution timing chart at hand.. let's see: an add fullword (memory to register) runs 1.5usec, but uses up an extra 2 bytes of data memory, where a add halfword runs 1.9usecs, and doesn't use those 2 bytes... so a visit to the costing committee to see which we have less of in that overlay: cycles or memory, then choose.)
w/re forgiving code: if the code is perfect, and I literally mean that, then one can strip out all the recovery code, all the value checking, all everything that isn't directly related to getting three men in a can home safely. But when one does get a core dump (yes, core), generally there was so little left that one couldn't debug it. Forgiving code is exactly that - the user can do something unexpected, and it doesn't crash and burn, but does something useful/meaningful about it. I think users today are much less tolerant of a blue-screen system lockups. That comes at a cost.
w/re the armada - we're not talking about that much of a computer, really, more like $2500 +/- (though not big name built, but one that one puts together themselves, ie, no 400% profit margins for Dell or HP etc). And yes, the same bits hopefully will arrive nomatter what... but user time is also valuable, beyond instructor time, beyond IT support people time. But beyond that: by dispatching a transaction as quickly as possible, there is a much higher probability of the hw cache working in one's favor - if it's spread out over time, then the cache probably will contain bit and pieces of multiple transactions, meaning one is running from ram, which for most every modern processor is a huge bottleneck. If one can run from cache, and avoiding processor stalls, one can get by with 1/5th the raw processing capability. I think most people that are designing high-performance transaction processing system have long ago learned this: the longer period of time over which a sequence of instructions are executed, the long time^2 it takes. And given the instruction path lengths within Moodle (at least 1.9) even a 3.0ghz proc (maybe 4 core) allowed to run from cache delivers very impressive performance.
w/re parallel NICs - two approaches: one, yes, bonded, aka shotgun; the other is to have more than one T3s coming in from a backbone. In either case, having one or two cores aggregating incoming traffic and feeding the other 6 -14 cores.