For this particular issue, note that Google doesn't really like visiting sites with large numbers of parameters after the '?'. In this case it appears to be cutting off the last one and just trying what's left anyway.
The googlebots are actually quite smart, and do seem to learn about your site to spider it better, e.g. they know which pages are updated regularly and spider them more often to get the 'fresh' content (the recent updates page is therefore an ideal candidate for spider blocking with robots.txt) so I'm guessing it would stop visiting these pages if it recieved a 404 or other machine readable error page, but it doesn't and the bot isn't (yet) smart enough to read the actual page and realise that it is an error.
I had a read through the Google guidelines for webmasters and the only thing that jumped out at me that Moodle might be doing to confuse these spiders is the following:
- Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behaviour, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.
I'd guess that if the googlebot doesn't accept cookies then the sessionID will be written into the URL. Those with multiple visits from the googlebot might want to check to see if the same page is being visited by bots with different sessionIDS.