I like decision trees and random forests as well but, at least for what
I've seen, they have difficulties to generalise complex relations
between features and having no control over the input features a neural
net should be less prone to overfitting and it should be able to
generalise better; I agree that random forests would partly solve this
but I don't know to what extent.
Random forests are quite "easy" in terms of modeling. You can (almost) throw in as many independent variables as you want without having to face serious problems like in linear regression where variables must not have a high correlation. If you do cross-validation in the training (splitting your data in several subsets and use one set for training and the others for validation and then changing the training set, etc.) overfitting should not be an issue.
The R xgboost package is just more robust to use, because it has a better handling of missing values, and models are smaller in file size and training is faster.
I am very curious about the results of analyzing submitted data sets and creating models for different situations of moodle use.
I have to say that I am working for a commercial company selling software for risk prediction in moodle, i.e. exactly the same goals as Project Inspire. My interest is of course to understand how other software works, and my manager is OK with me sharing openly our concepts and insights, etc. I do not think that there needs to be competition between open source software and commercial software (actually I am a big fan of the open source idea), because the crucial point here is the modeling. Our clients pay us to do it for them with their data, evaluate the model after the term has finished, etc. And with open source software they have to pay their own person spending quite some time doing (and first understanding) the modeling.
I cannot submit any data (because it is the clients' data). I can only share the experience that we were initially hoping that after some initial clients we would also kind of "converge" to a general model, but all the clients used moodle in different ways which made it necessary to make individual adjustments, up to refusing to create a model, if the client did not use the gradebook "properly" (if they did not have a minimum number of graded items in moodle), or if in hybrid courses the online part is just too low. But of course, the beauty of open source is, that a lot more people can contribute, and if a much higher number of people submit their data, it may still be possible to create satisfying general models.