Moodle announces Project Inspire! Integrated Learning Analytics Tools

Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -
Number of replies: 29

Moodle HQ formally introduced Project Inspire today, which will provide integrated Learning Analytics tools that assist educational outcomes beginning with Moodle 3.3. Now we need the support of the Moodle Community!

We need data from as many participants from the Moodle community as possible-- all are welcome to participate. Help us develop next-generation learning analytics that go beyond simple descriptive analytics to provide predictions of learner success, and ultimately diagnosis and prescriptions (advisements) to learners and teachers!

For more information, please see the Project Inspire community page!

Average of ratings: Useful (2)
In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Dan McGuire -

It appears that you're still stuck on this notion that you can avoid including the teacher or faculty in the learning process. You've posted the bastardized definition of 'blended' learning and mashed it into a definition of 'hybrid' learning without making any distinction. It seems that you're using the definition popularized by Michael Horn.

I'll point you to this blog post of mine that introduces Michael Horn to Blended Learning. The term 'blended' appeared only twice in the first issue of their book and in both instances it was not used in a substantive discussion about blended learning as a model of teaching and learning. Of course, they added a discussion about it in later editions, after the post above. They then went on to drastically damage the very useful construct of Blended Learning that Randy Garrison and Norm Vaughan so thoroughly described in their 2008 seminal book. It's a shame that Horn didn't use more of that work when he went nuts and created his Blended framework, but, as I've said on many occasions, Horn is not a teacher; he's a business guy.

For a more nuanced discussion of blended and hybrid take a look at the paper for which we received a Best Paper award at the 2014 HLC Conference.  I'm not suggesting the convolution of the terms, blended and hybrid, is going to go away anytime soon, but either term requires more than you've given in your definition.

The general thread of Project Inspire seems to be to avoid having teachers or faculty assess student work against a set of standards, criteria, competencies, or outcomes and then aggregate and report the assessments in very descriptive or even prescriptive ways. It seems that you'd rather use something other than teacher or peer assessment to do the assessing or analyzing of student learning.

Analyzing actions or behaviors that are associated with student learning is not the same thing as analyzing actual student learning. Teachers or faculty actively involved in the learning whether it be face to face, online, blended or hybrid (and those can and should be seen as different processes) is necessary. Let's talk about that.

In reply to Dan McGuire

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

Thank you for your clarifications about "blended" and "hybrid" as terms. I have heard these terms used in many ways, especially in different countries, so I tried to keep the definitions relatively generic, but I will review your references and see if I can make these definitions more clear and universal.

Project Inspire isn't aiming to replace assessment or teachers. We are looking at helping teachers and learners monitor engagement for sustained effort. Teachers will still need to assess learning. I am not sure why you think otherwise. Is there something in our language that is unclear?

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Dan McGuire -

You say you're  looking at "Monitoring engagement for sustained effort." That actually sounds scary to me. What exactly does that mean?

Trying to keep terms like 'blended' and 'hybrid' relatively generic is one of the problems I see with the something entitled 'Project Inspire.' (Isn't that what Jim Jones called his project that led to masses of people drinking poisonous Kool-Aid?) 'Relative' and 'generic' in the same sentence with assessment of learning or analyzing learning suggests crowding out the teacher or peers in assessment of learning. Saying that you've heard those terms in many ways especially in different countries makes your use of those terms even less reassuring. Those terms have been popularized in the U.S., at least, by those who are trying to replace teachers with software and machines. There are also people, like me, using those terms who are not trying to replace teachers with software or machines. Making a point of what exactly is meant by 'blended' or 'hybrid' has become a necessity in any informed discussion of either.

You say "What we are doing with Project Inspire is trying to find and make visible patterns of learner behavior, instructor behavior, and even course design that will lead to these desirable forms of success, and provide advice and encouragement to help learners be successful according to these definitions." You're not looking at actual evidence of student learning that has been assessed and commented on by a teacher or a learner's peer. You're doing a lot of stuff to avoid actually do that.

In the meantime, the competency frameworks features of Moodle need huge amounts of work to make reporting on student work that is aligned to competencies easily usable by teachers and faculty. It is not yet fully functioning. When that is working and widely implemented, the 'need' for what you're trying to with Project Inspire can be examined in a much more informed way. 

Another other email I got in my inbox this morning was an invitation from Intelliboard to participate in a focus group about what is needed to make reporting on competencies actually something that teachers can use. Of course, Intelliboard will be requiring a license fee to use the results of their focus groups and development. 

Also, in the meantime, I'm hearing about more and more K12 districts installing Canvas. I'm not hearing about new installations of Moodle. That makes me sad. Kool Aid won't make me less sad or inspired.



In reply to Dan McGuire

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

I have to say, it's a bit hard to discuss this with someone who seems determined to interpret everything in the most negative way possible. (Jim Jones? Really?) I will try to explain one more time, for the benefit of others reading this thread, if nothing else.

1: The most important goal is to help students succeed according to existing measures, which rely on human teacher-assessed judgments as recorded in Moodle.

2: We believe that students (and teachers) can benefit from metacognitive skill coaching to reach these goals.

3: We are attempting to empirically validate measures of metacognitive skills like persistence, along with experimental interpretations of cognitive, social, and teaching presence, to see which ones might generate the most helpful aids to students and teachers.

4: The Moodle community is international and while we will try to communicate as clearly as possible to teachers and learners in all countries, the project documents will not be specific to usage in any one country.

5: Regarding the definitions of "blended" and "hybrid", if they are misleading to a considerable proportion of Moodle users, I think it would be better to replace them with a more general term (perhaps "combination"). We do not, at this time, assume any particular definition of these terms in our code.

I hope this clarifies matters.

Average of ratings: Useful (2)
In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Dan McGuire -

The competency frameworks features of Moodle need huge amounts of work to make reporting on student work that is aligned to competencies easily usable by teachers and faculty. It is not yet fully functioning. When that is working and widely implemented, the 'need' for what you're trying to with Project Inspire can be examined in a much more informed way. 

This is not being negative. I'm stating what I think would be a more productive work flow for advancing the development of Moodle.

In reply to Dan McGuire

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

I completely agree that the competency frameworks in Moodle need more work, and that effort is continuing in parallel with this project. However, not all institutions want or need to use competencies, so Inspire will not require them. We have spent considerable thought on how to generalize interpretations of learner actions in a way that will work across different activities and courses. I would be happy to discuss our methodology in more detail with you, though we might want to move the discussion to the Inspire site.

Regards,

Elizabeth

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Dan McGuire -

I'm very interested in the methodology you used to generalize interpretations of learner actions in a way that will work across different activities and courses. I'm also interested in how you verify a correlation of that that with student learning.

I'm not particular about where the discussion resides.

In reply to Dan McGuire

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

I was just reviewing some of your earlier posts, and I thought of some points that might help clarify what we are doing.

First, we allow site administrators and teachers of individual courses (if allowed by the site admins) to specify what the definition of "student success" in the course is. The models we are supporting first allow choices of course completion (as defined in Moodle), final grade above passing, or completion of all competencies defined in the course. We will be able to easily support other measures of success going forward, but these seem good places to start. "Course completion" could potentially be automated, but also could be dependent on instructors manually setting completion criteria. Similarly, I suppose grades could be based entirely on automated assessments (shudder) but hopefully they will be given as a result of thoughtful assessment by teachers (and ideally, in conversation with students). Finally, competencies, like grades, can be tied to automated systems, but ideally would not be.

What we are doing with Project Inspire is trying to find and make visible patterns of learner behavior, instructor behavior, and even course design that will lead to these desirable forms of success, and provide advice and encouragement to help learners be successful according to these definitions.

I hope this helps.

Elizabeth

In reply to Dan McGuire

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

One final note on this: I've just finished reading your paper, and I think you may be interested to know that many of the indicators we are testing as predictors for student success (by the various definitions I described above) are based on the Community of Inquiry model, and are attempts to detect cognitive, social, and teaching presence. I would be happy to provide more detail on this if you are interested.

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Dan McGuire -

I'm absolutely interested in the details regarding your attempts to detect cognitive, social, and teaching presence, especially as those indicators are deemed able to predict student success.

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Tonia Malone -

Greetings,

Currently, we are looking for....

"Identify and support students at risk of not meeting institutional goals for success"

What data do you need? I can talk to our team to see what we can share. 

We are very interested in the "Inspire" project. 

Will this work in 3.1?

Thank you,

Tonia Malone

Cal Poly - SLO

In reply to Tonia Malone

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

We need a full copy of your production database, but we provide FERPA-compliant anonymization tools so we don't place personally identifying information (PII) at risk. We need data in the new logstore format, so it must be from a site that is at 2.7 or later with standard logs turned on.

At this time, the plan is to release Inspire as a separate plugin concurrently with Moodle 3.3. I don't know yet if this plugin will work with earlier versions of Moodle. That might require some custom coding. For Moodle 3.4 and beyond, Inspire is planned become part of core.

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Derek Chirnside -

Elizabeth, what is this (below) ?  Is this a new function of Moodle, or something special for Moodle.org?

-Derek


In reply to Derek Chirnside

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Looks like a customised lang string for 'Enrol me in this course'.

Average of ratings: Useful (2)
In reply to Derek Chirnside

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

Because the different offerings on moodle.org are considered "communities" rather than "courses," the language string reflects that. smile It's not specific to Project Inspire.

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

I have created a Working Group area within the Project Inspire community/course, to provide a space for discussing details like proposed indicators. I have posted some of our initial indicators that we hope to test against contributed data sets. These posts describe how we are attempting to operationalize specific Community of Inquiry features using Moodle data. CoI indicators are not the only ones we are testing, but we think they are a good place to start. Please feel free to contribute to this discussion within the Project Inspire community.

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Anand Kumar -

Hello ,

I'm looking the project inspire and i found some confusion in activity's cognitive depth and social breadth. What is the meaning of these indicators.? 

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Vera Friederichs -

Sorry for entering this discussion quite late, but I became only aware of this now.

I am a data scientist and I am curious about details of the underlying method. 'Machine learning' is quite a general term. Is there a place where I can find details about the statistical model, the learning method, the indicators used, etc.?

Thanks


In reply to Vera Friederichs

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by David Monllaó -

Hi Vera,

I am not sure how far have you dived into the available documentation but this whole thing is a system to create different prediction models reusing indicators; therefore, the indicators will be the ones you select for your model.

There is a new Moodle plugin type 'Machine learning backend'; you can find info about it in https://docs.moodle.org/dev/Analytics_API. I still need to create docs pages for the machine learning backends included in moodle core as a summary at the moment they only support supervised learning and only binary classification:

  • The PHP backend is quite simple. It is using logistic regression.
  • The Python one is using a feed-forward neural network with 1 single hidden layer.

They both use Matthews correlation coefficient to quantify prediction models quality; we are currently planning the roadmap; it would be great hearing your thoughts about this project.

Average of ratings: Useful (1)
In reply to David Monllaó

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Vera Friederichs -

Hi David,

that sounds all very good and very flexible, and including a lot of possibilities what people may want to do.

About the models: I personally like random forests and/or xgboost and/or support vector machines better than neural networks, but of course, they are also capable of doing the job.

I see the biggest difficulties in the fact, that modeling is still an art, in my opinion, which means that a person who knows what they are doing has to make all the choices (model, indicators, training samples, prediction samples, maybe training different models for different situations, etc.), otherwise the results will be by far not as good as they could be. 

Is there anything included around data cleaning? In my experience people do not always use moodle as they should. For example course start and end dates: If people re-use courses by just removing all enrolled students and enrolling new students, they sometimes do not adjust start and end date. And of course, the model training is only possible, if the moodle logs are not being deleted too often. wink

Average of ratings: Useful (1)
In reply to Vera Friederichs

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by David Monllaó -

Thanks for replying Vera. You nailed it with the challenges we have smile

We selected neural nets over other techniques as the default technique because they scale nicely and do not require all previous training data to be stored to be able to predict (we store the model state after training and we restore it back for both training and prediction). For PHP neural nets were the natural choice as well, but the PHP implementation was too slow to be used in production so we use the logistic regression one; it does not abstract that well (5-10% less accuracy) but it is acceptable as a start. Machine learning in PHP is not very popular ;)

The whole system aims to provide a default and simple base that scales properly and works well (not optimally obviously) for most supervised learning models; this base can be extended by advanced users to get optimal solutions; some examples of issues that will push the system towards this objective are https://tracker.moodle.org/browse/MDL-61006 and https://tracker.moodle.org/browse/MDL-60520. The limitation we have is that we still require people to "code". The target (label) needs to be coded in PHP. I have some ideas about how we could offer a list of elements to predict (a really simple format for Moodle components to expose targets from their database structure) but we need to see how practical would it be to create models based on these targets. We need to investigate this further, we have other issues we should fix before moving to this next stage.

Regarding data cleaning; it is a challenge and Moodle not having a standard way to rollover courses makes it harder. The best solution we could find for this is that each model defines its own conditions to accept analysable elements (e.g. courses). In 'Students at risk of dropping out' model courses are used for training straight after the end date is reached and we skip student enrolments which enrolment start and end dates do not make sense when comparing them with the course start and end. This affects both training and prediction, so courses with wrong start and end dates will not receive predictions. Sites with deleted logs do not help either, agree; I'm afraid we can't do much about it; those courses are also discarded. We provide a report for admins where we list all potentially analysable elements in the system that can not be used for training or prediction; this report includes the reason why they can not be used so, ideally, a Moodle site admin that notices that their courses do not get predictions because their course start and end dates are not set would fix the situation at some point.

Average of ratings: Useful (1)
In reply to David Monllaó

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Vera Friederichs -

> We selected neural nets over other techniques as the default technique because they scale nicely and do not require all previous training data to be stored to be able to predict (we store the model state after training and we restore it back for both training and prediction). For PHP neural nets were the natural choice as well, but the PHP implementation was too slow to be used in production so we use the logistic regression one; it does not abstract that well (5-10% less accuracy) but it is acceptable as a start. Machine learning in PHP is not very popular ;)


Random forest or xgboost models do also scale and you could create a pre-trained model. About the programming language: For statistical implementations I prefer R, because you can find packages for everything, but maybe it complicates things more to require people to install R.


> The whole system aims to provide a default and simple base that scales properly and works well (not optimally obviously) for most supervised learning models; this base can be extended by advanced users to get optimal solutions; some examples of issues that will push the system towards this objective are https://tracker.moodle.org/browse/MDL-61006 and https://tracker.moodle.org/browse/MDL-60520. The limitation we have is that we still require people to "code". The target (label) needs to be coded in PHP. I have some ideas about how we could offer a list of elements to predict (a really simple format for Moodle components to expose targets from their database structure) but we need to see how practical would it be to create models based on these targets. We need to investigate this further, we have other issues we should fix before moving to this next stage.


Do I understand correctly that you aim to create basic models which can be used without further training for users who do not have enough knowledge about modeling? If yes, I have some doubts. In my experience people use moodle in very different ways, and a "general average" model would be far less than optimal. But then again, it depends on the individual expectations what is considered to be good enough...


> Regarding data cleaning; it is a challenge and Moodle not having a standard way to rollover courses makes it harder. The best solution we could find for this is that each model defines its own conditions to accept analysable elements (e.g. courses). In 'Students at risk of dropping out' model courses are used for training straight after the end date is reached and we skip student enrolments which enrolment start and end dates do not make sense when comparing them with the course start and end. This affects both training and prediction, so courses with wrong start and end dates will not receive predictions. Sites with deleted logs do not help either, agree; I'm afraid we can't do much about it; those courses are also discarded. We provide a report for admins where we list all potentially analysable elements in the system that can not be used for training or prediction; this report includes the reason why they can not be used so, ideally, a Moodle site admin that notices that their courses do not get predictions because their course start and end dates are not set would fix the situation at some point.

I like that idea of telling people "If you want to get a prediction, bring your data in shape". smile Because all attempts to guess what the clean data should/could look like have their limitations.
In reply to Vera Friederichs

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Elizabeth Dalton -

Hi Vera,

I'm also a big fan of R. smile

I tend to prefer regression modeling myself, because even though the predictions are less precise, they are more interpretable. There is a trade-off between highly specific analysis techniques and interpretability, and we want to be able to offer guidance to teachers and students at some point based on these predictions. (To paraphrase Mark McKay, we are only interested in predicting the future in order to change it.)

I also agree on the difficulty of constructing "generic" models. We are actually starting with models that have some fairly defined expectations, such as a definition of "student success," and we make those assumptions clear in each model. As we develop more models (or models are contributed by the community), we hope these assumptions will continue to be made clear and will help guide Moodlers in choosing appropriate models for their circumstances. We are working on some ways to evaluate contexts of Moodle systems and courses to help adjust models to different common usages of Moodle, or to help administrators select the most appropriate models.

There is no way to automatically clean all the data, as you have noted, but we can look for indicators that data has problems, and we can flag those situations to the site administrators. This will hopefully help make people more aware of how they can improve their data quality. We are also thinking about tools that can help make maintaining good data quality easier, e.g. course rollover tools that would reset course start and end dates appropriately.

In reply to Elizabeth Dalton

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Séverin Terrier -
Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Testers Picture of Translators

Hi,

Talking about course rollover and course start and end dates, MDL-38501 (created more than 4 years ago, with some votes) could be a place to start.

Séverin

In reply to Vera Friederichs

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by David Monllaó -

I like decision trees and random forests as well but, at least for what I've seen, they have difficulties to generalise complex relations between features and having no control over the input features a neural net should be less prone to overfitting and it should be able to generalise better; I agree that random forests would partly solve this but I don't know to what extent. I am not familiar with xgboost; the library description seems interesting. I will read about it, thanks.

Regarding Do I understand correctly that you aim to create basic models which can be used without further training for users who do not have enough knowledge about modeling?  We aim to yes, but we can not do it yet for the exact reason you mention. Moodle 3.4 requires people to train their sites with their own data and we will not be able to include pre-trained models until we (Moodle HQ):

  1. Get enough anonymised datasets from the community (I would encourage everybody to participate https://moodle.org/course/view.php?id=17233&section=3)
  2. The datasets are diverse enough to cover most uses of Moodle e.g. workplace, unis, K-12, 100% online courses, partly-online courses, MOOCs, courses that are compulsory to get a degree, courses containing required activities, courses that are just a support for tradicional in-site courses... I am aware that this statement is very ambiguous, we can refine it once we are done with #1 smile
  3. We restore these datasets (moodle database .sql files), analyse them, evaluate our current models (indicators + target) performance and add extra context indicators when we see that the performance is far from the baseline (the baseline performance could be the performance of a structured site using completion, strict dates...) This is the key for having a single model that can generalise to any site. We are already including extra features with per-course averages to all indicators with linear values; we may need to expand this to percentiles, each sample std deviation from the mean...  
  4. Setup an ensemble of all sites trained models and expand machine learning backend APIs so they can accept initial weights that will be completed by each site training data 

This is not carved in stone in any roadmap, it is just what sounds reasonable and what I think can lead to models that can generalise well in any site; I would not vote for releasing Moodle with pre-trained models until we know that they generalise well enough; what well enough means depends on the model.

It is great reading your comments; please feel free to continue sharing you thoughts; there is no much people with data science background contributing to our discussions.

In reply to David Monllaó

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by Vera Friederichs -

I like decision trees and random forests as well but, at least for what I've seen, they have difficulties to generalise complex relations between features and having no control over the input features a neural net should be less prone to overfitting and it should be able to generalise better; I agree that random forests would partly solve this but I don't know to what extent.


Random forests are quite "easy" in terms of modeling. You can (almost) throw in as many independent variables as you want without having to face serious problems like in linear regression where variables must not have a high correlation. If you do cross-validation in the training (splitting your data in several subsets and use one set for training and the others for validation and then changing the training set, etc.) overfitting should not be an issue.

The R xgboost package is just more robust to use, because it has a better handling of missing values, and models are smaller in file size and training is faster.


I am very curious about the results of analyzing submitted data sets and creating models for different situations of moodle use.

I have to say that I am working for a commercial company selling software for risk prediction in moodle, i.e. exactly the same goals as Project Inspire. My interest is of course to understand how other software works, and my manager is OK with me sharing openly our concepts and insights, etc. I do not think that there needs to be competition between open source software and commercial software (actually I am a big fan of the open source idea), because the crucial point here is the modeling. Our clients pay us to do it for them with their data, evaluate the model after the term has finished, etc. And with open source software they have to pay their own person spending quite some time doing (and first understanding) the modeling.

I cannot submit any data (because it is the clients' data). I can only share the experience that we were initially hoping that after some initial clients we would also kind of "converge" to a general model, but all the clients used moodle in different ways which made it necessary to make individual adjustments, up to refusing to create a model, if the client did not use the gradebook "properly" (if they did not have a minimum number of graded items in moodle), or if in hybrid courses the online part is just too low. But of course, the beauty of open source is, that a lot more people can contribute, and if a much higher number of people submit their data, it may still be possible to create satisfying general models.

In reply to Vera Friederichs

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by David Monllaó -

Some months ago I setup a new model to test that the API is able to work at different context levels, it was about detection of late assignment submissions on specific assignment activities; it is nothing serious but allowed me to see how models generalize to different sites. It is easier to play with this model than with prevention of students at risk because in this case the label is clearly defined and easily calculated; the classes are well balanced which also helps. Despite I used 2 datasets and the courses they contain are not very well structured nor very clean, despite some indicators are partly coupled to the label and despite most assignments are submitted the day before the due date (or the same day hehehe) the model is able to predict late assignment submissions 2 and 4 days before the due date with around an 75-80% accuracy using test data from the same site (not used for training obviously) and around a 70-75% using another site as test data. Only 1 site data was used to train the model; the difference between these 2 accuracies should decrease as we train using new sites datasets although unseen sites accuracy will likely be always lower. 

The student indicators this model uses are:

  • How close to the due date the student submitted other assignments (also considers if he didn't submit at all)
  • Same for quizzes and their time close dates
  • Same for choice activities and their time close dates
  • Weighted number of write actions on the analysed activity
  • Weighted number of read actions on the analysed activity

The key to make this model generalise well to other sites seemed to be to use extra indicators to add context to the student indicators:

  • Is activity completion enabled for that activity?
  • How much weight does the activity have in the gradebook?
  • Is grade to pass set for that activity?

As I said above this is just an example and adding more context indicators should help this model generalise even further. This is not part of the current HQ priorities.

This model is available in https://github.com/dmonllao/moodle-local_testanalytics/tree/late-assign-submissions (late-assign-submissions branch). Again, this is nothing serious so don't expect much documentation but you should be able to install the plugin and evaluate this late assignments submissions model using your site data without any problem. If you are interested in playing with this model and given that you are a data scientist I would recommend you to evaluate the model using a few of your client sites (using --timesplitting='\local_testanalytics\analytics\time_splitting\close_to_deadline') and, instead of using the predictions processor that is included in moodle, download the resulting .csv files (you can use https://gist.github.com/dmonllao/d1db52b11c9ca00e76ab8ddcb95c6c93 for it) This way you can use your own algorithms to compare how well the model generalises.

I got the results presented above using neural nets with adam optimization, dropout regularization, tanh activation and a decaying learning rate from a generous 0.5 to 0.005. You can reproduce them with https://github.com/dmonllao/tensorflow-performance-playground (all options documented in python train.py --help, only a subset of them in README.md) the results will always depend on the datasets you use but given that the datasets I used were significantly different I wouldn't expect much differences between using unseen test data from one of the training datasets vs using test data from a completely different site.

In reply to David Monllaó

Re: Moodle announces Project Inspire! Integrated Learning Analytics Tools

by David Monllaó -

I forgot to mention that Cristobal Romero suggested using transfer learning as an alternative to #4; it makes a lot of sense and this option should also be studied when we reach that point.