AI Text Question Type LLM based marking of free text responses

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Number of replies: 16
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片

Dear Marcus,

I’ve come across your post by chance and recognising your name, I’m fascinated by the impact of the plugin.  There are a lot of replies here with links and I’ll admit that I’ve scanned though as there are multiple thoughts in my mind.  Thus:

  1. Could the tool detect if a response was AI generated before expending time pre-marking?
  2. What do students feel about a machine telling them, a human, how to communicate in their human language?
  3. As education is expensive, what do students feel about their fee going towards AI instead of a member of staff to assess their work?  As I get the impression that staff will check only if there are issues?
  4. As AI learns the human language and then corrects humans, then will this then cause the language to stagnate?  Over time human language evolves and adapts, we change it to suit the needs of the environment for which we exist in.  But if AI causes a static definitive state of the language (like a baseline version in software) where new human adaptions are rejected as false then is this a negative aspect?
  5. Will / could AI introduce new concepts and language constructs of its own accord and we, the humans become what AI wants us to be rather than the other way around?
  6. Is there scope for local AI solutions, such as https://www.raspberrypi.com/products/ai-kit/ - for which I have no idea if it is capable enough of running such a language model.
  7. Should AI only be used for what we can’t do (within the same relative duration) instead of what we can?

Thoughts welcome.

Kind regards,

Gareth

評比平均分數:Useful (2)
In reply to Gareth Barnard

AI Text Question Type LLM based marking of free text responses

Marcus Green發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片 Testers的相片
Hi Gareth thank you for your excellent questions.
I have split your question into a new thread as that other one was getting very long.

I have created some documentation for the quesiton type that covers some of the wider issues relating to this type of technology which you can see here
https://github.com/marcusgreen/moodle-qtype_aitext/wiki

1) Could the tool detect if a response was AI generated before expending time pre-marking?

Possibly, but it is not something I am interested in. When I was a teacher I found a short conversation with students to be quite effective to confirm if a student had created the work they submitted. I write about this here

https://github.com/marcusgreen/moodle-qtype_aitext/wiki/Cheating

2) What do students feel about a machine telling them, a human, how to communicate in their human language?

The feedback I have got is that students assume that the LLM response is "correct", This came up in a a webinar I was in last week which you can watch here. This was referenced in an academic paper that was written about AI Text that is referenced in the presentation.

3) As education is expensive, what do students feel about their fee going towards AI instead of a member of staff to assess their work?  As I get the impression that staff will check only if there are issues?

The cost of the inference (the work done by the LLM) is very low on a per student basis. My first year of experimentation with the question type incurred costs of well under $USD100 and the feedback I have yet to hear of cost being a concern. The cost of inference has dropped hugely and I predict it will continue to drop.  

"I get the impression that staff will check only if there are issues?"

My advice is that staff should always check, but the issue with these systems is not that they get things obviously wrong, but that they repeatedly get things correct until people become complacent and innacuracies creep in.

4) As AI learns the human language and then corrects humans, then will this then cause the language to stagnate?  Over time human language evolves and adapts, we change it to suit the needs of the environment for which we exist in.  But if AI causes a static definitive state of the language (like a baseline version in software) where new human adaptions are rejected as false then is this a negative aspect?

My short answer is no, in Language is dynamic and will continue to do so and LLM systems will track human use during those changes, though in the same way tradition media affects how Language is used (e.g. 6 7) LLM's will become part of that loop. 

4) Will / could AI introduce new concepts and language constructs of its own accord and we, the humans become what AI wants us to be rather than the other way around?

I think that the people in charge of the Big AI companies may attempt to bend language and beliefs to their own views (See Groqpedia), but I suspect that LLM's will become commodities and underminde their plans. A clue as to this is the way the Chinese models have been highly competitive on price and performance and helped lower the cost of the US alternative.

6) Is there scope for local AI solutions, such as https://www.raspberrypi.com/products/ai-kit/ - for which I have no idea if it is capable enough of running such a language model.

Yes. On my trip to Japan to  present I was running an LLM on my modest laptop (Lenovo X280) when I had no internet connection. The performance was slow but OK for testing. The hardware in that link doesn't help with running what is necessary for what I do but I have run it on standard Raspberrry Pi's and the response time is measured in minutes rather than seconds (e.g. 3 or 4 minutes). That may seem slow, but I think it is still potentially very useful and I anticiplate the arrival of hardware to accelerate that process. I am a big fan of the MoodleBox project and have been buying the latest Raspberry Pi each time they come out to see what can be done with it.  I will continue to do this and I have been collaborating with people who get EdTech into low resource places, e.g. intermittant power and little to no internet connectivity.

7) Should AI only be used for what we can’t do (within the same relative duration) instead of what we can?

AI/LLM enabled Edtech is just another tool. When you have a big shiny new hammer it is tempting to see everything as a nail. There has been a lot of talk of using LLM's to generate learning material, but to quote Dr Tim Hunt of the Open University 

"'As far as I can see, "lack of content" is not a problem the world suffers from. If anything, the opposite. "'

By contrast giving student feedback specific to them is a significant task that absorbs time teachers could spend on the things technology cannot do. It is rare for a student to say their teacher inspired them by the quality and amount of marking they did.

It is worthwhile being aware of some EU policy on the use of AI in Education 

https://artificialintelligenceact.eu/annex/3/

Annex III: High-Risk AI Systems Referred to in Article 6(2)

3. Education and vocational training:

(a) AI systems intended to be used to determine access or admission or to assign natural persons to educational and vocational training institutions at all levels;

(b) AI systems intended to be used to evaluate learning outcomes, including when those outcomes are used to steer the learning process of natural persons in educational and vocational training institutions at all levels;



評比平均分數:Useful (2)
In reply to Marcus Green

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片

Dear Marcus,

Thank you for your reply. Very interesting smile. My initial thoughts before I watch and read your webinar / links:

  • I wonder if the use of LLM's will now make courses cheaper for students?
  • Pi wise, I've heard of MoodleBox but not used it. I do run Moodle installs on Windows and Linux, sticking to a bespoke install. Having recently invested in a Pi 5, then I intend to see how that goes and as Raspbian is Debian based then what I facilitate on https://www.moodlebites.com/mod/page/view.php?id=3212 helps install Moodle on such.
Lots of food for thought!

Kind regards,

Gareth
評比平均分數:Useful (1)
In reply to Gareth Barnard

AI Text Question Type LLM based marking of free text responses

Visvanath Ratnaweera發表於
Particularly helpful Moodlers的相片 Translators的相片
Hi Gareth

You wrote:
> I wonder if the use of LLM's will now make courses cheaper for students?

Not really my world. But fallen in to in the Lounge: ‘We could have asked ChatGPT’: students fight back over course taught by AI. I hope, it'll only for a short stay.
眨眼
 
> I've heard of MoodleBox but not used it.
 
I lot has happened and happening on that topic. See MoodleBox Support forum. From a system administration point of view the quick separation of hardware and software in R Pi makes experimenting on may installations very easy: You make a micro-SD card for each installation and enjoy plug-and-play. Running them "headless" with no remove GUI either is a great exercise on the Linux command language.
 
 
評比平均分數:Useful (1)
In reply to Visvanath Ratnaweera

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片
Dear Visvanath,

In essence then with LLM's they enhance rather than replace elements of the course marking process then?

I already run a Git server on a Pi with no GUI, pure command line. And experiment with Moodle installs on Ubuntu both with and without a GUI. But Moodlebox could be a quicker way of getting up and running.

Kind regards,

Gareth
In reply to Gareth Barnard

AI Text Question Type LLM based marking of free text responses

Marcus Green發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片 Testers的相片
I have run LLM's on PI, but it is very slow. I keep looking out for some low cost better inferencing to run alongside the PI, but at the moment that would add about £500 to the tottal cost. The Webinar was very enjoyable and it reminded me of all the excellent stuff Joseph has been doing since the dawn of Moodletime.
In reply to Marcus Green

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片
Dear Marcus and Joe,

I'm still attempting to find the time to finish watching the webinar!

One thought that has struck me though is that if you slowed down the LLM response, then could you then apply the Turing Test ( https://en.wikipedia.org/wiki/Turing_test )? In that you could have random questions answered by humans instead and see if the students notice the difference? If they didn't then would then that validate the LLM in terms of accuracy and fitness for purpose.

Kind regards,

Gareth
In reply to Gareth Barnard

AI Text Question Type LLM based marking of free text responses

Joseph Thibault發表於
Particularly helpful Moodlers的相片 Plugin developers的相片 Testers的相片

Gareth, 

Interesting thought, I'm not sure I fully understand. 

In the research we discussed, students were provided that the responses/feedback were AI generated (the transparency was/is intentional). 

Are you testing the usefulness of the feedback, or trying something else with student responses?

In reply to Joseph Thibault

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片
Dear Joe,

I'm wondering if there could be a scientific experiment to determine if the LLM passes the Turing Test by sometimes having real humans, those that would have otherwise marked the answer, answer it. If students can't tell the difference, then it would pass the Turing test.

Kind regards,

Gareth
評比平均分數:Useful (1)
In reply to Gareth Barnard

AI Text Question Type LLM based marking of free text responses

Marcus Green發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片 Testers的相片
Did you mean answered by humans or that the feedback should be given by human (teachers)..
In reply to Marcus Green

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片
Dear Marcus,

That some feedback (random) should be given by teachers.

Kind regards,

Gareth
In reply to Gareth Barnard

AI Text Question Type LLM based marking of free text responses

Marcus Green發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片 Testers的相片
I suspect a student that was paying attention and looking for it woul spot the difference, and if they were interested they would give responses that returned signs that the response was not from a human. It would be possible to frame prompts that might mitigate against that but not entirely stop it. I was experimenting just before I wrote this post with question prompts that strongly hinted that the resonse was not from a human, but they were fairly trivial prompts. The disclaimer appended to each response is set at the site level on the assumption that the administrators would want students to know that it was generated by an LLM.
In reply to Marcus Green

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片
Ah, to clarify, I'm actually wondering if the tool could be used (in a controlled manner) to prove the Turing Test - thus without the disclaimer for that experiment. I agree though that under normal circumstances that the disclaimer should be there.
In reply to Gareth Barnard

AI Text Question Type LLM based marking of free text responses

Marcus Green發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片 Testers的相片
For me the key element in the Turing test
https://en.wikipedia.org/wiki/Turing_test

Is the suceptibility of the person taking it. Having said that I think most people would work out the responses were from a computer given enough questions, and by enough I mean measured in tens not hundreds.

Random fact, my neighbour growing up was one of Alan Turings close friends (Rupert Morecomb)
In reply to Marcus Green

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片

But "Having said that I think most people would work out the responses were from a computer given enough questions, and by enough I mean measured in tens not hundreds." is a hypothesis, and therefore to fully understand if it is accurate then it needs to be tested to the point where there is sufficient replicable data upon which to assert that it is demonstrated to be true.

In reply to Gareth Barnard

AI Text Question Type LLM based marking of free text responses

Joseph Thibault發表於
Particularly helpful Moodlers的相片 Plugin developers的相片 Testers的相片

@Gareth, additional answers for you for 1. and 2.

1. Could the tool detect if a response was AI generated before expending time pre-marking?

This is possible with other tools, including Cursive, which extend Tiny or plugin into a backend for similarity or AI detection. Cursive creates a revision history for each essay so the teacher can see how it was constructed, even showing a replay in the case a student had the right response but deleted it/changed it. 

2. What do students feel about a machine telling them, a human, how to communicate in their human language?

Marcus and I touched on this in the webinar with help from research of his tool directly by Sojo University. The short of it is twofold: a) students took the feedback as gospel, when it replaces or comes before teacher feedback, its treated as authoritative (which is problematic if the feedback is inaccurate, wrong, and/or inconsistent), and b) students generally were very positive for the feedback. 

The full Journal article is great, a preliminary, but essential grounding in how AI feedback changes things: https://www.castledown.com/journals/tltl/article/view/tltl.v7n1.2208

-Joe

評比平均分數:Useful (3)
In reply to Joseph Thibault

AI Text Question Type LLM based marking of free text responses

Gareth Barnard發表於
Core developers的相片 Particularly helpful Moodlers的相片 Plugin developers的相片
Dear Joe,

Thank you. I'm in the process of watching the webinar and will follow up on the article. Interesting, very interesting. Already have more thoughts.

Kind regards,

Gareth