data privacy plugin 3.3.5 hanging in cron

data privacy plugin 3.3.5 hanging in cron

by Tim Gildersleeve -
Number of replies: 11

Hi all

I tried to post this in the Moodle Tracker (but apparently even though I cannot find this issue reported elsewhere in the tracker - I am unable to post there without checking for existing issues - which I have) - although I am not sure this is a bug (it may simply be related to my setup) as others seem to be fine by all accounts.  

Using the data privacy plugin on Moodle 3.3.5, after making a DSAR as a user, and running the cron it hangs at:

Execute adhoc task: tool_dataprivacy\task\initiate_data_request_task
... started 09:38:58. Current memory use 46MB.
Generating the contexts containing personal data for the user...

The CPU use on the database server shoots up - but nothing is resolved.   I have left this for several hours.

I accept that this maybe/likely is an issue with my instance of Moodle but something is causing this process to hang with no obvious cause that I can fix.

Am I alone in this or has anyone else reported this issue?

Moodle version: git pull as of this morning of 3.3.5.

Moodle server: Debian 8 running Nginx and php-fpm5.6

Database server Debian 8 running MariaDB 10.1.32 (accessed via ssh tunnel from Moodle server)


If I cant find anything - I am going to try a rebuild of a new virtual server and try again on 3.3.5 - and then 3.4 and see if either works ok for me.  I DONT want to go to 3.4 yet (plan to go to 3.5 over summer) as I don't want to install php7 on my production server mid year - so I hope that 3.3.5 can work.




Average of ratings: -
In reply to Tim Gildersleeve

Re: data privacy plugin 3.3.5 hanging in cron

by Tim Gildersleeve -

I should add that the request in question stick at "pre-processing".    This worked ok on a test system with no data - but when trying it with a copy of our REAL data the above happens.

One potentially relevant bit of information is that on my test system while I have a copy of the real data - I do not have the actual "files" form the moodledata folder.  

In reply to Tim Gildersleeve

Re: data privacy plugin 3.3.5 hanging in cron

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Hi Tim,

I suspect that one of the Privacy implementations is executing a particularly difficult query, and that is why it is stuck.

We currently don't have any debugging in there to show which plugin is executing (I have just created MDL-62371 for this).

If you can, I'd suggest opening the privacy/classes/manager.php file, find the get_contexts_for_userid function, and add the following just after the foreach line:

mtrace("Processing {$component}");

Then run the cron task and have a look at the output. That should tell you where it's getting stuck and help us to track it down.

The other thing you can do is to add look at your DB server and see what query it is getting stuck on.

You mention that this is a test system, does that mean that the DB server is not well tuned?

In reply to Andrew Lyons

Re: data privacy plugin 3.3.5 hanging in cron

by Tim Gildersleeve -

Hi Andrew

Thanks for your reply, I have just added that line and am running it now but have to go to a meeting in a moment so will be looking at the results when I come back.

The database resides on the same server as the live database - but obviously a separate db.   The server (VM running on VMware virtual infrastructure) has 4 cores (if I remember correctly) and 24GB RAM, it powers our live Moodle that is well used so I dont think there is a problem there.

I will let you know the results of the cron when its run next.  Thanks again.


Tim

In reply to Tim Gildersleeve

Re: data privacy plugin 3.3.5 hanging in cron

by Tim Gildersleeve -

Ok, I think I can see the issue.   

Thanks again for the debug code Andrew.   I think our issue is our tutors - who horde old assignments (this will have to change with GDPR).   For Turnitin assignments I delete ALL assignments (after moving the current year to an archive server) and they have to recreate each year.  For standard Moodle assignments however they have some assignments that span multiple years and so I have not in the past been able to remove them on the year end rollover.

Needless to say we probably have a "LOT" of assignments - probably at least half are defunct (likely many more).   The debug code not surprisingly is now stuck at "Processing mod_assign".   I will leave it running to see if it actually DOES finish eventually!

I am going to have to address this issue with the assignments I think before we can use this tool.  For testing purposes, I may just delete all assignments on the dev server and ensure that there are no other issues but I think the issue is the amount of assignments.

Just looked into this and there are only around 4000 assignments - but as they are reused each year and currently submissions are not cleared there are nearly 94,000 submissions so I expect this is where the processing bottleneck is.


In reply to Tim Gildersleeve

Re: data privacy plugin 3.3.5 hanging in cron

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Hi Tim,

Thanks for tracking that down.

I've been running some exports today on one of our larger test datasets and I've identified a couple of places where there are some potential performance issues - this is the most noticeable one.

We're actively working on these and I hope to finish finding and fixing these in time for the release.

Andrew

In reply to Andrew Lyons

Re: data privacy plugin 3.3.5 hanging in cron

by Tim Gildersleeve -

Hi Andrew

Any increase in performance would be great.  I am sure the nature of our data is the largest cause of this but as it stands at the moment - I would probably not enable this functionality on our live server but rather deal with requests on a separate server as part of any incoming DSAR requests that come in.  That way I can schedule cron to run less frequently on that server.

As it is currently the cron is still running from yesterday and is currently sitting on mod_lesson.   This one surprises me as I am not aware of our tutors using lessons much (seems I may be wrong!).   It is then progressing - but its taken several hours so far I am very interested to see how long it does take in the end.  Once it has finished I will repeat the process on another test server with a local database rather than a remote database over an ssh tunnel and see if the overhead is causing issues there.



In reply to Tim Gildersleeve

Re: data privacy plugin 3.3.5 hanging in cron

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Thanks Tim,

We can definitely improve it - we're making some changes in MDL-62384 and we are going through all plugins and subsystems to check for others.

The nature of the issue is that, essentially some of the database joins lead to the inclusion of too many rows and we are filtering too late. The fix is relatively simple, and we've nearly finished the fixes. On my sample dataset the time for mod_assign went from over 30 minutes (I cancelled before it completed) to a few seconds. I would expect other affected locations to be similar.

Andrew

In reply to Andrew Lyons

Re: data privacy plugin 3.3.5 hanging in cron

by Tim Gildersleeve -

That sounds REALLY good news Andrew,  I look forward to looking at the difference once that's available.

I also have some questions about the download students will get and if its possible to make sure students cant download the data directly as someone (hopefully not me!) will have to go through the data to redact anything that has anyone elses data in (thinking here about forums etc) but I will raise a second post around this if I cant find something.

The whole GDPR is great - and a challenge is always interesting - but it really is a "rabbit hole" once we start looking at it I'm thinking!



In reply to Tim Gildersleeve

Re: data privacy plugin 3.3.5 hanging in cron

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Hi Tim,

At this time I don't believe we've implemented the ability to prevent a user accessing their download. It is on the roadmap for the future.

That said the GDPR download has been designed to only include the relevant data that a user has contributed, and any additional necessary contextual data.

Andrew

In reply to Andrew Lyons

Re: data privacy plugin 3.3.5 hanging in cron

by Tim Gildersleeve -

Thanks Andrew,

Does the download not include things like forum posts that the student may have quoted another student etc?   I have read a lot of differing views on how forum posts should be viewed in light of GDPR, but to me I think they should be treated in the same way as email.  With a DSAR we always have to send the results to HR (currently) who go through the data and redact as needed.   It is a college policy atm to do this so anywhere a student can use free text that may be included in the download from Moodle I am told will mean the download needs to be gone through and redacted (personally I don't envy anyone going though many json files to redact data - but thats not my decision).

In reply to Andrew Lyons

Re: data privacy plugin 3.3.5 hanging in cron

by John Packiaraj -

Hi,

I am having a problem too. 


When running cron, it comes to this particular point:

Execute adhoc task: tool_dataprivacy\task\initiate_data_request_task
... started 08:55:14. Current memory use 35.6MB.
Generating the contexts containing personal data for the user...
  Fetching data from 474 components (Wednesday, 21 November 2018, 8:55 AM)
Processing antivirus_clamav
    Processing antivirus_clamav (1/474) (Wednesday, 21 November 2018, 8:55 AM)
Processing availability_completion
.

.

.

.

Processing qbehaviour_adaptive_adapted_for_coderunner
    Processing qbehaviour_adaptive_adapted_for_coderunner (233/474) (Wednesday, 21 November 2018, 8:55 AM)

After this point, the job just stops.  There is no confirmation that the task completed successfully.


Can you please share some light on this.

Thans and Regards,

JSP