RFC: GDPR/Privacy changes

RFC: GDPR/Privacy changes

by Andrew Lyons -
Number of replies: 3
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Hi everyone,

As many of you will be aware Moodle introduced the Privacy API for the 3.5.0, 3.4.3, and 3.3.6 releases. This API was intended to help developers and administrators to comply with new EU legislation on data privacy, and puts Moodle at the forefront with regards GDPR functionality for Learning Management Systems worldwide.

As part of our commitment to this functionality, and the protection of personal user data, we have since been working on a number of followup issues to fix identified issues, address incorrect assumptions and understandings, and to add new functionality.

Some of these changes include:

  • The ability to make a policy optional (MDL-62309)
  • The ability to promote some policies ahead of others (MDL-63013)
  • Automatic creation of deletion requests when a user is deleted in Moodle (including deletion via CSV, and bulk interfaces) (MDL-62564)
  • An improved GDPR export format (HTML single-page app) (MDL-62491)
  • Improvements to the performance of the export processing stage (MDL-62602)
  • Control over which users can download SAR request data (thanks to Sam Marshall MDL-61652)
  • Allowing a Privacy Officer to make a request on behalf of another user (thanks to Jan Dageförde MDL-62544)
  • More fine-grained control of default purposes to include per module-type defaults (MDL-62554)

One of the other, bigger, changes we are currently working on adds the ability to define role-specific overrides to retention rules. This work is being progress in MDL-62560 and allows organisations to retain content for longer, or shorter, periods based upon the role of the user in a context.

Typically we expect this functionality to be utilised by Universities who wish to retain staff content for longer than student content, though the functionality is reasonably flexible and supports a wide range of use-cases.

Whilst the other changes we have made to the privacy API over the past few months have required no changes for plugin developers, this new per-role functionality does indeed require a change for some plugin developers - specifically those whose plugins store user data and which implement the privacy provider.

The current design of the user data provider interface is such that it allows the API to:

  • determine which contexts a user has data within;
  • delete data for a specific user in a specific set of contexts; and
  • delete data for all users in a specific contect.

In order to support per-role deletion strategies, we need to determine which users are present in the context, and then apply the relevant role filters. This means that we therefore need a way to determine which users have content in a specific context.

To this end, we have created a new core_userlist_provider interface, which contains two functions used to fetch the list of users, and to delete content for a subset of those users.

Similar to the existing core_user_data_provider interface, the new interface has a get_users_in_context function, the concept of a userlist with users being added to this userlist via an add_from_sql call. However, following lessons learned during the original work these both have a slightly different signature and behaviour.

Firstly, the userlist is provided as the sole argument to the get_users_in_context function. This userlist has helpers functions to retrieve the context being queried, and the component being queried. This differs from the existing get_contexts_for_user function where the plugin developer had to create the contextlist and the function arguments included a userid argument only. We felt that it would be clearer to provide the userlist object and doing so allows for a clearer implementation in the privacy manager code. This also make work more lightweight for plugin developers as they do not need to instantiate the list themselves, nor to return any value.

    /**
     * Get the list of contexts that contain user information for the specified user.
     *
     * @param   userlist    $userlist   The userlist containing the list of users who have data in this context/plugin combination.
     */
    public static function get_users_in_context(userlist $userlist);

Secondly the add_from_sql differs following performance lessons learnt and addressed in MDL-62602. We found that many of the existing providers make use of fairly large queries. We found that these did not perform well when run as subqueries (as was done in the original release).
We have modified this add_from_sql call slightly by asking you to provide the name of the field being retrieved as this allows for a more efficient query. We also encourage the writing of a greater number of smaller queries which tend to perform better than one large query with dependant LEFT JOIN statements within it.

    /**
     * Add a set of users from  SQL.
     *
     * The SQL should only return a list of user IDs.
     *
     * @param   string  $fieldname The name of the field which holds the user id
     * @param   string  $sql    The SQL which will fetch the list of user IDs
     * @param   array   $params The set of SQL parameters
     * @return  $this
     */
    public function add_from_sql(string $fieldname, string $sql, array $params) : userlist {

This body of work is still a work-in-progress, but as we have started to implement it for a number of the existing providers, we feel that it is largely correct and do not anticipate making many changes to this part of the code.

In order to support this functionality, plugin developers will need to extend the new core_userlist_provider interface, in addition to the plugin/provider that they already implement.

You can see the new Provider Interface, and an example of its usage for the Forum. Note: These are still a work-in-progress, and there may be minor changes to the functionality over the coming week or so.

Any constructive feedback on these changes would be greatfully appreciated.

Best wishes,

Andrew Nicols

Average of ratings: Useful (5)
In reply to Andrew Lyons

Re: RFC: GDPR/Privacy changes

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Thanks for this useful post Andrew.

Regarding the new interface - we have quite a few (mainly not published) plugins, even if you only count the ones with non-null providers, and we are keen to avoid having to change them all again. smile

We don't currently need the role-specific retention feature although I can imagine maybe it might be useful at some future point.

So my question is, will plugins that have been updated to support the original privacy API continue to work without implementing the new interface, although obviously without the role-specific retention options working? If so, will this continue indefinitely, or is it considered deprecated so that at some future Moodle version we will have to implement it?

A minor comment about the new API - basically seems very sensible. Trivial comment regarding this line:

@param   array   $params The set of SQL parameters
The phpdoc should define what type, i.e. explicitly say they should be named parameters, which I assume is the case...

--sam


Average of ratings: Useful (1)
In reply to sam marshall

Re: RFC: GDPR/Privacy changes

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Hi Sam,

So my question is, will plugins that have been updated to support the original privacy API continue to work without implementing the new interface, although obviously without the role-specific retention options working?

At this point, if you do not implement the interface then no content will be deleted until all roles have expired (I call this 'fully' expired).

It will currently fail silently (though I intend to have it add to the log with a warning). We considered a poor-man's fallback, but decided against this due to performance concerns (calling get_contexts_for_user for every user and adding a context restriction to the results).

If so, will this continue indefinitely, or is it considered deprecated so that at some future Moodle version we will have to implement it?

At this time we haven't quite decided. This is being implemented as an additional interface to implement rather than a replacement for the existing interface. Ideally we will require this be implemented at some point in the near future. The changes, in our experience, are pretty minor.

The phpdoc should define what type, i.e. explicitly say they should be named parameters, which I assume is the case...

Thanks, but it's entirely up to you what they should be. You can choose named, or placeholder because you are the one supplying both the SQL and the params. You pass this into the add_from_sql function which just wraps it in a sub-join on the user table:

            SELECT DISTINCT u.id
            FROM {user} u
            JOIN ({$sql}) target ON u.id = target.{$fieldname}";

And here's an example of selecting authors of all forum posts in a context:

        $sql = "SELECT p.userid
                  FROM {course_modules} cm
                  JOIN {modules} m ON m.id = cm.module AND m.name = :modulename
                  JOIN {forum} f ON f.id = cm.instance
                  JOIN {forum_discussions} d ON d.forum = f.id
                  JOIN {forum_posts} p ON d.id = p.discussion
                 WHERE cm.id = :instanceid";
        $userlist->add_from_sql('userid', $sql, $params);

Note: The first param is userid which is the {$fieldname} in the SQL.

Thanks for the feedback,

Andrew

In reply to Andrew Lyons

Re: RFC: GDPR/Privacy changes

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Aha! I assumed the query it would join with would need params but I was wrong. smile (Maybe the phpdoc should explicitly say that either type is allowed, but I guess it doesn't say that in every dml function...)

I get that the changes are probably minor - we have 40 custom plugins with non-null providers, which I realise is smaller than the number in core, but it might still take us a while...

The point about showing warnings if there's data not being deleted (and maybe indicating the lack of support in the plugin registry page?) sounds useful.

--sam