MUC identifiers, proposed change to implementation

MUC identifiers, proposed change to implementation

by Russell Smith -
Number of replies: 2

Hi,

During my implementation of MDL-53213 it realised that MUC identifiers can change the cache pointer of an existing cache eg;

$properties = array('dbfamily' => 'pgsql');
$cache = cache::make('core', 'databasemeta', $properties);

Then;

$properties = array('dbfamily' => 'mysql');
$cache2 = cache::make('core', 'databasemeta', $properties);


This will result in $cache = $cache2, which is not what I would have expected as a developer.  It feels dangerous and counter intuitive.

Proposal:

  • Identifiers and Sharing options be added to the definition_hash.
  • Identifiers would not be able to be changed once a cache::make has been called.  You will need to complete a second cache::make if you need a different cache option set.
  • Remove multikeyprefix.  This would be handled by the store as the singlekeyprefix along with the definition hash if that was interesting to the store.  Some may need it for keys, others may not as definition hashes will have separated it.

This will then uniquely identify a cache by those things and the code above would result in two separate instances.  It also has a benefit of allowing developers to purge a cache based on that new definition.  So in the databasemeta case, I would just purge the pgsql cache, and not all of them.  Developers can then use this function to make identifiers like smaller sub-caches for certain use cases.  I've had a couple where I want the identifier to be a userid or forumid and manage it as a separate cache.

Generate singlekeyprefix would then not be required, the singlekeyprefix is the definition hash.

Then the way definition hash and key hash are handled should be pushed downwards into the store.  A reason for this is something like;

  • Redis can uses hashes for each of the definitions.
  • A filestore can do the same thing.

Memcached would need to use a different strategy of prefixing the definition hash before the key_hash in a single key.  But it has to do that now.  It will still have all the same purge problems.  But other caches would no longer suffer from purging all versions of identifiers and sharing options when a cache is purged.

There is currently very little documentation about the usefulness of identifiers and it's original intended use.  I have spend quite some time working on the caching code and still don't understand the purpose they were made for.  My idea above is much clearer to me.

Possible future direction is also to push the calculation of the key into the store rather than the loader.  The store is the one that needs to handle restriction on the keyspace.  Whether it's size or character options available.  Each store could determine the best rules for it.  This coupled with the definition hash change would let a store decide how it indexes the content we send to it.  File stores can have directories, redis can have hashes, mongodb can do the same kind of thing.  Memory caches can just use keys as-is and memcached could put the keys together as it can't ensure unique keys without that.


I would like feedback on whether others think this is a move in the right direction and if there are any objections or ideas around this proposal.

Average of ratings: Useful (2)
In reply to Russell Smith

Re: MUC identifiers, proposed change to implementation

by Russell Smith -

There has been not feedback on this over a reasonable amount of time.  I'm bumping this for a couple of reasons;

1. See if there can be any buy-in to the idea or comments on it.
2. It has come to my attention through MDL-53875 that session cache is broken and a perfect case for the use of identifiers.

To try to explain 2 a little more;

Session cache for users is just a set of elements split by session_id().  That is the same as a set of elements split by an identifier.  With the capability of the proposal above, you get session caches to be much more like application caches in terms of code and management.  In my view it would reduce the complexity of the code and provide more consistent interface for users.  As MDL-53875 shows, there isn't any testing around session cache that ensures it cleans up correctly or can find the right sessions to destroy.  Work on MDL-55604 also showed there is no testing around identifiers at this time and the reliability of them is only tested by some use of them in the databasemeta cache.  Some small amount of testing was added in MDL-55604, however it is a complicated area.

In reply to Russell Smith

Re: MUC identifiers, proposed change to implementation

by Dan Poltawski -

I don't have the depth of experience with this to by able to offer much, other than saying what you are proposing sounds reasonable and safer, so I encourage you to go forward with it. 

On a general off topic note about MUC - since you seem to be working the most in this area, my personal feeling (without actual technical analysis) is that if we should be doing anything, we should actually be reducing the feature set of MUC and only allowing developers to do very performant operations with it. Anecdotally it seems there is so much functionality that many people don't know how to use and too often we are coming across MUC uses which are utilising its 'flexibility' at the cost of performance (e.g. serialisation impact). On reflection I think it would've been better to restrict the way which data which can be stored in MUC so it was as fast as possible rather than as flexible as possible.

Average of ratings: Useful (2)