Hi,
During my implementation of MDL-53213 it realised that MUC identifiers can change the cache pointer of an existing cache eg;
$properties = array('dbfamily' => 'pgsql');
$cache = cache::make('core', 'databasemeta', $properties);
Then;
$properties = array('dbfamily' => 'mysql');
$cache2 = cache::make('core', 'databasemeta', $properties);
This will result in $cache = $cache2, which is not what I would have expected as a developer. It feels dangerous and counter intuitive.
Proposal:
- Identifiers and Sharing options be added to the definition_hash.
- Identifiers would not be able to be changed once a cache::make has been called. You will need to complete a second cache::make if you need a different cache option set.
- Remove multikeyprefix. This would be handled by the store as the singlekeyprefix along with the definition hash if that was interesting to the store. Some may need it for keys, others may not as definition hashes will have separated it.
This will then uniquely identify a cache by those things and the code above would result in two separate instances. It also has a benefit of allowing developers to purge a cache based on that new definition. So in the databasemeta case, I would just purge the pgsql cache, and not all of them. Developers can then use this function to make identifiers like smaller sub-caches for certain use cases. I've had a couple where I want the identifier to be a userid or forumid and manage it as a separate cache.
Generate singlekeyprefix would then not be required, the singlekeyprefix is the definition hash.
Then the way definition hash and key hash are handled should be pushed downwards into the store. A reason for this is something like;
- Redis can uses hashes for each of the definitions.
- A filestore can do the same thing.
Memcached would need to use a different strategy of prefixing the definition hash before the key_hash in a single key. But it has to do that now. It will still have all the same purge problems. But other caches would no longer suffer from purging all versions of identifiers and sharing options when a cache is purged.
There is currently very little documentation about the usefulness of identifiers and it's original intended use. I have spend quite some time working on the caching code and still don't understand the purpose they were made for. My idea above is much clearer to me.
Possible future direction is also to push the calculation of the key into the store rather than the loader. The store is the one that needs to handle restriction on the keyspace. Whether it's size or character options available. Each store could determine the best rules for it. This coupled with the definition hash change would let a store decide how it indexes the content we send to it. File stores can have directories, redis can have hashes, mongodb can do the same kind of thing. Memory caches can just use keys as-is and memcached could put the keys together as it can't ensure unique keys without that.
I would like feedback on whether others think this is a move in the right direction and if there are any objections or ideas around this proposal.