ElggCollection plugin implementation

The entity list spec thread helped me a great deal, but I now have a working plugin implementation which supports creating/finding collections, modifying the items within, and using collections to modify elgg_get_entities() queries. There are also some simple tests.

  • Using query modifier objects allows the behavior of the collection to be applied to queries even if the underlying collection cannot be loaded. I also like that it makes the behavior explicit in code.
  • The model for altering queries with collections is passing in ElggXQueryModifier instance(s) via $options['xcollections'] to the family of elgg_get_entities() functions.
  • Actually, a function is provided to convert $options["xcollections"] to other array values that Elgg supports today.
  • Since this is based on ElggObjects, no core modifications are needed, although one table "xcollection_items" is created during activation.

Feedback much appreciated.

  • Looks like you are off to a fast start on this!

    Caveat: I only glanced at the code so far. Will look at it in more detail soon.

    1. Your additional table is an intersection table. It shouldn't need that id key unless you wanted to support an item being added to the list more than once (and I don't think we want to support that).
    2. We're upgrading comments to an ElggObject in Elgg 1.9 so the list can be entity-focused (I remember seeing something about storing annotations in the list)
    3. You are using bigint(20) for the priority. That's a lot of items in one list (sextillion?)
    4. Have you thought about implementing some of PHP's array interfaces?
  • @Cash:

    1. I agree. How should we reach a consensus on not allowing duplicate items per list? If so, I assume we also need a unique key on (guid, item).

    2. I've already got a request for sticky river items, and I'd hate to arbitrarily limit the collection's use. My plan is to remove "items_type" from the collection completely and put the onus on the API user to apply the collection correctly to queries. Imagine there'd need to be a slightly different API for river items, but at least it should be possible.

    3. Added priorities are stepped by 100 and can be negative, and when items are added between 2 items, they're spaced evenly apart. This makes it rare to have to touch more rows than the number of items inserted/deleted. Considering this, if we can get by with INT SIGNED I'm happy to do that. Small consideration: if a user treats a collection as a FILO stack (e.g. regularly pushing and shifting), the set of priorities would drift up. I guess I need something to look out for the case where they reach the column's max supported value.

    4. ElggData already implements ArrayAccess on the attributes, so I don't want to break anything that relies on that. Also the items modification API is already pretty easy, and I'd hate to make it seem just like an array since all operations require at least one DB query. A particularly nasty example would be if someone used usort().

  • 1. What Elgg needs is a lightweight Request For Comment on design issues like this. A unique key would prevent duplicates at the database layer so that's the most foolproof.

    2. I think we will have to upgrade river items to entities eventually. There are too many requests for adding additional fields or adding commenting to continue with the current solution. Need to do some performance testing to see what the effect of that would be.

    3. I need to look at the managing of priorities. As you say, we need code in there to manage max priorities anyway.

    4. Too bad the original developers implemented ArrayAccess already. Doesn't seem that useful compared to doing a foreach over a collection.

  • @Cash, agreed on 1-3. On point 4, consider what calling foreach would do under the hood: make N offsetGet() calls, each requiring a separate query having to select all the collection's item rows, order by priorities and LIMIT to cut out the individual item. I think the API should be easy to understand but not hide that these calls generate queries.

    Currently you can already do foreach($collection->sliceItems() as $item). Pretty clear I think.

    I went back and forth on whether to have the collection update the items immediately or only on save() and it seemed much simpler to just apply changes immediately than to have to load the whole collection in memory and figure out how to optimize rewriting it on save. And I think the vast majority of use cases will be single operations per request.

    I should also disclose the ElggXCollection constructor also behaves a little differently than other core entities in that, if you're using it to create a new collection, the collection entity is saved immediately. Otherwise the API would have to support handling calls to the items methods before having a GUID established (it would have to juggle an array internally until the initial save). Is this OK?

  • Update: I've just implemented applying collections by plugin hook. With 3 lines of code you can add support for applying collections (even multiple ones) to a elgg_list/get_entities call. Once you've created your $options array, think up a "name" for this query and add seomthing like this:

    if (elgg_plugin_exists('elgg_xcollection')) {
        elgg_xcollection_hook_into_entities_query($options, 'pages_group_widget', array('group' => $group_entity));
    }

    This calls the plugin hook ['apply', 'xcollection'] and sends in the query's name (pages_group_widget) in $params so that handlers can look for it (the third parameter is optional, but is merged into $params). The adding collections via hook handler is easy:

    function my_handler($hook, $type, $returnvalue, $params) {
        if ($params['query_name'] == 'pages_group_widget') {
            if (isset($params['group']) && $params['group'] instanceof ElggGroup) {
                $returnvalue[] = elgg_xcollection_get_sticky_modifier($params['group'], 'stickies');
            }
        }
        return $returnvalue;
    };

    There's a similar function for elgg_get_river.

  • FYI, the README now has some preliminary documentation with syntax highlighted code samples.

  • Next steps:

    • Decide: Can a collection can have duplicates?
    • Review the form of access: Collections are associated by a string key to a container entity. If you can see both the container and collection, you can access it for use in queries.
    • Review permissions model: Only if you canEdit the container can you edit the collection
    • Review query modification design: A modifier object wraps a collection to programmatically determine the behavior of the query w/r/t the collection; the modifier is responsible for altering the $options array to join in collection(s) and set ordering. This design allows proper handling of cases where the underlying collection cannot be accessed.
    • Review the plugin author API, which allows authors to let other plugins alter their queries along with applying their own collection.
    • Decide: Should API expose priority values (only used for ORDER BY) outside
    • Remove need for the `id` column in the alteration methods
    • Remove the `id` column from the items table
    • Change BIGINTs to: MEDIUMINT or INT?
    • If keeping the current insertion strategy (minimizing item row writes), when need we check for priority values approaching MAX/MIN INT (to trigger a "recentering" of the priorities around 0)?
    • Determine how this would be integrated into core
    • Determine what kind of views/actions should come with this feature
    • Consider the reordering UI: how to handle reordering if collection is so large it must be paginated?
  • BTW, the current version is powering a feature on one production site so far. It allows users to add/remove a particular subtype of object into their personal collections, and later it'll allow site admins to highlight particular river items to show as sticky in the stream (but only the first page).

    The production code isn't public but I could pull it out to a proof-of-concept plugin.

  • Starting back up on this. I've reimplemented it using relationships and using the relationship ID as "priority". (Why not?) Have not tested yet but I expect it to work just as well as the custom table. I can switch back any time. Big improvement came from separating out concerns: a manager for fetching/creating collections, shrinking the collection to more or less a value object, and having a separate wrapper for editing items.

    The branch also has a query modifier interface that allows some nice new features, like appling collections easily and naming queries to allow them to be altered by plugins (without hijacking the whole page handler).

Feedback and Planning

Feedback and Planning

Discussions about the past, present, and future of Elgg and this community site.