Entity List spec

It's my understanding that something like an entity list will eventually make it into core, but I have an immediate need for this and must push forward coming up with an API that I can implement now in a plugin. I think it'd be mutually beneficial to make my implementation as close to the final one as possible, so I welcome input here.

My main idea is that an ElggList should be a lightweight tool rather than being a full blown ElggObject. Two new tables:

  • elgg_lists: list_id (AUTOINC), key_sha1 (char 40, default NULL)
  • elgg_list_items: list_id, item_id (int, NOT NULL), weight (int, default NULL)

API usage:

$list = new ElggList(); // list with NULL alias

...or

$list = new ElggList("user:123|stickyPages");

$list->id // (read only)

$list->setItems($arrayOfGuids); // most efficient way to establish a list

$list[] = $guid; // adds an item row

array_shift($list); // remove first row

You could always reference a list by ID (available via $list->id), but you could store the SHA1 of a string in `key` like in my original proposal. If key is given, it should be unique across lists.

$list->key = "user:123|stickyPages"; // or set it in the constructor

Static methods findById($id) and findByKey($key) that would return null if not found.

If someone wanted a list to have a name/description, they could just make an object and hold a reference to the list's ID in metadata:

$photoAlbum->list_id = $list->id;

Then we need API methods to JOIN the elgg_list_items table inside get_entities in various ways. E.g.:

$listEntities = elgg_get_entities_in_list($list->id);
$markup = elgg_list_entities_in_list($list->id);
$entitiesSortedByList = elgg_get_entities(array(
    'sortByList' => $list->id
));

So... thoughts?

  • Here's the db schema steve proposed if I'm understanding him correctly:

    context_guid | type | item_guid | priority

    Here's the schema I would add if we were adding another table

    collection_guid | item_guid | priority | time_created

    To me, these both just look suspiciously like the proposed relationships schema

    subject_guid | relationship | object_guid | priority | time_created

     

    Something else to consider which I'm personally scared of is data conversion on upgrades. I'd like to avoid that wherever possible. It seems that adding a new table would effectively force us and any plugin authors implementing collections that want to take advantage of ordering to write scripts migrating the data to the new tables. However, if we add the field to relationships, then plugin authors would instantly be able to take advantage of custom sorting with minimal data migration. Let me know if I'm missing something here.

    Plenty of collections today already use relationships as the underlying data model for tracking membership (as they should). If we can avoid having to change that, that removes a huge barrier to adoption in my mind. 

    @Steve, If performance is a concern, I think there are things we can do to optimize the relationships table and make all relationship-related queries faster, rather creating a separate table to optimize for this one use case.

    I get that many relationships would not necessarily use "priority", and we certainly shouldn't force them to, but I can also think of non-membership relationships that might find this useful, such as active plugins on a site, featured items in a group, favorite items of a user, etc.

    Brett hasn't chimed in yet; interested to hear from him. Glad we're having this discussion.

  • I get that many relationships would not necessarily use 
    "priority", and we certainly shouldn't force them to, but
    I can also think of non-membership relationships that
    might find this useful, such as active plugins on a site,
    featured items in a group, favorite items of a user, etc.

    The items in your list there are all collections which makes me think we have a fundamental misunderstanding in this conversation. I would list those as making the point that priority makes sense for collections/lists and do not make sense for arbitrary relationships.

    Two other counter points:

    1. An ordered list of items in a collection and relationships between two entities are modeling different data so it is okay to have different tables even if the schemas are exactly the same.

    2. There would be data migration in either case. Take Tidypics as an example. Every album would need to have its ordering moved from metadata to relationships (or to a new collections table). The migration could should look the same assuming the APIs look the same between the proposed implementations.

  • I've made a new proposal outlining an ElggCollection as I imagine it at this point, including permissions, location, usage, and applying collections to existing queries via plugin hooks. One case we didn't consider is ordering comments/discussion replies (the collection items should be able to reference annotation ids as well).

  • A lot to take in, but here's a latecomer's observations and opinions:

    1. I agree that this shouldn't be on the relationship table because of the overloaded data model. If everything else were even (performance, API, difficulty of integration) I think the suggestion of storing priority in the relationships table would be rejected immediately.
    2. I think it's clear that ElggCollection needs to be an ElggEntity because of the automatic benefits that come along with entities: access controls, notifications, etc. This means a new top level entity table.
    3. There hasn't been much discussion about the interface. I'd argue this should be the most important part of the discussion because as Cash said the API would look the same regardless of implementation.
    4. Someone mentioned this could replace ACLs. That would make integration more complicated.
    5. I go back and forth about Steve's current proposal as being too complex. It's simple enough to understand, but there seems to be a lot of "stuff" going on in those tables with the order_direction, filter_type, and items_first fields.
  • 4: I think the point was just that ACLs could be ElggCollections if unreasonable aliens demanded an Elgg with fewer tables. ACLs needn't be touched.

    3/5: I think where my last proposal went wrong was storing the behavior (how a collection would affect a result set) in the collection. A collection is data, not behavior, and keeping behavior explicit in code would be wise for a few reasons.

    My OOP instinct suggests a separate object should define behavior and wrap the collection before it's passed into the query. E.g.

    $ordering = new ElggCollectionOrderer($collection1, 'asc', 'nulls_last');
    $filter = new ElggCollectionFilter($collection2);
    $entities = elgg_get_entities( ... 'apply_collections' => array($filter, $ordering) ... );

    Here it's clear at use time that $collection1 just supplies order whereas $collection2 has the more essential role of selecting the entities to return. If the $collection1 were inaccessible, who cares, but a missing $collection2 should lead to an empty array being returned.

    A more Elgg-y API might look like:

    $container = $vars['entity']; // user or group
    // build a list of collections to apply
    $collections = array();
    $collections[] = elgg_get_filter_collection($container, 'fave_pages');
    $collections[] = elgg_get_ordering_collection($container, 'stickies', 'asc', 'nulls_last');
    // allow plugin authors to add/alter collections
    $collections = elgg_trigger_plugin_hook(
        'collections:apply',
        'entity',
        array('container' => $container, 'query_name' => 'fave_pages'),
        $collections
    );
    $entities = elgg_get_entities( ... 'apply_collections' => $collections ... );
  • I've started work in this repo. All "collection" references are "xcollection" so that this could (hopefully) work beside the native ElggCollection once it's core.

    The biggest hurdle is that there's no way to add entity types at runtime. My plugin has to fork some functions like get_entities and add to the ENUM `type` column in elgg_entities. While this works (saving/deleting updates the appropriate tables), metadata and any other facilities that call the native get_entity() and friends will certainly fail.

    Because of this pretty major limitation I'm considering rewriting this based on ElggObject. The API possibly wouldn't need to change, but it would be more work porting it to core later.

  • One thing we can easily change in 1.8.x is to have a central place to maintain the parallel set of type strings to the ENUM. For example, I see that elgg_get_entity_type_subtype_where_sql() has its own list of entity types. elgg_register_entity_type() also has its own list. Let's fix this so that this list is stored in the config object.

    That would leave entity_row_to_elggstar() as the only function you would need to fork to continue your development.

  • Just linking to follow up thread since searching for "ElggCollection" doesn't find it!

Feedback and Planning

Feedback and Planning

Discussions about the past, present, and future of Elgg and this community site.