Entity List spec

It's my understanding that something like an entity list will eventually make it into core, but I have an immediate need for this and must push forward coming up with an API that I can implement now in a plugin. I think it'd be mutually beneficial to make my implementation as close to the final one as possible, so I welcome input here.

My main idea is that an ElggList should be a lightweight tool rather than being a full blown ElggObject. Two new tables:

  • elgg_lists: list_id (AUTOINC), key_sha1 (char 40, default NULL)
  • elgg_list_items: list_id, item_id (int, NOT NULL), weight (int, default NULL)

API usage:

$list = new ElggList(); // list with NULL alias

...or

$list = new ElggList("user:123|stickyPages");

$list->id // (read only)

$list->setItems($arrayOfGuids); // most efficient way to establish a list

$list[] = $guid; // adds an item row

array_shift($list); // remove first row

You could always reference a list by ID (available via $list->id), but you could store the SHA1 of a string in `key` like in my original proposal. If key is given, it should be unique across lists.

$list->key = "user:123|stickyPages"; // or set it in the constructor

Static methods findById($id) and findByKey($key) that would return null if not found.

If someone wanted a list to have a name/description, they could just make an object and hold a reference to the list's ID in metadata:

$photoAlbum->list_id = $list->id;

Then we need API methods to JOIN the elgg_list_items table inside get_entities in various ways. E.g.:

$listEntities = elgg_get_entities_in_list($list->id);
$markup = elgg_list_entities_in_list($list->id);
$entitiesSortedByList = elgg_get_entities(array(
    'sortByList' => $list->id
));

So... thoughts?

  • I really do not like adding a priority to relationships. What does it mean for relationships that are not involved in lists? Absolutely nothing. It is a hack to accomodate one particular use case. Not only that, but it overrloads the name property of relationships. If I want to group my friends into different collections, I not only have a friend relationship to them but I also have a list:friend:<collection_name> relationship to them. That means to see the names of those collections, I have to parse that information out of the relationship name. If I want to see a list of all my friend collections, we have to expand the relationships API to search by wildcard names.

  • Cash, I think in that case, you'd be storing separate collections of friends as separate entities, no need for wildcard searches or deferment relationships for different collections.

  • Also, I don't think the fact that some relationships are not interested in priority makes this a hack, but then I'm also pushing to consider serialized arrays because we got bit by some problems at Google and serialized arrays would have avoided those problems.

  • @Evan I don't see any way to fetch by a serialized list in order. You have to order the entities after fetching them (ick). Also this makes pagination insanely hard. I actually see most use cases as wanting to specify a default ordering/join type for the collection, and there's nowhere to store that in a relationship.

    I see the point about collection altering touching many rows, but I think alter operations will be rare compared to how often they are used in joins. Plus, with relationships, you'd touch the same # of rows. At least with elgg_collection_items the rows are tiny.

  • I think serialized arrays are a worse hack than overloading relationships :). Burying GUIDs in metadata is never a good thing to do. It requires lots of bookkeeping to prevent dangling pointers. It is also forces you to do the sort in PHP rather than in MySQL.

    It also sounds like you and Kevin are talking about different things. I think Kevin is talking about adding priority to relationships but not using an entity to capture the list (at least that is how I'm reading his comments). Doing that is possible, but requires the overloading of the relationship name as I pointed out in my comment. You are talking about using an ElggObject as the collection object and then using relationships to capture membership in the collection and a sort order. My main complaint still stands - adding something to the relationships table to handle a single use case is a bad idea. We've had requests to add other fields to relationships for specific use cases and have always turned them down.

  • FYI: the reason I prefer "context_guid" to "container_guid" is that I see a collection as more of an abstract listing that's loosely associated with an entity rather than something the entity contains. A particular user might want their own special ordering of a group's pages: it doesn't make sense for the group to contain that list; the context_guid just makes it easier for the plugin to find the collection.

  • @Steve, that is a markedly different use case than I had in mind. Good to know your thoughts.

    So if I can try to distill what we have here...

    Requirements:

    1. Most common operation to optimize for is fetching the ordered list. Reordering is common, but takes a back seat.
    2. Should be able to order the entities in a collection.
    3. Possibly want each user in the system to be able to specify their own custom sort of the collection.
    4. The same entity should be able to be present in many collections.
    5. Want to be able to name/describe lists and maintain access controls
    Use cases
    1. Albums -- sorting photos
    2. Menu -- groups provide custom sort of menu items
    3. Friends -- sorting all friends or various groups of friends
    4. Favorites -- specify top 10 entities
    5. etc.
    Proposals

    1. Serialized arrays (rejected)

    • Too hacky
    • not optimized for most common case

    2. Priority field in relationships table

    • minimal change
    • adds another field for too specific a use case? 
    • Allows sorting for any type of relationship, not just members in a collection (e.g. favorites)

    3. New tables specific to collections

    • faster? (rows are lighter, since they are dedicated to ordering)
    • complicates data model and API too significantly?
    • Can order containees
    I feel like requirement 3 adds significant complexity to this whole discussion. I'm not convinced that users providing a personalized sort of a group's menu items is something we need to support in core.
    I'm still convinced that adding priority to relationships makes a lot of sense. I'm definitely not convinced that adding extra tables and functions to the api is a great way to go. 
    @steve, I'm not clear why adding a new table would make things faster other than the fact that the table has fewer fields. If it's because relationships have varchars then maybe there's a way we can optimize the relationships table rather than adding a new table. 
  • Evan - thanks for putting together the summary.

    For requirement 3, I had been assuming that each user would have their own collection with its sort rather than one shared collection with many different sorts. My example would be the ordering of widgets on a groups page. Each user could use the default ordering of the group widgets or create a collection that has a specified ordering. The collection is a single entity and has a list of entity guids and sort position.

    For proposal 2, point a - it is a minimal change to the schema, but adding new tables takes only marginally more time. I'm not convinced the amount of code to write is any different for these two proposals. We would effectively be overloading the relationships table to support sorting. That's all new code to be written as the current relationships API does not support this.

    For proposal 2, point c - no one has ever asked for arbitrary sorting of relationships before so I'm not sure that merits consideration.

    For proposal 3, point b - I believe the API and data model should be mostly the same for these two approaches. With the relationship approach we're basically layering a collection sorting model on top of the relationship model. The table relationships and collection items don't end up looking that different except that the collection item one is optimized for its use case.

    The user facing API should be the same for both proposals (or we're probably doing something wrong). I think I could write an ElggCollection class 3 ways (custom tables, relationship table + new column, or serialized GUIDs) and keep the same API. (I actually think not using the relationship table leads to a cleaner implementation of the API - I think the queries would be cleaner.)

  • One more point - if this is mostly (or even partially) about not adding more tables, the access collection tables can be replaced by this. They are a special case of an ElggCollection.

  • @all: this is very exciting discussion. Collections have long been a thorn in my side in terms of elgg and containers have had to suffice.

    Personalized collecting and sorting of objects/items would be a superb addition...especially to core. This is what social websites are all about no?. A good use case could be here on the community site with a public collection of groupforumtopics that answer the most often asked questions. The biggest advantage I can see from this concept (from a site owner's perspective), is easier data mining to determine trends, interests etc. 

    From a normal user's perspective...I would love to be able to create a collection of my favourite blogs, or blogs I need for a current project for research purposes that are quick to hand. Honestly, searching through 100,000 blog posts for the 10 I want every time just isn't user-friendly.

    I'm not a MySQL guy so I can't add any thoughts to that part of the conversation, but if this concept is pushed through...I think it should be fully capable of handling things, and not being an "add-on" to a current model. Collections bring a whole new level to elgg in terms of personal organization.

    Apologies for jumping into a conversation where my points are not as "constructive" as they could be, but I did want to state my thoughts in general as a community member.

Feedback and Planning

Feedback and Planning

Discussions about the past, present, and future of Elgg and this community site.