Just trying to undertsand whats going on

So I am currently picking apart the site wide categories plugin (as a learning exercise) and have come across some code that I am not sure how it is doing what it is doing.

In the 'save' action it is simply taking the tag_array and saving it to $site->categories ($site = $CONFIG->site).

However when I look at $site (firePHP, xdebug) there is no evidence of the member variable 'categories' inside the $site object.  Now it seems as though in ELGG, when you set a member variable on an ELGG entity, is not actually setting a member variable but is actually saving it directly to the database.  Which seems to be making it difficult to track what's going on when using a debugger.  Upon further investigation it appears that this data is actually getting saved as metadata attached to the site object.

Is this the case?  Is ELGG somehow overriding how member variables are set and retrieved and using that mechanism to create metadata?

Also if that is the case, why is the site wide categories designed this way (more of a philosophical discussion)?  If I am understanding this correctly, the list of site wide categories are actually saved as metadata attached to the site.  Then when an entity is created/edited any selected categories are saved to the individual objects as metdata as well.

Now assume I wanted more info associated with my categories (descriptions, logos, etc.).  This model seems to not work (unless you can add metadata to metadata). I guess I could have just created the categories as their own entity.  Then in order to assign categories to other entities (blogs, files, etc.) I could have used the built in grouping functions to make the relationship.  Any thoughts on this?  I am just trying to take some of the simpler plugins and trying to re-engineer them (logically and sometimes physically) in order to better understand the ELGG architecture.  These excersises help me, maybe they will help others here as well.

  • Thanks Kevin, I did read that page which was what gave me the clue as to what was going on and where to look to find where this data was going.  I was more curious as to how ELGG was able to do this.

    But really I was hoping this post would trigger discussions about architectural choices and when and why you would use one mechanism over the other when designing a plugin.  Why was metadata chosen over the entity/grouping model I mentioned?  Was it just a matter of simplicity or was performance an issue as well?  The model I mentioned seems like it would have more overhead, but how much more?  The documentation is great at explaining the what, but not as good at the how or the why.  There is some (annotations are great for ratings and comments, metadata is good for attaching DOB) but I guess I am looking for more meat.

    Not having designed this framework myself from the ground up (and thankfully so) there are a lot of questions about implementation and how certain mechanisms are actually working behind the scenes and which ones are more efficient for this task vs that task.  Most users probably don't care and just use the high level methods without question.  Unfortunately, for me, and possibly you all :) too, my brain won't let me do that.

    This is my first real venture into PHP and complex web frameworks so tracing the code has been a fun learning process (just learning all the right tools to finally get the visibility I wanted was a process). The more I get into it, the more complex (and well thought out) I realize this framework is.  But there are still of lot of unknowns mostly in the implementation side of things (when is this called, why is that called, why is this called so often, how and where are all these variables getting passed around, what information is available to me in these variables).  I am probably a little too caught up in the inner workings than I should be (the main reason for choosing a framework is to not have to deal with all that) but my systems engineering background makes it difficult to leave it alone.  I am used to having these answers when designing a system.

    I guess in the end (after subjecting you all to my excessive pontifications) I am just looking for a good forum to discuss architectural choices.

    Thanks again Kevin.

  • One of the strongest points of Elgg is that you can approach a coding project in many different ways.  That said, one of the biggest stumbling blocks of Elgg is that you can do one thing a hundred different ways!

    For example, to make a connection between two entities you can use relationships, metadata, private settings, annotations, or container_guids.  Each of these have slightly different feature sets.

    • Relationships allow many to many connections between entities.  You can have any number of ElggEntities related to any number of other ElggEntities.  Relationships don't support metadata, annotations, private settings, or access levels.
    • Metadata also allows many to many connections between entities.  You can say $entity->parent_entity_guid = $parent_entity->getGUID().  This basically duplicates the functionality of relationships and isn't the best practice, but I've seen it done.  A major difference is that metadata supports access levels.  You cannot have metadata/annotations on metadata.
    • Private settings are very similar to metadata, but are unsearchable and by default only available within the code, while metadata is exported by default.  Private settings don't support access levels.  You cannot have metadata/annotations on private settings.
    • Annotations are also very similar to metadata, but include helper functions to perform some basic arithmetic on the annotations (average, max, min, sum).  You cannot have metadata/annotations on annotations.
    • Container guids are only truly supported since 1.7 and allow a many to one relationship. That is, entities can only have a single container guid, but any number of entities can below in the same container.  This approach allows you to put metadata on the containing entities OR the container entity.

    Categories is a good example.  They could be implemented any of the above ways, but I'd say there are two "right" ways: Use metadata (as it is currently implemented) or use relationships.

    As you found, one of the drawbacks of using metadata is that you cannot give any more information to a category than its name.  Were you to implement categories as entities with a relationship "category" to the site entity, you would have the full spectrum of options available to entities also available to your categories.  You could assign descriptions, annotations (a count of how many entities are in this category, perhaps?), metadata, access levels, etc.

    Why were categories implement like this?  I don't know--I didn't write it.  Were I to guess, I'd say it's because categories only need a lightweight implementation for the majority of users.  This isn't saying metadata has less functional overhead than any other approach, but it does have slightly less programmatic overheard.

  • @Brett - Thank you so much for this summary.  This is very informative.  So it sounds like, from an overhead point of view, there is not much of a difference.  If this is true then, the decisions for using one or the other would purely be based on the needed functionality (or preference in some cases where, say, I get the same functionality whether I use metadata or annotations but I think metadata is cleaner or I can do it with less code).  

    So I have not yet read anything about private data or Container GUIDs.  Is there somewhere I can go to learn more about those (if not in documentation then somewhere in the code where it is implemented). 

    You mentioned :

    Were you to implement categories as entities with a relationship "category" to the site entity, you would have the full spectrum of options available to entities also available to your categories.  

    What would be the function of having the relationship "category" with the site?  When wanting a list of site categories, couldn't I just search for all entities of type "custom"?  Or is there a value in that relationship that I am missing?  The relationship would seem like it would need to be with the other entities (a "blog" has a relationship "categorizedAs" with "CategoryA"--or something similar).

    If I understand a container guid, that would be a clean/simple way to implement, say, a sub-category.  Instead of creating another relationship that ties two category entities with an "issubcategoryof" relationship, I could just link them by adding the main category's GUID to the sub category's entity as its ContainerGUID.  Is that right or am I missing the point of these?

    Thanks again Brett.  Maybe when this discussion concludes I can summarize it and add it to the documentation as kind of an entity usage summary.

  • So another thing I noticed is that there doesn't seem to be any mechanism in place for viewing all entities with a certain "category".  It looks like there should be (or used to be) because the "categories/view" 'view'  (and others) is creating a search URL with the tagtype of "universal_categories" but clicking on that link does not produce any result (it looks like the search plugins's index.php used to look for a tagtype param but it is currently commented out).

    http://ELGGHOST/search?tagtype=universal_categories&tag=toys

    I was just looking for an example of how to list a group of entities based on a common value in their metadata but found this instead.

  • I need to clarify the previous post.  I realize you can use any of the get_entities_from_metadata() to get the list of entities in code.  I was focusing on the fact that the mechanism that the categories plugin is relying on in the search plugin seems to be missing.

  • First, the easy part:  Elgg's search changed in 1.7 and still has some bugs.  Previously "tags" meant "any metadata on an object" but that has changed to "any metadata on an object registered as a tag."  It looks like site categories needs to register its metadata names as tags to be included in the search.  This will need to be looked at for 1.7.1/1.8.

    Container guids aren't well-documented because they were only recently supported correctly.  Basically, each ElggEntity can have its container_guid point to another ElggEntity.  A photo gallery (like Tidypics) is a good example: ElggPhotoAlbum extends ElggEntity, then ElggPhoto extends ElggEntity with ElggPhoto->container_guid = ElggPhotoAlbum->getGUID().  You can then easily get all entities (photos or maybe even other albums) within an album by asking for entities with the container_guid of ElggPhotoAlbum->getGUID().  Another good example would be hierarchical file browsers with folders containing files and other folders.  Container guids would work well to have sub-categories (and even sub-sub-categories!).

    Private data is basically unsearchable metadata on entities and is accessed by ElggEntity::getPrivateData() and ElggEntity::setPrivateData().  Biggest benefit here is it's not exposed by default.

    For the site categories discussion, you do not need a relationship, and indeed can simply return all entities with a certain type/subtype.  However, a relationship is arguably more conceptually accurate for defining a connection among entities.  (You said you wanted philosophy, right? ;) )  Disassociated entities of a certain type/subtype don't necessarily mean anything semantically, but a relationship of "categories" to the site has more intrinsic meaning.  With that in mind, I consider either method to be perfectly acceptable, and honestly probably would implement it using type/subtype because it is slightly simpler in code.  (And simple is always good.)

    We're always looking to improve the documentation, so I'll never tell someone not to write a summary :)

  • So regarding tags vs metadata, it seems that all is in the state of change right now.  The get_tags() methods still seems to effectively get the site wide categories (even though they are not flagged as tags).  So the direction seems to be heading toward making tags a special subclass of metadata that has its own set of methods specific to tags?

    I do like the container GUID concept.  It looks like pre-1.7 they were used here and there (as metadata possibly). So in 1.7 the attribute was added to the entity class (making it more visible as a viable option and available by default).  What is the current best practice for a top level container then?  It seems that some areas of the code set the container to the owner that created it (mirror the owner GUID).  It seems that it might make sense at times to make a top level entity have the site GUID as the container GUID.

    Using the categories concept to illustrate this, it seems that if we had multiple levels of categories (sub-categories) then the main categories would use the Site GUID for its container guid.  Then any sub categories would use a main category's GUID for its container GUID (going as many levels deep as appropriate for your site).  This model would get rid of the disassociated entities issue while keeping the code simple.  Then you could use a relationship to assign the categories to other entities in the site (files, blogs, etc.).  Retrieval would be simple (get all categories whose Container GUID = the site GUID and then iterate through what was returned to get the rest).

    Next there seems to me that there would be an issue with making sure that these categories are included in the search, along with their associated entities, without some modifications to how search works (relationships are probably not included in the current search, or am I wrong?).  Maybe you would need to implement both architectures. The entities/container GUID's architecture are used to manage (create, edit, delete) the complex category/sub-category entities (remember categories may have pictures/descriptions attached to them).  But as part of this process, metadata tags are created that flatten the category relationships into a single tag (tag = "food/vegetables/peas") that was used for display and searching.   This is probably getting a bit complex at this point (especially if the search issue I mentioned is not really an issue).

    Regarding private data, I guess I really don't understand the need for these.  This is probably due to the fact that I am not sure what you mean by them not being "exposed by default".  How are metadata and annotations exposed by default?  Don't those have to be accessed though code first before they are accessible to the end user (though the browser)?  Or are they exposed somehow automatically though the session and are available to a smart user?  Or is it simply a matter of how the search plugin is designed (it will search on everything, except private data)?

    Thanks again.  This discussion has been very helpful to me (and I hope others) so far.

  • Tags and metadata:

    You are mostly correct--get_tags() will check an entity for a metadata name and find all values associated with it.  To prevent arbitrary metadata searches, 1.7 excepts the metadata names to be registered.  The special tag-related functions have been there for a while, so it's nothing new...

    Container GUID:

    Saving an entity with a container GUID has always worked, but until 1.7 there was no way in the API to retrieve an entity based on container GUID; get_entities() rewrote container_guid to the owner_guid.  I'd say a good rule is "only set the container_guid if you need to."  If you want EntityA to be contained by EntityB, set it.  Otherwise you can ignore it.  Your categories example would work fine.

    Search:

    First, you can easily have multiple tags on a single entity.  Tags are just a metadata name with multiple values.  I'm not sure I would "flatten" the categories into a single string--there's no need.

    Relationships are not searched by default, but they could be.  The search system allows you to extend it.  Check out the docs.

    Search doesn't search all metadata, but only certain bits.  Search on all metadata is probably not a good idea...

    Private Data:

    The Elgg engine supports exporting entities, which will include metadata and annotations (with access levels respected).  Private data isn't exported by default, nor is it searched.

  • Once again Brett, thanks for the discussion.  I think this discussion will be very helpful to newcomers (it was to me).  

    While I work on a clean summary, maybe it would be worthwhile to add a link in the docs to directly to this discussion (maybe in the FAQ).  Just a an idea.  I'll go ahead and add it and someone can remove it if they don't like it (it is a wiki after all).

    Thanks again.