Some Elgg performance/caching ideas

I just wanted to dump some ideas here lest I forget. I think the key to improving Elgg's performance is going to be maximizing re-use of views across users and across time. Some ideas I think could be pursued:

1. A caching mechanism that can be dynamically enabled (or disabled) to arbitrary views with flexible critera for reuse/invalidation. This would be a scalpel, allowing Elgg developers to optimize particular cases on a site. E.g. a way to turn this into code: "On URLs matching pattern A, cache the foo/bar view across all non-admin users and invalidate it every 30 seconds"

2. A mechanism for rendering views that can be reused across the most common user cases. E.g., say an entity summary view has 3 basic cases: [logged out users, logged in users, users who can edit it]. Instead of rendering for each as necessary, render all 3 views to a single string with a unique boundary (like in MIME) and cache it. Before display to particular user, cut out the unneeded sections with PHP's blazingly fast binary string functions (str_replace, strpos, substr, etc.).

3. Embed particularly useful metadata into cached representations, like if an entity is public or viewable by all logged in users. E.g. In an entity list, it may be more efficient to cache the first several pages of all applicable entities. Then when a user needs page 1, look up access visibility only for the more restricted items and just cut out invisible items from the cached string. Later pages could use the standard list_entities queries: most users won't get to these pages anyway.

4. Explore new pagination models: Ideally, no separate COUNT queries would be needed at all: always select one more row than you need to display the current page, which tells you if the next page exists. For situations where full page counts are necessary, cache the COUNT queries, and give larger count values longer TTLs. (Imagine if Google had to give accurate page counts for search results on every page. No one needs that). The endless scroll type of item viewing also allows for nice optimizations, like not having to worry about displaying fixed-size chunks of entities, and leaning a bit on HTTP caching.

5. Somehow allow particular view sequences to be "compiled" into flatter code while stripping away unnecessary code paths. Even with the view cache PHP has to load lots of files and make very many and deeply-nested function calls, which I bet have a real cost.

6. Slowly transition the API towards passing entities directly into functions rather than GUIDs. I see many instances where $entity->guid is passed into a function, which immediately turns around and calls get_entity($guid). Step through this process and watch the call stack... What I've always seen is that function calls are expensive when they start to add up.

  • #6 should eventually be a moot point because we'll push that logic into ElggEntity itself rather than having a standalone function at all. But in general, definitely in favor of passing around object pointers rather than guids for the reasons given.

Performance and Scalability

Performance and Scalability

If you've got a need for speed, this group is for you.