Use of ElggBatch

I have read about the use of ElggBatch in several discussion when using elgg_get_entities with $limit is set to 0 to avoid OOM errors. My question is why in Elgg core (1.8.13) there is this kind of code:

$friends = get_user_friends($user_guid, "", 999999, 0)

Is it because the case of a user having 25000 friends is nearly impossible ?

I want to use use ElggBatch in a cron task to calculate a ranking of users (and I read good solutions about this in the elgg community :) ). This cron task would update an array[5] of the top 5 users. This array would be displayed in my dashboard and activity page. What would be the best way to save (cache ?) this array ? I thought of a metadata of the site entity ? would it be a good solution ?

thanks.

 

  • ElggBatch is good for cases where you want to get the result, do something, then forget about it.  Getting all friends means you'll need to keep all of the friends in memory, so it doesn't matter if ElggBatch is used or not.

    In your case, why not record the ranking of each user as metadata on the user - that way you can just do an elgg_get_entities_from_metadata() to get your top 5.  Then there's no issue if one of the top 5 is deleted, but you have a metadata pointing to it from elsewhere.

  • ok thanks a lot Matt...

  • No problem - keep in mind that currently there's a bug in ElggBatch due to the new caching system, so if you end up with OOM errors that's why.  Hopefully that will get fixed soon as I have a number of plugins that rely on ElggBatch...

  • So basically if i want to grab a huge numbers of entities without doing any processing on them i can use $limit=0 without OOM risk ? that's right ?

  • Within reason, there is still a memory footprint that's linear with the number of entities so you will run out of memory at some point.  How many entities depends on a number of factors (amount of metadata for example, now that all metadata is loaded with the entity).

    Once the ElggBatch bug is fixed it's definitely better use ElggBatch wherever possible.  You'll still have linear time increases while processing lots of entities but won't have OOM issues.

  • ok I see...thank you Matt for all your explanations and congratulations for your work on Elgg (it is an amazing tool)..and I read that to reproduce the ElggBatch bug you grab something like 25000 users ..well by the time I will have that many users I sure the bug is going to be solved ;-) 

  • If you don't need them, you can improve performance by not creating the user entities (pass 'callback' => '' to elgg_get_entities). You'll get stdClass DB rows instead of ElggUser objects.

    We're not sure the source of the memory leak but I suspect the DB query cache, which stores all result sets and currently has no cache limit. Cash Costello is working on an LRU cache for this (and other memory) caches.