'Pages' very slow to load

On our 2.3 site, one of our users makes heavy use of Group Pages.  Currently the group has ~2,800 pages.  I noticed that the page loads on any of the Group Pages for this group (top level or sub-pages) are very slow, on the order of 30s.

I traced the performance issues to the recursion that builds the pages navigation menu (pages/lib/pages.php:pages_get_navigation_tree()).

Since all Page objects have container_guid set, I wondered whether this could be easily done without recursion just by grabbing all the pages within a container.  So I tried this:

$pages = elgg_get_entities([   // ElggBatch was ~5x slower

   'type' => 'object',

   'subtypes' => ['page_top','page'],

   'container_guid' => $container->getGUID(),

   'limit' => false,

]);

 

 

foreach ($pages as $page) {

    $tree[] = [

        'guid' => $page->guid,

        'title' => $page->getDisplayName(),

        'url' => $page->getURL(),

        'parent_guid' => $page->parent_guid, // not sure about a top-level page

    ];

}

It was ~20x faster than the recursive method, and seemed to work equally well on Group Pages and Site Pages.  Note that it was only ~4x faster if I used ElggBatch instead of elgg_get_entities directly.

I was just wondering if I'm missing anything obvious in my approach above.  Maybe it's not as elegant as the recursion, but the poor performance of the recursion on large page structures is a deal-breaker for us.

I also noticed that Elgg 3.x reorganized things a bit, and it's possible the performance is improved there.  I saw an ElggPage class with a single 'page' subtype instead of 'page_top' and 'page'.  But the recursion is still there in lib/pages.php, and I don't see any substantial difference in the data structures or algorithm that would make it faster.

  • When you are using  'limit' => false in elgg_get_entities that means that you are fetching all the ~2,800 pages at once, this will take time. That is why pagination is used to load n number of element (default being 10 or 20) and then keep on showing next n number of element on next page using offset.

  • Rohit, this isn't an issue of pagination.  Whether we use ElggBatch or simply elgg_get_entities(), the purpose of the pages_get_navigation_tree() is to build a navigation structure for all the pages in the container (Group or site/owner) upon loading the page.

    Additionally, fetching all 2,800 pages at once with elgg_get_entities() was by far the fastest approach of the three I tried.  :)  It was 5x faster than using ElggBatch with the default chunk size, and 20x faster than the default approach (which combines ElggBatch with recursion).

  • You can try this:

    $pages = elgg_get_entities([
        'type' => 'object',
        'subtypes' => ['page_top','page'],
        'limit' => false,
        'container_guid' => $container->guid,
        'batch' => true,
        'batch_inc_offset' => false,
    ]);
    
    foreach ($pages as $page) {
    
    ....
  • You could have a look at the Pages Tools https://github.com/ColdTrick/pages_tools/releases plugin, it removes the tree for the entire container and only shows the tree for the current page structure.

    Could help with performance.

    @RvR

    With 'batch_inc_offset' => false this will infinitely loop so won't help.

    @Josh

    If you don't use an ElggBatch and limit => false ALL entities get loaded into memory at once, this can cause Out-Of-Memory errors. That's why the recursion (i think)

  • We did something like this to speed up page tree navigation

    /**
     * Produce the navigation tree
     * 
     * @param ElggEntity $container Container entity for the pages
     *
     * @return array
     */
    function pages_get_navigation_tree($container) {
    	if (!elgg_instanceof($container)) {
    		return;
    	}
    
    	$container_guid = $container->getGUID();
    
    	// populate in-memory array of all top-page and child-page entities for the group/person
    	// using a fast database query
    	$pages = elgg_get_entities([
    		'type' => 'object',
    		'subtypes' => ['page', 'page_top'],
    		'container_guid' => $container_guid,
    		// sort and process by oldest first since most parent pages will be created before their children pages
    		'order_by' => 'e.guid asc',
    		'limit' => false
    	]);
    	$pages_count = count($pages);
    
    	$tree = array();
    	$depths = array();
    	$loops = 0;
    	$timer = new Timer();
    	$timer->start();
    
    	// hacky way to prevent infinite loops
    	$max_seen = 5;
    
    	// sort pages
    	/* @var \ElggObject $page */
    	while ($page = array_shift($pages)) {
    		$is_orphan = false;
    
    		$loops++;
    		if ($page->getVolatileData('seen') > $max_seen) {
    //			throw new \Exception("Exceeded $max_seen tries to build tree for page $page->guid. Please contact an administrator.");
    			error_log("Giving up and adding orphaned subpage to end of list");
    			$is_orphan = true;
    		} else {
    			$page->setVolatileData('seen', (int)$page->getVolatileData('seen') + 1);
    		}
    
    		if ($page->parent_guid && !$is_orphan) {
    			// if the page has a parent, but we haven't found it yet, append to end of list
    			// to process again
    			if (!isset($depths[$page->parent_guid])) {
    				array_push($pages, $page);
    				continue;
    			} else {
    				$depth = $depths[$page->parent_guid] + 1;
    				$parent_guid = $page->parent_guid;
    			}
    		} else {
    			$depth = 0;
    			$parent_guid = 0;
    		}
    
    		$depths[$page->guid] = $depth;
    
    		$tree[] = [
    			'guid' => $page->getGUID(),
    			'title' => $page->title,
    			'url' => $page->getURL(),
    			'depth' => $depth,
    			'parent_guid' => $parent_guid,
    		];
    	}
    
    	$timer->stop();
    
    	// @todo remove once confirmed performance is okay
    	if ($_GET['debug']) {
    		var_dump($tree);
    		var_dump("Loops: $loops");
    		var_dump("Pages: $pages_count");
    		var_dump("Time to process: " . $timer->result());
    		exit;
    	}
    
    	return $tree;
    }
    
  • @Jerome

    I was wondering about whether it might make sense to limit the Navigation structure to just the current page.  I'll take a look at the page_tools plugin -- thanks!  Though for the top level 'Group Pages' section of a group, I think it's going to create the hierarchy for all pages in the group regardless.

    Thanks for the warning about the possible out-of-memory error.  Our server is reasonably beefy, but this is something worth keeping an eye on.  We could probably mitigate the risk (and still get acceptable performance) by using ElggBatch with a fairly large chunk size.  Apparently our server can handle ~2,800 pages fetched at once, but maybe chunking 500 or 1000 at a time would be prudent.

  • @Jon,

    Interesting approach!  I might be missing something, but I don't think all that sorting is necessary.  Though I admit I'm not sure what you're trying to accomplish with the 'seen' volatile data.

    The items in the $tree array are added to the Navigation menu with elgg_register_menu_item(), and the hierarchy is determined by the 'parent_name' => $page['parent_guid'] option.  The order of the $tree items doesn't matter.  It doesn't even matter if you register a 'child' menu item of a parent before you register the parent itself.  Maybe under the hood in elgg_register_menu_item() things will go faster if $tree is already sorted -- I haven't looked.

  • Frustrated by how slow Pages were loading because of the tree navigation, I pulled apart the code for building them in 1.9 and re-implemented it with php arrays instead of n database calls.

    We're using volatile data to store how many times we've 'seen' this page before and using it for error checking rather than maintaining yet another array/map to keep track of that. (volatile data doesn't get set in the database the way other metadata does).

    It may be that versions above 1.9 bring in features that make this easier, but it's a 'solved problem' for our site at this point.

Performance and Scalability

Performance and Scalability

If you've got a need for speed, this group is for you.