optimising elgg 1.8 with nginx: sharing experiences; 3+ years of exploration & refinement

when i first installed elgg, way back in v1.7.4 - i was unaware of the extent to which i would need to learn to optimise servers just to be able to run elgg in a useful way.

i started with shared hosting and after using several different hosts i realised that i just couldn't get the performance i needed, even just for my own testing, let alone for actual production use. so i moved to a small vps - with 512MB RAM and 1 cpu. at this point i realised that apache was the bottleneck and thankfully nginx was stablised and be ready for use. so i switched. there was no elgg configuration for nginx at that point, however i found one on a blog and it worked.

the site was running smoother and faster than with apache, with less hardware use, though the performance was still not great and i didn't know why. i assumed that the elgg code would be optimised to use best practices for server/browser caching and that the structuring of page templates and plugin flow would use similar best practices for the layout of css and javascript. however, my assumption was incorrect (reminding me that to assume makes an ass out of u and me!). at that point though, i was more focussed on building the site theme and the design aspects, so performance was not high on my list of priorities. recently i have moved away from design and have focused on making changes to ensure the site runs quickly and reliably and sharing some of what i have learned is the purpose of this page.

when i started optimising about 6 months ago, i was seeing pages loading commonly in greater than 5 seconds, sometimes 10 and sometimes 20 or more! i had no explanation for this since there was no simple tool, available by default (that i knew of) with nginx & php that allowed me to view the causes of slow performance on the server. 

 

newrelic - php/server benchmarking

i found the php / server analysis tool from http://newrelic.com/ and used their free trial period to locate bottlenecks in the code. these were PHP issues in the elgg code of certain plugins (mainly videolist and some other plugins as i recall - including my own 'related items' plugin). newrelic allowed me to drill-down in to quite low level detail to find the sources of the slow areas in the code. i found some major delays were occurring with thumbnail generation for videos and applied a fix. after fixing bugs and poor design decisions in 5 or 6 plugins, plus removing some other plugins that were not necessary - the pages were loading about 50% faster.. but still not fast enough. (n.b. i just found that zend have a service that might be a free equivalent to newrelic and i will test it soon, once they release an update that is php5.5 compatible).

 

hardware

around this point i upgraded to 1MB of RAM on the server and found that more of the slowdown was coming from me running nginx, PHP, anti-virus, stat tracking (piwik) & email server all on that one VPS with not enough RAM to cover it all. with the extra RAM capacity the site sped up some more, though was still not fast enough for enjoyable, multi-user production use.

 

caching

after the free trial of newrelic ran out i shifted my approach and looked more closely at the server configuration and caching. i began by testing varnish cache, which did speed the site up nicely and without any configuration being done by me at all. i realised though that nginx is supposed to have a similar ability to varnish so i sought instead to not use varnish (i later discovered that varnish doesn't support https connections so it would have been no use to me anyway without another layer being added to translate - which is innefficient).

i had not previously realised that the default settings for elgg (plus plugins) and also for nginx do not apply helpful caching headers to the various file types we are using on our websites and so most files were being reloaded on every page load, which is highly innefficient! after learning the syntax for nginx configuration files i applied sensible caching to js/css/images/html and all the other types of files that my elgg site is serving. this made another dramatic improvement to speed of use.

after upgrading to PHP5.5 (which i recommend for performance, security and reliability), i installed and configured the PHP Opcode caching which provides a level of caching within the server that reduces the amount of PHP processing through allowing re-use of commonly run code/requests. this improved performance perhaps by 30-40% in some cases.

i have now also applied 2 levels of caching in nginx directly. microcaching is used in some areas, so pages are cached for several seconds - ensuring that refreshes and concurrent requests can be made from multiple visitors without incurring extra overhead, while maintaining near real-time data availability from elgg. file caching in nginx is also enabled (using the open_file_cache series of server directives).

i also explored using cloudflare and incapsula for a while as CDNs and thus globally cached data was available in different continents for my site - even though i only use one server for elgg. i found this was helpful for speed, however i also found that since i value privacy and security for my site's purpose, i was not willing to add the potential security risk of essentially 'employing a doorman/security guard' for my site who would not let me see, moment to moment, the exact details of what it is doing. so i no longer use these CDN services, though they may be of use to you.

encryption / ssl

on the topic of privacy/security, i chose to encrypt the entire site recently and saw that this also slowed the site - mostly on connection and not so much after that. after exploring various options i created a free ssl certificate using openssl and stored it with startssl.com for free. thus i now have a free method of encrypting my site and the site has the 'green safe logo/ padlock' for all pages (i needed to make sure that all links in the site code and database point to https links and not http). after reading some tutorials on this and also after listening to many of the 'black hat' conferences via youtube i realised that ssl has been mostly broken for a long while and that a lot of the information online is either incorrect or is deliberate mis-information.. so i needed to experiment and explore more.

i found that i needed to reduce the number of certificates in the ssl chain to be as small as possible to optimise the ssl connection time. this page offered some extra tips for optimising nginx with ssl (http://www.igvita.com/2013/12/16/optimizing-nginx-tls-time-to-first-byte/). i also included many settings in the nginx setup to optimise the tls encryption process.

after using the ssl testing tool at qualsys and a few weeks of learning about and tweaking the ssl configuration, i now see my site has an A rating (https://www.ssllabs.com/ssltest/analyze.html?d=infiniteeureka.com) for encryption. forward secrecy is the process (as i comprehend currently) of ensuring that the relevant aspects of the security cyphers change regularly, such that if the codes are broken at any point then the previously transmitted encoded messages will not be automatically accessible - the same level of code breaking is required for each transmission (or possibly session). i am no expert in this since i do not code encryption cyphers directly - however, i see now that most browsers can connect to my site with forward secrecy and thus the security is, in that sense, comparible to the busiest encrypted sites on the web.

compression

the use of gzip compression went a long way to reduce the page size of the pages on my site. through tweaking the settings in nginx (which are widely available online) i reduced the page size by at least 50% or more. 

 

performance benchmarking and load testing

i have used a variety of tools and apps along the way to help me in this task of optimisation. the one i currently am using for load testing is loadimpact (https://loadimpact.com/), which shows how the server handles multiple concurrent connections. the results there are not amazing currently, i will continue to refine the server configuration and accept that a hardware boost is probably required to improve the performance at load.

while optimising the nginx configuration i found that the process described here (http://seravo.fi/2013/optimizing-web-server-performance-with-nginx-and-php) for graphing server response helped a lot to gain clear measurements of the effects of small changes. i created a series of 20+ graph lines that show how each of my changes gradually sped up (or in some cases slowed down) the server response with multiple concurrent connections - the response line is now relatively flat, so that the response times are fairly similar regardless of how many visitors connect, whereas when i started the process i was seeing a nearly 45 degree vertical incline as the site slowed down with more visitors hitting it concurrently.

 

image optimisation


since most of the images in elgg are served dynamically via the elgg engine, any optimisation of them needs to be done within elgg or possibly via a server extension such as google's pagespeed module for nginx. i have recently seen that google has been somewhat unkind here in that they have offered pagespeed for free during it's creation process and many coders are adding to it on github, yet they now have a notice saying that the pagespeed service will be a paid service 'at some future point'. so i will not use their tool for optimising here.

instead i have optimised static files as much as possible (see related links below for a useful optimiser for png files) and for a while i used the lazy_load_images elgg plugin to only load the images that are visible on the page - which can speed up page loads for longer pages. currently i have disabled the lazy loading since there is an intermittent bug with chrome/chromium and the activity page in elgg when lazy loading images is enabled, which results in images not always loading.

i think there is considerable room for optimising elgg's serving of images, particularly through applying optimised headers in tidypics and for icons in elgg directly. this ticket in github is one which i think should be resolved asap to increase performance /caching of images in elgg (https://github.com/Elgg/Elgg/issues/4279).

 

page layout - javascript and css

i found the google pagespeed insights tool to be helpful in identifying bottlenecks in the site's javascript and css design (http://developers.google.com/speed/pagespeed/insights/?url=https%3A%2F%2Fwww.infiniteeureka.com). once i combined the javascript files from my site into one large file (i am still refining this as some files are still loaded seperately), the page weight decreased by around 40-50KB and the speed increased too. i still need to optimise the css file for my elgg theme as currently all css is loaded for all pages - leaving 1000+ tags being loaded unnecessarily for many pages.

i just recently found the html5 boilerplate files (http://html5boilerplate.com/) which offer some nice config tips and theming ideas for performance and reliability - i carefully read through the nginx files and other areas and grabbed parts that i didn't already use that are useful.

i also replaced the default jquery and jquery-ui files in elgg with the versions available from the google CDN, so that most (some/many) visitors to my site will not download jquery since it will already be available in their browser cache - in theory. if they do download these files they will be from google's own servers and thus the delivery will be highly optimised for free.

 

the results?

currently, the site is loading rapidly for visitors after they have made the initial ssl connection and downloaded the js/css that is reused throughout the site. the initial connection and page load takes between 3 and 5 seconds usually and subsequent loads are faster - between 1 and 3 seconds usually if the visitor is located within range of my european server. i have not yet seen the site under the weight of a lot of traffic - i am sure i will need to make more changes then.

 

in summation

coding and configuring elgg for performance needs to be a priority for anyone who intends to use elgg for any task more than just playing at home. the tools are available for anyone with enough time/ focus / intention to find them and use them in helpful way. the raw code of many plugins and also the core has space for improvement and the server configuration shared for nginx can be greatly enhanced.

what now?

  • i will improve the sll connection speed somehow - i think that future releases of nginx will improve this - so i may wait for them to see what they do before diving too deeply into that.
  • i have not yet successfully activated memcached for use with elgg - my previous attempts were not succesful. i see there are some current tickets in github related to this, so i will probably wait for elgg 1.9 before i do that.
  • the minds.com team are due to release a fork of elgg which uses a nosql database in place of mysql which is offering enhanced performance, so i may switch to that when they are ready.
  • i am aware that the caching headers for some plugins and images for elgg core need to be improved - i will look to inspire that via github or will do it myself when i have the time/space available.

at some point i will probably make the config files available on github to inspire learning and improvement within the elgg community (and the files themselves). i know that the lorea group has a version of their nginx files online there too. maybe you have some tips and thoughs that you can share here that may help too?

 

more useful links

http://gzipwtf.com/ - test your serve's use of gzip

https://www.ssllabs.com/ssltest/index.html - ssl / encryption testing

https://tinypng.com/ - highly efficient image optimisation for static png files

http://www.webpagetest.org/ - waterfall breakdown of page load, timing and server responses

http://tools.pingdom.com/fpt/ - another performance analysis tool, including waterfall breakdown and other useful metrics

  • Here are the results on one Elgg 1.8 site after I configured memcache to it:

    image

    As you can see, the number of database queries was reduced very dramatically.

     

    The CPU usage was also reduced significantly (although this is thanks to other optimizations also):

    image

  • excellent, ok, thanks for sharing. when i have looked at memcached previously i concluded that i would need to add some code to plugins and possibly to core to have a useful effect.

    did you optimise your code much to see this benefit from memcached @juho?
    i think we spoke about this previously (or maybe it was gerard) and i did briefly test memcached again then.

  • when i have looked at memcached previously i concluded that i would need to add some code to plugins and possibly to core to have a useful effect.

    No, it is a built-in feature in the Elgg core.

    did you optimise your code much to see this benefit from memcached

    Modifications are not required to get benefit from memcache.

    I did however remove one plugin that was polling the server every now and then. But I don't think that had much effect on the results. I removed it few days earlier than I enabled memcache, but there were no visible changes in the MySQL and CPU usage back then.

    (One issue still remains in Elgg's memcache: we don't have site specific namespacing yet, so it isn't possible to use the same memcache server for more than one Elgg instance.)

  • since you have nginx, why not boost it with a CDN like cloudflare?

  • i also explored using cloudflare and incapsula for a while as CDNs and thus globally cached data was available in different continents for my site - even though i only use one server for elgg. i found this was helpful for speed, however i also found that since i value privacy and security for my site's purpose, i was not willing to add the potential security risk of essentially 'employing a doorman/security guard' for my site who would not let me see, moment to moment, the exact details of what it is doing. so i no longer use these CDN services, though they may be of use to you.

  • i'm using their free service which is working fine for me at the moment.

  • My experience with cloudflare is that it actually decreased performance. The average download speed reported in google webmaster with cloudflare was around 1,1 second and now it is 300 msec. And that is google bot crawling from the US to my European server, instead of to a US server nearby from cloudflare.

    What I do would like to add to this very informative article: Go through the plugins you loaded and check the javascripts parts. A lot of plugin developers do not use simplecache and even worse they don't pay attention when to actually load it and  for simplicity reasons load it in the start.php. They should be loaded at the very last moment, when it is really needed and not in any page in your site. This really improves the page weight drastically and improves server response time.

    Regarding memcache, I did intensive testing with it and did not bring any improvements. It probably will, if you are able to distribute it around the world and have many memcache servers offloading your database queries. But local memcache did not bring me anything. Cloudflare is providing such service, but since my previous experience I am not risking the bet until some very positive reports are coming in

  • i have been optimising the javascript from the various plugins today, yes.

    i have used the unregister function in elgg to replace the calls to the js files with my own calls in my theme and where possible i have combined the files into one big one. mostly i am loading them in the footer - though i have not found a way to load jquery in the footer yet.

    the response times for my server are certainly being slowed by something like this. not sure what exactly yet.

    another reason not to use cloudflare is that their free service does not allow you to use https - though they say they will be changing that this year.

  • Only combine js files when loaded on each page. Otherwise optimise by checking the load_js statements.

    Creating a huge minified js only helps when these scripts are realy needed on all pages and you can squeeze out redundant functions. To risky for me, it makes debugging a nightmare.

  • @gerard checkout my elgg site http://demyx.com it has nginx with cloudflare and loads almost instantly on most pages except certain areas. i forgot which accelerator i used for nginx but i did set that up too.

    @ura cloudflare minimizes the js on my site on the fly so i don't have to worry about stripping white-spaces or breaklines. 

    image

    i'm not sure how accurate this is but i do like data :D

  • my thinking is that by combining all commonly used js files into one and then serving via gzip, most of the site's js comes down in one transfer and is then cached locally - so there is no noticable performance loss as a result - except maybe for visitors who visit, load one page and then leave.

    i have a script here that uses some type of external google combining service to merge all the various javascript files into one file and i can easily do that in a few seconds for the production site. as long as i only need to debug locally, there is no problem. if i did need to debug the js on the server then i would need to switch my theme into development mode on the production server, since i would not be so easily able to debug the minified and combined js file, yes.

  • i have to check my nginx config to see which accelerator i used

  • @cim - yes, your site loads fast here. i read the cloudflare blogs and they have explained a lot of what they do.. and i can mimic a lot of it. i agree that using them is an easy way to gain a lot of perfomance if the site is configured in the right way and uses cloudflare in an optimal way. currently though i cannot use them due to https limits.

    i would always be concerned about routing my entire site through an external organisation too. in many ways that is like sticking a message on my forehead saying 'censor me'. even if the cloudflare team are 100% honest and acting from integrity, there are those who are not, who would (and probably do) actively target cloudflare.

  • another area for optmisation that no-one mentioned is the language files.

    the english language file for my site is currently 284KB unzipped and 70KB zipped.. that's a LOT of unused text being downloaded by every visitor. i think this really needs to be moved into php land and pushed as html as needed.

  • @cim, your site is indeed loading quite fast. I checked similar pages like groups/all and your site is a bit faster but lesser groups. I am using apache, did test Nginx but again not much performance benefit there, especially not if you move to apache 2.4

    @ura soul. Language files are cached by Elgg and gzipped (or deflated) by nginx/apache. Don't think you can improve on that one

  • Oh and the language files are not downloaded to all visitors, only the parts in a specific page (elgg_echo). If not cached yet, it will be loaded into cache once a visitor enters the site in a language that has not been loaded yet. 

  • @gerard did you check the blog page? lots of images are there

  • if you look inside the language file that is downloaded you will see a huge amount of text that no-one will ever use.. aparently (nearly) the entire language file is downloaded on my site at least. see here: 

    https://www.infiniteeureka.com/ajax/view/js/languages?language=en&lc=1390440689

  • aha, well, i just found that i had not started the memcached service.. so that explains why i didn't see any difference yet. ;)
    i haven't noticed the type of cpu changes that juho showed.. but possibly the site is a bit faster overall.
    the latest version of nginx has some ssl optimisations that are helping here too. overall the site is faster than ever now. still some more to go.
    i have located some other services like newrelic for server monitoring and optimisations. though i haven't found one that supports PHP stack tracing like newrelic that is totally free. i have found appdynamics though (http://www.appdynamics.com/solutions/php-monitoring-solution) which give another 2 weeks worth of free optimising for PHP.

  • ok so, here are the accurate performance graphs.

    this shows the evolution of the server response times as i have tweaked the settings (the wobbly white line is the latest one - with memcache enabled):


    image

     

    and these are the loadimpact graphs. the 1st is showing how the server is responding to multiple user connections without memcache enabled:

    image

    the 2nd is with memcache enabled (much quicker and more stable):

    image

  • here's a load graph (that goes up to 800 users instead of 30) with the new server which has 4 cpu cores instead of 1 and also double the RAM (and a different datacenter / provider) - faster page loads, with more capacity... for less money.. lol:

    image

  • Ura you should try out lazy_hover. Sadly we couldn't get this in for 1.9 but I think this is going to make a real difference on sites that display a lot of user icons.

  • @steve_clay: i already am using lazy_hover, yes. lazy_load images helps too, though i have disabled it currently due to a glitch with videolist icons in the activity streams intermittently not loading.

  • Anything on MySQL tuning besides using memcached?

    I found this free online tool that gives some tuning recommandations:

    https://tools.percona.com/wizard

     

ura soul

co-creator of reality - admin of an online social network for healing, balancing & evolving.. plus maker of some free plugins for elgg