Hello I have a pretty high traffic Elgg installation which I'm looking to grow but unfortunately I'm hitting a upper limit with load averages around 40.0 (!) at times. I've installed the Varnish caching server and I put it in front of Apache to help reduce the load but unfortunately my "hit rate" is typically below 20% which means it doesn't do a lot of good. I suspect that a cookie or token (I notice elgg uses a token extensively) is causing Varnish not to want to cache most pages.
I was wondering if anyone has been here before and might have some tips for me? Most of my users are unauthenticated so I would particularly want to make sure that photos shown say on Tidypics are mainly served by varnish. I just want to get my varnish hit rate up over 50% to buy me some time until I can get a dual quad core server with 16 GB ram.
Any pointers or hints would greatly be appreciated. Thank you!
info@elgg.org
Security issues should be reported to security@elgg.org!
©2014 the Elgg Foundation
Elgg is a registered trademark of Thematic Networks.
Cover image by Raül Utrera is used under Creative Commons license.
Icons by Flaticon and FontAwesome.
Have you inspected the external resources of a typical page to see which resources are not coming through varnish/which don't have public/max-age? You can check the cache-ability of a resource via REDbot.
You might try the 1.8 branch from github (will be soon released as 1.8.9), which has full metadata prefetching. That should cut a bunch of queries.
You probably won't be able to cache any HTML output by default. Elgg could start allowing pages to be cached if the cache could be guaranteed vary by Cookie. That is, the cached version would be servable by a proxy cache like Varnish if the client had no cookies. I'm not sure how easy that is to implement on the Varnish side. On the Elgg side we need to not start a session for logged out users by default (could break some form saving features) and therefore use ajax to grab a login token before submitting the login form so the HTML output can be static.
Thanks Steve. I just read http://community.elgg.org/discussion/view/1108936/metadata-prefetching-lands-in-18-branch-please-test-if-you-can to learn more about what you are referring to. I will definitely upgrade once 1.8.9 is released as it looks like it can only help. After doing some investigation with REDbot I suspect the problem is primarily with the TidyPics portion of the site and the photos section. Placing a TidyPics thumbnail url into Redbot I get this:
---
Update: I've solved most of my problems by using something like this in my vcl file for varnish:
sub vcl_recv {
if (req.url ~ "^/photos/thumbnail/") {
unset req.http.cookie;
}
}
sub vcl_fetch {
if (req.url ~ "^/photos/thumbnail/") {
unset beresp.http.set-cookie;
}
}
See https://www.varnish-cache.org/trac/wiki/VCLExampleCacheCookies
It gives me a hitrate > .50 (50%) and over 70% when I add in some other code to strip the cookies from normal images using extensions (normally just using extensions would work but Tidypics seems to use dynamic urls for images).
This disregards the cookie for the Tidypics thumbnails. It seems to work fine but be aware that it might introduce a privacy compromise if a photo is private but gets put into the cache whereby an unauthorized user could potentially view the photo by knowing the url for it. Normally if private or unauthorized Elgg wouldn't serve it but if you disregard the cookies then Varnish will cache that content. Use at your own risk. I'm not an expert at this.
I hope this helps someone....
Good work. Ideally all non-public resources would be sent with Cache-Control: private (but I doubt we're doing this). I think you could safely cache anything with a future Expires header (ignoring the Set-Cookie). These should be profile icons, JS/CSS.
Also there are minor HTTP bugs I just fixed but these shouldn't be a real problem in practice. http://trac.elgg.org/ticket/4895
Anything you can share about your varnish experience is appreciated.
It's been working amazing well. I haven't tried tweaking further yet because the success has been phenomenal. Previously I was seeing lots of load averages between 30-40 on my dual core server and there were noticeable delays past 5.0. Now it's usually below 1.0 and almost always below 2.0.
I highly recommend Varnish or something like it for a high load Elgg installation which serves a lot of media content. What I did was set things as above to catch tidypics thumbnails (by removing the cookies to let the cache work) but I also set it to do the same by extension. I set it to cache for three hours (which seems more than people usually do but it seems to be working for me)
Here is my entire .vcl config file. I am a novice and I merely built upon the config a tutorial gave me so use at your own risk. I only offer it in the hopes that it can help give some hints to other Elgg users. It seems to be working well for me. I am also caching .css and .js files as well by removing cookies.
-----------
## Redirect requests to Apache, running on port 8000 on localhost
backend apache {
.host = "127.0.0.1";
.port = "8000";
}
### Recv
sub vcl_recv {
if (req.url ~ "^/photos/thumbnail/") {
unset req.http.cookie;
}
# if (req.url ~ "^/photos/image/") {
# unset req.http.cookie;
# }
if (req.url ~ "\.(png|gif|jpg|jpeg|flv|css|js)$") {
return(lookup);
}
}
## Fetch
sub vcl_fetch {
## Remove the X-Forwarded-For header if it exists.
remove req.http.X-Forwarded-For;
## insert the client IP address as X-Forwarded-For. This is the normal IP address of the user.
set req.http.X-Forwarded-For = req.http.rlnclientipaddr;
## Added security, the "w00tw00t" attacks are pretty annoying so lets block it before it reaches our webserver
if (req.url ~ "^/w00tw00t") {
error 403 "Not permitted";
}
if (req.url ~ "^/photos/thumbnail/") {
unset beresp.http.set-cookie;
set beresp.ttl = 180m;
}
# if (req.url ~ "^/photos/image/") {
# unset beresp.http.set-cookie;
# }
# strip the cookie before the image is inserted into cache.
if (req.url ~ "\.(png|gif|jpg|jpeg|flv|css|js)$") {
unset beresp.http.set-cookie;
set beresp.ttl = 180m;
}
## Deliver the content
return(deliver);
}
---------------
Hopefully some more experienced users will also come forth with help and insight. :)
gtsfa, this is a very interresting post. Could you share how many requests per second your website could handle before and after using Varnish, and with what server configuration?
We implemented a few cdn solutions in Elgg, although we didn't use Varnish, even though it was discussed. We created a set of articles about optimizing Elgg, maybe they will be usefull to you:
http://vazco.eu/news/view/568329
Yes, metrics would be good, including logged in/out cases and testing real user behavior as well as simple A/B of a single page. I've heard siege is good for this.
Reminds me I should test our caching improvements in the 1.8 branch
@Steve: I've also heard about siege, but AFAIK it's better for testing many pages. I think we're more concerned here on detailed info on few particular pages, so maybe Apache Benchmark (ab) would be better. It allows to test logged in/out scenario if you pass Elgg cookie to it.
According to improvements, I've pulled extension to developers plugin https://github.com/Elgg/Elgg/pull/419 that gives some basic info on SQL queries count. I see quite big improvement there (from over 100sql in basic install core load to ~40sql).
Also on bigger sites, unefficient SQLs become apparent, so there's also ability to remove DISTINCT (https://github.com/Elgg/Elgg/pull/418) and I'm working on several less crucial optimizations of caching instead hitting DB + optimizing 'Using temporary/filesort' in EXPLAINs. I'm thinking about adding some triggers or other (more leightweight? maybe called in debug mode only?) mechanism to the core to allow DB profiling without corechanges. I found myself writing lots of similar code while profiling Elgg, standarizing it shouldn't hurt.
Hi gtsfa,
Did you able to solve this problem even I am facing the problem with my high load site. my vanish server is not able to take "hit rate" more than 15 %.. I am on elgg-1.8.9
Now I am doing POC with "memcache" + "Varnish". To reduce the load
My current architecture is I am trying to reduce the DB hit by using memcache (will give a layer above the DB server) and to reduce the serer load I am using Varnish caching server which I put in front of Apache (Varnish I am trying to use as output cache) as soon as I turn on both caching I am not able to login (nothing happan) .. I am not sure what is happing and also in PHP log file apache logs nothing is happening and I am not able to proceed ahead.. Please give any suggestion.
@saurabh, check out the .vcl file I shared above for some ideas. There may be differences depending on the varnish version and what you are running. Unless you do something to strip out the cookies on static content (like images and movies) it won't catch it. Tidypics also has to be handled with something like:
because it's dynamic and otherwise it doesn't seem to cache it (the cookie being present tells varnish not to cache the content unless you do something to strip it out for this purpose).
... But there are dangers in doing this. For instance some content might get shared despite access restrictions. There might be a better way - perhaps if we could find a way to tell varnish simply to not catch anything if the user is logged in but I've yet to find it.
For the log in problem I would test using only varnish and then using only memcached to try to help isolate what may be the issue.
- Previous
- 1
- 2
- Next
You must log in to post replies.