Yes, you read that correctly. In this post, I’ll tell you how we’ve implemented “PHP file caching” in our application logic at Graphiq, resulting in crazy-fast cache reads.
The core of the technique is leveraging the PHP engine’s in-memory file caching (opcache) to cache application data in addition to code. HHVM has supported this technique for a few years, but PHP only recently started supporting it with the launch of PHP 7. The method still “works” in PHP < 7, it just isn’t fast.
The reason this method is faster than Redis, Memcache, APC, and other PHP caching solutions is the fact that all those solutions must serialize and unserialize objects, generally using PHP’s serialize or json_encode functions. By storing PHP objects in file cache memory across requests, we can avoid serialization completely!
Here’s how we “set” a value in the cache:
function cache_set($key, $val) {
$val = var_export($val, true);
// HHVM fails at __set_state, so just use object cast for now
$val = str_replace('stdClass::__set_state', '(object)', $val);
// Write to temp file first to ensure atomicity
$tmp = "/tmp/$key." . uniqid('', true) . '.tmp';
file_put_contents($tmp, '<?php $val = ' . $val . ';', LOCK_EX);
rename($tmp, "/tmp/$key");
}
And here’s how we “get” a value from the cache:
function cache_get($key) {
@include "/tmp/$key";
return isset($val) ? $val : false;
}
Now let’s store a value in both our PHP file cache and in APC to compare:
$data = array_fill(0, 1000000, ‘hi’); // your application data here
cache_set('my_key', $data);
apc_store('my_key', $data);
And see how they perform:
// note: make sure you run this on a separate request from cache_set to ensure PHP's opcache will actually cache the file$t = microtime(true); $data = cache_get(‘my_key’); echo microtime(true) - $t; // 0.00013017654418945$t = microtime(true); $data = apc_fetch(‘my_key’); echo microtime(true) - $t; // 0.061056137084961
We were able to fetch a million-row array of strings in a tenth of a millisecond, ~500 times faster than APC. The 500X number is kind of arbitrary, because larger objects will have larger performance improvements, approaching infinity. To be clear, though, we did see speed improvements of two orders of magnitude in our real production object caching when we implemented this method.
Keep in mind that PHP file caching should primarily be used for arrays & objects, not strings, since there is no performance benefit for strings. In fact, APC is a tad bit faster when dealing with short strings, due to the slight overhead of calling PHP’s include() function.
We use a bit more code in our production application than what is illustrated above. In addition to adding expiration support and a cache_clear function, we also had to implement our own multi-server distributed clearing logic, which data stores like Redis and Memcache handle automatically. Cache keys must also be validated or encoded based on the filesystem to ensure valid filename characters.
Another production consideration is the PHP opcache configuration (or HHVM .ini configuration). Your opcache.memory_consumption setting needs to be larger than the size of all your code files plus all the data you plan to store in the cache. Your opcache.max_accelerated_files setting needs to be larger than your total number of code files plus the total number of keys you plan to cache. If those settings aren’t high enough, the cache will still work, but its performance may suffer.
At Graphiq, we’re building a platform that processes and visualizes knowledge from thousands of distinct datasets and billions of entities. Operating at that scale and beyond forces constant innovation to keep our platform from getting bogged down. By caching our array/object data in PHP files, we’ve mitigated what once was a tricky bottleneck.
info@elgg.org
Security issues should be reported to security@elgg.org!
©2014 the Elgg Foundation
Elgg is a registered trademark of Thematic Networks.
Cover image by Raül Utrera is used under Creative Commons license.
Icons by Flaticon and FontAwesome.