Amazon S3

has anyone outsourced their data file to amazon s3 services? been looking at them today and they look like a reasonable solution to storing all the media files form an elgg install as you only pay for what you use.

If anyone has done this can you please let me know what you think.

If this has not been done, and people are interested in trying this i am happy to help out any way i can.

thanks

  • This is difficult in Elgg's current model because it requires raw filesystem access to the files, which AFAIK S3 doesn't immediately allow.

    I imagine it would require quite a few core modifications, but would definitely be possible, though I do think it would adversely affect performance...

  • OK Brett

    thanks for the info, ill stick with my standard set up for now then

  • Have you considered something like Subcloud?  It emulates a unix-filesystem on top of S3.  I'm using it currently for an FTP server and it's working well.  Solid performance.  You could try pointing your Elgg data directory to a subcloud volume.

  • <!-- @page { margin: 0.79in } P { margin-bottom: 0.08in } -->

    @Brett

    I had briefly analyzed Elgg for separating the file system functionality. Please validate that if my understanding is correct.

    (1) Create a new subtype called 'rfile' (Remote File), (2) new class ElggRemFile (in the same lines of ElggFile) and (3) Modify the file plugin to use ElggRemFile instead of ElggFile

    To make the best of the remote FS (like S3 or CDN), there will be changes to the download and upload paths too.

    However, I was not clear on how much time and effort. Any thoughts?

     

  • @mhourahine - I haven't tried that, but it's exactly what Elgg would require.  I'm still concerned about performance issues when pointing the data dir to a remote server, though.  Elgg uses the data directory to store cached views and view lists, and a network read will always be slower than a filesystem read.  It's worth checking out, though!

    @naren - Time and effort will depend upon how much experience you have with Elgg, what sort of APIs your CDN has available, and if any pre-existing interfaces exist for PHP.  Realistically, assuming good knowledge of PHP and Elgg, I'd say under a 2-4 days would be enough time for a working demo.  Possibly less if the CDN's API/library is super simple.

  • @brett – Thanks for the response. My guesstimation was 4-6 weeks. Its very comforting to know its much lesser. :) [I am 2 weeks old to Elgg/PHP.]

    Current consideration is Akamai for CDN or S3 if cloud storage. However, the approach is to use a distributed file system (like mogilefs, CouchDB) locally first and then move to S3. Feel that the such migration will be less disruptive. And will give us enough time to test.

    I had missed out the file system dependencies for views and view lists. Does it make sense to use memcached for views and view lists (cache.php and simplecache/)? Or is it a overall kill? Will the preformance degrade?

    Am I missing any other dependencies on the elgg data directories?

  • @tomchip:
    Mail me i could give you solution for this.

    Thanks

  • @Naren - Memcache would likely be faster, but memcache support is still experimental and underdeveloped.  Actually, IIRC memcache still reads the initial file from the data directory.  I think simplecache, generic data storage, and views caching is everything using the data dir.

    @Izap - If you have a solution, let's keep it in public so everyone can benefit :)

  • If somebody interested my icondirect.php looks like :

    <?php

    if ($_GET['guid']) {$guid = $_GET['guid']; }
    if ($_GET['username']) {$username = $_GET['username'];}

    $size = strtolower($_GET['size']);
    if (!in_array($size,array('large','medium','small','tiny','master','topbar'))) $size = "medium";

    $dataroot = '/path/to/data/';

    $filename = $dataroot . "avatar/" . "{$guid}/profile/" . $username . $size . ".jpg";

    ob_start();
    ob_get_clean();

    if (!file_exists($filename)) {
    header("Location: http://breezybuzz.s3.amazonaws.com/default/default".$size.".gif");
    exit;
    } else {
    header("Location: http://breezybuzz.s3.amazonaws.com/" . $guid . "/" . $username . $size . ".jpg");
    exit();
    }
    ?>

    I use to dump icons to s3 :

    http://undesigned.org.za/2007/10/22/amazon-s3-php-class

    I make a custom data matrix for icons only to place images local and put them s3 too.

    Server loads dramatically down now.  10-50ms / icon load.