AmazonS3 Integration

I have been asked to integrate Amazon's S3 file store into Elgg. Its part of a larger project to build an AMI for Elgg 1.8. We're thinking that cache will remain in the standard file store but all file uploads, group icons and profile avatars have to move to S3. For the database we're going to migrate to Amazon's RDS. The hope is that with these changes we should be able to spin up additional elgg instances as demand increases. Any thoughts /  experiences are appreciated.

To start, I have created an AmazonS3 plugin to priovide basic S3 functionality. I have modified Elgg's file plugin to use S3. Did not modify or replace the Elgg FilePluginClass. Instead I chose to mirror the local file structure on S3 (all uploaded files and any thumbnails created when image files are uploaded). I then migrated url's to point to the S3 files rather than the local files. I use the same prefix and user file matrix to mirror the directory path for each user. Seems to work well.

Today's task is to migrate group's icons and user avatars to S3. Seems like a straight forward task. I notice Elgg uses icontime to cue the browser when to use cache copies of icons - I presume for performanced reasons. Icontime is still being updated, but since I am returning urls to S3 thumbnails icontime is not used (never calling thumbnail.php). Nor am I using download.php - instead using the amazonS3 url in the download button (title menu).

Two concerns. One, am I missing something? This seems to have gone too easily. And two, I'm wondering if I should have build an amazons3filestore class instead of manually replacing local filestore calls with amazonS3 calls. I chose not to do so because the abstracted ElggFileStore.php is a bit to fine grained to easliy slip in AmazonS3 (seek, read, open, etc). Doing so I think would require writing S3 files to a temp file anytime it is accessed - which seems unnecessary given how easily the first approach is...

Your thoughts and comments are apprciated. I'd rather find problems now in the prototype stage than have to change my approach later.

  • @Mike - Thanks for getting back to me - and thanks for the info. In early tests it seems to be working quite well. I've been playing with the idea of storing locally if S3 is not available and building a queue to upload when it comes back online... but so far it has been rock solid...

  • @Jimmy Coder: Some notes i would like to add.

    1. We are using amazon S3 for some large scale elgg project. And i really did not find any issue with the realibility of S3. We are 100% dependent on S3 services.

    2. Only one aspect is, you need to compromise with the default access control of elgg on files.

    3. Amazon RDS is good. But there are more better opportunities like

    1. Amazon DynamoDB.
    2. Upgrade elgg from MySAMI to InnoDB which really improve database writes on disk. And has lots and lots of improvement while performing sql queries on db. It was not possible before mysql 5.6. We have upgraded our elgg installation and found great improvement on data writes. I am waiting for elgg, when they start using InnoDB as default storage engine. Well we are using elastic search for our full text search implementation. Which is dam good. Sphinix, solr are also good candidate for full text search. But new technologies are doing better like elastic search, mongoDB etc.
  • Jimmy, you can check out our article about approach to scaling Elgg here. By next Wednesday we will upload a few more articles, including comparison of NGINX and Apache, approach to measuring scalability and a few other topics.

     

    In case you're interrested in some specific topic, let me know.

  • @Jimmy Is there any chance of you to publish the S3 integration plugin? Or any pointers of how to develop one based on your experince so far?

    Thanks in advance!

     

    Best regards,

     

     

  • The integration was done under contract for a client. I will ask about releasing it to the community - however I doubt they will allow it - at least until the product is released. That said, I am more than willing to discuss the task of creating such a plugin.

    Amazon has a robust php library for accessing their S3 service. Thats where I started. I built a plugin that loaded the amazon library, had a settings page for amazon S3 credentials, etc. An decision was made to do a one for one replacement for files, profile icons, group icons and tidypics. Since Tidypics was not released until recently that integration is currently in process. We also use Google docs preview plugin which also required tweaking to work with S3. 

    Lets start by looking at what one would want to do in an amazonS3 plugin init function

    function amazons3_init()
    {
    $root = dirname(__FILE__);
    elgg_register_library('elgg:amazons3', "$root/lib/s3/sdk.class.php");
    elgg_register_library('elgg:amazons3_helpers', "$root/lib/amazonS3_helpers.php");

    $action_base = elgg_get_plugins_path() . 'amazons3/actions/';

    elgg_unregister_action("groups/edit");
    elgg_unregister_action("groups/delete");
    elgg_register_action("groups/edit", "$action_base/groups/amazons3_edit.php");
    elgg_register_action("groups/delete", "$action_base/groups/amazons3_delete.php");

    elgg_unregister_action("avatar/upload");
    elgg_unregister_action("avatar/crop");
    elgg_register_action("avatar/upload", "$action_base/avatar/amazons3_upload.php");
    elgg_register_action("avatar/crop", "$action_base/avatar/amazons3_crop.php");

    elgg_unregister_plugin_hook_handler('entity:icon:url', 'group', 'groups_icon_url_override');
    elgg_register_plugin_hook_handler('entity:icon:url', 'group', 'amazons3_groups_icon_url_override');

    elgg_unregister_plugin_hook_handler('entity:icon:url', 'user', 'profile_override_avatar_url');
    elgg_register_plugin_hook_handler('entity:icon:url', 'user', 'amazons3_profile_override_avatar_url');

    elgg_register_event_handler('delete', 'user', 'user_delete_event_listener');
    }

    The file plugin required extensive modification and it was decided to rewrite the file plugin as an amazonS3 file plugin. Seemed easier - though it was a toss up.

    Does this make sense to you?

  • @Jimmy Great! I will work on this soon and keep any doubts coming :P

    By the way... do you suggest creating a custom AMI (Im not really sure about how to scale out/horizontal an AMI) or can I deploy into Beanstalk and just use its Autoscaling feature?

    Best regards,