Old datamodel on Elgg 6

In the sake of performance and scalability issues, is there any project/plugin who implements the old data model (i.e, entity tables, not only metadata table) on Elgg 6?

  • What table in the database are you talking about because elgg_entities table exists?

  • elgg_users_entity, elgg_groups_entity and so on...

  • You didn't mention your Elgg version, but here's what we did for one project.

    We upgraded Elgg 2.3 to Elgg 5.1

    The customer needed to save data in the geocode_cache table dropped since Elgg 3.0.

    First, we dumped the data from this table to the db.sql file.

    Then we created a class in a custom plugin that extends Elgg\Upgrade\AsynchronousUpgrade.

    Then, with each upgrade, we checked if this table existed in the DB, and if not, we created it and entered the data from the dump (in reality, this was only needed with the first upgrade from Elgg 2.3 to 3.0):

    _elgg_services()->db->runSqlScript(__DIR__ . '/db.sql');

    Note: you need to upgrade both the core and plugins sequentially with each major version of Elgg.

    Based on our experience, we recommend the following Elgg update schedule:

    2.3 > 3.0 > 3.1 > 3.2 > 3.3 > 4.3 > 5.1 > 6.0

    References: 1, 2

  • elgg_users_entity, elgg_groups_entity and so on...

    We removed those tables as they were causing performance issues.

    In the old system a join (or multiple) was needed for every elgg_get_entities() query. this isn't a problem on smaller databases, but when you get into databases with 100k-1m+ rows this is an issue.

    We we decided to remove the need for those added tables. Also they didn't hold any information that couldn't be stored in metadata (which it effectively was).

    That covers the performance part, I'm not sure what you meant by scalability, but I believe you can still configure the storage location of MySQL tables on a per table basis. So it should be possible to spread a database over more than 1 disk.

    If you meant something else, please let me know.

    Based on our experience, we recommend the following Elgg update schedule:

    2.3 > 3.0 > 3.1 > 3.2 > 3.3 > 4.3 > 5.1 > 6.0

    It should be possible to go from 2.3 to 3.3 in one go. But if you have different experiences this will also work.

  • @Jerome Bakker

    It should be possible to go from 2.3 to 3.3 in one go

    We've been doing this for a long time.

    But then we discovered that sometimes the DB migrations in this case doesn't occur correctly.

    For example, updating metadata columns and adding entities subtype index.

    Especially for large sites.

    For this reason, we chose a different Elgg upgrade plan.

  • @Jerome Bakker

    If you meant something else, please let me know.

    I meant avoid having the most commom (and accessed) entity attributes as metadata, because it also implies in SQL joins (of elgg_metadata with elgg_entities) and far more records in the result set (one for each metadata). Beyond that, an attribute as metadata can't be indexed properly nor be used as a constraint for a foreign key.

    The metadata approach is undoubtedly flexible, but is far more onerous in terms of database workload and storage area.

    That's why I'm trying to upgrade to Elgg 6, but trying to avoid the overhead of the new metadata approach, using the former data layer of entity specific tables + metadata.

  • I meant avoid having the most commom (and accessed) entity attributes as metadata, because it also implies in SQL joins (of elgg_metadata with elgg_entities) and far more records in the result set (one for each metadata).

    By default there is no join, but we preload all metadata.

    For example a blog listing of 10 blogs

    • 1 query to get the 10 blogs from the entities table
    • than 1 query to load all metadata for those 10 entities

    Beyond that, an attribute as metadata can't be indexed properly nor be used as a constraint for a foreign key.

    True, but currently we don't use foreign keys are they conflict with internal events in Elgg.

    The metadata approach is undoubtedly flexible, but is far more onerous in terms of database workload and storage area.

    If you have some numbers on this, we'd love to hear so maybe we can optimize. So far we've only seen performance gains from the change on a fairly large database (1.6M entities, 11M metadata) with 100-200 concurrent users.

  • If you have some numbers on this, we'd love to hear so maybe we can optimize.

    Of course there are some kind of queries that will benefit from the full metadata model (like the ones in your example), but there are several others which will not.


    For example: in the data structure below, queries would be more expensive if executed over the metadata table than over an specific group entity table.

    A group entity about music bands with the following attributes:
    group_name
    description
    gender_id
    location_id
    
    where, gender_id is the foreign key from the music gender table
    location_id is the foreign key from the location table.

    To sum up,  there are advantages and drawbacks of both data models, but we, as developers, would love if we had back the choice of choosing over both approaches on Elgg 6 and above.

  • For example: in the data structure below, queries would be more expensive if executed over the metadata table than over an specific group entity table.

    A group entity about music bands with the following attributes:
    group_name
    description
    gender_id
    location_id
    
    where, gender_id is the foreign key from the music gender table
    location_id is the foreign key from the location table.

    The groups_entity table only ever had the columns: guid, name and description. If you added gender_id and location_id (and the corresponding tables) that means you already modified the datamodel.

    To sum up,  there are advantages and drawbacks of both data models, but we, as developers, would love if we had back the choice of choosing over both approaches on Elgg 6 and above.

    In my 15 years of experience with Elgg I never had to modify the datamodel to suit a need I (or my customers) had.

Performance and Scalability

Performance and Scalability

If you've got a need for speed, this group is for you.