are there potential speed problems with the elgg database?

Hi

We're building a social commerce site which we expect to have a lot of traffic. Will these lead to speed problems with the EAV database? EAV has a bad name in comparison to relational databases...is that fair?

Additionaly, myself and my fellow programmers have lots of expereince with relational db's and none with the EAV model. Is it true that the best way to approach the elgg database (and EAVs in general) is:

1/ Never change the database

2/ Never write sql, use the core functions

3/ Never even open phpAdmin to look at whats in there!

Its very different to what we're used to. Am I right in saying, to get an idea of the structure of the data (in a relational db, i just look at the table), you look at the object you are sending to the database, and ignore anything lower than that?

Any advice/opinion would be great

Thanks

Mark

 

  • Elgg has a semantic-like (as in semantic web) data model which supports rapid prototyping and developing of social applications. EAV is used for describing attributes on an entity (called metadata and annotations in Elgg's data model).

    Part of the reason you can develop applications on top of Elgg so quickly is that it handles data persistence for you and has a very flexible data model. Software design is always about trade-offs. Elgg trades off speed performance for flexibility and development time. The reality is that the vast majority of web applications out there never run into scaling issues because they never attract enough traffic.

    Where EAVs really hurt you is when you query for several attribute/value pairs. This requires multiple joins on the same table.

    Elgg's non-EAV tables are easy to view. I know on occasion I find it faster to take a look at the raw data in the database. Elgg's doesn't require you to know anything about the database structure though.

    It's possible to override chunks of Elgg's data persistence or add additional tables for performance reasons. It's generally not recommended because it can cause problems with upgrades, create brittle code, and not be as robust as what's in core. It especially causes problems if developers do that and then distribute their plugins through the community site. The quality can vary tremendously and now you have all these Elgg sites out there with variants in their databases.

  • I really buy in into the flexibility argument .. I have seen most of the social net frameworks around and the core issue is that you end up in a data mess with the traditional plugin approach where everybody associates its own interpretation of a data model with the plugin ...

    On the other hand there are these hard problems which is not only a performance issue that need some other approach  ... but the hard problems account usually for less than 20% of the effort and so I think it is better to just handle the hard problems with a customized approach ...

    I will need a way to figure out how to search e.g. for a female, with either red or black hair,  brown or blue eyes, in the age of 40 - 50, between 166 and 176 cm height, with either slim or normal body, engaged or married, not formal dressed, living at most 30 kilometers from me and holding a university degree :-) ... [that's just an example - I am not looking for such a person :-) ] ...

    Would that work the Elgg way and if yes should one do this the Elgg way .. ? I am currrently thinking of a de-normalized user table just for queries ...

    I have seen some notes that in 1.9 the focus is on performance and within that some kind of de-normalization work ... what is meant with the latter? ... the hard search problems?

     

  • Cash, we have developed a solution which improves search of Elgg with additional tables, it's possible to do it so that it don't influence original database or cause problems during upgrades (I think I mentioned on Trac how it works). This removes main limitation of Elgg's EAV architecture.

     

    I know from experience Elgg can be fairly well optimized and scaled as well. This requires relatively lot of work, although when you're in a point where you need scaling, this is usually affordable. I don't know of any OS script that wouldn't requrie personalized scaling at some point - it's usually good to tune architecture to functionalities of website and their load.

  • I had to implement similarly complex searches based on multiple user/entity attributes. With a moderate-sized community (couple of 1000 members, relatively small amount of concurrent users) queries like that can run smoothly on a fair dedicated box. However, as Cash said, the multiple joins on the metadata/metastrings table are quite a resource hog - they'll quickly hurt you if you experience concurrent searches. I wrote a plugin quite a while ago that generates a full community for you with users, blogs and upload files, and lets you run JMeter based load tests against your fake site. That should help you address scalability issues in advance, before experiencing real traffic. It's available here.

  • Hi, thanks for all the advice. Andras, interesting plugin, im going to take a look at that.

    We are also customising the search to look for products based on location. I've seen before that its not a good idea to write SQL for EAV databases. But I dont see how we can do this without writing our own custom SQL....what is the drawback to writing custom SQL? Is the main problem the time it takes to write the SQL, given all the joins?

  • We also created additional tables to handle custom extended searches (similar to what Vazco and DomBlue described above).

    The search predicates we needed for the project were much more complex (about 70+ probes combined and permuted ) than are catered for by the metadata search API.

    Custom SQL calls *are catered for within Elgg's APIs via get_data() and related functions inside lib/database.php -- (Look at bulk_user_admin start.php for one example of such a call).