SOLRSearch for Elgg Plugin

I have started to create a plugin to integrate SOLR into Elgg. The combination of SOLR / Solarium looks rather impressive. I have put the plugin onto Github ->  https://github.com/domblue/SOLRSearch

It still lacks a lot but if anybody is interested to contribute feel free to contact me. This plugin / integration is definitively not something for beginners! SOLR is a separate Java Server. 

There is a readme in the repo explaining the setup and the pre-reqs.

The basic idea is to get sophisticated search for Elgg for 2 main use case scenarios:

1. Search Elgg content (all types)

2. Search Elgg profiles (based on profile manager data) 

This will be implemented with 2 SOLR cores / index to get both paths optimized.

There is still a lot to do:

1. Enable semi-dynamic generation of SOLR configs out of Elgg

2. Create the user interface and linkage of the results to Elgg entities.

3. Enable sophisticated profile search.

4. Enable advanced search features like stemming, synonyms, stopwords, highliting, facets, etc.

5. Optimize data import including delta import.

6. Respect Elgg access control 

7. and more .... :-)

Attached is a screenshot. Of course the current output is optimized for development and not the way an end user would see it. 

 

 

Screenshot SOLRSearch

  • ... because this repo is just a ping test to SOLR using Solarium ...

  • Hi Domblue,

    If you looking a search for elgg I would suggest you to go with "Sphinx: https://github.com/ewinslow/elgg-sphinx"  as solr needs lot of custom coding and same you can archive in sphinx in very less time and also sphinx have better PHP supportt and  have batter RDMS integration capabilities.

  • @domblue Ad. 1 solr supports dynamic types (have a look at example config distributed with solr), so you don't have to generate actual configs - attaching suffiixes to particular field names should be enough and much simpler.

    Ad. 6 That's main problem here. I think in the end we need to firewall solr and emulate access suffix in solr (we'd need to map ACLs into solr). I was a bit digging into it but didn't have time to get working solution. I'm not aware of anything more elegant.

    I'm looking forward to see that plugin in action, didn't look into it for now.

    @sarubah I wouldn't even try to integrate solr on RDBMS level. I'd use create/update/delete events from elgg to update search index (don't think that changes will happen that often, they'll probably sooner kill MySQL than solr - no need for batch delta imports) and exportable fields as base info to be kept as document properties (probably some extendability will be requiired). Didn't think much about annotations yet. Don't see much use in keeping relationships in solr. For complex use cases we'd need some preprocessing before inserting to index AFAIK.

    I won't be working on my plugin soon (yes, it's very far from being releasable - it's my sandbox for experiments for now), have stuff with higher priority too me.

  • @saurabh ... I am working for the company of which is said that you need to have a PhD to consume their products :-) ... thus I am used to complexity:-) ... but SOLR is really easy to use and setup and you don't need custom coding ... the reason I am using it is Java which I know, some advanced features, Lucene and the independence from accessing the DB - all data can be included in SOLR  documents.

    @Pawel ... I know the dynamic field stuff and will certainly use it for profile searches. I have read the already available chapters of the SOLR 4.0 in action book. Great Book.

    The hardest so far has been to create the SQL statements for the SOLR Data Import Handler in order to create one document per Elgg object with all info inside.

    For the ACL for a start I am thinking of a multivalued field with all friends inside for the friends access level and with only one friend for the private access :-) - I know that is not all in respect toElgg ACL.

    Relationships is poison in a flat world. There is a join capability included but this is more to relate documents to a parent. Maybe a feature to relate group contents to groups.

    The profile search side is a bit harder due to the dynamics and extensive use of metadata. I am tempted to do an export of the profile fields directly into SOLR documents much like the csv export of the profile manager.     

       

  • @ Pawle, Till now what I understood from the Sphinx is that Sphinx can be used as a stand-alone server (just like other DB servers). So what I mean to say is this possible if we can eliminate the Mysql and use directly Sphinx..

  • @saurabh: is that the sphinx setup you have configured on your website/ server ?
    i'd like to give your website's sphinx search engine db a good test-run.

  • @Dhrup, Still i am working on Sphinx search. and its on my locale machine I will share my experience ASAP. Even I will share my plugin to the community

     

  • @DhrupDeScoop We've launched Sphinx search about 1 month ago with my PR from https://github.com/ewinslow/elgg-sphinx Some most configs include it ;)

    All works OK!

    Also, U can read my upz about Installation Sphinx Search Engine on Elgg website

  • I was more -> curious about what saurabh said above, but has not answered fully --> 
    " that Sphinx can be used as a stand-alone server (just like other DB servers... "    not so much about Sphinx itself...

    My personal preference leans towards SOLR or while a fuller implementation of that with Elgg awaits..
    I prefer my own STSX Search (Super Turbo Search) Utilitiy to handle fast, efficient searching within Elgg.