Sphinx Search for Elgg 1.8 v1.0a

Release Notes

ELGG 1.8 ONLY

ALPHA RELEASE. FOR TESTING ONLY. DON'T USE ON PRODUCTION

If you'd be so kind as to give feedback, here's some information that would be helpful:

  1. How hard was it to install and get running?
  2. Was there a noticeable performance boost?
  3. How large of a database did you test on? (# rows in entities table?)
  4. Find any bugs?
  • Thx for this idea, i use sphinx in other projects and try to use your plugin.

    this dont work for me, i have found any issues, first the config-file dont understand PHP so i insert the datas manually in the spinx-config file. After this, i got some errors from sphinx: unknown key name 'sql_attr_string' after i delete this line in configfile, i got a similar second error: unknown key name 'workers' i also deleted this line. now the config file will load to sphinx but there is waiting the next error :)
    failed to open /var/www/vhosts/**/httpdocs/elgg/mod/sphinx/indexes/groups.spl: No such file or directory, will not index.

    I added a folder in /mod/sphinx/indexer/ and then sphinx is collecting the data!

    BUT the search on my elgg-side dont use sphinx data, i cant see any difference, i testet it with words i added, they should displayed after i indexed again, but it look like its the normal search from elgg.

    And, i think the normal search in elgg have any issues, in the sidebar you can choose e.g. wire, post etc. but this dont work, i got every time a empty site, only All works!!

    I use the latest elgg version, nightly build.

  • Did you follow the directions i gave for installation? You should not have to go through all this trouble.

  • @Evan, yes i did! the first critical error was that i must modify the sql statements in your configfile

    'sql_attr_string' after i delete this line in configfile, i got a similar second error: unknown key name 'workers'

    I put the plugin on the end (after search-plugin), but nothing changed, i dont know whether the plugin is working, i cant see anything...

  • the php file should generate a conf file when you activate the plugin.

    What version of sphinx are you running?

  • Hi, Evan, setting up your module is somewhat of a manual horror, but it pays off even in this alpha state. If you are about to continue with this otherwise amazing plugin, indicate me, and I'll share the whole setup story (to note: by enabling the plugin the sphinx.conf.php does not output anything, so I had to manually substitute every referenced configuration variable found in the config text...).

    Environment: 

    • CentOS 6.3
    • PHP 5.3.18
    • Elgg 1.8.9
    • Sphinx 2.0.6

    Test of functioning: 

    1. Have set up indexes succesfully
    2. Started service (searchd)
    3. Entered a truncated word in the search box
    4. Result list has shown the entire resulting word
    5. Disabled your plugin
    6. Entered the same truncated word in the search box (as in step 3.)
    7. The expected result has not reappeared

    From steps 4 and 7 I conclude that your plugin works nicely. Alas, it is a PITA to set up.

  • Hi, Daniel,

    Glad you found it useful. Definitely a pain.

    I feel we need a more generic way for plugins to define what fields are relevant for their entity types. Otherwise basically every setup is going to be custom, which feels pretty stupid for an engine like Elgg.

    I'm not really actively developing this, though your story might be useful to others and I'd gladly add you to the repo as a collaborator if you want to help develop things.

  • " more generic way for plugins to define what fields are relevant.. " - that's it!, but for whom ? the developer or the non-coding site admin ? i do believe this, your statement, is onto something good.. triggered some *new thought process.

  • Hi EVAN,

    I tired this plugin and try to make it work with 1.8.8 its does not work .. Any help..

  • I have been trying to get sphinx search to work on my elgg site but seem to be getting nowhere.

    1. I downloaded the sphinx plugin from https://github.com/ewinslow/elgg-sphinx and uploaded it to my elgg installation.
    2. Activated the Sphinx plugin.
    3. Generated the configuration file.
    4. Downloaded sphinx-2.1.2-1.rhel6.x86_64.rpm
    5. Installed Sphinx using yum.
    6. Started Sphinx

    I then typed a search in the search box but it did not return any results.

    I looked in the sphinx.conf file and noticed the following..

    sql_host                = localhost
    sql_user                = test
    sql_pass                =
    sql_db                  = test

    I changed these to match my database setting but it still does not work.

    Has anyone had success with this plugin and can offer any advice/help?

    I am on a VPS with Linux Centos6.

     

     

     

     

  • Hi Evan,

    I'm experimenting with sphinx and I just want to add this one more thing for users having problems.
    In my specific case running 2.1.4 searchd wouldn't start because of a path to binlog which was not right.

    Searchd wanted to log to /var/data/binlog.lock (which isn't right in my case)

    So maybe to let it work on all servers you could add either:
        binlog_path = <?php echo $CONFIG->dataroot; ?>sphinx
    to the configfile in the plugin or edit the sphinx.conf file later and add your binlogpath there.

    Anyway, it wasn't serving results before and after the binlog path is't working great. Thanks for the plugin, I'm gonna see what I can do with it. It's very interesting!

     

  • Sphinx searching in Elgg is simply amazing!

    I was making an advanced searchengine that filtered on metadata from profilemanager and I could search on 5 metadatafields at one query (like city=paris & profession=designer & , if I took more then 5 the CPU of my dev server got 100% and my page hung up. With 4 it was struggling.

    I converted the same searchengine to use Sphinx now and it serves everything I want at lightning speed, no cpu load, as many filters as I want and the options are endless :)

    I would add the profilefield searching to github, but I'm actually cheating and I'm using Vazco_metadata_cache to put them all in one table and sphinx fetches them from there, just to save me time developing and leaving the joins out, but if I have time I'll look into making it work without Vazco's plugin, but that's for later.

  • Search works as long as someone does not delete anything from the elgg instance and then trys to run a search that contains a the deleted item in the result set.

    Realtime or Delta indexing needs to be investigated.  

  • Hey Charles! Thanks for the bug report! Can you submit it to github?

  • @Charles, I also worked on this plugin, but this does not seem to be a bug. The index is updated by the indexer script, which you should run in cron (or task scheduler in Windows). The index is therefore only updated after the batch has run. So deleting an item will be in the index until the new batch has started.

     

  • @Gerard, How ofter do you run the indexer?  My elggentities table has over 3.5 million rows and it takes over 20 minutes to recreate the index.  The error I'm getting is a "Fatal Error" from elgg after I request a new search that contains a value found in the Sphinx index but no longer in the elggentities table.  If I run the search from the servers command line I don't get any Sphinx errors.

  • To be honest, I am not using the plugin myself anymore but my own google search plugin. Sphinx is however potentially more powerfull, but we need more development effort to put in. We should include comment and tag search and a jquery search for users/groupps like advanced search from Coldtrick.

    But to answer your question, I did run it every hour and that could work for you too, but given the fact that it runs for 20 minutes you probably need to run the indexer on a different system to avoid overloading the CPU of your production system. But than it still would take huge database load.  

  • Charles, you are right, Delta indexing is what we need and is perfectly possible with Sphinx.
    I think we want to use the 'last_updated' field for this.

    I'm looking into it myself and I'm still experimenting with my indexes, so I can't tell much about it.
    I'll share my experience when I have set it up properly.

  • @Charles D Not a bug.

    The easiest way to start indexing is an adding to cron

    /usr/local/sphinx/bin/indexer --all

    However, this method is highly not optimal, because the groups and users are updated significantly less than objects, or comments. So you need to increase the refreshing of the index rate.

    For different frequency indexing of various objects add to cron the following commands:

    12 */3 * * *  root /usr/bin/indexer --rotate groups > /dev/null 2>&1
    12 */3 * * *  root /usr/bin/indexer --rotate objects > /dev/null 2>&1
    */50 * * * *  root /usr/bin/indexer --rotate users > /dev/null 2>&1



    Read also my guide 'Installation Sphinx Search Engine on Elgg website'

  • @Charles, I read more about deltaindexing and it should indead be possible to use the delta index in a way that it updates only the entities where the timestamp of date_updated is newer then the last time the mainindex is updated.

    This should bring the indextime down to a fraction of the original indextime and it merges the deltaindex with the mainindex.

  • @driesdk Thanks for the information. I will get one of my programmers to look at setting this up.  Otherwise I may have to look at another solution. 

    Charles

     

  • Search seems to crash (Elgg Fatal Error) when the result set contains items you don't have access to.  (Administrators don't get this crash since they have access to all).

    On our site we allow the creation of closed, private and hidden groups. 

  • Will there be an update to this module for Elgg 1.9/1.20?

Evan Winslow

Software Engineer at Google. Elgg enthusiast. I wrote the Javascript and CSS frameworks for 1.8.

Stats

  • Category: Third Party integrations
  • License: GNU General Public License (GPL) version 2
  • Updated: 2014-11-17
  • Downloads: 914
  • Recommendations: 4

Other Projects

View Evan Winslow's plugins