Enhancing Search

I'm part of a team that's looking to help build out the search capabilities in Elgg 1.5, which are admittedly quite anemic. I've seen a few of the "full text" search plugins up here, but I don't think they go far enough. What I'd like to see available:

  • The main search callback implemented as an action or otherwise extendable hook
  • Plugins be given control over the list of entities to display for a query, with the ability to add and remove items in the list
  • The ability to have non-persisted entities in the results list (for things like external data sources) and have these integrated into the search display
  • Entities to know (for display purposes) *why* they were returned as part of the search query
  • Get rid of the generic search-on-metadata that leads to bizarre and unexpected behavior (search for "on" or "1", for example) and replace it entirely with plugin-based search functionality

There are a lot of issues dealing with this, many of which could revolve around performance and memory considerations when a dozen plugins start feeding into the search results.

I have a small team of coders that I think I can throw at this, but I definitely want the support of both the community and, hopefully, Curverider, since this would necessitate changes being patched back into the core.

Who's with me?

  • Our initial efforts will be toward full text searches.  This is really where Elgg needs to be improved.  Tag-based search work decently well.

    All search has to go through Elgg because of the access system.  It won't be very useful if search returns items you can't view OR if search doesn't return items you SHOULD be able to view because the indexer itself could view them.

    I did neglected to say that when the index is created, metadata will be included in the denormlization of the data.  This gets complicated fast.

    So current challenges include:

    1. Pagination and sorting when plugins add to the live search but NOT to the index.
    2. Right balance between denormalisation and not duplicating the entire databases in the search table.
    3. Exactly how to deal with indexed metadata.
    4. Speed, speed, speed.  (Speed.)
  • One good thing that reduces the size of the index table is the fact that not every entity need be indexed and only the metadata that the subsystem wants to mark as "searchable" is in there. So we immediately cut out all kinds of settings and widgets and extraneous bits of information, though we do run the risk of coming close to duplicating the metastrings table if we do it wrong.