Patrick DSouza

Location:

Send private message

You must be logged in to send a private message.

Activity

  • Patrick DSouza has updated their profile
  • Patrick DSouza has updated their profile
  • Patrick DSouza replied on the discussion topic Enhancing Search in the group Elgg Technical Support
    I'm part of a team that's looking to help build out the search capabilities in Elgg 1.5, which are admittedly quite anemic. I've seen a few of the "full text" search plugins up here, but I don't think they go far enough. What I'd like to see...
    • The index Brett is talking about would be filled in by each of the individual plugins that know about the specific parts of data that ought to be searchable. For example, the core profile plugin would index things like interests, about me, name, etc. This will also let the search system know *why* the entities in question matched (since the index will have to record its "source" or "type" as well as the entity and string information itself). The actual searching would be triggered by the callback system. The first callback would be "search the index", which would allow for a fast search across indexed entities in the database all in one go. If all you have in your system is indexed entities, it should work nicely and allow for a more on-target system than is there now.

      Any plugins that add entities or information to entities that they want to be searchable will need to tie into the indexing system, and then they get the searching part for free. Plugins will also be able to hook into this same callback to extend the the search results *at runtime*, say with external "entities" that are not actually saved into the database. This is a key feature for us.

      The type of search that we were looking at here would be a simple "keyword-OR" method, but having a full-text index on the index table will let us pick up substrings. At the moment we aren't looking into structured data queries (locations and date ranges and the like), but hopefully this new structure will leave enough room for an "Advanced search" method later on.

      Here's a random thought: should we make a callback to let things expand and alter the search *query*? Could lead to a cool stemming engine plugin if someone would want to write it.

      Pagination and sorting, especially of mixed internal and external data, is a very interesting question, and not one we have completely figured out yet. Ideas on that would be greatly appreciated.

    • Our initial efforts will be toward full text searches.  This is really where Elgg needs to be improved.  Tag-based search work decently well.

      All search has to go through Elgg because of the access system.  It won't be very useful if search returns items you can't view OR if search doesn't return items you SHOULD be able to view because the indexer itself could view them.

      I did neglected to say that when the index is created, metadata will be included in the denormlization of the data.  This gets complicated fast.

      So current challenges include:

      1. Pagination and sorting when plugins add to the live search but NOT to the index.
      2. Right balance between denormalisation and not duplicating the entire databases in the search table.
      3. Exactly how to deal with indexed metadata.
      4. Speed, speed, speed.  (Speed.)
    • One good thing that reduces the size of the index table is the fact that not every entity need be indexed and only the metadata that the subsystem wants to mark as "searchable" is in there. So we immediately cut out all kinds of settings and widgets and extraneous bits of information, though we do run the risk of coming close to duplicating the metastrings table if we do it wrong.

  • Patrick DSouza has updated their profile
  • Patrick DSouza has updated their profile