Elgg search

My site (www.scholify.com) uses Scribd to dispay uploaded documents, however Elgg search does not index the contents of the documents uploaded by users. Instead search works only with tags, comments, etc. How can I get Elgg to index document contents?


Let me know if you need more information about the problem...

  • True - Elgg only searches via metadata / tags. To index  document contents --> extracting those pertinent  words / type of word -- that search is desired upon and indexing them as tags attached to the owning entity.

    One small big problem will be the selection of words to index - whether using Elgg's tags or any other indexing algorithim.


    If document content is => 

    "The quick brown fox jumps over the lazy dog"...

    Then the words index yields ==>

    • Quick
    • Brown
    • Fox 
    • Jumps
    • Dog

    -- omitting prepositions and any other stop words.

    Could try the KWIC indexing algorithm ( indexes all cyclic permutations of keywords ) published on the internet in numerous places, though not much has been seen coded in PHP.




  • I think copying the file contents in tags will solve this, at least temporarily. Any idea how to do that automatically when a file is uploaded?

  • i think you might have missed some points i made ;)

    "copying contents" when file is uploaded is not whole story. 'copying file contents.." might osr some hundred loc.

    needs to be whenever content [ entities (that whose "words" are to searchable) ] are --> * created * updated * deleted --> leads to create/update/delete tags ;-X

    the "brown fox" analogy ? does one really want *all the content's  words indexed ? lots of server cpu load ;-) how to be selective in fetching keywords ? (believe that this is) *not an easy programming task.. could take Brett's BigBrother PlugIn (now quite dated) and upgrade. extend to capture content words for indexing as tags.

    if a usual Page (PlugIn) add/update generates 20-30 metadata db hits, indexing content with 1-200 words will do about 2000++ db hits ;-( no joy ;;;-X