True - Elgg only searches via metadata / tags. To index document contents --> extracting those pertinent words / type of word -- that search is desired upon and indexing them as tags attached to the owning entity.
One small big problem will be the selection of words to index - whether using Elgg's tags or any other indexing algorithim.
EG
If document content is =>
"The quick brown fox jumps over the lazy dog"...
Then the words index yields ==>
-- omitting prepositions and any other stop words.
Could try the KWIC indexing algorithm ( indexes all cyclic permutations of keywords ) published on the internet in numerous places, though not much has been seen coded in PHP.
I think copying the file contents in tags will solve this, at least temporarily. Any idea how to do that automatically when a file is uploaded?
i think you might have missed some points i made ;)
"copying contents" when file is uploaded is not whole story. 'copying file contents.." might osr some hundred loc.
needs to be whenever content [ entities (that whose "words" are to searchable) ] are --> * created * updated * deleted --> leads to create/update/delete tags ;-X
the "brown fox" analogy ? does one really want *all the content's words indexed ? lots of server cpu load ;-) how to be selective in fetching keywords ? (believe that this is) *not an easy programming task.. could take Brett's BigBrother PlugIn (now quite dated) and upgrade. extend to capture content words for indexing as tags.
if a usual Page (PlugIn) add/update generates 20-30 metadata db hits, indexing content with 1-200 words will do about 2000++ db hits ;-( no joy ;;;-X
info@elgg.org
Security issues should be reported to security@elgg.org!
©2014 the Elgg Foundation
Elgg is a registered trademark of Thematic Networks.
Cover image by RaĆ¼l Utrera is used under Creative Commons license.
Icons by Flaticon and FontAwesome.