Strategies for tagging

This one is way beyond me when it comes to the programming aspects. But here are some thoughts from a multilingual author's/user's standpoint.

(BTW, I can't write anything more than basic code myself so my hope is to study the options out there on "the market" and put them on display for the folks that really know what they are doing.)

Tags are used by an author to describe content so that it can be grouped with similar objects and found by interested users.

In the simplest case, an author would create their content in as many languages as they are able and then create a list of multilingual tags (e.g. hello, tere, hola, bonjour). More complex approaches could keep tags in linguisticly separated fields so as to simplify their display per audience.

The obvious result of duplicating tags in multiple languages is an enormous tag cloud. In a site facilitating conversation between two langauges, the tag cloud could potentially double in size. The more languages we add to the mix, the bigger the cloud gets. As long as our searches are not visually based, this won't be much of a problem. But according to our definition of tags, searching is only one of the functions of tags. The other function is to visually categorize content. And this is where the problems come in.

  1. An enormous tagcloud representing 3 languages actually represents only at most 1/3 the cloud size. I say "at most" because not every author will offer their content in multiple languages.
  2. As I look through this immense tag cloud, I have to visually sort out my langauge(s) from those I don't recognize.
  3. Furthermore, if the site is able to provide translations of some or all content but some objects are tagged only in one language, I might miss out on a whole swath of useful information from other language sources.
  4. Tag clouds organized by popularity often filter out tags on the lower end of the popularity scale. When tag "combinations" (the same tag definition in multiple languages) are not associated with one anther, it will be possible that they appear in different sizes and some of these are filtered out. For example, I want to find all objects associated with tags meaning "hello". Possible tags include "hello, bonjour, tere, and hola". Of these, there are 3 "hello"s, 1 "tere"s and 25 "hola"s. As a non-spanish speaking Estonian, I do not recognize the enormous number of objects available in spanish, see only a small number of english language objects and cannot see that there is actually an estonian object which has been filtered out of the cloud. Of the 29 objects available in the system that would be available to me in translation, I only have access to 3.

Initial recommendations

  1. Keep tags linguistically separated either by field or some kind of markup (though markup might be an obstacle to the majority of users).
  2. Create a multilingual tag dictionary which associates tag combinations and represents the total count in single language tag searches or clouds. Perhaps it would be best to leave these combinations in the hands of a few multilingual gurus in order to avoid choas througout the system.
  3. Give an option for tag clouds to display default language only, possible languages only (per elgg settings), all languages, or separate tag clouds.

Any other suggestions?