Emoji's from phone breaks posts

Hi,

I was wondering if anyone else experience that when people post something from their phone the result is broken by emojii's/smileys. It looks like the content of the post is only saved until it meets one of those emojii's, the rest is discarded.

Funny enough I often see email notifications containing the full post.

I have found that most Emojii's should be 4 byte unicode - can we strip them in HTMLawed or maybe convert them to smileys?

As more and more people use their phones/tablets the impact of this problem is getting worse.

Ideas are most welcome :-)

  • I had the same problem at importing postings from Facebook. I removed the emojis.

  • @Sepp - did you do that yourself or did you use some code to filter them out?

    It would be more fun to convert them to text smileys (like :-), :-( etc) and not just remove them.

    With the right filter I would trigger it with a plugin hook and filter/convert all input.

  • I just tried it and it really does not show up in the browser.

  • I used that function:

    function remove_emoji($text){
      return preg_replace('/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', '', $text);
    }

  • I have search for solutions and found a few ways:

    • Strip them out all together (like Sepp suggest)
    • Convert them to real smileys (there are several projects doing that on GitHub)
    • Convert database and tables from utf8 to utf8mb4 - that way no changes have to be made to Elgg, exept if Elgg doesn't support utf8mb4.
  • For UTF8MB4 we need someone to rework #7128 for 2.x and push it through to the finish line. And this only handles new installs, not migrating existing DBs.

  • Went with option 2, convert them to real smileys using the Emoji One project.

    Built a plugin that validates input (like HTMLawed) and replace unicode chars with emoji short codes.

    It also rewrites output/longtext to convert emoji shortnames to images, but I might go for a javascript solution for this instead.

    I'll release this plugin when I feel it's ready for production.