Hebrew chars turn to gibberish with elgg

Hi all.

In my project i'm giving the option to write messages and posts in Hebrew.
The whole site remains in english so i don't want to change the site's language, just show those hebrew letters on the river and anywhere else.
Hebrew is written right to left so messages in Hebrew are supposed to be align to the right (this can be done by giving CSS rules to the hebrew : {direction:'rtl' , text-align:'right'}
My problem is how to decide which messages should be aligned and which not (english messages are also supported of course), my starting point is that if the message's first letter is hebrew then i'll align it to the right.

Now i know it's bad practice but just for testing i tried to meddle with elgg's core 'ElggAutoP.php' file (vendor\elgg\elgg\engine\classes\elggAutoP.php), around line 190 where it generates the <p> elements that wraps the text from the db,

Again just for tests, i declared $firstLetter as $html[7] (this is the first letter of the input) and dump_var it, for english letters i get the letter with no problem, but for hebrew letters i get �.
I checked the Ascii code of this gibberish char and it's 215 (hebrew chars are between 128-154 with hebrew encoding - https://www.ascii-codes.com/cp862.html), i also tried creating an array with all the hebrew chars and then check if $firstLetter appears in this array - returns false.

here is the code that i've inserted in core's elggAutoP.php (two methods - array of chars commented out and ascii code comparison  -  line ~190 :

// strip AUTOPs that should be removed

$html = preg_replace('@<autop r="1">(.*?)</autop>@', '\\1', $html); 



// commit to converting AUTOPs to Ps


// checking if first leeter is Hebrew - TODO : improve this test

$firstLetterAscii = ord($html[7]);   // for hebrew this var is always 215

$dirClass = (
 ($firstLetterAscii >= 128 && $firstLetterAscii <= 154) ||
  $firstLetterAscii == 215)? 'RTL' : 'LTR';     
   // this works because the 215 condition is met but it seems fragile and i would'nt count on it 


// $firstLetter = $html[7];

// var_dump($firstLetter);

// $hebLetters = array('א','ב','ג','ד','ה','ו','ז','ח','ט','י','כ','ל','מ','נ','ס','ע','פ','צ','ק','ר','ש','ת','ך','ם','ן','ף','ץ','�');

// $dirClass = (in_array($firstLetter, $hebLetters))? 'RTL' : 'LTR';


$html = str_replace('<autop>', "\n<p class='".$dirClass."'>"


Anyway i want to make it clear that hebrew does appears on screen correctly, the problem is that i can't do manipulations on it in the php file.

I believe that the problem start from the fact that the encoding is not set - but i might be wrong.

Would love to get some help in how to set encoding with elgg or anything else that you might think will solve this problem i have.  Thanks

 
  • I have not really any experience with Hebrew (or more generally UTF8 encoding). I think the problem you have here is that you can't work with ASCII codes at all as Hebrew characters are not included in this standard. It's the UTF8 encoding you need to work with. What makes it much more complicated is that the characters can have different lengths in UTF8 encoding (up to 4 bytes) whereas PHP relies on 1 byte/character. The Hebrew characters seem to be of 2 byte length. Problem here is that PHP is likely to fail when you try to compare a single character out of a string that can contain multi-byte characters.

    I don't know if PHP 7 might fully support multi-byte characters / UTF8 out of the box. Then it might work to work with strings regardless of encoding. If not on PHP 7 (or maybe even then) you might have to use the functionality of the mbstring PHP extension. Unfortunately, I can't give you any advice here as I never worked with the functions of this extension. Maybe it helps to google to learn how to make use of the mbstring functionality.

    Another options could be to make use of the Extended Tinymce plugin (https://elgg.org/plugins/782028) and modify the default editor config used by it. Tinymce has a "directionality" plugin (https://www.tinymce.com/docs/plugins/directionality/) that adds toolbar options to switch between ltr and rtl input. It would be necessary to add this plugin to the list of plugins loaded in the config of the Extended Tinymce plugin and to add the corresponding toolbar options. I have to say though that I never tried this plugin, so I don't know how it works exactly. The CKEditor (that already comes bundled with Elgg) might have a similar functionality. But I have no experience with this editor.

Beginning Developers

Beginning Developers

This space is for newcomers, who wish to build a new plugin or to customize an existing one to their liking