Upgrading to 1.7 if your database is already utf8

This is very important for those who modified Elgg to use utf8 encoding for the connection to the database!

If you added either of these two commands so that your strings would be stored as utf8 strings:

  1. mysql_set_charset('utf8',...)
  2. mysql_query("SET NAMES 'utf8'");

the upgrade script included with Elgg 1.7 will corrupt your database. Versions of Elgg before 1.7 used the default character encoding that php sets for a mysql connection (latin1). The upgrade script converts all strings in the database from latin1 to utf8. If your strings are already utf8, you need to skip the conversion by removing this file before running the upgrade: engine/schema/upgrades/2009100701.sql

  • This is good information, but wouldn't it be better for the elgg upgrader / installer to test this before using it? and if it should not use it, then not use it?

  • I don't believe there is a simple way to test for this.

  • The upgrader expects the core database to be in an unmodified state because there is no way to predict what modifications users might make to the core.  If a user is changing the database structure, he risks not being able to upgrade.  Users making such modifications need to plan ahead to test their modifications against newer versions.

  • Understood, but apparently this is an expected issue, since Cash is even addressing it.  So, is it all that modification of the DB, if Cash already knows that this is a problem and he is providing information about it?  It *appears* that maybe there is more to this?

  • Thanks Cash,

    Your assistance is always very welcome.

    With Love,
    Uddhava dāsa

  • @Yakiv - There were 3 reports of this and Cash attempted to duplicate it so he created a ticket about it.  With more testing, we established that this is NOT a problem on unmodified versions.  By chance I remembered a thread in which people were suggesting to make these modifications so I had a pretty good idea of what was going on by that point and wrote instructions on how to avoid corruption.  Cash has made the suggestions and warnings from that ticket more public--it was not expected.

    Elgg upgrades must expect the database to be in the state as it creates it.  Because of the difficulty (and sometimes impossibility) of predicting and testing for changes to the database and code during an upgrade, the user who changed the code must be expected to understand the risks and be literate enough to handle any problems from changing core.

    In this example, I'm not sure if it's even possible to test which character set strings are written in.  It's not a structural change to the database; it's a change in how PHP talks to it.

  • That said, I suppose before an upgrade we could take an MD5 of all files and compare it to known values.  If they fail, we could notify the user and optionally allow them to continue.  I don't think will be necessarily useful, however, as users who have modified their files generally KNOW they have modified them...

  • Thanks for the further explanation. Very informative!  :-)

  • Brett shows "John Bull style" in action. It's a bad manners in troublesolving, I have to mention... and I haven't complaints to Cash