Previous post: Paléo Festival Nyon

Next post: Easier TopicExchange Trackbacks for WordPress

Converting MySQL Database Contents to UTF-8

by Stephanie on 07.18.2004

in Real Live Code, Weblog Technology

[fr]

Voici comment j'ai converti le contenu de ma base de données WordPress en UTF-8. C'est assez simple en soi, mais ça m'a pris longtemps pour comprendre comment le faire!

[en]

A few weeks ago, I discovered (to my horror) that my site was not UTF-8, as I thought it was. Checking my pages in the validator produced this error:

The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the XML declaration (utf-8). I will use the value from the HTTP header (iso-8859-1).

In all likeliness, my server adds a default header (iso-8859-1) to the pages it serves. When I switched to WordPress, I was careful to save all my import files as UTF-8, and I honestly thought that everything I had imported into the database was UTF-8. Somewhere in the process, it got switched back to iso-8859-1 (latin-1).

The solution to make sure the pages served are indeed UTF-8, as specified in the meta tags of my HTML pages, is to add the following line to .htaccess:

AddDefaultCharset OFF

(If one wanted to force UTF-8, AddDefaultCharset UTF-8 would do it, but actually, it’s better to leave the possibility to serve pages with different encodings, isn’t it?)

Now, when I did that, of course, all the accented characters in my site went beserk — proof if it was needed that my database content was not UTF-8. Here is the story of what I went through (and it took many days to find the solution, believe me, although it takes only 2 minutes to do once everything is ready) to convert my database content from ISO-8859-1 to UTF-8. Thanks a lot to all those who helped me through this — and they are many!

First thing, dump the database. I always forget the command for dumps, so here it is:

mysqldump --opt -u root -p wordpress > wordpress.sql

As we’re going to be doing stuff, it might be wise to make a copy of the working wordpress database. I did that by creating a new database in PhpMyAdmin, and importing my freshly dumped database into it:

mysql -u root -p wordpress-backup < wordpress.sql

Then, conversion. I tried a PHP script, I tried BBEdit, and they seemed to mess up. (Though as I had other issues elsewhere, they may well have worked but I mistakenly thought the problem was coming from there.) Anyway, command-line conversion with iconv is much easier to do:

iconv -f iso-8859-15 -t utf8 wordpress.sql > wordpress-iconv.sql

Then, import into the database. I first imported it into another database, edited wp-config.php to point to the new database, and checked that everything was ok:

mysql -u root -p wordpress-utf8 < wordpress-iconv.sql

Once I was happy that it was working, I imported my converted dump into the WordPress production database:

mysql -u root -p wordpress < wordpress-iconv.sql

On the way there, I had some trouble with MySQL. The MySQL dump more or less put the content of all my weblog posts on one line. For some reason, it didn’t cause any problems when importing the dump before conversion, to create the backup database, but it didn’t play nice after conversion.

I got this error when trying to import:

ERROR 1153 at line 378: Got a packet bigger than 'max_allowed_packet'

Line 378 contained half my weblog posts… and was obviously bigger than the 1Mb limit for max_allowed_packet (the whole dump is around 2Mb).

I had to edit my.cnf (/etc/mysql/my.cnf on my system) and change the value for max_allowed_packet in the section titled [mysqld]. I set it to 8Mb. Then, I had to stop mysql and restart it: mysqladmin -u root -p shutdown to stop it, and mysqld_safe & to start it again (as root).

This is not necessarily the best way to do it, and it might not work like that on your system, but it’s what I did and the site is now back up again. Comments welcome, and hope this can be useful to others!

{ 13 trackbacks }

Climb to the Stars (Stephanie Booth)
02.19.2005 at 19:09
星☆空
02.21.2005 at 22:19
Blog’n » Blog Archive » Blog oppdatering
03.06.2005 at 8:35
Le monde selon waffi » Techno bla-bla
11.10.2005 at 7:49
Bis einer weint… » Blog Archive » Konvertieren auf Zeichencodierung UTF-8
02.17.2006 at 0:31
Slawek’s Hideout » Wordpress ISO-8859-2 > UTF-8
03.18.2006 at 15:49
Stefan Macke » Blog Archive » Umstieg von b2evolution auf Wordpress
04.30.2006 at 23:57
Weblogger.ch » Passage à UTF-8
09.07.2006 at 22:16
Actualización de WordPress y cambio de encoding - Trabber Blog
11.03.2006 at 11:47
geheimwerk.de » Blog Archiv » Encoding-Wirrwarr
12.09.2006 at 14:56
Climb to the Stars (Stephanie Booth)
02.03.2007 at 23:53
Climb to the Stars (Stephanie Booth) » Going Solo Venues, Open Stage, and Link Love
01.22.2008 at 13:50
MySQL + UTF-8 = Not So Obvious « Ward’s Words
03.28.2009 at 8:56

{ 74 comments… read them below or add one }

1 fatalfuj 08.02.2004 at 18:27

Your post covered exactly what I set out to do. Thank you!

2 fatalfuj 08.02.2004 at 19:27

Your post covered exactly what I set out to do. Thank you!

3 scott reynen 09.05.2004 at 15:43

FYI, you only got the max_allowed_packet error with the UTF-8 version because UTF-8 takes more space (more bytes) than ISO-8859-15.

4 scott reynen 09.05.2004 at 16:43

FYI, you only got the max_allowed_packet error with the UTF-8 version because UTF-8 takes more space (more bytes) than ISO-8859-15.

5 doug 09.16.2004 at 0:11

i don't have shell access but would love to know how to do this using phpmyadmin. any tips?

i've checked phpmyadmin and iconv docs — and the answer may be there — but alas i'm quite the novice and unsure how to word the query within phpmyadmin.

i am migrating a home-built coldfusion blog to wordpress and it's full of special characters.

6 doug 09.16.2004 at 1:11

i don’t have shell access but would love to know how to do this using phpmyadmin. any tips?

i’ve checked phpmyadmin and iconv docs — and the answer may be there — but alas i’m quite the novice and unsure how to word the query within phpmyadmin.

i am migrating a home-built coldfusion blog to wordpress and it’s full of special characters.

7 Moshu 11.10.2004 at 4:34

It is a very useful post, but as the previous commenter said: those people who have only phpmyadmin access to their database (i.e. the majority of the average WP users!) will just stare at it. Is there any chance to “convert” this instruction for them?
Thanks!

8 Moshu 11.10.2004 at 6:34

It is a very useful post, but as the previous commenter said: those people who have only phpmyadmin access to their database (i.e. the majority of the average WP users!) will just stare at it. Is there any chance to “convert” this instruction for them?
Thanks!

9 Climb to the Stars (Stephanie 02.19.2005 at 18:09

Problèmes d'encodage MySQL

Un joli mélange de latin-1 et d'utf-8 dans ma base de données. Un script python pour nettoyer tout ça….

10 星☆空 02.21.2005 at 21:19

Linux 下将 MySQL 编�转�至 UTF-8 的实例

Ping Back�自:blog.csdn.net

11 Uffe 03.01.2005 at 12:43

I like the part about AddDefaultCharset OFF - didn't think about that myself. For those without shell access and without access to iconv, I suggest to look at the method I used and has described in the wp support forum, in a thread namedchange from ISO-8859-15 to UTF-8

12 Uffe 03.01.2005 at 13:43

I like the part about AddDefaultCharset OFF - didn’t think about that myself. For those without shell access and without access to iconv, I suggest to look at the method I used and has described in the wp support forum, in a thread namedchange from ISO-8859-15 to UTF-8

13 keli 03.08.2005 at 20:23

a few notes to those without shell access (and even on a windows machine on top of that :) )

phpMyAdmin can dyump mysql data just fine (although it calls it “exporting”). Specify a file name and check the “save to file” option, and make sure you select all your tables.

As for backup, it's unnecessary to import it back. just make a copy of the dump if you really want to. This description does contain a lot of redundant saves -beyond your normal safety needs-.

When you're finished with conversion –see Uffe's post above– just use phpMyAdmin's “import file” feature to import it back.

14 keli 03.08.2005 at 21:23

a few notes to those without shell access (and even on a windows machine on top of that :) )

phpMyAdmin can dyump mysql data just fine (although it calls it “exporting”). Specify a file name and check the “save to file” option, and make sure you select all your tables.

As for backup, it’s unnecessary to import it back. just make a copy of the dump if you really want to. This description does contain a lot of redundant saves -beyond your normal safety needs-.

When you’re finished with conversion –see Uffe’s post above– just use phpMyAdmin’s “import file” feature to import it back.

15 Diego 04.05.2005 at 0:38

Very good! Very helpful!
Work finelly fixing the problem to a Portuguese (Brazilian) site.
Thank you!

16 Diego 04.05.2005 at 1:38

Very good! Very helpful!
Work finelly fixing the problem to a Portuguese (Brazilian) site.
Thank you!

17 streaky 04.30.2005 at 15:10

FYI for windows users - open the doc in wordpad and save with utf-8 encoding - should cure what ails you :)

18 streaky 04.30.2005 at 16:10

FYI for windows users - open the doc in wordpad and save with utf-8 encoding - should cure what ails you :)

19 Cloud 08.10.2005 at 15:44

Replace utf8 with utf-8 on some systems…

20 Cloud 08.10.2005 at 16:44

Replace utf8 with utf-8 on some systems…

21 marc 10.28.2005 at 18:16

Eu le même problème et ayant suivi vos pas ça a marché niquel: zéro problèmes! Je vous remercie énormement!

22 marc 10.28.2005 at 19:16

Eu le même problème et ayant suivi vos pas ça a marché niquel: zéro problèmes! Je vous remercie énormement!

23 Alonzo 01.06.2006 at 22:23

Thanks, this how-to was working great for me.

24 Alonzo 01.06.2006 at 23:23

Thanks, this how-to was working great for me.

25 Paolo 01.23.2006 at 18:48

I've find this article partially usefull for me but I've imported from a custom CMS and I've found better doing “iconv -f windows-1252 -t utf8 file.sql > file_utf8.sql” 'cause sometimes ppl do copy/paste from Word into the textarea.

26 Paolo 01.23.2006 at 19:48

I’ve find this article partially usefull for me but I’ve imported from a custom CMS and I’ve found better doing “iconv -f windows-1252 -t utf8 file.sql > file_utf8.sql” ’cause sometimes ppl do copy/paste from Word into the textarea.

27 Marcus 01.26.2006 at 5:45

Wow this is great! I was just pulling my hair wondering on how to convert the database :)

Thanx for sharing the knowledge

28 Marcus 01.26.2006 at 6:45

Wow this is great! I was just pulling my hair wondering on how to convert the database :)

Thanx for sharing the knowledge

29 Alexander Kjerulf 02.07.2006 at 21:32

Thanks for that solution - worked like a charm, when I tried to import a greymatter blog written in danish.

:o) A

30 Alexander Kjerulf 02.07.2006 at 22:32

Thanks for that solution - worked like a charm, when I tried to import a greymatter blog written in danish.

:o) A

31 sogas 03.16.2006 at 8:46

It worked perfect for me. Nice tip!

32 sogas 03.16.2006 at 9:46

It worked perfect for me. Nice tip!

33 xuan 06.02.2006 at 6:55

Hi,

I have same problem, can you help me convert ?
thank !

34 xuan 06.02.2006 at 6:55

Hi,

I have same problem, can you help me convert ?
thank !

35 Max 06.23.2006 at 12:17

I think there’s an easier solution than restarting mysql (although I haven’t tested this, because I’ve never had a database big enough). Simply tell mysqldump to limit the size of the packets it creates by adding the following commandline argument:

–set-variable=max_allowed_packet=1M

36 Max 06.23.2006 at 12:17

I think there's an easier solution than restarting mysql (although I haven't tested this, because I've never had a database big enough). Simply tell mysqldump to limit the size of the packets it creates by adding the following commandline argument:

–set-variable=max_allowed_packet=1M

37 Max 06.23.2006 at 12:19

That should have been two dashes, not an em dash. Let me try again:

--set-variable=max_allowed_packet=1M

38 Max 06.23.2006 at 12:19

That should have been two dashes, not an em dash. Let me try again:

--set-variable=max_allowed_packet=1M

39 gorkem 08.14.2006 at 12:28

For those using libiconv-1.9.2-1 ,new iconv file, “utf8″ does not work use “utf-8″ instead as in “iconv -f iso-8859-9 -t utf-8 wordp…..” perfectly converted iso türkish database to utf-8. Also be sure to convert your site language files (in joomla/mambo case /languages/turkish.php for example) to utf-8 by opening in textpad/notepad and saving as utf-8.

40 gorkem 08.14.2006 at 12:28

For those using libiconv-1.9.2-1 ,new iconv file, “utf8″ does not work use “utf-8″ instead as in “iconv -f iso-8859-9 -t utf-8 wordp…..” perfectly converted iso türkish database to utf-8. Also be sure to convert your site language files (in joomla/mambo case /languages/turkish.php for example) to utf-8 by opening in textpad/notepad and saving as utf-8.

41 gorkem 08.14.2006 at 12:30

add to above. using phpmyadmin is fine, but do not compress, zip version of phpmyadmin and winrar/winzip does not match, dumpt to just sql, also no need to restart mysql server (at least in localhost), just refresh your page.

42 gorkem 08.14.2006 at 12:30

add to above. using phpmyadmin is fine, but do not compress, zip version of phpmyadmin and winrar/winzip does not match, dumpt to just sql, also no need to restart mysql server (at least in localhost), just refresh your page.

43 Egil 08.15.2006 at 23:51

Just a quick note - it is wise to specify the character set when exporting/dumping from mysql with mysqldump, using the –default-character-set=latin1 command. If not, and your system is defaulting to utf8 after an upgrade, you get iso-8859-15 data inside a utf8 file, which took me quite some time to find out (sigh).

44 Egil 08.15.2006 at 23:51

Just a quick note - it is wise to specify the character set when exporting/dumping from mysql with mysqldump, using the –default-character-set=latin1 command. If not, and your system is defaulting to utf8 after an upgrade, you get iso-8859-15 data inside a utf8 file, which took me quite some time to find out (sigh).

45 eWebmaster 01.31.2007 at 18:22

OR you can

Executing SQL Statements from a Text File

The mysql client typically is used interactively, like this:

shell> mysql db_name

However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here:

shell> mysql db_name < text_file

If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line:

shell> mysql < text_file

If you are already running mysql, you can execute an SQL script file using the source or . command:

mysql> source file_name
mysql> . file_name

Sometimes you may want your script to display progress information to the user. For this you can insert statements like this:

SELECT '' AS ' ';

The statement shown outputs .

46 eWebmaster 01.31.2007 at 19:22

OR you can

Executing SQL Statements from a Text File

The mysql client typically is used interactively, like this:

shell> mysql db_name

However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here:

shell> mysql db_name < text_file

If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line:

shell> mysql < text_file

If you are already running mysql, you can execute an SQL script file using the source or . command:

mysql> source file_name
mysql> . file_name

Sometimes you may want your script to display progress information to the user. For this you can insert statements like this:

SELECT ” AS ‘ ‘;

The statement shown outputs .

47 Climb to the Stars (Stephanie 02.03.2007 at 22:53

Invalid argument supplied for foreach() in wp-capabilities.php: Case Cracked!…

Finally. At last. Endlich. Enfin.

Once more, while trying to transfer a WordPress installation from one server to another, I found myself facing the dreaded problem which locks me out of my WordPress install with a rather cryptic message:

Warning: In…

48 John 03.01.2007 at 22:02

This info just saved my life. I thank you. bows

49 John 03.01.2007 at 23:02

This info just saved my life. I thank you. bows

50 wal-d 04.14.2007 at 11:06

thanks you soo much for you documentation on how to convert the charset, it saved me a lot of time .. grüsse

51 wal-d 04.14.2007 at 11:06

thanks you soo much for you documentation on how to convert the charset, it saved me a lot of time .. grüsse

52 Marek 05.20.2007 at 20:49

Thank you for this post, especially the iconv bit! It worked perfectly!

53 Marek 05.20.2007 at 20:49

Thank you for this post, especially the iconv bit! It worked perfectly!

54 gomba 05.31.2007 at 14:23

I had problem converting a joomla ISO DB to UTF8 using the iconv -f iso-8859-15 -t utf8…
It didn’t convert german special caracters.

Then I used iconv -f CP1252 -t utf8 and it worked perfecly….

55 gomba 05.31.2007 at 14:23

I had problem converting a joomla ISO DB to UTF8 using the iconv -f iso-8859-15 -t utf8…
It didn't convert german special caracters.

Then I used iconv -f CP1252 -t utf8 and it worked perfecly….

56 strawks 07.13.2007 at 14:08

I had to convert a database from iso latin1 to UTF-8, but I’m not an expert in encodings.
What I found out is that MySQL obviously uses the character set you specify when creating a table to store the data and makes conversions depending of the client.
If the DB is stored using UTF-8 and the client specify latin1 (using the SET NAMES ‘latin1′ statement) after connecting, MySQL will convert all data to latin1 before sending the results to the client.

My need was to convert a database stored in latin1 to UTF-8, so I simply used the following commands (a quick hack) and MySQL would do the rest :

mysqldump –opt -u db_user -p db_name | sed “s/CHARSET=latin1/CHARSET=utf8/” > db.dump.sql

mysql -u db_user -p -D db_name < db.dump.sql

As suggested by Egil, specifying the charset to these commands would be wise.

The default charset of my server is latin1 so db.dump.sql is encoded in latin1, sed replaces all default charset for each table from latin1 to UTF-8 and the second mysql will parse the file using the latin1 charset, send the query to the server in latin1 and the server will convert and store all the data in UTF-8.

57 strawks 07.13.2007 at 14:08

I had to convert a database from iso latin1 to UTF-8, but I'm not an expert in encodings.
What I found out is that MySQL obviously uses the character set you specify when creating a table to store the data and makes conversions depending of the client.
If the DB is stored using UTF-8 and the client specify latin1 (using the SET NAMES 'latin1' statement) after connecting, MySQL will convert all data to latin1 before sending the results to the client.

My need was to convert a database stored in latin1 to UTF-8, so I simply used the following commands (a quick hack) and MySQL would do the rest :

mysqldump –opt -u db_user -p db_name | sed “s/CHARSET=latin1/CHARSET=utf8/” > db.dump.sql

mysql -u db_user -p -D db_name < db.dump.sql

As suggested by Egil, specifying the charset to these commands would be wise.

The default charset of my server is latin1 so db.dump.sql is encoded in latin1, sed replaces all default charset for each table from latin1 to UTF-8 and the second mysql will parse the file using the latin1 charset, send the query to the server in latin1 and the server will convert and store all the data in UTF-8.

58 Christophe 01.25.2008 at 7:15

Here is a technique I found that allows you to do the conversion to unicode through phpMyAdmin. I have tried this personally with a WP site/database and it worked for me on a site that has both English and Japanese text. This approach becomes really tedious if you have a lot of database tables though.

Making your Wordpress database portable:
http://jonkenpon.com/2007/02/20/making-your-wor...>

59 Christophe 01.25.2008 at 7:48

I found another link that talks about your method and ads a little more details. People might find this helpful too:

http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL

60 Christophe 01.25.2008 at 8:15

Here is a technique I found that allows you to do the conversion to unicode through phpMyAdmin. I have tried this personally with a WP site/database and it worked for me on a site that has both English and Japanese text. This approach becomes really tedious if you have a lot of database tables though.

Making your Wordpress database portable:
http://jonkenpon.com/2007/02/20/making-your-wordpress-database-portable-because-it-probably-isnt-right-now/

61 Christophe 01.25.2008 at 8:48

I found another link that talks about your method and ads a little more details. People might find this helpful too:

http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL

62 Vladimir Kolesnikov 03.09.2008 at 20:48

AFAIK, this could be done much simpler:

ALTER TABLE table_name CHARSET 'utf8';

This is what official MySQL documentation says: “If you want to change all CHAR/VARCHAR/TEXT columns to a new character set you can do: ALTER TABLE table_name CHARACTER SET character_set_name;”

63 Vladimir Kolesnikov 03.09.2008 at 21:48

AFAIK, this could be done much simpler:

ALTER TABLE table_name CHARSET ‘utf8′;

This is what official MySQL documentation says: “If you want to change all CHAR/VARCHAR/TEXT columns to a new character set you can do: ALTER TABLE table_name CHARACTER SET character_set_name;”

64 Stephanie 03.10.2008 at 11:41

I'm afraid that does not convert the actual content. It won't work in this case. Trust me, I'd have done it, otherwise ;-)

65 Stephanie 03.10.2008 at 12:41

I’m afraid that does not convert the actual content. It won’t work in this case. Trust me, I’d have done it, otherwise ;-)

66 Vladimir Kolesnikov 03.13.2008 at 14:50

Sorry, I have mistaken with the statement: it should be

ALTER TABLE table_name CONVERT TO CHARACTER SET 'utf8';

I have recently faced the same problem and that statement did really help!

You can read about it here: http://dev.mysql.com/doc/refman/5.0/en/alter-ta...>

Believe me, it works :-)

67 Vladimir Kolesnikov 03.13.2008 at 15:50

Sorry, I have mistaken with the statement: it should be

ALTER TABLE table_name CONVERT TO CHARACTER SET ‘utf8′;

I have recently faced the same problem and that statement did really help!

You can read about it here: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

Believe me, it works :-)

68 Frank 03.20.2008 at 2:55

Damn… I don't get anything.
Seems, it DOES work.

Where to write this? mysqldump –opt -u root -p wordpress > wordpress.sql

No idea where to start. Please help me guys to fix this old Mongolian DB!

69 Frank 03.20.2008 at 3:55

Damn… I don’t get anything.
Seems, it DOES work.

Where to write this? mysqldump –opt -u root -p wordpress > wordpress.sql

No idea where to start. Please help me guys to fix this old Mongolian DB!

70 Vladimir Kolesnikov 03.22.2008 at 18:56

Where to write this?

Frank, you will need to log into the server with your SSH client (provided you have SSH access enabled). And in the command line you type those commands.

Alternatively, you can probably do this with the help of PHP Shell.

Please feel free to contact me if you need help: sjinks (at) sjinks (dot) org (dot) ua

Regards,
Vladimir

71 Vladimir Kolesnikov 03.22.2008 at 19:56

Where to write this?

Frank, you will need to log into the server with your SSH client (provided you have SSH access enabled). And in the command line you type those commands.

Alternatively, you can probably do this with the help of PHP Shell.

Please feel free to contact me if you need help: sjinks (at) sjinks (dot) org (dot) ua

Regards,
Vladimir

72 DrTebi 08.26.2008 at 3:35

Hello there,

unfortunately, it is not enough to add the –max_allowed_packet option to your mysql import statement. While this will increase the allowed packed size the client can send, it will not change the allowed packet size the mysql server. See here:
http://lists.mysql.com/mysql/198070

So, if your dump is very big, you will have to edit my.cnf to increase the packet size there–at least temporarily. Don't forget to restart after editing my.cnf

DrTebi

73 Bimal 12.16.2008 at 5:05

I am not sure what happens if I do the below:
Modidfy the original database tables accpet utf-8 over the running data.
(Note that taking a backup is always a safe way to start).

alter table wordpress_table_1
change latin_column utf_column text character set utf8 collate utf8_bin NOT NULL;

If this works, you do not need to make another database restoration.

74 Oliver 01.24.2009 at 6:00

Great! I was having the same host header error on the validator. I had actually forgotten where the host header came from but your post cleared it up for me.

-Thanks

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

« Back to text comment

Additional comments powered by BackType

Clicky Web Analytics