[fr] Voici comment j'ai converti le contenu de ma base de données WordPress en UTF-8. C'est assez simple en soi, mais ça m'a pris longtemps pour comprendre comment le faire!
A few weeks ago, I discovered (to my horror) that my site was not UTF-8, as I thought it was. Checking my pages in the validator produced this error:
The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the XML declaration (utf-8). I will use the value from the HTTP header (iso-8859-1).
In all likeliness, my server adds a default header (iso-8859-1) to the pages it serves. When I switched to WordPress, I was careful to save all my import files as UTF-8, and I honestly thought that everything I had imported into the database was UTF-8. Somewhere in the process, it got switched back to iso-8859-1 (latin-1).
The solution to make sure the pages served are indeed UTF-8, as specified in the meta tags of my HTML pages, is to add the following line to .htaccess
:
AddDefaultCharset OFF
(If one wanted to force UTF-8, AddDefaultCharset UTF-8
would do it, but actually, it’s better to leave the possibility to serve pages with different encodings, isn’t it?)
Now, when I did that, of course, all the accented characters in my site went beserk — proof if it was needed that my database content was not UTF-8. Here is the story of what I went through (and it took many days to find the solution, believe me, although it takes only 2 minutes to do once everything is ready) to convert my database content from ISO-8859-1 to UTF-8. Thanks a lot to all those who helped me through this — and they are many!
First thing, dump the database. I always forget the command for dumps, so here it is:
mysqldump --opt -u root -p wordpress > wordpress.sql
As we’re going to be doing stuff, it might be wise to make a copy of the working wordpress database. I did that by creating a new database in PhpMyAdmin, and importing my freshly dumped database into it:
mysql -u root -p wordpress-backup < wordpress.sql
Then, conversion. I tried a PHP script, I tried BBEdit, and they seemed to mess up. (Though as I had other issues elsewhere, they may well have worked but I mistakenly thought the problem was coming from there.) Anyway, command-line conversion with iconv is much easier to do:
iconv -f iso-8859-15 -t utf8 wordpress.sql > wordpress-iconv.sql
Then, import into the database. I first imported it into another database, edited wp-config.php
to point to the new database, and checked that everything was ok:
mysql -u root -p wordpress-utf8 < wordpress-iconv.sql
Once I was happy that it was working, I imported my converted dump into the WordPress production database:
mysql -u root -p wordpress < wordpress-iconv.sql
On the way there, I had some trouble with MySQL. The MySQL dump more or less put the content of all my weblog posts on one line. For some reason, it didn’t cause any problems when importing the dump before conversion, to create the backup database, but it didn’t play nice after conversion.
I got this error when trying to import:
ERROR 1153 at line 378: Got a packet bigger than 'max_allowed_packet'
Line 378 contained half my weblog posts… and was obviously bigger than the 1Mb limit for max_allowed_packet
(the whole dump is around 2Mb).
I had to edit my.cnf
(/etc/mysql/my.cnf
on my system) and change the value for max_allowed_packet
in the section titled [mysqld]
. I set it to 8Mb. Then, I had to stop mysql and restart it: mysqladmin -u root -p shutdown
to stop it, and mysqld_safe &
to start it again (as root).
This is not necessarily the best way to do it, and it might not work like that on your system, but it’s what I did and the site is now back up again. Comments welcome, and hope this can be useful to others!
Your post covered exactly what I set out to do. Thank you!
FYI, you only got the max_allowed_packet error with the UTF-8 version because UTF-8 takes more space (more bytes) than ISO-8859-15.
i don’t have shell access but would love to know how to do this using phpmyadmin. any tips?
i’ve checked phpmyadmin and iconv docs — and the answer may be there — but alas i’m quite the novice and unsure how to word the query within phpmyadmin.
i am migrating a home-built coldfusion blog to wordpress and it’s full of special characters.
It is a very useful post, but as the previous commenter said: those people who have only phpmyadmin access to their database (i.e. the majority of the average WP users!) will just stare at it. Is there any chance to “convert” this instruction for them?
Thanks!
Problèmes d'encodage MySQL
Un joli mélange de latin-1 et d'utf-8 dans ma base de données. Un script python pour nettoyer tout ça….
I like the part about AddDefaultCharset OFF – didn’t think about that myself. For those without shell access and without access to iconv, I suggest to look at the method I used and has described in the wp support forum, in a thread namedchange from ISO-8859-15 to UTF-8
a few notes to those without shell access (and even on a windows machine on top of that 🙂 )
phpMyAdmin can dyump mysql data just fine (although it calls it “exporting”). Specify a file name and check the “save to file” option, and make sure you select all your tables.
As for backup, it’s unnecessary to import it back. just make a copy of the dump if you really want to. This description does contain a lot of redundant saves -beyond your normal safety needs-.
When you’re finished with conversion –see Uffe’s post above– just use phpMyAdmin’s “import file” feature to import it back.
Very good! Very helpful!
Work finelly fixing the problem to a Portuguese (Brazilian) site.
Thank you!
FYI for windows users – open the doc in wordpad and save with utf-8 encoding – should cure what ails you 🙂
Replace utf8 with utf-8 on some systems…
Eu le même problème et ayant suivi vos pas ça a marché niquel: zéro problèmes! Je vous remercie énormement!
Thanks, this how-to was working great for me.
I’ve find this article partially usefull for me but I’ve imported from a custom CMS and I’ve found better doing “iconv -f windows-1252 -t utf8 file.sql > file_utf8.sql” ’cause sometimes ppl do copy/paste from Word into the textarea.
Wow this is great! I was just pulling my hair wondering on how to convert the database 🙂
Thanx for sharing the knowledge
Thanks for that solution – worked like a charm, when I tried to import a greymatter blog written in danish.
:o) A
It worked perfect for me. Nice tip!
Hi,
I have same problem, can you help me convert ?
thank !
I think there’s an easier solution than restarting mysql (although I haven’t tested this, because I’ve never had a database big enough). Simply tell mysqldump to limit the size of the packets it creates by adding the following commandline argument:
–set-variable=max_allowed_packet=1M
That should have been two dashes, not an em dash. Let me try again:
--set-variable=max_allowed_packet=1M
For those using libiconv-1.9.2-1 ,new iconv file, “utf8” does not work use “utf-8” instead as in “iconv -f iso-8859-9 -t utf-8 wordp…..” perfectly converted iso türkish database to utf-8. Also be sure to convert your site language files (in joomla/mambo case /languages/turkish.php for example) to utf-8 by opening in textpad/notepad and saving as utf-8.
add to above. using phpmyadmin is fine, but do not compress, zip version of phpmyadmin and winrar/winzip does not match, dumpt to just sql, also no need to restart mysql server (at least in localhost), just refresh your page.
Just a quick note – it is wise to specify the character set when exporting/dumping from mysql with mysqldump, using the –default-character-set=latin1 command. If not, and your system is defaulting to utf8 after an upgrade, you get iso-8859-15 data inside a utf8 file, which took me quite some time to find out (sigh).
OR you can
Executing SQL Statements from a Text File
The mysql client typically is used interactively, like this:
shell> mysql db_name
However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here:
shell> mysql db_name < text_file
If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line:
shell> mysql < text_file
If you are already running mysql, you can execute an SQL script file using the source or . command:
mysql> source file_name
mysql> . file_name
Sometimes you may want your script to display progress information to the user. For this you can insert statements like this:
SELECT ” AS ‘ ‘;
The statement shown outputs .
Invalid argument supplied for foreach() in wp-capabilities.php: Case Cracked!…
Finally. At last. Endlich. Enfin.
Once more, while trying to transfer a WordPress installation from one server to another, I found myself facing the dreaded problem which locks me out of my WordPress install with a rather cryptic message:
Warning: In…
This info just saved my life. I thank you. bows
thanks you soo much for you documentation on how to convert the charset, it saved me a lot of time .. grüsse
Thank you for this post, especially the iconv bit! It worked perfectly!
I had problem converting a joomla ISO DB to UTF8 using the iconv -f iso-8859-15 -t utf8…
It didn’t convert german special caracters.
Then I used iconv -f CP1252 -t utf8 and it worked perfecly….
I had to convert a database from iso latin1 to UTF-8, but I’m not an expert in encodings.
What I found out is that MySQL obviously uses the character set you specify when creating a table to store the data and makes conversions depending of the client.
If the DB is stored using UTF-8 and the client specify latin1 (using the SET NAMES ‘latin1’ statement) after connecting, MySQL will convert all data to latin1 before sending the results to the client.
My need was to convert a database stored in latin1 to UTF-8, so I simply used the following commands (a quick hack) and MySQL would do the rest :
mysqldump –opt -u db_user -p db_name | sed “s/CHARSET=latin1/CHARSET=utf8/” > db.dump.sql
mysql -u db_user -p -D db_name < db.dump.sql
As suggested by Egil, specifying the charset to these commands would be wise.
The default charset of my server is latin1 so db.dump.sql is encoded in latin1, sed replaces all default charset for each table from latin1 to UTF-8 and the second mysql will parse the file using the latin1 charset, send the query to the server in latin1 and the server will convert and store all the data in UTF-8.
Here is a technique I found that allows you to do the conversion to unicode through phpMyAdmin. I have tried this personally with a WP site/database and it worked for me on a site that has both English and Japanese text. This approach becomes really tedious if you have a lot of database tables though.
Making your WordPress database portable:
http://jonkenpon.com/2007/02/20/making-your-wordpress-database-portable-because-it-probably-isnt-right-now/
I found another link that talks about your method and ads a little more details. People might find this helpful too:
http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL
AFAIK, this could be done much simpler:
ALTER TABLE
table_name
CHARSET ‘utf8’;This is what official MySQL documentation says: “If you want to change all CHAR/VARCHAR/TEXT columns to a new character set you can do: ALTER TABLE table_name CHARACTER SET character_set_name;”
I’m afraid that does not convert the actual content. It won’t work in this case. Trust me, I’d have done it, otherwise 😉
Sorry, I have mistaken with the statement: it should be
ALTER TABLE
table_name
CONVERT TO CHARACTER SET ‘utf8’;I have recently faced the same problem and that statement did really help!
You can read about it here: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html
Believe me, it works 🙂
Damn… I don’t get anything.
Seems, it DOES work.
Where to write this? mysqldump –opt -u root -p wordpress > wordpress.sql
No idea where to start. Please help me guys to fix this old Mongolian DB!
Frank, you will need to log into the server with your SSH client (provided you have SSH access enabled). And in the command line you type those commands.
Alternatively, you can probably do this with the help of PHP Shell.
Please feel free to contact me if you need help: sjinks (at) sjinks (dot) org (dot) ua
Regards,
Vladimir
Hello there,
unfortunately, it is not enough to add the –max_allowed_packet option to your mysql import statement. While this will increase the allowed packed size the client can send, it will not change the allowed packet size the mysql server. See here:
http://lists.mysql.com/mysql/198070
So, if your dump is very big, you will have to edit my.cnf to increase the packet size there–at least temporarily. Don't forget to restart after editing my.cnf
DrTebi
I am not sure what happens if I do the below:
Modidfy the original database tables accpet utf-8 over the running data.
(Note that taking a backup is always a safe way to start).
alter table
wordpress_table_1
change
latin_column
utf_column
text character set utf8 collate utf8_bin NOT NULL;If this works, you do not need to make another database restoration.
Great! I was having the same host header error on the validator. I had actually forgotten where the host header came from but your post cleared it up for me.
-Thanks
This a one of Important Site I am looking for. It helped me alot to bring the data of iso88591 to utf8. Thanks to the initiator and for all the pains underwent to prepare this doc
Thanks for these mysql tips, I’ve been looking for a solution with UTF8 for more tnah 2 days… I thought there was no solution :/
Hi,
@Bimal: be careful from Foreign Keys when you do this:
alter table wordpress_table_1
change latin_column utf_column text character set utf8 collate utf8_bin NOT NULL;
I tried something similar like:
alter table convert to character set utf8 collate utf8_general_ci;
and it doesn’t work at all if you have foreign keys in the tables in question. You need to drop foreign keys first.
The error is (lets say your table is called ):
ERROR 1025 (HY000): Error on rename of ‘./stockUTF8/#sql-3ea_17’ to ‘./stockUTF8/items’ (errno: 150)
And probably the method that Stephanie supplied won’t work with Foreign Keys. I am not sure, I need to test it with my tables that have foreign keys set.
I correct my previous post: the method that Stephanie used works fine even with Foreign Keys. However, iconv didn’t convert from latin1(iso-8859-1) to utf8 in my case. I haven’t investigated iconv yet. What I did is manually change every occurrence of “latin-1” to “utf8” in the .sql file.
this wordpress plugin (http://www.sociable.es/facebook-wordpress-plugin-3-0/) creating
Unicode bug
when i submit my blog link to facebook it display like odd charecters
“¤œ¤¾¤‡¤¤¤œ¤¾¤‡¤¤…”
the original text is “डिजाइनदार राखियों की…”
and when i uninstall this plugin, everything working fine
my blog: http://www.shreshthbharat.in (please try to post a link on facebook you’ll see odd character in preview )
there are some tables are in latin1_swedish_ci i change it to “utf8_unicode_ci” but still problem is there
i donot want to leave this powerful plugin
please tell me how do i fix this bug
Make sure you’re not confusing collation with the actual encoding the content is in.
With out dumping the database, and then restore it back:
$ mysql -u root -p
password:
mysql> ALTER DATABASE wordpress-iconv CHARACTER SET utf8 COLLATE utf8_unicode_ci;