02-22-2008, 06:53 PM
|
#2 (permalink)
|
|
Super Moderator
Join Date: Sep 2007
Posts: 165
Thanks: 0
|
Quote:
Originally Posted by Devels
I always have troubles with UTF8.
The scenario: Mulitple sources (like xml, csv, own cms), and one database: Mysql. Information needs to be displayed on a website or outputted in XML to another program.
Do I need to store text like they are? Special settings needed (collocation?) in the mysql database, what php functions need to be used?
To display these on a webpage I use a utf-8 header using php or a meta tag saying it is utf8. But still special characters are displayed incorrect or the xml is broken because of an invalid character.
What is the best consistent way to avoid these problems, or has someone really good reading about this topic?
|
Hi Devels,
The UTF-8 common issues are a major one, and there is a few things that can be done to get around issues and irregularitys with it.
1) Use the PHP DOM for xml, xml by default has to be utf-8 encoded to work and the DOM enforces this for that, so for XML all is well.
2) In your PHP settings ensure you have output and input set to use UTF-8
3) In Mysql set your database to use utf8_general or something very similar to that, not sure what its called exactly but has the words utf8 and general within it!
4) If you are scraping data from the web, ie websites you are going to need to go about some detection and conversion. Iconv, mbstring are two things you should look into RE that. You should first check the HTTP header for the charset (from the source server) and then the html for meta tags or some other indicator. Failing that, there is a mb_detect_encoding, which can be used but it is slow and pretty crummmy at standard detection!
5) Install mbstring, and any functions like strlen can be replaced with mb_strlen to be utf-8 compliant (until PHP 6 arrives!)
Hope that helps you!
|
|
|
|