PHP detect encoding of each character in a string


Here is a little script I wrote that will detect the encoding of each character in a string. If the encoding is not UTF-8, it will try to convert the character using each of the below character encodings. The result of the encoding will be printed to the screen and if the character appears as it should, then it fits that encoding. Note that the character may match more than one encoding and this will not work with multi byte characters.

$encodings = array("UTF-8", "UTF-16", "ASCII",
		"Windows-1250", "Windows-1251", "Windows-1252", "Windows-1253", "Windows-1254", "Windows-1255", "Windows-1256", "Windows-1257", "Windows-1258",
		"ISO-8859-1", "ISO-8859-2", "ISO-8859-3", "ISO-8859-4", "ISO-8859-5", "ISO-8859-6", "ISO-8859-7", "ISO-8859-8", "ISO-8859-9", "ISO-8859-10",
		"ISO-8859-11", "ISO-8859-12", "ISO-8859-13", "ISO-8859-14", "ISO-8859-15", "ISO-8859-16",
		"CP1256", "CP1250", "CP1252", 'CP437', 'CP737', 'CP850', 'CP852', 'CP855', 'CP857', 'CP858', 'CP860', 'CP861', 'CP862', 'CP863', 'CP865',
		'CP866', 'CP869', 'CP37', 'CP930', 'CP1047', 'MIK', 'ISCII', 'TSCII', 'VISCII', 'JIS X 0208', 'EUC-JP', 'GB 2312', 'GBK', 'Big5',
		'HKSCS', 'KS X 1001', 'EUC-KR', 'ISO-2022-KR', 'Mac OS Roman', 'KOI7', 'KOI8-U', 'KOI8-R', 'GB18030', 'GB2312 80'
);
 
$string = 'This is a test string';
echo '<table>';
$len = strlen($string);
for ($i = 0; $i < $len; $i++) {
	$encoding = mb_detect_encoding($string[$i], 'UTF-8', true);
	echo '<tr><td>' . $i . '</td><td>' . $string[$i] . '</td><td>' . $encoding . '</td>';
	if($encoding != 'UTF-8') {
		foreach ($encodings as $j) {
			echo '<td>' . iconv($j, 'UTF-8', $string[$i]) . '</td>';
		}
	}
	echo '</tr>';
}
echo '</table>';

Leave a comment

Your email address will not be published. Required fields are marked *