欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

PHP读取文件,解决中文乱码UTF-8的方法分析

程序员文章站 2023-11-18 20:53:40
本文实例讲述了php读取文件,解决中文乱码utf-8的方法。分享给大家供大家参考,具体如下: $opts = array( 'file' => array(...

本文实例讲述了php读取文件,解决中文乱码utf-8的方法。分享给大家供大家参考,具体如下:

$opts = array(
  'file' => array(
    'encoding' => "utf-8"
  )
);
$opts = array('http' => array('encoding' => 'utf-8'));
$ctxt = stream_context_create($opts);
$content = file_get_contents($filepath, file_text, $ctxt);

最简单的就是将gf2312→utf-8

$str = iconv("gb2312", "utf-8", $str);

不管用的

$content = mb_convert_encoding($content, "utf-8", "auto");

******************************************丑陋的分割线来告诉大家上面的不好的:下面的才是正确的方法···哈哈···**********************************************************

define('utf32_big_endian_bom', chr(0x00) . chr(0x00) . chr(0xfe) . chr(0xff));
define('utf32_little_endian_bom', chr(0xff) . chr(0xfe) . chr(0x00) . chr(0x00));
define('utf16_big_endian_bom', chr(0xfe) . chr(0xff));
define('utf16_little_endian_bom', chr(0xff) . chr(0xfe));
define('utf8_bom', chr(0xef) . chr(0xbb) . chr(0xbf));

$text = file_get_contents($newpath);
$first2 = substr($text, 0, 2);
$first3 = substr($text, 0, 3);
$first4 = substr($text, 0, 3);
$encodtype = "";
if ($first3 == utf8_bom)
  $encodtype = 'utf-8 bom';
else if ($first4 == utf32_big_endian_bom)
  $encodtype = 'utf-32be';
else if ($first4 == utf32_little_endian_bom)
  $encodtype = 'utf-32le';
else if ($first2 == utf16_big_endian_bom)
  $encodtype = 'utf-16be';
else if ($first2 == utf16_little_endian_bom)
  $encodtype = 'utf-16le';

$content = file_get_contents($newpath);

$content = iconv($encodtype, "utf-8", $content);

终极版·····

$text = file_get_contents($filepath);
//$encodtype = mb_detect_encoding($text);
define('utf32_big_endian_bom', chr(0x00) . chr(0x00) . chr(0xfe) . chr(0xff));
define('utf32_little_endian_bom', chr(0xff) . chr(0xfe) . chr(0x00) . chr(0x00));
define('utf16_big_endian_bom', chr(0xfe) . chr(0xff));
define('utf16_little_endian_bom', chr(0xff) . chr(0xfe));
define('utf8_bom', chr(0xef) . chr(0xbb) . chr(0xbf));
$first2 = substr($text, 0, 2);
$first3 = substr($text, 0, 3);
$first4 = substr($text, 0, 3);
$encodtype = "";
if ($first3 == utf8_bom)
  $encodtype = 'utf-8 bom';
else if ($first4 == utf32_big_endian_bom)
  $encodtype = 'utf-32be';
else if ($first4 == utf32_little_endian_bom)
  $encodtype = 'utf-32le';
else if ($first2 == utf16_big_endian_bom)
  $encodtype = 'utf-16be';
else if ($first2 == utf16_little_endian_bom)
  $encodtype = 'utf-16le';
//下面的判断主要还是判断ansi编码的·
if ($encodtype == '') {//即默认创建的txt文本-ansi编码的
  $content = iconv("gbk", "utf-8", $text);
} else if ($encodtype == 'utf-8 bom') {//本来就是utf-8不用转换
  $content = $text;
} else {//其他的格式都转化为utf-8就可以了
  $content = iconv($encodtype, "utf-8", $text);
}

以上的终极版·可以适应中文操作windows系统建立的ansi``````````````utf-8`````````unicode`````的txt文本····