crawler抓取内容出现乱码了。。

exports.index = function(req, res){ var Crawler = require(“crawler”).Crawler;

var c = new Crawler({ "maxConnections":10, "debug":true, "forceUTF8":true, // This will be called for each crawled page "callback":function(error,result,$) {

    // $ is a jQuery instance scoped to the server-side DOM of the page
  
    var te=$("#top .gengxin table  tr:first").html();
    console.log(te);
     res.render('index',{title:te});
}

}); c.queue(“http://psv.tgbus.com”); };

后台输出: GET http://psv.tgbus.com … Got http://psv.tgbus.com (107044 bytes)… forceUTF8 true Detected charset windows-1252 (95% confidence)

茂驴陆茂驴陆<a class="" href="http://psv.tgbus.com/yxgl/201303/20130314105417.shtml" title="茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆寐济柯矫柯矫柯矫柯矫陆茂驴陆寐柯矫柯矫柯矫柯?nbsp;茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆寐济柯矫柯矫柯矫柯矫柯矫柯矫柯? target="_blank">茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆寐济柯矫柯矫柯矫柯矫陆茂驴陆寐柯矫柯矫柯矫柯?nbsp;茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆茂驴陆寐济柯矫柯矫柯矫柯矫柯矫柯矫柯?/font></a></td><td align="right" class="" width="40">03-14</td> 难道是 windows-1252编码不支持吗？我抓取utf-8的网页是正常的。。

anuxs 1楼•2年前

爬虫要面对的第一个问题就是编码的问题。建议用fetch，自动转码。

shiedman 2楼•2年前

并非不支持windows-1252编码，而是crawler调用的jschardet库将gb2312误判成windows-1252. 除非修改crawler的代码，无解。

ronincn 3楼•2年前

自己抓，然后jquery dom

jathya2 4楼•1年前

@anuxs 谢谢大神。。搞了一天了。啥request,bufferHelper,needle,iconv,spider在编码问题上都是没用的。最后才看见了fetch…解决了各种编码问题通用性极强… 虽然是老帖子，但是确实解决了问题 btw,编码问题在http://stackoverflow.com完全不知道搜索啥关键字英文太差了

xadillax 5楼•1年前

也可以爬虫用nodegrassex