python3的爬虫库与python2库的区别较大,python3将urllib2和urllib直接合并成了一个库——urllib,在其下面有四个模块,分别为request,parse,error,robotparser模块,在request之下的urlopen方法,方法原型如下:urlopen(url,data=none),这个方法返回的是一个response对象,其中参数url可以使一个request对象,也可以是一个字符串,该方法等价于:
res=urlib.request.Request(url)
response=urlli.request.urlopen(res)
再回过头来说一下urlopen里面的data参数,首先需要构造一个字典,然后再用urllib.urlencode()进行转化为相应的格式。
由于我的机器装的是python2.7版本,所以需要将这个代码进行转化,所以,我在网上查阅了一下相应的对应代码:
Python 2 name
| Python 3 name | |
| urllib.urlretrieve() | urllib.request.urlretrieve() |
| urllib.urlcleanup() | urllib.request.urlcleanup() |
| urllib.quote() | urllib.parse.quote() |
| urllib.quote_plus() | urllib.parse.quote_plus() |
| urllib.unquote() | urllib.parse.unquote() |
| urllib.unquote_plus() | urllib.parse.unquote_plus() |
| urllib.urlencode() | urllib.parse.urlencode() |
| urllib.pathname2url() | urllib.request.pathname2url() |
| urllib.url2pathname() | urllib.request.url2pathname() |
| urllib.getPRoxies() | urllib.request.getproxies() |
| urllib.URLopener | urllib.request.URLopener |
| urllib.FancyURLopener | urllib.request.FancyURLopener |
| urllib.ContentTooShortError | urllib.error.ContentTooShortError |
| urllib2.urlopen() | urllib.request.urlopen() |
| urllib2.install_opener() | urllib.request.install_opener() |
| urllib2.build_opener() | urllib.request.build_opener() |
| urllib2.URLError | urllib.error.URLError |
| urllib2.HTTPError | urllib.error.HTTPError |
| urllib2.Request | urllib.request.Request |
| urllib2.OpenerDirector | urllib.request.OpenerDirector |
| urllib2.BaseHandler | urllib.request.BaseHandler |
| urllib2.HTTPDefaultErrorHandler | urllib.request.HTTPDefaultErrorHandler |
| urllib2.HTTPRedirectHandler | urllib.request.HTTPRedirectHandler |
| urllib2.HTTPCookieProcessor | urllib.request.HTTPCookieProcessor |
| urllib2.ProxyHandler | urllib.request.ProxyHandler |
| urllib2.HTTPPassWordMgr | urllib.request.HTTPPasswordMgr |
| urllib2.HTTPPasswordMgrWithDefaultRealm | urllib.request.HTTPPasswordMgrWithDefaultRealm |
| urllib2.AbstractBasicAuthHandler | urllib.request.AbstractBasicAuthHandler |
| urllib2.HTTPBasicAuthHandler | urllib.request.HTTPBasicAuthHandler |
| urllib2.ProxyBasicAuthHandler | urllib.request.ProxyBasicAuthHandler |
| urllib2.AbstractDigestAuthHandler | urllib.request.AbstractDigestAuthHandler |
| urllib2.HTTPDigestAuthHandler | urllib.request.HTTPDigestAuthHandler |
| urllib2.ProxyDigestAuthHandler | urllib.request.ProxyDigestAuthHandler |
| urllib2.HTTPHandler | urllib.request.HTTPHandler |
| urllib2.HTTPSHandler | urllib.request.HTTPSHandler |
| urllib2.FileHandler | urllib.request.FileHandler |
| urllib2.FTPHandler | urllib.request.FTPHandler |
| urllib2.CacheFTPHandler | urllib.request.CacheFTPHandler |
| urllib2.UnknownHandler | urllib.request.UnknownHandler |
新闻热点
疑难解答