首页 > 编程 > Python > 正文

Python爬虫1----房源信息

2019-11-08 03:13:37
字体:
来源:转载
供稿:网友

任务描述

爬取300个房源信息,每页具体信息如下

具体信息


Python代码

#-*- coding: UTF-8 -*-# 20170217:work wellfrom bs4 import BeautifulSoupimport requests# 形成小猪主页上前10页的网址urls = ['http://bj.xiaozhu.com/search-duanzufang-p{}-0/'.format(str(i)) for i in range(1, 11, 1)]# 性别不同,标签的class属性内容不同,通过这个差异区分房东性别def get_lorder_sex(class_name): if class_name == ['member_ico']: return '男' elif class_name == ['member_ico1']: return '女'#对每一页上的具体信息进行解析def get_attar(url): web_data = requests.get(url) soup = BeautifulSoup(web_data.text, 'lxml') titles = soup.select('div.pho_info > h4 > em') locations = soup.select('div.pho_info > p > span') PRices = soup.select('div.day_l > span') images = soup.select('div.pho_show_big > div > img') lorder_names = soup.select('div.w_240 > h6 > a') lorder_images = soup.select('div.member_pic > a > img') lorder_genders = soup.select('div.member_pic > div') for title, location, price, image, lorder_name, lorder_image, gender in zip(titles, locations, prices, images, lorder_names, lorder_images, lorder_genders): data = { 'title': title.get_text(), 'location': location.get_text(), 'price': price.get_text(), 'image': image.get('src'), 'lorder_name': lorder_name.get_text(), 'lorder_image': lorder_image.get('src'), "gender": get_lorder_sex(gender.get("class")) } print data#10个主页中,每一个主页又有很多小页:上面记录了待租房的具体信息for url in urls: web_data = requests.get(url) soup = BeautifulSoup(web_data.text, 'lxml') url_links = soup.select('a.resule_img_a') for url_link in url_links: get_attar(url_link.get('href'))

结果展示

这里仅截取其中两个房源信息

这里写图片描述


不足之处

Pycharm的控制台中,对中文汉字,只能显示其字符编码,未能显示中文


发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表