该程序是爬取京东上的java图书信息 book模型:
PRivate String bookID; private String bookName; private String bookPrice;文件结构
1)httpclient maven配置:(不同版本创建HttpClient方法不同)
<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.1.2</version></dependency>2)main方法:(获取数据,存放数据)
public class bookMain { static final Log logger = LogFactory.getLog(bookMain.class); //log4j public static void main(String[] args) throws Exception { HttpClient httpclient = new DefaultHttpClient(); //创建HttpClient String url = "https://search.jd.com/Search?keyWord=java&enc=utf-8&wq=java&pvid=f961dczi.8r5joc"; //种子 List<Book> books = URLEntity.URLParse(httpclient, url); //通过URLEntity获取实体中的信息 for (Book book : books) { logger.info("bookId:" + book.getBookID() + "/t" + "bookName:" + book.getBookName() + "/t" + "bookPrice:" + book.getBookPrice() + "/t"); } MySQL_control.executeInsert(books); //数据库添加数据 }}3)获取response(httpUtil类)
public class httpUtil { public static HttpResponse getHtml(HttpClient httpclient, String url) throws IOException { HttpGet getMethod = new HttpGet(url); /get方法 HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,HttpStatus.SC_OK,"ok"); //response初始化 response = httpclient.execute(getMethod); //执行get方法 return response; }}4)返回实体中的信息(URLEntity类) 调用3)获取response
public class URLEntity { public static List<Book> URLParse(HttpClient httpclient,String url) throws IOException { List<Book> getbooks = new ArrayList<Book>(); HttpResponse response = httpUtil.getHtml(httpclient, url); int statusCode = response.getStatusLine().getStatusCode(); //获取状态码 if(statusCode == 200) //200为正常 { String entity = EntityUtils.toString(response.getEntity(),"utf-8"); getbooks = bookParse.getData(entity); EntityUtils.consume(response.getEntity()); //消耗实体类,实体类最后需要消耗 } else EntityUtils.consume(response.getEntity()); return getbooks; }}5)解析html(此处使用的是jsoup)bookParse类
public class bookParse { public static List<Book> getData(String html) { List<Book> datas = new ArrayList<Book>(); Document doc = Jsoup.parse(html); Elements elements = doc.select("ul[class=gl-warp clearfix]").select("li[class=gl-item]"); for (Element element : elements) { String bookid = element.select("div[class=gl-i-wrap j-sku-item]").attr("data-sku"); String bookprice = element.select("div[class=p-price]").select("strong").select("i").text(); String bookname = element.select("div[class=p-name]").select("em").text(); Book book = new Book(); book.setBookID(bookid); book.setBookName(bookname); book.setBookPrice(bookprice); datas.add(book); } return datas; }} 6)此处的数据库连接 mysql_source类
mysql_control类
public class mysql_control { static DataSource ds = mysql_source.getDataSource("jdbc:mysql://127.0.0.1:3306/book"); static QueryRunner qr = new QueryRunner(ds); public static void executeInsert(List<Book> bookdatas) throws SQLException { Object[][] params = new Object[bookdatas.size()][3]; for(int i=0; i<params.length; i++) { params[i][0] = bookdatas.get(i).getBookID(); params[i][1] = bookdatas.get(i).getBookName(); params[i][2] = bookdatas.get(i).getBookPrice(); } qr.batch("insert into books(bookID,bookNam,bookPrice)values(?,?,?)", params); System.out.println("成功插入" + bookdatas.size() + "条"); }}7)补充
log4j.properties文件内容
log4j.rootLogger=DEBUG, stdoutlog4j.appender.stdout=org.apache.log4j.ConsoleAppenderlog4j.appender.stdout.layout=org.apache.log4j.PatternLayoutlog4j.appender.stdout.layout.ConversionPattern=%-5p - %m%n只是输出到控制台
控制台运行结果
新闻热点
疑难解答