首页 > 学院 > 开发设计 > 正文

lucen入门学习之一(初识Lucene)

2019-11-06 09:14:15
字体:
来源:转载
供稿:网友

Lucen最主要用到的又两个部分,一个是创建索引,一个是查询索引

说明下下面的代码注意事项:

1、如果自己比较懒的话,就把文档路径和索引路径创建到项目下,方便查看

2、我下面的两个例子都是在D盘下创建Lucene,然后分别在D:/Lucene下创建Index和Data两个文件夹

3、在Data中我创建了两个文件:record1.txt和record2.txt,内容分别为txt1和txt2,当然你也可以创建更多和更复杂的内容的文本

4、在搜索的时候我搜的是txt2所以能查到record2.txt。。-_-

一、创建索引

1、创建索引常用到的类有以下几个(有些事基类,其子类就不过多列举了,具体使用的时候再查)

IndexWriter

Direcory

Analyzer

Document

Field

具体意思就不多说了,如果你不清楚,点击这个链接http://pan.baidu.com/s/1pLrZzWB下载《lucene in action_中文版.pdf》,在第25页可以看到解释,

下面是关于创建索引的一个例子:

package com.burns.lucene_in_action;import java.io.File;import java.io.FileFilter;import java.io.FileReader;import java.io.IOException;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.document.Field.Index;import org.apache.lucene.document.FieldSelectorResult;import org.apache.lucene.index.CorruptIndexException;import org.apache.lucene.index.IndexFileNameFilter;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.store.Directory;import org.apache.lucene.store.FSDirectory;import org.apache.lucene.util.Version;import com.yiibai.lucene.TextFileFilter;public class Indexer {		public static void main(String[] args) throws IOException {//		if (args.length != 2) {//			throw new IllegalArgumentException("Usage:java" + Index.class.getName() + "<index dir> <data dir>");//		}		// String indexDir = args[0];		// String dataDir = args[1];		String indexDir = "D://Lucene//Index";		String dataDir = "D://Lucene//Data";		long start = System.currentTimeMillis();		Indexer indexer = new Indexer(indexDir);		int numIndexed;		try {			numIndexed = indexer.index(dataDir, new TextFileFilter());		} finally {			indexer.close();		}		long end = System.currentTimeMillis();		System.out.PRintln("Indexing" + numIndexed + "files took" + (end - start) + "milliseconds");	}	public IndexWriter writer;	@SuppressWarnings("deprecation")	public Indexer(String indexDir) throws IOException {		Directory dir = FSDirectory.open(new File(indexDir));		writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true,				IndexWriter.MaxFieldLength.UNLIMITED);	}	public void close() throws CorruptIndexException, IOException {// 关闭Index																	// Writer		writer.close();	}	public int index(String dataDir, FileFilter filter) throws IOException {		File[] files = new File(dataDir).listFiles();		for (File file : files) {			if (!file.isDirectory() && !file.isHidden() && file.exists() && file.canRead()					&& (filter == null || filter.accept(file))) {				indexFile(file);			}		}		return writer.numDocs();	}	private static class TextFileFilter implements FileFilter {		@Override		public boolean accept(File pathname) {			return pathname.getName().toLowerCase().endsWith(".txt");		}	}	protected Document getDocument(File f) throws IOException {		Document doc = new Document();		doc.add(new Field("contents", new FileReader(f)));		doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));		doc.add(new Field("fullpath", f.getCanonicalPath(), Field.Store.YES, Field.Index.NOT_ANALYZED));		return doc;	}	private void indexFile(File f) throws IOException {		System.out.println("Indexing" + f.getCanonicalPath());		Document doc = getDocument(f);		writer.addDocument(doc);	}}執行結果如下:

IndexingD:/Lucene/Data/record1.txtIndexingD:/Lucene/Data/record2.txtIndexing2files took1954milliseconds二、搜索/查询索引

常用的类如下:

IndexSearcher

Term

Query

TermQuery

TopDocs

例子代码如下:

package com.burns.lucene_in_action;import java.io.File;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.queryParser.QueryParser;import org.apache.lucene.search.IndexSearcher;import org.apache.lucene.search.Query;import org.apache.lucene.search.ScoreDoc;import org.apache.lucene.search.TopDocs;import org.apache.lucene.store.Directory;import org.apache.lucene.store.FSDirectory;import org.apache.lucene.util.Version;public class Searcher {	public static void main(String[] args) throws Exception {		// if (args.length != 2) {		// throw new IllegalArgumentException("Usage: java" +		// Searcher.class.getName() + "<index dir> <query>");		// }		//		// String indexDir = args[0];		// String q = args[1];		String indexDir = "D://Lucene//Index";		String q = "txt2";		search(indexDir, q);	}	private static void search(String indexDir, String q) throws Exception {		Directory dir = FSDirectory.open(new File(indexDir));		IndexSearcher is = new IndexSearcher(dir);		QueryParser parser = new QueryParser(Version.LUCENE_30, "contents", new StandardAnalyzer(Version.LUCENE_30));		Query query = parser.parse(q);		long start = System.currentTimeMillis();		TopDocs hits = is.search(query, 10);		long end = System.currentTimeMillis();		System.err.print("Found" + hits.totalHits + "document(s) (in " + (end - start)				+ "milliseconds that matched query'" + q + "':");		for (ScoreDoc scoreDoc : hits.scoreDocs) {			Document doc = is.doc(scoreDoc.doc);			System.out.println(doc.get("fullpath"));		}		is.close();	}}执行结果如下:

Found1document(s) (in 9milliseconds that matched query'txt2':D:/Lucene/Data/record2.txt


发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表