Journal of Learning Apache Lucene - the core searching classes

The basic search interface that Lucene provides is as straightforward as the one for
indexing. Only a few classes are needed to perform the basic search operation:

IndexSearcher
Term
Query
TermQuery
TopDocs

#1 IndexSearcher
IndexSearcher is to searching what IndexWriter is to indexing: the central link to the index that exposes several search methods. You can think of IndexSearcher as a class that opens an index in a read-only mode. It requires a Directory instance, holding the previously created index, and then offers a number of search methods, some of which are implemented in its abstract parent class Searcher; the simplest takes a Query object and an int topN count as parameters and returns a TopDocs object.

A typical use of this method looks like this:
// open the folder holds the index
Directory dir = FSDirectory.open(new File("/tmp/index"));
IndexSearcher searcher = new IndexSearcher(dir);
Query q = new TermQuery(new Term("contents", "lucene"));
// Finds the top n(10 here) hits for query.
TopDocs hits = searcher.search(q, 10);
searcher.close();

#2 Term
A Term is the basic unit for searching. Similar to the Field object, it consists of a pair of string elements: the name of the field and the word (text value) of that field. Note that Term objects are also involved in the indexing process. However, they’re created by Lucene’s internals, so you typically don’t need to think about them while indexing. During searching, you may construct Term objects and use them together with TermQuery

#3 Query
Lucene comes with a number of concrete, the most basic Lucene Query is TermQuery. Other Query types are BooleanQuery, PhraseQuery, PrefixQuery, PhrasePrefixQuery, TermRangeQuery, NumericRangeQuery, FilteredQuery, and SpanQuery. Query is the common, abstract parent class. It contains several utility methods, the most interesting of which is setBoost(float), which enables you to tell
Lucene that certain subqueries should have a stronger contribution to the final relevance score than other subqueries.

#4 TermQuery
TermQuery is the most basic type of query supported by Lucene, and it’s one of the primitive query types. It’s used for matching documents that contain fields with specific values, as you’ve seen in the last few paragraphs. Finally, wrapping up our brief tour of the core classes used for searching, we touch on TopDocs, which represents the result set returned by searching.

#5 TopDocs
The TopDocs class is a simple container of pointers to the top N ranked search results—documents that match a given query. For each of the top N results, TopDocs records the int docID (which you can use to retrieve the document) as well as the float score.

I love programming

Search This Blog

Journal of Learning Apache Lucene - the core searching classes

Labels

Comments

Post a Comment

Popular posts from this blog

JasperReports - Configuration Reference

Stretch a row if data overflows in jasper reports

Live - solving the jasper report out of memory and high cpu usage problems