Skip to main content

Journal of Learning Apache Lucene - the core searching classes

The basic search interface that Lucene provides is as straightforward as the one for
indexing. Only a few classes are needed to perform the basic search operation:

  • IndexSearcher
  • Term
  • Query
  • TermQuery
  • TopDocs

#1 IndexSearcher
IndexSearcher is to searching what IndexWriter is to indexing: the central link to the  index that exposes several search methods. You can think of IndexSearcher as a class that opens an index in a read-only mode. It requires a Directory instance, holding the previously created index, and then offers a number of search methods, some of which are implemented in its abstract parent class Searcher; the simplest takes a Query object and an int topN count as parameters and returns a TopDocs object.

A typical use of this method looks like this:
// open the folder holds the index
Directory dir = FSDirectory.open(new File("/tmp/index"));
IndexSearcher searcher = new IndexSearcher(dir);
Query q = new TermQuery(new Term("contents", "lucene"));
// Finds the top n(10 here) hits for query.
TopDocs hits = searcher.search(q, 10);
searcher.close();

#2 Term
A Term is the basic unit for searching. Similar to the Field object, it consists of a pair of string elements: the name of the field and the word (text value) of that field. Note that Term objects are also involved in the indexing process. However, they’re created by Lucene’s internals, so you typically don’t need to think about them while indexing. During searching, you may construct Term objects and use them together with TermQuery

#3 Query
Lucene comes with a number of concrete, the most basic Lucene Query is TermQuery. Other Query types are BooleanQuery, PhraseQuery, PrefixQuery, PhrasePrefixQuery, TermRangeQuery, NumericRangeQuery, FilteredQuery, and SpanQuery. Query is the common, abstract parent class. It contains several utility methods, the most interesting of which is setBoost(float), which enables you to tell
Lucene that certain subqueries should have a stronger contribution to the final relevance score than other subqueries.

#4 TermQuery
TermQuery is the most basic type of query supported by Lucene, and it’s one of the primitive query types. It’s used for matching documents that contain fields with specific values, as you’ve seen in the last few paragraphs. Finally, wrapping up our brief tour of the core classes used for searching, we touch on TopDocs, which represents the result set returned by searching.

#5 TopDocs
The TopDocs class is a simple container of pointers to the top N ranked search results—documents that match a given query. For each of the top N results, TopDocs records the int docID (which you can use to retrieve the document) as well as the float score.




Comments

Popular posts from this blog

Stretch a row if data overflows in jasper reports

It is very common that some columns of the report need to stretch to show all the content in that column. But  if you just specify the property " stretch with overflow' to that column(we called text field in jasper report world) , it will just stretch that column and won't change other columns, so the row could be ridiculous. Haven't find the solution from internet yet. So I just review the properties in iReport one by one and find two useful properties(the bold highlighted in example below) which resolve the problems.   example:
<band height="20" splitType="Stretch"> <textField isStretchWithOverflow="true" pattern="" isBlankWhenNull="true"> <reportElement stretchType="RelativeToTallestObject" mode="Opaque" x="192" y="0" width="183" height="20"/> <box leftPadding="2"> <pen lineWidth="0.25"/> …

JasperReports - Configuration Reference

Spring - Operations with jdbcTemplate

This class manages all the database communication and exception handling using a java.sql.Connection that is obtained from the provided DataSource. JdbcTemplate is a stateless and threadsafe class and you can safely instantiate a single instance to be used for each DAO.


Use of Callback Methods
JdbcTemplate is based on a template style of programming common to many other parts of Spring. Some method calls are handled entirely by the JdbcTemplate, while others require the calling class to provide callback methods that contain the implementation for parts of the JDBC workflow. This is another form of Inversion of Control. Your application code hands over the responsibility of managing the database access to the template class. The template class in turn calls back to your application code when it needs some detail processing filled in. These callback methods are allowed to throw a java.sql.SQLException, since the framework will be able to catch this exception and use its built-in excepti…