Skip to main content

Journal of Learning Apache Lucene - the core searching classes

The basic search interface that Lucene provides is as straightforward as the one for
indexing. Only a few classes are needed to perform the basic search operation:

  • IndexSearcher
  • Term
  • Query
  • TermQuery
  • TopDocs

#1 IndexSearcher
IndexSearcher is to searching what IndexWriter is to indexing: the central link to the  index that exposes several search methods. You can think of IndexSearcher as a class that opens an index in a read-only mode. It requires a Directory instance, holding the previously created index, and then offers a number of search methods, some of which are implemented in its abstract parent class Searcher; the simplest takes a Query object and an int topN count as parameters and returns a TopDocs object.

A typical use of this method looks like this:
// open the folder holds the index
Directory dir = FSDirectory.open(new File("/tmp/index"));
IndexSearcher searcher = new IndexSearcher(dir);
Query q = new TermQuery(new Term("contents", "lucene"));
// Finds the top n(10 here) hits for query.
TopDocs hits = searcher.search(q, 10);
searcher.close();

#2 Term
A Term is the basic unit for searching. Similar to the Field object, it consists of a pair of string elements: the name of the field and the word (text value) of that field. Note that Term objects are also involved in the indexing process. However, they’re created by Lucene’s internals, so you typically don’t need to think about them while indexing. During searching, you may construct Term objects and use them together with TermQuery

#3 Query
Lucene comes with a number of concrete, the most basic Lucene Query is TermQuery. Other Query types are BooleanQuery, PhraseQuery, PrefixQuery, PhrasePrefixQuery, TermRangeQuery, NumericRangeQuery, FilteredQuery, and SpanQuery. Query is the common, abstract parent class. It contains several utility methods, the most interesting of which is setBoost(float), which enables you to tell
Lucene that certain subqueries should have a stronger contribution to the final relevance score than other subqueries.

#4 TermQuery
TermQuery is the most basic type of query supported by Lucene, and it’s one of the primitive query types. It’s used for matching documents that contain fields with specific values, as you’ve seen in the last few paragraphs. Finally, wrapping up our brief tour of the core classes used for searching, we touch on TopDocs, which represents the result set returned by searching.

#5 TopDocs
The TopDocs class is a simple container of pointers to the top N ranked search results—documents that match a given query. For each of the top N results, TopDocs records the int docID (which you can use to retrieve the document) as well as the float score.




Comments

Popular posts from this blog

Live - solving the jasper report out of memory and high cpu usage problems

I still can not find the solution. So I summary all the things and tell my boss about it. If any one knows the solution, please let me know. Symptom: 1.        The JVM became Out of memory when creating big consumption report 2.        Those JRTemplateElement-instances is still there occupied even if I logged out the system Reason:         1. There is a large number of JRTemplateElement-instances cached in the memory 2.     The clearobjects() method in ReportThread class has not been triggered when logging out Action I tried:      About the Virtualizer: 1.     Replacing the JRSwapFileVirtualizer with JRFileVirtualizer 2.     Not use any FileVirtualizer for cache the report in the hard disk Result: The japserreport still creating the a large number of JRTemplateElement-instances in the memory        About the work around below,      I tried: item 3(in below work around list) – result: it helps to reduce  the size of the JRTemplateElement Object        

Stretch a row if data overflows in jasper reports

It is very common that some columns of the report need to stretch to show all the content in that column. But  if you just specify the property " stretch with overflow' to that column(we called text field in jasper report world) , it will just stretch that column and won't change other columns, so the row could be ridiculous. Haven't find the solution from internet yet. So I just review the properties in iReport one by one and find two useful properties(the bold  highlighted in example below) which resolve the problems.   example: <band height="20" splitType="Stretch" > <textField isStretchWithOverflow="true" pattern="" isBlankWhenNull="true"> <reportElement stretchType="RelativeToTallestObject" mode="Opaque" x="192" y="0" width="183" height="20"/> <box leftPadding="2"> <pen lineWidth="0.25"/>

JasperReports - Configuration Reference

Data Source / Query Executer net.sf.jasperreports.csv.column.names.{arbitrary_name} net.sf.jasperreports.csv.date.pattern net.sf.jasperreports.csv.encoding net.sf.jasperreports.csv.field.delimiter net.sf.jasperreports.csv.locale.code net.sf.jasperreports.csv.number.pattern net.sf.jasperreports.csv.record.delimiter net.sf.jasperreports.csv.source net.sf.jasperreports.csv.timezone.id net.sf.jasperreports.ejbql.query.hint.{hint} net.sf.jasperreports.ejbql.query.page.size net.sf.jasperreports.hql.clear.cache net.sf.jasperreports.hql.field.mapping.descriptions net.sf.jasperreports.hql.query.list.page.size net.sf.jasperreports.hql.query.run.type net.sf.jasperreports.jdbc.concurrency net.sf.jasperreports.jdbc.fetch.size net.sf.jasperreports.jdbc.holdability net.sf.jasperreports.jdbc.max.field.size net.sf.jasperreports.jdbc.result.set.type net.sf.jasperreports.query.chunk.token.separators net.sf.jasperreports.query.executer.factory.{language} net.sf.jasperreports.xpath.