Skip to main content

Journal of Learning Apache Lucene - Boosting documents and fields

Not all documents and fields are created equal—or at least you can make sure that’s the case by using boosting. Boosting may be done during indexing or during searching.

#1 Boosting Documents
Document boosting is a feature that makes such a requirement simple to implement. By default, all documents have no boost—or, rather, they all have the same boost factor of 1.0. By changing a document’s boost factor, you can instruct Lucene to consider it more or less important with respect to other documents in the index when computing relevance.

For example:
if (isImportant(lowerDomain)) {
doc.setBoost(1.5F);
} else if (isUnimportant(lowerDomain)) {
doc.setBoost(0.1F);
}

#2 Boosting fields

Just as you can boost documents, you can also boost individual fields. When you boosta document, Lucene internally uses the same boost factor to boost each of its fields. Imagine that another requirement for the email-indexing application is to consider the subject field more important than the field with a sender’s name. In other words, search matches made in the subject field should be more valuable than equivalent
matches in the senderName field in our earlier example. To achieve this behavior, we use the setBoost(float) method of the Field class:

Field subjectField = new Field("subject", subject, Field.Store.YES, Field.Index.ANALYZED);
subjectField.setBoost(1.2F);


Comments

  1. Hello Xu,
    We are a corporate training firm. Your courses are very impressive and a lot of our clients have expressed need for online training in this area. We would love to discuss if we could collaborate with you on this so as to market your courses and generate additional revenue. Pls get in touch with me at shilpa.khatana@skillofy.com if you would be interested.
    Thanks
    Shilpa Khatana

    ReplyDelete

Post a Comment

Popular posts from this blog

Stretch a row if data overflows in jasper reports

It is very common that some columns of the report need to stretch to show all the content in that column. But  if you just specify the property " stretch with overflow' to that column(we called text field in jasper report world) , it will just stretch that column and won't change other columns, so the row could be ridiculous. Haven't find the solution from internet yet. So I just review the properties in iReport one by one and find two useful properties(the bold highlighted in example below) which resolve the problems.   example:
<band height="20" splitType="Stretch"> <textField isStretchWithOverflow="true" pattern="" isBlankWhenNull="true"> <reportElement stretchType="RelativeToTallestObject" mode="Opaque" x="192" y="0" width="183" height="20"/> <box leftPadding="2"> <pen lineWidth="0.25"/> …

JasperReports - Configuration Reference

Live - solving the jasper report out of memory and high cpu usage problems

I still can not find the solution. So I summary all the things and tell my boss about it. If any one knows the solution, please let me know.


Symptom: 1.The JVM became Out of memory when creating big consumption report 2.Those JRTemplateElement-instances is still there occupied even if I logged out the system
Reason:         1. There is a large number of JRTemplateElement-instances cached in the memory 2.The clearobjects() method in ReportThread class has not been triggered when logging out
Action I tried:      About the Virtualizer: 1.Replacing the JRSwapFileVirtualizer with JRFileVirtualizer 2.Not use any FileVirtualizer for cache the report in the hard disk Result: The japserreport still creating the a large number of JRTemplateElement-instances in the memory     About the work around below,      I tried: item 3(in below work around list) – result: it helps to reduce  the size of the JRTemplateElement Object                Item 4,5 – result : it helps a lot to reduce the number of  JRTemplateE…