js-solr-highlighter

0.8.8 • Public • Published

js-solr-highlighter

A JavaScript library for highlighting HTML text based on the query in the lucene/solr query syntax
Run in the browser or Node.js environment
Built based on lucene and text-annotator
The general highlighting process is:

  1. Derive which text to highlight from a query in the lucene syntax
  2. Highlight the derived text in the HTML

An example from Europe PMC

js-solr-highlighter has been used to highlight the article titles in the search results of Europe PMC, an open science platform that enables access to a worldwide collection of life science publications. An example is https://europepmc.org/search?query=blood%20AND%20TITLE%3Acancer "an example from Europe PMC" "an example from Europe PMC"

Basic usage

No options

var query = 'cancer AND blood'
var content = 'Platelet Volume Is Reduced In Metastasing Breast Cancer: Blood Profiles Reveal Significant Shifts.'
var highlightedContent = highlightByQuery(query, content)
// 'Platelet Volume Is Reduced In Metastasing Breast <span id="highlight-0" class="highlight">Cancer</span>: <span id="highlight-1" class="highlight">Blood</span> Profiles Reveal Significant Shifts.'

With the validFields options that specify the fields valid in the query syntax. If not specified, all like x:x will be valid fields

var query = 'TITLE:blood AND CONTENT:cell'
var content = 'A molecular map of lymph node blood vascular endothelium at single cell resolution'
var options = { validFields: ['TITLE'] }
var highlightedContent = highlightByQuery(query, content, options)
// 'A molecular map of lymph node <span id="highlight-0" class="highlight">blood</span> vascular endothelium at single cell resolution'
// "cell" will not be highlighted

With the highlightedFields options that specify the valid fields whose values will be highlighted. If not specified, the values of all valid fields will be highlighted

var query = 'TITLE:blood OR CONTENT:cell'
var content = 'A molecular map of lymph node blood vascular endothelium at single cell resolution'
var options = { validFields: ['TITLE', 'CONTENT'], highlightedFields: ['CONTENT'] }
var highlightedContent = highlightByQuery(query, content, options)
// 'A molecular map of lymph node blood vascular endothelium at single <span id="highlight-0" class="highlight">cell</span> resolution'
// "blood" will not be highlighted

Options

Field Type Description
validFields array validFields are those parsed as fields.
If undefined, all will be parsed as fields if they are like x:x
highlightedFields array highlightedFields are those among validFields whose values will be highlighted.
If undefined, the values of all valid fields will be highlighted.
highlightAll boolean highlightAll indicates whether to highlight all occurances of the text or the first found occurance only.
If undefined, it is true.
highlightIdPattern string highlightIdPattern is the same pattern of the IDs associated with the highlights in the HTML.
A highlight ID consists of highlightIdPattern followed by the index of the highlight, such as "highlight-0", "highlight-1"...
If undefined, it is "highlight-".
highlightClass string highlightClass is the classname of every highlight in the HTML.
If undefined, it is "highlight".
caseSensitive boolean caseSensitive indicates whether to ignore case when highlighting.
If undefined, it is false (ignore).

Highlighting rules

Rule Examples
If the query has only text and has no fields, highlight each word in it. If the query is methylation test, methylation and test will be highlighted if they appear in the content.
If the field is valid, highlight its value. If the query is TITLE:blood and TITLE is a valid field, highlight blood if it appears in the content.
Do not highlight part of a word in the content. If the query is bloo and the content has no such word but has the word blood, do not highlight bloo in blood.
Highlight both the text or field values that the AND or OR operator takes. If the query is blood AND TITLE:cancer and TITLE is a valid field, highlight both blood and cancer in the content if they exist.
Do not highlight the text or field value that the NOT operator takes. If the query is NOT blood AND cancer, highlight cancer but not blood.
Highlight the text or field values within parentheses. If the query is (blood) AND (TITLE:cancer) and TITLE is a valid field, both blood and cancer will be highlighted if possible.
Do not highlight Solr stop words. If the query is a theory-based study, do not highlight a but the other words.
If the text or the value of a valid field is within quotes, highlight the EXACT text/value (including stop words). If the query is "breast cancer", do not highlight breast or cancer if it appears without the other following or being followed.

Contact

Zhan Huang

Package Sidebar

Install

npm i js-solr-highlighter

Weekly Downloads

680

Version

0.8.8

License

MIT

Unpacked Size

57.2 kB

Total Files

10

Last publish

Collaborators

  • zhan-huang