@chcaa/text-search

1.7.1 • Public • Published

Text Search

Text Search makes it easier to use Elastic Search for most common text search use cases.

Installation

  • npm install @chcaa/text-search
  • install elastic search >= 7.10 and include the ICU plugin and the Ingest Attachment Processor plugin
  • create a search-settings.js file in the root folder of the project (where package.json is located). Use the /settings/search-settings-template.js as inspiration. (copy, rename, and fill in the required fields)
  • to view a demo clone the git repository and start elastic search and run /test/web/bin/www and open a browser on localhost:3000

config (search-settings.js)

  • dataDir: this should point to the elastic search data dir
    • on linux create two sub-dirs: "textdb-index-files" and "textdb-resources" and give them read and write access both by the app and elastic search (as well as future created files and dirs). This could be done by making elastic search (elasticsearch linux user) the owner and then setting a group the app has access to. Remember to set the gid for ownership of future files and directories. (On windows the dirs will be created automatically)

Usage

This is just some simple examples to get started, see the documentation of the code for more examples and configuration options.

Search

const { Search } = require('@chcaa/text-search');

Search.initEsConnection(); // will init the ES connection using the search-settings.js file (see above). This Should only be done once pr. application.
  
let simpleSchema = await Search.createIndexIfNotExists('simple-test', {
    language: Search.language.ENGLISH,
    fields: [
        { name: 'title', type: 'text', sortable: true, boost: 5, indexed: true },
        { name: 'desc', type: 'text', sortable: true, boost: 1, indexed: true },
        { name: 'date', type: 'date', sortable: true, boost: 1, indexed: true },
        { name: 'author.title', type: 'text', sortable: true, boost: 1, indexed: true },
        { name: 'author.name', type: 'text', sortable: true, boost: 1, indexed: true },
        { name: 'author.address.zip', type: 'text', sortable: true, boost: 1, indexed: true },
        { name: 'author.address.street', type: 'text', sortable: true, boost: 1, indexed: true }
    ]
});

//await simpleSchema.dropIndex(); // drops the index but keeps resource files
//await Search.deleteIndex('simple-test'); // drops index and deletes resource files

await simpleSchema.setSynonyms([
    'aristocats, cartoons',
    'face hugger, facehugger, alien',
    'easter eg, groundhog day'
]);

await simpleSchema.index({ id: 1, body:{ title: 'Aliens', desc: '🙂', author: { title: 'Writer', name: 'john', address: { zip: '2000', street: 'Astreet' } } } });
await simpleSchema.index({ id: 2, body:{ title: 'Aristocats <3 😀', desc: '🙂', author: { title: 'Hero', name: 'Eric', address:{ zip: '3000', street: 'Bstreet'} } } });
await simpleSchema.index({ id: 3, permissions: { public: false, users: 'peter', groups: ['group1'] }, body: { title: 'Iill', desc: '🙂', author: { title: 'Bus driver', name: 'Joe', address: { zip: '4000', street: 'Cstreet' } } } });

let searchRes = await simpleSchema.find('alien', {
    pagination: { page: 1, maxResults: 3 },
    includeSource: false,
    authorization: {
        user: 'peter',
        groups: ['group1']
    },
    sorting: { field: 'date', order: 'desc' },
});

console.log(searchRes);
  • ElasticSearch can take a long time to start, which can result in problems on reboot. To await ElasticSearch use one of the following:
    await Search.waitForElasticSearchToComeOnline(timeoutMillis)
    await Search.waitForElasticSearchToComeOnlineAndBeReadyForSearch(timeoutMillis)

FileToText

Extract text from various filetypes using Tika through Elastic Search.

const { FileToText } = require('@chcaa/text-search');

FileToText.initEsConnection(); // will init the ES connection using the search-settings.js file (see installation and config notes). This Should only be done once pr. applicationSearch.initEsConnection(); // will init the ES connection using the search-settings.js file (see above). This Should only be done once pr. application

let ftt = new FileToText();

await ftt.init();
let response = await ftt.extractText([
    //'D:/temp/reddit/no-new-normal-2020/submissions/NoNewNormal.ndjson',
    'D:/temp/reddit/new_thread_gts6iv.zip',
    'D:/desktop/Vejledningsplan_MortenKüseler_Udfyldt.pdf',
    'D:/desktop/ETA.html',
    'D:/desktop/user-tokens.json',
]);
for (let text of response) {
    console.log(text.file, text.data);
}

Maintenance

From time to time the index needs to be updated. This could be if a change to the mapping is required or Search has been updated to a new version which requires changes made to the index.

Search#reindex()

Reindex the current indexed data with a new schema definition. Use this when changes to "index-time" mappings (see documentation for Search.createIndex) needs to be made.

let initialSchema = {
    language: Search.language.ENGLISH,
    fields: [
        { name: 'title', type: 'text', sortable: true, boost: 5, indexed: true }
    ]
};
let search = await Search.createIndexIfNotExists('test-index', initialSchema); // we have done this in the past

let newSchema = { // we want the laguage to be "danish" and have content assist as well, so we add that to the schema. Language change and content-assist requires reindexing
    language: Search.language.DANISH, // this has changed
    fields: [
        { name: 'title', type: 'text', sortable: true, boost: 5, indexed: true, contentAssist: true } // contentAssist is added
    ]
};

await search.reindex(newSchema);

Changes to "query-time" mappings will be applied automatically if changes is detected in Search.createIndexIfNotExists() or can be applied manually by calling Search#updateQueryTimeFieldSettings().

Search.upgrade()

When upgrading Search to a new version from npm an upgrade can be required to enable the latest features. If trying to load an index which needs upgrading to work correctly an error will be thrown saying an upgrade is required, so it is not possible to open an index which is not up to date. Search.isUpgradeIndexRequired() can be used to test if an upgrade is required before loading the index.

To upgrade simply run below code... and wait... (can take a long time if a re-index is required)

await Search.upgradeIndex('test-index');

Readme

Keywords

none

Package Sidebar

Install

npm i @chcaa/text-search

Weekly Downloads

5

Version

1.7.1

License

ISC

Unpacked Size

398 kB

Total Files

26

Last publish

Collaborators

  • donbjarkone
  • jedglow
  • csiztom
  • pbvahlst