document-ingestion

1.1.0 • Public • Published

Elasticsearch document ingestion

Build Status

The ingestion library assumes the document ingestion plugin from Elasticsearch has been installed.

This app is composed of two parts - the library and ingestion functions.

The core library located at /lib contains the api to interact with Elasticsearch.

Running

The cleanIngest and alias functions take a settings object. It is in this format:

settings = {
  index: 'myindex',
  alias: 'alias',
  host: 'http://myelasticsearchinstance:9200',
  analyzer: {},
  documentObject: {
    "elastictype": [
      {
        "lastSaveDate": "2015-06-10",
        "file": "Season four episode eight.pdf",
        "title": "Season four episode eight"
      },{
        "lastSaveDate": "2013-05-10",
        "file": "Season two episode two.pdf",
        "title": "Season two episode two"
      }
    ]
  }
}

Note the documentObject is an array of the documents to be ingested. They will be into ingested into the index with the type specified in the settings object.

cleanIngest(config)

A clean ingest deletes the index (which you specify in the config), ingests documents, and optionally aliases an index.

let analyzer = JSON.parse(fs.readFileSync('./analyzer.json', 'utf8'));
let pipeline = JSON.parse(fs.readFileSync('./pipeline.json', 'utf8'));
let documentObject = JSON.parse(fs.readFileSync('./documentObject.json', 'utf8'));

let ingest = require('../');

ingest.cleanIngest({
  index: 'myindex',
  host: 'http://localhost:9200',
  analyzer: analyzer,
  pipeline: pipeline,
  documentObject: documentObject
}).then(result => console.log).catch(error => console.log);

alias(config)

This creates an index, aliases it, and deletes previous index.

let analyzer = require('./analyzer.json');
let pipeline = require('./pipeline.json');
let documentObject = require('./documentObject.json');

let ingest = require('document-ingestion');

ingest.alias({
  index: `myindex-${Date.now()}`,
  host: 'http://localhost:9200',
  alias: 'myalias',
  analyzer: analyzer,
  pipeline: pipeline,
  documentObject: documentObject,
}).then(console.log).catch(console.error);

API

A short cut api is exposed too:

getClient(host)

Returns an elasticsearch client bound to the host provided e.g 'http://localhost:9200'. If the same domain is passed, a cached client is returned.

getPipeline(client, pipelineObject)

Pipeline is an object, if pipeline.body exists, it is added to elasticsearch. If no body attribute it will assume that a pipeline with name pipeLine.name already exists in elasticsearch.

Example we need for an attachment ingestion:

{
  "name": "attachment",
  "body": {
    "description" : "Attachment ingestion",
    "processors" : [{
      "attachment" : {
        "field" : "data"
      }
    }]
  }
}

createIndex(client, index, analyzer)

deleteIndex(client, index)

getAliasIndex(client, alias)

When we want to switch aliases we need to know which index currently aliases. This function takes an elasticsearch client, and an alias, returns a promise which returns the name of the current index or null if the alias does not exist.

Readme

Keywords

none

Package Sidebar

Install

npm i document-ingestion

Weekly Downloads

2

Version

1.1.0

License

MIT

Unpacked Size

20.1 kB

Total Files

11

Last publish

Collaborators

  • roppa_uk