watsonhelper

1.1.1 • Public • Published

Watson-Helper

Extract phone numbers, emails, date, and any custom pattern from arbitrary text.

Features:

  • Extract multiple phone number formats across different countries.
  • Extract multiple email number formats across multiple domains.
  • Extract any custom information like invoiceno, ticketid, purchaseordernumber, UID etc.
  • Extract date formats of multiple types from natural language conversations.
  • Extract date with respect to specific reference dates.
  • Extract output in user's defined date format.
  • Unit tested.
  • Examples enclosed.

Background - ver 1.0.0

A recent IBM Watson customer was exploring Watson conversation api. For the customer extraction of phone numbers and email was a key usecase. This data needs to be captured during the conversation. Hence to assist the customer I wrote this library. This library now can help customers with similar usecase extract their set of data usually expressed in conversations or personal blogs. I have released the first version that can address 3 key usecases. Few other key usecases are in pipeline.

Background - ver 1.1.0

Date is commonly used in conversation. Now one can easily convert date references like yesterday, today, tomorrow into quantitve value with respect to a reference date as defined by business. One can also extract natural language dates commonly used in conversations. Output date can also be pre-configured in the required format as expected by business without any extra effort.

Examples

Basic example for email,phone and any custom data extraction

Extraction of one or more phone numbers, email or custom data(like invoiceno, ticketid, purchaseordernumber, UID etc ) from a user supplied text.

var helper = require('watsonhelper');
var phonelist = helper.phoneextractor("I am moving to hyderabad and my mobile number is  +919538099898, You can also call me at 08042227967");
 
var email = helper.emailextractor("I am moving to US and my email id is  shunandi@gmail.com, You can also email me at shubhradeepnandi@gmail.com");
 
var invoiceno = helper.extractor("<Extracting What>", "Your invoice is generated it is inv1105576", <REGEX STRING>);

Examples to extract date from natural language conversation

Extraction of date can now be done from natural language text, or from multiple date formats used in conversation. Business can define a reference date and also provide the expected date format as output.

var watsonhelper = require('watsonhelper');
var refDate1 = new Date('Thu Oct 20 2016 12:20:29 GMT+0530 (India Standard Time)'); //reference date as defined by business
var refDate2 = new Date('20-oct-2016'); //reference date as defined by business
var text1 = 'I have an appointment tomorrow';
var text2 = 'I have an appointment on 23/11/16';
var text3 = 'I have an appointment on 3rd November';
var opformat1 = "dd-mm-yyyy"; //output date format as defined by business
var opformat2 = "dd-mmm-yyyy"; //output date format as defined by business
 
var date1 = watsonhelper.dateextractor(text1,refDate1,opformat1);
var date2 = watsonhelper.dateextractor(text2,null,opformat2);
var date3 = watsonhelper.dateextractor(text3);

Limitations

This software cannot capture every single combination imaginable. Especially number-to-letter substitution is difficult to detect e.g:

  • O4!4.Ol2;341 (= 0414 012 341)

In my experience very few users write their phone number this way. From a programming point of view it would be possible to cover for edge cases like above, but I have chosen not to.

Issues, bug reports

https://github.com/ShubhradeepNandi/watsonhelper/issues

License

http://www.apache.org/licenses/LICENSE-2.0

Collaborate and Social

https://ibm.biz/BdrUsn

Package Sidebar

Install

npm i watsonhelper

Weekly Downloads

9

Version

1.1.1

License

Apache-2.0

Last publish

Collaborators

  • shubhra