The purpose of this module is to provide a simple in-stream interface to process text files line by line, with a known set of fields. The initial release will allow parsing of fields with fixed widths, with future plans to add the ability to break up lines using regex patterns.
The package uses the file specification provided by the user to generate a transform stream which will parse and validate the fields. Fully validated fields will be sent along the stream, invalid lines will be emitted for the user, the line number will be attached to either of these objects for reference.
This stream will work in object mode; in those cases an additional
line event will be emitted as lines are parsed from each chunk.
fields: Is an array of field components necessary for the stream to correctly parse the file.
header(optional): Is an array of field components which make up the formatted header. If provided it will be assumed the first line of the file must be a header and will throw an
invalidevent if it could not be parsed successfully.
footer(optional): Is an array of field components which make up the formatted footer. If provided, the last line of each chunk will be checked to see if the format matches the footer. There is never an assumption of the footer being the last line in the stream so no
invalidevents will be thrown like it does for the header. However, if the last line is an invalid footer the
invalidevent would likely fire as an invalid
lineand this could be used to identify the contents of this line.
requiredHeader(optional): If true and a
headerspecification is provided, an invalid/missing header will result in an
requiredFooter(optional): If true and a
footerspecification, an invalid/missing footer will result in an
Example Options configuration
options =fields:value: <string> | <
Field Component Definition
Each field component is made up of the following attributes:
value: Default field will be named FIELD_[order || index] if value is undefined or not a string.
width: Width of this column.
regex(optional): Field will only be considered valid if a regex is defined and it matches against the parsed field.
order(optional): Defines the order of the fields as they appear in the text file; should be consistently defined for all fields or none of them, otherwise unpredictable results may occur.
scale(optional): Field will be scaled by a factor of 10, unless scale < 1 or left undefined; especially useful for flat files with no floating point specification to define the number of 0s in a field which are decimal places.
The file2json-stream package is based off a transform stream and will have the standard events related to those include
end as well as a few custom events documented here.
headeris emitted if the first line of the file includes a valid header line per the header specification.
footeris emitted if a valid footer line is identified per the footer specification.
invalidis emitted if a line is found to be an invalid format match. A type attribute is included to indicate if the invalid line was the header or line. Since it isn't known when the stream will end the footer will either be found or it won't.
erroris emitted for the usual errors found in stream. Additional errors will be sent if the header was invalid and/or not found when the headerRequired is set to true. If a footer is not found and footerRequired is set to true an error will be emitted when flushing the stream.
lineis emitted for each line as its parsed in the stream if the transform stream is configured with objectMode set to true.