Nanotechnology Promises Much

    @digitallinguistics/concordance

    0.4.0 • Public • Published

    Concordance

    GitHub releases status issues npm downloads DOI license GitHub stars

    The Digital Linguistics (DLx) Concordance library is a Node.js library for creating a concordance of words in a corpus (a collection of texts in a language) which is formatted according to the Data Format for Digital Linguistics (DaFoDiL) (a JSON-based format). It is useful for anybody doing research involving linguistic corpora. If your data are not yet in DaFoDiL format, there are several converters available here.

    This library produces a tab-delimited file containing information about each token (instance) of the words specified. By default, the concordance is generated in Keyword in Context (KWIC) format, where the word is listed along with the immediately preceding and following context. An example of a partial concordance of the word little in The Three Little Pigs is shown in KWIC format below.

    text utterance word pre token post
    3LP 1 14 mother pig who had three little pigs and not enough food
    3LP 3 3 The first little pig was very lazy.
    3LP 5 3 The second little pig worked a little bit
    3LP 5 7 second little pig worked a little bit harder but he was
    3LP 7 3 The third little pig worked hard all day

    NOTE: This project is still in initial development phases, but should be ready for initial release by the end of September 2019.

    Basic Usage

    This following examples process any JSON files in the current directory and output a concordance file to concordance.tsv in Keyword in Context format. At a minimum, the concordance function requires a single argument: a wordform or list of wordforms to concordance.

    As a module:

    const concordance = require(`concordance`)
     
    const wordforms = [`little`, `big`];
     
    concordance({ wordforms });

    On the command line:

    dlx-conc -k --wordforms=little,big

    Note: The Keyword in Context format is not enabled by default. It must be enabled by passing the -k or --kwic flag.

    Options

    The available options are listed below.

    Module Command Line Default Description
    context -c, --context 10 the number of words to show to either side of the token (if the KWIC option is set to true)
    dir -d, --dir "." the directory where the corpus is located
    KWIC -k, --KWIC false whether to create the concordance in Keyword in Context format; adds pre and post columns to the concordance if true
    outputPath -o, --outputPath "concordance.tsv" path where the concordance file should be generated
    wordforms -w, --wordforms [] a string or list of strings of words to concordance (formatted as an array when using as a module, and as a comma-separated list when using on the command line)
    wordlist -l, --wordlist undefined path to a file containing a JSON array of words to concordance

    Contributing

    Report an issue or suggest a feature here.

    Pull requests are very welcome. Please make sure you've opened and issue for your change first.

    No test suite was written for this library, but you can test the results with npm test. A test concordance will be generated at test/concordance.tsv.

    About

    This library is authored and maintained by Daniel W. Hieber. Please consider citing this library following the model below:

    Hieber, Daniel W. 2019. digitallinguistics/concordance. DOI:10.5281/zenodo.3464144

    Install

    npm i @digitallinguistics/concordance

    DownloadsWeekly Downloads

    1

    Version

    0.4.0

    License

    MIT

    Unpacked Size

    51.4 kB

    Total Files

    14

    Last publish

    Collaborators

    • dwhieb