This package has been deprecated

Author message:

This project is not maintained anymore.

castor-compute

2.0.0 • Public • Published

"Computer" for castor

Compute fields for the whole corpus of documents.

You can create new fields for the set of all documents, by tweaking the ad hoc JSON configuration file (the one located besides the data directory you give to castor, or the one you give in parameters).

corpusFields

All the settings concerning custom fields are put in corpusFields key in that configuration file.

If you want to create a nbdocs key for your corpus, add:

"corpusFields": {
  "nbdoc" : {
  }
}

There are many options (keys like path) you can use, see Options.

Options

All following options go inside the corpusFields.fieldId you created.

operator

Operator applies one the following operators to the given fields:

  • catalog (list the fields and their occurrences in the corpus)
  • count (number of documents with a value for this field)
  • distinct (distinct values in that field)
  • graph (co-occurrences of each distinct value of the field/s)
  • total (sum of the numeric field for all documents)
  • ventilate (?)

Optional

To test the result of an operator, use the following route in your browser: http://localhost:3000/compute.json?o=operator&f=field (replace operator by one of the previous list, and field by the name of a field). The result is in data.

Ex:

"corpusFields": {
  "extcount" : {
    "fields"   : [ "extension" ],
    "operator" : "distinct"
  }
}

fields

Fields is an array of dot notation path to fields in the documents, or in already computed corpusFields.

Often used with operator, or glue.

Optional

Ex:

sumsize" : {
  "visible" : true,
  "default" : 0,
  "label" : "Size total",
  "fields" : [
    "filesize"
  ],
  "selector" : {},
  "operator" : "total",
  "transform" : "values().first()",
  "type" : "number"
},

glue

When the already computed field is an array, and that glue is set, each value of the array is joined in a string, using glue between every value. Optional

label

Label (in UTF-8, without any constraint): the name of the field to display in pages of the application. Optional

Ex:

"corpusField" : {
  "doi" : {
    "label" : "Document Object Identifier"
  }
}

Values can be multiform:

  • array of objects: [{ "lang" : "XX", "$t": "The label" }]
  • object: { "en" : "Hello", "fr": "Bonjour" }
  • string

default

Default value, used when the field has no value (for example when the path is not present in the document). Optional

Ex:

"corpusFields" : {
  "title" : {
    "default" : "No title given"
  }
}

transform

Apply any operations of the following list in chain to the field's value. Optional

  • first
  • last
  • uniq
  • at index
  • max
  • min
  • pluck field
  • filter
  • reject
  • sample n
  • shuffle
  • size
  • where object
  • invert
  • keys
  • values
  • escape
  • unescape
  • capitalize
  • clean
  • count substring
  • titleize
  • humanize
  • trim
  • ltrim
  • rtrim
  • truncate length
  • pad length
  • lpad length
  • rpad length
  • center length
  • slugify

Ex:

"corpusFields" : {
  "slug" : {
    "path" : "content.json.title"
    "transform" : "slugify()"
  },
  "sumsize" : {
    "fields" : [
      "filesize"
    ],
    "operator" : "total",
    "transform" : "values().first()"
  },
  "maxsize" : {
    "fields" : [
      "filesize"
    ],
    "operator" : "distinct",
    "transform" : "pluck('_id').max()"
    }
}

selector

Replaces the default selector (which unselects documents which status is either hidden or deleted). It has the syntax of a mongodb find criteria. Optional

Ex:

  "selector": {
    "state": {
      "$nin": [ "deleted", "hidden" ]
    }
  }
  "selector": {
    "content.json.University": "MIT"
  }

type

Transtype the custom field value, in order to be used with another type than string by compute, or used by a filter... (values: boolean, string, text, number, date). Optional

Ex:

"corpusField" : {
  "Year" :  {
    "type" : "number"
  }
},

pattern

Mask (or pattern) used to validate the variable. Optional

Values depend on type:

  • REGEX for text and string
  • date format for date

compute

Compute a funex expression, on the already generated corpusFields. You can access to the corpusFields.Year simply using Year.

Ex (where sumsize and nbfiles should have been computed before avgsize, that is to say within corpusFields, but declared before avgsize) :

"corpusFields" : {
    "avgsize" : {
        "visible" : true,
        "default" : 0,
        "label" : "Taille moyenne",
        "compute" : "sumsize / nbfiles",
        "type" : "number"
    }
},

visible

Set visible to true to indicate that this custom field should appear whenever the theme needs to display the current field. Optional (default value: false)

Ex:

"corpusField" : {
  "Authors" :  {
    "visible" : true
  }
},

mapping

Data mappings between the value of a corpus field and a static value. Static values can be declared as a hash table or as an array. Optional

Ex with array:

"corpusField" : {
  "Month" :  {
    "default" : "0",
    "type": "number",
    "mapping" : [
        "janvier",
        "février",
        "mars",
        "avril",
        "mai",
        "juin",
        "juillet",
        "août",
        "septembre",
        "octobre",
        "novembre",
        "décembre"
      ]
  }
},

Ex with hash table:

"corpusField" : {
  "Month" :  {
    "path" : "content.json.month",
    "mapping" : {
        "JAN" : "janvier",
        "FEV" : "février",
        "MAR" : "mars",
        "AVR" : "avril",
        "MAI" : "mai",
        "JUN" : "juin",
        "JUL" : "juillet",
        "AOU" : "août",
        "SEP" : "septembre",
        "OCT" : "octobre",
        "NOV" : "novembre",
        "DEC" : "décembre"
      }
  }
},

Readme

Keywords

none

Package Sidebar

Install

npm i castor-compute

Weekly Downloads

4

Version

2.0.0

License

MIT

Last publish

Collaborators

  • touv