"Computer" for castor
Compute fields for the whole corpus of documents.
You can create new fields for the set of all documents, by tweaking the ad hoc JSON configuration file (the one located besides the data directory you give to castor, or the one you give in parameters).
corpusFields
All the settings concerning custom fields are put in corpusFields
key in that configuration file.
If you want to create a nbdocs
key for your corpus, add:
"corpusFields": "nbdoc" :
There are many options (keys like path
) you can use, see
Options.
Options
All following options go inside the corpusFields.fieldId
you created.
operator
Operator applies one the following operators to the given fields
:
- catalog (list the fields and their occurrences in the corpus)
- count (number of documents with a value for this
field
) - distinct (distinct values in that
field
) - graph (co-occurrences of each distinct value of the
field/s
) - total (sum of the numeric
field
for all documents) - ventilate (?)
Optional
To test the result of an operator, use the following route in your
browser: http://localhost:3000/compute.json?o=operator&f=field (replace
operator
by one of the previous list, and field
by the name of a
field).
The result is in data
.
Ex:
"corpusFields": "extcount" : "fields" : "extension" "operator" : "distinct"
fields
Fields is an array of dot notation path to fields in the documents, or in already computed corpusFields.
Often used with operator
, or glue
.
Optional
Ex:
sumsize" : { "visible" : true, "default" : 0, "label" : "Size total", "fields" : [ "filesize" ], "selector" : {}, "operator" : "total", "transform" : "", "type" : "number"},
glue
When the already computed field is an array, and that glue
is set,
each value of the array is joined in a string, using glue
between
every value. Optional
label
Label (in UTF-8, without any constraint): the name of the field to display in pages of the application. Optional
Ex:
"corpusField" : "doi" : "label" : "Document Object Identifier"
Values can be multiform:
- array of objects:
[{ "lang" : "XX", "$t": "The label" }]
- object:
{ "en" : "Hello", "fr": "Bonjour" }
- string
default
Default value, used when the field has no value (for example
when the path
is not present in the document). Optional
Ex:
"corpusFields" : "title" : "default" : "No title given"
transform
Apply any operations of the following list in chain to the field's value. Optional
- first
- last
- uniq
- at index
- max
- min
- pluck field
- filter
- reject
- sample n
- shuffle
- size
- where object
- invert
- keys
- values
- escape
- unescape
- capitalize
- clean
- count substring
- titleize
- humanize
- trim
- ltrim
- rtrim
- truncate length
- pad length
- lpad length
- rpad length
- center length
- slugify
Ex:
"corpusFields" : "slug" : "path" : "content.json.title" "transform" : "slugify()" "sumsize" : "fields" : "filesize" "operator" : "total" "transform" : "values().first()" "maxsize" : "fields" : "filesize" "operator" : "distinct" "transform" : "pluck('_id').max()"
selector
Replaces the default selector (which unselects documents which status
is either hidden
or deleted
).
It has the syntax of a
mongodb find
criteria.
Optional
Ex:
"selector": "state": "$nin": "deleted" "hidden"
"selector": "content.json.University": "MIT"
type
Transtype the custom field value, in order to be used with another
type than string by compute
, or used by a filter... (values: boolean
, string
, text
, number
, date
).
Optional
Ex:
"corpusField" : "Year" : "type" : "number"
pattern
Mask (or pattern) used to validate the variable. Optional
Values depend on type
:
- REGEX for
text
andstring
- date format for
date
compute
Compute a funex expression, on
the already generated corpusFields
.
You can access to the corpusFields.Year
simply using Year
.
Ex (where sumsize
and nbfiles
should have been computed before
avgsize
, that is to say within corpusFields
, but declared before
avgsize
) :
"corpusFields" : "avgsize" : "visible" : true "default" : 0 "label" : "Taille moyenne" "compute" : "sumsize / nbfiles" "type" : "number"
visible
Set visible
to true to indicate that this custom field should appear
whenever the theme needs to display the current field.
Optional (default value: false)
Ex:
"corpusField" : "Authors" : "visible" : true
mapping
Data mappings between the value of a corpus field and a static value. Static values can be declared as a hash table or as an array. Optional
Ex with array:
"corpusField" : "Month" : "default" : "0" "type": "number" "mapping" : "janvier" "février" "mars" "avril" "mai" "juin" "juillet" "août" "septembre" "octobre" "novembre" "décembre"
Ex with hash table:
"corpusField" : "Month" : "path" : "content.json.month" "mapping" : "JAN" : "janvier" "FEV" : "février" "MAR" : "mars" "AVR" : "avril" "MAI" : "mai" "JUN" : "juin" "JUL" : "juillet" "AOU" : "août" "SEP" : "septembre" "OCT" : "octobre" "NOV" : "novembre" "DEC" : "décembre"