Conflex - Confluence Extractor
Conflex is an application that is designed to crawl any Confluence Wiki and translate the information into structured data in a database that can be easily queried.
Application Architecture
The Conflex application follows the actor model framework. It utilises the comedy actor model framework implementation due to it being the most mature JavaScript actor framework.
--
Note this document is still under development whilst the transition to typescript is taking place.
--
Installation
npm install -g conflex
This will give you access to the cnx
command which will run the application.
Running The Application
To run the application simply run cnx
followed by any parameters in a terminal window.
The application will not stop running in its current state, this will be fixed in the next release candidate.
Usage Notes
The following are important usage notes for the conflex application:
- Pages that are to be retrieved from the wiki should be child pages of the space Home. This is due to the fact that pages
are undiscoverable if they exist in a space but are not a child of the space Home.
- Linking in the configuration file must be correctly written as these are parsed in the Confluence API return
_links
section and will be undiscoverable otherwise.- Heading/Title matching only supports full text matches i.e. no regex.
Command Line Arguments
The application supports the following command line arguments:
Argument | Description |
---|---|
--config | the location of the configuration file to run the application with. If none is provided, one in the folder where the script is run will be used. |
--version | return the current application version. |
Generating Documentation
The application supports documentation through Sphinx. To generate the Documentation run make html
. Currently the
only way to do this is through the Makefile
.
Configuration
The configuration file is written in YAML or JSON.
Example Configuration
conflex:
schedule: none
confluence:
host: https://wiki.confluence.com
username:
password:
datastores:
mysql:
host: 127.0.0.1
port: 3306
database: conflex
username:
password:
prefix: wiki
logging:
level: INFO
wiki:
spaces:
- SPACENAME:
pages:
labels:
- some-label:
pages:
labels:
- some-label-child
- some-label-second-child
- some-other-label:
pages:
labels:
- something-else
titles:
$ref: '#/schemas/SOMETHING/another-thing/headings'
schemas:
SOMETHING:
another-thing:
headings:
- Overview
- Principles
- Standards
- Guidelines
- Security
- Monitoring
- Resilience
- Recovery
- Future
Logging Configuration
The application supports the following levels of logging:
- SILENT
- ERROR
- WARN
- INFO
- DEBUG
Wiki Configuration
The example configuration above contains a configuration definition for wiki
. This configures the Conflex application
so that it knows where and what information the application is to retrieve. The wiki
configuration adheres to the
following design guideline.
wiki:
spaces:
[LIST OF SPACE NAMES]:
pages:
[PAGE NAVIGATION TYPES]:
[Page NAVIGATION VALUE]:
pages:
[PAGE NAVIGATION TYPES]:
[Page NAVIGATION VALUE]:
Wiki Descriptors
Keyword | Description |
---|---|
pages | Wiki pages. When using the pages descriptor, only the specified pages will be crawled for information. To get information from a child page, a child pages definition is required (This will also ensure that in the database, the parent /app of the child page will be set to the id of the parent page. The pages descriptor should only have navigation types as its immediate values. |
spaces | Wiki spaces. There should only ever be one spaces definition in the wiki configuration. |
Page Navigation Types
Keyword | Description |
---|---|
labels | Find a wiki page which is labelled with the following label. |
titles | Find a wiki page with the following title. |
Page Information Retrieval
All information from a page will be scraped and placed in the database.
Using $ref
The configuration file supports the use of the $ref
keyword for reusable configuration elements.
To reference a definition, use the $ref keyword:
$ref: 'reference to definition'
This works in a similar way to how Swagger implements the $ref
tag. Please see their
documentation for more details. Currently Conflex is only able to
reference definitions within the current configuration document.