controlR is copyright (c) 2016 Structured Data LLC and released under the MIT license.
See source files for license details.
R is great for data processing, in particular because of the excellent libraries developed by the user community. Once you have developed your data model, you can run it through the R shell or GUI provided. Beyond that, it's pretty easy to embed R in a C/C++ application. You can build a full-featured desktop application or service this way.
Moreover because controlR runs R in an external process, you have benefits beyond what you get by embedding in a C++ application -- for example, you can monitor execution and kill off runaway processes; or you can start multiple instances and run code in parallel.
The child process will run R's event loop, meaning it supports things like R's html help server. This is probably only useful for desktop applications.
One further note on rationale -- for our purposes we did not want to modify the R source code in any way. controlR binds against the R shared libraries (DLLs on windows) at runtime. This way we can guarantee fidelity with the standard R interpreter, and building against updated versions of R is trivial.
What it is not
controlR is not designed for, nor is it suitable for, running a web service. It is designed to support a single client connection; and it adds no limitations on what running R code does to the host system.
R has full access to the system (subject to the host process' permissions) and the interface imposes no security restrictions on top of this. Therefore exposing the R interface to outside users is a significant security risk, even if the host process is running with minimal permissions.
R processes maintain internal state. If multiple clients connect to the same running instance, each client will have access to and can modify that state. For some purposes this is immaterial; but for our purposes this is not desirable. Therefore there is a tight binding between a single client and a single R process.
On the other hand, there is no reason you can't run multiple R processes at the same time, and talk to them from a single client or from multiple logical clients.
controlR connects to and talks to a single R instance. Within the interface there are
two separate "channels" for communication. In the API these are generally referred to
internal channel uses the embedded R interface. This is generally what you want
if you want to execute some R code and get a result back. For most purposes this is
sufficient to build an R application. The
exec channel talks to R through R's REPL
loop -- much as if you were using an R shell. Why do this at all? to support debugging
and R's concept of a
browser -- a window into executing code. Without modifying R code,
this is the only way to support debugging. We can also use it to build our R shell
(more on that later).
(JSON on the wire). The
exec channel executes code in a shell context, and (possibly)
prints results to the output console. Anything R wishes to print to the output console
R is single-threaded, and can run only one operation at a time. State is maintained by
the module, which enforces linear execution. State my be polled via the
and the module broadcasts state change events.
internal will fail if another call is in process. The module
queued_internal methods which will wait for a change in state
and then execute.
js/ directory for API and event documentation.
controlR depends on R and node. See build instructions. controlR futher depends on libuv and nlohmann::json, used under license and included in source distributions. See the individual projects for license details.
To build, set an environment variable
R_HOME pointing to the root of the R directory.
node-gyp to build.
> export R_HOME=/path/to/R-3.2.3 > npm install controlr
At runtime, you can either set an
R_HOME environment variable or pass a value directly
to the initialization method. Remember to escape backslashes in Windows paths.
const ControlR =var controlr = ;controlr;