TOC
v0.12.0: Important Announcement: fastcall!
As of v0.12.0 NOOOCL has switched its core native binding component from node-ffi to fastcall. It led to a significant performance increase (see fastcall becnhmarks). However, fastcall uses CMake.js as of its build system instead of bundled node-gyp. It means NOOOCL has no Python 2 dependency anymore, but you gotta have CMake installed.
About
Why OpenCL?
In Node.js JavaScript code is synchronous, single threaded. That's mean if you have an algorithm that can spawn a tens of thousand computation operations per second, your application will hang while the computation runs. The is no HTTP requests served, the is no event processed, there is nothing. You can overcome the above limitation by using some of the available threading modules there (like Webworker Threads), but this has some serious limitations:
- There is no synchronization implemented in Node.js, so the inter thread communication is allowed only by using messages. That's mean you can only exchange small, JSON serialized data between worker threads, so this is impossible to implement parallel algorithms that works on common data reside in memory buffers, like image processing for example.
- Code JIT-ed by V8 doesn't support SIMD instructions at the same level like that available in advanced C++ compilers. So while JavaScript code can be perfect for orchestrating large computation operations, it is not so good for writing them.
- Data parallelism is very hard to implement from scratch by using a only simple threading module, and it requires synchronization constructs that is not available in Node.js.
OpenCL is perfect to fill the gap. Beside solving the above issues it provides the following benefits:
- It supports GPU along with CPU based SSE/AVX instructions sets.
- It's supported by all the mayor GPU and CPU vendors.
- It's truly cross platform.
Why not WebCL?
WebCL is gonna be the OpenCL for JavaScript (TM) at some time. It will be (should be) supported by all mayor browsers, to give web developers a powerful, cross platform computation platform for supporting algorithms like image processing in the client side. At least that's the plan.
Right now WebCL is a draft specification. There are plugins available for some browsers, but it is far from being a part of the modern web standards.
The problem is WebCL specification is a nerfed version of the OpenCL 1.0 with some support of minor Open CL 1.1 features. Right now OpenCL standard stays on version 2.0 which is a huge step forward from the 1.x line. When WebCL will be a released technology it will be a toy compared to the mainline.
There is already a WebCL module for Node.js, if you are interested. But this module doesn't seem to be maintained for a while, and have strange dependencies for no apparent reason (you just don't need GLFW, GLEW, AntTweakBar and FreeImage to run OpenCL programs, trust me).
Why NOOOCL?
It's a full featured OpenCL wrapper library for Node.js. It supports full 1.1 and 1.2 specifications. Despite it's an OOP wrapper, the whole C API available by ffi, and can be called by using ref.
I know that there are some other OpenCL modules, but please check them out then decide that if there is a need for yet an other OpenCL module for Node.js or isn't?
OpenCL 2.0?
I'm planning to support OCL 2.0 in the near future, it just depends on demand. Open up an issue or give a tlinkin' star to my repo, and I'll look into this as soon as possible.
io.js? 0.12?
Those versions are supported as well.
Install
NPM:
npm install nooocl
JavaScript:
var nooocl = ;var CLHost = noooclCLHost;var CLPlatform = noooclCLPlatform;var CLDevice = noooclCLDevice;var CLContext = noooclCLContext;var CLBuffer = noooclCLBuffer;var CLCommandQueue = noooclCLCommandQueue;var CLUserEvent = noooclCLUserEvent;var NDRange = noooclNDRange;var CLProgram = noooclCLProgram;var CLKernel = noooclCLKernel;var CLImage2D = noooclCLImage2D;var CLImage3D = noooclCLImage3D;var CLSampler = noooclCLSampler;
Tutorial
1. Basics
Host
The OpenCL goodness is available through a CLHost instance.
host = CLHost; // for OpenCL 1.1host = CLHost; // for OpenCL 1.2host = 11; // for OpenCL 1.1host = 12; // for OpenCL 1.2
You will get an exception if there is no compatible OpenCL platform available.
CLHost and all of CL* class instances share this common, important properties:
- cl.version: version of the OpenCL platform
- cl.defs.xxx: predefined OpenCL values, like: CL_MEM_COPY_HOST_PTR, CL_DEVICE_MAX_COMPUTE_UNITS. See the OpenCL specification or NOOOCL/lib/cl/clDefines.js.
- cl.imports.clxxx: this is where OpenCL C API is imported with ffi, you can call native API methods like clEnqueueCopyBuffer, clEnqueueNDRangeKernel, etc.
- cl.types.xxx: ref compatible OpenCL type definitions, see the complete list there: NOOOCL/lib/cl/types.js.
Example:
var hostVersion = hostclversion; var someOpenCLValue = hostcldefsCL_MEM_COPY_HOST_PTR; var err = hostclimports;
Platforms
Then you can access to supported platforms:
var count = hostplatformsCount; // you will get an array filled with instances of nooocl.CLPlatform classvar allPlatforms = host;
For each platform you can access its information in JS properties:
var platform = host0; // First platform var info = name: platformname vendor: platformvendor clVersion: platformclVersion profile: platformprofile extensions: platformextensions;
CLPlatform and all CL* class instances except CLHost share the handle property, which holds the value of cl_platform_id, cl_command_queue, cl_kernel, etc, OpenCL native handles. These handles will be automatically released during garbage collection, rr they can be released explicitly by calling release method.
Devices
You can query available devices:
var all = platform; var cpus = platform; var gpus = platform; var accels = platform; var gpusAndCpus = platform;
You will get an array of nooocl.CLDevice class instances. CLDevice can provide all OpenCL device information in simple JavaScript properties, for example:
var cpuDevice = platform0; // you get the value of CL_DEVICE_MAX_COMPUTE_UNITS:var maxComputeUnits = cpuDevicemaxComputeUnits; // you get the value of CL_DEVICE_MAX_WORK_ITEM_SIZES in an array like: [256, 64, 1]:var maxWorkItemSizes = cpuDevicemaxWorkItemSizes;
Please see the API docs or NOOOCL/tests/hostTests.js unit test for complete list of available device info properties.
Ok, you have a host, a platform, a device, now you need a context. you can create it from a CLDevice instance, from an array of CLDevice instances, or from a CLPlatform instance and a device type, like:
// Create content for a single device:var cpuDevice = platform0;context = cpuDevice; // Create context for multiple devices:var gpusAndCpus = platform;context = gpusAndCpus; // Create context for a platform's devices:context = platform platformcldefsCL_DEVICE_TYPE_GPU;
The Queue
The last thing that you need in every OpenCL application is the command queue. you can create a queue for a device by calling CLCommandQueue class' constructor:
// The last two parameters are optional, their defaults are false:var queue = context cpuDevice isOutOfOrder isProfilingEnabled;
CLCommandQueue implements every clEnqueue* method but names modified slightly, like: clEnqueueMarker becomes enqueueMarker, clEnqueueNDRangeKernel becomes enqueueNDRangeKernel, and so on. Please see the API docs further details.
The queue has two modes. Waitable and non waitable. A queue initially is non waitable. If the queue is non waitable its enqueue* methods return undefined, if waitable enqueue* methods return a CLEvent instance which have a promise property of type bluebird promise. You can switch modes by calling waitable method, which accepts an optional boolean parameter. When its true, the result queue will be waitable, if false, the result queue will be non waitable. Default value is true.
Example:
var queue = context device; // It's non waitable. // Fire and forget a kernel:queue; // Read its result asynchronously:queuepromise ;
Please note there is no synchronous operations in NOOOCL, because those kill the event loop.
2. Memory
NOOOCL uses standard Node.js Buffer for memory pointers. Raw memory operations, like reinterpreting are implemented by using ref.
Allocate
OpenCL runtime can allocate memory if requested.
var openCLBuffer = context hostcldefsCL_MEM_ALLOC_HOST_PTR size_in_bytes_here;
You can copy data into this buffer, and copy data from it into Node.js memory.
var destBuffer = openCLBuffersize;queuepromise ;
Copy
OpenCL buffers can be initialized by copying values from an already initialized Node.js Buffer.
var float = reftypesfloat;var nodeBuffer = floatsize * 3;float;float;float;var openCLBuffer = context hostcldefsCL_MEM_COPY_HOST_PTR nodeBufferlength nodeBuffer;var otherBuffer = nodeBufferlength;queuepromise ;
Use
OpenCL can use Node.js buffers directly. It is safe to access its content only after a mapping operation.
var float = reftypesfloat;var nodeBuffer = floatsize * 3;float;float;float; var openCLBuffer = context hostcldefsCL_MEM_USE_HOST_PTR nodeBufferlength nodeBuffer; // You can use the following shortcut syntax instead of the above constructor call:// var openCLBuffer = CLBuffer.wrap(context, nodeBuffer); var otherBuffer = nodeBufferlength;var out = {};queuepromise ;
Images
2D and 3D images are also supported in NOOOCL. There is a unit test that shows how you can do OpenCL accelerated image grayscale conversion in NOOOCL, please take a look at it there: NOOOCL/tests/imageTests.js.
Fist, you should open the image and access to its raw RGBA data in a Node.js buffer. Any appropriate npm module can be used there (I suggest lwip).
Then you can create and OpenCL image from it:
var ImageFormat = hostconeltypesImageFormat;var format = imageChannelOrder: hostcldefsCL_RGBA imageChannelDataType: hostcldefsCL_UNSIGNED_INT8; // Wrap means CL_MEM_USE_HOST_PTRvar src = CLImage2D;
Please refer to the API docs for further details.
3. Program
Build
OpenCL programs can be compiled from string source code or loaded from precompiled binaries, these methods are supported in NOOOCL.
// Creating OpenCL program from string source:var source = 'kernel void foo(global float* data) { }';var program = context; // Everything is asynchronous in Node.js:program;
After a program builds you can access it's binaries for each device:
// This returns an array of CLDevice instancesvar devices = programdevices; // This returns an array of Buffer instancesvar binaries = program; // According to the OpenCL Specification:// "Each entry in this array is used by the implementation// as the location in memory where to copy the program binary for a specific device,// if there is a binary available. To find out which device// the program binary in the array refers to,// use the CL_PROGRAM_DEVICES query to get the list of devices.// There is a one-to-one correspondence between the array of n pointers// returned by CL_PROGRAM_BINARIES and array of devices// returned by CL_PROGRAM_DEVICES." // So you can zip the above:var deviceBinaries = _
Binaries could be stored in files for example, so when the application executes next time, there slow build from source process won't be necessary.
// Creating program from binaries: // This creates a Buffer instance:var binary = fs; var program = context; // You should call build,// but this time it will be much faster than compiling from source:program;
Kernel
You can create kernel by name, or can create all kernels in the program at once.
// By name:doStuffKernel = program; // All. This time the return values is an array of CLKernel instances.var kernels = program;doStuffKernel = _;
You can set its arguments by index, or all at once:
// Assume you have a kernel of the following signature:// kernel void doStuff(global float* data, uint someValue, local float* tmp) {...}// and a CLBuffer instance created like:// var openCLBuffer = CLBuffer.wrap(context, nodeBuffer); var kernel = program; // You can set kernel's arguments by index: // For buffer arguments you can pass the instance of a CLBuffer class:kernel;// or native cl_mem handle// kernel.setArg(0, openCLBuffer.handle); // For constant arguments you have to specify its typekernel; // For local buffers, you have to specify its size in byteskernel; // Or you can specify all of the arguments at once: kernel;
Now you can enqueue the kernel. In NOOOCL there is an NDRange class, for defining OpenCL ranges.
// 1 dimension range:var r1 = 10; // 2 dimensions range:var r2 = 10 20; // 3 dimensions rangevar r3 = 10 20 30;
So the enqueuing is really simple:
queue;
You can create a simple JavaScript function for calling OpenCL kernels with ad-hoc arguments by using the bind method:
var func = kernel; // offset // Now you have a JS function to call (aka set arguments and enqueue)// our OpenCL kernel!// It's easy as goblin pie.;
API Docs
Examples
Vector Addition
I converted this OpenCL tutorial's C++ code to JavaScript: OAK RIDGE - OpenCL Vector Addition.
You can find the example there.
Vector Addition ES6
Slightly modified version of the above Vector Addition example to demonstrate how promise based asynchronous code can look in recent version of JavaScript (like synchronous code).
You can find the example there.