This is to show how to integrate the Concurix profiling and monitoring solution with your Nodejs application. You can set up the 'helloworld' and start monitoring it in 10 minutes.
Step 1 install and start concurix-server locally or globally:
All the trace data of your application and your source code information are kept on premise, i.e. either on memory or on a file you specified. They are not sent anywhere.
$ npm install -g concurix-server$ concurix-server
$ npm install concurix-server$ ./node_modules/concurix-server/bin/server.js
Step 2 install and start helloworld:
$ npm install cx-helloworld$ node ./node_modules/cx-helloworld/concurix-root.js
Step 3 monitor it on your web browser: http://localhost:8103/dash/wfp:helloworld
Usually, you start the app as follows:
$ node helloworld.js
Tracer integration is simple. You do not have to edit helloworld.js, instead, create a separate module like concurix-root.js with two 'require' lines, and then start helloworld as follows:
$ node concurix-root.js
Once the helloworld app runs on your computer, it's time to start monitoring your own app, say, myapp.js. The simplest way is to replace 'helloworld.js' in the second 'require' line of concurix-root.js with 'myapp.js' and start 'node concurix-root.js' We'll see 'direct tracer integration' later in the 'advanced topics' section .
On the first 'require' line of concurix-root.js, four parameters are set for the Concurix tracer. You can customize the parameters as follows:
accountKey [required] : Any alphanumeric text is accepted as long as it starts with 'wfp:' Best practice is to use a project or module name. For example, when you monitor your 'dashboard' module in production, 'wfp:prod-dashboard', or when you monitor your 'payment' module in the test environment, 'wfp:test-payment' might be a good name. It's particulaly useful to use good naming convention for your environment when you file the trace data 'on-file' as opposed to 'in-memory' database - we'll cover more about on-file or in-memory later.
archiveInterval [optional] : (in msec) interval for the tracer to send trace files to Concurix server. Tracer uses this parameter as a guideline and randomly determines the actual interval which is usually a few seconds larger than this value - the actual trace interval is shown as 'Profile time duration' on the summary statistics view explained in the 'advanced topics' section Values between 5000 (five sec) and 60000 (one min) are recommended depending on frequency you want the tracer to send data to the Concurix server. Please note that it may be constrained by real memory size or disk space availability (more on this topic later). Defaults to 60000
api_host [optional] : Host URL of the Concurix server You may choose to run the Concurix server on a separate machine and run multiple apps to monitor on other machines pointing to the server URL. In that case, all tracing data can be stored on the server disk. We'll see more about on-file or in-memory later. Defaults to 'localhost'
api_port [optional] : Port on which the Concurix server runs Defaults to 8103
To analyze the helloworld app or your own app, the tracer needs to send trace files to the Concurix server. You can install the server on the same computer as your app (localhost) or you can install it on your remote host to monitor one or more apps running on one or more computers. The tracer-server communication uses HTTP. The helloworld example by default runs on the same computer as the Concurix server.
In an initial set up, you might see a message:
Error attempting to send trace file: Error: connect ECONNREFUSED
It means that Concurix server is not listening on the port specified as 'api_port' parameter on the tracer side, thus the tracer connection is refused. ECONNREFUSED itself is harmless. It just means that no monitoring is happening yet. If you see ECONNREFUSED, it's probably due to setting two different port numbers on the tracer and the server, or the server is not started yet.
To set up 'concurix-server', 'npm install concurix-server' then start the server as follows:
$ npm install -g concurix-server$ concurix-server [port]
'server.js' is a script to start the Concurix server. The port must match with the 'api_port' parameter set on the tracer side so that the server listens on the port to which the tracer talks.
Notes for the OSX user: 'npm install concurix-server' installs some dependent modules that require compilation. The compilation is done automatically during 'npm install concurix-server.' For the compilation to work on OSX, either Apple xCode (2GB) or Apple Command Line Tool (300KB) needs to be available. You might want to install Apple CLT as follows:
$ xcode-select --install
For more information about the Concurix server, please 'npm install concurix-server'.
Once the port is properly set on both ends, the tracer can communicate with the server. Now, you can analyze, in real-time, dynamic behavior of your app as well as all the modules your app depends:
For example, you can analyze the helloworld by navigating your browser to http://localhost:8103/dash/wfp:helloworld
The "require('concurix')" statement can be directly put in your module before any other required modules you'd like to trace. The simplest case is cx-helloworld/concurix-root.js which has one required module (helloworld.js) and no other statements.
var concurix =accountKey: "wfp:helloworld" // required must start with "wfp:"archiveInterval: 5000 // in msec. Defaults to 60000api_host: 'localhost' // Defaults to 'localhost'api_port: 8103 // Defaults to 8103
There are six configuration options for concurix-server. They are defined in 'config.json' file. If concurix-server is globally installed, it's under /usr/local/lib/node_modules/concurix-server/.
var config_options =stale_minutes: 750 // how long we retain data in the db, e.g., 1 day = 1440 = 60*24chart_minutes: 720 // data points shown on the Timeline view, e.g., 1 day = 1440 = 60*24max_transaction_count: 30 // the number of transactions shown on the Transaction viewin_memory: false // data are kept in memory or written to a filedir_path: "./" // directory path when data are written to a filedb_name: "minkelite.db" // file name when data are written to a file
You'll land on this view when navigating to http://localhost:8103/dash/wfp:helloworld . The top line chart shows system load average (1m) and process uptime. Note that when the process restarts, the trace data is stored separately under the new Process ID. The second line chart shows memory usage history.
When you click a particular time data point on the timeline view, you will be navigating to the state of your monitored module at the specific trace time interval.
Data of multiple hosts and processes are stored on the database separately. If your application restarts, the data will show up with a new process ID. Please note that the same host may appear under a few different names. For example, my Mac Book Pro are listed as SetoMBP.local or SetoMBP.home, etc.
Transaction history view shows transactions up to the number set as 'max_transaction_count' server configuration option. In the above screenshots, there are two transactions types: "request GET http://localhost:8123/" and "serve GET /" In the chart of each transaction type, there are 69 transaction instances.
Why 69 transaction instances? When the screenshots were taken, the 'chart_time" was seto to 10 min as you see. From the summary statistics view, we know the actual archive interval (Profile time duration) is 8.6 sec. 600 / 8.6 = 69.76. Each instance point shows mean and standard deviation because in the 8.6 sec interval, there were multiple HTTP requests (transactions). Exactly how many? helloworld.js sends a request every second. During the 8.6 sec interval, there are eight or nine requests. Each transaction instance shows an average and standard deviation of processing time of the eight or nine requests.
The 'max_transaction_count' option defines the number of transaction data points to display on the view. It does not change the amount of data the server stores to the database. The Concurix server stores all the transaction data whatever max_transaction_count is. If you set 'max_transaction_count' to a large value such as 1000,000, the transaction history list can be very long which could impact the performance of listing the transactions. When the Concurix server filters the transaction types and instances based on max_transaction_count, it sorts them by transaction duration in the reverse order, then trims the tail.
When you click a particular time data point on the timeline view or the transaction history view, you will be navigating to the state of your monitored module at the specific trace time interval.
'Transaction Sequences' view shows graphical split of the actual running time and async callback time of each specific code path occurredin the tracing interval. Clicking the blue horizontal bar on the code path, you will be jumping into the flame graph view.
Lastly, detailed flame graph of the specific code path. For detailed walk-through of the flame graph analysis, see the best practices below.
Did you hit the error when you did localhost:8103/dash/wfp:helloworld on your browser ? The error means that 'express' cannot find 'jade'. concurix-server requires 'express' as shown in its package.json. Go to concurix-server/node_modules and see if both 'express' and 'jade' are there. If not, run 'npm install' under the concurix-server directory and have express and jade installed under concurix-server/node_modules, then you are good to go. Here is a good post about the issue and solution.
As you know by now, browsing through the flame graphs, the Concurix profiling and monitoring solution records behaviors of all functions in every module through hundreds of thousands lines of code. You may face a case that the tracer fails to start tracing your module or someone else's.
For example, recursive calls. Passing a function to setTimeout within the function itself is the same nature. If the tracer raises a flag, quick workaround is to black-list by adding them the tracer parameters:
var concurix =accountKey: "wfp:helloworld" // required must start with "wfp:"archiveInterval: 5000 // in msec. Defaults to 60000api_host: "localhost" // Defaults to "localhost"api_port: 8103 // Defaults to 8103blacklist:"*/foo.js" // block any file named "foo.js""*node_modules/foo/*" // block anything under module "foo"
The default server config_options and the default tracer parameters are designed for both quick real-time monitoring and longer period of time so that you can not only monitor the system behavior real-time but run in-depth analysis offline. We'll discuss more about the offline analysis later.
Every numeric parameter/option has an impact on the memory/disk space to store the trace data. There may be an impact on performance such as the response time of the dashboard or overhead incurred to the system being profiled and monitored. The Concurix profiling and monitoring solution is designed to minimize the performance overhead so that it can be used for monitoring production systems as well as profiling the system in development. We have used the Concurix solution ourselves extensively and come up with a variety of configuration parameters and options.
Below, we go over two sets of recommended tracer parameters and server config options: one for development use and the other for production use. Please keep in mind, however, there is nothing to prevent you from using any combination of parameters and options in any environment to maximize your storage and memory efficiency.
|archiveInterval||5000 msec||60000 msec||interval for the tracer to send trace files to the server|
|api_host||"localhost"||remote server URL||Host URL of the Concurix server|
|api_port||8103 (default)||any port number||Port on which the Concurix server runs|
archiveInterval defines the precision of trace data. Setting it to ten times larger will result in 1/10 precision but storage requirement will also be 1/10.
api_host and api_port defines the location of the Concurix server. The tracer and the server communicate with HTTP API. For example, when multiple tracing processes are monitored on 'localhost', there will be addtional overhead incurred by the server process. When the server is running on a separate host, you can place the server anywhere on the web at a little performance cost of HTTP over the wire.
server config options
|stale_minutes||15 min||1450 min||how long we retain data in the db|
|chart_minutes||10 min||1440 min||data points shown on the Timeline view|
|max_transaction_count||500||20||the number of transactions shown on the Transaction view|
|in_memory||true||false||data are kept in memory or written to a file|
|dir_path||n/a||"./"||directory path when data are written to a file|
|db_name||n/a||"minkelite.db"||file name when data are written to a file|
Like archiveInterval, you can control both storage requirement and performance overhead by changing stale_minutes. For example, with the default configuration (archiveInterval: 5 sec and stale_minutes: 15 min.), the database size would be as small as several mega bytes which fits nicely in memory and the dashboard response is snappy. I'm using the default configuration in development running it on my laptop. For full fledged monitoring and offline analysis in production system, you may set archiveInterval: 60 sec and stale_minutes: 4320 min = 3 days which require more space to store the trace data. You might want to run the Concurix server process on a separate server host in production environment like I do.
NOTE: The dashboard has a limitation of 5,000 data points that it can display on the chart views. Each trace file corresponds to one data point. When the number of data points exceeds the limitation, the chart areas will be blank. When you choose to store large amount of data points (shorter archiveInterval and larger stale_time), please make sure to use a shorter chart_minutes to keep the number of data points to display less than 5,000.
The Concurix profiling and monitoring solution keeps the data in the database. It regularly prunes older data points and keeps the data for the last 'stale_minutes' minutes. 'stale_minutes' defines so called, TTL (time to live).
With 'in_memory' option setting to 'false', you can store all the data on file. The location of the 'on-file' database file is set by 'dir_path' and 'db_name' config options. Intended use case is the following. In case some incident was detected, you can copy the database file and analyze it offline. It works exactly like Flight Recorder. When you run the offline analysis, don't forget to set stale_minutes to zero to disable the pruning process.
As discussed in the config_options tip section, the overhead varies by the options and parameters of your choice. With the recommended DEVELOPMENT or PRODUCTION setting, the overhead will be small or 'noise level' so to speak so that you can find the options and parameters that best suit your production systems.
The flame graph provides rich information about your system behavior. The Concurix profiling and monitoring solution records all the trace data of your modules as well as the third party modules your module requires. Good news is that we've identified several best practices and developed sophisticated statistical models and I'd like to share some of the best practices here.
First, pick one module you are familiar with the structure of and identify the color of the bar of the module. For example, in our helloworld example, we see "cx-helloworld#app.get() fn argument" in 'light green' which means that our helloworld is shown in the same light green color throughout the flame graphs.
All the screenshots you see in this post were taken from the default set up of concurix-server and cx-helloworld. If you set up concurix-server and cx-helloworld with the default setting, you will see very similar flame graphs but the specific color might be different. As you see in helloworld.js, the callback we passed to app.get calls just 'res.end'. The flame graph shows exactly that, i.e., no functions but "http#OutgoingMessage.prototype.end" are on top of the light green bar. Beneath the light green bar, however, there are 28 green bars, which represent 'express' module. Express, as we know, does a lot of good things for us. In some layer of the flame graph, we see several very short function calls displayed in purple. The purple bar represents "parseurl" module.
Walking through the flame graph view looking for the light green bar, you would find "cx-helloworld#setInterval() fn argument" and "cx-helloworld#setInterval() fn argument>request() fn argument" which correspond to the two anonymous functions in setInterval call in helloworld.js.
You might ask: "There is another piece of code in helloworld.js which is app.listen. Where does the anonymous function appear in the flame graph?"
It doesn't appear because app.listen and the anonymous function were called right after helloworld.js was started and would never be called again. If you saw it in the flame graph, that would be the time helloworld.js was restarted for a good reason or another.
You can try to manually restart the helloworld process while running the server and see what's logged in the server. The new process is logged under a new Process ID. Browsing through the flame graphics for the new Process ID, you would find that 'cx-helloworld#app.listen() fn argument' in the very first trace interval.
Flame graph is powerful. It's unthinkable that we could get this much information quickly reading call stack trace.
The orange triangle means that 'system loadavg' or 'process memory used' is out of 3 sigma. Note that 'system loadavg' is a system level indicator. The loadavg line chart shows a peek when your process or any other process put a peak load to the system. On the other hand, 'process memory used' is an indicator of the process your app is running.
They are code paths that the V3 tracer could not associate with any particular transactions (e.g. A web request). Like any other waterfalls, you can click on the ‘untagged waterfall’ row to drill down to flame graph and see the details of where the time was spent.
That's from morgan in 'combined' format:
:remote-addr - :remote-user [:date[clf]] ":method :url HTTP/:http-version" :status :res[content-length] ":referrer" ":user-agent"
127.0.0.1 - - [26/Nov/2014:18:04:44 +0000] "POST /results/1.1.0 HTTP/1.1" 202 - "-" "-"
Yes. Please use concurix-server version 0.2.19 or later. On the client side, set your Heroku app name of your concurix-server as api_host and 80 as api_port in tracer parameters.
What's written here is just the beginning. We've developed various techniques for machine generated time-series data analytics. I discussed one example of machine learning technology stack we've developed in this blog post in case you're interested in.
Please feel free to contact me at firstname.lastname@example.org or email@example.com if you have questions about cx-helloworld or Concurix Enterprise Edition such as Custom Data Logging and Multi-User version of the Concurix profiling and monitoring solution.