serverless-slic-watch-plugin

    1.0.4 • Public • Published

    slic-watch

    serverless npm version Build Coverage Status JavaScript Style Guide

    SLIC Watch provides a CloudWatch Dashboard and Alarms for:

    1. AWS Lambda
    2. API Gateway
    3. DynamoDB
    4. Kinesis Data Streams
    5. SQS Queues
    6. Step Functions

    Currently, SLIC Watch is available as a Serverless Framework plugin.

    Getting Started

    1. 📦 Install the plugin:
    npm install serverless-slic-watch-plugin --save-dev
    
    1. 🖋️ Add the plugin to the plugins section of serverless.yml:
    plugins:
      - serverless-slic-watch-plugin
    
    1. 🪛 Optionally, add some configuration for the plugin to the custom -> slicWatch section of serverless.yml. Here, you can specify a reference to the SNS topic for alarms. This is optional, but it's usually something you want so you can receive alarm notifications via email, Slack, etc.
    custom:
      slicWatch:
        topicArn: {'Fn::Ref': myTopic}
    

    See the Configuration section below for more detailed instructions on fine tuning SLIC Watch to your needs.

    1. 🚢 Deploy your application in the usual way, for example:
    sls deploy
    
    1. 👀 Head to the CloudWatch section of the AWS Console to check out your new dashboards 📊 and alarms !

    Features

    CloudWatch Alarms and Dashboard widgets are created for all supported resources in the CloudFormation stack generated by The Serverless Framework. This includes generated resources as well as resources specifed explicitly in the resources section. Any feature can be configured or disabled completely - see the section on configuration to see how.

    Lambda Functions

    Lambda Function alarms are created for:

    1. Errors
    2. Throttles, as a percentage of the number of invocations
    3. Duration, as a percentage of the function's configured timeout
    4. Invocations, disabled by default
    5. IteratorAge, for function's triggered by an Event Source Mapping

    Lambda dashboard widgets show:

    Errors Throttles Duration Average, P95 and Maximum
    Errors Throttles Throttles
    Invocations Concurrent Executions Iterator Age
    Invocations concurrent executions Iterator Age

    API Gateway

    API Gateway alarms are created for:

    1. 5XX Errors
    2. 4XX Errors
    3. Latency

    API Gateway dashboard widgets show:

    5XX Errors 4XX Errors Latency Count
    5XX Errors 4XX Errors Latency Count

    DynamoDB

    DynamoDB alarms are created for:

    1. Read Throttle Events (Table and GSI)
    2. Write Throttle Events (Table and GSI)
    3. UserErrors
    4. SystemErrors

    Dashboard widgets are created for tables and GSIs: dynamodbGSIReadThrottle.png dynamodbGSIWriteThrottle.png dynamodbTableWriteThrottle.png

    ReadThrottleEvents (Table) WriteThrottleEvent (Table)
    WriteThrottleEvents Table WriteThrottleEvents Table
    ReadThrottleEvents (GSI) WriteThrottleEvent (GSI)
    WriteThrottleEvents GSI WriteThrottleEvents GSI

    Kinesis Data Streams

    Kinesis data stream alarms are created for:

    1. Iterator Age
    2. Read Provisioned Throughput Exceeded
    3. Write Provisioned Throughput Exceeded
    4. PutRecord.Success
    5. PutRecords.Success
    6. GetRecords.Success

    Kinesis data stream dashboard widgets show:

    Iterator Age Read Provisioned Throughput Exceeded Write Provisioned Throughput Exceeded
    Iterator Age Provisioned Throughput Exceeded Put/Get Success

    SQS Queues

    SQS Queue alarms are create for:

    1. Age Of Oldest Message (disabled by default). If enabled, a threshold in seconds should be specified.
    2. In Flight Messages Percentage. This is a percentage of the AWS hard limits (20,000 messages for FIFO queues and 120,000 for standard queues).

    SQS queue dashboard widgets show:

    Messages Sent, Received and Deleted Messages Visible Age of Oldest Message
    Messages Messages Visible Oldest Message

    Step Functions

    Step Function alarms are created for:

    1. Execution Throttled
    2. Executions Failed
    3. Executions Timed Out

    The dashboard contains one widget per Step Function:

    ExecutionsFailed ExecutionThrottled, ExecutionsTimedOut
    Step Function widget

    Configuration

    Configuration is entirely optional - SLIC Watch provides defaults that work out of the box.

    Note: Alarm configuration is cascading. This means that configuration properties are automatically propagated from parent to children nodes (unless an override is present at the given node).

    You can customize the configuration:

    • at the top level, for all resources in each service, and/or
    • at the level of individual functions.

    Plugin configuration

    Top-level plugin configuration can be specified in the customslicWatch section of serverless.yml

    • The topic may be optionally provided as an SNS Topic destination for all alarms. If you omit the topic, alarms are still created but are not sent to any destination.
    • Alarms or dashboards can be disabled at any level in the configuration by adding enabled: false. You can even disable all plugin functionality by specifying enabled: false at the top-level plugin configuration.

    Supported options along with their defaults are shown below.

    # ...
    
    custom:
      slicWatch:
        topic: SNS_TOPIC_ARN  # This is optional but recommended so you can receive alarms via email, Slack, etc.
        enabled: true
    
        alarms:
          enabled: true
          Period: 60
          EvaluationPeriods: 1
          TreatMissingData: notBreaching
          ComparisonOperator: GreaterThanThreshold
          Lambda: # Lambda Functions
            Errors:
              Threshold: 0
              Statistic: Sum
            ThrottlesPc: # Throttles are evaluated as a percentage of invocations
              Threshold: 0
            DurationPc: # Duration is evaluated as a percentage of the function timeout
              Threshold: 95
              Statistic: Maximum
            Invocations: # No invocation alarms are created by default. Override threshold to create alarms
              enabled: false # Note: this one requires both `enabled: true` and `Threshold: someValue` to be effectively enabled
              Threshold: null
              Statistic: Sum
            IteratorAge:
              Threshold: 10000
              Statistic: Maximum
          ApiGateway: # API Gateway REST APIs
            5XXError:
              Statistic: Average
              Threshold: 0
            4XXError:
              Statistic: Average
              Threshold: 0.05
            Latency:
              ExtendedStatistic: p99
              Threshold: 5000
          States: # Step Functions
            Statistic: Sum
            ExecutionsThrottled:
              Threshold: 0
            ExecutionsFailed:
              Threshold: 0
            ExecutionsTimedOut:
              Threshold: 0
          DynamoDB:
            # Consumed read/write capacity units are not alarmed. These should either
            # be part of an auto-scaling configuration for provisioned mode or should be automatically
            # avoided for on-demand mode. Instead, we rely on persistent throttling
            # to alert failures in these scenarios.
            # Throttles can occur in normal operation and are handled with retries. Threshold should
            # therefore be configured to provide meaningful alarms based on higher than average throttling.
            Statistic: Sum
            ReadThrottleEvents:
              Threshold: 10
            WriteThrottleEvents:
              Threshold: 10
            UserErrors:
              Threshold: 0
            SystemErrors:
              Threshold: 0
          Kinesis:
            GetRecords.IteratorAgeMilliseconds:
              Statistic: Maximum
              Threshold: 10000
            ReadProvisionedThroughputExceeded:
              Statistic: Maximum
              Threshold: 0
            WriteProvisionedThroughputExceeded:
              Statistic: Maximum
              Threshold: 0
            PutRecord.Success:
              ComparisonOperator: LessThanThreshold
              Statistic: Average
              Threshold: 1
            PutRecords.Success:
              ComparisonOperator: LessThanThreshold
              Statistic: Average
              Threshold: 1
            GetRecords.Success:
              ComparisonOperator: LessThanThreshold
              Statistic: Average
              Threshold: 1
            SQS:
              # approximate age of the oldest message in the queue above threshold: messages aren't processed fast enough
              AgeOfOldestMessage:
                Statistic: Maximum
                enabled: false # Note: this one requires both `enabled: true` and `Threshold: someValue` to be effectively enabled
                Threshold: null
              # approximate number of messages in flight above threshold (in percentage of hard limit: 120000 for regular queues and 20000 for FIFO queues)
              InFlightMessagesPc:
                Statistic: Maximum
                Threshold: 80 # 80% of 120.000 for regular queues or 80% of 20000 for FIFO queues
    
        dashboard:
          enabled: true
          timeRange:
            # For possible 'start' and 'end' values, see
            # https:# docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/CloudWatch-Dashboard-Body-Structure.html
            start: -PT3H
          metricPeriod: 300
          widgets:
            metricPeriod: 300
            width: 8
            height: 6
            Lambda:
              # Metrics per Lambda Function
              Errors:
                Statistic: ['Sum']
              Throttles:
                Statistic: ['Sum']
              Duration:
                Statistic: ['Average', 'p95', 'Maximum']
              Invocations:
                Statistic: ['Sum']
              ConcurrentExecutions:
                Statistic: ['Maximum']
              IteratorAge:
                Statistic: ['Maximum']
            ApiGateway:
              5XXError:
                Statistic: ['Sum']
              4XXError:
                Statistic: ['Sum']
              Latency:
                Statistic: ['Average', 'p95']
              Count:
                Statistic: ['Sum']
            States:
              # Step Functions
              ExecutionsFailed:
                Statistic: ['Sum']
              ExecutionsThrottled:
                Statistic: ['Sum']
              ExecutionsTimedOut:
                Statistic: ['Sum']
            DynamoDB:
              # Tables and GSIs
              ReadThrottleEvents:
                Statistic: ['Sum']
              WriteThrottleEvents:
                Statistic: ['Sum']
            Kinesis:
              # Kinesis Data Streams
              GetRecords.IteratorAgeMilliseconds:
                Statistic: ['Maximum']
              ReadProvisionedThroughputExceeded:
                Statistic: ['Sum']
              WriteProvisionedThroughputExceeded:
                Statistic: ['Sum']
              PutRecord.Success:
                Statistic: ['Average']
              PutRecords.Success:
                Statistic: ['Average']
              GetRecords.Success:
                Statistic: ['Average']
            SQS:
              # SQS Queues
              NumberOfMessagesSent:
                Statistic: ["Sum"]
              NumberOfMessagesReceived:
                Statistic: ["Sum"]
              NumberOfMessagesDeleted:
                Statistic: ["Sum"]
              ApproximateAgeOfOldestMessage:
                Statistic: ["Maximum"]
              ApproximateNumberOfMessagesVisible:
                Statistic: ["Maximum"]

    An example project is provided for reference: serverless-test-project

    Function-level configuration

    For each function, add the slicWatch property to configure specific overrides for alarms and dashboards relating to the AWS Lambda Function resource.

    functions:
      hello:
        handler: basic-handler.hello
        slicWatch:
          dashboard:
            enabled: false    # No Lambda widgets will be created for this function
          alarms:
            Lambda:
              Invocations:
                Threshold: 2  # The invocation threshold is specific to
                              # this function's expected invocation count

    To disable all alarms for any given function, use:

    functions:
      hello:
        handler: basic-handler.hello
        slicWatch:
          alarms:
            Lambda:
              enabled: false

    A note on CloudWatch cost

    This plugin creates additional CloudWatch resources that, apart from a limited free tier, have an associated cost. Depending on what you enable, SLIC Watch creates one dashboard and multiple alarms. The number of each depend on the number of resources in your stack and the number of stacks you have.

    Check out the AWS CloudWatch Pricing page to understand the cost impact of creating CloudWatch resources.

    References

    Other Projects

    1. serverless-plugin-aws-alerts
    2. Real World Serverless Application - Serverless Operations
    3. CDK Watchful
    4. CDK Patterns - The CloudWatch Dashboard

    Reading

    1. AWS Well Architected Serverless Applications Lens
    2. How to Monitor Lambda with CloudWatch Metrics - Yan Cui

    LICENSE

    Apache - LICENSE

    Keywords

    none

    Install

    npm i serverless-slic-watch-plugin

    DownloadsWeekly Downloads

    425

    Version

    1.0.4

    License

    Apache

    Unpacked Size

    173 kB

    Total Files

    34

    Last publish

    Collaborators

    • eoinsha
    • lmammino