Nomnom Pumpernickle Muffins

    kasha

    0.14.6 • Public • Published

    logo

    Kasha

    Pre-render your Single-Page Application.

    workflow

    Features

    • Prerender the Single-Page Application.
    • Automatically collect sitemaps from <meta>s.
    • Generate robots.txt with sitemap directives.
    • Sync prerendering.
    • Async prerendering with callback URL.
    • URL rewriting.
    • Works as a proxy server.
    • Rich APIs.
    • Caching.

    Requirements

    SPA compatibility adjustments

    In order to make the pre-rendered SPA works correctly in the client-side browser, you need to do some works:

    • When pre-rendering, intercept the anonymous AJAX requests and store the responses into <script> tag, so AJAX requests would not send again on the client-side. Our AJAX library teleman and teleman-ssr-cache may help you.
    • On the client-side, mount the SPA and replace the pre-rendered content.
    • Set <meta> tags, so search engine can know more about the page. You can use set-meta.

    Installation

    npm i -g kasha

    Docker:

    docker pull kasha/kasha

    Configuration

    See config.sample.js

    Running

    Start the server:

    kasha server --config=/path/to/config.js

    Docker:

    docker run -v /path/to/config.js:/dest/to/config.js kasha/kasha server --config=/dest/to/config.js

    Start the worker:

    kasha worker --config=/path/to/config.js
     
    # async worker 
    # requests with 'callbackURL' parameter will be dispatched to async workers. 
    kasha worker --async --config=/path/to/config.js

    Docker:

    docker run -v /path/to/config.js:/dest/to/config.js kasha/kasha worker [--async] --config=/dest/to/config.js

    Site Config

    db.sites.insert({
      // The hostname of your site.
      host: 'www.example.com',
     
      // In proxy mode, if the request doesn't contain 'X-Forwarded-Proto' or 'Forwarded:...proto=...' header,
      // then use 'defaultProtocol'.
      defaultProtocol: 'https',
      
      // If your site use REST-style URLs, like /article/123, the query string isn't necessary to the page,
      // you can remove the query string to improve the cache hit rate:
      // keepQuery: false,
     
      // You can also keep the required query parameter of some URLs
      keepQuery: [
        [
          '/search', // the first element is the pathname of URL.
          'type', // starting from the second element, specifies the query names you need to keep.
          'keyword'
        ],
     
        // another URL and its query names
        ['/product', 'id']
      ],
     
      // You can use the '/render' API to crawl the hash-based Single-page application.
      // For example, you can crawl https://www.example.com/app/#/home via
      // /render?url=https%3A%2F%2Fwww.example.com%2Fapp%2F%23%2Fhome
      
      // But if this site is not hash-based, you can remove the hash:
      keepHash: false,
      
      // Rewrites the request URL.
      rewrites: [
        // [from, to]
        // If 'to' is an empty string, the request will be aborted.
        // pattern syntax see https://github.com/jiangfengming/url-router#pattern
     
        // route all requests to the entry point HTML file
        ['https://www.example.com/(.*)', 'https://static.example.com/index.html'],
     
        // except robots.txt
        ['https://www.example.com/robots.txt', 'https://static.example.com/robots.txt'],
     
        // or block it if you do not have one
        // ['https://www.example.com/robots.txt', ''],
     
        // block google analytics requests
        ['https://www.googletagmanager.com/(.*)', '']
      ],
     
      // Excludes the pages that don't need pre-rendering.
      excludes: [
        '/your-account/(.*)',
        '/your-orders/(.*)'
      ],
     
      // But include these pages that matched the excludes pattern
      includes: [
        'your-account/signin'
      ],
      
      // Specifies the User-Agent
      userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36',
      
      // You can create profiles for different device types.
      // A profile can override keepQuery, keepHash, rewrites, excludes, includes, userAgent.
     
      profiles: {
        desktop: {
          userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36',
          rewrites: [
            [
              'https://www.example.com/(.*)',
              'https://static.example.com/desktop/index.html'
            ]
          ]
        },
     
        mobile: {
          userAgent: 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Mobile Safari/537.36',
          rewrites: [
            [
              'https://www.example.com/(.*)',
              'https://static.example.com/mobile/index.html'
            ]
          ]
        }
      },
     
      // If profile param of the request isn't set, use this profile
      defaultProfile: 'desktop'
    })

    APIs

    Please confirm apiHost has been set correctly.

    For example, if set apiHost: '127.0.0.1:3000', then only requests from http(s)://127.0.0.1:3000/* can access the APIs, All other domains are served in proxy mode.

    GET /render

    Renders the page.

    Query string params:

    url: The encoded URL of the webpage to render.

    profile: The profile to use.

    type: Set the response type. Defaults to json.

    • html: Returns html with header Content-Type: text/html.
    • json: Returns json with header Content-Type: application/json.
    • static: Returns html with header Content-Type: text/html, but stripped the <script> tags and on* event handlers.

    callbackURL: Don't wait the result. Once the job is done, POST the result to the given URL with json format. If callbackURL is set, type is ignored.

    metaOnly: If type is json, only returns meta data without html content.

    followRedirect: Follows the redirects if the page return 301/302.

    refresh: Forces to refresh the cache.

    noWait: Don't wait for the response. It is useful for pre-caching the page.

    fallback: If no cache found or the cache is expired, the request is proxied to the origin directly. If fallback is set, type must be html, callbackURL, metaOnly, followRedirect, refresh and noWait can not be set.

    To the boolean parameters, if the param is absent or set to 0, it means false. If set to 1 or empty value (e.g., &refresh, &refresh=, &refresh=1), it means true.

    Example: http://localhost:3000/render?url=https%3A%2F%2Fdavidwalsh.name%2Ffacebook-meta-tags

    The returned JSON format example:

    {
      "url": "https://davidwalsh.name/facebook-meta-tags",
      "profile": "",
      "status": 200,
      "redirect": null,
      "meta": {
        "title": "Facebook Open Graph META Tags",
        "description": "Facebook's Open Graph protocol allows for web developers to turn their websites into Facebook \"graph\" objects, allowing a certain level of customization over how information is carried over from a non-Facebook website to Facebook when a page is \"recommended\" and \"liked\".",
        "image": "https://davidwalsh.name/demo/facebook-developers-logo.png",
        "canonicalUrl": "https://davidwalsh.name/facebook-meta-tags",
        "author": "David Walsh",
        "keywords": null
      },
      "openGraph": {
        "og": {
          "locale": {
            "current": "en_US"
          },
          "type": "article",
          "title": "Facebook Open Graph META Tags",
          "description": "Facebook's Open Graph protocol allows for web developers to turn their websites into Facebook \"graph\" objects, allowing a certain level of customization over how information is carried over from a non-Facebook website to Facebook when a page is \"recommended\" and \"liked\".",
          "url": "https://davidwalsh.name/facebook-meta-tags",
          "site_name": "David Walsh Blog",
          "updated_time": "2016-02-23T00:44:54+00:00",
          "image": [
            {
              "url": "https://davidwalsh.name/demo/facebook-developers-logo.png",
              "secure_url": "https://davidwalsh.name/demo/facebook-developers-logo.png"
            },
            {
              "url": "https://davidwalsh.name/demo/david-facebook-share.png",
              "secure_url": "https://davidwalsh.name/demo/david-facebook-share.png"
            }
          ]
        },
        "article": {
          "publisher": "https://www.facebook.com/davidwalshblog",
          "section": "APIs",
          "published_time": "2011-04-25T09:24:28+00:00",
          "modified_time": "2016-02-23T00:44:54+00:00"
        }
      },
      "content": "<!DOCTYPE html><html>...</html>",
      "date": "2018-03-13T09:53:00.921Z"
    }

    GET /:url

    Alias of /render?url=ENCODED_URL&type=html.

    For example, http://localhost:3000/https://www.example.com/ is equivalent to http://localhost:3000/render?url=https%3A%2F%2Fwww.example.com%2F&type=html

    And profile param can be set from Kasha-Profile header, fallback can be set from Kasha-Fallback header.

    Notice: the hash of the url won't be sent to server. If you need the hash to be sent to the server, use the /render API.

    Proxy mode

    If host header of the request is not apiHost, or X-Forwarded-Host or Forwarded:...host=... header is set, Then the requested URL will be treated as url query param of /render API. And type is set to html.

    For example, the following request

    GET /
    Host: www.example.com
    Kasha-Profile: mobile
    Kasha-Fallback: 1
    

    is equivalent to http://localhost:3000/render?url=https%3A%2F%2Fwww.example.com%2F&type=html&profile=mobile&fallback=1

    GET /cache?url=URL

    Alias of /render?url=ENCODED_URL&noWait

    GET /:site/robots.txt

    Get robots.txt file with sitemaps collected by kasha. e.g.:

    http://localhost:3000/https://www.example.com/robots.txt
    

    It will fetch the https://www.example.com/robots.txt file, then append sitemap directives at the end. The result example:

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /tmp/
    Disallow: /private/
     
    Sitemap: https://www.example.com/sitemaps.index.1.xml
    Sitemap: https://www.example.com/sitemaps.index.google.1.xml
    Sitemap: https://www.example.com/sitemaps.index.google.news.1.xml
    Sitemap: https://www.example.com/sitemaps.index.google.image.1.xml
    Sitemap: https://www.example.com/sitemaps.index.google.video.1.xml

    GET /:site/sitemaps.:page.xml

    Get sitemap of page N.

    For example:

    http://localhost:3000/https://www.example.com/sitemaps.1.xml
    

    GET /:site/sitemaps.google.:page.xml

    Get Google sitemap of page N.

    GET /:site/sitemaps.google.news.:page.xml

    Get Google news sitemap of page N.

    GET /:site/sitemaps.google.image.:page.xml

    Get Google image sitemap of page N.

    GET /:site/sitemaps.google.video.:page.xml

    Get Google video sitemap of page N.

    GET /:site/sitemaps.index.:page.xml

    Get sitemap index file of page N.

    GET /:site/sitemaps.index.google.:page.xml

    Get Google sitemap index file of page N.

    GET /:site/sitemaps.index.google.news.:page.xml

    Get Google news sitemap index file of Page N.

    GET /:site/sitemaps.index.google.image.:page.xml

    Get Google image sitemap index file of Page N.

    GET /:site/sitemaps.index.google.video.:page.xml

    Get Google video sitemap index file of page N.

    Collecting sitemap data

    kasha can collect sitemap data from custom Open Graph <meta> tags. For example:

    <head prefix="og: http://ogp.me/ns# sitemap: https://kasha-io.github.io/kasha/ns/sitemap#">
     
    <!--
    canonical url is used as <loc> tag of sitemap xml.
    <meta property="og:url" content="..."> can be used also.
    -->
    <link rel="canonical" href="https://www.example.com/test.html">
     
    <meta property="sitemap:changefreq" content="hourly">
    <meta property="sitemap:priority" content="1">
    <meta property="sitemap:news:publication:name" content="The Example Times">
    <meta property="sitemap:news:publication:language" content="en">
    <meta property="sitemap:news:publication_date" content="2018-05-25T09:19:54.000Z">
    <meta property="sitemap:news:title" content="Page Title">
    <meta property="sitemap:image:loc" content="http://examples.opengraphprotocol.us/media/images/train.jpg">
    <meta property="sitemap:image:caption" content="The caption of the image.">
    <meta property="sitemap:image:geo_location" content="Limerick, Ireland">
    </head>

    Sitemap data will be collected only if the origin of the canonical URL is the same as the current page.

    See here for available tags: sitemap protocol and Google sitemap extensions

    License

    MIT

    The logo is made from Prosymbols's camera icon licensed by Creative Commons BY 3.0.

    Install

    npm i kasha

    DownloadsWeekly Downloads

    12

    Version

    0.14.6

    License

    MIT

    Unpacked Size

    125 kB

    Total Files

    42

    Last publish

    Collaborators

    • jiangfengming