@financial-times/cp-content-pipeline-schema
TypeScript icon, indicating that this package has built-in type declarations

2.7.0 • Public • Published

Content Pipeline Schema

The GraphQL schema, data sources, and business logic consumed by the cp-content-pipeline-api and cp-content-pipeline-client. It is not intended for use outside this project.

Table of Contents

Usage Guide

This package's exports are shown below.

import {
    typeDefs,
    articleDocumentQuery
    QueryContext,
    resolvers,
    scalars,
    initDataSources,
    DataSources,
} from '@financial-times/cp-content-pipeline-schema'

They are consumed by a GraphQL server framework, for example:

import { ApolloServer } from 'apollo-server'
import {
  resolvers,
  typeDefs,
  dataSources,
} from '@financial-times/cp-content-pipeline-schema'

const server = new ApolloServer({
  typeDefs,
  resolvers,
  dataSources,
})

See the API package entry point for a more complex example.

Exports Explained

See GraphQL and Apollo Server Basics for more context on the below.

typeDefs

A collection of type definitions that is exported as typeDefs. It defines the shape of queries that are executed against our data. These are defined as .graphql files in the typedefs folder.

articleDocumentQuery

A GraphQL query for a complete FT article, that is consumed by the cp-content-pipeline-api.

QueryContext

An interface that defines the context object sent to each resolver function.

initDataSources & DataSources

Data sources are used to fetch data from external sources, such as the content-api (CAPI). These are defined in the data sources directory. Its factory function is initDataSources and its type is exported as DataSources.

Type Checking

We use Zod for our CAPI schemas to do runtime checks on the incoming data from data sources. We want to ensure that it matches the types defined in our data source schema.

There is no agreed schema provided by CAPI, so there is a chance it will occasionally be out of date or incorrect.

Manual updates will be required to keep it in sync with the responses we receive.

We monitor such errors in Grafana and log such errors to Splunk in a non-blocking format, for example:

{
    "event": "RECOVERABLE_ERROR",
    "error": { 
        "code": "CAPI_SCHEMA_VALIDATION_FAILURE",
        "data": { 
            "contentId": "http://www.ft.com/thing/31884c4d-2da7-4b43-917e-3180e9eafa3d",
            "contentType": "Article",
            "schemaError": [ 
                "mainImage.members.[].format": "Invalid literal value, received 'promo', expected one of 'standardInline','mobile','desktop'..."
            ]
        },
        "isOperational": true,
        "message": "The data received from the CAPI data source does not match our data source schema. It is likely that our schema will require updating to handle all possible responses from CAPI."
    }
}

resolvers

The GraphQL resolvers which define the technique for fetching types defined in the schema.

Type Checking

We use GraphQL Code Generator with the TypeScript Resolvers plugin to output TypeScript types for the expected function signatures for the resolvers. This prevents runtime GraphQL errors caused by unexpected data being returned.

Models

The resolvers accept and return data objects in the form of models instead of plain GraphQL objects. The models classes encapsulate business logic and interact with the data sources.

For example:

  • The Teaser.metaLink field is specified in the schema as the Concept GraphQL type
  • We use the Concept model for it internally
  • The Teaser.metaLink resolver accepts an instance of the Concept model as its parent argument. It returns an instance of the Concept model too
  • The Concept model is able to access the dataSources via its context

Themappers option in the config file is how we let the GraphQL Code Generator know how to map the models to our resolvers. The mappers option is an object whose keys are names of GraphQL types and values the paths of the models.

For the Concept example above, this looks like:

{
  Concept: '../model/Concept#Concept as ConceptModel'
}

The as ConceptModel aliasing is to make sure the name doesn't collide with the Concept GraphQL type in the generated code.

GraphQL Code Generator is also used to generate the client library.

Content resolution

The most complex resolver is the one for Financial Times article content - the body() resolver. It resolves content in the format of a content-tree, a specification for representing content as an abstract syntax tree. It implements the unist spec.

As of January 2023, the content-tree spec defines everything that can appear in the body of an article. In the future, this may extend to other fields (e.g. Topper).

It is shared across Spark, Content & Metadata and Customer Products.

What are content-tree nodes?

Each node in the content-tree has a type property, which will correspond to a what data is available to that nodes, e.g. paragraph, link, image-set.

How is the tree created?

As of January 2023, the bodyTree property is not yet being published to the Content API, so we convert the bodyXML field to a valid content tree within cp-content-pipeline.

This is done by the bodyXMLtoTree function. This function uses cheerio to parse the XML, and then traverses through the nodes, converting each one to a content-tree node.

We provide the bodyXMLtoTree function some tagMappings to map XML nodes to a content-tree nodes.

References

For some content-tree nodes, the CMS will not be able to provide all of the information required to render. For example, to render a tweet, we need to fetch the embed code from the Twitter API.

In the content-tree spec, this additional information will be marked as optional, as it may not be there when the tree is produced.

We provide this additional data to the tree by using an array of references - objects containing the extra information needed. These references can be queried using GraphQL, so we can make use of:

  • dataSources to fetch data
  • other resolvers (e.g. Picture) to share transformation logic

For nodes that require additional information, a file should be created in the content-tree/references folder, containing a GraphQL typeDef and resolver object.

Example

An example of a very simplified resolution for the body() resolver (including the references and the tree) is:

{
  "structured": {
    "references": {
        ...
    },
    "tree": {
      "type": "body",
      "version": 1,
      "children": [
        {
          "type": "paragraph",
          "children": [
            {
              "type": "text",
              "value": "Some text"
            },
          ]
        },
         {
          "type": "paragraph",
          "children": [
            {
              "type": "text",
              "value":  "Some text"
            },
          ]
        }
...
}

scalars

The collection of GraphQL types for resolvers which return scalars.

Architecture Diagram

flowchart TB
    subgraph typeDefs[typeDefs]
    end

    subgraph articleDocumentQuery[articleDocumentQuery]
    end

    subgraph articleJsonData[article content JSON data]
    end

    subgraph resolvers[resolvers]
        subgraph contentResolvers[content.ts]
            subgraph idResolver["id()"]
            end
            subgraph bodyResolver["body()"]
            end
            subgraph etcResolvers["..."]
            end
        end
    end

    subgraph bodyXMLToTree["bodyXMLToTree()"]
    end

    subgraph tagMappings["tagMappings()"]
    end

    subgraph models[models]
    end

    subgraph dataSources[dataSources]
    end

    typeDefs -- structures query --> articleDocumentQuery
    typeDefs -- structures resolvers -->  resolvers
    articleDocumentQuery -- once executed (via client or api), the query returns article data-->  articleJsonData
    dataSources --> models
    models --> resolvers
    resolvers  <-- query initiates resolvers --> articleDocumentQuery
    bodyXMLToTree -- traverses XML and returns content-tree structure --> bodyResolver
    tagMappings -- mappings provided to differentiate nodes --> bodyXMLToTree

Readme

Keywords

none

Package Sidebar

Install

npm i @financial-times/cp-content-pipeline-schema

Weekly Downloads

223

Version

2.7.0

License

ISC

Unpacked Size

2.09 MB

Total Files

339

Last publish

Collaborators

  • robertboulton
  • seraph2000
  • hamza.samih
  • notlee
  • emmalewis
  • aendra
  • the-ft
  • rowanmanning
  • chee
  • alexwilson