GitDB is a RESTful HTTP network service for reading and writing to git repositories. It is implemented in Coffeescript and uses the nodegit Javascript libgit2 bindings.

GitDB is a RESTful HTTP network service for reading and writing to git repositories. It is implemented in Coffeescript and uses the nodegit Javascript libgit2 bindings.

Check out a live demo of the HTTP interface here

GitDB was originally designed for a scalable, git-backed, wiki. Typically, git-backed wikis host the web server and git repository on the same machine. This approach is difficult to scale: if your site gets popular and you need to add more web servers, you need also partition the git repository. To remedy this, GitDB wraps a git repository in a network service: now, the web server tier can be deployed in a normal shared-nothing configuration, communicating with GitDB over an HTTP REST interface; and the git tier can scale independently.

One reason to use git as a datastore is that it models well the revision history of a corpus of documents, as in a wiki -- but git is, at its core, a durable store for blobs; trees; and directed, acyclic graphs; with a variety of atomic read and write operations. So GitDB may be useful in a variety of contexts.

GitDB is designed to be both low latency and convenient. Its API:

  • minimizes network round-trips
  • supports compression ubiquitously
  • support patch/delta get and set operations -- not yet implemented
  • supports high concurrency thanks to an asynchronous programming model

Finally, the API conveniently supports a variety of atomic-write and snapshot-read operations.

Although git was not designed for OLTP applications, it has very low latency (~1ms) for most common tasks, once the file-system cache is warm. Benchmarks will be forthcoming.

The API is inspired GitHub's HTTP API but it offers more flexibility for OLTP applications. The API is hypermedia in style, so some familiarity with custom media types and, of course, HTTP verbs is a must.

This document assumes some fmailiarity with Git's internals; check out Chapter 9 of Pro Git first if you are unfamiliar with Git's object model.

GitDB aims to have a "hypermedia" API. In theory, hypermedia APIs loosely couple clients and servers. The loose coupling is provided by hyperlinking and content negotiation.

GitDB provides a text/html interface, an application/json interface, and a few custom, "vendor" media-types, such as application/vnd.gitdb.raw. This last is used for retrieving the "raw" contents of a blob, i.e., the contents of a file without any metadata. It is be useful for very large files, where wrapping the a file in JSON would be computationally expensive to parse.

The json and vendor media types constitute the "API", while the html media type is browsable in your favorite browser. The URL structure and semantics of the HTTP verbs (PUT, GET, etc.) are identical across media types, so the system is partially "self-documenting".

application/json media may have one or more *_url properties linking to other resources. This is exactly analogous to <a> tags in html. These are provided so API clients don’t need to explicitly construct URLs. The * in *_url will be a link relation; for example, a repository is linked to its references by its ref_url attribute. For example, here is the JSON of a repository:

    "url": "/repos/Perseus",
    "refs_url": "/repos/Perseus/refs"

Whether you choose to follow links or construct them on your own is a practical decision. Following links is resilient to some API change: in principle, you can simply start at the root resources (/) and traverse to any other resource without ever hardcoding a URL in your source code. In practice this adds more network round-trips, at least at startup.

GitDB uses the HTTP verbs with a fairly standard interpretation. POST creates a new resource, PUT replaces a resource, PATCH updates a resource, and DELETE deletes a resource.

Every resource in git is immutable except for references (e.g., a branch). So PUT, PATCH, and DELETE, are only meaningful relative to a reference: that is, you may modify some files on a branch, but what happens under the covers is that new trees and blobs are created, a new commit is created, and the reference is updated to point to the new commit.

Metadata about repositories hosted by GitDB.

References are typically tags and branches; in theory, they can "refer" to any kind of git object, but usually they refer to commits. "Tags" are usually immutable, whereas branches often change.

You may want to view a file on a branch, or view the history (git log) of a branch:

"Head refs" are how branches keep track of the current commit. You can make a commit to a branch in two ways: either you can update many files at once using patch, or you can edit just one file to make a more precise commit, using put. In either case, these are atomic operations; all updates succeed or none succeed, and the If-Match HTTP header to perform a kind of compare-and-set.

Individual commit objects can be part of a branch's (or reference's) history, or they can float off in the aether. Note: creating a new commit is idempotent.

A branch potentially changes over time, but you can easily read the file system over many http requests at a specific point in time -- like a snapshot-read.

Blobs are how files are represented in git; they can be binary and very large. Since they are referenced by their sha fingerprint, they are immutable.

Trees represent the directory structure. Typically you will want to read a tree relative to a commit or a reference, but creating trees (and blobs) directly can sometimes be useful. For exampe, you may want to make several modifications over several HTTP requests, and then later commit them all atomically.