node package manager
Stop writing boring code. Discover, share, and reuse within your team. Create a free org »

gitwalk

gitwalk: Bulk processing of git repos

npm badge npm downloads

gitwalk is a tool to manipulate multiple git repositories at once. You select a group of repos using an expression and provide an operation to be completed. Gitwalk will download the repos, iterate through them and run the operation for each one. This may be searching through files and commits, running tests and linters, editing files and pushing the changes back upstream — whatever you can think of.

gitwalk in the terminal

gitwalk is made in CoffeeScript and runs on Node.js. nodegit and node-github do most of the heavy lifting in the background.

Features

  • Wildcard matching
  • Integrates directly with GitHub API
  • Works with private repos via GitHub auth
  • Authenticated pushes via ssh and http
  • Lets you define groups of repositories
  • Built-in search tool
  • Easily extensible with new expressions and processors
  • Usable from the CLI as well as JavaScript


Installation

Gitwalk is distributed as an npm package. Use the following command to install it on your system:

$ npm install -g gitwalk

Make sure to include -g to get the CLI command in your $PATH.

Usage

The interface is pretty straight-forward. Here's the synopsis:

gitwalk [options] <expr...> <proc> <proc-args>

The first positional arguments are expressions that determine which repositories will be processed. You can pass one or more of them at the same time. The next one is your preferred processor which specifies what will happen with the repositories, followed by any arguments that it takes — usually a command, but different processors support different ones.

Check out the expressions and processors sections below for the details.

Examples

Here are a few quick examples that should work out of the box. Paste them in your terminal and see what happens.

$ gitwalk 'github:pazdera/tco:*' grep TODO
 
$ gitwalk '**/*' command 'tree .'
 
$ gitwalk 'github:pazdera/@(catpix|word_wrap)' commits 'echo "#{sha}: #{summary}"'

Expressions

Expressions determine which repositories will be processed by the gitwalk command. You can provide one or more of them and their results will be combined. Check out the following examples:

# Matches all branches of the npm repo on GitHub.
$ gitwalk 'github:npm/npm:*' ...
 
# Matches all the git repositories in my home dir.
$ gitwalk '~/**/*' ...
 
# Use ^ to exclude previously matched repositories.
# Matches all my repos on GitHub _except_ of scriptster.
$ gitwalk 'github:pazdera/*' '^github:pazdera/scriptster' ...
 
# URLs work too.
$ gitwalk 'https://github.com/pazdera/tco.git:*' ...
 
# You can predefine custom groups of repositories.
# Check out the _Groups_ resolver below.
$ gitwalk 'group:all-js' 'group:all-ruby' ...

What comes after the last colon in each expression is interpreted as branch name. If omitted, gitwalk assumes master by default. Also note that you can use globs for certain parts. Check out the detailed description of each resolver below for more information.

Under the hood, each type of expression is handled by a resolver. What follows is a description of those that come with gitwalk by default. However, it's really easy to add new ones. See the contribution guidelines.

GitHub

With this resolver, you can match repositories directly on GitHub. Gitwalk will make queries to their API and clone the repositories automatically.

github:<username>/<repo>:<branch>
  • username: GitHub username or organisation.
  • repo: Which repositories to process (glob expressions allowed).
  • [optional] branch: Name of the branch to process (glob expressions allowed).

Private repos

To be able to work with private repositories, you need to give gitwalk either your credentials or auth token. Put this into your configuration file:

{
  "resolvers": {
    "github": {
      "username": "example",
      "password": "1337_h4x0r",
      "token": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }
  }
}

Push access

Pushing to repositories on GitHub again requires authentication. You'll need to give gitwalk either your SSH keys or credentials via the config file. Here's how to do it:

{
  "git": {
    "auth": {
      "github-http-auth": {
        "pattern": "^https?\\:\\/\\/github\\.com",
        "username": "example",
        "password": "1337_h4x0r"
      },
      "github-ssh-auth": {
        "pattern": "^git@github\\.com",
        "public_key": "/Users/radek/.ssh/id_rsa.pub",
        "private_key": "/Users/radek/.ssh/id_rsa",
        "passphrase": ""
      }
    }
  }
}

Glob

You can match repositories on your file system using glob. Note that gitwalk will clone the repository even if it's stored locally on your computer. One can never be too careful.

<path>:<branch>
  • path: Location of the repositories (glob expressions allowed).
  • [optional] branch: Name of the branch to process (glob expressions allowed).

URL

URLs aren't a problem eiter. Gitwalk will make a local copy of the remote repository and process it. However, you can't do any wildcard matching with URLs.

<url>:<branch>
  • url: An URL pointing to a git repository (glob not allowed).
  • [optional] branch: Name of the branch to process (glob expressions allowed).

Groups

You can define custom groups of expressions that you can refer to by name later on. There are no predefined groups.

group:<name>
  • name: Name of the group.

Configuring groups

Groups of expressions can be defined in the configuration file like this:

{
  "resolvers": {
    "groups": {
      "all-ruby": [
        "github:pazdera/tco",
        "github:pazdera/scriptster",
        "github:pazdera/catpix"
      ],
      "c": [
          "github:pazdera/*itree*:*",
          "^github:pazdera/e2fs*"
      ],
      "c++": [
        "github:pazdera/libcity:*",
        "github:pazdera/OgreCity:*",
        "github:pazdera/pop3client:*"
      ]
    }
  }
}

Processors

Processors are tasks that you can run on each of the repositories that were matched by your expressions. Just as with expressions, it's really easy to add your own. Here are the ones that ship with gitwalk by default:

# Search for unfinished work in all JavaScript files
$ gitwalk ... grep '(TODO|FIXME)' '**/*.js'
 
# List all files in the repository
$ gitwalk ... command 'tree .'
 
# Another way to search the files
$ gitwalk ... command 'git grep "(TODO|FIXME)"'
 
# Replace the year in all Ruby files
$ gitwalk ... files '**/*.rb' 'sed -i s/2015/2016/g #{file}'
 
# Simple commit message profanity detector
$ gitwalk ... commits 'grep "(f.ck|sh.t|b.llocks)" <<<"#{message}"'

The #{hashCurlyBraces} templates will be expanded into values before the command is executed. Each command exports different set of variables; check out the detailed descriptions below to find out more.

grep

Line-by-line search through all the files in the working directory using regular expressions, similar to grep.

$ gitwalk ... grep <pattern> [<files>]
  • regexp: The pattern to look for.
  • (optional) files: Limit the search to certain files only (glob expressions allowed). Searches all files by default.

files

Run a custom command for each file in the repository. The path parameter lets you match only certain files.

$ gitwalk ... files <path> <command>
  • path: Path filter (glob expressions allowed).
  • command: A command to run for each file
    • Expored vars
      • #{file}

commits

Run a command for each

$ gitwalk ... commits <command>
  • command: A command to run for each commit.
    • Expored vars
      • #{sha}
      • #{summary}
      • #{message}
      • #{author}
      • #{committer}

command

Run a command per repository. This is useful for running custom scripts. The current working directory for the command is set to the working tree of gitwalk's local clone. You should be able to use the git command as usual.

$ gitwalk ... command <command>
  • command: A command to run for each repository.

Configuration

Providing a config file isn't mandatory. However, you'll need one if you want to be able to git-push or access private repositories. Gitwalk loads the following files:

  • ~/.gitwalk.(json|cson)
  • /etc/gitwalk.(json|cson)

Both json and cson formats are accepted. Here's how a basic gitwalk config might look like:

{
  "resolvers": {
    "github": {
      "token": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    },
  },
  "git": {
    "auth": {
      "github-ssh": {
        "pattern": "^git@github\\.com",
        "public_key": "/Users/radek/.ssh/github.pub",
        "private_key": "/Users/radek/.ssh/github",
        "passphrase": ""
      }
    }
  },
  "logger": {
    "level": "debug"
  }
}

JavaScript API

The JS API is similar to the CLI interface:

var gitwalk = require('gitwalk');
 
gitwalk('github:pazdera/\*', gitwalk.proc.grep(/TODO/, '*.js'), function (err) {
    if (err) {
        console.log 'gitwalk failed (' + err + ')';
    }
});

Contributing

If you found a bug, please open a new issue with as much relevant information as possible. And in case you'd like to jump in and get your hands dirty, here are a few ideas to get you started:

  • Add new resolvers (BitBucket and other services)
  • Add new processors (profanity search, integration with linters)
  • The tests are atrocious (please help me write them!)
  • Finish writing the API docs

If you start working on something, feel free to create an issue or drop me a line to make sure you're not working on the same thing as somebody else.

Credits

Radek Pazdera <me@radek.io> http://radek.io

Licence

Please see LICENCE.