Contents
nws — whitespace normalization
nws
is a Unix CLI that normalizes whitespace in text, offering several modes,
grouped into two categories:
- Whitespace transliteration modes:
Line endings can be changed to be Windows- or Unix-specific, and select
Unicode whitespace and punctuation can be replaced with their closest ASCII
equivalents.
- Whitespace condensing modes:
Trims leading and trailing runs of any mix of tabs and spaces and replaces
them with a single space each. The individual modes in this category differ
only with respect to how multi-line input is treated.
Input can be provided either via filename arguments or via stdin.
Option -i
offers in-place updating.
See the examples below, get concise usage information further below, or read the manual.
Examples
Transliteration Examples
# Converts a CRLF line-endings file (Windows) to a LF-only file (Unix). # No output is produced, because the file is updated in-place; a backup # of the original file is created with suffix '.bak'. $ nws --mode lf --in-place=.bak from-windows.txt # Converts a LF-only file (Unix) to a CRLF line-endings file (Windows). # No output is produced, because the file is updated in-place; since no # backup suffix is specified, no backup file is created. $ nws --crlf -i from-unix.txt # Converts select Unicode whitespace and punctuation chars. to their # closest ASCII equivalents and sends the output to a different file. # Note that any other non-ASCII characters are left untouched. # Helpful for converting code samples that were formatted for display back to # valid source code. # IMPORTANT: This only works with properly encoded UTF-8 files. $ nws --ascii unicode-punct.txt > ascii-punct.txt
Condensing Examples
- Output from the example commands is piped to
cat -et
to better illustrate the output;cat -et
shows line endings as$
(and control chars. as^M<char>
; e.g., a tab would show as^I
).
# -- Single-input-line normalization (mode option doesn't apply). > nws | cat -et I will be normalized.$ # Ditto, but with a mix of spaces and tabs. > nws "$(printf ' I \t\t will be normalized.\t\t')" | cat -et I will be normalized.$ # -- Multi-input-line normalizations, using different modes. # Create demo file. > cat <<EOF > /tmp/nws-demo $(printf '\t') one two $(printf '\t') three EOF # Multi-paragraph mode - by default, or with `--mp` or `-m mp` or # `--mode multi-para`. # In addition to line-internal normalization, # folds runs of blank/empty lines into 1 empty line each. $ nws < /tmp/nws-demo | cat -et$one$two$$three$$ # Single-paragraph mode: `--sp` or `-m sp` or `--mode single-para` # In addition to line-internal normalization, # removes all blank/empty lines. $ nws --sp < /tmp/nws-demo | cat -etone$two$three$ # Flattened-multi-pargraph mode: `--fp` or `-m fp` or `--mode flat-para` # In addition to line-internal normalization, # joins paragraph-internal lines with a space each. $ nws --fp < /tmp/nws-demo | cat -et$one two$$three$$ # Single-output-line mode: `sl` or `-m sl` or `--mode single-line`. # In addition to line-internal normalization, # joins all non-empty/non-blank lines with a space each # to form a single, long output line. $ nws --sl < /tmp/nws-demo | cat -etone two three$
Installation
Supported platforms
- When installing from the npm registry: Linux and OSX
- When installing manually: any Unix-like platform with Bash and POSIX-compatible utilities.
Installation from the npm registry
Note: Even if you don't use Node.js, its package manager, npm
, works across platforms and is easy to install; try curl -L http://git.io/n-install | bash
With Node.js or io.js installed, install the package as follows:
[sudo] npm install nws-cli -g
Note:
- Whether you need
sudo
depends on how you installed Node.js / io.js and whether you've changed permissions later; if you get anEACCES
error, try again withsudo
. - The
-g
ensures global installation and is needed to putnws
in your system's$PATH
.
Manual installation
- Download the CLI as
nws
. - Make it executable with
chmod +x nws
. - Move it or symlink it to a folder in your
$PATH
, such as/usr/local/bin
(OSX) or/usr/bin
(Linux).
Usage
Find concise usage information below; for complete documentation, read the manual online or,
once installed, run man nws
(nws --man
if installed manually).
$ nws --help Normalizes whitespace in one of several modes. nws [-m <mode>] [[-i[<ext>]] file...] Condensing <mode>s: All these modes normalize runs of tabs and spaces to a single space each and trim leading and trailing runs; they only differ with respect to how multi-line input is processed. mp (default) multi-paragraph: folds multiple blank lines into one fp flattened multi-paragraph: normalizes each paragraph to single line sp single-paragraph: removes all blank lines. sl single-line: normalizes to single output line Transliteration <mode>s: lf translates line endings to LF-only (\n) crlf translates line endings to CRLF (\r\n) ascii translates Unicode whitespace and punctuation to ASCII Alternatively, specify mode values directly as options; e.g., --sp in lieu of -m sp Standard options: --help, --man, --version, --home
License
Copyright (c) 2015-2017 Michael Klement mklement0@gmail.com (http://same2u.net), released under the MIT license.
Acknowledgements
This project gratefully depends on the following open-source components, according to the terms of their respective licenses.
npm dependencies below have optional suffixes denoting the type of dependency; the absence of a suffix denotes a required run-time dependency: (D)
denotes a development-time-only dependency, (O)
an optional dependency, and (P)
a peer dependency.
npm dependencies
Changelog
Versioning complies with semantic versioning (semver).
-
v0.3.4 (2017-09-06):
- [doc] Clarified that
--mode ascii
(--asci
) only works with properly encoded UTF-8 files.
- [doc] Clarified that
-
v0.3.3 (2017-09-05):
- [enhancement] Error message for -i mode improved to reflect the count of input files in case the pre-updating check fails;
this is an improvement with potentially batched
xargs
-mediated invocations to at least provide a hint that only a given batch failed. - [doc] Fixed typo in man page.
- [enhancement] Error message for -i mode improved to reflect the count of input files in case the pre-updating check fails;
this is an improvement with potentially batched
-
v0.3.2 (2016-12-11):
- [fix] Mode
--crlf
is now idempotent with input that is already CRLF- terminated (previously, an extra CR was mistakenly added).
- [fix] Mode
-
v0.3.1 (2016-12-10):
- [doc] Copy-editing in read-me file.
-
v0.3.0 (2016-11-13):
- [BREAKING CHANGE]
nws
is now file-based: operands are interpreted as filenames, and option-i
allows in-place updating. Use stdin to provide strings as input, such as viaecho ... | nws ...
. - [enhancement] New transliteration modes added for changing line-ending styles and for translating non-ASCII Unicode whitespace/punctuation to their closest ASCII equivalents.
- [BREAKING CHANGE]
-
v0.2.0 (2015-09-18):
- [usability improvement] New, mnemonic mode names supersede the old numeric
normalization modes (option-arguments for
-m
); mode names come in both short and long forms; similarly,--mode
is now supported as a verbose alternative to-m
. - [deprecation] The numeric modes (0..3) still work, but should no longer be used and are no longer documented.
- [doc]
nws
now has a man page (if manually installed, usenws --man
);nws -h
now just prints concise usage information.
- [usability improvement] New, mnemonic mode names supersede the old numeric
normalization modes (option-arguments for
-
v0.1.4 (2015-09-15):
- [dev] Makefile improvements; various other behind-the-scenes tweaks.
-
v0.1.3 (2015-06-13):
- [doc] Read-me improvements.
-
v0.1.2 (2015-06-13):
- [doc] Read-me improvements.
-
v0.1.1 (2015-06-13):
- [doc] Read-me improvements.
-
v0.1.0 (2015-06-13):
- Initial release.