Contents
- print-nonascii: print lines that contain non-ASCII characters.
- Examples
- Installation
- Usage
- License
- Changelog
print-nonascii: print lines that contain non-ASCII characters.
print-nonascii
is a Unix CLI that locates lines in text files or
stdin input that contain non-ASCII characters, which is helpful when
diagnosing character encoding problems.
Lines can be printed as-is and/or using abstract representations of non-ASCII characters in one of several formats; namely:
-v
,--caret
... the same representationcat -v
uses, based on caret notation.--bash
... per-byte two-digit hex. escape sequences such as\xc3
--psh
... PowerShell Unicode escape sequences such as`u{20ac}
for€
Note: --psh
only works correctly with properly UTF-8-encoded input.
Line numbers can be prepended on request, and output for multiple input files is by default preceded with headers identifying each input file.
Caveat: For now, no automated tests are run before releases.
Examples
# Create a test file with 1 line containing a non-ASCII character. $ cat <<'EOF' > /tmp/test.txtonetwöthreeEOF # Print only lines that have non-ASCII characters, as-is. $ print-nonascii /tmp/test.txttwö # Print only lines that have non-ASCII characters, with line numbers: $ print-nonascii -n /tmp/test.txt2:twö # Print only lines that have non-ASCII characters, using PowerShell # Unicode escape-sequence notation (--psh), preceded by the # line as-is (--raw). # The Unicode code point of character "ö" is U+00F6: $ print-nonascii --psh --raw /tmp/test.txttwötw`u{f6} # Ditto with line numbers and per-byte Bash escape sequences: $ print-nonascii --bash --raw /tmp/test.txttwötw\xc3\xb6 # Simulate input from multiple files by specifying the same file # twice, so as to show the headers identifying each input file # (suppress with -b). # Note that each header line (invisibly) starts with control # character U+0001, so as to allow more predictable # identification of header lines in the output. $ print-nonascii -n /tmp/test.txt /tmp/test.txt ### /tmp/test.txt2:twö### /tmp/test.txt2:twö
Installation
Prerequisites
- When installing from the npm registry: macOS and Linux
- When installing manually: any Unix platform with
bash
that also hasperl
installed.
Installation from the npm registry
With Node.js installed, install the package as follows:
[sudo] npm install print-nonascii -g
Note:
Note: Even if you don't use Node.js, its package manager, npm
, works across platforms and is easy to install; try curl -L https://git.io/n-install | bash
- Whether you need
sudo
depends on how you installed Node.js / io.js and whether you've changed permissions later; if you get anEACCES
error, try again withsudo
. - The
-g
ensures global installation and is needed to putprint-nonascii
in your system's$PATH
.
Manual installation
- Download the CLI as
print-nonascii
. - Make it executable with
chmod +x print-nonascii
. - Move it or symlink it to a folder in your
$PATH
, such as/usr/local/bin
(macOS) or/usr/bin
(Linux).
Usage
Find concise usage information below; for complete documentation, read the manual online, or, once installed, run man print-nonascii
(print-nonascii --man
if installed manually).
$ print-nonascii --help Prints lines that contain non-ASCII characters. print-nonascii [--<mode> [-r]] [-n] [-b] [file ...] print-nonascii -q [file ...] --<mode> prints abstract representations of non-ASCII chars.; one of: --caret, -v ... use caret notation, as cat -v would. --bash ... represent non-ASCII bytes as \xhh --psh ... (PowerShell) represent non-ASCII Unicode characters as Unicode escape sequences: <backtick>u{h...} -r, --raw ... with --<mode>, print each matching line as-is too, first. -n, --line-number ... prefix the output lines with their line number from the original file, using format "<line-number>:" - decimal line numbers, no padding, no space before or after the ":" -b, --bare ... suppress per-input-filename headers -q ... quiet mode: produce no output; signal presence of non-ASCII chars. with exit code 0; exit code 100 signals that there are none. Standard options: --help, --man, --version, --home
License
Copyright (c) 2017 Michael Klement mklement0@gmail.com (http://same2u.net), released under the MIT license.
Acknowledgements
This project gratefully depends on the following open-source components, according to the terms of their respective licenses.
npm dependencies below have an optional suffix denoting the type of dependency: the absence of a suffix denotes a required run-time dependency; (D)
denotes a development-time-only dependency, (O)
an optional dependency, and (P)
a peer dependency.
npm dependencies
Changelog
Versioning complies with semantic versioning (semver).
-
v0.0.3 (2017-09-11):
- [enhancement] Header lines are now only printed for input files that produce at least 1 output line.
-
v0.0.2 (2017-09-10):
- [fix] Header line is no longer printed twice when
--<mode>
is combined with--raw
. - Header line now uses a tab char. to separate prefix
###
from the filename.
- [fix] Header line is no longer printed twice when
-
v0.0.1 (2017-09-10):
- Initial release.