print-nonascii

0.0.3 • Public • Published

npm versionlicense

Contents

print-nonascii: print lines that contain non-ASCII characters.

print-nonascii is a Unix CLI that locates lines in text files or
stdin input that contain non-ASCII characters, which is helpful when
diagnosing character encoding problems.

Lines can be printed as-is and/or using abstract representations of non-ASCII characters in one of several formats; namely:

  • -v, --caret ... the same representation cat -v uses, based on caret notation.
  • --bash ... per-byte two-digit hex. escape sequences such as \xc3
  • --psh ... PowerShell Unicode escape sequences such as `u{20ac} for

Note: --psh only works correctly with properly UTF-8-encoded input.

Line numbers can be prepended on request, and output for multiple input files is by default preceded with headers identifying each input file.

Caveat: For now, no automated tests are run before releases.

Examples

# Create a test file with 1 line containing a non-ASCII character. 
$ cat <<'EOF' > /tmp/test.txt
one
twö
three
EOF
 
# Print only lines that have non-ASCII characters, as-is. 
$ print-nonascii /tmp/test.txt
twö
 
# Print only lines that have non-ASCII characters, with line numbers: 
$ print-nonascii -n /tmp/test.txt
2:twö
 
# Print only lines that have non-ASCII characters, using PowerShell  
# Unicode escape-sequence notation (--psh), preceded by the  
# line as-is (--raw). 
# The Unicode code point of character "ö" is U+00F6: 
$ print-nonascii --psh --raw /tmp/test.txt
twö
tw`u{f6}
 
# Ditto with line numbers and per-byte Bash escape sequences: 
$ print-nonascii --bash --raw /tmp/test.txt
twö
tw\xc3\xb6
 
# Simulate input from multiple files by specifying the same file 
# twice, so as to show the headers identifying each input file  
# (suppress with -b). 
# Note that each header line (invisibly) starts with control  
# character U+0001, so as to allow more predictable 
# identification of header lines in the output. 
$ print-nonascii -n /tmp/test.txt /tmp/test.txt 
### /tmp/test.txt
2:twö
### /tmp/test.txt
2:twö

Installation

Prerequisites

  • When installing from the npm registry: macOS and Linux
  • When installing manually: any Unix platform with bash that also has perl installed.

Installation from the npm registry

With Node.js installed, install the package as follows:

[sudo] npm install print-nonascii -g

Note:

Note: Even if you don't use Node.js, its package manager, npm, works across platforms and is easy to install; try curl -L https://git.io/n-install | bash

  • Whether you need sudo depends on how you installed Node.js / io.js and whether you've changed permissions later; if you get an EACCES error, try again with sudo.
  • The -g ensures global installation and is needed to put print-nonascii in your system's $PATH.

Manual installation

  • Download the CLI as print-nonascii.
  • Make it executable with chmod +x print-nonascii.
  • Move it or symlink it to a folder in your $PATH, such as /usr/local/bin (macOS) or /usr/bin (Linux).

Usage

Find concise usage information below; for complete documentation, read the manual online, or, once installed, run man print-nonascii (print-nonascii --man if installed manually).

$ print-nonascii --help
 
 
Prints lines that contain non-ASCII characters.
 
    print-nonascii [--<mode> [-r]] [-n] [-b] [file ...]
    print-nonascii -q                        [file ...]
 
    --<mode> prints abstract representations of non-ASCII chars.; one of:
      --caret, -v ... use caret notation, as cat -v would.
      --bash ... represent non-ASCII bytes as \xhh 
      --psh ... (PowerShell) represent non-ASCII Unicode characters as  
                Unicode escape sequences: <backtick>u{h...}
    
    -r, --raw ... with --<mode>, print each matching line as-is too, first.
 
    -n, --line-number ... prefix the output lines with their line number from  
     the original file, using format "<line-number>:" - decimal line numbers,  
     no padding, no space before or after the ":"
 
    -b, --bare ... suppress per-input-filename headers
 
    -q ... quiet mode: produce no output; signal presence of non-ASCII chars.  
           with exit code 0; exit code 100 signals that there are none.
 
Standard options: --help, --man, --version, --home

License

Copyright (c) 2017 Michael Klement mklement0@gmail.com (http://same2u.net), released under the MIT license.

Acknowledgements

This project gratefully depends on the following open-source components, according to the terms of their respective licenses.

npm dependencies below have an optional suffix denoting the type of dependency: the absence of a suffix denotes a required run-time dependency; (D) denotes a development-time-only dependency, (O) an optional dependency, and (P) a peer dependency.

npm dependencies

Changelog

Versioning complies with semantic versioning (semver).

  • v0.0.3 (2017-09-11):

    • [enhancement] Header lines are now only printed for input files that produce at least 1 output line.
  • v0.0.2 (2017-09-10):

    • [fix] Header line is no longer printed twice when --<mode> is combined with --raw.
    • Header line now uses a tab char. to separate prefix ### from the filename.
  • v0.0.1 (2017-09-10):

    • Initial release.

Package Sidebar

Install

npm i print-nonascii

Weekly Downloads

4

Version

0.0.3

License

MIT

Last publish

Collaborators

  • mklement0