Skip to content

Latest commit

 

History

History
184 lines (135 loc) · 6.25 KB

nws.md

File metadata and controls

184 lines (135 loc) · 6.25 KB

nws(1) - normalize whitespace

SYNOPSIS

Normalizes whitespace in one of several modes.

nws [-m <mode>] [[-i[<ext>]] file...]

Condensing <mode>s:

All these modes normalize runs of tabs and spaces to a single space  
each and trim leading and trailing runs; they only differ with respect to
how multi-line input is processed.

mp   (default) multi-paragraph: folds multiple blank lines into one
fp   flattened multi-paragraph: normalizes each paragraph to single line
sp   single-paragraph: removes all blank lines.
sl   single-line: normalizes to single output line

Transliteration <mode>s:

lf     translates line endings to LF-only (\n)
crlf   translates line endings to CRLF (\r\n)
ascii  translates Unicode whitespace and punctuation to ASCII

Alternatively, specify mode values directly as options; e.g., --sp in lieu
of -m sp

Standard options: --help, --man, --version, --home

DESCRIPTION

nws (normalize whitespace) performs whitespace normalization,
offering several modes in two categories:

  • whitespace-condensing modes:
    Trims leading and trailing runs of any mix of tabs and spaces and replaces
    them with a single space each. The individual modes differ only with respect to
    how multi-line input is treated.

  • whitespace-transliteration modes:
    Line endings can be changed to be Windows- or Unix-specific, and select
    Unicode whitespace and punctuation can be replaced with their closest ASCII
    equivalents.

Input is provided either from the specified files or via stdin.
Output is sent to stdout by default.
To update files in-place, use the -i option (in which case there will be no
stdout output).

OPTIONS

  • CONDENSING modes: -m <mode> or --mode <mode> or --<mode>
    where <mode> is one of:

    • mp, multi-para (default)
      Runs of blank (all-whitespace or empty) lines are replaced with 1 empty
      line each, resulting in paragraph-internal newlines getting preserved,
      with blank lines at the beginning, between paragraphs, and at the end
      getting normalized to a single empty line each.

    • fp, flat-para
      Like mode mp, except that paragraph-internal newlines are replaced
      with a single space each, resulting in each paragraph becoming a
      single line, with 1 empty line between paragraphs.

    • sp, single-para
      Runs of blank (all-whitespace or empty) lines are discarded, resulting
      in a single paragraph of non-blank lines.

    • sl, single-line
      Normalization includes newlines too, so that any run of any mix of
      spaces, tabs, and newlines is replaced with a single space each,
      resulting in a single, long output line.

  • TRANSLITERATION modes: -m <mode> or --mode <mode>or --<mode>
    where <mode> is one of:

    • lf
      Translates Windows-style CRLF (\r\n) line endings to Unix-style LF (\n)
      line endings.

    • crlf
      Translates Unix-style LF (\n) line endings to Windows-style CRLF (\r\n)
      line endings.

    • ascii, ascii-punctuation
      Translates non-ASCII Unicode whitespace and punctuation to the closest
      ASCII equivalents, while leaving other non-ASCII characters untouched.
      This is helpful for source-code samples that have been formatted for display
      with typographic quotes, em dashes, and the like, which usually makes the
      code indigestible to compilers/interpreters.
      IMPORTANT: This mode only works with PROPERLY ENCODED UTF-8 FILES.
      On BSD/macOS systems, an improperly encoded input file will
      result in a 'sed: RE error: illegal byte sequence' error.

  • -i[<backup-suffix>], --in-place[=<backup-suffix>]
    Updates the specified files in place; that is, the results of the
    normalization are written back to the input file, and no stdout output is
    produced. If <backup-suffix> is specified (recommended), a backup copy of each
    input file is made first, simply by appending the suffix to the filename.

STANDARD OPTIONS

All standard options provide information only.

  • -h, --help
    Prints the contents of the synopsis chapter to stdout for quick reference.

  • --man
    Displays this manual page, which is a helpful alternative to using man,
    if the manual page isn't installed.

  • --version
    Prints version information.

  • --home
    Opens this utility's home page in the system's default web browser.

LICENSE

For license information, bug reports, and more, visit this utility's home page
by running nws --home

EXAMPLES

The examples use ANSI C-quoted input strings ($'...') for brevity, which
are supported in Bash, Ksh, and Zsh.
Empty output lines are represented by ~.

## CONDENSING EXAMPLES:

# Single-line input - no mode needed.
$ nws <<< $'  one \t\t two  three   '
one two three

# Default: multi-paragraph mode (`-m mp` or `--mode multi-para`)
$ nws <<<$'\n\n  one\n two \n\n\n  three\n\n'
~
one
two
~
three
~

# Single-paragraph mode; `-m sp` is the short equivalent of
# `--mode single-para`.
$ nws -m sp <<<$'\n\n  one\n two \n\n\n  three\n\n'
one
two
three

# Flattened-paragraph mode; note use of shorcut option `--fp` for `-m fp`.
nws --fp <<<$'\n\n  one\n two \n\n\n  three\n\n'
~
one two
~
three
~

# Single-line mode
$ nws --sl <<<$'  one two\n  three '
one two three

## TRANSLITERATION EXAMPLES:

# Converts a CRLF line-endings file (Windows) to a LF-only file (Unix).
# No output is produced, because the file is updated in-place; a backup
# of the original file is created with suffix '.bak'. 
$ nws --mode lf --in-place=.bak from-windows.txt

# Converts a LF-only file (Unix) to a CRLF line-endings file (Windows).
# No output is produced, because the file is updated in-place; since no
# backup suffix is specified, no backup file is created.
$ nws --crlf -i from-unix.txt

# Converts select Unicode whitespace and punctuation chars. to their 
# closest ASCII equivalents and sends the output to a different file. 
$ nws --ascii unicode-punct.txt > ascii-punct.txt