Normalizes whitespace in one of several modes.
nws [-m <mode>] [[-i[<ext>]] file...]
Condensing <mode>s:
All these modes normalize runs of tabs and spaces to a single space
each and trim leading and trailing runs; they only differ with respect to
how multi-line input is processed.
mp (default) multi-paragraph: folds multiple blank lines into one
fp flattened multi-paragraph: normalizes each paragraph to single line
sp single-paragraph: removes all blank lines.
sl single-line: normalizes to single output line
Transliteration <mode>s:
lf translates line endings to LF-only (\n)
crlf translates line endings to CRLF (\r\n)
ascii translates Unicode whitespace and punctuation to ASCII
Alternatively, specify mode values directly as options; e.g., --sp
in lieu
of -m sp
Standard options: --help
, --man
, --version
, --home
nws
(normalize whitespace) performs whitespace normalization,
offering several modes in two categories:
-
whitespace-condensing modes:
Trims leading and trailing runs of any mix of tabs and spaces and replaces
them with a single space each. The individual modes differ only with respect to
how multi-line input is treated. -
whitespace-transliteration modes:
Line endings can be changed to be Windows- or Unix-specific, and select
Unicode whitespace and punctuation can be replaced with their closest ASCII
equivalents.
Input is provided either from the specified files or via stdin.
Output is sent to stdout by default.
To update files in-place, use the -i
option (in which case there will be no
stdout output).
-
CONDENSING modes:
-m <mode>
or--mode <mode>
or--<mode>
where<mode>
is one of:-
mp
,multi-para
(default)
Runs of blank (all-whitespace or empty) lines are replaced with 1 empty
line each, resulting in paragraph-internal newlines getting preserved,
with blank lines at the beginning, between paragraphs, and at the end
getting normalized to a single empty line each. -
fp
,flat-para
Like modemp
, except that paragraph-internal newlines are replaced
with a single space each, resulting in each paragraph becoming a
single line, with 1 empty line between paragraphs. -
sp
,single-para
Runs of blank (all-whitespace or empty) lines are discarded, resulting
in a single paragraph of non-blank lines. -
sl
,single-line
Normalization includes newlines too, so that any run of any mix of
spaces, tabs, and newlines is replaced with a single space each,
resulting in a single, long output line.
-
-
TRANSLITERATION modes:
-m <mode>
or--mode <mode>
or--<mode>
where<mode>
is one of:-
lf
Translates Windows-style CRLF (\r\n) line endings to Unix-style LF (\n)
line endings. -
crlf
Translates Unix-style LF (\n) line endings to Windows-style CRLF (\r\n)
line endings. -
ascii
,ascii-punctuation
Translates non-ASCII Unicode whitespace and punctuation to the closest
ASCII equivalents, while leaving other non-ASCII characters untouched.
This is helpful for source-code samples that have been formatted for display
with typographic quotes, em dashes, and the like, which usually makes the
code indigestible to compilers/interpreters.
IMPORTANT: This mode only works with PROPERLY ENCODED UTF-8 FILES.
On BSD/macOS systems, an improperly encoded input file will
result in a 'sed: RE error: illegal byte sequence' error.
-
-
-i[<backup-suffix>]
,--in-place[=<backup-suffix>]
Updates the specified files in place; that is, the results of the
normalization are written back to the input file, and no stdout output is
produced. If<backup-suffix>
is specified (recommended), a backup copy of each
input file is made first, simply by appending the suffix to the filename.
All standard options provide information only.
-
-h, --help
Prints the contents of the synopsis chapter to stdout for quick reference. -
--man
Displays this manual page, which is a helpful alternative to usingman
,
if the manual page isn't installed. -
--version
Prints version information. -
--home
Opens this utility's home page in the system's default web browser.
For license information, bug reports, and more, visit this utility's home page
by running nws --home
The examples use ANSI C-quoted input strings ($'...'
) for brevity, which
are supported in Bash, Ksh, and Zsh.
Empty output lines are represented by ~
.
## CONDENSING EXAMPLES:
# Single-line input - no mode needed.
$ nws <<< $' one \t\t two three '
one two three
# Default: multi-paragraph mode (`-m mp` or `--mode multi-para`)
$ nws <<<$'\n\n one\n two \n\n\n three\n\n'
~
one
two
~
three
~
# Single-paragraph mode; `-m sp` is the short equivalent of
# `--mode single-para`.
$ nws -m sp <<<$'\n\n one\n two \n\n\n three\n\n'
one
two
three
# Flattened-paragraph mode; note use of shorcut option `--fp` for `-m fp`.
nws --fp <<<$'\n\n one\n two \n\n\n three\n\n'
~
one two
~
three
~
# Single-line mode
$ nws --sl <<<$' one two\n three '
one two three
## TRANSLITERATION EXAMPLES:
# Converts a CRLF line-endings file (Windows) to a LF-only file (Unix).
# No output is produced, because the file is updated in-place; a backup
# of the original file is created with suffix '.bak'.
$ nws --mode lf --in-place=.bak from-windows.txt
# Converts a LF-only file (Unix) to a CRLF line-endings file (Windows).
# No output is produced, because the file is updated in-place; since no
# backup suffix is specified, no backup file is created.
$ nws --crlf -i from-unix.txt
# Converts select Unicode whitespace and punctuation chars. to their
# closest ASCII equivalents and sends the output to a different file.
$ nws --ascii unicode-punct.txt > ascii-punct.txt