Skip to content

Latest commit

 

History

History
44 lines (35 loc) · 2.26 KB

FilenameSanitizer.md

File metadata and controls

44 lines (35 loc) · 2.26 KB

FilenameSanitizer

FilenameSanitizer performs the sanitization of a filename in order to make it safe to be used for a file creation on every modern File System.

Unallowed chars are replaced with a safe token (an underscore) according to every modern File System naming convention.

Standard Sanitization replaces:

  • the `NUL` (0) character
  • Control Codes between 1 and 31
  • `<` (less than)
  • `>` (greater than)
  • `:` (colon)
  • `"` (double quote)
  • `/` (forward slash)
  • `\` (backslash)
  • `|` (vertical bar or pipe)
  • `?` (question mark)
  • `*` (asterisk)
  • leading and trailing spaces

In addition, it prepends a safe token to files which correspond to one of the Windows reserved filenames (CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9), with or without extension.

Safe Sanitization also handles invalid inputs, specifically:

  • null file names, by generating a safe, unique filename
  • empty or whitespace-only file names, by generating a safe, unique filename
  • file names too long (higher than 256 charaters, since on Windows the MAX_PATH is 260 characters and includes drive letter, colon, backslash and terminal NUL, like C:\file-256-chars-long.

Pretty Sanitization behaves identically to a Safe Sanitization, but it also removes characters which (even if allowed) could be dangerous or annoying. Specifically:

  • leading dots (make files semi-hidden on *NIX systems, and dangerous to be removed massively with "rm .*")
  • leading hyphens (make tricky to perform "rm -filename" since - is used to express command options)
  • trailing dots (make Windows angry)

The logging is performed with SLF4J, which will default to NOP (No OPeration) if no binding will be specified.

Related: Naming Files, Paths, and Namespaces

Related: Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems