Scripts to help with validating the quality of files in a collection before being preserved.
This tool will recursively scan a directory for filenames that violate a set of naming standards meant to prevent problems when ingesting collections into LOCKSS over HTTP. The relative paths will be printed to standard output; No files are moved or modified by this tool.
If -v or --verbose is provided, the reasons for the results being matched will be printed with the output.
python find-bad-files.py [-h|--help] [-v|--verbose] <directory>
- Characters must be URL-safe. For our purposes, we strictly limit the characters present in filenames to letters, numbers, dots (.), hyphens (-), and underscores (_).
- Filenames must start with a letter or number. This prevents inclusion of various hidden files, config files, .DS_store, tilde-prefixed backups, and other likely-undesired files.
- Filenames must not equal "Thumbs.db".
This tool allows you to compare a HashCUS.txt manifest generated by LOCKSS with a BagIt MD5 manifest from the same title to check for any discrepancies.
Once you have your HashCUS.txt and manifest-md5.txt files ready, use either the HTML/JS graphical tool or the Python command-line interface script provided to run the comparison.
- Open the
lockss-manifest-validate.html
file using Firefox, Chrome, IE, or any reliable web browser not listed here. - Use the form on the page to select your HashCUS.txt and manifest-md5.txt files from their location on your hard drive.
- Click Compare. You will soon see an alert window indicating how many records were compared, and how many errors were found.
- Click OK. In the 'Output' box, you will see a detailed log with information about which records had errors, as well as some additional statistics.
When invoked, the lockss-manifest-validate.py
script will output a log
(similar to the one written by the HTML/JS version of the tool) containing
a detailed report of the comparison results.
python lockss-manifest-validate.py [-h|--help] <HashCUS> <manifest-md5>
See LICENSE.txt