Skip to content

Scripts to help with validating the quality of files in a collection before being preserved.

License

Notifications You must be signed in to change notification settings

MetaArchive/metaarchive-qa-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MetaArchive QA Tools

Scripts to help with validating the quality of files in a collection before being preserved.

find-bad-files

This tool will recursively scan a directory for filenames that violate a set of naming standards meant to prevent problems when ingesting collections into LOCKSS over HTTP. The relative paths will be printed to standard output; No files are moved or modified by this tool.

If -v or --verbose is provided, the reasons for the results being matched will be printed with the output.

Usage

python find-bad-files.py [-h|--help] [-v|--verbose] <directory>

Rules enforced

  • Characters must be URL-safe. For our purposes, we strictly limit the characters present in filenames to letters, numbers, dots (.), hyphens (-), and underscores (_).
  • Filenames must start with a letter or number. This prevents inclusion of various hidden files, config files, .DS_store, tilde-prefixed backups, and other likely-undesired files.
  • Filenames must not equal "Thumbs.db".

lockss-manifest-validate

This tool allows you to compare a HashCUS.txt manifest generated by LOCKSS with a BagIt MD5 manifest from the same title to check for any discrepancies.

Once you have your HashCUS.txt and manifest-md5.txt files ready, use either the HTML/JS graphical tool or the Python command-line interface script provided to run the comparison.

HTML/Javascript GUI Version

Usage

  1. Open the lockss-manifest-validate.html file using Firefox, Chrome, IE, or any reliable web browser not listed here.
  2. Use the form on the page to select your HashCUS.txt and manifest-md5.txt files from their location on your hard drive.
  3. Click Compare. You will soon see an alert window indicating how many records were compared, and how many errors were found.
  4. Click OK. In the 'Output' box, you will see a detailed log with information about which records had errors, as well as some additional statistics.

Python CLI Version

When invoked, the lockss-manifest-validate.py script will output a log (similar to the one written by the HTML/JS version of the tool) containing a detailed report of the comparison results.

Usage

python lockss-manifest-validate.py [-h|--help] <HashCUS> <manifest-md5>

License

See LICENSE.txt

About

Scripts to help with validating the quality of files in a collection before being preserved.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published