Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Log file cannot currently be in directory being archived #335

Open
forsyth2 opened this issue Apr 12, 2024 · 0 comments
Open

[Bug]: Log file cannot currently be in directory being archived #335

forsyth2 opened this issue Apr 12, 2024 · 0 comments
Labels
semver: bug Bug fix (will increment patch version)

Comments

@forsyth2
Copy link
Collaborator

forsyth2 commented Apr 12, 2024

What happened?

Log files can get corrupted if they are in the directory being archived by zstash. See #332 (comment)

What machine were you running on?

Chrysalis

Environment

zstash 1.4.2

Minimal Complete Verifiable Example (MCVE)

# Create zstash archive
mkdir zstash_20240408
echo 'file0 stuff' > zstash_20240408/file0.txt
cd zstash_20240408/
zstash create --hpss=none . 2>&1 | tee 20240408.log

# Now, try to extract it
rm -f 20240408.log file0.txt
zstash extract --hpss=none "*"
For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
INFO: zstash/000000.tar exists. Checking expected size matches actual size.
INFO: Opening tar archive zstash/000000.tar
INFO: Extracting 20240408.log
ERROR: md5 mismatch for: 20240408.log
ERROR: md5 of extracted file: a8600c75b3d84cdaefd020cf13fb6556
ERROR: md5 of original file:  00a33f0fdfbe470ae5b32123cc3e372c
INFO: Extracting file0.txt
Traceback (most recent call last):
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.9.3_login/lib/python3.10/site-packages/zstash/extract.py", line 535, in extractFiles
    tarinfo: tarfile.TarInfo = tar.tarinfo.fromtarfile(tar)
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.9.3_login/lib/python3.10/tarfile.py", line 1293, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.9.3_login/lib/python3.10/tarfile.py", line 1237, in frombuf
    raise InvalidHeaderError("bad checksum")
tarfile.InvalidHeaderError: bad checksum
ERROR: Retrieving file0.txt
ERROR: Encountered an error for files:
ERROR: 20240408.log in 000000.tar
ERROR: file0.txt in 000000.tar
ERROR: The following tar archives had errors:
ERROR: 000000.tar

# Furthermore, try extracting the tar file directly
cd zstash
tar xvf 000000.tar 
20240408.log
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

# The only way to salvage data from the tar file is to use cpio (ironically)
cpio -ivd -H ustar < 000000.tar
././@PaxHeader
20240408.log
cpio: invalid header: checksum error
cpio: warning: skipped 29 bytes of junk
cpio: ././@PaxHeader not created: newer or same age version exists
././@PaxHeader
file0.txt
10 blocks

# At least the data file is recoverable even if the log file is not
cat file0.txt
file0 stuff

Relevant log output

No response

Anything else we need to know?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver: bug Bug fix (will increment patch version)
Projects
None yet
Development

No branches or pull requests

1 participant