Skip to content

Commit

Permalink
Merge dev from auanasgheps/dev
Browse files Browse the repository at this point in the history
Push dev to main
  • Loading branch information
Oliver Cervera authored Jan 18, 2021
2 parents 9ee9e58 + dc529dd commit 0db56fc
Show file tree
Hide file tree
Showing 3 changed files with 302 additions and 128 deletions.
179 changes: 165 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,37 +4,188 @@ The definitive all-in-one [SnapRAID](https://github.com/amadvance/snapraid) scri
There are many SnapRAID scripts out there, but none could fit my needs. So I took the best of them to start a new one.

It is meant to be run periodically (e.g. daily) and do the heavy lifting, then send an email you will actually read.
It is highly customizable.
It has been tested with Debian 10 and OpenMediaVault 5.

Supports single and dual parity configurations.

It is customizable and has been tested with Debian 10 and OpenMediaVault 5.

Contributions are welcome: there's always room for improvement!

This readme has some rough edges which will be smoothened over time.
_This readme has some rough edges which will be smoothened over time._

# Highlights

## How it works
- After some preliminary checks, the script will execute `snapraid diff` to figure out if parity info is out of date, which means checking for changes since the last execution.
- One of the following will happen:
- If parity info is out of sync **and** the number of deleted or changed files exceed the threshold you have configured it **stops**. You may want to take a look to the output log.
- If parity info is out of sync **and** the number of deleted or changed files exceed the threshold, you can still **force a sync** after a number of warnings. It's useful If you often get a false alarm but you're confident enough.
- If parity info is out of sync **but** the number of deleted or changed files did not exceed the treshold, it **executes a sync** to update the parity info.
- When the parity info is in sync, either because nothing has changed or after a successfully sync, it runs the `snapraid scrub` command to validate the integrity of the data, both the files and the parity info. _Note that each run of the scrub command will validate only a configurable portion of parity info to avoid having a long running job and affecting the performance of the server._
- When the script is done sends an email with the results, both in case of error or success.

Pre-hashing is enabled by default to avoid silent read errors. It mitigates the lack of ECC memory.

## A nice email report
This report produces emails that don't contain a list of changed files to improve clarity.

You can re-enable full output in the email by switching the option `VERBOSITY` but the full report will always be available in `/tmp/snapRAID.out` and will be replaced after each run or deleted when the system is shut down if kept there.

SMART drive report from SnapRAID is also included by default.

Here's a sneak peek of the email report.

```markdown
## [COMPLETED] DIFF + SYNC + SCRUB Jobs (SnapRAID on omv-test.local)

SnapRAID Script Job started [Sat Jan 9 02:07:46 CET 2021]
Running SnapRAID version 11.5
SnapRAID Script version 2.7.0

----------

## Preprocessing

Configuration file found! Proceeding.
Testing that all parity files are present.
All parity files found. Continuing...

----------

## Processing

### SnapRAID TOUCH [Sat Jan 9 02:07:46 CET 2021]

Checking for zero sub-second files.
No zero sub-second timestamp files found.
TOUCH finished [Sat Jan 9 02:07:46 CET 2021]

### SnapRAID DIFF [Sat Jan 9 02:07:46 CET 2021]

DIFF finished [Sat Jan 9 02:07:46 CET 2021]

**SUMMARY of changes - Added [2] - Deleted [0] - Moved [0] - Copied [0] - Updated [0]**

There are deleted files. The number of deleted files, (0), is below the threshold of (2). SYNC Authorized.
There are updated files. The number of updated files, (0), is below the threshold of (2). SYNC Authorized.

### SnapRAID SYNC [Sat Jan 9 02:07:46 CET 2021]

Self test...
Loading state from /srv/dev-disk-by-label-DISK1/snapraid.content...
Scanning disk DATA1...
Scanning disk DATA2...
Using 0 MiB of memory for the file-system.
Initializing...
Hashing...
SYNC_JOB--Everything OK
Resizing...
Saving state to /srv/dev-disk-by-label-DISK1/snapraid.content...
Saving state to /srv/dev-disk-by-label-DISK2/snapraid.content...
Saving state to /srv/dev-disk-by-label-DISK3/snapraid.content...
Saving state to /srv/dev-disk-by-label-DISK4/snapraid.content...
Verifying /srv/dev-disk-by-label-DISK1/snapraid.content...
Verifying /srv/dev-disk-by-label-DISK2/snapraid.content...
Verifying /srv/dev-disk-by-label-DISK3/snapraid.content...
Verifying /srv/dev-disk-by-label-DISK4/snapraid.content...
Verified /srv/dev-disk-by-label-DISK4/snapraid.content in 0 seconds
Verified /srv/dev-disk-by-label-DISK3/snapraid.content in 0 seconds
Verified /srv/dev-disk-by-label-DISK2/snapraid.content in 0 seconds
Verified /srv/dev-disk-by-label-DISK1/snapraid.content in 0 seconds
Syncing...
Using 32 MiB of memory for 32 cached blocks.

DATA1 59% | ***********************************
DATA2 55% | ********************************
parity 0% |
2-parity 0% |
raid 6% |
hash 5% |
sched 7% |
misc 17% |
|______________
wait time (total, less is better)

SYNC_JOB--Everything OK
Saving state to /srv/dev-disk-by-label-DISK1/snapraid.content...
Saving state to /srv/dev-disk-by-label-DISK2/snapraid.content...
Saving state to /srv/dev-disk-by-label-DISK3/snapraid.content...
Saving state to /srv/dev-disk-by-label-DISK4/snapraid.content...
Verifying /srv/dev-disk-by-label-DISK1/snapraid.content...
Verifying /srv/dev-disk-by-label-DISK2/snapraid.content...
Verifying /srv/dev-disk-by-label-DISK3/snapraid.content...
Verifying /srv/dev-disk-by-label-DISK4/snapraid.content...
Verified /srv/dev-disk-by-label-DISK4/snapraid.content in 0 seconds
Verified /srv/dev-disk-by-label-DISK3/snapraid.content in 0 seconds
Verified /srv/dev-disk-by-label-DISK2/snapraid.content in 0 seconds
Verified /srv/dev-disk-by-label-DISK1/snapraid.content in 0 seconds
SYNC finished [Sat Jan 9 02:07:49 CET 2021]

### SnapRAID SCRUB [Sat Jan 9 02:07:49 CET 2021]

Self test...
Loading state from /srv/dev-disk-by-label-DISK1/snapraid.content...
Using 0 MiB of memory for the file-system.
Initializing...
Scrubbing...
Using 48 MiB of memory for 32 cached blocks.
SCRUB_JOB--Nothing to do
SCRUB finished [Sat Jan 9 02:07:49 CET 2021]

----------

## Postprocessing

SnapRAID SMART report:

Temp Power Error FP Size
C OnDays Count TB Serial Device Disk

----------

- - - SSD 0.0 00000000000000000001 /dev/sdb DATA1
- - - - 0.0 01000000000000000001 /dev/sdc DATA2
- - - SSD 0.0 02000000000000000001 /dev/sdd parity
- - - SSD 0.0 03000000000000000001 /dev/sde 2-parity
0 - - - 0.0 - /dev/sda -

The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.

Probability that at least one disk is going to fail in the next year is 0%.
All jobs ended. [Sat Jan 9 02:07:49 CET 2021]
Email address is set. Sending email report to example@example.com [Sat Jan 9 02:07:49 CET 2021]
```

## Customization
Many options can be changed to your taste, their behaviour is documented in the script config file.

If you don't know what to do, I recommend using the default values and see how it performs.

You can also change more advanced options such as mail binary (by default uses `mailx`), SnapRAID binary location, log file location.

# Features
[WIP]

# Requirements
- Markdown to have nice emails
- Hd-idle to spin down disks - [Link TBD]
- ~~Hd-idle to spin down disks - [Link TBD] - currently not required since spin down does not work properly.~~

# Installation
[WIP]
1. Install markdown `apt install python-markdown`
2. Place the script wherever you prefer e.g. `/usr/sbin/snapraid`
3. Give executable rights - `chmod +x snapraid-aio-script.sh`
4. Open the script and add your email address at line 43
5. Tweak the script if needed
1. Install markdown `apt install python-markdown`. You can skip this step since the script will check and install it for you.
2. Download config file and script, then place wherever you prefer e.g. `/usr/sbin/snapraid`
3. Give executable rights to the main script - `chmod +x snapraid-aio-script.sh`
4. Edit the config file and add your email address at line 9
5. Tweak the config file if needed
6. Schedule the script execution time

# Known Issues
Hard disk spin down does not work: they are immediately woken up. The script probably does not handle this correctly while running.
- Hard disk spin down does not work: they are immediately woken up. The script probably does not handle this correctly while running.
- The report is not perfect, we can't be solve this because SnapRAID does not natively support Markdown.

# Credits
All rights belong to the respective creators.
Thanks to:
- [Zack Reed](https://zackreed.me/snapraid-split-parity-sync-script/) for most of the original script
- [mtompkins](https://gist.github.com/mtompkins/91cf0b8be36064c237da3f39ff5cc49d) for most of the original script
- [sburke](https://zackreed.me/snapraid-split-parity-sync-script/#comment-300) for the Debian 10 fix
- metagliatore (I don't think he's on Github) for removing the DIFF output from the email
- metagliatore (a friend, not on Github) for removing the DIFF output from the email
- [ozboss](https://forum.openmediavault.org/wsc/index.php?user/27331-ozboss/)
82 changes: 82 additions & 0 deletions script-config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#!/bin/bash
######################
# USER VARIABLES #
######################

####################### USER CONFIGURATION START #######################

# address where the output of the jobs will be emailed to.
EMAIL_ADDRESS="youremailgoeshere"

# Set the threshold of deleted files to stop the sync job from running.
# NOTE that depending on how active your filesystem is being used, a low
# number here may result in your parity info being out of sync often and/or
# you having to do lots of manual syncing.
DEL_THRESHOLD=500
UP_THRESHOLD=500

# Set number of warnings before we force a sync job.
# This option comes in handy when you cannot be bothered to manually
# start a sync job when DEL_THRESHOLD is breached due to false alarm.
# Set to 0 to ALWAYS force a sync (i.e. ignore the delete threshold above)
# Set to -1 to NEVER force a sync (i.e. need to manual sync if delete threshold is breached)
SYNC_WARN_THRESHOLD=-1

# Set percentage of array to scrub if it is in sync.
# i.e. 0 to disable and 100 to scrub the full array in one go
# WARNING - depending on size of your array, setting to 100 will take a very long time!
SCRUB_PERCENT=5
SCRUB_AGE=10

# Prehash Data To avoid the risk of a latent hardware issue, you can enable the "pre-hash" mode and have all the
# data read two times to ensure its integrity. This option also verifies the files moved inside the array, to ensure
# that the move operation went successfully, and in case to block the sync and to allow to run a fix operation.
# 1 to enable, any other values to disable
PREHASH=1

# Set the option to log SMART info. 1 to enable, any other value to disable
SMART_LOG=1

# Set verbosity of the email output. TOUCH and DIFF outputs will be kept in the email, producing a potentially huge email. Keep this disabled for optimal reading
# You can always check TOUCH and DIFF outputs using the TMP file.
# 1 to enable, any other values to disable
VERBOSITY=0

# Set if disk spindown should be performed. Depending on your system, this may not work. 1 to enable, any other values to disable
SPINDOWN=0

# Run snapraid status command to show array general information.
# Be aware the HTML output is pretty broken.
SNAP_STATUS=0

# location of the snapraid binary
SNAPRAID_BIN="/usr/bin/snapraid"
# location of the mail program binary
MAIL_BIN="/usr/bin/mailx"

####################### USER CONFIGURATION END #######################

####################### SYSTEM CONFIGURATION #######################
# Make changes only if you know what you're doing
######################

# Init variables
CHK_FAIL=0
DO_SYNC=0
EMAIL_SUBJECT_PREFIX="(SnapRAID on `hostname`)"
GRACEFUL=0
SYNC_WARN_FILE="$CURRENT_DIR/snapRAID.warnCount"
SYNC_WARN_COUNT=""
TMP_OUTPUT="/tmp/snapRAID.out"
SNAPRAID_LOG="/var/log/snapraid.log"
SECONDS=0 #Capture time

# Expand PATH for smartctl
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# Determine names of first content file...
CONTENT_FILE=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.content | head -n 1 | cut -d " " -f2`

# Build an array of parity all files...
PARITY_FILES[0]=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.parity | head -n 1 | cut -d " " -f2`
IFS=$'\n' PARITY_FILES=(`cat /etc/snapraid.conf | grep "^[^#;]" | grep "^\([2-6z]-\)*parity" | cut -d " " -f 2 | tr ',' '\n'`)
Loading

0 comments on commit 0db56fc

Please sign in to comment.