Skip to content

Latest commit

 

History

History
343 lines (259 loc) · 13.7 KB

README.md

File metadata and controls

343 lines (259 loc) · 13.7 KB

snapraid_sync

A script designed for automating the SnapRAID sync and scrubbing tasks, with user configurable threshold values in order to prevent accidental syncs when too many files have been changed/deleted. Use cron to trigger it on a schedule, and have it notify you by email when syncs are successful or something has gone wrong.

All output will be printed both to stdout and to a log file. This means that it is possible to use this script interactively as well, which makes the task of "forcing" a sync (when manual intervention is necessary) much easier.

This script is also used in my Ansible "SnapRAID" role, which is why it has been designed to be able to handle multiple SnapRAID arrays on the same computer.

Acknowledgments and Motive

This script is not an entirely original piece of work. I have had a lot of inspiration from a couple of other similar scripts which exists out there:

However, none of these really fulfilled my desire to have a script that could be configured through environment variables, in order to make it easy to use the same script for multiple arrays and/or deploy via Ansible. So I took some time to analyze the best parts and design choices from all of these sources, and then build my own solution from that knowledge.

Installation

SnapRAID

The most critical part, in order for this script to work, is of course that you have installed SnapRAID and have created a valid configuration file for your array. ZackReed has a good guide on how to install SnapRAID as well, in addition to his version of this script that is linked above.

However, as mentioned in the introduction, for those who like Ansible it might be interesting to check out my Ansible "SnapRAID" role as well, if you don't feel like you want to do all the installation steps manually.

Mutt

If you want to be notified by email, when syncs are successful or something goes wrong, you will need to install mutt. This is a very lightweight email client that is able to authenticate with other IMAP services, which is necessary if you want to send emails out over the world wide web, and it can be installed with the following command:

sudo apt install mutt

Mutt then needs to be properly configured so it is able to send emails to you. You will need an account on some other email service (Gmail/Hotmail) which can be used to login to and send emails from. In the examples/ folder there is an example muttrc file that has been configured to use a Gmail account. You will only need to change the <user> name/mail and <supersectret> password to something that you control.

This muttrc config file need to be placed in the $HOME folder of the user that will invoke the snapraid_sync.sh script. Since we usually want SnapRAID to run as "root" (in order to be able to read all files), we should place the file in one of these two places:

  • /root/.muttrc
  • /root/.mutt/muttrc

Notice the leading dot on either the file or the folder.

When this is done you can type sudo mutt in the terminal to test to read/send emails.

snapraid_sync

To install this script you should move into a suitable directory of your choice and clone this repository from GitHub:

git clone git@github.com:JonasAlfredsson/snapraid_sync.git

Inside the src/ directory there will be four files which needs to be kept together for this script to work as intended. The snapraid_sync.sh file is the main executable for this project, and it will source the utils_* files during runtime, so do not separate them.

It is important that the snapraid_sync.sh file is executable, which should already be the case, but can also be achieved by the following command:

sudo chmod +x snapraid_sync.sh

After this, it should be possible to use this program by always providing the full path to the snapraid_sync.sh file, but to be able to call the executable from anywhere on your system you can also add its folder to your $PATH. This can be done by including the following line at the bottom of your ~/.bashrc or ~/.zshrc file:

PATH="${PATH}:/path/to/snapraid_sync/src"

By sourcing the edited file again, or just opening a new terminal, it should now be possible to use snapraid_sync.sh without having to provide the full path.

Please also read the section about log rotation if you want to keep your server more organized.

Environment Variables

These variables are read from the environment when this script is started, which makes it easy to quickly point to another SnapRAID configuration file in case you have multiple arrays on your system.

Here are all the available variables and their default values if nothing is provided from the environment. If you are only using this script for a single array/setup on a single computer, it is perfectly fine to go into this script and manually change the defaults directly in the code. This way you will not need to prepend any additional settings every time you run it.

Important

  • EMAIL_ADDRESS: The address which the notification emails should be sent to (default: "" [i.e. disabled])
  • DELETE_THRESHOLD: Threshold value for deleted files, if exceeded no sync will be made (default: "0")
  • UPDATE_THRESHOLD: Threshold value for updated files, if exceeded no sync will be made (default: "-1" ["-1" for disable])
  • CONFIG_FILE: The location of the SnapRAID array's configuration file (default: "/etc/snapraid.conf")

Optional

  • SCRUB_PERCENT: The percentage of the array which should be scrubbed when "scrub" is called (default: "8")
  • SCRUB_AGE: Only scrub files which are older than this amount of days (default: "10")
  • EMAIL_SUBJECT_PREFIX: A prefix which will be added to the subject line of all notification mails
    (default: "SnapRAID on $(hostname) - ")
  • MAIL_ATTACH_LOG: Attach the entire log file to the notification mail (default: "false")

Additional - Do not change these unless you know what you are doing.

  • FORCE_SYNC: Run a "sync" even though threshold values have been exceeded (default: "false")
  • NONINTERACTIVE: Unless this is "true" the script will ask the user for confirmation before forcing a sync (default: "false")
  • RUN_SCRUB: Run a "scrub" after the "sync" (default: "false")
  • LOG_FILE: The full path to the main log file (default: "" [This will create a temporary file in /tmp/])
  • SNAPRAID_BIN: The location of the SnapRAID executable binary (default: "/usr/local/bin/snapraid")
  • MAIL_BIN: The location of the mail program's executable binary (default: "/usr/bin/mutt")

Usage

There are two methods of usage which I have envisioned when I wrote this; a daily non-interactive automatic sync/scrub via cron, and then an interactive intervention when threshold values have been exceeded (i.e. force a sync). I will begin by explaining the interactive intervention, since that one is necessary if you have not yet made any syncs, and from that it should be easier to understand how to properly set up cron with this.

Interactive Intervention

If you only have a single SnapRAID array, and the config file is in the default location (see the defaults above), you should be able to run a normal "sync" by just executing the following command:

sudo ./snapraid_sync.sh

Notice the use of sudo in order to give SnapRAID root privileges (so it can read all files present on the filesystem).

Force a "sync"

However, if this is the first time running a "sync", or you have deleted some files, it will complain that the threshold values have been exceeded, and the script will exit with an error. If running in "non-interactive mode" the script will also send an email to notify you about this problem. To override this error you will need to set the environment variable FORCE_SYNC to "true", which can be achieved with either of these two options:

sudo ./snapraid_sync.sh force
sudo FORCE_SYNC="true" ./snapraid_sync.sh

The script will then not exit when threshold values are exceeded, but rather stop and ask the user to confirm (with a "Y") that a "sync" should be performed irregardless of the "diff" status.

If this safety-prompt is annoying, or you are trying to automate everything, it can be turned off by setting the environment variable NONINTERACTIVE to "true". In combination with FORCE_SYNC this will make SnapRAID "sync" irregardless of the threshold values, and these settings can be combined in whichever of the following ways you are most comfortable with:

sudo ./snapraid_sync.sh force noninteractive
sudo FORCE_SYNC="true" NONINTERACTIVE="true" ./snapraid_sync.sh

or as a combination in some way:

sudo NONINTERACTIVE="true" ./snapraid_sync.sh force

The trailing commands have precedence over the prepended environment variables.

Non-Interactive Execution

Above was a guide on how to do a "sync" manually, but usually we want to have as much as possible automated. By creating an entry in cron we can have this script be triggered automatically on a schedule we choose, and have it keep the array in an up to date synced state without our help.

When this script is run by cron you need to have the NONINTERACTIVE variable set to "true", otherwise it might get stuck waiting for user input that will never arrive. It is also recommended to set the user running this cron job to "root", so that SnapRAID will be able to read all the files on the filesystem without any issues.

An example cron configuration file can be found in the examples/ folder, and in that one it is easy to see how the user is set to "root" and the NONINTERACTIVE variable is set to "true". Additionally a half-finished entry of an email address is present, which should be changed to something that you own, since this is the primary method for notifying you when something goes wrong while running in non-interactive mode.

It is also possible to have the entire LOG_FILE attached to the notification email that is sent. Just make sure that the variable MAIL_ATTACH_LOG is set to "true" for the log to show up as an file attachment. However, a minor warning regarding this is that the "diff" output will be present in this file, and if you do not trust you email provider you might not want it to know about the names of the files which you have on your computer. Therefore the default of this setting is "false".

Nevertheless, in the example cron file there are two entries present with two different schedules. The first one will trigger every day, except Monday, at 09:05 and 22:05 to run a "sync". The second one will only run on Mondays at 13:00, and then it will also run a "scrub" in addition to the "sync" (see the trailing "scrub" command). In both of these cases the output is routed to /dev/null, since we collect all of it in the LOG_FILE instead.

The crond file needs to be renamed and placed under /etc/cron.d/ to work. A suggestion might be something like this:

/etc/cron.d/snapraid_sync

Files inside this folder may not have any extensions, e.g. *.sh, or contain any weird characters.

Something to remember is that cron does not read your user's .bashrc file (or similar), which means that all the environment variables you want propagated to the script needs to be defined in the cron job. For a complete list of all available variables, look in the environment section.

Log Rotation

This is an extra step you should take some time to complete if you want your server more organized.

During execution this script will produce output to four different files:

  • tmp_file
  • mail_body
  • tmp_mail
  • LOG_FILE

Those in lowercase letters will be created as temporary files in /tmp/, and deleted when the script exits (for whatever reason), while the main LOG_FILE will remain untouched after completion. This is done so that you will be able to go back and look through the log to find details about any errors which might have occurred.

However, by default this LOG_FILE is also created as a temporary file in /tmp/, which means that sooner or later the system will remove it from that folder. If you would like to keep it for longer you will need to define a different path and manage housekeeping yourself.

A suggestion is to configure the LOG_FILE variable to point to a path like this:

/var/log/snapraid_sync/array_name.log

and then configure logrotate to make sure the logs are renamed and compressed every day, and then have it delete the oldest ones so you do not fill the folder with tons of files.

An example of a logrotate configuration file can be found inside the examples/ folder, and this file then needs to be renamed and placed inside the /etc/logrotate.d/ folder. A suggestion could be something like this:

/etc/logrotate.d/snapraid_sync