Skip to content
This repository has been archived by the owner on Jun 17, 2023. It is now read-only.

Duration and exit code of last parity check in $var #181

Open
bergware opened this issue Jun 28, 2015 · 3 comments
Open

Duration and exit code of last parity check in $var #181

bergware opened this issue Jun 28, 2015 · 3 comments
Labels

Comments

@bergware
Copy link
Collaborator

At the moment I scan the syslog file to find duration and exit code of the last parity check, which is subsequently displayed on the Array Operation page, this is an inefficient approach - specially when the syslog file is big.

A better solution would be that duration and exit code are stored in $var. In conjunction with variable 'sbSynced'. These values can then be read instantly like all other values.

Bonus question: what parity check exit codes exist ?

@limetech
Copy link
Owner

To review, here's how it works now:

  • 'sbSynced' records time that last 'parity check' exited (whether is completed or not). This variable is cleared at the start of a 'data rebuild' or start of 'parity sync'.
  • 'sbSyncErrs' records the number of parity disk blocks that had incorrect parity during last parity check run (each parity disk block is 4K in size).

So right, this is not so precise because it does not record whether the last parity check completed or not. Here are the actions that will cause a parity check/sync or data rebuild to abort:

  • unrecoverable write error occurs on any disk while a parity check is in progress. That also results in that disk also being disabled (that is, if a disk write error happens to occur while a parity check is taking place, we kill the parity check as part of the process of disabling the disk that failed).
  • user Stops array - in this case parity operation is canceled first, then array stopped
  • user Cancels the parity operation

Note that a parity operation is not cancelled as a result of "too many errors". Instead, the first 100 sync errors are logged in the system log. The 101'st sync error generates a log message that simply says " stopped logging", but the operation keeps marching along, incrementing 'sbSyncErrs'.

As you know we are working on integrating P+Q array protection. Here's what I'll be adding to stored config data:

  • sbSyncType = will indicate what operation took place last: generate P only, generate Q only, generate P+Q, rebuild data, check P, check Q, check P+Q
  • sbSyncResult = 0 success, 1 aborted: write failure, 2 aborted: cancelled.
  • sbSyncStart = record start time of operation

I'll probably be able to get above into 4.1 release (though P+Q will probably not be in 4.1 release).

@limetech limetech reopened this Jun 29, 2015
@bergware
Copy link
Collaborator Author

sbUpdated already gives the start time of the operation. I use it to show the "Elapsed time" when a parity operation is running.

Can something like sbDuration be added to store the duration of the last parity check (or parity operation) ?

@limetech
Copy link
Owner

sbUpdated gets updated every time there's an update, for example every Start array.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants