forked from gluster/gstatus
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
205 lines (156 loc) · 7.74 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
GSTATUS
=======
Overview
========
A gluster trusted pool (aka cluster), consists of several key components;
nodes, volumes and bricks. In glusterfs 3.4/3.5, there isn't a single command
that can provide an overview of the cluster's health. This means that admins
currently assess the cluster health by looking at several commands to piece
together a picture of the cluster's state.
This isn't ideal - so 'gstatus' is an attempt to provide an easy to use,
highlevel view of a cluster's health through a single command. The tool gathers
information by issuing gluster commands, to build an object model covering
Nodes, Volumes and Bricks. With this data in place, checks are performed across
this meta data and errors reported back to the user. In later releases, this
data could be analysed in different ways to add further checks, incorporating
deployment best practices, freespace triggers etc.
Pre-Requisites
==============
+ glusterfs 3.4 or above
The tool issues commands to gluster using the --xml parameter. Scraping the
output is another possibility, but by using the xml option, the program
should be more stable and more easily extendable
+ volume types
the initial release of gstatus supports distributed and
distributed-replicated volumes ONLY
Dependencies
============
- python 2.6 or above
- gluster CLI
- gluster version 3.4 or above
Installation
============
Installing the tool can be done in two ways, through python-seuptools or simply
by running the main script (gstatus.py) directly.
* Using python-setuptools
1. To install using this method extract the archive file
- tar xvzf gstatus.tar.gz
2. cd to the gstatus directory
3. run the setup.py program
- python setup.py install
4. invoke the script
gstatus <options>
This will install a hook for gstatus in /usr/bin so running the script doesn't
need the .py extension.
* running from the archive
1. Extract the archive file
- tar xvzf gstatus.tar.gz
2. cd to the gstatus directory
3. invoke the script using
./gstatus.py <options>
Upgrade Notes
=============
1. If you're upgrading via rpm, first remove the existing version and then install
the new one
Volume States
=============
A gluster volume is made from individual filesystems (gluster bricks) across
multiple nodes. From an end/user of application perspective, this complexity
is abstracted but the status of individual bricks does have a bearing on the
data availability of the volume. For example, even without replication the
loss of a single brick in the volume will not cause the volume itself to be
unavilable, but instead manifest as inaccessible files in the filesystem.
To help understand the impact a brick or node outage can have on data access,
gstatus introduces some additional volume status descriptions.
UP
Volume is started and available, all bricks are up
UP(Degraded)
This state is specific to replicated volumes, where at least one brick is down
within a replica set. Data is still 100% available due to the alternate
replica(s), but the resilience of the volume to further failures within the
same replica set flags this volume as 'degraded'.
UP(Partial)
Effectively this means that allthough some bricks in the volume are online,
there are others that are down to a point where areas of the filesystem will be
missing. For a distrubuted volume, this state is seen if ANY brick is down,
whereas for a replicated volume a complete replica set needs to be down before
the volume state transitions to PARTIAL.
DOWN
Bricks are down, or the volume has not yet been started
Running the tool
================
Running gstatus with a -h flag shows the following options;
[root@gfs-ec1 gstatus]# ./gstatus.py -h
Usage: gstatus.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-s, --state show highlevel health of the cluster
-v, --volume volume info (default is ALL, or supply a volume name)
-b, --backlog Look deeper at self heal state
-a, --all show all cluster information (-s with -v)
-u UNITS, --units=UNITS
display capacity units in DECimal or BINary format (GB
vs GiB)
-l, --layout show brick layout when used with -v, or -a
-o OUTPUT_MODE, --output-mode=OUTPUT_MODE
produce output in different formats - json, keyvalue
or console(default)
-D, --debug turn on debug mode
-w, --without-progress
turn off progress updates to user during data
gathering
-t TIMEOUT, --timeout=TIMEOUT
gluster command timeout value (secs)
* Understanding the output
- Simply invoking gstatus without any options, will give a 3 line summary of
the cluster. This will tell you overall health, capacity and gluster version.
If there are issues detected in the cluster, the UNHEALTHY state will be
suffixed by a number in brackets - the number denotes the number of errors
detected within the cluster.
- Using -a will show high level cluster elements, as well as volume information
You'll notice that there are a number of what appear to be fractions listed
against items like Nodes, Bricks etc.
These fractions actually show you the current 'up' state against the
potential up state. So a 'Node' field that shows 2/4, actually means the
cluster has 4 nodes, but only 2 are in an up state.
e.g.
[root@rhs1-1 gstatus-tests]# gstatus
Status: HEALTHY Capacity: 32.00 GiB(raw bricks)
Glusterfs: 3.4.0.59rhs 10.00 GiB(raw used)
OverCommit: Yes 48.00 GiB(usable from volumes)
[root@rhs1-1 gstatus-tests]# gstatus -a
Status: HEALTHY Capacity: 32.00 GiB(raw bricks)
Glusterfs: 3.4.0.59rhs 10.00 GiB(raw used)
OverCommit: Yes 48.00 GiB(usable from volumes)
Nodes : 5/ 5 Volumes: 2 Up
Self Heal: 5/ 4 0 Up(Degraded)
Bricks : 8/ 8 0 Up(Partial)
Clients : 5 0 Down
<snip>
- Capacity information is derived from the brick information taken from a
'vol status detail' command. The accuracy of this number therefore depends
on the nodes/bricks being all up - elements missing from the configuration
will simply be missing from the calculation.
- A 'flag' called 'OverCommit' is provided that indicates whether a brick is
used by multiple volumes. Although technically valid, this exposes the
user to capacity conflicts across different volumes when quota's are not used.
- running the tool with the -l flag (with -a, or -v) will show a line diagram
depicting the brick layout within the volume. This feature was is inspired
by work of Fred van Zwieten and Niels De Vos in the 'lsgvt' project.
TIP: To keep the tool responsive, gstatus will not perform a vol heal <vol> info
command - since this can take time. However, if you suspect self heal is active
you can use the -b setting to check for a backlog of file heal operations.
The examples directory included in the archive contains a number of scenarios
that show the behaviour of the script under different error conditions.
Tested on
=========
- 4 node vm cluster, running Red Hat Storage 2.1 (RHEL 6.4, glusterfs 3.4)
- 4 node vm cluster, based on RHEL 6.5 with upstream glusterfs 3.5 beta 3
- 6 node vm cluster, based on CentOS 7.1, glusterfs 3.7 beta2
Known Issues
============
1) Normally every node will be capacity nodes. However, when nodes are added to
the cluster for quorum purposes, the Self Heal field shows more nodes with Self
Heal active than needed. This is due to the arbitration nodes also running the
self head daemon.