Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snmp observ lib #1382

Draft
wants to merge 49 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
72d227f
Add snmp-observ-lib (init)
v-zhuravlev Jan 3, 2025
76238cb
Add system
v-zhuravlev Jan 3, 2025
3cd9c69
Add packets
v-zhuravlev Jan 3, 2025
7b774b6
Add irate for interface
v-zhuravlev Jan 3, 2025
1b331fc
Update dashboards
v-zhuravlev Jan 3, 2025
4b3f050
Add main
v-zhuravlev Jan 3, 2025
1f933a5
Add clamp
v-zhuravlev Jan 9, 2025
d45b6f8
Errors and drops should be > 0
v-zhuravlev Jan 9, 2025
74c961a
Add nameShort for legends
v-zhuravlev Jan 9, 2025
cd9b10f
Move legend to the right for traffic
v-zhuravlev Jan 9, 2025
5a03f86
Make snnmp host single
v-zhuravlev Jan 9, 2025
802bfaf
Add aggKeepLabels
v-zhuravlev Jan 9, 2025
e4ebdb5
Add interfaceTable
v-zhuravlev Jan 9, 2025
6a90f46
Update packet panel types
v-zhuravlev Jan 10, 2025
50a476e
Add systemname
v-zhuravlev Jan 10, 2025
ae94129
Add new interface signals
v-zhuravlev Jan 10, 2025
9eac02c
Add row
v-zhuravlev Jan 10, 2025
757cfd9
Update interface signals and panels
v-zhuravlev Jan 14, 2025
0cc5095
Add cpu signals for multiple vendors
v-zhuravlev Jan 14, 2025
e4b03da
Add memory signals for multiple vendors
v-zhuravlev Jan 14, 2025
7b19111
Remove txt file
v-zhuravlev Jan 14, 2025
d598d6b
Add system signals
v-zhuravlev Jan 14, 2025
320d5da
Add README
v-zhuravlev Jan 14, 2025
9411a4e
Add rows/dashboards
v-zhuravlev Jan 14, 2025
eb0d5ea
Add config
v-zhuravlev Jan 14, 2025
c87cd0d
Add fleet dashboard
v-zhuravlev Jan 15, 2025
fd2517d
Add withTopK to panels
v-zhuravlev Jan 15, 2025
6d695f4
Simplify clampQuery
v-zhuravlev Jan 15, 2025
39e30f8
Add links between dashboards: from table, as datalinks, global
v-zhuravlev Jan 15, 2025
a1ff861
add mininterval option
v-zhuravlev Jan 15, 2025
f17027c
Update README
v-zhuravlev Jan 15, 2025
e17f6a9
Fixed ifLastChange
v-zhuravlev Jan 15, 2025
42c738c
Fmt
v-zhuravlev Jan 16, 2025
7aa50f4
Update cpu/memory to avg
v-zhuravlev Jan 20, 2025
133d14d
Add SNMP alerts
v-zhuravlev Jan 20, 2025
941da69
Add alerts thresholds
v-zhuravlev Jan 20, 2025
d71ede7
Add SNMPInterfaceIsFlapping alert
v-zhuravlev Jan 20, 2025
3c04aa6
Add snmp-exporter-alerts
v-zhuravlev Jan 20, 2025
85ce9f0
add metricsSource
v-zhuravlev Jan 20, 2025
6761571
Update counters for ubuquiti airos
v-zhuravlev Jan 22, 2025
ef76502
Update README
v-zhuravlev Jan 22, 2025
8443ac2
update 0 PDU alert
v-zhuravlev Jan 22, 2025
257f4a3
Rename arista to arista_Sw
v-zhuravlev Jan 22, 2025
9463270
Fix add cpu/memory signals for generic (and arista)
v-zhuravlev Jan 22, 2025
b2f4c6c
Fix mikrotik memory
v-zhuravlev Jan 22, 2025
a15a4e5
Fix selector
v-zhuravlev Jan 22, 2025
86683dc
Update default group/instance labels
v-zhuravlev Jan 22, 2025
c4b18fd
Fix lint
v-zhuravlev Jan 23, 2025
a0c2de4
Update juniper cpu/mem
v-zhuravlev Jan 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,11 @@ More examples:
Examples:
- [kafka-observ-lib](kafka-observ-lib/)
- [jvm-observ-lib](jvm-observ-lib/)
- [snmp-observ-lib](snmp-observ-lib/)
- [process-observ-lib](process-observ-lib/)
- [golang-observ-lib](golang-observ-lib/)
- [csp-mixin](csp-mixin/)



## LICENSE

[Apache-2.0](LICENSE)
5 changes: 5 additions & 0 deletions snmp-observ-lib/.lint
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
exclusions:
template-instance-rule:
reason: "These dashboards are designed to be single instance"
entries:
- dashboard: SNMP overview
43 changes: 43 additions & 0 deletions snmp-observ-lib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# SNMP observability library

This lib can be used to generate dashboards, rows, panels for SNMP devices.

The library supports multiple metrics sources (`metricsSource`).

### Supported sources

|metricsSource|Description|MIBs|Known devices|Links|snmp_exporter modules|
|-|-|-|-|-|-|
|generic |Generic SNMP device|IF-MIB,SNMPv2-MIB |default choice||system,if_mib,hrDevice,hrStorage|
|cisco | Cisco IoS devices |IF-MIB,SNMPv2-MIB, Cisco private mibs|-||system,if_mib|
|arista_sw | Arista devices |IF-MIB,SNMPv2-MIB,HOST-RESOURCES-MIB|-||system,if_mib,hrDevice,hrStorage,arista_sw|
|brocade_fcs | Brocade |IF-MIB,SNMPv2-MIB,SW-MIB|Brocade 6520 v7.4.1c, Brocade 300 v7.0.0c,Brocade BL 5480 v6.3.1c|system,if_mib|
|brocade_foundry | Brocade Foundry | FOUNDRY-SN-AGENT-MIB | Brocade MLXe (System Mode: MLX), IronWare Version V5.4.0eT163, Foundry FLS648 Foundry Networks, Inc. FLS648, IronWare Version 04.1.00bT7e1, Foundry FWSX424 Foundry Networks, Inc. FWSX424, IronWare Version 02.0.00aT1e0||system,if_mib|
|dell_force | Dell Force S-Series | IF-MIB,SNMPv2-MIB,F10-S-SERIES-CHASSIS-MIB | Dell Force S-Series ||system,if_mib|
|dlink_des | D-Link DES series | IF-MIB,SNMPv2-MIB,AGENT-GENERAL-MIB | DGS-3420-26SC Gigabit Ethernet ||system,if_mib,dlink|
|eltex | Eltex | IF-MIB, SNMPv2-MIB | - ||system,if_mib|
|eltex_mes | Eltex MES | IF-MIB, SNMPv2-MIB,ELTEX-MES-ISS-CPU-UTIL-MIB,ARICENT-ISS-MIB | MES 2448P ||system,if_mib,eltex_mes|
|extreme | Extreme EXOS | IF-MIB, SNMPv2-MIB,EXTREME-SYSTEM-MIB, EXTREME-SOFTWARE-MONITOR-MIB | - ||system,if_mib|
|f5_bigip | F5 BigIP | IF-MIB,SNMPv2-MIB,F5-BIGIP-SYSTEM-MIB | - |https://my.f5.com/manage/s/article/K13322|system,if_mib|
|fortigate | Fortinet Fortigate | IF-MIB,SNMPv2-MIB,FORTINET-FORTIGATE-MIB,ENTITY-MIB | v7.2.5 ||system,if_mib,hrDevice,hrStorage|
|hpe | HP Enterprise Switches | IF-MIB,SNMPv2-MIB,STATISTICS-MIB,NETSWITCH-MIB | HP ProCurve J4900B, HP J9728A 2920-48G | https://support.hpe.com/hpesc/public/docDisplay?sp4ts.oid=51079&docId=emr_na-c02597344|system,if_mib|
|huawei | Huawei VRP | IF-MIB,SNMPv2-MIB,HUAWEI-ENTITY-EXTENT-MIB | - | |system,if_mib|
|juniper | Juniper MX, Juniper SRX | IF-MIB,SNMPv2-MIB,JUNIPER-MIB | Juniper MX204 Edge Router, JUNOS 24.2R1-S1.10, Juniper SRX, Juniper EX4200-24| https://www.juniper.net/documentation/us/en/software/nce/nce-srx-cluster-management-best/topics/concept/chassis-cluster-performance-monitoring.html |system,if_mib|
|mikrotik | Mikrotik OS | HOST-RESOURCES-MIB,SNMPv2-MIB,MIKROTIK-MIB,IF-MIB | Router OS 7.3 |912UAG-5HPnD,941-2nD,1100ahx2,CCR1016-12G,CCR1036-12G-4S,rb2011ua,mikrotik450g,mikrotikrb1100ah|system,if_mib,mikrotik,hrStorage,hrDevice|
|netgear | Netgear FastPath switches | HOST-RESOURCES-MIB,SNMPv2-MIB,FASTPATH-SWITCHING-MIB,FASTPATH-BOXSERVICES-PRIVATE-MIB,IF-MIB | Netgear M5300-28G | https://kb.netgear.com/24352/MIBs-for-Smart-switches |system,if_mib|
|qtech | QTech | QTECH-MIB,EtherLike-MIB,HOST-RESOURCES-MIB,SNMPv2-MIB,ENTITY-MIB,IF-MIB | | |system,if_mib|
|tplink | TP-LINK | TPLINK-SYSINFO-MIB,HOST-RESOURCES-MIB,SNMPv2-MIB,TPLINK-SYSMONITOR-MIB,IF-MIB | T2600G-28TS | https://www.tp-link.com/en/support/download/t2600g-28ts/#MIBs_Files https://www.tp-link.com/ru/support/faq/1330/ |system,if_mib|
|ubiquiti_airos | Ubiquiti AirOS | FROGFOOT-RESOURCES-MIB,HOST-RESOURCES-MIB,SNMPv2-MIB,IEEE802dot11-MIB,IF-MIB | NanoStation M5, UAP-LR | |system,if_mib,ubiquiti_airos|

## Import

```sh
jb init
jb install https://github.com/grafana/jsonnet-libs/snmp-observ-lib
```

## Example


## Links
https://cric.grenoble.cnrs.fr/Administrateurs/Outils/MIBS/?oid=1.3.6.1.2.1.2.2.1.3
220 changes: 220 additions & 0 deletions snmp-observ-lib/alerts.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
local xtd = import 'github.com/jsonnet-libs/xtd/main.libsonnet';

{
new(this): {
local instanceLabel = xtd.array.slice(this.config.instanceLabels, -1)[0],
local groupLabel = xtd.array.slice(this.config.groupLabels, -1)[0],
groups+: [
{
name: this.config.uid + '-snmp-alerts',
rules:
[
{
alert: 'SNMPNodeHasRebooted',
expr: |||
(%s) < 600 and (%s) > 600
||| % [
this.signals.system.uptime.asRuleExpression(),
this.signals.system.uptime.withOffset('10m').asRuleExpression(),
],
labels: {
severity: 'info',
},
annotations: {
summary: 'SNMP node has rebooted.',
description: 'SNMP node {{ $labels.%(instanceLabel)s }} has rebooted {{ $value | humanize }} seconds ago.'
% this.config { instanceLabel: instanceLabel },
},
},
{
alert: 'SNMPNodeCPUHighUsage',
expr: |||
avg by (%s) (%s) > %s
||| % [
std.join(',', this.config.groupLabels + this.config.instanceLabels),
this.signals.cpu.cpuUsage.asRuleExpression(),
this.config.alertsCPUThresholdWarning,
],
'for': '15m',
keep_firing_for: '5m',
labels: {
severity: 'warning',
},
annotations: {
summary: 'High CPU usage on SNMP node.',
description: |||
CPU usage on SNMP node {{ $labels.%(instanceLabel)s }} is above %(alertsCPUThresholdWarning)s%%. The current value is {{ $value | printf "%%.2f" }}%%.
||| % this.config { instanceLabel: instanceLabel },
},
},
{
alert: 'SNMPNodeMemoryUtilization',
expr: |||
avg by (%s) (%s) > %s
||| % [
std.join(',', this.config.groupLabels + this.config.instanceLabels),
this.signals.memory.memoryUsage.asRuleExpression(),
this.config.alertMemoryUsageThresholdCritical,
],
labels: {
severity: 'info',
},
annotations: {
summary: 'High memory usage on SNMP node.',
description: |||
Memory usage on SNMP node {{ $labels.%(instanceLabel)s }} is above %(alertMemoryUsageThresholdCritical)s%%. The current value is {{ $value | printf "%%.2f" }}%%.
||| % this.config { instanceLabel: instanceLabel },
},
},
{
alert: 'SNMPInterfaceDown',
expr: |||
(%s) == 2
# only alert if interface is adminatratively up:
and (%s) != 1
|||
% [
this.signals.interface.ifOperStatus.withFilteringSelectorMixin(this.config.alertInterfaceDownSelector).asRuleExpression(),
this.signals.interface.ifAdminStatus.asRuleExpression(),
],
labels: {
severity: 'warning',
},
annotations: {
summary: 'Network interface is down on SNMP device.',
description: |||
Network interface {{$labels.ifName}} ({{$labels.ifAlias}}) on {{$labels.%s}} is down.
Only interfaces with ifAdminStatus = `up` and matching `%s` are being checked.'
||| % [instanceLabel, this.config.alertInterfaceDownSelector],
},
'for': '5m',
keep_firing_for: '3m',
},
{
alert: 'SNMPInterfaceDrops',
expr: |||
(%s) > 0
or
(%s) > 0
|||
% [
this.signals.interface.networkInDroppedPerSec.asRuleExpression(),
this.signals.interface.networkOutDroppedPerSec.asRuleExpression(),
],
labels: {
severity: 'warning',
},
annotations: {
summary: 'Too many packets discarded on the network interface.',
description: |||
Too many packets discarded on {{ $labels.%s }}, interface {{ $labels.ifName }} ({{$labels.ifAlias}}) for extended period of time (30m).
||| % [instanceLabel],
},
'for': '30m',
keep_firing_for: '3m',
},
{
alert: 'SNMPInterfaceErrors',
expr: |||
(%s) > 0
or
(%s) > 0
|||
% [
this.signals.interface.networkInErrorsPerSec.asRuleExpression(),
this.signals.interface.networkOutErrorsPerSec.asRuleExpression(),
],
labels: {
severity: 'warning',
},
annotations: {
summary: 'Too many packets with errors on the network interface.',
description: |||
Too many packets with errors on {{ $labels.%s }}, interface {{ $labels.ifName }} ({{$labels.ifAlias}}) for extended period of time (15m).
||| % [instanceLabel],
},
'for': '15m',
keep_firing_for: '3m',
},
{
alert: 'SNMPInterfaceIsFlapping',
expr: |||
changes(%s[5m]) > 5
|||
% [
this.signals.interface.ifOperStatus.asRuleExpression(),
],
labels: {
severity: 'warning',
},
annotations: {
summary: 'Network interface is flapping.',
description: |||
Network interface {{ $labels.ifName }} ({{$labels.ifAlias}}) is flapping on {{ $labels.%s }}. It has changed its status more than 5 times in the last 5 minutes.
||| % [instanceLabel],
},
'for': '0',
keep_firing_for: '3m',
},
],
},
{
name: this.config.uid + '-snmp-exporter-alerts',
rules:
[
{
alert: 'SNMPExporterEmptyResponse',
expr: 'snmp_scrape_pdus_returned{%s} <= 1' % this.config.filteringSelector,
labels: {
severity: 'warning',
},
annotations: {
summary: 'SNMP exporter returns an empty response.',
description: |||
SNMP exporter returns an empty response for node {{ $labels.%s }} and module {{ $labels.module}}. Please check that target support {{ $labels.module }} module as well as authentication and other SNMP settings.
||| % instanceLabel,
},
'for': '10m',
keep_firing_for: '3m',
},
{
alert: 'SNMPExporterSlowScrape',
expr: 'min_over_time(snmp_scrape_duration_seconds{%s}[5m]) > 50' % this.config.filteringSelector,
labels: {
severity: 'info',
},
annotations: {
summary: 'SNMP exporter scrape is slow.',
description: |||
SNMP exporter scrape of {{ $labels.%s }} is taking more than 50 seconds. Please check SNMP modules polled and that snmp_exporter is located on the same network as the SNMP target.
||| % instanceLabel,
},
'for': '10m',
keep_firing_for: '3m',
},
]
+ (
if this.config.filteringSelector != '' then
[
{
alert: 'SNMPExporterNoResponse',
expr: 'up{%s} == 0' % this.config.filteringSelector,
labels: {
severity: 'warning',
},
annotations: {
summary: 'SNMP node is down.',
description: |||
SNMP exporter scrape of node {{ $labels.%s }} is not responding to SNMP walk.
Please check network connectivity and SNMP authentication settings.
||| % instanceLabel,
},
'for': '10m',
},
]
else []
),
},
],
},
}
36 changes: 36 additions & 0 deletions snmp-observ-lib/config.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
local this = self,
filteringSelector: '', // set to apply static filters to all queries and alerts, i.e. job="integrations/snmp"
groupLabels: ['job'],
instanceLabels: ['instance'],
uid: 'snmp',
dashboardNamePrefix: '',
dashboardTags: ['snmp'],
dashboardPeriod: 'now-6h',
dashboardRefresh: '5m',
dashboardTimezone: '',

//increase min interval to match max SNMP polling interval used:
minInterval: '2m',
metricsSource: ['generic', 'cisco', 'eltex', 'eltex_mes', 'mikrotik'],

//only fire alerts 'interface is down' for the following selector:
alertInterfaceDownSelector: 'ifAlias=~".*(?i:(uplink|internet|WAN)).*"',
// cpuSelector for metricsSources with HOST-RESOURCE-MIB:
cpuSelector: 'hrDeviceType="1.3.6.1.2.1.25.3.1.3"',
// memorySelector for metricsSources with HOST-RESOURCE-MIB:
// ignore buffers for now:
memorySelector: 'hrStorageDescr!~".*(?i:(cache|buffer)).*", hrStorageType="1.3.6.1.2.1.25.2.1.2"',
mikrotikMemorySelector: 'hrStorageDescr="main memory",hrStorageIndex="65536"',

alertMemoryUsageThresholdCritical: 90,
alertsCPUThresholdWarning: 90,
signals+:
{
cpu: (import './signals/cpu.libsonnet')(this),
fleetInterface: (import './signals/interface.libsonnet')(this, level='fleet'),
memory: (import './signals/memory.libsonnet')(this),
interface: (import './signals/interface.libsonnet')(this, level='interface'),
system: (import './signals/system.libsonnet')(this),
},
}
76 changes: 76 additions & 0 deletions snmp-observ-lib/dashboards.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
local g = import './g.libsonnet';
{
local root = self,
new(this):
local prefix = this.config.dashboardNamePrefix;
local links = this.grafana.links;
local tags = this.config.dashboardTags;
local uid = g.util.string.slugify(this.config.uid);
local vars = this.grafana.variables;
local annotations = this.grafana.annotations;
local refresh = this.config.dashboardRefresh;
local period = this.config.dashboardPeriod;
local timezone = this.config.dashboardTimezone;
local panels = this.grafana.panels;
local stat = g.panel.stat;
{
'snmp-fleet.json':
g.dashboard.new(this.config.dashboardNamePrefix + 'SNMP fleet overview')
+ g.dashboard.withPanels(
g.util.panel.resolveCollapsedFlagOnRows(
g.util.grid.wrapPanels(
this.grafana.rows.fleet.panels,
)
), setPanelIDs=false
)
+ root.applyCommon(
std.setUnion(
this.signals.fleetInterface.getVariablesMultiChoice(),
this.signals.system.getVariablesSingleChoice(),
keyF=function(x) x.name
),
uid + '-snmp-fleet',
tags,
links { backToFleet+:: {} },
annotations,
timezone,
refresh,
period
),
'snmp-overview.json':
g.dashboard.new(this.config.dashboardNamePrefix + 'SNMP overview')
+ g.dashboard.withPanels(
g.util.panel.resolveCollapsedFlagOnRows(
g.util.grid.wrapPanels(
[
this.grafana.rows.system,
this.grafana.rows.interface,
]
)
), setPanelIDs=false
)
+ root.applyCommon(
std.setUnion(
this.signals.system.getVariablesSingleChoice(),
this.signals.interface.getVariablesMultiChoice(),
keyF=function(x) x.name
),
uid + '-snmp-overview',
tags,
links,
annotations,
timezone,
refresh,
period
),
},
applyCommon(vars, uid, tags, links, annotations, timezone, refresh, period):
g.dashboard.withTags(tags)
+ g.dashboard.withUid(uid)
+ g.dashboard.withLinks(std.objectValues(links))
+ g.dashboard.withTimezone(timezone)
+ g.dashboard.withRefresh(refresh)
+ g.dashboard.time.withFrom(period)
+ g.dashboard.withVariables(vars)
+ g.dashboard.withAnnotations(std.objectValues(annotations)),
}
Loading
Loading