Skip to content

Commit

Permalink
Integration aws direct connect (#574)
Browse files Browse the repository at this point in the history
* init integration_aws-direct-connect for testing

* fix: namespace

* fix: connection state

* fix: lower

* fix: detectors

* fix: variables

* fix: reorg, remove vars

* fix detector after testing cutting the VIF

* fix: heatbeat detector

---------

Co-authored-by: Jean-Baptiste Simillon <jb.simillon@fr.clara.net>
  • Loading branch information
jlsclaranet and haedri authored Dec 13, 2024
1 parent a9ebcc0 commit eca7084
Show file tree
Hide file tree
Showing 15 changed files with 458 additions and 0 deletions.
10 changes: 10 additions & 0 deletions docs/severity.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
- [integration_aws-apigateway](#integration_aws-apigateway)
- [integration_aws-backup](#integration_aws-backup)
- [integration_aws-beanstalk](#integration_aws-beanstalk)
- [integration_aws-direct-connect](#integration_aws-direct-connect)
- [integration_aws-ecs-cluster](#integration_aws-ecs-cluster)
- [integration_aws-ecs-service](#integration_aws-ecs-service)
- [integration_aws-efs](#integration_aws-efs)
Expand Down Expand Up @@ -235,6 +236,15 @@
|AWS Beanstalk instance root filesystem usage|X|X|-|-|-|


## integration_aws-direct-connect

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|AWS Direct Connect heartbeat|X|-|-|-|-|
|AWS Direct Connect connection state|X|-|-|-|-|
|AWS Direct Connect virtual interface traffic|X|-|-|-|-|


## integration_aws-ecs-cluster

|Detector|Critical|Major|Minor|Warning|Info|
Expand Down
108 changes: 108 additions & 0 deletions modules/integration_aws-direct-connect/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# AWS-DIRECT-CONNECT SignalFx detectors

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
:link: **Contents**

- [How to use this module?](#how-to-use-this-module)
- [What are the available detectors in this module?](#what-are-the-available-detectors-in-this-module)
- [How to collect required metrics?](#how-to-collect-required-metrics)
- [Metrics](#metrics)
- [Related documentation](#related-documentation)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->

## How to use this module?

This directory defines a [Terraform](https://www.terraform.io/)
[module](https://www.terraform.io/language/modules/syntax) you can use in your
existing [stack](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#stack) by adding a
`module` configuration and setting its `source` parameter to URL of this folder:

```hcl
module "signalfx-detectors-integration-aws-direct-connect" {
source = "github.com/claranet/terraform-signalfx-detectors.git//modules/integration_aws-direct-connect?ref={revision}"
environment = var.environment
notifications = local.notifications
}
```

Note the following parameters:

* `source`: Use this parameter to specify the URL of the module. The double slash (`//`) is intentional and required.
Terraform uses it to specify subfolders within a Git repo (see [module
sources](https://www.terraform.io/language/modules/sources)). The `ref` parameter specifies a specific Git tag in
this repository. It is recommended to use the latest "pinned" version in place of `{revision}`. Avoid using a branch
like `master` except for testing purpose. Note that every modules in this repository are available on the Terraform
[registry](https://registry.terraform.io/modules/claranet/detectors/signalfx) and we recommend using it as source
instead of `git` which is more flexible but less future-proof.

* `environment`: Use this parameter to specify the
[environment](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#environment) used by this
instance of the module.
Its value will be added to the `prefixes` list at the start of the [detector
name](https://github.com/claranet/terraform-signalfx-detectors/wiki/Templating#example).
In general, it will also be used in the `filtering` internal sub-module to [apply
filters](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance#filtering) based on our default
[tagging convention](https://github.com/claranet/terraform-signalfx-detectors/wiki/Tagging-convention) by default.

* `notifications`: Use this parameter to define where alerts should be sent depending on their severity. It consists
of a Terraform [object](https://www.terraform.io/language/expressions/type-constraints#object) where each key represents an available
[detector rule severity](https://docs.splunk.com/observability/alerts-detectors-notifications/create-detectors-for-alerts.html#severity)
and its value is a list of recipients. Every recipients must respect the [detector notification
format](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector#notification-format).
Check the [notification binding](https://github.com/claranet/terraform-signalfx-detectors/wiki/Notifications-binding)
documentation to understand the recommended role of each severity.

These 3 parameters along with all variables defined in [common-variables.tf](common-variables.tf) are common to all
[modules](../) in this repository. Other variables, specific to this module, are available in
[variables-gen.tf](variables-gen.tf).
In general, the default configuration "works" but all of these Terraform
[variables](https://www.terraform.io/language/values/variables) make it possible to
customize the detectors behavior to better fit your needs.

Most of them represent usual tips and rules detailed in the
[guidance](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance) documentation and listed in the
common [variables](https://github.com/claranet/terraform-signalfx-detectors/wiki/Variables) dedicated documentation.

Feel free to explore the [wiki](https://github.com/claranet/terraform-signalfx-detectors/wiki) for more information about
general usage of this repository.

## What are the available detectors in this module?

This module creates the following SignalFx detectors which could contain one or multiple alerting rules:

|Detector|Critical|Major|Minor|Warning|Info|
|---|---|---|---|---|---|
|AWS Direct Connect heartbeat|X|-|-|-|-|
|AWS Direct Connect connection state|X|-|-|-|-|
|AWS Direct Connect virtual interface traffic|X|-|-|-|-|

## How to collect required metrics?

This module deploys detectors using metrics reported by the
[AWS integration](https://docs.splunk.com/Observability/gdi/get-data-in/connect/aws/aws.html) configurable
with [this Terraform module](https://github.com/claranet/terraform-signalfx-integrations/tree/master/cloud/aws).


Check the [Related documentation](#related-documentation) section for more detailed and specific information about this module dependencies.



### Metrics


Here is the list of required metrics for detectors in this module.

* `ConnectionState`
* `VirtualInterfaceBpsEgress`




## Related documentation

* [Terraform SignalFx provider](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs)
* [Terraform SignalFx detector](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector)
* [Splunk Observability integrations](https://docs.splunk.com/Observability/gdi/get-data-in/integrations.html)
1 change: 1 addition & 0 deletions modules/integration_aws-direct-connect/common-filters.tf
1 change: 1 addition & 0 deletions modules/integration_aws-direct-connect/common-locals.tf
1 change: 1 addition & 0 deletions modules/integration_aws-direct-connect/common-modules.tf
1 change: 1 addition & 0 deletions modules/integration_aws-direct-connect/common-variables.tf
1 change: 1 addition & 0 deletions modules/integration_aws-direct-connect/common-versions.tf
8 changes: 8 additions & 0 deletions modules/integration_aws-direct-connect/conf/00-heartbeat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
module: AWS Direct Connect
name: heartbeat

signals:
signal:
metric: "ConnectionState"
rules:
critical:
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
module: AWS Direct Connect
name: "Connection state"

transformation: true
aggregation: true

filtering: "filter('namespace', 'AWS/DX')"
value_unit: "state"

signals:
signal:
metric: ConnectionState
filter: "filter('stat', 'lower')"

rules:
critical:
threshold: 0
comparator: "=="
description: "Connection is down"
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module: AWS Direct Connect
name: "Virtual Interface traffic"

transformation: true
aggregation: true

filtering: "filter('namespace', 'AWS/DX')"
value_unit: "bytes"

signals:
egress_bps:
metric: VirtualInterfaceBpsEgress
filter: "filter('stat', 'sum')"

rules:
critical:
threshold: 0
comparator: "=="
signal: egress_bps
description: "No traffic detected on the virtual interface"
3 changes: 3 additions & 0 deletions modules/integration_aws-direct-connect/conf/readme.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
documentations:

source_doc:
94 changes: 94 additions & 0 deletions modules/integration_aws-direct-connect/detectors-gen.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
resource "signalfx_detector" "heartbeat" {
name = format("%s %s", local.detector_name_prefix, "AWS Direct Connect heartbeat")

authorized_writer_teams = var.authorized_writer_teams
teams = try(coalescelist(var.teams, var.authorized_writer_teams), null)
tags = compact(concat(local.common_tags, local.tags, var.extra_tags))

program_text = <<-EOF
from signalfx.detectors.not_reporting import not_reporting
signal = data('ConnectionState', filter=${module.filtering.signalflow})${var.heartbeat_aggregation_function}${var.heartbeat_transformation_function}.publish('signal')
not_reporting.detector(stream=signal, resource_identifier=None, duration='${var.heartbeat_timeframe}', auto_resolve_after='${local.heartbeat_auto_resolve_after}').publish('CRIT')
EOF

rule {
description = "has not reported in ${var.heartbeat_timeframe}"
severity = "Critical"
detect_label = "CRIT"
disabled = coalesce(var.heartbeat_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.heartbeat_notifications, "critical", []), var.notifications.critical), null)
runbook_url = try(coalesce(var.heartbeat_runbook_url, var.runbook_url), "")
tip = var.heartbeat_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject_novalue : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

max_delay = var.heartbeat_max_delay
}

resource "signalfx_detector" "connection_state" {
name = format("%s %s", local.detector_name_prefix, "AWS Direct Connect connection state")

authorized_writer_teams = var.authorized_writer_teams
teams = try(coalescelist(var.teams, var.authorized_writer_teams), null)
tags = compact(concat(local.common_tags, local.tags, var.extra_tags))

viz_options {
label = "signal"
value_suffix = "state"
}

program_text = <<-EOF
base_filtering = filter('namespace', 'AWS/DX')
signal = data('ConnectionState', filter=base_filtering and filter('stat', 'lower') and ${module.filtering.signalflow})${var.connection_state_aggregation_function}${var.connection_state_transformation_function}.publish('signal')
detect(when(signal == ${var.connection_state_threshold_critical}%{if var.connection_state_lasting_duration_critical != null}, lasting='${var.connection_state_lasting_duration_critical}', at_least=${var.connection_state_at_least_percentage_critical}%{endif})).publish('CRIT')
EOF

rule {
description = "Connection is down == ${var.connection_state_threshold_critical}state"
severity = "Critical"
detect_label = "CRIT"
disabled = coalesce(var.connection_state_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.connection_state_notifications, "critical", []), var.notifications.critical), null)
runbook_url = try(coalesce(var.connection_state_runbook_url, var.runbook_url), "")
tip = var.connection_state_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

max_delay = var.connection_state_max_delay
}

resource "signalfx_detector" "virtual_interface_traffic" {
name = format("%s %s", local.detector_name_prefix, "AWS Direct Connect virtual interface traffic")

authorized_writer_teams = var.authorized_writer_teams
teams = try(coalescelist(var.teams, var.authorized_writer_teams), null)
tags = compact(concat(local.common_tags, local.tags, var.extra_tags))

viz_options {
label = "egress_bps"
value_suffix = "bytes"
}

program_text = <<-EOF
base_filtering = filter('namespace', 'AWS/DX')
egress_bps = data('VirtualInterfaceBpsEgress', filter=base_filtering and filter('stat', 'sum') and ${module.filtering.signalflow})${var.virtual_interface_traffic_aggregation_function}${var.virtual_interface_traffic_transformation_function}.publish('egress_bps')
detect(when(egress_bps == ${var.virtual_interface_traffic_threshold_critical}%{if var.virtual_interface_traffic_lasting_duration_critical != null}, lasting='${var.virtual_interface_traffic_lasting_duration_critical}', at_least=${var.virtual_interface_traffic_at_least_percentage_critical}%{endif})).publish('CRIT')
EOF

rule {
description = "No traffic detected on the virtual interface == ${var.virtual_interface_traffic_threshold_critical}bytes"
severity = "Critical"
detect_label = "CRIT"
disabled = coalesce(var.virtual_interface_traffic_disabled, var.detectors_disabled)
notifications = try(coalescelist(lookup(var.virtual_interface_traffic_notifications, "critical", []), var.notifications.critical), null)
runbook_url = try(coalesce(var.virtual_interface_traffic_runbook_url, var.runbook_url), "")
tip = var.virtual_interface_traffic_tip
parameterized_subject = var.message_subject == "" ? local.rule_subject : var.message_subject
parameterized_body = var.message_body == "" ? local.rule_body : var.message_body
}

max_delay = var.virtual_interface_traffic_max_delay
}

15 changes: 15 additions & 0 deletions modules/integration_aws-direct-connect/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
output "connection_state" {
description = "Detector resource for connection_state"
value = signalfx_detector.connection_state
}

output "heartbeat" {
description = "Detector resource for heartbeat"
value = signalfx_detector.heartbeat
}

output "virtual_interface_traffic" {
description = "Detector resource for virtual_interface_traffic"
value = signalfx_detector.virtual_interface_traffic
}

4 changes: 4 additions & 0 deletions modules/integration_aws-direct-connect/tags.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
locals {
tags = ["integration", "aws-direct-connect"]
}

Loading

0 comments on commit eca7084

Please sign in to comment.