Skip to content

Storm topology LCM testing

Nikita Marchenko edited this page Jan 23, 2018 · 11 revisions

Main goal: Stay operational between reboot events.

The only way to update storm topology is to kill it and send a new one. From official documentation: Storm won't kill the topology immediately. Instead, it deactivates all the spouts so that they don't emit any more tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS(topology.message.timeout.secs) seconds before destroying all the workers. This gives the topology enough time to complete any tuples it was processing when it got killed.

List of topologies: cache, flow, islstats, opentsdb, stats, wfm


Possible issues.

  • State data loss in Bolts (inheritors of BaseStatefulBolt) in case of topology restart
  • Offset loss for Topology Spouts, which causes execution of already processed Tuples.
 From: org.openkilda.wfm.topology.AbstractTopology
    protected org.apache.storm.kafka.KafkaSpout createKafkaSpout(String topic, String spoutId) {
        String zkRoot = String.format("/%s/%s", getTopologyName(), topic);
        ZkHosts hosts = new ZkHosts(config.getZookeeperHosts());
        SpoutConfig cfg = new SpoutConfig(hosts, topic, zkRoot, spoutId);
        cfg.startOffsetTime = OffsetRequest.EarliestTime();
        cfg.scheme = new SchemeAsMultiScheme(new StringScheme());
        cfg.bufferSizeBytes = 1024 * 1024 * 4;
        cfg.fetchSizeBytes = 1024 * 1024 * 4;
        return new org.apache.storm.kafka.KafkaSpout(cfg);
    }    
  • Issue: Bolts execute tuples before internal state were restored.
  • State restoration implemented for Cache bolt only. Other bolts have no such functionality implemented.
  • Size of Dump from TE can potentially more than Kafka can handle [need check]

Pre-conditions for all topologies:

  • Healthy controller
  • Added topology and check it
$ cat simple-topology.json
{
 "controllers": [
  {
   "host": "kilda",
   "name": "floodlight",
   "port": 6653
  }
 ],
 "links": [
  {
   "node1": "00000001",
   "node2": "00000002"
  }
 ],
 "switches": [
  {
   "dpid": "deadbeef00000001",
   "name": "00000001"
  },
  {
   "dpid": "deadbeef00000002",
   "name": "00000002"
  }
 ]
}
$ http POST :38080/topology < simple-topology.json
  • Added flow and check it
$ NB_IP=$(docker inspect --format '{{ .NetworkSettings.Networks.openkilda_default.IPAddress }}' openkilda_northbound_1)
$ cat simple-flow.json
{
  "flowid": "flow-01",
  "source": {
    "switch-id": "de:ad:be:ef:00:00:00:01",
    "port-id": 1,
    "vlan-id": 100
  },
  "destination": {
    "switch-id": "de:ad:be:ef:00:00:00:02",
    "port-id": 1,
    "vlan-id": 100
  },
  "maximum-bandwidth": 10000,
  "description": "flow-01"
}                          
$ http --auth kilda:kilda PUT $NB_IP:8080/api/v1/flows < simple-flow.json
$ http --auth kilda:kilda $NB_IP:8080/api/v1/flows/status/flow-01

Cache topology test scenario:

  • Dump cache state
$ ./probe dump-state "cache/cache"
+----------+-----------+---------+
| Topology | Component | Task ID |
+----------+-----------+---------+
|  cache   |   cache   |    3    |
+----------+-----------+---------+
+----------
|  Flows
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
|       Property      |                             Forward                              |                             Reverse                              |
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
|        flowid       |                             flow-01                              |                             flow-01                              |
|      bandwidth      |                              10000                               |                              10000                               |
|        cookie       |                       4611686018427387905                        |                       2305843009213693953                        |
|     description     |                             flow-01                              |                             flow-01                              |
|     last_updated    |                     2018-01-11T07:05:28.276Z                     |                     2018-01-11T07:05:28.276Z                     |
|      src_switch     |                     de:ad:be:ef:00:00:00:01                      |                     de:ad:be:ef:00:00:00:02                      |
|      dst_switch     |                     de:ad:be:ef:00:00:00:02                      |                     de:ad:be:ef:00:00:00:01                      |
|       src_port      |                                1                                 |                                1                                 |
|       dst_port      |                                1                                 |                                1                                 |
|       src_vlan      |                               100                                |                               100                                |
|       dst_vlan      |                               100                                |                               100                                |
|       meter_id      |                                1                                 |                                1                                 |
|     transit_vlan    |                                2                                 |                                3                                 |
| flowpath:latency_ns |                                1                                 |                                1                                 |
|        state        |                                UP                                |                                UP                                |
|                     |          switch_id          port_no   segment_latency   seq_id   |          switch_id          port_no   segment_latency   seq_id   |
|         path        | ---------------------------------------------------------------- | ---------------------------------------------------------------- |
|                     |   de:ad:be:ef:00:00:00:01      1             1            0      |   de:ad:be:ef:00:00:00:02      1             1            0      |
|                     |   de:ad:be:ef:00:00:00:02      1            None          1      |   de:ad:be:ef:00:00:00:01      1            None          1      |
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
  • Kill cache topology storm kill cache
  • Wait 30 sec timeout sleep 30
  • Health check - FAILED
$ http --auth kilda:kilda $NB_IP:8080/api/v1/health-check
HTTP/1.1 504
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Type: application/json;charset=UTF-8
Date: Thu, 11 Jan 2018 16:36:40 GMT
Expires: 0
Pragma: no-cache
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

{
    "components": {
        "cache-storm-topology": "non-operational",
        "event-wfm-storm-topology": "operational",
        "flow-storm-topology": "operational",
        "statistics-storm-topology": "operational"
    },
    "description": "Northbound API service provides API for Kilda controller",
    "name": "Northbound",
    "version": "1.0-SNAPSHOT"
}
  • Create new flow from NB - FAILED
$ http --auth kilda:kilda PUT $NB_IP:8080/api/v1/flows < simple-flow-2.json
HTTP/1.1 200
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Type: application/json;charset=UTF-8
Date: Thu, 11 Jan 2018 16:38:43 GMT
Expires: 0
Pragma: no-cache
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

{
    "description": "flow-02",
    "destination": {
        "port-id": 1,
        "switch-id": "de:ad:be:ef:00:00:00:02",
        "vlan-id": 101
    },
    "flowid": "flow-02",
    "last-updated": "2018-01-11T16:38:43.783Z",
    "maximum-bandwidth": 10000,
    "source": {
        "port-id": 1,
        "switch-id": "de:ad:be:ef:00:00:00:01",
        "vlan-id": 101
    }
}
$ http --auth kilda:kilda $NB_IP:8080/api/v1/flows/status/flow-02
HTTP/1.1 200
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Type: application/json;charset=UTF-8
Date: Thu, 11 Jan 2018 16:39:59 GMT
Expires: 0
Pragma: no-cache
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

{
    "flowid": "flow-02",
    "status": "ALLOCATED"
}

  • Get all flows - OK
  • Load topology
$ cd services/wfm
$ storm jar target/WorkflowManager-1.0-SNAPSHOT-jar-with-dependencies.jar \
org.openkilda.wfm.topology.cache.CacheTopology \
--name=cache src/main/resources/topology.properties
  • Check flow: expected state ALLOCATED, current state UP
$ http --auth kilda:kilda $NB_IP:8080/api/v1/flows/status/flow-02

HTTP/1.1 200
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Type: application/json;charset=UTF-8
Date: Fri, 12 Jan 2018 17:14:26 GMT
Expires: 0
Pragma: no-cache
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

{
    "flowid": "flow-02",
    "status": "UP"
}
  • Dump cache state and verify. Two flow is UP.
$ ./probe dump-state "cache/cache"
+----------+-----------+---------+
| Topology | Component | Task ID |
+----------+-----------+---------+
|  cache   |   cache   |    3    |
+----------+-----------+---------+
+----------
|  Flows
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
|       Property      |                             Forward                              |                             Reverse                              |
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
|        flowid       |                             flow-01                              |                             flow-01                              |
|      bandwidth      |                              10000                               |                              10000                               |
|        cookie       |                       4611686018427387906                        |                       2305843009213693954                        |
|     description     |                             flow-01                              |                             flow-01                              |
|     last_updated    |                     2018-01-12T17:10:10.889Z                     |                     2018-01-12T17:10:10.889Z                     |
|      src_switch     |                     de:ad:be:ef:00:00:00:01                      |                     de:ad:be:ef:00:00:00:02                      |
|      dst_switch     |                     de:ad:be:ef:00:00:00:02                      |                     de:ad:be:ef:00:00:00:01                      |
|       src_port      |                                1                                 |                                1                                 |
|       dst_port      |                                1                                 |                                1                                 |
|       src_vlan      |                               100                                |                               100                                |
|       dst_vlan      |                               100                                |                               100                                |
|       meter_id      |                                2                                 |                                2                                 |
|     transit_vlan    |                                4                                 |                                5                                 |
| flowpath:latency_ns |                                1                                 |                                1                                 |
|        state        |                                UP                                |                                UP                                |
|                     |          switch_id          port_no   segment_latency   seq_id   |          switch_id          port_no   segment_latency   seq_id   |
|         path        | ---------------------------------------------------------------- | ---------------------------------------------------------------- |
|                     |   de:ad:be:ef:00:00:00:01      1             1            0      |   de:ad:be:ef:00:00:00:02      1             1            0      |
|                     |   de:ad:be:ef:00:00:00:02      1            None          1      |   de:ad:be:ef:00:00:00:01      1            None          1      |
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
|       Property      |                             Forward                              |                             Reverse                              |
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
|        flowid       |                             flow-02                              |                             flow-02                              |
|      bandwidth      |                              10000                               |                              10000                               |
|        cookie       |                       4611686018427387907                        |                       2305843009213693955                        |
|     description     |                             flow-02                              |                             flow-02                              |
|     last_updated    |                     2018-01-12T17:10:10.844Z                     |                     2018-01-12T17:10:10.844Z                     |
|      src_switch     |                     de:ad:be:ef:00:00:00:01                      |                     de:ad:be:ef:00:00:00:02                      |
|      dst_switch     |                     de:ad:be:ef:00:00:00:02                      |                     de:ad:be:ef:00:00:00:01                      |
|       src_port      |                                1                                 |                                1                                 |
|       dst_port      |                                1                                 |                                1                                 |
|       src_vlan      |                               101                                |                               101                                |
|       dst_vlan      |                               101                                |                               101                                |
|       meter_id      |                                3                                 |                                3                                 |
|     transit_vlan    |                                6                                 |                                7                                 |
| flowpath:latency_ns |                                1                                 |                                1                                 |
|        state        |                                UP                                |                                UP                                |
|                     |          switch_id          port_no   segment_latency   seq_id   |          switch_id          port_no   segment_latency   seq_id   |
|         path        | ---------------------------------------------------------------- | ---------------------------------------------------------------- |
|                     |   de:ad:be:ef:00:00:00:01      1             1            0      |   de:ad:be:ef:00:00:00:02      1             1            0      |
|                     |   de:ad:be:ef:00:00:00:02      1            None          1      |   de:ad:be:ef:00:00:00:01      1            None          1      |
+---------------------+------------------------------------------------------------------+------------------------------------------------------------------+
  • Verify create\delete flow - OK