Skip to content

Latest commit

 

History

History
146 lines (114 loc) · 5.16 KB

README.md

File metadata and controls

146 lines (114 loc) · 5.16 KB

balboa-backend-accumulo

Java CI

This is a backend for balboa that uses Apache Accumulo as a storage and query engine. It is quite basic in its feature set and should be considered a starting point or building block in a more refined setup, most likely involving multiple input consumer frontends feeding into multiple backend instances, all connecting to one Accumulo cluster.

Requirements

Building

A self-contained jar can be built, in the source directory, like this:

$ mvn package

This should leave a balboa-backend-accumulo-<VERSION>-jar-with-dependencies.jar in the target/ subdirectory. Dependencies will be fetched automatically from Maven Central.

Configuration

The jar takes a -c command line parameter specifying the path to a properties file, which needs to contain at least the necessary Accumulo client properties needed to connect to the cluster. For example, a simple development setup using Uno could be accessed with something along the lines of:

instance.name=uno
instance.zookeepers=uno
auth.type=password
auth.principal=satta
auth.token=satta

balboa.port=4242

The balboa.port property defines the local port listened on for msgpack TCP connection from frontends.

Accumulo setup

The observation data are stored in three tables, optimized for rrname, rdata and reverse rrname look-ups (used for suffix queries). We store observations redundantly reduce the number of indirections.

Please make sure these tables are present and read/writable for the user specified in the connection details.

Table balboa_by_rrname

Row ID Column Family Column Qualifier Visibility Value
rrname-rsensorid-data-rrtype count count public LONG VARLEN
rrname-rsensorid-data-rrtype seen first public LONG VARLEN
rrname-rsensorid-data-rrtype seen last public LONG VARLEN

We use various combiners to aggregate identical observations:

setiter -class org.apache.accumulo.core.iterators.user.MaxCombiner -p 11 -t balboa_by_rrname -all      # on seen:last
setiter -class org.apache.accumulo.core.iterators.user.MinCombiner -p 13 -t balboa_by_rrname -all      # on seen:first
setiter -class org.apache.accumulo.core.iterators.user.SummingCombiner -p 12 -t balboa_by_rrname -all  # on count:count

These need to be set on the following other tables as well:

Table balboa_by_rdata

Row ID Column Family Column Qualifier Visibility Value
rdata-sensorid-rrname-rrtype count count public LONG VARLEN
rdata-sensorid-rrname-rrtype seen first public LONG VARLEN
rdata-sensorid-rrname-rrtype seen last public LONG VARLEN

Table balboa_by_rrname_rev

Row ID Column Family Column Qualifier Visibility Value
rev(rrname)-sensorid-rdata-rrtype count count public LONG VARLEN
rev(rrname)-sensorid-rdata-rrtype seen first public LONG VARLEN
rev(rrname)-sensorid-rdata-rrtype seen last public LONG VARLEN

Example run

This example run uses balboa's balboa-backend-console to directly talk to the backend rather than having to go through the GraphQL frontend.

rrname full query:

$ balboa-backend-console query -h 127.0.0.1 -p 4242 -r dns.google | head -n 1 | jq
{
  "rrname": "dns.google",
  "rrtype": "A",
  "sensor_id": "foo",
  "rdata": "8.8.4.4",
  "count": 1,
  "first_seen": 1598303837,
  "last_seen": 1598303897
}

rrname suffix query:

$ balboa-backend-console query -h 127.0.0.1 -p 4242 -r %.com.de | head -n 1 | jq
{
  "rrname": "www.jabra.com.de",
  "rrtype": "A",
  "sensor_id": "foo",
  "rdata": "152.199.21.175",
  "count": 1,
  "first_seen": 1603348710,
  "last_seen": 1603348770
}

rdata query:

$ balboa-backend-console query -h 127.0.0.1 -p 4242 -d 9.9.9.10 | jq
{
  "rrname": "dns10.quad9.net",
  "rrtype": "A",
  "sensor_id": "foo",
  "rdata": "9.9.9.10",
  "count": 1,
  "first_seen": 1603892361,
  "last_seen": 1603892421
}

PoC limitations

  • Hard-coded table names and public visibility
  • Wildcard support limited to rrname queries
  • For rrname queries, additional rdata and sensorid constraints will be matched anywhere in the row