co-author: Marina Angelovska
With the help of a socket network traffic simulator written in Java, it has been decided to develop a simple Spark Streaming application to process and monitor TCP-based tuples consisting of a port + ip_address. A Spark Streaming function called countByValueAndWindow has been used to filter the occurrences of the tuples in a certain time window and above a certain threshold.
Run the java helper to simulate the network traffic and after that run the pyspark class to monitor and count. In order to run the script, the following bash-command was used:
spark-submit frequent ips.py < host > < port > < min packets > < window >
With the following command spark-submit frequent ips.py localhost 9999 5 30
, the script listens to the port 9999 on localhost, it sets 5 to the min packets variable and it counts in a window of 30 seconds. The following pictures show two subsequent time windows: