Count unique IPV4 addresses in the provided file according to task
The app is provided as an executable jar with the main class IpCounter.
mvn package
java -jar target/ip-counter.jar target/test-classes/ips15.in
The main task of this implementation is to maximize the utilization of provided resources.
- Avoid String creation using reading file by byte and composing integer value corresponding to IP.
- Provide special data structure to store set
of
int
values efficiently. It requires$2^{32}$ bits to store all possible IPs which is around512Mb
of heap memory. - Support for concurrent access to the data structure using AtomicIntegerArray
- Concurrent processing of file: file is split into independent parts and processed in parallel if possible. The
approach is implemented using
reactive
Flow
API. -
Using
MappedByteBuffer
for faster reading parts of a file. - Minimize arithmetic operations during scans (caching appending of read digit (
res = res*10 + digit
)), replace arithmetic operations with a bit one if possible.
Measurements are performed on a low-middle laptop with the following characteristics:
Processor | AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx 2.30 GHz |
RAM | 16.0 GB (14.9 GB usable) |
SSD | NVMe____WDC_PC_SN530_SDB7 |
OS | Windows 11 (ver. 22H2) |
JDK | graalvm-ce-java17@22.3.1 |
Handling of provided file with the size of 120GB
takes about 2 min
in average.
PS C:\Users\unrea> Measure-Command { java -jar C:\Users\unrea\IdeaProjects\ip-counter\target\ip-counter.jar D:\ip_addresses\ip_addresses | Out-Host }
1000000000
Days : 0
Hours : 0
Minutes : 2
Seconds : 11
Milliseconds : 99
Ticks : 1310990407
TotalDays : 0.00151735000810185
TotalHours : 0.0364164001944444
TotalMinutes : 2.18498401166667
TotalSeconds : 131.0990407
TotalMilliseconds : 131099.0407