Unsupervised Compressive Learning Sketch (UCL-sketch)
_{Official PyTorch Implementation}

Fig. 1: Learning-based Sketches for Frequency Estimation in Data Streams without Ground Truth (from Chat-GPT).

Learning-based Sketches for Frequency Estimation in Data Streams without Ground Truth
Authors: Xinyu Yuan, Yan Qiao*, Meng Li et al.
Paper: http://arxiv.org/abs/2412.03611

About

This work addresses the challenge of frequency estimation in unending data streams, which is vital across domains such as networking, social media, and finance. Traditional sketch-based approaches, using compact counters and hash functions, are computationally efficient but often sacrifice accuracy. Recent efforts to enhance sketches with deep learning have shown promise in improving performance but face limitations: reliance on labeled data, frequent retraining requirements, and high time and space costs in streaming scenarios.

To overcomethese limitations, the study introduces UCL-sketch (Unsupervised Compressive Learning Sketch), a novel framework combining the strengths of equation-based and learned sketches. Unlike prior approaches, UCL-sketch is ground-truth-free, relying on self-supervised strategy called equivalent learning using only sketch counters for online training. This enables real-time adaptation to distribution shifts in streaming data. The framework also incorporates logical buckets, allowing scalable and efficient handling of large-scale streams by splitting and learning multiple mappings with shared parameters.

Fig. 2: Overview of Our Stream Data Sketching Framework.

This repository contains:

🪐 A simple PyTorch implementation of UCL-sketch.
⚡️ Pre-processed 13-byte long 5-tuple network packet data slices.
💥 A self-contained jupyter notebook for running and evaluating all sketching algorithms: CM-sketch, C-sketch, Ideally Learned CM-sketch, Ideally Learned C-sketch, Univmon, Elastic Sketch, NitroSketch, SeqSketch and our UCL-sketch for sure.
🛸 Other useful functions and documents, such as metrics like Weighted Mean Relative Difference (WMRD).

Setup

First, download and set up the repo:

git clone https://github.com/Y-debug-sys/UCL-sketch.git
cd UCL-sketch

We provide an environment.yml file that can be used to create a Conda environment.

conda env create -f environment.yml
conda activate UCL-sketch

Running

We provide a running script for UCL-sketch in main.py. This script can be used to train UCL-sketch on the provided IP traces, but it can be easily modified to support other streaming datasets: For both Kosarak and Retail, they can be downloaded from http://fmi.uantwerpen.be/data. After downloading, extract the .dat format file into data directory. Then the usage is given by:

python main.py --config_path ./configs/{your_config_name}.yaml --data_path ./data/{your_dataset_name}.dat --ckpt ./checkpoints --data network

Moreover, you can also conduct experiments on synthetic zipfian datasets by running the following code:

python main.py --config_path ./configs/{your_config_name}.yaml --skewness {your_skew_value} --ckpt ./checkpoints --data synthetic

Regarding comparisons with baselines and evaluation (AAE, ARE, WMRD etc.), see our Jupyter demo for details.

Acknowledgments

The implementation of baselines in this codebase mainly borrows from a C++ repo called BitSense. We thank the authors for their helpful open-source contributions.

Citation

If you use this codebase, or otherwise find our work valuable, please cite UCL-sketch:

@article{yuan2024learning,
  title={Learning-based Sketches for Frequency Estimation in Data Streams without Ground Truth},
  author={Yuan, Xinyu and Qiao, Yan and Li, Meng and Wei, Zhenchun and Feng, Cuiying},
  journal={arXiv preprint arXiv:2412.03611},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Sketching		Sketching
UCL_sketch		UCL_sketch
Utils		Utils
configs		configs
data		data
figs		figs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
experiment_demo.ipynb		experiment_demo.ipynb
load_data.py		load_data.py
main.py		main.py
net_params.py		net_params.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Compressive Learning Sketch (UCL-sketch)
_{Official PyTorch Implementation}

About

Setup

Running

Acknowledgments

Citation

About

Releases

Packages

Languages

License

Y-debug-sys/UCL-sketch

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Compressive Learning Sketch (UCL-sketch)Official PyTorch Implementation

About

Setup

Running

Acknowledgments

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Unsupervised Compressive Learning Sketch (UCL-sketch)
_{Official PyTorch Implementation}

Packages