High Average-Utility Itemset Sampling under Length Constraints

Cite: "Diop, L. (2022). High Average-Utility Itemset Sampling Under Length Constraints. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13281. Springer, Cham. https://doi.org/10.1007/978-3-031-05936-0_11"

High Utility Itemset extraction algorithms are methods for discovering knowledge in a database where the items are weighted. Their usefulness has been widely demonstrated in many real world applications. The traditional algorithms return the set of all patterns with a utility above a minimum utility threshold which is difficult to fix, while top-k algorithms tend to lack of diversity in the produced patterns. We propose an algorithm named HAISAMPLER to sample itemsets where each itemset is drawn with a probability proportional to its average-utility in the database and under length constraints to avoid the long and rare itemsets with low weighted items. The originality of our method stems from the fact that it combines length constraints with qualitative and quantitative utilities. Experiments show that HAISAMPLER extracts thousands of high average-utility patterns in a few seconds from different databases.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
DatasetsHUI		DatasetsHUI
XP_Output		XP_Output
XP_Stat		XP_Stat
HAISampler.py		HAISampler.py
LICENSE		LICENSE
README.md		README.md
proof.pdf		proof.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High Average-Utility Itemset Sampling under Length Constraints

About

Releases

Packages

Languages

License

HAISampler/haisampler-src

Folders and files

Latest commit

History

Repository files navigation

High Average-Utility Itemset Sampling under Length Constraints

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages