You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a long-term thing so I don't forget about it.
This is related to #4: the standard regex::Regex is fast, convenient, and feature-rich and I think it makes a good default, but there's no denying with the number of regexes you'd put in a regex-filtered set it can get rather memory-intensive. So specific users may want to trade performance and / or convenience for lower memory use. Possibilities here are:
regex-lite, that is Switch from regex crate to regex-lite #4's attempt and the memory savings are tremendous (for about the same features minus rich unicode support), the performances are terrible unless the prefilter has extremely high discriminatory power, but for more resource-constrained uses, or users who are already on regex-lite (and don't mind the lower performances) it could be a nice option
regex::bytes, the memory savings are much less than lite but they can still be quite respectable, this trades away a lot of convenience as you get bytes out
lazy compilation of any of those, using std::sync::LazyLock (or once_cell::sync::Lazy for lower MSRV), for highly biased sets which have a very small number of "hot" regexes, and a much larger sets of regexes which are essentially never used, the engine would keep much more compact String or even &str around until the regex is actually needed for post-filtering and matching, this trades for less consistent behaviour however (memory will grow over time and any matching can take arbitrarily long if it triggers the compilation of several regexes), this is especially attractive for the cases where the regex set is static and embedded in the binary (so the source strings are "free").
This would likely require a trait per crate:
regex-filtered needs to parameterize on the "regex" object being stored, which may be lifetime-parameterized how to construct it the interface to match it
ua-parser further needs the extracted data kind and a way to mix that and the replacement values
The text was updated successfully, but these errors were encountered:
This is a long-term thing so I don't forget about it.
This is related to #4: the standard
regex::Regex
is fast, convenient, and feature-rich and I think it makes a good default, but there's no denying with the number of regexes you'd put in aregex-filtered
set it can get rather memory-intensive. So specific users may want to trade performance and / or convenience for lower memory use. Possibilities here are:regex-lite
, that is Switch from regex crate to regex-lite #4's attempt and the memory savings are tremendous (for about the same features minus rich unicode support), the performances are terrible unless the prefilter has extremely high discriminatory power, but for more resource-constrained uses, or users who are already on regex-lite (and don't mind the lower performances) it could be a nice optionregex::bytes
, the memory savings are much less than lite but they can still be quite respectable, this trades away a lot of convenience as you get bytes outstd::sync::LazyLock
(oronce_cell::sync::Lazy
for lower MSRV), for highly biased sets which have a very small number of "hot" regexes, and a much larger sets of regexes which are essentially never used, the engine would keep much more compactString
or even&str
around until the regex is actually needed for post-filtering and matching, this trades for less consistent behaviour however (memory will grow over time and any matching can take arbitrarily long if it triggers the compilation of several regexes), this is especially attractive for the cases where the regex set is static and embedded in the binary (so the source strings are "free").This would likely require a trait per crate:
regex-filtered
needs to parameterize on the "regex" object being stored, which may be lifetime-parameterized how to construct it the interface tomatch
itua-parser
further needs the extracted data kind and a way to mix that and the replacement valuesThe text was updated successfully, but these errors were encountered: