-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there support for capturing groups? #8
Comments
Tried
|
My apologies for the panic you got with sqlite> select * from regex_find_all('\w+', 'aaa-bb-c-dddd-eeeee') where rowid = 2;
┌───────┬─────┬───────┐
│ start │ end │ match │
├───────┼─────┼───────┤
│ 7 │ 8 │ c │
└───────┴─────┴───────┘ But for capturing groups specifically, there isn't a great way. I want to add a new |
Great. Any ETA on this by chance? |
I'll give it a shot in the next week or so! I say "it's not a great way" mostly because it's awkward and not really readable. For single values like with strings as (
select value
from json_each('[
"aaa-bb-c-dddd-eeeee",
"jjj-k-lll-mmmm-nnnn",
"only-two"
]')
)
select *
from strings
join regex_find_all(regex('\w+'), value) as parts
where parts.rowid = 2;
/*
┌─────────────────────┬───────┬─────┬───────┐
│ value │ start │ end │ match │
├─────────────────────┼───────┼─────┼───────┤
│ aaa-bb-c-dddd-eeeee │ 7 │ 8 │ c │
│ jjj-k-lll-mmmm-nnnn │ 6 │ 9 │ lll │
└─────────────────────┴───────┴─────┴───────┘
*/
First issue: There's a The second issue: For the All this to say, when capture group support is added, you'd be able to do something like: with strings as (
select value
from json_each('[
"aaa-bb-c-dddd-eeeee",
"jjj-k-lll-mmmm-nnnn",
"only-two"
]')
)
select
value,
regex_capture(
'\w+-\w+-(?P<third_word>[^-]+).*',
value,
'third_word'
) as third_word
from strings; Which, in my opinion, is much cleaner and easier to reason about. Re performance: They should be both equal, but one important note when using regex table function like |
Hey @greatvovan , I just pushed capture group support with the new Using your ID example, here's how it would work: create table items as
select value as code
from json_each('["ID123Y2023ABC", "ID456Y2022ABC", "ID789Y1984ABC"]')
;
select
items.code,
regex_capture(
regex('ID(?P<id>\d+)Y(?P<year>\d+)ABC'),
items.code,
'id'
) as id,
regex_capture(
regex('ID(?P<id>\d+)Y(?P<year>\d+)ABC'),
items.code,
'year'
) as year
from items; There's also the select
rowid,
regex_capture(captures, 'id') as id,
regex_capture(captures, 'year') as year
from regex_captures(
regex('ID(?P<id>\d+)Y(?P<year>\d+)ABC'),
"ID123Y2023ABC, ID456Y2022ABC, and ID789Y1984ABC"
);
/*
┌───────┬─────┬──────┐
│ rowid │ id │ year │
├───────┼─────┼──────┤
│ 0 │ 123 │ 2023 │
│ 1 │ 456 │ 2022 │
│ 2 │ 789 │ 1984 │
└───────┴─────┴──────┘
*/ Due to limitations in the SQLite virtual table interface, we can't return custom column names with a table function, so we can't do something like |
How can I extract the second group from
aa-bb-c-dddd-eeeee
defined as(\w+)-(\w+)-(\w+)-(\w+)-(\w+)
(nothing, if no match)?The text was updated successfully, but these errors were encountered: