Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTS5 (full-text search) support #7

Open
Anutrix opened this issue Feb 6, 2024 · 2 comments
Open

FTS5 (full-text search) support #7

Anutrix opened this issue Feb 6, 2024 · 2 comments

Comments

@Anutrix
Copy link

Anutrix commented Feb 6, 2024

FTS5 is an SQLite virtual table module that provides full-text search functionality to database applications.

Is there a way to use it with sqlite-xsv? I couldn't find a way to use using csv with using fts5.

If not can some support be added?

I currently do:

.load ./xsv0
select xsv_version();
create virtual table temp.students using csv(filename="students.csv");
CREATE VIRTUAL TABLE temp.studentsfts5 USING fts5(column1, column2, column3, column4);
INSERT INTO temp.studentsfts5 SELECT * FROM temp.students;

which for my dataset(2million records, 22 columns) takes a 40 seconds while xsv only took 1 second.

Just trying to see if I can leverage this amazingly fast library.

@asg017
Copy link
Owner

asg017 commented Feb 6, 2024

That approach (seperate csv and fts5 virtual tables) is correct, there's no way to "combine" the two in a single table. Reading a 2 million row CSV in ~1 second with sqlite-xsv sounds about right. I think the bulk of that extra 40 seconds is coming from creating the FTS5 table.

One approach to confirm this: insert the CSV data into another table first.

.timer on
.load ./xsv0
create virtual table temp.csv_students using csv(filename="students.csv");

create table students as select * from temp.csv_students;

create virtual table fts_students using fts5(col1, col2, ...);

insert into fts_students select * from students;

The .timer on here is key - it'll show how long each step will take. My guess is that the create table students as ... should take ~1-3 seconds, since it's directly reading the CSV into a regular table with sqlite-xsv. My guess is that the last line, insert into fts_students, will take a long time (>30 seconds), because it's building the FTS5 index from scratch.

Let me know how it goes!

@Anutrix
Copy link
Author

Anutrix commented Feb 6, 2024

That approach (seperate csv and fts5 virtual tables) is correct, there's no way to "combine" the two in a single table. Reading a 2 million row CSV in ~1 second with sqlite-xsv sounds about right. I think the bulk of that extra 40 seconds is coming from creating the FTS5 table.

You are correct. I had confirmed it already that was the insert command that was taking time.
Sorry about not making it clear earlier.

I was trying to find if there was a fast alternative way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants