Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add StarRocks connector #24720

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Max-Cheng
Copy link

Description

Support Trino access Starrocks by connector

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Add Starrocks Connector

@cla-bot cla-bot bot added the cla-signed label Jan 16, 2025
@github-actions github-actions bot added the docs label Jan 16, 2025
@hackeryang hackeryang added the jdbc Relates to Trino JDBC driver label Jan 16, 2025
@ebyhr ebyhr removed the jdbc Relates to Trino JDBC driver label Jan 16, 2025
@ebyhr
Copy link
Member

ebyhr commented Jan 16, 2025

@hackeryang jdbc lable is for JDBC driver, not JDBC-based connectors.

@ebyhr
Copy link
Member

ebyhr commented Jan 16, 2025

The old PR #17330 looks better than this PR at a glance. I would recommend asking @chenjian2664 if the PR can be reopened and how to collaborate.

@Max-Cheng
Copy link
Author

Max-Cheng commented Jan 16, 2025

The old PR #17330 looks better than this PR at a glance. I would recommend asking @chenjian2664 if the PR can be reopened and how to collaborate.

@ebyhr In fact, these are two different implementations.

This PR using a JDBC client can first fetch the query plan from the responsible FE, then distribute the fetched query plan as a parameter to all relevant BEs, and finally fetch the data returned by the BEs.
image

PR #17330 using JDBC connector can only read data from a single FE at a time. Data reading is very slow.
image

You can take a look at https://docs.starrocks.io/docs/unloading/Flink_connector/

@Max-Cheng
Copy link
Author

Max-Cheng commented Jan 16, 2025

This is benchmark on Starrocks 3.2.9 Using JDBC connector and Starrocks connector.
Comparison test on TPC-H 100G scale dataset with 22 queries.

Test data

Table Rows
customer 15 million
lineitem 600 million
nation 25
orders 150 million
part 20 million
partsupp 80 million
region 5
supplier 1 million

Result

Query Trino-435 [SR connector->SR] Trino-435 【mysql connector->SR
Q1 24470 804226
Q2 13547 70248
Q3 17045 161474
Q4 18691 240217
Q5 22624 336967
Q6 661 3979
Q7 11909 153452
Q8 27098 440370
Q9 37751 505058
Q10 10074 87535
Q11 4340 83074
Q12 7140 115119
Q13 13516 129997
Q14 3613 14239
Q15 2433 13642
Q16 9974 45004
Q17 15635 473364
Q18 43138 210435
Q19 5076 21069
Q20 14661 70100
Q21 49115 355466
Q22 6306 11543
Total 359S 4347S

@Max-Cheng Max-Cheng requested a review from ebyhr January 16, 2025 07:10
@ebyhr ebyhr marked this pull request as ready for review January 16, 2025 07:13
@ebyhr ebyhr changed the title draft:starrocks connector Add StarRocks connector Jan 16, 2025
@ebyhr
Copy link
Member

ebyhr commented Jan 16, 2025

Please fix CI failures and add following tests:

  • Test extending BaseConnectorTest
  • Type mapping test. See TestPostgreSqlTypeMapping.
  • A product test

Also, I would recommend reading developer guide.

@Max-Cheng Max-Cheng added the enhancement New feature or request label Jan 16, 2025
@chenjian2664
Copy link
Contributor

@Max-Cheng I love your implementation of the load data from BEs, great job! And I think the main difference is just between the SplitSource part(with jdbc implementation), actually it can envolve on the jdbc implementation.
For the performance, IIRC, my user case is pretty simple that user using point query with limit and the read performance is enough for the table is not so big doing data exploration.

@ebyhr Does the community still want the jdbc implementation of the Starrocks? if yes I would like to reopen that.

@Max-Cheng
Copy link
Author

Max-Cheng commented Jan 20, 2025

@Max-Cheng I love your implementation of the load data from BEs, great job! And I think the main difference is just between the SplitSource part(with jdbc implementation), actually it can envolve on the jdbc implementation. For the performance, IIRC, my user case is pretty simple that user using point query with limit and the read performance is enough for the table is not so big doing data exploration.

@ebyhr Does the community still want the jdbc implementation of the Starrocks? if yes I would like to reopen that.

@chenjian2664 Thanks for your reply. Actually, the biggest headache is how to convert the arrow format to Trino type. If you have any ideas for this connector, I'd be happy to address and improve it.

@ebyhr It may take a while for me to get the code to follow the developer guidelines, the reason for this is that some of my community customers have contacted me privately via email with this scenario and they are anxious to get them using all the released code as soon as possible. I'll finish the overall format as soon as I can and put this MR on the WIP for now.

@Max-Cheng Max-Cheng changed the title Add StarRocks connector [WIP] Add StarRocks connector Jan 20, 2025
@ebyhr
Copy link
Member

ebyhr commented Jan 20, 2025

@Max-Cheng Please note that adding a new connector usually takes a few months or over 1 year. You may need to tell the customer about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed docs enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

4 participants