-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Google Dataflow docs #3148
base: main
Are you sure you want to change the base?
Add Google Dataflow docs #3148
Conversation
docs/en/integrations/data-ingestion/google-dataflow/templates/bigquery-to-clickhouse.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor changes - spellings for files https://github.com/ClickHouse/clickhouse-docs/blob/main/scripts/aspell-dict-file.txt
@@ -3452,4 +3452,24 @@ znode | |||
znodes | |||
zookeeperSessionUptime | |||
zstd | |||
DataFlow | |||
Dataflow | |||
DataflowTemplates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we dont globally exclude this? add a file level specific exclusion
GoogleSQL | ||
InputTableSpec | ||
KMSEncryptionKey | ||
clickHousePassword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
clickHouseUsername | ||
insertDeduplicate | ||
insertDistributedSync | ||
insertQuorum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are parametters. I dont want to exclude this globally. put settings in ``
maxRetries | ||
outputDeadletterTable | ||
queryLocation | ||
queryTempDataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for these, these arent valid global exclusions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use images
for folders - scripts will rely on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the folder to images
.
docs/en/integrations/data-ingestion/google-dataflow/dataflow.md
Outdated
Show resolved
Hide resolved
|
||
[Google Dataflow](https://cloud.google.com/dataflow) is a fully managed stream and batch data processing service. It supports pipelines written in Java or Python and is built on the Apache Beam SDK. | ||
|
||
There are two main ways to use Google Dataflow with ClickHouse, both are leveraging [`ClickHouseIO`](../../apache-beam): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gingerwizard do we have any guidelines on using relative vs absolute links?
docs/en/integrations/data-ingestion/google-dataflow/dataflow.md
Outdated
Show resolved
Hide resolved
docs/en/integrations/data-ingestion/google-dataflow/dataflow.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Mikhail Shustov <restrry@gmail.com>
|
||
## List of ClickHouse Templates | ||
* [BigQuery To ClickHouse](./templates/bigquery-to-clickhouse) | ||
* GCS To ClickHouse (coming soon!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to work on these in the foreseeable future? If not, I'd recommend creating issues in https://github.com/ClickHouse/DataflowTemplates, linking them here with a CTA to upvote the issue and provide more details about the use case
@laeg, do you have something else in mind to track the signals from the field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issues are disabled for this fork, I've contacted the relevant folks to enable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created these two feature request issues:
- [Feature Request]: Create Template for GCS to ClickHouse DataflowTemplates#3
- [Feature Request]: Create Template for Pub/Sub to ClickHouse DataflowTemplates#4
and also linked to them from the docs (List of ClickHouse Templates).
Co-authored-by: Mikhail Shustov <restrry@gmail.com>
| `clickHouseUsername` | The ClickHouse username to authenticate with. | ✅ | | | ||
| `clickHousePassword` | The ClickHouse password to authenticate with. | ✅ | | | ||
| `clickHouseTable` | The target ClickHouse table name to insert the data to. | ✅ | | | ||
| `maxInsertBlockSize` | The maximum block size for insertion, if we control the creation of blocks for insertion (ClickHouseIO option). | | A `ClickHouseIO` option. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, we don't document the options in ClickHouseIO docs. Maybe we should move these to ClickHouseIO page and link them here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pure Beam code (without this template) the parameters are being set differently:
ClickHouseIO
- we set the parameters within the code with setter functions like ClickHouseIO.Write.withMaxInsertBlockSize(long),
In this template, we pass the parameters as a template options.
I'll create a section there, and add a link from the template docs to ClickHouseIO
's .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes were added also to the Apache Beam
documentation.
|
||
| BigQuery Type | ClickHouse Type | Notes | | ||
|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| [**Array Type**](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type) | [**Array Type**](https://clickhouse.com/docs/en/sql-reference/data-types/array) | The inner type must be one of the supported primitive data types listed in this table. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the nested arrays aren't supported, are they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, they are not.
I added an indication to take it into account in this apache/beam#33692 issue we opened about it.
…flow-docs # Conflicts: # docs/en/integrations/index.mdx
Summary
These pages organize the knowledge around ClickHosue and Dataflow, including Dataflow template coverage.
Checklist