Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the InfluxDB Timestream connector #201

Draft
wants to merge 11 commits into
base: mainline
Choose a base branch
from

Conversation

trevorbonas
Copy link
Contributor

@trevorbonas trevorbonas commented Nov 1, 2024

Issue #, if available:

N/A.

Description of changes:

The InfluxDB Timestream connector has been added.

The InfluxDB Timestream connector is a Rust application that receives, translates, and ingests line protocol data into Timestream for LiveAnalytics. The connector is intended to be deployed as part of a CloudFormation stack as a Lambda function, but can also be used as a library or run locally as a Lambda function.

The connector includes the following components:

  • A README with documentation for how the connector translates line protocol data to Timestream records, a guide for using the connector, and documentation for different configuration options.
  • A SAM template that deploys a REST API Gateway, using either synchronous or asynchronous deployment, and the connector as a Lambda function.

A few things to note:

The connector splits up batches and ingests records in chunks of 100 at a time in parallel. This means that if a record in one of these chunks has an issue, an error will occur and ingestion will stop but other records in the request batch, records that were processed before or in parallel with the chunk with the erroneous record, will be successfully ingested. When asynchronous invocation is used, the entire request will be added to the Lambda's dead letter queue. Rejected records and AWS SDK for Rust errors are logged. This will help users narrow down the problem among the line protocol points in their batch.

The connector uses the InfluxDB v3 parser, since InfluxDB v3 is written in Rust and its parser is available as a crate. This parser has a few inconsistencies in behavior compared to the InfluxDB v2 parser.

93 tests have been created for the connector, 40 integration tests and 53 unit tests. The following is a list of all tests with a short summary for each:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

forestmvey and others added 11 commits September 25, 2024 10:50
Signed-off-by: forestmvey <forestv@bitquilltech.com>
*Issue #, if available:*

N/A.

*Description of changes:*

- A pre-commit hook has been added that uses `aws-secrets` in order to prevent secrets from being committed.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
*Issue #, if available:*

N/A

*Description of changes:*

- SAM template `template.yml` added.
    - Asynchronous and synchronous invocation supported.
- Documentation for how the connector can be deployed using the SAM template added.
- `lambda_runtime` crate added.
- `LambdaEvent<serde_json::Value>` used instead of `lambda_http::Request`, in order to support more types of requests, instead of just AWS service integrations.
- Tests added for handling different kinds of `queryParameters` keys in requests.
- DLQ implemented for asynchronous invocation.
- Integration tests changed to check the newly returned `serde_json::Value` struct.

- [x] Unit tests passed.
- [x] Integration tests passed.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
- Users can now define partition keys for their newly-created tables. Partition key configuration is controlled using three environment variables, `custom_partition_key_type`, `custom_partition_key_dimension`, and `enforce_custom_partition_key`.
- Environment variables added for CDPK support.
- SAM template parameters added for CDPK support.
- Documentation added for CDPK support.
- Integration tests added for CDPK support.
- Integration test asserts moved to end of tests, in order to ensure resources are cleaned up.

- [x] Unit tests passed.
- [x] Integration tests passed.
*Issue #, if available:*

N/A.

*Description of changes:*

- Deployment permissions have been updated according to the required permissions as discovered by testing deploying the connector using its SAM template.
- `samconfig.toml` added with default stack deployment options.
- "Troubleshooting" section added to the README with two known errors users may encounter.
    - Issue with cross-platform Rust compilation.
    - ConflictExceptions occurring with concurrent instances of the connector.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
- The environment variable `local_invocation` has been added and is set to `true` by default when the Lambda function is run with `cargo lambda watch`. This environment enables responses to be returned in a format `cargo lambda` expects, in the 2.0 formatting. Otherwise, the 1.0 format will be returned, which the synchronous API Gateway expects.
- `cargo fmt` run.

- [x] Tested locally.
- [x] Tested with synchronous invocation.
- [x] Tested with asynchronous invocation.
*Issue #, if available:*

N/A.

*Description of changes:*
- Ingestion to multiple tables done in parallel.
- Ingestion of 100 records to a single table done in parallel.
- Chunking of records into batches of 100 done in parallel.
- Limit on maximum possible number of threads added.
- Print statement for each 100 records removed.
- Logging option added to SAM template.
- All `println!` calls changed to `info!`.
- Default logging level for the connector changed to `INFO`.
- Trace statements that measure the execution time of each function added.
- Instructions added to README for how to configure logging levels.
- Database and tables are no longer checked if their corresponding environment variables for creation are not set.
- The checking of whether tables exist has been moved within the asynchronous code block used to ingest records to a table. This checking is now done in parallel and the hashmap is looped through once, instead of twice.
- Instructions added to README for how to reduce stack costs.

- [x] Integration tests passed.
- [x] Unit tests passed.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
* Initial implementation for adding single-table mapping.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Adding environment variables and updating documentation.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Adding tests and revising single-table mapping.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Adding single table example to README.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Fixing measure-name using line protocol metric name for single table mapping.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Fix typo in README.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Fixing documentation typos and test setting wrong environment variable.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Fixing table for single table multi measure records example in README.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Fix invalid line protocol examples with additional comma separating the timestamp.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

---------

Signed-off-by: forestmvey <forestv@bitquilltech.com>
* Set default table mapping to multi-table for the InfluxDB Timestream Connector.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Fixing table formatting in README for InfluxDB Timestream Connector.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Updating README to reference multi-table for the default table mapping.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

---------

Signed-off-by: forestmvey <forestv@bitquilltech.com>
* Add gzip support to Go client

* Add gzip support to template

* Ignore custom_partition_key_type if it is invalid option

* Add comment about lack of local gzip support
…DB Timestream Connector (#33)

* Defining least-privilege Lambda invocation permissions for the connector. Adding output for IAM policy when deploying using the SAM template.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Making stage name dynamic for output least privilege IAM policy.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Adding section in README for ingestion permissions of Lambda function.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

* Revising wording for IAM permissions in README.

Signed-off-by: forestmvey <forestv@bitquilltech.com>

---------

Signed-off-by: forestmvey <forestv@bitquilltech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants