Sanketika-Obsrv/issue-tracker#240 Master Data Processor enhancements to add router for hudi integration #83

GayathriSrividya · 2024-06-19T10:04:54Z

No description provided.

[v 0.5] - Obsrv Core

Fix the NPE and test cases for Dataset Registry

* feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * feat: added descriptions for default configurations * feat: added descriptions for default configurations * feat: modified kafka connector input topic * feat: obsrv setup instructions * feat: revisiting open source features * feat: masterdata processor job config * Build deploy v2 (#19) * #0 - Refactor Dockerfile and Github actions workflow --------- Co-authored-by: Santhosh Vasabhaktula <santhosh@sanketika.in> Co-authored-by: ManojCKrishna <Manoj.Chintaluri@syngenta.com> * Update DatasetModels.scala * Release 1.3.0 into Main branch (#34) * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * #0 fix: update github actions release condition --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> * Update DatasetModels.scala * fix: update flink base image * fix: update flink base image --------- Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: GayathriSrividya <gayathrirajavarapu@sanketika.in> Co-authored-by: Manjunath Davanam <manjunath@sanketika.in> Co-authored-by: Manoj Krishna <92361832+ManojKrishnaChintauri@users.noreply.github.com> Co-authored-by: Santhosh Vasabhaktula <santhosh@sanketika.in> Co-authored-by: ManojCKrishna <Manoj.Chintaluri@syngenta.com> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com>

* feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * feat: added descriptions for default configurations * feat: added descriptions for default configurations * feat: modified kafka connector input topic * feat: obsrv setup instructions * feat: revisiting open source features * feat: masterdata processor job config * Build deploy v2 (#19) * #0 - Refactor Dockerfile and Github actions workflow --------- Co-authored-by: Santhosh Vasabhaktula <santhosh@sanketika.in> Co-authored-by: ManojCKrishna <Manoj.Chintaluri@syngenta.com> * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * feat: update all failed, invalid and duplicate topic names * feat: update kafka topic names in test cases * #0 fix: add individual extraction * feat: update failed event * Update ErrorConstants.scala * feat: update failed event * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * feat: add exception handling for json deserialization * Update BaseProcessFunction.scala * Update BaseProcessFunction.scala * feat: update batch failed event generation * Update ExtractionFunction.scala * feat: update invalid json exception handling * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 fix: remove cloning object * Issue #46 feat: update batch failed event * #0 fix: update github actions release condition * Update DatasetModels.scala * Issue #46 feat: add error reasons * Release 1.3.0 into Main branch (#34) * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * #0 fix: update github actions release condition --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> * Update DatasetModels.scala * Issue #2 feat: Remove kafka connector code * Issue #46 feat: add exception stack trace * Issue #46 feat: add exception stack trace * feat: add function to get all datasets * Release 1.3.1 Changes (#42) * Dataset enhancements (#38) * feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * Update DatasetModels.scala * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> * #0000 [SV] - Fallback to local redis instance if embedded redis is not starting * Update DatasetModels.scala * #0000 - refactor the denormalization logic 1. Do not fail the denormalization if the denorm key is missing 2. Add clear message whether the denorm is sucessful or failed or partially successful 3. Handle denorm for both text and number fields * #0000 - refactor: 1. Created a enum for dataset status and ignore events if the dataset is not in Live status 2. Created a outputtag for denorm failed stats 3. Parse event validation failed messages into a case class * #0000 - refactor: 1. Updated the DruidRouter job to publish data to router topics dynamically 2. Updated framework to created dynamicKafkaSink object * #0000 - mega refactoring: 1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres 2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures 3. Refactored serde - merged map and string serialization into one function and parameterized the function 4. Moved failed events sinking into a common base class 5. Master dataset processor can now do denormalization with another master dataset as well * #0000 - mega refactoring: 1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres 2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures 3. Refactored serde - merged map and string serialization into one function and parameterized the function 4. Moved failed events sinking into a common base class 5. Master dataset processor can now do denormalization with another master dataset as well * #0000 - mega refactoring: 1. Added validation to check if the event has a timestamp key and it is not blank nor invalid 2. Added timezone handling to store the data in druid in the TZ specified by the dataset * #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig * #0000 - mega refactoring: Refactored logs, error messages and metrics * #0000 - mega refactoring: Fix unit tests * #0000 - refactoring: 1. Introduced transformation mode to enable lenient transformations 2. Proper exception handling for transformer job * #0000 - refactoring: Fix test cases and code * #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2 * #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Framework test cases and bug fixes * #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: improve code coverage and fix bugs * #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100% * #0000 - refactoring: organize imports * #0000 - refactoring: 1. transformer test cases and bug fixes - code coverage is 100% * #0000 - refactoring: test cases and bug fixes --------- Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: Manjunath Davanam <manjunath@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> * #000:feat: Removed the provided scope of the kafka-client in the framework (#40) * #0000 - feat: Add dataset-type to system events (#41) * #0000 - feat: Add dataset-type to system events * #0000 - feat: Modify tests for dataset-type in system events * #0000 - feat: Remove unused getDatasetType function * #0000 - feat: Remove unused pom test dependencies * #0000 - feat: Remove unused pom test dependencies --------- Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> * Main conflicts fixes (#44) * feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * Update DatasetModels.scala * Release 1.3.0 into Main branch (#34) * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * #0 fix: update github actions release condition --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> * Update DatasetModels.scala * Issue #2 feat: Remove kafka connector code * feat: add function to get all datasets * #000:feat: Resolve conflicts --------- Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> Co-authored-by: Ravi Mula <ravismula@users.noreply.github.com> * Release 1.3.1 into Main (#43) * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * feat: update all failed, invalid and duplicate topic names * feat: update kafka topic names in test cases * #0 fix: add individual extraction * feat: update failed event * Update ErrorConstants.scala * feat: update failed event * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * feat: add exception handling for json deserialization * Update BaseProcessFunction.scala * Update BaseProcessFunction.scala * feat: update batch failed event generation * Update ExtractionFunction.scala * feat: update invalid json exception handling * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 fix: remove cloning object * Issue #46 feat: update batch failed event * #0 fix: update github actions release condition * Issue #46 feat: add error reasons * Issue #46 feat: add exception stack trace * Issue #46 feat: add exception stack trace * Release 1.3.1 Changes (#42) * Dataset enhancements (#38) * feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * Update DatasetModels.scala * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> * #0000 [SV] - Fallback to local redis instance if embedded redis is not starting * Update DatasetModels.scala * #0000 - refactor the denormalization logic 1. Do not fail the denormalization if the denorm key is missing 2. Add clear message whether the denorm is sucessful or failed or partially successful 3. Handle denorm for both text and number fields * #0000 - refactor: 1. Created a enum for dataset status and ignore events if the dataset is not in Live status 2. Created a outputtag for denorm failed stats 3. Parse event validation failed messages into a case class * #0000 - refactor: 1. Updated the DruidRouter job to publish data to router topics dynamically 2. Updated framework to created dynamicKafkaSink object * #0000 - mega refactoring: 1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres 2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures 3. Refactored serde - merged map and string serialization into one function and parameterized the function 4. Moved failed events sinking into a common base class 5. Master dataset processor can now do denormalization with another master dataset as well * #0000 - mega refactoring: 1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres 2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures 3. Refactored serde - merged map and string serialization into one function and parameterized the function 4. Moved failed events sinking into a common base class 5. Master dataset processor can now do denormalization with another master dataset as well * #0000 - mega refactoring: 1. Added validation to check if the event has a timestamp key and it is not blank nor invalid 2. Added timezone handling to store the data in druid in the TZ specified by the dataset * #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig * #0000 - mega refactoring: Refactored logs, error messages and metrics * #0000 - mega refactoring: Fix unit tests * #0000 - refactoring: 1. Introduced transformation mode to enable lenient transformations 2. Proper exception handling for transformer job * #0000 - refactoring: Fix test cases and code * #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2 * #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Framework test cases and bug fixes * #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: improve code coverage and fix bugs * #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100% * #0000 - refactoring: organize imports * #0000 - refactoring: 1. transformer test cases and bug fixes - code coverage is 100% * #0000 - refactoring: test cases and bug fixes --------- Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: Manjunath Davanam <manjunath@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> * #000:feat: Removed the provided scope of the kafka-client in the framework (#40) * #0000 - feat: Add dataset-type to system events (#41) * #0000 - feat: Add dataset-type to system events * #0000 - feat: Modify tests for dataset-type in system events * #0000 - feat: Remove unused getDatasetType function * #0000 - feat: Remove unused pom test dependencies * #0000 - feat: Remove unused pom test dependencies --------- Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> * Main conflicts fixes (#44) * feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * Update DatasetModels.scala * Release 1.3.0 into Main branch (#34) * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * #0 fix: update github actions release condition --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> * Update DatasetModels.scala * Issue #2 feat: Remove kafka connector code * feat: add function to get all datasets * #000:feat: Resolve conflicts --------- Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> Co-authored-by: Ravi Mula <ravismula@users.noreply.github.com> --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Manjunath Davanam <manjunath@sanketika.in> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> Co-authored-by: Ravi Mula <ravismula@users.noreply.github.com> * update workflow file to skip tests (#45) * Release 1.3.1 into Main (#49) * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * feat: update all failed, invalid and duplicate topic names * feat: update kafka topic names in test cases * #0 fix: add individual extraction * feat: update failed event * Update ErrorConstants.scala * feat: update failed event * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * feat: add exception handling for json deserialization * Update BaseProcessFunction.scala * Update BaseProcessFunction.scala * feat: update batch failed event generation * Update ExtractionFunction.scala * feat: update invalid json exception handling * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 fix: remove cloning object * Issue #46 feat: update batch failed event * #0 fix: update github actions release condition * Issue #46 feat: add error reasons * Issue #46 feat: add exception stack trace * Issue #46 feat: add exception stack trace * Release 1.3.1 Changes (#42) * Dataset enhancements (#38) * feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * Update DatasetModels.scala * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> * #0000 [SV] - Fallback to local redis instance if embedded redis is not starting * Update DatasetModels.scala * #0000 - refactor the denormalization logic 1. Do not fail the denormalization if the denorm key is missing 2. Add clear message whether the denorm is sucessful or failed or partially successful 3. Handle denorm for both text and number fields * #0000 - refactor: 1. Created a enum for dataset status and ignore events if the dataset is not in Live status 2. Created a outputtag for denorm failed stats 3. Parse event validation failed messages into a case class * #0000 - refactor: 1. Updated the DruidRouter job to publish data to router topics dynamically 2. Updated framework to created dynamicKafkaSink object * #0000 - mega refactoring: 1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres 2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures 3. Refactored serde - merged map and string serialization into one function and parameterized the function 4. Moved failed events sinking into a common base class 5. Master dataset processor can now do denormalization with another master dataset as well * #0000 - mega refactoring: 1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres 2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures 3. Refactored serde - merged map and string serialization into one function and parameterized the function 4. Moved failed events sinking into a common base class 5. Master dataset processor can now do denormalization with another master dataset as well * #0000 - mega refactoring: 1. Added validation to check if the event has a timestamp key and it is not blank nor invalid 2. Added timezone handling to store the data in druid in the TZ specified by the dataset * #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig * #0000 - mega refactoring: Refactored logs, error messages and metrics * #0000 - mega refactoring: Fix unit tests * #0000 - refactoring: 1. Introduced transformation mode to enable lenient transformations 2. Proper exception handling for transformer job * #0000 - refactoring: Fix test cases and code * #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2 * #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Framework test cases and bug fixes * #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: improve code coverage and fix bugs * #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100% * #0000 - refactoring: organize imports * #0000 - refactoring: 1. transformer test cases and bug fixes - code coverage is 100% * #0000 - refactoring: test cases and bug fixes --------- Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: Manjunath Davanam <manjunath@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> * #000:feat: Removed the provided scope of the kafka-client in the framework (#40) * #0000 - feat: Add dataset-type to system events (#41) * #0000 - feat: Add dataset-type to system events * #0000 - feat: Modify tests for dataset-type in system events * #0000 - feat: Remove unused getDatasetType function * #0000 - feat: Remove unused pom test dependencies * #0000 - feat: Remove unused pom test dependencies --------- Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> * Main conflicts fixes (#44) * feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * Update DatasetModels.scala * Release 1.3.0 into Main branch (#34) * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * #0 fix: update github actions release condition --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> * Update DatasetModels.scala * Issue #2 feat: Remove kafka connector code * feat: add function to get all datasets * #000:feat: Resolve conflicts --------- Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> Co-authored-by: Ravi Mula <ravismula@users.noreply.github.com> * #0000 - fix: Fix null dataset_type in DruidRouterFunction (#48) --------- Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> Co-authored-by: Ravi Mula <ravismula@users.noreply.github.com> * Develop to Release-1.0.0-GA (#52) (#53) * testing new images * testing new images * testing new images * testing new images * testing new images * build new image with bug fixes * update dockerfile * update dockerfile * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * feat: update all failed, invalid and duplicate topic names * feat: update kafka topic names in test cases * #0 fix: add individual extraction * feat: update failed event * Update ErrorConstants.scala * feat: update failed event * Issue #0 fix: upgrade ubuntu packages for vulnerabilities * feat: add exception handling for json deserialization * Update BaseProcessFunction.scala * Update BaseProcessFunction.scala * feat: update batch failed event generation * Update ExtractionFunction.scala * feat: update invalid json exception handling * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 feat: update batch failed event * Issue #46 fix: remove cloning object * Issue #46 feat: update batch failed event * #0 fix: update github actions release condition * Issue #46 feat: add error reasons * Issue #46 feat: add exception stack trace * Issue #46 feat: add exception stack trace * Dataset enhancements (#38) * feat: add connector config and connector stats update functions * Issue #33 feat: add documentation for Dataset, Datasources, Data In and Query APIs * Update DatasetModels.scala * #0 fix: upgrade packages * #0 feat: add flink dockerfiles * #0 fix: add individual extraction --------- * #0000 [SV] - Fallback to local redis instance if embedded redis is not starting * Update DatasetModels.scala * #0000 - refactor the denormalization logic 1. Do not fail the denormalization if the denorm key is missing 2. Add clear message whether the denorm is sucessful or failed or partially successful 3. Handle denorm for both text and number fields * #0000 - refactor: 1. Created a enum for dataset status and ignore events if the dataset is not in Live status 2. Created a outputtag for denorm failed stats 3. Parse event validation failed messages into a case class * #0000 - refactor: 1. Updated the DruidRouter job to publish data to router topics dynamically 2. Updated framework to created dynamicKafkaSink object * #0000 - mega refactoring: 1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres 2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures 3. Refactored serde - merged map and string serialization into one function and parameterized the function 4. Moved failed events sinking into a common base class 5. Master dataset processor can now do denormalization with another master dataset as well * #0000 - mega refactoring: 1. Made calls to getAllDatasets and getAllDatasetSources to always query postgres 2. Created BaseDatasetProcessFunction for all flink functions to extend that would dynamically resolve dataset config, initialize metrics and handle common failures 3. Refactored serde - merged map and string serialization into one function and parameterized the function 4. Moved failed events sinking into a common base class 5. Master dataset processor can now do denormalization with another master dataset as well * #0000 - mega refactoring: 1. Added validation to check if the event has a timestamp key and it is not blank nor invalid 2. Added timezone handling to store the data in druid in the TZ specified by the dataset * #0000 - minor refactoring: Updated DatasetRegistry.getDatasetSourceConfig to getAllDatasetSourceConfig * #0000 - mega refactoring: Refactored logs, error messages and metrics * #0000 - mega refactoring: Fix unit tests * #0000 - refactoring: 1. Introduced transformation mode to enable lenient transformations 2. Proper exception handling for transformer job * #0000 - refactoring: Fix test cases and code * #0000 - refactoring: upgrade embedded redis to work with macos sonoma m2 * #0000 - refactoring: Denormalizer test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Router test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Validator test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: Framework test cases and bug fixes * #0000 - refactoring: kafka connector test cases and bug fixes. Code coverage is 100% now * #0000 - refactoring: improve code coverage and fix bugs * #0000 - refactoring: improve code coverage and fix bugs --- Now the code coverage is 100% * #0000 - refactoring: organize imports * #0000 - refactoring: 1. transformer test cases and bug fixes - code coverage is 100% * #0000 - refactoring: test cases and bug fixes --------- * #000:feat: Removed the provided scope of the kafka-client in the framework (#40) * #0000 - feat: Add dataset-type to system events (#41) * #0000 - feat: Add dataset-type to system events * #0000 - feat: Modify tests for dataset-type in system events * #0000 - feat: Remove unused getDatasetType function * #0000 - feat: Remove unused pom test dependencies * #0000 - feat: Remove unused pom test dependencies * #67 feat: query system configurations from meta store * #67 fix: Refactor system configuration retrieval and update dynamic router function * #67 fix: update system config according to review * #67 fix: update test cases for system config * #67 fix: update default values in test cases * #67 fix: add get all system settings method and update test cases * #67 fix: add test case for covering exception case * #67 fix: fix data types in test cases * #67 fix: Refactor event indexing in DynamicRouterFunction * Issue #67 refactor: SystemConfig read from DB implementation * #226 fix: update test cases according to the refactor --------- Co-authored-by: Manjunath Davanam <manjunath@sanketika.in> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Sowmya N Dixit <sowmyadixit7@gmail.com> Co-authored-by: Santhosh <santhosh@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> * #0 fix: Flink base image updates --------- Co-authored-by: shiva-rakshith <rakshiths@sanketika.in> Co-authored-by: Aniket Sakinala <aniket@sanketika.in> Co-authored-by: GayathriSrividya <gayathrirajavarapu@sanketika.in> Co-authored-by: Manjunath Davanam <manjunath@sanketika.in> Co-authored-by: Manoj Krishna <92361832+ManojKrishnaChintauri@users.noreply.github.com> Co-authored-by: Santhosh Vasabhaktula <santhosh@sanketika.in> Co-authored-by: ManojCKrishna <Manoj.Chintaluri@syngenta.com> Co-authored-by: ManojKrishnaChintaluri <manojc@sanketika.in> Co-authored-by: Praveen <66662436+pveleneni@users.noreply.github.com> Co-authored-by: Anand Parthasarathy <anandp504@gmail.com> Co-authored-by: Ravi Mula <ravismula@users.noreply.github.com> Co-authored-by: Manoj Krishna <92361832+ManojKrishnaChintaluri@users.noreply.github.com>

…rtition

… handle empty datasets list for lakehouse.

…configurations only if partition key is of timestamp type.

…ion without using TimestampBasedAvroKeyGenerator.

…fixes.

…thub.com:Sanketika-Obsrv/obsrv-core into hudi-integration

…nnector

…fixes.

…thub.com:Sanketika-Obsrv/obsrv-core into hudi-integration

… enable master-dataset

…fixes to filter for only Live Datasources.

…ort retire workflow

…lakehouse

manjudr and others added 28 commits June 1, 2023 19:49

Merge pull request #2 from Sanketika-Obsrv/main

12ce080

[v 0.5] - Obsrv Core

Merge pull request #8 from Sanketika-Obsrv/main

1e5b677

Fix the NPE and test cases for Dataset Registry

feat: Hudi Flink Implementation.

fc785b5

feat: local working with metastore and localstack.

7aea469

#0000 - feat: Hudi Sink implementation

f785a2f

#0000 - feat: Hudi Sink implementation

6ee65aa

#0000 - feat: Initialize dataset RowType during job startup

1c24b07

refactor: Integrate hudi connector with dataset registry.

cbeb0ea

refactor: Integrate hudi connector with dataset registry.

e8d6378

Sanketika-Obsrv/issue-tracker#141 refactor: Enable timestamp based pa…

0f51304

…rtition

Sanketika-Obsrv/issue-tracker#141 refactor: Fix Hudi connector job to…

2458cf6

… handle empty datasets list for lakehouse.

Sanketika-Obsrv/issue-tracker#141 fix: Set Timestamp based partition …

fc9c1c0

…configurations only if partition key is of timestamp type.

Sanketika-Obsrv/issue-tracker#170 fix: Resolve timestamp based partit…

4af1b7e

…ion without using TimestampBasedAvroKeyGenerator.

Sanketika-Obsrv/issue-tracker#177 fix: Lakehouse connector flink job …

1b9c0e5

…fixes.

Sanketika-Obsrv/issue-tracker#177 merge: Merge branch 'develop' of gi…

01a8fa3

…thub.com:Sanketika-Obsrv/obsrv-core into hudi-integration

Sanketika-Obsrv/issue-tracker#177 fix: Dockerfile changes for hudi-co…

517dd4c

…nnector

Sanketika-Obsrv/issue-tracker#177 fix: Lakehouse connector flink job …

90bfd7e

…fixes.

Sanketika-Obsrv/issue-tracker#177 fix: remove unused code

43f2d2f

Sanketika-Obsrv/issue-tracker#177 merge: Merge branch 'develop' of gi…

8215fc7

…thub.com:Sanketika-Obsrv/obsrv-core into hudi-integration

Sanketika-Obsrv/issue-tracker#177 fix: remove unused code

4fb116b

Sanketika-Obsrv/issue-tracker#177 fix: remove unused code

8df0ba2

Sanketika-Obsrv/issue-tracker#177 fix: remove commented code

d888635

Sanketika-Obsrv/issue-tracker#177 fix: Hudi connector enhancements to…

8219234

… enable master-dataset

Sanketika-Obsrv/issue-tracker#177 fix: Lakehouse connector flink job …

32ae3ee

…fixes to filter for only Live Datasources.

Sanketika-Obsrv/issue-tracker#240 feat: lakehouse job changes to supp…

113a554

…ort retire workflow

Sanketika-Obsrv/issue-tracker#240 feat: master data enhancements for …

462955f

…lakehouse

GayathriSrividya closed this Jun 19, 2024

GayathriSrividya deleted the masterdata-changes-for-lakehouse branch June 19, 2024 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanketika-Obsrv/issue-tracker#240 Master Data Processor enhancements to add router for hudi integration #83

Sanketika-Obsrv/issue-tracker#240 Master Data Processor enhancements to add router for hudi integration #83

GayathriSrividya commented Jun 19, 2024

Sanketika-Obsrv/issue-tracker#240 Master Data Processor enhancements to add router for hudi integration #83

Sanketika-Obsrv/issue-tracker#240 Master Data Processor enhancements to add router for hudi integration #83

Conversation

GayathriSrividya commented Jun 19, 2024