Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-11197 Initial check in for Yaml record reader #7665

Closed
wants to merge 6 commits into from

Conversation

dan-s1
Copy link
Contributor

@dan-s1 dan-s1 commented Sep 1, 2023

Summary

NIFI-11197

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 17

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@dan-s1 dan-s1 force-pushed the NIFI-11197 branch 3 times, most recently from 4674f75 to 351952c Compare September 4, 2023 20:23
@dan-s1
Copy link
Contributor Author

dan-s1 commented Sep 14, 2023

@exceptionfactory Is there anything else you would like me to do for this PR?

@exceptionfactory
Copy link
Contributor

@dan-s1 Nothing yet, I have been focusing on several other things, but plan to review this soon.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this new feature @dan-s1. The changes for reusing the JSON-based components look straightforward. I plan on doing some additional testing soon, but the general approach looks good.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience on this PR @dan-s1. In the course of review, I noticed several disabled test methods. Unless you are planning to adjust them for this PR, they should be removed. As noted to the comments for those tests, those particular styles may not be applicable to JSON, so it is better to leave them out for the initial version.

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 9, 2023

@exceptionfactory I see three out of the four builds succeeded but I do not see a reason for the one that failed. Also I am not sure why are there conflicts. Please advise.

@exceptionfactory
Copy link
Contributor

@dan-s1 The failure appears to be an intermittent issue based on heap space for the build. We may need to address that separately, but it is not a concern for this PR.

However, I just merged a separate PR adding StreamConstraints handling to the JSON Readers, resulting in merge conflicts. Can you rebase on the latest main branch?

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 9, 2023

@exceptionfactory There are conflicts (I guess obviously) I will try to resolve and then push.

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 13, 2023

@exceptionfactory I am trying to rework my code to handle the changes made in #7823 for NIFI-12153. What I am discovering is that YAML supports comments and there is no way to turn off parsing of comments. In addition I am trying to mimic TestJsonTreeRowRecordReader#testReadJSONStringTooLong and I am trying to use StreamReadConstraints.builder().maxStringLength(1).build() yet no exception is thrown even though I have strings longer than one character long. It would seem that aspect only works for Json parsing and not Yaml parsing. I did find the following article How to parse large YAML file in Java or Kotlin which would seem to handle larger Yaml files but I am not sure if that is the same setting as StreamReadConstraints.builder().maxStringLength(1).build(). Please advise.

@exceptionfactory
Copy link
Contributor

@exceptionfactory I am trying to rework my code to handle the changes made in #7823 for NIFI-12153. What I am discovering is that YAML supports comments and there is no way to turn off parsing of comments. In addition I am trying to mimic TestJsonTreeRowRecordReader#testReadJSONStringTooLong and I am trying to use StreamReadConstraints.builder().maxStringLength(1).build() yet no exception is thrown even though I have strings longer than one character long. It would seem that aspect only works for Json parsing and not Yaml parsing. I did find the following article How to parse large YAML file in Java or Kotlin which would seem to handle larger Yaml files but I am not sure if that is the same setting as StreamReadConstraints.builder().maxStringLength(1).build(). Please advise.

Thanks for the details @dan-s1. If the YAML implementation does not support the same constraints, then I recommend not including those property descriptors on the YAML Record Reader implementation.

@MikeThomsen
Copy link
Contributor

@dan-s1 Just out of (genuine) curiosity, what prompted this feature? I have never seen anyone use YAML as anything other than a configuration file format. What horror story/interesting use case prompted it? :-D

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 16, 2023

@exceptionfactory I am trying to rework my code to handle the changes made in #7823 for NIFI-12153. What I am discovering is that YAML supports comments and there is no way to turn off parsing of comments. In addition I am trying to mimic TestJsonTreeRowRecordReader#testReadJSONStringTooLong and I am trying to use StreamReadConstraints.builder().maxStringLength(1).build() yet no exception is thrown even though I have strings longer than one character long. It would seem that aspect only works for Json parsing and not Yaml parsing. I did find the following article How to parse large YAML file in Java or Kotlin which would seem to handle larger Yaml files but I am not sure if that is the same setting as StreamReadConstraints.builder().maxStringLength(1).build(). Please advise.

Thanks for the details @dan-s1. If the YAML implementation does not support the same constraints, then I recommend not including those property descriptors on the YAML Record Reader implementation.

@exceptionfactory Then would it be okay to have in YamlTreeReader code like:

@Override
    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
        final List<PropertyDescriptor> properties = new ArrayList<>(super.getSupportedPropertyDescriptors());
        //NOTE: Remove those properties which are not applicable for Yaml.
        properties.remove(AbstractJsonRowRecordReader.MAX_STRING_LENGTH);
        properties.remove(AbstractJsonRowRecordReader.ALLOW_COMMENTS);
        
        return properties;
    }

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 16, 2023

@dan-s1 Just out of (genuine) curiosity, what prompted this feature? I have never seen anyone use YAML as anything other than a configuration file format. What horror story/interesting use case prompted it? :-D

@MikeThomsen Please refer to the ticket for more details but the submitter's use case was to have ability to convert from Yaml to Json.

@exceptionfactory
Copy link
Contributor

@exceptionfactory Then would it be okay to have in YamlTreeReader code like:

@Override
    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
        final List<PropertyDescriptor> properties = new ArrayList<>(super.getSupportedPropertyDescriptors());
        //NOTE: Remove those properties which are not applicable for Yaml.
        properties.remove(AbstractJsonRowRecordReader.MAX_STRING_LENGTH);
        properties.remove(AbstractJsonRowRecordReader.ALLOW_COMMENTS);
        
        return properties;
    }

Thanks for the example @dan-s1, yes, if those properties do not appear to be honored with the YAML implementation, then that is one approach. However, given the number of times that this method will be called, it would be better to define a local static list of supported properties and return that reference in getSupportedPropertyDescriptors(). That does require manual maintenance of the list for the YAML Reader, but that is probably better in this case as it should force a review of whether any new property would apply to both the JSON and YAML implementations.

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 16, 2023

@exceptionfactory The problem with static is I need many of the properties which are defined in instance methods in SchemaRegistryService. I basically want to mimic getSupportedPropertyDescriptors from JsonTreeReader which is not using any static methods minus the 2 properties which are not applicable for Yaml.

@exceptionfactory
Copy link
Contributor

@exceptionfactory The problem with static is I need many of the properties which are defined in instance methods in SchemaRegistryService. I basically want to mimic getSupportedPropertyDescriptors from JsonTreeReader which is not using any static methods minus the 2 properties which are not applicable for Yaml.

Thanks for noting that detail @dan-s1, in that case, the approach you outlined seems like the best option.

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 16, 2023

@exceptionfactory I resolved the conflicts but mistakenly rebased and squashed the commits. I believe all the changes you had requested are all in there. Is that okay?

@exceptionfactory
Copy link
Contributor

Thanks for the updates @dan-s1, I will take a closer look soon.

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 16, 2023

@exceptionfactory The failed build seems to not have to do with any of my changes.

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 20, 2023

@exceptionfactory Any chance this can make 1.24?

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dan-s1, I just pushed a minor update to simplify the JsonParserFactory interface usage and streamline some test methods. I consolidated the two JsonParserFactory methods into one, which clarifies the usage in different components.

After going through a few more runtime tests, this should be ready to go.

Copy link
Contributor Author

@dan-s1 dan-s1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exceptionfactory Great changes! Just had some minor comments.

- YAML implementation returns default settings
@exceptionfactory
Copy link
Contributor

exceptionfactory commented Oct 23, 2023

@dan-s1 In the course of runtime testing, I found an issue with the Max String Length property value being null in storePropertyValues in JsonTreeReader. This is understandable since the YamlTreeReader removes that property. I rebased the branch and added one more commit with a buildStreamReadContraints() method that YamlTreeReader overrides. This approach will be clearer than null-checking the property value. I plan to run a few more tests soon.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the contribution @dan-s1, after making one more adjustment to handle the Allow Comments property, the latest version passes all tests and works as expected during runtime verification. +1 merging

@dan-s1
Copy link
Contributor Author

dan-s1 commented Oct 23, 2023

@exceptionfactory Thank you! I am not sure if there is a delay or not but in Jira I still see the status as Patch Available.

@exceptionfactory
Copy link
Contributor

You're welcome @dan-s1. I am verifying the changes for the support branch, I updated the Jira to resolved for 2.0.0, and should be able to include 1.24.0 shortly.

exceptionfactory pushed a commit that referenced this pull request Oct 23, 2023
- Adjusted JsonTreeReader implementation for sharing common Jackson components

This closes #7665

Signed-off-by: David Handermann <exceptionfactory@apache.org>
(cherry picked from commit 4b95129)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants