-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inner and left joins with a single join condition #19
base: main
Are you sure you want to change the base?
Conversation
if (s.contains(prefix)) { | ||
return s.replace(prefix, ""); | ||
} | ||
if (s != null && prefix != null) if (s.contains(prefix)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move again as separate row
} | ||
else | ||
throw new IllegalArgumentException("FILE: " + fileName + " FOR RDFREADER DOES NOT EXIST"); | ||
if (fileName != null) if ((new File(fileName)).exists()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the consecutive if
on the same line, but I assume this is automatically done by some plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on the test cases!
} | ||
} | ||
|
||
public List<Map<String, String>> renameDataframeColumn(List<Map<String, String>> dataFrame, String oldColumn, String newColumn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to use the Java Stream API. Check the last message here: https://chat.openai.com/share/9e8482b2-3c1b-4ddc-b6f4-f131ce39f2f0
return leftJoin(leftTable, rightTable, key, key); | ||
} | ||
|
||
public List<Map<String, String>> leftJoin(List<Map<String, String>> leftTable, List<Map<String, String>> rightTable, String leftKey, String rightKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to check the code in details but I would use the Java Stream API, check https://chat.openai.com/share/9e8482b2-3c1b-4ddc-b6f4-f131ce39f2f0
return innerJoin(leftTable, rightTable, key, key); | ||
} | ||
|
||
public List<Map<String, String>> innerJoin(List<Map<String, String>> leftTable, List<Map<String, String>> rightTable, String leftKey, String rightKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to refactor and reuse portion of the code between the two joins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement a default behaviour if no condition is provided? i.e., cross-join btw the dataframes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a separate crossJoin
method. If the user knows that he wants to perform a crossJoin he would have to do so explicitly. Additionally this would make the template easier to reason about. Someone reading the template knows that i am doing a crossJoin and does not have to know that it is the behaviour of an innerJoin
when no condition is specified. From what i gathered it is non-standard to do a cross join when no condition is given to a leftJoin
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on having a separate method to do it explicitly but I would also add a method .join(table1,table2,key,key)
that behaves like JOIN in SQL, i.e., that (if I got it correctly):
- performs by default an inner join if the conditions are provided
- performs by default a cross-join if the conditions are not provided (null/empty keys)
Implemented functionality in a VTL template to facilitate left and inner joins between two dataframes. Users can now utilize
$functions.leftJoin($df1, $df2, "key of df1", "key of df2")
or$functions.innerJoin(...)
within the template.For convenience, both functions support a "shortcut" when the joining column names are identical in both dataframes. This shortcut can be invoked as
$function.leftJoin($df1, $df2, "key")
.In cases where the resulting dataframe from the join operation contains multiple columns with the same name, the system now generates a runtime exception. This exception notifies the user regarding which columns need renaming in the input dataframes and instructs them to use the
renameDataframeColumn
function.The renameDataframeColumn function is accessible from the VTL through
$functions.renameDataframeColumn($df, "oldColumn", "newColumn")
.One thing that is not addressed in this PR is the possibility of specifying multiple join conditions to be evaluated as a chain of ANDs. It should however be relatively straightforward to implement at a later date should we need it.