-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Integrate SOMAColumn
: Arrow adapter methods, part 1
#3405
base: main
Are you sure you want to change the base?
[c++] Integrate SOMAColumn
: Arrow adapter methods, part 1
#3405
Conversation
SOMAColumn
in Arrow adapter methods [WIP]SOMAColumn
in Arrow adapter methods Part 2
SOMAColumn
in Arrow adapter methods Part 2SOMAColumn
in Arrow adapter methods Part 2
SOMAColumn
in Arrow adapter methods Part 2SOMAColumn
in Arrow adapter methods, part 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what the Skip
and Take
are for and document it? It looks like Take
is the index of the column to retrieve and Skip
is relevant only for geometry columns (where it's always 2)?
Also is there a way to use std::variant
or a templated type instead of std::any
or would that make things too complicated?
/** | ||
* Return a copy of the data in a specified column of an arrow table. | ||
* Complex column types are supported. The for each sub column are an | ||
* std::array<T, 2> casted as an std::any object. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* Return a copy of the data in a specified column of an arrow table. | |
* Complex column types are supported. The for each sub column are an | |
* std::array<T, 2> casted as an std::any object. | |
*/ | |
/** | |
* Return a copy of the data in a specified column of an arrow table. | |
* Complex column types are supported. The type for each sub column is | |
* an std::array<T, 2> casted as an std::any object. | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip
and Take
are used in 2 places with 2 specific sets on values (either Skip=3
and Take=2
or Skip=0
and Take=2
) and are independent of the geometry column. Their usage is to extract specific subranges of ArrowArray data and they come in handy during ArrowSchema -> TileDBSchema
where the arrow array provided has 5 values per dimension and we only need the last 2 to set the current domain.
As to using std::variant
, adding more SOMAColumn
types would require changing multiple variants. The use of std::any
here is to enable runtime polymorphism and indirectly introduces a runtime type check (via any_cast
, make_any
) between the templated function and the actual dimension type. std::variant
can provide all the above it is just a different style I am open to discuss further.
804c87e
to
a58361a
Compare
5485141
to
0e69ed7
Compare
SOMAColumn
in Arrow adapter methods, part 2SOMAColumn
: Arrow adapter methods, part 1
0e69ed7
to
d6d6187
Compare
8daf17e
to
a426c7a
Compare
d6d6187
to
af1f010
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3405 +/- ##
==========================================
+ Coverage 86.22% 86.27% +0.04%
==========================================
Files 55 55
Lines 6410 6410
==========================================
+ Hits 5527 5530 +3
+ Misses 883 880 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
af1f010
to
2401416
Compare
…nt domain checks, replace vector with span when selecting points
…nt domain checks, replace vector with span when selecting points
…nt domain checks, replace vector with span when selecting points
2401416
to
77a01e1
Compare
template <typename T, size_t Take, size_t Skip = 0> | ||
static std::vector<std::array<T, Take>> get_table_column_by_name( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to be used anywhere. Can you pull it out into a separate PR and/or add it to the branch where it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the get_table_column_by_name
function, but everything else looks good to me. However, someone with more familiarity with the arrow adapter code should look over this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @XanthosXanthopoulos !
@@ -976,8 +980,8 @@ void ArrowAdapter::_set_current_domain_slot( | |||
LOG_DEBUG(std::format( | |||
"[ArrowAdapter] {} current_domain float {} to {}", | |||
name, | |||
std::to_string(lo), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do seem to recall these were about crash avoidance on some platform (I don't recall which). I'd rather leave these as-is please.
columns.begin(), columns.end(), [&](auto col) { | ||
return strcmp( | ||
col->name().c_str(), | ||
index_column_schema->children[i]->name) == 0; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be pulled out into a utility function
if (column == columns.end()) { | ||
throw TileDBSOMAError(std::format( | ||
"[ArrowAdapter][tiledb_schema_from_arrow_schema] Index column " | ||
"{} missing", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"{} missing", | |
"'{}' missing", |
Strings in error messages should always be quoted/bracketed. Even if you think it's impossible for the string to ever be empty. Heaven forbid someday there is some bug somewhere somehow ... and an empty string gets in here ... that needs to be clear to everyone that sees the error message.
index_column_schema->children[i]->name)); | ||
} | ||
|
||
if ((*column)->tiledb_dimensions().has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward.
column
needs to becolumn_it
or some such -- it is not a column, it is an iterator- Then, const auto column = *column_it
after you check that
column _it != columns.end()` - Then the rest of these
(*column)->foo
becomecolumn->foo
as they should be
if (strcmp(child->name, col_name) != 0) { | ||
continue; | ||
if (column->name() == SOMA_GEOMETRY_COLUMN_NAME) { | ||
std::vector<std::any> dom; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As on previous PRs: we cannot simply say dom
or domain
, ever.
There are four things it can mean:
- core domain (which is soma maxdomain)
- soma domain (which is core current domain)
The names are confusing (and too late to change), and confusion is too easy, and developer confusion is high-risk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename dom
to cdslot
const void* buff, | ||
NDRectangle& ndrect, | ||
std::string name); | ||
template <typename T, size_t Take, size_t Skip = 0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please state as a compact summary, right here as a code comment, what Take
and Skip
are for, what they do, and an example usage.
This PR replaces the Arrow schema to TileDB schema transformation to use the
SOMAColumn
create methods.Also there are a set of new data converters from arrow arrays to
std::array
for simplification.This migration also enforces a current domain restriction for string dimensions to libtiledbsoma in addition to the restriction being present only on the R and Python APIs.