Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we add a notion of "named rows"? #1746

Open
blerner opened this issue May 20, 2024 · 0 comments
Open

Should we add a notion of "named rows"? #1746

blerner opened this issue May 20, 2024 · 0 comments

Comments

@blerner
Copy link
Member

blerner commented May 20, 2024

Right now, our table syntax has a notion of named columns (and the names are required to always be distinct), and rows are entirely anonymous. But for some data examples, e.g. connectivity in a graph, it might make sense for the rows to be labeled as well. This would allow us to treat the contents of the table more symmetrically and more like a matrix, and might be amenable to further manipulation as matrices.

I'm wondering if we should riff of the syntax for spy: / spy "name":, check: / check "name": and do something like

table: c1, c2, c3:
  row: r1c1, r1c2, r1c3
  row: r2c1, r2c2, r2c3
end

table: c1, c2, c3:
  row "r1": r1c1, r1c2, r1c3
  row "r2": r2c1, r2c2, r2c3
end

I'm pretty sure this is grammatically unambiguous, but it is a breaking change: we would have to make row be a keyword, and not just row: (currently, row can be used as a normal identifier). Is this worth it?

We might also want to symmetrize the names a bit more carefully. Currently, the column names in table syntax are required to be NAME identifier tokens, but in add-column they're allowed to have spaces/not be identifiers. So we might want to loosen the grammar to

table-expr: TABLE table-headers COLON table-rows END
table-headers: [(table-header COMMA)* table-header]
table-header: NAME [COLONCOLON ann]
!!!!!! new !!!!!
table-header: STRING [COLONCOLON ann]

table-rows: [table-row* table-row]
table-row: ROWCOLON table-items
!!!!!!! new !!!!!!!!!
table-row: ROW (STRING | NAME) table-items

Then Row values can have an optional name in them (accessible with .get-name() -> Option<String>), and we'd add a new constructor [raw-named-row(name): ...] to match [raw-row: ...] and new method table.named-row(name, c1, c2, c3) to match table.row(c1, c2, c3). In the implementation, a table value would store an array of row names just as it does an array of column names.

Our load-table syntax could add a new clause row-names NAME, so that loading a spreadsheet can generate named rows as well as named columns.

If we had a hypothetical matrix library, then we could lift matrix operations to table operations, and generate tables with the appropriate row and column names (and e.g. only tolerate T1 x T2 if T1.column-names match T2.row-names, and if all the values are numeric) If row names are all present and all unique, we could also add a table-transpose operation that exchanges rows for columns, and still have a meaningfully-shaped table.

The symmetry isn't perfect, and leads to a bunch of design questions. table.column(name) returns a List<Col> values but not a name, as does table.column-n(ndex). We have table.row-n(index) returning a Row. If Rows store their own names, why don't columns? Should we add a table.row(name) method to match? Should it return an anonymous Row like an anonymous column? (If so, then table.add-row(some-named-row).row(some-named-row.name) would not equal some-named-row...which seems odd.) Should table.add-row(Row) enforce row name-uniqueness? Do we want to tolerate a mixture of anonymous and named rows?

@shriram, @jpolitz , thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant