You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, our table syntax has a notion of named columns (and the names are required to always be distinct), and rows are entirely anonymous. But for some data examples, e.g. connectivity in a graph, it might make sense for the rows to be labeled as well. This would allow us to treat the contents of the table more symmetrically and more like a matrix, and might be amenable to further manipulation as matrices.
I'm wondering if we should riff of the syntax for spy: / spy "name":, check: / check "name": and do something like
I'm pretty sure this is grammatically unambiguous, but it is a breaking change: we would have to make row be a keyword, and not just row: (currently, row can be used as a normal identifier). Is this worth it?
We might also want to symmetrize the names a bit more carefully. Currently, the column names in table syntax are required to be NAME identifier tokens, but in add-column they're allowed to have spaces/not be identifiers. So we might want to loosen the grammar to
table-expr: TABLE table-headers COLON table-rows END
table-headers: [(table-header COMMA)* table-header]
table-header: NAME [COLONCOLON ann]
!!!!!! new !!!!!
table-header: STRING [COLONCOLON ann]
table-rows: [table-row* table-row]
table-row: ROWCOLON table-items
!!!!!!! new !!!!!!!!!
table-row: ROW (STRING | NAME) table-items
Then Row values can have an optional name in them (accessible with .get-name() -> Option<String>), and we'd add a new constructor [raw-named-row(name): ...] to match [raw-row: ...] and new method table.named-row(name, c1, c2, c3) to match table.row(c1, c2, c3). In the implementation, a table value would store an array of row names just as it does an array of column names.
Our load-table syntax could add a new clause row-names NAME, so that loading a spreadsheet can generate named rows as well as named columns.
If we had a hypothetical matrix library, then we could lift matrix operations to table operations, and generate tables with the appropriate row and column names (and e.g. only tolerate T1 x T2 if T1.column-names match T2.row-names, and if all the values are numeric) If row names are all present and all unique, we could also add a table-transpose operation that exchanges rows for columns, and still have a meaningfully-shaped table.
The symmetry isn't perfect, and leads to a bunch of design questions. table.column(name) returns a List<Col> values but not a name, as does table.column-n(ndex). We have table.row-n(index) returning a Row. If Rows store their own names, why don't columns? Should we add a table.row(name) method to match? Should it return an anonymous Row like an anonymous column? (If so, then table.add-row(some-named-row).row(some-named-row.name) would not equal some-named-row...which seems odd.) Should table.add-row(Row) enforce row name-uniqueness? Do we want to tolerate a mixture of anonymous and named rows?
Right now, our table syntax has a notion of named columns (and the names are required to always be distinct), and rows are entirely anonymous. But for some data examples, e.g. connectivity in a graph, it might make sense for the rows to be labeled as well. This would allow us to treat the contents of the table more symmetrically and more like a matrix, and might be amenable to further manipulation as matrices.
I'm wondering if we should riff of the syntax for
spy:
/spy "name":
,check:
/check "name":
and do something likeI'm pretty sure this is grammatically unambiguous, but it is a breaking change: we would have to make
row
be a keyword, and not justrow:
(currently,row
can be used as a normal identifier). Is this worth it?We might also want to symmetrize the names a bit more carefully. Currently, the column names in table syntax are required to be
NAME
identifier tokens, but inadd-column
they're allowed to have spaces/not be identifiers. So we might want to loosen the grammar toThen
Row
values can have an optional name in them (accessible with.get-name() -> Option<String>
), and we'd add a new constructor[raw-named-row(name): ...]
to match[raw-row: ...]
and new methodtable.named-row(name, c1, c2, c3)
to matchtable.row(c1, c2, c3)
. In the implementation, a table value would store an array of row names just as it does an array of column names.Our
load-table
syntax could add a new clauserow-names NAME
, so that loading a spreadsheet can generate named rows as well as named columns.If we had a hypothetical matrix library, then we could lift matrix operations to table operations, and generate tables with the appropriate row and column names (and e.g. only tolerate T1 x T2 if T1.column-names match T2.row-names, and if all the values are numeric) If row names are all present and all unique, we could also add a table-transpose operation that exchanges rows for columns, and still have a meaningfully-shaped table.
The symmetry isn't perfect, and leads to a bunch of design questions.
table.column(name)
returns aList<Col>
values but not a name, as doestable.column-n(ndex)
. We havetable.row-n(index)
returning aRow
. If Rows store their own names, why don't columns? Should we add atable.row(name)
method to match? Should it return an anonymous Row like an anonymous column? (If so, thentable.add-row(some-named-row).row(some-named-row.name)
would not equalsome-named-row
...which seems odd.) Shouldtable.add-row(Row)
enforce row name-uniqueness? Do we want to tolerate a mixture of anonymous and named rows?@shriram, @jpolitz , thoughts?
The text was updated successfully, but these errors were encountered: