You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i feel like the biggest slowdown when working with polars is the inability of pyright to infer column names in pl.col(''). It is good in the sense that all other activities are quite streamlined and natural, however, the typing now is the lowest-hanging fruit in terms of usability. I would like to start a conversation about a simple and unobtrusive way column name hinting could be implemented. I did spend a minute fooling around to come up with a way, not necessarily a good way, to achieve that. Of course, this should be viewed with a great deal of skepticism since the more logical way to do typing is pydantic models but i decided to start dumb and unobtrusive.
importpolarsasplfromtypingimportGeneric, TypeVar, LiteralString, Literal, IterablefromdecimalimportDecimalfromdatetimeimportdate, time, timedelta, datetimefrompolars.functions.colimportColumnFactory, ColumnFactoryMetaT=TypeVar("T", bound=LiteralString, covariant=True)
classGExpr(pl.Expr, Generic[T]):...
IntoExpr=int|float|Decimal|date|time|datetime|timedelta|T|bool|bytes|list|GExpr[T] |pl.Series|NoneclassDF(pl.DataFrame, Generic[T]):
defselect(
self,
*exprs: IntoExpr[T] |Iterable[IntoExpr[T]],
**named_exprs: IntoExpr[T],
) ->pl.DataFrame:
returnsuper().select(*exprs, **named_exprs)
classGColumnFactoryMeta(ColumnFactoryMeta):
def__getitem__(self, item: T)->GExpr[T]:
returngetattr(self, item)
classcol(ColumnFactory, metaclass=GColumnFactoryMeta):
...
DF[Literal['a', 'bbb']](
{'a':[1,2,3], 'bbb':[323,2,42]}
).select(col['bbb']) # this allows the popup in vscode due to pyright inferring possible keys, as well as type checking the literal
Obviously, the dataframe init would have to change to accomodate the generic, possibly using the schema parameter.
Some of my own criticism of this approach includes the necessity to include an explicit schema in the init, and the fact that im not adressing pydantic well :)
The text was updated successfully, but these errors were encountered:
I understand this is not a priority, but if this approach looks roughly right i can implement the typing changes and open a PR for a closer and more comprehensive review in-citu
iliya-malecki
changed the title
DataFrame generic over column names
DataFrame generic over column names for pl.col convenience
Jan 16, 2025
Description
i feel like the biggest slowdown when working with polars is the inability of pyright to infer column names in
pl.col('')
. It is good in the sense that all other activities are quite streamlined and natural, however, the typing now is the lowest-hanging fruit in terms of usability. I would like to start a conversation about a simple and unobtrusive way column name hinting could be implemented. I did spend a minute fooling around to come up with a way, not necessarily a good way, to achieve that. Of course, this should be viewed with a great deal of skepticism since the more logical way to do typing is pydantic models but i decided to start dumb and unobtrusive.Obviously, the dataframe init would have to change to accomodate the generic, possibly using the schema parameter.
Some of my own criticism of this approach includes the necessity to include an explicit schema in the init, and the fact that im not adressing pydantic well :)
The text was updated successfully, but these errors were encountered: