-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV reader issue on Windows with polars rs-0.44.2 #1288
Comments
I don't have access to a Windows machine, I'll need a VM. I'll try that later today if possible |
Oh, sorry. I misunderstood that you were using Windows for development. I guess one way is to do a release and wait for user feedback rather than spending time debugging on Windows, what do you think about doing a release in this state? |
Actually I had a windows machine until recently but I don't have access to it anymore, you couldn't know.
I'd rather try to fix this before the release. I'll take a look tonight. Also I'd like to check that it doesn't break anything in |
Ok, thanks. In fact I usually use a Windows machine but only use Docker for programming and in the past I had a bad experience with Rust and cmake on Windows so I am not confident to build polars on Windows....... |
I can reproduce this error with r-polars on Windows. If I use py-polars, using the URL directly works fine. However, if I use R's (It's very annoying to do this in a VM so I'm looking for an easy fix on the R side, I don't plan on doing more important changes in Rust.) |
@eitsupi I'm going to bed, if you're fine with the PR to close this issue then you can make a new release, I prepared |
Thanks! Another thing I noticed is that Line 227 in 2899703
https://github.com/apache/arrow/blob/ad752482f819df638d9438feff6cad3c49946fc7/r/R/io.R#L247 |
I think I understand the problem: if mode is not specified (
In this case, the newline character in the original CSV file was Original one is ended with $ od -c orig.csv | tail
5621460 B u i l d i n g W a s h i n
5621500 g t o n D C 2 0 5 1 5 - 0 9
5621520 0 1 , 2 0 2 - 2 2 5 - 4 1 3 6 ,
5621540 , , , , , , , , G 0 0 0 5 7 8 ,
5621560 , N 0 0 0 3 9 5 0 3 , , H 6 F L
5621600 0 1 1 1 9 , 1 0 4 5 2 8 , 4 1 2
5621620 6 9 0 , 1 1 7 1 0 1 , M a t t
5621640 G a e t z , , 2 1 7 1 9 , M a t
5621660 t G a e t z \r \n
5621671
$ od -c broken.csv | tail
5651240 f f i c e B u i l d i n g W
5651260 a s h i n g t o n D C 2 0 5
5651300 1 5 - 0 9 0 1 , 2 0 2 - 2 2 5 -
5651320 4 1 3 6 , , , , , , , , , G 0 0
5651340 0 5 7 8 , , N 0 0 0 3 9 5 0 3 ,
5651360 , H 6 F L 0 1 1 1 9 , 1 0 4 5 2
5651400 8 , 4 1 2 6 9 0 , 1 1 7 1 0 1 ,
5651420 M a t t G a e t z , , 2 1 7 1
5651440 9 , M a t t G a e t z \r \r \n
5651457 So I think |
It seems that an error is occurring in the following location only on Windows.
r-polars/vignettes/userguide.Rmd
Lines 229 to 240 in a37b3c0
The cell in the CSV in issue is now this, and a comma can be found there.
@etiennebacher Could you please check if you can reproduce the problem in Windows (R and Python)?
Maybe related to this change: pola-rs/polars#19088
Originally posted by @eitsupi in #1271 (comment)
The text was updated successfully, but these errors were encountered: