-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read.xlsx() does not return the correct number of rows #307
Comments
Okay, now - that you provide an example - I see what you mean and I agree that this behavior is unpleasant. Though I don't see an easy way to fix this cleanly. The issue is within |
I won't pretend to know anything about the internals/technical details so I can't suggest the right fix. But with the risk of being ignorant, I would suggest adding some sort of padding with NULL values that can happen in read.xlsx if you think it doesn't belong in getCellInfo. Or perhaps getCellInfo can gain a boolean parameter of whether or not to always return the expected dimensions or if to simplify. |
Another way to see the dangerous bugs this can cause: Suppose you have this Excel data
When I try to grab different sets of rows, I always get the same output, which makes it impossible for me to know what the actual data in the sheet looks like! Rows 2:3, 2:4, 3:4, 2:5 - all of them return just a single value "a" and I don't know what row it's in. |
I have added a pull request #309 but merely to draft the issue. With this pull request the output looks as follows. Though this draft is not ment for merging. There are two reasons for this:
> library(openxlsx)
> read.xlsx("~/gh_issue_307.xlsx")
Row Value
1 1 <NA>
2 2 a
3 3 <NA>
> read.xlsx("~/gh_issue_307.xlsx", rows = 2:3, cols = 2:3,
+ colNames = FALSE, skipEmptyRows = FALSE, skipEmptyCols = TRUE)
X2
1 a
2 <NA>
> read.xlsx("~/gh_issue_307.xlsx", rows = 2:3, cols = 2:3,
+ colNames = FALSE, skipEmptyRows = TRUE, skipEmptyCols = FALSE)
X2 X3
1 a NA
> read.xlsx("~/gh_issue_307.xlsx", rows = 2:3, cols = 2:3,
+ colNames = FALSE, skipEmptyRows = FALSE, skipEmptyCols = FALSE)
X2 X3
1 a NA
2 <NA> NA
> read.xlsx("~/gh_issue_304.xlsx")
Row Value
1 1 a
2 2 b
3 3 <NA>
4 4 <NA>
5 5 <NA>
6 6 c
> read.xlsx("~/gh_issue_304.xlsx", rows = 2:4, cols = 2,
+ colNames = FALSE, skipEmptyRows = F)
X2
1 a
2 b
3 <NA>
> read.xlsx("~/gh_issue_304.xlsx", rows = 6:8, cols = 1:3,
+ colNames = FALSE, skipEmptyRows = F, skipEmptyCols = F)
X1 X2 X3
1 5 <NA> NA
2 6 c NA
3 NA <NA> NA |
Regarding the first point: I'm also extremely conservative myself wrt to backwards compatibility/breaking changes, so I would think it's best to add a parameter and not change the default behaviour. |
* change a few checks for cc as uninitializedField to NULL * update C++ loadvals() to use Environment (R6) rather than Reference * add more missing `self$` * some cleanup in other files from debugging
This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This is still relevant, bot |
This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
@ycphs are there plans to resolve this bug? |
This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
(Duplicate of #304 but it was closed prematurely and I don't have the option of commenting+reopening).
To continue from the last comment there: @JanMarvin I think you misunderstood the issue. You'll see the problem is you use
cols = 2
in your example.You can use this test file
test.xlsx
Calling
openxlsx::read.xlsx("test.xlsx", rows = 2:4, cols = 2, colNames = FALSE)
returnsBut it SHOULD return
Another way to make the problem obvious is trying to read rows 4:6, which will return the following:
But asking for 4:7 returns a 1-row dataframe. It should return 4 rows.
The text was updated successfully, but these errors were encountered: