-
Hello xarray community, I'm currently working on a project involving mid-large datasets and have encountered a warning related to chunking that I'm hoping to get some clarification on. The warning message I'm receiving is as follows:
I do not understand the reason for this warning, especially since when I initially open the dataset with no specified chunk sizes and execute the .chunksizes method, the result indicates that there are no chunks. However, when I attempt to specify chunks along the "x" dimension starting at index 1000, I receive this warning. Could someone please provide some insights into why this warning occurs and how I can address it? Specifically, I'm interested in understanding: The implications of the warning message in terms of performance degradation. Thank you in advance for your help and insights! The dataset I am currently using is the raster file of the Harmonized World Soil Database v2.0 This is the code that shows the warning:
With
If I remove the chunks for "x" only, there is no warning. However, these are the chunk sizes for
Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
can you check the hwsd2_rast["<variable1>"].encoding This will tell you the size of the chunks on disk, which are the smallest unit of what we can load into memory. If your dataset is not chunked along |
Beta Was this translation helpful? Give feedback.
not quite: the chunk size tells you the size of the chunks, not the number of chunks along that dimension. In this case, your dataset appears to store the data row by row. Thus, instead of somewhat rectangular chunks, it appears you have a single chunk along
x
and many1
-sized chunks alongy
.With the code you posted, you'll load
5000
of these on-disk chunks into memory on a worker, then split them into 9 smaller chunks alongx
(which is what the warning recommended). For that to work properly, however, you need to specifychunks={}
in theopen_dataset
call, which will give you the chunksize on disk.