-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stackstac.stack doesn't read overviews #196
Comments
More specifically, my issue is to create a stackstac that reads overviews, so I can for example create a median value composite over a country-size roi reading overviews. |
Slightly simpler example:
With rasterio reading the overview explicitly:
Via stackstac
Next step would likely be to inspect the VRT that stackstac is building to figure out why GDAL isn't using the overviews when reading via it. |
I'm not sure how stackstac is using VRT but if it is using
|
Thanks Vincent! That's good to know. A couple modifications to my example, which should get us closer (but won't necessarily work for the full example): ds2 = stackstac.stack([item], resolution=((640.8371040723982, 640.7645259938838)), assets=["vv"], dtype="float32", bounds=ds1.rio.bounds(), snap_bounds=False).squeeze()
ds2 Still slow. This (maybe?) gets the VRT that stackstac is building. It's perhaps not identical, since I'm not getting it directly out of the task graph. Instead, I'm rebuilding it and might have messed something up: import numpy as np
import rasterio.enums
dsk = ds2.__dask_graph__()
asset_table = dsk[list(dsk)[0]]
spec = ds2.attrs["spec"]
resampling = rasterio.enums.Resampling.nearest
dtype = "float32"
fill_value = np.nan
rescale = True
gdal_env = None
errors_as_nodata = ()
reader = stackstac.rio_reader.AutoParallelRioReader
x = stackstac.to_dask.asset_table_to_reader_and_window(
asset_table,
spec,
resampling,
dtype,
fill_value,
rescale,
gdal_env,
errors_as_nodata,
reader,
)
reader, window = x[0, 0] |
It does look like the WarpedVRT construction behind the scenes results in many GET requests to the full-resolution data rather than using an overview, a simple test is to check out the rasterio logs trying to just extract a single pixel value: import logging
logger = logging.getLogger('rasterio')
logger.level = logging.DEBUG
logger.addHandler(logging.StreamHandler())
item = planetary_computer.sign(pystac.read_file("https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-rtc/items/S1A_IW_GRDH_1SDV_20210129T174036_20210129T174101_036356_044426_rtc"))
href = item.assets["vv"].href
# Fixing the metadata (see linked issue above)
item.properties['proj:shape'] = [20953, 28325]
item.properties['proj:bbox'] = [325900.0, 5265850.0, 609150.0, 5475380.0]
da = stackstac.stack([item],
resolution=((640.8371040723982, 640.7645259938838)),
assets=["vv"],
dtype="float32",
xy_coords='center',
snap_bounds=False
).squeeze()
# 327 GET Requests! (should be 2-3)
with rasterio.Env(CPL_CURL_VERBOSE=True):
da.isel(x=0,y=0).load() I wonder if setting resolution only should just avoid WarpedVRT altogether (with a check that all items have same EPSG)? Or perhaps |
Interesting, thanks for reporting this @brunosan. Basically sounds like GDAL / rasterio only uses overviews when an stackstac is currently constructing a VRT up front which defines both the output CRS and the output resolution. So we're defining the resolution at dataset-open time, instead of at read time. The downscaling is defined in by the VRT, not by the I'd assumed these would be have equivalently, but it sounds like we actually need to construct the VRT at full resolution (and only if we need to change CRSs, otherwise no need for a VRT at all), then call |
Thanks everyone for the digging. import odc.stac,odc
import numpy as np
import planetary_computer
import pystac
item = planetary_computer.sign(pystac.read_file("https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-1-rtc/items/S1A_IW_GRDH_1SDV_20210129T174036_20210129T174101_036356_044426_rtc"))
ds = odc.stac.load(
[item],
chunks={},
bands="vv",
crs="EPSG:32643",
resolution=odc.geo.Resolution(640, -640)).where(lambda x: x > 0, other=np.nan)["vv"]
%time _ = ds.load()
|
Yeah, odc-stac is explicitly picking an overview level: Additionally, odc-stac is reading directly through a warp when reprojection is required, rather than a VRT like stackstac: https://github.com/opendatacube/odc-stac/blob/da03bde0ffafb158e8f56429c9b5667ee38901e3/odc/stac/_reader.py#L144-L152 @Kirill888 / @vincentsarago, do you know if there's an advantage to using warp directly for reprojection, instead of a VRT? |
I looked at it properly way before gdal 3, back then I could not convince VRT to use larger block size than 256x256, and at that size it was painfully slow. As far as overviews go, I could not reliably trigger use of overviews when combining read with projection change, only reliable way to force use of overview was to read at the native resolution of the overview itself. EDIT: I see this is related to S1 RTC which are not GCP |
On my side, I know we spent quite some time making sure we were getting data from the overview (see Sean's comment rasterio/rasterio#1373 (comment)) (other interesting link cogeotiff/rio-tiler#132) Using WarpedVRT works to fetch the overview but you need to take extra care about how you construct the WarpedVRT (see Sean's comment and https://github.com/cogeotiff/rio-tiler/blob/25fe5b30e743388027f3ae1f0cc96177cd6996fe/rio_tiler/utils.py#L223-L298). Something I just changed in rio-tiler, is that when no reprojection is needed we don't use WarpedVRT anymore, but I'm not sure it changes a lot the performance 🤷 As for
I don't. It would be great to see some benchmark (maybe using https://github.com/developmentseed/tilebench) |
I just added some tests in rio-tiler to make sure we're fetching overview (even when using WarpedVRT or a WarpedVRT): https://github.com/cogeotiff/rio-tiler/blob/main/tests/test_overview.py I've made 2 files where each IFD has it's own value, so it's pretty easy to know which IFD you are reading: |
Any updates on this? I think from a user's perspective being able to leverage overviews would be of great importance. |
The documentation implies that setting
resolution
uses overview, but as far as I can see, that it doesn't.It seems that
stackstac.stack
doesn't use or can read COG overviews. This prevents several use cases such a large composites. The parameterresolution
seems to be an output spec, not an input or read spec.I can test this by the timings here (as a one notebook here) trying to get the same resolution from a large COG:
and then use
stackstac.stack
:Yields
Versus
yields
cc @TomAugspurger
The text was updated successfully, but these errors were encountered: