diff --git a/CHANGELOG.md b/CHANGELOG.md index 8edf1cde5..41f97981a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,8 +6,8 @@ We will continue to maintain Mosaic for the foreseeable future, including bug fi This release includes a number of enhancements and fixes, detailed below. -### Raster checkpointing is enabled by default -Fuse-based checkpointing for raster operations is now enabled by default and managed through: +### Raster checkpointing functions +Fuse-based checkpointing for raster operations is disabled by default but can be enabled and managed through: - spark configs `spark.databricks.labs.mosaic.raster.use.checkpoint` and `spark.databricks.labs.mosaic.raster.checkpoint`. - python: `mos.enable_gdal(spark, with_checkpoint_path=path)`. - scala: `MosaicGDAL.enableGDALWithCheckpoint(spark, path)`. @@ -23,6 +23,7 @@ We plan further enhancements to this feature (including automatic cleanup of che - `RST_Clip` now exposes the GDAL Warp option `CUTLINE_ALL_TOUCHED` which determines whether or not any given pixel is included whether the clipping geometry crosses the centre point of the pixel (false) or any part of the pixel (true). The default is true but this is now configurable. - Within clipping operations such as `RST_Clip` we now correctly set the CRS in the generated Shapefile Feature Layer used for clipping. This means that the CRS of the input geometry will be respected when clipping rasters. - Added two new functions for getting and upcasting the datatype of a raster band: `RST_Type` and `RST_UpdateType`. Use these for ensuring that the datatype of a raster is appropriate for the operations being performed, e.g. upcasting the types of integer-typed input rasters before performing raster algebra like NDVI calculations where the result needs to be a float. + - Added `RST_AsFormat`, a function that translates rasters between formats e.g. from NetCDF to GeoTIFF. - The logic underpinning `RST_MemSize` (and related operations) has been updated to fall back to estimating based on the raster dimensions and data types of each band if the raster is held in-memory. - `RST_To_Overlapping_Tiles` is renamed `RST_ToOverlappingTiles`. The original expression remains but is marked as deprecated. - `RST_WorldToRasterCoordY` now returns the correct `y` value (was returning `x`) diff --git a/docs/source/api/raster-format-readers.rst b/docs/source/api/raster-format-readers.rst index 7e77f39d6..0974b3ee3 100644 --- a/docs/source/api/raster-format-readers.rst +++ b/docs/source/api/raster-format-readers.rst @@ -98,17 +98,27 @@ The output of the reader is a DataFrame with the following columns: mos.read().format("raster_to_grid") *********************************** Reads a GDAL raster file and converts it to a grid. + It uses a pattern similar to standard spark.read.format(*).option(*).load(*) pattern. The only difference is that it uses :code:`mos.read()` instead of :code:`spark.read()`. + The raster pixels are converted to grid cells using specified combiner operation (default is mean). If the raster pixels are larger than the grid cells, the cell values can be calculated using interpolation. The interpolation method used is Inverse Distance Weighting (IDW) where the distance function is a k_ring distance of the grid. + +Rasters can be transformed into different formats as part of this process in order to overcome problems with bands +being translated into subdatasets by some GDAL operations. Our recommendation is to specify :code:`GTiff` if you run into problems here. + +Raster checkpointing should be enabled to avoid memory issues when working with large rasters. See :doc:`Checkpointing ` for more information. + The reader supports the following options: * fileExtension - file extension of the raster file (StringType) - default is *.* * vsizip - if the rasters are zipped files, set this to true (BooleanType) * resolution - resolution of the output grid (IntegerType) + * sizeInMB - size of subdivided rasters in MB. Must be supplied, must be a positive integer (IntegerType) + * convertToFormat - convert the raster to a different format (StringType) * combiner - combiner operation to use when converting raster to grid (StringType) - default is mean * retile - if the rasters are too large they can be re-tiled to smaller tiles (BooleanType) * tileSize - size of the re-tiled tiles, tiles are always squares of tileSize x tileSize (IntegerType) @@ -131,14 +141,19 @@ The reader supports the following options: .. tabs:: .. code-tab:: py - df = mos.read().format("raster_to_grid")\ - .option("fileExtension", "*.tif")\ - .option("resolution", "8")\ - .option("combiner", "mean")\ - .option("retile", "true")\ - .option("tileSize", "1000")\ - .option("kRingInterpolate", "2")\ + df = ( + mos.read() + .format("raster_to_grid") + .option("sizeInMB", "16") + .option("convertToFormat", "GTiff") + .option("resolution", "0") + .option("readSubdataset", "true") + .option("subdatasetName", "t2m") + .option("retile", "true") + .option("tileSize", "600") + .option("combiner", "avg") .load("dbfs:/path/to/raster.tif") + ) df.show() +--------+--------+------------------+ |band_id |cell_id |cell_value | @@ -151,14 +166,17 @@ The reader supports the following options: .. code-tab:: scala - val df = MosaicContext.read.format("raster_to_grid") - .option("fileExtension", "*.tif") - .option("resolution", "8") - .option("combiner", "mean") - .option("retile", "true") - .option("tileSize", "1000") - .option("kRingInterpolate", "2") - .load("dbfs:/path/to/raster.tif") + val df = MosaicContext.read + .format("raster_to_grid") + .option("sizeInMB", "16") + .option("convertToFormat", "GTiff") + .option("resolution", "0") + .option("readSubdataset", "true") + .option("subdatasetName", "t2m") + .option("retile", "true") + .option("tileSize", "600") + .option("combiner", "avg") + .load("dbfs:/path/to/raster.tif") df.show() +--------+--------+------------------+ |band_id |cell_id |cell_value | diff --git a/docs/source/conf.py b/docs/source/conf.py index e01d5e4d0..4796c392c 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -22,7 +22,7 @@ author = 'Milos Colic, Stuart Lynn, Michael Johns, Robert Whiffin' # The full version, including alpha/beta/rc tags -release = "v0.4.2" +release = "v0.4.3" # -- General configuration --------------------------------------------------- diff --git a/src/main/scala/com/databricks/labs/mosaic/datasource/multiread/RasterAsGridReader.scala b/src/main/scala/com/databricks/labs/mosaic/datasource/multiread/RasterAsGridReader.scala index 4d48a4a5d..0b632fa41 100644 --- a/src/main/scala/com/databricks/labs/mosaic/datasource/multiread/RasterAsGridReader.scala +++ b/src/main/scala/com/databricks/labs/mosaic/datasource/multiread/RasterAsGridReader.scala @@ -64,7 +64,6 @@ class RasterAsGridReader(sparkSession: SparkSession) extends MosaicDataFrameRead } else { lit(config("convertToFormat")) } - val rasterToGridCombiner = getRasterToGridFunc(config("combiner")) val loadedDf = retiledDf