Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final tweaks before release #598

Merged
merged 18 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ We will continue to maintain Mosaic for the foreseeable future, including bug fi

This release includes a number of enhancements and fixes, detailed below.

### Raster checkpointing is enabled by default
Fuse-based checkpointing for raster operations is now enabled by default and managed through:
### Raster checkpointing functions
Fuse-based checkpointing for raster operations is disabled by default but can be enabled and managed through:
- spark configs `spark.databricks.labs.mosaic.raster.use.checkpoint` and `spark.databricks.labs.mosaic.raster.checkpoint`.
- python: `mos.enable_gdal(spark, with_checkpoint_path=path)`.
- scala: `MosaicGDAL.enableGDALWithCheckpoint(spark, path)`.
Expand All @@ -23,6 +23,7 @@ We plan further enhancements to this feature (including automatic cleanup of che
- `RST_Clip` now exposes the GDAL Warp option `CUTLINE_ALL_TOUCHED` which determines whether or not any given pixel is included whether the clipping geometry crosses the centre point of the pixel (false) or any part of the pixel (true). The default is true but this is now configurable.
- Within clipping operations such as `RST_Clip` we now correctly set the CRS in the generated Shapefile Feature Layer used for clipping. This means that the CRS of the input geometry will be respected when clipping rasters.
- Added two new functions for getting and upcasting the datatype of a raster band: `RST_Type` and `RST_UpdateType`. Use these for ensuring that the datatype of a raster is appropriate for the operations being performed, e.g. upcasting the types of integer-typed input rasters before performing raster algebra like NDVI calculations where the result needs to be a float.
- Added `RST_AsFormat`, a function that translates rasters between formats e.g. from NetCDF to GeoTIFF.
- The logic underpinning `RST_MemSize` (and related operations) has been updated to fall back to estimating based on the raster dimensions and data types of each band if the raster is held in-memory.
- `RST_To_Overlapping_Tiles` is renamed `RST_ToOverlappingTiles`. The original expression remains but is marked as deprecated.
- `RST_WorldToRasterCoordY` now returns the correct `y` value (was returning `x`)
Expand Down
48 changes: 33 additions & 15 deletions docs/source/api/raster-format-readers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,17 +98,27 @@ The output of the reader is a DataFrame with the following columns:
mos.read().format("raster_to_grid")
***********************************
Reads a GDAL raster file and converts it to a grid.

It uses a pattern similar to standard spark.read.format(*).option(*).load(*) pattern.
The only difference is that it uses :code:`mos.read()` instead of :code:`spark.read()`.

The raster pixels are converted to grid cells using specified combiner operation (default is mean).
If the raster pixels are larger than the grid cells, the cell values can be calculated using interpolation.
The interpolation method used is Inverse Distance Weighting (IDW) where the distance function is a k_ring
distance of the grid.

Rasters can be transformed into different formats as part of this process in order to overcome problems with bands
being translated into subdatasets by some GDAL operations. Our recommendation is to specify :code:`GTiff` if you run into problems here.

Raster checkpointing should be enabled to avoid memory issues when working with large rasters. See :doc:`Checkpointing </usage/raster-checkpointing>` for more information.

The reader supports the following options:

* fileExtension - file extension of the raster file (StringType) - default is *.*
* vsizip - if the rasters are zipped files, set this to true (BooleanType)
* resolution - resolution of the output grid (IntegerType)
* sizeInMB - size of subdivided rasters in MB. Must be supplied, must be a positive integer (IntegerType)
* convertToFormat - convert the raster to a different format (StringType)
* combiner - combiner operation to use when converting raster to grid (StringType) - default is mean
* retile - if the rasters are too large they can be re-tiled to smaller tiles (BooleanType)
* tileSize - size of the re-tiled tiles, tiles are always squares of tileSize x tileSize (IntegerType)
Expand All @@ -131,14 +141,19 @@ The reader supports the following options:
.. tabs::
.. code-tab:: py

df = mos.read().format("raster_to_grid")\
.option("fileExtension", "*.tif")\
.option("resolution", "8")\
.option("combiner", "mean")\
.option("retile", "true")\
.option("tileSize", "1000")\
.option("kRingInterpolate", "2")\
df = (
mos.read()
.format("raster_to_grid")
.option("sizeInMB", "16")
.option("convertToFormat", "GTiff")
.option("resolution", "0")
.option("readSubdataset", "true")
.option("subdatasetName", "t2m")
.option("retile", "true")
.option("tileSize", "600")
.option("combiner", "avg")
.load("dbfs:/path/to/raster.tif")
)
df.show()
+--------+--------+------------------+
|band_id |cell_id |cell_value |
Expand All @@ -151,14 +166,17 @@ The reader supports the following options:

.. code-tab:: scala

val df = MosaicContext.read.format("raster_to_grid")
.option("fileExtension", "*.tif")
.option("resolution", "8")
.option("combiner", "mean")
.option("retile", "true")
.option("tileSize", "1000")
.option("kRingInterpolate", "2")
.load("dbfs:/path/to/raster.tif")
val df = MosaicContext.read
.format("raster_to_grid")
.option("sizeInMB", "16")
.option("convertToFormat", "GTiff")
.option("resolution", "0")
.option("readSubdataset", "true")
.option("subdatasetName", "t2m")
.option("retile", "true")
.option("tileSize", "600")
.option("combiner", "avg")
.load("dbfs:/path/to/raster.tif")
df.show()
+--------+--------+------------------+
|band_id |cell_id |cell_value |
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
author = 'Milos Colic, Stuart Lynn, Michael Johns, Robert Whiffin'

# The full version, including alpha/beta/rc tags
release = "v0.4.2"
release = "v0.4.3"


# -- General configuration ---------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ class RasterAsGridReader(sparkSession: SparkSession) extends MosaicDataFrameRead
} else {
lit(config("convertToFormat"))
}

val rasterToGridCombiner = getRasterToGridFunc(config("combiner"))

val loadedDf = retiledDf
Expand Down
Loading