-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.rmd
304 lines (243 loc) · 12.4 KB
/
README.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
error = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
<!-- badges: start -->
<!-- badges: end -->
# mscomposer - An R package for creating and analysing multi-spectral image composites
## Authors
Larissa Torney<sup>1</sup>, Michael Blaschek<sup>2</sup>, Darius Görgen<sup>1</sup>
<sup>1</sup>Federal Institute for Geosciences and Natural Resources, Stilleweg 2, 30655 Hannover, Germany
<sup>2</sup>State Authority for Geology, Resources and Mining, Albertstraße 5, 79104 Freiburg, Germany
## About
The `{mscomposer}` package was created by BGR with the single use-case of creating
a composite image of multi-spectral satellite imagery over a given area-of-interest
in mind. This mutli-spectral composite serves than as input to a machine learning
model which is trained based on reference points from the ground. It consists only
of three functions, which streamline this process. While the scope of the package
is thus rather limited, we have put substantial efforts to allow you to adapt the
function to your specific analysis. More specifically, this means that you can use
it with almost any satellite data set and supply custom spectral indices and
thresholds for composite creation. For the model training process you can choose
between several cross-validation routines. However, it is advised to further analyse
the model validity for your study area, for instance via the [CAST](https://hannameyer.github.io/CAST/) package.
This package was created at the BGR - Federal Institute for Geosciences and Natural Resources - in the frame of the BopaBW project ([„Near-surface soil parameters Baden-Wuerttemberg“](https://www.bgr.bund.de/EN/Themen/GG_Fernerkundung/Projekte/laufend/Multispektral/Boden-Stoffgehalte/BOPA_Baden_Wuerttemberg_en.html?nn=1551022)).
## How to cite
Please cite as follows:
Torney, L., Blaschek, M., & Görgen, D. (2023). mscomposer - An R package for creating and analysing multi-spectral image composites (Version 0.0.1). https://doi.org/10.25928/5yx7-x825
## Installation
You can install `{mscomposer}` directly from GitHub with:
```{r, install, eval = FALSE}
remotes::install_github("BGR-Geowissenschaften-Rohstoffe/mscomposer)
```
## Usage
We will showcase the general workflow of `{mscomposer}` based on a small Sentinel-2
image collection and a simulated reference dataset. Note, that the provided
routines rely heavily on functionalities provided by `{gdalcubes}`, `{sf}`,
`{terra}`, and `{caret}` under the hood. For more control over the processing,
experienced users of these packages might consider using them directly to create
their own custom workflow.
## Big-Data processing
In the case your study area covers a large spatio-temporal extent you might
consider parallel processing in the composite creation. In this case be advised
to use `gdalcubes_options()` to enable parallel processing before calling
`compose()`. This way you can use the cores available on your machine to speed
up the composite generation process. Also, in case the original image collection
is quite large and you wish to re-run your analysis with different settings, make
sure to specify the `col` argument in the `compose()` function so that you create
the image collection only once.
## Sentinel-2 Use Case
Within the `{mscomposer}` package we included a small subset of two Sentinel-2
tiles for the year 2018 over an agricultural area in Baden-Württemberg. In this
use-case, we are interested to retrieve a median composite of the visual channels
containing only pixels when we assume that the bare soil is visible. We are going
to use a simple heuristic to identify pixels with an NDVI pixel below a value of
0.3 to represent bare soil pixels.
The data set was originally downloaded from the Copernicus Hub which provides
zip-files. In order for the data to fit into this package, we included only a
very small area between the overlapping tiles T32TNT and T32TMT and flattened
the directory structure so that we only keep those band files we need in the
scope of this tutorial. That also requires that we write our own collection
format. See this [link](https://github.com/gdalcubes/collection_formats) for
pre-defined formats that come with `{gdalcubes}`
### True-Color Composite
In the first step, we will create a true-color-composite over the entire year in
order to get a first impression of the area of interest. For that, we first have
to define a cube view, i.e. a definition of the spatio-temporal extent of our
output data set. We provided a routine called `view()` that allows you to
specify an sf object to automatically derive the bounding box of your study area.
You can specify any valid CRS, but we also allow you to derive a
[LAEA](https://proj.org/operations/projections/laea.html) or
[UTM](https://proj.org/operations/projections/utm.html)
projection automatically from your AOI.
First, we are going to load the libraries required for the processing.
```{r setup}
library(sf)
library(caret)
library(terra)
library(gdalcubes)
library(mscomposer)
```
In the present case, we set the
temporal resolution of our cube view to `"P3M"`, meaning that the intial calculation
will be based on a temporal resolution of three months. Since we also specify a
final reducer, the actual output raster will represent a single layer per band
based on the median of the three-month composites. The spatial resolution of the
visible Sentinel-2 channels is set to 10 meters. We also specify methods
understood by GDAL to aggregate pixels falling into the same time-step as well
a method to resample grid cells to the new output reference grid.
We then provide the input files, format, and view to the `compose()` function.
We also specify an image mask which corresponds to bad quality pixel in the Sentinel-2
[Scene Classification Layer](https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/level-2a/algorithm).
Since we are only interested in the color composite for now, we only include the
three visible channels in the composite generation. Finally, we declare a method
for the final reduction of the temporal dimension and declare the output filename
and directory.
```{r s2-comp-rgb}
outdir <- file.path(tempdir(), "mscomposer")
dir.create(outdir)
collection_path <- file.path(outdir, "s2_collection.sqlite")
s2_files <- list.files("inst/extdata/Sentinel2/", pattern = ".jp2$", full.names = TRUE, recursive = TRUE)
format <- system.file("Sentinel2_L2A_flat.json", package = "mscomposer")
head(basename(s2_files), 5)
aoi <- st_read(system.file("aoi_S2.gpkg", package = "mscomposer"))
cv <- ms_view(
aoi = aoi,
srs = "utm",
dt = "P3M",
dx = 10,
dy = 10,
start = "2018-01-01",
end = "2018-12-31",
agg = "min",
rsmp = "bilinear")
s2_rgb <- ms_compose(
s2_files,
format = format,
view = cv,
col = collection_path,
mask = image_mask("SCL", values=c(1,3,6,8,9,10,11)),
bands = c("B02", "B03", "B04"),
reducers = "median",
prefix = "rgb",
outdir = outdir
)
(s2_rgb <- rast(s2_rgb[1]))
plotRGB(s2_rgb, 3,2,1, stretch = "lin")
```
We can see that the dominant land use type in this area is agriculture. We also
observe some forest areas as well as a village with built-up areas. In the next
code example, we show how to calculate the NDVI and filter valid pixels based on
a NDVI threshold. This way, we wish to include only those pixels in the composite
calculation where we assume that the satellite saw bare soil.
### Bare Soil Composite
We supply the formula of the NDVI and also make sure
that we include all bands necessary for its calculation in the processing.
Additionally, we supply a filtering statement. You can supply your own spectral
indices formula and filters. The only restriction is that the bands you are using
in your index calculation are also present in the data set.
For the NDVI calculation, we need the Near-Infrared channel, which corresponds to
band 8 in the case of Sentinel-2. We include this band in the processing of the
composite, thus we can refer to it in the definition of our spectral index. We
also include a filter based on our index, meaning that only those pixel will be
used for the computation where the expression evaluates to `TRUE`.
```{r s2-comp-soil}
s2_filt <- ms_compose(
s2_files,
format = format,
view = cv,
col = collection_path,
mask = image_mask("SCL", values=c(1,3,6,8,9,10,11)),
bands = c("B02", "B03", "B04", "B08"),
indices = list(NDVI = "(B08-B04)/(B08+B04)"),
filters = list(NDVI = "NDVI < 0.3"),
reducers = "median",
prefix = "soil",
outdir = outdir
)
(s2_filt <- rast(s2_filt[1]))
plotRGB(s2_filt, 3,2,1, stretch = "lin")
```
In the result, we see that we mainly retained valid pixels for agricultural
fields. There are also pixels retained for a larger street, which is
characterized by low NDVI values during the entire year. We also see some remaining
artifacts. potentially originating from clouds or snow cover which we should
take care of in a real-world analysis. While not being perfect,
the above composite can well serve as the input to further analysis, e.g. by
supplying a vector data set of field boundaries to only include pixels which
actually lie on fields. Also, several spectral indices and filters might be
combined to suit the needs of the analysis you are conducting.
### Model Training and Prediction
Continuing with our example, we now load a simulated reference data set that
we are using to train a Random Forest model.
```{r model-setup}
(reference <- system.file("corg-simulated-sample.gpkg",
package = "mscomposer") |>
read_sf())
plot(s2_filt[[1]])
plot(reference, add = TRUE, pch = 3, col = "red")
```
To train a model and use it for spatial prediction, we provide the `raster_predict()`
function. It expects an sf object containing only POINT geometries as input as
well as the specification of a column containing numeric values as the target
variable. Additionally, a multi-layer SpatRaster object is required to extract
the predictor variables at the reference locations.
Then a model is trained based on the outcome variable and the extracted predictors.
You can choose any machine learning model supported by `{caret}` (see [here](https://topepo.github.io/caret/available-models.html)),
but note that you might need to install additional dependencies to use them.
Also, you can chose between simple cross-validation patterns provided via
`trainControl()`'s `method` argument. Here we are going to train a Random Forest
model based on Leave-One-Observation-Out Cross-Validation.
```{r train}
results <- ms_predict(
reference,
colname = "corg",
predictors = s2_filt,
model_method = "rf",
cv_method = "LOOCV"
)
```
As you can see from the above warnings and messages, the function takes
care of reprojecting the sf object in case its CRS differs from that of the
predictor raster. In the above example, there are also some reference locations
for which no valid pixels values are available. Those locations are excluded
before model training. The output of the function is a list with several objects.
We can look at the structure of the list like this:
```{r results-str}
str(results, 1)
```
The first object is the predicted raster dataset while the trained model is the
second object. We also find some accuracy metrics in the third object while the
last object contains those reference locations for which valid pixels could be
extracted.
Let's have a look at the prediction raster. We see that we obtain predicted
values for all those raster cells that did not contain NA values.
```{r vis-prediction}
plot(results$prediction)
```
Next, we can inspect the model and the associated accuracies.
```{r model-accuracy}
print(results$model)
print(results$metrics)
```
We see that the cross-validation results indicated an R-squared value of about
0.41 while for the complete training data set we obtain an R-squared value of
about 0.90. Note, that the package currently does not provide means to compare
the model against a validation dataset, however, since the model is an output
you can validate against a left-out dataset yourself.
The final output is an sf object with valid pixels in the predictor raster.
We include the extracted values per layer as well as the reference variable and
predictions (called `.preds` and `.obs` respectively.)
```{r extracted-values}
results$data
plot(results$data$.obs, results$data$.preds)
```