Skip to content

Commit

Permalink
Merge pull request #57 from romi/refractoring_segmentation
Browse files Browse the repository at this point in the history
Refractoring segmentation
  • Loading branch information
jlegrand62 authored Mar 22, 2021
2 parents 1aad407 + 9dec9c5 commit 87a61a5
Showing 1 changed file with 27 additions and 26 deletions.
53 changes: 27 additions & 26 deletions docs/Scanner/explanations/segmentation.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Segmentation of images
===

The segmentation of an image consists in assigning a label to each of its pixels. For the 3d reconstruction of a plant, we need at least the segmentation of the images into 2 classes: *plant* and *backround*. For a reconstruction with semantic labeling of the point cloud, we will need a semantic segmentation of the images giving one label for each organ type (e.g. {*leaf*, *stem*,*pedicel*, *flower*, *fruit*}). Figures shows describe the binary and multi-class segmentations for a virtual plant.
The segmentation of an image consists in assigning a label to each of its pixels. For the 3d reconstruction of a plant, we need at least the segmentation of the images into 2 classes: *plant* and *backround*. For a reconstruction with semantic labeling of the point cloud, we will need a semantic segmentation of the images giving one label for each organ type (e.g. {*leaf*, *stem*,*pedicel*, *flower*, *fruit*}). The figure below shows the binary and multi-class segmentations for a virtual plant.

<figure>
<img src="/assets/images/segmentation/segmentation_ex.png" alt="Binary and multi-class segmentation examples" width="600" />
Expand All @@ -16,9 +16,9 @@ The binary segmentation of an image into *plant* and *background* is performed w
```bash
romi_run_task Masks scan_id --config myconfig.toml
```
with upstream task being *ImagesFilesetExists* when processing the raw RGB images or *Undistorded* when processing images corrected using the intrinsic parameters of the camera. The task takes this set of images as an input and produce either binary mask or real valued maps depending on parameters.
with upstream task being *ImagesFilesetExists* when processing the raw RGB images or *Undistorded* when processing images corrected using the intrinsic parameters of the camera. The task takes this set of images as an input and produce one binary mask for each image.

There are 3 methods available to compute indices for binary segmentation: Excess Green Index, Linear SVM or Vesselness. For each method, we provide an example configuration file in the *Index computation* section.
There are 2 methods available to compute indices for binary segmentation: Excess Green Index and Linear SVM. For each method, we provide an example configuration file in the *Index computation* section.

### Index computation

Expand All @@ -31,11 +31,11 @@ $$
S_{ij}=w_0 R_{ij} + w_1 G_{ij} +w_2 B_{ij}
$$

where $w$ is the *parameter* vector specified in the configuration file. A simple vector, like $w=(0,1,0)$ may be used.
where $w$ is the *parameters* vector specified in the configuration file. A simple vector, like $w=(0,1,0)$ may be used for example.

Alternatively you can train an SVM to learn those weights and the threshold to be provided in the configuration file. For this, we consider you have a sample image and a ground truth binary mask. A ground may be produced using a manual annotation tool like [LabelMe](https://github.com/wkentaro/labelme).
Alternatively, you can train an SVM to learn those weights and the threshold to be provided in the configuration file. For this, we consider you have a sample image and a ground truth binary mask. A ground truth may be produced using a manual annotation tool like [LabelMe](https://github.com/wkentaro/labelme).

Using for example a list of 1000 randomly selected pixels as $X_{train}$ and their corresponding labels as $Y_{train}$, a linear SVM is trained using
Using for example a list of N randomly selected pixels as $X_{train}$ (array of size [N,3]) and their corresponding labels as $Y_{train}$ (array of size N), a linear SVM is trained using
```python
from sklearn import svm

Expand All @@ -54,9 +54,6 @@ upstream_task = "ImagesFilesetExists" # other option "Undistorted"
type = "linear"
parameters = "[0,1,0]"
threshold = 0.5
#Optional arguments
dilation = 0
query = "{\"channel\":\"rgb\"}" #This is optional, necessary when the *ImageFileset* contains multiple channels (typically when it is produced from a virtual scan)
```


Expand Down Expand Up @@ -85,17 +82,9 @@ $$
[Masks]
upstream_task = "ImagesFilesetExists" # other option "Undistorted"
type = "excess_green"
dilation = 0
binarize = true
threshold = 0.2
query = "{\"channel\":\"rgb\"}" #This is optional, necessary when the *ImageFileset* contains multiple channels (typically when it is produced from a virtual scan)
```

### Inversion

For an index I, if you want to use $1-I$ for creating the mask, set *invert* to *True*.


##Multi-class segmentation

The *Segmentation2D* task performs the semantic segmentation of images using a deep neural network (DNN). The command to run this task is:
Expand All @@ -110,24 +99,24 @@ This will produce a series of binary masks, one for each class on which the netw
<figcaption>Generic encoder/decoder architecture for semantic segmentation (U-net).</figcaption>
</figure>

The architecture of the network is inspired from the U-net [ref], with a ResNet encoder [ref]. It constists in encoding and decoding pathways with skip connections between the 2.
The architecture of the network is inspired from the U-net [^1], with a ResNet encoder [^2]. It constists in encoding and decoding pathways with skip connections between the 2. Along the encoding pathways, there is a sequence of convolutions and the image signal is upsampled along the decoding pathway.

The network is trained for segmenting images of a size $(S_x,S_y)$ which is not necessarily the image size of the acquired images. Those parameters *Sx* and *Sy* should be provided in the configuration file. The images will be cropped to $(S_x,S_y)$ before being fed to the DNN and it is then resized to the original size as an output of the task.

[^1]: Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.

[^2]: Zhang, Z., Liu, Q., & Wang, Y. (2018). Road extraction by deep residual u-net. IEEE Geoscience and Remote Sensing Letters, 15(5), 749-753.

####Configuration File

```toml
[Segmentation2D]
upstream_task = "ImageFilesetExists" #Alternatively Undistorted
model_fileset = "ModelFileset"
model_id = "Resnet_896_896_epoch50" # no default value
query = "{\"channel\":\"rgb\"}" # default is an empty dict '{}'
model_id = "Resnetdataset_gl_png_896_896_epoch50" # no default value
Sx = 896
Sy = 896
labels = "[]" # default is empty list to use all trained labels from model
inverted_labels = "[\"background\"]"
threshold = 0.01
```


### DNN model

The neural architecture weights are obtained through training on an annotated dataset (see How to train a DNN for semantic segmentation). Those weights should be stored in the database (at `<database>/models/models`) and the name of the weights file should be provided as the *model_id* parameter in the configuration. You can use our model trained on virtual arabidopsis [here](https://media.romi-project.eu/data/Resnetdataset_gl_png_896_896_epoch50.pt)
Expand All @@ -144,10 +133,22 @@ A binary mask $m$ is produced from the index or from the output of the DNN, *I*,
\end{cases}
\end{equation}

This threshold may be chosen empirically or it may be learnt from annotated data (see linear SVM section).

## Dilation

If the integer *dilation* parameter is non-zero a morphological dilation is apllied to the image using the function [*binary_dilation*](https://scikit-image.org/docs/dev/api/skimage.morphology.html#skimage.morphology.binary_dilation) from the *skimage.morphology* module.
If the integer *dilation* parameter is non-zero a morphological dilation is apllied to the image using the function [*binary_dilation*](https://scikit-image.org/docs/dev/api/skimage.morphology.html#skimage.morphology.binary_dilation) from the *skimage.morphology* module.

The *dilation* parameter sets the number of times *binary_dilation* is iteratively applied. For a faithful reconstruction this parameter should be set to $0$ but in practice you may want to have a coarser point cloud. This is true when your segmentation is not perfect, dilation will fill the holes or when the reconstructed mesh is broken because the pôint cloud is too thin.

## Working with data from the virtual scanner

When working with data generated with the virtual scanner, the *images* folder contains multiple channels corresponding to the various class for which images were generated (*stem*, *flower*, *fruit*, *leaf*, *pedicel*). You have to select the *rgb* channel using the *query* parameter.

####Configuration File
```toml
[Masks]
type = "excess_green"
threshold = 0.2
query = "{\"channel\":\"rgb\"}"
```

0 comments on commit 87a61a5

Please sign in to comment.