-
-## Contacts (Maintainers)
-
-* Liang-Chieh Chen, github: [aquariusjay](https://github.com/aquariusjay)
-* YuKun Zhu, github: [yknzhu](https://github.com/YknZhu)
-* George Papandreou, github: [gpapan](https://github.com/gpapan)
-* Hui Hui, github: [huihui-personal](https://github.com/huihui-personal)
-* Maxwell D. Collins, github: [mcollinswisc](https://github.com/mcollinswisc)
-* Ting Liu: github: [tingliu](https://github.com/tingliu)
-
-## Tables of Contents
-
-Demo:
-
-* Colab notebook for off-the-shelf inference.
-
-Running:
-
-* Installation.
-* Running DeepLab on PASCAL VOC 2012 semantic segmentation dataset.
-* Running DeepLab on Cityscapes semantic segmentation dataset.
-* Running DeepLab on ADE20K semantic segmentation dataset.
-
-Models:
-
-* Checkpoints and frozen inference graphs.
-
-Misc:
-
-* Please check FAQ if you have some questions before reporting the issues.
-
-## Getting Help
-
-To get help with issues you may encounter while using the DeepLab Tensorflow
-implementation, create a new question on
-[StackOverflow](https://stackoverflow.com/) with the tag "tensorflow".
-
-Please report bugs (i.e., broken code, not usage questions) to the
-tensorflow/models GitHub [issue
-tracker](https://github.com/tensorflow/models/issues), prefixing the issue name
-with "deeplab".
-
-## License
-
-All the codes in deeplab folder is covered by the [LICENSE](https://github.com/tensorflow/models/blob/master/LICENSE)
-under tensorflow/models. Please refer to the LICENSE for details.
-
-## Change Logs
-
-### March 26, 2020
-* Supported EdgeTPU-DeepLab and EdgeTPU-DeepLab-slim on Cityscapes.
-**Contributor**: Yun Long.
-
-### November 20, 2019
-* Supported MobileNetV3 large and small model variants on Cityscapes.
-**Contributor**: Yukun Zhu.
-
-
-### March 27, 2019
-
-* Supported using different loss weights on different classes during training.
-**Contributor**: Yuwei Yang.
-
-
-### March 26, 2019
-
-* Supported ResNet-v1-18. **Contributor**: Michalis Raptis.
-
-
-### March 6, 2019
-
-* Released the evaluation code (under the `evaluation` folder) for image
-parsing, a.k.a. panoptic segmentation. In particular, the released code supports
-evaluating the parsing results in terms of both the parsing covering and
-panoptic quality metrics. **Contributors**: Maxwell Collins and Ting Liu.
-
-
-### February 6, 2019
-
-* Updated decoder module to exploit multiple low-level features with different
-output_strides.
-
-### December 3, 2018
-
-* Released the MobileNet-v2 checkpoint on ADE20K.
-
-
-### November 19, 2018
-
-* Supported NAS architecture for feature extraction. **Contributor**: Chenxi Liu.
-
-* Supported hard pixel mining during training.
-
-
-### October 1, 2018
-
-* Released MobileNet-v2 depth-multiplier = 0.5 COCO-pretrained checkpoints on
-PASCAL VOC 2012, and Xception-65 COCO pretrained checkpoint (i.e., no PASCAL
-pretrained).
-
-
-### September 5, 2018
-
-* Released Cityscapes pretrained checkpoints with found best dense prediction cell.
-
-
-### May 26, 2018
-
-* Updated ADE20K pretrained checkpoint.
-
-
-### May 18, 2018
-* Added builders for ResNet-v1 and Xception model variants.
-* Added ADE20K support, including colormap and pretrained Xception_65 checkpoint.
-* Fixed a bug on using non-default depth_multiplier for MobileNet-v2.
-
-
-### March 22, 2018
-
-* Released checkpoints using MobileNet-V2 as network backbone and pretrained on
-PASCAL VOC 2012 and Cityscapes.
-
-
-### March 5, 2018
-
-* First release of DeepLab in TensorFlow including deeper Xception network
-backbone. Included checkpoints that have been pretrained on PASCAL VOC 2012
-and Cityscapes.
-
-## References
-
-1. **Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs**
- Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (+ equal
- contribution).
- [[link]](https://arxiv.org/abs/1412.7062). In ICLR, 2015.
-
-2. **DeepLab: Semantic Image Segmentation with Deep Convolutional Nets,**
- **Atrous Convolution, and Fully Connected CRFs**
- Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille (+ equal
- contribution).
- [[link]](http://arxiv.org/abs/1606.00915). TPAMI 2017.
-
-3. **Rethinking Atrous Convolution for Semantic Image Segmentation**
- Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam.
- [[link]](http://arxiv.org/abs/1706.05587). arXiv: 1706.05587, 2017.
-
-4. **Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation**
- Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam.
- [[link]](https://arxiv.org/abs/1802.02611). In ECCV, 2018.
-
-5. **ParseNet: Looking Wider to See Better**
- Wei Liu, Andrew Rabinovich, Alexander C Berg
- [[link]](https://arxiv.org/abs/1506.04579). arXiv:1506.04579, 2015.
-
-6. **Pyramid Scene Parsing Network**
- Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
- [[link]](https://arxiv.org/abs/1612.01105). In CVPR, 2017.
-
-7. **Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate shift**
- Sergey Ioffe, Christian Szegedy
- [[link]](https://arxiv.org/abs/1502.03167). In ICML, 2015.
-
-8. **MobileNetV2: Inverted Residuals and Linear Bottlenecks**
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
- [[link]](https://arxiv.org/abs/1801.04381). In CVPR, 2018.
-
-9. **Xception: Deep Learning with Depthwise Separable Convolutions**
- François Chollet
- [[link]](https://arxiv.org/abs/1610.02357). In CVPR, 2017.
-
-10. **Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry**
- Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai
- [[link]](http://presentations.cocodataset.org/COCO17-Detect-MSRA.pdf). ICCV COCO Challenge
- Workshop, 2017.
-
-11. **Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems**
- M. Abadi, A. Agarwal, et al.
- [[link]](https://arxiv.org/abs/1603.04467). arXiv:1603.04467, 2016.
-
-12. **The Pascal Visual Object Classes Challenge – A Retrospective,**
- Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John
- Winn, and Andrew Zisserma.
- [[link]](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/). IJCV, 2014.
-
-13. **The Cityscapes Dataset for Semantic Urban Scene Understanding**
- Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
- [[link]](https://www.cityscapes-dataset.com/). In CVPR, 2016.
-
-14. **Deep Residual Learning for Image Recognition**
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
- [[link]](https://arxiv.org/abs/1512.03385). In CVPR, 2016.
-
-15. **Progressive Neural Architecture Search**
- Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy.
- [[link]](https://arxiv.org/abs/1712.00559). In ECCV, 2018.
-
-16. **Searching for MobileNetV3**
- Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam.
- [[link]](https://arxiv.org/abs/1905.02244). In ICCV, 2019.
diff --git a/research/deeplab/__init__.py b/research/deeplab/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/deeplab/common.py b/research/deeplab/common.py
deleted file mode 100644
index 928f7176c37..00000000000
--- a/research/deeplab/common.py
+++ /dev/null
@@ -1,295 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Provides flags that are common to scripts.
-
-Common flags from train/eval/vis/export_model.py are collected in this script.
-"""
-import collections
-import copy
-import json
-import tensorflow as tf
-
-flags = tf.app.flags
-
-# Flags for input preprocessing.
-
-flags.DEFINE_integer('min_resize_value', None,
- 'Desired size of the smaller image side.')
-
-flags.DEFINE_integer('max_resize_value', None,
- 'Maximum allowed size of the larger image side.')
-
-flags.DEFINE_integer('resize_factor', None,
- 'Resized dimensions are multiple of factor plus one.')
-
-flags.DEFINE_boolean('keep_aspect_ratio', True,
- 'Keep aspect ratio after resizing or not.')
-
-# Model dependent flags.
-
-flags.DEFINE_integer('logits_kernel_size', 1,
- 'The kernel size for the convolutional kernel that '
- 'generates logits.')
-
-# When using 'mobilent_v2', we set atrous_rates = decoder_output_stride = None.
-# When using 'xception_65' or 'resnet_v1' model variants, we set
-# atrous_rates = [6, 12, 18] (output stride 16) and decoder_output_stride = 4.
-# See core/feature_extractor.py for supported model variants.
-flags.DEFINE_string('model_variant', 'mobilenet_v2', 'DeepLab model variant.')
-
-flags.DEFINE_multi_float('image_pyramid', None,
- 'Input scales for multi-scale feature extraction.')
-
-flags.DEFINE_boolean('add_image_level_feature', True,
- 'Add image level feature.')
-
-flags.DEFINE_list(
- 'image_pooling_crop_size', None,
- 'Image pooling crop size [height, width] used in the ASPP module. When '
- 'value is None, the model performs image pooling with "crop_size". This'
- 'flag is useful when one likes to use different image pooling sizes.')
-
-flags.DEFINE_list(
- 'image_pooling_stride', '1,1',
- 'Image pooling stride [height, width] used in the ASPP image pooling. ')
-
-flags.DEFINE_boolean('aspp_with_batch_norm', True,
- 'Use batch norm parameters for ASPP or not.')
-
-flags.DEFINE_boolean('aspp_with_separable_conv', True,
- 'Use separable convolution for ASPP or not.')
-
-# Defaults to None. Set multi_grid = [1, 2, 4] when using provided
-# 'resnet_v1_{50,101}_beta' checkpoints.
-flags.DEFINE_multi_integer('multi_grid', None,
- 'Employ a hierarchy of atrous rates for ResNet.')
-
-flags.DEFINE_float('depth_multiplier', 1.0,
- 'Multiplier for the depth (number of channels) for all '
- 'convolution ops used in MobileNet.')
-
-flags.DEFINE_integer('divisible_by', None,
- 'An integer that ensures the layer # channels are '
- 'divisible by this value. Used in MobileNet.')
-
-# For `xception_65`, use decoder_output_stride = 4. For `mobilenet_v2`, use
-# decoder_output_stride = None.
-flags.DEFINE_list('decoder_output_stride', None,
- 'Comma-separated list of strings with the number specifying '
- 'output stride of low-level features at each network level.'
- 'Current semantic segmentation implementation assumes at '
- 'most one output stride (i.e., either None or a list with '
- 'only one element.')
-
-flags.DEFINE_boolean('decoder_use_separable_conv', True,
- 'Employ separable convolution for decoder or not.')
-
-flags.DEFINE_enum('merge_method', 'max', ['max', 'avg'],
- 'Scheme to merge multi scale features.')
-
-flags.DEFINE_boolean(
- 'prediction_with_upsampled_logits', True,
- 'When performing prediction, there are two options: (1) bilinear '
- 'upsampling the logits followed by softmax, or (2) softmax followed by '
- 'bilinear upsampling.')
-
-flags.DEFINE_string(
- 'dense_prediction_cell_json',
- '',
- 'A JSON file that specifies the dense prediction cell.')
-
-flags.DEFINE_integer(
- 'nas_stem_output_num_conv_filters', 20,
- 'Number of filters of the stem output tensor in NAS models.')
-
-flags.DEFINE_bool('nas_use_classification_head', False,
- 'Use image classification head for NAS model variants.')
-
-flags.DEFINE_bool('nas_remove_os32_stride', False,
- 'Remove the stride in the output stride 32 branch.')
-
-flags.DEFINE_bool('use_bounded_activation', False,
- 'Whether or not to use bounded activations. Bounded '
- 'activations better lend themselves to quantized inference.')
-
-flags.DEFINE_boolean('aspp_with_concat_projection', True,
- 'ASPP with concat projection.')
-
-flags.DEFINE_boolean('aspp_with_squeeze_and_excitation', False,
- 'ASPP with squeeze and excitation.')
-
-flags.DEFINE_integer('aspp_convs_filters', 256, 'ASPP convolution filters.')
-
-flags.DEFINE_boolean('decoder_use_sum_merge', False,
- 'Decoder uses simply sum merge.')
-
-flags.DEFINE_integer('decoder_filters', 256, 'Decoder filters.')
-
-flags.DEFINE_boolean('decoder_output_is_logits', False,
- 'Use decoder output as logits or not.')
-
-flags.DEFINE_boolean('image_se_uses_qsigmoid', False, 'Use q-sigmoid.')
-
-flags.DEFINE_multi_float(
- 'label_weights', None,
- 'A list of label weights, each element represents the weight for the label '
- 'of its index, for example, label_weights = [0.1, 0.5] means the weight '
- 'for label 0 is 0.1 and the weight for label 1 is 0.5. If set as None, all '
- 'the labels have the same weight 1.0.')
-
-flags.DEFINE_float('batch_norm_decay', 0.9997, 'Batchnorm decay.')
-
-FLAGS = flags.FLAGS
-
-# Constants
-
-# Perform semantic segmentation predictions.
-OUTPUT_TYPE = 'semantic'
-
-# Semantic segmentation item names.
-LABELS_CLASS = 'labels_class'
-IMAGE = 'image'
-HEIGHT = 'height'
-WIDTH = 'width'
-IMAGE_NAME = 'image_name'
-LABEL = 'label'
-ORIGINAL_IMAGE = 'original_image'
-
-# Test set name.
-TEST_SET = 'test'
-
-
-class ModelOptions(
- collections.namedtuple('ModelOptions', [
- 'outputs_to_num_classes',
- 'crop_size',
- 'atrous_rates',
- 'output_stride',
- 'preprocessed_images_dtype',
- 'merge_method',
- 'add_image_level_feature',
- 'image_pooling_crop_size',
- 'image_pooling_stride',
- 'aspp_with_batch_norm',
- 'aspp_with_separable_conv',
- 'multi_grid',
- 'decoder_output_stride',
- 'decoder_use_separable_conv',
- 'logits_kernel_size',
- 'model_variant',
- 'depth_multiplier',
- 'divisible_by',
- 'prediction_with_upsampled_logits',
- 'dense_prediction_cell_config',
- 'nas_architecture_options',
- 'use_bounded_activation',
- 'aspp_with_concat_projection',
- 'aspp_with_squeeze_and_excitation',
- 'aspp_convs_filters',
- 'decoder_use_sum_merge',
- 'decoder_filters',
- 'decoder_output_is_logits',
- 'image_se_uses_qsigmoid',
- 'label_weights',
- 'sync_batch_norm_method',
- 'batch_norm_decay',
- ])):
- """Immutable class to hold model options."""
-
- __slots__ = ()
-
- def __new__(cls,
- outputs_to_num_classes,
- crop_size=None,
- atrous_rates=None,
- output_stride=8,
- preprocessed_images_dtype=tf.float32):
- """Constructor to set default values.
-
- Args:
- outputs_to_num_classes: A dictionary from output type to the number of
- classes. For example, for the task of semantic segmentation with 21
- semantic classes, we would have outputs_to_num_classes['semantic'] = 21.
- crop_size: A tuple [crop_height, crop_width].
- atrous_rates: A list of atrous convolution rates for ASPP.
- output_stride: The ratio of input to output spatial resolution.
- preprocessed_images_dtype: The type after the preprocessing function.
-
- Returns:
- A new ModelOptions instance.
- """
- dense_prediction_cell_config = None
- if FLAGS.dense_prediction_cell_json:
- with tf.gfile.Open(FLAGS.dense_prediction_cell_json, 'r') as f:
- dense_prediction_cell_config = json.load(f)
- decoder_output_stride = None
- if FLAGS.decoder_output_stride:
- decoder_output_stride = [
- int(x) for x in FLAGS.decoder_output_stride]
- if sorted(decoder_output_stride, reverse=True) != decoder_output_stride:
- raise ValueError('Decoder output stride need to be sorted in the '
- 'descending order.')
- image_pooling_crop_size = None
- if FLAGS.image_pooling_crop_size:
- image_pooling_crop_size = [int(x) for x in FLAGS.image_pooling_crop_size]
- image_pooling_stride = [1, 1]
- if FLAGS.image_pooling_stride:
- image_pooling_stride = [int(x) for x in FLAGS.image_pooling_stride]
- label_weights = FLAGS.label_weights
- if label_weights is None:
- label_weights = 1.0
- nas_architecture_options = {
- 'nas_stem_output_num_conv_filters': (
- FLAGS.nas_stem_output_num_conv_filters),
- 'nas_use_classification_head': FLAGS.nas_use_classification_head,
- 'nas_remove_os32_stride': FLAGS.nas_remove_os32_stride,
- }
- return super(ModelOptions, cls).__new__(
- cls, outputs_to_num_classes, crop_size, atrous_rates, output_stride,
- preprocessed_images_dtype,
- FLAGS.merge_method,
- FLAGS.add_image_level_feature,
- image_pooling_crop_size,
- image_pooling_stride,
- FLAGS.aspp_with_batch_norm,
- FLAGS.aspp_with_separable_conv,
- FLAGS.multi_grid,
- decoder_output_stride,
- FLAGS.decoder_use_separable_conv,
- FLAGS.logits_kernel_size,
- FLAGS.model_variant,
- FLAGS.depth_multiplier,
- FLAGS.divisible_by,
- FLAGS.prediction_with_upsampled_logits,
- dense_prediction_cell_config,
- nas_architecture_options,
- FLAGS.use_bounded_activation,
- FLAGS.aspp_with_concat_projection,
- FLAGS.aspp_with_squeeze_and_excitation,
- FLAGS.aspp_convs_filters,
- FLAGS.decoder_use_sum_merge,
- FLAGS.decoder_filters,
- FLAGS.decoder_output_is_logits,
- FLAGS.image_se_uses_qsigmoid,
- label_weights,
- 'None',
- FLAGS.batch_norm_decay)
-
- def __deepcopy__(self, memo):
- return ModelOptions(copy.deepcopy(self.outputs_to_num_classes),
- self.crop_size,
- self.atrous_rates,
- self.output_stride,
- self.preprocessed_images_dtype)
diff --git a/research/deeplab/common_test.py b/research/deeplab/common_test.py
deleted file mode 100644
index 45b64e50e3b..00000000000
--- a/research/deeplab/common_test.py
+++ /dev/null
@@ -1,52 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for common.py."""
-import copy
-
-import tensorflow as tf
-
-from deeplab import common
-
-
-class CommonTest(tf.test.TestCase):
-
- def testOutputsToNumClasses(self):
- num_classes = 21
- model_options = common.ModelOptions(
- outputs_to_num_classes={common.OUTPUT_TYPE: num_classes})
- self.assertEqual(model_options.outputs_to_num_classes[common.OUTPUT_TYPE],
- num_classes)
-
- def testDeepcopy(self):
- num_classes = 21
- model_options = common.ModelOptions(
- outputs_to_num_classes={common.OUTPUT_TYPE: num_classes})
- model_options_new = copy.deepcopy(model_options)
- self.assertEqual((model_options_new.
- outputs_to_num_classes[common.OUTPUT_TYPE]),
- num_classes)
-
- num_classes_new = 22
- model_options_new.outputs_to_num_classes[common.OUTPUT_TYPE] = (
- num_classes_new)
- self.assertEqual(model_options.outputs_to_num_classes[common.OUTPUT_TYPE],
- num_classes)
- self.assertEqual((model_options_new.
- outputs_to_num_classes[common.OUTPUT_TYPE]),
- num_classes_new)
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/convert_to_tflite.py b/research/deeplab/convert_to_tflite.py
deleted file mode 100644
index d23ce9e2337..00000000000
--- a/research/deeplab/convert_to_tflite.py
+++ /dev/null
@@ -1,112 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tools to convert a quantized deeplab model to tflite."""
-
-from absl import app
-from absl import flags
-import numpy as np
-from PIL import Image
-import tensorflow as tf
-
-
-flags.DEFINE_string('quantized_graph_def_path', None,
- 'Path to quantized graphdef.')
-flags.DEFINE_string('output_tflite_path', None, 'Output TFlite model path.')
-flags.DEFINE_string(
- 'input_tensor_name', None,
- 'Input tensor to TFlite model. This usually should be the input tensor to '
- 'model backbone.'
-)
-flags.DEFINE_string(
- 'output_tensor_name', 'ArgMax:0',
- 'Output tensor name of TFlite model. By default we output the raw semantic '
- 'label predictions.'
-)
-flags.DEFINE_string(
- 'test_image_path', None,
- 'Path to an image to test the consistency between input graphdef / '
- 'converted tflite model.'
-)
-
-FLAGS = flags.FLAGS
-
-
-def convert_to_tflite(quantized_graphdef,
- backbone_input_tensor,
- output_tensor):
- """Helper method to convert quantized deeplab model to TFlite."""
- with tf.Graph().as_default() as graph:
- tf.graph_util.import_graph_def(quantized_graphdef, name='')
- sess = tf.compat.v1.Session()
-
- tflite_input = graph.get_tensor_by_name(backbone_input_tensor)
- tflite_output = graph.get_tensor_by_name(output_tensor)
- converter = tf.compat.v1.lite.TFLiteConverter.from_session(
- sess, [tflite_input], [tflite_output])
- converter.inference_type = tf.compat.v1.lite.constants.QUANTIZED_UINT8
- input_arrays = converter.get_input_arrays()
- converter.quantized_input_stats = {input_arrays[0]: (127.5, 127.5)}
- return converter.convert()
-
-
-def check_tflite_consistency(graph_def, tflite_model, image_path):
- """Runs tflite and frozen graph on same input, check their outputs match."""
- # Load tflite model and check input size.
- interpreter = tf.lite.Interpreter(model_content=tflite_model)
- interpreter.allocate_tensors()
- input_details = interpreter.get_input_details()
- output_details = interpreter.get_output_details()
- height, width = input_details[0]['shape'][1:3]
-
- # Prepare input image data.
- with tf.io.gfile.GFile(image_path, 'rb') as f:
- image = Image.open(f)
- image = np.asarray(image.convert('RGB').resize((width, height)))
- image = np.expand_dims(image, 0)
-
- # Output from tflite model.
- interpreter.set_tensor(input_details[0]['index'], image)
- interpreter.invoke()
- output_tflite = interpreter.get_tensor(output_details[0]['index'])
-
- with tf.Graph().as_default():
- tf.graph_util.import_graph_def(graph_def, name='')
- with tf.compat.v1.Session() as sess:
- # Note here the graph will include preprocessing part of the graph
- # (e.g. resize, pad, normalize). Given the input image size is at the
- # crop size (backbone input size), resize / pad should be an identity op.
- output_graph = sess.run(
- FLAGS.output_tensor_name, feed_dict={'ImageTensor:0': image})
-
- print('%.2f%% pixels have matched semantic labels.' % (
- 100 * np.mean(output_graph == output_tflite)))
-
-
-def main(unused_argv):
- with tf.io.gfile.GFile(FLAGS.quantized_graph_def_path, 'rb') as f:
- graph_def = tf.compat.v1.GraphDef.FromString(f.read())
- tflite_model = convert_to_tflite(
- graph_def, FLAGS.input_tensor_name, FLAGS.output_tensor_name)
-
- if FLAGS.output_tflite_path:
- with tf.io.gfile.GFile(FLAGS.output_tflite_path, 'wb') as f:
- f.write(tflite_model)
-
- if FLAGS.test_image_path:
- check_tflite_consistency(graph_def, tflite_model, FLAGS.test_image_path)
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/deeplab/core/__init__.py b/research/deeplab/core/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/deeplab/core/conv2d_ws.py b/research/deeplab/core/conv2d_ws.py
deleted file mode 100644
index 9aaaf33dd3c..00000000000
--- a/research/deeplab/core/conv2d_ws.py
+++ /dev/null
@@ -1,369 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Augment slim.conv2d with optional Weight Standardization (WS).
-
-WS is a normalization method to accelerate micro-batch training. When used with
-Group Normalization and trained with 1 image/GPU, WS is able to match or
-outperform the performances of BN trained with large batch sizes.
-[1] Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille
- Weight Standardization. arXiv:1903.10520
-[2] Lei Huang, Xianglong Liu, Yang Liu, Bo Lang, Dacheng Tao
- Centered Weight Normalization in Accelerating Training of Deep Neural
- Networks. ICCV 2017
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-from tensorflow.contrib import framework as contrib_framework
-from tensorflow.contrib import layers as contrib_layers
-
-from tensorflow.contrib.layers.python.layers import layers
-from tensorflow.contrib.layers.python.layers import utils
-
-
-class Conv2D(tf.keras.layers.Conv2D, tf.layers.Layer):
- """2D convolution layer (e.g. spatial convolution over images).
-
- This layer creates a convolution kernel that is convolved
- (actually cross-correlated) with the layer input to produce a tensor of
- outputs. If `use_bias` is True (and a `bias_initializer` is provided),
- a bias vector is created and added to the outputs. Finally, if
- `activation` is not `None`, it is applied to the outputs as well.
- """
-
- def __init__(self,
- filters,
- kernel_size,
- strides=(1, 1),
- padding='valid',
- data_format='channels_last',
- dilation_rate=(1, 1),
- activation=None,
- use_bias=True,
- kernel_initializer=None,
- bias_initializer=tf.zeros_initializer(),
- kernel_regularizer=None,
- bias_regularizer=None,
- use_weight_standardization=False,
- activity_regularizer=None,
- kernel_constraint=None,
- bias_constraint=None,
- trainable=True,
- name=None,
- **kwargs):
- """Constructs the 2D convolution layer.
-
- Args:
- filters: Integer, the dimensionality of the output space (i.e. the number
- of filters in the convolution).
- kernel_size: An integer or tuple/list of 2 integers, specifying the height
- and width of the 2D convolution window. Can be a single integer to
- specify the same value for all spatial dimensions.
- strides: An integer or tuple/list of 2 integers, specifying the strides of
- the convolution along the height and width. Can be a single integer to
- specify the same value for all spatial dimensions. Specifying any stride
- value != 1 is incompatible with specifying any `dilation_rate` value !=
- 1.
- padding: One of `"valid"` or `"same"` (case-insensitive).
- data_format: A string, one of `channels_last` (default) or
- `channels_first`. The ordering of the dimensions in the inputs.
- `channels_last` corresponds to inputs with shape `(batch, height, width,
- channels)` while `channels_first` corresponds to inputs with shape
- `(batch, channels, height, width)`.
- dilation_rate: An integer or tuple/list of 2 integers, specifying the
- dilation rate to use for dilated convolution. Can be a single integer to
- specify the same value for all spatial dimensions. Currently, specifying
- any `dilation_rate` value != 1 is incompatible with specifying any
- stride value != 1.
- activation: Activation function. Set it to None to maintain a linear
- activation.
- use_bias: Boolean, whether the layer uses a bias.
- kernel_initializer: An initializer for the convolution kernel.
- bias_initializer: An initializer for the bias vector. If None, the default
- initializer will be used.
- kernel_regularizer: Optional regularizer for the convolution kernel.
- bias_regularizer: Optional regularizer for the bias vector.
- use_weight_standardization: Boolean, whether the layer uses weight
- standardization.
- activity_regularizer: Optional regularizer function for the output.
- kernel_constraint: Optional projection function to be applied to the
- kernel after being updated by an `Optimizer` (e.g. used to implement
- norm constraints or value constraints for layer weights). The function
- must take as input the unprojected variable and must return the
- projected variable (which must have the same shape). Constraints are not
- safe to use when doing asynchronous distributed training.
- bias_constraint: Optional projection function to be applied to the bias
- after being updated by an `Optimizer`.
- trainable: Boolean, if `True` also add variables to the graph collection
- `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
- name: A string, the name of the layer.
- **kwargs: Arbitrary keyword arguments passed to tf.keras.layers.Conv2D
- """
-
- super(Conv2D, self).__init__(
- filters=filters,
- kernel_size=kernel_size,
- strides=strides,
- padding=padding,
- data_format=data_format,
- dilation_rate=dilation_rate,
- activation=activation,
- use_bias=use_bias,
- kernel_initializer=kernel_initializer,
- bias_initializer=bias_initializer,
- kernel_regularizer=kernel_regularizer,
- bias_regularizer=bias_regularizer,
- activity_regularizer=activity_regularizer,
- kernel_constraint=kernel_constraint,
- bias_constraint=bias_constraint,
- trainable=trainable,
- name=name,
- **kwargs)
- self.use_weight_standardization = use_weight_standardization
-
- def call(self, inputs):
- if self.use_weight_standardization:
- mean, var = tf.nn.moments(self.kernel, [0, 1, 2], keep_dims=True)
- kernel = (self.kernel - mean) / tf.sqrt(var + 1e-5)
- outputs = self._convolution_op(inputs, kernel)
- else:
- outputs = self._convolution_op(inputs, self.kernel)
-
- if self.use_bias:
- if self.data_format == 'channels_first':
- if self.rank == 1:
- # tf.nn.bias_add does not accept a 1D input tensor.
- bias = tf.reshape(self.bias, (1, self.filters, 1))
- outputs += bias
- else:
- outputs = tf.nn.bias_add(outputs, self.bias, data_format='NCHW')
- else:
- outputs = tf.nn.bias_add(outputs, self.bias, data_format='NHWC')
-
- if self.activation is not None:
- return self.activation(outputs)
- return outputs
-
-
-@contrib_framework.add_arg_scope
-def conv2d(inputs,
- num_outputs,
- kernel_size,
- stride=1,
- padding='SAME',
- data_format=None,
- rate=1,
- activation_fn=tf.nn.relu,
- normalizer_fn=None,
- normalizer_params=None,
- weights_initializer=contrib_layers.xavier_initializer(),
- weights_regularizer=None,
- biases_initializer=tf.zeros_initializer(),
- biases_regularizer=None,
- use_weight_standardization=False,
- reuse=None,
- variables_collections=None,
- outputs_collections=None,
- trainable=True,
- scope=None):
- """Adds a 2D convolution followed by an optional batch_norm layer.
-
- `convolution` creates a variable called `weights`, representing the
- convolutional kernel, that is convolved (actually cross-correlated) with the
- `inputs` to produce a `Tensor` of activations. If a `normalizer_fn` is
- provided (such as `batch_norm`), it is then applied. Otherwise, if
- `normalizer_fn` is None and a `biases_initializer` is provided then a `biases`
- variable would be created and added the activations. Finally, if
- `activation_fn` is not `None`, it is applied to the activations as well.
-
- Performs atrous convolution with input stride/dilation rate equal to `rate`
- if a value > 1 for any dimension of `rate` is specified. In this case
- `stride` values != 1 are not supported.
-
- Args:
- inputs: A Tensor of rank N+2 of shape `[batch_size] + input_spatial_shape +
- [in_channels]` if data_format does not start with "NC" (default), or
- `[batch_size, in_channels] + input_spatial_shape` if data_format starts
- with "NC".
- num_outputs: Integer, the number of output filters.
- kernel_size: A sequence of N positive integers specifying the spatial
- dimensions of the filters. Can be a single integer to specify the same
- value for all spatial dimensions.
- stride: A sequence of N positive integers specifying the stride at which to
- compute output. Can be a single integer to specify the same value for all
- spatial dimensions. Specifying any `stride` value != 1 is incompatible
- with specifying any `rate` value != 1.
- padding: One of `"VALID"` or `"SAME"`.
- data_format: A string or None. Specifies whether the channel dimension of
- the `input` and output is the last dimension (default, or if `data_format`
- does not start with "NC"), or the second dimension (if `data_format`
- starts with "NC"). For N=1, the valid values are "NWC" (default) and
- "NCW". For N=2, the valid values are "NHWC" (default) and "NCHW". For
- N=3, the valid values are "NDHWC" (default) and "NCDHW".
- rate: A sequence of N positive integers specifying the dilation rate to use
- for atrous convolution. Can be a single integer to specify the same value
- for all spatial dimensions. Specifying any `rate` value != 1 is
- incompatible with specifying any `stride` value != 1.
- activation_fn: Activation function. The default value is a ReLU function.
- Explicitly set it to None to skip it and maintain a linear activation.
- normalizer_fn: Normalization function to use instead of `biases`. If
- `normalizer_fn` is provided then `biases_initializer` and
- `biases_regularizer` are ignored and `biases` are not created nor added.
- default set to None for no normalizer function
- normalizer_params: Normalization function parameters.
- weights_initializer: An initializer for the weights.
- weights_regularizer: Optional regularizer for the weights.
- biases_initializer: An initializer for the biases. If None skip biases.
- biases_regularizer: Optional regularizer for the biases.
- use_weight_standardization: Boolean, whether the layer uses weight
- standardization.
- reuse: Whether or not the layer and its variables should be reused. To be
- able to reuse the layer scope must be given.
- variables_collections: Optional list of collections for all the variables or
- a dictionary containing a different list of collection per variable.
- outputs_collections: Collection to add the outputs.
- trainable: If `True` also add variables to the graph collection
- `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
- scope: Optional scope for `variable_scope`.
-
- Returns:
- A tensor representing the output of the operation.
-
- Raises:
- ValueError: If `data_format` is invalid.
- ValueError: Both 'rate' and `stride` are not uniformly 1.
- """
- if data_format not in [None, 'NWC', 'NCW', 'NHWC', 'NCHW', 'NDHWC', 'NCDHW']:
- raise ValueError('Invalid data_format: %r' % (data_format,))
-
- # pylint: disable=protected-access
- layer_variable_getter = layers._build_variable_getter({
- 'bias': 'biases',
- 'kernel': 'weights'
- })
- # pylint: enable=protected-access
- with tf.variable_scope(
- scope, 'Conv', [inputs], reuse=reuse,
- custom_getter=layer_variable_getter) as sc:
- inputs = tf.convert_to_tensor(inputs)
- input_rank = inputs.get_shape().ndims
-
- if input_rank != 4:
- raise ValueError('Convolution expects input with rank %d, got %d' %
- (4, input_rank))
-
- data_format = ('channels_first' if data_format and
- data_format.startswith('NC') else 'channels_last')
- layer = Conv2D(
- filters=num_outputs,
- kernel_size=kernel_size,
- strides=stride,
- padding=padding,
- data_format=data_format,
- dilation_rate=rate,
- activation=None,
- use_bias=not normalizer_fn and biases_initializer,
- kernel_initializer=weights_initializer,
- bias_initializer=biases_initializer,
- kernel_regularizer=weights_regularizer,
- bias_regularizer=biases_regularizer,
- use_weight_standardization=use_weight_standardization,
- activity_regularizer=None,
- trainable=trainable,
- name=sc.name,
- dtype=inputs.dtype.base_dtype,
- _scope=sc,
- _reuse=reuse)
- outputs = layer.apply(inputs)
-
- # Add variables to collections.
- # pylint: disable=protected-access
- layers._add_variable_to_collections(layer.kernel, variables_collections,
- 'weights')
- if layer.use_bias:
- layers._add_variable_to_collections(layer.bias, variables_collections,
- 'biases')
- # pylint: enable=protected-access
- if normalizer_fn is not None:
- normalizer_params = normalizer_params or {}
- outputs = normalizer_fn(outputs, **normalizer_params)
-
- if activation_fn is not None:
- outputs = activation_fn(outputs)
- return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
-
-
-def conv2d_same(inputs, num_outputs, kernel_size, stride, rate=1, scope=None):
- """Strided 2-D convolution with 'SAME' padding.
-
- When stride > 1, then we do explicit zero-padding, followed by conv2d with
- 'VALID' padding.
-
- Note that
-
- net = conv2d_same(inputs, num_outputs, 3, stride=stride)
-
- is equivalent to
-
- net = conv2d(inputs, num_outputs, 3, stride=1, padding='SAME')
- net = subsample(net, factor=stride)
-
- whereas
-
- net = conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')
-
- is different when the input's height or width is even, which is why we add the
- current function. For more details, see ResnetUtilsTest.testConv2DSameEven().
-
- Args:
- inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
- num_outputs: An integer, the number of output filters.
- kernel_size: An int with the kernel_size of the filters.
- stride: An integer, the output stride.
- rate: An integer, rate for atrous convolution.
- scope: Scope.
-
- Returns:
- output: A 4-D tensor of size [batch, height_out, width_out, channels] with
- the convolution output.
- """
- if stride == 1:
- return conv2d(
- inputs,
- num_outputs,
- kernel_size,
- stride=1,
- rate=rate,
- padding='SAME',
- scope=scope)
- else:
- kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
- pad_total = kernel_size_effective - 1
- pad_beg = pad_total // 2
- pad_end = pad_total - pad_beg
- inputs = tf.pad(inputs,
- [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
- return conv2d(
- inputs,
- num_outputs,
- kernel_size,
- stride=stride,
- rate=rate,
- padding='VALID',
- scope=scope)
diff --git a/research/deeplab/core/conv2d_ws_test.py b/research/deeplab/core/conv2d_ws_test.py
deleted file mode 100644
index b6bea85ee03..00000000000
--- a/research/deeplab/core/conv2d_ws_test.py
+++ /dev/null
@@ -1,420 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for conv2d_ws."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-from tensorflow.contrib import framework as contrib_framework
-from tensorflow.contrib import layers as contrib_layers
-from deeplab.core import conv2d_ws
-
-
-class ConvolutionTest(tf.test.TestCase):
-
- def testInvalidShape(self):
- with self.cached_session():
- images_3d = tf.random_uniform((5, 6, 7, 9, 3), seed=1)
- with self.assertRaisesRegexp(
- ValueError, 'Convolution expects input with rank 4, got 5'):
- conv2d_ws.conv2d(images_3d, 32, 3)
-
- def testInvalidDataFormat(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- with self.assertRaisesRegexp(ValueError, 'data_format'):
- conv2d_ws.conv2d(images, 32, 3, data_format='CHWN')
-
- def testCreateConv(self):
- height, width = 7, 9
- with self.cached_session():
- images = np.random.uniform(size=(5, height, width, 4)).astype(np.float32)
- output = conv2d_ws.conv2d(images, 32, [3, 3])
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
- weights = contrib_framework.get_variables_by_name('weights')[0]
- self.assertListEqual(weights.get_shape().as_list(), [3, 3, 4, 32])
- biases = contrib_framework.get_variables_by_name('biases')[0]
- self.assertListEqual(biases.get_shape().as_list(), [32])
-
- def testCreateConvWithWS(self):
- height, width = 7, 9
- with self.cached_session():
- images = np.random.uniform(size=(5, height, width, 4)).astype(np.float32)
- output = conv2d_ws.conv2d(
- images, 32, [3, 3], use_weight_standardization=True)
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
- weights = contrib_framework.get_variables_by_name('weights')[0]
- self.assertListEqual(weights.get_shape().as_list(), [3, 3, 4, 32])
- biases = contrib_framework.get_variables_by_name('biases')[0]
- self.assertListEqual(biases.get_shape().as_list(), [32])
-
- def testCreateConvNCHW(self):
- height, width = 7, 9
- with self.cached_session():
- images = np.random.uniform(size=(5, 4, height, width)).astype(np.float32)
- output = conv2d_ws.conv2d(images, 32, [3, 3], data_format='NCHW')
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), [5, 32, height, width])
- weights = contrib_framework.get_variables_by_name('weights')[0]
- self.assertListEqual(weights.get_shape().as_list(), [3, 3, 4, 32])
- biases = contrib_framework.get_variables_by_name('biases')[0]
- self.assertListEqual(biases.get_shape().as_list(), [32])
-
- def testCreateSquareConv(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- output = conv2d_ws.conv2d(images, 32, 3)
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
-
- def testCreateConvWithTensorShape(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- output = conv2d_ws.conv2d(images, 32, images.get_shape()[1:3])
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
-
- def testCreateFullyConv(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 32), seed=1)
- output = conv2d_ws.conv2d(
- images, 64, images.get_shape()[1:3], padding='VALID')
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), [5, 1, 1, 64])
- biases = contrib_framework.get_variables_by_name('biases')[0]
- self.assertListEqual(biases.get_shape().as_list(), [64])
-
- def testFullyConvWithCustomGetter(self):
- height, width = 7, 9
- with self.cached_session():
- called = [0]
-
- def custom_getter(getter, *args, **kwargs):
- called[0] += 1
- return getter(*args, **kwargs)
-
- with tf.variable_scope('test', custom_getter=custom_getter):
- images = tf.random_uniform((5, height, width, 32), seed=1)
- conv2d_ws.conv2d(images, 64, images.get_shape()[1:3])
- self.assertEqual(called[0], 2) # Custom getter called twice.
-
- def testCreateVerticalConv(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 4), seed=1)
- output = conv2d_ws.conv2d(images, 32, [3, 1])
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
- weights = contrib_framework.get_variables_by_name('weights')[0]
- self.assertListEqual(weights.get_shape().as_list(), [3, 1, 4, 32])
- biases = contrib_framework.get_variables_by_name('biases')[0]
- self.assertListEqual(biases.get_shape().as_list(), [32])
-
- def testCreateHorizontalConv(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 4), seed=1)
- output = conv2d_ws.conv2d(images, 32, [1, 3])
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
- weights = contrib_framework.get_variables_by_name('weights')[0]
- self.assertListEqual(weights.get_shape().as_list(), [1, 3, 4, 32])
-
- def testCreateConvWithStride(self):
- height, width = 6, 8
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- output = conv2d_ws.conv2d(images, 32, [3, 3], stride=2)
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(),
- [5, height / 2, width / 2, 32])
-
- def testCreateConvCreatesWeightsAndBiasesVars(self):
- height, width = 7, 9
- images = tf.random_uniform((5, height, width, 3), seed=1)
- with self.cached_session():
- self.assertFalse(contrib_framework.get_variables('conv1/weights'))
- self.assertFalse(contrib_framework.get_variables('conv1/biases'))
- conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1')
- self.assertTrue(contrib_framework.get_variables('conv1/weights'))
- self.assertTrue(contrib_framework.get_variables('conv1/biases'))
-
- def testCreateConvWithScope(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- output = conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1')
- self.assertEqual(output.op.name, 'conv1/Relu')
-
- def testCreateConvWithCollection(self):
- height, width = 7, 9
- images = tf.random_uniform((5, height, width, 3), seed=1)
- with tf.name_scope('fe'):
- conv = conv2d_ws.conv2d(
- images, 32, [3, 3], outputs_collections='outputs', scope='Conv')
- output_collected = tf.get_collection('outputs')[0]
- self.assertEqual(output_collected.aliases, ['Conv'])
- self.assertEqual(output_collected, conv)
-
- def testCreateConvWithoutActivation(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- output = conv2d_ws.conv2d(images, 32, [3, 3], activation_fn=None)
- self.assertEqual(output.op.name, 'Conv/BiasAdd')
-
- def testCreateConvValid(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- output = conv2d_ws.conv2d(images, 32, [3, 3], padding='VALID')
- self.assertListEqual(output.get_shape().as_list(), [5, 5, 7, 32])
-
- def testCreateConvWithWD(self):
- height, width = 7, 9
- weight_decay = 0.01
- with self.cached_session() as sess:
- images = tf.random_uniform((5, height, width, 3), seed=1)
- regularizer = contrib_layers.l2_regularizer(weight_decay)
- conv2d_ws.conv2d(images, 32, [3, 3], weights_regularizer=regularizer)
- l2_loss = tf.nn.l2_loss(
- contrib_framework.get_variables_by_name('weights')[0])
- wd = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)[0]
- self.assertEqual(wd.op.name, 'Conv/kernel/Regularizer/l2_regularizer')
- sess.run(tf.global_variables_initializer())
- self.assertAlmostEqual(sess.run(wd), weight_decay * l2_loss.eval())
-
- def testCreateConvNoRegularizers(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- conv2d_ws.conv2d(images, 32, [3, 3])
- self.assertEqual(
- tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES), [])
-
- def testReuseVars(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1')
- self.assertEqual(len(contrib_framework.get_variables()), 2)
- conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1', reuse=True)
- self.assertEqual(len(contrib_framework.get_variables()), 2)
-
- def testNonReuseVars(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- conv2d_ws.conv2d(images, 32, [3, 3])
- self.assertEqual(len(contrib_framework.get_variables()), 2)
- conv2d_ws.conv2d(images, 32, [3, 3])
- self.assertEqual(len(contrib_framework.get_variables()), 4)
-
- def testReuseConvWithWD(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 3), seed=1)
- weight_decay = contrib_layers.l2_regularizer(0.01)
- with contrib_framework.arg_scope([conv2d_ws.conv2d],
- weights_regularizer=weight_decay):
- conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1')
- self.assertEqual(len(contrib_framework.get_variables()), 2)
- self.assertEqual(
- len(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)), 1)
- conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1', reuse=True)
- self.assertEqual(len(contrib_framework.get_variables()), 2)
- self.assertEqual(
- len(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)), 1)
-
- def testConvWithBatchNorm(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 32), seed=1)
- with contrib_framework.arg_scope([conv2d_ws.conv2d],
- normalizer_fn=contrib_layers.batch_norm,
- normalizer_params={'decay': 0.9}):
- net = conv2d_ws.conv2d(images, 32, [3, 3])
- net = conv2d_ws.conv2d(net, 32, [3, 3])
- self.assertEqual(len(contrib_framework.get_variables()), 8)
- self.assertEqual(
- len(contrib_framework.get_variables('Conv/BatchNorm')), 3)
- self.assertEqual(
- len(contrib_framework.get_variables('Conv_1/BatchNorm')), 3)
-
- def testReuseConvWithBatchNorm(self):
- height, width = 7, 9
- with self.cached_session():
- images = tf.random_uniform((5, height, width, 32), seed=1)
- with contrib_framework.arg_scope([conv2d_ws.conv2d],
- normalizer_fn=contrib_layers.batch_norm,
- normalizer_params={'decay': 0.9}):
- net = conv2d_ws.conv2d(images, 32, [3, 3], scope='Conv')
- net = conv2d_ws.conv2d(net, 32, [3, 3], scope='Conv', reuse=True)
- self.assertEqual(len(contrib_framework.get_variables()), 4)
- self.assertEqual(
- len(contrib_framework.get_variables('Conv/BatchNorm')), 3)
- self.assertEqual(
- len(contrib_framework.get_variables('Conv_1/BatchNorm')), 0)
-
- def testCreateConvCreatesWeightsAndBiasesVarsWithRateTwo(self):
- height, width = 7, 9
- images = tf.random_uniform((5, height, width, 3), seed=1)
- with self.cached_session():
- self.assertFalse(contrib_framework.get_variables('conv1/weights'))
- self.assertFalse(contrib_framework.get_variables('conv1/biases'))
- conv2d_ws.conv2d(images, 32, [3, 3], rate=2, scope='conv1')
- self.assertTrue(contrib_framework.get_variables('conv1/weights'))
- self.assertTrue(contrib_framework.get_variables('conv1/biases'))
-
- def testOutputSizeWithRateTwoSamePadding(self):
- num_filters = 32
- input_size = [5, 10, 12, 3]
- expected_size = [5, 10, 12, num_filters]
-
- images = tf.random_uniform(input_size, seed=1)
- output = conv2d_ws.conv2d(
- images, num_filters, [3, 3], rate=2, padding='SAME')
- self.assertListEqual(list(output.get_shape().as_list()), expected_size)
- with self.cached_session() as sess:
- sess.run(tf.global_variables_initializer())
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(list(output.eval().shape), expected_size)
-
- def testOutputSizeWithRateTwoValidPadding(self):
- num_filters = 32
- input_size = [5, 10, 12, 3]
- expected_size = [5, 6, 8, num_filters]
-
- images = tf.random_uniform(input_size, seed=1)
- output = conv2d_ws.conv2d(
- images, num_filters, [3, 3], rate=2, padding='VALID')
- self.assertListEqual(list(output.get_shape().as_list()), expected_size)
- with self.cached_session() as sess:
- sess.run(tf.global_variables_initializer())
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(list(output.eval().shape), expected_size)
-
- def testOutputSizeWithRateTwoThreeValidPadding(self):
- num_filters = 32
- input_size = [5, 10, 12, 3]
- expected_size = [5, 6, 6, num_filters]
-
- images = tf.random_uniform(input_size, seed=1)
- output = conv2d_ws.conv2d(
- images, num_filters, [3, 3], rate=[2, 3], padding='VALID')
- self.assertListEqual(list(output.get_shape().as_list()), expected_size)
- with self.cached_session() as sess:
- sess.run(tf.global_variables_initializer())
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(list(output.eval().shape), expected_size)
-
- def testDynamicOutputSizeWithRateOneValidPadding(self):
- num_filters = 32
- input_size = [5, 9, 11, 3]
- expected_size = [None, None, None, num_filters]
- expected_size_dynamic = [5, 7, 9, num_filters]
-
- with self.cached_session():
- images = tf.placeholder(np.float32, [None, None, None, input_size[3]])
- output = conv2d_ws.conv2d(
- images, num_filters, [3, 3], rate=1, padding='VALID')
- tf.global_variables_initializer().run()
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), expected_size)
- eval_output = output.eval({images: np.zeros(input_size, np.float32)})
- self.assertListEqual(list(eval_output.shape), expected_size_dynamic)
-
- def testDynamicOutputSizeWithRateOneValidPaddingNCHW(self):
- if tf.test.is_gpu_available(cuda_only=True):
- num_filters = 32
- input_size = [5, 3, 9, 11]
- expected_size = [None, num_filters, None, None]
- expected_size_dynamic = [5, num_filters, 7, 9]
-
- with self.session(use_gpu=True):
- images = tf.placeholder(np.float32, [None, input_size[1], None, None])
- output = conv2d_ws.conv2d(
- images,
- num_filters, [3, 3],
- rate=1,
- padding='VALID',
- data_format='NCHW')
- tf.global_variables_initializer().run()
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), expected_size)
- eval_output = output.eval({images: np.zeros(input_size, np.float32)})
- self.assertListEqual(list(eval_output.shape), expected_size_dynamic)
-
- def testDynamicOutputSizeWithRateTwoValidPadding(self):
- num_filters = 32
- input_size = [5, 9, 11, 3]
- expected_size = [None, None, None, num_filters]
- expected_size_dynamic = [5, 5, 7, num_filters]
-
- with self.cached_session():
- images = tf.placeholder(np.float32, [None, None, None, input_size[3]])
- output = conv2d_ws.conv2d(
- images, num_filters, [3, 3], rate=2, padding='VALID')
- tf.global_variables_initializer().run()
- self.assertEqual(output.op.name, 'Conv/Relu')
- self.assertListEqual(output.get_shape().as_list(), expected_size)
- eval_output = output.eval({images: np.zeros(input_size, np.float32)})
- self.assertListEqual(list(eval_output.shape), expected_size_dynamic)
-
- def testWithScope(self):
- num_filters = 32
- input_size = [5, 9, 11, 3]
- expected_size = [5, 5, 7, num_filters]
-
- images = tf.random_uniform(input_size, seed=1)
- output = conv2d_ws.conv2d(
- images, num_filters, [3, 3], rate=2, padding='VALID', scope='conv7')
- with self.cached_session() as sess:
- sess.run(tf.global_variables_initializer())
- self.assertEqual(output.op.name, 'conv7/Relu')
- self.assertListEqual(list(output.eval().shape), expected_size)
-
- def testWithScopeWithoutActivation(self):
- num_filters = 32
- input_size = [5, 9, 11, 3]
- expected_size = [5, 5, 7, num_filters]
-
- images = tf.random_uniform(input_size, seed=1)
- output = conv2d_ws.conv2d(
- images,
- num_filters, [3, 3],
- rate=2,
- padding='VALID',
- activation_fn=None,
- scope='conv7')
- with self.cached_session() as sess:
- sess.run(tf.global_variables_initializer())
- self.assertEqual(output.op.name, 'conv7/BiasAdd')
- self.assertListEqual(list(output.eval().shape), expected_size)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/core/dense_prediction_cell.py b/research/deeplab/core/dense_prediction_cell.py
deleted file mode 100644
index 8e32f8e227f..00000000000
--- a/research/deeplab/core/dense_prediction_cell.py
+++ /dev/null
@@ -1,290 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Dense Prediction Cell class that can be evolved in semantic segmentation.
-
-DensePredictionCell is used as a `layer` in semantic segmentation whose
-architecture is determined by the `config`, a dictionary specifying
-the architecture.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-from tensorflow.contrib import slim as contrib_slim
-
-from deeplab.core import utils
-
-slim = contrib_slim
-
-# Local constants.
-_META_ARCHITECTURE_SCOPE = 'meta_architecture'
-_CONCAT_PROJECTION_SCOPE = 'concat_projection'
-_OP = 'op'
-_CONV = 'conv'
-_PYRAMID_POOLING = 'pyramid_pooling'
-_KERNEL = 'kernel'
-_RATE = 'rate'
-_GRID_SIZE = 'grid_size'
-_TARGET_SIZE = 'target_size'
-_INPUT = 'input'
-
-
-def dense_prediction_cell_hparams():
- """DensePredictionCell HParams.
-
- Returns:
- A dictionary of hyper-parameters used for dense prediction cell with keys:
- - reduction_size: Integer, the number of output filters for each operation
- inside the cell.
- - dropout_on_concat_features: Boolean, apply dropout on the concatenated
- features or not.
- - dropout_on_projection_features: Boolean, apply dropout on the projection
- features or not.
- - dropout_keep_prob: Float, when `dropout_on_concat_features' or
- `dropout_on_projection_features' is True, the `keep_prob` value used
- in the dropout operation.
- - concat_channels: Integer, the concatenated features will be
- channel-reduced to `concat_channels` channels.
- - conv_rate_multiplier: Integer, used to multiply the convolution rates.
- This is useful in the case when the output_stride is changed from 16
- to 8, we need to double the convolution rates correspondingly.
- """
- return {
- 'reduction_size': 256,
- 'dropout_on_concat_features': True,
- 'dropout_on_projection_features': False,
- 'dropout_keep_prob': 0.9,
- 'concat_channels': 256,
- 'conv_rate_multiplier': 1,
- }
-
-
-class DensePredictionCell(object):
- """DensePredictionCell class used as a 'layer' in semantic segmentation."""
-
- def __init__(self, config, hparams=None):
- """Initializes the dense prediction cell.
-
- Args:
- config: A dictionary storing the architecture of a dense prediction cell.
- hparams: A dictionary of hyper-parameters, provided by users. This
- dictionary will be used to update the default dictionary returned by
- dense_prediction_cell_hparams().
-
- Raises:
- ValueError: If `conv_rate_multiplier` has value < 1.
- """
- self.hparams = dense_prediction_cell_hparams()
- if hparams is not None:
- self.hparams.update(hparams)
- self.config = config
-
- # Check values in hparams are valid or not.
- if self.hparams['conv_rate_multiplier'] < 1:
- raise ValueError('conv_rate_multiplier cannot have value < 1.')
-
- def _get_pyramid_pooling_arguments(
- self, crop_size, output_stride, image_grid, image_pooling_crop_size=None):
- """Gets arguments for pyramid pooling.
-
- Args:
- crop_size: A list of two integers, [crop_height, crop_width] specifying
- whole patch crop size.
- output_stride: Integer, output stride value for extracted features.
- image_grid: A list of two integers, [image_grid_height, image_grid_width],
- specifying the grid size of how the pyramid pooling will be performed.
- image_pooling_crop_size: A list of two integers, [crop_height, crop_width]
- specifying the crop size for image pooling operations. Note that we
- decouple whole patch crop_size and image_pooling_crop_size as one could
- perform the image_pooling with different crop sizes.
-
- Returns:
- A list of (resize_value, pooled_kernel)
- """
- resize_height = utils.scale_dimension(crop_size[0], 1. / output_stride)
- resize_width = utils.scale_dimension(crop_size[1], 1. / output_stride)
- # If image_pooling_crop_size is not specified, use crop_size.
- if image_pooling_crop_size is None:
- image_pooling_crop_size = crop_size
- pooled_height = utils.scale_dimension(
- image_pooling_crop_size[0], 1. / (output_stride * image_grid[0]))
- pooled_width = utils.scale_dimension(
- image_pooling_crop_size[1], 1. / (output_stride * image_grid[1]))
- return ([resize_height, resize_width], [pooled_height, pooled_width])
-
- def _parse_operation(self, config, crop_size, output_stride,
- image_pooling_crop_size=None):
- """Parses one operation.
-
- When 'operation' is 'pyramid_pooling', we compute the required
- hyper-parameters and save in config.
-
- Args:
- config: A dictionary storing required hyper-parameters for one
- operation.
- crop_size: A list of two integers, [crop_height, crop_width] specifying
- whole patch crop size.
- output_stride: Integer, output stride value for extracted features.
- image_pooling_crop_size: A list of two integers, [crop_height, crop_width]
- specifying the crop size for image pooling operations. Note that we
- decouple whole patch crop_size and image_pooling_crop_size as one could
- perform the image_pooling with different crop sizes.
-
- Returns:
- A dictionary stores the related information for the operation.
- """
- if config[_OP] == _PYRAMID_POOLING:
- (config[_TARGET_SIZE],
- config[_KERNEL]) = self._get_pyramid_pooling_arguments(
- crop_size=crop_size,
- output_stride=output_stride,
- image_grid=config[_GRID_SIZE],
- image_pooling_crop_size=image_pooling_crop_size)
-
- return config
-
- def build_cell(self,
- features,
- output_stride=16,
- crop_size=None,
- image_pooling_crop_size=None,
- weight_decay=0.00004,
- reuse=None,
- is_training=False,
- fine_tune_batch_norm=False,
- scope=None):
- """Builds the dense prediction cell based on the config.
-
- Args:
- features: Input feature map of size [batch, height, width, channels].
- output_stride: Int, output stride at which the features were extracted.
- crop_size: A list [crop_height, crop_width], determining the input
- features resolution.
- image_pooling_crop_size: A list of two integers, [crop_height, crop_width]
- specifying the crop size for image pooling operations. Note that we
- decouple whole patch crop_size and image_pooling_crop_size as one could
- perform the image_pooling with different crop sizes.
- weight_decay: Float, the weight decay for model variables.
- reuse: Reuse the model variables or not.
- is_training: Boolean, is training or not.
- fine_tune_batch_norm: Boolean, fine-tuning batch norm parameters or not.
- scope: Optional string, specifying the variable scope.
-
- Returns:
- Features after passing through the constructed dense prediction cell with
- shape = [batch, height, width, channels] where channels are determined
- by `reduction_size` returned by dense_prediction_cell_hparams().
-
- Raises:
- ValueError: Use Convolution with kernel size not equal to 1x1 or 3x3 or
- the operation is not recognized.
- """
- batch_norm_params = {
- 'is_training': is_training and fine_tune_batch_norm,
- 'decay': 0.9997,
- 'epsilon': 1e-5,
- 'scale': True,
- }
- hparams = self.hparams
- with slim.arg_scope(
- [slim.conv2d, slim.separable_conv2d],
- weights_regularizer=slim.l2_regularizer(weight_decay),
- activation_fn=tf.nn.relu,
- normalizer_fn=slim.batch_norm,
- padding='SAME',
- stride=1,
- reuse=reuse):
- with slim.arg_scope([slim.batch_norm], **batch_norm_params):
- with tf.variable_scope(scope, _META_ARCHITECTURE_SCOPE, [features]):
- depth = hparams['reduction_size']
- branch_logits = []
- for i, current_config in enumerate(self.config):
- scope = 'branch%d' % i
- current_config = self._parse_operation(
- config=current_config,
- crop_size=crop_size,
- output_stride=output_stride,
- image_pooling_crop_size=image_pooling_crop_size)
- tf.logging.info(current_config)
- if current_config[_INPUT] < 0:
- operation_input = features
- else:
- operation_input = branch_logits[current_config[_INPUT]]
- if current_config[_OP] == _CONV:
- if current_config[_KERNEL] == [1, 1] or current_config[
- _KERNEL] == 1:
- branch_logits.append(
- slim.conv2d(operation_input, depth, 1, scope=scope))
- else:
- conv_rate = [r * hparams['conv_rate_multiplier']
- for r in current_config[_RATE]]
- branch_logits.append(
- utils.split_separable_conv2d(
- operation_input,
- filters=depth,
- kernel_size=current_config[_KERNEL],
- rate=conv_rate,
- weight_decay=weight_decay,
- scope=scope))
- elif current_config[_OP] == _PYRAMID_POOLING:
- pooled_features = slim.avg_pool2d(
- operation_input,
- kernel_size=current_config[_KERNEL],
- stride=[1, 1],
- padding='VALID')
- pooled_features = slim.conv2d(
- pooled_features,
- depth,
- 1,
- scope=scope)
- pooled_features = tf.image.resize_bilinear(
- pooled_features,
- current_config[_TARGET_SIZE],
- align_corners=True)
- # Set shape for resize_height/resize_width if they are not Tensor.
- resize_height = current_config[_TARGET_SIZE][0]
- resize_width = current_config[_TARGET_SIZE][1]
- if isinstance(resize_height, tf.Tensor):
- resize_height = None
- if isinstance(resize_width, tf.Tensor):
- resize_width = None
- pooled_features.set_shape(
- [None, resize_height, resize_width, depth])
- branch_logits.append(pooled_features)
- else:
- raise ValueError('Unrecognized operation.')
- # Merge branch logits.
- concat_logits = tf.concat(branch_logits, 3)
- if self.hparams['dropout_on_concat_features']:
- concat_logits = slim.dropout(
- concat_logits,
- keep_prob=self.hparams['dropout_keep_prob'],
- is_training=is_training,
- scope=_CONCAT_PROJECTION_SCOPE + '_dropout')
- concat_logits = slim.conv2d(concat_logits,
- self.hparams['concat_channels'],
- 1,
- scope=_CONCAT_PROJECTION_SCOPE)
- if self.hparams['dropout_on_projection_features']:
- concat_logits = slim.dropout(
- concat_logits,
- keep_prob=self.hparams['dropout_keep_prob'],
- is_training=is_training,
- scope=_CONCAT_PROJECTION_SCOPE + '_dropout')
- return concat_logits
diff --git a/research/deeplab/core/dense_prediction_cell_branch5_top1_cityscapes.json b/research/deeplab/core/dense_prediction_cell_branch5_top1_cityscapes.json
deleted file mode 100644
index 12b093d07d1..00000000000
--- a/research/deeplab/core/dense_prediction_cell_branch5_top1_cityscapes.json
+++ /dev/null
@@ -1 +0,0 @@
-[{"kernel": 3, "rate": [1, 6], "op": "conv", "input": -1}, {"kernel": 3, "rate": [18, 15], "op": "conv", "input": 0}, {"kernel": 3, "rate": [6, 3], "op": "conv", "input": 1}, {"kernel": 3, "rate": [1, 1], "op": "conv", "input": 0}, {"kernel": 3, "rate": [6, 21], "op": "conv", "input": 0}]
\ No newline at end of file
diff --git a/research/deeplab/core/dense_prediction_cell_test.py b/research/deeplab/core/dense_prediction_cell_test.py
deleted file mode 100644
index 1396a73626d..00000000000
--- a/research/deeplab/core/dense_prediction_cell_test.py
+++ /dev/null
@@ -1,136 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for dense_prediction_cell."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from deeplab.core import dense_prediction_cell
-
-
-class DensePredictionCellTest(tf.test.TestCase):
-
- def setUp(self):
- self.segmentation_layer = dense_prediction_cell.DensePredictionCell(
- config=[
- {
- dense_prediction_cell._INPUT: -1,
- dense_prediction_cell._OP: dense_prediction_cell._CONV,
- dense_prediction_cell._KERNEL: 1,
- },
- {
- dense_prediction_cell._INPUT: 0,
- dense_prediction_cell._OP: dense_prediction_cell._CONV,
- dense_prediction_cell._KERNEL: 3,
- dense_prediction_cell._RATE: [1, 3],
- },
- {
- dense_prediction_cell._INPUT: 1,
- dense_prediction_cell._OP: (
- dense_prediction_cell._PYRAMID_POOLING),
- dense_prediction_cell._GRID_SIZE: [1, 2],
- },
- ],
- hparams={'conv_rate_multiplier': 2})
-
- def testPyramidPoolingArguments(self):
- features_size, pooled_kernel = (
- self.segmentation_layer._get_pyramid_pooling_arguments(
- crop_size=[513, 513],
- output_stride=16,
- image_grid=[4, 4]))
- self.assertListEqual(features_size, [33, 33])
- self.assertListEqual(pooled_kernel, [9, 9])
-
- def testPyramidPoolingArgumentsWithImageGrid1x1(self):
- features_size, pooled_kernel = (
- self.segmentation_layer._get_pyramid_pooling_arguments(
- crop_size=[257, 257],
- output_stride=16,
- image_grid=[1, 1]))
- self.assertListEqual(features_size, [17, 17])
- self.assertListEqual(pooled_kernel, [17, 17])
-
- def testParseOperationStringWithConv1x1(self):
- operation = self.segmentation_layer._parse_operation(
- config={
- dense_prediction_cell._OP: dense_prediction_cell._CONV,
- dense_prediction_cell._KERNEL: [1, 1],
- },
- crop_size=[513, 513], output_stride=16)
- self.assertEqual(operation[dense_prediction_cell._OP],
- dense_prediction_cell._CONV)
- self.assertListEqual(operation[dense_prediction_cell._KERNEL], [1, 1])
-
- def testParseOperationStringWithConv3x3(self):
- operation = self.segmentation_layer._parse_operation(
- config={
- dense_prediction_cell._OP: dense_prediction_cell._CONV,
- dense_prediction_cell._KERNEL: [3, 3],
- dense_prediction_cell._RATE: [9, 6],
- },
- crop_size=[513, 513], output_stride=16)
- self.assertEqual(operation[dense_prediction_cell._OP],
- dense_prediction_cell._CONV)
- self.assertListEqual(operation[dense_prediction_cell._KERNEL], [3, 3])
- self.assertEqual(operation[dense_prediction_cell._RATE], [9, 6])
-
- def testParseOperationStringWithPyramidPooling2x2(self):
- operation = self.segmentation_layer._parse_operation(
- config={
- dense_prediction_cell._OP: dense_prediction_cell._PYRAMID_POOLING,
- dense_prediction_cell._GRID_SIZE: [2, 2],
- },
- crop_size=[513, 513],
- output_stride=16)
- self.assertEqual(operation[dense_prediction_cell._OP],
- dense_prediction_cell._PYRAMID_POOLING)
- # The feature maps of size [33, 33] should be covered by 2x2 kernels with
- # size [17, 17].
- self.assertListEqual(
- operation[dense_prediction_cell._TARGET_SIZE], [33, 33])
- self.assertListEqual(operation[dense_prediction_cell._KERNEL], [17, 17])
-
- def testBuildCell(self):
- with self.test_session(graph=tf.Graph()) as sess:
- features = tf.random_normal([2, 33, 33, 5])
- concat_logits = self.segmentation_layer.build_cell(
- features,
- output_stride=8,
- crop_size=[257, 257])
- sess.run(tf.global_variables_initializer())
- concat_logits = sess.run(concat_logits)
- self.assertTrue(concat_logits.any())
-
- def testBuildCellWithImagePoolingCropSize(self):
- with self.test_session(graph=tf.Graph()) as sess:
- features = tf.random_normal([2, 33, 33, 5])
- concat_logits = self.segmentation_layer.build_cell(
- features,
- output_stride=8,
- crop_size=[257, 257],
- image_pooling_crop_size=[129, 129])
- sess.run(tf.global_variables_initializer())
- concat_logits = sess.run(concat_logits)
- self.assertTrue(concat_logits.any())
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/core/feature_extractor.py b/research/deeplab/core/feature_extractor.py
deleted file mode 100644
index 553bd9b6a73..00000000000
--- a/research/deeplab/core/feature_extractor.py
+++ /dev/null
@@ -1,711 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Extracts features for different models."""
-import copy
-import functools
-
-import tensorflow.compat.v1 as tf
-from tensorflow.contrib import slim as contrib_slim
-
-from deeplab.core import nas_network
-from deeplab.core import resnet_v1_beta
-from deeplab.core import xception
-from nets.mobilenet import conv_blocks
-from nets.mobilenet import mobilenet
-from nets.mobilenet import mobilenet_v2
-from nets.mobilenet import mobilenet_v3
-
-slim = contrib_slim
-
-# Default end point for MobileNetv2 (one-based indexing).
-_MOBILENET_V2_FINAL_ENDPOINT = 'layer_18'
-# Default end point for MobileNetv3.
-_MOBILENET_V3_LARGE_FINAL_ENDPOINT = 'layer_17'
-_MOBILENET_V3_SMALL_FINAL_ENDPOINT = 'layer_13'
-# Default end point for EdgeTPU Mobilenet.
-_MOBILENET_EDGETPU = 'layer_24'
-
-
-def _mobilenet_v2(net,
- depth_multiplier,
- output_stride,
- conv_defs=None,
- divisible_by=None,
- reuse=None,
- scope=None,
- final_endpoint=None):
- """Auxiliary function to add support for 'reuse' to mobilenet_v2.
-
- Args:
- net: Input tensor of shape [batch_size, height, width, channels].
- depth_multiplier: Float multiplier for the depth (number of channels)
- for all convolution ops. The value must be greater than zero. Typical
- usage will be to set this value in (0, 1) to reduce the number of
- parameters or computation cost of the model.
- output_stride: An integer that specifies the requested ratio of input to
- output spatial resolution. If not None, then we invoke atrous convolution
- if necessary to prevent the network from reducing the spatial resolution
- of the activation maps. Allowed values are 8 (accurate fully convolutional
- mode), 16 (fast fully convolutional mode), 32 (classification mode).
- conv_defs: MobileNet con def.
- divisible_by: None (use default setting) or an integer that ensures all
- layers # channels will be divisible by this number. Used in MobileNet.
- reuse: Reuse model variables.
- scope: Optional variable scope.
- final_endpoint: The endpoint to construct the network up to.
-
- Returns:
- Features extracted by MobileNetv2.
- """
- if divisible_by is None:
- divisible_by = 8 if depth_multiplier == 1.0 else 1
- if conv_defs is None:
- conv_defs = mobilenet_v2.V2_DEF
- with tf.variable_scope(
- scope, 'MobilenetV2', [net], reuse=reuse) as scope:
- return mobilenet_v2.mobilenet_base(
- net,
- conv_defs=conv_defs,
- depth_multiplier=depth_multiplier,
- min_depth=8 if depth_multiplier == 1.0 else 1,
- divisible_by=divisible_by,
- final_endpoint=final_endpoint or _MOBILENET_V2_FINAL_ENDPOINT,
- output_stride=output_stride,
- scope=scope)
-
-
-def _mobilenet_v3(net,
- depth_multiplier,
- output_stride,
- conv_defs=None,
- divisible_by=None,
- reuse=None,
- scope=None,
- final_endpoint=None):
- """Auxiliary function to build mobilenet v3.
-
- Args:
- net: Input tensor of shape [batch_size, height, width, channels].
- depth_multiplier: Float multiplier for the depth (number of channels)
- for all convolution ops. The value must be greater than zero. Typical
- usage will be to set this value in (0, 1) to reduce the number of
- parameters or computation cost of the model.
- output_stride: An integer that specifies the requested ratio of input to
- output spatial resolution. If not None, then we invoke atrous convolution
- if necessary to prevent the network from reducing the spatial resolution
- of the activation maps. Allowed values are 8 (accurate fully convolutional
- mode), 16 (fast fully convolutional mode), 32 (classification mode).
- conv_defs: A list of ConvDef namedtuples specifying the net architecture.
- divisible_by: None (use default setting) or an integer that ensures all
- layers # channels will be divisible by this number. Used in MobileNet.
- reuse: Reuse model variables.
- scope: Optional variable scope.
- final_endpoint: The endpoint to construct the network up to.
-
- Returns:
- net: The output tensor.
- end_points: A set of activations for external use.
-
- Raises:
- ValueError: If conv_defs or final_endpoint is not specified.
- """
- del divisible_by
- with tf.variable_scope(
- scope, 'MobilenetV3', [net], reuse=reuse) as scope:
- if conv_defs is None:
- raise ValueError('conv_defs must be specified for mobilenet v3.')
- if final_endpoint is None:
- raise ValueError('Final endpoint must be specified for mobilenet v3.')
- net, end_points = mobilenet_v3.mobilenet_base(
- net,
- depth_multiplier=depth_multiplier,
- conv_defs=conv_defs,
- output_stride=output_stride,
- final_endpoint=final_endpoint,
- scope=scope)
-
- return net, end_points
-
-
-def mobilenet_v3_large_seg(net,
- depth_multiplier,
- output_stride,
- divisible_by=None,
- reuse=None,
- scope=None,
- final_endpoint=None):
- """Final mobilenet v3 large model for segmentation task."""
- del divisible_by
- del final_endpoint
- conv_defs = copy.deepcopy(mobilenet_v3.V3_LARGE)
-
- # Reduce the filters by a factor of 2 in the last block.
- for layer, expansion in [(13, 336), (14, 480), (15, 480), (16, None)]:
- conv_defs['spec'][layer].params['num_outputs'] /= 2
- # Update expansion size
- if expansion is not None:
- factor = expansion / conv_defs['spec'][layer - 1].params['num_outputs']
- conv_defs['spec'][layer].params[
- 'expansion_size'] = mobilenet_v3.expand_input(factor)
-
- return _mobilenet_v3(
- net,
- depth_multiplier=depth_multiplier,
- output_stride=output_stride,
- divisible_by=8,
- conv_defs=conv_defs,
- reuse=reuse,
- scope=scope,
- final_endpoint=_MOBILENET_V3_LARGE_FINAL_ENDPOINT)
-
-
-def mobilenet_edgetpu(net,
- depth_multiplier,
- output_stride,
- divisible_by=None,
- reuse=None,
- scope=None,
- final_endpoint=None):
- """EdgeTPU version of mobilenet model for segmentation task."""
- del divisible_by
- del final_endpoint
- conv_defs = copy.deepcopy(mobilenet_v3.V3_EDGETPU)
-
- return _mobilenet_v3(
- net,
- depth_multiplier=depth_multiplier,
- output_stride=output_stride,
- divisible_by=8,
- conv_defs=conv_defs,
- reuse=reuse,
- scope=scope, # the scope is 'MobilenetEdgeTPU'
- final_endpoint=_MOBILENET_EDGETPU)
-
-
-def mobilenet_v3_small_seg(net,
- depth_multiplier,
- output_stride,
- divisible_by=None,
- reuse=None,
- scope=None,
- final_endpoint=None):
- """Final mobilenet v3 small model for segmentation task."""
- del divisible_by
- del final_endpoint
- conv_defs = copy.deepcopy(mobilenet_v3.V3_SMALL)
-
- # Reduce the filters by a factor of 2 in the last block.
- for layer, expansion in [(9, 144), (10, 288), (11, 288), (12, None)]:
- conv_defs['spec'][layer].params['num_outputs'] /= 2
- # Update expansion size
- if expansion is not None:
- factor = expansion / conv_defs['spec'][layer - 1].params['num_outputs']
- conv_defs['spec'][layer].params[
- 'expansion_size'] = mobilenet_v3.expand_input(factor)
-
- return _mobilenet_v3(
- net,
- depth_multiplier=depth_multiplier,
- output_stride=output_stride,
- divisible_by=8,
- conv_defs=conv_defs,
- reuse=reuse,
- scope=scope,
- final_endpoint=_MOBILENET_V3_SMALL_FINAL_ENDPOINT)
-
-
-# A map from network name to network function.
-networks_map = {
- 'mobilenet_v2': _mobilenet_v2,
- 'mobilenet_edgetpu': mobilenet_edgetpu,
- 'mobilenet_v3_large_seg': mobilenet_v3_large_seg,
- 'mobilenet_v3_small_seg': mobilenet_v3_small_seg,
- 'resnet_v1_18': resnet_v1_beta.resnet_v1_18,
- 'resnet_v1_18_beta': resnet_v1_beta.resnet_v1_18_beta,
- 'resnet_v1_50': resnet_v1_beta.resnet_v1_50,
- 'resnet_v1_50_beta': resnet_v1_beta.resnet_v1_50_beta,
- 'resnet_v1_101': resnet_v1_beta.resnet_v1_101,
- 'resnet_v1_101_beta': resnet_v1_beta.resnet_v1_101_beta,
- 'xception_41': xception.xception_41,
- 'xception_65': xception.xception_65,
- 'xception_71': xception.xception_71,
- 'nas_pnasnet': nas_network.pnasnet,
- 'nas_hnasnet': nas_network.hnasnet,
-}
-
-
-def mobilenet_v2_arg_scope(is_training=True,
- weight_decay=0.00004,
- stddev=0.09,
- activation=tf.nn.relu6,
- bn_decay=0.997,
- bn_epsilon=None,
- bn_renorm=None):
- """Defines the default MobilenetV2 arg scope.
-
- Args:
- is_training: Whether or not we're training the model. If this is set to None
- is_training parameter in batch_norm is not set. Please note that this also
- sets the is_training parameter in dropout to None.
- weight_decay: The weight decay to use for regularizing the model.
- stddev: Standard deviation for initialization, if negative uses xavier.
- activation: If True, a modified activation is used (initialized ~ReLU6).
- bn_decay: decay for the batch norm moving averages.
- bn_epsilon: batch normalization epsilon.
- bn_renorm: whether to use batchnorm renormalization
-
- Returns:
- An `arg_scope` to use for the mobilenet v1 model.
- """
- batch_norm_params = {
- 'center': True,
- 'scale': True,
- 'decay': bn_decay,
- }
- if bn_epsilon is not None:
- batch_norm_params['epsilon'] = bn_epsilon
- if is_training is not None:
- batch_norm_params['is_training'] = is_training
- if bn_renorm is not None:
- batch_norm_params['renorm'] = bn_renorm
- dropout_params = {}
- if is_training is not None:
- dropout_params['is_training'] = is_training
-
- instance_norm_params = {
- 'center': True,
- 'scale': True,
- 'epsilon': 0.001,
- }
-
- if stddev < 0:
- weight_intitializer = slim.initializers.xavier_initializer()
- else:
- weight_intitializer = tf.truncated_normal_initializer(stddev=stddev)
-
- # Set weight_decay for weights in Conv and FC layers.
- with slim.arg_scope(
- [slim.conv2d, slim.fully_connected, slim.separable_conv2d],
- weights_initializer=weight_intitializer,
- activation_fn=activation,
- normalizer_fn=slim.batch_norm), \
- slim.arg_scope(
- [conv_blocks.expanded_conv], normalizer_fn=slim.batch_norm), \
- slim.arg_scope([mobilenet.apply_activation], activation_fn=activation),\
- slim.arg_scope([slim.batch_norm], **batch_norm_params), \
- slim.arg_scope([mobilenet.mobilenet_base, mobilenet.mobilenet],
- is_training=is_training),\
- slim.arg_scope([slim.dropout], **dropout_params), \
- slim.arg_scope([slim.instance_norm], **instance_norm_params), \
- slim.arg_scope([slim.conv2d], \
- weights_regularizer=slim.l2_regularizer(weight_decay)), \
- slim.arg_scope([slim.separable_conv2d], weights_regularizer=None), \
- slim.arg_scope([slim.conv2d, slim.separable_conv2d], padding='SAME') as s:
- return s
-
-
-# A map from network name to network arg scope.
-arg_scopes_map = {
- 'mobilenet_v2': mobilenet_v2.training_scope,
- 'mobilenet_edgetpu': mobilenet_v2_arg_scope,
- 'mobilenet_v3_large_seg': mobilenet_v2_arg_scope,
- 'mobilenet_v3_small_seg': mobilenet_v2_arg_scope,
- 'resnet_v1_18': resnet_v1_beta.resnet_arg_scope,
- 'resnet_v1_18_beta': resnet_v1_beta.resnet_arg_scope,
- 'resnet_v1_50': resnet_v1_beta.resnet_arg_scope,
- 'resnet_v1_50_beta': resnet_v1_beta.resnet_arg_scope,
- 'resnet_v1_101': resnet_v1_beta.resnet_arg_scope,
- 'resnet_v1_101_beta': resnet_v1_beta.resnet_arg_scope,
- 'xception_41': xception.xception_arg_scope,
- 'xception_65': xception.xception_arg_scope,
- 'xception_71': xception.xception_arg_scope,
- 'nas_pnasnet': nas_network.nas_arg_scope,
- 'nas_hnasnet': nas_network.nas_arg_scope,
-}
-
-# Names for end point features.
-DECODER_END_POINTS = 'decoder_end_points'
-
-# A dictionary from network name to a map of end point features.
-networks_to_feature_maps = {
- 'mobilenet_v2': {
- DECODER_END_POINTS: {
- 4: ['layer_4/depthwise_output'],
- 8: ['layer_7/depthwise_output'],
- 16: ['layer_14/depthwise_output'],
- },
- },
- 'mobilenet_v3_large_seg': {
- DECODER_END_POINTS: {
- 4: ['layer_4/depthwise_output'],
- 8: ['layer_7/depthwise_output'],
- 16: ['layer_13/depthwise_output'],
- },
- },
- 'mobilenet_v3_small_seg': {
- DECODER_END_POINTS: {
- 4: ['layer_2/depthwise_output'],
- 8: ['layer_4/depthwise_output'],
- 16: ['layer_9/depthwise_output'],
- },
- },
- 'resnet_v1_18': {
- DECODER_END_POINTS: {
- 4: ['block1/unit_1/lite_bottleneck_v1/conv2'],
- 8: ['block2/unit_1/lite_bottleneck_v1/conv2'],
- 16: ['block3/unit_1/lite_bottleneck_v1/conv2'],
- },
- },
- 'resnet_v1_18_beta': {
- DECODER_END_POINTS: {
- 4: ['block1/unit_1/lite_bottleneck_v1/conv2'],
- 8: ['block2/unit_1/lite_bottleneck_v1/conv2'],
- 16: ['block3/unit_1/lite_bottleneck_v1/conv2'],
- },
- },
- 'resnet_v1_50': {
- DECODER_END_POINTS: {
- 4: ['block1/unit_2/bottleneck_v1/conv3'],
- 8: ['block2/unit_3/bottleneck_v1/conv3'],
- 16: ['block3/unit_5/bottleneck_v1/conv3'],
- },
- },
- 'resnet_v1_50_beta': {
- DECODER_END_POINTS: {
- 4: ['block1/unit_2/bottleneck_v1/conv3'],
- 8: ['block2/unit_3/bottleneck_v1/conv3'],
- 16: ['block3/unit_5/bottleneck_v1/conv3'],
- },
- },
- 'resnet_v1_101': {
- DECODER_END_POINTS: {
- 4: ['block1/unit_2/bottleneck_v1/conv3'],
- 8: ['block2/unit_3/bottleneck_v1/conv3'],
- 16: ['block3/unit_22/bottleneck_v1/conv3'],
- },
- },
- 'resnet_v1_101_beta': {
- DECODER_END_POINTS: {
- 4: ['block1/unit_2/bottleneck_v1/conv3'],
- 8: ['block2/unit_3/bottleneck_v1/conv3'],
- 16: ['block3/unit_22/bottleneck_v1/conv3'],
- },
- },
- 'xception_41': {
- DECODER_END_POINTS: {
- 4: ['entry_flow/block2/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- 8: ['entry_flow/block3/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- 16: ['exit_flow/block1/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- },
- },
- 'xception_65': {
- DECODER_END_POINTS: {
- 4: ['entry_flow/block2/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- 8: ['entry_flow/block3/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- 16: ['exit_flow/block1/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- },
- },
- 'xception_71': {
- DECODER_END_POINTS: {
- 4: ['entry_flow/block3/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- 8: ['entry_flow/block5/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- 16: ['exit_flow/block1/unit_1/xception_module/'
- 'separable_conv2_pointwise'],
- },
- },
- 'nas_pnasnet': {
- DECODER_END_POINTS: {
- 4: ['Stem'],
- 8: ['Cell_3'],
- 16: ['Cell_7'],
- },
- },
- 'nas_hnasnet': {
- DECODER_END_POINTS: {
- 4: ['Cell_2'],
- 8: ['Cell_5'],
- 16: ['Cell_7'],
- },
- },
-}
-
-# A map from feature extractor name to the network name scope used in the
-# ImageNet pretrained versions of these models.
-name_scope = {
- 'mobilenet_v2': 'MobilenetV2',
- 'mobilenet_edgetpu': 'MobilenetEdgeTPU',
- 'mobilenet_v3_large_seg': 'MobilenetV3',
- 'mobilenet_v3_small_seg': 'MobilenetV3',
- 'resnet_v1_18': 'resnet_v1_18',
- 'resnet_v1_18_beta': 'resnet_v1_18',
- 'resnet_v1_50': 'resnet_v1_50',
- 'resnet_v1_50_beta': 'resnet_v1_50',
- 'resnet_v1_101': 'resnet_v1_101',
- 'resnet_v1_101_beta': 'resnet_v1_101',
- 'xception_41': 'xception_41',
- 'xception_65': 'xception_65',
- 'xception_71': 'xception_71',
- 'nas_pnasnet': 'pnasnet',
- 'nas_hnasnet': 'hnasnet',
-}
-
-# Mean pixel value.
-_MEAN_RGB = [123.15, 115.90, 103.06]
-
-
-def _preprocess_subtract_imagenet_mean(inputs, dtype=tf.float32):
- """Subtract Imagenet mean RGB value."""
- mean_rgb = tf.reshape(_MEAN_RGB, [1, 1, 1, 3])
- num_channels = tf.shape(inputs)[-1]
- # We set mean pixel as 0 for the non-RGB channels.
- mean_rgb_extended = tf.concat(
- [mean_rgb, tf.zeros([1, 1, 1, num_channels - 3])], axis=3)
- return tf.cast(inputs - mean_rgb_extended, dtype=dtype)
-
-
-def _preprocess_zero_mean_unit_range(inputs, dtype=tf.float32):
- """Map image values from [0, 255] to [-1, 1]."""
- preprocessed_inputs = (2.0 / 255.0) * tf.to_float(inputs) - 1.0
- return tf.cast(preprocessed_inputs, dtype=dtype)
-
-
-_PREPROCESS_FN = {
- 'mobilenet_v2': _preprocess_zero_mean_unit_range,
- 'mobilenet_edgetpu': _preprocess_zero_mean_unit_range,
- 'mobilenet_v3_large_seg': _preprocess_zero_mean_unit_range,
- 'mobilenet_v3_small_seg': _preprocess_zero_mean_unit_range,
- 'resnet_v1_18': _preprocess_subtract_imagenet_mean,
- 'resnet_v1_18_beta': _preprocess_zero_mean_unit_range,
- 'resnet_v1_50': _preprocess_subtract_imagenet_mean,
- 'resnet_v1_50_beta': _preprocess_zero_mean_unit_range,
- 'resnet_v1_101': _preprocess_subtract_imagenet_mean,
- 'resnet_v1_101_beta': _preprocess_zero_mean_unit_range,
- 'xception_41': _preprocess_zero_mean_unit_range,
- 'xception_65': _preprocess_zero_mean_unit_range,
- 'xception_71': _preprocess_zero_mean_unit_range,
- 'nas_pnasnet': _preprocess_zero_mean_unit_range,
- 'nas_hnasnet': _preprocess_zero_mean_unit_range,
-}
-
-
-def mean_pixel(model_variant=None):
- """Gets mean pixel value.
-
- This function returns different mean pixel value, depending on the input
- model_variant which adopts different preprocessing functions. We currently
- handle the following preprocessing functions:
- (1) _preprocess_subtract_imagenet_mean. We simply return mean pixel value.
- (2) _preprocess_zero_mean_unit_range. We return [127.5, 127.5, 127.5].
- The return values are used in a way that the padded regions after
- pre-processing will contain value 0.
-
- Args:
- model_variant: Model variant (string) for feature extraction. For
- backwards compatibility, model_variant=None returns _MEAN_RGB.
-
- Returns:
- Mean pixel value.
- """
- if model_variant in ['resnet_v1_50',
- 'resnet_v1_101'] or model_variant is None:
- return _MEAN_RGB
- else:
- return [127.5, 127.5, 127.5]
-
-
-def extract_features(images,
- output_stride=8,
- multi_grid=None,
- depth_multiplier=1.0,
- divisible_by=None,
- final_endpoint=None,
- model_variant=None,
- weight_decay=0.0001,
- reuse=None,
- is_training=False,
- fine_tune_batch_norm=False,
- regularize_depthwise=False,
- preprocess_images=True,
- preprocessed_images_dtype=tf.float32,
- num_classes=None,
- global_pool=False,
- nas_architecture_options=None,
- nas_training_hyper_parameters=None,
- use_bounded_activation=False):
- """Extracts features by the particular model_variant.
-
- Args:
- images: A tensor of size [batch, height, width, channels].
- output_stride: The ratio of input to output spatial resolution.
- multi_grid: Employ a hierarchy of different atrous rates within network.
- depth_multiplier: Float multiplier for the depth (number of channels)
- for all convolution ops used in MobileNet.
- divisible_by: None (use default setting) or an integer that ensures all
- layers # channels will be divisible by this number. Used in MobileNet.
- final_endpoint: The MobileNet endpoint to construct the network up to.
- model_variant: Model variant for feature extraction.
- weight_decay: The weight decay for model variables.
- reuse: Reuse the model variables or not.
- is_training: Is training or not.
- fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
- regularize_depthwise: Whether or not apply L2-norm regularization on the
- depthwise convolution weights.
- preprocess_images: Performs preprocessing on images or not. Defaults to
- True. Set to False if preprocessing will be done by other functions. We
- supprot two types of preprocessing: (1) Mean pixel substraction and (2)
- Pixel values normalization to be [-1, 1].
- preprocessed_images_dtype: The type after the preprocessing function.
- num_classes: Number of classes for image classification task. Defaults
- to None for dense prediction tasks.
- global_pool: Global pooling for image classification task. Defaults to
- False, since dense prediction tasks do not use this.
- nas_architecture_options: A dictionary storing NAS architecture options.
- It is either None or its kerys are:
- - `nas_stem_output_num_conv_filters`: Number of filters of the NAS stem
- output tensor.
- - `nas_use_classification_head`: Boolean, use image classification head.
- nas_training_hyper_parameters: A dictionary storing hyper-parameters for
- training nas models. It is either None or its keys are:
- - `drop_path_keep_prob`: Probability to keep each path in the cell when
- training.
- - `total_training_steps`: Total training steps to help drop path
- probability calculation.
- use_bounded_activation: Whether or not to use bounded activations. Bounded
- activations better lend themselves to quantized inference. Currently,
- bounded activation is only used in xception model.
-
- Returns:
- features: A tensor of size [batch, feature_height, feature_width,
- feature_channels], where feature_height/feature_width are determined
- by the images height/width and output_stride.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: Unrecognized model variant.
- """
- if 'resnet' in model_variant:
- arg_scope = arg_scopes_map[model_variant](
- weight_decay=weight_decay,
- batch_norm_decay=0.95,
- batch_norm_epsilon=1e-5,
- batch_norm_scale=True)
- features, end_points = get_network(
- model_variant, preprocess_images, preprocessed_images_dtype, arg_scope)(
- inputs=images,
- num_classes=num_classes,
- is_training=(is_training and fine_tune_batch_norm),
- global_pool=global_pool,
- output_stride=output_stride,
- multi_grid=multi_grid,
- reuse=reuse,
- scope=name_scope[model_variant])
- elif 'xception' in model_variant:
- arg_scope = arg_scopes_map[model_variant](
- weight_decay=weight_decay,
- batch_norm_decay=0.9997,
- batch_norm_epsilon=1e-3,
- batch_norm_scale=True,
- regularize_depthwise=regularize_depthwise,
- use_bounded_activation=use_bounded_activation)
- features, end_points = get_network(
- model_variant, preprocess_images, preprocessed_images_dtype, arg_scope)(
- inputs=images,
- num_classes=num_classes,
- is_training=(is_training and fine_tune_batch_norm),
- global_pool=global_pool,
- output_stride=output_stride,
- regularize_depthwise=regularize_depthwise,
- multi_grid=multi_grid,
- reuse=reuse,
- scope=name_scope[model_variant])
- elif 'mobilenet' in model_variant or model_variant.startswith('mnas'):
- arg_scope = arg_scopes_map[model_variant](
- is_training=(is_training and fine_tune_batch_norm),
- weight_decay=weight_decay)
- features, end_points = get_network(
- model_variant, preprocess_images, preprocessed_images_dtype, arg_scope)(
- inputs=images,
- depth_multiplier=depth_multiplier,
- divisible_by=divisible_by,
- output_stride=output_stride,
- reuse=reuse,
- scope=name_scope[model_variant],
- final_endpoint=final_endpoint)
- elif model_variant.startswith('nas'):
- arg_scope = arg_scopes_map[model_variant](
- weight_decay=weight_decay,
- batch_norm_decay=0.9997,
- batch_norm_epsilon=1e-3)
- features, end_points = get_network(
- model_variant, preprocess_images, preprocessed_images_dtype, arg_scope)(
- inputs=images,
- num_classes=num_classes,
- is_training=(is_training and fine_tune_batch_norm),
- global_pool=global_pool,
- output_stride=output_stride,
- nas_architecture_options=nas_architecture_options,
- nas_training_hyper_parameters=nas_training_hyper_parameters,
- reuse=reuse,
- scope=name_scope[model_variant])
- else:
- raise ValueError('Unknown model variant %s.' % model_variant)
-
- return features, end_points
-
-
-def get_network(network_name, preprocess_images,
- preprocessed_images_dtype=tf.float32, arg_scope=None):
- """Gets the network.
-
- Args:
- network_name: Network name.
- preprocess_images: Preprocesses the images or not.
- preprocessed_images_dtype: The type after the preprocessing function.
- arg_scope: Optional, arg_scope to build the network. If not provided the
- default arg_scope of the network would be used.
-
- Returns:
- A network function that is used to extract features.
-
- Raises:
- ValueError: network is not supported.
- """
- if network_name not in networks_map:
- raise ValueError('Unsupported network %s.' % network_name)
- arg_scope = arg_scope or arg_scopes_map[network_name]()
- def _identity_function(inputs, dtype=preprocessed_images_dtype):
- return tf.cast(inputs, dtype=dtype)
- if preprocess_images:
- preprocess_function = _PREPROCESS_FN[network_name]
- else:
- preprocess_function = _identity_function
- func = networks_map[network_name]
- @functools.wraps(func)
- def network_fn(inputs, *args, **kwargs):
- with slim.arg_scope(arg_scope):
- return func(preprocess_function(inputs, preprocessed_images_dtype),
- *args, **kwargs)
- return network_fn
diff --git a/research/deeplab/core/nas_cell.py b/research/deeplab/core/nas_cell.py
deleted file mode 100644
index d179082dc72..00000000000
--- a/research/deeplab/core/nas_cell.py
+++ /dev/null
@@ -1,221 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Cell structure used by NAS."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import functools
-from six.moves import range
-from six.moves import zip
-import tensorflow as tf
-from tensorflow.contrib import framework as contrib_framework
-from tensorflow.contrib import slim as contrib_slim
-from deeplab.core import xception as xception_utils
-from deeplab.core.utils import resize_bilinear
-from deeplab.core.utils import scale_dimension
-from tensorflow.contrib.slim.nets import resnet_utils
-
-arg_scope = contrib_framework.arg_scope
-slim = contrib_slim
-
-separable_conv2d_same = functools.partial(xception_utils.separable_conv2d_same,
- regularize_depthwise=True)
-
-
-class NASBaseCell(object):
- """NASNet Cell class that is used as a 'layer' in image architectures."""
-
- def __init__(self, num_conv_filters, operations, used_hiddenstates,
- hiddenstate_indices, drop_path_keep_prob, total_num_cells,
- total_training_steps, batch_norm_fn=slim.batch_norm):
- """Init function.
-
- For more details about NAS cell, see
- https://arxiv.org/abs/1707.07012 and https://arxiv.org/abs/1712.00559.
-
- Args:
- num_conv_filters: The number of filters for each convolution operation.
- operations: List of operations that are performed in the NASNet Cell in
- order.
- used_hiddenstates: Binary array that signals if the hiddenstate was used
- within the cell. This is used to determine what outputs of the cell
- should be concatenated together.
- hiddenstate_indices: Determines what hiddenstates should be combined
- together with the specified operations to create the NASNet cell.
- drop_path_keep_prob: Float, drop path keep probability.
- total_num_cells: Integer, total number of cells.
- total_training_steps: Integer, total training steps.
- batch_norm_fn: Function, batch norm function. Defaults to
- slim.batch_norm.
- """
- if len(hiddenstate_indices) != len(operations):
- raise ValueError(
- 'Number of hiddenstate_indices and operations should be the same.')
- if len(operations) % 2:
- raise ValueError('Number of operations should be even.')
- self._num_conv_filters = num_conv_filters
- self._operations = operations
- self._used_hiddenstates = used_hiddenstates
- self._hiddenstate_indices = hiddenstate_indices
- self._drop_path_keep_prob = drop_path_keep_prob
- self._total_num_cells = total_num_cells
- self._total_training_steps = total_training_steps
- self._batch_norm_fn = batch_norm_fn
-
- def __call__(self, net, scope, filter_scaling, stride, prev_layer, cell_num):
- """Runs the conv cell."""
- self._cell_num = cell_num
- self._filter_scaling = filter_scaling
- self._filter_size = int(self._num_conv_filters * filter_scaling)
-
- with tf.variable_scope(scope):
- net = self._cell_base(net, prev_layer)
- for i in range(len(self._operations) // 2):
- with tf.variable_scope('comb_iter_{}'.format(i)):
- h1 = net[self._hiddenstate_indices[i * 2]]
- h2 = net[self._hiddenstate_indices[i * 2 + 1]]
- with tf.variable_scope('left'):
- h1 = self._apply_conv_operation(
- h1, self._operations[i * 2], stride,
- self._hiddenstate_indices[i * 2] < 2)
- with tf.variable_scope('right'):
- h2 = self._apply_conv_operation(
- h2, self._operations[i * 2 + 1], stride,
- self._hiddenstate_indices[i * 2 + 1] < 2)
- with tf.variable_scope('combine'):
- h = h1 + h2
- net.append(h)
-
- with tf.variable_scope('cell_output'):
- net = self._combine_unused_states(net)
-
- return net
-
- def _cell_base(self, net, prev_layer):
- """Runs the beginning of the conv cell before the chosen ops are run."""
- filter_size = self._filter_size
-
- if prev_layer is None:
- prev_layer = net
- else:
- if net.shape[2] != prev_layer.shape[2]:
- prev_layer = resize_bilinear(
- prev_layer, tf.shape(net)[1:3], prev_layer.dtype)
- if filter_size != prev_layer.shape[3]:
- prev_layer = tf.nn.relu(prev_layer)
- prev_layer = slim.conv2d(prev_layer, filter_size, 1, scope='prev_1x1')
- prev_layer = self._batch_norm_fn(prev_layer, scope='prev_bn')
-
- net = tf.nn.relu(net)
- net = slim.conv2d(net, filter_size, 1, scope='1x1')
- net = self._batch_norm_fn(net, scope='beginning_bn')
- net = tf.split(axis=3, num_or_size_splits=1, value=net)
- net.append(prev_layer)
- return net
-
- def _apply_conv_operation(self, net, operation, stride,
- is_from_original_input):
- """Applies the predicted conv operation to net."""
- if stride > 1 and not is_from_original_input:
- stride = 1
- input_filters = net.shape[3]
- filter_size = self._filter_size
- if 'separable' in operation:
- num_layers = int(operation.split('_')[-1])
- kernel_size = int(operation.split('x')[0][-1])
- for layer_num in range(num_layers):
- net = tf.nn.relu(net)
- net = separable_conv2d_same(
- net,
- filter_size,
- kernel_size,
- depth_multiplier=1,
- scope='separable_{0}x{0}_{1}'.format(kernel_size, layer_num + 1),
- stride=stride)
- net = self._batch_norm_fn(
- net, scope='bn_sep_{0}x{0}_{1}'.format(kernel_size, layer_num + 1))
- stride = 1
- elif 'atrous' in operation:
- kernel_size = int(operation.split('x')[0][-1])
- net = tf.nn.relu(net)
- if stride == 2:
- scaled_height = scale_dimension(tf.shape(net)[1], 0.5)
- scaled_width = scale_dimension(tf.shape(net)[2], 0.5)
- net = resize_bilinear(net, [scaled_height, scaled_width], net.dtype)
- net = resnet_utils.conv2d_same(
- net, filter_size, kernel_size, rate=1, stride=1,
- scope='atrous_{0}x{0}'.format(kernel_size))
- else:
- net = resnet_utils.conv2d_same(
- net, filter_size, kernel_size, rate=2, stride=1,
- scope='atrous_{0}x{0}'.format(kernel_size))
- net = self._batch_norm_fn(net, scope='bn_atr_{0}x{0}'.format(kernel_size))
- elif operation in ['none']:
- if stride > 1 or (input_filters != filter_size):
- net = tf.nn.relu(net)
- net = slim.conv2d(net, filter_size, 1, stride=stride, scope='1x1')
- net = self._batch_norm_fn(net, scope='bn_1')
- elif 'pool' in operation:
- pooling_type = operation.split('_')[0]
- pooling_shape = int(operation.split('_')[-1].split('x')[0])
- if pooling_type == 'avg':
- net = slim.avg_pool2d(net, pooling_shape, stride=stride, padding='SAME')
- elif pooling_type == 'max':
- net = slim.max_pool2d(net, pooling_shape, stride=stride, padding='SAME')
- else:
- raise ValueError('Unimplemented pooling type: ', pooling_type)
- if input_filters != filter_size:
- net = slim.conv2d(net, filter_size, 1, stride=1, scope='1x1')
- net = self._batch_norm_fn(net, scope='bn_1')
- else:
- raise ValueError('Unimplemented operation', operation)
-
- if operation != 'none':
- net = self._apply_drop_path(net)
- return net
-
- def _combine_unused_states(self, net):
- """Concatenates the unused hidden states of the cell."""
- used_hiddenstates = self._used_hiddenstates
- states_to_combine = ([
- h for h, is_used in zip(net, used_hiddenstates) if not is_used])
- net = tf.concat(values=states_to_combine, axis=3)
- return net
-
- @contrib_framework.add_arg_scope
- def _apply_drop_path(self, net):
- """Apply drop_path regularization."""
- drop_path_keep_prob = self._drop_path_keep_prob
- if drop_path_keep_prob < 1.0:
- # Scale keep prob by layer number.
- assert self._cell_num != -1
- layer_ratio = (self._cell_num + 1) / float(self._total_num_cells)
- drop_path_keep_prob = 1 - layer_ratio * (1 - drop_path_keep_prob)
- # Decrease keep prob over time.
- current_step = tf.cast(tf.train.get_or_create_global_step(), tf.float32)
- current_ratio = tf.minimum(1.0, current_step / self._total_training_steps)
- drop_path_keep_prob = (1 - current_ratio * (1 - drop_path_keep_prob))
- # Drop path.
- noise_shape = [tf.shape(net)[0], 1, 1, 1]
- random_tensor = drop_path_keep_prob
- random_tensor += tf.random_uniform(noise_shape, dtype=tf.float32)
- binary_tensor = tf.cast(tf.floor(random_tensor), net.dtype)
- keep_prob_inv = tf.cast(1.0 / drop_path_keep_prob, net.dtype)
- net = net * keep_prob_inv * binary_tensor
- return net
diff --git a/research/deeplab/core/nas_genotypes.py b/research/deeplab/core/nas_genotypes.py
deleted file mode 100644
index a2e6dd55b45..00000000000
--- a/research/deeplab/core/nas_genotypes.py
+++ /dev/null
@@ -1,45 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Genotypes used by NAS."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from tensorflow.contrib import slim as contrib_slim
-from deeplab.core import nas_cell
-
-slim = contrib_slim
-
-
-class PNASCell(nas_cell.NASBaseCell):
- """Configuration and construction of the PNASNet-5 Cell."""
-
- def __init__(self, num_conv_filters, drop_path_keep_prob, total_num_cells,
- total_training_steps, batch_norm_fn=slim.batch_norm):
- # Name of operations: op_kernel-size_num-layers.
- operations = [
- 'separable_5x5_2', 'max_pool_3x3', 'separable_7x7_2', 'max_pool_3x3',
- 'separable_5x5_2', 'separable_3x3_2', 'separable_3x3_2', 'max_pool_3x3',
- 'separable_3x3_2', 'none'
- ]
- used_hiddenstates = [1, 1, 0, 0, 0, 0, 0]
- hiddenstate_indices = [1, 1, 0, 0, 0, 0, 4, 0, 1, 0]
-
- super(PNASCell, self).__init__(
- num_conv_filters, operations, used_hiddenstates, hiddenstate_indices,
- drop_path_keep_prob, total_num_cells, total_training_steps,
- batch_norm_fn)
diff --git a/research/deeplab/core/nas_network.py b/research/deeplab/core/nas_network.py
deleted file mode 100644
index 1da2e04dbaa..00000000000
--- a/research/deeplab/core/nas_network.py
+++ /dev/null
@@ -1,368 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Network structure used by NAS.
-
-Here we provide a few NAS backbones for semantic segmentation.
-Currently, we have
-
-1. pnasnet
-"Progressive Neural Architecture Search", Chenxi Liu, Barret Zoph,
-Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei,
-Alan Yuille, Jonathan Huang, Kevin Murphy. In ECCV, 2018.
-
-2. hnasnet (also called Auto-DeepLab)
-"Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic
-Image Segmentation", Chenxi Liu, Liang-Chieh Chen, Florian Schroff,
-Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei. In CVPR, 2019.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from six.moves import range
-import tensorflow as tf
-from tensorflow.contrib import framework as contrib_framework
-from tensorflow.contrib import layers as contrib_layers
-from tensorflow.contrib import slim as contrib_slim
-from tensorflow.contrib import training as contrib_training
-
-from deeplab.core import nas_genotypes
-from deeplab.core import utils
-from deeplab.core.nas_cell import NASBaseCell
-from tensorflow.contrib.slim.nets import resnet_utils
-
-arg_scope = contrib_framework.arg_scope
-slim = contrib_slim
-resize_bilinear = utils.resize_bilinear
-scale_dimension = utils.scale_dimension
-
-
-def config(num_conv_filters=20,
- total_training_steps=500000,
- drop_path_keep_prob=1.0):
- return contrib_training.HParams(
- # Multiplier when spatial size is reduced by 2.
- filter_scaling_rate=2.0,
- # Number of filters of the stem output tensor.
- num_conv_filters=num_conv_filters,
- # Probability to keep each path in the cell when training.
- drop_path_keep_prob=drop_path_keep_prob,
- # Total training steps to help drop path probability calculation.
- total_training_steps=total_training_steps,
- )
-
-
-def nas_arg_scope(weight_decay=4e-5,
- batch_norm_decay=0.9997,
- batch_norm_epsilon=0.001,
- sync_batch_norm_method='None'):
- """Default arg scope for the NAS models."""
- batch_norm_params = {
- # Decay for the moving averages.
- 'decay': batch_norm_decay,
- # epsilon to prevent 0s in variance.
- 'epsilon': batch_norm_epsilon,
- 'scale': True,
- }
- batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
- weights_regularizer = contrib_layers.l2_regularizer(weight_decay)
- weights_initializer = contrib_layers.variance_scaling_initializer(
- factor=1 / 3.0, mode='FAN_IN', uniform=True)
- with arg_scope([slim.fully_connected, slim.conv2d, slim.separable_conv2d],
- weights_regularizer=weights_regularizer,
- weights_initializer=weights_initializer):
- with arg_scope([slim.fully_connected],
- activation_fn=None, scope='FC'):
- with arg_scope([slim.conv2d, slim.separable_conv2d],
- activation_fn=None, biases_initializer=None):
- with arg_scope([batch_norm], **batch_norm_params) as sc:
- return sc
-
-
-def _nas_stem(inputs,
- batch_norm_fn=slim.batch_norm):
- """Stem used for NAS models."""
- net = resnet_utils.conv2d_same(inputs, 64, 3, stride=2, scope='conv0')
- net = batch_norm_fn(net, scope='conv0_bn')
- net = tf.nn.relu(net)
- net = resnet_utils.conv2d_same(net, 64, 3, stride=1, scope='conv1')
- net = batch_norm_fn(net, scope='conv1_bn')
- cell_outputs = [net]
- net = tf.nn.relu(net)
- net = resnet_utils.conv2d_same(net, 128, 3, stride=2, scope='conv2')
- net = batch_norm_fn(net, scope='conv2_bn')
- cell_outputs.append(net)
- return net, cell_outputs
-
-
-def _build_nas_base(images,
- cell,
- backbone,
- num_classes,
- hparams,
- global_pool=False,
- output_stride=16,
- nas_use_classification_head=False,
- reuse=None,
- scope=None,
- final_endpoint=None,
- batch_norm_fn=slim.batch_norm,
- nas_remove_os32_stride=False):
- """Constructs a NAS model.
-
- Args:
- images: A tensor of size [batch, height, width, channels].
- cell: Cell structure used in the network.
- backbone: Backbone structure used in the network. A list of integers in
- which value 0 means "output_stride=4", value 1 means "output_stride=8",
- value 2 means "output_stride=16", and value 3 means "output_stride=32".
- num_classes: Number of classes to predict.
- hparams: Hyperparameters needed to construct the network.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: Interger, the stride of output feature maps.
- nas_use_classification_head: Boolean, use image classification head.
- reuse: Whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- final_endpoint: The endpoint to construct the network up to.
- batch_norm_fn: Batch norm function.
- nas_remove_os32_stride: Boolean, remove stride in output_stride 32 branch.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: If output_stride is not a multiple of backbone output stride.
- """
- with tf.variable_scope(scope, 'nas', [images], reuse=reuse):
- end_points = {}
- def add_and_check_endpoint(endpoint_name, net):
- end_points[endpoint_name] = net
- return final_endpoint and (endpoint_name == final_endpoint)
-
- net, cell_outputs = _nas_stem(images,
- batch_norm_fn=batch_norm_fn)
- if add_and_check_endpoint('Stem', net):
- return net, end_points
-
- # Run the cells
- filter_scaling = 1.0
- for cell_num in range(len(backbone)):
- stride = 1
- if cell_num == 0:
- if backbone[0] == 1:
- stride = 2
- filter_scaling *= hparams.filter_scaling_rate
- else:
- if backbone[cell_num] == backbone[cell_num - 1] + 1:
- stride = 2
- if backbone[cell_num] == 3 and nas_remove_os32_stride:
- stride = 1
- filter_scaling *= hparams.filter_scaling_rate
- elif backbone[cell_num] == backbone[cell_num - 1] - 1:
- if backbone[cell_num - 1] == 3 and nas_remove_os32_stride:
- # No need to rescale features.
- pass
- else:
- # Scale features by a factor of 2.
- scaled_height = scale_dimension(net.shape[1].value, 2)
- scaled_width = scale_dimension(net.shape[2].value, 2)
- net = resize_bilinear(net, [scaled_height, scaled_width], net.dtype)
- filter_scaling /= hparams.filter_scaling_rate
- net = cell(
- net,
- scope='cell_{}'.format(cell_num),
- filter_scaling=filter_scaling,
- stride=stride,
- prev_layer=cell_outputs[-2],
- cell_num=cell_num)
- if add_and_check_endpoint('Cell_{}'.format(cell_num), net):
- return net, end_points
- cell_outputs.append(net)
- net = tf.nn.relu(net)
-
- if nas_use_classification_head:
- # Add image classification head.
- # We will expand the filters for different output_strides.
- output_stride_to_expanded_filters = {8: 256, 16: 512, 32: 1024}
- current_output_scale = 2 + backbone[-1]
- current_output_stride = 2 ** current_output_scale
- if output_stride % current_output_stride != 0:
- raise ValueError(
- 'output_stride must be a multiple of backbone output stride.')
- output_stride //= current_output_stride
- rate = 1
- if current_output_stride != 32:
- num_downsampling = 5 - current_output_scale
- for i in range(num_downsampling):
- # Gradually donwsample feature maps to output stride = 32.
- target_output_stride = 2 ** (current_output_scale + 1 + i)
- target_filters = output_stride_to_expanded_filters[
- target_output_stride]
- scope = 'downsample_os{}'.format(target_output_stride)
- if output_stride != 1:
- stride = 2
- output_stride //= 2
- else:
- stride = 1
- rate *= 2
- net = resnet_utils.conv2d_same(
- net, target_filters, 3, stride=stride, rate=rate,
- scope=scope + '_conv')
- net = batch_norm_fn(net, scope=scope + '_bn')
- add_and_check_endpoint(scope, net)
- net = tf.nn.relu(net)
- # Apply 1x1 convolution to expand dimension to 2048.
- scope = 'classification_head'
- net = slim.conv2d(net, 2048, 1, scope=scope + '_conv')
- net = batch_norm_fn(net, scope=scope + '_bn')
- add_and_check_endpoint(scope, net)
- net = tf.nn.relu(net)
- if global_pool:
- # Global average pooling.
- net = tf.reduce_mean(net, [1, 2], name='global_pool', keepdims=True)
- if num_classes is not None:
- net = slim.conv2d(net, num_classes, 1, activation_fn=None,
- normalizer_fn=None, scope='logits')
- end_points['predictions'] = slim.softmax(net, scope='predictions')
- return net, end_points
-
-
-def pnasnet(images,
- num_classes,
- is_training=True,
- global_pool=False,
- output_stride=16,
- nas_architecture_options=None,
- nas_training_hyper_parameters=None,
- reuse=None,
- scope='pnasnet',
- final_endpoint=None,
- sync_batch_norm_method='None'):
- """Builds PNASNet model."""
- if nas_architecture_options is None:
- raise ValueError(
- 'Using NAS model variants. nas_architecture_options cannot be None.')
- hparams = config(num_conv_filters=nas_architecture_options[
- 'nas_stem_output_num_conv_filters'])
- if nas_training_hyper_parameters:
- hparams.set_hparam('drop_path_keep_prob',
- nas_training_hyper_parameters['drop_path_keep_prob'])
- hparams.set_hparam('total_training_steps',
- nas_training_hyper_parameters['total_training_steps'])
- if not is_training:
- tf.logging.info('During inference, setting drop_path_keep_prob = 1.0.')
- hparams.set_hparam('drop_path_keep_prob', 1.0)
- tf.logging.info(hparams)
- if output_stride == 8:
- backbone = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
- elif output_stride == 16:
- backbone = [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]
- elif output_stride == 32:
- backbone = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
- else:
- raise ValueError('Unsupported output_stride ', output_stride)
- batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
- cell = nas_genotypes.PNASCell(hparams.num_conv_filters,
- hparams.drop_path_keep_prob,
- len(backbone),
- hparams.total_training_steps,
- batch_norm_fn=batch_norm)
- with arg_scope([slim.dropout, batch_norm], is_training=is_training):
- return _build_nas_base(
- images,
- cell=cell,
- backbone=backbone,
- num_classes=num_classes,
- hparams=hparams,
- global_pool=global_pool,
- output_stride=output_stride,
- nas_use_classification_head=nas_architecture_options[
- 'nas_use_classification_head'],
- reuse=reuse,
- scope=scope,
- final_endpoint=final_endpoint,
- batch_norm_fn=batch_norm,
- nas_remove_os32_stride=nas_architecture_options[
- 'nas_remove_os32_stride'])
-
-
-# pylint: disable=unused-argument
-def hnasnet(images,
- num_classes,
- is_training=True,
- global_pool=False,
- output_stride=8,
- nas_architecture_options=None,
- nas_training_hyper_parameters=None,
- reuse=None,
- scope='hnasnet',
- final_endpoint=None,
- sync_batch_norm_method='None'):
- """Builds hierarchical model."""
- if nas_architecture_options is None:
- raise ValueError(
- 'Using NAS model variants. nas_architecture_options cannot be None.')
- hparams = config(num_conv_filters=nas_architecture_options[
- 'nas_stem_output_num_conv_filters'])
- if nas_training_hyper_parameters:
- hparams.set_hparam('drop_path_keep_prob',
- nas_training_hyper_parameters['drop_path_keep_prob'])
- hparams.set_hparam('total_training_steps',
- nas_training_hyper_parameters['total_training_steps'])
- if not is_training:
- tf.logging.info('During inference, setting drop_path_keep_prob = 1.0.')
- hparams.set_hparam('drop_path_keep_prob', 1.0)
- tf.logging.info(hparams)
- operations = [
- 'atrous_5x5', 'separable_3x3_2', 'separable_3x3_2', 'atrous_3x3',
- 'separable_3x3_2', 'separable_3x3_2', 'separable_5x5_2',
- 'separable_5x5_2', 'separable_5x5_2', 'atrous_5x5'
- ]
- used_hiddenstates = [1, 1, 0, 0, 0, 0, 0]
- hiddenstate_indices = [1, 0, 1, 0, 3, 1, 4, 2, 3, 5]
- backbone = [0, 0, 0, 1, 2, 1, 2, 2, 3, 3, 2, 1]
- batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
- cell = NASBaseCell(hparams.num_conv_filters,
- operations,
- used_hiddenstates,
- hiddenstate_indices,
- hparams.drop_path_keep_prob,
- len(backbone),
- hparams.total_training_steps,
- batch_norm_fn=batch_norm)
- with arg_scope([slim.dropout, batch_norm], is_training=is_training):
- return _build_nas_base(
- images,
- cell=cell,
- backbone=backbone,
- num_classes=num_classes,
- hparams=hparams,
- global_pool=global_pool,
- output_stride=output_stride,
- nas_use_classification_head=nas_architecture_options[
- 'nas_use_classification_head'],
- reuse=reuse,
- scope=scope,
- final_endpoint=final_endpoint,
- batch_norm_fn=batch_norm,
- nas_remove_os32_stride=nas_architecture_options[
- 'nas_remove_os32_stride'])
diff --git a/research/deeplab/core/nas_network_test.py b/research/deeplab/core/nas_network_test.py
deleted file mode 100644
index 18621b250ad..00000000000
--- a/research/deeplab/core/nas_network_test.py
+++ /dev/null
@@ -1,111 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for resnet_v1_beta module."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-from tensorflow.contrib import framework as contrib_framework
-from tensorflow.contrib import slim as contrib_slim
-from tensorflow.contrib import training as contrib_training
-
-from deeplab.core import nas_genotypes
-from deeplab.core import nas_network
-
-arg_scope = contrib_framework.arg_scope
-slim = contrib_slim
-
-
-def create_test_input(batch, height, width, channels):
- """Creates test input tensor."""
- if None in [batch, height, width, channels]:
- return tf.placeholder(tf.float32, (batch, height, width, channels))
- else:
- return tf.to_float(
- np.tile(
- np.reshape(
- np.reshape(np.arange(height), [height, 1]) +
- np.reshape(np.arange(width), [1, width]),
- [1, height, width, 1]),
- [batch, 1, 1, channels]))
-
-
-class NASNetworkTest(tf.test.TestCase):
- """Tests with complete small NAS networks."""
-
- def _pnasnet(self,
- images,
- backbone,
- num_classes,
- is_training=True,
- output_stride=16,
- final_endpoint=None):
- """Build PNASNet model backbone."""
- hparams = contrib_training.HParams(
- filter_scaling_rate=2.0,
- num_conv_filters=10,
- drop_path_keep_prob=1.0,
- total_training_steps=200000,
- )
- if not is_training:
- hparams.set_hparam('drop_path_keep_prob', 1.0)
-
- cell = nas_genotypes.PNASCell(hparams.num_conv_filters,
- hparams.drop_path_keep_prob,
- len(backbone),
- hparams.total_training_steps)
- with arg_scope([slim.dropout, slim.batch_norm], is_training=is_training):
- return nas_network._build_nas_base(
- images,
- cell=cell,
- backbone=backbone,
- num_classes=num_classes,
- hparams=hparams,
- reuse=tf.AUTO_REUSE,
- scope='pnasnet_small',
- final_endpoint=final_endpoint)
-
- def testFullyConvolutionalEndpointShapes(self):
- num_classes = 10
- backbone = [0, 0, 0, 1, 2, 1, 2, 2, 3, 3, 2, 1]
- inputs = create_test_input(None, 321, 321, 3)
- with slim.arg_scope(nas_network.nas_arg_scope()):
- _, end_points = self._pnasnet(inputs, backbone, num_classes)
- endpoint_to_shape = {
- 'Stem': [None, 81, 81, 128],
- 'Cell_0': [None, 81, 81, 50],
- 'Cell_1': [None, 81, 81, 50],
- 'Cell_2': [None, 81, 81, 50],
- 'Cell_3': [None, 41, 41, 100],
- 'Cell_4': [None, 21, 21, 200],
- 'Cell_5': [None, 41, 41, 100],
- 'Cell_6': [None, 21, 21, 200],
- 'Cell_7': [None, 21, 21, 200],
- 'Cell_8': [None, 11, 11, 400],
- 'Cell_9': [None, 11, 11, 400],
- 'Cell_10': [None, 21, 21, 200],
- 'Cell_11': [None, 41, 41, 100]
- }
- for endpoint, shape in endpoint_to_shape.items():
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/core/preprocess_utils.py b/research/deeplab/core/preprocess_utils.py
deleted file mode 100644
index 440717e414d..00000000000
--- a/research/deeplab/core/preprocess_utils.py
+++ /dev/null
@@ -1,533 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Utility functions related to preprocessing inputs."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from six.moves import range
-from six.moves import zip
-import tensorflow as tf
-
-
-def flip_dim(tensor_list, prob=0.5, dim=1):
- """Randomly flips a dimension of the given tensor.
-
- The decision to randomly flip the `Tensors` is made together. In other words,
- all or none of the images pass in are flipped.
-
- Note that tf.random_flip_left_right and tf.random_flip_up_down isn't used so
- that we can control for the probability as well as ensure the same decision
- is applied across the images.
-
- Args:
- tensor_list: A list of `Tensors` with the same number of dimensions.
- prob: The probability of a left-right flip.
- dim: The dimension to flip, 0, 1, ..
-
- Returns:
- outputs: A list of the possibly flipped `Tensors` as well as an indicator
- `Tensor` at the end whose value is `True` if the inputs were flipped and
- `False` otherwise.
-
- Raises:
- ValueError: If dim is negative or greater than the dimension of a `Tensor`.
- """
- random_value = tf.random_uniform([])
-
- def flip():
- flipped = []
- for tensor in tensor_list:
- if dim < 0 or dim >= len(tensor.get_shape().as_list()):
- raise ValueError('dim must represent a valid dimension.')
- flipped.append(tf.reverse_v2(tensor, [dim]))
- return flipped
-
- is_flipped = tf.less_equal(random_value, prob)
- outputs = tf.cond(is_flipped, flip, lambda: tensor_list)
- if not isinstance(outputs, (list, tuple)):
- outputs = [outputs]
- outputs.append(is_flipped)
-
- return outputs
-
-
-def _image_dimensions(image, rank):
- """Returns the dimensions of an image tensor.
-
- Args:
- image: A rank-D Tensor. For 3-D of shape: `[height, width, channels]`.
- rank: The expected rank of the image
-
- Returns:
- A list of corresponding to the dimensions of the input image. Dimensions
- that are statically known are python integers, otherwise they are integer
- scalar tensors.
- """
- if image.get_shape().is_fully_defined():
- return image.get_shape().as_list()
- else:
- static_shape = image.get_shape().with_rank(rank).as_list()
- dynamic_shape = tf.unstack(tf.shape(image), rank)
- return [
- s if s is not None else d for s, d in zip(static_shape, dynamic_shape)
- ]
-
-
-def get_label_resize_method(label):
- """Returns the resize method of labels depending on label dtype.
-
- Args:
- label: Groundtruth label tensor.
-
- Returns:
- tf.image.ResizeMethod.BILINEAR, if label dtype is floating.
- tf.image.ResizeMethod.NEAREST_NEIGHBOR, if label dtype is integer.
-
- Raises:
- ValueError: If label is neither floating nor integer.
- """
- if label.dtype.is_floating:
- return tf.image.ResizeMethod.BILINEAR
- elif label.dtype.is_integer:
- return tf.image.ResizeMethod.NEAREST_NEIGHBOR
- else:
- raise ValueError('Label type must be either floating or integer.')
-
-
-def pad_to_bounding_box(image, offset_height, offset_width, target_height,
- target_width, pad_value):
- """Pads the given image with the given pad_value.
-
- Works like tf.image.pad_to_bounding_box, except it can pad the image
- with any given arbitrary pad value and also handle images whose sizes are not
- known during graph construction.
-
- Args:
- image: 3-D tensor with shape [height, width, channels]
- offset_height: Number of rows of zeros to add on top.
- offset_width: Number of columns of zeros to add on the left.
- target_height: Height of output image.
- target_width: Width of output image.
- pad_value: Value to pad the image tensor with.
-
- Returns:
- 3-D tensor of shape [target_height, target_width, channels].
-
- Raises:
- ValueError: If the shape of image is incompatible with the offset_* or
- target_* arguments.
- """
- with tf.name_scope(None, 'pad_to_bounding_box', [image]):
- image = tf.convert_to_tensor(image, name='image')
- original_dtype = image.dtype
- if original_dtype != tf.float32 and original_dtype != tf.float64:
- # If image dtype is not float, we convert it to int32 to avoid overflow.
- image = tf.cast(image, tf.int32)
- image_rank_assert = tf.Assert(
- tf.logical_or(
- tf.equal(tf.rank(image), 3),
- tf.equal(tf.rank(image), 4)),
- ['Wrong image tensor rank.'])
- with tf.control_dependencies([image_rank_assert]):
- image -= pad_value
- image_shape = image.get_shape()
- is_batch = True
- if image_shape.ndims == 3:
- is_batch = False
- image = tf.expand_dims(image, 0)
- elif image_shape.ndims is None:
- is_batch = False
- image = tf.expand_dims(image, 0)
- image.set_shape([None] * 4)
- elif image.get_shape().ndims != 4:
- raise ValueError('Input image must have either 3 or 4 dimensions.')
- _, height, width, _ = _image_dimensions(image, rank=4)
- target_width_assert = tf.Assert(
- tf.greater_equal(
- target_width, width),
- ['target_width must be >= width'])
- target_height_assert = tf.Assert(
- tf.greater_equal(target_height, height),
- ['target_height must be >= height'])
- with tf.control_dependencies([target_width_assert]):
- after_padding_width = target_width - offset_width - width
- with tf.control_dependencies([target_height_assert]):
- after_padding_height = target_height - offset_height - height
- offset_assert = tf.Assert(
- tf.logical_and(
- tf.greater_equal(after_padding_width, 0),
- tf.greater_equal(after_padding_height, 0)),
- ['target size not possible with the given target offsets'])
- batch_params = tf.stack([0, 0])
- height_params = tf.stack([offset_height, after_padding_height])
- width_params = tf.stack([offset_width, after_padding_width])
- channel_params = tf.stack([0, 0])
- with tf.control_dependencies([offset_assert]):
- paddings = tf.stack([batch_params, height_params, width_params,
- channel_params])
- padded = tf.pad(image, paddings)
- if not is_batch:
- padded = tf.squeeze(padded, axis=[0])
- outputs = padded + pad_value
- if outputs.dtype != original_dtype:
- outputs = tf.cast(outputs, original_dtype)
- return outputs
-
-
-def _crop(image, offset_height, offset_width, crop_height, crop_width):
- """Crops the given image using the provided offsets and sizes.
-
- Note that the method doesn't assume we know the input image size but it does
- assume we know the input image rank.
-
- Args:
- image: an image of shape [height, width, channels].
- offset_height: a scalar tensor indicating the height offset.
- offset_width: a scalar tensor indicating the width offset.
- crop_height: the height of the cropped image.
- crop_width: the width of the cropped image.
-
- Returns:
- The cropped (and resized) image.
-
- Raises:
- ValueError: if `image` doesn't have rank of 3.
- InvalidArgumentError: if the rank is not 3 or if the image dimensions are
- less than the crop size.
- """
- original_shape = tf.shape(image)
-
- if len(image.get_shape().as_list()) != 3:
- raise ValueError('input must have rank of 3')
- original_channels = image.get_shape().as_list()[2]
-
- rank_assertion = tf.Assert(
- tf.equal(tf.rank(image), 3),
- ['Rank of image must be equal to 3.'])
- with tf.control_dependencies([rank_assertion]):
- cropped_shape = tf.stack([crop_height, crop_width, original_shape[2]])
-
- size_assertion = tf.Assert(
- tf.logical_and(
- tf.greater_equal(original_shape[0], crop_height),
- tf.greater_equal(original_shape[1], crop_width)),
- ['Crop size greater than the image size.'])
-
- offsets = tf.cast(tf.stack([offset_height, offset_width, 0]), tf.int32)
-
- # Use tf.slice instead of crop_to_bounding box as it accepts tensors to
- # define the crop size.
- with tf.control_dependencies([size_assertion]):
- image = tf.slice(image, offsets, cropped_shape)
- image = tf.reshape(image, cropped_shape)
- image.set_shape([crop_height, crop_width, original_channels])
- return image
-
-
-def random_crop(image_list, crop_height, crop_width):
- """Crops the given list of images.
-
- The function applies the same crop to each image in the list. This can be
- effectively applied when there are multiple image inputs of the same
- dimension such as:
-
- image, depths, normals = random_crop([image, depths, normals], 120, 150)
-
- Args:
- image_list: a list of image tensors of the same dimension but possibly
- varying channel.
- crop_height: the new height.
- crop_width: the new width.
-
- Returns:
- the image_list with cropped images.
-
- Raises:
- ValueError: if there are multiple image inputs provided with different size
- or the images are smaller than the crop dimensions.
- """
- if not image_list:
- raise ValueError('Empty image_list.')
-
- # Compute the rank assertions.
- rank_assertions = []
- for i in range(len(image_list)):
- image_rank = tf.rank(image_list[i])
- rank_assert = tf.Assert(
- tf.equal(image_rank, 3),
- ['Wrong rank for tensor %s [expected] [actual]',
- image_list[i].name, 3, image_rank])
- rank_assertions.append(rank_assert)
-
- with tf.control_dependencies([rank_assertions[0]]):
- image_shape = tf.shape(image_list[0])
- image_height = image_shape[0]
- image_width = image_shape[1]
- crop_size_assert = tf.Assert(
- tf.logical_and(
- tf.greater_equal(image_height, crop_height),
- tf.greater_equal(image_width, crop_width)),
- ['Crop size greater than the image size.'])
-
- asserts = [rank_assertions[0], crop_size_assert]
-
- for i in range(1, len(image_list)):
- image = image_list[i]
- asserts.append(rank_assertions[i])
- with tf.control_dependencies([rank_assertions[i]]):
- shape = tf.shape(image)
- height = shape[0]
- width = shape[1]
-
- height_assert = tf.Assert(
- tf.equal(height, image_height),
- ['Wrong height for tensor %s [expected][actual]',
- image.name, height, image_height])
- width_assert = tf.Assert(
- tf.equal(width, image_width),
- ['Wrong width for tensor %s [expected][actual]',
- image.name, width, image_width])
- asserts.extend([height_assert, width_assert])
-
- # Create a random bounding box.
- #
- # Use tf.random_uniform and not numpy.random.rand as doing the former would
- # generate random numbers at graph eval time, unlike the latter which
- # generates random numbers at graph definition time.
- with tf.control_dependencies(asserts):
- max_offset_height = tf.reshape(image_height - crop_height + 1, [])
- max_offset_width = tf.reshape(image_width - crop_width + 1, [])
- offset_height = tf.random_uniform(
- [], maxval=max_offset_height, dtype=tf.int32)
- offset_width = tf.random_uniform(
- [], maxval=max_offset_width, dtype=tf.int32)
-
- return [_crop(image, offset_height, offset_width,
- crop_height, crop_width) for image in image_list]
-
-
-def get_random_scale(min_scale_factor, max_scale_factor, step_size):
- """Gets a random scale value.
-
- Args:
- min_scale_factor: Minimum scale value.
- max_scale_factor: Maximum scale value.
- step_size: The step size from minimum to maximum value.
-
- Returns:
- A random scale value selected between minimum and maximum value.
-
- Raises:
- ValueError: min_scale_factor has unexpected value.
- """
- if min_scale_factor < 0 or min_scale_factor > max_scale_factor:
- raise ValueError('Unexpected value of min_scale_factor.')
-
- if min_scale_factor == max_scale_factor:
- return tf.cast(min_scale_factor, tf.float32)
-
- # When step_size = 0, we sample the value uniformly from [min, max).
- if step_size == 0:
- return tf.random_uniform([1],
- minval=min_scale_factor,
- maxval=max_scale_factor)
-
- # When step_size != 0, we randomly select one discrete value from [min, max].
- num_steps = int((max_scale_factor - min_scale_factor) / step_size + 1)
- scale_factors = tf.lin_space(min_scale_factor, max_scale_factor, num_steps)
- shuffled_scale_factors = tf.random_shuffle(scale_factors)
- return shuffled_scale_factors[0]
-
-
-def randomly_scale_image_and_label(image, label=None, scale=1.0):
- """Randomly scales image and label.
-
- Args:
- image: Image with shape [height, width, 3].
- label: Label with shape [height, width, 1].
- scale: The value to scale image and label.
-
- Returns:
- Scaled image and label.
- """
- # No random scaling if scale == 1.
- if scale == 1.0:
- return image, label
- image_shape = tf.shape(image)
- new_dim = tf.cast(
- tf.cast([image_shape[0], image_shape[1]], tf.float32) * scale,
- tf.int32)
-
- # Need squeeze and expand_dims because image interpolation takes
- # 4D tensors as input.
- image = tf.squeeze(tf.image.resize_bilinear(
- tf.expand_dims(image, 0),
- new_dim,
- align_corners=True), [0])
- if label is not None:
- label = tf.image.resize(
- label,
- new_dim,
- method=get_label_resize_method(label),
- align_corners=True)
-
- return image, label
-
-
-def resolve_shape(tensor, rank=None, scope=None):
- """Fully resolves the shape of a Tensor.
-
- Use as much as possible the shape components already known during graph
- creation and resolve the remaining ones during runtime.
-
- Args:
- tensor: Input tensor whose shape we query.
- rank: The rank of the tensor, provided that we know it.
- scope: Optional name scope.
-
- Returns:
- shape: The full shape of the tensor.
- """
- with tf.name_scope(scope, 'resolve_shape', [tensor]):
- if rank is not None:
- shape = tensor.get_shape().with_rank(rank).as_list()
- else:
- shape = tensor.get_shape().as_list()
-
- if None in shape:
- shape_dynamic = tf.shape(tensor)
- for i in range(len(shape)):
- if shape[i] is None:
- shape[i] = shape_dynamic[i]
-
- return shape
-
-
-def resize_to_range(image,
- label=None,
- min_size=None,
- max_size=None,
- factor=None,
- keep_aspect_ratio=True,
- align_corners=True,
- label_layout_is_chw=False,
- scope=None,
- method=tf.image.ResizeMethod.BILINEAR):
- """Resizes image or label so their sides are within the provided range.
-
- The output size can be described by two cases:
- 1. If the image can be rescaled so its minimum size is equal to min_size
- without the other side exceeding max_size, then do so.
- 2. Otherwise, resize so the largest side is equal to max_size.
-
- An integer in `range(factor)` is added to the computed sides so that the
- final dimensions are multiples of `factor` plus one.
-
- Args:
- image: A 3D tensor of shape [height, width, channels].
- label: (optional) A 3D tensor of shape [height, width, channels] (default)
- or [channels, height, width] when label_layout_is_chw = True.
- min_size: (scalar) desired size of the smaller image side.
- max_size: (scalar) maximum allowed size of the larger image side. Note
- that the output dimension is no larger than max_size and may be slightly
- smaller than max_size when factor is not None.
- factor: Make output size multiple of factor plus one.
- keep_aspect_ratio: Boolean, keep aspect ratio or not. If True, the input
- will be resized while keeping the original aspect ratio. If False, the
- input will be resized to [max_resize_value, max_resize_value] without
- keeping the original aspect ratio.
- align_corners: If True, exactly align all 4 corners of input and output.
- label_layout_is_chw: If true, the label has shape [channel, height, width].
- We support this case because for some instance segmentation dataset, the
- instance segmentation is saved as [num_instances, height, width].
- scope: Optional name scope.
- method: Image resize method. Defaults to tf.image.ResizeMethod.BILINEAR.
-
- Returns:
- A 3-D tensor of shape [new_height, new_width, channels], where the image
- has been resized (with the specified method) so that
- min(new_height, new_width) == ceil(min_size) or
- max(new_height, new_width) == ceil(max_size).
-
- Raises:
- ValueError: If the image is not a 3D tensor.
- """
- with tf.name_scope(scope, 'resize_to_range', [image]):
- new_tensor_list = []
- min_size = tf.cast(min_size, tf.float32)
- if max_size is not None:
- max_size = tf.cast(max_size, tf.float32)
- # Modify the max_size to be a multiple of factor plus 1 and make sure the
- # max dimension after resizing is no larger than max_size.
- if factor is not None:
- max_size = (max_size - (max_size - 1) % factor)
-
- [orig_height, orig_width, _] = resolve_shape(image, rank=3)
- orig_height = tf.cast(orig_height, tf.float32)
- orig_width = tf.cast(orig_width, tf.float32)
- orig_min_size = tf.minimum(orig_height, orig_width)
-
- # Calculate the larger of the possible sizes
- large_scale_factor = min_size / orig_min_size
- large_height = tf.cast(tf.floor(orig_height * large_scale_factor), tf.int32)
- large_width = tf.cast(tf.floor(orig_width * large_scale_factor), tf.int32)
- large_size = tf.stack([large_height, large_width])
-
- new_size = large_size
- if max_size is not None:
- # Calculate the smaller of the possible sizes, use that if the larger
- # is too big.
- orig_max_size = tf.maximum(orig_height, orig_width)
- small_scale_factor = max_size / orig_max_size
- small_height = tf.cast(
- tf.floor(orig_height * small_scale_factor), tf.int32)
- small_width = tf.cast(tf.floor(orig_width * small_scale_factor), tf.int32)
- small_size = tf.stack([small_height, small_width])
- new_size = tf.cond(
- tf.cast(tf.reduce_max(large_size), tf.float32) > max_size,
- lambda: small_size,
- lambda: large_size)
- # Ensure that both output sides are multiples of factor plus one.
- if factor is not None:
- new_size += (factor - (new_size - 1) % factor) % factor
- if not keep_aspect_ratio:
- # If not keep the aspect ratio, we resize everything to max_size, allowing
- # us to do pre-processing without extra padding.
- new_size = [tf.reduce_max(new_size), tf.reduce_max(new_size)]
- new_tensor_list.append(tf.image.resize(
- image, new_size, method=method, align_corners=align_corners))
- if label is not None:
- if label_layout_is_chw:
- # Input label has shape [channel, height, width].
- resized_label = tf.expand_dims(label, 3)
- resized_label = tf.image.resize(
- resized_label,
- new_size,
- method=get_label_resize_method(label),
- align_corners=align_corners)
- resized_label = tf.squeeze(resized_label, 3)
- else:
- # Input label has shape [height, width, channel].
- resized_label = tf.image.resize(
- label,
- new_size,
- method=get_label_resize_method(label),
- align_corners=align_corners)
- new_tensor_list.append(resized_label)
- else:
- new_tensor_list.append(None)
- return new_tensor_list
diff --git a/research/deeplab/core/preprocess_utils_test.py b/research/deeplab/core/preprocess_utils_test.py
deleted file mode 100644
index 606fe46dd62..00000000000
--- a/research/deeplab/core/preprocess_utils_test.py
+++ /dev/null
@@ -1,515 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for preprocess_utils."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from six.moves import range
-import tensorflow as tf
-
-from deeplab.core import preprocess_utils
-
-
-class PreprocessUtilsTest(tf.test.TestCase):
-
- def testNoFlipWhenProbIsZero(self):
- numpy_image = np.dstack([[[5., 6.],
- [9., 0.]],
- [[4., 3.],
- [3., 5.]]])
- image = tf.convert_to_tensor(numpy_image)
-
- with self.test_session():
- actual, is_flipped = preprocess_utils.flip_dim([image], prob=0, dim=0)
- self.assertAllEqual(numpy_image, actual.eval())
- self.assertAllEqual(False, is_flipped.eval())
- actual, is_flipped = preprocess_utils.flip_dim([image], prob=0, dim=1)
- self.assertAllEqual(numpy_image, actual.eval())
- self.assertAllEqual(False, is_flipped.eval())
- actual, is_flipped = preprocess_utils.flip_dim([image], prob=0, dim=2)
- self.assertAllEqual(numpy_image, actual.eval())
- self.assertAllEqual(False, is_flipped.eval())
-
- def testFlipWhenProbIsOne(self):
- numpy_image = np.dstack([[[5., 6.],
- [9., 0.]],
- [[4., 3.],
- [3., 5.]]])
- dim0_flipped = np.dstack([[[9., 0.],
- [5., 6.]],
- [[3., 5.],
- [4., 3.]]])
- dim1_flipped = np.dstack([[[6., 5.],
- [0., 9.]],
- [[3., 4.],
- [5., 3.]]])
- dim2_flipped = np.dstack([[[4., 3.],
- [3., 5.]],
- [[5., 6.],
- [9., 0.]]])
- image = tf.convert_to_tensor(numpy_image)
-
- with self.test_session():
- actual, is_flipped = preprocess_utils.flip_dim([image], prob=1, dim=0)
- self.assertAllEqual(dim0_flipped, actual.eval())
- self.assertAllEqual(True, is_flipped.eval())
- actual, is_flipped = preprocess_utils.flip_dim([image], prob=1, dim=1)
- self.assertAllEqual(dim1_flipped, actual.eval())
- self.assertAllEqual(True, is_flipped.eval())
- actual, is_flipped = preprocess_utils.flip_dim([image], prob=1, dim=2)
- self.assertAllEqual(dim2_flipped, actual.eval())
- self.assertAllEqual(True, is_flipped.eval())
-
- def testFlipMultipleImagesConsistentlyWhenProbIsOne(self):
- numpy_image = np.dstack([[[5., 6.],
- [9., 0.]],
- [[4., 3.],
- [3., 5.]]])
- numpy_label = np.dstack([[[0., 1.],
- [2., 3.]]])
- image_dim1_flipped = np.dstack([[[6., 5.],
- [0., 9.]],
- [[3., 4.],
- [5., 3.]]])
- label_dim1_flipped = np.dstack([[[1., 0.],
- [3., 2.]]])
- image = tf.convert_to_tensor(numpy_image)
- label = tf.convert_to_tensor(numpy_label)
-
- with self.test_session() as sess:
- image, label, is_flipped = preprocess_utils.flip_dim(
- [image, label], prob=1, dim=1)
- actual_image, actual_label = sess.run([image, label])
- self.assertAllEqual(image_dim1_flipped, actual_image)
- self.assertAllEqual(label_dim1_flipped, actual_label)
- self.assertEqual(True, is_flipped.eval())
-
- def testReturnRandomFlipsOnMultipleEvals(self):
- numpy_image = np.dstack([[[5., 6.],
- [9., 0.]],
- [[4., 3.],
- [3., 5.]]])
- dim1_flipped = np.dstack([[[6., 5.],
- [0., 9.]],
- [[3., 4.],
- [5., 3.]]])
- image = tf.convert_to_tensor(numpy_image)
- tf.compat.v1.set_random_seed(53)
-
- with self.test_session() as sess:
- actual, is_flipped = preprocess_utils.flip_dim(
- [image], prob=0.5, dim=1)
- actual_image, actual_is_flipped = sess.run([actual, is_flipped])
- self.assertAllEqual(numpy_image, actual_image)
- self.assertEqual(False, actual_is_flipped)
- actual_image, actual_is_flipped = sess.run([actual, is_flipped])
- self.assertAllEqual(dim1_flipped, actual_image)
- self.assertEqual(True, actual_is_flipped)
-
- def testReturnCorrectCropOfSingleImage(self):
- np.random.seed(0)
-
- height, width = 10, 20
- image = np.random.randint(0, 256, size=(height, width, 3))
-
- crop_height, crop_width = 2, 4
-
- image_placeholder = tf.placeholder(tf.int32, shape=(None, None, 3))
- [cropped] = preprocess_utils.random_crop([image_placeholder],
- crop_height,
- crop_width)
-
- with self.test_session():
- cropped_image = cropped.eval(feed_dict={image_placeholder: image})
-
- # Ensure we can find the cropped image in the original:
- is_found = False
- for x in range(0, width - crop_width + 1):
- for y in range(0, height - crop_height + 1):
- if np.isclose(image[y:y+crop_height, x:x+crop_width, :],
- cropped_image).all():
- is_found = True
- break
-
- self.assertTrue(is_found)
-
- def testRandomCropMaintainsNumberOfChannels(self):
- np.random.seed(0)
-
- crop_height, crop_width = 10, 20
- image = np.random.randint(0, 256, size=(100, 200, 3))
-
- tf.compat.v1.set_random_seed(37)
- image_placeholder = tf.placeholder(tf.int32, shape=(None, None, 3))
- [cropped] = preprocess_utils.random_crop(
- [image_placeholder], crop_height, crop_width)
-
- with self.test_session():
- cropped_image = cropped.eval(feed_dict={image_placeholder: image})
- self.assertTupleEqual(cropped_image.shape, (crop_height, crop_width, 3))
-
- def testReturnDifferentCropAreasOnTwoEvals(self):
- tf.compat.v1.set_random_seed(0)
-
- crop_height, crop_width = 2, 3
- image = np.random.randint(0, 256, size=(100, 200, 3))
- image_placeholder = tf.placeholder(tf.int32, shape=(None, None, 3))
- [cropped] = preprocess_utils.random_crop(
- [image_placeholder], crop_height, crop_width)
-
- with self.test_session():
- crop0 = cropped.eval(feed_dict={image_placeholder: image})
- crop1 = cropped.eval(feed_dict={image_placeholder: image})
- self.assertFalse(np.isclose(crop0, crop1).all())
-
- def testReturnConsistenCropsOfImagesInTheList(self):
- tf.compat.v1.set_random_seed(0)
-
- height, width = 10, 20
- crop_height, crop_width = 2, 3
- labels = np.linspace(0, height * width-1, height * width)
- labels = labels.reshape((height, width, 1))
- image = np.tile(labels, (1, 1, 3))
-
- image_placeholder = tf.placeholder(tf.int32, shape=(None, None, 3))
- label_placeholder = tf.placeholder(tf.int32, shape=(None, None, 1))
- [cropped_image, cropped_label] = preprocess_utils.random_crop(
- [image_placeholder, label_placeholder], crop_height, crop_width)
-
- with self.test_session() as sess:
- cropped_image, cropped_labels = sess.run([cropped_image, cropped_label],
- feed_dict={
- image_placeholder: image,
- label_placeholder: labels})
- for i in range(3):
- self.assertAllEqual(cropped_image[:, :, i], cropped_labels.squeeze())
-
- def testDieOnRandomCropWhenImagesWithDifferentWidth(self):
- crop_height, crop_width = 2, 3
- image1 = tf.placeholder(tf.float32, name='image1', shape=(None, None, 3))
- image2 = tf.placeholder(tf.float32, name='image2', shape=(None, None, 1))
- cropped = preprocess_utils.random_crop(
- [image1, image2], crop_height, crop_width)
-
- with self.test_session() as sess:
- with self.assertRaises(tf.errors.InvalidArgumentError):
- sess.run(cropped, feed_dict={image1: np.random.rand(4, 5, 3),
- image2: np.random.rand(4, 6, 1)})
-
- def testDieOnRandomCropWhenImagesWithDifferentHeight(self):
- crop_height, crop_width = 2, 3
- image1 = tf.placeholder(tf.float32, name='image1', shape=(None, None, 3))
- image2 = tf.placeholder(tf.float32, name='image2', shape=(None, None, 1))
- cropped = preprocess_utils.random_crop(
- [image1, image2], crop_height, crop_width)
-
- with self.test_session() as sess:
- with self.assertRaisesWithPredicateMatch(
- tf.errors.InvalidArgumentError,
- 'Wrong height for tensor'):
- sess.run(cropped, feed_dict={image1: np.random.rand(4, 5, 3),
- image2: np.random.rand(3, 5, 1)})
-
- def testDieOnRandomCropWhenCropSizeIsGreaterThanImage(self):
- crop_height, crop_width = 5, 9
- image1 = tf.placeholder(tf.float32, name='image1', shape=(None, None, 3))
- image2 = tf.placeholder(tf.float32, name='image2', shape=(None, None, 1))
- cropped = preprocess_utils.random_crop(
- [image1, image2], crop_height, crop_width)
-
- with self.test_session() as sess:
- with self.assertRaisesWithPredicateMatch(
- tf.errors.InvalidArgumentError,
- 'Crop size greater than the image size.'):
- sess.run(cropped, feed_dict={image1: np.random.rand(4, 5, 3),
- image2: np.random.rand(4, 5, 1)})
-
- def testReturnPaddedImageWithNonZeroPadValue(self):
- for dtype in [np.int32, np.int64, np.float32, np.float64]:
- image = np.dstack([[[5, 6],
- [9, 0]],
- [[4, 3],
- [3, 5]]]).astype(dtype)
- expected_image = np.dstack([[[255, 255, 255, 255, 255],
- [255, 255, 255, 255, 255],
- [255, 5, 6, 255, 255],
- [255, 9, 0, 255, 255],
- [255, 255, 255, 255, 255]],
- [[255, 255, 255, 255, 255],
- [255, 255, 255, 255, 255],
- [255, 4, 3, 255, 255],
- [255, 3, 5, 255, 255],
- [255, 255, 255, 255, 255]]]).astype(dtype)
-
- with self.session() as sess:
- padded_image = preprocess_utils.pad_to_bounding_box(
- image, 2, 1, 5, 5, 255)
- padded_image = sess.run(padded_image)
- self.assertAllClose(padded_image, expected_image)
- # Add batch size = 1 to image.
- padded_image = preprocess_utils.pad_to_bounding_box(
- np.expand_dims(image, 0), 2, 1, 5, 5, 255)
- padded_image = sess.run(padded_image)
- self.assertAllClose(padded_image, np.expand_dims(expected_image, 0))
-
- def testReturnOriginalImageWhenTargetSizeIsEqualToImageSize(self):
- image = np.dstack([[[5, 6],
- [9, 0]],
- [[4, 3],
- [3, 5]]])
- with self.session() as sess:
- padded_image = preprocess_utils.pad_to_bounding_box(
- image, 0, 0, 2, 2, 255)
- padded_image = sess.run(padded_image)
- self.assertAllClose(padded_image, image)
-
- def testDieOnTargetSizeGreaterThanImageSize(self):
- image = np.dstack([[[5, 6],
- [9, 0]],
- [[4, 3],
- [3, 5]]])
- with self.test_session():
- image_placeholder = tf.placeholder(tf.float32)
- padded_image = preprocess_utils.pad_to_bounding_box(
- image_placeholder, 0, 0, 2, 1, 255)
- with self.assertRaisesWithPredicateMatch(
- tf.errors.InvalidArgumentError,
- 'target_width must be >= width'):
- padded_image.eval(feed_dict={image_placeholder: image})
- padded_image = preprocess_utils.pad_to_bounding_box(
- image_placeholder, 0, 0, 1, 2, 255)
- with self.assertRaisesWithPredicateMatch(
- tf.errors.InvalidArgumentError,
- 'target_height must be >= height'):
- padded_image.eval(feed_dict={image_placeholder: image})
-
- def testDieIfTargetSizeNotPossibleWithGivenOffset(self):
- image = np.dstack([[[5, 6],
- [9, 0]],
- [[4, 3],
- [3, 5]]])
- with self.test_session():
- image_placeholder = tf.placeholder(tf.float32)
- padded_image = preprocess_utils.pad_to_bounding_box(
- image_placeholder, 3, 0, 4, 4, 255)
- with self.assertRaisesWithPredicateMatch(
- tf.errors.InvalidArgumentError,
- 'target size not possible with the given target offsets'):
- padded_image.eval(feed_dict={image_placeholder: image})
-
- def testDieIfImageTensorRankIsTwo(self):
- image = np.vstack([[5, 6],
- [9, 0]])
- with self.test_session():
- image_placeholder = tf.placeholder(tf.float32)
- padded_image = preprocess_utils.pad_to_bounding_box(
- image_placeholder, 0, 0, 2, 2, 255)
- with self.assertRaisesWithPredicateMatch(
- tf.errors.InvalidArgumentError,
- 'Wrong image tensor rank'):
- padded_image.eval(feed_dict={image_placeholder: image})
-
- def testResizeTensorsToRange(self):
- test_shapes = [[60, 40],
- [15, 30],
- [15, 50]]
- min_size = 50
- max_size = 100
- factor = None
- expected_shape_list = [(75, 50, 3),
- (50, 100, 3),
- (30, 100, 3)]
- for i, test_shape in enumerate(test_shapes):
- image = tf.random.normal([test_shape[0], test_shape[1], 3])
- new_tensor_list = preprocess_utils.resize_to_range(
- image=image,
- label=None,
- min_size=min_size,
- max_size=max_size,
- factor=factor,
- align_corners=True)
- with self.test_session() as session:
- resized_image = session.run(new_tensor_list[0])
- self.assertEqual(resized_image.shape, expected_shape_list[i])
-
- def testResizeTensorsToRangeWithFactor(self):
- test_shapes = [[60, 40],
- [15, 30],
- [15, 50]]
- min_size = 50
- max_size = 98
- factor = 8
- expected_image_shape_list = [(81, 57, 3),
- (49, 97, 3),
- (33, 97, 3)]
- expected_label_shape_list = [(81, 57, 1),
- (49, 97, 1),
- (33, 97, 1)]
- for i, test_shape in enumerate(test_shapes):
- image = tf.random.normal([test_shape[0], test_shape[1], 3])
- label = tf.random.normal([test_shape[0], test_shape[1], 1])
- new_tensor_list = preprocess_utils.resize_to_range(
- image=image,
- label=label,
- min_size=min_size,
- max_size=max_size,
- factor=factor,
- align_corners=True)
- with self.test_session() as session:
- new_tensor_list = session.run(new_tensor_list)
- self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
- self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
-
- def testResizeTensorsToRangeWithFactorAndLabelShapeCHW(self):
- test_shapes = [[60, 40],
- [15, 30],
- [15, 50]]
- min_size = 50
- max_size = 98
- factor = 8
- expected_image_shape_list = [(81, 57, 3),
- (49, 97, 3),
- (33, 97, 3)]
- expected_label_shape_list = [(5, 81, 57),
- (5, 49, 97),
- (5, 33, 97)]
- for i, test_shape in enumerate(test_shapes):
- image = tf.random.normal([test_shape[0], test_shape[1], 3])
- label = tf.random.normal([5, test_shape[0], test_shape[1]])
- new_tensor_list = preprocess_utils.resize_to_range(
- image=image,
- label=label,
- min_size=min_size,
- max_size=max_size,
- factor=factor,
- align_corners=True,
- label_layout_is_chw=True)
- with self.test_session() as session:
- new_tensor_list = session.run(new_tensor_list)
- self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
- self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
-
- def testResizeTensorsToRangeWithSimilarMinMaxSizes(self):
- test_shapes = [[60, 40],
- [15, 30],
- [15, 50]]
- # Values set so that one of the side = 97.
- min_size = 96
- max_size = 98
- factor = 8
- expected_image_shape_list = [(97, 65, 3),
- (49, 97, 3),
- (33, 97, 3)]
- expected_label_shape_list = [(97, 65, 1),
- (49, 97, 1),
- (33, 97, 1)]
- for i, test_shape in enumerate(test_shapes):
- image = tf.random.normal([test_shape[0], test_shape[1], 3])
- label = tf.random.normal([test_shape[0], test_shape[1], 1])
- new_tensor_list = preprocess_utils.resize_to_range(
- image=image,
- label=label,
- min_size=min_size,
- max_size=max_size,
- factor=factor,
- align_corners=True)
- with self.test_session() as session:
- new_tensor_list = session.run(new_tensor_list)
- self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
- self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
-
- def testResizeTensorsToRangeWithEqualMaxSize(self):
- test_shapes = [[97, 38],
- [96, 97]]
- # Make max_size equal to the larger value of test_shapes.
- min_size = 97
- max_size = 97
- factor = 8
- expected_image_shape_list = [(97, 41, 3),
- (97, 97, 3)]
- expected_label_shape_list = [(97, 41, 1),
- (97, 97, 1)]
- for i, test_shape in enumerate(test_shapes):
- image = tf.random.normal([test_shape[0], test_shape[1], 3])
- label = tf.random.normal([test_shape[0], test_shape[1], 1])
- new_tensor_list = preprocess_utils.resize_to_range(
- image=image,
- label=label,
- min_size=min_size,
- max_size=max_size,
- factor=factor,
- align_corners=True)
- with self.test_session() as session:
- new_tensor_list = session.run(new_tensor_list)
- self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
- self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
-
- def testResizeTensorsToRangeWithPotentialErrorInTFCeil(self):
- test_shape = [3936, 5248]
- # Make max_size equal to the larger value of test_shapes.
- min_size = 1441
- max_size = 1441
- factor = 16
- expected_image_shape = (1089, 1441, 3)
- expected_label_shape = (1089, 1441, 1)
- image = tf.random.normal([test_shape[0], test_shape[1], 3])
- label = tf.random.normal([test_shape[0], test_shape[1], 1])
- new_tensor_list = preprocess_utils.resize_to_range(
- image=image,
- label=label,
- min_size=min_size,
- max_size=max_size,
- factor=factor,
- align_corners=True)
- with self.test_session() as session:
- new_tensor_list = session.run(new_tensor_list)
- self.assertEqual(new_tensor_list[0].shape, expected_image_shape)
- self.assertEqual(new_tensor_list[1].shape, expected_label_shape)
-
- def testResizeTensorsToRangeWithEqualMaxSizeWithoutAspectRatio(self):
- test_shapes = [[97, 38],
- [96, 97]]
- # Make max_size equal to the larger value of test_shapes.
- min_size = 97
- max_size = 97
- factor = 8
- keep_aspect_ratio = False
- expected_image_shape_list = [(97, 97, 3),
- (97, 97, 3)]
- expected_label_shape_list = [(97, 97, 1),
- (97, 97, 1)]
- for i, test_shape in enumerate(test_shapes):
- image = tf.random.normal([test_shape[0], test_shape[1], 3])
- label = tf.random.normal([test_shape[0], test_shape[1], 1])
- new_tensor_list = preprocess_utils.resize_to_range(
- image=image,
- label=label,
- min_size=min_size,
- max_size=max_size,
- factor=factor,
- keep_aspect_ratio=keep_aspect_ratio,
- align_corners=True)
- with self.test_session() as session:
- new_tensor_list = session.run(new_tensor_list)
- self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
- self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/core/resnet_v1_beta.py b/research/deeplab/core/resnet_v1_beta.py
deleted file mode 100644
index 0d5f1f19a23..00000000000
--- a/research/deeplab/core/resnet_v1_beta.py
+++ /dev/null
@@ -1,827 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Resnet v1 model variants.
-
-Code branched out from slim/nets/resnet_v1.py, and please refer to it for
-more details.
-
-The original version ResNets-v1 were proposed by:
-[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
- Deep Residual Learning for Image Recognition. arXiv:1512.03385
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import functools
-from six.moves import range
-import tensorflow as tf
-from tensorflow.contrib import slim as contrib_slim
-from deeplab.core import conv2d_ws
-from deeplab.core import utils
-from tensorflow.contrib.slim.nets import resnet_utils
-
-slim = contrib_slim
-
-_DEFAULT_MULTI_GRID = [1, 1, 1]
-_DEFAULT_MULTI_GRID_RESNET_18 = [1, 1]
-
-
-@slim.add_arg_scope
-def bottleneck(inputs,
- depth,
- depth_bottleneck,
- stride,
- unit_rate=1,
- rate=1,
- outputs_collections=None,
- scope=None):
- """Bottleneck residual unit variant with BN after convolutions.
-
- This is the original residual unit proposed in [1]. See Fig. 1(a) of [2] for
- its definition. Note that we use here the bottleneck variant which has an
- extra bottleneck layer.
-
- When putting together two consecutive ResNet blocks that use this unit, one
- should use stride = 2 in the last unit of the first block.
-
- Args:
- inputs: A tensor of size [batch, height, width, channels].
- depth: The depth of the ResNet unit output.
- depth_bottleneck: The depth of the bottleneck layers.
- stride: The ResNet unit's stride. Determines the amount of downsampling of
- the units output compared to its input.
- unit_rate: An integer, unit rate for atrous convolution.
- rate: An integer, rate for atrous convolution.
- outputs_collections: Collection to add the ResNet unit output.
- scope: Optional variable_scope.
-
- Returns:
- The ResNet unit's output.
- """
- with tf.variable_scope(scope, 'bottleneck_v1', [inputs]) as sc:
- depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
- if depth == depth_in:
- shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
- else:
- shortcut = conv2d_ws.conv2d(
- inputs,
- depth,
- [1, 1],
- stride=stride,
- activation_fn=None,
- scope='shortcut')
-
- residual = conv2d_ws.conv2d(inputs, depth_bottleneck, [1, 1], stride=1,
- scope='conv1')
- residual = conv2d_ws.conv2d_same(residual, depth_bottleneck, 3, stride,
- rate=rate*unit_rate, scope='conv2')
- residual = conv2d_ws.conv2d(residual, depth, [1, 1], stride=1,
- activation_fn=None, scope='conv3')
- output = tf.nn.relu(shortcut + residual)
-
- return slim.utils.collect_named_outputs(outputs_collections, sc.name,
- output)
-
-
-@slim.add_arg_scope
-def lite_bottleneck(inputs,
- depth,
- stride,
- unit_rate=1,
- rate=1,
- outputs_collections=None,
- scope=None):
- """Bottleneck residual unit variant with BN after convolutions.
-
- This is the original residual unit proposed in [1]. See Fig. 1(a) of [2] for
- its definition. Note that we use here the bottleneck variant which has an
- extra bottleneck layer.
-
- When putting together two consecutive ResNet blocks that use this unit, one
- should use stride = 2 in the last unit of the first block.
-
- Args:
- inputs: A tensor of size [batch, height, width, channels].
- depth: The depth of the ResNet unit output.
- stride: The ResNet unit's stride. Determines the amount of downsampling of
- the units output compared to its input.
- unit_rate: An integer, unit rate for atrous convolution.
- rate: An integer, rate for atrous convolution.
- outputs_collections: Collection to add the ResNet unit output.
- scope: Optional variable_scope.
-
- Returns:
- The ResNet unit's output.
- """
- with tf.variable_scope(scope, 'lite_bottleneck_v1', [inputs]) as sc:
- depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
- if depth == depth_in:
- shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
- else:
- shortcut = conv2d_ws.conv2d(
- inputs,
- depth, [1, 1],
- stride=stride,
- activation_fn=None,
- scope='shortcut')
-
- residual = conv2d_ws.conv2d_same(
- inputs, depth, 3, 1, rate=rate * unit_rate, scope='conv1')
- with slim.arg_scope([conv2d_ws.conv2d], activation_fn=None):
- residual = conv2d_ws.conv2d_same(
- residual, depth, 3, stride, rate=rate * unit_rate, scope='conv2')
- output = tf.nn.relu(shortcut + residual)
-
- return slim.utils.collect_named_outputs(outputs_collections, sc.name,
- output)
-
-
-def root_block_fn_for_beta_variant(net, depth_multiplier=1.0):
- """Gets root_block_fn for beta variant.
-
- ResNet-v1 beta variant modifies the first original 7x7 convolution to three
- 3x3 convolutions.
-
- Args:
- net: A tensor of size [batch, height, width, channels], input to the model.
- depth_multiplier: Controls the number of convolution output channels for
- each input channel. The total number of depthwise convolution output
- channels will be equal to `num_filters_out * depth_multiplier`.
-
- Returns:
- A tensor after three 3x3 convolutions.
- """
- net = conv2d_ws.conv2d_same(
- net, int(64 * depth_multiplier), 3, stride=2, scope='conv1_1')
- net = conv2d_ws.conv2d_same(
- net, int(64 * depth_multiplier), 3, stride=1, scope='conv1_2')
- net = conv2d_ws.conv2d_same(
- net, int(128 * depth_multiplier), 3, stride=1, scope='conv1_3')
-
- return net
-
-
-def resnet_v1_beta(inputs,
- blocks,
- num_classes=None,
- is_training=None,
- global_pool=True,
- output_stride=None,
- root_block_fn=None,
- reuse=None,
- scope=None,
- sync_batch_norm_method='None'):
- """Generator for v1 ResNet models (beta variant).
-
- This function generates a family of modified ResNet v1 models. In particular,
- the first original 7x7 convolution is replaced with three 3x3 convolutions.
- See the resnet_v1_*() methods for specific model instantiations, obtained by
- selecting different block instantiations that produce ResNets of various
- depths.
-
- The code is modified from slim/nets/resnet_v1.py, and please refer to it for
- more details.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- blocks: A list of length equal to the number of ResNet blocks. Each element
- is a resnet_utils.Block object describing the units in the block.
- num_classes: Number of predicted classes for classification tasks. If None
- we return the features before the logit layer.
- is_training: Enable/disable is_training for batch normalization.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- root_block_fn: The function consisting of convolution operations applied to
- the root input. If root_block_fn is None, use the original setting of
- RseNet-v1, which is simply one convolution with 7x7 kernel and stride=2.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- sync_batch_norm_method: String, sync batchnorm method.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is None, then
- net is the output of the last ResNet block, potentially after global
- average pooling. If num_classes is not None, net contains the pre-softmax
- activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: If the target output_stride is not valid.
- """
- if root_block_fn is None:
- root_block_fn = functools.partial(conv2d_ws.conv2d_same,
- num_outputs=64,
- kernel_size=7,
- stride=2,
- scope='conv1')
- batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
- with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
- end_points_collection = sc.original_name_scope + '_end_points'
- with slim.arg_scope([
- conv2d_ws.conv2d, bottleneck, lite_bottleneck,
- resnet_utils.stack_blocks_dense
- ],
- outputs_collections=end_points_collection):
- if is_training is not None:
- arg_scope = slim.arg_scope([batch_norm], is_training=is_training)
- else:
- arg_scope = slim.arg_scope([])
- with arg_scope:
- net = inputs
- if output_stride is not None:
- if output_stride % 4 != 0:
- raise ValueError('The output_stride needs to be a multiple of 4.')
- output_stride //= 4
- net = root_block_fn(net)
- net = slim.max_pool2d(net, 3, stride=2, padding='SAME', scope='pool1')
- net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
-
- if global_pool:
- # Global average pooling.
- net = tf.reduce_mean(net, [1, 2], name='pool5', keepdims=True)
- if num_classes is not None:
- net = conv2d_ws.conv2d(net, num_classes, [1, 1], activation_fn=None,
- normalizer_fn=None, scope='logits',
- use_weight_standardization=False)
- # Convert end_points_collection into a dictionary of end_points.
- end_points = slim.utils.convert_collection_to_dict(
- end_points_collection)
- if num_classes is not None:
- end_points['predictions'] = slim.softmax(net, scope='predictions')
- return net, end_points
-
-
-def resnet_v1_beta_block(scope, base_depth, num_units, stride):
- """Helper function for creating a resnet_v1 beta variant bottleneck block.
-
- Args:
- scope: The scope of the block.
- base_depth: The depth of the bottleneck layer for each unit.
- num_units: The number of units in the block.
- stride: The stride of the block, implemented as a stride in the last unit.
- All other units have stride=1.
-
- Returns:
- A resnet_v1 bottleneck block.
- """
- return resnet_utils.Block(scope, bottleneck, [{
- 'depth': base_depth * 4,
- 'depth_bottleneck': base_depth,
- 'stride': 1,
- 'unit_rate': 1
- }] * (num_units - 1) + [{
- 'depth': base_depth * 4,
- 'depth_bottleneck': base_depth,
- 'stride': stride,
- 'unit_rate': 1
- }])
-
-
-def resnet_v1_small_beta_block(scope, base_depth, num_units, stride):
- """Helper function for creating a resnet_18 beta variant bottleneck block.
-
- Args:
- scope: The scope of the block.
- base_depth: The depth of the bottleneck layer for each unit.
- num_units: The number of units in the block.
- stride: The stride of the block, implemented as a stride in the last unit.
- All other units have stride=1.
-
- Returns:
- A resnet_18 bottleneck block.
- """
- block_args = []
- for _ in range(num_units - 1):
- block_args.append({'depth': base_depth, 'stride': 1, 'unit_rate': 1})
- block_args.append({'depth': base_depth, 'stride': stride, 'unit_rate': 1})
- return resnet_utils.Block(scope, lite_bottleneck, block_args)
-
-
-def resnet_v1_18(inputs,
- num_classes=None,
- is_training=None,
- global_pool=False,
- output_stride=None,
- multi_grid=None,
- reuse=None,
- scope='resnet_v1_18',
- sync_batch_norm_method='None'):
- """Resnet v1 18.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- num_classes: Number of predicted classes for classification tasks. If None
- we return the features before the logit layer.
- is_training: Enable/disable is_training for batch normalization.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- multi_grid: Employ a hierarchy of different atrous rates within network.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- sync_batch_norm_method: String, sync batchnorm method.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is None, then
- net is the output of the last ResNet block, potentially after global
- average pooling. If num_classes is not None, net contains the pre-softmax
- activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: if multi_grid is not None and does not have length = 3.
- """
- if multi_grid is None:
- multi_grid = _DEFAULT_MULTI_GRID_RESNET_18
- else:
- if len(multi_grid) != 2:
- raise ValueError('Expect multi_grid to have length 2.')
-
- block4_args = []
- for rate in multi_grid:
- block4_args.append({'depth': 512, 'stride': 1, 'unit_rate': rate})
-
- blocks = [
- resnet_v1_small_beta_block(
- 'block1', base_depth=64, num_units=2, stride=2),
- resnet_v1_small_beta_block(
- 'block2', base_depth=128, num_units=2, stride=2),
- resnet_v1_small_beta_block(
- 'block3', base_depth=256, num_units=2, stride=2),
- resnet_utils.Block('block4', lite_bottleneck, block4_args),
- ]
- return resnet_v1_beta(
- inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def resnet_v1_18_beta(inputs,
- num_classes=None,
- is_training=None,
- global_pool=False,
- output_stride=None,
- multi_grid=None,
- root_depth_multiplier=0.25,
- reuse=None,
- scope='resnet_v1_18',
- sync_batch_norm_method='None'):
- """Resnet v1 18 beta variant.
-
- This variant modifies the first convolution layer of ResNet-v1-18. In
- particular, it changes the original one 7x7 convolution to three 3x3
- convolutions.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- num_classes: Number of predicted classes for classification tasks. If None
- we return the features before the logit layer.
- is_training: Enable/disable is_training for batch normalization.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- multi_grid: Employ a hierarchy of different atrous rates within network.
- root_depth_multiplier: Float, depth multiplier used for the first three
- convolution layers that replace the 7x7 convolution.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- sync_batch_norm_method: String, sync batchnorm method.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is None, then
- net is the output of the last ResNet block, potentially after global
- average pooling. If num_classes is not None, net contains the pre-softmax
- activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: if multi_grid is not None and does not have length = 3.
- """
- if multi_grid is None:
- multi_grid = _DEFAULT_MULTI_GRID_RESNET_18
- else:
- if len(multi_grid) != 2:
- raise ValueError('Expect multi_grid to have length 2.')
-
- block4_args = []
- for rate in multi_grid:
- block4_args.append({'depth': 512, 'stride': 1, 'unit_rate': rate})
-
- blocks = [
- resnet_v1_small_beta_block(
- 'block1', base_depth=64, num_units=2, stride=2),
- resnet_v1_small_beta_block(
- 'block2', base_depth=128, num_units=2, stride=2),
- resnet_v1_small_beta_block(
- 'block3', base_depth=256, num_units=2, stride=2),
- resnet_utils.Block('block4', lite_bottleneck, block4_args),
- ]
- return resnet_v1_beta(
- inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- root_block_fn=functools.partial(root_block_fn_for_beta_variant,
- depth_multiplier=root_depth_multiplier),
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def resnet_v1_50(inputs,
- num_classes=None,
- is_training=None,
- global_pool=False,
- output_stride=None,
- multi_grid=None,
- reuse=None,
- scope='resnet_v1_50',
- sync_batch_norm_method='None'):
- """Resnet v1 50.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- num_classes: Number of predicted classes for classification tasks. If None
- we return the features before the logit layer.
- is_training: Enable/disable is_training for batch normalization.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- multi_grid: Employ a hierarchy of different atrous rates within network.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- sync_batch_norm_method: String, sync batchnorm method.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is None, then
- net is the output of the last ResNet block, potentially after global
- average pooling. If num_classes is not None, net contains the pre-softmax
- activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: if multi_grid is not None and does not have length = 3.
- """
- if multi_grid is None:
- multi_grid = _DEFAULT_MULTI_GRID
- else:
- if len(multi_grid) != 3:
- raise ValueError('Expect multi_grid to have length 3.')
-
- blocks = [
- resnet_v1_beta_block(
- 'block1', base_depth=64, num_units=3, stride=2),
- resnet_v1_beta_block(
- 'block2', base_depth=128, num_units=4, stride=2),
- resnet_v1_beta_block(
- 'block3', base_depth=256, num_units=6, stride=2),
- resnet_utils.Block('block4', bottleneck, [
- {'depth': 2048, 'depth_bottleneck': 512, 'stride': 1,
- 'unit_rate': rate} for rate in multi_grid]),
- ]
- return resnet_v1_beta(
- inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def resnet_v1_50_beta(inputs,
- num_classes=None,
- is_training=None,
- global_pool=False,
- output_stride=None,
- multi_grid=None,
- reuse=None,
- scope='resnet_v1_50',
- sync_batch_norm_method='None'):
- """Resnet v1 50 beta variant.
-
- This variant modifies the first convolution layer of ResNet-v1-50. In
- particular, it changes the original one 7x7 convolution to three 3x3
- convolutions.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- num_classes: Number of predicted classes for classification tasks. If None
- we return the features before the logit layer.
- is_training: Enable/disable is_training for batch normalization.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- multi_grid: Employ a hierarchy of different atrous rates within network.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- sync_batch_norm_method: String, sync batchnorm method.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is None, then
- net is the output of the last ResNet block, potentially after global
- average pooling. If num_classes is not None, net contains the pre-softmax
- activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: if multi_grid is not None and does not have length = 3.
- """
- if multi_grid is None:
- multi_grid = _DEFAULT_MULTI_GRID
- else:
- if len(multi_grid) != 3:
- raise ValueError('Expect multi_grid to have length 3.')
-
- blocks = [
- resnet_v1_beta_block(
- 'block1', base_depth=64, num_units=3, stride=2),
- resnet_v1_beta_block(
- 'block2', base_depth=128, num_units=4, stride=2),
- resnet_v1_beta_block(
- 'block3', base_depth=256, num_units=6, stride=2),
- resnet_utils.Block('block4', bottleneck, [
- {'depth': 2048, 'depth_bottleneck': 512, 'stride': 1,
- 'unit_rate': rate} for rate in multi_grid]),
- ]
- return resnet_v1_beta(
- inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- root_block_fn=functools.partial(root_block_fn_for_beta_variant),
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def resnet_v1_101(inputs,
- num_classes=None,
- is_training=None,
- global_pool=False,
- output_stride=None,
- multi_grid=None,
- reuse=None,
- scope='resnet_v1_101',
- sync_batch_norm_method='None'):
- """Resnet v1 101.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- num_classes: Number of predicted classes for classification tasks. If None
- we return the features before the logit layer.
- is_training: Enable/disable is_training for batch normalization.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- multi_grid: Employ a hierarchy of different atrous rates within network.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- sync_batch_norm_method: String, sync batchnorm method.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is None, then
- net is the output of the last ResNet block, potentially after global
- average pooling. If num_classes is not None, net contains the pre-softmax
- activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: if multi_grid is not None and does not have length = 3.
- """
- if multi_grid is None:
- multi_grid = _DEFAULT_MULTI_GRID
- else:
- if len(multi_grid) != 3:
- raise ValueError('Expect multi_grid to have length 3.')
-
- blocks = [
- resnet_v1_beta_block(
- 'block1', base_depth=64, num_units=3, stride=2),
- resnet_v1_beta_block(
- 'block2', base_depth=128, num_units=4, stride=2),
- resnet_v1_beta_block(
- 'block3', base_depth=256, num_units=23, stride=2),
- resnet_utils.Block('block4', bottleneck, [
- {'depth': 2048, 'depth_bottleneck': 512, 'stride': 1,
- 'unit_rate': rate} for rate in multi_grid]),
- ]
- return resnet_v1_beta(
- inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def resnet_v1_101_beta(inputs,
- num_classes=None,
- is_training=None,
- global_pool=False,
- output_stride=None,
- multi_grid=None,
- reuse=None,
- scope='resnet_v1_101',
- sync_batch_norm_method='None'):
- """Resnet v1 101 beta variant.
-
- This variant modifies the first convolution layer of ResNet-v1-101. In
- particular, it changes the original one 7x7 convolution to three 3x3
- convolutions.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- num_classes: Number of predicted classes for classification tasks. If None
- we return the features before the logit layer.
- is_training: Enable/disable is_training for batch normalization.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- multi_grid: Employ a hierarchy of different atrous rates within network.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- sync_batch_norm_method: String, sync batchnorm method.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is None, then
- net is the output of the last ResNet block, potentially after global
- average pooling. If num_classes is not None, net contains the pre-softmax
- activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: if multi_grid is not None and does not have length = 3.
- """
- if multi_grid is None:
- multi_grid = _DEFAULT_MULTI_GRID
- else:
- if len(multi_grid) != 3:
- raise ValueError('Expect multi_grid to have length 3.')
-
- blocks = [
- resnet_v1_beta_block(
- 'block1', base_depth=64, num_units=3, stride=2),
- resnet_v1_beta_block(
- 'block2', base_depth=128, num_units=4, stride=2),
- resnet_v1_beta_block(
- 'block3', base_depth=256, num_units=23, stride=2),
- resnet_utils.Block('block4', bottleneck, [
- {'depth': 2048, 'depth_bottleneck': 512, 'stride': 1,
- 'unit_rate': rate} for rate in multi_grid]),
- ]
- return resnet_v1_beta(
- inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- root_block_fn=functools.partial(root_block_fn_for_beta_variant),
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def resnet_arg_scope(weight_decay=0.0001,
- batch_norm_decay=0.997,
- batch_norm_epsilon=1e-5,
- batch_norm_scale=True,
- activation_fn=tf.nn.relu,
- use_batch_norm=True,
- sync_batch_norm_method='None',
- normalization_method='unspecified',
- use_weight_standardization=False):
- """Defines the default ResNet arg scope.
-
- Args:
- weight_decay: The weight decay to use for regularizing the model.
- batch_norm_decay: The moving average decay when estimating layer activation
- statistics in batch normalization.
- batch_norm_epsilon: Small constant to prevent division by zero when
- normalizing activations by their variance in batch normalization.
- batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
- activations in the batch normalization layer.
- activation_fn: The activation function which is used in ResNet.
- use_batch_norm: Deprecated in favor of normalization_method.
- sync_batch_norm_method: String, sync batchnorm method.
- normalization_method: String, one of `batch`, `none`, or `group`, to use
- batch normalization, no normalization, or group normalization.
- use_weight_standardization: Boolean, whether to use weight standardization.
-
- Returns:
- An `arg_scope` to use for the resnet models.
- """
- batch_norm_params = {
- 'decay': batch_norm_decay,
- 'epsilon': batch_norm_epsilon,
- 'scale': batch_norm_scale,
- }
- batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
- if normalization_method == 'batch':
- normalizer_fn = batch_norm
- elif normalization_method == 'none':
- normalizer_fn = None
- elif normalization_method == 'group':
- normalizer_fn = slim.group_norm
- elif normalization_method == 'unspecified':
- normalizer_fn = batch_norm if use_batch_norm else None
- else:
- raise ValueError('Unrecognized normalization_method %s' %
- normalization_method)
-
- with slim.arg_scope([conv2d_ws.conv2d],
- weights_regularizer=slim.l2_regularizer(weight_decay),
- weights_initializer=slim.variance_scaling_initializer(),
- activation_fn=activation_fn,
- normalizer_fn=normalizer_fn,
- use_weight_standardization=use_weight_standardization):
- with slim.arg_scope([batch_norm], **batch_norm_params):
- # The following implies padding='SAME' for pool1, which makes feature
- # alignment easier for dense prediction tasks. This is also used in
- # https://github.com/facebook/fb.resnet.torch. However the accompanying
- # code of 'Deep Residual Learning for Image Recognition' uses
- # padding='VALID' for pool1. You can switch to that choice by setting
- # slim.arg_scope([slim.max_pool2d], padding='VALID').
- with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
- return arg_sc
diff --git a/research/deeplab/core/resnet_v1_beta_test.py b/research/deeplab/core/resnet_v1_beta_test.py
deleted file mode 100644
index 8b61edcce21..00000000000
--- a/research/deeplab/core/resnet_v1_beta_test.py
+++ /dev/null
@@ -1,564 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for resnet_v1_beta module."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import functools
-
-import numpy as np
-import six
-import tensorflow as tf
-from tensorflow.contrib import slim as contrib_slim
-
-from deeplab.core import resnet_v1_beta
-from tensorflow.contrib.slim.nets import resnet_utils
-
-slim = contrib_slim
-
-
-def create_test_input(batch, height, width, channels):
- """Create test input tensor."""
- if None in [batch, height, width, channels]:
- return tf.placeholder(tf.float32, (batch, height, width, channels))
- else:
- return tf.to_float(
- np.tile(
- np.reshape(
- np.reshape(np.arange(height), [height, 1]) +
- np.reshape(np.arange(width), [1, width]),
- [1, height, width, 1]),
- [batch, 1, 1, channels]))
-
-
-class ResnetCompleteNetworkTest(tf.test.TestCase):
- """Tests with complete small ResNet v1 networks."""
-
- def _resnet_small_lite_bottleneck(self,
- inputs,
- num_classes=None,
- is_training=True,
- global_pool=True,
- output_stride=None,
- multi_grid=None,
- reuse=None,
- scope='resnet_v1_small'):
- """A shallow and thin ResNet v1 with lite_bottleneck."""
- if multi_grid is None:
- multi_grid = [1, 1]
- else:
- if len(multi_grid) != 2:
- raise ValueError('Expect multi_grid to have length 2.')
- block = resnet_v1_beta.resnet_v1_small_beta_block
- blocks = [
- block('block1', base_depth=1, num_units=1, stride=2),
- block('block2', base_depth=2, num_units=1, stride=2),
- block('block3', base_depth=4, num_units=1, stride=2),
- resnet_utils.Block('block4', resnet_v1_beta.lite_bottleneck, [
- {'depth': 8,
- 'stride': 1,
- 'unit_rate': rate} for rate in multi_grid])]
- return resnet_v1_beta.resnet_v1_beta(
- inputs,
- blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- root_block_fn=functools.partial(
- resnet_v1_beta.root_block_fn_for_beta_variant,
- depth_multiplier=0.25),
- reuse=reuse,
- scope=scope)
-
- def _resnet_small(self,
- inputs,
- num_classes=None,
- is_training=True,
- global_pool=True,
- output_stride=None,
- multi_grid=None,
- reuse=None,
- scope='resnet_v1_small'):
- """A shallow and thin ResNet v1 for faster tests."""
- if multi_grid is None:
- multi_grid = [1, 1, 1]
- else:
- if len(multi_grid) != 3:
- raise ValueError('Expect multi_grid to have length 3.')
-
- block = resnet_v1_beta.resnet_v1_beta_block
- blocks = [
- block('block1', base_depth=1, num_units=1, stride=2),
- block('block2', base_depth=2, num_units=1, stride=2),
- block('block3', base_depth=4, num_units=1, stride=2),
- resnet_utils.Block('block4', resnet_v1_beta.bottleneck, [
- {'depth': 32, 'depth_bottleneck': 8, 'stride': 1,
- 'unit_rate': rate} for rate in multi_grid])]
-
- return resnet_v1_beta.resnet_v1_beta(
- inputs,
- blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- root_block_fn=functools.partial(
- resnet_v1_beta.root_block_fn_for_beta_variant),
- reuse=reuse,
- scope=scope)
-
- def testClassificationEndPointsWithLiteBottleneck(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- logits, end_points = self._resnet_small_lite_bottleneck(
- inputs,
- num_classes,
- global_pool=global_pool,
- scope='resnet')
-
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
- self.assertIn('predictions', end_points)
- self.assertListEqual(end_points['predictions'].get_shape().as_list(),
- [2, 1, 1, num_classes])
-
- def testClassificationEndPointsWithMultigridAndLiteBottleneck(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- multi_grid = [1, 2]
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- logits, end_points = self._resnet_small_lite_bottleneck(
- inputs,
- num_classes,
- global_pool=global_pool,
- multi_grid=multi_grid,
- scope='resnet')
-
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
- self.assertIn('predictions', end_points)
- self.assertListEqual(end_points['predictions'].get_shape().as_list(),
- [2, 1, 1, num_classes])
-
- def testClassificationShapesWithLiteBottleneck(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- _, end_points = self._resnet_small_lite_bottleneck(
- inputs,
- num_classes,
- global_pool=global_pool,
- scope='resnet')
- endpoint_to_shape = {
- 'resnet/conv1_1': [2, 112, 112, 16],
- 'resnet/conv1_2': [2, 112, 112, 16],
- 'resnet/conv1_3': [2, 112, 112, 32],
- 'resnet/block1': [2, 28, 28, 1],
- 'resnet/block2': [2, 14, 14, 2],
- 'resnet/block3': [2, 7, 7, 4],
- 'resnet/block4': [2, 7, 7, 8]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testFullyConvolutionalEndpointShapesWithLiteBottleneck(self):
- global_pool = False
- num_classes = 10
- inputs = create_test_input(2, 321, 321, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- _, end_points = self._resnet_small_lite_bottleneck(
- inputs,
- num_classes,
- global_pool=global_pool,
- scope='resnet')
- endpoint_to_shape = {
- 'resnet/conv1_1': [2, 161, 161, 16],
- 'resnet/conv1_2': [2, 161, 161, 16],
- 'resnet/conv1_3': [2, 161, 161, 32],
- 'resnet/block1': [2, 41, 41, 1],
- 'resnet/block2': [2, 21, 21, 2],
- 'resnet/block3': [2, 11, 11, 4],
- 'resnet/block4': [2, 11, 11, 8]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testAtrousFullyConvolutionalEndpointShapesWithLiteBottleneck(self):
- global_pool = False
- num_classes = 10
- output_stride = 8
- inputs = create_test_input(2, 321, 321, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- _, end_points = self._resnet_small_lite_bottleneck(
- inputs,
- num_classes,
- global_pool=global_pool,
- output_stride=output_stride,
- scope='resnet')
- endpoint_to_shape = {
- 'resnet/conv1_1': [2, 161, 161, 16],
- 'resnet/conv1_2': [2, 161, 161, 16],
- 'resnet/conv1_3': [2, 161, 161, 32],
- 'resnet/block1': [2, 41, 41, 1],
- 'resnet/block2': [2, 41, 41, 2],
- 'resnet/block3': [2, 41, 41, 4],
- 'resnet/block4': [2, 41, 41, 8]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testAtrousFullyConvolutionalValuesWithLiteBottleneck(self):
- """Verify dense feature extraction with atrous convolution."""
- nominal_stride = 32
- for output_stride in [4, 8, 16, 32, None]:
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- with tf.Graph().as_default():
- with self.test_session() as sess:
- tf.set_random_seed(0)
- inputs = create_test_input(2, 81, 81, 3)
- # Dense feature extraction followed by subsampling.
- output, _ = self._resnet_small_lite_bottleneck(
- inputs,
- None,
- is_training=False,
- global_pool=False,
- output_stride=output_stride)
- if output_stride is None:
- factor = 1
- else:
- factor = nominal_stride // output_stride
- output = resnet_utils.subsample(output, factor)
- # Make the two networks use the same weights.
- tf.get_variable_scope().reuse_variables()
- # Feature extraction at the nominal network rate.
- expected, _ = self._resnet_small_lite_bottleneck(
- inputs,
- None,
- is_training=False,
- global_pool=False)
- sess.run(tf.global_variables_initializer())
- self.assertAllClose(output.eval(), expected.eval(),
- atol=1e-4, rtol=1e-4)
-
- def testUnknownBatchSizeWithLiteBottleneck(self):
- batch = 2
- height, width = 65, 65
- global_pool = True
- num_classes = 10
- inputs = create_test_input(None, height, width, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- logits, _ = self._resnet_small_lite_bottleneck(
- inputs,
- num_classes,
- global_pool=global_pool,
- scope='resnet')
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(),
- [None, 1, 1, num_classes])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(logits, {inputs: images.eval()})
- self.assertEqual(output.shape, (batch, 1, 1, num_classes))
-
- def testFullyConvolutionalUnknownHeightWidthWithLiteBottleneck(self):
- batch = 2
- height, width = 65, 65
- global_pool = False
- inputs = create_test_input(batch, None, None, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- output, _ = self._resnet_small_lite_bottleneck(
- inputs,
- None,
- global_pool=global_pool)
- self.assertListEqual(output.get_shape().as_list(),
- [batch, None, None, 8])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(output, {inputs: images.eval()})
- self.assertEqual(output.shape, (batch, 3, 3, 8))
-
- def testAtrousFullyConvolutionalUnknownHeightWidthWithLiteBottleneck(self):
- batch = 2
- height, width = 65, 65
- global_pool = False
- output_stride = 8
- inputs = create_test_input(batch, None, None, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- output, _ = self._resnet_small_lite_bottleneck(
- inputs,
- None,
- global_pool=global_pool,
- output_stride=output_stride)
- self.assertListEqual(output.get_shape().as_list(),
- [batch, None, None, 8])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(output, {inputs: images.eval()})
- self.assertEqual(output.shape, (batch, 9, 9, 8))
-
- def testClassificationEndPoints(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- logits, end_points = self._resnet_small(inputs,
- num_classes,
- global_pool=global_pool,
- scope='resnet')
-
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
- self.assertIn('predictions', end_points)
- self.assertListEqual(end_points['predictions'].get_shape().as_list(),
- [2, 1, 1, num_classes])
-
- def testClassificationEndPointsWithWS(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- with slim.arg_scope(
- resnet_v1_beta.resnet_arg_scope(use_weight_standardization=True)):
- logits, end_points = self._resnet_small(
- inputs, num_classes, global_pool=global_pool, scope='resnet')
-
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
- self.assertIn('predictions', end_points)
- self.assertListEqual(end_points['predictions'].get_shape().as_list(),
- [2, 1, 1, num_classes])
-
- def testClassificationEndPointsWithGN(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- with slim.arg_scope(
- resnet_v1_beta.resnet_arg_scope(normalization_method='group')):
- with slim.arg_scope([slim.group_norm], groups=1):
- logits, end_points = self._resnet_small(
- inputs, num_classes, global_pool=global_pool, scope='resnet')
-
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
- self.assertIn('predictions', end_points)
- self.assertListEqual(end_points['predictions'].get_shape().as_list(),
- [2, 1, 1, num_classes])
-
- def testInvalidGroupsWithGN(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- with self.assertRaisesRegexp(ValueError, 'Invalid groups'):
- with slim.arg_scope(
- resnet_v1_beta.resnet_arg_scope(normalization_method='group')):
- with slim.arg_scope([slim.group_norm], groups=32):
- _, _ = self._resnet_small(
- inputs, num_classes, global_pool=global_pool, scope='resnet')
-
- def testClassificationEndPointsWithGNWS(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- with slim.arg_scope(
- resnet_v1_beta.resnet_arg_scope(
- normalization_method='group', use_weight_standardization=True)):
- with slim.arg_scope([slim.group_norm], groups=1):
- logits, end_points = self._resnet_small(
- inputs, num_classes, global_pool=global_pool, scope='resnet')
-
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
- self.assertIn('predictions', end_points)
- self.assertListEqual(end_points['predictions'].get_shape().as_list(),
- [2, 1, 1, num_classes])
-
- def testClassificationEndPointsWithMultigrid(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- multi_grid = [1, 2, 4]
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- logits, end_points = self._resnet_small(inputs,
- num_classes,
- global_pool=global_pool,
- multi_grid=multi_grid,
- scope='resnet')
-
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
- self.assertIn('predictions', end_points)
- self.assertListEqual(end_points['predictions'].get_shape().as_list(),
- [2, 1, 1, num_classes])
-
- def testClassificationShapes(self):
- global_pool = True
- num_classes = 10
- inputs = create_test_input(2, 224, 224, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- _, end_points = self._resnet_small(inputs,
- num_classes,
- global_pool=global_pool,
- scope='resnet')
- endpoint_to_shape = {
- 'resnet/conv1_1': [2, 112, 112, 64],
- 'resnet/conv1_2': [2, 112, 112, 64],
- 'resnet/conv1_3': [2, 112, 112, 128],
- 'resnet/block1': [2, 28, 28, 4],
- 'resnet/block2': [2, 14, 14, 8],
- 'resnet/block3': [2, 7, 7, 16],
- 'resnet/block4': [2, 7, 7, 32]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testFullyConvolutionalEndpointShapes(self):
- global_pool = False
- num_classes = 10
- inputs = create_test_input(2, 321, 321, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- _, end_points = self._resnet_small(inputs,
- num_classes,
- global_pool=global_pool,
- scope='resnet')
- endpoint_to_shape = {
- 'resnet/conv1_1': [2, 161, 161, 64],
- 'resnet/conv1_2': [2, 161, 161, 64],
- 'resnet/conv1_3': [2, 161, 161, 128],
- 'resnet/block1': [2, 41, 41, 4],
- 'resnet/block2': [2, 21, 21, 8],
- 'resnet/block3': [2, 11, 11, 16],
- 'resnet/block4': [2, 11, 11, 32]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testAtrousFullyConvolutionalEndpointShapes(self):
- global_pool = False
- num_classes = 10
- output_stride = 8
- inputs = create_test_input(2, 321, 321, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- _, end_points = self._resnet_small(inputs,
- num_classes,
- global_pool=global_pool,
- output_stride=output_stride,
- scope='resnet')
- endpoint_to_shape = {
- 'resnet/conv1_1': [2, 161, 161, 64],
- 'resnet/conv1_2': [2, 161, 161, 64],
- 'resnet/conv1_3': [2, 161, 161, 128],
- 'resnet/block1': [2, 41, 41, 4],
- 'resnet/block2': [2, 41, 41, 8],
- 'resnet/block3': [2, 41, 41, 16],
- 'resnet/block4': [2, 41, 41, 32]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testAtrousFullyConvolutionalValues(self):
- """Verify dense feature extraction with atrous convolution."""
- nominal_stride = 32
- for output_stride in [4, 8, 16, 32, None]:
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- with tf.Graph().as_default():
- with self.test_session() as sess:
- tf.set_random_seed(0)
- inputs = create_test_input(2, 81, 81, 3)
- # Dense feature extraction followed by subsampling.
- output, _ = self._resnet_small(inputs,
- None,
- is_training=False,
- global_pool=False,
- output_stride=output_stride)
- if output_stride is None:
- factor = 1
- else:
- factor = nominal_stride // output_stride
- output = resnet_utils.subsample(output, factor)
- # Make the two networks use the same weights.
- tf.get_variable_scope().reuse_variables()
- # Feature extraction at the nominal network rate.
- expected, _ = self._resnet_small(inputs,
- None,
- is_training=False,
- global_pool=False)
- sess.run(tf.global_variables_initializer())
- self.assertAllClose(output.eval(), expected.eval(),
- atol=1e-4, rtol=1e-4)
-
- def testUnknownBatchSize(self):
- batch = 2
- height, width = 65, 65
- global_pool = True
- num_classes = 10
- inputs = create_test_input(None, height, width, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- logits, _ = self._resnet_small(inputs,
- num_classes,
- global_pool=global_pool,
- scope='resnet')
- self.assertTrue(logits.op.name.startswith('resnet/logits'))
- self.assertListEqual(logits.get_shape().as_list(),
- [None, 1, 1, num_classes])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(logits, {inputs: images.eval()})
- self.assertEqual(output.shape, (batch, 1, 1, num_classes))
-
- def testFullyConvolutionalUnknownHeightWidth(self):
- batch = 2
- height, width = 65, 65
- global_pool = False
- inputs = create_test_input(batch, None, None, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- output, _ = self._resnet_small(inputs,
- None,
- global_pool=global_pool)
- self.assertListEqual(output.get_shape().as_list(),
- [batch, None, None, 32])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(output, {inputs: images.eval()})
- self.assertEqual(output.shape, (batch, 3, 3, 32))
-
- def testAtrousFullyConvolutionalUnknownHeightWidth(self):
- batch = 2
- height, width = 65, 65
- global_pool = False
- output_stride = 8
- inputs = create_test_input(batch, None, None, 3)
- with slim.arg_scope(resnet_utils.resnet_arg_scope()):
- output, _ = self._resnet_small(inputs,
- None,
- global_pool=global_pool,
- output_stride=output_stride)
- self.assertListEqual(output.get_shape().as_list(),
- [batch, None, None, 32])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(output, {inputs: images.eval()})
- self.assertEqual(output.shape, (batch, 9, 9, 32))
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/core/utils.py b/research/deeplab/core/utils.py
deleted file mode 100644
index 4bf3d09ad46..00000000000
--- a/research/deeplab/core/utils.py
+++ /dev/null
@@ -1,214 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""This script contains utility functions."""
-import tensorflow as tf
-from tensorflow.contrib import framework as contrib_framework
-from tensorflow.contrib import slim as contrib_slim
-
-slim = contrib_slim
-
-
-# Quantized version of sigmoid function.
-q_sigmoid = lambda x: tf.nn.relu6(x + 3) * 0.16667
-
-
-def resize_bilinear(images, size, output_dtype=tf.float32):
- """Returns resized images as output_type.
-
- Args:
- images: A tensor of size [batch, height_in, width_in, channels].
- size: A 1-D int32 Tensor of 2 elements: new_height, new_width. The new size
- for the images.
- output_dtype: The destination type.
- Returns:
- A tensor of size [batch, height_out, width_out, channels] as a dtype of
- output_dtype.
- """
- images = tf.image.resize_bilinear(images, size, align_corners=True)
- return tf.cast(images, dtype=output_dtype)
-
-
-def scale_dimension(dim, scale):
- """Scales the input dimension.
-
- Args:
- dim: Input dimension (a scalar or a scalar Tensor).
- scale: The amount of scaling applied to the input.
-
- Returns:
- Scaled dimension.
- """
- if isinstance(dim, tf.Tensor):
- return tf.cast((tf.to_float(dim) - 1.0) * scale + 1.0, dtype=tf.int32)
- else:
- return int((float(dim) - 1.0) * scale + 1.0)
-
-
-def split_separable_conv2d(inputs,
- filters,
- kernel_size=3,
- rate=1,
- weight_decay=0.00004,
- depthwise_weights_initializer_stddev=0.33,
- pointwise_weights_initializer_stddev=0.06,
- scope=None):
- """Splits a separable conv2d into depthwise and pointwise conv2d.
-
- This operation differs from `tf.layers.separable_conv2d` as this operation
- applies activation function between depthwise and pointwise conv2d.
-
- Args:
- inputs: Input tensor with shape [batch, height, width, channels].
- filters: Number of filters in the 1x1 pointwise convolution.
- kernel_size: A list of length 2: [kernel_height, kernel_width] of
- of the filters. Can be an int if both values are the same.
- rate: Atrous convolution rate for the depthwise convolution.
- weight_decay: The weight decay to use for regularizing the model.
- depthwise_weights_initializer_stddev: The standard deviation of the
- truncated normal weight initializer for depthwise convolution.
- pointwise_weights_initializer_stddev: The standard deviation of the
- truncated normal weight initializer for pointwise convolution.
- scope: Optional scope for the operation.
-
- Returns:
- Computed features after split separable conv2d.
- """
- outputs = slim.separable_conv2d(
- inputs,
- None,
- kernel_size=kernel_size,
- depth_multiplier=1,
- rate=rate,
- weights_initializer=tf.truncated_normal_initializer(
- stddev=depthwise_weights_initializer_stddev),
- weights_regularizer=None,
- scope=scope + '_depthwise')
- return slim.conv2d(
- outputs,
- filters,
- 1,
- weights_initializer=tf.truncated_normal_initializer(
- stddev=pointwise_weights_initializer_stddev),
- weights_regularizer=slim.l2_regularizer(weight_decay),
- scope=scope + '_pointwise')
-
-
-def get_label_weight_mask(labels, ignore_label, num_classes, label_weights=1.0):
- """Gets the label weight mask.
-
- Args:
- labels: A Tensor of labels with the shape of [-1].
- ignore_label: Integer, label to ignore.
- num_classes: Integer, the number of semantic classes.
- label_weights: A float or a list of weights. If it is a float, it means all
- the labels have the same weight. If it is a list of weights, then each
- element in the list represents the weight for the label of its index, for
- example, label_weights = [0.1, 0.5] means the weight for label 0 is 0.1
- and the weight for label 1 is 0.5.
-
- Returns:
- A Tensor of label weights with the same shape of labels, each element is the
- weight for the label with the same index in labels and the element is 0.0
- if the label is to ignore.
-
- Raises:
- ValueError: If label_weights is neither a float nor a list, or if
- label_weights is a list and its length is not equal to num_classes.
- """
- if not isinstance(label_weights, (float, list)):
- raise ValueError(
- 'The type of label_weights is invalid, it must be a float or a list.')
-
- if isinstance(label_weights, list) and len(label_weights) != num_classes:
- raise ValueError(
- 'Length of label_weights must be equal to num_classes if it is a list, '
- 'label_weights: %s, num_classes: %d.' % (label_weights, num_classes))
-
- not_ignore_mask = tf.not_equal(labels, ignore_label)
- not_ignore_mask = tf.cast(not_ignore_mask, tf.float32)
- if isinstance(label_weights, float):
- return not_ignore_mask * label_weights
-
- label_weights = tf.constant(label_weights, tf.float32)
- weight_mask = tf.einsum('...y,y->...',
- tf.one_hot(labels, num_classes, dtype=tf.float32),
- label_weights)
- return tf.multiply(not_ignore_mask, weight_mask)
-
-
-def get_batch_norm_fn(sync_batch_norm_method):
- """Gets batch norm function.
-
- Currently we only support the following methods:
- - `None` (no sync batch norm). We use slim.batch_norm in this case.
-
- Args:
- sync_batch_norm_method: String, method used to sync batch norm.
-
- Returns:
- Batchnorm function.
-
- Raises:
- ValueError: If sync_batch_norm_method is not supported.
- """
- if sync_batch_norm_method == 'None':
- return slim.batch_norm
- else:
- raise ValueError('Unsupported sync_batch_norm_method.')
-
-
-def get_batch_norm_params(decay=0.9997,
- epsilon=1e-5,
- center=True,
- scale=True,
- is_training=True,
- sync_batch_norm_method='None',
- initialize_gamma_as_zeros=False):
- """Gets batch norm parameters.
-
- Args:
- decay: Float, decay for the moving average.
- epsilon: Float, value added to variance to avoid dividing by zero.
- center: Boolean. If True, add offset of `beta` to normalized tensor. If
- False,`beta` is ignored.
- scale: Boolean. If True, multiply by `gamma`. If False, `gamma` is not used.
- is_training: Boolean, whether or not the layer is in training mode.
- sync_batch_norm_method: String, method used to sync batch norm.
- initialize_gamma_as_zeros: Boolean, initializing `gamma` as zeros or not.
-
- Returns:
- A dictionary for batchnorm parameters.
-
- Raises:
- ValueError: If sync_batch_norm_method is not supported.
- """
- batch_norm_params = {
- 'is_training': is_training,
- 'decay': decay,
- 'epsilon': epsilon,
- 'scale': scale,
- 'center': center,
- }
- if initialize_gamma_as_zeros:
- if sync_batch_norm_method == 'None':
- # Slim-type gamma_initialier.
- batch_norm_params['param_initializers'] = {
- 'gamma': tf.zeros_initializer(),
- }
- else:
- raise ValueError('Unsupported sync_batch_norm_method.')
- return batch_norm_params
diff --git a/research/deeplab/core/utils_test.py b/research/deeplab/core/utils_test.py
deleted file mode 100644
index cfdb63ef2d3..00000000000
--- a/research/deeplab/core/utils_test.py
+++ /dev/null
@@ -1,90 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for utils.py."""
-
-import numpy as np
-import tensorflow as tf
-
-from deeplab.core import utils
-
-
-class UtilsTest(tf.test.TestCase):
-
- def testScaleDimensionOutput(self):
- self.assertEqual(161, utils.scale_dimension(321, 0.5))
- self.assertEqual(193, utils.scale_dimension(321, 0.6))
- self.assertEqual(241, utils.scale_dimension(321, 0.75))
-
- def testGetLabelWeightMask_withFloatLabelWeights(self):
- labels = tf.constant([0, 4, 1, 3, 2])
- ignore_label = 4
- num_classes = 5
- label_weights = 0.5
- expected_label_weight_mask = np.array([0.5, 0.0, 0.5, 0.5, 0.5],
- dtype=np.float32)
-
- with self.test_session() as sess:
- label_weight_mask = utils.get_label_weight_mask(
- labels, ignore_label, num_classes, label_weights=label_weights)
- label_weight_mask = sess.run(label_weight_mask)
- self.assertAllEqual(label_weight_mask, expected_label_weight_mask)
-
- def testGetLabelWeightMask_withListLabelWeights(self):
- labels = tf.constant([0, 4, 1, 3, 2])
- ignore_label = 4
- num_classes = 5
- label_weights = [0.0, 0.1, 0.2, 0.3, 0.4]
- expected_label_weight_mask = np.array([0.0, 0.0, 0.1, 0.3, 0.2],
- dtype=np.float32)
-
- with self.test_session() as sess:
- label_weight_mask = utils.get_label_weight_mask(
- labels, ignore_label, num_classes, label_weights=label_weights)
- label_weight_mask = sess.run(label_weight_mask)
- self.assertAllEqual(label_weight_mask, expected_label_weight_mask)
-
- def testGetLabelWeightMask_withInvalidLabelWeightsType(self):
- labels = tf.constant([0, 4, 1, 3, 2])
- ignore_label = 4
- num_classes = 5
-
- self.assertRaisesWithRegexpMatch(
- ValueError,
- '^The type of label_weights is invalid, it must be a float or a list',
- utils.get_label_weight_mask,
- labels=labels,
- ignore_label=ignore_label,
- num_classes=num_classes,
- label_weights=None)
-
- def testGetLabelWeightMask_withInvalidLabelWeightsLength(self):
- labels = tf.constant([0, 4, 1, 3, 2])
- ignore_label = 4
- num_classes = 5
- label_weights = [0.0, 0.1, 0.2]
-
- self.assertRaisesWithRegexpMatch(
- ValueError,
- '^Length of label_weights must be equal to num_classes if it is a list',
- utils.get_label_weight_mask,
- labels=labels,
- ignore_label=ignore_label,
- num_classes=num_classes,
- label_weights=label_weights)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/core/xception.py b/research/deeplab/core/xception.py
deleted file mode 100644
index f9925714716..00000000000
--- a/research/deeplab/core/xception.py
+++ /dev/null
@@ -1,945 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-r"""Xception model.
-
-"Xception: Deep Learning with Depthwise Separable Convolutions"
-Fran{\c{c}}ois Chollet
-https://arxiv.org/abs/1610.02357
-
-We implement the modified version by Jifeng Dai et al. for their COCO 2017
-detection challenge submission, where the model is made deeper and has aligned
-features for dense prediction tasks. See their slides for details:
-
-"Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge
-2017 Entry"
-Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei and Jifeng Dai
-ICCV 2017 COCO Challenge workshop
-http://presentations.cocodataset.org/COCO17-Detect-MSRA.pdf
-
-We made a few more changes on top of MSRA's modifications:
-1. Fully convolutional: All the max-pooling layers are replaced with separable
- conv2d with stride = 2. This allows us to use atrous convolution to extract
- feature maps at any resolution.
-
-2. We support adding ReLU and BatchNorm after depthwise convolution, motivated
- by the design of MobileNetv1.
-
-"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
-Applications"
-Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang,
-Tobias Weyand, Marco Andreetto, Hartwig Adam
-https://arxiv.org/abs/1704.04861
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-from six.moves import range
-import tensorflow as tf
-from tensorflow.contrib import slim as contrib_slim
-
-from deeplab.core import utils
-from tensorflow.contrib.slim.nets import resnet_utils
-from nets.mobilenet import conv_blocks as mobilenet_v3_ops
-
-slim = contrib_slim
-
-
-_DEFAULT_MULTI_GRID = [1, 1, 1]
-# The cap for tf.clip_by_value.
-_CLIP_CAP = 6
-
-
-class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
- """A named tuple describing an Xception block.
-
- Its parts are:
- scope: The scope of the block.
- unit_fn: The Xception unit function which takes as input a tensor and
- returns another tensor with the output of the Xception unit.
- args: A list of length equal to the number of units in the block. The list
- contains one dictionary for each unit in the block to serve as argument to
- unit_fn.
- """
-
-
-def fixed_padding(inputs, kernel_size, rate=1):
- """Pads the input along the spatial dimensions independently of input size.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels].
- kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
- Should be a positive integer.
- rate: An integer, rate for atrous convolution.
-
- Returns:
- output: A tensor of size [batch, height_out, width_out, channels] with the
- input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
- """
- kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
- pad_total = kernel_size_effective - 1
- pad_beg = pad_total // 2
- pad_end = pad_total - pad_beg
- padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end],
- [pad_beg, pad_end], [0, 0]])
- return padded_inputs
-
-
-@slim.add_arg_scope
-def separable_conv2d_same(inputs,
- num_outputs,
- kernel_size,
- depth_multiplier,
- stride,
- rate=1,
- use_explicit_padding=True,
- regularize_depthwise=False,
- scope=None,
- **kwargs):
- """Strided 2-D separable convolution with 'SAME' padding.
-
- If stride > 1 and use_explicit_padding is True, then we do explicit zero-
- padding, followed by conv2d with 'VALID' padding.
-
- Note that
-
- net = separable_conv2d_same(inputs, num_outputs, 3,
- depth_multiplier=1, stride=stride)
-
- is equivalent to
-
- net = slim.separable_conv2d(inputs, num_outputs, 3,
- depth_multiplier=1, stride=1, padding='SAME')
- net = resnet_utils.subsample(net, factor=stride)
-
- whereas
-
- net = slim.separable_conv2d(inputs, num_outputs, 3, stride=stride,
- depth_multiplier=1, padding='SAME')
-
- is different when the input's height or width is even, which is why we add the
- current function.
-
- Consequently, if the input feature map has even height or width, setting
- `use_explicit_padding=False` will result in feature misalignment by one pixel
- along the corresponding dimension.
-
- Args:
- inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
- num_outputs: An integer, the number of output filters.
- kernel_size: An int with the kernel_size of the filters.
- depth_multiplier: The number of depthwise convolution output channels for
- each input channel. The total number of depthwise convolution output
- channels will be equal to `num_filters_in * depth_multiplier`.
- stride: An integer, the output stride.
- rate: An integer, rate for atrous convolution.
- use_explicit_padding: If True, use explicit padding to make the model fully
- compatible with the open source version, otherwise use the native
- Tensorflow 'SAME' padding.
- regularize_depthwise: Whether or not apply L2-norm regularization on the
- depthwise convolution weights.
- scope: Scope.
- **kwargs: additional keyword arguments to pass to slim.conv2d
-
- Returns:
- output: A 4-D tensor of size [batch, height_out, width_out, channels] with
- the convolution output.
- """
- def _separable_conv2d(padding):
- """Wrapper for separable conv2d."""
- return slim.separable_conv2d(inputs,
- num_outputs,
- kernel_size,
- depth_multiplier=depth_multiplier,
- stride=stride,
- rate=rate,
- padding=padding,
- scope=scope,
- **kwargs)
- def _split_separable_conv2d(padding):
- """Splits separable conv2d into depthwise and pointwise conv2d."""
- outputs = slim.separable_conv2d(inputs,
- None,
- kernel_size,
- depth_multiplier=depth_multiplier,
- stride=stride,
- rate=rate,
- padding=padding,
- scope=scope + '_depthwise',
- **kwargs)
- return slim.conv2d(outputs,
- num_outputs,
- 1,
- scope=scope + '_pointwise',
- **kwargs)
- if stride == 1 or not use_explicit_padding:
- if regularize_depthwise:
- outputs = _separable_conv2d(padding='SAME')
- else:
- outputs = _split_separable_conv2d(padding='SAME')
- else:
- inputs = fixed_padding(inputs, kernel_size, rate)
- if regularize_depthwise:
- outputs = _separable_conv2d(padding='VALID')
- else:
- outputs = _split_separable_conv2d(padding='VALID')
- return outputs
-
-
-@slim.add_arg_scope
-def xception_module(inputs,
- depth_list,
- skip_connection_type,
- stride,
- kernel_size=3,
- unit_rate_list=None,
- rate=1,
- activation_fn_in_separable_conv=False,
- regularize_depthwise=False,
- outputs_collections=None,
- scope=None,
- use_bounded_activation=False,
- use_explicit_padding=True,
- use_squeeze_excite=False,
- se_pool_size=None):
- """An Xception module.
-
- The output of one Xception module is equal to the sum of `residual` and
- `shortcut`, where `residual` is the feature computed by three separable
- convolution. The `shortcut` is the feature computed by 1x1 convolution with
- or without striding. In some cases, the `shortcut` path could be a simple
- identity function or none (i.e, no shortcut).
-
- Note that we replace the max pooling operations in the Xception module with
- another separable convolution with striding, since atrous rate is not properly
- supported in current TensorFlow max pooling implementation.
-
- Args:
- inputs: A tensor of size [batch, height, width, channels].
- depth_list: A list of three integers specifying the depth values of one
- Xception module.
- skip_connection_type: Skip connection type for the residual path. Only
- supports 'conv', 'sum', or 'none'.
- stride: The block unit's stride. Determines the amount of downsampling of
- the units output compared to its input.
- kernel_size: Integer, convolution kernel size.
- unit_rate_list: A list of three integers, determining the unit rate for
- each separable convolution in the xception module.
- rate: An integer, rate for atrous convolution.
- activation_fn_in_separable_conv: Includes activation function in the
- separable convolution or not.
- regularize_depthwise: Whether or not apply L2-norm regularization on the
- depthwise convolution weights.
- outputs_collections: Collection to add the Xception unit output.
- scope: Optional variable_scope.
- use_bounded_activation: Whether or not to use bounded activations. Bounded
- activations better lend themselves to quantized inference.
- use_explicit_padding: If True, use explicit padding to make the model fully
- compatible with the open source version, otherwise use the native
- Tensorflow 'SAME' padding.
- use_squeeze_excite: Boolean, use squeeze-and-excitation or not.
- se_pool_size: None or integer specifying the pooling size used in SE module.
-
- Returns:
- The Xception module's output.
-
- Raises:
- ValueError: If depth_list and unit_rate_list do not contain three elements,
- or if stride != 1 for the third separable convolution operation in the
- residual path, or unsupported skip connection type.
- """
- if len(depth_list) != 3:
- raise ValueError('Expect three elements in depth_list.')
- if unit_rate_list:
- if len(unit_rate_list) != 3:
- raise ValueError('Expect three elements in unit_rate_list.')
-
- with tf.variable_scope(scope, 'xception_module', [inputs]) as sc:
- residual = inputs
-
- def _separable_conv(features, depth, kernel_size, depth_multiplier,
- regularize_depthwise, rate, stride, scope):
- """Separable conv block."""
- if activation_fn_in_separable_conv:
- activation_fn = tf.nn.relu6 if use_bounded_activation else tf.nn.relu
- else:
- if use_bounded_activation:
- # When use_bounded_activation is True, we clip the feature values and
- # apply relu6 for activation.
- activation_fn = lambda x: tf.clip_by_value(x, -_CLIP_CAP, _CLIP_CAP)
- features = tf.nn.relu6(features)
- else:
- # Original network design.
- activation_fn = None
- features = tf.nn.relu(features)
- return separable_conv2d_same(features,
- depth,
- kernel_size,
- depth_multiplier=depth_multiplier,
- stride=stride,
- rate=rate,
- activation_fn=activation_fn,
- use_explicit_padding=use_explicit_padding,
- regularize_depthwise=regularize_depthwise,
- scope=scope)
- for i in range(3):
- residual = _separable_conv(residual,
- depth_list[i],
- kernel_size=kernel_size,
- depth_multiplier=1,
- regularize_depthwise=regularize_depthwise,
- rate=rate*unit_rate_list[i],
- stride=stride if i == 2 else 1,
- scope='separable_conv' + str(i+1))
- if use_squeeze_excite:
- residual = mobilenet_v3_ops.squeeze_excite(
- input_tensor=residual,
- squeeze_factor=16,
- inner_activation_fn=tf.nn.relu,
- gating_fn=lambda x: tf.nn.relu6(x+3)*0.16667,
- pool=se_pool_size)
-
- if skip_connection_type == 'conv':
- shortcut = slim.conv2d(inputs,
- depth_list[-1],
- [1, 1],
- stride=stride,
- activation_fn=None,
- scope='shortcut')
- if use_bounded_activation:
- residual = tf.clip_by_value(residual, -_CLIP_CAP, _CLIP_CAP)
- shortcut = tf.clip_by_value(shortcut, -_CLIP_CAP, _CLIP_CAP)
- outputs = residual + shortcut
- if use_bounded_activation:
- outputs = tf.nn.relu6(outputs)
- elif skip_connection_type == 'sum':
- if use_bounded_activation:
- residual = tf.clip_by_value(residual, -_CLIP_CAP, _CLIP_CAP)
- inputs = tf.clip_by_value(inputs, -_CLIP_CAP, _CLIP_CAP)
- outputs = residual + inputs
- if use_bounded_activation:
- outputs = tf.nn.relu6(outputs)
- elif skip_connection_type == 'none':
- outputs = residual
- else:
- raise ValueError('Unsupported skip connection type.')
-
- return slim.utils.collect_named_outputs(outputs_collections,
- sc.name,
- outputs)
-
-
-@slim.add_arg_scope
-def stack_blocks_dense(net,
- blocks,
- output_stride=None,
- outputs_collections=None):
- """Stacks Xception blocks and controls output feature density.
-
- First, this function creates scopes for the Xception in the form of
- 'block_name/unit_1', 'block_name/unit_2', etc.
-
- Second, this function allows the user to explicitly control the output
- stride, which is the ratio of the input to output spatial resolution. This
- is useful for dense prediction tasks such as semantic segmentation or
- object detection.
-
- Control of the output feature density is implemented by atrous convolution.
-
- Args:
- net: A tensor of size [batch, height, width, channels].
- blocks: A list of length equal to the number of Xception blocks. Each
- element is an Xception Block object describing the units in the block.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution, which needs to be equal to
- the product of unit strides from the start up to some level of Xception.
- For example, if the Xception employs units with strides 1, 2, 1, 3, 4, 1,
- then valid values for the output_stride are 1, 2, 6, 24 or None (which
- is equivalent to output_stride=24).
- outputs_collections: Collection to add the Xception block outputs.
-
- Returns:
- net: Output tensor with stride equal to the specified output_stride.
-
- Raises:
- ValueError: If the target output_stride is not valid.
- """
- # The current_stride variable keeps track of the effective stride of the
- # activations. This allows us to invoke atrous convolution whenever applying
- # the next residual unit would result in the activations having stride larger
- # than the target output_stride.
- current_stride = 1
-
- # The atrous convolution rate parameter.
- rate = 1
-
- for block in blocks:
- with tf.variable_scope(block.scope, 'block', [net]) as sc:
- for i, unit in enumerate(block.args):
- if output_stride is not None and current_stride > output_stride:
- raise ValueError('The target output_stride cannot be reached.')
- with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
- # If we have reached the target output_stride, then we need to employ
- # atrous convolution with stride=1 and multiply the atrous rate by the
- # current unit's stride for use in subsequent layers.
- if output_stride is not None and current_stride == output_stride:
- net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
- rate *= unit.get('stride', 1)
- else:
- net = block.unit_fn(net, rate=1, **unit)
- current_stride *= unit.get('stride', 1)
-
- # Collect activations at the block's end before performing subsampling.
- net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
-
- if output_stride is not None and current_stride != output_stride:
- raise ValueError('The target output_stride cannot be reached.')
-
- return net
-
-
-def xception(inputs,
- blocks,
- num_classes=None,
- is_training=True,
- global_pool=True,
- keep_prob=0.5,
- output_stride=None,
- reuse=None,
- scope=None,
- sync_batch_norm_method='None'):
- """Generator for Xception models.
-
- This function generates a family of Xception models. See the xception_*()
- methods for specific model instantiations, obtained by selecting different
- block instantiations that produce Xception of various depths.
-
- Args:
- inputs: A tensor of size [batch, height_in, width_in, channels]. Must be
- floating point. If a pretrained checkpoint is used, pixel values should be
- the same as during training (see go/slim-classification-models for
- specifics).
- blocks: A list of length equal to the number of Xception blocks. Each
- element is an Xception Block object describing the units in the block.
- num_classes: Number of predicted classes for classification tasks.
- If 0 or None, we return the features before the logit layer.
- is_training: whether batch_norm layers are in training mode.
- global_pool: If True, we perform global average pooling before computing the
- logits. Set to True for image classification, False for dense prediction.
- keep_prob: Keep probability used in the pre-logits dropout layer.
- output_stride: If None, then the output will be computed at the nominal
- network stride. If output_stride is not None, it specifies the requested
- ratio of input to output spatial resolution.
- reuse: whether or not the network and its variables should be reused. To be
- able to reuse 'scope' must be given.
- scope: Optional variable_scope.
- sync_batch_norm_method: String, sync batchnorm method. Currently only
- support `None`.
-
- Returns:
- net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
- If global_pool is False, then height_out and width_out are reduced by a
- factor of output_stride compared to the respective height_in and width_in,
- else both height_out and width_out equal one. If num_classes is 0 or None,
- then net is the output of the last Xception block, potentially after
- global average pooling. If num_classes is a non-zero integer, net contains
- the pre-softmax activations.
- end_points: A dictionary from components of the network to the corresponding
- activation.
-
- Raises:
- ValueError: If the target output_stride is not valid.
- """
- with tf.variable_scope(
- scope, 'xception', [inputs], reuse=reuse) as sc:
- end_points_collection = sc.original_name_scope + 'end_points'
- batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
- with slim.arg_scope([slim.conv2d,
- slim.separable_conv2d,
- xception_module,
- stack_blocks_dense],
- outputs_collections=end_points_collection):
- with slim.arg_scope([batch_norm], is_training=is_training):
- net = inputs
- if output_stride is not None:
- if output_stride % 2 != 0:
- raise ValueError('The output_stride needs to be a multiple of 2.')
- output_stride //= 2
- # Root block function operated on inputs.
- net = resnet_utils.conv2d_same(net, 32, 3, stride=2,
- scope='entry_flow/conv1_1')
- net = resnet_utils.conv2d_same(net, 64, 3, stride=1,
- scope='entry_flow/conv1_2')
-
- # Extract features for entry_flow, middle_flow, and exit_flow.
- net = stack_blocks_dense(net, blocks, output_stride)
-
- # Convert end_points_collection into a dictionary of end_points.
- end_points = slim.utils.convert_collection_to_dict(
- end_points_collection, clear_collection=True)
-
- if global_pool:
- # Global average pooling.
- net = tf.reduce_mean(net, [1, 2], name='global_pool', keepdims=True)
- end_points['global_pool'] = net
- if num_classes:
- net = slim.dropout(net, keep_prob=keep_prob, is_training=is_training,
- scope='prelogits_dropout')
- net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
- normalizer_fn=None, scope='logits')
- end_points[sc.name + '/logits'] = net
- end_points['predictions'] = slim.softmax(net, scope='predictions')
- return net, end_points
-
-
-def xception_block(scope,
- depth_list,
- skip_connection_type,
- activation_fn_in_separable_conv,
- regularize_depthwise,
- num_units,
- stride,
- kernel_size=3,
- unit_rate_list=None,
- use_squeeze_excite=False,
- se_pool_size=None):
- """Helper function for creating a Xception block.
-
- Args:
- scope: The scope of the block.
- depth_list: The depth of the bottleneck layer for each unit.
- skip_connection_type: Skip connection type for the residual path. Only
- supports 'conv', 'sum', or 'none'.
- activation_fn_in_separable_conv: Includes activation function in the
- separable convolution or not.
- regularize_depthwise: Whether or not apply L2-norm regularization on the
- depthwise convolution weights.
- num_units: The number of units in the block.
- stride: The stride of the block, implemented as a stride in the last unit.
- All other units have stride=1.
- kernel_size: Integer, convolution kernel size.
- unit_rate_list: A list of three integers, determining the unit rate in the
- corresponding xception block.
- use_squeeze_excite: Boolean, use squeeze-and-excitation or not.
- se_pool_size: None or integer specifying the pooling size used in SE module.
-
- Returns:
- An Xception block.
- """
- if unit_rate_list is None:
- unit_rate_list = _DEFAULT_MULTI_GRID
- return Block(scope, xception_module, [{
- 'depth_list': depth_list,
- 'skip_connection_type': skip_connection_type,
- 'activation_fn_in_separable_conv': activation_fn_in_separable_conv,
- 'regularize_depthwise': regularize_depthwise,
- 'stride': stride,
- 'kernel_size': kernel_size,
- 'unit_rate_list': unit_rate_list,
- 'use_squeeze_excite': use_squeeze_excite,
- 'se_pool_size': se_pool_size,
- }] * num_units)
-
-
-def xception_41(inputs,
- num_classes=None,
- is_training=True,
- global_pool=True,
- keep_prob=0.5,
- output_stride=None,
- regularize_depthwise=False,
- multi_grid=None,
- reuse=None,
- scope='xception_41',
- sync_batch_norm_method='None'):
- """Xception-41 model."""
- blocks = [
- xception_block('entry_flow/block1',
- depth_list=[128, 128, 128],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2),
- xception_block('entry_flow/block2',
- depth_list=[256, 256, 256],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2),
- xception_block('entry_flow/block3',
- depth_list=[728, 728, 728],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2),
- xception_block('middle_flow/block1',
- depth_list=[728, 728, 728],
- skip_connection_type='sum',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=8,
- stride=1),
- xception_block('exit_flow/block1',
- depth_list=[728, 1024, 1024],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2),
- xception_block('exit_flow/block2',
- depth_list=[1536, 1536, 2048],
- skip_connection_type='none',
- activation_fn_in_separable_conv=True,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=1,
- unit_rate_list=multi_grid),
- ]
- return xception(inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- keep_prob=keep_prob,
- output_stride=output_stride,
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def xception_65_factory(inputs,
- num_classes=None,
- is_training=True,
- global_pool=True,
- keep_prob=0.5,
- output_stride=None,
- regularize_depthwise=False,
- kernel_size=3,
- multi_grid=None,
- reuse=None,
- use_squeeze_excite=False,
- se_pool_size=None,
- scope='xception_65',
- sync_batch_norm_method='None'):
- """Xception-65 model factory."""
- blocks = [
- xception_block('entry_flow/block1',
- depth_list=[128, 128, 128],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2,
- kernel_size=kernel_size,
- use_squeeze_excite=False,
- se_pool_size=se_pool_size),
- xception_block('entry_flow/block2',
- depth_list=[256, 256, 256],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2,
- kernel_size=kernel_size,
- use_squeeze_excite=False,
- se_pool_size=se_pool_size),
- xception_block('entry_flow/block3',
- depth_list=[728, 728, 728],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2,
- kernel_size=kernel_size,
- use_squeeze_excite=use_squeeze_excite,
- se_pool_size=se_pool_size),
- xception_block('middle_flow/block1',
- depth_list=[728, 728, 728],
- skip_connection_type='sum',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=16,
- stride=1,
- kernel_size=kernel_size,
- use_squeeze_excite=use_squeeze_excite,
- se_pool_size=se_pool_size),
- xception_block('exit_flow/block1',
- depth_list=[728, 1024, 1024],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2,
- kernel_size=kernel_size,
- use_squeeze_excite=use_squeeze_excite,
- se_pool_size=se_pool_size),
- xception_block('exit_flow/block2',
- depth_list=[1536, 1536, 2048],
- skip_connection_type='none',
- activation_fn_in_separable_conv=True,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=1,
- kernel_size=kernel_size,
- unit_rate_list=multi_grid,
- use_squeeze_excite=False,
- se_pool_size=se_pool_size),
- ]
- return xception(inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- keep_prob=keep_prob,
- output_stride=output_stride,
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def xception_65(inputs,
- num_classes=None,
- is_training=True,
- global_pool=True,
- keep_prob=0.5,
- output_stride=None,
- regularize_depthwise=False,
- multi_grid=None,
- reuse=None,
- scope='xception_65',
- sync_batch_norm_method='None'):
- """Xception-65 model."""
- return xception_65_factory(
- inputs=inputs,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- keep_prob=keep_prob,
- output_stride=output_stride,
- regularize_depthwise=regularize_depthwise,
- multi_grid=multi_grid,
- reuse=reuse,
- scope=scope,
- use_squeeze_excite=False,
- se_pool_size=None,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def xception_71_factory(inputs,
- num_classes=None,
- is_training=True,
- global_pool=True,
- keep_prob=0.5,
- output_stride=None,
- regularize_depthwise=False,
- kernel_size=3,
- multi_grid=None,
- reuse=None,
- scope='xception_71',
- use_squeeze_excite=False,
- se_pool_size=None,
- sync_batch_norm_method='None'):
- """Xception-71 model factory."""
- blocks = [
- xception_block('entry_flow/block1',
- depth_list=[128, 128, 128],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2,
- kernel_size=kernel_size,
- use_squeeze_excite=False,
- se_pool_size=se_pool_size),
- xception_block('entry_flow/block2',
- depth_list=[256, 256, 256],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=1,
- kernel_size=kernel_size,
- use_squeeze_excite=False,
- se_pool_size=se_pool_size),
- xception_block('entry_flow/block3',
- depth_list=[256, 256, 256],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2,
- kernel_size=kernel_size,
- use_squeeze_excite=False,
- se_pool_size=se_pool_size),
- xception_block('entry_flow/block4',
- depth_list=[728, 728, 728],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=1,
- kernel_size=kernel_size,
- use_squeeze_excite=use_squeeze_excite,
- se_pool_size=se_pool_size),
- xception_block('entry_flow/block5',
- depth_list=[728, 728, 728],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2,
- kernel_size=kernel_size,
- use_squeeze_excite=use_squeeze_excite,
- se_pool_size=se_pool_size),
- xception_block('middle_flow/block1',
- depth_list=[728, 728, 728],
- skip_connection_type='sum',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=16,
- stride=1,
- kernel_size=kernel_size,
- use_squeeze_excite=use_squeeze_excite,
- se_pool_size=se_pool_size),
- xception_block('exit_flow/block1',
- depth_list=[728, 1024, 1024],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2,
- kernel_size=kernel_size,
- use_squeeze_excite=use_squeeze_excite,
- se_pool_size=se_pool_size),
- xception_block('exit_flow/block2',
- depth_list=[1536, 1536, 2048],
- skip_connection_type='none',
- activation_fn_in_separable_conv=True,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=1,
- kernel_size=kernel_size,
- unit_rate_list=multi_grid,
- use_squeeze_excite=False,
- se_pool_size=se_pool_size),
- ]
- return xception(inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- keep_prob=keep_prob,
- output_stride=output_stride,
- reuse=reuse,
- scope=scope,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def xception_71(inputs,
- num_classes=None,
- is_training=True,
- global_pool=True,
- keep_prob=0.5,
- output_stride=None,
- regularize_depthwise=False,
- multi_grid=None,
- reuse=None,
- scope='xception_71',
- sync_batch_norm_method='None'):
- """Xception-71 model."""
- return xception_71_factory(
- inputs=inputs,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- keep_prob=keep_prob,
- output_stride=output_stride,
- regularize_depthwise=regularize_depthwise,
- multi_grid=multi_grid,
- reuse=reuse,
- scope=scope,
- use_squeeze_excite=False,
- se_pool_size=None,
- sync_batch_norm_method=sync_batch_norm_method)
-
-
-def xception_arg_scope(weight_decay=0.00004,
- batch_norm_decay=0.9997,
- batch_norm_epsilon=0.001,
- batch_norm_scale=True,
- weights_initializer_stddev=0.09,
- regularize_depthwise=False,
- use_batch_norm=True,
- use_bounded_activation=False,
- sync_batch_norm_method='None'):
- """Defines the default Xception arg scope.
-
- Args:
- weight_decay: The weight decay to use for regularizing the model.
- batch_norm_decay: The moving average decay when estimating layer activation
- statistics in batch normalization.
- batch_norm_epsilon: Small constant to prevent division by zero when
- normalizing activations by their variance in batch normalization.
- batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
- activations in the batch normalization layer.
- weights_initializer_stddev: The standard deviation of the trunctated normal
- weight initializer.
- regularize_depthwise: Whether or not apply L2-norm regularization on the
- depthwise convolution weights.
- use_batch_norm: Whether or not to use batch normalization.
- use_bounded_activation: Whether or not to use bounded activations. Bounded
- activations better lend themselves to quantized inference.
- sync_batch_norm_method: String, sync batchnorm method. Currently only
- support `None`. Also, it is only effective for Xception.
-
- Returns:
- An `arg_scope` to use for the Xception models.
- """
- batch_norm_params = {
- 'decay': batch_norm_decay,
- 'epsilon': batch_norm_epsilon,
- 'scale': batch_norm_scale,
- }
- if regularize_depthwise:
- depthwise_regularizer = slim.l2_regularizer(weight_decay)
- else:
- depthwise_regularizer = None
- activation_fn = tf.nn.relu6 if use_bounded_activation else tf.nn.relu
- batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
- with slim.arg_scope(
- [slim.conv2d, slim.separable_conv2d],
- weights_initializer=tf.truncated_normal_initializer(
- stddev=weights_initializer_stddev),
- activation_fn=activation_fn,
- normalizer_fn=batch_norm if use_batch_norm else None):
- with slim.arg_scope([batch_norm], **batch_norm_params):
- with slim.arg_scope(
- [slim.conv2d],
- weights_regularizer=slim.l2_regularizer(weight_decay)):
- with slim.arg_scope(
- [slim.separable_conv2d],
- weights_regularizer=depthwise_regularizer):
- with slim.arg_scope(
- [xception_module],
- use_bounded_activation=use_bounded_activation,
- use_explicit_padding=not use_bounded_activation) as arg_sc:
- return arg_sc
diff --git a/research/deeplab/core/xception_test.py b/research/deeplab/core/xception_test.py
deleted file mode 100644
index fc338daa6e5..00000000000
--- a/research/deeplab/core/xception_test.py
+++ /dev/null
@@ -1,488 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for xception.py."""
-import numpy as np
-import six
-import tensorflow as tf
-from tensorflow.contrib import slim as contrib_slim
-
-from deeplab.core import xception
-from tensorflow.contrib.slim.nets import resnet_utils
-
-slim = contrib_slim
-
-
-def create_test_input(batch, height, width, channels):
- """Create test input tensor."""
- if None in [batch, height, width, channels]:
- return tf.placeholder(tf.float32, (batch, height, width, channels))
- else:
- return tf.cast(
- np.tile(
- np.reshape(
- np.reshape(np.arange(height), [height, 1]) +
- np.reshape(np.arange(width), [1, width]),
- [1, height, width, 1]),
- [batch, 1, 1, channels]),
- tf.float32)
-
-
-class UtilityFunctionTest(tf.test.TestCase):
-
- def testSeparableConv2DSameWithInputEvenSize(self):
- n, n2 = 4, 2
-
- # Input image.
- x = create_test_input(1, n, n, 1)
-
- # Convolution kernel.
- dw = create_test_input(1, 3, 3, 1)
- dw = tf.reshape(dw, [3, 3, 1, 1])
-
- tf.get_variable('Conv/depthwise_weights', initializer=dw)
- tf.get_variable('Conv/pointwise_weights',
- initializer=tf.ones([1, 1, 1, 1]))
- tf.get_variable('Conv/biases', initializer=tf.zeros([1]))
- tf.get_variable_scope().reuse_variables()
-
- y1 = slim.separable_conv2d(x, 1, [3, 3], depth_multiplier=1,
- stride=1, scope='Conv')
- y1_expected = tf.cast([[14, 28, 43, 26],
- [28, 48, 66, 37],
- [43, 66, 84, 46],
- [26, 37, 46, 22]], tf.float32)
- y1_expected = tf.reshape(y1_expected, [1, n, n, 1])
-
- y2 = resnet_utils.subsample(y1, 2)
- y2_expected = tf.cast([[14, 43],
- [43, 84]], tf.float32)
- y2_expected = tf.reshape(y2_expected, [1, n2, n2, 1])
-
- y3 = xception.separable_conv2d_same(x, 1, 3, depth_multiplier=1,
- regularize_depthwise=True,
- stride=2, scope='Conv')
- y3_expected = y2_expected
-
- y4 = slim.separable_conv2d(x, 1, [3, 3], depth_multiplier=1,
- stride=2, scope='Conv')
- y4_expected = tf.cast([[48, 37],
- [37, 22]], tf.float32)
- y4_expected = tf.reshape(y4_expected, [1, n2, n2, 1])
-
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- self.assertAllClose(y1.eval(), y1_expected.eval())
- self.assertAllClose(y2.eval(), y2_expected.eval())
- self.assertAllClose(y3.eval(), y3_expected.eval())
- self.assertAllClose(y4.eval(), y4_expected.eval())
-
- def testSeparableConv2DSameWithInputOddSize(self):
- n, n2 = 5, 3
-
- # Input image.
- x = create_test_input(1, n, n, 1)
-
- # Convolution kernel.
- dw = create_test_input(1, 3, 3, 1)
- dw = tf.reshape(dw, [3, 3, 1, 1])
-
- tf.get_variable('Conv/depthwise_weights', initializer=dw)
- tf.get_variable('Conv/pointwise_weights',
- initializer=tf.ones([1, 1, 1, 1]))
- tf.get_variable('Conv/biases', initializer=tf.zeros([1]))
- tf.get_variable_scope().reuse_variables()
-
- y1 = slim.separable_conv2d(x, 1, [3, 3], depth_multiplier=1,
- stride=1, scope='Conv')
- y1_expected = tf.cast([[14, 28, 43, 58, 34],
- [28, 48, 66, 84, 46],
- [43, 66, 84, 102, 55],
- [58, 84, 102, 120, 64],
- [34, 46, 55, 64, 30]], tf.float32)
- y1_expected = tf.reshape(y1_expected, [1, n, n, 1])
-
- y2 = resnet_utils.subsample(y1, 2)
- y2_expected = tf.cast([[14, 43, 34],
- [43, 84, 55],
- [34, 55, 30]], tf.float32)
- y2_expected = tf.reshape(y2_expected, [1, n2, n2, 1])
-
- y3 = xception.separable_conv2d_same(x, 1, 3, depth_multiplier=1,
- regularize_depthwise=True,
- stride=2, scope='Conv')
- y3_expected = y2_expected
-
- y4 = slim.separable_conv2d(x, 1, [3, 3], depth_multiplier=1,
- stride=2, scope='Conv')
- y4_expected = y2_expected
-
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- self.assertAllClose(y1.eval(), y1_expected.eval())
- self.assertAllClose(y2.eval(), y2_expected.eval())
- self.assertAllClose(y3.eval(), y3_expected.eval())
- self.assertAllClose(y4.eval(), y4_expected.eval())
-
-
-class XceptionNetworkTest(tf.test.TestCase):
- """Tests with small Xception network."""
-
- def _xception_small(self,
- inputs,
- num_classes=None,
- is_training=True,
- global_pool=True,
- output_stride=None,
- regularize_depthwise=True,
- reuse=None,
- scope='xception_small'):
- """A shallow and thin Xception for faster tests."""
- block = xception.xception_block
- blocks = [
- block('entry_flow/block1',
- depth_list=[1, 1, 1],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2),
- block('entry_flow/block2',
- depth_list=[2, 2, 2],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2),
- block('entry_flow/block3',
- depth_list=[4, 4, 4],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=1),
- block('entry_flow/block4',
- depth_list=[4, 4, 4],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2),
- block('middle_flow/block1',
- depth_list=[4, 4, 4],
- skip_connection_type='sum',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=2,
- stride=1),
- block('exit_flow/block1',
- depth_list=[8, 8, 8],
- skip_connection_type='conv',
- activation_fn_in_separable_conv=False,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=2),
- block('exit_flow/block2',
- depth_list=[16, 16, 16],
- skip_connection_type='none',
- activation_fn_in_separable_conv=True,
- regularize_depthwise=regularize_depthwise,
- num_units=1,
- stride=1),
- ]
- return xception.xception(inputs,
- blocks=blocks,
- num_classes=num_classes,
- is_training=is_training,
- global_pool=global_pool,
- output_stride=output_stride,
- reuse=reuse,
- scope=scope)
-
- def testClassificationEndPoints(self):
- global_pool = True
- num_classes = 3
- inputs = create_test_input(2, 32, 32, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- logits, end_points = self._xception_small(
- inputs,
- num_classes=num_classes,
- global_pool=global_pool,
- scope='xception')
- self.assertTrue(
- logits.op.name.startswith('xception/logits'))
- self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
- self.assertTrue('predictions' in end_points)
- self.assertListEqual(end_points['predictions'].get_shape().as_list(),
- [2, 1, 1, num_classes])
- self.assertTrue('global_pool' in end_points)
- self.assertListEqual(end_points['global_pool'].get_shape().as_list(),
- [2, 1, 1, 16])
-
- def testEndpointNames(self):
- global_pool = True
- num_classes = 3
- inputs = create_test_input(2, 32, 32, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- _, end_points = self._xception_small(
- inputs,
- num_classes=num_classes,
- global_pool=global_pool,
- scope='xception')
- expected = [
- 'xception/entry_flow/conv1_1',
- 'xception/entry_flow/conv1_2',
- 'xception/entry_flow/block1/unit_1/xception_module/separable_conv1',
- 'xception/entry_flow/block1/unit_1/xception_module/separable_conv2',
- 'xception/entry_flow/block1/unit_1/xception_module/separable_conv3',
- 'xception/entry_flow/block1/unit_1/xception_module/shortcut',
- 'xception/entry_flow/block1/unit_1/xception_module',
- 'xception/entry_flow/block1',
- 'xception/entry_flow/block2/unit_1/xception_module/separable_conv1',
- 'xception/entry_flow/block2/unit_1/xception_module/separable_conv2',
- 'xception/entry_flow/block2/unit_1/xception_module/separable_conv3',
- 'xception/entry_flow/block2/unit_1/xception_module/shortcut',
- 'xception/entry_flow/block2/unit_1/xception_module',
- 'xception/entry_flow/block2',
- 'xception/entry_flow/block3/unit_1/xception_module/separable_conv1',
- 'xception/entry_flow/block3/unit_1/xception_module/separable_conv2',
- 'xception/entry_flow/block3/unit_1/xception_module/separable_conv3',
- 'xception/entry_flow/block3/unit_1/xception_module/shortcut',
- 'xception/entry_flow/block3/unit_1/xception_module',
- 'xception/entry_flow/block3',
- 'xception/entry_flow/block4/unit_1/xception_module/separable_conv1',
- 'xception/entry_flow/block4/unit_1/xception_module/separable_conv2',
- 'xception/entry_flow/block4/unit_1/xception_module/separable_conv3',
- 'xception/entry_flow/block4/unit_1/xception_module/shortcut',
- 'xception/entry_flow/block4/unit_1/xception_module',
- 'xception/entry_flow/block4',
- 'xception/middle_flow/block1/unit_1/xception_module/separable_conv1',
- 'xception/middle_flow/block1/unit_1/xception_module/separable_conv2',
- 'xception/middle_flow/block1/unit_1/xception_module/separable_conv3',
- 'xception/middle_flow/block1/unit_1/xception_module',
- 'xception/middle_flow/block1/unit_2/xception_module/separable_conv1',
- 'xception/middle_flow/block1/unit_2/xception_module/separable_conv2',
- 'xception/middle_flow/block1/unit_2/xception_module/separable_conv3',
- 'xception/middle_flow/block1/unit_2/xception_module',
- 'xception/middle_flow/block1',
- 'xception/exit_flow/block1/unit_1/xception_module/separable_conv1',
- 'xception/exit_flow/block1/unit_1/xception_module/separable_conv2',
- 'xception/exit_flow/block1/unit_1/xception_module/separable_conv3',
- 'xception/exit_flow/block1/unit_1/xception_module/shortcut',
- 'xception/exit_flow/block1/unit_1/xception_module',
- 'xception/exit_flow/block1',
- 'xception/exit_flow/block2/unit_1/xception_module/separable_conv1',
- 'xception/exit_flow/block2/unit_1/xception_module/separable_conv2',
- 'xception/exit_flow/block2/unit_1/xception_module/separable_conv3',
- 'xception/exit_flow/block2/unit_1/xception_module',
- 'xception/exit_flow/block2',
- 'global_pool',
- 'xception/logits',
- 'predictions',
- ]
- self.assertItemsEqual(list(end_points.keys()), expected)
-
- def testClassificationShapes(self):
- global_pool = True
- num_classes = 3
- inputs = create_test_input(2, 64, 64, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- _, end_points = self._xception_small(
- inputs,
- num_classes,
- global_pool=global_pool,
- scope='xception')
- endpoint_to_shape = {
- 'xception/entry_flow/conv1_1': [2, 32, 32, 32],
- 'xception/entry_flow/block1': [2, 16, 16, 1],
- 'xception/entry_flow/block2': [2, 8, 8, 2],
- 'xception/entry_flow/block4': [2, 4, 4, 4],
- 'xception/middle_flow/block1': [2, 4, 4, 4],
- 'xception/exit_flow/block1': [2, 2, 2, 8],
- 'xception/exit_flow/block2': [2, 2, 2, 16]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testFullyConvolutionalEndpointShapes(self):
- global_pool = False
- num_classes = 3
- inputs = create_test_input(2, 65, 65, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- _, end_points = self._xception_small(
- inputs,
- num_classes,
- global_pool=global_pool,
- scope='xception')
- endpoint_to_shape = {
- 'xception/entry_flow/conv1_1': [2, 33, 33, 32],
- 'xception/entry_flow/block1': [2, 17, 17, 1],
- 'xception/entry_flow/block2': [2, 9, 9, 2],
- 'xception/entry_flow/block4': [2, 5, 5, 4],
- 'xception/middle_flow/block1': [2, 5, 5, 4],
- 'xception/exit_flow/block1': [2, 3, 3, 8],
- 'xception/exit_flow/block2': [2, 3, 3, 16]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testAtrousFullyConvolutionalEndpointShapes(self):
- global_pool = False
- num_classes = 3
- output_stride = 8
- inputs = create_test_input(2, 65, 65, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- _, end_points = self._xception_small(
- inputs,
- num_classes,
- global_pool=global_pool,
- output_stride=output_stride,
- scope='xception')
- endpoint_to_shape = {
- 'xception/entry_flow/block1': [2, 17, 17, 1],
- 'xception/entry_flow/block2': [2, 9, 9, 2],
- 'xception/entry_flow/block4': [2, 9, 9, 4],
- 'xception/middle_flow/block1': [2, 9, 9, 4],
- 'xception/exit_flow/block1': [2, 9, 9, 8],
- 'xception/exit_flow/block2': [2, 9, 9, 16]}
- for endpoint, shape in six.iteritems(endpoint_to_shape):
- self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
-
- def testAtrousFullyConvolutionalValues(self):
- """Verify dense feature extraction with atrous convolution."""
- nominal_stride = 32
- for output_stride in [4, 8, 16, 32, None]:
- with slim.arg_scope(xception.xception_arg_scope()):
- with tf.Graph().as_default():
- with self.test_session() as sess:
- tf.set_random_seed(0)
- inputs = create_test_input(2, 96, 97, 3)
- # Dense feature extraction followed by subsampling.
- output, _ = self._xception_small(
- inputs,
- None,
- is_training=False,
- global_pool=False,
- output_stride=output_stride)
- if output_stride is None:
- factor = 1
- else:
- factor = nominal_stride // output_stride
- output = resnet_utils.subsample(output, factor)
- # Make the two networks use the same weights.
- tf.get_variable_scope().reuse_variables()
- # Feature extraction at the nominal network rate.
- expected, _ = self._xception_small(
- inputs,
- None,
- is_training=False,
- global_pool=False)
- sess.run(tf.global_variables_initializer())
- self.assertAllClose(output.eval(), expected.eval(),
- atol=1e-5, rtol=1e-5)
-
- def testUnknownBatchSize(self):
- batch = 2
- height, width = 65, 65
- global_pool = True
- num_classes = 10
- inputs = create_test_input(None, height, width, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- logits, _ = self._xception_small(
- inputs,
- num_classes,
- global_pool=global_pool,
- scope='xception')
- self.assertTrue(logits.op.name.startswith('xception/logits'))
- self.assertListEqual(logits.get_shape().as_list(),
- [None, 1, 1, num_classes])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(logits, {inputs: images.eval()})
- self.assertEquals(output.shape, (batch, 1, 1, num_classes))
-
- def testFullyConvolutionalUnknownHeightWidth(self):
- batch = 2
- height, width = 65, 65
- global_pool = False
- inputs = create_test_input(batch, None, None, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- output, _ = self._xception_small(
- inputs,
- None,
- global_pool=global_pool)
- self.assertListEqual(output.get_shape().as_list(),
- [batch, None, None, 16])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(output, {inputs: images.eval()})
- self.assertEquals(output.shape, (batch, 3, 3, 16))
-
- def testAtrousFullyConvolutionalUnknownHeightWidth(self):
- batch = 2
- height, width = 65, 65
- global_pool = False
- output_stride = 8
- inputs = create_test_input(batch, None, None, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- output, _ = self._xception_small(
- inputs,
- None,
- global_pool=global_pool,
- output_stride=output_stride)
- self.assertListEqual(output.get_shape().as_list(),
- [batch, None, None, 16])
- images = create_test_input(batch, height, width, 3)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output = sess.run(output, {inputs: images.eval()})
- self.assertEquals(output.shape, (batch, 9, 9, 16))
-
- def testEndpointsReuse(self):
- inputs = create_test_input(2, 32, 32, 3)
- with slim.arg_scope(xception.xception_arg_scope()):
- _, end_points0 = xception.xception_65(
- inputs,
- num_classes=10,
- reuse=False)
- with slim.arg_scope(xception.xception_arg_scope()):
- _, end_points1 = xception.xception_65(
- inputs,
- num_classes=10,
- reuse=True)
- self.assertItemsEqual(list(end_points0.keys()), list(end_points1.keys()))
-
- def testUseBoundedAcitvation(self):
- global_pool = False
- num_classes = 3
- output_stride = 16
- for use_bounded_activation in (True, False):
- tf.reset_default_graph()
- inputs = create_test_input(2, 65, 65, 3)
- with slim.arg_scope(xception.xception_arg_scope(
- use_bounded_activation=use_bounded_activation)):
- _, _ = self._xception_small(
- inputs,
- num_classes,
- global_pool=global_pool,
- output_stride=output_stride,
- scope='xception')
- for node in tf.get_default_graph().as_graph_def().node:
- if node.op.startswith('Relu'):
- self.assertEqual(node.op == 'Relu6', use_bounded_activation)
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/datasets/__init__.py b/research/deeplab/datasets/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/deeplab/datasets/build_ade20k_data.py b/research/deeplab/datasets/build_ade20k_data.py
deleted file mode 100644
index fc04ed0db04..00000000000
--- a/research/deeplab/datasets/build_ade20k_data.py
+++ /dev/null
@@ -1,123 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Converts ADE20K data to TFRecord file format with Example protos."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import math
-import os
-import random
-import sys
-import build_data
-from six.moves import range
-import tensorflow as tf
-
-FLAGS = tf.app.flags.FLAGS
-
-tf.app.flags.DEFINE_string(
- 'train_image_folder',
- './ADE20K/ADEChallengeData2016/images/training',
- 'Folder containing trainng images')
-tf.app.flags.DEFINE_string(
- 'train_image_label_folder',
- './ADE20K/ADEChallengeData2016/annotations/training',
- 'Folder containing annotations for trainng images')
-
-tf.app.flags.DEFINE_string(
- 'val_image_folder',
- './ADE20K/ADEChallengeData2016/images/validation',
- 'Folder containing validation images')
-
-tf.app.flags.DEFINE_string(
- 'val_image_label_folder',
- './ADE20K/ADEChallengeData2016/annotations/validation',
- 'Folder containing annotations for validation')
-
-tf.app.flags.DEFINE_string(
- 'output_dir', './ADE20K/tfrecord',
- 'Path to save converted tfrecord of Tensorflow example')
-
-_NUM_SHARDS = 4
-
-
-def _convert_dataset(dataset_split, dataset_dir, dataset_label_dir):
- """Converts the ADE20k dataset into into tfrecord format.
-
- Args:
- dataset_split: Dataset split (e.g., train, val).
- dataset_dir: Dir in which the dataset locates.
- dataset_label_dir: Dir in which the annotations locates.
-
- Raises:
- RuntimeError: If loaded image and label have different shape.
- """
-
- img_names = tf.gfile.Glob(os.path.join(dataset_dir, '*.jpg'))
- random.shuffle(img_names)
- seg_names = []
- for f in img_names:
- # get the filename without the extension
- basename = os.path.basename(f).split('.')[0]
- # cover its corresponding *_seg.png
- seg = os.path.join(dataset_label_dir, basename+'.png')
- seg_names.append(seg)
-
- num_images = len(img_names)
- num_per_shard = int(math.ceil(num_images / _NUM_SHARDS))
-
- image_reader = build_data.ImageReader('jpeg', channels=3)
- label_reader = build_data.ImageReader('png', channels=1)
-
- for shard_id in range(_NUM_SHARDS):
- output_filename = os.path.join(
- FLAGS.output_dir,
- '%s-%05d-of-%05d.tfrecord' % (dataset_split, shard_id, _NUM_SHARDS))
- with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
- start_idx = shard_id * num_per_shard
- end_idx = min((shard_id + 1) * num_per_shard, num_images)
- for i in range(start_idx, end_idx):
- sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
- i + 1, num_images, shard_id))
- sys.stdout.flush()
- # Read the image.
- image_filename = img_names[i]
- image_data = tf.gfile.FastGFile(image_filename, 'rb').read()
- height, width = image_reader.read_image_dims(image_data)
- # Read the semantic segmentation annotation.
- seg_filename = seg_names[i]
- seg_data = tf.gfile.FastGFile(seg_filename, 'rb').read()
- seg_height, seg_width = label_reader.read_image_dims(seg_data)
- if height != seg_height or width != seg_width:
- raise RuntimeError('Shape mismatched between image and label.')
- # Convert to tf example.
- example = build_data.image_seg_to_tfexample(
- image_data, img_names[i], height, width, seg_data)
- tfrecord_writer.write(example.SerializeToString())
- sys.stdout.write('\n')
- sys.stdout.flush()
-
-
-def main(unused_argv):
- tf.gfile.MakeDirs(FLAGS.output_dir)
- _convert_dataset(
- 'train', FLAGS.train_image_folder, FLAGS.train_image_label_folder)
- _convert_dataset('val', FLAGS.val_image_folder, FLAGS.val_image_label_folder)
-
-
-if __name__ == '__main__':
- tf.app.run()
diff --git a/research/deeplab/datasets/build_cityscapes_data.py b/research/deeplab/datasets/build_cityscapes_data.py
deleted file mode 100644
index 53c11e30310..00000000000
--- a/research/deeplab/datasets/build_cityscapes_data.py
+++ /dev/null
@@ -1,198 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Converts Cityscapes data to TFRecord file format with Example protos.
-
-The Cityscapes dataset is expected to have the following directory structure:
-
- + cityscapes
- - build_cityscapes_data.py (current working directiory).
- - build_data.py
- + cityscapesscripts
- + annotation
- + evaluation
- + helpers
- + preparation
- + viewer
- + gtFine
- + train
- + val
- + test
- + leftImg8bit
- + train
- + val
- + test
- + tfrecord
-
-This script converts data into sharded data files and save at tfrecord folder.
-
-Note that before running this script, the users should (1) register the
-Cityscapes dataset website at https://www.cityscapes-dataset.com to
-download the dataset, and (2) run the script provided by Cityscapes
-`preparation/createTrainIdLabelImgs.py` to generate the training groundtruth.
-
-Also note that the tensorflow model will be trained with `TrainId' instead
-of `EvalId' used on the evaluation server. Thus, the users need to convert
-the predicted labels to `EvalId` for evaluation on the server. See the
-vis.py for more details.
-
-The Example proto contains the following fields:
-
- image/encoded: encoded image content.
- image/filename: image filename.
- image/format: image file format.
- image/height: image height.
- image/width: image width.
- image/channels: image channels.
- image/segmentation/class/encoded: encoded semantic segmentation content.
- image/segmentation/class/format: semantic segmentation file format.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import glob
-import math
-import os.path
-import re
-import sys
-import build_data
-from six.moves import range
-import tensorflow as tf
-
-FLAGS = tf.app.flags.FLAGS
-
-tf.app.flags.DEFINE_string('cityscapes_root',
- './cityscapes',
- 'Cityscapes dataset root folder.')
-
-tf.app.flags.DEFINE_string(
- 'output_dir',
- './tfrecord',
- 'Path to save converted SSTable of TensorFlow examples.')
-
-
-_NUM_SHARDS = 10
-
-# A map from data type to folder name that saves the data.
-_FOLDERS_MAP = {
- 'image': 'leftImg8bit',
- 'label': 'gtFine',
-}
-
-# A map from data type to filename postfix.
-_POSTFIX_MAP = {
- 'image': '_leftImg8bit',
- 'label': '_gtFine_labelTrainIds',
-}
-
-# A map from data type to data format.
-_DATA_FORMAT_MAP = {
- 'image': 'png',
- 'label': 'png',
-}
-
-# Image file pattern.
-_IMAGE_FILENAME_RE = re.compile('(.+)' + _POSTFIX_MAP['image'])
-
-
-def _get_files(data, dataset_split):
- """Gets files for the specified data type and dataset split.
-
- Args:
- data: String, desired data ('image' or 'label').
- dataset_split: String, dataset split ('train_fine', 'val_fine', 'test_fine')
-
- Returns:
- A list of sorted file names or None when getting label for
- test set.
- """
- if dataset_split == 'train_fine':
- split_dir = 'train'
- elif dataset_split == 'val_fine':
- split_dir = 'val'
- elif dataset_split == 'test_fine':
- split_dir = 'test'
- else:
- raise RuntimeError("Split {} is not supported".format(dataset_split))
- pattern = '*%s.%s' % (_POSTFIX_MAP[data], _DATA_FORMAT_MAP[data])
- search_files = os.path.join(
- FLAGS.cityscapes_root, _FOLDERS_MAP[data], split_dir, '*', pattern)
- filenames = glob.glob(search_files)
- return sorted(filenames)
-
-
-def _convert_dataset(dataset_split):
- """Converts the specified dataset split to TFRecord format.
-
- Args:
- dataset_split: The dataset split (e.g., train_fine, val_fine).
-
- Raises:
- RuntimeError: If loaded image and label have different shape, or if the
- image file with specified postfix could not be found.
- """
- image_files = _get_files('image', dataset_split)
- label_files = _get_files('label', dataset_split)
-
- num_images = len(image_files)
- num_labels = len(label_files)
- num_per_shard = int(math.ceil(num_images / _NUM_SHARDS))
-
- if num_images != num_labels:
- raise RuntimeError("The number of images and labels doesn't match: {} {}".format(num_images, num_labels))
-
- image_reader = build_data.ImageReader('png', channels=3)
- label_reader = build_data.ImageReader('png', channels=1)
-
- for shard_id in range(_NUM_SHARDS):
- shard_filename = '%s-%05d-of-%05d.tfrecord' % (
- dataset_split, shard_id, _NUM_SHARDS)
- output_filename = os.path.join(FLAGS.output_dir, shard_filename)
- with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
- start_idx = shard_id * num_per_shard
- end_idx = min((shard_id + 1) * num_per_shard, num_images)
- for i in range(start_idx, end_idx):
- sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
- i + 1, num_images, shard_id))
- sys.stdout.flush()
- # Read the image.
- image_data = tf.gfile.FastGFile(image_files[i], 'rb').read()
- height, width = image_reader.read_image_dims(image_data)
- # Read the semantic segmentation annotation.
- seg_data = tf.gfile.FastGFile(label_files[i], 'rb').read()
- seg_height, seg_width = label_reader.read_image_dims(seg_data)
- if height != seg_height or width != seg_width:
- raise RuntimeError('Shape mismatched between image and label.')
- # Convert to tf example.
- re_match = _IMAGE_FILENAME_RE.search(image_files[i])
- if re_match is None:
- raise RuntimeError('Invalid image filename: ' + image_files[i])
- filename = os.path.basename(re_match.group(1))
- example = build_data.image_seg_to_tfexample(
- image_data, filename, height, width, seg_data)
- tfrecord_writer.write(example.SerializeToString())
- sys.stdout.write('\n')
- sys.stdout.flush()
-
-
-def main(unused_argv):
- # Only support converting 'train_fine', 'val_fine' and 'test_fine' sets for now.
- for dataset_split in ['train_fine', 'val_fine', 'test_fine']:
- _convert_dataset(dataset_split)
-
-
-if __name__ == '__main__':
- tf.app.run()
diff --git a/research/deeplab/datasets/build_data.py b/research/deeplab/datasets/build_data.py
deleted file mode 100644
index 45628674dbf..00000000000
--- a/research/deeplab/datasets/build_data.py
+++ /dev/null
@@ -1,161 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Contains common utility functions and classes for building dataset.
-
-This script contains utility functions and classes to converts dataset to
-TFRecord file format with Example protos.
-
-The Example proto contains the following fields:
-
- image/encoded: encoded image content.
- image/filename: image filename.
- image/format: image file format.
- image/height: image height.
- image/width: image width.
- image/channels: image channels.
- image/segmentation/class/encoded: encoded semantic segmentation content.
- image/segmentation/class/format: semantic segmentation file format.
-"""
-import collections
-import six
-import tensorflow as tf
-
-FLAGS = tf.app.flags.FLAGS
-
-tf.app.flags.DEFINE_enum('image_format', 'png', ['jpg', 'jpeg', 'png'],
- 'Image format.')
-
-tf.app.flags.DEFINE_enum('label_format', 'png', ['png'],
- 'Segmentation label format.')
-
-# A map from image format to expected data format.
-_IMAGE_FORMAT_MAP = {
- 'jpg': 'jpeg',
- 'jpeg': 'jpeg',
- 'png': 'png',
-}
-
-
-class ImageReader(object):
- """Helper class that provides TensorFlow image coding utilities."""
-
- def __init__(self, image_format='jpeg', channels=3):
- """Class constructor.
-
- Args:
- image_format: Image format. Only 'jpeg', 'jpg', or 'png' are supported.
- channels: Image channels.
- """
- with tf.Graph().as_default():
- self._decode_data = tf.placeholder(dtype=tf.string)
- self._image_format = image_format
- self._session = tf.Session()
- if self._image_format in ('jpeg', 'jpg'):
- self._decode = tf.image.decode_jpeg(self._decode_data,
- channels=channels)
- elif self._image_format == 'png':
- self._decode = tf.image.decode_png(self._decode_data,
- channels=channels)
-
- def read_image_dims(self, image_data):
- """Reads the image dimensions.
-
- Args:
- image_data: string of image data.
-
- Returns:
- image_height and image_width.
- """
- image = self.decode_image(image_data)
- return image.shape[:2]
-
- def decode_image(self, image_data):
- """Decodes the image data string.
-
- Args:
- image_data: string of image data.
-
- Returns:
- Decoded image data.
-
- Raises:
- ValueError: Value of image channels not supported.
- """
- image = self._session.run(self._decode,
- feed_dict={self._decode_data: image_data})
- if len(image.shape) != 3 or image.shape[2] not in (1, 3):
- raise ValueError('The image channels not supported.')
-
- return image
-
-
-def _int64_list_feature(values):
- """Returns a TF-Feature of int64_list.
-
- Args:
- values: A scalar or list of values.
-
- Returns:
- A TF-Feature.
- """
- if not isinstance(values, collections.Iterable):
- values = [values]
-
- return tf.train.Feature(int64_list=tf.train.Int64List(value=values))
-
-
-def _bytes_list_feature(values):
- """Returns a TF-Feature of bytes.
-
- Args:
- values: A string.
-
- Returns:
- A TF-Feature.
- """
- def norm2bytes(value):
- return value.encode() if isinstance(value, str) and six.PY3 else value
-
- return tf.train.Feature(
- bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))
-
-
-def image_seg_to_tfexample(image_data, filename, height, width, seg_data):
- """Converts one image/segmentation pair to tf example.
-
- Args:
- image_data: string of image data.
- filename: image filename.
- height: image height.
- width: image width.
- seg_data: string of semantic segmentation data.
-
- Returns:
- tf example of one image/segmentation pair.
- """
- return tf.train.Example(features=tf.train.Features(feature={
- 'image/encoded': _bytes_list_feature(image_data),
- 'image/filename': _bytes_list_feature(filename),
- 'image/format': _bytes_list_feature(
- _IMAGE_FORMAT_MAP[FLAGS.image_format]),
- 'image/height': _int64_list_feature(height),
- 'image/width': _int64_list_feature(width),
- 'image/channels': _int64_list_feature(3),
- 'image/segmentation/class/encoded': (
- _bytes_list_feature(seg_data)),
- 'image/segmentation/class/format': _bytes_list_feature(
- FLAGS.label_format),
- }))
diff --git a/research/deeplab/datasets/build_voc2012_data.py b/research/deeplab/datasets/build_voc2012_data.py
deleted file mode 100644
index f0bdecb6a0f..00000000000
--- a/research/deeplab/datasets/build_voc2012_data.py
+++ /dev/null
@@ -1,146 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Converts PASCAL VOC 2012 data to TFRecord file format with Example protos.
-
-PASCAL VOC 2012 dataset is expected to have the following directory structure:
-
- + pascal_voc_seg
- - build_data.py
- - build_voc2012_data.py (current working directory).
- + VOCdevkit
- + VOC2012
- + JPEGImages
- + SegmentationClass
- + ImageSets
- + Segmentation
- + tfrecord
-
-Image folder:
- ./VOCdevkit/VOC2012/JPEGImages
-
-Semantic segmentation annotations:
- ./VOCdevkit/VOC2012/SegmentationClass
-
-list folder:
- ./VOCdevkit/VOC2012/ImageSets/Segmentation
-
-This script converts data into sharded data files and save at tfrecord folder.
-
-The Example proto contains the following fields:
-
- image/encoded: encoded image content.
- image/filename: image filename.
- image/format: image file format.
- image/height: image height.
- image/width: image width.
- image/channels: image channels.
- image/segmentation/class/encoded: encoded semantic segmentation content.
- image/segmentation/class/format: semantic segmentation file format.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import math
-import os.path
-import sys
-import build_data
-from six.moves import range
-import tensorflow as tf
-
-FLAGS = tf.app.flags.FLAGS
-
-tf.app.flags.DEFINE_string('image_folder',
- './VOCdevkit/VOC2012/JPEGImages',
- 'Folder containing images.')
-
-tf.app.flags.DEFINE_string(
- 'semantic_segmentation_folder',
- './VOCdevkit/VOC2012/SegmentationClassRaw',
- 'Folder containing semantic segmentation annotations.')
-
-tf.app.flags.DEFINE_string(
- 'list_folder',
- './VOCdevkit/VOC2012/ImageSets/Segmentation',
- 'Folder containing lists for training and validation')
-
-tf.app.flags.DEFINE_string(
- 'output_dir',
- './tfrecord',
- 'Path to save converted SSTable of TensorFlow examples.')
-
-
-_NUM_SHARDS = 4
-
-
-def _convert_dataset(dataset_split):
- """Converts the specified dataset split to TFRecord format.
-
- Args:
- dataset_split: The dataset split (e.g., train, test).
-
- Raises:
- RuntimeError: If loaded image and label have different shape.
- """
- dataset = os.path.basename(dataset_split)[:-4]
- sys.stdout.write('Processing ' + dataset)
- filenames = [x.strip('\n') for x in open(dataset_split, 'r')]
- num_images = len(filenames)
- num_per_shard = int(math.ceil(num_images / _NUM_SHARDS))
-
- image_reader = build_data.ImageReader('jpeg', channels=3)
- label_reader = build_data.ImageReader('png', channels=1)
-
- for shard_id in range(_NUM_SHARDS):
- output_filename = os.path.join(
- FLAGS.output_dir,
- '%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
- with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
- start_idx = shard_id * num_per_shard
- end_idx = min((shard_id + 1) * num_per_shard, num_images)
- for i in range(start_idx, end_idx):
- sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
- i + 1, len(filenames), shard_id))
- sys.stdout.flush()
- # Read the image.
- image_filename = os.path.join(
- FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
- image_data = tf.gfile.GFile(image_filename, 'rb').read()
- height, width = image_reader.read_image_dims(image_data)
- # Read the semantic segmentation annotation.
- seg_filename = os.path.join(
- FLAGS.semantic_segmentation_folder,
- filenames[i] + '.' + FLAGS.label_format)
- seg_data = tf.gfile.GFile(seg_filename, 'rb').read()
- seg_height, seg_width = label_reader.read_image_dims(seg_data)
- if height != seg_height or width != seg_width:
- raise RuntimeError('Shape mismatched between image and label.')
- # Convert to tf example.
- example = build_data.image_seg_to_tfexample(
- image_data, filenames[i], height, width, seg_data)
- tfrecord_writer.write(example.SerializeToString())
- sys.stdout.write('\n')
- sys.stdout.flush()
-
-
-def main(unused_argv):
- dataset_splits = tf.gfile.Glob(os.path.join(FLAGS.list_folder, '*.txt'))
- for dataset_split in dataset_splits:
- _convert_dataset(dataset_split)
-
-
-if __name__ == '__main__':
- tf.app.run()
diff --git a/research/deeplab/datasets/convert_cityscapes.sh b/research/deeplab/datasets/convert_cityscapes.sh
deleted file mode 100644
index ddc39fb11dd..00000000000
--- a/research/deeplab/datasets/convert_cityscapes.sh
+++ /dev/null
@@ -1,60 +0,0 @@
-#!/bin/bash
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-#
-# Script to preprocess the Cityscapes dataset. Note (1) the users should
-# register the Cityscapes dataset website at
-# https://www.cityscapes-dataset.com/downloads/ to download the dataset,
-# and (2) the users should download the utility scripts provided by
-# Cityscapes at https://github.com/mcordts/cityscapesScripts.
-#
-# Usage:
-# bash ./convert_cityscapes.sh
-#
-# The folder structure is assumed to be:
-# + datasets
-# - build_cityscapes_data.py
-# - convert_cityscapes.sh
-# + cityscapes
-# + cityscapesscripts (downloaded scripts)
-# + gtFine
-# + leftImg8bit
-#
-
-# Exit immediately if a command exits with a non-zero status.
-set -e
-
-CURRENT_DIR=$(pwd)
-WORK_DIR="."
-
-# Root path for Cityscapes dataset.
-CITYSCAPES_ROOT="${WORK_DIR}/cityscapes"
-
-export PYTHONPATH="${CITYSCAPES_ROOT}:${PYTHONPATH}"
-
-# Create training labels.
-python "${CITYSCAPES_ROOT}/cityscapesscripts/preparation/createTrainIdLabelImgs.py"
-
-# Build TFRecords of the dataset.
-# First, create output directory for storing TFRecords.
-OUTPUT_DIR="${CITYSCAPES_ROOT}/tfrecord"
-mkdir -p "${OUTPUT_DIR}"
-
-BUILD_SCRIPT="${CURRENT_DIR}/build_cityscapes_data.py"
-
-echo "Converting Cityscapes dataset..."
-python "${BUILD_SCRIPT}" \
- --cityscapes_root="${CITYSCAPES_ROOT}" \
- --output_dir="${OUTPUT_DIR}" \
diff --git a/research/deeplab/datasets/data_generator.py b/research/deeplab/datasets/data_generator.py
deleted file mode 100644
index d84e66f9c48..00000000000
--- a/research/deeplab/datasets/data_generator.py
+++ /dev/null
@@ -1,350 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Wrapper for providing semantic segmentaion data.
-
-The SegmentationDataset class provides both images and annotations (semantic
-segmentation and/or instance segmentation) for TensorFlow. Currently, we
-support the following datasets:
-
-1. PASCAL VOC 2012 (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/).
-
-PASCAL VOC 2012 semantic segmentation dataset annotates 20 foreground objects
-(e.g., bike, person, and so on) and leaves all the other semantic classes as
-one background class. The dataset contains 1464, 1449, and 1456 annotated
-images for the training, validation and test respectively.
-
-2. Cityscapes dataset (https://www.cityscapes-dataset.com)
-
-The Cityscapes dataset contains 19 semantic labels (such as road, person, car,
-and so on) for urban street scenes.
-
-3. ADE20K dataset (http://groups.csail.mit.edu/vision/datasets/ADE20K)
-
-The ADE20K dataset contains 150 semantic labels both urban street scenes and
-indoor scenes.
-
-References:
- M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. Winn,
- and A. Zisserman, The pascal visual object classes challenge a retrospective.
- IJCV, 2014.
-
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,
- U. Franke, S. Roth, and B. Schiele, "The cityscapes dataset for semantic urban
- scene understanding," In Proc. of CVPR, 2016.
-
- B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba, "Scene Parsing
- through ADE20K dataset", In Proc. of CVPR, 2017.
-"""
-
-import collections
-import os
-import tensorflow as tf
-from deeplab import common
-from deeplab import input_preprocess
-
-# Named tuple to describe the dataset properties.
-DatasetDescriptor = collections.namedtuple(
- 'DatasetDescriptor',
- [
- 'splits_to_sizes', # Splits of the dataset into training, val and test.
- 'num_classes', # Number of semantic classes, including the
- # background class (if exists). For example, there
- # are 20 foreground classes + 1 background class in
- # the PASCAL VOC 2012 dataset. Thus, we set
- # num_classes=21.
- 'ignore_label', # Ignore label value.
- ])
-
-_CITYSCAPES_INFORMATION = DatasetDescriptor(
- splits_to_sizes={'train_fine': 2975,
- 'train_coarse': 22973,
- 'trainval_fine': 3475,
- 'trainval_coarse': 23473,
- 'val_fine': 500,
- 'test_fine': 1525},
- num_classes=19,
- ignore_label=255,
-)
-
-_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
- splits_to_sizes={
- 'train': 1464,
- 'train_aug': 10582,
- 'trainval': 2913,
- 'val': 1449,
- },
- num_classes=21,
- ignore_label=255,
-)
-
-_ADE20K_INFORMATION = DatasetDescriptor(
- splits_to_sizes={
- 'train': 20210, # num of samples in images/training
- 'val': 2000, # num of samples in images/validation
- },
- num_classes=151,
- ignore_label=0,
-)
-
-_DATASETS_INFORMATION = {
- 'cityscapes': _CITYSCAPES_INFORMATION,
- 'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
- 'ade20k': _ADE20K_INFORMATION,
-}
-
-# Default file pattern of TFRecord of TensorFlow Example.
-_FILE_PATTERN = '%s-*'
-
-
-def get_cityscapes_dataset_name():
- return 'cityscapes'
-
-
-class Dataset(object):
- """Represents input dataset for deeplab model."""
-
- def __init__(self,
- dataset_name,
- split_name,
- dataset_dir,
- batch_size,
- crop_size,
- min_resize_value=None,
- max_resize_value=None,
- resize_factor=None,
- min_scale_factor=1.,
- max_scale_factor=1.,
- scale_factor_step_size=0,
- model_variant=None,
- num_readers=1,
- is_training=False,
- should_shuffle=False,
- should_repeat=False):
- """Initializes the dataset.
-
- Args:
- dataset_name: Dataset name.
- split_name: A train/val Split name.
- dataset_dir: The directory of the dataset sources.
- batch_size: Batch size.
- crop_size: The size used to crop the image and label.
- min_resize_value: Desired size of the smaller image side.
- max_resize_value: Maximum allowed size of the larger image side.
- resize_factor: Resized dimensions are multiple of factor plus one.
- min_scale_factor: Minimum scale factor value.
- max_scale_factor: Maximum scale factor value.
- scale_factor_step_size: The step size from min scale factor to max scale
- factor. The input is randomly scaled based on the value of
- (min_scale_factor, max_scale_factor, scale_factor_step_size).
- model_variant: Model variant (string) for choosing how to mean-subtract
- the images. See feature_extractor.network_map for supported model
- variants.
- num_readers: Number of readers for data provider.
- is_training: Boolean, if dataset is for training or not.
- should_shuffle: Boolean, if should shuffle the input data.
- should_repeat: Boolean, if should repeat the input data.
-
- Raises:
- ValueError: Dataset name and split name are not supported.
- """
- if dataset_name not in _DATASETS_INFORMATION:
- raise ValueError('The specified dataset is not supported yet.')
- self.dataset_name = dataset_name
-
- splits_to_sizes = _DATASETS_INFORMATION[dataset_name].splits_to_sizes
-
- if split_name not in splits_to_sizes:
- raise ValueError('data split name %s not recognized' % split_name)
-
- if model_variant is None:
- tf.logging.warning('Please specify a model_variant. See '
- 'feature_extractor.network_map for supported model '
- 'variants.')
-
- self.split_name = split_name
- self.dataset_dir = dataset_dir
- self.batch_size = batch_size
- self.crop_size = crop_size
- self.min_resize_value = min_resize_value
- self.max_resize_value = max_resize_value
- self.resize_factor = resize_factor
- self.min_scale_factor = min_scale_factor
- self.max_scale_factor = max_scale_factor
- self.scale_factor_step_size = scale_factor_step_size
- self.model_variant = model_variant
- self.num_readers = num_readers
- self.is_training = is_training
- self.should_shuffle = should_shuffle
- self.should_repeat = should_repeat
-
- self.num_of_classes = _DATASETS_INFORMATION[self.dataset_name].num_classes
- self.ignore_label = _DATASETS_INFORMATION[self.dataset_name].ignore_label
-
- def _parse_function(self, example_proto):
- """Function to parse the example proto.
-
- Args:
- example_proto: Proto in the format of tf.Example.
-
- Returns:
- A dictionary with parsed image, label, height, width and image name.
-
- Raises:
- ValueError: Label is of wrong shape.
- """
-
- # Currently only supports jpeg and png.
- # Need to use this logic because the shape is not known for
- # tf.image.decode_image and we rely on this info to
- # extend label if necessary.
- def _decode_image(content, channels):
- return tf.cond(
- tf.image.is_jpeg(content),
- lambda: tf.image.decode_jpeg(content, channels),
- lambda: tf.image.decode_png(content, channels))
-
- features = {
- 'image/encoded':
- tf.FixedLenFeature((), tf.string, default_value=''),
- 'image/filename':
- tf.FixedLenFeature((), tf.string, default_value=''),
- 'image/format':
- tf.FixedLenFeature((), tf.string, default_value='jpeg'),
- 'image/height':
- tf.FixedLenFeature((), tf.int64, default_value=0),
- 'image/width':
- tf.FixedLenFeature((), tf.int64, default_value=0),
- 'image/segmentation/class/encoded':
- tf.FixedLenFeature((), tf.string, default_value=''),
- 'image/segmentation/class/format':
- tf.FixedLenFeature((), tf.string, default_value='png'),
- }
-
- parsed_features = tf.parse_single_example(example_proto, features)
-
- image = _decode_image(parsed_features['image/encoded'], channels=3)
-
- label = None
- if self.split_name != common.TEST_SET:
- label = _decode_image(
- parsed_features['image/segmentation/class/encoded'], channels=1)
-
- image_name = parsed_features['image/filename']
- if image_name is None:
- image_name = tf.constant('')
-
- sample = {
- common.IMAGE: image,
- common.IMAGE_NAME: image_name,
- common.HEIGHT: parsed_features['image/height'],
- common.WIDTH: parsed_features['image/width'],
- }
-
- if label is not None:
- if label.get_shape().ndims == 2:
- label = tf.expand_dims(label, 2)
- elif label.get_shape().ndims == 3 and label.shape.dims[2] == 1:
- pass
- else:
- raise ValueError('Input label shape must be [height, width], or '
- '[height, width, 1].')
-
- label.set_shape([None, None, 1])
-
- sample[common.LABELS_CLASS] = label
-
- return sample
-
- def _preprocess_image(self, sample):
- """Preprocesses the image and label.
-
- Args:
- sample: A sample containing image and label.
-
- Returns:
- sample: Sample with preprocessed image and label.
-
- Raises:
- ValueError: Ground truth label not provided during training.
- """
- image = sample[common.IMAGE]
- label = sample[common.LABELS_CLASS]
-
- original_image, image, label = input_preprocess.preprocess_image_and_label(
- image=image,
- label=label,
- crop_height=self.crop_size[0],
- crop_width=self.crop_size[1],
- min_resize_value=self.min_resize_value,
- max_resize_value=self.max_resize_value,
- resize_factor=self.resize_factor,
- min_scale_factor=self.min_scale_factor,
- max_scale_factor=self.max_scale_factor,
- scale_factor_step_size=self.scale_factor_step_size,
- ignore_label=self.ignore_label,
- is_training=self.is_training,
- model_variant=self.model_variant)
-
- sample[common.IMAGE] = image
-
- if not self.is_training:
- # Original image is only used during visualization.
- sample[common.ORIGINAL_IMAGE] = original_image
-
- if label is not None:
- sample[common.LABEL] = label
-
- # Remove common.LABEL_CLASS key in the sample since it is only used to
- # derive label and not used in training and evaluation.
- sample.pop(common.LABELS_CLASS, None)
-
- return sample
-
- def get_one_shot_iterator(self):
- """Gets an iterator that iterates across the dataset once.
-
- Returns:
- An iterator of type tf.data.Iterator.
- """
-
- files = self._get_all_files()
-
- dataset = (
- tf.data.TFRecordDataset(files, num_parallel_reads=self.num_readers)
- .map(self._parse_function, num_parallel_calls=self.num_readers)
- .map(self._preprocess_image, num_parallel_calls=self.num_readers))
-
- if self.should_shuffle:
- dataset = dataset.shuffle(buffer_size=100)
-
- if self.should_repeat:
- dataset = dataset.repeat() # Repeat forever for training.
- else:
- dataset = dataset.repeat(1)
-
- dataset = dataset.batch(self.batch_size).prefetch(self.batch_size)
- return dataset.make_one_shot_iterator()
-
- def _get_all_files(self):
- """Gets all the files to read data from.
-
- Returns:
- A list of input files.
- """
- file_pattern = _FILE_PATTERN
- file_pattern = os.path.join(self.dataset_dir,
- file_pattern % self.split_name)
- return tf.gfile.Glob(file_pattern)
diff --git a/research/deeplab/datasets/data_generator_test.py b/research/deeplab/datasets/data_generator_test.py
deleted file mode 100644
index f4425d01da0..00000000000
--- a/research/deeplab/datasets/data_generator_test.py
+++ /dev/null
@@ -1,115 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for deeplab.datasets.data_generator."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-
-from six.moves import range
-import tensorflow as tf
-
-from deeplab import common
-from deeplab.datasets import data_generator
-
-ImageAttributes = collections.namedtuple(
- 'ImageAttributes', ['image', 'label', 'height', 'width', 'image_name'])
-
-
-class DatasetTest(tf.test.TestCase):
-
- # Note: training dataset cannot be tested since there is shuffle operation.
- # When disabling the shuffle, training dataset is operated same as validation
- # dataset. Therefore it is not tested again.
- def testPascalVocSegTestData(self):
- dataset = data_generator.Dataset(
- dataset_name='pascal_voc_seg',
- split_name='val',
- dataset_dir=
- 'deeplab/testing/pascal_voc_seg',
- batch_size=1,
- crop_size=[3, 3], # Use small size for testing.
- min_resize_value=3,
- max_resize_value=3,
- resize_factor=None,
- min_scale_factor=0.01,
- max_scale_factor=2.0,
- scale_factor_step_size=0.25,
- is_training=False,
- model_variant='mobilenet_v2')
-
- self.assertAllEqual(dataset.num_of_classes, 21)
- self.assertAllEqual(dataset.ignore_label, 255)
-
- num_of_images = 3
- with self.test_session() as sess:
- iterator = dataset.get_one_shot_iterator()
-
- for i in range(num_of_images):
- batch = iterator.get_next()
- batch, = sess.run([batch])
- image_attributes = _get_attributes_of_image(i)
- self.assertEqual(batch[common.HEIGHT][0], image_attributes.height)
- self.assertEqual(batch[common.WIDTH][0], image_attributes.width)
- self.assertEqual(batch[common.IMAGE_NAME][0],
- image_attributes.image_name.encode())
-
- # All data have been read.
- with self.assertRaisesRegexp(tf.errors.OutOfRangeError, ''):
- sess.run([iterator.get_next()])
-
-
-def _get_attributes_of_image(index):
- """Gets the attributes of the image.
-
- Args:
- index: Index of image in all images.
-
- Returns:
- Attributes of the image in the format of ImageAttributes.
-
- Raises:
- ValueError: If index is of wrong value.
- """
- if index == 0:
- return ImageAttributes(
- image=None,
- label=None,
- height=366,
- width=500,
- image_name='2007_000033')
- elif index == 1:
- return ImageAttributes(
- image=None,
- label=None,
- height=335,
- width=500,
- image_name='2007_000042')
- elif index == 2:
- return ImageAttributes(
- image=None,
- label=None,
- height=333,
- width=500,
- image_name='2007_000061')
- else:
- raise ValueError('Index can only be 0, 1 or 2.')
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/datasets/download_and_convert_ade20k.sh b/research/deeplab/datasets/download_and_convert_ade20k.sh
deleted file mode 100644
index 3614ae42c16..00000000000
--- a/research/deeplab/datasets/download_and_convert_ade20k.sh
+++ /dev/null
@@ -1,80 +0,0 @@
-#!/bin/bash
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-#
-# Script to download and preprocess the ADE20K dataset.
-#
-# Usage:
-# bash ./download_and_convert_ade20k.sh
-#
-# The folder structure is assumed to be:
-# + datasets
-# - build_data.py
-# - build_ade20k_data.py
-# - download_and_convert_ade20k.sh
-# + ADE20K
-# + tfrecord
-# + ADEChallengeData2016
-# + annotations
-# + training
-# + validation
-# + images
-# + training
-# + validation
-
-# Exit immediately if a command exits with a non-zero status.
-set -e
-
-CURRENT_DIR=$(pwd)
-WORK_DIR="./ADE20K"
-mkdir -p "${WORK_DIR}"
-cd "${WORK_DIR}"
-
-# Helper function to download and unpack ADE20K dataset.
-download_and_uncompress() {
- local BASE_URL=${1}
- local FILENAME=${2}
-
- if [ ! -f "${FILENAME}" ]; then
- echo "Downloading ${FILENAME} to ${WORK_DIR}"
- wget -nd -c "${BASE_URL}/${FILENAME}"
- fi
- echo "Uncompressing ${FILENAME}"
- unzip "${FILENAME}"
-}
-
-# Download the images.
-BASE_URL="http://data.csail.mit.edu/places/ADEchallenge"
-FILENAME="ADEChallengeData2016.zip"
-
-download_and_uncompress "${BASE_URL}" "${FILENAME}"
-
-cd "${CURRENT_DIR}"
-
-# Root path for ADE20K dataset.
-ADE20K_ROOT="${WORK_DIR}/ADEChallengeData2016"
-
-# Build TFRecords of the dataset.
-# First, create output directory for storing TFRecords.
-OUTPUT_DIR="${WORK_DIR}/tfrecord"
-mkdir -p "${OUTPUT_DIR}"
-
-echo "Converting ADE20K dataset..."
-python ./build_ade20k_data.py \
- --train_image_folder="${ADE20K_ROOT}/images/training/" \
- --train_image_label_folder="${ADE20K_ROOT}/annotations/training/" \
- --val_image_folder="${ADE20K_ROOT}/images/validation/" \
- --val_image_label_folder="${ADE20K_ROOT}/annotations/validation/" \
- --output_dir="${OUTPUT_DIR}"
diff --git a/research/deeplab/datasets/download_and_convert_voc2012.sh b/research/deeplab/datasets/download_and_convert_voc2012.sh
deleted file mode 100644
index 3126f729dec..00000000000
--- a/research/deeplab/datasets/download_and_convert_voc2012.sh
+++ /dev/null
@@ -1,92 +0,0 @@
-#!/bin/bash
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-#
-# Script to download and preprocess the PASCAL VOC 2012 dataset.
-#
-# Usage:
-# bash ./download_and_convert_voc2012.sh
-#
-# The folder structure is assumed to be:
-# + datasets
-# - build_data.py
-# - build_voc2012_data.py
-# - download_and_convert_voc2012.sh
-# - remove_gt_colormap.py
-# + pascal_voc_seg
-# + VOCdevkit
-# + VOC2012
-# + JPEGImages
-# + SegmentationClass
-#
-
-# Exit immediately if a command exits with a non-zero status.
-set -e
-
-CURRENT_DIR=$(pwd)
-WORK_DIR="./pascal_voc_seg"
-SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
-mkdir -p "${WORK_DIR}"
-cd "${WORK_DIR}"
-
-# Helper function to download and unpack VOC 2012 dataset.
-download_and_uncompress() {
- local BASE_URL=${1}
- local FILENAME=${2}
-
- if [ ! -f "${FILENAME}" ]; then
- echo "Downloading ${FILENAME} to ${WORK_DIR}"
- wget -nd -c "${BASE_URL}/${FILENAME}"
- fi
- echo "Uncompressing ${FILENAME}"
- sudo apt install unzip
- unzip "${FILENAME}"
-}
-
-# Download the images.
-BASE_URL="https://data.deepai.org/"
-FILENAME="PascalVOC2012.zip"
-
-download_and_uncompress "${BASE_URL}" "${FILENAME}"
-
-cd "${CURRENT_DIR}"
-
-# Root path for PASCAL VOC 2012 dataset.
-PASCAL_ROOT="${WORK_DIR}/VOC2012"
-
-# Remove the colormap in the ground truth annotations.
-SEG_FOLDER="${PASCAL_ROOT}/SegmentationClass"
-SEMANTIC_SEG_FOLDER="${PASCAL_ROOT}/SegmentationClassRaw"
-
-echo "Removing the color map in ground truth annotations..."
-python3 "${SCRIPT_DIR}/remove_gt_colormap.py" \
- --original_gt_folder="${SEG_FOLDER}" \
- --output_dir="${SEMANTIC_SEG_FOLDER}"
-
-# Build TFRecords of the dataset.
-# First, create output directory for storing TFRecords.
-OUTPUT_DIR="${WORK_DIR}/tfrecord"
-mkdir -p "${OUTPUT_DIR}"
-
-IMAGE_FOLDER="${PASCAL_ROOT}/JPEGImages"
-LIST_FOLDER="${PASCAL_ROOT}/ImageSets/Segmentation"
-
-echo "Converting PASCAL VOC 2012 dataset..."
-python3 "${SCRIPT_DIR}/build_voc2012_data.py" \
- --image_folder="${IMAGE_FOLDER}" \
- --semantic_segmentation_folder="${SEMANTIC_SEG_FOLDER}" \
- --list_folder="${LIST_FOLDER}" \
- --image_format="jpg" \
- --output_dir="${OUTPUT_DIR}"
diff --git a/research/deeplab/datasets/remove_gt_colormap.py b/research/deeplab/datasets/remove_gt_colormap.py
deleted file mode 100644
index 900570038ed..00000000000
--- a/research/deeplab/datasets/remove_gt_colormap.py
+++ /dev/null
@@ -1,83 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Removes the color map from segmentation annotations.
-
-Removes the color map from the ground truth segmentation annotations and save
-the results to output_dir.
-"""
-import glob
-import os.path
-import numpy as np
-
-from PIL import Image
-
-import tensorflow as tf
-
-FLAGS = tf.compat.v1.flags.FLAGS
-
-tf.compat.v1.flags.DEFINE_string('original_gt_folder',
- './VOCdevkit/VOC2012/SegmentationClass',
- 'Original ground truth annotations.')
-
-tf.compat.v1.flags.DEFINE_string('segmentation_format', 'png', 'Segmentation format.')
-
-tf.compat.v1.flags.DEFINE_string('output_dir',
- './VOCdevkit/VOC2012/SegmentationClassRaw',
- 'folder to save modified ground truth annotations.')
-
-
-def _remove_colormap(filename):
- """Removes the color map from the annotation.
-
- Args:
- filename: Ground truth annotation filename.
-
- Returns:
- Annotation without color map.
- """
- return np.array(Image.open(filename))
-
-
-def _save_annotation(annotation, filename):
- """Saves the annotation as png file.
-
- Args:
- annotation: Segmentation annotation.
- filename: Output filename.
- """
- pil_image = Image.fromarray(annotation.astype(dtype=np.uint8))
- with tf.io.gfile.GFile(filename, mode='w') as f:
- pil_image.save(f, 'PNG')
-
-
-def main(unused_argv):
- # Create the output directory if not exists.
- if not tf.io.gfile.isdir(FLAGS.output_dir):
- tf.io.gfile.makedirs(FLAGS.output_dir)
-
- annotations = glob.glob(os.path.join(FLAGS.original_gt_folder,
- '*.' + FLAGS.segmentation_format))
- for annotation in annotations:
- raw_annotation = _remove_colormap(annotation)
- filename = os.path.basename(annotation)[:-4]
- _save_annotation(raw_annotation,
- os.path.join(
- FLAGS.output_dir,
- filename + '.' + FLAGS.segmentation_format))
-
-
-if __name__ == '__main__':
- tf.compat.v1.app.run()
diff --git a/research/deeplab/deeplab_demo.ipynb b/research/deeplab/deeplab_demo.ipynb
deleted file mode 100644
index 81ccfde1b64..00000000000
--- a/research/deeplab/deeplab_demo.ipynb
+++ /dev/null
@@ -1,369 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "KFPcBuVFw61h"
- },
- "source": [
- "# Overview\n",
- "\n",
- "This colab demonstrates the steps to use the DeepLab model to perform semantic segmentation on a sample input image. Expected outputs are semantic labels overlayed on the sample image.\n",
- "\n",
- "### About DeepLab\n",
- "The models used in this colab perform semantic segmentation. Semantic segmentation models focus on assigning semantic labels, such as sky, person, or car, to multiple objects and stuff in a single image."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "t3ozFsEEP-u_"
- },
- "source": [
- "# Instructions\n",
- "\u003ch3\u003e\u003ca href=\"https://cloud.google.com/tpu/\"\u003e\u003cimg valign=\"middle\" src=\"https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png\" width=\"50\"\u003e\u003c/a\u003e \u0026nbsp;\u0026nbsp;Use a free TPU device\u003c/h3\u003e\n",
- "\n",
- " 1. On the main menu, click Runtime and select **Change runtime type**. Set \"TPU\" as the hardware accelerator.\n",
- " 1. Click Runtime again and select **Runtime \u003e Run All**. You can also run the cells manually with Shift-ENTER."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "7cRiapZ1P3wy"
- },
- "source": [
- "## Import Libraries"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 0,
- "metadata": {
- "cellView": "code",
- "colab": {},
- "colab_type": "code",
- "id": "kAbdmRmvq0Je"
- },
- "outputs": [],
- "source": [
- "import os\n",
- "from io import BytesIO\n",
- "import tarfile\n",
- "import tempfile\n",
- "from six.moves import urllib\n",
- "\n",
- "from matplotlib import gridspec\n",
- "from matplotlib import pyplot as plt\n",
- "import numpy as np\n",
- "from PIL import Image\n",
- "\n",
- "%tensorflow_version 1.x\n",
- "import tensorflow as tf"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "p47cYGGOQE1W"
- },
- "source": [
- "## Import helper methods\n",
- "These methods help us perform the following tasks:\n",
- "* Load the latest version of the pretrained DeepLab model\n",
- "* Load the colormap from the PASCAL VOC dataset\n",
- "* Adds colors to various labels, such as \"pink\" for people, \"green\" for bicycle and more\n",
- "* Visualize an image, and add an overlay of colors on various regions"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 0,
- "metadata": {
- "cellView": "code",
- "colab": {},
- "colab_type": "code",
- "id": "vN0kU6NJ1Ye5"
- },
- "outputs": [],
- "source": [
- "class DeepLabModel(object):\n",
- " \"\"\"Class to load deeplab model and run inference.\"\"\"\n",
- "\n",
- " INPUT_TENSOR_NAME = 'ImageTensor:0'\n",
- " OUTPUT_TENSOR_NAME = 'SemanticPredictions:0'\n",
- " INPUT_SIZE = 513\n",
- " FROZEN_GRAPH_NAME = 'frozen_inference_graph'\n",
- "\n",
- " def __init__(self, tarball_path):\n",
- " \"\"\"Creates and loads pretrained deeplab model.\"\"\"\n",
- " self.graph = tf.Graph()\n",
- "\n",
- " graph_def = None\n",
- " # Extract frozen graph from tar archive.\n",
- " tar_file = tarfile.open(tarball_path)\n",
- " for tar_info in tar_file.getmembers():\n",
- " if self.FROZEN_GRAPH_NAME in os.path.basename(tar_info.name):\n",
- " file_handle = tar_file.extractfile(tar_info)\n",
- " graph_def = tf.GraphDef.FromString(file_handle.read())\n",
- " break\n",
- "\n",
- " tar_file.close()\n",
- "\n",
- " if graph_def is None:\n",
- " raise RuntimeError('Cannot find inference graph in tar archive.')\n",
- "\n",
- " with self.graph.as_default():\n",
- " tf.import_graph_def(graph_def, name='')\n",
- "\n",
- " self.sess = tf.Session(graph=self.graph)\n",
- "\n",
- " def run(self, image):\n",
- " \"\"\"Runs inference on a single image.\n",
- "\n",
- " Args:\n",
- " image: A PIL.Image object, raw input image.\n",
- "\n",
- " Returns:\n",
- " resized_image: RGB image resized from original input image.\n",
- " seg_map: Segmentation map of `resized_image`.\n",
- " \"\"\"\n",
- " width, height = image.size\n",
- " resize_ratio = 1.0 * self.INPUT_SIZE / max(width, height)\n",
- " target_size = (int(resize_ratio * width), int(resize_ratio * height))\n",
- " resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)\n",
- " batch_seg_map = self.sess.run(\n",
- " self.OUTPUT_TENSOR_NAME,\n",
- " feed_dict={self.INPUT_TENSOR_NAME: [np.asarray(resized_image)]})\n",
- " seg_map = batch_seg_map[0]\n",
- " return resized_image, seg_map\n",
- "\n",
- "\n",
- "def create_pascal_label_colormap():\n",
- " \"\"\"Creates a label colormap used in PASCAL VOC segmentation benchmark.\n",
- "\n",
- " Returns:\n",
- " A Colormap for visualizing segmentation results.\n",
- " \"\"\"\n",
- " colormap = np.zeros((256, 3), dtype=int)\n",
- " ind = np.arange(256, dtype=int)\n",
- "\n",
- " for shift in reversed(range(8)):\n",
- " for channel in range(3):\n",
- " colormap[:, channel] |= ((ind \u003e\u003e channel) \u0026 1) \u003c\u003c shift\n",
- " ind \u003e\u003e= 3\n",
- "\n",
- " return colormap\n",
- "\n",
- "\n",
- "def label_to_color_image(label):\n",
- " \"\"\"Adds color defined by the dataset colormap to the label.\n",
- "\n",
- " Args:\n",
- " label: A 2D array with integer type, storing the segmentation label.\n",
- "\n",
- " Returns:\n",
- " result: A 2D array with floating type. The element of the array\n",
- " is the color indexed by the corresponding element in the input label\n",
- " to the PASCAL color map.\n",
- "\n",
- " Raises:\n",
- " ValueError: If label is not of rank 2 or its value is larger than color\n",
- " map maximum entry.\n",
- " \"\"\"\n",
- " if label.ndim != 2:\n",
- " raise ValueError('Expect 2-D input label')\n",
- "\n",
- " colormap = create_pascal_label_colormap()\n",
- "\n",
- " if np.max(label) \u003e= len(colormap):\n",
- " raise ValueError('label value too large.')\n",
- "\n",
- " return colormap[label]\n",
- "\n",
- "\n",
- "def vis_segmentation(image, seg_map):\n",
- " \"\"\"Visualizes input image, segmentation map and overlay view.\"\"\"\n",
- " plt.figure(figsize=(15, 5))\n",
- " grid_spec = gridspec.GridSpec(1, 4, width_ratios=[6, 6, 6, 1])\n",
- "\n",
- " plt.subplot(grid_spec[0])\n",
- " plt.imshow(image)\n",
- " plt.axis('off')\n",
- " plt.title('input image')\n",
- "\n",
- " plt.subplot(grid_spec[1])\n",
- " seg_image = label_to_color_image(seg_map).astype(np.uint8)\n",
- " plt.imshow(seg_image)\n",
- " plt.axis('off')\n",
- " plt.title('segmentation map')\n",
- "\n",
- " plt.subplot(grid_spec[2])\n",
- " plt.imshow(image)\n",
- " plt.imshow(seg_image, alpha=0.7)\n",
- " plt.axis('off')\n",
- " plt.title('segmentation overlay')\n",
- "\n",
- " unique_labels = np.unique(seg_map)\n",
- " ax = plt.subplot(grid_spec[3])\n",
- " plt.imshow(\n",
- " FULL_COLOR_MAP[unique_labels].astype(np.uint8), interpolation='nearest')\n",
- " ax.yaxis.tick_right()\n",
- " plt.yticks(range(len(unique_labels)), LABEL_NAMES[unique_labels])\n",
- " plt.xticks([], [])\n",
- " ax.tick_params(width=0.0)\n",
- " plt.grid('off')\n",
- " plt.show()\n",
- "\n",
- "\n",
- "LABEL_NAMES = np.asarray([\n",
- " 'background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',\n",
- " 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',\n",
- " 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tv'\n",
- "])\n",
- "\n",
- "FULL_LABEL_MAP = np.arange(len(LABEL_NAMES)).reshape(len(LABEL_NAMES), 1)\n",
- "FULL_COLOR_MAP = label_to_color_image(FULL_LABEL_MAP)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "nGcZzNkASG9A"
- },
- "source": [
- "## Select a pretrained model\n",
- "We have trained the DeepLab model using various backbone networks. Select one from the MODEL_NAME list."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 0,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "c4oXKmnjw6i_"
- },
- "outputs": [],
- "source": [
- "MODEL_NAME = 'mobilenetv2_coco_voctrainaug' # @param ['mobilenetv2_coco_voctrainaug', 'mobilenetv2_coco_voctrainval', 'xception_coco_voctrainaug', 'xception_coco_voctrainval']\n",
- "\n",
- "_DOWNLOAD_URL_PREFIX = 'http://download.tensorflow.org/models/'\n",
- "_MODEL_URLS = {\n",
- " 'mobilenetv2_coco_voctrainaug':\n",
- " 'deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz',\n",
- " 'mobilenetv2_coco_voctrainval':\n",
- " 'deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz',\n",
- " 'xception_coco_voctrainaug':\n",
- " 'deeplabv3_pascal_train_aug_2018_01_04.tar.gz',\n",
- " 'xception_coco_voctrainval':\n",
- " 'deeplabv3_pascal_trainval_2018_01_04.tar.gz',\n",
- "}\n",
- "_TARBALL_NAME = 'deeplab_model.tar.gz'\n",
- "\n",
- "model_dir = tempfile.mkdtemp()\n",
- "tf.gfile.MakeDirs(model_dir)\n",
- "\n",
- "download_path = os.path.join(model_dir, _TARBALL_NAME)\n",
- "print('downloading model, this might take a while...')\n",
- "urllib.request.urlretrieve(_DOWNLOAD_URL_PREFIX + _MODEL_URLS[MODEL_NAME],\n",
- " download_path)\n",
- "print('download completed! loading DeepLab model...')\n",
- "\n",
- "MODEL = DeepLabModel(download_path)\n",
- "print('model loaded successfully!')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "SZst78N-4OKO"
- },
- "source": [
- "## Run on sample images\n",
- "\n",
- "Select one of sample images (leave `IMAGE_URL` empty) or feed any internet image\n",
- "url for inference.\n",
- "\n",
- "Note that this colab uses single scale inference for fast computation,\n",
- "so the results may slightly differ from the visualizations in the\n",
- "[README](https://github.com/tensorflow/models/blob/master/research/deeplab/README.md) file,\n",
- "which uses multi-scale and left-right flipped inputs."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 0,
- "metadata": {
- "cellView": "form",
- "colab": {},
- "colab_type": "code",
- "id": "edGukUHXyymr"
- },
- "outputs": [],
- "source": [
- "\n",
- "SAMPLE_IMAGE = 'image1' # @param ['image1', 'image2', 'image3']\n",
- "IMAGE_URL = '' #@param {type:\"string\"}\n",
- "\n",
- "_SAMPLE_URL = ('https://github.com/tensorflow/models/blob/master/research/'\n",
- " 'deeplab/g3doc/img/%s.jpg?raw=true')\n",
- "\n",
- "\n",
- "def run_visualization(url):\n",
- " \"\"\"Inferences DeepLab model and visualizes result.\"\"\"\n",
- " try:\n",
- " f = urllib.request.urlopen(url)\n",
- " jpeg_str = f.read()\n",
- " original_im = Image.open(BytesIO(jpeg_str))\n",
- " except IOError:\n",
- " print('Cannot retrieve image. Please check url: ' + url)\n",
- " return\n",
- "\n",
- " print('running deeplab on image %s...' % url)\n",
- " resized_im, seg_map = MODEL.run(original_im)\n",
- "\n",
- " vis_segmentation(resized_im, seg_map)\n",
- "\n",
- "\n",
- "image_url = IMAGE_URL or _SAMPLE_URL % SAMPLE_IMAGE\n",
- "run_visualization(image_url)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "aUbVoHScTJYe"
- },
- "source": [
- "## What's next\n",
- "\n",
- "* Learn about [Cloud TPUs](https://cloud.google.com/tpu/docs) that Google designed and optimized specifically to speed up and scale up ML workloads for training and inference and to enable ML engineers and researchers to iterate more quickly.\n",
- "* Explore the range of [Cloud TPU tutorials and Colabs](https://cloud.google.com/tpu/docs/tutorials) to find other examples that can be used when implementing your ML project.\n",
- "* For more information on running the DeepLab model on Cloud TPUs, see the [DeepLab tutorial](https://cloud.google.com/tpu/docs/tutorials/deeplab).\n"
- ]
- }
- ],
- "metadata": {
- "colab": {
- "collapsed_sections": [],
- "name": "DeepLab Demo.ipynb",
- "provenance": [],
- "toc_visible": true,
- "version": "0.3.2"
- },
- "kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
diff --git a/research/deeplab/deprecated/__init__.py b/research/deeplab/deprecated/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/deeplab/deprecated/segmentation_dataset.py b/research/deeplab/deprecated/segmentation_dataset.py
deleted file mode 100644
index 8a6a8c766e4..00000000000
--- a/research/deeplab/deprecated/segmentation_dataset.py
+++ /dev/null
@@ -1,200 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Provides data from semantic segmentation datasets.
-
-The SegmentationDataset class provides both images and annotations (semantic
-segmentation and/or instance segmentation) for TensorFlow. Currently, we
-support the following datasets:
-
-1. PASCAL VOC 2012 (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/).
-
-PASCAL VOC 2012 semantic segmentation dataset annotates 20 foreground objects
-(e.g., bike, person, and so on) and leaves all the other semantic classes as
-one background class. The dataset contains 1464, 1449, and 1456 annotated
-images for the training, validation and test respectively.
-
-2. Cityscapes dataset (https://www.cityscapes-dataset.com)
-
-The Cityscapes dataset contains 19 semantic labels (such as road, person, car,
-and so on) for urban street scenes.
-
-3. ADE20K dataset (http://groups.csail.mit.edu/vision/datasets/ADE20K)
-
-The ADE20K dataset contains 150 semantic labels both urban street scenes and
-indoor scenes.
-
-References:
- M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. Winn,
- and A. Zisserman, The pascal visual object classes challenge a retrospective.
- IJCV, 2014.
-
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,
- U. Franke, S. Roth, and B. Schiele, "The cityscapes dataset for semantic urban
- scene understanding," In Proc. of CVPR, 2016.
-
- B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba, "Scene Parsing
- through ADE20K dataset", In Proc. of CVPR, 2017.
-"""
-import collections
-import os.path
-import tensorflow as tf
-from tensorflow.contrib import slim as contrib_slim
-
-slim = contrib_slim
-
-dataset = slim.dataset
-
-tfexample_decoder = slim.tfexample_decoder
-
-
-_ITEMS_TO_DESCRIPTIONS = {
- 'image': 'A color image of varying height and width.',
- 'labels_class': ('A semantic segmentation label whose size matches image.'
- 'Its values range from 0 (background) to num_classes.'),
-}
-
-# Named tuple to describe the dataset properties.
-DatasetDescriptor = collections.namedtuple(
- 'DatasetDescriptor',
- ['splits_to_sizes', # Splits of the dataset into training, val, and test.
- 'num_classes', # Number of semantic classes, including the background
- # class (if exists). For example, there are 20
- # foreground classes + 1 background class in the PASCAL
- # VOC 2012 dataset. Thus, we set num_classes=21.
- 'ignore_label', # Ignore label value.
- ]
-)
-
-_CITYSCAPES_INFORMATION = DatasetDescriptor(
- splits_to_sizes={
- 'train_fine': 2975,
- 'val_fine': 500,
- },
- num_classes=19,
- ignore_label=255,
-)
-
-_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
- splits_to_sizes={
- 'train': 1464,
- 'train_aug': 10582,
- 'trainval': 2913,
- 'val': 1449,
- },
- num_classes=21,
- ignore_label=255,
-)
-
-# These number (i.e., 'train'/'test') seems to have to be hard coded
-# You are required to figure it out for your training/testing example.
-_ADE20K_INFORMATION = DatasetDescriptor(
- splits_to_sizes={
- 'train': 20210, # num of samples in images/training
- 'val': 2000, # num of samples in images/validation
- },
- num_classes=151,
- ignore_label=0,
-)
-
-
-_DATASETS_INFORMATION = {
- 'cityscapes': _CITYSCAPES_INFORMATION,
- 'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
- 'ade20k': _ADE20K_INFORMATION,
-}
-
-# Default file pattern of TFRecord of TensorFlow Example.
-_FILE_PATTERN = '%s-*'
-
-
-def get_cityscapes_dataset_name():
- return 'cityscapes'
-
-
-def get_dataset(dataset_name, split_name, dataset_dir):
- """Gets an instance of slim Dataset.
-
- Args:
- dataset_name: Dataset name.
- split_name: A train/val Split name.
- dataset_dir: The directory of the dataset sources.
-
- Returns:
- An instance of slim Dataset.
-
- Raises:
- ValueError: if the dataset_name or split_name is not recognized.
- """
- if dataset_name not in _DATASETS_INFORMATION:
- raise ValueError('The specified dataset is not supported yet.')
-
- splits_to_sizes = _DATASETS_INFORMATION[dataset_name].splits_to_sizes
-
- if split_name not in splits_to_sizes:
- raise ValueError('data split name %s not recognized' % split_name)
-
- # Prepare the variables for different datasets.
- num_classes = _DATASETS_INFORMATION[dataset_name].num_classes
- ignore_label = _DATASETS_INFORMATION[dataset_name].ignore_label
-
- file_pattern = _FILE_PATTERN
- file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
-
- # Specify how the TF-Examples are decoded.
- keys_to_features = {
- 'image/encoded': tf.FixedLenFeature(
- (), tf.string, default_value=''),
- 'image/filename': tf.FixedLenFeature(
- (), tf.string, default_value=''),
- 'image/format': tf.FixedLenFeature(
- (), tf.string, default_value='jpeg'),
- 'image/height': tf.FixedLenFeature(
- (), tf.int64, default_value=0),
- 'image/width': tf.FixedLenFeature(
- (), tf.int64, default_value=0),
- 'image/segmentation/class/encoded': tf.FixedLenFeature(
- (), tf.string, default_value=''),
- 'image/segmentation/class/format': tf.FixedLenFeature(
- (), tf.string, default_value='png'),
- }
- items_to_handlers = {
- 'image': tfexample_decoder.Image(
- image_key='image/encoded',
- format_key='image/format',
- channels=3),
- 'image_name': tfexample_decoder.Tensor('image/filename'),
- 'height': tfexample_decoder.Tensor('image/height'),
- 'width': tfexample_decoder.Tensor('image/width'),
- 'labels_class': tfexample_decoder.Image(
- image_key='image/segmentation/class/encoded',
- format_key='image/segmentation/class/format',
- channels=1),
- }
-
- decoder = tfexample_decoder.TFExampleDecoder(
- keys_to_features, items_to_handlers)
-
- return dataset.Dataset(
- data_sources=file_pattern,
- reader=tf.TFRecordReader,
- decoder=decoder,
- num_samples=splits_to_sizes[split_name],
- items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
- ignore_label=ignore_label,
- num_classes=num_classes,
- name=dataset_name,
- multi_label=True)
diff --git a/research/deeplab/eval.py b/research/deeplab/eval.py
deleted file mode 100644
index 4f5fb8ba9c7..00000000000
--- a/research/deeplab/eval.py
+++ /dev/null
@@ -1,227 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Evaluation script for the DeepLab model.
-
-See model.py for more details and usage.
-"""
-
-import numpy as np
-import six
-import tensorflow as tf
-from tensorflow.contrib import metrics as contrib_metrics
-from tensorflow.contrib import quantize as contrib_quantize
-from tensorflow.contrib import tfprof as contrib_tfprof
-from tensorflow.contrib import training as contrib_training
-from deeplab import common
-from deeplab import model
-from deeplab.datasets import data_generator
-
-flags = tf.app.flags
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('master', '', 'BNS name of the tensorflow server')
-
-# Settings for log directories.
-
-flags.DEFINE_string('eval_logdir', None, 'Where to write the event logs.')
-
-flags.DEFINE_string('checkpoint_dir', None, 'Directory of model checkpoints.')
-
-# Settings for evaluating the model.
-
-flags.DEFINE_integer('eval_batch_size', 1,
- 'The number of images in each batch during evaluation.')
-
-flags.DEFINE_list('eval_crop_size', '513,513',
- 'Image crop size [height, width] for evaluation.')
-
-flags.DEFINE_integer('eval_interval_secs', 60 * 5,
- 'How often (in seconds) to run evaluation.')
-
-# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
-# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
-# one could use different atrous_rates/output_stride during training/evaluation.
-flags.DEFINE_multi_integer('atrous_rates', None,
- 'Atrous rates for atrous spatial pyramid pooling.')
-
-flags.DEFINE_integer('output_stride', 16,
- 'The ratio of input to output spatial resolution.')
-
-# Change to [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] for multi-scale test.
-flags.DEFINE_multi_float('eval_scales', [1.0],
- 'The scales to resize images for evaluation.')
-
-# Change to True for adding flipped images during test.
-flags.DEFINE_bool('add_flipped_images', False,
- 'Add flipped images for evaluation or not.')
-
-flags.DEFINE_integer(
- 'quantize_delay_step', -1,
- 'Steps to start quantized training. If < 0, will not quantize model.')
-
-# Dataset settings.
-
-flags.DEFINE_string('dataset', 'pascal_voc_seg',
- 'Name of the segmentation dataset.')
-
-flags.DEFINE_string('eval_split', 'val',
- 'Which split of the dataset used for evaluation')
-
-flags.DEFINE_string('dataset_dir', None, 'Where the dataset reside.')
-
-flags.DEFINE_integer('max_number_of_evaluations', 0,
- 'Maximum number of eval iterations. Will loop '
- 'indefinitely upon nonpositive values.')
-
-
-def main(unused_argv):
- tf.logging.set_verbosity(tf.logging.INFO)
-
- dataset = data_generator.Dataset(
- dataset_name=FLAGS.dataset,
- split_name=FLAGS.eval_split,
- dataset_dir=FLAGS.dataset_dir,
- batch_size=FLAGS.eval_batch_size,
- crop_size=[int(sz) for sz in FLAGS.eval_crop_size],
- min_resize_value=FLAGS.min_resize_value,
- max_resize_value=FLAGS.max_resize_value,
- resize_factor=FLAGS.resize_factor,
- model_variant=FLAGS.model_variant,
- num_readers=2,
- is_training=False,
- should_shuffle=False,
- should_repeat=False)
-
- tf.gfile.MakeDirs(FLAGS.eval_logdir)
- tf.logging.info('Evaluating on %s set', FLAGS.eval_split)
-
- with tf.Graph().as_default():
- samples = dataset.get_one_shot_iterator().get_next()
-
- model_options = common.ModelOptions(
- outputs_to_num_classes={common.OUTPUT_TYPE: dataset.num_of_classes},
- crop_size=[int(sz) for sz in FLAGS.eval_crop_size],
- atrous_rates=FLAGS.atrous_rates,
- output_stride=FLAGS.output_stride)
-
- # Set shape in order for tf.contrib.tfprof.model_analyzer to work properly.
- samples[common.IMAGE].set_shape(
- [FLAGS.eval_batch_size,
- int(FLAGS.eval_crop_size[0]),
- int(FLAGS.eval_crop_size[1]),
- 3])
- if tuple(FLAGS.eval_scales) == (1.0,):
- tf.logging.info('Performing single-scale test.')
- predictions = model.predict_labels(samples[common.IMAGE], model_options,
- image_pyramid=FLAGS.image_pyramid)
- else:
- tf.logging.info('Performing multi-scale test.')
- if FLAGS.quantize_delay_step >= 0:
- raise ValueError(
- 'Quantize mode is not supported with multi-scale test.')
-
- predictions = model.predict_labels_multi_scale(
- samples[common.IMAGE],
- model_options=model_options,
- eval_scales=FLAGS.eval_scales,
- add_flipped_images=FLAGS.add_flipped_images)
- predictions = predictions[common.OUTPUT_TYPE]
- predictions = tf.reshape(predictions, shape=[-1])
- labels = tf.reshape(samples[common.LABEL], shape=[-1])
- weights = tf.to_float(tf.not_equal(labels, dataset.ignore_label))
-
- # Set ignore_label regions to label 0, because metrics.mean_iou requires
- # range of labels = [0, dataset.num_classes). Note the ignore_label regions
- # are not evaluated since the corresponding regions contain weights = 0.
- labels = tf.where(
- tf.equal(labels, dataset.ignore_label), tf.zeros_like(labels), labels)
-
- predictions_tag = 'miou'
- for eval_scale in FLAGS.eval_scales:
- predictions_tag += '_' + str(eval_scale)
- if FLAGS.add_flipped_images:
- predictions_tag += '_flipped'
-
- # Define the evaluation metric.
- metric_map = {}
- num_classes = dataset.num_of_classes
- metric_map['eval/%s_overall' % predictions_tag] = tf.metrics.mean_iou(
- labels=labels, predictions=predictions, num_classes=num_classes,
- weights=weights)
- # IoU for each class.
- one_hot_predictions = tf.one_hot(predictions, num_classes)
- one_hot_predictions = tf.reshape(one_hot_predictions, [-1, num_classes])
- one_hot_labels = tf.one_hot(labels, num_classes)
- one_hot_labels = tf.reshape(one_hot_labels, [-1, num_classes])
- for c in range(num_classes):
- predictions_tag_c = '%s_class_%d' % (predictions_tag, c)
- tp, tp_op = tf.metrics.true_positives(
- labels=one_hot_labels[:, c], predictions=one_hot_predictions[:, c],
- weights=weights)
- fp, fp_op = tf.metrics.false_positives(
- labels=one_hot_labels[:, c], predictions=one_hot_predictions[:, c],
- weights=weights)
- fn, fn_op = tf.metrics.false_negatives(
- labels=one_hot_labels[:, c], predictions=one_hot_predictions[:, c],
- weights=weights)
- tp_fp_fn_op = tf.group(tp_op, fp_op, fn_op)
- iou = tf.where(tf.greater(tp + fn, 0.0),
- tp / (tp + fn + fp),
- tf.constant(np.NaN))
- metric_map['eval/%s' % predictions_tag_c] = (iou, tp_fp_fn_op)
-
- (metrics_to_values,
- metrics_to_updates) = contrib_metrics.aggregate_metric_map(metric_map)
-
- summary_ops = []
- for metric_name, metric_value in six.iteritems(metrics_to_values):
- op = tf.summary.scalar(metric_name, metric_value)
- op = tf.Print(op, [metric_value], metric_name)
- summary_ops.append(op)
-
- summary_op = tf.summary.merge(summary_ops)
- summary_hook = contrib_training.SummaryAtEndHook(
- log_dir=FLAGS.eval_logdir, summary_op=summary_op)
- hooks = [summary_hook]
-
- num_eval_iters = None
- if FLAGS.max_number_of_evaluations > 0:
- num_eval_iters = FLAGS.max_number_of_evaluations
-
- if FLAGS.quantize_delay_step >= 0:
- contrib_quantize.create_eval_graph()
-
- contrib_tfprof.model_analyzer.print_model_analysis(
- tf.get_default_graph(),
- tfprof_options=contrib_tfprof.model_analyzer
- .TRAINABLE_VARS_PARAMS_STAT_OPTIONS)
- contrib_tfprof.model_analyzer.print_model_analysis(
- tf.get_default_graph(),
- tfprof_options=contrib_tfprof.model_analyzer.FLOAT_OPS_OPTIONS)
- contrib_training.evaluate_repeatedly(
- checkpoint_dir=FLAGS.checkpoint_dir,
- master=FLAGS.master,
- eval_ops=list(metrics_to_updates.values()),
- max_number_of_evaluations=num_eval_iters,
- hooks=hooks,
- eval_interval_secs=FLAGS.eval_interval_secs)
-
-
-if __name__ == '__main__':
- flags.mark_flag_as_required('checkpoint_dir')
- flags.mark_flag_as_required('eval_logdir')
- flags.mark_flag_as_required('dataset_dir')
- tf.app.run()
diff --git a/research/deeplab/evaluation/README.md b/research/deeplab/evaluation/README.md
deleted file mode 100644
index 69255384e9a..00000000000
--- a/research/deeplab/evaluation/README.md
+++ /dev/null
@@ -1,311 +0,0 @@
-# Evaluation Metrics for Whole Image Parsing
-
-Whole Image Parsing [1], also known as Panoptic Segmentation [2], generalizes
-the tasks of semantic segmentation for "stuff" classes and instance
-segmentation for "thing" classes, assigning both semantic and instance labels
-to every pixel in an image.
-
-Previous works evaluate the parsing result with separate metrics (e.g., one for
-semantic segmentation result and one for object detection result). Recently,
-Kirillov et al. propose the unified instance-based Panoptic Quality (PQ) metric
-[2] into several benchmarks [3, 4].
-
-However, we notice that the instance-based PQ metric often places
-disproportionate emphasis on small instance parsing, as well as on "thing" over
-"stuff" classes. To remedy these effects, we propose an alternative
-region-based Parsing Covering (PC) metric [5], which adapts the Covering
-metric [6], previously used for class-agnostics segmentation quality
-evaluation, to the task of image parsing.
-
-Here, we provide implementation of both PQ and PC for evaluating the parsing
-results. We briefly explain both metrics below for reference.
-
-## Panoptic Quality (PQ)
-
-Given a groundtruth segmentation S and a predicted segmentation S', PQ is
-defined as follows:
-
-
-
-
-
-where R and R' are groundtruth regions and predicted regions respectively,
-and |TP|, |FP|, and |FN| are the number of true positives, false postives,
-and false negatives. The matching is determined by a threshold of 0.5
-Intersection-Over-Union (IOU).
-
-PQ treats all regions of the same ‘stuff‘ class as one instance, and the
-size of instances is not considered. For example, instances with 10 × 10
-pixels contribute equally to the metric as instances with 1000 × 1000 pixels.
-Therefore, PQ is sensitive to false positives with small regions and some
-heuristics could improve the performance, such as removing those small
-regions (as also pointed out in the open-sourced evaluation code from [2]).
-Thus, we argue that PQ is suitable in applications where one cares equally for
-the parsing quality of instances irrespective of their sizes.
-
-## Parsing Covering (PC)
-
-We notice that there are applications where one pays more attention to large
-objects, e.g., autonomous driving (where nearby objects are more important
-than far away ones). Motivated by this, we propose to also evaluate the
-quality of image parsing results by extending the existing Covering metric [5],
-which accounts for instance sizes. Specifically, our proposed metric, Parsing
-Covering (PC), is defined as follows:
-
-
-
-
-
-
-where Si and Si' are the groundtruth segmentation and
-predicted segmentation for the i-th semantic class respectively, and
-Ni is the total number of pixels of groundtruth regions from
-Si . The Covering for class i, Covi , is computed in
-the same way as the original Covering metric except that only groundtruth
-regions from Si and predicted regions from Si' are
-considered. PC is then obtained by computing the average of Covi
-over C semantic classes.
-
-A notable difference between PQ and the proposed PC is that there is no
-matching involved in PC and hence no matching threshold. As an attempt to
-treat equally "thing" and "stuff", the segmentation of "stuff" classes still
-receives partial PC score if the segmentation is only partially correct. For
-example, if one out of three equally-sized trees is perfectly segmented, the
-model will get the same partial score by using PC regardless of considering
-"tree" as "stuff" or "thing".
-
-## Tutorial
-
-To evaluate the parsing results with PQ and PC, we provide two options:
-
-1. Python off-line evaluation with results saved in the [COCO format](http://cocodataset.org/#format-results).
-2. TensorFlow on-line evaluation.
-
-Below, we explain each option in detail.
-
-#### 1. Python off-line evaluation with results saved in COCO format
-
-[COCO result format](http://cocodataset.org/#format-results) has been
-adopted by several benchmarks [3, 4]. Therefore, we provide a convenient
-function, `eval_coco_format`, to evaluate the results saved in COCO format
-in terms of PC and re-implemented PQ.
-
-Before using the provided function, the users need to download the official COCO
-panotpic segmentation task API. Please see [installation](../g3doc/installation.md#add-libraries-to-pythonpath)
-for reference.
-
-Once the official COCO panoptic segmentation task API is downloaded, the
-users should be able to run the `eval_coco_format.py` to evaluate the parsing
-results in terms of both PC and reimplemented PQ.
-
-To be concrete, let's take a look at the function, `eval_coco_format` in
-`eval_coco_format.py`:
-
-```python
-eval_coco_format(gt_json_file,
- pred_json_file,
- gt_folder=None,
- pred_folder=None,
- metric='pq',
- num_categories=201,
- ignored_label=0,
- max_instances_per_category=256,
- intersection_offset=None,
- normalize_by_image_size=True,
- num_workers=0,
- print_digits=3):
-
-```
-where
-
-1. `gt_json_file`: Path to a JSON file giving ground-truth annotations in COCO
-format.
-2. `pred_json_file`: Path to a JSON file for the predictions to evaluate.
-3. `gt_folder`: Folder containing panoptic-format ID images to match
-ground-truth annotations to image regions.
-4. `pred_folder`: Path to a folder containing ID images for predictions.
-5. `metric`: Name of a metric to compute. Set to `pc`, `pq` for evaluation in PC
-or PQ, respectively.
-6. `num_categories`: The number of segmentation categories (or "classes") in the
-dataset.
-7. `ignored_label`: A category id that is ignored in evaluation, e.g. the "void"
-label in COCO panoptic segmentation dataset.
-8. `max_instances_per_category`: The maximum number of instances for each
-category to ensure unique instance labels.
-9. `intersection_offset`: The maximum number of unique labels.
-10. `normalize_by_image_size`: Whether to normalize groundtruth instance region
-areas by image size when using PC.
-11. `num_workers`: If set to a positive number, will spawn child processes to
-compute parts of the metric in parallel by splitting the images between the
-workers. If set to -1, will use the value of multiprocessing.cpu_count().
-12. `print_digits`: Number of significant digits to print in summary of computed
-metrics.
-
-The input arguments have default values set for the COCO panoptic segmentation
-dataset. Thus, users only need to provide the `gt_json_file` and the
-`pred_json_file` (following the COCO format) to run the evaluation on COCO with
-PQ. If users want to evaluate the results on other datasets, they may need
-to change the default values.
-
-As an example, the interested users could take a look at the provided unit
-test, `test_compare_pq_with_reference_eval`, in `eval_coco_format_test.py`.
-
-#### 2. TensorFlow on-line evaluation
-
-Users may also want to run the TensorFlow on-line evaluation, similar to the
-[tf.contrib.metrics.streaming_mean_iou](https://www.tensorflow.org/api_docs/python/tf/contrib/metrics/streaming_mean_iou).
-
-Below, we provide a code snippet that shows how to use the provided
-`streaming_panoptic_quality` and `streaming_parsing_covering`.
-
-```python
-metric_map = {}
-metric_map['panoptic_quality'] = streaming_metrics.streaming_panoptic_quality(
- category_label,
- instance_label,
- category_prediction,
- instance_prediction,
- num_classes=201,
- max_instances_per_category=256,
- ignored_label=0,
- offset=256*256)
-metric_map['parsing_covering'] = streaming_metrics.streaming_parsing_covering(
- category_label,
- instance_label,
- category_prediction,
- instance_prediction,
- num_classes=201,
- max_instances_per_category=256,
- ignored_label=0,
- offset=256*256,
- normalize_by_image_size=True)
-metrics_to_values, metrics_to_updates = slim.metrics.aggregate_metric_map(
- metric_map)
-```
-where `metric_map` is a dictionary storing the streamed results of PQ and PC.
-
-The `category_label` and the `instance_label` are the semantic segmentation and
-instance segmentation groundtruth, respectively. That is, in the panoptic
-segmentation format:
-panoptic_label = category_label * max_instances_per_category + instance_label.
-Similarly, the `category_prediction` and the `instance_prediction` are the
-predicted semantic segmentation and instance segmentation, respectively.
-
-Below, we provide a code snippet about how to summarize the results in the
-context of tf.summary.
-
-```python
-summary_ops = []
-for metric_name, metric_value in metrics_to_values.iteritems():
- if metric_name == 'panoptic_quality':
- [pq, sq, rq, total_tp, total_fn, total_fp] = tf.unstack(
- metric_value, 6, axis=0)
- panoptic_metrics = {
- # Panoptic quality.
- 'pq': pq,
- # Segmentation quality.
- 'sq': sq,
- # Recognition quality.
- 'rq': rq,
- # Total true positives.
- 'total_tp': total_tp,
- # Total false negatives.
- 'total_fn': total_fn,
- # Total false positives.
- 'total_fp': total_fp,
- }
- # Find the valid classes that will be used for evaluation. We will
- # ignore the `ignore_label` class and other classes which have (tp + fn
- # + fp) equal to 0.
- valid_classes = tf.logical_and(
- tf.not_equal(tf.range(0, num_classes), void_label),
- tf.not_equal(total_tp + total_fn + total_fp, 0))
- for target_metric, target_value in panoptic_metrics.iteritems():
- output_metric_name = '{}_{}'.format(metric_name, target_metric)
- op = tf.summary.scalar(
- output_metric_name,
- tf.reduce_mean(tf.boolean_mask(target_value, valid_classes)))
- op = tf.Print(op, [target_value], output_metric_name + '_classwise: ',
- summarize=num_classes)
- op = tf.Print(
- op,
- [tf.reduce_mean(tf.boolean_mask(target_value, valid_classes))],
- output_metric_name + '_mean: ',
- summarize=1)
- summary_ops.append(op)
- elif metric_name == 'parsing_covering':
- [per_class_covering,
- total_per_class_weighted_ious,
- total_per_class_gt_areas] = tf.unstack(metric_value, 3, axis=0)
- # Find the valid classes that will be used for evaluation. We will
- # ignore the `void_label` class and other classes which have
- # total_per_class_weighted_ious + total_per_class_gt_areas equal to 0.
- valid_classes = tf.logical_and(
- tf.not_equal(tf.range(0, num_classes), void_label),
- tf.not_equal(
- total_per_class_weighted_ious + total_per_class_gt_areas, 0))
- op = tf.summary.scalar(
- metric_name,
- tf.reduce_mean(tf.boolean_mask(per_class_covering, valid_classes)))
- op = tf.Print(op, [per_class_covering], metric_name + '_classwise: ',
- summarize=num_classes)
- op = tf.Print(
- op,
- [tf.reduce_mean(
- tf.boolean_mask(per_class_covering, valid_classes))],
- metric_name + '_mean: ',
- summarize=1)
- summary_ops.append(op)
- else:
- raise ValueError('The metric_name "%s" is not supported.' % metric_name)
-```
-
-Afterwards, the users could use the following code to run the evaluation in
-TensorFlow.
-
-Users can take a look at eval.py for reference which provides a simple
-example to run the streaming evaluation of mIOU for semantic segmentation.
-
-```python
-metric_values = slim.evaluation.evaluation_loop(
- master=FLAGS.master,
- checkpoint_dir=FLAGS.checkpoint_dir,
- logdir=FLAGS.eval_logdir,
- num_evals=num_batches,
- eval_op=metrics_to_updates.values(),
- final_op=metrics_to_values.values(),
- summary_op=tf.summary.merge(summary_ops),
- max_number_of_evaluations=FLAGS.max_number_of_evaluations,
- eval_interval_secs=FLAGS.eval_interval_secs)
-```
-
-
-### References
-
-1. **Image Parsing: Unifying Segmentation, Detection, and Recognition**
- Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, and Song-Chun Zhu
- IJCV, 2005.
-
-2. **Panoptic Segmentation**
- Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother and Piotr
- Dollár
- arXiv:1801.00868, 2018.
-
-3. **Microsoft COCO: Common Objects in Context**
- Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross
- Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick,
- Piotr Dollar
- In the Proc. of ECCV, 2014.
-
-4. **The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes**
- Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulò, and Peter Kontschieder
- In the Proc. of ICCV, 2017.
-
-5. **DeeperLab: Single-Shot Image Parser**
- Tien-Ju Yang, Maxwell D. Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu,
- Xiao Zhang, Vivienne Sze, George Papandreou, Liang-Chieh Chen
- arXiv: 1902.05093, 2019.
-
-6. **Contour Detection and Hierarchical Image Segmentation**
- Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik
- PAMI, 2011
diff --git a/research/deeplab/evaluation/__init__.py b/research/deeplab/evaluation/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/deeplab/evaluation/base_metric.py b/research/deeplab/evaluation/base_metric.py
deleted file mode 100644
index ee7606ef44c..00000000000
--- a/research/deeplab/evaluation/base_metric.py
+++ /dev/null
@@ -1,191 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Defines the top-level interface for evaluating segmentations."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import abc
-import numpy as np
-import six
-
-
-_EPSILON = 1e-10
-
-
-def realdiv_maybe_zero(x, y):
- """Element-wise x / y where y may contain zeros, for those returns 0 too."""
- return np.where(
- np.less(np.abs(y), _EPSILON), np.zeros_like(x), np.divide(x, y))
-
-
-@six.add_metaclass(abc.ABCMeta)
-class SegmentationMetric(object):
- """Abstract base class for computers of segmentation metrics.
-
- Subclasses will implement both:
- 1. Comparing the predicted segmentation for an image with the groundtruth.
- 2. Computing the final metric over a set of images.
- These are often done as separate steps, due to the need to accumulate
- intermediate values other than the metric itself across images, computing the
- actual metric value only on these accumulations after all the images have been
- compared.
-
- A simple usage would be:
-
- metric = MetricImplementation(...)
- for , in evaluation_set:
- = run_segmentation()
- metric.compare_and_accumulate(, )
- print(metric.result())
-
- """
-
- def __init__(self, num_categories, ignored_label, max_instances_per_category,
- offset):
- """Base initialization for SegmentationMetric.
-
- Args:
- num_categories: The number of segmentation categories (or "classes" in the
- dataset.
- ignored_label: A category id that is ignored in evaluation, e.g. the void
- label as defined in COCO panoptic segmentation dataset.
- max_instances_per_category: The maximum number of instances for each
- category. Used in ensuring unique instance labels.
- offset: The maximum number of unique labels. This is used, by multiplying
- the ground-truth labels, to generate unique ids for individual regions
- of overlap between groundtruth and predicted segments.
- """
- self.num_categories = num_categories
- self.ignored_label = ignored_label
- self.max_instances_per_category = max_instances_per_category
- self.offset = offset
- self.reset()
-
- def _naively_combine_labels(self, category_array, instance_array):
- """Naively creates a combined label array from categories and instances."""
- return (category_array.astype(np.uint32) * self.max_instances_per_category +
- instance_array.astype(np.uint32))
-
- @abc.abstractmethod
- def compare_and_accumulate(
- self, groundtruth_category_array, groundtruth_instance_array,
- predicted_category_array, predicted_instance_array):
- """Compares predicted segmentation with groundtruth, accumulates its metric.
-
- It is not assumed that instance ids are unique across different categories.
- See for example combine_semantic_and_instance_predictions.py in official
- PanopticAPI evaluation code for issues to consider when fusing category
- and instance labels.
-
- Instances ids of the ignored category have the meaning that id 0 is "void"
- and remaining ones are crowd instances.
-
- Args:
- groundtruth_category_array: A 2D numpy uint16 array of groundtruth
- per-pixel category labels.
- groundtruth_instance_array: A 2D numpy uint16 array of groundtruth
- instance labels.
- predicted_category_array: A 2D numpy uint16 array of predicted per-pixel
- category labels.
- predicted_instance_array: A 2D numpy uint16 array of predicted instance
- labels.
-
- Returns:
- The value of the metric over all comparisons done so far, including this
- one, as a float scalar.
- """
- raise NotImplementedError('Must be implemented in subclasses.')
-
- @abc.abstractmethod
- def result(self):
- """Computes the metric over all comparisons done so far."""
- raise NotImplementedError('Must be implemented in subclasses.')
-
- @abc.abstractmethod
- def detailed_results(self, is_thing=None):
- """Computes and returns the detailed final metric results.
-
- Args:
- is_thing: A boolean array of length `num_categories`. The entry
- `is_thing[category_id]` is True iff that category is a "thing" category
- instead of "stuff."
-
- Returns:
- A dictionary with a breakdown of metrics and/or metric factors by things,
- stuff, and all categories.
- """
- raise NotImplementedError('Not implemented in subclasses.')
-
- @abc.abstractmethod
- def result_per_category(self):
- """For supported metrics, return individual per-category metric values.
-
- Returns:
- A numpy array of shape `[self.num_categories]`, where index `i` is the
- metrics value over only that category.
- """
- raise NotImplementedError('Not implemented in subclass.')
-
- def print_detailed_results(self, is_thing=None, print_digits=3):
- """Prints out a detailed breakdown of metric results.
-
- Args:
- is_thing: A boolean array of length num_categories.
- `is_thing[category_id]` will say whether that category is a "thing"
- rather than "stuff."
- print_digits: Number of significant digits to print in computed metrics.
- """
- raise NotImplementedError('Not implemented in subclass.')
-
- @abc.abstractmethod
- def merge(self, other_instance):
- """Combines the accumulated results of another instance into self.
-
- The following two cases should put `metric_a` into an equivalent state.
-
- Case 1 (with merge):
-
- metric_a = MetricsSubclass(...)
- metric_a.compare_and_accumulate()
- metric_a.compare_and_accumulate()
-
- metric_b = MetricsSubclass(...)
- metric_b.compare_and_accumulate()
- metric_b.compare_and_accumulate()
-
- metric_a.merge(metric_b)
-
- Case 2 (without merge):
-
- metric_a = MetricsSubclass(...)
- metric_a.compare_and_accumulate()
- metric_a.compare_and_accumulate()
- metric_a.compare_and_accumulate()
- metric_a.compare_and_accumulate()
-
- Args:
- other_instance: Another compatible instance of the same metric subclass.
- """
- raise NotImplementedError('Not implemented in subclass.')
-
- @abc.abstractmethod
- def reset(self):
- """Resets the accumulation to the metric class's state at initialization.
-
- Note that this function will be called in SegmentationMetric.__init__.
- """
- raise NotImplementedError('Must be implemented in subclasses.')
diff --git a/research/deeplab/evaluation/eval_coco_format.py b/research/deeplab/evaluation/eval_coco_format.py
deleted file mode 100644
index 1a26446f16b..00000000000
--- a/research/deeplab/evaluation/eval_coco_format.py
+++ /dev/null
@@ -1,338 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Computes evaluation metrics on groundtruth and predictions in COCO format.
-
-The Common Objects in Context (COCO) dataset defines a format for specifying
-combined semantic and instance segmentations as "panoptic" segmentations. This
-is done with the combination of JSON and image files as specified at:
-http://cocodataset.org/#format-results
-where the JSON file specifies the overall structure of the result,
-including the categories for each annotation, and the images specify the image
-region for each annotation in that image by its ID.
-
-This script computes additional metrics such as Parsing Covering on datasets and
-predictions in this format. An implementation of Panoptic Quality is also
-provided for convenience.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-import json
-import multiprocessing
-import os
-
-from absl import app
-from absl import flags
-from absl import logging
-import numpy as np
-from PIL import Image
-import utils as panopticapi_utils
-import six
-
-from deeplab.evaluation import panoptic_quality
-from deeplab.evaluation import parsing_covering
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string(
- 'gt_json_file', None,
- ' Path to a JSON file giving ground-truth annotations in COCO format.')
-flags.DEFINE_string('pred_json_file', None,
- 'Path to a JSON file for the predictions to evaluate.')
-flags.DEFINE_string(
- 'gt_folder', None,
- 'Folder containing panoptic-format ID images to match ground-truth '
- 'annotations to image regions.')
-flags.DEFINE_string('pred_folder', None,
- 'Folder containing ID images for predictions.')
-flags.DEFINE_enum(
- 'metric', 'pq', ['pq', 'pc'], 'Shorthand name of a metric to compute. '
- 'Supported values are:\n'
- 'Panoptic Quality (pq)\n'
- 'Parsing Covering (pc)')
-flags.DEFINE_integer(
- 'num_categories', 201,
- 'The number of segmentation categories (or "classes") in the dataset.')
-flags.DEFINE_integer(
- 'ignored_label', 0,
- 'A category id that is ignored in evaluation, e.g. the void label as '
- 'defined in COCO panoptic segmentation dataset.')
-flags.DEFINE_integer(
- 'max_instances_per_category', 256,
- 'The maximum number of instances for each category. Used in ensuring '
- 'unique instance labels.')
-flags.DEFINE_integer('intersection_offset', None,
- 'The maximum number of unique labels.')
-flags.DEFINE_bool(
- 'normalize_by_image_size', True,
- 'Whether to normalize groundtruth instance region areas by image size. If '
- 'True, groundtruth instance areas and weighted IoUs will be divided by the '
- 'size of the corresponding image before accumulated across the dataset. '
- 'Only used for Parsing Covering (pc) evaluation.')
-flags.DEFINE_integer(
- 'num_workers', 0, 'If set to a positive number, will spawn child processes '
- 'to compute parts of the metric in parallel by splitting '
- 'the images between the workers. If set to -1, will use '
- 'the value of multiprocessing.cpu_count().')
-flags.DEFINE_integer('print_digits', 3,
- 'Number of significant digits to print in metrics.')
-
-
-def _build_metric(metric,
- num_categories,
- ignored_label,
- max_instances_per_category,
- intersection_offset=None,
- normalize_by_image_size=True):
- """Creates a metric aggregator objet of the given name."""
- if metric == 'pq':
- logging.warning('One should check Panoptic Quality results against the '
- 'official COCO API code. Small numerical differences '
- '(< 0.1%) can be magnified by rounding.')
- return panoptic_quality.PanopticQuality(num_categories, ignored_label,
- max_instances_per_category,
- intersection_offset)
- elif metric == 'pc':
- return parsing_covering.ParsingCovering(
- num_categories, ignored_label, max_instances_per_category,
- intersection_offset, normalize_by_image_size)
- else:
- raise ValueError('No implementation for metric "%s"' % metric)
-
-
-def _matched_annotations(gt_json, pred_json):
- """Yields a set of (groundtruth, prediction) image annotation pairs.."""
- image_id_to_pred_ann = {
- annotation['image_id']: annotation
- for annotation in pred_json['annotations']
- }
- for gt_ann in gt_json['annotations']:
- image_id = gt_ann['image_id']
- pred_ann = image_id_to_pred_ann[image_id]
- yield gt_ann, pred_ann
-
-
-def _open_panoptic_id_image(image_path):
- """Loads a COCO-format panoptic ID image from file."""
- return panopticapi_utils.rgb2id(
- np.array(Image.open(image_path), dtype=np.uint32))
-
-
-def _split_panoptic(ann_json, id_array, ignored_label, allow_crowds):
- """Given the COCO JSON and ID map, splits into categories and instances."""
- category = np.zeros(id_array.shape, np.uint16)
- instance = np.zeros(id_array.shape, np.uint16)
- next_instance_id = collections.defaultdict(int)
- # Skip instance label 0 for ignored label. That is reserved for void.
- next_instance_id[ignored_label] = 1
- for segment_info in ann_json['segments_info']:
- if allow_crowds and segment_info['iscrowd']:
- category_id = ignored_label
- else:
- category_id = segment_info['category_id']
- mask = np.equal(id_array, segment_info['id'])
- category[mask] = category_id
- instance[mask] = next_instance_id[category_id]
- next_instance_id[category_id] += 1
- return category, instance
-
-
-def _category_and_instance_from_annotation(ann_json, folder, ignored_label,
- allow_crowds):
- """Given the COCO JSON annotations, finds maps of categories and instances."""
- panoptic_id_image = _open_panoptic_id_image(
- os.path.join(folder, ann_json['file_name']))
- return _split_panoptic(ann_json, panoptic_id_image, ignored_label,
- allow_crowds)
-
-
-def _compute_metric(metric_aggregator, gt_folder, pred_folder,
- annotation_pairs):
- """Iterates over matched annotation pairs and computes a metric over them."""
- for gt_ann, pred_ann in annotation_pairs:
- # We only expect "iscrowd" to appear in the ground-truth, and not in model
- # output. In predicted JSON it is simply ignored, as done in official code.
- gt_category, gt_instance = _category_and_instance_from_annotation(
- gt_ann, gt_folder, metric_aggregator.ignored_label, True)
- pred_category, pred_instance = _category_and_instance_from_annotation(
- pred_ann, pred_folder, metric_aggregator.ignored_label, False)
-
- metric_aggregator.compare_and_accumulate(gt_category, gt_instance,
- pred_category, pred_instance)
- return metric_aggregator
-
-
-def _iterate_work_queue(work_queue):
- """Creates an iterable that retrieves items from a queue until one is None."""
- task = work_queue.get(block=True)
- while task is not None:
- yield task
- task = work_queue.get(block=True)
-
-
-def _run_metrics_worker(metric_aggregator, gt_folder, pred_folder, work_queue,
- result_queue):
- result = _compute_metric(metric_aggregator, gt_folder, pred_folder,
- _iterate_work_queue(work_queue))
- result_queue.put(result, block=True)
-
-
-def _is_thing_array(categories_json, ignored_label):
- """is_thing[category_id] is a bool on if category is "thing" or "stuff"."""
- is_thing_dict = {}
- for category_json in categories_json:
- is_thing_dict[category_json['id']] = bool(category_json['isthing'])
-
- # Check our assumption that the category ids are consecutive.
- # Usually metrics should be able to handle this case, but adding a warning
- # here.
- max_category_id = max(six.iterkeys(is_thing_dict))
- if len(is_thing_dict) != max_category_id + 1:
- seen_ids = six.viewkeys(is_thing_dict)
- all_ids = set(six.moves.range(max_category_id + 1))
- unseen_ids = all_ids.difference(seen_ids)
- if unseen_ids != {ignored_label}:
- logging.warning(
- 'Nonconsecutive category ids or no category JSON specified for ids: '
- '%s', unseen_ids)
-
- is_thing_array = np.zeros(max_category_id + 1)
- for category_id, is_thing in six.iteritems(is_thing_dict):
- is_thing_array[category_id] = is_thing
-
- return is_thing_array
-
-
-def eval_coco_format(gt_json_file,
- pred_json_file,
- gt_folder=None,
- pred_folder=None,
- metric='pq',
- num_categories=201,
- ignored_label=0,
- max_instances_per_category=256,
- intersection_offset=None,
- normalize_by_image_size=True,
- num_workers=0,
- print_digits=3):
- """Top-level code to compute metrics on a COCO-format result.
-
- Note that the default values are set for COCO panoptic segmentation dataset,
- and thus the users may want to change it for their own dataset evaluation.
-
- Args:
- gt_json_file: Path to a JSON file giving ground-truth annotations in COCO
- format.
- pred_json_file: Path to a JSON file for the predictions to evaluate.
- gt_folder: Folder containing panoptic-format ID images to match ground-truth
- annotations to image regions.
- pred_folder: Folder containing ID images for predictions.
- metric: Name of a metric to compute.
- num_categories: The number of segmentation categories (or "classes") in the
- dataset.
- ignored_label: A category id that is ignored in evaluation, e.g. the "void"
- label as defined in the COCO panoptic segmentation dataset.
- max_instances_per_category: The maximum number of instances for each
- category. Used in ensuring unique instance labels.
- intersection_offset: The maximum number of unique labels.
- normalize_by_image_size: Whether to normalize groundtruth instance region
- areas by image size. If True, groundtruth instance areas and weighted IoUs
- will be divided by the size of the corresponding image before accumulated
- across the dataset. Only used for Parsing Covering (pc) evaluation.
- num_workers: If set to a positive number, will spawn child processes to
- compute parts of the metric in parallel by splitting the images between
- the workers. If set to -1, will use the value of
- multiprocessing.cpu_count().
- print_digits: Number of significant digits to print in summary of computed
- metrics.
-
- Returns:
- The computed result of the metric as a float scalar.
- """
- with open(gt_json_file, 'r') as gt_json_fo:
- gt_json = json.load(gt_json_fo)
- with open(pred_json_file, 'r') as pred_json_fo:
- pred_json = json.load(pred_json_fo)
- if gt_folder is None:
- gt_folder = gt_json_file.replace('.json', '')
- if pred_folder is None:
- pred_folder = pred_json_file.replace('.json', '')
- if intersection_offset is None:
- intersection_offset = (num_categories + 1) * max_instances_per_category
-
- metric_aggregator = _build_metric(
- metric, num_categories, ignored_label, max_instances_per_category,
- intersection_offset, normalize_by_image_size)
-
- if num_workers == -1:
- logging.info('Attempting to get the CPU count to set # workers.')
- num_workers = multiprocessing.cpu_count()
-
- if num_workers > 0:
- logging.info('Computing metric in parallel with %d workers.', num_workers)
- work_queue = multiprocessing.Queue()
- result_queue = multiprocessing.Queue()
- workers = []
- worker_args = (metric_aggregator, gt_folder, pred_folder, work_queue,
- result_queue)
- for _ in six.moves.range(num_workers):
- workers.append(
- multiprocessing.Process(target=_run_metrics_worker, args=worker_args))
- for worker in workers:
- worker.start()
- for ann_pair in _matched_annotations(gt_json, pred_json):
- work_queue.put(ann_pair, block=True)
-
- # Will cause each worker to return a result and terminate upon recieving a
- # None task.
- for _ in six.moves.range(num_workers):
- work_queue.put(None, block=True)
-
- # Retrieve results.
- for _ in six.moves.range(num_workers):
- metric_aggregator.merge(result_queue.get(block=True))
-
- for worker in workers:
- worker.join()
- else:
- logging.info('Computing metric in a single process.')
- annotation_pairs = _matched_annotations(gt_json, pred_json)
- _compute_metric(metric_aggregator, gt_folder, pred_folder, annotation_pairs)
-
- is_thing = _is_thing_array(gt_json['categories'], ignored_label)
- metric_aggregator.print_detailed_results(
- is_thing=is_thing, print_digits=print_digits)
- return metric_aggregator.detailed_results(is_thing=is_thing)
-
-
-def main(argv):
- if len(argv) > 1:
- raise app.UsageError('Too many command-line arguments.')
-
- eval_coco_format(FLAGS.gt_json_file, FLAGS.pred_json_file, FLAGS.gt_folder,
- FLAGS.pred_folder, FLAGS.metric, FLAGS.num_categories,
- FLAGS.ignored_label, FLAGS.max_instances_per_category,
- FLAGS.intersection_offset, FLAGS.normalize_by_image_size,
- FLAGS.num_workers, FLAGS.print_digits)
-
-
-if __name__ == '__main__':
- flags.mark_flags_as_required(
- ['gt_json_file', 'gt_folder', 'pred_json_file', 'pred_folder'])
- app.run(main)
diff --git a/research/deeplab/evaluation/eval_coco_format_test.py b/research/deeplab/evaluation/eval_coco_format_test.py
deleted file mode 100644
index d9093ff127e..00000000000
--- a/research/deeplab/evaluation/eval_coco_format_test.py
+++ /dev/null
@@ -1,140 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for eval_coco_format script."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-from absl.testing import absltest
-import evaluation as panopticapi_eval
-
-from deeplab.evaluation import eval_coco_format
-
-_TEST_DIR = 'deeplab/evaluation/testdata'
-
-FLAGS = flags.FLAGS
-
-
-class EvalCocoFormatTest(absltest.TestCase):
-
- def test_compare_pq_with_reference_eval(self):
- sample_data_dir = os.path.join(_TEST_DIR)
- gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
- gt_folder = os.path.join(sample_data_dir, 'coco_gt')
- pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
- pred_folder = os.path.join(sample_data_dir, 'coco_pred')
-
- panopticapi_results = panopticapi_eval.pq_compute(
- gt_json_file, pred_json_file, gt_folder, pred_folder)
- deeplab_results = eval_coco_format.eval_coco_format(
- gt_json_file,
- pred_json_file,
- gt_folder,
- pred_folder,
- metric='pq',
- num_categories=7,
- ignored_label=0,
- max_instances_per_category=256,
- intersection_offset=(256 * 256))
- self.assertCountEqual(
- list(deeplab_results.keys()), ['All', 'Things', 'Stuff'])
- for cat_group in ['All', 'Things', 'Stuff']:
- self.assertCountEqual(deeplab_results[cat_group], ['pq', 'sq', 'rq', 'n'])
- for metric in ['pq', 'sq', 'rq', 'n']:
- self.assertAlmostEqual(deeplab_results[cat_group][metric],
- panopticapi_results[cat_group][metric])
-
- def test_compare_pc_with_golden_value(self):
- sample_data_dir = os.path.join(_TEST_DIR)
- gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
- gt_folder = os.path.join(sample_data_dir, 'coco_gt')
- pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
- pred_folder = os.path.join(sample_data_dir, 'coco_pred')
-
- deeplab_results = eval_coco_format.eval_coco_format(
- gt_json_file,
- pred_json_file,
- gt_folder,
- pred_folder,
- metric='pc',
- num_categories=7,
- ignored_label=0,
- max_instances_per_category=256,
- intersection_offset=(256 * 256),
- normalize_by_image_size=False)
- self.assertCountEqual(
- list(deeplab_results.keys()), ['All', 'Things', 'Stuff'])
- for cat_group in ['All', 'Things', 'Stuff']:
- self.assertCountEqual(deeplab_results[cat_group], ['pc', 'n'])
- self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68210561)
- self.assertEqual(deeplab_results['All']['n'], 6)
- self.assertAlmostEqual(deeplab_results['Things']['pc'], 0.5890529)
- self.assertEqual(deeplab_results['Things']['n'], 4)
- self.assertAlmostEqual(deeplab_results['Stuff']['pc'], 0.86821097)
- self.assertEqual(deeplab_results['Stuff']['n'], 2)
-
- def test_compare_pc_with_golden_value_normalize_by_size(self):
- sample_data_dir = os.path.join(_TEST_DIR)
- gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
- gt_folder = os.path.join(sample_data_dir, 'coco_gt')
- pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
- pred_folder = os.path.join(sample_data_dir, 'coco_pred')
-
- deeplab_results = eval_coco_format.eval_coco_format(
- gt_json_file,
- pred_json_file,
- gt_folder,
- pred_folder,
- metric='pc',
- num_categories=7,
- ignored_label=0,
- max_instances_per_category=256,
- intersection_offset=(256 * 256),
- normalize_by_image_size=True)
- self.assertCountEqual(
- list(deeplab_results.keys()), ['All', 'Things', 'Stuff'])
- self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68214908840)
-
- def test_pc_with_multiple_workers(self):
- sample_data_dir = os.path.join(_TEST_DIR)
- gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
- gt_folder = os.path.join(sample_data_dir, 'coco_gt')
- pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
- pred_folder = os.path.join(sample_data_dir, 'coco_pred')
-
- deeplab_results = eval_coco_format.eval_coco_format(
- gt_json_file,
- pred_json_file,
- gt_folder,
- pred_folder,
- metric='pc',
- num_categories=7,
- ignored_label=0,
- max_instances_per_category=256,
- intersection_offset=(256 * 256),
- num_workers=3,
- normalize_by_image_size=False)
- self.assertCountEqual(
- list(deeplab_results.keys()), ['All', 'Things', 'Stuff'])
- self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68210561668)
-
-
-if __name__ == '__main__':
- absltest.main()
diff --git a/research/deeplab/evaluation/g3doc/img/equation_pc.png b/research/deeplab/evaluation/g3doc/img/equation_pc.png
deleted file mode 100644
index 90f15e7a461..00000000000
Binary files a/research/deeplab/evaluation/g3doc/img/equation_pc.png and /dev/null differ
diff --git a/research/deeplab/evaluation/g3doc/img/equation_pq.png b/research/deeplab/evaluation/g3doc/img/equation_pq.png
deleted file mode 100644
index 13a4393c181..00000000000
Binary files a/research/deeplab/evaluation/g3doc/img/equation_pq.png and /dev/null differ
diff --git a/research/deeplab/evaluation/panoptic_quality.py b/research/deeplab/evaluation/panoptic_quality.py
deleted file mode 100644
index f7d0f3f98f0..00000000000
--- a/research/deeplab/evaluation/panoptic_quality.py
+++ /dev/null
@@ -1,259 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Implementation of the Panoptic Quality metric.
-
-Panoptic Quality is an instance-based metric for evaluating the task of
-image parsing, aka panoptic segmentation.
-
-Please see the paper for details:
-"Panoptic Segmentation", Alexander Kirillov, Kaiming He, Ross Girshick,
-Carsten Rother and Piotr Dollar. arXiv:1801.00868, 2018.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-import numpy as np
-import prettytable
-import six
-
-from deeplab.evaluation import base_metric
-
-
-def _ids_to_counts(id_array):
- """Given a numpy array, a mapping from each unique entry to its count."""
- ids, counts = np.unique(id_array, return_counts=True)
- return dict(six.moves.zip(ids, counts))
-
-
-class PanopticQuality(base_metric.SegmentationMetric):
- """Metric class for Panoptic Quality.
-
- "Panoptic Segmentation" by Alexander Kirillov, Kaiming He, Ross Girshick,
- Carsten Rother, Piotr Dollar.
- https://arxiv.org/abs/1801.00868
- """
-
- def compare_and_accumulate(
- self, groundtruth_category_array, groundtruth_instance_array,
- predicted_category_array, predicted_instance_array):
- """See base class."""
- # First, combine the category and instance labels so that every unique
- # value for (category, instance) is assigned a unique integer label.
- pred_segment_id = self._naively_combine_labels(predicted_category_array,
- predicted_instance_array)
- gt_segment_id = self._naively_combine_labels(groundtruth_category_array,
- groundtruth_instance_array)
-
- # Pre-calculate areas for all groundtruth and predicted segments.
- gt_segment_areas = _ids_to_counts(gt_segment_id)
- pred_segment_areas = _ids_to_counts(pred_segment_id)
-
- # We assume there is only one void segment and it has instance id = 0.
- void_segment_id = self.ignored_label * self.max_instances_per_category
-
- # There may be other ignored groundtruth segments with instance id > 0, find
- # those ids using the unique segment ids extracted with the area computation
- # above.
- ignored_segment_ids = {
- gt_segment_id for gt_segment_id in six.iterkeys(gt_segment_areas)
- if (gt_segment_id //
- self.max_instances_per_category) == self.ignored_label
- }
-
- # Next, combine the groundtruth and predicted labels. Dividing up the pixels
- # based on which groundtruth segment and which predicted segment they belong
- # to, this will assign a different 32-bit integer label to each choice
- # of (groundtruth segment, predicted segment), encoded as
- # gt_segment_id * offset + pred_segment_id.
- intersection_id_array = (
- gt_segment_id.astype(np.uint32) * self.offset +
- pred_segment_id.astype(np.uint32))
-
- # For every combination of (groundtruth segment, predicted segment) with a
- # non-empty intersection, this counts the number of pixels in that
- # intersection.
- intersection_areas = _ids_to_counts(intersection_id_array)
-
- # Helper function that computes the area of the overlap between a predicted
- # segment and the ground-truth void/ignored segment.
- def prediction_void_overlap(pred_segment_id):
- void_intersection_id = void_segment_id * self.offset + pred_segment_id
- return intersection_areas.get(void_intersection_id, 0)
-
- # Compute overall ignored overlap.
- def prediction_ignored_overlap(pred_segment_id):
- total_ignored_overlap = 0
- for ignored_segment_id in ignored_segment_ids:
- intersection_id = ignored_segment_id * self.offset + pred_segment_id
- total_ignored_overlap += intersection_areas.get(intersection_id, 0)
- return total_ignored_overlap
-
- # Sets that are populated with which segments groundtruth/predicted segments
- # have been matched with overlapping predicted/groundtruth segments
- # respectively.
- gt_matched = set()
- pred_matched = set()
-
- # Calculate IoU per pair of intersecting segments of the same category.
- for intersection_id, intersection_area in six.iteritems(intersection_areas):
- gt_segment_id = intersection_id // self.offset
- pred_segment_id = intersection_id % self.offset
-
- gt_category = gt_segment_id // self.max_instances_per_category
- pred_category = pred_segment_id // self.max_instances_per_category
- if gt_category != pred_category:
- continue
-
- # Union between the groundtruth and predicted segments being compared does
- # not include the portion of the predicted segment that consists of
- # groundtruth "void" pixels.
- union = (
- gt_segment_areas[gt_segment_id] +
- pred_segment_areas[pred_segment_id] - intersection_area -
- prediction_void_overlap(pred_segment_id))
- iou = intersection_area / union
- if iou > 0.5:
- self.tp_per_class[gt_category] += 1
- self.iou_per_class[gt_category] += iou
- gt_matched.add(gt_segment_id)
- pred_matched.add(pred_segment_id)
-
- # Count false negatives for each category.
- for gt_segment_id in six.iterkeys(gt_segment_areas):
- if gt_segment_id in gt_matched:
- continue
- category = gt_segment_id // self.max_instances_per_category
- # Failing to detect a void segment is not a false negative.
- if category == self.ignored_label:
- continue
- self.fn_per_class[category] += 1
-
- # Count false positives for each category.
- for pred_segment_id in six.iterkeys(pred_segment_areas):
- if pred_segment_id in pred_matched:
- continue
- # A false positive is not penalized if is mostly ignored in the
- # groundtruth.
- if (prediction_ignored_overlap(pred_segment_id) /
- pred_segment_areas[pred_segment_id]) > 0.5:
- continue
- category = pred_segment_id // self.max_instances_per_category
- self.fp_per_class[category] += 1
-
- return self.result()
-
- def _valid_categories(self):
- """Categories with a "valid" value for the metric, have > 0 instances.
-
- We will ignore the `ignore_label` class and other classes which have
- `tp + fn + fp = 0`.
-
- Returns:
- Boolean array of shape `[num_categories]`.
- """
- valid_categories = np.not_equal(
- self.tp_per_class + self.fn_per_class + self.fp_per_class, 0)
- if self.ignored_label >= 0 and self.ignored_label < self.num_categories:
- valid_categories[self.ignored_label] = False
- return valid_categories
-
- def detailed_results(self, is_thing=None):
- """See base class."""
- valid_categories = self._valid_categories()
-
- # If known, break down which categories are valid _and_ things/stuff.
- category_sets = collections.OrderedDict()
- category_sets['All'] = valid_categories
- if is_thing is not None:
- category_sets['Things'] = np.logical_and(valid_categories, is_thing)
- category_sets['Stuff'] = np.logical_and(valid_categories,
- np.logical_not(is_thing))
-
- # Compute individual per-class metrics that constitute factors of PQ.
- sq = base_metric.realdiv_maybe_zero(self.iou_per_class, self.tp_per_class)
- rq = base_metric.realdiv_maybe_zero(
- self.tp_per_class,
- self.tp_per_class + 0.5 * self.fn_per_class + 0.5 * self.fp_per_class)
- pq = np.multiply(sq, rq)
-
- # Assemble detailed results dictionary.
- results = {}
- for category_set_name, in_category_set in six.iteritems(category_sets):
- if np.any(in_category_set):
- results[category_set_name] = {
- 'pq': np.mean(pq[in_category_set]),
- 'sq': np.mean(sq[in_category_set]),
- 'rq': np.mean(rq[in_category_set]),
- # The number of categories in this subset.
- 'n': np.sum(in_category_set.astype(np.int32)),
- }
- else:
- results[category_set_name] = {'pq': 0, 'sq': 0, 'rq': 0, 'n': 0}
-
- return results
-
- def result_per_category(self):
- """See base class."""
- sq = base_metric.realdiv_maybe_zero(self.iou_per_class, self.tp_per_class)
- rq = base_metric.realdiv_maybe_zero(
- self.tp_per_class,
- self.tp_per_class + 0.5 * self.fn_per_class + 0.5 * self.fp_per_class)
- return np.multiply(sq, rq)
-
- def print_detailed_results(self, is_thing=None, print_digits=3):
- """See base class."""
- results = self.detailed_results(is_thing=is_thing)
-
- tab = prettytable.PrettyTable()
-
- tab.add_column('', [], align='l')
- for fieldname in ['PQ', 'SQ', 'RQ', 'N']:
- tab.add_column(fieldname, [], align='r')
-
- for category_set, subset_results in six.iteritems(results):
- data_cols = [
- round(subset_results[col_key], print_digits) * 100
- for col_key in ['pq', 'sq', 'rq']
- ]
- data_cols += [subset_results['n']]
- tab.add_row([category_set] + data_cols)
-
- print(tab)
-
- def result(self):
- """See base class."""
- pq_per_class = self.result_per_category()
- valid_categories = self._valid_categories()
- if not np.any(valid_categories):
- return 0.
- return np.mean(pq_per_class[valid_categories])
-
- def merge(self, other_instance):
- """See base class."""
- self.iou_per_class += other_instance.iou_per_class
- self.tp_per_class += other_instance.tp_per_class
- self.fn_per_class += other_instance.fn_per_class
- self.fp_per_class += other_instance.fp_per_class
-
- def reset(self):
- """See base class."""
- self.iou_per_class = np.zeros(self.num_categories, dtype=np.float64)
- self.tp_per_class = np.zeros(self.num_categories, dtype=np.float64)
- self.fn_per_class = np.zeros(self.num_categories, dtype=np.float64)
- self.fp_per_class = np.zeros(self.num_categories, dtype=np.float64)
diff --git a/research/deeplab/evaluation/panoptic_quality_test.py b/research/deeplab/evaluation/panoptic_quality_test.py
deleted file mode 100644
index 00c88c293b8..00000000000
--- a/research/deeplab/evaluation/panoptic_quality_test.py
+++ /dev/null
@@ -1,336 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for Panoptic Quality metric."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-
-from absl.testing import absltest
-import numpy as np
-import six
-
-from deeplab.evaluation import panoptic_quality
-from deeplab.evaluation import test_utils
-
-# See the definition of the color names at:
-# https://en.wikipedia.org/wiki/Web_colors.
-_CLASS_COLOR_MAP = {
- (0, 0, 0): 0,
- (0, 0, 255): 1, # Person (blue).
- (255, 0, 0): 2, # Bear (red).
- (0, 255, 0): 3, # Tree (lime).
- (255, 0, 255): 4, # Bird (fuchsia).
- (0, 255, 255): 5, # Sky (aqua).
- (255, 255, 0): 6, # Cat (yellow).
-}
-
-
-class PanopticQualityTest(absltest.TestCase):
-
- def test_perfect_match(self):
- categories = np.zeros([6, 6], np.uint16)
- instances = np.array([
- [1, 1, 1, 1, 1, 1],
- [1, 2, 2, 2, 2, 1],
- [1, 2, 2, 2, 2, 1],
- [1, 2, 2, 2, 2, 1],
- [1, 2, 2, 1, 1, 1],
- [1, 2, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
-
- pq = panoptic_quality.PanopticQuality(
- num_categories=1,
- ignored_label=2,
- max_instances_per_category=16,
- offset=16)
- pq.compare_and_accumulate(categories, instances, categories, instances)
- np.testing.assert_array_equal(pq.iou_per_class, [2.0])
- np.testing.assert_array_equal(pq.tp_per_class, [2])
- np.testing.assert_array_equal(pq.fn_per_class, [0])
- np.testing.assert_array_equal(pq.fp_per_class, [0])
- np.testing.assert_array_equal(pq.result_per_category(), [1.0])
- self.assertEqual(pq.result(), 1.0)
-
- def test_totally_wrong(self):
- det_categories = np.array([
- [0, 0, 0, 0, 0, 0],
- [0, 1, 0, 0, 1, 0],
- [0, 1, 1, 1, 1, 0],
- [0, 1, 1, 1, 1, 0],
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- ],
- dtype=np.uint16)
- gt_categories = 1 - det_categories
- instances = np.zeros([6, 6], np.uint16)
-
- pq = panoptic_quality.PanopticQuality(
- num_categories=2,
- ignored_label=2,
- max_instances_per_category=1,
- offset=16)
- pq.compare_and_accumulate(gt_categories, instances, det_categories,
- instances)
- np.testing.assert_array_equal(pq.iou_per_class, [0.0, 0.0])
- np.testing.assert_array_equal(pq.tp_per_class, [0, 0])
- np.testing.assert_array_equal(pq.fn_per_class, [1, 1])
- np.testing.assert_array_equal(pq.fp_per_class, [1, 1])
- np.testing.assert_array_equal(pq.result_per_category(), [0.0, 0.0])
- self.assertEqual(pq.result(), 0.0)
-
- def test_matches_by_iou(self):
- good_det_labels = np.array(
- [
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 2, 2, 2, 2, 1],
- [1, 2, 2, 2, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
- gt_labels = np.array(
- [
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 2, 2, 2, 1],
- [1, 2, 2, 2, 2, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
-
- pq = panoptic_quality.PanopticQuality(
- num_categories=1,
- ignored_label=2,
- max_instances_per_category=16,
- offset=16)
- pq.compare_and_accumulate(
- np.zeros_like(gt_labels), gt_labels, np.zeros_like(good_det_labels),
- good_det_labels)
-
- # iou(1, 1) = 28/30
- # iou(2, 2) = 6/8
- np.testing.assert_array_almost_equal(pq.iou_per_class, [28 / 30 + 6 / 8])
- np.testing.assert_array_equal(pq.tp_per_class, [2])
- np.testing.assert_array_equal(pq.fn_per_class, [0])
- np.testing.assert_array_equal(pq.fp_per_class, [0])
- self.assertAlmostEqual(pq.result(), (28 / 30 + 6 / 8) / 2)
-
- bad_det_labels = np.array(
- [
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 2, 2, 1],
- [1, 1, 1, 2, 2, 1],
- [1, 1, 1, 2, 2, 1],
- [1, 1, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
-
- pq.reset()
- pq.compare_and_accumulate(
- np.zeros_like(gt_labels), gt_labels, np.zeros_like(bad_det_labels),
- bad_det_labels)
-
- # iou(1, 1) = 27/32
- np.testing.assert_array_almost_equal(pq.iou_per_class, [27 / 32])
- np.testing.assert_array_equal(pq.tp_per_class, [1])
- np.testing.assert_array_equal(pq.fn_per_class, [1])
- np.testing.assert_array_equal(pq.fp_per_class, [1])
- self.assertAlmostEqual(pq.result(), (27 / 32) * (1 / 2))
-
- def test_wrong_instances(self):
- categories = np.array([
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 2, 2, 1, 2, 2],
- [1, 2, 2, 1, 2, 2],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
- predicted_instances = np.array([
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 1, 1],
- [0, 0, 0, 0, 1, 1],
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- ],
- dtype=np.uint16)
- groundtruth_instances = np.zeros([6, 6], dtype=np.uint16)
-
- pq = panoptic_quality.PanopticQuality(
- num_categories=3,
- ignored_label=0,
- max_instances_per_category=10,
- offset=100)
- pq.compare_and_accumulate(categories, groundtruth_instances, categories,
- predicted_instances)
-
- np.testing.assert_array_equal(pq.iou_per_class, [0.0, 1.0, 0.0])
- np.testing.assert_array_equal(pq.tp_per_class, [0, 1, 0])
- np.testing.assert_array_equal(pq.fn_per_class, [0, 0, 1])
- np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 2])
- np.testing.assert_array_equal(pq.result_per_category(), [0, 1, 0])
- self.assertAlmostEqual(pq.result(), 0.5)
-
- def test_instance_order_is_arbitrary(self):
- categories = np.array([
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 2, 2, 1, 2, 2],
- [1, 2, 2, 1, 2, 2],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
- predicted_instances = np.array([
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 1, 1],
- [0, 0, 0, 0, 1, 1],
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- ],
- dtype=np.uint16)
- groundtruth_instances = np.array([
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- [0, 1, 1, 0, 0, 0],
- [0, 1, 1, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- ],
- dtype=np.uint16)
-
- pq = panoptic_quality.PanopticQuality(
- num_categories=3,
- ignored_label=0,
- max_instances_per_category=10,
- offset=100)
- pq.compare_and_accumulate(categories, groundtruth_instances, categories,
- predicted_instances)
-
- np.testing.assert_array_equal(pq.iou_per_class, [0.0, 1.0, 2.0])
- np.testing.assert_array_equal(pq.tp_per_class, [0, 1, 2])
- np.testing.assert_array_equal(pq.fn_per_class, [0, 0, 0])
- np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 0])
- np.testing.assert_array_equal(pq.result_per_category(), [0, 1, 1])
- self.assertAlmostEqual(pq.result(), 1.0)
-
- def test_matches_expected(self):
- pred_classes = test_utils.read_segmentation_with_rgb_color_map(
- 'team_pred_class.png', _CLASS_COLOR_MAP)
- pred_instances = test_utils.read_test_image(
- 'team_pred_instance.png', mode='L')
-
- instance_class_map = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 2,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
- 'team_gt_instance.png', instance_class_map)
-
- pq = panoptic_quality.PanopticQuality(
- num_categories=3,
- ignored_label=0,
- max_instances_per_category=256,
- offset=256 * 256)
- pq.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
- pred_instances)
- np.testing.assert_array_almost_equal(
- pq.iou_per_class, [2.06104, 5.26827, 0.54069], decimal=4)
- np.testing.assert_array_equal(pq.tp_per_class, [1, 7, 1])
- np.testing.assert_array_equal(pq.fn_per_class, [0, 1, 0])
- np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 0])
- np.testing.assert_array_almost_equal(pq.result_per_category(),
- [2.061038, 0.702436, 0.54069])
- self.assertAlmostEqual(pq.result(), 0.62156287)
-
- def test_merge_accumulates_all_across_instances(self):
- categories = np.zeros([6, 6], np.uint16)
- good_det_labels = np.array([
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 2, 2, 2, 2, 1],
- [1, 2, 2, 2, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
- gt_labels = np.array([
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 2, 2, 2, 1],
- [1, 2, 2, 2, 2, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
-
- good_pq = panoptic_quality.PanopticQuality(
- num_categories=1,
- ignored_label=2,
- max_instances_per_category=16,
- offset=16)
- for _ in six.moves.range(2):
- good_pq.compare_and_accumulate(categories, gt_labels, categories,
- good_det_labels)
-
- bad_det_labels = np.array([
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 1, 1, 1],
- [1, 1, 1, 2, 2, 1],
- [1, 1, 1, 2, 2, 1],
- [1, 1, 1, 2, 2, 1],
- [1, 1, 1, 1, 1, 1],
- ],
- dtype=np.uint16)
-
- bad_pq = panoptic_quality.PanopticQuality(
- num_categories=1,
- ignored_label=2,
- max_instances_per_category=16,
- offset=16)
- for _ in six.moves.range(2):
- bad_pq.compare_and_accumulate(categories, gt_labels, categories,
- bad_det_labels)
-
- good_pq.merge(bad_pq)
-
- np.testing.assert_array_almost_equal(
- good_pq.iou_per_class, [2 * (28 / 30 + 6 / 8) + 2 * (27 / 32)])
- np.testing.assert_array_equal(good_pq.tp_per_class, [2 * 2 + 2])
- np.testing.assert_array_equal(good_pq.fn_per_class, [2])
- np.testing.assert_array_equal(good_pq.fp_per_class, [2])
- self.assertAlmostEqual(good_pq.result(), 0.63177083)
-
-
-if __name__ == '__main__':
- absltest.main()
diff --git a/research/deeplab/evaluation/parsing_covering.py b/research/deeplab/evaluation/parsing_covering.py
deleted file mode 100644
index a40e55fc6be..00000000000
--- a/research/deeplab/evaluation/parsing_covering.py
+++ /dev/null
@@ -1,246 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Implementation of the Parsing Covering metric.
-
-Parsing Covering is a region-based metric for evaluating the task of
-image parsing, aka panoptic segmentation.
-
-Please see the paper for details:
-"DeeperLab: Single-Shot Image Parser", Tien-Ju Yang, Maxwell D. Collins,
-Yukun Zhu, Jyh-Jing Hwang, Ting Liu, Xiao Zhang, Vivienne Sze,
-George Papandreou, Liang-Chieh Chen. arXiv: 1902.05093, 2019.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-
-import numpy as np
-import prettytable
-import six
-
-from deeplab.evaluation import base_metric
-
-
-class ParsingCovering(base_metric.SegmentationMetric):
- r"""Metric class for Parsing Covering.
-
- Computes segmentation covering metric introduced in (Arbelaez, et al., 2010)
- with extension to handle multi-class semantic labels (a.k.a. parsing
- covering). Specifically, segmentation covering (SC) is defined in Eq. (8) in
- (Arbelaez et al., 2010) as:
-
- SC(c) = \sum_{R\in S}(|R| * \max_{R'\in S'}O(R,R')) / \sum_{R\in S}|R|,
-
- where S are the groundtruth instance regions and S' are the predicted
- instance regions. The parsing covering is simply:
-
- PC = \sum_{c=1}^{C}SC(c) / C,
-
- where C is the number of classes.
- """
-
- def __init__(self,
- num_categories,
- ignored_label,
- max_instances_per_category,
- offset,
- normalize_by_image_size=True):
- """Initialization for ParsingCovering.
-
- Args:
- num_categories: The number of segmentation categories (or "classes" in the
- dataset.
- ignored_label: A category id that is ignored in evaluation, e.g. the void
- label as defined in COCO panoptic segmentation dataset.
- max_instances_per_category: The maximum number of instances for each
- category. Used in ensuring unique instance labels.
- offset: The maximum number of unique labels. This is used, by multiplying
- the ground-truth labels, to generate unique ids for individual regions
- of overlap between groundtruth and predicted segments.
- normalize_by_image_size: Whether to normalize groundtruth instance region
- areas by image size. If True, groundtruth instance areas and weighted
- IoUs will be divided by the size of the corresponding image before
- accumulated across the dataset.
- """
- super(ParsingCovering, self).__init__(num_categories, ignored_label,
- max_instances_per_category, offset)
- self.normalize_by_image_size = normalize_by_image_size
-
- def compare_and_accumulate(
- self, groundtruth_category_array, groundtruth_instance_array,
- predicted_category_array, predicted_instance_array):
- """See base class."""
- # Allocate intermediate data structures.
- max_ious = np.zeros([self.num_categories, self.max_instances_per_category],
- dtype=np.float64)
- gt_areas = np.zeros([self.num_categories, self.max_instances_per_category],
- dtype=np.float64)
- pred_areas = np.zeros(
- [self.num_categories, self.max_instances_per_category],
- dtype=np.float64)
- # This is a dictionary in the format:
- # {(category, gt_instance): [(pred_instance, intersection_area)]}.
- intersections = collections.defaultdict(list)
-
- # First, combine the category and instance labels so that every unique
- # value for (category, instance) is assigned a unique integer label.
- pred_segment_id = self._naively_combine_labels(predicted_category_array,
- predicted_instance_array)
- gt_segment_id = self._naively_combine_labels(groundtruth_category_array,
- groundtruth_instance_array)
-
- # Next, combine the groundtruth and predicted labels. Dividing up the pixels
- # based on which groundtruth segment and which predicted segment they belong
- # to, this will assign a different 32-bit integer label to each choice
- # of (groundtruth segment, predicted segment), encoded as
- # gt_segment_id * offset + pred_segment_id.
- intersection_id_array = (
- gt_segment_id.astype(np.uint32) * self.offset +
- pred_segment_id.astype(np.uint32))
-
- # For every combination of (groundtruth segment, predicted segment) with a
- # non-empty intersection, this counts the number of pixels in that
- # intersection.
- intersection_ids, intersection_areas = np.unique(
- intersection_id_array, return_counts=True)
-
- # Find areas of all groundtruth and predicted instances, as well as of their
- # intersections.
- for intersection_id, intersection_area in six.moves.zip(
- intersection_ids, intersection_areas):
- gt_segment_id = intersection_id // self.offset
- gt_category = gt_segment_id // self.max_instances_per_category
- if gt_category == self.ignored_label:
- continue
- gt_instance = gt_segment_id % self.max_instances_per_category
- gt_areas[gt_category, gt_instance] += intersection_area
-
- pred_segment_id = intersection_id % self.offset
- pred_category = pred_segment_id // self.max_instances_per_category
- pred_instance = pred_segment_id % self.max_instances_per_category
- pred_areas[pred_category, pred_instance] += intersection_area
- if pred_category != gt_category:
- continue
-
- intersections[gt_category, gt_instance].append((pred_instance,
- intersection_area))
-
- # Find maximum IoU for every groundtruth instance.
- for gt_label, instance_intersections in six.iteritems(intersections):
- category, gt_instance = gt_label
- gt_area = gt_areas[category, gt_instance]
- ious = []
- for pred_instance, intersection_area in instance_intersections:
- pred_area = pred_areas[category, pred_instance]
- union = gt_area + pred_area - intersection_area
- ious.append(intersection_area / union)
- max_ious[category, gt_instance] = max(ious)
-
- # Normalize groundtruth instance areas by image size if necessary.
- if self.normalize_by_image_size:
- gt_areas /= groundtruth_category_array.size
-
- # Compute per-class weighted IoUs and areas summed over all groundtruth
- # instances.
- self.weighted_iou_per_class += np.sum(max_ious * gt_areas, axis=-1)
- self.gt_area_per_class += np.sum(gt_areas, axis=-1)
-
- return self.result()
-
- def result_per_category(self):
- """See base class."""
- return base_metric.realdiv_maybe_zero(self.weighted_iou_per_class,
- self.gt_area_per_class)
-
- def _valid_categories(self):
- """Categories with a "valid" value for the metric, have > 0 instances.
-
- We will ignore the `ignore_label` class and other classes which have
- groundtruth area of 0.
-
- Returns:
- Boolean array of shape `[num_categories]`.
- """
- valid_categories = np.not_equal(self.gt_area_per_class, 0)
- if self.ignored_label >= 0 and self.ignored_label < self.num_categories:
- valid_categories[self.ignored_label] = False
- return valid_categories
-
- def detailed_results(self, is_thing=None):
- """See base class."""
- valid_categories = self._valid_categories()
-
- # If known, break down which categories are valid _and_ things/stuff.
- category_sets = collections.OrderedDict()
- category_sets['All'] = valid_categories
- if is_thing is not None:
- category_sets['Things'] = np.logical_and(valid_categories, is_thing)
- category_sets['Stuff'] = np.logical_and(valid_categories,
- np.logical_not(is_thing))
-
- covering_per_class = self.result_per_category()
- results = {}
- for category_set_name, in_category_set in six.iteritems(category_sets):
- if np.any(in_category_set):
- results[category_set_name] = {
- 'pc': np.mean(covering_per_class[in_category_set]),
- # The number of valid categories in this subset.
- 'n': np.sum(in_category_set.astype(np.int32)),
- }
- else:
- results[category_set_name] = {'pc': 0, 'n': 0}
-
- return results
-
- def print_detailed_results(self, is_thing=None, print_digits=3):
- """See base class."""
- results = self.detailed_results(is_thing=is_thing)
-
- tab = prettytable.PrettyTable()
-
- tab.add_column('', [], align='l')
- for fieldname in ['PC', 'N']:
- tab.add_column(fieldname, [], align='r')
-
- for category_set, subset_results in six.iteritems(results):
- data_cols = [
- round(subset_results['pc'], print_digits) * 100, subset_results['n']
- ]
- tab.add_row([category_set] + data_cols)
-
- print(tab)
-
- def result(self):
- """See base class."""
- covering_per_class = self.result_per_category()
- valid_categories = self._valid_categories()
- if not np.any(valid_categories):
- return 0.
- return np.mean(covering_per_class[valid_categories])
-
- def merge(self, other_instance):
- """See base class."""
- self.weighted_iou_per_class += other_instance.weighted_iou_per_class
- self.gt_area_per_class += other_instance.gt_area_per_class
-
- def reset(self):
- """See base class."""
- self.weighted_iou_per_class = np.zeros(
- self.num_categories, dtype=np.float64)
- self.gt_area_per_class = np.zeros(self.num_categories, dtype=np.float64)
diff --git a/research/deeplab/evaluation/parsing_covering_test.py b/research/deeplab/evaluation/parsing_covering_test.py
deleted file mode 100644
index 124d1b37255..00000000000
--- a/research/deeplab/evaluation/parsing_covering_test.py
+++ /dev/null
@@ -1,173 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for Parsing Covering metric."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-
-
-from absl.testing import absltest
-import numpy as np
-
-from deeplab.evaluation import parsing_covering
-from deeplab.evaluation import test_utils
-
-# See the definition of the color names at:
-# https://en.wikipedia.org/wiki/Web_colors.
-_CLASS_COLOR_MAP = {
- (0, 0, 0): 0,
- (0, 0, 255): 1, # Person (blue).
- (255, 0, 0): 2, # Bear (red).
- (0, 255, 0): 3, # Tree (lime).
- (255, 0, 255): 4, # Bird (fuchsia).
- (0, 255, 255): 5, # Sky (aqua).
- (255, 255, 0): 6, # Cat (yellow).
-}
-
-
-class CoveringConveringTest(absltest.TestCase):
-
- def test_perfect_match(self):
- categories = np.zeros([6, 6], np.uint16)
- instances = np.array([
- [2, 2, 2, 2, 2, 2],
- [2, 4, 4, 4, 4, 2],
- [2, 4, 4, 4, 4, 2],
- [2, 4, 4, 4, 4, 2],
- [2, 4, 4, 2, 2, 2],
- [2, 4, 2, 2, 2, 2],
- ],
- dtype=np.uint16)
-
- pc = parsing_covering.ParsingCovering(
- num_categories=3,
- ignored_label=2,
- max_instances_per_category=2,
- offset=16,
- normalize_by_image_size=False)
- pc.compare_and_accumulate(categories, instances, categories, instances)
- np.testing.assert_array_equal(pc.weighted_iou_per_class, [0.0, 21.0, 0.0])
- np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 21.0, 0.0])
- np.testing.assert_array_equal(pc.result_per_category(), [0.0, 1.0, 0.0])
- self.assertEqual(pc.result(), 1.0)
-
- def test_totally_wrong(self):
- categories = np.zeros([6, 6], np.uint16)
- gt_instances = np.array([
- [0, 0, 0, 0, 0, 0],
- [0, 1, 0, 0, 1, 0],
- [0, 1, 1, 1, 1, 0],
- [0, 1, 1, 1, 1, 0],
- [0, 0, 0, 0, 0, 0],
- [0, 0, 0, 0, 0, 0],
- ],
- dtype=np.uint16)
- pred_instances = 1 - gt_instances
-
- pc = parsing_covering.ParsingCovering(
- num_categories=2,
- ignored_label=0,
- max_instances_per_category=1,
- offset=16,
- normalize_by_image_size=False)
- pc.compare_and_accumulate(categories, gt_instances, categories,
- pred_instances)
- np.testing.assert_array_equal(pc.weighted_iou_per_class, [0.0, 0.0])
- np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 10.0])
- np.testing.assert_array_equal(pc.result_per_category(), [0.0, 0.0])
- self.assertEqual(pc.result(), 0.0)
-
- def test_matches_expected(self):
- pred_classes = test_utils.read_segmentation_with_rgb_color_map(
- 'team_pred_class.png', _CLASS_COLOR_MAP)
- pred_instances = test_utils.read_test_image(
- 'team_pred_instance.png', mode='L')
-
- instance_class_map = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 2,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
- 'team_gt_instance.png', instance_class_map)
-
- pc = parsing_covering.ParsingCovering(
- num_categories=3,
- ignored_label=0,
- max_instances_per_category=256,
- offset=256 * 256,
- normalize_by_image_size=False)
- pc.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
- pred_instances)
- np.testing.assert_array_almost_equal(
- pc.weighted_iou_per_class, [0.0, 39864.14634, 3136], decimal=4)
- np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 56870, 5800])
- np.testing.assert_array_almost_equal(
- pc.result_per_category(), [0.0, 0.70097, 0.54069], decimal=4)
- self.assertAlmostEqual(pc.result(), 0.6208296732)
-
- def test_matches_expected_normalize_by_size(self):
- pred_classes = test_utils.read_segmentation_with_rgb_color_map(
- 'team_pred_class.png', _CLASS_COLOR_MAP)
- pred_instances = test_utils.read_test_image(
- 'team_pred_instance.png', mode='L')
-
- instance_class_map = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 2,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
- 'team_gt_instance.png', instance_class_map)
-
- pc = parsing_covering.ParsingCovering(
- num_categories=3,
- ignored_label=0,
- max_instances_per_category=256,
- offset=256 * 256,
- normalize_by_image_size=True)
- pc.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
- pred_instances)
- np.testing.assert_array_almost_equal(
- pc.weighted_iou_per_class, [0.0, 0.5002088756, 0.03935002196],
- decimal=4)
- np.testing.assert_array_almost_equal(
- pc.gt_area_per_class, [0.0, 0.7135955832, 0.07277746408], decimal=4)
- # Note that the per-category and overall PCs are identical to those without
- # normalization in the previous test, because we only have a single image.
- np.testing.assert_array_almost_equal(
- pc.result_per_category(), [0.0, 0.70097, 0.54069], decimal=4)
- self.assertAlmostEqual(pc.result(), 0.6208296732)
-
-
-if __name__ == '__main__':
- absltest.main()
diff --git a/research/deeplab/evaluation/streaming_metrics.py b/research/deeplab/evaluation/streaming_metrics.py
deleted file mode 100644
index 8313792676a..00000000000
--- a/research/deeplab/evaluation/streaming_metrics.py
+++ /dev/null
@@ -1,240 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Code to compute segmentation in a "streaming" pattern in Tensorflow.
-
-These aggregate the metric over examples of the evaluation set. Each example is
-assumed to be fed in in a stream, and the metric implementation accumulates
-across them.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from deeplab.evaluation import panoptic_quality
-from deeplab.evaluation import parsing_covering
-
-_EPSILON = 1e-10
-
-
-def _realdiv_maybe_zero(x, y):
- """Support tf.realdiv(x, y) where y may contain zeros."""
- return tf.where(tf.less(y, _EPSILON), tf.zeros_like(x), tf.realdiv(x, y))
-
-
-def _running_total(value, shape, name=None):
- """Maintains a running total of tensor `value` between calls."""
- with tf.variable_scope(name, 'running_total', [value]):
- total_var = tf.get_variable(
- 'total',
- shape,
- value.dtype,
- initializer=tf.zeros_initializer(),
- trainable=False,
- collections=[
- tf.GraphKeys.LOCAL_VARIABLES, tf.GraphKeys.METRIC_VARIABLES
- ])
- updated_total = tf.assign_add(total_var, value, use_locking=True)
-
- return total_var, updated_total
-
-
-def _panoptic_quality_helper(
- groundtruth_category_array, groundtruth_instance_array,
- predicted_category_array, predicted_instance_array, num_classes,
- max_instances_per_category, ignored_label, offset):
- """Helper function to compute panoptic quality."""
- pq = panoptic_quality.PanopticQuality(num_classes, ignored_label,
- max_instances_per_category, offset)
- pq.compare_and_accumulate(groundtruth_category_array,
- groundtruth_instance_array,
- predicted_category_array, predicted_instance_array)
- return pq.iou_per_class, pq.tp_per_class, pq.fn_per_class, pq.fp_per_class
-
-
-def streaming_panoptic_quality(groundtruth_categories,
- groundtruth_instances,
- predicted_categories,
- predicted_instances,
- num_classes,
- max_instances_per_category,
- ignored_label,
- offset,
- name=None):
- """Aggregates the panoptic metric across calls with different input tensors.
-
- See tf.metrics.* functions for comparable functionality and usage.
-
- Args:
- groundtruth_categories: A 2D uint16 tensor of groundtruth category labels.
- groundtruth_instances: A 2D uint16 tensor of groundtruth instance labels.
- predicted_categories: A 2D uint16 tensor of predicted category labels.
- predicted_instances: A 2D uint16 tensor of predicted instance labels.
- num_classes: Number of classes in the dataset as an integer.
- max_instances_per_category: The maximum number of instances for each class
- as an integer or integer tensor.
- ignored_label: The class id to be ignored in evaluation as an integer or
- integer tensor.
- offset: The maximum number of unique labels as an integer or integer tensor.
- name: An optional variable_scope name.
-
- Returns:
- qualities: A tensor of shape `[6, num_classes]`, where (1) panoptic quality,
- (2) segmentation quality, (3) recognition quality, (4) total_tp,
- (5) total_fn and (6) total_fp are saved in the respective rows.
- update_ops: List of operations that update the running overall panoptic
- quality.
-
- Raises:
- RuntimeError: If eager execution is enabled.
- """
- if tf.executing_eagerly():
- raise RuntimeError('Cannot aggregate when eager execution is enabled.')
-
- input_args = [
- tf.convert_to_tensor(groundtruth_categories, tf.uint16),
- tf.convert_to_tensor(groundtruth_instances, tf.uint16),
- tf.convert_to_tensor(predicted_categories, tf.uint16),
- tf.convert_to_tensor(predicted_instances, tf.uint16),
- tf.convert_to_tensor(num_classes, tf.int32),
- tf.convert_to_tensor(max_instances_per_category, tf.int32),
- tf.convert_to_tensor(ignored_label, tf.int32),
- tf.convert_to_tensor(offset, tf.int32),
- ]
- return_types = [
- tf.float64,
- tf.float64,
- tf.float64,
- tf.float64,
- ]
- with tf.variable_scope(name, 'streaming_panoptic_quality', input_args):
- panoptic_results = tf.py_func(
- _panoptic_quality_helper, input_args, return_types, stateful=False)
- iou, tp, fn, fp = tuple(panoptic_results)
-
- total_iou, updated_iou = _running_total(
- iou, [num_classes], name='iou_total')
- total_tp, updated_tp = _running_total(tp, [num_classes], name='tp_total')
- total_fn, updated_fn = _running_total(fn, [num_classes], name='fn_total')
- total_fp, updated_fp = _running_total(fp, [num_classes], name='fp_total')
- update_ops = [updated_iou, updated_tp, updated_fn, updated_fp]
-
- sq = _realdiv_maybe_zero(total_iou, total_tp)
- rq = _realdiv_maybe_zero(total_tp,
- total_tp + 0.5 * total_fn + 0.5 * total_fp)
- pq = tf.multiply(sq, rq)
- qualities = tf.stack([pq, sq, rq, total_tp, total_fn, total_fp], axis=0)
- return qualities, update_ops
-
-
-def _parsing_covering_helper(
- groundtruth_category_array, groundtruth_instance_array,
- predicted_category_array, predicted_instance_array, num_classes,
- max_instances_per_category, ignored_label, offset, normalize_by_image_size):
- """Helper function to compute parsing covering."""
- pc = parsing_covering.ParsingCovering(num_classes, ignored_label,
- max_instances_per_category, offset,
- normalize_by_image_size)
- pc.compare_and_accumulate(groundtruth_category_array,
- groundtruth_instance_array,
- predicted_category_array, predicted_instance_array)
- return pc.weighted_iou_per_class, pc.gt_area_per_class
-
-
-def streaming_parsing_covering(groundtruth_categories,
- groundtruth_instances,
- predicted_categories,
- predicted_instances,
- num_classes,
- max_instances_per_category,
- ignored_label,
- offset,
- normalize_by_image_size=True,
- name=None):
- """Aggregates the covering across calls with different input tensors.
-
- See tf.metrics.* functions for comparable functionality and usage.
-
- Args:
- groundtruth_categories: A 2D uint16 tensor of groundtruth category labels.
- groundtruth_instances: A 2D uint16 tensor of groundtruth instance labels.
- predicted_categories: A 2D uint16 tensor of predicted category labels.
- predicted_instances: A 2D uint16 tensor of predicted instance labels.
- num_classes: Number of classes in the dataset as an integer.
- max_instances_per_category: The maximum number of instances for each class
- as an integer or integer tensor.
- ignored_label: The class id to be ignored in evaluation as an integer or
- integer tensor.
- offset: The maximum number of unique labels as an integer or integer tensor.
- normalize_by_image_size: Whether to normalize groundtruth region areas by
- image size. If True, groundtruth instance areas and weighted IoUs will be
- divided by the size of the corresponding image before accumulated across
- the dataset.
- name: An optional variable_scope name.
-
- Returns:
- coverings: A tensor of shape `[3, num_classes]`, where (1) per class
- coverings, (2) per class sum of weighted IoUs, and (3) per class sum of
- groundtruth region areas are saved in the perspective rows.
- update_ops: List of operations that update the running overall parsing
- covering.
-
- Raises:
- RuntimeError: If eager execution is enabled.
- """
- if tf.executing_eagerly():
- raise RuntimeError('Cannot aggregate when eager execution is enabled.')
-
- input_args = [
- tf.convert_to_tensor(groundtruth_categories, tf.uint16),
- tf.convert_to_tensor(groundtruth_instances, tf.uint16),
- tf.convert_to_tensor(predicted_categories, tf.uint16),
- tf.convert_to_tensor(predicted_instances, tf.uint16),
- tf.convert_to_tensor(num_classes, tf.int32),
- tf.convert_to_tensor(max_instances_per_category, tf.int32),
- tf.convert_to_tensor(ignored_label, tf.int32),
- tf.convert_to_tensor(offset, tf.int32),
- tf.convert_to_tensor(normalize_by_image_size, tf.bool),
- ]
- return_types = [
- tf.float64,
- tf.float64,
- ]
- with tf.variable_scope(name, 'streaming_parsing_covering', input_args):
- covering_results = tf.py_func(
- _parsing_covering_helper, input_args, return_types, stateful=False)
- weighted_iou_per_class, gt_area_per_class = tuple(covering_results)
-
- total_weighted_iou_per_class, updated_weighted_iou_per_class = (
- _running_total(
- weighted_iou_per_class, [num_classes],
- name='weighted_iou_per_class_total'))
- total_gt_area_per_class, updated_gt_area_per_class = _running_total(
- gt_area_per_class, [num_classes], name='gt_area_per_class_total')
-
- covering_per_class = _realdiv_maybe_zero(total_weighted_iou_per_class,
- total_gt_area_per_class)
- coverings = tf.stack([
- covering_per_class,
- total_weighted_iou_per_class,
- total_gt_area_per_class,
- ],
- axis=0)
- update_ops = [updated_weighted_iou_per_class, updated_gt_area_per_class]
-
- return coverings, update_ops
diff --git a/research/deeplab/evaluation/streaming_metrics_test.py b/research/deeplab/evaluation/streaming_metrics_test.py
deleted file mode 100644
index 656007e6238..00000000000
--- a/research/deeplab/evaluation/streaming_metrics_test.py
+++ /dev/null
@@ -1,549 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for segmentation "streaming" metrics."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import collections
-
-
-
-import numpy as np
-import six
-import tensorflow as tf
-
-from deeplab.evaluation import streaming_metrics
-from deeplab.evaluation import test_utils
-
-# See the definition of the color names at:
-# https://en.wikipedia.org/wiki/Web_colors.
-_CLASS_COLOR_MAP = {
- (0, 0, 0): 0,
- (0, 0, 255): 1, # Person (blue).
- (255, 0, 0): 2, # Bear (red).
- (0, 255, 0): 3, # Tree (lime).
- (255, 0, 255): 4, # Bird (fuchsia).
- (0, 255, 255): 5, # Sky (aqua).
- (255, 255, 0): 6, # Cat (yellow).
-}
-
-
-class StreamingPanopticQualityTest(tf.test.TestCase):
-
- def test_streaming_metric_on_single_image(self):
- offset = 256 * 256
-
- instance_class_map = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 2,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
- 'team_gt_instance.png', instance_class_map)
-
- pred_classes = test_utils.read_segmentation_with_rgb_color_map(
- 'team_pred_class.png', _CLASS_COLOR_MAP)
- pred_instances = test_utils.read_test_image(
- 'team_pred_instance.png', mode='L')
-
- gt_class_tensor = tf.placeholder(tf.uint16)
- gt_instance_tensor = tf.placeholder(tf.uint16)
- pred_class_tensor = tf.placeholder(tf.uint16)
- pred_instance_tensor = tf.placeholder(tf.uint16)
- qualities, update_pq = streaming_metrics.streaming_panoptic_quality(
- gt_class_tensor,
- gt_instance_tensor,
- pred_class_tensor,
- pred_instance_tensor,
- num_classes=3,
- max_instances_per_category=256,
- ignored_label=0,
- offset=offset)
- pq, sq, rq, total_tp, total_fn, total_fp = tf.unstack(qualities, 6, axis=0)
- feed_dict = {
- gt_class_tensor: gt_classes,
- gt_instance_tensor: gt_instances,
- pred_class_tensor: pred_classes,
- pred_instance_tensor: pred_instances
- }
-
- with self.session() as sess:
- sess.run(tf.local_variables_initializer())
- sess.run(update_pq, feed_dict=feed_dict)
- (result_pq, result_sq, result_rq, result_total_tp, result_total_fn,
- result_total_fp) = sess.run([pq, sq, rq, total_tp, total_fn, total_fp],
- feed_dict=feed_dict)
- np.testing.assert_array_almost_equal(
- result_pq, [2.06104, 0.7024, 0.54069], decimal=4)
- np.testing.assert_array_almost_equal(
- result_sq, [2.06104, 0.7526, 0.54069], decimal=4)
- np.testing.assert_array_almost_equal(result_rq, [1., 0.9333, 1.], decimal=4)
- np.testing.assert_array_almost_equal(
- result_total_tp, [1., 7., 1.], decimal=4)
- np.testing.assert_array_almost_equal(
- result_total_fn, [0., 1., 0.], decimal=4)
- np.testing.assert_array_almost_equal(
- result_total_fp, [0., 0., 0.], decimal=4)
-
- def test_streaming_metric_on_multiple_images(self):
- num_classes = 7
- offset = 256 * 256
-
- bird_gt_instance_class_map = {
- 92: 5,
- 176: 3,
- 255: 4,
- }
- cat_gt_instance_class_map = {
- 0: 0,
- 255: 6,
- }
- team_gt_instance_class_map = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 2,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- test_image = collections.namedtuple(
- 'TestImage',
- ['gt_class_map', 'gt_path', 'pred_inst_path', 'pred_class_path'])
- test_images = [
- test_image(bird_gt_instance_class_map, 'bird_gt.png',
- 'bird_pred_instance.png', 'bird_pred_class.png'),
- test_image(cat_gt_instance_class_map, 'cat_gt.png',
- 'cat_pred_instance.png', 'cat_pred_class.png'),
- test_image(team_gt_instance_class_map, 'team_gt_instance.png',
- 'team_pred_instance.png', 'team_pred_class.png'),
- ]
-
- gt_classes = []
- gt_instances = []
- pred_classes = []
- pred_instances = []
- for test_image in test_images:
- (image_gt_instances,
- image_gt_classes) = test_utils.panoptic_segmentation_with_class_map(
- test_image.gt_path, test_image.gt_class_map)
- gt_classes.append(image_gt_classes)
- gt_instances.append(image_gt_instances)
-
- pred_classes.append(
- test_utils.read_segmentation_with_rgb_color_map(
- test_image.pred_class_path, _CLASS_COLOR_MAP))
- pred_instances.append(
- test_utils.read_test_image(test_image.pred_inst_path, mode='L'))
-
- gt_class_tensor = tf.placeholder(tf.uint16)
- gt_instance_tensor = tf.placeholder(tf.uint16)
- pred_class_tensor = tf.placeholder(tf.uint16)
- pred_instance_tensor = tf.placeholder(tf.uint16)
- qualities, update_pq = streaming_metrics.streaming_panoptic_quality(
- gt_class_tensor,
- gt_instance_tensor,
- pred_class_tensor,
- pred_instance_tensor,
- num_classes=num_classes,
- max_instances_per_category=256,
- ignored_label=0,
- offset=offset)
- pq, sq, rq, total_tp, total_fn, total_fp = tf.unstack(qualities, 6, axis=0)
- with self.session() as sess:
- sess.run(tf.local_variables_initializer())
- for pred_class, pred_instance, gt_class, gt_instance in six.moves.zip(
- pred_classes, pred_instances, gt_classes, gt_instances):
- sess.run(
- update_pq,
- feed_dict={
- gt_class_tensor: gt_class,
- gt_instance_tensor: gt_instance,
- pred_class_tensor: pred_class,
- pred_instance_tensor: pred_instance
- })
- (result_pq, result_sq, result_rq, result_total_tp, result_total_fn,
- result_total_fp) = sess.run(
- [pq, sq, rq, total_tp, total_fn, total_fp],
- feed_dict={
- gt_class_tensor: 0,
- gt_instance_tensor: 0,
- pred_class_tensor: 0,
- pred_instance_tensor: 0
- })
- np.testing.assert_array_almost_equal(
- result_pq,
- [4.3107, 0.7024, 0.54069, 0.745353, 0.85768, 0.99107, 0.77410],
- decimal=4)
- np.testing.assert_array_almost_equal(
- result_sq, [5.3883, 0.7526, 0.5407, 0.7454, 0.8577, 0.9911, 0.7741],
- decimal=4)
- np.testing.assert_array_almost_equal(
- result_rq, [0.8, 0.9333, 1., 1., 1., 1., 1.], decimal=4)
- np.testing.assert_array_almost_equal(
- result_total_tp, [2., 7., 1., 1., 1., 1., 1.], decimal=4)
- np.testing.assert_array_almost_equal(
- result_total_fn, [0., 1., 0., 0., 0., 0., 0.], decimal=4)
- np.testing.assert_array_almost_equal(
- result_total_fp, [1., 0., 0., 0., 0., 0., 0.], decimal=4)
-
-
-class StreamingParsingCoveringTest(tf.test.TestCase):
-
- def test_streaming_metric_on_single_image(self):
- offset = 256 * 256
-
- instance_class_map = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 2,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
- 'team_gt_instance.png', instance_class_map)
-
- pred_classes = test_utils.read_segmentation_with_rgb_color_map(
- 'team_pred_class.png', _CLASS_COLOR_MAP)
- pred_instances = test_utils.read_test_image(
- 'team_pred_instance.png', mode='L')
-
- gt_class_tensor = tf.placeholder(tf.uint16)
- gt_instance_tensor = tf.placeholder(tf.uint16)
- pred_class_tensor = tf.placeholder(tf.uint16)
- pred_instance_tensor = tf.placeholder(tf.uint16)
- coverings, update_ops = streaming_metrics.streaming_parsing_covering(
- gt_class_tensor,
- gt_instance_tensor,
- pred_class_tensor,
- pred_instance_tensor,
- num_classes=3,
- max_instances_per_category=256,
- ignored_label=0,
- offset=offset,
- normalize_by_image_size=False)
- (per_class_coverings, per_class_weighted_ious, per_class_gt_areas) = (
- tf.unstack(coverings, num=3, axis=0))
- feed_dict = {
- gt_class_tensor: gt_classes,
- gt_instance_tensor: gt_instances,
- pred_class_tensor: pred_classes,
- pred_instance_tensor: pred_instances
- }
-
- with self.session() as sess:
- sess.run(tf.local_variables_initializer())
- sess.run(update_ops, feed_dict=feed_dict)
- (result_per_class_coverings, result_per_class_weighted_ious,
- result_per_class_gt_areas) = (
- sess.run([
- per_class_coverings,
- per_class_weighted_ious,
- per_class_gt_areas,
- ],
- feed_dict=feed_dict))
-
- np.testing.assert_array_almost_equal(
- result_per_class_coverings, [0.0, 0.7009696912, 0.5406896552],
- decimal=4)
- np.testing.assert_array_almost_equal(
- result_per_class_weighted_ious, [0.0, 39864.14634, 3136], decimal=4)
- np.testing.assert_array_equal(result_per_class_gt_areas, [0, 56870, 5800])
-
- def test_streaming_metric_on_multiple_images(self):
- """Tests streaming parsing covering metric."""
- num_classes = 7
- offset = 256 * 256
-
- bird_gt_instance_class_map = {
- 92: 5,
- 176: 3,
- 255: 4,
- }
- cat_gt_instance_class_map = {
- 0: 0,
- 255: 6,
- }
- team_gt_instance_class_map = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 2,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- test_image = collections.namedtuple(
- 'TestImage',
- ['gt_class_map', 'gt_path', 'pred_inst_path', 'pred_class_path'])
- test_images = [
- test_image(bird_gt_instance_class_map, 'bird_gt.png',
- 'bird_pred_instance.png', 'bird_pred_class.png'),
- test_image(cat_gt_instance_class_map, 'cat_gt.png',
- 'cat_pred_instance.png', 'cat_pred_class.png'),
- test_image(team_gt_instance_class_map, 'team_gt_instance.png',
- 'team_pred_instance.png', 'team_pred_class.png'),
- ]
-
- gt_classes = []
- gt_instances = []
- pred_classes = []
- pred_instances = []
- for test_image in test_images:
- (image_gt_instances,
- image_gt_classes) = test_utils.panoptic_segmentation_with_class_map(
- test_image.gt_path, test_image.gt_class_map)
- gt_classes.append(image_gt_classes)
- gt_instances.append(image_gt_instances)
-
- pred_instances.append(
- test_utils.read_test_image(test_image.pred_inst_path, mode='L'))
- pred_classes.append(
- test_utils.read_segmentation_with_rgb_color_map(
- test_image.pred_class_path, _CLASS_COLOR_MAP))
-
- gt_class_tensor = tf.placeholder(tf.uint16)
- gt_instance_tensor = tf.placeholder(tf.uint16)
- pred_class_tensor = tf.placeholder(tf.uint16)
- pred_instance_tensor = tf.placeholder(tf.uint16)
- coverings, update_ops = streaming_metrics.streaming_parsing_covering(
- gt_class_tensor,
- gt_instance_tensor,
- pred_class_tensor,
- pred_instance_tensor,
- num_classes=num_classes,
- max_instances_per_category=256,
- ignored_label=0,
- offset=offset,
- normalize_by_image_size=False)
- (per_class_coverings, per_class_weighted_ious, per_class_gt_areas) = (
- tf.unstack(coverings, num=3, axis=0))
-
- with self.session() as sess:
- sess.run(tf.local_variables_initializer())
- for pred_class, pred_instance, gt_class, gt_instance in six.moves.zip(
- pred_classes, pred_instances, gt_classes, gt_instances):
- sess.run(
- update_ops,
- feed_dict={
- gt_class_tensor: gt_class,
- gt_instance_tensor: gt_instance,
- pred_class_tensor: pred_class,
- pred_instance_tensor: pred_instance
- })
- (result_per_class_coverings, result_per_class_weighted_ious,
- result_per_class_gt_areas) = (
- sess.run(
- [
- per_class_coverings,
- per_class_weighted_ious,
- per_class_gt_areas,
- ],
- feed_dict={
- gt_class_tensor: 0,
- gt_instance_tensor: 0,
- pred_class_tensor: 0,
- pred_instance_tensor: 0
- }))
-
- np.testing.assert_array_almost_equal(
- result_per_class_coverings, [
- 0.0,
- 0.7009696912,
- 0.5406896552,
- 0.7453531599,
- 0.8576779026,
- 0.9910687881,
- 0.7741046032,
- ],
- decimal=4)
- np.testing.assert_array_almost_equal(
- result_per_class_weighted_ious, [
- 0.0,
- 39864.14634,
- 3136,
- 1177.657993,
- 2498.41573,
- 33366.31289,
- 26671,
- ],
- decimal=4)
- np.testing.assert_array_equal(result_per_class_gt_areas, [
- 0.0,
- 56870,
- 5800,
- 1580,
- 2913,
- 33667,
- 34454,
- ])
-
- def test_streaming_metric_on_multiple_images_normalize_by_size(self):
- """Tests streaming parsing covering metric with image size normalization."""
- num_classes = 7
- offset = 256 * 256
-
- bird_gt_instance_class_map = {
- 92: 5,
- 176: 3,
- 255: 4,
- }
- cat_gt_instance_class_map = {
- 0: 0,
- 255: 6,
- }
- team_gt_instance_class_map = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 2,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- test_image = collections.namedtuple(
- 'TestImage',
- ['gt_class_map', 'gt_path', 'pred_inst_path', 'pred_class_path'])
- test_images = [
- test_image(bird_gt_instance_class_map, 'bird_gt.png',
- 'bird_pred_instance.png', 'bird_pred_class.png'),
- test_image(cat_gt_instance_class_map, 'cat_gt.png',
- 'cat_pred_instance.png', 'cat_pred_class.png'),
- test_image(team_gt_instance_class_map, 'team_gt_instance.png',
- 'team_pred_instance.png', 'team_pred_class.png'),
- ]
-
- gt_classes = []
- gt_instances = []
- pred_classes = []
- pred_instances = []
- for test_image in test_images:
- (image_gt_instances,
- image_gt_classes) = test_utils.panoptic_segmentation_with_class_map(
- test_image.gt_path, test_image.gt_class_map)
- gt_classes.append(image_gt_classes)
- gt_instances.append(image_gt_instances)
-
- pred_instances.append(
- test_utils.read_test_image(test_image.pred_inst_path, mode='L'))
- pred_classes.append(
- test_utils.read_segmentation_with_rgb_color_map(
- test_image.pred_class_path, _CLASS_COLOR_MAP))
-
- gt_class_tensor = tf.placeholder(tf.uint16)
- gt_instance_tensor = tf.placeholder(tf.uint16)
- pred_class_tensor = tf.placeholder(tf.uint16)
- pred_instance_tensor = tf.placeholder(tf.uint16)
- coverings, update_ops = streaming_metrics.streaming_parsing_covering(
- gt_class_tensor,
- gt_instance_tensor,
- pred_class_tensor,
- pred_instance_tensor,
- num_classes=num_classes,
- max_instances_per_category=256,
- ignored_label=0,
- offset=offset,
- normalize_by_image_size=True)
- (per_class_coverings, per_class_weighted_ious, per_class_gt_areas) = (
- tf.unstack(coverings, num=3, axis=0))
-
- with self.session() as sess:
- sess.run(tf.local_variables_initializer())
- for pred_class, pred_instance, gt_class, gt_instance in six.moves.zip(
- pred_classes, pred_instances, gt_classes, gt_instances):
- sess.run(
- update_ops,
- feed_dict={
- gt_class_tensor: gt_class,
- gt_instance_tensor: gt_instance,
- pred_class_tensor: pred_class,
- pred_instance_tensor: pred_instance
- })
- (result_per_class_coverings, result_per_class_weighted_ious,
- result_per_class_gt_areas) = (
- sess.run(
- [
- per_class_coverings,
- per_class_weighted_ious,
- per_class_gt_areas,
- ],
- feed_dict={
- gt_class_tensor: 0,
- gt_instance_tensor: 0,
- pred_class_tensor: 0,
- pred_instance_tensor: 0
- }))
-
- np.testing.assert_array_almost_equal(
- result_per_class_coverings, [
- 0.0,
- 0.7009696912,
- 0.5406896552,
- 0.7453531599,
- 0.8576779026,
- 0.9910687881,
- 0.7741046032,
- ],
- decimal=4)
- np.testing.assert_array_almost_equal(
- result_per_class_weighted_ious, [
- 0.0,
- 0.5002088756,
- 0.03935002196,
- 0.03086105851,
- 0.06547211033,
- 0.8743792686,
- 0.2549565051,
- ],
- decimal=4)
- np.testing.assert_array_almost_equal(
- result_per_class_gt_areas, [
- 0.0,
- 0.7135955832,
- 0.07277746408,
- 0.04140461216,
- 0.07633647799,
- 0.8822589099,
- 0.3293566581,
- ],
- decimal=4)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/evaluation/test_utils.py b/research/deeplab/evaluation/test_utils.py
deleted file mode 100644
index 9ad4f551271..00000000000
--- a/research/deeplab/evaluation/test_utils.py
+++ /dev/null
@@ -1,119 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Utility functions to set up unit tests on Panoptic Segmentation code."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-
-
-from absl import flags
-import numpy as np
-import scipy.misc
-import six
-from six.moves import map
-
-FLAGS = flags.FLAGS
-
-_TEST_DIR = 'deeplab/evaluation/testdata'
-
-
-def read_test_image(testdata_path, *args, **kwargs):
- """Loads a test image.
-
- Args:
- testdata_path: Image path relative to panoptic_segmentation/testdata as a
- string.
- *args: Additional positional arguments passed to `imread`.
- **kwargs: Additional keyword arguments passed to `imread`.
-
- Returns:
- The image, as a numpy array.
- """
- image_path = os.path.join(_TEST_DIR, testdata_path)
- return scipy.misc.imread(image_path, *args, **kwargs)
-
-
-def read_segmentation_with_rgb_color_map(image_testdata_path,
- rgb_to_semantic_label,
- output_dtype=None):
- """Reads a test segmentation as an image and a map from colors to labels.
-
- Args:
- image_testdata_path: Image path relative to panoptic_segmentation/testdata
- as a string.
- rgb_to_semantic_label: Mapping from RGB colors to integer labels as a
- dictionary.
- output_dtype: Type of the output labels. If None, defaults to the type of
- the provided color map.
-
- Returns:
- A 2D numpy array of labels.
-
- Raises:
- ValueError: On an incomplete `rgb_to_semantic_label`.
- """
- rgb_image = read_test_image(image_testdata_path, mode='RGB')
- if len(rgb_image.shape) != 3 or rgb_image.shape[2] != 3:
- raise AssertionError(
- 'Expected RGB image, actual shape is %s' % rgb_image.sape)
-
- num_pixels = rgb_image.shape[0] * rgb_image.shape[1]
- unique_colors = np.unique(np.reshape(rgb_image, [num_pixels, 3]), axis=0)
- if not set(map(tuple, unique_colors)).issubset(
- six.viewkeys(rgb_to_semantic_label)):
- raise ValueError('RGB image has colors not in color map.')
-
- output_dtype = output_dtype or type(
- next(six.itervalues(rgb_to_semantic_label)))
- output_labels = np.empty(rgb_image.shape[:2], dtype=output_dtype)
- for rgb_color, int_label in six.iteritems(rgb_to_semantic_label):
- color_array = np.array(rgb_color, ndmin=3)
- output_labels[np.all(rgb_image == color_array, axis=2)] = int_label
- return output_labels
-
-
-def panoptic_segmentation_with_class_map(instance_testdata_path,
- instance_label_to_semantic_label):
- """Reads in a panoptic segmentation with an instance map and a map to classes.
-
- Args:
- instance_testdata_path: Path to a grayscale instance map, given as a string
- and relative to panoptic_segmentation/testdata.
- instance_label_to_semantic_label: A map from instance labels to class
- labels.
-
- Returns:
- A tuple `(instance_labels, class_labels)` of numpy arrays.
-
- Raises:
- ValueError: On a mismatched set of instances in
- the
- `instance_label_to_semantic_label`.
- """
- instance_labels = read_test_image(instance_testdata_path, mode='L')
- if set(np.unique(instance_labels)) != set(
- six.iterkeys(instance_label_to_semantic_label)):
- raise ValueError('Provided class map does not match present instance ids.')
-
- class_labels = np.empty_like(instance_labels)
- for instance_id, class_id in six.iteritems(instance_label_to_semantic_label):
- class_labels[instance_labels == instance_id] = class_id
-
- return instance_labels, class_labels
diff --git a/research/deeplab/evaluation/test_utils_test.py b/research/deeplab/evaluation/test_utils_test.py
deleted file mode 100644
index 9e9bed37e4b..00000000000
--- a/research/deeplab/evaluation/test_utils_test.py
+++ /dev/null
@@ -1,74 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for test_utils."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-
-
-from absl.testing import absltest
-import numpy as np
-
-from deeplab.evaluation import test_utils
-
-
-class TestUtilsTest(absltest.TestCase):
-
- def test_read_test_image(self):
- image_array = test_utils.read_test_image('team_pred_class.png')
- self.assertSequenceEqual(image_array.shape, (231, 345, 4))
-
- def test_reads_segmentation_with_color_map(self):
- rgb_to_semantic_label = {(0, 0, 0): 0, (0, 0, 255): 1, (255, 0, 0): 23}
- labels = test_utils.read_segmentation_with_rgb_color_map(
- 'team_pred_class.png', rgb_to_semantic_label)
-
- input_image = test_utils.read_test_image('team_pred_class.png')
- np.testing.assert_array_equal(
- labels == 0,
- np.logical_and(input_image[:, :, 0] == 0, input_image[:, :, 2] == 0))
- np.testing.assert_array_equal(labels == 1, input_image[:, :, 2] == 255)
- np.testing.assert_array_equal(labels == 23, input_image[:, :, 0] == 255)
-
- def test_reads_gt_segmentation(self):
- instance_label_to_semantic_label = {
- 0: 0,
- 47: 1,
- 97: 1,
- 133: 1,
- 150: 1,
- 174: 1,
- 198: 23,
- 215: 1,
- 244: 1,
- 255: 1,
- }
- instances, classes = test_utils.panoptic_segmentation_with_class_map(
- 'team_gt_instance.png', instance_label_to_semantic_label)
-
- expected_label_shape = (231, 345)
- self.assertSequenceEqual(instances.shape, expected_label_shape)
- self.assertSequenceEqual(classes.shape, expected_label_shape)
- np.testing.assert_array_equal(instances == 0, classes == 0)
- np.testing.assert_array_equal(instances == 198, classes == 23)
- np.testing.assert_array_equal(
- np.logical_and(instances != 0, instances != 198), classes == 1)
-
-
-if __name__ == '__main__':
- absltest.main()
diff --git a/research/deeplab/evaluation/testdata/README.md b/research/deeplab/evaluation/testdata/README.md
deleted file mode 100644
index 711b4767de8..00000000000
--- a/research/deeplab/evaluation/testdata/README.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# Segmentation Evalaution Test Data
-
-## Source Images
-
-* [team_input.png](team_input.png) \
- Source:
- https://ai.googleblog.com/2018/03/semantic-image-segmentation-with.html
-* [cat_input.jpg](cat_input.jpg) \
- Source: https://www.flickr.com/photos/magdalena_b/4995858743
-* [bird_input.jpg](bird_input.jpg) \
- Source: https://www.flickr.com/photos/chivinskia/40619099560
-* [congress_input.jpg](congress_input.jpg) \
- Source:
- https://cao.house.gov/sites/cao.house.gov/files/documents/SAR-Jan-Jun-2016.pdf
diff --git a/research/deeplab/evaluation/testdata/bird_gt.png b/research/deeplab/evaluation/testdata/bird_gt.png
deleted file mode 100644
index 05d854915d1..00000000000
Binary files a/research/deeplab/evaluation/testdata/bird_gt.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/bird_pred_class.png b/research/deeplab/evaluation/testdata/bird_pred_class.png
deleted file mode 100644
index 07351bf0611..00000000000
Binary files a/research/deeplab/evaluation/testdata/bird_pred_class.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/bird_pred_instance.png b/research/deeplab/evaluation/testdata/bird_pred_instance.png
deleted file mode 100644
index faa1371f525..00000000000
Binary files a/research/deeplab/evaluation/testdata/bird_pred_instance.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/cat_gt.png b/research/deeplab/evaluation/testdata/cat_gt.png
deleted file mode 100644
index 41f60111f3d..00000000000
Binary files a/research/deeplab/evaluation/testdata/cat_gt.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/cat_pred_class.png b/research/deeplab/evaluation/testdata/cat_pred_class.png
deleted file mode 100644
index 3728c68ced2..00000000000
Binary files a/research/deeplab/evaluation/testdata/cat_pred_class.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/cat_pred_instance.png b/research/deeplab/evaluation/testdata/cat_pred_instance.png
deleted file mode 100644
index ebd9ba4855f..00000000000
Binary files a/research/deeplab/evaluation/testdata/cat_pred_instance.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/coco_gt.json b/research/deeplab/evaluation/testdata/coco_gt.json
deleted file mode 100644
index 5f79bf18433..00000000000
--- a/research/deeplab/evaluation/testdata/coco_gt.json
+++ /dev/null
@@ -1,214 +0,0 @@
-{
- "info": {
- "description": "Test COCO-format dataset",
- "url": "https://github.com/tensorflow/models/tree/master/research/deeplab",
- "version": "1.0",
- "year": 2019
- },
- "images": [
- {
- "id": 1,
- "file_name": "bird.jpg",
- "height": 159,
- "width": 240,
- "flickr_url": "https://www.flickr.com/photos/chivinskia/40619099560"
- },
- {
- "id": 2,
- "file_name": "cat.jpg",
- "height": 330,
- "width": 317,
- "flickr_url": "https://www.flickr.com/photos/magdalena_b/4995858743"
- },
- {
- "id": 3,
- "file_name": "team.jpg",
- "height": 231,
- "width": 345
- },
- {
- "id": 4,
- "file_name": "congress.jpg",
- "height": 267,
- "width": 525
- }
- ],
- "annotations": [
- {
- "image_id": 1,
- "file_name": "bird.png",
- "segments_info": [
- {
- "id": 255,
- "area": 2913,
- "category_id": 4,
- "iscrowd": 0
- },
- {
- "id": 2586368,
- "area": 1580,
- "category_id": 3,
- "iscrowd": 0
- },
- {
- "id": 16770360,
- "area": 33667,
- "category_id": 5,
- "iscrowd": 0
- }
- ]
- },
- {
- "image_id": 2,
- "file_name": "cat.png",
- "segments_info": [
- {
- "id": 16711691,
- "area": 34454,
- "category_id": 6,
- "iscrowd": 0
- }
- ]
- },
- {
- "image_id": 3,
- "file_name": "team.png",
- "segments_info": [
- {
- "id": 129,
- "area": 5443,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 255,
- "area": 3574,
- "category_id": 2,
- "iscrowd": 0
- },
- {
- "id": 47615,
- "area": 11483,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 65532,
- "area": 7080,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 8585107,
- "area": 11363,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 9011200,
- "area": 7158,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 12858027,
- "area": 6419,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 16053492,
- "area": 4350,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 16711680,
- "area": 5800,
- "category_id": 1,
- "iscrowd": 0
- }
- ]
- },
- {
- "image_id": 4,
- "file_name": "congress.png",
- "segments_info": [
- {
- "id": 255,
- "area": 243,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 65315,
- "area": 553,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 65516,
- "area": 652,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 9895680,
- "area": 82774,
- "category_id": 1,
- "iscrowd": 1
- },
- {
- "id": 16711739,
- "area": 137,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 16711868,
- "area": 179,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 16762624,
- "area": 2742,
- "category_id": 1,
- "iscrowd": 0
- }
- ]
- }
- ],
- "categories": [
- {
- "id": 1,
- "name": "person",
- "isthing": 1
- },
- {
- "id": 2,
- "name": "umbrella",
- "isthing": 1
- },
- {
- "id": 3,
- "name": "tree-merged",
- "isthing": 0
- },
- {
- "id": 4,
- "name": "bird",
- "isthing": 1
- },
- {
- "id": 5,
- "name": "sky",
- "isthing": 0
- },
- {
- "id": 6,
- "name": "cat",
- "isthing": 1
- }
- ]
-}
diff --git a/research/deeplab/evaluation/testdata/coco_gt/bird.png b/research/deeplab/evaluation/testdata/coco_gt/bird.png
deleted file mode 100644
index 9ef4ad95041..00000000000
Binary files a/research/deeplab/evaluation/testdata/coco_gt/bird.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/coco_gt/cat.png b/research/deeplab/evaluation/testdata/coco_gt/cat.png
deleted file mode 100644
index cb02530f2f9..00000000000
Binary files a/research/deeplab/evaluation/testdata/coco_gt/cat.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/coco_gt/congress.png b/research/deeplab/evaluation/testdata/coco_gt/congress.png
deleted file mode 100644
index a56b98d3361..00000000000
Binary files a/research/deeplab/evaluation/testdata/coco_gt/congress.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/coco_gt/team.png b/research/deeplab/evaluation/testdata/coco_gt/team.png
deleted file mode 100644
index bde358d151a..00000000000
Binary files a/research/deeplab/evaluation/testdata/coco_gt/team.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/coco_pred.json b/research/deeplab/evaluation/testdata/coco_pred.json
deleted file mode 100644
index 4aead17a65d..00000000000
--- a/research/deeplab/evaluation/testdata/coco_pred.json
+++ /dev/null
@@ -1,208 +0,0 @@
-{
- "info": {
- "description": "Test COCO-format dataset",
- "url": "https://github.com/tensorflow/models/tree/master/research/deeplab",
- "version": "1.0",
- "year": 2019
- },
- "images": [
- {
- "id": 1,
- "file_name": "bird.jpg",
- "height": 159,
- "width": 240,
- "flickr_url": "https://www.flickr.com/photos/chivinskia/40619099560"
- },
- {
- "id": 2,
- "file_name": "cat.jpg",
- "height": 330,
- "width": 317,
- "flickr_url": "https://www.flickr.com/photos/magdalena_b/4995858743"
- },
- {
- "id": 3,
- "file_name": "team.jpg",
- "height": 231,
- "width": 345
- },
- {
- "id": 4,
- "file_name": "congress.jpg",
- "height": 267,
- "width": 525
- }
- ],
- "annotations": [
- {
- "image_id": 1,
- "file_name": "bird.png",
- "segments_info": [
- {
- "id": 55551,
- "area": 3039,
- "category_id": 4,
- "iscrowd": 0
- },
- {
- "id": 16216831,
- "area": 33659,
- "category_id": 5,
- "iscrowd": 0
- },
- {
- "id": 16760832,
- "area": 1237,
- "category_id": 3,
- "iscrowd": 0
- }
- ]
- },
- {
- "image_id": 2,
- "file_name": "cat.png",
- "segments_info": [
- {
- "id": 36493,
- "area": 26910,
- "category_id": 6,
- "iscrowd": 0
- }
- ]
- },
- {
- "image_id": 3,
- "file_name": "team.png",
- "segments_info": [
- {
- "id": 0,
- "area": 22164,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 129,
- "area": 3418,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 255,
- "area": 12827,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 740608,
- "area": 8606,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 2555695,
- "area": 7636,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 2883541,
- "area": 6844,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 14408667,
- "area": 4766,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 16711820,
- "area": 4767,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 16768768,
- "area": 8667,
- "category_id": 1,
- "iscrowd": 0
- }
- ]
- },
- {
- "image_id": 4,
- "file_name": "congress.png",
- "segments_info": [
- {
- "id": 255,
- "area": 2599,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 37375,
- "area": 386,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 62207,
- "area": 384,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 5177088,
- "area": 260,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 16711691,
- "area": 1011,
- "category_id": 1,
- "iscrowd": 0
- },
- {
- "id": 16774912,
- "area": 803,
- "category_id": 1,
- "iscrowd": 0
- }
- ]
- }
- ],
- "categories": [
- {
- "id": 1,
- "name": "person",
- "isthing": 1
- },
- {
- "id": 2,
- "name": "umbrella",
- "isthing": 1
- },
- {
- "id": 3,
- "name": "tree-merged",
- "isthing": 0
- },
- {
- "id": 4,
- "name": "bird",
- "isthing": 1
- },
- {
- "id": 5,
- "name": "sky",
- "isthing": 0
- },
- {
- "id": 6,
- "name": "cat",
- "isthing": 1
- }
- ]
-}
diff --git a/research/deeplab/evaluation/testdata/coco_pred/bird.png b/research/deeplab/evaluation/testdata/coco_pred/bird.png
deleted file mode 100644
index c9b4cbcbf44..00000000000
Binary files a/research/deeplab/evaluation/testdata/coco_pred/bird.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/coco_pred/cat.png b/research/deeplab/evaluation/testdata/coco_pred/cat.png
deleted file mode 100644
index 324583271c4..00000000000
Binary files a/research/deeplab/evaluation/testdata/coco_pred/cat.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/coco_pred/congress.png b/research/deeplab/evaluation/testdata/coco_pred/congress.png
deleted file mode 100644
index fc7bb06050e..00000000000
Binary files a/research/deeplab/evaluation/testdata/coco_pred/congress.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/coco_pred/team.png b/research/deeplab/evaluation/testdata/coco_pred/team.png
deleted file mode 100644
index 7300bf41f03..00000000000
Binary files a/research/deeplab/evaluation/testdata/coco_pred/team.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/team_gt_instance.png b/research/deeplab/evaluation/testdata/team_gt_instance.png
deleted file mode 100644
index 97abb55273c..00000000000
Binary files a/research/deeplab/evaluation/testdata/team_gt_instance.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/team_pred_class.png b/research/deeplab/evaluation/testdata/team_pred_class.png
deleted file mode 100644
index 2ed78de2cbd..00000000000
Binary files a/research/deeplab/evaluation/testdata/team_pred_class.png and /dev/null differ
diff --git a/research/deeplab/evaluation/testdata/team_pred_instance.png b/research/deeplab/evaluation/testdata/team_pred_instance.png
deleted file mode 100644
index 264606a4d88..00000000000
Binary files a/research/deeplab/evaluation/testdata/team_pred_instance.png and /dev/null differ
diff --git a/research/deeplab/export_model.py b/research/deeplab/export_model.py
deleted file mode 100644
index b7307b5a212..00000000000
--- a/research/deeplab/export_model.py
+++ /dev/null
@@ -1,201 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Exports trained model to TensorFlow frozen graph."""
-
-import os
-import tensorflow as tf
-
-from tensorflow.contrib import quantize as contrib_quantize
-from tensorflow.python.tools import freeze_graph
-from deeplab import common
-from deeplab import input_preprocess
-from deeplab import model
-
-slim = tf.contrib.slim
-flags = tf.app.flags
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('checkpoint_path', None, 'Checkpoint path')
-
-flags.DEFINE_string('export_path', None,
- 'Path to output Tensorflow frozen graph.')
-
-flags.DEFINE_integer('num_classes', 21, 'Number of classes.')
-
-flags.DEFINE_multi_integer('crop_size', [513, 513],
- 'Crop size [height, width].')
-
-# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
-# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
-# one could use different atrous_rates/output_stride during training/evaluation.
-flags.DEFINE_multi_integer('atrous_rates', None,
- 'Atrous rates for atrous spatial pyramid pooling.')
-
-flags.DEFINE_integer('output_stride', 8,
- 'The ratio of input to output spatial resolution.')
-
-# Change to [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] for multi-scale inference.
-flags.DEFINE_multi_float('inference_scales', [1.0],
- 'The scales to resize images for inference.')
-
-flags.DEFINE_bool('add_flipped_images', False,
- 'Add flipped images during inference or not.')
-
-flags.DEFINE_integer(
- 'quantize_delay_step', -1,
- 'Steps to start quantized training. If < 0, will not quantize model.')
-
-flags.DEFINE_bool('save_inference_graph', False,
- 'Save inference graph in text proto.')
-
-# Input name of the exported model.
-_INPUT_NAME = 'ImageTensor'
-
-# Output name of the exported predictions.
-_OUTPUT_NAME = 'SemanticPredictions'
-_RAW_OUTPUT_NAME = 'RawSemanticPredictions'
-
-# Output name of the exported probabilities.
-_OUTPUT_PROB_NAME = 'SemanticProbabilities'
-_RAW_OUTPUT_PROB_NAME = 'RawSemanticProbabilities'
-
-
-def _create_input_tensors():
- """Creates and prepares input tensors for DeepLab model.
-
- This method creates a 4-D uint8 image tensor 'ImageTensor' with shape
- [1, None, None, 3]. The actual input tensor name to use during inference is
- 'ImageTensor:0'.
-
- Returns:
- image: Preprocessed 4-D float32 tensor with shape [1, crop_height,
- crop_width, 3].
- original_image_size: Original image shape tensor [height, width].
- resized_image_size: Resized image shape tensor [height, width].
- """
- # input_preprocess takes 4-D image tensor as input.
- input_image = tf.placeholder(tf.uint8, [1, None, None, 3], name=_INPUT_NAME)
- original_image_size = tf.shape(input_image)[1:3]
-
- # Squeeze the dimension in axis=0 since `preprocess_image_and_label` assumes
- # image to be 3-D.
- image = tf.squeeze(input_image, axis=0)
- resized_image, image, _ = input_preprocess.preprocess_image_and_label(
- image,
- label=None,
- crop_height=FLAGS.crop_size[0],
- crop_width=FLAGS.crop_size[1],
- min_resize_value=FLAGS.min_resize_value,
- max_resize_value=FLAGS.max_resize_value,
- resize_factor=FLAGS.resize_factor,
- is_training=False,
- model_variant=FLAGS.model_variant)
- resized_image_size = tf.shape(resized_image)[:2]
-
- # Expand the dimension in axis=0, since the following operations assume the
- # image to be 4-D.
- image = tf.expand_dims(image, 0)
-
- return image, original_image_size, resized_image_size
-
-
-def main(unused_argv):
- tf.logging.set_verbosity(tf.logging.INFO)
- tf.logging.info('Prepare to export model to: %s', FLAGS.export_path)
-
- with tf.Graph().as_default():
- image, image_size, resized_image_size = _create_input_tensors()
-
- model_options = common.ModelOptions(
- outputs_to_num_classes={common.OUTPUT_TYPE: FLAGS.num_classes},
- crop_size=FLAGS.crop_size,
- atrous_rates=FLAGS.atrous_rates,
- output_stride=FLAGS.output_stride)
-
- if tuple(FLAGS.inference_scales) == (1.0,):
- tf.logging.info('Exported model performs single-scale inference.')
- predictions = model.predict_labels(
- image,
- model_options=model_options,
- image_pyramid=FLAGS.image_pyramid)
- else:
- tf.logging.info('Exported model performs multi-scale inference.')
- if FLAGS.quantize_delay_step >= 0:
- raise ValueError(
- 'Quantize mode is not supported with multi-scale test.')
- predictions = model.predict_labels_multi_scale(
- image,
- model_options=model_options,
- eval_scales=FLAGS.inference_scales,
- add_flipped_images=FLAGS.add_flipped_images)
- raw_predictions = tf.identity(
- tf.cast(predictions[common.OUTPUT_TYPE], tf.float32),
- _RAW_OUTPUT_NAME)
- raw_probabilities = tf.identity(
- predictions[common.OUTPUT_TYPE + model.PROB_SUFFIX],
- _RAW_OUTPUT_PROB_NAME)
-
- # Crop the valid regions from the predictions.
- semantic_predictions = raw_predictions[
- :, :resized_image_size[0], :resized_image_size[1]]
- semantic_probabilities = raw_probabilities[
- :, :resized_image_size[0], :resized_image_size[1]]
-
- # Resize back the prediction to the original image size.
- def _resize_label(label, label_size):
- # Expand dimension of label to [1, height, width, 1] for resize operation.
- label = tf.expand_dims(label, 3)
- resized_label = tf.image.resize_images(
- label,
- label_size,
- method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
- align_corners=True)
- return tf.cast(tf.squeeze(resized_label, 3), tf.int32)
- semantic_predictions = _resize_label(semantic_predictions, image_size)
- semantic_predictions = tf.identity(semantic_predictions, name=_OUTPUT_NAME)
-
- semantic_probabilities = tf.image.resize_bilinear(
- semantic_probabilities, image_size, align_corners=True,
- name=_OUTPUT_PROB_NAME)
-
- if FLAGS.quantize_delay_step >= 0:
- contrib_quantize.create_eval_graph()
-
- saver = tf.train.Saver(tf.all_variables())
-
- dirname = os.path.dirname(FLAGS.export_path)
- tf.gfile.MakeDirs(dirname)
- graph_def = tf.get_default_graph().as_graph_def(add_shapes=True)
- freeze_graph.freeze_graph_with_def_protos(
- graph_def,
- saver.as_saver_def(),
- FLAGS.checkpoint_path,
- _OUTPUT_NAME + ',' + _OUTPUT_PROB_NAME,
- restore_op_name=None,
- filename_tensor_name=None,
- output_graph=FLAGS.export_path,
- clear_devices=True,
- initializer_nodes=None)
-
- if FLAGS.save_inference_graph:
- tf.train.write_graph(graph_def, dirname, 'inference_graph.pbtxt')
-
-
-if __name__ == '__main__':
- flags.mark_flag_as_required('checkpoint_path')
- flags.mark_flag_as_required('export_path')
- tf.app.run()
diff --git a/research/deeplab/g3doc/ade20k.md b/research/deeplab/g3doc/ade20k.md
deleted file mode 100644
index 9505ab2cd99..00000000000
--- a/research/deeplab/g3doc/ade20k.md
+++ /dev/null
@@ -1,107 +0,0 @@
-# Running DeepLab on ADE20K Semantic Segmentation Dataset
-
-This page walks through the steps required to run DeepLab on ADE20K dataset on a
-local machine.
-
-## Download dataset and convert to TFRecord
-
-We have prepared the script (under the folder `datasets`) to download and
-convert ADE20K semantic segmentation dataset to TFRecord.
-
-```bash
-# From the tensorflow/models/research/deeplab/datasets directory.
-bash download_and_convert_ade20k.sh
-```
-
-The converted dataset will be saved at ./deeplab/datasets/ADE20K/tfrecord
-
-## Recommended Directory Structure for Training and Evaluation
-
-```
-+ datasets
- - build_data.py
- - build_ade20k_data.py
- - download_and_convert_ade20k.sh
- + ADE20K
- + tfrecord
- + exp
- + train_on_train_set
- + train
- + eval
- + vis
- + ADEChallengeData2016
- + annotations
- + training
- + validation
- + images
- + training
- + validation
-```
-
-where the folder `train_on_train_set` stores the train/eval/vis events and
-results (when training DeepLab on the ADE20K train set).
-
-## Running the train/eval/vis jobs
-
-A local training job using `xception_65` can be run with the following command:
-
-```bash
-# From tensorflow/models/research/
-python deeplab/train.py \
- --logtostderr \
- --training_number_of_steps=150000 \
- --train_split="train" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --train_crop_size="513,513" \
- --train_batch_size=4 \
- --min_resize_value=513 \
- --max_resize_value=513 \
- --resize_factor=16 \
- --dataset="ade20k" \
- --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
- --train_logdir=${PATH_TO_TRAIN_DIR}\
- --dataset_dir=${PATH_TO_DATASET}
-```
-
-where ${PATH\_TO\_INITIAL\_CHECKPOINT} is the path to the initial checkpoint.
-${PATH\_TO\_TRAIN\_DIR} is the directory in which training checkpoints and
-events will be written to (it is recommended to set it to the
-`train_on_train_set/train` above), and ${PATH\_TO\_DATASET} is the directory in
-which the ADE20K dataset resides (the `tfrecord` above)
-
-**Note that for train.py:**
-
-1. In order to fine tune the BN layers, one needs to use large batch size (>
- 12), and set fine_tune_batch_norm = True. Here, we simply use small batch
- size during training for the purpose of demonstration. If the users have
- limited GPU memory at hand, please fine-tune from our provided checkpoints
- whose batch norm parameters have been trained, and use smaller learning rate
- with fine_tune_batch_norm = False.
-
-2. User should fine tune the `min_resize_value` and `max_resize_value` to get
- better result. Note that `resize_factor` has to be equal to `output_stride`.
-
-3. The users should change atrous_rates from [6, 12, 18] to [12, 24, 36] if
- setting output_stride=8.
-
-4. The users could skip the flag, `decoder_output_stride`, if you do not want
- to use the decoder structure.
-
-## Running Tensorboard
-
-Progress for training and evaluation jobs can be inspected using Tensorboard. If
-using the recommended directory structure, Tensorboard can be run using the
-following command:
-
-```bash
-tensorboard --logdir=${PATH_TO_LOG_DIRECTORY}
-```
-
-where `${PATH_TO_LOG_DIRECTORY}` points to the directory that contains the train
-directorie (e.g., the folder `train_on_train_set` in the above example). Please
-note it may take Tensorboard a couple minutes to populate with data.
diff --git a/research/deeplab/g3doc/cityscapes.md b/research/deeplab/g3doc/cityscapes.md
deleted file mode 100644
index 5a660aaca34..00000000000
--- a/research/deeplab/g3doc/cityscapes.md
+++ /dev/null
@@ -1,159 +0,0 @@
-# Running DeepLab on Cityscapes Semantic Segmentation Dataset
-
-This page walks through the steps required to run DeepLab on Cityscapes on a
-local machine.
-
-## Download dataset and convert to TFRecord
-
-We have prepared the script (under the folder `datasets`) to convert Cityscapes
-dataset to TFRecord. The users are required to download the dataset beforehand
-by registering the [website](https://www.cityscapes-dataset.com/).
-
-```bash
-# From the tensorflow/models/research/deeplab/datasets directory.
-sh convert_cityscapes.sh
-```
-
-The converted dataset will be saved at ./deeplab/datasets/cityscapes/tfrecord.
-
-## Recommended Directory Structure for Training and Evaluation
-
-```
-+ datasets
- + cityscapes
- + leftImg8bit
- + gtFine
- + tfrecord
- + exp
- + train_on_train_set
- + train
- + eval
- + vis
-```
-
-where the folder `train_on_train_set` stores the train/eval/vis events and
-results (when training DeepLab on the Cityscapes train set).
-
-## Running the train/eval/vis jobs
-
-A local training job using `xception_65` can be run with the following command:
-
-```bash
-# From tensorflow/models/research/
-python deeplab/train.py \
- --logtostderr \
- --training_number_of_steps=90000 \
- --train_split="train_fine" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --train_crop_size="769,769" \
- --train_batch_size=1 \
- --dataset="cityscapes" \
- --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
- --train_logdir=${PATH_TO_TRAIN_DIR} \
- --dataset_dir=${PATH_TO_DATASET}
-```
-
-where ${PATH_TO_INITIAL_CHECKPOINT} is the path to the initial checkpoint
-(usually an ImageNet pretrained checkpoint), ${PATH_TO_TRAIN_DIR} is the
-directory in which training checkpoints and events will be written to, and
-${PATH_TO_DATASET} is the directory in which the Cityscapes dataset resides.
-
-**Note that for {train,eval,vis}.py**:
-
-1. In order to reproduce our results, one needs to use large batch size (> 8),
- and set fine_tune_batch_norm = True. Here, we simply use small batch size
- during training for the purpose of demonstration. If the users have limited
- GPU memory at hand, please fine-tune from our provided checkpoints whose
- batch norm parameters have been trained, and use smaller learning rate with
- fine_tune_batch_norm = False.
-
-2. The users should change atrous_rates from [6, 12, 18] to [12, 24, 36] if
- setting output_stride=8.
-
-3. The users could skip the flag, `decoder_output_stride`, if you do not want
- to use the decoder structure.
-
-4. Change and add the following flags in order to use the provided dense
- prediction cell. Note we need to set decoder_output_stride if you want to
- use the provided checkpoints which include the decoder module.
-
-```bash
---model_variant="xception_71"
---dense_prediction_cell_json="deeplab/core/dense_prediction_cell_branch5_top1_cityscapes.json"
---decoder_output_stride=4
-```
-
-A local evaluation job using `xception_65` can be run with the following
-command:
-
-```bash
-# From tensorflow/models/research/
-python deeplab/eval.py \
- --logtostderr \
- --eval_split="val_fine" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --eval_crop_size="1025,2049" \
- --dataset="cityscapes" \
- --checkpoint_dir=${PATH_TO_CHECKPOINT} \
- --eval_logdir=${PATH_TO_EVAL_DIR} \
- --dataset_dir=${PATH_TO_DATASET}
-```
-
-where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the
-path to train_logdir), ${PATH_TO_EVAL_DIR} is the directory in which evaluation
-events will be written to, and ${PATH_TO_DATASET} is the directory in which the
-Cityscapes dataset resides.
-
-A local visualization job using `xception_65` can be run with the following
-command:
-
-```bash
-# From tensorflow/models/research/
-python deeplab/vis.py \
- --logtostderr \
- --vis_split="val_fine" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --vis_crop_size="1025,2049" \
- --dataset="cityscapes" \
- --colormap_type="cityscapes" \
- --checkpoint_dir=${PATH_TO_CHECKPOINT} \
- --vis_logdir=${PATH_TO_VIS_DIR} \
- --dataset_dir=${PATH_TO_DATASET}
-```
-
-where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the
-path to train_logdir), ${PATH_TO_VIS_DIR} is the directory in which evaluation
-events will be written to, and ${PATH_TO_DATASET} is the directory in which the
-Cityscapes dataset resides. Note that if the users would like to save the
-segmentation results for evaluation server, set also_save_raw_predictions =
-True.
-
-## Running Tensorboard
-
-Progress for training and evaluation jobs can be inspected using Tensorboard. If
-using the recommended directory structure, Tensorboard can be run using the
-following command:
-
-```bash
-tensorboard --logdir=${PATH_TO_LOG_DIRECTORY}
-```
-
-where `${PATH_TO_LOG_DIRECTORY}` points to the directory that contains the
-train, eval, and vis directories (e.g., the folder `train_on_train_set` in the
-above example). Please note it may take Tensorboard a couple minutes to populate
-with data.
diff --git a/research/deeplab/g3doc/export_model.md b/research/deeplab/g3doc/export_model.md
deleted file mode 100644
index c41649e609a..00000000000
--- a/research/deeplab/g3doc/export_model.md
+++ /dev/null
@@ -1,23 +0,0 @@
-# Export trained deeplab model to frozen inference graph
-
-After model training finishes, you could export it to a frozen TensorFlow
-inference graph proto. Your trained model checkpoint usually includes the
-following files:
-
-* model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001,
-* model.ckpt-${CHECKPOINT_NUMBER}.index
-* model.ckpt-${CHECKPOINT_NUMBER}.meta
-
-After you have identified a candidate checkpoint to export, you can run the
-following commandline to export to a frozen graph:
-
-```bash
-# From tensorflow/models/research/
-# Assume all checkpoint files share the same path prefix `${CHECKPOINT_PATH}`.
-python deeplab/export_model.py \
- --checkpoint_path=${CHECKPOINT_PATH} \
- --export_path=${OUTPUT_DIR}/frozen_inference_graph.pb
-```
-
-Please also add other model specific flags as you use for training, such as
-`model_variant`, `add_image_level_feature`, etc.
diff --git a/research/deeplab/g3doc/faq.md b/research/deeplab/g3doc/faq.md
deleted file mode 100644
index 26ff4b3281c..00000000000
--- a/research/deeplab/g3doc/faq.md
+++ /dev/null
@@ -1,87 +0,0 @@
-# FAQ
-___
-Q1: What if I want to use other network backbones, such as ResNet [1], instead of only those provided ones (e.g., Xception)?
-
-A: The users could modify the provided core/feature_extractor.py to support more network backbones.
-___
-Q2: What if I want to train the model on other datasets?
-
-A: The users could modify the provided dataset/build_{cityscapes,voc2012}_data.py and dataset/segmentation_dataset.py to build their own dataset.
-___
-Q3: Where can I download the PASCAL VOC augmented training set?
-
-A: The PASCAL VOC augmented training set is provided by Bharath Hariharan et al. [2] Please refer to their [website](http://home.bharathh.info/pubs/codes/SBD/download.html) for details and consider citing their paper if using the dataset.
-___
-Q4: Why the implementation does not include DenseCRF [3]?
-
-A: We have not tried this. The interested users could take a look at Philipp Krähenbühl's [website](http://graphics.stanford.edu/projects/densecrf/) and [paper](https://arxiv.org/abs/1210.5644) for details.
-___
-Q5: What if I want to train the model and fine-tune the batch normalization parameters?
-
-A: If given the limited resource at hand, we would suggest you simply fine-tune
-from our provided checkpoint whose batch-norm parameters have been trained (i.e.,
-train with a smaller learning rate, set `fine_tune_batch_norm = false`, and
-employ longer training iterations since the learning rate is small). If
-you really would like to train by yourself, we would suggest
-
-1. Set `output_stride = 16` or maybe even `32` (remember to change the flag
-`atrous_rates` accordingly, e.g., `atrous_rates = [3, 6, 9]` for
-`output_stride = 32`).
-
-2. Use as many GPUs as possible (change the flag `num_clones` in train.py) and
-set `train_batch_size` as large as possible.
-
-3. Adjust the `train_crop_size` in train.py. Maybe set it to be smaller, e.g.,
-513x513 (or even 321x321), so that you could use a larger batch size.
-
-4. Use a smaller network backbone, such as MobileNet-v2.
-
-___
-Q6: How can I train the model asynchronously?
-
-A: In the train.py, the users could set `num_replicas` (number of machines for training) and `num_ps_tasks` (we usually set `num_ps_tasks` = `num_replicas` / 2). See slim.deployment.model_deploy for more details.
-___
-Q7: I could not reproduce the performance even with the provided checkpoints.
-
-A: Please try running
-
-```bash
-# Run the simple test with Xception_65 as network backbone.
-sh local_test.sh
-```
-
-or
-
-```bash
-# Run the simple test with MobileNet-v2 as network backbone.
-sh local_test_mobilenetv2.sh
-```
-
-First, make sure you could reproduce the results with our provided setting.
-After that, you could start to make a new change one at a time to help debug.
-___
-Q8: What value of `eval_crop_size` should I use?
-
-A: Our model uses whole-image inference, meaning that we need to set `eval_crop_size` equal to `output_stride` * k + 1, where k is an integer and set k so that the resulting `eval_crop_size` is slightly larger the largest
-image dimension in the dataset. For example, we have `eval_crop_size` = 513x513 for PASCAL dataset whose largest image dimension is 512. Similarly, we set `eval_crop_size` = 1025x2049 for Cityscapes images whose
-image dimension is all equal to 1024x2048.
-___
-Q9: Why multi-gpu training is slow?
-
-A: Please try to use more threads to pre-process the inputs. For, example change [num_readers = 4](https://github.com/tensorflow/models/blob/master/research/deeplab/train.py#L457).
-___
-
-
-## References
-
-1. **Deep Residual Learning for Image Recognition**
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
- [[link]](https://arxiv.org/abs/1512.03385), In CVPR, 2016.
-
-2. **Semantic Contours from Inverse Detectors**
- Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik
- [[link]](http://home.bharathh.info/pubs/codes/SBD/download.html), In ICCV, 2011.
-
-3. **Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials**
- Philipp Krähenbühl, Vladlen Koltun
- [[link]](http://graphics.stanford.edu/projects/densecrf/), In NIPS, 2011.
diff --git a/research/deeplab/g3doc/img/image1.jpg b/research/deeplab/g3doc/img/image1.jpg
deleted file mode 100644
index 939b6f9cef3..00000000000
Binary files a/research/deeplab/g3doc/img/image1.jpg and /dev/null differ
diff --git a/research/deeplab/g3doc/img/image2.jpg b/research/deeplab/g3doc/img/image2.jpg
deleted file mode 100644
index 5ec1b8ac278..00000000000
Binary files a/research/deeplab/g3doc/img/image2.jpg and /dev/null differ
diff --git a/research/deeplab/g3doc/img/image3.jpg b/research/deeplab/g3doc/img/image3.jpg
deleted file mode 100644
index d788e3dc68d..00000000000
Binary files a/research/deeplab/g3doc/img/image3.jpg and /dev/null differ
diff --git a/research/deeplab/g3doc/img/image_info.txt b/research/deeplab/g3doc/img/image_info.txt
deleted file mode 100644
index 583d113e7eb..00000000000
--- a/research/deeplab/g3doc/img/image_info.txt
+++ /dev/null
@@ -1,13 +0,0 @@
-Image provenance:
-
-image1.jpg: Philippe Put,
- https://www.flickr.com/photos/34547181@N00/14499172124
-
-image2.jpg: Peretz Partensky
- https://www.flickr.com/photos/ifl/3926001309
-
-image3.jpg: Peter Harrison
- https://www.flickr.com/photos/devcentre/392585679
-
-
-vis[1-3].png: Showing original image together with DeepLab segmentation map.
diff --git a/research/deeplab/g3doc/img/vis1.png b/research/deeplab/g3doc/img/vis1.png
deleted file mode 100644
index 41b8ecd8959..00000000000
Binary files a/research/deeplab/g3doc/img/vis1.png and /dev/null differ
diff --git a/research/deeplab/g3doc/img/vis2.png b/research/deeplab/g3doc/img/vis2.png
deleted file mode 100644
index 7fa7a4cacc4..00000000000
Binary files a/research/deeplab/g3doc/img/vis2.png and /dev/null differ
diff --git a/research/deeplab/g3doc/img/vis3.png b/research/deeplab/g3doc/img/vis3.png
deleted file mode 100644
index 813b6340a61..00000000000
Binary files a/research/deeplab/g3doc/img/vis3.png and /dev/null differ
diff --git a/research/deeplab/g3doc/installation.md b/research/deeplab/g3doc/installation.md
deleted file mode 100644
index 591a1f8da50..00000000000
--- a/research/deeplab/g3doc/installation.md
+++ /dev/null
@@ -1,73 +0,0 @@
-# Installation
-
-## Dependencies
-
-DeepLab depends on the following libraries:
-
-* Numpy
-* Pillow 1.0
-* tf Slim (which is included in the "tensorflow/models/research/" checkout)
-* Jupyter notebook
-* Matplotlib
-* Tensorflow
-
-For detailed steps to install Tensorflow, follow the [Tensorflow installation
-instructions](https://www.tensorflow.org/install/). A typical user can install
-Tensorflow using one of the following commands:
-
-```bash
-# For CPU
-pip install tensorflow
-# For GPU
-pip install tensorflow-gpu
-```
-
-The remaining libraries can be installed on Ubuntu 14.04 using via apt-get:
-
-```bash
-sudo apt-get install python-pil python-numpy
-pip install --user jupyter
-pip install --user matplotlib
-pip install --user PrettyTable
-```
-
-## Add Libraries to PYTHONPATH
-
-When running locally, the tensorflow/models/research/ directory should be
-appended to PYTHONPATH. This can be done by running the following from
-tensorflow/models/research/:
-
-```bash
-# From tensorflow/models/research/
-export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
-
-# [Optional] for panoptic evaluation, you might need panopticapi:
-# https://github.com/cocodataset/panopticapi
-# Please clone it to a local directory ${PANOPTICAPI_DIR}
-touch ${PANOPTICAPI_DIR}/panopticapi/__init__.py
-export PYTHONPATH=$PYTHONPATH:${PANOPTICAPI_DIR}/panopticapi
-```
-
-Note: This command needs to run from every new terminal you start. If you wish
-to avoid running this manually, you can add it as a new line to the end of your
-~/.bashrc file.
-
-# Testing the Installation
-
-You can test if you have successfully installed the Tensorflow DeepLab by
-running the following commands:
-
-Quick test by running model_test.py:
-
-```bash
-# From tensorflow/models/research/
-python deeplab/model_test.py
-```
-
-Quick running the whole code on the PASCAL VOC 2012 dataset:
-
-```bash
-# From tensorflow/models/research/deeplab
-bash local_test.sh
-```
-
diff --git a/research/deeplab/g3doc/model_zoo.md b/research/deeplab/g3doc/model_zoo.md
deleted file mode 100644
index 76972dc796e..00000000000
--- a/research/deeplab/g3doc/model_zoo.md
+++ /dev/null
@@ -1,254 +0,0 @@
-# TensorFlow DeepLab Model Zoo
-
-We provide deeplab models pretrained several datasets, including (1) PASCAL VOC
-2012, (2) Cityscapes, and (3) ADE20K for reproducing our results, as well as
-some checkpoints that are only pretrained on ImageNet for training your own
-models.
-
-## DeepLab models trained on PASCAL VOC 2012
-
-Un-tar'ed directory includes:
-
-* a frozen inference graph (`frozen_inference_graph.pb`). All frozen inference
- graphs by default use output stride of 8, a single eval scale of 1.0 and
- no left-right flips, unless otherwise specified. MobileNet-v2 based models
- do not include the decoder module.
-
-* a checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`)
-
-### Model details
-
-We provide several checkpoints that have been pretrained on VOC 2012 train_aug
-set or train_aug + trainval set. In the former case, one could train their model
-with smaller batch size and freeze batch normalization when limited GPU memory
-is available, since we have already fine-tuned the batch normalization for you.
-In the latter case, one could directly evaluate the checkpoints on VOC 2012 test
-set or use this checkpoint for demo. Note *MobileNet-v2* based models do not
-employ ASPP and decoder modules for fast computation.
-
-Checkpoint name | Network backbone | Pretrained dataset | ASPP | Decoder
---------------------------- | :--------------: | :-----------------: | :---: | :-----:
-mobilenetv2_dm05_coco_voc_trainaug | MobileNet-v2 Depth-Multiplier = 0.5 | ImageNet MS-COCO VOC 2012 train_aug set| N/A | N/A
-mobilenetv2_dm05_coco_voc_trainval | MobileNet-v2 Depth-Multiplier = 0.5 | ImageNet MS-COCO VOC 2012 train_aug + trainval sets | N/A | N/A
-mobilenetv2_coco_voc_trainaug | MobileNet-v2 | ImageNet MS-COCO VOC 2012 train_aug set| N/A | N/A
-mobilenetv2_coco_voc_trainval | MobileNet-v2 | ImageNet MS-COCO VOC 2012 train_aug + trainval sets | N/A | N/A
-xception65_coco_voc_trainaug | Xception_65 | ImageNet MS-COCO VOC 2012 train_aug set| [6,12,18] for OS=16 [12,24,36] for OS=8 | OS = 4
-xception65_coco_voc_trainval | Xception_65 | ImageNet MS-COCO VOC 2012 train_aug + trainval sets | [6,12,18] for OS=16 [12,24,36] for OS=8 | OS = 4
-
-In the table, **OS** denotes output stride.
-
-Checkpoint name | Eval OS | Eval scales | Left-right Flip | Multiply-Adds | Runtime (sec) | PASCAL mIOU | File Size
------------------------------------------------------------------------------------------------------------------------- | :-------: | :------------------------: | :-------------: | :------------------: | :------------: | :----------------------------: | :-------:
-[mobilenetv2_dm05_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz) | 16 | [1.0] | No | 0.88B | - | 70.19% (val) | 7.6MB
-[mobilenetv2_dm05_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainval_2018_10_01.tar.gz) | 8 | [1.0] | No | 2.84B | - | 71.83% (test) | 7.6MB
-[mobilenetv2_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz) | 16 8 | [1.0] [0.5:0.25:1.75] | No Yes | 2.75B 152.59B | 0.1 26.9 | 75.32% (val) 77.33 (val) | 23MB
-[mobilenetv2_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz) | 8 | [0.5:0.25:1.75] | Yes | 152.59B | 26.9 | 80.25% (**test**) | 23MB
-[xception65_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_pascal_train_aug_2018_01_04.tar.gz) | 16 8 | [1.0] [0.5:0.25:1.75] | No Yes | 54.17B 3055.35B | 0.7 223.2 | 82.20% (val) 83.58% (val) | 439MB
-[xception65_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_pascal_trainval_2018_01_04.tar.gz) | 8 | [0.5:0.25:1.75] | Yes | 3055.35B | 223.2 | 87.80% (**test**) | 439MB
-
-In the table, we report both computation complexity (in terms of Multiply-Adds
-and CPU Runtime) and segmentation performance (in terms of mIOU) on the PASCAL
-VOC val or test set. The reported runtime is calculated by tfprof on a
-workstation with CPU E5-1650 v3 @ 3.50GHz and 32GB memory. Note that applying
-multi-scale inputs and left-right flips increases the segmentation performance
-but also significantly increases the computation and thus may not be suitable
-for real-time applications.
-
-## DeepLab models trained on Cityscapes
-
-### Model details
-
-We provide several checkpoints that have been pretrained on Cityscapes
-train_fine set. Note *MobileNet-v2* based model has been pretrained on MS-COCO
-dataset and does not employ ASPP and decoder modules for fast computation.
-
-Checkpoint name | Network backbone | Pretrained dataset | ASPP | Decoder
-------------------------------------- | :--------------: | :-------------------------------------: | :----------------------------------------------: | :-----:
-mobilenetv2_coco_cityscapes_trainfine | MobileNet-v2 | ImageNet MS-COCO Cityscapes train_fine set | N/A | N/A
-mobilenetv3_large_cityscapes_trainfine | MobileNet-v3 Large | Cityscapes train_fine set (No ImageNet) | N/A | OS = 8
-mobilenetv3_small_cityscapes_trainfine | MobileNet-v3 Small | Cityscapes train_fine set (No ImageNet) | N/A | OS = 8
-xception65_cityscapes_trainfine | Xception_65 | ImageNet Cityscapes train_fine set | [6, 12, 18] for OS=16 [12, 24, 36] for OS=8 | OS = 4
-xception71_dpc_cityscapes_trainfine | Xception_71 | ImageNet MS-COCO Cityscapes train_fine set | Dense Prediction Cell | OS = 4
-xception71_dpc_cityscapes_trainval | Xception_71 | ImageNet MS-COCO Cityscapes trainval_fine and coarse set | Dense Prediction Cell | OS = 4
-
-In the table, **OS** denotes output stride.
-
-Note for mobilenet v3 models, we use additional commandline flags as follows:
-
-```
---model_variant={ mobilenet_v3_large_seg | mobilenet_v3_small_seg }
---image_pooling_crop_size=769,769
---image_pooling_stride=4,5
---add_image_level_feature=1
---aspp_convs_filters=128
---aspp_with_concat_projection=0
---aspp_with_squeeze_and_excitation=1
---decoder_use_sum_merge=1
---decoder_filters=19
---decoder_output_is_logits=1
---image_se_uses_qsigmoid=1
---decoder_output_stride=8
---output_stride=32
-```
-
-Checkpoint name | Eval OS | Eval scales | Left-right Flip | Multiply-Adds | Runtime (sec) | Cityscapes mIOU | File Size
--------------------------------------------------------------------------------------------------------------------------------- | :-------: | :-------------------------: | :-------------: | :-------------------: | :------------: | :----------------------------: | :-------:
-[mobilenetv2_coco_cityscapes_trainfine](http://download.tensorflow.org/models/deeplabv3_mnv2_cityscapes_train_2018_02_05.tar.gz) | 16 8 | [1.0] [0.75:0.25:1.25] | No Yes | 21.27B 433.24B | 0.8 51.12 | 70.71% (val) 73.57% (val) | 23MB
-[mobilenetv3_large_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_mnv3_large_cityscapes_trainfine_2019_11_15.tar.gz) | 32 | [1.0] | No | 15.95B | 0.6 | 72.41% (val) | 17MB
-[mobilenetv3_small_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_mnv3_small_cityscapes_trainfine_2019_11_15.tar.gz) | 32 | [1.0] | No | 4.63B | 0.4 | 68.99% (val) | 5MB
-[xception65_cityscapes_trainfine](http://download.tensorflow.org/models/deeplabv3_cityscapes_train_2018_02_06.tar.gz) | 16 8 | [1.0] [0.75:0.25:1.25] | No Yes | 418.64B 8677.92B | 5.0 422.8 | 78.79% (val) 80.42% (val) | 439MB
-[xception71_dpc_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_cityscapes_xception71_trainfine_2018_09_08.tar.gz) | 16 | [1.0] | No | 502.07B | - | 80.31% (val) | 445MB
-[xception71_dpc_cityscapes_trainval](http://download.tensorflow.org/models/deeplab_cityscapes_xception71_trainvalfine_2018_09_08.tar.gz) | 8 | [0.75:0.25:2] | Yes | - | - | 82.66% (**test**) | 446MB
-
-### EdgeTPU-DeepLab models on Cityscapes
-
-EdgeTPU is Google's machine learning accelerator architecture for edge devices
-(exists in Coral devices and Pixel4's Neural Core). Leveraging nerual
-architecture search (NAS, also named as Auto-ML) algorithms,
-[EdgeTPU-Mobilenet](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet)
-has been released which yields higher hardware utilization, lower latency, as
-well as better accuracy over Mobilenet-v2/v3. We use EdgeTPU-Mobilenet as the
-backbone and provide checkpoints that have been pretrained on Cityscapes
-train_fine set. We named them as EdgeTPU-DeepLab models.
-
-Checkpoint name | Network backbone | Pretrained dataset | ASPP | Decoder
--------------------- | :----------------: | :----------------: | :--: | :-----:
-EdgeTPU-DeepLab | EdgeMobilenet-1.0 | ImageNet | N/A | N/A
-EdgeTPU-DeepLab-slim | EdgeMobilenet-0.75 | ImageNet | N/A | N/A
-
-For EdgeTPU-DeepLab-slim, the backbone feature extractor has depth multiplier =
-0.75 and aspp_convs_filters = 128. We do not employ ASPP nor decoder modules to
-further reduce the latency. We employ the same train/eval flags used for
-MobileNet-v2 DeepLab model. Flags changed for EdgeTPU-DeepLab model are listed
-here.
-
-```
---decoder_output_stride=''
---aspp_convs_filters=256
---model_variant=mobilenet_edgetpu
-```
-
-For EdgeTPU-DeepLab-slim, also include the following flags.
-
-```
---depth_multiplier=0.75
---aspp_convs_filters=128
-```
-
-Checkpoint name | Eval OS | Eval scales | Cityscapes mIOU | Multiply-Adds | Simulator latency on Pixel 4 EdgeTPU
----------------------------------------------------------------------------------------------------- | :--------: | :---------: | :--------------------------: | :------------: | :----------------------------------:
-[EdgeTPU-DeepLab](http://download.tensorflow.org/models/edgetpu-deeplab_2020_03_09.tar.gz) | 32 16 | [1.0] | 70.6% (val) 74.1% (val) | 5.6B 7.1B | 13.8 ms 17.5 ms
-[EdgeTPU-DeepLab-slim](http://download.tensorflow.org/models/edgetpu-deeplab-slim_2020_03_09.tar.gz) | 32 16 | [1.0] | 70.0% (val) 73.2% (val) | 3.5B 4.3B | 9.9 ms 13.2 ms
-
-## DeepLab models trained on ADE20K
-
-### Model details
-
-We provide some checkpoints that have been pretrained on ADE20K training set.
-Note that the model has only been pretrained on ImageNet, following the
-dataset rule.
-
-Checkpoint name | Network backbone | Pretrained dataset | ASPP | Decoder | Input size
-------------------------------------- | :--------------: | :-------------------------------------: | :----------------------------------------------: | :-----: | :-----:
-mobilenetv2_ade20k_train | MobileNet-v2 | ImageNet ADE20K training set | N/A | OS = 4 | 257x257
-xception65_ade20k_train | Xception_65 | ImageNet ADE20K training set | [6, 12, 18] for OS=16 [12, 24, 36] for OS=8 | OS = 4 | 513x513
-
-The input dimensions of ADE20K have a huge amount of variation. We resize inputs so that the longest size is 257 for MobileNet-v2 (faster inference) and 513 for Xception_65 (better performation). Note that we also include the decoder module in the MobileNet-v2 checkpoint.
-
-Checkpoint name | Eval OS | Eval scales | Left-right Flip | mIOU | Pixel-wise Accuracy | File Size
-------------------------------------- | :-------: | :-------------------------: | :-------------: | :-------------------: | :-------------------: | :-------:
-[mobilenetv2_ade20k_train](http://download.tensorflow.org/models/deeplabv3_mnv2_ade20k_train_2018_12_03.tar.gz) | 16 | [1.0] | No | 32.04% (val) | 75.41% (val) | 24.8MB
-[xception65_ade20k_train](http://download.tensorflow.org/models/deeplabv3_xception_ade20k_train_2018_05_29.tar.gz) | 8 | [0.5:0.25:1.75] | Yes | 45.65% (val) | 82.52% (val) | 439MB
-
-
-## Checkpoints pretrained on ImageNet
-
-Un-tar'ed directory includes:
-
-* model checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`).
-
-### Model details
-
-We also provide some checkpoints that are pretrained on ImageNet and/or COCO (as
-post-fixed in the model name) so that one could use this for training your own
-models.
-
-* mobilenet_v2: We refer the interested users to the TensorFlow open source
- [MobileNet-V2](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet)
- for details.
-
-* xception_{41,65,71}: We adapt the original Xception model to the task of
- semantic segmentation with the following changes: (1) more layers, (2) all
- max pooling operations are replaced by strided (atrous) separable
- convolutions, and (3) extra batch-norm and ReLU after each 3x3 depthwise
- convolution are added. We provide three Xception model variants with
- different network depths.
-
-* resnet_v1_{50,101}_beta: We modify the original ResNet-101 [10], similar to
- PSPNet [11] by replacing the first 7x7 convolution with three 3x3
- convolutions. See resnet_v1_beta.py for more details.
-
-Model name | File Size
--------------------------------------------------------------------------------------- | :-------:
-[xception_41_imagenet](http://download.tensorflow.org/models/xception_41_2018_05_09.tar.gz ) | 288MB
-[xception_65_imagenet](http://download.tensorflow.org/models/deeplabv3_xception_2018_01_04.tar.gz) | 447MB
-[xception_65_imagenet_coco](http://download.tensorflow.org/models/xception_65_coco_pretrained_2018_10_02.tar.gz) | 292MB
-[xception_71_imagenet](http://download.tensorflow.org/models/xception_71_2018_05_09.tar.gz ) | 474MB
-[resnet_v1_50_beta_imagenet](http://download.tensorflow.org/models/resnet_v1_50_2018_05_04.tar.gz) | 274MB
-[resnet_v1_101_beta_imagenet](http://download.tensorflow.org/models/resnet_v1_101_2018_05_04.tar.gz) | 477MB
-
-## References
-
-1. **Mobilenets: Efficient convolutional neural networks for mobile vision applications**
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
- [[link]](https://arxiv.org/abs/1704.04861). arXiv:1704.04861, 2017.
-
-2. **Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation**
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
- [[link]](https://arxiv.org/abs/1801.04381). arXiv:1801.04381, 2018.
-
-3. **Xception: Deep Learning with Depthwise Separable Convolutions**
- François Chollet
- [[link]](https://arxiv.org/abs/1610.02357). In the Proc. of CVPR, 2017.
-
-4. **Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry**
- Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai
- [[link]](http://presentations.cocodataset.org/COCO17-Detect-MSRA.pdf). ICCV COCO Challenge
- Workshop, 2017.
-
-5. **The Pascal Visual Object Classes Challenge: A Retrospective**
- Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John M. Winn, Andrew Zisserman
- [[link]](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/). IJCV, 2014.
-
-6. **Semantic Contours from Inverse Detectors**
- Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik
- [[link]](http://home.bharathh.info/pubs/codes/SBD/download.html). In the Proc. of ICCV, 2011.
-
-7. **The Cityscapes Dataset for Semantic Urban Scene Understanding**
- Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
- [[link]](https://www.cityscapes-dataset.com/). In the Proc. of CVPR, 2016.
-
-8. **Microsoft COCO: Common Objects in Context**
- Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar
- [[link]](http://cocodataset.org/). In the Proc. of ECCV, 2014.
-
-9. **ImageNet Large Scale Visual Recognition Challenge**
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei
- [[link]](http://www.image-net.org/). IJCV, 2015.
-
-10. **Deep Residual Learning for Image Recognition**
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
- [[link]](https://arxiv.org/abs/1512.03385). CVPR, 2016.
-
-11. **Pyramid Scene Parsing Network**
- Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
- [[link]](https://arxiv.org/abs/1612.01105). In CVPR, 2017.
-
-12. **Scene Parsing through ADE20K Dataset**
- Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba
- [[link]](http://groups.csail.mit.edu/vision/datasets/ADE20K/). In CVPR,
- 2017.
-
-13. **Searching for MobileNetV3**
- Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
- [[link]](https://arxiv.org/abs/1905.02244). In ICCV, 2019.
diff --git a/research/deeplab/g3doc/pascal.md b/research/deeplab/g3doc/pascal.md
deleted file mode 100644
index f4bc84eabb8..00000000000
--- a/research/deeplab/g3doc/pascal.md
+++ /dev/null
@@ -1,161 +0,0 @@
-# Running DeepLab on PASCAL VOC 2012 Semantic Segmentation Dataset
-
-This page walks through the steps required to run DeepLab on PASCAL VOC 2012 on
-a local machine.
-
-## Download dataset and convert to TFRecord
-
-We have prepared the script (under the folder `datasets`) to download and
-convert PASCAL VOC 2012 semantic segmentation dataset to TFRecord.
-
-```bash
-# From the tensorflow/models/research/deeplab/datasets directory.
-sh download_and_convert_voc2012.sh
-```
-
-The converted dataset will be saved at
-./deeplab/datasets/pascal_voc_seg/tfrecord
-
-## Recommended Directory Structure for Training and Evaluation
-
-```
-+ datasets
- + pascal_voc_seg
- + VOCdevkit
- + VOC2012
- + JPEGImages
- + SegmentationClass
- + tfrecord
- + exp
- + train_on_train_set
- + train
- + eval
- + vis
-```
-
-where the folder `train_on_train_set` stores the train/eval/vis events and
-results (when training DeepLab on the PASCAL VOC 2012 train set).
-
-## Running the train/eval/vis jobs
-
-A local training job using `xception_65` can be run with the following command:
-
-```bash
-# From tensorflow/models/research/
-python deeplab/train.py \
- --logtostderr \
- --training_number_of_steps=30000 \
- --train_split="train" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --train_crop_size="513,513" \
- --train_batch_size=1 \
- --dataset="pascal_voc_seg" \
- --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
- --train_logdir=${PATH_TO_TRAIN_DIR} \
- --dataset_dir=${PATH_TO_DATASET}
-```
-
-where ${PATH_TO_INITIAL_CHECKPOINT} is the path to the initial checkpoint
-(usually an ImageNet pretrained checkpoint), ${PATH_TO_TRAIN_DIR} is the
-directory in which training checkpoints and events will be written to, and
-${PATH_TO_DATASET} is the directory in which the PASCAL VOC 2012 dataset
-resides.
-
-**Note that for {train,eval,vis}.py:**
-
-1. In order to reproduce our results, one needs to use large batch size (> 12),
- and set fine_tune_batch_norm = True. Here, we simply use small batch size
- during training for the purpose of demonstration. If the users have limited
- GPU memory at hand, please fine-tune from our provided checkpoints whose
- batch norm parameters have been trained, and use smaller learning rate with
- fine_tune_batch_norm = False.
-
-2. The users should change atrous_rates from [6, 12, 18] to [12, 24, 36] if
- setting output_stride=8.
-
-3. The users could skip the flag, `decoder_output_stride`, if you do not want
- to use the decoder structure.
-
-A local evaluation job using `xception_65` can be run with the following
-command:
-
-```bash
-# From tensorflow/models/research/
-python deeplab/eval.py \
- --logtostderr \
- --eval_split="val" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --eval_crop_size="513,513" \
- --dataset="pascal_voc_seg" \
- --checkpoint_dir=${PATH_TO_CHECKPOINT} \
- --eval_logdir=${PATH_TO_EVAL_DIR} \
- --dataset_dir=${PATH_TO_DATASET}
-```
-
-where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the
-path to train_logdir), ${PATH_TO_EVAL_DIR} is the directory in which evaluation
-events will be written to, and ${PATH_TO_DATASET} is the directory in which the
-PASCAL VOC 2012 dataset resides.
-
-A local visualization job using `xception_65` can be run with the following
-command:
-
-```bash
-# From tensorflow/models/research/
-python deeplab/vis.py \
- --logtostderr \
- --vis_split="val" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --vis_crop_size="513,513" \
- --dataset="pascal_voc_seg" \
- --checkpoint_dir=${PATH_TO_CHECKPOINT} \
- --vis_logdir=${PATH_TO_VIS_DIR} \
- --dataset_dir=${PATH_TO_DATASET}
-```
-
-where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the
-path to train_logdir), ${PATH_TO_VIS_DIR} is the directory in which evaluation
-events will be written to, and ${PATH_TO_DATASET} is the directory in which the
-PASCAL VOC 2012 dataset resides. Note that if the users would like to save the
-segmentation results for evaluation server, set also_save_raw_predictions =
-True.
-
-## Running Tensorboard
-
-Progress for training and evaluation jobs can be inspected using Tensorboard. If
-using the recommended directory structure, Tensorboard can be run using the
-following command:
-
-```bash
-tensorboard --logdir=${PATH_TO_LOG_DIRECTORY}
-```
-
-where `${PATH_TO_LOG_DIRECTORY}` points to the directory that contains the
-train, eval, and vis directories (e.g., the folder `train_on_train_set` in the
-above example). Please note it may take Tensorboard a couple minutes to populate
-with data.
-
-## Example
-
-We provide a script to run the {train,eval,vis,export_model}.py on the PASCAL VOC
-2012 dataset as an example. See the code in local_test.sh for details.
-
-```bash
-# From tensorflow/models/research/deeplab
-sh local_test.sh
-```
diff --git a/research/deeplab/g3doc/quantize.md b/research/deeplab/g3doc/quantize.md
deleted file mode 100644
index 65dbdd70b4d..00000000000
--- a/research/deeplab/g3doc/quantize.md
+++ /dev/null
@@ -1,103 +0,0 @@
-# Quantize DeepLab model for faster on-device inference
-
-This page describes the steps required to quantize DeepLab model and convert it
-to TFLite for on-device inference. The main steps include:
-
-1. Quantization-aware training
-1. Exporting model
-1. Converting to TFLite FlatBuffer
-
-We provide details for each step below.
-
-## Quantization-aware training
-
-DeepLab supports two approaches to quantize your model.
-
-1. **[Recommended]** Training a non-quantized model until convergence. Then
- fine-tune the trained float model with quantization using a small learning
- rate (on PASCAL we use the value of 3e-5) . This fine-tuning step usually
- takes 2k to 5k steps to converge.
-
-1. Training a deeplab float model with delayed quantization. Usually we delay
- quantization until the last a few thousand steps in training.
-
-In the current implementation, quantization is only supported with 1)
-`num_clones=1` for training and 2) single scale inference for evaluation,
-visualization and model export. To get the best performance for the quantized
-model, we strongly recommend to train the float model with larger `num_clones`
-and then fine-tune the model with a single clone.
-
-Here shows the commandline to quantize deeplab model trained on PASCAL VOC
-dataset using fine-tuning:
-
-```
-# From tensorflow/models/research/
-python deeplab/train.py \
- --logtostderr \
- --training_number_of_steps=3000 \
- --train_split="train" \
- --model_variant="mobilenet_v2" \
- --output_stride=16 \
- --train_crop_size="513,513" \
- --train_batch_size=8 \
- --base_learning_rate=3e-5 \
- --dataset="pascal_voc_seg" \
- --quantize_delay_step=0 \
- --tf_initial_checkpoint=${PATH_TO_TRAINED_FLOAT_MODEL} \
- --train_logdir=${PATH_TO_TRAIN_DIR} \
- --dataset_dir=${PATH_TO_DATASET}
-```
-
-## Converting to TFLite FlatBuffer
-
-First use the following commandline to export your trained model.
-
-```
-# From tensorflow/models/research/
-python deeplab/export_model.py \
- --checkpoint_path=${CHECKPOINT_PATH} \
- --quantize_delay_step=0 \
- --export_path=${OUTPUT_DIR}/frozen_inference_graph.pb
-
-```
-
-Commandline below shows how to convert exported graphdef to TFlite model.
-
-```
-# From tensorflow/models/research/
-python deeplab/convert_to_tflite.py \
- --quantized_graph_def_path=${OUTPUT_DIR}/frozen_inference_graph.pb \
- --input_tensor_name=MobilenetV2/MobilenetV2/input:0 \
- --output_tflite_path=${OUTPUT_DIR}/frozen_inference_graph.tflite \
- --test_image_path=${PATH_TO_TEST_IMAGE}
-```
-
-**[Important]** Note that converted model expects 513x513 RGB input and doesn't
-include preprocessing (resize and pad input image) and post processing (crop
-padded region and resize to original input size). These steps can be implemented
-outside of TFlite model.
-
-## Quantized model on PASCAL VOC
-
-We provide float and quantized checkpoints that have been pretrained on VOC 2012
-train_aug set, using MobileNet-v2 backbone with different depth multipliers.
-Quantized model usually have 1% decay in mIoU.
-
-For quantized (8bit) model, un-tar'ed directory includes:
-
-* a frozen inference graph (frozen_inference_graph.pb)
-
-* a checkpoint (model.ckpt.data*, model.ckpt.index)
-
-* a converted TFlite FlatBuffer file (frozen_inference_graph.tflite)
-
-Checkpoint name | Eval OS | Eval scales | Left-right Flip | Multiply-Adds | Quantize | PASCAL mIOU | Folder Size | TFLite File Size
--------------------------------------------------------------------------------------------------------------------------------------------- | :-----: | :---------: | :-------------: | :-----------: | :------: | :----------: | :-------: | :-------:
-[mobilenetv2_dm05_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz) | 16 | [1.0] | No | 0.88B | No | 70.19% (val) | 7.6MB | N/A
-[mobilenetv2_dm05_coco_voc_trainaug_8bit](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_train_aug_8bit_2019_04_26.tar.gz) | 16 | [1.0] | No | 0.88B | Yes | 69.65% (val) | 8.2MB | 751.1KB
-[mobilenetv2_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz) | 16 | [1.0] | No | 2.75B | No | 75.32% (val) | 23MB | N/A
-[mobilenetv2_coco_voc_trainaug_8bit](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_8bit_2019_04_26.tar.gz) | 16 | [1.0] | No | 2.75B | Yes | 74.26% (val) | 24MB | 2.2MB
-
-Note that you might need the nightly build of TensorFlow (see
-[here](https://www.tensorflow.org/install) for install instructions) to convert
-above quantized model to TFLite.
diff --git a/research/deeplab/input_preprocess.py b/research/deeplab/input_preprocess.py
deleted file mode 100644
index 9ca8bce4eb9..00000000000
--- a/research/deeplab/input_preprocess.py
+++ /dev/null
@@ -1,139 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Prepares the data used for DeepLab training/evaluation."""
-import tensorflow as tf
-from deeplab.core import feature_extractor
-from deeplab.core import preprocess_utils
-
-
-# The probability of flipping the images and labels
-# left-right during training
-_PROB_OF_FLIP = 0.5
-
-
-def preprocess_image_and_label(image,
- label,
- crop_height,
- crop_width,
- min_resize_value=None,
- max_resize_value=None,
- resize_factor=None,
- min_scale_factor=1.,
- max_scale_factor=1.,
- scale_factor_step_size=0,
- ignore_label=255,
- is_training=True,
- model_variant=None):
- """Preprocesses the image and label.
-
- Args:
- image: Input image.
- label: Ground truth annotation label.
- crop_height: The height value used to crop the image and label.
- crop_width: The width value used to crop the image and label.
- min_resize_value: Desired size of the smaller image side.
- max_resize_value: Maximum allowed size of the larger image side.
- resize_factor: Resized dimensions are multiple of factor plus one.
- min_scale_factor: Minimum scale factor value.
- max_scale_factor: Maximum scale factor value.
- scale_factor_step_size: The step size from min scale factor to max scale
- factor. The input is randomly scaled based on the value of
- (min_scale_factor, max_scale_factor, scale_factor_step_size).
- ignore_label: The label value which will be ignored for training and
- evaluation.
- is_training: If the preprocessing is used for training or not.
- model_variant: Model variant (string) for choosing how to mean-subtract the
- images. See feature_extractor.network_map for supported model variants.
-
- Returns:
- original_image: Original image (could be resized).
- processed_image: Preprocessed image.
- label: Preprocessed ground truth segmentation label.
-
- Raises:
- ValueError: Ground truth label not provided during training.
- """
- if is_training and label is None:
- raise ValueError('During training, label must be provided.')
- if model_variant is None:
- tf.logging.warning('Default mean-subtraction is performed. Please specify '
- 'a model_variant. See feature_extractor.network_map for '
- 'supported model variants.')
-
- # Keep reference to original image.
- original_image = image
-
- processed_image = tf.cast(image, tf.float32)
-
- if label is not None:
- label = tf.cast(label, tf.int32)
-
- # Resize image and label to the desired range.
- if min_resize_value or max_resize_value:
- [processed_image, label] = (
- preprocess_utils.resize_to_range(
- image=processed_image,
- label=label,
- min_size=min_resize_value,
- max_size=max_resize_value,
- factor=resize_factor,
- align_corners=True))
- # The `original_image` becomes the resized image.
- original_image = tf.identity(processed_image)
-
- # Data augmentation by randomly scaling the inputs.
- if is_training:
- scale = preprocess_utils.get_random_scale(
- min_scale_factor, max_scale_factor, scale_factor_step_size)
- processed_image, label = preprocess_utils.randomly_scale_image_and_label(
- processed_image, label, scale)
- processed_image.set_shape([None, None, 3])
-
- # Pad image and label to have dimensions >= [crop_height, crop_width]
- image_shape = tf.shape(processed_image)
- image_height = image_shape[0]
- image_width = image_shape[1]
-
- target_height = image_height + tf.maximum(crop_height - image_height, 0)
- target_width = image_width + tf.maximum(crop_width - image_width, 0)
-
- # Pad image with mean pixel value.
- mean_pixel = tf.reshape(
- feature_extractor.mean_pixel(model_variant), [1, 1, 3])
- processed_image = preprocess_utils.pad_to_bounding_box(
- processed_image, 0, 0, target_height, target_width, mean_pixel)
-
- if label is not None:
- label = preprocess_utils.pad_to_bounding_box(
- label, 0, 0, target_height, target_width, ignore_label)
-
- # Randomly crop the image and label.
- if is_training and label is not None:
- processed_image, label = preprocess_utils.random_crop(
- [processed_image, label], crop_height, crop_width)
-
- processed_image.set_shape([crop_height, crop_width, 3])
-
- if label is not None:
- label.set_shape([crop_height, crop_width, 1])
-
- if is_training:
- # Randomly left-right flip the image and label.
- processed_image, label, _ = preprocess_utils.flip_dim(
- [processed_image, label], _PROB_OF_FLIP, dim=1)
-
- return original_image, processed_image, label
diff --git a/research/deeplab/local_test.sh b/research/deeplab/local_test.sh
deleted file mode 100644
index c9ad75f6928..00000000000
--- a/research/deeplab/local_test.sh
+++ /dev/null
@@ -1,147 +0,0 @@
-#!/bin/bash
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-#
-# This script is used to run local test on PASCAL VOC 2012. Users could also
-# modify from this script for their use case.
-#
-# Usage:
-# # From the tensorflow/models/research/deeplab directory.
-# bash ./local_test.sh
-#
-#
-
-# Exit immediately if a command exits with a non-zero status.
-set -e
-
-# Move one-level up to tensorflow/models/research directory.
-cd ..
-
-# Update PYTHONPATH.
-export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
-
-# Set up the working environment.
-CURRENT_DIR=$(pwd)
-WORK_DIR="${CURRENT_DIR}/deeplab"
-
-# Run model_test first to make sure the PYTHONPATH is correctly set.
-python "${WORK_DIR}"/model_test.py
-
-# Go to datasets folder and download PASCAL VOC 2012 segmentation dataset.
-DATASET_DIR="datasets"
-cd "${WORK_DIR}/${DATASET_DIR}"
-bash download_and_convert_voc2012.sh
-
-# Go back to original directory.
-cd "${CURRENT_DIR}"
-
-# Set up the working directories.
-PASCAL_FOLDER="pascal_voc_seg"
-EXP_FOLDER="exp/train_on_trainval_set"
-INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/init_models"
-TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/train"
-EVAL_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/eval"
-VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/vis"
-EXPORT_DIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/export"
-mkdir -p "${INIT_FOLDER}"
-mkdir -p "${TRAIN_LOGDIR}"
-mkdir -p "${EVAL_LOGDIR}"
-mkdir -p "${VIS_LOGDIR}"
-mkdir -p "${EXPORT_DIR}"
-
-# Copy locally the trained checkpoint as the initial checkpoint.
-TF_INIT_ROOT="http://download.tensorflow.org/models"
-TF_INIT_CKPT="deeplabv3_pascal_train_aug_2018_01_04.tar.gz"
-cd "${INIT_FOLDER}"
-wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
-tar -xf "${TF_INIT_CKPT}"
-cd "${CURRENT_DIR}"
-
-PASCAL_DATASET="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/tfrecord"
-
-# Train 10 iterations.
-NUM_ITERATIONS=10
-python "${WORK_DIR}"/train.py \
- --logtostderr \
- --train_split="trainval" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --train_crop_size="513,513" \
- --train_batch_size=4 \
- --training_number_of_steps="${NUM_ITERATIONS}" \
- --fine_tune_batch_norm=true \
- --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
- --train_logdir="${TRAIN_LOGDIR}" \
- --dataset_dir="${PASCAL_DATASET}"
-
-# Run evaluation. This performs eval over the full val split (1449 images) and
-# will take a while.
-# Using the provided checkpoint, one should expect mIOU=82.20%.
-python "${WORK_DIR}"/eval.py \
- --logtostderr \
- --eval_split="val" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --eval_crop_size="513,513" \
- --checkpoint_dir="${TRAIN_LOGDIR}" \
- --eval_logdir="${EVAL_LOGDIR}" \
- --dataset_dir="${PASCAL_DATASET}" \
- --max_number_of_evaluations=1
-
-# Visualize the results.
-python "${WORK_DIR}"/vis.py \
- --logtostderr \
- --vis_split="val" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --vis_crop_size="513,513" \
- --checkpoint_dir="${TRAIN_LOGDIR}" \
- --vis_logdir="${VIS_LOGDIR}" \
- --dataset_dir="${PASCAL_DATASET}" \
- --max_number_of_iterations=1
-
-# Export the trained checkpoint.
-CKPT_PATH="${TRAIN_LOGDIR}/model.ckpt-${NUM_ITERATIONS}"
-EXPORT_PATH="${EXPORT_DIR}/frozen_inference_graph.pb"
-
-python "${WORK_DIR}"/export_model.py \
- --logtostderr \
- --checkpoint_path="${CKPT_PATH}" \
- --export_path="${EXPORT_PATH}" \
- --model_variant="xception_65" \
- --atrous_rates=6 \
- --atrous_rates=12 \
- --atrous_rates=18 \
- --output_stride=16 \
- --decoder_output_stride=4 \
- --num_classes=21 \
- --crop_size=513 \
- --crop_size=513 \
- --inference_scales=1.0
-
-# Run inference with the exported checkpoint.
-# Please refer to the provided deeplab_demo.ipynb for an example.
diff --git a/research/deeplab/local_test_mobilenetv2.sh b/research/deeplab/local_test_mobilenetv2.sh
deleted file mode 100644
index c38646fdf6c..00000000000
--- a/research/deeplab/local_test_mobilenetv2.sh
+++ /dev/null
@@ -1,129 +0,0 @@
-#!/bin/bash
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-#
-# This script is used to run local test on PASCAL VOC 2012 using MobileNet-v2.
-# Users could also modify from this script for their use case.
-#
-# Usage:
-# # From the tensorflow/models/research/deeplab directory.
-# sh ./local_test_mobilenetv2.sh
-#
-#
-
-# Exit immediately if a command exits with a non-zero status.
-set -e
-
-# Move one-level up to tensorflow/models/research directory.
-cd ..
-
-# Update PYTHONPATH.
-export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
-
-# Set up the working environment.
-CURRENT_DIR=$(pwd)
-WORK_DIR="${CURRENT_DIR}/deeplab"
-
-# Run model_test first to make sure the PYTHONPATH is correctly set.
-python "${WORK_DIR}"/model_test.py -v
-
-# Go to datasets folder and download PASCAL VOC 2012 segmentation dataset.
-DATASET_DIR="datasets"
-cd "${WORK_DIR}/${DATASET_DIR}"
-sh download_and_convert_voc2012.sh
-
-# Go back to original directory.
-cd "${CURRENT_DIR}"
-
-# Set up the working directories.
-PASCAL_FOLDER="pascal_voc_seg"
-EXP_FOLDER="exp/train_on_trainval_set_mobilenetv2"
-INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/init_models"
-TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/train"
-EVAL_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/eval"
-VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/vis"
-EXPORT_DIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/export"
-mkdir -p "${INIT_FOLDER}"
-mkdir -p "${TRAIN_LOGDIR}"
-mkdir -p "${EVAL_LOGDIR}"
-mkdir -p "${VIS_LOGDIR}"
-mkdir -p "${EXPORT_DIR}"
-
-# Copy locally the trained checkpoint as the initial checkpoint.
-TF_INIT_ROOT="http://download.tensorflow.org/models"
-CKPT_NAME="deeplabv3_mnv2_pascal_train_aug"
-TF_INIT_CKPT="${CKPT_NAME}_2018_01_29.tar.gz"
-cd "${INIT_FOLDER}"
-wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
-tar -xf "${TF_INIT_CKPT}"
-cd "${CURRENT_DIR}"
-
-PASCAL_DATASET="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/tfrecord"
-
-# Train 10 iterations.
-NUM_ITERATIONS=10
-python "${WORK_DIR}"/train.py \
- --logtostderr \
- --train_split="trainval" \
- --model_variant="mobilenet_v2" \
- --output_stride=16 \
- --train_crop_size="513,513" \
- --train_batch_size=4 \
- --training_number_of_steps="${NUM_ITERATIONS}" \
- --fine_tune_batch_norm=true \
- --tf_initial_checkpoint="${INIT_FOLDER}/${CKPT_NAME}/model.ckpt-30000" \
- --train_logdir="${TRAIN_LOGDIR}" \
- --dataset_dir="${PASCAL_DATASET}"
-
-# Run evaluation. This performs eval over the full val split (1449 images) and
-# will take a while.
-# Using the provided checkpoint, one should expect mIOU=75.34%.
-python "${WORK_DIR}"/eval.py \
- --logtostderr \
- --eval_split="val" \
- --model_variant="mobilenet_v2" \
- --eval_crop_size="513,513" \
- --checkpoint_dir="${TRAIN_LOGDIR}" \
- --eval_logdir="${EVAL_LOGDIR}" \
- --dataset_dir="${PASCAL_DATASET}" \
- --max_number_of_evaluations=1
-
-# Visualize the results.
-python "${WORK_DIR}"/vis.py \
- --logtostderr \
- --vis_split="val" \
- --model_variant="mobilenet_v2" \
- --vis_crop_size="513,513" \
- --checkpoint_dir="${TRAIN_LOGDIR}" \
- --vis_logdir="${VIS_LOGDIR}" \
- --dataset_dir="${PASCAL_DATASET}" \
- --max_number_of_iterations=1
-
-# Export the trained checkpoint.
-CKPT_PATH="${TRAIN_LOGDIR}/model.ckpt-${NUM_ITERATIONS}"
-EXPORT_PATH="${EXPORT_DIR}/frozen_inference_graph.pb"
-
-python "${WORK_DIR}"/export_model.py \
- --logtostderr \
- --checkpoint_path="${CKPT_PATH}" \
- --export_path="${EXPORT_PATH}" \
- --model_variant="mobilenet_v2" \
- --num_classes=21 \
- --crop_size=513 \
- --crop_size=513 \
- --inference_scales=1.0
-
-# Run inference with the exported checkpoint.
-# Please refer to the provided deeplab_demo.ipynb for an example.
diff --git a/research/deeplab/model.py b/research/deeplab/model.py
deleted file mode 100644
index 311aaa1acb1..00000000000
--- a/research/deeplab/model.py
+++ /dev/null
@@ -1,911 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-r"""Provides DeepLab model definition and helper functions.
-
-DeepLab is a deep learning system for semantic image segmentation with
-the following features:
-
-(1) Atrous convolution to explicitly control the resolution at which
-feature responses are computed within Deep Convolutional Neural Networks.
-
-(2) Atrous spatial pyramid pooling (ASPP) to robustly segment objects at
-multiple scales with filters at multiple sampling rates and effective
-fields-of-views.
-
-(3) ASPP module augmented with image-level feature and batch normalization.
-
-(4) A simple yet effective decoder module to recover the object boundaries.
-
-See the following papers for more details:
-
-"Encoder-Decoder with Atrous Separable Convolution for Semantic Image
-Segmentation"
-Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam.
-(https://arxiv.org/abs/1802.02611)
-
-"Rethinking Atrous Convolution for Semantic Image Segmentation,"
-Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam
-(https://arxiv.org/abs/1706.05587)
-
-"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets,
-Atrous Convolution, and Fully Connected CRFs",
-Liang-Chieh Chen*, George Papandreou*, Iasonas Kokkinos, Kevin Murphy,
-Alan L Yuille (* equal contribution)
-(https://arxiv.org/abs/1606.00915)
-
-"Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected
-CRFs"
-Liang-Chieh Chen*, George Papandreou*, Iasonas Kokkinos, Kevin Murphy,
-Alan L. Yuille (* equal contribution)
-(https://arxiv.org/abs/1412.7062)
-"""
-import tensorflow as tf
-from tensorflow.contrib import slim as contrib_slim
-from deeplab.core import dense_prediction_cell
-from deeplab.core import feature_extractor
-from deeplab.core import utils
-
-slim = contrib_slim
-
-LOGITS_SCOPE_NAME = 'logits'
-MERGED_LOGITS_SCOPE = 'merged_logits'
-IMAGE_POOLING_SCOPE = 'image_pooling'
-ASPP_SCOPE = 'aspp'
-CONCAT_PROJECTION_SCOPE = 'concat_projection'
-DECODER_SCOPE = 'decoder'
-META_ARCHITECTURE_SCOPE = 'meta_architecture'
-
-PROB_SUFFIX = '_prob'
-
-_resize_bilinear = utils.resize_bilinear
-scale_dimension = utils.scale_dimension
-split_separable_conv2d = utils.split_separable_conv2d
-
-
-def get_extra_layer_scopes(last_layers_contain_logits_only=False):
- """Gets the scopes for extra layers.
-
- Args:
- last_layers_contain_logits_only: Boolean, True if only consider logits as
- the last layer (i.e., exclude ASPP module, decoder module and so on)
-
- Returns:
- A list of scopes for extra layers.
- """
- if last_layers_contain_logits_only:
- return [LOGITS_SCOPE_NAME]
- else:
- return [
- LOGITS_SCOPE_NAME,
- IMAGE_POOLING_SCOPE,
- ASPP_SCOPE,
- CONCAT_PROJECTION_SCOPE,
- DECODER_SCOPE,
- META_ARCHITECTURE_SCOPE,
- ]
-
-
-def predict_labels_multi_scale(images,
- model_options,
- eval_scales=(1.0,),
- add_flipped_images=False):
- """Predicts segmentation labels.
-
- Args:
- images: A tensor of size [batch, height, width, channels].
- model_options: A ModelOptions instance to configure models.
- eval_scales: The scales to resize images for evaluation.
- add_flipped_images: Add flipped images for evaluation or not.
-
- Returns:
- A dictionary with keys specifying the output_type (e.g., semantic
- prediction) and values storing Tensors representing predictions (argmax
- over channels). Each prediction has size [batch, height, width].
- """
- outputs_to_predictions = {
- output: []
- for output in model_options.outputs_to_num_classes
- }
-
- for i, image_scale in enumerate(eval_scales):
- with tf.variable_scope(tf.get_variable_scope(), reuse=True if i else None):
- outputs_to_scales_to_logits = multi_scale_logits(
- images,
- model_options=model_options,
- image_pyramid=[image_scale],
- is_training=False,
- fine_tune_batch_norm=False)
-
- if add_flipped_images:
- with tf.variable_scope(tf.get_variable_scope(), reuse=True):
- outputs_to_scales_to_logits_reversed = multi_scale_logits(
- tf.reverse_v2(images, [2]),
- model_options=model_options,
- image_pyramid=[image_scale],
- is_training=False,
- fine_tune_batch_norm=False)
-
- for output in sorted(outputs_to_scales_to_logits):
- scales_to_logits = outputs_to_scales_to_logits[output]
- logits = _resize_bilinear(
- scales_to_logits[MERGED_LOGITS_SCOPE],
- tf.shape(images)[1:3],
- scales_to_logits[MERGED_LOGITS_SCOPE].dtype)
- outputs_to_predictions[output].append(
- tf.expand_dims(tf.nn.softmax(logits), 4))
-
- if add_flipped_images:
- scales_to_logits_reversed = (
- outputs_to_scales_to_logits_reversed[output])
- logits_reversed = _resize_bilinear(
- tf.reverse_v2(scales_to_logits_reversed[MERGED_LOGITS_SCOPE], [2]),
- tf.shape(images)[1:3],
- scales_to_logits_reversed[MERGED_LOGITS_SCOPE].dtype)
- outputs_to_predictions[output].append(
- tf.expand_dims(tf.nn.softmax(logits_reversed), 4))
-
- for output in sorted(outputs_to_predictions):
- predictions = outputs_to_predictions[output]
- # Compute average prediction across different scales and flipped images.
- predictions = tf.reduce_mean(tf.concat(predictions, 4), axis=4)
- outputs_to_predictions[output] = tf.argmax(predictions, 3)
- outputs_to_predictions[output + PROB_SUFFIX] = tf.nn.softmax(predictions)
-
- return outputs_to_predictions
-
-
-def predict_labels(images, model_options, image_pyramid=None):
- """Predicts segmentation labels.
-
- Args:
- images: A tensor of size [batch, height, width, channels].
- model_options: A ModelOptions instance to configure models.
- image_pyramid: Input image scales for multi-scale feature extraction.
-
- Returns:
- A dictionary with keys specifying the output_type (e.g., semantic
- prediction) and values storing Tensors representing predictions (argmax
- over channels). Each prediction has size [batch, height, width].
- """
- outputs_to_scales_to_logits = multi_scale_logits(
- images,
- model_options=model_options,
- image_pyramid=image_pyramid,
- is_training=False,
- fine_tune_batch_norm=False)
-
- predictions = {}
- for output in sorted(outputs_to_scales_to_logits):
- scales_to_logits = outputs_to_scales_to_logits[output]
- logits = scales_to_logits[MERGED_LOGITS_SCOPE]
- # There are two ways to obtain the final prediction results: (1) bilinear
- # upsampling the logits followed by argmax, or (2) argmax followed by
- # nearest neighbor upsampling. The second option may introduce the "blocking
- # effect" but is computationally efficient.
- if model_options.prediction_with_upsampled_logits:
- logits = _resize_bilinear(logits,
- tf.shape(images)[1:3],
- scales_to_logits[MERGED_LOGITS_SCOPE].dtype)
- predictions[output] = tf.argmax(logits, 3)
- predictions[output + PROB_SUFFIX] = tf.nn.softmax(logits)
- else:
- argmax_results = tf.argmax(logits, 3)
- argmax_results = tf.image.resize_nearest_neighbor(
- tf.expand_dims(argmax_results, 3),
- tf.shape(images)[1:3],
- align_corners=True,
- name='resize_prediction')
- predictions[output] = tf.squeeze(argmax_results, 3)
- predictions[output + PROB_SUFFIX] = tf.image.resize_bilinear(
- tf.nn.softmax(logits),
- tf.shape(images)[1:3],
- align_corners=True,
- name='resize_prob')
- return predictions
-
-
-def multi_scale_logits(images,
- model_options,
- image_pyramid,
- weight_decay=0.0001,
- is_training=False,
- fine_tune_batch_norm=False,
- nas_training_hyper_parameters=None):
- """Gets the logits for multi-scale inputs.
-
- The returned logits are all downsampled (due to max-pooling layers)
- for both training and evaluation.
-
- Args:
- images: A tensor of size [batch, height, width, channels].
- model_options: A ModelOptions instance to configure models.
- image_pyramid: Input image scales for multi-scale feature extraction.
- weight_decay: The weight decay for model variables.
- is_training: Is training or not.
- fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
- nas_training_hyper_parameters: A dictionary storing hyper-parameters for
- training nas models. Its keys are:
- - `drop_path_keep_prob`: Probability to keep each path in the cell when
- training.
- - `total_training_steps`: Total training steps to help drop path
- probability calculation.
-
- Returns:
- outputs_to_scales_to_logits: A map of maps from output_type (e.g.,
- semantic prediction) to a dictionary of multi-scale logits names to
- logits. For each output_type, the dictionary has keys which
- correspond to the scales and values which correspond to the logits.
- For example, if `scales` equals [1.0, 1.5], then the keys would
- include 'merged_logits', 'logits_1.00' and 'logits_1.50'.
-
- Raises:
- ValueError: If model_options doesn't specify crop_size and its
- add_image_level_feature = True, since add_image_level_feature requires
- crop_size information.
- """
- # Setup default values.
- if not image_pyramid:
- image_pyramid = [1.0]
- crop_height = (
- model_options.crop_size[0]
- if model_options.crop_size else tf.shape(images)[1])
- crop_width = (
- model_options.crop_size[1]
- if model_options.crop_size else tf.shape(images)[2])
- if model_options.image_pooling_crop_size:
- image_pooling_crop_height = model_options.image_pooling_crop_size[0]
- image_pooling_crop_width = model_options.image_pooling_crop_size[1]
-
- # Compute the height, width for the output logits.
- if model_options.decoder_output_stride:
- logits_output_stride = min(model_options.decoder_output_stride)
- else:
- logits_output_stride = model_options.output_stride
-
- logits_height = scale_dimension(
- crop_height,
- max(1.0, max(image_pyramid)) / logits_output_stride)
- logits_width = scale_dimension(
- crop_width,
- max(1.0, max(image_pyramid)) / logits_output_stride)
-
- # Compute the logits for each scale in the image pyramid.
- outputs_to_scales_to_logits = {
- k: {}
- for k in model_options.outputs_to_num_classes
- }
-
- num_channels = images.get_shape().as_list()[-1]
-
- for image_scale in image_pyramid:
- if image_scale != 1.0:
- scaled_height = scale_dimension(crop_height, image_scale)
- scaled_width = scale_dimension(crop_width, image_scale)
- scaled_crop_size = [scaled_height, scaled_width]
- scaled_images = _resize_bilinear(images, scaled_crop_size, images.dtype)
- if model_options.crop_size:
- scaled_images.set_shape(
- [None, scaled_height, scaled_width, num_channels])
- # Adjust image_pooling_crop_size accordingly.
- scaled_image_pooling_crop_size = None
- if model_options.image_pooling_crop_size:
- scaled_image_pooling_crop_size = [
- scale_dimension(image_pooling_crop_height, image_scale),
- scale_dimension(image_pooling_crop_width, image_scale)]
- else:
- scaled_crop_size = model_options.crop_size
- scaled_images = images
- scaled_image_pooling_crop_size = model_options.image_pooling_crop_size
-
- updated_options = model_options._replace(
- crop_size=scaled_crop_size,
- image_pooling_crop_size=scaled_image_pooling_crop_size)
- outputs_to_logits = _get_logits(
- scaled_images,
- updated_options,
- weight_decay=weight_decay,
- reuse=tf.AUTO_REUSE,
- is_training=is_training,
- fine_tune_batch_norm=fine_tune_batch_norm,
- nas_training_hyper_parameters=nas_training_hyper_parameters)
-
- # Resize the logits to have the same dimension before merging.
- for output in sorted(outputs_to_logits):
- outputs_to_logits[output] = _resize_bilinear(
- outputs_to_logits[output], [logits_height, logits_width],
- outputs_to_logits[output].dtype)
-
- # Return when only one input scale.
- if len(image_pyramid) == 1:
- for output in sorted(model_options.outputs_to_num_classes):
- outputs_to_scales_to_logits[output][
- MERGED_LOGITS_SCOPE] = outputs_to_logits[output]
- return outputs_to_scales_to_logits
-
- # Save logits to the output map.
- for output in sorted(model_options.outputs_to_num_classes):
- outputs_to_scales_to_logits[output][
- 'logits_%.2f' % image_scale] = outputs_to_logits[output]
-
- # Merge the logits from all the multi-scale inputs.
- for output in sorted(model_options.outputs_to_num_classes):
- # Concatenate the multi-scale logits for each output type.
- all_logits = [
- tf.expand_dims(logits, axis=4)
- for logits in outputs_to_scales_to_logits[output].values()
- ]
- all_logits = tf.concat(all_logits, 4)
- merge_fn = (
- tf.reduce_max
- if model_options.merge_method == 'max' else tf.reduce_mean)
- outputs_to_scales_to_logits[output][MERGED_LOGITS_SCOPE] = merge_fn(
- all_logits, axis=4)
-
- return outputs_to_scales_to_logits
-
-
-def extract_features(images,
- model_options,
- weight_decay=0.0001,
- reuse=None,
- is_training=False,
- fine_tune_batch_norm=False,
- nas_training_hyper_parameters=None):
- """Extracts features by the particular model_variant.
-
- Args:
- images: A tensor of size [batch, height, width, channels].
- model_options: A ModelOptions instance to configure models.
- weight_decay: The weight decay for model variables.
- reuse: Reuse the model variables or not.
- is_training: Is training or not.
- fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
- nas_training_hyper_parameters: A dictionary storing hyper-parameters for
- training nas models. Its keys are:
- - `drop_path_keep_prob`: Probability to keep each path in the cell when
- training.
- - `total_training_steps`: Total training steps to help drop path
- probability calculation.
-
- Returns:
- concat_logits: A tensor of size [batch, feature_height, feature_width,
- feature_channels], where feature_height/feature_width are determined by
- the images height/width and output_stride.
- end_points: A dictionary from components of the network to the corresponding
- activation.
- """
- features, end_points = feature_extractor.extract_features(
- images,
- output_stride=model_options.output_stride,
- multi_grid=model_options.multi_grid,
- model_variant=model_options.model_variant,
- depth_multiplier=model_options.depth_multiplier,
- divisible_by=model_options.divisible_by,
- weight_decay=weight_decay,
- reuse=reuse,
- is_training=is_training,
- preprocessed_images_dtype=model_options.preprocessed_images_dtype,
- fine_tune_batch_norm=fine_tune_batch_norm,
- nas_architecture_options=model_options.nas_architecture_options,
- nas_training_hyper_parameters=nas_training_hyper_parameters,
- use_bounded_activation=model_options.use_bounded_activation)
-
- if not model_options.aspp_with_batch_norm:
- return features, end_points
- else:
- if model_options.dense_prediction_cell_config is not None:
- tf.logging.info('Using dense prediction cell config.')
- dense_prediction_layer = dense_prediction_cell.DensePredictionCell(
- config=model_options.dense_prediction_cell_config,
- hparams={
- 'conv_rate_multiplier': 16 // model_options.output_stride,
- })
- concat_logits = dense_prediction_layer.build_cell(
- features,
- output_stride=model_options.output_stride,
- crop_size=model_options.crop_size,
- image_pooling_crop_size=model_options.image_pooling_crop_size,
- weight_decay=weight_decay,
- reuse=reuse,
- is_training=is_training,
- fine_tune_batch_norm=fine_tune_batch_norm)
- return concat_logits, end_points
- else:
- # The following codes employ the DeepLabv3 ASPP module. Note that we
- # could express the ASPP module as one particular dense prediction
- # cell architecture. We do not do so but leave the following codes
- # for backward compatibility.
- batch_norm_params = utils.get_batch_norm_params(
- decay=0.9997,
- epsilon=1e-5,
- scale=True,
- is_training=(is_training and fine_tune_batch_norm),
- sync_batch_norm_method=model_options.sync_batch_norm_method)
- batch_norm = utils.get_batch_norm_fn(
- model_options.sync_batch_norm_method)
- activation_fn = (
- tf.nn.relu6 if model_options.use_bounded_activation else tf.nn.relu)
- with slim.arg_scope(
- [slim.conv2d, slim.separable_conv2d],
- weights_regularizer=slim.l2_regularizer(weight_decay),
- activation_fn=activation_fn,
- normalizer_fn=batch_norm,
- padding='SAME',
- stride=1,
- reuse=reuse):
- with slim.arg_scope([batch_norm], **batch_norm_params):
- depth = model_options.aspp_convs_filters
- branch_logits = []
-
- if model_options.add_image_level_feature:
- if model_options.crop_size is not None:
- image_pooling_crop_size = model_options.image_pooling_crop_size
- # If image_pooling_crop_size is not specified, use crop_size.
- if image_pooling_crop_size is None:
- image_pooling_crop_size = model_options.crop_size
- pool_height = scale_dimension(
- image_pooling_crop_size[0],
- 1. / model_options.output_stride)
- pool_width = scale_dimension(
- image_pooling_crop_size[1],
- 1. / model_options.output_stride)
- image_feature = slim.avg_pool2d(
- features, [pool_height, pool_width],
- model_options.image_pooling_stride, padding='VALID')
- resize_height = scale_dimension(
- model_options.crop_size[0],
- 1. / model_options.output_stride)
- resize_width = scale_dimension(
- model_options.crop_size[1],
- 1. / model_options.output_stride)
- else:
- # If crop_size is None, we simply do global pooling.
- pool_height = tf.shape(features)[1]
- pool_width = tf.shape(features)[2]
- image_feature = tf.reduce_mean(
- features, axis=[1, 2], keepdims=True)
- resize_height = pool_height
- resize_width = pool_width
- image_feature_activation_fn = tf.nn.relu
- image_feature_normalizer_fn = batch_norm
- if model_options.aspp_with_squeeze_and_excitation:
- image_feature_activation_fn = tf.nn.sigmoid
- if model_options.image_se_uses_qsigmoid:
- image_feature_activation_fn = utils.q_sigmoid
- image_feature_normalizer_fn = None
- image_feature = slim.conv2d(
- image_feature, depth, 1,
- activation_fn=image_feature_activation_fn,
- normalizer_fn=image_feature_normalizer_fn,
- scope=IMAGE_POOLING_SCOPE)
- image_feature = _resize_bilinear(
- image_feature,
- [resize_height, resize_width],
- image_feature.dtype)
- # Set shape for resize_height/resize_width if they are not Tensor.
- if isinstance(resize_height, tf.Tensor):
- resize_height = None
- if isinstance(resize_width, tf.Tensor):
- resize_width = None
- image_feature.set_shape([None, resize_height, resize_width, depth])
- if not model_options.aspp_with_squeeze_and_excitation:
- branch_logits.append(image_feature)
-
- # Employ a 1x1 convolution.
- branch_logits.append(slim.conv2d(features, depth, 1,
- scope=ASPP_SCOPE + str(0)))
-
- if model_options.atrous_rates:
- # Employ 3x3 convolutions with different atrous rates.
- for i, rate in enumerate(model_options.atrous_rates, 1):
- scope = ASPP_SCOPE + str(i)
- if model_options.aspp_with_separable_conv:
- aspp_features = split_separable_conv2d(
- features,
- filters=depth,
- rate=rate,
- weight_decay=weight_decay,
- scope=scope)
- else:
- aspp_features = slim.conv2d(
- features, depth, 3, rate=rate, scope=scope)
- branch_logits.append(aspp_features)
-
- # Merge branch logits.
- concat_logits = tf.concat(branch_logits, 3)
- if model_options.aspp_with_concat_projection:
- concat_logits = slim.conv2d(
- concat_logits, depth, 1, scope=CONCAT_PROJECTION_SCOPE)
- concat_logits = slim.dropout(
- concat_logits,
- keep_prob=0.9,
- is_training=is_training,
- scope=CONCAT_PROJECTION_SCOPE + '_dropout')
- if (model_options.add_image_level_feature and
- model_options.aspp_with_squeeze_and_excitation):
- concat_logits *= image_feature
-
- return concat_logits, end_points
-
-
-def _get_logits(images,
- model_options,
- weight_decay=0.0001,
- reuse=None,
- is_training=False,
- fine_tune_batch_norm=False,
- nas_training_hyper_parameters=None):
- """Gets the logits by atrous/image spatial pyramid pooling.
-
- Args:
- images: A tensor of size [batch, height, width, channels].
- model_options: A ModelOptions instance to configure models.
- weight_decay: The weight decay for model variables.
- reuse: Reuse the model variables or not.
- is_training: Is training or not.
- fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
- nas_training_hyper_parameters: A dictionary storing hyper-parameters for
- training nas models. Its keys are:
- - `drop_path_keep_prob`: Probability to keep each path in the cell when
- training.
- - `total_training_steps`: Total training steps to help drop path
- probability calculation.
-
- Returns:
- outputs_to_logits: A map from output_type to logits.
- """
- features, end_points = extract_features(
- images,
- model_options,
- weight_decay=weight_decay,
- reuse=reuse,
- is_training=is_training,
- fine_tune_batch_norm=fine_tune_batch_norm,
- nas_training_hyper_parameters=nas_training_hyper_parameters)
-
- if model_options.decoder_output_stride:
- crop_size = model_options.crop_size
- if crop_size is None:
- crop_size = [tf.shape(images)[1], tf.shape(images)[2]]
- features = refine_by_decoder(
- features,
- end_points,
- crop_size=crop_size,
- decoder_output_stride=model_options.decoder_output_stride,
- decoder_use_separable_conv=model_options.decoder_use_separable_conv,
- decoder_use_sum_merge=model_options.decoder_use_sum_merge,
- decoder_filters=model_options.decoder_filters,
- decoder_output_is_logits=model_options.decoder_output_is_logits,
- model_variant=model_options.model_variant,
- weight_decay=weight_decay,
- reuse=reuse,
- is_training=is_training,
- fine_tune_batch_norm=fine_tune_batch_norm,
- use_bounded_activation=model_options.use_bounded_activation)
-
- outputs_to_logits = {}
- for output in sorted(model_options.outputs_to_num_classes):
- if model_options.decoder_output_is_logits:
- outputs_to_logits[output] = tf.identity(features,
- name=output)
- else:
- outputs_to_logits[output] = get_branch_logits(
- features,
- model_options.outputs_to_num_classes[output],
- model_options.atrous_rates,
- aspp_with_batch_norm=model_options.aspp_with_batch_norm,
- kernel_size=model_options.logits_kernel_size,
- weight_decay=weight_decay,
- reuse=reuse,
- scope_suffix=output)
-
- return outputs_to_logits
-
-
-def refine_by_decoder(features,
- end_points,
- crop_size=None,
- decoder_output_stride=None,
- decoder_use_separable_conv=False,
- decoder_use_sum_merge=False,
- decoder_filters=256,
- decoder_output_is_logits=False,
- model_variant=None,
- weight_decay=0.0001,
- reuse=None,
- is_training=False,
- fine_tune_batch_norm=False,
- use_bounded_activation=False,
- sync_batch_norm_method='None'):
- """Adds the decoder to obtain sharper segmentation results.
-
- Args:
- features: A tensor of size [batch, features_height, features_width,
- features_channels].
- end_points: A dictionary from components of the network to the corresponding
- activation.
- crop_size: A tuple [crop_height, crop_width] specifying whole patch crop
- size.
- decoder_output_stride: A list of integers specifying the output stride of
- low-level features used in the decoder module.
- decoder_use_separable_conv: Employ separable convolution for decoder or not.
- decoder_use_sum_merge: Boolean, decoder uses simple sum merge or not.
- decoder_filters: Integer, decoder filter size.
- decoder_output_is_logits: Boolean, using decoder output as logits or not.
- model_variant: Model variant for feature extraction.
- weight_decay: The weight decay for model variables.
- reuse: Reuse the model variables or not.
- is_training: Is training or not.
- fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
- use_bounded_activation: Whether or not to use bounded activations. Bounded
- activations better lend themselves to quantized inference.
- sync_batch_norm_method: String, method used to sync batch norm. Currently
- only support `None` (no sync batch norm) and `tpu` (use tpu code to
- sync batch norm).
-
- Returns:
- Decoder output with size [batch, decoder_height, decoder_width,
- decoder_channels].
-
- Raises:
- ValueError: If crop_size is None.
- """
- if crop_size is None:
- raise ValueError('crop_size must be provided when using decoder.')
- batch_norm_params = utils.get_batch_norm_params(
- decay=0.9997,
- epsilon=1e-5,
- scale=True,
- is_training=(is_training and fine_tune_batch_norm),
- sync_batch_norm_method=sync_batch_norm_method)
- batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
- decoder_depth = decoder_filters
- projected_filters = 48
- if decoder_use_sum_merge:
- # When using sum merge, the projected filters must be equal to decoder
- # filters.
- projected_filters = decoder_filters
- if decoder_output_is_logits:
- # Overwrite the setting when decoder output is logits.
- activation_fn = None
- normalizer_fn = None
- conv2d_kernel = 1
- # Use original conv instead of separable conv.
- decoder_use_separable_conv = False
- else:
- # Default setting when decoder output is not logits.
- activation_fn = tf.nn.relu6 if use_bounded_activation else tf.nn.relu
- normalizer_fn = batch_norm
- conv2d_kernel = 3
- with slim.arg_scope(
- [slim.conv2d, slim.separable_conv2d],
- weights_regularizer=slim.l2_regularizer(weight_decay),
- activation_fn=activation_fn,
- normalizer_fn=normalizer_fn,
- padding='SAME',
- stride=1,
- reuse=reuse):
- with slim.arg_scope([batch_norm], **batch_norm_params):
- with tf.variable_scope(DECODER_SCOPE, DECODER_SCOPE, [features]):
- decoder_features = features
- decoder_stage = 0
- scope_suffix = ''
- for output_stride in decoder_output_stride:
- feature_list = feature_extractor.networks_to_feature_maps[
- model_variant][
- feature_extractor.DECODER_END_POINTS][output_stride]
- # If only one decoder stage, we do not change the scope name in
- # order for backward compactibility.
- if decoder_stage:
- scope_suffix = '_{}'.format(decoder_stage)
- for i, name in enumerate(feature_list):
- decoder_features_list = [decoder_features]
- # MobileNet and NAS variants use different naming convention.
- if ('mobilenet' in model_variant or
- model_variant.startswith('mnas') or
- model_variant.startswith('nas')):
- feature_name = name
- else:
- feature_name = '{}/{}'.format(
- feature_extractor.name_scope[model_variant], name)
- decoder_features_list.append(
- slim.conv2d(
- end_points[feature_name],
- projected_filters,
- 1,
- scope='feature_projection' + str(i) + scope_suffix))
- # Determine the output size.
- decoder_height = scale_dimension(crop_size[0], 1.0 / output_stride)
- decoder_width = scale_dimension(crop_size[1], 1.0 / output_stride)
- # Resize to decoder_height/decoder_width.
- for j, feature in enumerate(decoder_features_list):
- decoder_features_list[j] = _resize_bilinear(
- feature, [decoder_height, decoder_width], feature.dtype)
- h = (None if isinstance(decoder_height, tf.Tensor)
- else decoder_height)
- w = (None if isinstance(decoder_width, tf.Tensor)
- else decoder_width)
- decoder_features_list[j].set_shape([None, h, w, None])
- if decoder_use_sum_merge:
- decoder_features = _decoder_with_sum_merge(
- decoder_features_list,
- decoder_depth,
- conv2d_kernel=conv2d_kernel,
- decoder_use_separable_conv=decoder_use_separable_conv,
- weight_decay=weight_decay,
- scope_suffix=scope_suffix)
- else:
- if not decoder_use_separable_conv:
- scope_suffix = str(i) + scope_suffix
- decoder_features = _decoder_with_concat_merge(
- decoder_features_list,
- decoder_depth,
- decoder_use_separable_conv=decoder_use_separable_conv,
- weight_decay=weight_decay,
- scope_suffix=scope_suffix)
- decoder_stage += 1
- return decoder_features
-
-
-def _decoder_with_sum_merge(decoder_features_list,
- decoder_depth,
- conv2d_kernel=3,
- decoder_use_separable_conv=True,
- weight_decay=0.0001,
- scope_suffix=''):
- """Decoder with sum to merge features.
-
- Args:
- decoder_features_list: A list of decoder features.
- decoder_depth: Integer, the filters used in the convolution.
- conv2d_kernel: Integer, the convolution kernel size.
- decoder_use_separable_conv: Boolean, use separable conv or not.
- weight_decay: Weight decay for the model variables.
- scope_suffix: String, used in the scope suffix.
-
- Returns:
- decoder features merged with sum.
-
- Raises:
- RuntimeError: If decoder_features_list have length not equal to 2.
- """
- if len(decoder_features_list) != 2:
- raise RuntimeError('Expect decoder_features has length 2.')
- # Only apply one convolution when decoder use sum merge.
- if decoder_use_separable_conv:
- decoder_features = split_separable_conv2d(
- decoder_features_list[0],
- filters=decoder_depth,
- rate=1,
- weight_decay=weight_decay,
- scope='decoder_split_sep_conv0'+scope_suffix) + decoder_features_list[1]
- else:
- decoder_features = slim.conv2d(
- decoder_features_list[0],
- decoder_depth,
- conv2d_kernel,
- scope='decoder_conv0'+scope_suffix) + decoder_features_list[1]
- return decoder_features
-
-
-def _decoder_with_concat_merge(decoder_features_list,
- decoder_depth,
- decoder_use_separable_conv=True,
- weight_decay=0.0001,
- scope_suffix=''):
- """Decoder with concatenation to merge features.
-
- This decoder method applies two convolutions to smooth the features obtained
- by concatenating the input decoder_features_list.
-
- This decoder module is proposed in the DeepLabv3+ paper.
-
- Args:
- decoder_features_list: A list of decoder features.
- decoder_depth: Integer, the filters used in the convolution.
- decoder_use_separable_conv: Boolean, use separable conv or not.
- weight_decay: Weight decay for the model variables.
- scope_suffix: String, used in the scope suffix.
-
- Returns:
- decoder features merged with concatenation.
- """
- if decoder_use_separable_conv:
- decoder_features = split_separable_conv2d(
- tf.concat(decoder_features_list, 3),
- filters=decoder_depth,
- rate=1,
- weight_decay=weight_decay,
- scope='decoder_conv0'+scope_suffix)
- decoder_features = split_separable_conv2d(
- decoder_features,
- filters=decoder_depth,
- rate=1,
- weight_decay=weight_decay,
- scope='decoder_conv1'+scope_suffix)
- else:
- num_convs = 2
- decoder_features = slim.repeat(
- tf.concat(decoder_features_list, 3),
- num_convs,
- slim.conv2d,
- decoder_depth,
- 3,
- scope='decoder_conv'+scope_suffix)
- return decoder_features
-
-
-def get_branch_logits(features,
- num_classes,
- atrous_rates=None,
- aspp_with_batch_norm=False,
- kernel_size=1,
- weight_decay=0.0001,
- reuse=None,
- scope_suffix=''):
- """Gets the logits from each model's branch.
-
- The underlying model is branched out in the last layer when atrous
- spatial pyramid pooling is employed, and all branches are sum-merged
- to form the final logits.
-
- Args:
- features: A float tensor of shape [batch, height, width, channels].
- num_classes: Number of classes to predict.
- atrous_rates: A list of atrous convolution rates for last layer.
- aspp_with_batch_norm: Use batch normalization layers for ASPP.
- kernel_size: Kernel size for convolution.
- weight_decay: Weight decay for the model variables.
- reuse: Reuse model variables or not.
- scope_suffix: Scope suffix for the model variables.
-
- Returns:
- Merged logits with shape [batch, height, width, num_classes].
-
- Raises:
- ValueError: Upon invalid input kernel_size value.
- """
- # When using batch normalization with ASPP, ASPP has been applied before
- # in extract_features, and thus we simply apply 1x1 convolution here.
- if aspp_with_batch_norm or atrous_rates is None:
- if kernel_size != 1:
- raise ValueError('Kernel size must be 1 when atrous_rates is None or '
- 'using aspp_with_batch_norm. Gets %d.' % kernel_size)
- atrous_rates = [1]
-
- with slim.arg_scope(
- [slim.conv2d],
- weights_regularizer=slim.l2_regularizer(weight_decay),
- weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
- reuse=reuse):
- with tf.variable_scope(LOGITS_SCOPE_NAME, LOGITS_SCOPE_NAME, [features]):
- branch_logits = []
- for i, rate in enumerate(atrous_rates):
- scope = scope_suffix
- if i:
- scope += '_%d' % i
-
- branch_logits.append(
- slim.conv2d(
- features,
- num_classes,
- kernel_size=kernel_size,
- rate=rate,
- activation_fn=None,
- normalizer_fn=None,
- scope=scope))
-
- return tf.add_n(branch_logits)
diff --git a/research/deeplab/model_test.py b/research/deeplab/model_test.py
deleted file mode 100644
index d8413d7395d..00000000000
--- a/research/deeplab/model_test.py
+++ /dev/null
@@ -1,148 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for DeepLab model and some helper functions."""
-
-import tensorflow as tf
-
-from deeplab import common
-from deeplab import model
-
-
-class DeeplabModelTest(tf.test.TestCase):
-
- def testWrongDeepLabVariant(self):
- model_options = common.ModelOptions([])._replace(
- model_variant='no_such_variant')
- with self.assertRaises(ValueError):
- model._get_logits(images=[], model_options=model_options)
-
- def testBuildDeepLabv2(self):
- batch_size = 2
- crop_size = [41, 41]
-
- # Test with two image_pyramids.
- image_pyramids = [[1], [0.5, 1]]
-
- # Test two model variants.
- model_variants = ['xception_65', 'mobilenet_v2']
-
- # Test with two output_types.
- outputs_to_num_classes = {'semantic': 3,
- 'direction': 2}
-
- expected_endpoints = [['merged_logits'],
- ['merged_logits',
- 'logits_0.50',
- 'logits_1.00']]
- expected_num_logits = [1, 3]
-
- for model_variant in model_variants:
- model_options = common.ModelOptions(outputs_to_num_classes)._replace(
- add_image_level_feature=False,
- aspp_with_batch_norm=False,
- aspp_with_separable_conv=False,
- model_variant=model_variant)
-
- for i, image_pyramid in enumerate(image_pyramids):
- g = tf.Graph()
- with g.as_default():
- with self.test_session(graph=g):
- inputs = tf.random_uniform(
- (batch_size, crop_size[0], crop_size[1], 3))
- outputs_to_scales_to_logits = model.multi_scale_logits(
- inputs, model_options, image_pyramid=image_pyramid)
-
- # Check computed results for each output type.
- for output in outputs_to_num_classes:
- scales_to_logits = outputs_to_scales_to_logits[output]
- self.assertListEqual(sorted(scales_to_logits.keys()),
- sorted(expected_endpoints[i]))
-
- # Expected number of logits = len(image_pyramid) + 1, since the
- # last logits is merged from all the scales.
- self.assertEqual(len(scales_to_logits), expected_num_logits[i])
-
- def testForwardpassDeepLabv3plus(self):
- crop_size = [33, 33]
- outputs_to_num_classes = {'semantic': 3}
-
- model_options = common.ModelOptions(
- outputs_to_num_classes,
- crop_size,
- output_stride=16
- )._replace(
- add_image_level_feature=True,
- aspp_with_batch_norm=True,
- logits_kernel_size=1,
- decoder_output_stride=[4],
- model_variant='mobilenet_v2') # Employ MobileNetv2 for fast test.
-
- g = tf.Graph()
- with g.as_default():
- with self.test_session(graph=g) as sess:
- inputs = tf.random_uniform(
- (1, crop_size[0], crop_size[1], 3))
- outputs_to_scales_to_logits = model.multi_scale_logits(
- inputs,
- model_options,
- image_pyramid=[1.0])
-
- sess.run(tf.global_variables_initializer())
- outputs_to_scales_to_logits = sess.run(outputs_to_scales_to_logits)
-
- # Check computed results for each output type.
- for output in outputs_to_num_classes:
- scales_to_logits = outputs_to_scales_to_logits[output]
- # Expect only one output.
- self.assertEqual(len(scales_to_logits), 1)
- for logits in scales_to_logits.values():
- self.assertTrue(logits.any())
-
- def testBuildDeepLabWithDensePredictionCell(self):
- batch_size = 1
- crop_size = [33, 33]
- outputs_to_num_classes = {'semantic': 2}
- expected_endpoints = ['merged_logits']
- dense_prediction_cell_config = [
- {'kernel': 3, 'rate': [1, 6], 'op': 'conv', 'input': -1},
- {'kernel': 3, 'rate': [18, 15], 'op': 'conv', 'input': 0},
- ]
- model_options = common.ModelOptions(
- outputs_to_num_classes,
- crop_size,
- output_stride=16)._replace(
- aspp_with_batch_norm=True,
- model_variant='mobilenet_v2',
- dense_prediction_cell_config=dense_prediction_cell_config)
- g = tf.Graph()
- with g.as_default():
- with self.test_session(graph=g):
- inputs = tf.random_uniform(
- (batch_size, crop_size[0], crop_size[1], 3))
- outputs_to_scales_to_model_results = model.multi_scale_logits(
- inputs,
- model_options,
- image_pyramid=[1.0])
- for output in outputs_to_num_classes:
- scales_to_model_results = outputs_to_scales_to_model_results[output]
- self.assertListEqual(
- list(scales_to_model_results), expected_endpoints)
- self.assertEqual(len(scales_to_model_results), 1)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/testing/info.md b/research/deeplab/testing/info.md
deleted file mode 100644
index b84d2adb1c5..00000000000
--- a/research/deeplab/testing/info.md
+++ /dev/null
@@ -1,6 +0,0 @@
-This directory contains testing data.
-
-# pascal_voc_seg
-This folder contains data specific to pascal_voc_seg dataset. val-00000-of-00001.tfrecord contains
-three randomly generated images with format defined in
-tensorflow/models/research/deeplab/datasets/build_voc2012_data.py.
diff --git a/research/deeplab/testing/pascal_voc_seg/val-00000-of-00001.tfrecord b/research/deeplab/testing/pascal_voc_seg/val-00000-of-00001.tfrecord
deleted file mode 100644
index e81455b2e1a..00000000000
Binary files a/research/deeplab/testing/pascal_voc_seg/val-00000-of-00001.tfrecord and /dev/null differ
diff --git a/research/deeplab/train.py b/research/deeplab/train.py
deleted file mode 100644
index fbe060dccd4..00000000000
--- a/research/deeplab/train.py
+++ /dev/null
@@ -1,464 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Training script for the DeepLab model.
-
-See model.py for more details and usage.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import six
-import tensorflow as tf
-from tensorflow.contrib import quantize as contrib_quantize
-from tensorflow.contrib import tfprof as contrib_tfprof
-from deeplab import common
-from deeplab import model
-from deeplab.datasets import data_generator
-from deeplab.utils import train_utils
-from deployment import model_deploy
-
-slim = tf.contrib.slim
-flags = tf.app.flags
-FLAGS = flags.FLAGS
-
-# Settings for multi-GPUs/multi-replicas training.
-
-flags.DEFINE_integer('num_clones', 1, 'Number of clones to deploy.')
-
-flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')
-
-flags.DEFINE_integer('num_replicas', 1, 'Number of worker replicas.')
-
-flags.DEFINE_integer('startup_delay_steps', 15,
- 'Number of training steps between replicas startup.')
-
-flags.DEFINE_integer(
- 'num_ps_tasks', 0,
- 'The number of parameter servers. If the value is 0, then '
- 'the parameters are handled locally by the worker.')
-
-flags.DEFINE_string('master', '', 'BNS name of the tensorflow server')
-
-flags.DEFINE_integer('task', 0, 'The task ID.')
-
-# Settings for logging.
-
-flags.DEFINE_string('train_logdir', None,
- 'Where the checkpoint and logs are stored.')
-
-flags.DEFINE_integer('log_steps', 10,
- 'Display logging information at every log_steps.')
-
-flags.DEFINE_integer('save_interval_secs', 1200,
- 'How often, in seconds, we save the model to disk.')
-
-flags.DEFINE_integer('save_summaries_secs', 600,
- 'How often, in seconds, we compute the summaries.')
-
-flags.DEFINE_boolean(
- 'save_summaries_images', False,
- 'Save sample inputs, labels, and semantic predictions as '
- 'images to summary.')
-
-# Settings for profiling.
-
-flags.DEFINE_string('profile_logdir', None,
- 'Where the profile files are stored.')
-
-# Settings for training strategy.
-
-flags.DEFINE_enum('optimizer', 'momentum', ['momentum', 'adam'],
- 'Which optimizer to use.')
-
-
-# Momentum optimizer flags
-
-flags.DEFINE_enum('learning_policy', 'poly', ['poly', 'step'],
- 'Learning rate policy for training.')
-
-# Use 0.007 when training on PASCAL augmented training set, train_aug. When
-# fine-tuning on PASCAL trainval set, use learning rate=0.0001.
-flags.DEFINE_float('base_learning_rate', .0001,
- 'The base learning rate for model training.')
-
-flags.DEFINE_float('decay_steps', 0.0,
- 'Decay steps for polynomial learning rate schedule.')
-
-flags.DEFINE_float('end_learning_rate', 0.0,
- 'End learning rate for polynomial learning rate schedule.')
-
-flags.DEFINE_float('learning_rate_decay_factor', 0.1,
- 'The rate to decay the base learning rate.')
-
-flags.DEFINE_integer('learning_rate_decay_step', 2000,
- 'Decay the base learning rate at a fixed step.')
-
-flags.DEFINE_float('learning_power', 0.9,
- 'The power value used in the poly learning policy.')
-
-flags.DEFINE_integer('training_number_of_steps', 30000,
- 'The number of steps used for training')
-
-flags.DEFINE_float('momentum', 0.9, 'The momentum value to use')
-
-# Adam optimizer flags
-flags.DEFINE_float('adam_learning_rate', 0.001,
- 'Learning rate for the adam optimizer.')
-flags.DEFINE_float('adam_epsilon', 1e-08, 'Adam optimizer epsilon.')
-
-# When fine_tune_batch_norm=True, use at least batch size larger than 12
-# (batch size more than 16 is better). Otherwise, one could use smaller batch
-# size and set fine_tune_batch_norm=False.
-flags.DEFINE_integer('train_batch_size', 8,
- 'The number of images in each batch during training.')
-
-# For weight_decay, use 0.00004 for MobileNet-V2 or Xcpetion model variants.
-# Use 0.0001 for ResNet model variants.
-flags.DEFINE_float('weight_decay', 0.00004,
- 'The value of the weight decay for training.')
-
-flags.DEFINE_list('train_crop_size', '513,513',
- 'Image crop size [height, width] during training.')
-
-flags.DEFINE_float(
- 'last_layer_gradient_multiplier', 1.0,
- 'The gradient multiplier for last layers, which is used to '
- 'boost the gradient of last layers if the value > 1.')
-
-flags.DEFINE_boolean('upsample_logits', True,
- 'Upsample logits during training.')
-
-# Hyper-parameters for NAS training strategy.
-
-flags.DEFINE_float(
- 'drop_path_keep_prob', 1.0,
- 'Probability to keep each path in the NAS cell when training.')
-
-# Settings for fine-tuning the network.
-
-flags.DEFINE_string('tf_initial_checkpoint', None,
- 'The initial checkpoint in tensorflow format.')
-
-# Set to False if one does not want to re-use the trained classifier weights.
-flags.DEFINE_boolean('initialize_last_layer', True,
- 'Initialize the last layer.')
-
-flags.DEFINE_boolean('last_layers_contain_logits_only', False,
- 'Only consider logits as last layers or not.')
-
-flags.DEFINE_integer('slow_start_step', 0,
- 'Training model with small learning rate for few steps.')
-
-flags.DEFINE_float('slow_start_learning_rate', 1e-4,
- 'Learning rate employed during slow start.')
-
-# Set to True if one wants to fine-tune the batch norm parameters in DeepLabv3.
-# Set to False and use small batch size to save GPU memory.
-flags.DEFINE_boolean('fine_tune_batch_norm', True,
- 'Fine tune the batch norm parameters or not.')
-
-flags.DEFINE_float('min_scale_factor', 0.5,
- 'Mininum scale factor for data augmentation.')
-
-flags.DEFINE_float('max_scale_factor', 2.,
- 'Maximum scale factor for data augmentation.')
-
-flags.DEFINE_float('scale_factor_step_size', 0.25,
- 'Scale factor step size for data augmentation.')
-
-# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
-# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
-# one could use different atrous_rates/output_stride during training/evaluation.
-flags.DEFINE_multi_integer('atrous_rates', None,
- 'Atrous rates for atrous spatial pyramid pooling.')
-
-flags.DEFINE_integer('output_stride', 16,
- 'The ratio of input to output spatial resolution.')
-
-# Hard example mining related flags.
-flags.DEFINE_integer(
- 'hard_example_mining_step', 0,
- 'The training step in which exact hard example mining kicks off. Note we '
- 'gradually reduce the mining percent to the specified '
- 'top_k_percent_pixels. For example, if hard_example_mining_step=100K and '
- 'top_k_percent_pixels=0.25, then mining percent will gradually reduce from '
- '100% to 25% until 100K steps after which we only mine top 25% pixels.')
-
-flags.DEFINE_float(
- 'top_k_percent_pixels', 1.0,
- 'The top k percent pixels (in terms of the loss values) used to compute '
- 'loss during training. This is useful for hard pixel mining.')
-
-# Quantization setting.
-flags.DEFINE_integer(
- 'quantize_delay_step', -1,
- 'Steps to start quantized training. If < 0, will not quantize model.')
-
-# Dataset settings.
-flags.DEFINE_string('dataset', 'pascal_voc_seg',
- 'Name of the segmentation dataset.')
-
-flags.DEFINE_string('train_split', 'train',
- 'Which split of the dataset to be used for training')
-
-flags.DEFINE_string('dataset_dir', None, 'Where the dataset reside.')
-
-
-def _build_deeplab(iterator, outputs_to_num_classes, ignore_label):
- """Builds a clone of DeepLab.
-
- Args:
- iterator: An iterator of type tf.data.Iterator for images and labels.
- outputs_to_num_classes: A map from output type to the number of classes. For
- example, for the task of semantic segmentation with 21 semantic classes,
- we would have outputs_to_num_classes['semantic'] = 21.
- ignore_label: Ignore label.
- """
- samples = iterator.get_next()
-
- # Add name to input and label nodes so we can add to summary.
- samples[common.IMAGE] = tf.identity(samples[common.IMAGE], name=common.IMAGE)
- samples[common.LABEL] = tf.identity(samples[common.LABEL], name=common.LABEL)
-
- model_options = common.ModelOptions(
- outputs_to_num_classes=outputs_to_num_classes,
- crop_size=[int(sz) for sz in FLAGS.train_crop_size],
- atrous_rates=FLAGS.atrous_rates,
- output_stride=FLAGS.output_stride)
-
- outputs_to_scales_to_logits = model.multi_scale_logits(
- samples[common.IMAGE],
- model_options=model_options,
- image_pyramid=FLAGS.image_pyramid,
- weight_decay=FLAGS.weight_decay,
- is_training=True,
- fine_tune_batch_norm=FLAGS.fine_tune_batch_norm,
- nas_training_hyper_parameters={
- 'drop_path_keep_prob': FLAGS.drop_path_keep_prob,
- 'total_training_steps': FLAGS.training_number_of_steps,
- })
-
- # Add name to graph node so we can add to summary.
- output_type_dict = outputs_to_scales_to_logits[common.OUTPUT_TYPE]
- output_type_dict[model.MERGED_LOGITS_SCOPE] = tf.identity(
- output_type_dict[model.MERGED_LOGITS_SCOPE], name=common.OUTPUT_TYPE)
-
- for output, num_classes in six.iteritems(outputs_to_num_classes):
- train_utils.add_softmax_cross_entropy_loss_for_each_scale(
- outputs_to_scales_to_logits[output],
- samples[common.LABEL],
- num_classes,
- ignore_label,
- loss_weight=model_options.label_weights,
- upsample_logits=FLAGS.upsample_logits,
- hard_example_mining_step=FLAGS.hard_example_mining_step,
- top_k_percent_pixels=FLAGS.top_k_percent_pixels,
- scope=output)
-
-
-def main(unused_argv):
- tf.logging.set_verbosity(tf.logging.INFO)
- # Set up deployment (i.e., multi-GPUs and/or multi-replicas).
- config = model_deploy.DeploymentConfig(
- num_clones=FLAGS.num_clones,
- clone_on_cpu=FLAGS.clone_on_cpu,
- replica_id=FLAGS.task,
- num_replicas=FLAGS.num_replicas,
- num_ps_tasks=FLAGS.num_ps_tasks)
-
- # Split the batch across GPUs.
- assert FLAGS.train_batch_size % config.num_clones == 0, (
- 'Training batch size not divisble by number of clones (GPUs).')
-
- clone_batch_size = FLAGS.train_batch_size // config.num_clones
-
- tf.gfile.MakeDirs(FLAGS.train_logdir)
- tf.logging.info('Training on %s set', FLAGS.train_split)
-
- with tf.Graph().as_default() as graph:
- with tf.device(config.inputs_device()):
- dataset = data_generator.Dataset(
- dataset_name=FLAGS.dataset,
- split_name=FLAGS.train_split,
- dataset_dir=FLAGS.dataset_dir,
- batch_size=clone_batch_size,
- crop_size=[int(sz) for sz in FLAGS.train_crop_size],
- min_resize_value=FLAGS.min_resize_value,
- max_resize_value=FLAGS.max_resize_value,
- resize_factor=FLAGS.resize_factor,
- min_scale_factor=FLAGS.min_scale_factor,
- max_scale_factor=FLAGS.max_scale_factor,
- scale_factor_step_size=FLAGS.scale_factor_step_size,
- model_variant=FLAGS.model_variant,
- num_readers=4,
- is_training=True,
- should_shuffle=True,
- should_repeat=True)
-
- # Create the global step on the device storing the variables.
- with tf.device(config.variables_device()):
- global_step = tf.train.get_or_create_global_step()
-
- # Define the model and create clones.
- model_fn = _build_deeplab
- model_args = (dataset.get_one_shot_iterator(), {
- common.OUTPUT_TYPE: dataset.num_of_classes
- }, dataset.ignore_label)
- clones = model_deploy.create_clones(config, model_fn, args=model_args)
-
- # Gather update_ops from the first clone. These contain, for example,
- # the updates for the batch_norm variables created by model_fn.
- first_clone_scope = config.clone_scope(0)
- update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, first_clone_scope)
-
- # Gather initial summaries.
- summaries = set(tf.get_collection(tf.GraphKeys.SUMMARIES))
-
- # Add summaries for model variables.
- for model_var in tf.model_variables():
- summaries.add(tf.summary.histogram(model_var.op.name, model_var))
-
- # Add summaries for images, labels, semantic predictions
- if FLAGS.save_summaries_images:
- summary_image = graph.get_tensor_by_name(
- ('%s/%s:0' % (first_clone_scope, common.IMAGE)).strip('/'))
- summaries.add(
- tf.summary.image('samples/%s' % common.IMAGE, summary_image))
-
- first_clone_label = graph.get_tensor_by_name(
- ('%s/%s:0' % (first_clone_scope, common.LABEL)).strip('/'))
- # Scale up summary image pixel values for better visualization.
- pixel_scaling = max(1, 255 // dataset.num_of_classes)
- summary_label = tf.cast(first_clone_label * pixel_scaling, tf.uint8)
- summaries.add(
- tf.summary.image('samples/%s' % common.LABEL, summary_label))
-
- first_clone_output = graph.get_tensor_by_name(
- ('%s/%s:0' % (first_clone_scope, common.OUTPUT_TYPE)).strip('/'))
- predictions = tf.expand_dims(tf.argmax(first_clone_output, 3), -1)
-
- summary_predictions = tf.cast(predictions * pixel_scaling, tf.uint8)
- summaries.add(
- tf.summary.image(
- 'samples/%s' % common.OUTPUT_TYPE, summary_predictions))
-
- # Add summaries for losses.
- for loss in tf.get_collection(tf.GraphKeys.LOSSES, first_clone_scope):
- summaries.add(tf.summary.scalar('losses/%s' % loss.op.name, loss))
-
- # Build the optimizer based on the device specification.
- with tf.device(config.optimizer_device()):
- learning_rate = train_utils.get_model_learning_rate(
- FLAGS.learning_policy,
- FLAGS.base_learning_rate,
- FLAGS.learning_rate_decay_step,
- FLAGS.learning_rate_decay_factor,
- FLAGS.training_number_of_steps,
- FLAGS.learning_power,
- FLAGS.slow_start_step,
- FLAGS.slow_start_learning_rate,
- decay_steps=FLAGS.decay_steps,
- end_learning_rate=FLAGS.end_learning_rate)
-
- summaries.add(tf.summary.scalar('learning_rate', learning_rate))
-
- if FLAGS.optimizer == 'momentum':
- optimizer = tf.train.MomentumOptimizer(learning_rate, FLAGS.momentum)
- elif FLAGS.optimizer == 'adam':
- optimizer = tf.train.AdamOptimizer(
- learning_rate=FLAGS.adam_learning_rate, epsilon=FLAGS.adam_epsilon)
- else:
- raise ValueError('Unknown optimizer')
-
- if FLAGS.quantize_delay_step >= 0:
- if FLAGS.num_clones > 1:
- raise ValueError('Quantization doesn\'t support multi-clone yet.')
- contrib_quantize.create_training_graph(
- quant_delay=FLAGS.quantize_delay_step)
-
- startup_delay_steps = FLAGS.task * FLAGS.startup_delay_steps
-
- with tf.device(config.variables_device()):
- total_loss, grads_and_vars = model_deploy.optimize_clones(
- clones, optimizer)
- total_loss = tf.check_numerics(total_loss, 'Loss is inf or nan.')
- summaries.add(tf.summary.scalar('total_loss', total_loss))
-
- # Modify the gradients for biases and last layer variables.
- last_layers = model.get_extra_layer_scopes(
- FLAGS.last_layers_contain_logits_only)
- grad_mult = train_utils.get_model_gradient_multipliers(
- last_layers, FLAGS.last_layer_gradient_multiplier)
- if grad_mult:
- grads_and_vars = slim.learning.multiply_gradients(
- grads_and_vars, grad_mult)
-
- # Create gradient update op.
- grad_updates = optimizer.apply_gradients(
- grads_and_vars, global_step=global_step)
- update_ops.append(grad_updates)
- update_op = tf.group(*update_ops)
- with tf.control_dependencies([update_op]):
- train_tensor = tf.identity(total_loss, name='train_op')
-
- # Add the summaries from the first clone. These contain the summaries
- # created by model_fn and either optimize_clones() or _gather_clone_loss().
- summaries |= set(
- tf.get_collection(tf.GraphKeys.SUMMARIES, first_clone_scope))
-
- # Merge all summaries together.
- summary_op = tf.summary.merge(list(summaries))
-
- # Soft placement allows placing on CPU ops without GPU implementation.
- session_config = tf.ConfigProto(
- allow_soft_placement=True, log_device_placement=False)
-
- # Start the training.
- profile_dir = FLAGS.profile_logdir
- if profile_dir is not None:
- tf.gfile.MakeDirs(profile_dir)
-
- with contrib_tfprof.ProfileContext(
- enabled=profile_dir is not None, profile_dir=profile_dir):
- init_fn = None
- if FLAGS.tf_initial_checkpoint:
- init_fn = train_utils.get_model_init_fn(
- FLAGS.train_logdir,
- FLAGS.tf_initial_checkpoint,
- FLAGS.initialize_last_layer,
- last_layers,
- ignore_missing_vars=True)
-
- slim.learning.train(
- train_tensor,
- logdir=FLAGS.train_logdir,
- log_every_n_steps=FLAGS.log_steps,
- master=FLAGS.master,
- number_of_steps=FLAGS.training_number_of_steps,
- is_chief=(FLAGS.task == 0),
- session_config=session_config,
- startup_delay_steps=startup_delay_steps,
- init_fn=init_fn,
- summary_op=summary_op,
- save_summaries_secs=FLAGS.save_summaries_secs,
- save_interval_secs=FLAGS.save_interval_secs)
-
-
-if __name__ == '__main__':
- flags.mark_flag_as_required('train_logdir')
- flags.mark_flag_as_required('dataset_dir')
- tf.app.run()
diff --git a/research/deeplab/utils/__init__.py b/research/deeplab/utils/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/deeplab/utils/get_dataset_colormap.py b/research/deeplab/utils/get_dataset_colormap.py
deleted file mode 100644
index c0502e3b3cd..00000000000
--- a/research/deeplab/utils/get_dataset_colormap.py
+++ /dev/null
@@ -1,416 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Visualizes the segmentation results via specified color map.
-
-Visualizes the semantic segmentation results by the color map
-defined by the different datasets. Supported colormaps are:
-
-* ADE20K (http://groups.csail.mit.edu/vision/datasets/ADE20K/).
-
-* Cityscapes dataset (https://www.cityscapes-dataset.com).
-
-* Mapillary Vistas (https://research.mapillary.com).
-
-* PASCAL VOC 2012 (http://host.robots.ox.ac.uk/pascal/VOC/).
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import numpy as np
-from six.moves import range
-
-# Dataset names.
-_ADE20K = 'ade20k'
-_CITYSCAPES = 'cityscapes'
-_MAPILLARY_VISTAS = 'mapillary_vistas'
-_PASCAL = 'pascal'
-
-# Max number of entries in the colormap for each dataset.
-_DATASET_MAX_ENTRIES = {
- _ADE20K: 151,
- _CITYSCAPES: 256,
- _MAPILLARY_VISTAS: 66,
- _PASCAL: 512,
-}
-
-
-def create_ade20k_label_colormap():
- """Creates a label colormap used in ADE20K segmentation benchmark.
-
- Returns:
- A colormap for visualizing segmentation results.
- """
- return np.asarray([
- [0, 0, 0],
- [120, 120, 120],
- [180, 120, 120],
- [6, 230, 230],
- [80, 50, 50],
- [4, 200, 3],
- [120, 120, 80],
- [140, 140, 140],
- [204, 5, 255],
- [230, 230, 230],
- [4, 250, 7],
- [224, 5, 255],
- [235, 255, 7],
- [150, 5, 61],
- [120, 120, 70],
- [8, 255, 51],
- [255, 6, 82],
- [143, 255, 140],
- [204, 255, 4],
- [255, 51, 7],
- [204, 70, 3],
- [0, 102, 200],
- [61, 230, 250],
- [255, 6, 51],
- [11, 102, 255],
- [255, 7, 71],
- [255, 9, 224],
- [9, 7, 230],
- [220, 220, 220],
- [255, 9, 92],
- [112, 9, 255],
- [8, 255, 214],
- [7, 255, 224],
- [255, 184, 6],
- [10, 255, 71],
- [255, 41, 10],
- [7, 255, 255],
- [224, 255, 8],
- [102, 8, 255],
- [255, 61, 6],
- [255, 194, 7],
- [255, 122, 8],
- [0, 255, 20],
- [255, 8, 41],
- [255, 5, 153],
- [6, 51, 255],
- [235, 12, 255],
- [160, 150, 20],
- [0, 163, 255],
- [140, 140, 140],
- [250, 10, 15],
- [20, 255, 0],
- [31, 255, 0],
- [255, 31, 0],
- [255, 224, 0],
- [153, 255, 0],
- [0, 0, 255],
- [255, 71, 0],
- [0, 235, 255],
- [0, 173, 255],
- [31, 0, 255],
- [11, 200, 200],
- [255, 82, 0],
- [0, 255, 245],
- [0, 61, 255],
- [0, 255, 112],
- [0, 255, 133],
- [255, 0, 0],
- [255, 163, 0],
- [255, 102, 0],
- [194, 255, 0],
- [0, 143, 255],
- [51, 255, 0],
- [0, 82, 255],
- [0, 255, 41],
- [0, 255, 173],
- [10, 0, 255],
- [173, 255, 0],
- [0, 255, 153],
- [255, 92, 0],
- [255, 0, 255],
- [255, 0, 245],
- [255, 0, 102],
- [255, 173, 0],
- [255, 0, 20],
- [255, 184, 184],
- [0, 31, 255],
- [0, 255, 61],
- [0, 71, 255],
- [255, 0, 204],
- [0, 255, 194],
- [0, 255, 82],
- [0, 10, 255],
- [0, 112, 255],
- [51, 0, 255],
- [0, 194, 255],
- [0, 122, 255],
- [0, 255, 163],
- [255, 153, 0],
- [0, 255, 10],
- [255, 112, 0],
- [143, 255, 0],
- [82, 0, 255],
- [163, 255, 0],
- [255, 235, 0],
- [8, 184, 170],
- [133, 0, 255],
- [0, 255, 92],
- [184, 0, 255],
- [255, 0, 31],
- [0, 184, 255],
- [0, 214, 255],
- [255, 0, 112],
- [92, 255, 0],
- [0, 224, 255],
- [112, 224, 255],
- [70, 184, 160],
- [163, 0, 255],
- [153, 0, 255],
- [71, 255, 0],
- [255, 0, 163],
- [255, 204, 0],
- [255, 0, 143],
- [0, 255, 235],
- [133, 255, 0],
- [255, 0, 235],
- [245, 0, 255],
- [255, 0, 122],
- [255, 245, 0],
- [10, 190, 212],
- [214, 255, 0],
- [0, 204, 255],
- [20, 0, 255],
- [255, 255, 0],
- [0, 153, 255],
- [0, 41, 255],
- [0, 255, 204],
- [41, 0, 255],
- [41, 255, 0],
- [173, 0, 255],
- [0, 245, 255],
- [71, 0, 255],
- [122, 0, 255],
- [0, 255, 184],
- [0, 92, 255],
- [184, 255, 0],
- [0, 133, 255],
- [255, 214, 0],
- [25, 194, 194],
- [102, 255, 0],
- [92, 0, 255],
- ])
-
-
-def create_cityscapes_label_colormap():
- """Creates a label colormap used in CITYSCAPES segmentation benchmark.
-
- Returns:
- A colormap for visualizing segmentation results.
- """
- colormap = np.zeros((256, 3), dtype=np.uint8)
- colormap[0] = [128, 64, 128]
- colormap[1] = [244, 35, 232]
- colormap[2] = [70, 70, 70]
- colormap[3] = [102, 102, 156]
- colormap[4] = [190, 153, 153]
- colormap[5] = [153, 153, 153]
- colormap[6] = [250, 170, 30]
- colormap[7] = [220, 220, 0]
- colormap[8] = [107, 142, 35]
- colormap[9] = [152, 251, 152]
- colormap[10] = [70, 130, 180]
- colormap[11] = [220, 20, 60]
- colormap[12] = [255, 0, 0]
- colormap[13] = [0, 0, 142]
- colormap[14] = [0, 0, 70]
- colormap[15] = [0, 60, 100]
- colormap[16] = [0, 80, 100]
- colormap[17] = [0, 0, 230]
- colormap[18] = [119, 11, 32]
- return colormap
-
-
-def create_mapillary_vistas_label_colormap():
- """Creates a label colormap used in Mapillary Vistas segmentation benchmark.
-
- Returns:
- A colormap for visualizing segmentation results.
- """
- return np.asarray([
- [165, 42, 42],
- [0, 192, 0],
- [196, 196, 196],
- [190, 153, 153],
- [180, 165, 180],
- [102, 102, 156],
- [102, 102, 156],
- [128, 64, 255],
- [140, 140, 200],
- [170, 170, 170],
- [250, 170, 160],
- [96, 96, 96],
- [230, 150, 140],
- [128, 64, 128],
- [110, 110, 110],
- [244, 35, 232],
- [150, 100, 100],
- [70, 70, 70],
- [150, 120, 90],
- [220, 20, 60],
- [255, 0, 0],
- [255, 0, 0],
- [255, 0, 0],
- [200, 128, 128],
- [255, 255, 255],
- [64, 170, 64],
- [128, 64, 64],
- [70, 130, 180],
- [255, 255, 255],
- [152, 251, 152],
- [107, 142, 35],
- [0, 170, 30],
- [255, 255, 128],
- [250, 0, 30],
- [0, 0, 0],
- [220, 220, 220],
- [170, 170, 170],
- [222, 40, 40],
- [100, 170, 30],
- [40, 40, 40],
- [33, 33, 33],
- [170, 170, 170],
- [0, 0, 142],
- [170, 170, 170],
- [210, 170, 100],
- [153, 153, 153],
- [128, 128, 128],
- [0, 0, 142],
- [250, 170, 30],
- [192, 192, 192],
- [220, 220, 0],
- [180, 165, 180],
- [119, 11, 32],
- [0, 0, 142],
- [0, 60, 100],
- [0, 0, 142],
- [0, 0, 90],
- [0, 0, 230],
- [0, 80, 100],
- [128, 64, 64],
- [0, 0, 110],
- [0, 0, 70],
- [0, 0, 192],
- [32, 32, 32],
- [0, 0, 0],
- [0, 0, 0],
- ])
-
-
-def create_pascal_label_colormap():
- """Creates a label colormap used in PASCAL VOC segmentation benchmark.
-
- Returns:
- A colormap for visualizing segmentation results.
- """
- colormap = np.zeros((_DATASET_MAX_ENTRIES[_PASCAL], 3), dtype=int)
- ind = np.arange(_DATASET_MAX_ENTRIES[_PASCAL], dtype=int)
-
- for shift in reversed(list(range(8))):
- for channel in range(3):
- colormap[:, channel] |= bit_get(ind, channel) << shift
- ind >>= 3
-
- return colormap
-
-
-def get_ade20k_name():
- return _ADE20K
-
-
-def get_cityscapes_name():
- return _CITYSCAPES
-
-
-def get_mapillary_vistas_name():
- return _MAPILLARY_VISTAS
-
-
-def get_pascal_name():
- return _PASCAL
-
-
-def bit_get(val, idx):
- """Gets the bit value.
-
- Args:
- val: Input value, int or numpy int array.
- idx: Which bit of the input val.
-
- Returns:
- The "idx"-th bit of input val.
- """
- return (val >> idx) & 1
-
-
-def create_label_colormap(dataset=_PASCAL):
- """Creates a label colormap for the specified dataset.
-
- Args:
- dataset: The colormap used in the dataset.
-
- Returns:
- A numpy array of the dataset colormap.
-
- Raises:
- ValueError: If the dataset is not supported.
- """
- if dataset == _ADE20K:
- return create_ade20k_label_colormap()
- elif dataset == _CITYSCAPES:
- return create_cityscapes_label_colormap()
- elif dataset == _MAPILLARY_VISTAS:
- return create_mapillary_vistas_label_colormap()
- elif dataset == _PASCAL:
- return create_pascal_label_colormap()
- else:
- raise ValueError('Unsupported dataset.')
-
-
-def label_to_color_image(label, dataset=_PASCAL):
- """Adds color defined by the dataset colormap to the label.
-
- Args:
- label: A 2D array with integer type, storing the segmentation label.
- dataset: The colormap used in the dataset.
-
- Returns:
- result: A 2D array with floating type. The element of the array
- is the color indexed by the corresponding element in the input label
- to the dataset color map.
-
- Raises:
- ValueError: If label is not of rank 2 or its value is larger than color
- map maximum entry.
- """
- if label.ndim != 2:
- raise ValueError('Expect 2-D input label. Got {}'.format(label.shape))
-
- if np.max(label) >= _DATASET_MAX_ENTRIES[dataset]:
- raise ValueError(
- 'label value too large: {} >= {}.'.format(
- np.max(label), _DATASET_MAX_ENTRIES[dataset]))
-
- colormap = create_label_colormap(dataset)
- return colormap[label]
-
-
-def get_dataset_colormap_max_entries(dataset):
- return _DATASET_MAX_ENTRIES[dataset]
diff --git a/research/deeplab/utils/get_dataset_colormap_test.py b/research/deeplab/utils/get_dataset_colormap_test.py
deleted file mode 100644
index 89adb2c7391..00000000000
--- a/research/deeplab/utils/get_dataset_colormap_test.py
+++ /dev/null
@@ -1,97 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for get_dataset_colormap.py."""
-
-import numpy as np
-import tensorflow as tf
-
-from deeplab.utils import get_dataset_colormap
-
-
-class VisualizationUtilTest(tf.test.TestCase):
-
- def testBitGet(self):
- """Test that if the returned bit value is correct."""
- self.assertEqual(1, get_dataset_colormap.bit_get(9, 0))
- self.assertEqual(0, get_dataset_colormap.bit_get(9, 1))
- self.assertEqual(0, get_dataset_colormap.bit_get(9, 2))
- self.assertEqual(1, get_dataset_colormap.bit_get(9, 3))
-
- def testPASCALLabelColorMapValue(self):
- """Test the getd color map value."""
- colormap = get_dataset_colormap.create_pascal_label_colormap()
-
- # Only test a few sampled entries in the color map.
- self.assertTrue(np.array_equal([128., 0., 128.], colormap[5, :]))
- self.assertTrue(np.array_equal([128., 192., 128.], colormap[23, :]))
- self.assertTrue(np.array_equal([128., 0., 192.], colormap[37, :]))
- self.assertTrue(np.array_equal([224., 192., 192.], colormap[127, :]))
- self.assertTrue(np.array_equal([192., 160., 192.], colormap[175, :]))
-
- def testLabelToPASCALColorImage(self):
- """Test the value of the converted label value."""
- label = np.array([[0, 16, 16], [52, 7, 52]])
- expected_result = np.array([
- [[0, 0, 0], [0, 64, 0], [0, 64, 0]],
- [[0, 64, 192], [128, 128, 128], [0, 64, 192]]
- ])
- colored_label = get_dataset_colormap.label_to_color_image(
- label, get_dataset_colormap.get_pascal_name())
- self.assertTrue(np.array_equal(expected_result, colored_label))
-
- def testUnExpectedLabelValueForLabelToPASCALColorImage(self):
- """Raise ValueError when input value exceeds range."""
- label = np.array([[120], [600]])
- with self.assertRaises(ValueError):
- get_dataset_colormap.label_to_color_image(
- label, get_dataset_colormap.get_pascal_name())
-
- def testUnExpectedLabelDimensionForLabelToPASCALColorImage(self):
- """Raise ValueError if input dimension is not correct."""
- label = np.array([120])
- with self.assertRaises(ValueError):
- get_dataset_colormap.label_to_color_image(
- label, get_dataset_colormap.get_pascal_name())
-
- def testGetColormapForUnsupportedDataset(self):
- with self.assertRaises(ValueError):
- get_dataset_colormap.create_label_colormap('unsupported_dataset')
-
- def testUnExpectedLabelDimensionForLabelToADE20KColorImage(self):
- label = np.array([250])
- with self.assertRaises(ValueError):
- get_dataset_colormap.label_to_color_image(
- label, get_dataset_colormap.get_ade20k_name())
-
- def testFirstColorInADE20KColorMap(self):
- label = np.array([[1, 3], [10, 20]])
- expected_result = np.array([
- [[120, 120, 120], [6, 230, 230]],
- [[4, 250, 7], [204, 70, 3]]
- ])
- colored_label = get_dataset_colormap.label_to_color_image(
- label, get_dataset_colormap.get_ade20k_name())
- self.assertTrue(np.array_equal(colored_label, expected_result))
-
- def testMapillaryVistasColorMapValue(self):
- colormap = get_dataset_colormap.create_mapillary_vistas_label_colormap()
- self.assertTrue(np.array_equal([190, 153, 153], colormap[3, :]))
- self.assertTrue(np.array_equal([102, 102, 156], colormap[6, :]))
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/deeplab/utils/save_annotation.py b/research/deeplab/utils/save_annotation.py
deleted file mode 100644
index 2444df79532..00000000000
--- a/research/deeplab/utils/save_annotation.py
+++ /dev/null
@@ -1,66 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Saves an annotation as one png image.
-
-This script saves an annotation as one png image, and has the option to add
-colormap to the png image for better visualization.
-"""
-
-import numpy as np
-import PIL.Image as img
-import tensorflow as tf
-
-from deeplab.utils import get_dataset_colormap
-
-
-def save_annotation(label,
- save_dir,
- filename,
- add_colormap=True,
- normalize_to_unit_values=False,
- scale_values=False,
- colormap_type=get_dataset_colormap.get_pascal_name()):
- """Saves the given label to image on disk.
-
- Args:
- label: The numpy array to be saved. The data will be converted
- to uint8 and saved as png image.
- save_dir: String, the directory to which the results will be saved.
- filename: String, the image filename.
- add_colormap: Boolean, add color map to the label or not.
- normalize_to_unit_values: Boolean, normalize the input values to [0, 1].
- scale_values: Boolean, scale the input values to [0, 255] for visualization.
- colormap_type: String, colormap type for visualization.
- """
- # Add colormap for visualizing the prediction.
- if add_colormap:
- colored_label = get_dataset_colormap.label_to_color_image(
- label, colormap_type)
- else:
- colored_label = label
- if normalize_to_unit_values:
- min_value = np.amin(colored_label)
- max_value = np.amax(colored_label)
- range_value = max_value - min_value
- if range_value != 0:
- colored_label = (colored_label - min_value) / range_value
-
- if scale_values:
- colored_label = 255. * colored_label
-
- pil_image = img.fromarray(colored_label.astype(dtype=np.uint8))
- with tf.gfile.Open('%s/%s.png' % (save_dir, filename), mode='w') as f:
- pil_image.save(f, 'PNG')
diff --git a/research/deeplab/utils/train_utils.py b/research/deeplab/utils/train_utils.py
deleted file mode 100644
index 14bbd6ee7e5..00000000000
--- a/research/deeplab/utils/train_utils.py
+++ /dev/null
@@ -1,372 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Utility functions for training."""
-
-import six
-import tensorflow as tf
-from tensorflow.contrib import framework as contrib_framework
-
-from deeplab.core import preprocess_utils
-from deeplab.core import utils
-
-
-def _div_maybe_zero(total_loss, num_present):
- """Normalizes the total loss with the number of present pixels."""
- return tf.to_float(num_present > 0) * tf.math.divide(
- total_loss,
- tf.maximum(1e-5, num_present))
-
-
-def add_softmax_cross_entropy_loss_for_each_scale(scales_to_logits,
- labels,
- num_classes,
- ignore_label,
- loss_weight=1.0,
- upsample_logits=True,
- hard_example_mining_step=0,
- top_k_percent_pixels=1.0,
- gt_is_matting_map=False,
- scope=None):
- """Adds softmax cross entropy loss for logits of each scale.
-
- Args:
- scales_to_logits: A map from logits names for different scales to logits.
- The logits have shape [batch, logits_height, logits_width, num_classes].
- labels: Groundtruth labels with shape [batch, image_height, image_width, 1].
- num_classes: Integer, number of target classes.
- ignore_label: Integer, label to ignore.
- loss_weight: A float or a list of loss weights. If it is a float, it means
- all the labels have the same weight. If it is a list of weights, then each
- element in the list represents the weight for the label of its index, for
- example, loss_weight = [0.1, 0.5] means the weight for label 0 is 0.1 and
- the weight for label 1 is 0.5.
- upsample_logits: Boolean, upsample logits or not.
- hard_example_mining_step: An integer, the training step in which the hard
- exampling mining kicks off. Note that we gradually reduce the mining
- percent to the top_k_percent_pixels. For example, if
- hard_example_mining_step = 100K and top_k_percent_pixels = 0.25, then
- mining percent will gradually reduce from 100% to 25% until 100K steps
- after which we only mine top 25% pixels.
- top_k_percent_pixels: A float, the value lies in [0.0, 1.0]. When its value
- < 1.0, only compute the loss for the top k percent pixels (e.g., the top
- 20% pixels). This is useful for hard pixel mining.
- gt_is_matting_map: If true, the groundtruth is a matting map of confidence
- score. If false, the groundtruth is an integer valued class mask.
- scope: String, the scope for the loss.
-
- Raises:
- ValueError: Label or logits is None, or groundtruth is matting map while
- label is not floating value.
- """
- if labels is None:
- raise ValueError('No label for softmax cross entropy loss.')
-
- # If input groundtruth is a matting map of confidence, check if the input
- # labels are floating point values.
- if gt_is_matting_map and not labels.dtype.is_floating:
- raise ValueError('Labels must be floats if groundtruth is a matting map.')
-
- for scale, logits in six.iteritems(scales_to_logits):
- loss_scope = None
- if scope:
- loss_scope = '%s_%s' % (scope, scale)
-
- if upsample_logits:
- # Label is not downsampled, and instead we upsample logits.
- logits = tf.image.resize_bilinear(
- logits,
- preprocess_utils.resolve_shape(labels, 4)[1:3],
- align_corners=True)
- scaled_labels = labels
- else:
- # Label is downsampled to the same size as logits.
- # When gt_is_matting_map = true, label downsampling with nearest neighbor
- # method may introduce artifacts. However, to avoid ignore_label from
- # being interpolated with other labels, we still perform nearest neighbor
- # interpolation.
- # TODO(huizhongc): Change to bilinear interpolation by processing padded
- # and non-padded label separately.
- if gt_is_matting_map:
- tf.logging.warning(
- 'Label downsampling with nearest neighbor may introduce artifacts.')
-
- scaled_labels = tf.image.resize_nearest_neighbor(
- labels,
- preprocess_utils.resolve_shape(logits, 4)[1:3],
- align_corners=True)
-
- scaled_labels = tf.reshape(scaled_labels, shape=[-1])
- weights = utils.get_label_weight_mask(
- scaled_labels, ignore_label, num_classes, label_weights=loss_weight)
- # Dimension of keep_mask is equal to the total number of pixels.
- keep_mask = tf.cast(
- tf.not_equal(scaled_labels, ignore_label), dtype=tf.float32)
-
- train_labels = None
- logits = tf.reshape(logits, shape=[-1, num_classes])
-
- if gt_is_matting_map:
- # When the groundtruth is integer label mask, we can assign class
- # dependent label weights to the loss. When the groundtruth is image
- # matting confidence, we do not apply class-dependent label weight (i.e.,
- # label_weight = 1.0).
- if loss_weight != 1.0:
- raise ValueError(
- 'loss_weight must equal to 1 if groundtruth is matting map.')
-
- # Assign label value 0 to ignore pixels. The exact label value of ignore
- # pixel does not matter, because those ignore_value pixel losses will be
- # multiplied to 0 weight.
- train_labels = scaled_labels * keep_mask
-
- train_labels = tf.expand_dims(train_labels, 1)
- train_labels = tf.concat([1 - train_labels, train_labels], axis=1)
- else:
- train_labels = tf.one_hot(
- scaled_labels, num_classes, on_value=1.0, off_value=0.0)
-
- default_loss_scope = ('softmax_all_pixel_loss'
- if top_k_percent_pixels == 1.0 else
- 'softmax_hard_example_mining')
- with tf.name_scope(loss_scope, default_loss_scope,
- [logits, train_labels, weights]):
- # Compute the loss for all pixels.
- pixel_losses = tf.nn.softmax_cross_entropy_with_logits_v2(
- labels=tf.stop_gradient(
- train_labels, name='train_labels_stop_gradient'),
- logits=logits,
- name='pixel_losses')
- weighted_pixel_losses = tf.multiply(pixel_losses, weights)
-
- if top_k_percent_pixels == 1.0:
- total_loss = tf.reduce_sum(weighted_pixel_losses)
- num_present = tf.reduce_sum(keep_mask)
- loss = _div_maybe_zero(total_loss, num_present)
- tf.losses.add_loss(loss)
- else:
- num_pixels = tf.to_float(tf.shape(logits)[0])
- # Compute the top_k_percent pixels based on current training step.
- if hard_example_mining_step == 0:
- # Directly focus on the top_k pixels.
- top_k_pixels = tf.to_int32(top_k_percent_pixels * num_pixels)
- else:
- # Gradually reduce the mining percent to top_k_percent_pixels.
- global_step = tf.to_float(tf.train.get_or_create_global_step())
- ratio = tf.minimum(1.0, global_step / hard_example_mining_step)
- top_k_pixels = tf.to_int32(
- (ratio * top_k_percent_pixels + (1.0 - ratio)) * num_pixels)
- top_k_losses, _ = tf.nn.top_k(weighted_pixel_losses,
- k=top_k_pixels,
- sorted=True,
- name='top_k_percent_pixels')
- total_loss = tf.reduce_sum(top_k_losses)
- num_present = tf.reduce_sum(
- tf.to_float(tf.not_equal(top_k_losses, 0.0)))
- loss = _div_maybe_zero(total_loss, num_present)
- tf.losses.add_loss(loss)
-
-
-def get_model_init_fn(train_logdir,
- tf_initial_checkpoint,
- initialize_last_layer,
- last_layers,
- ignore_missing_vars=False):
- """Gets the function initializing model variables from a checkpoint.
-
- Args:
- train_logdir: Log directory for training.
- tf_initial_checkpoint: TensorFlow checkpoint for initialization.
- initialize_last_layer: Initialize last layer or not.
- last_layers: Last layers of the model.
- ignore_missing_vars: Ignore missing variables in the checkpoint.
-
- Returns:
- Initialization function.
- """
- if tf_initial_checkpoint is None:
- tf.logging.info('Not initializing the model from a checkpoint.')
- return None
-
- if tf.train.latest_checkpoint(train_logdir):
- tf.logging.info('Ignoring initialization; other checkpoint exists')
- return None
-
- tf.logging.info('Initializing model from path: %s', tf_initial_checkpoint)
-
- # Variables that will not be restored.
- exclude_list = ['global_step']
- if not initialize_last_layer:
- exclude_list.extend(last_layers)
-
- variables_to_restore = contrib_framework.get_variables_to_restore(
- exclude=exclude_list)
-
- if variables_to_restore:
- init_op, init_feed_dict = contrib_framework.assign_from_checkpoint(
- tf_initial_checkpoint,
- variables_to_restore,
- ignore_missing_vars=ignore_missing_vars)
- global_step = tf.train.get_or_create_global_step()
-
- def restore_fn(sess):
- sess.run(init_op, init_feed_dict)
- sess.run([global_step])
-
- return restore_fn
-
- return None
-
-
-def get_model_gradient_multipliers(last_layers, last_layer_gradient_multiplier):
- """Gets the gradient multipliers.
-
- The gradient multipliers will adjust the learning rates for model
- variables. For the task of semantic segmentation, the models are
- usually fine-tuned from the models trained on the task of image
- classification. To fine-tune the models, we usually set larger (e.g.,
- 10 times larger) learning rate for the parameters of last layer.
-
- Args:
- last_layers: Scopes of last layers.
- last_layer_gradient_multiplier: The gradient multiplier for last layers.
-
- Returns:
- The gradient multiplier map with variables as key, and multipliers as value.
- """
- gradient_multipliers = {}
-
- for var in tf.model_variables():
- # Double the learning rate for biases.
- if 'biases' in var.op.name:
- gradient_multipliers[var.op.name] = 2.
-
- # Use larger learning rate for last layer variables.
- for layer in last_layers:
- if layer in var.op.name and 'biases' in var.op.name:
- gradient_multipliers[var.op.name] = 2 * last_layer_gradient_multiplier
- break
- elif layer in var.op.name:
- gradient_multipliers[var.op.name] = last_layer_gradient_multiplier
- break
-
- return gradient_multipliers
-
-
-def get_model_learning_rate(learning_policy,
- base_learning_rate,
- learning_rate_decay_step,
- learning_rate_decay_factor,
- training_number_of_steps,
- learning_power,
- slow_start_step,
- slow_start_learning_rate,
- slow_start_burnin_type='none',
- decay_steps=0.0,
- end_learning_rate=0.0,
- boundaries=None,
- boundary_learning_rates=None):
- """Gets model's learning rate.
-
- Computes the model's learning rate for different learning policy.
- Right now, only "step" and "poly" are supported.
- (1) The learning policy for "step" is computed as follows:
- current_learning_rate = base_learning_rate *
- learning_rate_decay_factor ^ (global_step / learning_rate_decay_step)
- See tf.train.exponential_decay for details.
- (2) The learning policy for "poly" is computed as follows:
- current_learning_rate = base_learning_rate *
- (1 - global_step / training_number_of_steps) ^ learning_power
-
- Args:
- learning_policy: Learning rate policy for training.
- base_learning_rate: The base learning rate for model training.
- learning_rate_decay_step: Decay the base learning rate at a fixed step.
- learning_rate_decay_factor: The rate to decay the base learning rate.
- training_number_of_steps: Number of steps for training.
- learning_power: Power used for 'poly' learning policy.
- slow_start_step: Training model with small learning rate for the first
- few steps.
- slow_start_learning_rate: The learning rate employed during slow start.
- slow_start_burnin_type: The burnin type for the slow start stage. Can be
- `none` which means no burnin or `linear` which means the learning rate
- increases linearly from slow_start_learning_rate and reaches
- base_learning_rate after slow_start_steps.
- decay_steps: Float, `decay_steps` for polynomial learning rate.
- end_learning_rate: Float, `end_learning_rate` for polynomial learning rate.
- boundaries: A list of `Tensor`s or `int`s or `float`s with strictly
- increasing entries.
- boundary_learning_rates: A list of `Tensor`s or `float`s or `int`s that
- specifies the values for the intervals defined by `boundaries`. It should
- have one more element than `boundaries`, and all elements should have the
- same type.
-
- Returns:
- Learning rate for the specified learning policy.
-
- Raises:
- ValueError: If learning policy or slow start burnin type is not recognized.
- ValueError: If `boundaries` and `boundary_learning_rates` are not set for
- multi_steps learning rate decay.
- """
- global_step = tf.train.get_or_create_global_step()
- adjusted_global_step = tf.maximum(global_step - slow_start_step, 0)
- if decay_steps == 0.0:
- tf.logging.info('Setting decay_steps to total training steps.')
- decay_steps = training_number_of_steps - slow_start_step
- if learning_policy == 'step':
- learning_rate = tf.train.exponential_decay(
- base_learning_rate,
- adjusted_global_step,
- learning_rate_decay_step,
- learning_rate_decay_factor,
- staircase=True)
- elif learning_policy == 'poly':
- learning_rate = tf.train.polynomial_decay(
- base_learning_rate,
- adjusted_global_step,
- decay_steps=decay_steps,
- end_learning_rate=end_learning_rate,
- power=learning_power)
- elif learning_policy == 'cosine':
- learning_rate = tf.train.cosine_decay(
- base_learning_rate,
- adjusted_global_step,
- training_number_of_steps - slow_start_step)
- elif learning_policy == 'multi_steps':
- if boundaries is None or boundary_learning_rates is None:
- raise ValueError('Must set `boundaries` and `boundary_learning_rates` '
- 'for multi_steps learning rate decay.')
- learning_rate = tf.train.piecewise_constant_decay(
- adjusted_global_step,
- boundaries,
- boundary_learning_rates)
- else:
- raise ValueError('Unknown learning policy.')
-
- adjusted_slow_start_learning_rate = slow_start_learning_rate
- if slow_start_burnin_type == 'linear':
- # Do linear burnin. Increase linearly from slow_start_learning_rate and
- # reach base_learning_rate after (global_step >= slow_start_steps).
- adjusted_slow_start_learning_rate = (
- slow_start_learning_rate +
- (base_learning_rate - slow_start_learning_rate) *
- tf.to_float(global_step) / slow_start_step)
- elif slow_start_burnin_type != 'none':
- raise ValueError('Unknown burnin type.')
-
- # Employ small learning rate at the first few steps for warm start.
- return tf.where(global_step < slow_start_step,
- adjusted_slow_start_learning_rate, learning_rate)
diff --git a/research/deeplab/vis.py b/research/deeplab/vis.py
deleted file mode 100644
index 20808d37bf2..00000000000
--- a/research/deeplab/vis.py
+++ /dev/null
@@ -1,327 +0,0 @@
-# Lint as: python2, python3
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Segmentation results visualization on a given set of images.
-
-See model.py for more details and usage.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os.path
-import time
-import numpy as np
-from six.moves import range
-import tensorflow as tf
-from tensorflow.contrib import quantize as contrib_quantize
-from tensorflow.contrib import training as contrib_training
-from deeplab import common
-from deeplab import model
-from deeplab.datasets import data_generator
-from deeplab.utils import save_annotation
-
-flags = tf.app.flags
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('master', '', 'BNS name of the tensorflow server')
-
-# Settings for log directories.
-
-flags.DEFINE_string('vis_logdir', None, 'Where to write the event logs.')
-
-flags.DEFINE_string('checkpoint_dir', None, 'Directory of model checkpoints.')
-
-# Settings for visualizing the model.
-
-flags.DEFINE_integer('vis_batch_size', 1,
- 'The number of images in each batch during evaluation.')
-
-flags.DEFINE_list('vis_crop_size', '513,513',
- 'Crop size [height, width] for visualization.')
-
-flags.DEFINE_integer('eval_interval_secs', 60 * 5,
- 'How often (in seconds) to run evaluation.')
-
-# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
-# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
-# one could use different atrous_rates/output_stride during training/evaluation.
-flags.DEFINE_multi_integer('atrous_rates', None,
- 'Atrous rates for atrous spatial pyramid pooling.')
-
-flags.DEFINE_integer('output_stride', 16,
- 'The ratio of input to output spatial resolution.')
-
-# Change to [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] for multi-scale test.
-flags.DEFINE_multi_float('eval_scales', [1.0],
- 'The scales to resize images for evaluation.')
-
-# Change to True for adding flipped images during test.
-flags.DEFINE_bool('add_flipped_images', False,
- 'Add flipped images for evaluation or not.')
-
-flags.DEFINE_integer(
- 'quantize_delay_step', -1,
- 'Steps to start quantized training. If < 0, will not quantize model.')
-
-# Dataset settings.
-
-flags.DEFINE_string('dataset', 'pascal_voc_seg',
- 'Name of the segmentation dataset.')
-
-flags.DEFINE_string('vis_split', 'val',
- 'Which split of the dataset used for visualizing results')
-
-flags.DEFINE_string('dataset_dir', None, 'Where the dataset reside.')
-
-flags.DEFINE_enum('colormap_type', 'pascal', ['pascal', 'cityscapes', 'ade20k'],
- 'Visualization colormap type.')
-
-flags.DEFINE_boolean('also_save_raw_predictions', False,
- 'Also save raw predictions.')
-
-flags.DEFINE_integer('max_number_of_iterations', 0,
- 'Maximum number of visualization iterations. Will loop '
- 'indefinitely upon nonpositive values.')
-
-# The folder where semantic segmentation predictions are saved.
-_SEMANTIC_PREDICTION_SAVE_FOLDER = 'segmentation_results'
-
-# The folder where raw semantic segmentation predictions are saved.
-_RAW_SEMANTIC_PREDICTION_SAVE_FOLDER = 'raw_segmentation_results'
-
-# The format to save image.
-_IMAGE_FORMAT = '%06d_image'
-
-# The format to save prediction
-_PREDICTION_FORMAT = '%06d_prediction'
-
-# To evaluate Cityscapes results on the evaluation server, the labels used
-# during training should be mapped to the labels for evaluation.
-_CITYSCAPES_TRAIN_ID_TO_EVAL_ID = [7, 8, 11, 12, 13, 17, 19, 20, 21, 22,
- 23, 24, 25, 26, 27, 28, 31, 32, 33]
-
-
-def _convert_train_id_to_eval_id(prediction, train_id_to_eval_id):
- """Converts the predicted label for evaluation.
-
- There are cases where the training labels are not equal to the evaluation
- labels. This function is used to perform the conversion so that we could
- evaluate the results on the evaluation server.
-
- Args:
- prediction: Semantic segmentation prediction.
- train_id_to_eval_id: A list mapping from train id to evaluation id.
-
- Returns:
- Semantic segmentation prediction whose labels have been changed.
- """
- converted_prediction = prediction.copy()
- for train_id, eval_id in enumerate(train_id_to_eval_id):
- converted_prediction[prediction == train_id] = eval_id
-
- return converted_prediction
-
-
-def _process_batch(sess, original_images, semantic_predictions, image_names,
- image_heights, image_widths, image_id_offset, save_dir,
- raw_save_dir, train_id_to_eval_id=None):
- """Evaluates one single batch qualitatively.
-
- Args:
- sess: TensorFlow session.
- original_images: One batch of original images.
- semantic_predictions: One batch of semantic segmentation predictions.
- image_names: Image names.
- image_heights: Image heights.
- image_widths: Image widths.
- image_id_offset: Image id offset for indexing images.
- save_dir: The directory where the predictions will be saved.
- raw_save_dir: The directory where the raw predictions will be saved.
- train_id_to_eval_id: A list mapping from train id to eval id.
- """
- (original_images,
- semantic_predictions,
- image_names,
- image_heights,
- image_widths) = sess.run([original_images, semantic_predictions,
- image_names, image_heights, image_widths])
-
- num_image = semantic_predictions.shape[0]
- for i in range(num_image):
- image_height = np.squeeze(image_heights[i])
- image_width = np.squeeze(image_widths[i])
- original_image = np.squeeze(original_images[i])
- semantic_prediction = np.squeeze(semantic_predictions[i])
- crop_semantic_prediction = semantic_prediction[:image_height, :image_width]
-
- # Save image.
- save_annotation.save_annotation(
- original_image, save_dir, _IMAGE_FORMAT % (image_id_offset + i),
- add_colormap=False)
-
- # Save prediction.
- save_annotation.save_annotation(
- crop_semantic_prediction, save_dir,
- _PREDICTION_FORMAT % (image_id_offset + i), add_colormap=True,
- colormap_type=FLAGS.colormap_type)
-
- if FLAGS.also_save_raw_predictions:
- image_filename = os.path.basename(image_names[i])
-
- if train_id_to_eval_id is not None:
- crop_semantic_prediction = _convert_train_id_to_eval_id(
- crop_semantic_prediction,
- train_id_to_eval_id)
- save_annotation.save_annotation(
- crop_semantic_prediction, raw_save_dir, image_filename,
- add_colormap=False)
-
-
-def main(unused_argv):
- tf.logging.set_verbosity(tf.logging.INFO)
-
- # Get dataset-dependent information.
- dataset = data_generator.Dataset(
- dataset_name=FLAGS.dataset,
- split_name=FLAGS.vis_split,
- dataset_dir=FLAGS.dataset_dir,
- batch_size=FLAGS.vis_batch_size,
- crop_size=[int(sz) for sz in FLAGS.vis_crop_size],
- min_resize_value=FLAGS.min_resize_value,
- max_resize_value=FLAGS.max_resize_value,
- resize_factor=FLAGS.resize_factor,
- model_variant=FLAGS.model_variant,
- is_training=False,
- should_shuffle=False,
- should_repeat=False)
-
- train_id_to_eval_id = None
- if dataset.dataset_name == data_generator.get_cityscapes_dataset_name():
- tf.logging.info('Cityscapes requires converting train_id to eval_id.')
- train_id_to_eval_id = _CITYSCAPES_TRAIN_ID_TO_EVAL_ID
-
- # Prepare for visualization.
- tf.gfile.MakeDirs(FLAGS.vis_logdir)
- save_dir = os.path.join(FLAGS.vis_logdir, _SEMANTIC_PREDICTION_SAVE_FOLDER)
- tf.gfile.MakeDirs(save_dir)
- raw_save_dir = os.path.join(
- FLAGS.vis_logdir, _RAW_SEMANTIC_PREDICTION_SAVE_FOLDER)
- tf.gfile.MakeDirs(raw_save_dir)
-
- tf.logging.info('Visualizing on %s set', FLAGS.vis_split)
-
- with tf.Graph().as_default():
- samples = dataset.get_one_shot_iterator().get_next()
-
- model_options = common.ModelOptions(
- outputs_to_num_classes={common.OUTPUT_TYPE: dataset.num_of_classes},
- crop_size=[int(sz) for sz in FLAGS.vis_crop_size],
- atrous_rates=FLAGS.atrous_rates,
- output_stride=FLAGS.output_stride)
-
- if tuple(FLAGS.eval_scales) == (1.0,):
- tf.logging.info('Performing single-scale test.')
- predictions = model.predict_labels(
- samples[common.IMAGE],
- model_options=model_options,
- image_pyramid=FLAGS.image_pyramid)
- else:
- tf.logging.info('Performing multi-scale test.')
- if FLAGS.quantize_delay_step >= 0:
- raise ValueError(
- 'Quantize mode is not supported with multi-scale test.')
- predictions = model.predict_labels_multi_scale(
- samples[common.IMAGE],
- model_options=model_options,
- eval_scales=FLAGS.eval_scales,
- add_flipped_images=FLAGS.add_flipped_images)
- predictions = predictions[common.OUTPUT_TYPE]
-
- if FLAGS.min_resize_value and FLAGS.max_resize_value:
- # Only support batch_size = 1, since we assume the dimensions of original
- # image after tf.squeeze is [height, width, 3].
- assert FLAGS.vis_batch_size == 1
-
- # Reverse the resizing and padding operations performed in preprocessing.
- # First, we slice the valid regions (i.e., remove padded region) and then
- # we resize the predictions back.
- original_image = tf.squeeze(samples[common.ORIGINAL_IMAGE])
- original_image_shape = tf.shape(original_image)
- predictions = tf.slice(
- predictions,
- [0, 0, 0],
- [1, original_image_shape[0], original_image_shape[1]])
- resized_shape = tf.to_int32([tf.squeeze(samples[common.HEIGHT]),
- tf.squeeze(samples[common.WIDTH])])
- predictions = tf.squeeze(
- tf.image.resize_images(tf.expand_dims(predictions, 3),
- resized_shape,
- method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
- align_corners=True), 3)
-
- tf.train.get_or_create_global_step()
- if FLAGS.quantize_delay_step >= 0:
- contrib_quantize.create_eval_graph()
-
- num_iteration = 0
- max_num_iteration = FLAGS.max_number_of_iterations
-
- checkpoints_iterator = contrib_training.checkpoints_iterator(
- FLAGS.checkpoint_dir, min_interval_secs=FLAGS.eval_interval_secs)
- for checkpoint_path in checkpoints_iterator:
- num_iteration += 1
- tf.logging.info(
- 'Starting visualization at ' + time.strftime('%Y-%m-%d-%H:%M:%S',
- time.gmtime()))
- tf.logging.info('Visualizing with model %s', checkpoint_path)
-
- scaffold = tf.train.Scaffold(init_op=tf.global_variables_initializer())
- session_creator = tf.train.ChiefSessionCreator(
- scaffold=scaffold,
- master=FLAGS.master,
- checkpoint_filename_with_path=checkpoint_path)
- with tf.train.MonitoredSession(
- session_creator=session_creator, hooks=None) as sess:
- batch = 0
- image_id_offset = 0
-
- while not sess.should_stop():
- tf.logging.info('Visualizing batch %d', batch + 1)
- _process_batch(sess=sess,
- original_images=samples[common.ORIGINAL_IMAGE],
- semantic_predictions=predictions,
- image_names=samples[common.IMAGE_NAME],
- image_heights=samples[common.HEIGHT],
- image_widths=samples[common.WIDTH],
- image_id_offset=image_id_offset,
- save_dir=save_dir,
- raw_save_dir=raw_save_dir,
- train_id_to_eval_id=train_id_to_eval_id)
- image_id_offset += FLAGS.vis_batch_size
- batch += 1
-
- tf.logging.info(
- 'Finished visualization at ' + time.strftime('%Y-%m-%d-%H:%M:%S',
- time.gmtime()))
- if max_num_iteration > 0 and num_iteration >= max_num_iteration:
- break
-
-if __name__ == '__main__':
- flags.mark_flag_as_required('checkpoint_dir')
- flags.mark_flag_as_required('vis_logdir')
- flags.mark_flag_as_required('dataset_dir')
- tf.app.run()
diff --git a/research/delf/.gitignore b/research/delf/.gitignore
deleted file mode 100644
index b61ddd10001..00000000000
--- a/research/delf/.gitignore
+++ /dev/null
@@ -1,4 +0,0 @@
-*pyc
-*~
-*pb2.py
-*pb2.pyc
diff --git a/research/delf/DETECTION.md b/research/delf/DETECTION.md
deleted file mode 100644
index 7fa7570f74d..00000000000
--- a/research/delf/DETECTION.md
+++ /dev/null
@@ -1,69 +0,0 @@
-## Quick start: landmark detection
-
-[![Paper](http://img.shields.io/badge/paper-arXiv.1812.01584-B3181B.svg)](https://arxiv.org/abs/1812.01584)
-
-### Install DELF library
-
-To be able to use this code, please follow
-[these instructions](INSTALL_INSTRUCTIONS.md) to properly install the DELF
-library.
-
-### Download Oxford buildings dataset
-
-To illustrate detector usage, please download the Oxford buildings dataset, by
-following the instructions
-[here](EXTRACTION_MATCHING.md#download-oxford-buildings-dataset). Then, create
-the file `list_images_detector.txt` as follows:
-
-```bash
-# From tensorflow/models/research/delf/delf/python/examples/
-echo data/oxford5k_images/all_souls_000002.jpg >> list_images_detector.txt
-echo data/oxford5k_images/all_souls_000035.jpg >> list_images_detector.txt
-```
-
-### Download detector model
-
-Also, you will need to download the pre-trained detector model:
-
-```bash
-# From tensorflow/models/research/delf/delf/python/examples/
-mkdir parameters && cd parameters
-wget http://storage.googleapis.com/delf/d2r_frcnn_20190411.tar.gz
-tar -xvzf d2r_frcnn_20190411.tar.gz
-```
-
-**Note**: this is the Faster-RCNN based model. We also release a MobileNet-SSD
-model, see the [README](README.md#pre-trained-models) for download link. The
-instructions should work seamlessly for both models.
-
-### Detecting landmarks
-
-Now that you have everything in place, running this command should detect boxes
-for the images `all_souls_000002.jpg` and `all_souls_000035.jpg`, with a
-threshold of 0.8, and produce visualizations.
-
-```bash
-# From tensorflow/models/research/delf/delf/python/examples/
-python3 extract_boxes.py \
- --detector_path parameters/d2r_frcnn_20190411 \
- --detector_thresh 0.8 \
- --list_images_path list_images_detector.txt \
- --output_dir data/oxford5k_boxes \
- --output_viz_dir data/oxford5k_boxes_viz
-```
-
-Two images are generated in the `data/oxford5k_boxes_viz` directory, they should
-look similar to these ones:
-
-![DetectionExample1](delf/python/examples/detection_example_1.jpg)
-![DetectionExample2](delf/python/examples/detection_example_2.jpg)
-
-### Troubleshooting
-
-#### `matplotlib`
-
-`matplotlib` may complain with a message such as `no display name and no
-$DISPLAY environment variable`. To fix this, one option is add the line
-`backend : Agg` to the file `.config/matplotlib/matplotlibrc`. On this problem,
-see the discussion
-[here](https://stackoverflow.com/questions/37604289/tkinter-tclerror-no-display-name-and-no-display-environment-variable).
diff --git a/research/delf/EXTRACTION_MATCHING.md b/research/delf/EXTRACTION_MATCHING.md
deleted file mode 100644
index 53159638587..00000000000
--- a/research/delf/EXTRACTION_MATCHING.md
+++ /dev/null
@@ -1,87 +0,0 @@
-## Quick start: DELF extraction and matching
-
-[![Paper](http://img.shields.io/badge/paper-arXiv.1612.06321-B3181B.svg)](https://arxiv.org/abs/1612.06321)
-
-### Install DELF library
-
-To be able to use this code, please follow
-[these instructions](INSTALL_INSTRUCTIONS.md) to properly install the DELF
-library.
-
-### Download Oxford buildings dataset
-
-To illustrate DELF usage, please download the Oxford buildings dataset. To
-follow these instructions closely, please download the dataset to the
-`tensorflow/models/research/delf/delf/python/examples` directory, as in the
-following commands:
-
-```bash
-# From tensorflow/models/research/delf/delf/python/examples/
-mkdir data && cd data
-wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
-mkdir oxford5k_images oxford5k_features
-tar -xvzf oxbuild_images.tgz -C oxford5k_images/
-cd ../
-echo data/oxford5k_images/hertford_000056.jpg >> list_images.txt
-echo data/oxford5k_images/oxford_000317.jpg >> list_images.txt
-```
-
-### Download pre-trained DELF model
-
-Also, you will need to download the trained DELF model:
-
-```bash
-# From tensorflow/models/research/delf/delf/python/examples/
-mkdir parameters && cd parameters
-wget http://storage.googleapis.com/delf/delf_gld_20190411.tar.gz
-tar -xvzf delf_gld_20190411.tar.gz
-```
-
-### DELF feature extraction
-
-Now that you have everything in place, running this command should extract DELF
-features for the images `hertford_000056.jpg` and `oxford_000317.jpg`:
-
-```bash
-# From tensorflow/models/research/delf/delf/python/examples/
-python3 extract_features.py \
- --config_path delf_config_example.pbtxt \
- --list_images_path list_images.txt \
- --output_dir data/oxford5k_features
-```
-
-### Image matching using DELF features
-
-After feature extraction, run this command to perform feature matching between
-the images `hertford_000056.jpg` and `oxford_000317.jpg`:
-
-```bash
-python3 match_images.py \
- --image_1_path data/oxford5k_images/hertford_000056.jpg \
- --image_2_path data/oxford5k_images/oxford_000317.jpg \
- --features_1_path data/oxford5k_features/hertford_000056.delf \
- --features_2_path data/oxford5k_features/oxford_000317.delf \
- --output_image matched_images.png
-```
-
-The image `matched_images.png` is generated and should look similar to this one:
-
-![MatchedImagesExample](delf/python/examples/matched_images_example.jpg)
-
-### Troubleshooting
-
-#### `matplotlib`
-
-`matplotlib` may complain with a message such as `no display name and no
-$DISPLAY environment variable`. To fix this, one option is add the line
-`backend : Agg` to the file `.config/matplotlib/matplotlibrc`. On this problem,
-see the discussion
-[here](https://stackoverflow.com/questions/37604289/tkinter-tclerror-no-display-name-and-no-display-environment-variable).
-
-#### 'skimage'
-
-By default, skimage 0.13.XX or 0.14.1 is installed if you followed the
-instructions. According to
-[https://github.com/scikit-image/scikit-image/issues/3649#issuecomment-455273659]
-If you have scikit-image related issues, upgrading to a version above 0.14.1
-with `pip3 install -U scikit-image` should fix the issue
diff --git a/research/delf/INSTALL_INSTRUCTIONS.md b/research/delf/INSTALL_INSTRUCTIONS.md
deleted file mode 100644
index f5616f47ff0..00000000000
--- a/research/delf/INSTALL_INSTRUCTIONS.md
+++ /dev/null
@@ -1,157 +0,0 @@
-## DELF installation
-
-### Installation script
-
-We now have a script to do the entire installation in one shot. Navigate to the
-directory `models/research/delf/delf/python/training`, then run:
-
-```bash
-# From models/research/delf/delf/python/training
-bash install_delf.sh
-```
-
-If this works, you are done! If not, see below for detailed instructions for
-installing this codebase and its dependencies.
-
-*Please note that this installation script only works on 64 bits Linux
-architectures due to the `protoc` binary that is automatically downloaded. If
-you wish to install the DELF library on other architectures please update the
-[`install_delf.sh`](delf/python/training/install_delf.sh) script by referencing
-the desired `protoc`
-[binary release](https://github.com/protocolbuffers/protobuf/releases).*
-
-In more detail: the `install_delf.sh` script installs both the DELF library and
-its dependencies in the following sequence:
-
-* Install TensorFlow 2.2 and TensorFlow 2.2 for GPU.
-* Install the [TF-Slim](https://github.com/google-research/tf-slim) library
- from source.
-* Download [protoc](https://github.com/protocolbuffers/protobuf) and compile
- the DELF Protocol Buffers.
-* Install the matplotlib, numpy, scikit-image, scipy and python3-tk Python
- libraries.
-* Install the
- [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection)
- from the cloned TensorFlow Model Garden repository.
-* Install the DELF package.
-
-### Tensorflow
-
-[![TensorFlow 2.2](https://img.shields.io/badge/tensorflow-2.2-brightgreen)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0)
-[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)
-
-For detailed steps to install Tensorflow, follow the
-[Tensorflow installation instructions](https://www.tensorflow.org/install/). A
-typical user can install Tensorflow using one of the following commands:
-
-```bash
-# For CPU:
-pip3 install 'tensorflow>=2.2.0'
-# For GPU:
-pip3 install 'tensorflow-gpu>=2.2.0'
-```
-
-### TF-Slim
-
-Note: currently, we need to install the latest version from source, to avoid
-using previous versions which relied on tf.contrib (which is now deprecated).
-
-```bash
-git clone git@github.com:google-research/tf-slim.git
-cd tf-slim
-pip3 install .
-```
-
-Note that these commands assume you are cloning using SSH. If you are using
-HTTPS instead, use `git clone https://github.com/google-research/tf-slim.git`
-instead. See
-[this link](https://help.github.com/en/github/using-git/which-remote-url-should-i-use)
-for more information.
-
-### Protobuf
-
-The DELF library uses [protobuf](https://github.com/google/protobuf) (the python
-version) to configure feature extraction and its format. You will need the
-`protoc` compiler, version >= 3.3. The easiest way to get it is to download
-directly. For Linux, this can be done as (see
-[here](https://github.com/google/protobuf/releases) for other platforms):
-
-```bash
-wget https://github.com/google/protobuf/releases/download/v3.3.0/protoc-3.3.0-linux-x86_64.zip
-unzip protoc-3.3.0-linux-x86_64.zip
-PATH_TO_PROTOC=`pwd`
-```
-
-### Python dependencies
-
-Install python library dependencies:
-
-```bash
-pip3 install matplotlib numpy scikit-image scipy
-sudo apt-get install python3-tk
-```
-
-### `tensorflow/models`
-
-Now, clone `tensorflow/models`, and install required libraries: (note that the
-`object_detection` library requires you to add `tensorflow/models/research/` to
-your `PYTHONPATH`, as instructed
-[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md))
-
-```bash
-git clone git@github.com:tensorflow/models.git
-
-# Setup the object_detection module by editing PYTHONPATH.
-cd ..
-# From tensorflow/models/research/
-export PYTHONPATH=$PYTHONPATH:`pwd`
-```
-
-Note that these commands assume you are cloning using SSH. If you are using
-HTTPS instead, use `git clone https://github.com/tensorflow/models.git` instead.
-See
-[this link](https://help.github.com/en/github/using-git/which-remote-url-should-i-use)
-for more information.
-
-Then, compile DELF's protobufs. Use `PATH_TO_PROTOC` as the directory where you
-downloaded the `protoc` compiler.
-
-```bash
-# From tensorflow/models/research/delf/
-${PATH_TO_PROTOC?}/bin/protoc delf/protos/*.proto --python_out=.
-```
-
-Finally, install the DELF package. This may also install some other dependencies
-under the hood.
-
-```bash
-# From tensorflow/models/research/delf/
-pip3 install -e . # Install "delf" package.
-```
-
-At this point, running
-
-```bash
-python3 -c 'import delf'
-```
-
-should just return without complaints. This indicates that the DELF package is
-loaded successfully.
-
-### Troubleshooting
-
-#### `pip3 install`
-
-Issues might be observed if using `pip3 install` with `-e` option (editable
-mode). You may try out to simply remove the `-e` from the commands above. Also,
-depending on your machine setup, you might need to run the `sudo pip3 install`
-command, that is with a `sudo` at the beginning.
-
-#### Cloning github repositories
-
-The default commands above assume you are cloning using SSH. If you are using
-HTTPS instead, use for example `git clone
-https://github.com/tensorflow/models.git` instead of `git clone
-git@github.com:tensorflow/models.git`. See
-[this link](https://help.github.com/en/github/using-git/which-remote-url-should-i-use)
-for more information.
diff --git a/research/delf/README.md b/research/delf/README.md
deleted file mode 100644
index 274723db5be..00000000000
--- a/research/delf/README.md
+++ /dev/null
@@ -1,291 +0,0 @@
-# Deep Local and Global Image Features
-
-[![TensorFlow 2.2](https://img.shields.io/badge/tensorflow-2.2-brightgreen)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0)
-[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)
-
-This project presents code for deep local and global image feature methods,
-which are particularly useful for the computer vision tasks of instance-level
-recognition and retrieval. These were introduced in the
-[DELF](https://arxiv.org/abs/1612.06321),
-[Detect-to-Retrieve](https://arxiv.org/abs/1812.01584),
-[DELG](https://arxiv.org/abs/2001.05027) and
-[Google Landmarks Dataset v2](https://arxiv.org/abs/2004.01804) papers.
-
-We provide Tensorflow code for building and training models, and python code for
-image retrieval and local feature matching. Pre-trained models for the landmark
-recognition domain are also provided.
-
-If you make use of this codebase, please consider citing the following papers:
-
-DELF:
-[![Paper](http://img.shields.io/badge/paper-arXiv.1612.06321-B3181B.svg)](https://arxiv.org/abs/1612.06321)
-
-```
-"Large-Scale Image Retrieval with Attentive Deep Local Features",
-H. Noh, A. Araujo, J. Sim, T. Weyand and B. Han,
-Proc. ICCV'17
-```
-
-Detect-to-Retrieve:
-[![Paper](http://img.shields.io/badge/paper-arXiv.1812.01584-B3181B.svg)](https://arxiv.org/abs/1812.01584)
-
-```
-"Detect-to-Retrieve: Efficient Regional Aggregation for Image Search",
-M. Teichmann*, A. Araujo*, M. Zhu and J. Sim,
-Proc. CVPR'19
-```
-
-DELG:
-[![Paper](http://img.shields.io/badge/paper-arXiv.2001.05027-B3181B.svg)](https://arxiv.org/abs/2001.05027)
-
-```
-"Unifying Deep Local and Global Features for Image Search",
-B. Cao*, A. Araujo* and J. Sim,
-Proc. ECCV'20
-```
-
-GLDv2:
-[![Paper](http://img.shields.io/badge/paper-arXiv.2004.01804-B3181B.svg)](https://arxiv.org/abs/2004.01804)
-
-```
-"Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval",
-T. Weyand*, A. Araujo*, B. Cao and J. Sim,
-Proc. CVPR'20
-```
-
-## News
-
-- [Jul'20] Check out our ECCV'20 paper:
- ["Unifying Deep Local and Global Features for Image Search"](https://arxiv.org/abs/2001.05027)
-- [Apr'20] Check out our CVPR'20 paper: ["Google Landmarks Dataset v2 - A
- Large-Scale Benchmark for Instance-Level Recognition and
- Retrieval"](https://arxiv.org/abs/2004.01804)
-- [Jun'19] DELF achieved 2nd place in
- [CVPR Visual Localization challenge (Local Features track)](https://sites.google.com/corp/view/ltvl2019).
- See our slides
- [here](https://docs.google.com/presentation/d/e/2PACX-1vTswzoXelqFqI_pCEIVl2uazeyGr7aKNklWHQCX-CbQ7MB17gaycqIaDTguuUCRm6_lXHwCdrkP7n1x/pub?start=false&loop=false&delayms=3000).
-- [Apr'19] Check out our CVPR'19 paper:
- ["Detect-to-Retrieve: Efficient Regional Aggregation for Image Search"](https://arxiv.org/abs/1812.01584)
-- [Jun'18] DELF achieved state-of-the-art results in a CVPR'18 image retrieval
- paper: [Radenovic et al., "Revisiting Oxford and Paris: Large-Scale Image
- Retrieval Benchmarking"](https://arxiv.org/abs/1803.11285).
-- [Apr'18] DELF was featured in
- [ModelDepot](https://modeldepot.io/mikeshi/delf/overview)
-- [Mar'18] DELF is now available in
- [TF-Hub](https://www.tensorflow.org/hub/modules/google/delf/1)
-
-## Datasets
-
-We have two Google-Landmarks dataset versions:
-
-- Initial version (v1) can be found
- [here](https://www.kaggle.com/google/google-landmarks-dataset). In includes
- the Google Landmark Boxes which were described in the Detect-to-Retrieve
- paper.
-- Second version (v2) has been released as part of two Kaggle challenges:
- [Landmark Recognition](https://www.kaggle.com/c/landmark-recognition-2019)
- and [Landmark Retrieval](https://www.kaggle.com/c/landmark-retrieval-2019).
- It can be downloaded from CVDF
- [here](https://github.com/cvdfoundation/google-landmark). See also
- [the CVPR'20 paper](https://arxiv.org/abs/2004.01804) on this new dataset
- version.
-
-If you make use of these datasets in your research, please consider citing the
-papers mentioned above.
-
-## Installation
-
-To be able to use this code, please follow
-[these instructions](INSTALL_INSTRUCTIONS.md) to properly install the DELF
-library.
-
-## Quick start
-
-### Pre-trained models
-
-We release several pre-trained models. See instructions in the following
-sections for examples on how to use the models.
-
-**DELF pre-trained on the Google-Landmarks dataset v1**
-([link](http://storage.googleapis.com/delf/delf_gld_20190411.tar.gz)). Presented
-in the [Detect-to-Retrieve paper](https://arxiv.org/abs/1812.01584). Boosts
-performance by ~4% mAP compared to ICCV'17 DELF model.
-
-**DELG pre-trained on the Google-Landmarks dataset v1**
-([R101-DELG](http://storage.googleapis.com/delf/r101delg_gld_20200814.tar.gz),
-[R50-DELG](http://storage.googleapis.com/delf/r50delg_gld_20200814.tar.gz)).
-Presented in the [DELG paper](https://arxiv.org/abs/2001.05027).
-
-**DELG pre-trained on the Google-Landmarks dataset v2 (clean)**
-([R101-DELG](https://storage.googleapis.com/delf/r101delg_gldv2clean_20200914.tar.gz),
-[R50-DELG](https://storage.googleapis.com/delf/r50delg_gldv2clean_20200914.tar.gz)).
-Presented in the [DELG paper](https://arxiv.org/abs/2001.05027).
-
-**RN101-ArcFace pre-trained on the Google-Landmarks dataset v2 (train-clean)**
-([link](https://storage.googleapis.com/delf/rn101_af_gldv2clean_20200814.tar.gz)).
-Presented in the [GLDv2 paper](https://arxiv.org/abs/2004.01804).
-
-**DELF pre-trained on Landmarks-Clean/Landmarks-Full dataset**
-([link](http://storage.googleapis.com/delf/delf_v1_20171026.tar.gz)). Presented
-in the [DELF paper](https://arxiv.org/abs/1612.06321), model was trained on the
-dataset released by the [DIR paper](https://arxiv.org/abs/1604.01325).
-
-**Faster-RCNN detector pre-trained on Google Landmark Boxes**
-([link](http://storage.googleapis.com/delf/d2r_frcnn_20190411.tar.gz)).
-Presented in the [Detect-to-Retrieve paper](https://arxiv.org/abs/1812.01584).
-
-**MobileNet-SSD detector pre-trained on Google Landmark Boxes**
-([link](http://storage.googleapis.com/delf/d2r_mnetssd_20190411.tar.gz)).
-Presented in the [Detect-to-Retrieve paper](https://arxiv.org/abs/1812.01584).
-
-Besides these, we also release pre-trained codebooks for local feature
-aggregation. See the
-[Detect-to-Retrieve instructions](delf/python/detect_to_retrieve/DETECT_TO_RETRIEVE_INSTRUCTIONS.md)
-for details.
-
-### DELF extraction and matching
-
-Please follow [these instructions](EXTRACTION_MATCHING.md). At the end, you
-should obtain a nice figure showing local feature matches, as:
-
-![MatchedImagesExample](delf/python/examples/matched_images_example.jpg)
-
-### DELF training
-
-Please follow [these instructions](delf/python/training/README.md).
-
-### DELG
-
-Please follow [these instructions](delf/python/delg/DELG_INSTRUCTIONS.md). At
-the end, you should obtain image retrieval results on the Revisited Oxford/Paris
-datasets.
-
-### GLDv2 baseline
-
-Please follow
-[these instructions](delf/python/datasets/google_landmarks_dataset/README.md). At the
-end, you should obtain image retrieval results on the Revisited Oxford/Paris
-datasets.
-
-### Landmark detection
-
-Please follow [these instructions](DETECTION.md). At the end, you should obtain
-a nice figure showing a detection, as:
-
-![DetectionExample1](delf/python/examples/detection_example_1.jpg)
-
-### Detect-to-Retrieve
-
-Please follow
-[these instructions](delf/python/detect_to_retrieve/DETECT_TO_RETRIEVE_INSTRUCTIONS.md).
-At the end, you should obtain image retrieval results on the Revisited
-Oxford/Paris datasets.
-
-## Code overview
-
-DELF/D2R/DELG/GLD code is located under the `delf` directory. There are two
-directories therein, `protos` and `python`.
-
-### `delf/protos`
-
-This directory contains protobufs for local feature aggregation
-(`aggregation_config.proto`), serializing detected boxes (`box.proto`),
-serializing float tensors (`datum.proto`), configuring DELF/DELG extraction
-(`delf_config.proto`), serializing local features (`feature.proto`).
-
-### `delf/python`
-
-This directory contains files for several different purposes, such as:
-reading/writing tensors/features (`box_io.py`, `datum_io.py`, `feature_io.py`),
-local feature aggregation extraction and similarity computation
-(`feature_aggregation_extractor.py`, `feature_aggregation_similarity.py`) and
-helper functions for image/feature loading/processing (`utils.py`,
-`feature_extractor.py`).
-
-The subdirectory `delf/python/examples` contains sample scripts to run DELF/DELG
-feature extraction/matching (`extractor.py`, `extract_features.py`,
-`match_images.py`) and object detection (`detector.py`, `extract_boxes.py`).
-`delf_config_example.pbtxt` shows an example instantiation of the DelfConfig
-proto, used for DELF feature extraction.
-
-The subdirectory `delf/python/delg` contains sample scripts/configs related to
-the DELG paper: `extract_features.py` for local+global feature extraction (with
-and example `delg_gld_config.pbtxt`) and `perform_retrieval.py` for performing
-retrieval/scoring.
-
-The subdirectory `delf/python/detect_to_retrieve` contains sample
-scripts/configs related to the Detect-to-Retrieve paper, for feature/box
-extraction/aggregation/clustering (`aggregation_extraction.py`,
-`boxes_and_features_extraction.py`, `cluster_delf_features.py`,
-`extract_aggregation.py`, `extract_index_boxes_and_features.py`,
-`extract_query_features.py`), image retrieval/reranking (`perform_retrieval.py`,
-`image_reranking.py`), along with configs used for feature
-extraction/aggregation (`delf_gld_config.pbtxt`,
-`index_aggregation_config.pbtxt`, `query_aggregation_config.pbtxt`) and
-Revisited Oxford/Paris dataset parsing/evaluation (`dataset.py`).
-
-The subdirectory `delf/python/google_landmarks_dataset` contains sample
-scripts/modules for computing GLD metrics (`metrics.py`,
-`compute_recognition_metrics.py`, `compute_retrieval_metrics.py`), GLD file IO
-(`dataset_file_io.py`) / reproducing results from the GLDv2 paper
-(`rn101_af_gldv2clean_config.pbtxt` and the instructions therein).
-
-The subdirectory `delf/python/training` contains sample scripts/modules for
-performing model training (`train.py`) based on a ResNet50 DELF model
-(`model/resnet50.py`, `model/delf_model.py`), also presenting relevant model
-exporting scripts and associated utils (`model/export_model.py`,
-`model/export_global_model.py`, `model/export_model_utils.py`) and dataset
-downloading/preprocessing (`download_dataset.sh`, `build_image_dataset.py`,
-`datasets/googlelandmarks.py`).
-
-Besides these, other files in the different subdirectories contain tests for the
-various modules.
-
-## Maintainers
-
-André Araujo (@andrefaraujo)
-
-## Release history
-
-### Jul, 2020
-
-- Full TF2 support. Only one minor `compat.v1` usage left. Updated
- instructions to require TF2.2
-- Refactored / much improved training code, with very detailed, step-by-step
- instructions
-
-**Thanks to contributors**: Dan Anghel, Barbara Fusinska and André
-Araujo.
-
-### May, 2020
-
-- Codebase is now Python3-first
-- DELG model/code released
-- GLDv2 baseline model released
-
-**Thanks to contributors**: Barbara Fusinska and André Araujo.
-
-### April, 2020 (version 2.0)
-
-- Initial DELF training code released.
-- Codebase is now fully compatible with TF 2.1.
-
-**Thanks to contributors**: Arun Mukundan, Yuewei Na and André Araujo.
-
-### April, 2019
-
-Detect-to-Retrieve code released.
-
-Includes pre-trained models to detect landmark boxes, and DELF model pre-trained
-on Google Landmarks v1 dataset.
-
-**Thanks to contributors**: André Araujo, Marvin Teichmann, Menglong Zhu,
-Jack Sim.
-
-### October, 2017
-
-Initial release containing DELF-v1 code, including feature extraction and
-matching examples. Pre-trained DELF model from ICCV'17 paper is released.
-
-**Thanks to contributors**: André Araujo, Hyeonwoo Noh, Youlong Cheng,
-Jack Sim.
diff --git a/research/delf/delf/__init__.py b/research/delf/delf/__init__.py
deleted file mode 100644
index a3c5d37bc44..00000000000
--- a/research/delf/delf/__init__.py
+++ /dev/null
@@ -1,42 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Module to extract deep local features."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-# pylint: disable=unused-import
-from delf.protos import aggregation_config_pb2
-from delf.protos import box_pb2
-from delf.protos import datum_pb2
-from delf.protos import delf_config_pb2
-from delf.protos import feature_pb2
-from delf.python import box_io
-from delf.python import datum_io
-from delf.python import feature_aggregation_extractor
-from delf.python import feature_aggregation_similarity
-from delf.python import feature_extractor
-from delf.python import feature_io
-from delf.python import utils
-from delf.python import whiten
-from delf.python.examples import detector
-from delf.python.examples import extractor
-from delf.python import detect_to_retrieve
-from delf.python import training
-from delf.python.training import model
-from delf.python import datasets
-from delf.python.datasets import google_landmarks_dataset
-from delf.python.datasets import revisited_op
-# pylint: enable=unused-import
diff --git a/research/delf/delf/protos/__init__.py b/research/delf/delf/protos/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/delf/delf/protos/aggregation_config.proto b/research/delf/delf/protos/aggregation_config.proto
deleted file mode 100644
index b1d5953d43f..00000000000
--- a/research/delf/delf/protos/aggregation_config.proto
+++ /dev/null
@@ -1,63 +0,0 @@
-// Protocol buffer for feature aggregation configuration.
-//
-// Used for both extraction and comparison of aggregated representations. Note
-// that some options are only relevant for the former or the latter.
-//
-// For more details, please refer to the paper:
-// "Detect-to-Retrieve: Efficient Regional Aggregation for Image Search",
-// Proc. CVPR'19 (https://arxiv.org/abs/1812.01584).
-
-syntax = "proto2";
-
-package delf.protos;
-
-message AggregationConfig {
- // Number of codewords (ie, visual words) in the codebook.
- optional int32 codebook_size = 1 [default = 65536];
-
- // Dimensionality of local features (eg, 128 for DELF used in
- // Detect-to-Retrieve paper).
- optional int32 feature_dimensionality = 2 [default = 128];
-
- // Type of aggregation to use.
- // For example, to use R-ASMK*, `aggregation_type` should be set to ASMK_STAR
- // and `use_regional_aggregation` should be set to true.
- enum AggregationType {
- INVALID = 0;
- VLAD = 1;
- ASMK = 2;
- ASMK_STAR = 3;
- }
- optional AggregationType aggregation_type = 3 [default = ASMK_STAR];
-
- // L2 normalization option.
- // - For vanilla aggregated kernels (eg, VLAD/ASMK/ASMK*), this should be
- // set to true.
- // - For regional aggregated kernels (ie, if `use_regional_aggregation` is
- // true, leading to R-VLAD/R-ASMK/R-ASMK*), this should be set to false.
- // Note that it is used differently depending on the `aggregation_type`:
- // - For VLAD, this option is only used for extraction.
- // - For ASMK/ASMK*, this option is only used for comparisons.
- optional bool use_l2_normalization = 4 [default = true];
-
- // Additional options used only for extraction.
- // - Path to codebook checkpoint for aggregation.
- optional string codebook_path = 5;
- // - Number of visual words to assign each feature.
- optional int32 num_assignments = 6 [default = 1];
- // - Whether to use regional aggregation.
- optional bool use_regional_aggregation = 7 [default = false];
- // - Batch size to use for local features when computing aggregated
- // representations. Particularly useful if `codebook_size` and
- // `feature_dimensionality` are large, to avoid OOM. A value of zero or
- // lower indicates that no batching is used.
- optional int32 feature_batch_size = 10 [default = 100];
-
- // Additional options used only for comparison.
- // Only relevant if `aggregation_type` is ASMK or ASMK_STAR.
- // - Power-law exponent for similarity of visual word descriptors.
- optional float alpha = 8 [default = 3.0];
- // - Threshold above which similarity of visual word descriptors are
- // considered; below this, similarity is set to zero.
- optional float tau = 9 [default = 0.0];
-}
diff --git a/research/delf/delf/protos/box.proto b/research/delf/delf/protos/box.proto
deleted file mode 100644
index 28da7fb7141..00000000000
--- a/research/delf/delf/protos/box.proto
+++ /dev/null
@@ -1,24 +0,0 @@
-// Protocol buffer for serializing detected bounding boxes.
-
-syntax = "proto2";
-
-package delf.protos;
-
-message Box {
- // Coordinates: [ymin, xmin, ymax, xmax] corresponds to
- // [top, left, bottom, right].
- optional float ymin = 1;
- optional float xmin = 2;
- optional float ymax = 3;
- optional float xmax = 4;
-
- // Detection score. Usually, the higher the more confident.
- optional float score = 5;
-
- // Indicates which class the box corresponds to.
- optional int32 class_index = 6;
-}
-
-message Boxes {
- repeated Box box = 1;
-}
diff --git a/research/delf/delf/protos/datum.proto b/research/delf/delf/protos/datum.proto
deleted file mode 100644
index 6806e56b25e..00000000000
--- a/research/delf/delf/protos/datum.proto
+++ /dev/null
@@ -1,66 +0,0 @@
-// Protocol buffer for serializing arbitrary float tensors.
-// Note: Currently only floating point feature is supported.
-
-syntax = "proto2";
-
-package delf.protos;
-
-// A DatumProto is a data structure used to serialize tensor with arbitrary
-// shape. DatumProto contains an array of floating point values and its shape
-// is represented as a sequence of integer values. Values are contained in
-// row major order.
-//
-// Example:
-// 3 x 2 array
-//
-// [1.1, 2.2]
-// [3.3, 4.4]
-// [5.5, 6.6]
-//
-// can be represented with the following DatumProto:
-//
-// DatumProto {
-// shape {
-// dim: 3
-// dim: 2
-// }
-// float_list {
-// value: 1.1
-// value: 2.2
-// value: 3.3
-// value: 4.4
-// value: 5.5
-// value: 6.6
-// }
-// }
-
-// DatumShape is array of dimension of the tensor.
-message DatumShape {
- repeated int64 dim = 1 [packed = true];
-}
-
-// FloatList is a container of tensor values, which are saved as a list of
-// floating point values.
-message FloatList {
- repeated float value = 1 [packed = true];
-}
-
-// Uint32List is a container of tensor values, which are saved as a list of
-// uint32 values.
-message Uint32List {
- repeated uint32 value = 1 [packed = true];
-}
-
-message DatumProto {
- optional DatumShape shape = 1;
- oneof kind_oneof {
- FloatList float_list = 2;
- Uint32List uint32_list = 3;
- }
-}
-
-// Groups two DatumProto's.
-message DatumPairProto {
- optional DatumProto first = 1;
- optional DatumProto second = 2;
-}
diff --git a/research/delf/delf/protos/delf_config.proto b/research/delf/delf/protos/delf_config.proto
deleted file mode 100644
index c7cd5b1ce27..00000000000
--- a/research/delf/delf/protos/delf_config.proto
+++ /dev/null
@@ -1,130 +0,0 @@
-// Protocol buffer for configuring DELF feature extraction.
-
-syntax = "proto2";
-
-package delf.protos;
-
-message DelfPcaParameters {
- // Path to PCA mean file.
- optional string mean_path = 1; // Required.
-
- // Path to PCA matrix file.
- optional string projection_matrix_path = 2; // Required.
-
- // Dimensionality of feature after PCA.
- optional int32 pca_dim = 3; // Required.
-
- // If whitening is to be used, this must be set to true.
- optional bool use_whitening = 4 [default = false];
-
- // Path to PCA variances file, used for whitening. This is used only if
- // use_whitening is set to true.
- optional string pca_variances_path = 5;
-}
-
-message DelfLocalFeatureConfig {
- // If PCA is to be used, this must be set to true.
- optional bool use_pca = 1 [default = true];
-
- // Target layer name for DELF model. This is used to obtain receptive field
- // parameters used for localizing features with respect to the input image.
- optional string layer_name = 2 [default = ""];
-
- // Intersection over union threshold for the non-max suppression (NMS)
- // operation. If two features overlap by at most this amount, both are kept.
- // Otherwise, the one with largest attention score is kept. This should be a
- // number between 0.0 (no region is selected) and 1.0 (all regions are
- // selected and NMS is not performed).
- optional float iou_threshold = 3 [default = 1.0];
-
- // Maximum number of features that will be selected. The features with largest
- // scores (eg, largest attention score if score_type is "Att") are the
- // selected ones.
- optional int32 max_feature_num = 4 [default = 1000];
-
- // Threshold to be used for feature selection: no feature with score lower
- // than this number will be selected).
- optional float score_threshold = 5 [default = 100.0];
-
- // PCA parameters for DELF local feature. This is used only if use_pca is
- // true.
- optional DelfPcaParameters pca_parameters = 6;
-
- // If true, the returned keypoint locations are grounded to coordinates of the
- // resized image used for extraction. If false (default), the returned
- // keypoint locations are grounded to coordinates of the original image that
- // is fed into feature extraction.
- optional bool use_resized_coordinates = 7 [default = false];
-}
-
-message DelfGlobalFeatureConfig {
- // If PCA is to be used, this must be set to true.
- optional bool use_pca = 1 [default = true];
-
- // PCA parameters for DELF global feature. This is used only if use_pca is
- // true.
- optional DelfPcaParameters pca_parameters = 2;
-
- // Denotes indices of DelfConfig's scales that will be used for global
- // descriptor extraction. For example, if DelfConfig's image_scales are
- // [0.25, 0.5, 1.0] and image_scales_ind is [0, 2], global descriptor
- // extraction will use solely scales [0.25, 1.0]. Note that local feature
- // extraction will still use [0.25, 0.5, 1.0] in this case. If empty (default)
- // , all scales are used.
- repeated int32 image_scales_ind = 3;
-}
-
-message DelfConfig {
- // Whether to extract local features when using the model.
- // At least one of {use_local_features, use_global_features} must be true.
- optional bool use_local_features = 7 [default = true];
- // Configuration used for local features. Note: this is used only if
- // use_local_features is true.
- optional DelfLocalFeatureConfig delf_local_config = 3;
-
- // Whether to extract global features when using the model.
- // At least one of {use_local_features, use_global_features} must be true.
- optional bool use_global_features = 8 [default = false];
- // Configuration used for global features. Note: this is used only if
- // use_global_features is true.
- optional DelfGlobalFeatureConfig delf_global_config = 9;
-
- // Path to DELF model.
- optional string model_path = 1; // Required.
-
- // Whether model has been exported using TF version 2+.
- optional bool is_tf2_exported = 10 [default = false];
-
- // Image scales to be used.
- repeated float image_scales = 2;
-
- // Image resizing options.
- // - The maximum/minimum image size (in terms of height or width) to be used
- // when extracting DELF features. If set to -1 (default), no upper/lower
- // bound for image size. If use_square_images option is false (default):
- // * If the height *OR* width is larger than max_image_size, it will be
- // resized to max_image_size, and the other dimension will be resized by
- // preserving the aspect ratio.
- // * If both height *AND* width are smaller than min_image_size, the larger
- // side is set to min_image_size.
- // - If use_square_images option is true, it needs to be resized to square
- // resolution. To be more specific:
- // * If the height *OR* width is larger than max_image_size, it is resized
- // to square resolution of max_image_size.
- // * If both height *AND* width are smaller than min_image_size, it is
- // resized to square resolution of min_image_size.
- // * Else, if the input image's resolution is not square, it is resized to
- // square resolution of the larger side.
- // Image resizing is useful when we want to ensure that the input to the image
- // pyramid has a reasonable number of pixels, which could have large impact in
- // terms of image matching performance.
- // When using local features, note that the feature locations and scales will
- // be consistent with the original image input size.
- // Note that when both max_image_size and min_image_size are specified
- // (which is a valid and legit use case), as long as max_image_size >=
- // min_image_size, there's no conflicting scenario (i.e. never triggers both
- // enlarging / shrinking). Bilinear interpolation is used.
- optional int32 max_image_size = 4 [default = -1];
- optional int32 min_image_size = 5 [default = -1];
- optional bool use_square_images = 6 [default = false];
-}
diff --git a/research/delf/delf/protos/feature.proto b/research/delf/delf/protos/feature.proto
deleted file mode 100644
index 64c342fe2c3..00000000000
--- a/research/delf/delf/protos/feature.proto
+++ /dev/null
@@ -1,22 +0,0 @@
-// Protocol buffer for serializing the DELF feature information.
-
-syntax = "proto2";
-
-package delf.protos;
-
-import "delf/protos/datum.proto";
-
-// FloatList is the container of tensor values. The tensor values are saved as
-// a list of floating point values.
-message DelfFeature {
- optional DatumProto descriptor = 1;
- optional float x = 2;
- optional float y = 3;
- optional float scale = 4;
- optional float orientation = 5;
- optional float strength = 6;
-}
-
-message DelfFeatures {
- repeated DelfFeature feature = 1;
-}
diff --git a/research/delf/delf/python/__init__.py b/research/delf/delf/python/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/delf/delf/python/box_io.py b/research/delf/delf/python/box_io.py
deleted file mode 100644
index 8b0f0d2c973..00000000000
--- a/research/delf/delf/python/box_io.py
+++ /dev/null
@@ -1,151 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Python interface for Boxes proto.
-
-Support read and write of Boxes from/to numpy arrays and file.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-
-from delf import box_pb2
-
-
-def ArraysToBoxes(boxes, scores, class_indices):
- """Converts `boxes` to Boxes proto.
-
- Args:
- boxes: [N, 4] float array denoting bounding box coordinates, in format [top,
- left, bottom, right].
- scores: [N] float array with detection scores.
- class_indices: [N] int array with class indices.
-
- Returns:
- boxes_proto: Boxes object.
- """
- num_boxes = len(scores)
- assert num_boxes == boxes.shape[0]
- assert num_boxes == len(class_indices)
-
- boxes_proto = box_pb2.Boxes()
- for i in range(num_boxes):
- boxes_proto.box.add(
- ymin=boxes[i, 0],
- xmin=boxes[i, 1],
- ymax=boxes[i, 2],
- xmax=boxes[i, 3],
- score=scores[i],
- class_index=class_indices[i])
-
- return boxes_proto
-
-
-def BoxesToArrays(boxes_proto):
- """Converts data saved in Boxes proto to numpy arrays.
-
- If there are no boxes, the function returns three empty arrays.
-
- Args:
- boxes_proto: Boxes proto object.
-
- Returns:
- boxes: [N, 4] float array denoting bounding box coordinates, in format [top,
- left, bottom, right].
- scores: [N] float array with detection scores.
- class_indices: [N] int array with class indices.
- """
- num_boxes = len(boxes_proto.box)
- if num_boxes == 0:
- return np.array([]), np.array([]), np.array([])
-
- boxes = np.zeros([num_boxes, 4])
- scores = np.zeros([num_boxes])
- class_indices = np.zeros([num_boxes])
-
- for i in range(num_boxes):
- box_proto = boxes_proto.box[i]
- boxes[i] = [box_proto.ymin, box_proto.xmin, box_proto.ymax, box_proto.xmax]
- scores[i] = box_proto.score
- class_indices[i] = box_proto.class_index
-
- return boxes, scores, class_indices
-
-
-def SerializeToString(boxes, scores, class_indices):
- """Converts numpy arrays to serialized Boxes.
-
- Args:
- boxes: [N, 4] float array denoting bounding box coordinates, in format [top,
- left, bottom, right].
- scores: [N] float array with detection scores.
- class_indices: [N] int array with class indices.
-
- Returns:
- Serialized Boxes string.
- """
- boxes_proto = ArraysToBoxes(boxes, scores, class_indices)
- return boxes_proto.SerializeToString()
-
-
-def ParseFromString(string):
- """Converts serialized Boxes proto string to numpy arrays.
-
- Args:
- string: Serialized Boxes string.
-
- Returns:
- boxes: [N, 4] float array denoting bounding box coordinates, in format [top,
- left, bottom, right].
- scores: [N] float array with detection scores.
- class_indices: [N] int array with class indices.
- """
- boxes_proto = box_pb2.Boxes()
- boxes_proto.ParseFromString(string)
- return BoxesToArrays(boxes_proto)
-
-
-def ReadFromFile(file_path):
- """Helper function to load data from a Boxes proto format in a file.
-
- Args:
- file_path: Path to file containing data.
-
- Returns:
- boxes: [N, 4] float array denoting bounding box coordinates, in format [top,
- left, bottom, right].
- scores: [N] float array with detection scores.
- class_indices: [N] int array with class indices.
- """
- with tf.io.gfile.GFile(file_path, 'rb') as f:
- return ParseFromString(f.read())
-
-
-def WriteToFile(file_path, boxes, scores, class_indices):
- """Helper function to write data to a file in Boxes proto format.
-
- Args:
- file_path: Path to file that will be written.
- boxes: [N, 4] float array denoting bounding box coordinates, in format [top,
- left, bottom, right].
- scores: [N] float array with detection scores.
- class_indices: [N] int array with class indices.
- """
- serialized_data = SerializeToString(boxes, scores, class_indices)
- with tf.io.gfile.GFile(file_path, 'w') as f:
- f.write(serialized_data)
diff --git a/research/delf/delf/python/box_io_test.py b/research/delf/delf/python/box_io_test.py
deleted file mode 100644
index c659185daee..00000000000
--- a/research/delf/delf/python/box_io_test.py
+++ /dev/null
@@ -1,82 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for box_io, the python interface of Boxes proto."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import numpy as np
-import tensorflow as tf
-
-from delf import box_io
-
-FLAGS = flags.FLAGS
-
-
-class BoxesIoTest(tf.test.TestCase):
-
- def _create_data(self):
- """Creates data to be used in tests.
-
- Returns:
- boxes: [N, 4] float array denoting bounding box coordinates, in format
- [top,
- left, bottom, right].
- scores: [N] float array with detection scores.
- class_indices: [N] int array with class indices.
- """
- boxes = np.arange(24, dtype=np.float32).reshape(6, 4)
- scores = np.arange(6, dtype=np.float32)
- class_indices = np.arange(6, dtype=np.int32)
-
- return boxes, scores, class_indices
-
- def testConversionAndBack(self):
- boxes, scores, class_indices = self._create_data()
-
- serialized = box_io.SerializeToString(boxes, scores, class_indices)
- parsed_data = box_io.ParseFromString(serialized)
-
- self.assertAllEqual(boxes, parsed_data[0])
- self.assertAllEqual(scores, parsed_data[1])
- self.assertAllEqual(class_indices, parsed_data[2])
-
- def testWriteAndReadToFile(self):
- boxes, scores, class_indices = self._create_data()
-
- filename = os.path.join(FLAGS.test_tmpdir, 'test.boxes')
- box_io.WriteToFile(filename, boxes, scores, class_indices)
- data_read = box_io.ReadFromFile(filename)
-
- self.assertAllEqual(boxes, data_read[0])
- self.assertAllEqual(scores, data_read[1])
- self.assertAllEqual(class_indices, data_read[2])
-
- def testWriteAndReadToFileEmptyFile(self):
- filename = os.path.join(FLAGS.test_tmpdir, 'test.box')
- box_io.WriteToFile(filename, np.array([]), np.array([]), np.array([]))
- data_read = box_io.ReadFromFile(filename)
-
- self.assertAllEqual(np.array([]), data_read[0])
- self.assertAllEqual(np.array([]), data_read[1])
- self.assertAllEqual(np.array([]), data_read[2])
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/datasets/__init__.py b/research/delf/delf/python/datasets/__init__.py
deleted file mode 100644
index 8b137891791..00000000000
--- a/research/delf/delf/python/datasets/__init__.py
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/research/delf/delf/python/datasets/generic_dataset.py b/research/delf/delf/python/datasets/generic_dataset.py
deleted file mode 100644
index a2e6d8f1e3c..00000000000
--- a/research/delf/delf/python/datasets/generic_dataset.py
+++ /dev/null
@@ -1,81 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Functions for generic image dataset creation."""
-
-import os
-
-from delf.python.datasets import utils
-
-
-class ImagesFromList():
- """A generic data loader that loads images from a list.
-
- Supports images of different sizes.
- """
-
- def __init__(self, root, image_paths, imsize=None, bounding_boxes=None,
- loader=utils.default_loader):
- """ImagesFromList object initialization.
-
- Args:
- root: String, root directory path.
- image_paths: List, relative image paths as strings.
- imsize: Integer, defines the maximum size of longer image side.
- bounding_boxes: List of (x1,y1,x2,y2) tuples to crop the query images.
- loader: Callable, a function to load an image given its path.
-
- Raises:
- ValueError: Raised if `image_paths` list is empty.
- """
- # List of the full image filenames.
- images_filenames = [os.path.join(root, image_path) for image_path in
- image_paths]
-
- if not images_filenames:
- raise ValueError("Dataset contains 0 images.")
-
- self.root = root
- self.images = image_paths
- self.imsize = imsize
- self.images_filenames = images_filenames
- self.bounding_boxes = bounding_boxes
- self.loader = loader
-
- def __getitem__(self, index):
- """Called to load an image at the given `index`.
-
- Args:
- index: Integer, image index.
-
- Returns:
- image: Tensor, loaded image.
- """
- path = self.images_filenames[index]
-
- if self.bounding_boxes is not None:
- img = self.loader(path, self.imsize, self.bounding_boxes[index])
- else:
- img = self.loader(path, self.imsize)
-
- return img
-
- def __len__(self):
- """Implements the built-in function len().
-
- Returns:
- len: Number of images in the dataset.
- """
- return len(self.images_filenames)
diff --git a/research/delf/delf/python/datasets/generic_dataset_test.py b/research/delf/delf/python/datasets/generic_dataset_test.py
deleted file mode 100644
index 93c5de9598f..00000000000
--- a/research/delf/delf/python/datasets/generic_dataset_test.py
+++ /dev/null
@@ -1,60 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for generic dataset."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-
-import numpy as np
-from PIL import Image
-import tensorflow as tf
-
-from delf.python.datasets import generic_dataset
-
-FLAGS = flags.FLAGS
-
-
-class GenericDatasetTest(tf.test.TestCase):
- """Test functions for generic dataset."""
-
- def testGenericDataset(self):
- """Tests loading dummy images from list."""
- # Number of images to be created.
- n = 2
- image_names = []
-
- # Create and save `n` dummy images.
- for i in range(n):
- dummy_image = np.random.rand(1024, 750, 3) * 255
- img_out = Image.fromarray(dummy_image.astype('uint8')).convert('RGB')
- filename = os.path.join(FLAGS.test_tmpdir,
- 'test_image_{}.jpg'.format(i))
- img_out.save(filename)
- image_names.append('test_image_{}.jpg'.format(i))
-
- data = generic_dataset.ImagesFromList(root=FLAGS.test_tmpdir,
- image_paths=image_names,
- imsize=1024)
- self.assertLen(data, n)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/README.md b/research/delf/delf/python/datasets/google_landmarks_dataset/README.md
deleted file mode 100644
index 4f34b59aaa4..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/README.md
+++ /dev/null
@@ -1,123 +0,0 @@
-## GLDv2 code/models
-
-[![Paper](http://img.shields.io/badge/paper-arXiv.2004.01804-B3181B.svg)](https://arxiv.org/abs/2004.01804)
-
-These instructions can be used to reproduce results from the
-[GLDv2 paper](https://arxiv.org/abs/2004.01804). We present here results on the
-Revisited Oxford/Paris datasets since they are smaller and quicker to
-reproduce -- but note that a very similar procedure can be used to obtain
-results on the GLDv2 retrieval or recognition datasets.
-
-Note that this directory also contains code to compute GLDv2 metrics: see
-`compute_retrieval_metrics.py`, `compute_recognition_metrics.py` and associated
-file reading / metric computation modules.
-
-For more details on the dataset, please refer to its
-[website](https://github.com/cvdfoundation/google-landmark).
-
-### Install DELF library
-
-To be able to use this code, please follow
-[these instructions](../../../../INSTALL_INSTRUCTIONS.md) to properly install the
-DELF library.
-
-### Download Revisited Oxford/Paris datasets
-
-```bash
-mkdir -p ~/revisitop/data && cd ~/revisitop/data
-
-# Oxford dataset.
-wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
-mkdir oxford5k_images
-tar -xvzf oxbuild_images.tgz -C oxford5k_images/
-
-# Paris dataset. Download and move all images to same directory.
-wget http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/paris_1.tgz
-wget http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/paris_2.tgz
-mkdir paris6k_images_tmp
-tar -xvzf paris_1.tgz -C paris6k_images_tmp/
-tar -xvzf paris_2.tgz -C paris6k_images_tmp/
-mkdir paris6k_images
-mv paris6k_images_tmp/paris/*/*.jpg paris6k_images/
-
-# Revisited annotations.
-wget http://cmp.felk.cvut.cz/revisitop/data/datasets/roxford5k/gnd_roxford5k.mat
-wget http://cmp.felk.cvut.cz/revisitop/data/datasets/rparis6k/gnd_rparis6k.mat
-```
-
-### Download model
-
-```bash
-# From models/research/delf/delf/python/datasets/google_landmarks_dataset
-mkdir parameters && cd parameters
-
-# RN101-ArcFace model trained on GLDv2-clean.
-wget https://storage.googleapis.com/delf/rn101_af_gldv2clean_20200814.tar.gz
-tar -xvzf rn101_af_gldv2clean_20200814.tar.gz
-```
-
-### Feature extraction
-
-We present here commands for extraction on `roxford5k`. To extract on `rparis6k`
-instead, please edit the arguments accordingly (especially the
-`dataset_file_path` argument).
-
-#### Query feature extraction
-
-In the Revisited Oxford/Paris experimental protocol, query images must be the
-cropped before feature extraction (this is done in the `extract_features`
-script, when setting `image_set=query`). Note that this is specific to these
-datasets, and not required for the GLDv2 retrieval/recognition datasets.
-
-Run query feature extraction as follows:
-
-```bash
-# From models/research/delf/delf/python/datasets/google_landmarks_dataset
-python3 ../../delg/extract_features.py \
- --delf_config_path rn101_af_gldv2clean_config.pbtxt \
- --dataset_file_path ~/revisitop/data/gnd_roxford5k.mat \
- --images_dir ~/revisitop/data/oxford5k_images \
- --image_set query \
- --output_features_dir ~/revisitop/data/oxford5k_features/query
-```
-
-#### Index feature extraction
-
-Run index feature extraction as follows:
-
-```bash
-# From models/research/delf/delf/python/datasets/google_landmarks_dataset
-python3 ../../delg/extract_features.py \
- --delf_config_path rn101_af_gldv2clean_config.pbtxt \
- --dataset_file_path ~/revisitop/data/gnd_roxford5k.mat \
- --images_dir ~/revisitop/data/oxford5k_images \
- --image_set index \
- --output_features_dir ~/revisitop/data/oxford5k_features/index
-```
-
-### Perform retrieval
-
-To run retrieval on `roxford5k`, the following command can be used:
-
-```bash
-# From models/research/delf/delf/python/datasets/google_landmarks_dataset
-python3 ../../delg/perform_retrieval.py \
- --dataset_file_path ~/revisitop/data/gnd_roxford5k.mat \
- --query_features_dir ~/revisitop/data/oxford5k_features/query \
- --index_features_dir ~/revisitop/data/oxford5k_features/index \
- --output_dir ~/revisitop/results/oxford5k
-```
-
-A file with named `metrics.txt` will be written to the path given in
-`output_dir`. The contents should look approximately like:
-
-```
-hard
- mAP=55.54
- mP@k[ 1 5 10] [88.57 80.86 70.14]
- mR@k[ 1 5 10] [19.46 33.65 42.44]
-medium
- mAP=76.23
- mP@k[ 1 5 10] [95.71 92.86 90.43]
- mR@k[ 1 5 10] [10.17 25.96 35.29]
-```
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/__init__.py b/research/delf/delf/python/datasets/google_landmarks_dataset/__init__.py
deleted file mode 100644
index 4e24e0fb7c5..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/__init__.py
+++ /dev/null
@@ -1,22 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Module exposing Google Landmarks dataset for training."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-# pylint: disable=unused-import
-from delf.python.datasets.google_landmarks_dataset import googlelandmarks
-# pylint: enable=unused-import
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/compute_recognition_metrics.py b/research/delf/delf/python/datasets/google_landmarks_dataset/compute_recognition_metrics.py
deleted file mode 100644
index 4c241ed5380..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/compute_recognition_metrics.py
+++ /dev/null
@@ -1,99 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Computes metrics for Google Landmarks Recognition dataset predictions.
-
-Metrics are written to stdout.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import sys
-
-from absl import app
-from delf.python.datasets.google_landmarks_dataset import dataset_file_io
-from delf.python.datasets.google_landmarks_dataset import metrics
-
-cmd_args = None
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Read solution.
- print('Reading solution...')
- public_solution, private_solution, ignored_ids = dataset_file_io.ReadSolution(
- cmd_args.solution_path, dataset_file_io.RECOGNITION_TASK_ID)
- print('done!')
-
- # Read predictions.
- print('Reading predictions...')
- public_predictions, private_predictions = dataset_file_io.ReadPredictions(
- cmd_args.predictions_path, set(public_solution.keys()),
- set(private_solution.keys()), set(ignored_ids),
- dataset_file_io.RECOGNITION_TASK_ID)
- print('done!')
-
- # Global Average Precision.
- print('**********************************************')
- print('(Public) Global Average Precision: %f' %
- metrics.GlobalAveragePrecision(public_predictions, public_solution))
- print('(Private) Global Average Precision: %f' %
- metrics.GlobalAveragePrecision(private_predictions, private_solution))
-
- # Global Average Precision ignoring non-landmark queries.
- print('**********************************************')
- print(
- '(Public) Global Average Precision ignoring non-landmark queries: %f' %
- metrics.GlobalAveragePrecision(
- public_predictions, public_solution, ignore_non_gt_test_images=True))
- print(
- '(Private) Global Average Precision ignoring non-landmark queries: %f' %
- metrics.GlobalAveragePrecision(
- private_predictions, private_solution,
- ignore_non_gt_test_images=True))
-
- # Top-1 accuracy.
- print('**********************************************')
- print('(Public) Top-1 accuracy: %.2f' %
- (100.0 * metrics.Top1Accuracy(public_predictions, public_solution)))
- print('(Private) Top-1 accuracy: %.2f' %
- (100.0 * metrics.Top1Accuracy(private_predictions, private_solution)))
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--predictions_path',
- type=str,
- default='/tmp/predictions.csv',
- help="""
- Path to CSV predictions file, formatted with columns 'id,landmarks' (the
- file should include a header).
- """)
- parser.add_argument(
- '--solution_path',
- type=str,
- default='/tmp/solution.csv',
- help="""
- Path to CSV solution file, formatted with columns 'id,landmarks,Usage'
- (the file should include a header).
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/compute_retrieval_metrics.py b/research/delf/delf/python/datasets/google_landmarks_dataset/compute_retrieval_metrics.py
deleted file mode 100644
index 231c320168c..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/compute_retrieval_metrics.py
+++ /dev/null
@@ -1,106 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Computes metrics for Google Landmarks Retrieval dataset predictions.
-
-Metrics are written to stdout.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import sys
-
-from absl import app
-from delf.python.datasets.google_landmarks_dataset import dataset_file_io
-from delf.python.datasets.google_landmarks_dataset import metrics
-
-cmd_args = None
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Read solution.
- print('Reading solution...')
- public_solution, private_solution, ignored_ids = dataset_file_io.ReadSolution(
- cmd_args.solution_path, dataset_file_io.RETRIEVAL_TASK_ID)
- print('done!')
-
- # Read predictions.
- print('Reading predictions...')
- public_predictions, private_predictions = dataset_file_io.ReadPredictions(
- cmd_args.predictions_path, set(public_solution.keys()),
- set(private_solution.keys()), set(ignored_ids),
- dataset_file_io.RETRIEVAL_TASK_ID)
- print('done!')
-
- # Mean average precision.
- print('**********************************************')
- print('(Public) Mean Average Precision: %f' %
- metrics.MeanAveragePrecision(public_predictions, public_solution))
- print('(Private) Mean Average Precision: %f' %
- metrics.MeanAveragePrecision(private_predictions, private_solution))
-
- # Mean precision@k.
- print('**********************************************')
- public_precisions = 100.0 * metrics.MeanPrecisions(public_predictions,
- public_solution)
- private_precisions = 100.0 * metrics.MeanPrecisions(private_predictions,
- private_solution)
- print('(Public) Mean precisions: P@1: %.2f, P@5: %.2f, P@10: %.2f, '
- 'P@50: %.2f, P@100: %.2f' %
- (public_precisions[0], public_precisions[4], public_precisions[9],
- public_precisions[49], public_precisions[99]))
- print('(Private) Mean precisions: P@1: %.2f, P@5: %.2f, P@10: %.2f, '
- 'P@50: %.2f, P@100: %.2f' %
- (private_precisions[0], private_precisions[4], private_precisions[9],
- private_precisions[49], private_precisions[99]))
-
- # Mean/median position of first correct.
- print('**********************************************')
- public_mean_position, public_median_position = metrics.MeanMedianPosition(
- public_predictions, public_solution)
- private_mean_position, private_median_position = metrics.MeanMedianPosition(
- private_predictions, private_solution)
- print('(Public) Mean position: %.2f, median position: %.2f' %
- (public_mean_position, public_median_position))
- print('(Private) Mean position: %.2f, median position: %.2f' %
- (private_mean_position, private_median_position))
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--predictions_path',
- type=str,
- default='/tmp/predictions.csv',
- help="""
- Path to CSV predictions file, formatted with columns 'id,images' (the
- file should include a header).
- """)
- parser.add_argument(
- '--solution_path',
- type=str,
- default='/tmp/solution.csv',
- help="""
- Path to CSV solution file, formatted with columns 'id,images,Usage'
- (the file should include a header).
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/dataset_file_io.py b/research/delf/delf/python/datasets/google_landmarks_dataset/dataset_file_io.py
deleted file mode 100644
index 93f2785d78f..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/dataset_file_io.py
+++ /dev/null
@@ -1,159 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""IO module for files from Landmark recognition/retrieval challenges."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import csv
-
-import tensorflow as tf
-
-RECOGNITION_TASK_ID = 'recognition'
-RETRIEVAL_TASK_ID = 'retrieval'
-
-
-def ReadSolution(file_path, task):
- """Reads solution from file, for a given task.
-
- Args:
- file_path: Path to CSV file with solution. File contains a header.
- task: Type of challenge task. Supported values: 'recognition', 'retrieval'.
-
- Returns:
- public_solution: Dict mapping test image ID to list of ground-truth IDs, for
- the Public subset of test images. If `task` == 'recognition', the IDs are
- integers corresponding to landmark IDs. If `task` == 'retrieval', the IDs
- are strings corresponding to index image IDs.
- private_solution: Same as `public_solution`, but for the private subset of
- test images.
- ignored_ids: List of test images that are ignored in scoring.
-
- Raises:
- ValueError: If Usage field is not Public, Private or Ignored; or if `task`
- is not supported.
- """
- public_solution = {}
- private_solution = {}
- ignored_ids = []
- with tf.io.gfile.GFile(file_path, 'r') as csv_file:
- reader = csv.reader(csv_file)
- next(reader, None) # Skip header.
- for row in reader:
- test_id = row[0]
- if row[2] == 'Ignored':
- ignored_ids.append(test_id)
- else:
- ground_truth_ids = []
- if task == RECOGNITION_TASK_ID:
- if row[1]:
- for landmark_id in row[1].split(' '):
- ground_truth_ids.append(int(landmark_id))
- elif task == RETRIEVAL_TASK_ID:
- for image_id in row[1].split(' '):
- ground_truth_ids.append(image_id)
- else:
- raise ValueError('Unrecognized task: %s' % task)
-
- if row[2] == 'Public':
- public_solution[test_id] = ground_truth_ids
- elif row[2] == 'Private':
- private_solution[test_id] = ground_truth_ids
- else:
- raise ValueError('Test image %s has unrecognized Usage tag %s' %
- (row[0], row[2]))
-
- return public_solution, private_solution, ignored_ids
-
-
-def ReadPredictions(file_path, public_ids, private_ids, ignored_ids, task):
- """Reads predictions from file, for a given task.
-
- Args:
- file_path: Path to CSV file with predictions. File contains a header.
- public_ids: Set (or list) of test image IDs in Public subset of test images.
- private_ids: Same as `public_ids`, but for the private subset of test
- images.
- ignored_ids: Set (or list) of test image IDs that are ignored in scoring and
- are associated to no ground-truth.
- task: Type of challenge task. Supported values: 'recognition', 'retrieval'.
-
- Returns:
- public_predictions: Dict mapping test image ID to prediction, for the Public
- subset of test images. If `task` == 'recognition', the prediction is a
- dict with keys 'class' (integer) and 'score' (float). If `task` ==
- 'retrieval', the prediction is a list of strings corresponding to index
- image IDs.
- private_predictions: Same as `public_predictions`, but for the private
- subset of test images.
-
- Raises:
- ValueError:
- - If test image ID is unrecognized/repeated;
- - If `task` is not supported;
- - If prediction is malformed.
- """
- public_predictions = {}
- private_predictions = {}
- with tf.io.gfile.GFile(file_path, 'r') as csv_file:
- reader = csv.reader(csv_file)
- next(reader, None) # Skip header.
- for row in reader:
- # Skip row if empty.
- if not row:
- continue
-
- test_id = row[0]
-
- # Makes sure this query has not yet been seen.
- if test_id in public_predictions:
- raise ValueError('Test image %s is repeated.' % test_id)
- if test_id in private_predictions:
- raise ValueError('Test image %s is repeated' % test_id)
-
- # If ignored, skip it.
- if test_id in ignored_ids:
- continue
-
- # Only parse result if there is a prediction.
- if row[1]:
- prediction_split = row[1].split(' ')
- # Remove empty spaces at end (if any).
- if not prediction_split[-1]:
- prediction_split = prediction_split[:-1]
-
- if task == RECOGNITION_TASK_ID:
- if len(prediction_split) != 2:
- raise ValueError('Prediction is malformed: there should only be 2 '
- 'elements in second column, but found %d for test '
- 'image %s' % (len(prediction_split), test_id))
-
- landmark_id = int(prediction_split[0])
- score = float(prediction_split[1])
- prediction_entry = {'class': landmark_id, 'score': score}
- elif task == RETRIEVAL_TASK_ID:
- prediction_entry = prediction_split
- else:
- raise ValueError('Unrecognized task: %s' % task)
-
- if test_id in public_ids:
- public_predictions[test_id] = prediction_entry
- elif test_id in private_ids:
- private_predictions[test_id] = prediction_entry
- else:
- raise ValueError('test_id %s is unrecognized' % test_id)
-
- return public_predictions, private_predictions
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/dataset_file_io_test.py b/research/delf/delf/python/datasets/google_landmarks_dataset/dataset_file_io_test.py
deleted file mode 100644
index 8bd2ac5e0f3..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/dataset_file_io_test.py
+++ /dev/null
@@ -1,166 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for dataset file IO module."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import tensorflow as tf
-
-from delf.python.datasets.google_landmarks_dataset import dataset_file_io
-
-FLAGS = flags.FLAGS
-
-
-class DatasetFileIoTest(tf.test.TestCase):
-
- def testReadRecognitionSolutionWorks(self):
- # Define inputs.
- file_path = os.path.join(FLAGS.test_tmpdir, 'recognition_solution.csv')
- with tf.io.gfile.GFile(file_path, 'w') as f:
- f.write('id,landmarks,Usage\n')
- f.write('0123456789abcdef,0 12,Public\n')
- f.write('0223456789abcdef,,Public\n')
- f.write('0323456789abcdef,100,Ignored\n')
- f.write('0423456789abcdef,1,Private\n')
- f.write('0523456789abcdef,,Ignored\n')
-
- # Run tested function.
- (public_solution, private_solution,
- ignored_ids) = dataset_file_io.ReadSolution(
- file_path, dataset_file_io.RECOGNITION_TASK_ID)
-
- # Define expected results.
- expected_public_solution = {
- '0123456789abcdef': [0, 12],
- '0223456789abcdef': []
- }
- expected_private_solution = {
- '0423456789abcdef': [1],
- }
- expected_ignored_ids = ['0323456789abcdef', '0523456789abcdef']
-
- # Compare actual and expected results.
- self.assertEqual(public_solution, expected_public_solution)
- self.assertEqual(private_solution, expected_private_solution)
- self.assertEqual(ignored_ids, expected_ignored_ids)
-
- def testReadRetrievalSolutionWorks(self):
- # Define inputs.
- file_path = os.path.join(FLAGS.test_tmpdir, 'retrieval_solution.csv')
- with tf.io.gfile.GFile(file_path, 'w') as f:
- f.write('id,images,Usage\n')
- f.write('0123456789abcdef,None,Ignored\n')
- f.write('0223456789abcdef,fedcba9876543210 fedcba9876543200,Public\n')
- f.write('0323456789abcdef,fedcba9876543200,Private\n')
- f.write('0423456789abcdef,fedcba9876543220,Private\n')
- f.write('0523456789abcdef,None,Ignored\n')
-
- # Run tested function.
- (public_solution, private_solution,
- ignored_ids) = dataset_file_io.ReadSolution(
- file_path, dataset_file_io.RETRIEVAL_TASK_ID)
-
- # Define expected results.
- expected_public_solution = {
- '0223456789abcdef': ['fedcba9876543210', 'fedcba9876543200'],
- }
- expected_private_solution = {
- '0323456789abcdef': ['fedcba9876543200'],
- '0423456789abcdef': ['fedcba9876543220'],
- }
- expected_ignored_ids = ['0123456789abcdef', '0523456789abcdef']
-
- # Compare actual and expected results.
- self.assertEqual(public_solution, expected_public_solution)
- self.assertEqual(private_solution, expected_private_solution)
- self.assertEqual(ignored_ids, expected_ignored_ids)
-
- def testReadRecognitionPredictionsWorks(self):
- # Define inputs.
- file_path = os.path.join(FLAGS.test_tmpdir, 'recognition_predictions.csv')
- with tf.io.gfile.GFile(file_path, 'w') as f:
- f.write('id,landmarks\n')
- f.write('0123456789abcdef,12 0.1 \n')
- f.write('0423456789abcdef,0 19.0\n')
- f.write('0223456789abcdef,\n')
- f.write('\n')
- f.write('0523456789abcdef,14 0.01\n')
- public_ids = ['0123456789abcdef', '0223456789abcdef']
- private_ids = ['0423456789abcdef']
- ignored_ids = ['0323456789abcdef', '0523456789abcdef']
-
- # Run tested function.
- public_predictions, private_predictions = dataset_file_io.ReadPredictions(
- file_path, public_ids, private_ids, ignored_ids,
- dataset_file_io.RECOGNITION_TASK_ID)
-
- # Define expected results.
- expected_public_predictions = {
- '0123456789abcdef': {
- 'class': 12,
- 'score': 0.1
- }
- }
- expected_private_predictions = {
- '0423456789abcdef': {
- 'class': 0,
- 'score': 19.0
- }
- }
-
- # Compare actual and expected results.
- self.assertEqual(public_predictions, expected_public_predictions)
- self.assertEqual(private_predictions, expected_private_predictions)
-
- def testReadRetrievalPredictionsWorks(self):
- # Define inputs.
- file_path = os.path.join(FLAGS.test_tmpdir, 'retrieval_predictions.csv')
- with tf.io.gfile.GFile(file_path, 'w') as f:
- f.write('id,images\n')
- f.write('0123456789abcdef,fedcba9876543250 \n')
- f.write('0423456789abcdef,fedcba9876543260\n')
- f.write('0223456789abcdef,fedcba9876543210 fedcba9876543200 '
- 'fedcba9876543220\n')
- f.write('\n')
- f.write('0523456789abcdef,\n')
- public_ids = ['0223456789abcdef']
- private_ids = ['0323456789abcdef', '0423456789abcdef']
- ignored_ids = ['0123456789abcdef', '0523456789abcdef']
-
- # Run tested function.
- public_predictions, private_predictions = dataset_file_io.ReadPredictions(
- file_path, public_ids, private_ids, ignored_ids,
- dataset_file_io.RETRIEVAL_TASK_ID)
-
- # Define expected results.
- expected_public_predictions = {
- '0223456789abcdef': [
- 'fedcba9876543210', 'fedcba9876543200', 'fedcba9876543220'
- ]
- }
- expected_private_predictions = {'0423456789abcdef': ['fedcba9876543260']}
-
- # Compare actual and expected results.
- self.assertEqual(public_predictions, expected_public_predictions)
- self.assertEqual(private_predictions, expected_private_predictions)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/googlelandmarks.py b/research/delf/delf/python/datasets/google_landmarks_dataset/googlelandmarks.py
deleted file mode 100644
index b6122f5c79c..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/googlelandmarks.py
+++ /dev/null
@@ -1,186 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Google Landmarks Dataset(GLD).
-
-Placeholder for Google Landmarks dataset.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import functools
-
-import tensorflow as tf
-
-
-class _GoogleLandmarksInfo(object):
- """Metadata about the Google Landmarks dataset."""
- num_classes = {'gld_v1': 14951, 'gld_v2': 203094, 'gld_v2_clean': 81313}
-
-
-class _DataAugmentationParams(object):
- """Default parameters for augmentation."""
- # The following are used for training.
- min_object_covered = 0.1
- aspect_ratio_range_min = 3. / 4
- aspect_ratio_range_max = 4. / 3
- area_range_min = 0.08
- area_range_max = 1.0
- max_attempts = 100
- update_labels = False
- # 'central_fraction' is used for central crop in inference.
- central_fraction = 0.875
-
- random_reflection = False
-
-
-def NormalizeImages(images, pixel_value_scale=0.5, pixel_value_offset=0.5):
- """Normalize pixel values in image.
-
- Output is computed as
- normalized_images = (images - pixel_value_offset) / pixel_value_scale.
-
- Args:
- images: `Tensor`, images to normalize.
- pixel_value_scale: float, scale.
- pixel_value_offset: float, offset.
-
- Returns:
- normalized_images: `Tensor`, normalized images.
- """
- images = tf.cast(images, tf.float32)
- normalized_images = tf.math.divide(
- tf.subtract(images, pixel_value_offset), pixel_value_scale)
- return normalized_images
-
-
-def _ImageNetCrop(image, image_size):
- """Imagenet-style crop with random bbox and aspect ratio.
-
- Args:
- image: a `Tensor`, image to crop.
- image_size: an `int`. The image size for the decoded image, on each side.
-
- Returns:
- cropped_image: `Tensor`, cropped image.
- """
-
- params = _DataAugmentationParams()
- bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4])
- (bbox_begin, bbox_size, _) = tf.image.sample_distorted_bounding_box(
- tf.shape(image),
- bounding_boxes=bbox,
- min_object_covered=params.min_object_covered,
- aspect_ratio_range=(params.aspect_ratio_range_min,
- params.aspect_ratio_range_max),
- area_range=(params.area_range_min, params.area_range_max),
- max_attempts=params.max_attempts,
- use_image_if_no_bounding_boxes=True)
- cropped_image = tf.slice(image, bbox_begin, bbox_size)
- cropped_image.set_shape([None, None, 3])
-
- cropped_image = tf.image.resize(
- cropped_image, [image_size, image_size], method='area')
- if params.random_reflection:
- cropped_image = tf.image.random_flip_left_right(cropped_image)
-
- return cropped_image
-
-
-def _ParseFunction(example, name_to_features, image_size, augmentation):
- """Parse a single TFExample to get the image and label and process the image.
-
- Args:
- example: a `TFExample`.
- name_to_features: a `dict`. The mapping from feature names to its type.
- image_size: an `int`. The image size for the decoded image, on each side.
- augmentation: a `boolean`. True if the image will be augmented.
-
- Returns:
- image: a `Tensor`. The processed image.
- label: a `Tensor`. The ground-truth label.
- """
- parsed_example = tf.io.parse_single_example(example, name_to_features)
- # Parse to get image.
- image = parsed_example['image/encoded']
- image = tf.io.decode_jpeg(image)
- image = NormalizeImages(
- image, pixel_value_scale=128.0, pixel_value_offset=128.0)
- if augmentation:
- image = _ImageNetCrop(image, image_size)
- else:
- image = tf.image.resize(image, [image_size, image_size])
- image.set_shape([image_size, image_size, 3])
- # Parse to get label.
- label = parsed_example['image/class/label']
-
- return image, label
-
-
-def CreateDataset(file_pattern,
- image_size=321,
- batch_size=32,
- augmentation=False,
- seed=0):
- """Creates a dataset.
-
- Args:
- file_pattern: str, file pattern of the dataset files.
- image_size: int, image size.
- batch_size: int, batch size.
- augmentation: bool, whether to apply augmentation.
- seed: int, seed for shuffling the dataset.
-
- Returns:
- tf.data.TFRecordDataset.
- """
-
- filenames = tf.io.gfile.glob(file_pattern)
-
- dataset = tf.data.TFRecordDataset(filenames)
- dataset = dataset.repeat().shuffle(buffer_size=100, seed=seed)
-
- # Create a description of the features.
- feature_description = {
- 'image/height': tf.io.FixedLenFeature([], tf.int64, default_value=0),
- 'image/width': tf.io.FixedLenFeature([], tf.int64, default_value=0),
- 'image/channels': tf.io.FixedLenFeature([], tf.int64, default_value=0),
- 'image/format': tf.io.FixedLenFeature([], tf.string, default_value=''),
- 'image/id': tf.io.FixedLenFeature([], tf.string, default_value=''),
- 'image/filename': tf.io.FixedLenFeature([], tf.string, default_value=''),
- 'image/encoded': tf.io.FixedLenFeature([], tf.string, default_value=''),
- 'image/class/label': tf.io.FixedLenFeature([], tf.int64, default_value=0),
- }
-
- customized_parse_func = functools.partial(
- _ParseFunction,
- name_to_features=feature_description,
- image_size=image_size,
- augmentation=augmentation)
- dataset = dataset.map(customized_parse_func)
- dataset = dataset.batch(batch_size)
-
- return dataset
-
-
-def GoogleLandmarksInfo():
- """Returns metadata information on the Google Landmarks dataset.
-
- Returns:
- object _GoogleLandmarksInfo containing metadata about the GLD dataset.
- """
- return _GoogleLandmarksInfo()
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/metrics.py b/research/delf/delf/python/datasets/google_landmarks_dataset/metrics.py
deleted file mode 100644
index 1516be9d856..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/metrics.py
+++ /dev/null
@@ -1,254 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Python module to compute metrics for Google Landmarks dataset."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-
-def _CountPositives(solution):
- """Counts number of test images with non-empty ground-truth in `solution`.
-
- Args:
- solution: Dict mapping test image ID to list of ground-truth IDs.
-
- Returns:
- count: Number of test images with non-empty ground-truth.
- """
- count = 0
- for v in solution.values():
- if v:
- count += 1
-
- return count
-
-
-def GlobalAveragePrecision(predictions,
- recognition_solution,
- ignore_non_gt_test_images=False):
- """Computes global average precision for recognition prediction.
-
- Args:
- predictions: Dict mapping test image ID to a dict with keys 'class'
- (integer) and 'score' (float).
- recognition_solution: Dict mapping test image ID to list of ground-truth
- landmark IDs.
- ignore_non_gt_test_images: If True, ignore test images which do not have
- associated ground-truth landmark IDs. For the Google Landmark Recognition
- challenge, this should be set to False.
-
- Returns:
- gap: Global average precision score (float).
- """
- # Compute number of expected results.
- num_positives = _CountPositives(recognition_solution)
-
- gap = 0.0
- total_predictions = 0
- correct_predictions = 0
-
- # Sort predictions according to Kaggle's convention:
- # - first by score (descending);
- # - then by key (ascending);
- # - then by class (ascending).
- sorted_predictions_by_key_class = sorted(
- predictions.items(), key=lambda item: (item[0], item[1]['class']))
- sorted_predictions = sorted(
- sorted_predictions_by_key_class,
- key=lambda item: item[1]['score'],
- reverse=True)
-
- # Loop over sorted predictions (descending order) and compute GAPs.
- for key, prediction in sorted_predictions:
- if ignore_non_gt_test_images and not recognition_solution[key]:
- continue
-
- total_predictions += 1
- if prediction['class'] in recognition_solution[key]:
- correct_predictions += 1
- gap += correct_predictions / total_predictions
-
- gap /= num_positives
-
- return gap
-
-
-def Top1Accuracy(predictions, recognition_solution):
- """Computes top-1 accuracy for recognition prediction.
-
- Note that test images without ground-truth are ignored.
-
- Args:
- predictions: Dict mapping test image ID to a dict with keys 'class'
- (integer) and 'score' (float).
- recognition_solution: Dict mapping test image ID to list of ground-truth
- landmark IDs.
-
- Returns:
- accuracy: Top-1 accuracy (float).
- """
- # Loop over test images in solution. If it has at least one class label, we
- # check if the predicion is correct.
- num_correct_predictions = 0
- num_test_images_with_ground_truth = 0
- for key, ground_truth in recognition_solution.items():
- if ground_truth:
- num_test_images_with_ground_truth += 1
- if key in predictions:
- if predictions[key]['class'] in ground_truth:
- num_correct_predictions += 1
-
- return num_correct_predictions / num_test_images_with_ground_truth
-
-
-def MeanAveragePrecision(predictions, retrieval_solution, max_predictions=100):
- """Computes mean average precision for retrieval prediction.
-
- Args:
- predictions: Dict mapping test image ID to a list of strings corresponding
- to index image IDs.
- retrieval_solution: Dict mapping test image ID to list of ground-truth image
- IDs.
- max_predictions: Maximum number of predictions per query to take into
- account. For the Google Landmark Retrieval challenge, this should be set
- to 100.
-
- Returns:
- mean_ap: Mean average precision score (float).
-
- Raises:
- ValueError: If a test image in `predictions` is not included in
- `retrieval_solutions`.
- """
- # Compute number of test images.
- num_test_images = len(retrieval_solution.keys())
-
- # Loop over predictions for each query and compute mAP.
- mean_ap = 0.0
- for key, prediction in predictions.items():
- if key not in retrieval_solution:
- raise ValueError('Test image %s is not part of retrieval_solution' % key)
-
- # Loop over predicted images, keeping track of those which were already
- # used (duplicates are skipped).
- ap = 0.0
- already_predicted = set()
- num_expected_retrieved = min(len(retrieval_solution[key]), max_predictions)
- num_correct = 0
- for i in range(min(len(prediction), max_predictions)):
- if prediction[i] not in already_predicted:
- if prediction[i] in retrieval_solution[key]:
- num_correct += 1
- ap += num_correct / (i + 1)
- already_predicted.add(prediction[i])
-
- ap /= num_expected_retrieved
- mean_ap += ap
-
- mean_ap /= num_test_images
-
- return mean_ap
-
-
-def MeanPrecisions(predictions, retrieval_solution, max_predictions=100):
- """Computes mean precisions for retrieval prediction.
-
- Args:
- predictions: Dict mapping test image ID to a list of strings corresponding
- to index image IDs.
- retrieval_solution: Dict mapping test image ID to list of ground-truth image
- IDs.
- max_predictions: Maximum number of predictions per query to take into
- account.
-
- Returns:
- mean_precisions: NumPy array with mean precisions at ranks 1 through
- `max_predictions`.
-
- Raises:
- ValueError: If a test image in `predictions` is not included in
- `retrieval_solutions`.
- """
- # Compute number of test images.
- num_test_images = len(retrieval_solution.keys())
-
- # Loop over predictions for each query and compute precisions@k.
- precisions = np.zeros((num_test_images, max_predictions))
- count_test_images = 0
- for key, prediction in predictions.items():
- if key not in retrieval_solution:
- raise ValueError('Test image %s is not part of retrieval_solution' % key)
-
- # Loop over predicted images, keeping track of those which were already
- # used (duplicates are skipped).
- already_predicted = set()
- num_correct = 0
- for i in range(max_predictions):
- if i < len(prediction):
- if prediction[i] not in already_predicted:
- if prediction[i] in retrieval_solution[key]:
- num_correct += 1
- already_predicted.add(prediction[i])
- precisions[count_test_images, i] = num_correct / (i + 1)
- count_test_images += 1
-
- mean_precisions = np.mean(precisions, axis=0)
-
- return mean_precisions
-
-
-def MeanMedianPosition(predictions, retrieval_solution, max_predictions=100):
- """Computes mean and median positions of first correct image.
-
- Args:
- predictions: Dict mapping test image ID to a list of strings corresponding
- to index image IDs.
- retrieval_solution: Dict mapping test image ID to list of ground-truth image
- IDs.
- max_predictions: Maximum number of predictions per query to take into
- account.
-
- Returns:
- mean_position: Float.
- median_position: Float.
-
- Raises:
- ValueError: If a test image in `predictions` is not included in
- `retrieval_solutions`.
- """
- # Compute number of test images.
- num_test_images = len(retrieval_solution.keys())
-
- # Loop over predictions for each query to find first correct ranked image.
- positions = (max_predictions + 1) * np.ones((num_test_images))
- count_test_images = 0
- for key, prediction in predictions.items():
- if key not in retrieval_solution:
- raise ValueError('Test image %s is not part of retrieval_solution' % key)
-
- for i in range(min(len(prediction), max_predictions)):
- if prediction[i] in retrieval_solution[key]:
- positions[count_test_images] = i + 1
- break
-
- count_test_images += 1
-
- mean_position = np.mean(positions)
- median_position = np.median(positions)
-
- return mean_position, median_position
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/metrics_test.py b/research/delf/delf/python/datasets/google_landmarks_dataset/metrics_test.py
deleted file mode 100644
index ee8a443de16..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/metrics_test.py
+++ /dev/null
@@ -1,219 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for Google Landmarks dataset metric computation."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from delf.python.datasets.google_landmarks_dataset import metrics
-
-
-def _CreateRecognitionSolution():
- """Creates recognition solution to be used in tests.
-
- Returns:
- solution: Dict mapping test image ID to list of ground-truth landmark IDs.
- """
- return {
- '0123456789abcdef': [0, 12],
- '0223456789abcdef': [100, 200, 300],
- '0323456789abcdef': [1],
- '0423456789abcdef': [],
- '0523456789abcdef': [],
- }
-
-
-def _CreateRecognitionPredictions():
- """Creates recognition predictions to be used in tests.
-
- Returns:
- predictions: Dict mapping test image ID to a dict with keys 'class'
- (integer) and 'score' (float).
- """
- return {
- '0223456789abcdef': {
- 'class': 0,
- 'score': 0.01
- },
- '0323456789abcdef': {
- 'class': 1,
- 'score': 10.0
- },
- '0423456789abcdef': {
- 'class': 150,
- 'score': 15.0
- },
- }
-
-
-def _CreateRetrievalSolution():
- """Creates retrieval solution to be used in tests.
-
- Returns:
- solution: Dict mapping test image ID to list of ground-truth image IDs.
- """
- return {
- '0123456789abcdef': ['fedcba9876543210', 'fedcba9876543220'],
- '0223456789abcdef': ['fedcba9876543210'],
- '0323456789abcdef': [
- 'fedcba9876543230', 'fedcba9876543240', 'fedcba9876543250'
- ],
- '0423456789abcdef': ['fedcba9876543230'],
- }
-
-
-def _CreateRetrievalPredictions():
- """Creates retrieval predictions to be used in tests.
-
- Returns:
- predictions: Dict mapping test image ID to a list with predicted index image
- ids.
- """
- return {
- '0223456789abcdef': ['fedcba9876543200', 'fedcba9876543210'],
- '0323456789abcdef': ['fedcba9876543240'],
- '0423456789abcdef': ['fedcba9876543230', 'fedcba9876543240'],
- }
-
-
-class MetricsTest(tf.test.TestCase):
-
- def testGlobalAveragePrecisionWorks(self):
- # Define input.
- predictions = _CreateRecognitionPredictions()
- solution = _CreateRecognitionSolution()
-
- # Run tested function.
- gap = metrics.GlobalAveragePrecision(predictions, solution)
-
- # Define expected results.
- expected_gap = 0.166667
-
- # Compare actual and expected results.
- self.assertAllClose(gap, expected_gap)
-
- def testGlobalAveragePrecisionIgnoreNonGroundTruthWorks(self):
- # Define input.
- predictions = _CreateRecognitionPredictions()
- solution = _CreateRecognitionSolution()
-
- # Run tested function.
- gap = metrics.GlobalAveragePrecision(
- predictions, solution, ignore_non_gt_test_images=True)
-
- # Define expected results.
- expected_gap = 0.333333
-
- # Compare actual and expected results.
- self.assertAllClose(gap, expected_gap)
-
- def testTop1AccuracyWorks(self):
- # Define input.
- predictions = _CreateRecognitionPredictions()
- solution = _CreateRecognitionSolution()
-
- # Run tested function.
- accuracy = metrics.Top1Accuracy(predictions, solution)
-
- # Define expected results.
- expected_accuracy = 0.333333
-
- # Compare actual and expected results.
- self.assertAllClose(accuracy, expected_accuracy)
-
- def testMeanAveragePrecisionWorks(self):
- # Define input.
- predictions = _CreateRetrievalPredictions()
- solution = _CreateRetrievalSolution()
-
- # Run tested function.
- mean_ap = metrics.MeanAveragePrecision(predictions, solution)
-
- # Define expected results.
- expected_mean_ap = 0.458333
-
- # Compare actual and expected results.
- self.assertAllClose(mean_ap, expected_mean_ap)
-
- def testMeanAveragePrecisionMaxPredictionsWorks(self):
- # Define input.
- predictions = _CreateRetrievalPredictions()
- solution = _CreateRetrievalSolution()
-
- # Run tested function.
- mean_ap = metrics.MeanAveragePrecision(
- predictions, solution, max_predictions=1)
-
- # Define expected results.
- expected_mean_ap = 0.5
-
- # Compare actual and expected results.
- self.assertAllClose(mean_ap, expected_mean_ap)
-
- def testMeanPrecisionsWorks(self):
- # Define input.
- predictions = _CreateRetrievalPredictions()
- solution = _CreateRetrievalSolution()
-
- # Run tested function.
- mean_precisions = metrics.MeanPrecisions(
- predictions, solution, max_predictions=2)
-
- # Define expected results.
- expected_mean_precisions = [0.5, 0.375]
-
- # Compare actual and expected results.
- self.assertAllClose(mean_precisions, expected_mean_precisions)
-
- def testMeanMedianPositionWorks(self):
- # Define input.
- predictions = _CreateRetrievalPredictions()
- solution = _CreateRetrievalSolution()
-
- # Run tested function.
- mean_position, median_position = metrics.MeanMedianPosition(
- predictions, solution)
-
- # Define expected results.
- expected_mean_position = 26.25
- expected_median_position = 1.5
-
- # Compare actual and expected results.
- self.assertAllClose(mean_position, expected_mean_position)
- self.assertAllClose(median_position, expected_median_position)
-
- def testMeanMedianPositionMaxPredictionsWorks(self):
- # Define input.
- predictions = _CreateRetrievalPredictions()
- solution = _CreateRetrievalSolution()
-
- # Run tested function.
- mean_position, median_position = metrics.MeanMedianPosition(
- predictions, solution, max_predictions=1)
-
- # Define expected results.
- expected_mean_position = 1.5
- expected_median_position = 1.5
-
- # Compare actual and expected results.
- self.assertAllClose(mean_position, expected_mean_position)
- self.assertAllClose(median_position, expected_median_position)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/datasets/google_landmarks_dataset/rn101_af_gldv2clean_config.pbtxt b/research/delf/delf/python/datasets/google_landmarks_dataset/rn101_af_gldv2clean_config.pbtxt
deleted file mode 100644
index 6a065d51280..00000000000
--- a/research/delf/delf/python/datasets/google_landmarks_dataset/rn101_af_gldv2clean_config.pbtxt
+++ /dev/null
@@ -1,10 +0,0 @@
-use_local_features: false
-use_global_features: true
-model_path: "parameters/rn101_af_gldv2clean_20200814"
-image_scales: 0.70710677
-image_scales: 1.0
-image_scales: 1.4142135
-delf_global_config {
- use_pca: false
-}
-max_image_size: 1024
diff --git a/research/delf/delf/python/datasets/revisited_op/__init__.py b/research/delf/delf/python/datasets/revisited_op/__init__.py
deleted file mode 100644
index 1a8b35fb1b4..00000000000
--- a/research/delf/delf/python/datasets/revisited_op/__init__.py
+++ /dev/null
@@ -1,22 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Module for revisited Oxford and Paris datasets."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-# pylint: disable=unused-import
-from delf.python.datasets.revisited_op import dataset
-# pylint: enable=unused-import
diff --git a/research/delf/delf/python/datasets/revisited_op/dataset.py b/research/delf/delf/python/datasets/revisited_op/dataset.py
deleted file mode 100644
index ae3020cd345..00000000000
--- a/research/delf/delf/python/datasets/revisited_op/dataset.py
+++ /dev/null
@@ -1,535 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Python library to parse ground-truth/evaluate on Revisited datasets."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import pickle
-
-import numpy as np
-from scipy.io import matlab
-import tensorflow as tf
-
-_GROUND_TRUTH_KEYS = ['easy', 'hard', 'junk']
-
-DATASET_NAMES = ['roxford5k', 'rparis6k']
-
-
-def ReadDatasetFile(dataset_file_path):
- """Reads dataset file in Revisited Oxford/Paris ".mat" format.
-
- Args:
- dataset_file_path: Path to dataset file, in .mat format.
-
- Returns:
- query_list: List of query image names.
- index_list: List of index image names.
- ground_truth: List containing ground-truth information for dataset. Each
- entry is a dict corresponding to the ground-truth information for a query.
- The dict may have keys 'easy', 'hard', or 'junk', mapping to a NumPy
- array of integers; additionally, it has a key 'bbx' mapping to a NumPy
- array of floats with bounding box coordinates.
- """
- with tf.io.gfile.GFile(dataset_file_path, 'rb') as f:
- cfg = matlab.loadmat(f)
-
- # Parse outputs according to the specificities of the dataset file.
- query_list = [str(im_array[0]) for im_array in np.squeeze(cfg['qimlist'])]
- index_list = [str(im_array[0]) for im_array in np.squeeze(cfg['imlist'])]
- ground_truth_raw = np.squeeze(cfg['gnd'])
- ground_truth = []
- for query_ground_truth_raw in ground_truth_raw:
- query_ground_truth = {}
- for ground_truth_key in _GROUND_TRUTH_KEYS:
- if ground_truth_key in query_ground_truth_raw.dtype.names:
- adjusted_labels = query_ground_truth_raw[ground_truth_key] - 1
- query_ground_truth[ground_truth_key] = adjusted_labels.flatten()
-
- query_ground_truth['bbx'] = np.squeeze(query_ground_truth_raw['bbx'])
- ground_truth.append(query_ground_truth)
-
- return query_list, index_list, ground_truth
-
-
-def _ParseGroundTruth(ok_list, junk_list):
- """Constructs dictionary of ok/junk indices for a data subset and query.
-
- Args:
- ok_list: List of NumPy arrays containing true positive indices for query.
- junk_list: List of NumPy arrays containing ignored indices for query.
-
- Returns:
- ok_junk_dict: Dict mapping 'ok' and 'junk' strings to NumPy array of
- indices.
- """
- ok_junk_dict = {}
- ok_junk_dict['ok'] = np.concatenate(ok_list)
- ok_junk_dict['junk'] = np.concatenate(junk_list)
- return ok_junk_dict
-
-
-def ParseEasyMediumHardGroundTruth(ground_truth):
- """Parses easy/medium/hard ground-truth from Revisited datasets.
-
- Args:
- ground_truth: Usually the output from ReadDatasetFile(). List containing
- ground-truth information for dataset. Each entry is a dict corresponding
- to the ground-truth information for a query. The dict must have keys
- 'easy', 'hard', and 'junk', mapping to a NumPy array of integers.
-
- Returns:
- easy_ground_truth: List containing ground-truth information for easy subset
- of dataset. Each entry is a dict corresponding to the ground-truth
- information for a query. The dict has keys 'ok' and 'junk', mapping to a
- NumPy array of integers.
- medium_ground_truth: Same as `easy_ground_truth`, but for the medium subset.
- hard_ground_truth: Same as `easy_ground_truth`, but for the hard subset.
- """
- num_queries = len(ground_truth)
-
- easy_ground_truth = []
- medium_ground_truth = []
- hard_ground_truth = []
- for i in range(num_queries):
- easy_ground_truth.append(
- _ParseGroundTruth([ground_truth[i]['easy']],
- [ground_truth[i]['junk'], ground_truth[i]['hard']]))
- medium_ground_truth.append(
- _ParseGroundTruth([ground_truth[i]['easy'], ground_truth[i]['hard']],
- [ground_truth[i]['junk']]))
- hard_ground_truth.append(
- _ParseGroundTruth([ground_truth[i]['hard']],
- [ground_truth[i]['junk'], ground_truth[i]['easy']]))
-
- return easy_ground_truth, medium_ground_truth, hard_ground_truth
-
-
-def AdjustPositiveRanks(positive_ranks, junk_ranks):
- """Adjusts positive ranks based on junk ranks.
-
- Args:
- positive_ranks: Sorted 1D NumPy integer array.
- junk_ranks: Sorted 1D NumPy integer array.
-
- Returns:
- adjusted_positive_ranks: Sorted 1D NumPy array.
- """
- if not junk_ranks.size:
- return positive_ranks
-
- adjusted_positive_ranks = positive_ranks
- j = 0
- for i, positive_index in enumerate(positive_ranks):
- while (j < len(junk_ranks) and positive_index > junk_ranks[j]):
- j += 1
-
- adjusted_positive_ranks[i] -= j
-
- return adjusted_positive_ranks
-
-
-def ComputeAveragePrecision(positive_ranks):
- """Computes average precision according to dataset convention.
-
- It assumes that `positive_ranks` contains the ranks for all expected positive
- index images to be retrieved. If `positive_ranks` is empty, returns
- `average_precision` = 0.
-
- Note that average precision computation here does NOT use the finite sum
- method (see
- https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision)
- which is common in information retrieval literature. Instead, the method
- implemented here integrates over the precision-recall curve by averaging two
- adjacent precision points, then multiplying by the recall step. This is the
- convention for the Revisited Oxford/Paris datasets.
-
- Args:
- positive_ranks: Sorted 1D NumPy integer array, zero-indexed.
-
- Returns:
- average_precision: Float.
- """
- average_precision = 0.0
-
- num_expected_positives = len(positive_ranks)
- if not num_expected_positives:
- return average_precision
-
- recall_step = 1.0 / num_expected_positives
- for i, rank in enumerate(positive_ranks):
- if not rank:
- left_precision = 1.0
- else:
- left_precision = i / rank
-
- right_precision = (i + 1) / (rank + 1)
- average_precision += (left_precision + right_precision) * recall_step / 2
-
- return average_precision
-
-
-def ComputePRAtRanks(positive_ranks, desired_pr_ranks):
- """Computes precision/recall at desired ranks.
-
- It assumes that `positive_ranks` contains the ranks for all expected positive
- index images to be retrieved. If `positive_ranks` is empty, return all-zeros
- `precisions`/`recalls`.
-
- If a desired rank is larger than the last positive rank, its precision is
- computed based on the last positive rank. For example, if `desired_pr_ranks`
- is [10] and `positive_ranks` = [0, 7] --> `precisions` = [0.25], `recalls` =
- [1.0].
-
- Args:
- positive_ranks: 1D NumPy integer array, zero-indexed.
- desired_pr_ranks: List of integers containing the desired precision/recall
- ranks to be reported. Eg, if precision@1/recall@1 and
- precision@10/recall@10 are desired, this should be set to [1, 10].
-
- Returns:
- precisions: Precision @ `desired_pr_ranks` (NumPy array of
- floats, with shape [len(desired_pr_ranks)]).
- recalls: Recall @ `desired_pr_ranks` (NumPy array of floats, with
- shape [len(desired_pr_ranks)]).
- """
- num_desired_pr_ranks = len(desired_pr_ranks)
- precisions = np.zeros([num_desired_pr_ranks])
- recalls = np.zeros([num_desired_pr_ranks])
-
- num_expected_positives = len(positive_ranks)
- if not num_expected_positives:
- return precisions, recalls
-
- positive_ranks_one_indexed = positive_ranks + 1
- for i, desired_pr_rank in enumerate(desired_pr_ranks):
- recalls[i] = np.sum(
- positive_ranks_one_indexed <= desired_pr_rank) / num_expected_positives
-
- # If `desired_pr_rank` is larger than last positive's rank, only compute
- # precision with respect to last positive's position.
- precision_rank = min(max(positive_ranks_one_indexed), desired_pr_rank)
- precisions[i] = np.sum(
- positive_ranks_one_indexed <= precision_rank) / precision_rank
-
- return precisions, recalls
-
-
-def ComputeMetrics(sorted_index_ids, ground_truth, desired_pr_ranks):
- """Computes metrics for retrieval results on the Revisited datasets.
-
- If there are no valid ground-truth index images for a given query, the metric
- results for the given query (`average_precisions`, `precisions` and `recalls`)
- are set to NaN, and they are not taken into account when computing the
- aggregated metrics (`mean_average_precision`, `mean_precisions` and
- `mean_recalls`) over all queries.
-
- Args:
- sorted_index_ids: Integer NumPy array of shape [#queries, #index_images].
- For each query, contains an array denoting the most relevant index images,
- sorted from most to least relevant.
- ground_truth: List containing ground-truth information for dataset. Each
- entry is a dict corresponding to the ground-truth information for a query.
- The dict has keys 'ok' and 'junk', mapping to a NumPy array of integers.
- desired_pr_ranks: List of integers containing the desired precision/recall
- ranks to be reported. Eg, if precision@1/recall@1 and
- precision@10/recall@10 are desired, this should be set to [1, 10]. The
- largest item should be <= #index_images.
-
- Returns:
- mean_average_precision: Mean average precision (float).
- mean_precisions: Mean precision @ `desired_pr_ranks` (NumPy array of
- floats, with shape [len(desired_pr_ranks)]).
- mean_recalls: Mean recall @ `desired_pr_ranks` (NumPy array of floats, with
- shape [len(desired_pr_ranks)]).
- average_precisions: Average precision for each query (NumPy array of floats,
- with shape [#queries]).
- precisions: Precision @ `desired_pr_ranks`, for each query (NumPy array of
- floats, with shape [#queries, len(desired_pr_ranks)]).
- recalls: Recall @ `desired_pr_ranks`, for each query (NumPy array of
- floats, with shape [#queries, len(desired_pr_ranks)]).
-
- Raises:
- ValueError: If largest desired PR rank in `desired_pr_ranks` >
- #index_images.
- """
- num_queries, num_index_images = sorted_index_ids.shape
- num_desired_pr_ranks = len(desired_pr_ranks)
-
- sorted_desired_pr_ranks = sorted(desired_pr_ranks)
-
- if sorted_desired_pr_ranks[-1] > num_index_images:
- raise ValueError(
- 'Requested PR ranks up to %d, however there are only %d images' %
- (sorted_desired_pr_ranks[-1], num_index_images))
-
- # Instantiate all outputs, then loop over each query and gather metrics.
- mean_average_precision = 0.0
- mean_precisions = np.zeros([num_desired_pr_ranks])
- mean_recalls = np.zeros([num_desired_pr_ranks])
- average_precisions = np.zeros([num_queries])
- precisions = np.zeros([num_queries, num_desired_pr_ranks])
- recalls = np.zeros([num_queries, num_desired_pr_ranks])
- num_empty_gt_queries = 0
- for i in range(num_queries):
- ok_index_images = ground_truth[i]['ok']
- junk_index_images = ground_truth[i]['junk']
-
- if not ok_index_images.size:
- average_precisions[i] = float('nan')
- precisions[i, :] = float('nan')
- recalls[i, :] = float('nan')
- num_empty_gt_queries += 1
- continue
-
- positive_ranks = np.arange(num_index_images)[np.in1d(
- sorted_index_ids[i], ok_index_images)]
- junk_ranks = np.arange(num_index_images)[np.in1d(sorted_index_ids[i],
- junk_index_images)]
-
- adjusted_positive_ranks = AdjustPositiveRanks(positive_ranks, junk_ranks)
-
- average_precisions[i] = ComputeAveragePrecision(adjusted_positive_ranks)
- precisions[i, :], recalls[i, :] = ComputePRAtRanks(adjusted_positive_ranks,
- desired_pr_ranks)
-
- mean_average_precision += average_precisions[i]
- mean_precisions += precisions[i, :]
- mean_recalls += recalls[i, :]
-
- # Normalize aggregated metrics by number of queries.
- num_valid_queries = num_queries - num_empty_gt_queries
- mean_average_precision /= num_valid_queries
- mean_precisions /= num_valid_queries
- mean_recalls /= num_valid_queries
-
- return (mean_average_precision, mean_precisions, mean_recalls,
- average_precisions, precisions, recalls)
-
-
-def SaveMetricsFile(mean_average_precision, mean_precisions, mean_recalls,
- pr_ranks, output_path):
- """Saves aggregated retrieval metrics to text file.
-
- Args:
- mean_average_precision: Dict mapping each dataset protocol to a float.
- mean_precisions: Dict mapping each dataset protocol to a NumPy array of
- floats with shape [len(pr_ranks)].
- mean_recalls: Dict mapping each dataset protocol to a NumPy array of floats
- with shape [len(pr_ranks)].
- pr_ranks: List of integers.
- output_path: Full file path.
- """
- with tf.io.gfile.GFile(output_path, 'w') as f:
- for k in sorted(mean_average_precision.keys()):
- f.write('{}\n mAP={}\n mP@k{} {}\n mR@k{} {}\n'.format(
- k, np.around(mean_average_precision[k] * 100, decimals=2),
- np.array(pr_ranks), np.around(mean_precisions[k] * 100, decimals=2),
- np.array(pr_ranks), np.around(mean_recalls[k] * 100, decimals=2)))
-
-
-def _ParseSpaceSeparatedStringsInBrackets(line, prefixes, ind):
- """Parses line containing space-separated strings in brackets.
-
- Args:
- line: String, containing line in metrics file with mP@k or mR@k figures.
- prefixes: Tuple/list of strings, containing valid prefixes.
- ind: Integer indicating which field within brackets is parsed.
-
- Yields:
- entry: String format entry.
-
- Raises:
- ValueError: If input line does not contain a valid prefix.
- """
- for prefix in prefixes:
- if line.startswith(prefix):
- line = line[len(prefix):]
- break
- else:
- raise ValueError('Line %s is malformed, cannot find valid prefixes' % line)
-
- for entry in line.split('[')[ind].split(']')[0].split():
- yield entry
-
-
-def _ParsePrRanks(line):
- """Parses PR ranks from mP@k line in metrics file.
-
- Args:
- line: String, containing line in metrics file with mP@k figures.
-
- Returns:
- pr_ranks: List of integers, containing used ranks.
-
- Raises:
- ValueError: If input line is malformed.
- """
- return [
- int(pr_rank) for pr_rank in _ParseSpaceSeparatedStringsInBrackets(
- line, [' mP@k['], 0) if pr_rank
- ]
-
-
-def _ParsePrScores(line, num_pr_ranks):
- """Parses PR scores from line in metrics file.
-
- Args:
- line: String, containing line in metrics file with mP@k or mR@k figures.
- num_pr_ranks: Integer, number of scores that should be in output list.
-
- Returns:
- pr_scores: List of floats, containing scores.
-
- Raises:
- ValueError: If input line is malformed.
- """
- pr_scores = [
- float(pr_score) for pr_score in _ParseSpaceSeparatedStringsInBrackets(
- line, (' mP@k[', ' mR@k['), 1) if pr_score
- ]
-
- if len(pr_scores) != num_pr_ranks:
- raise ValueError('Line %s is malformed, expected %d scores but found %d' %
- (line, num_pr_ranks, len(pr_scores)))
-
- return pr_scores
-
-
-def ReadMetricsFile(metrics_path):
- """Reads aggregated retrieval metrics from text file.
-
- Args:
- metrics_path: Full file path, containing aggregated retrieval metrics.
-
- Returns:
- mean_average_precision: Dict mapping each dataset protocol to a float.
- pr_ranks: List of integer ranks used in aggregated recall/precision metrics.
- mean_precisions: Dict mapping each dataset protocol to a NumPy array of
- floats with shape [len(`pr_ranks`)].
- mean_recalls: Dict mapping each dataset protocol to a NumPy array of floats
- with shape [len(`pr_ranks`)].
-
- Raises:
- ValueError: If input file is malformed.
- """
- with tf.io.gfile.GFile(metrics_path, 'r') as f:
- file_contents_stripped = [l.rstrip() for l in f]
-
- if len(file_contents_stripped) % 4:
- raise ValueError(
- 'Malformed input %s: number of lines must be a multiple of 4, '
- 'but it is %d' % (metrics_path, len(file_contents_stripped)))
-
- mean_average_precision = {}
- pr_ranks = []
- mean_precisions = {}
- mean_recalls = {}
- protocols = set()
- for i in range(0, len(file_contents_stripped), 4):
- protocol = file_contents_stripped[i]
- if protocol in protocols:
- raise ValueError(
- 'Malformed input %s: protocol %s is found a second time' %
- (metrics_path, protocol))
- protocols.add(protocol)
-
- # Parse mAP.
- mean_average_precision[protocol] = float(
- file_contents_stripped[i + 1].split('=')[1]) / 100.0
-
- # Parse (or check consistency of) pr_ranks.
- parsed_pr_ranks = _ParsePrRanks(file_contents_stripped[i + 2])
- if not pr_ranks:
- pr_ranks = parsed_pr_ranks
- else:
- if parsed_pr_ranks != pr_ranks:
- raise ValueError('Malformed input %s: inconsistent PR ranks' %
- metrics_path)
-
- # Parse mean precisions.
- mean_precisions[protocol] = np.array(
- _ParsePrScores(file_contents_stripped[i + 2], len(pr_ranks)),
- dtype=float) / 100.0
-
- # Parse mean recalls.
- mean_recalls[protocol] = np.array(
- _ParsePrScores(file_contents_stripped[i + 3], len(pr_ranks)),
- dtype=float) / 100.0
-
- return mean_average_precision, pr_ranks, mean_precisions, mean_recalls
-
-
-def CreateConfigForTestDataset(dataset, dir_main):
- """Creates the configuration dictionary for the test dataset.
-
- Args:
- dataset: String, dataset name: either 'roxford5k' or 'rparis6k'.
- dir_main: String, path to the folder containing ground truth files.
-
- Returns:
- cfg: Dataset configuration in a form of dictionary. The configuration
- includes:
- `gnd_fname` - path to the ground truth file for the dataset,
- `ext` and `qext` - image extensions for the images in the test dataset
- and the query images,
- `dir_data` - path to the folder containing ground truth files,
- `dir_images` - path to the folder containing images,
- `n` and `nq` - number of images and query images in the dataset
- respectively,
- `im_fname` and `qim_fname` - functions providing paths for the dataset
- and query images respectively,
- `dataset` - test dataset name.
-
- Raises:
- ValueError: If an unknown dataset name is provided as an argument.
- """
- dataset = dataset.lower()
-
- def _ConfigImname(cfg, i):
- return os.path.join(cfg['dir_images'], cfg['imlist'][i] + cfg['ext'])
-
- def _ConfigQimname(cfg, i):
- return os.path.join(cfg['dir_images'], cfg['qimlist'][i] + cfg['qext'])
-
- if dataset not in DATASET_NAMES:
- raise ValueError('Unknown dataset: {}!'.format(dataset))
-
- # Loading imlist, qimlist, and gnd in configuration as a dictionary.
- gnd_fname = os.path.join(dir_main, 'gnd_{}.pkl'.format(dataset))
- with tf.io.gfile.GFile(gnd_fname, 'rb') as f:
- cfg = pickle.load(f)
- cfg['gnd_fname'] = gnd_fname
- if dataset == 'rparis6k':
- dir_images = 'paris6k_images'
- elif dataset == 'roxford5k':
- dir_images = 'oxford5k_images'
-
- cfg['ext'] = '.jpg'
- cfg['qext'] = '.jpg'
- cfg['dir_data'] = os.path.join(dir_main)
- cfg['dir_images'] = os.path.join(cfg['dir_data'], dir_images)
-
- cfg['n'] = len(cfg['imlist'])
- cfg['nq'] = len(cfg['qimlist'])
-
- cfg['im_fname'] = _ConfigImname
- cfg['qim_fname'] = _ConfigQimname
-
- cfg['dataset'] = dataset
-
- return cfg
diff --git a/research/delf/delf/python/datasets/revisited_op/dataset_test.py b/research/delf/delf/python/datasets/revisited_op/dataset_test.py
deleted file mode 100644
index 04caa64f098..00000000000
--- a/research/delf/delf/python/datasets/revisited_op/dataset_test.py
+++ /dev/null
@@ -1,288 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for the python library parsing Revisited Oxford/Paris datasets."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import numpy as np
-import tensorflow as tf
-
-from delf.python.datasets.revisited_op import dataset
-
-FLAGS = flags.FLAGS
-
-
-class DatasetTest(tf.test.TestCase):
-
- def testParseEasyMediumHardGroundTruth(self):
- # Define input.
- ground_truth = [{
- 'easy': np.array([10, 56, 100]),
- 'hard': np.array([0]),
- 'junk': np.array([6, 90])
- }, {
- 'easy': np.array([], dtype='int64'),
- 'hard': [5],
- 'junk': [99, 100]
- }, {
- 'easy': [33],
- 'hard': [66, 99],
- 'junk': np.array([], dtype='int64')
- }]
-
- # Run tested function.
- (easy_ground_truth, medium_ground_truth,
- hard_ground_truth) = dataset.ParseEasyMediumHardGroundTruth(ground_truth)
-
- # Define expected outputs.
- expected_easy_ground_truth = [{
- 'ok': np.array([10, 56, 100]),
- 'junk': np.array([6, 90, 0])
- }, {
- 'ok': np.array([], dtype='int64'),
- 'junk': np.array([99, 100, 5])
- }, {
- 'ok': np.array([33]),
- 'junk': np.array([66, 99])
- }]
- expected_medium_ground_truth = [{
- 'ok': np.array([10, 56, 100, 0]),
- 'junk': np.array([6, 90])
- }, {
- 'ok': np.array([5]),
- 'junk': np.array([99, 100])
- }, {
- 'ok': np.array([33, 66, 99]),
- 'junk': np.array([], dtype='int64')
- }]
- expected_hard_ground_truth = [{
- 'ok': np.array([0]),
- 'junk': np.array([6, 90, 10, 56, 100])
- }, {
- 'ok': np.array([5]),
- 'junk': np.array([99, 100])
- }, {
- 'ok': np.array([66, 99]),
- 'junk': np.array([33])
- }]
-
- # Compare actual versus expected.
- def _AssertListOfDictsOfArraysAreEqual(ground_truth, expected_ground_truth):
- """Helper function to compare ground-truth data.
-
- Args:
- ground_truth: List of dicts of arrays.
- expected_ground_truth: List of dicts of arrays.
- """
- self.assertEqual(len(ground_truth), len(expected_ground_truth))
-
- for i, ground_truth_entry in enumerate(ground_truth):
- self.assertEqual(sorted(ground_truth_entry.keys()), ['junk', 'ok'])
- self.assertAllEqual(ground_truth_entry['junk'],
- expected_ground_truth[i]['junk'])
- self.assertAllEqual(ground_truth_entry['ok'],
- expected_ground_truth[i]['ok'])
-
- _AssertListOfDictsOfArraysAreEqual(easy_ground_truth,
- expected_easy_ground_truth)
- _AssertListOfDictsOfArraysAreEqual(medium_ground_truth,
- expected_medium_ground_truth)
- _AssertListOfDictsOfArraysAreEqual(hard_ground_truth,
- expected_hard_ground_truth)
-
- def testAdjustPositiveRanksWorks(self):
- # Define inputs.
- positive_ranks = np.array([0, 2, 6, 10, 20])
- junk_ranks = np.array([1, 8, 9, 30])
-
- # Run tested function.
- adjusted_positive_ranks = dataset.AdjustPositiveRanks(
- positive_ranks, junk_ranks)
-
- # Define expected output.
- expected_adjusted_positive_ranks = [0, 1, 5, 7, 17]
-
- # Compare actual versus expected.
- self.assertAllEqual(adjusted_positive_ranks,
- expected_adjusted_positive_ranks)
-
- def testComputeAveragePrecisionWorks(self):
- # Define input.
- positive_ranks = [0, 2, 5]
-
- # Run tested function.
- average_precision = dataset.ComputeAveragePrecision(positive_ranks)
-
- # Define expected output.
- expected_average_precision = 0.677778
-
- # Compare actual versus expected.
- self.assertAllClose(average_precision, expected_average_precision)
-
- def testComputePRAtRanksWorks(self):
- # Define inputs.
- positive_ranks = np.array([0, 2, 5])
- desired_pr_ranks = np.array([1, 5, 10])
-
- # Run tested function.
- precisions, recalls = dataset.ComputePRAtRanks(positive_ranks,
- desired_pr_ranks)
-
- # Define expected outputs.
- expected_precisions = [1.0, 0.4, 0.5]
- expected_recalls = [0.333333, 0.666667, 1.0]
-
- # Compare actual versus expected.
- self.assertAllClose(precisions, expected_precisions)
- self.assertAllClose(recalls, expected_recalls)
-
- def testComputeMetricsWorks(self):
- # Define inputs: 3 queries. For the last one, there are no expected images
- # to be retrieved
- sorted_index_ids = np.array([[4, 2, 0, 1, 3], [0, 2, 4, 1, 3],
- [0, 1, 2, 3, 4]])
- ground_truth = [{
- 'ok': np.array([0, 1]),
- 'junk': np.array([2])
- }, {
- 'ok': np.array([0, 4]),
- 'junk': np.array([], dtype='int64')
- }, {
- 'ok': np.array([], dtype='int64'),
- 'junk': np.array([], dtype='int64')
- }]
- desired_pr_ranks = [1, 2, 5]
-
- # Run tested function.
- (mean_average_precision, mean_precisions, mean_recalls, average_precisions,
- precisions, recalls) = dataset.ComputeMetrics(sorted_index_ids,
- ground_truth,
- desired_pr_ranks)
-
- # Define expected outputs.
- expected_mean_average_precision = 0.604167
- expected_mean_precisions = [0.5, 0.5, 0.666667]
- expected_mean_recalls = [0.25, 0.5, 1.0]
- expected_average_precisions = [0.416667, 0.791667, float('nan')]
- expected_precisions = [[0.0, 0.5, 0.666667], [1.0, 0.5, 0.666667],
- [float('nan'),
- float('nan'),
- float('nan')]]
- expected_recalls = [[0.0, 0.5, 1.0], [0.5, 0.5, 1.0],
- [float('nan'), float('nan'),
- float('nan')]]
-
- # Compare actual versus expected.
- self.assertAllClose(mean_average_precision, expected_mean_average_precision)
- self.assertAllClose(mean_precisions, expected_mean_precisions)
- self.assertAllClose(mean_recalls, expected_mean_recalls)
- self.assertAllClose(average_precisions, expected_average_precisions)
- self.assertAllClose(precisions, expected_precisions)
- self.assertAllClose(recalls, expected_recalls)
-
- def testSaveMetricsFileWorks(self):
- # Define inputs.
- mean_average_precision = {'hard': 0.7, 'medium': 0.9}
- mean_precisions = {
- 'hard': np.array([1.0, 0.8]),
- 'medium': np.array([1.0, 1.0])
- }
- mean_recalls = {
- 'hard': np.array([0.5, 0.8]),
- 'medium': np.array([0.5, 1.0])
- }
- pr_ranks = [1, 5]
- output_path = os.path.join(FLAGS.test_tmpdir, 'metrics.txt')
-
- # Run tested function.
- dataset.SaveMetricsFile(mean_average_precision, mean_precisions,
- mean_recalls, pr_ranks, output_path)
-
- # Define expected results.
- expected_metrics = ('hard\n'
- ' mAP=70.0\n'
- ' mP@k[1 5] [100. 80.]\n'
- ' mR@k[1 5] [50. 80.]\n'
- 'medium\n'
- ' mAP=90.0\n'
- ' mP@k[1 5] [100. 100.]\n'
- ' mR@k[1 5] [ 50. 100.]\n')
-
- # Parse actual results, and compare to expected.
- with tf.io.gfile.GFile(output_path) as f:
- metrics = f.read()
-
- self.assertEqual(metrics, expected_metrics)
-
- def testSaveAndReadMetricsWorks(self):
- # Define inputs.
- mean_average_precision = {'hard': 0.7, 'medium': 0.9}
- mean_precisions = {
- 'hard': np.array([1.0, 0.8]),
- 'medium': np.array([1.0, 1.0])
- }
- mean_recalls = {
- 'hard': np.array([0.5, 0.8]),
- 'medium': np.array([0.5, 1.0])
- }
- pr_ranks = [1, 5]
- output_path = os.path.join(FLAGS.test_tmpdir, 'metrics.txt')
-
- # Run tested functions.
- dataset.SaveMetricsFile(mean_average_precision, mean_precisions,
- mean_recalls, pr_ranks, output_path)
- (read_mean_average_precision, read_pr_ranks, read_mean_precisions,
- read_mean_recalls) = dataset.ReadMetricsFile(output_path)
-
- # Compares actual and expected metrics.
- self.assertEqual(read_mean_average_precision, mean_average_precision)
- self.assertEqual(read_pr_ranks, pr_ranks)
- self.assertEqual(read_mean_precisions.keys(), mean_precisions.keys())
- self.assertAllEqual(read_mean_precisions['hard'], mean_precisions['hard'])
- self.assertAllEqual(read_mean_precisions['medium'],
- mean_precisions['medium'])
- self.assertEqual(read_mean_recalls.keys(), mean_recalls.keys())
- self.assertAllEqual(read_mean_recalls['hard'], mean_recalls['hard'])
- self.assertAllEqual(read_mean_recalls['medium'], mean_recalls['medium'])
-
- def testReadMetricsWithRepeatedProtocolFails(self):
- # Define inputs.
- input_path = os.path.join(FLAGS.test_tmpdir, 'metrics.txt')
- with tf.io.gfile.GFile(input_path, 'w') as f:
- f.write('hard\n'
- ' mAP=70.0\n'
- ' mP@k[1 5] [ 100. 80.]\n'
- ' mR@k[1 5] [ 50. 80.]\n'
- 'medium\n'
- ' mAP=90.0\n'
- ' mP@k[1 5] [ 100. 100.]\n'
- ' mR@k[1 5] [ 50. 100.]\n'
- 'medium\n'
- ' mAP=90.0\n'
- ' mP@k[1 5] [ 100. 100.]\n'
- ' mR@k[1 5] [ 50. 100.]\n')
-
- # Run tested functions.
- with self.assertRaisesRegex(ValueError, 'Malformed input'):
- dataset.ReadMetricsFile(input_path)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/datasets/sfm120k/__init__.py b/research/delf/delf/python/datasets/sfm120k/__init__.py
deleted file mode 100644
index 8f8fc48f4a6..00000000000
--- a/research/delf/delf/python/datasets/sfm120k/__init__.py
+++ /dev/null
@@ -1,23 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Module exposing Sfm120k dataset for training."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-# pylint: disable=unused-import
-from delf.python.datasets.sfm120k import sfm120k
-# pylint: enable=unused-import
diff --git a/research/delf/delf/python/datasets/sfm120k/dataset_download.py b/research/delf/delf/python/datasets/sfm120k/dataset_download.py
deleted file mode 100644
index ba6b17feaf2..00000000000
--- a/research/delf/delf/python/datasets/sfm120k/dataset_download.py
+++ /dev/null
@@ -1,103 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Structure-from-Motion dataset (Sfm120k) download function."""
-
-import os
-
-import tensorflow as tf
-
-
-def download_train(data_dir):
- """Checks, and, if required, downloads the necessary files for the training.
-
- Checks if the data necessary for running the example training script exist.
- If not, it downloads it in the following folder structure:
- DATA_ROOT/train/retrieval-SfM-120k/ : folder with rsfm120k images and db
- files.
- DATA_ROOT/train/retrieval-SfM-30k/ : folder with rsfm30k images and db
- files.
- """
-
- # Create data folder if does not exist.
- if not tf.io.gfile.exists(data_dir):
- tf.io.gfile.mkdir(data_dir)
-
- # Create datasets folder if does not exist.
- datasets_dir = os.path.join(data_dir, 'train')
- if not tf.io.gfile.exists(datasets_dir):
- tf.io.gfile.mkdir(datasets_dir)
-
- # Download folder train/retrieval-SfM-120k/.
- src_dir = 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/train/ims'
- dst_dir = os.path.join(datasets_dir, 'retrieval-SfM-120k', 'ims')
- download_file = 'ims.tar.gz'
- if not tf.io.gfile.exists(dst_dir):
- src_file = os.path.join(src_dir, download_file)
- dst_file = os.path.join(dst_dir, download_file)
- print('>> Image directory does not exist. Creating: {}'.format(dst_dir))
- tf.io.gfile.makedirs(dst_dir)
- print('>> Downloading ims.tar.gz...')
- os.system('wget {} -O {}'.format(src_file, dst_file))
- print('>> Extracting {}...'.format(dst_file))
- os.system('tar -zxf {} -C {}'.format(dst_file, dst_dir))
- print('>> Extracted, deleting {}...'.format(dst_file))
- os.system('rm {}'.format(dst_file))
-
- # Create symlink for train/retrieval-SfM-30k/.
- dst_dir_old = os.path.join(datasets_dir, 'retrieval-SfM-120k', 'ims')
- dst_dir = os.path.join(datasets_dir, 'retrieval-SfM-30k', 'ims')
- if not (tf.io.gfile.exists(dst_dir) or os.path.islink(dst_dir)):
- tf.io.gfile.makedirs(os.path.join(datasets_dir, 'retrieval-SfM-30k'))
- os.system('ln -s {} {}'.format(dst_dir_old, dst_dir))
- print(
- '>> Created symbolic link from retrieval-SfM-120k/ims to '
- 'retrieval-SfM-30k/ims')
-
- # Download db files.
- src_dir = 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/train/dbs'
- datasets = ['retrieval-SfM-120k', 'retrieval-SfM-30k']
- for dataset in datasets:
- dst_dir = os.path.join(datasets_dir, dataset)
- if dataset == 'retrieval-SfM-120k':
- download_files = ['{}.pkl'.format(dataset),
- '{}-whiten.pkl'.format(dataset)]
- download_eccv2020 = '{}-val-eccv2020.pkl'.format(dataset)
- elif dataset == 'retrieval-SfM-30k':
- download_files = ['{}-whiten.pkl'.format(dataset)]
- download_eccv2020 = None
-
- if not tf.io.gfile.exists(dst_dir):
- print('>> Dataset directory does not exist. Creating: {}'.format(
- dst_dir))
- tf.io.gfile.mkdir(dst_dir)
-
- for i in range(len(download_files)):
- src_file = os.path.join(src_dir, download_files[i])
- dst_file = os.path.join(dst_dir, download_files[i])
- if not os.path.isfile(dst_file):
- print('>> DB file {} does not exist. Downloading...'.format(
- download_files[i]))
- os.system('wget {} -O {}'.format(src_file, dst_file))
-
- if download_eccv2020:
- eccv2020_dst_file = os.path.join(dst_dir, download_eccv2020)
- if not os.path.isfile(eccv2020_dst_file):
- eccv2020_src_dir = \
- "http://ptak.felk.cvut.cz/personal/toliageo/share/how/dataset/"
- eccv2020_dst_file = os.path.join(dst_dir, download_eccv2020)
- eccv2020_src_file = os.path.join(eccv2020_src_dir,
- download_eccv2020)
- os.system('wget {} -O {}'.format(eccv2020_src_file,
- eccv2020_dst_file))
diff --git a/research/delf/delf/python/datasets/sfm120k/sfm120k.py b/research/delf/delf/python/datasets/sfm120k/sfm120k.py
deleted file mode 100644
index 3be14b10e1e..00000000000
--- a/research/delf/delf/python/datasets/sfm120k/sfm120k.py
+++ /dev/null
@@ -1,143 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Structure-from-Motion dataset (Sfm120k) module.
-
-[1] From Single Image Query to Detailed 3D Reconstruction.
-Johannes L. Schonberger, Filip Radenovic, Ondrej Chum, Jan-Michael Frahm.
-The related paper can be found at: https://ieeexplore.ieee.org/document/7299148.
-"""
-
-import os
-import pickle
-import tensorflow as tf
-
-from delf.python.datasets import tuples_dataset
-from delf.python.datasets import utils
-
-
-def id2filename(image_id, prefix):
- """Creates a training image path out of its id name.
-
- Used for the image mapping in the Sfm120k datset.
-
- Args:
- image_id: String, image id.
- prefix: String, root directory where images are saved.
-
- Returns:
- filename: String, full image filename.
- """
- if prefix:
- return os.path.join(prefix, image_id[-2:], image_id[-4:-2], image_id[-6:-4],
- image_id)
- else:
- return os.path.join(image_id[-2:], image_id[-4:-2], image_id[-6:-4],
- image_id)
-
-
-class _Sfm120k(tuples_dataset.TuplesDataset):
- """Structure-from-Motion (Sfm120k) dataset instance.
-
- The dataset contains the image names lists for training and validation,
- the cluster ID (3D model ID) for each image and indices forming
- query-positive pairs of images. The images are loaded per epoch and resized
- on the fly to the desired dimensionality.
- """
-
- def __init__(self, mode, data_root, imsize=None, num_negatives=5,
- num_queries=2000, pool_size=20000, loader=utils.default_loader,
- eccv2020=False):
- """Structure-from-Motion (Sfm120k) dataset initialization.
-
- Args:
- mode: Either 'train' or 'val'.
- data_root: Path to the root directory of the dataset.
- imsize: Integer, defines the maximum size of longer image side.
- num_negatives: Integer, number of negative images per one query.
- num_queries: Integer, number of query images.
- pool_size: Integer, size of the negative image pool, from where the
- hard-negative images are chosen.
- loader: Callable, a function to load an image given its path.
- eccv2020: Bool, whether to use a new validation dataset used with ECCV
- 2020 paper (https://arxiv.org/abs/2007.13172).
-
- Raises:
- ValueError: Raised if `mode` is not one of 'train' or 'val'.
- """
- if mode not in ['train', 'val']:
- raise ValueError(
- "`mode` argument should be either 'train' or 'val', passed as a "
- "String.")
-
- # Setting up the paths for the dataset.
- if eccv2020:
- name = "retrieval-SfM-120k-val-eccv2020"
- else:
- name = "retrieval-SfM-120k"
- db_root = os.path.join(data_root, 'train/retrieval-SfM-120k')
- ims_root = os.path.join(db_root, 'ims/')
-
- # Loading the dataset db file.
- db_filename = os.path.join(db_root, '{}.pkl'.format(name))
-
- with tf.io.gfile.GFile(db_filename, 'rb') as f:
- db = pickle.load(f)[mode]
-
- # Setting full paths for the dataset images.
- self.images = [id2filename(img_name, None) for
- img_name in db['cids']]
-
- # Initializing tuples dataset.
- super().__init__(name, mode, db_root, imsize, num_negatives, num_queries,
- pool_size, loader, ims_root)
-
- def Sfm120kInfo(self):
- """Metadata for the Sfm120k dataset.
-
- The dataset contains the image names lists for training and
- validation, the cluster ID (3D model ID) for each image and indices
- forming query-positive pairs of images. The images are loaded per epoch
- and resized on the fly to the desired dimensionality.
-
- Returns:
- info: dictionary with the dataset parameters.
- """
- info = {'train': {'clusters': 91642, 'pidxs': 181697, 'qidxs': 181697},
- 'val': {'clusters': 6403, 'pidxs': 1691, 'qidxs': 1691}}
- return info
-
-
-def CreateDataset(mode, data_root, imsize=None, num_negatives=5,
- num_queries=2000, pool_size=20000,
- loader=utils.default_loader, eccv2020=False):
- '''Creates Structure-from-Motion (Sfm120k) dataset.
-
- Args:
- mode: String, either 'train' or 'val'.
- data_root: Path to the root directory of the dataset.
- imsize: Integer, defines the maximum size of longer image side.
- num_negatives: Integer, number of negative images per one query.
- num_queries: Integer, number of query images.
- pool_size: Integer, size of the negative image pool, from where the
- hard-negative images are chosen.
- loader: Callable, a function to load an image given its path.
- eccv2020: Bool, whether to use a new validation dataset used with ECCV
- 2020 paper (https://arxiv.org/abs/2007.13172).
-
- Returns:
- sfm120k: Sfm120k dataset instance.
- '''
- return _Sfm120k(mode, data_root, imsize, num_negatives, num_queries,
- pool_size, loader, eccv2020)
diff --git a/research/delf/delf/python/datasets/sfm120k/sfm120k_test.py b/research/delf/delf/python/datasets/sfm120k/sfm120k_test.py
deleted file mode 100644
index 5b253200eb2..00000000000
--- a/research/delf/delf/python/datasets/sfm120k/sfm120k_test.py
+++ /dev/null
@@ -1,37 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for Sfm120k dataset module."""
-
-import tensorflow as tf
-
-from delf.python.datasets.sfm120k import sfm120k
-
-
-class Sfm120kTest(tf.test.TestCase):
- """Tests for Sfm120k dataset module."""
-
- def testId2Filename(self):
- """Tests conversion of image id to full path mapping."""
- image_id = "29fdc243aeb939388cfdf2d081dc080e"
- prefix = "train/retrieval-SfM-120k/ims/"
- path = sfm120k.id2filename(image_id, prefix)
- expected_path = "train/retrieval-SfM-120k/ims/0e/08/dc" \
- "/29fdc243aeb939388cfdf2d081dc080e"
- self.assertEqual(path, expected_path)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/datasets/tuples_dataset.py b/research/delf/delf/python/datasets/tuples_dataset.py
deleted file mode 100644
index 8449c060fb1..00000000000
--- a/research/delf/delf/python/datasets/tuples_dataset.py
+++ /dev/null
@@ -1,328 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tuple dataset module.
-
-Based on the Radenovic et al. ECCV16: CNN image retrieval learns from BoW.
-For more information refer to https://arxiv.org/abs/1604.02426.
-"""
-
-import os
-import pickle
-
-import numpy as np
-import tensorflow as tf
-
-from delf.python.datasets import utils as image_loading_utils
-from delf.python.training import global_features_utils
-from delf.python.training.model import global_model
-
-
-class TuplesDataset():
- """Data loader that loads training and validation tuples.
-
- After initialization, the function create_epoch_tuples() should be called to
- create the dataset tuples. After that, the dataset can be iterated through
- using next() function.
- Tuples are based on Radenovic et al. ECCV16 work: CNN image retrieval
- learns from BoW. For more information refer to
- https://arxiv.org/abs/1604.02426.
- """
-
- def __init__(self, name, mode, data_root, imsize=None, num_negatives=5,
- num_queries=2000, pool_size=20000,
- loader=image_loading_utils.default_loader, ims_root=None):
- """TuplesDataset object initialization.
-
- Args:
- name: String, dataset name. I.e. 'retrieval-sfm-120k'.
- mode: 'train' or 'val' for training and validation parts of dataset.
- data_root: Path to the root directory of the dataset.
- imsize: Integer, defines the maximum size of longer image side transform.
- num_negatives: Integer, number of negative images for a query image in a
- training tuple.
- num_queries: Integer, number of query images to be processed in one epoch.
- pool_size: Integer, size of the negative image pool, from where the
- hard-negative images are re-mined.
- loader: Callable, a function to load an image given its path.
- ims_root: String, image root directory.
-
- Raises:
- ValueError: If mode is not either 'train' or 'val'.
- """
-
- if mode not in ['train', 'val']:
- raise ValueError(
- "`mode` argument should be either 'train' or 'val', passed as a "
- "String.")
-
- # Loading db.
- db_filename = os.path.join(data_root, '{}.pkl'.format(name))
- with tf.io.gfile.GFile(db_filename, 'rb') as f:
- db = pickle.load(f)[mode]
-
- # Initializing tuples dataset.
- self._ims_root = data_root if ims_root is None else ims_root
- self._name = name
- self._mode = mode
- self._imsize = imsize
- self._clusters = db['cluster']
- self._query_pool = db['qidxs']
- self._positive_pool = db['pidxs']
-
- if not hasattr(self, 'images'):
- self.images = db['ids']
-
- # Size of training subset for an epoch.
- self._num_negatives = num_negatives
- self._num_queries = min(num_queries, len(self._query_pool))
- self._pool_size = min(pool_size, len(self.images))
- self._qidxs = None
- self._pidxs = None
- self._nidxs = None
-
- self._loader = loader
- self._print_freq = 10
- # Indexer for the iterator.
- self._n = 0
-
- def __iter__(self):
- """Function for making TupleDataset an iterator.
-
- Returns:
- iter: The iterator object itself (TupleDataset).
- """
- return self
-
- def __next__(self):
- """Function for making TupleDataset an iterator.
-
- Returns:
- next: The next item in the sequence (next dataset image tuple).
- """
- if self._n < len(self._qidxs):
- result = self.__getitem__(self._n)
- self._n += 1
- return result
- else:
- raise StopIteration
-
- def _img_names_to_full_path(self, image_list):
- """Converts list of image names to the list of full paths to the images.
-
- Args:
- image_list: Image names, either a list or a single image path.
-
- Returns:
- image_full_paths: List of full paths to the images.
- """
- if not isinstance(image_list, list):
- return os.path.join(self._ims_root, image_list)
- return [os.path.join(self._ims_root, img_name) for img_name in image_list]
-
- def __getitem__(self, index):
- """Called to load an image tuple at the given `index`.
-
- Args:
- index: Integer, index.
-
- Returns:
- output: Tuple [q,p,n1,...,nN, target], loaded 'train'/'val' tuple at
- index of qidxs. `q` is the query image tensor, `p` is the
- corresponding positive image tensor, `n1`,...,`nN` are the negatives
- associated with the query. `target` is a tensor (with the shape [2+N])
- of integer labels corresponding to the tuple list: query (-1),
- positive (1), negative (0).
-
- Raises:
- ValueError: Raised if the query indexes list `qidxs` is empty.
- """
- if self.__len__() == 0:
- raise ValueError(
- "List `qidxs` is empty. Run `dataset.create_epoch_tuples(net)` "
- "method to create subset for `train`/`val`.")
-
- output = []
- # Query image.
- output.append(self._loader(
- self._img_names_to_full_path(self.images[self._qidxs[index]]),
- self._imsize))
- # Positive image.
- output.append(self._loader(
- self._img_names_to_full_path(self.images[self._pidxs[index]]),
- self._imsize))
- # Negative images.
- for nidx in self._nidxs[index]:
- output.append(self._loader(
- self._img_names_to_full_path(self.images[nidx]),
- self._imsize))
- # Labels for the query (-1), positive (1), negative (0) images in the tuple.
- target = tf.convert_to_tensor([-1, 1] + [0] * self._num_negatives)
- output.append(target)
-
- return tuple(output)
-
- def __len__(self):
- """Called to implement the built-in function len().
-
- Returns:
- len: Integer, number of query images.
- """
- if self._qidxs is None:
- return 0
- return len(self._qidxs)
-
- def __repr__(self):
- """Metadata for the TupleDataset.
-
- Returns:
- meta: String, containing TupleDataset meta.
- """
- fmt_str = self.__class__.__name__ + '\n'
- fmt_str += '\tName and mode: {} {}\n'.format(self._name, self._mode)
- fmt_str += '\tNumber of images: {}\n'.format(len(self.images))
- fmt_str += '\tNumber of training tuples: {}\n'.format(len(self._query_pool))
- fmt_str += '\tNumber of negatives per tuple: {}\n'.format(
- self._num_negatives)
- fmt_str += '\tNumber of tuples processed in an epoch: {}\n'.format(
- self._num_queries)
- fmt_str += '\tPool size for negative remining: {}\n'.format(self._pool_size)
- return fmt_str
-
- def create_epoch_tuples(self, net):
- """Creates epoch tuples with the hard-negative re-mining.
-
- Negative examples are selected from clusters different than the cluster
- of the query image, as the clusters are ideally non-overlaping. For
- every query image we choose hard-negatives, that is, non-matching images
- with the most similar descriptor. Hard-negatives depend on the current
- CNN parameters. K-nearest neighbors from all non-matching images are
- selected. Query images are selected randomly. Positives examples are
- fixed for the related query image during the whole training process.
-
- Args:
- net: Model, network to be used for negative re-mining.
-
- Raises:
- ValueError: If the pool_size is smaller than the number of negative
- images per tuple.
-
- Returns:
- avg_l2: Float, average negative L2-distance.
- """
- self._n = 0
-
- if self._num_negatives < self._pool_size:
- raise ValueError("Unable to create epoch tuples. Negative pool_size "
- "should be larger than the number of negative images "
- "per tuple.")
-
- global_features_utils.debug_and_log(
- '>> Creating tuples for an epoch of {}-{}...'.format(self._name,
- self._mode),
- True)
- global_features_utils.debug_and_log(">> Used network: ", True)
- global_features_utils.debug_and_log(net.meta_repr(), True)
-
- ## Selecting queries.
- # Draw `num_queries` random queries for the tuples.
- idx_list = np.arange(len(self._query_pool))
- np.random.shuffle(idx_list)
- idxs2query_pool = idx_list[:self._num_queries]
- self._qidxs = [self._query_pool[i] for i in idxs2query_pool]
-
- ## Selecting positive pairs.
- # Positives examples are fixed for each query during the whole training
- # process.
- self._pidxs = [self._positive_pool[i] for i in idxs2query_pool]
-
- ## Selecting negative pairs.
- # If `num_negatives` = 0 create dummy nidxs.
- # Useful when only positives used for training.
- if self._num_negatives == 0:
- self._nidxs = [[] for _ in range(len(self._qidxs))]
- return 0
-
- # Draw pool_size random images for pool of negatives images.
- neg_idx_list = np.arange(len(self.images))
- np.random.shuffle(neg_idx_list)
- neg_images_idxs = neg_idx_list[:self._pool_size]
-
- global_features_utils.debug_and_log(
- '>> Extracting descriptors for query images...', debug=True)
-
- img_list = self._img_names_to_full_path([self.images[i] for i in
- self._qidxs])
- qvecs = global_model.extract_global_descriptors_from_list(
- net,
- images=img_list,
- image_size=self._imsize,
- print_freq=self._print_freq)
-
- global_features_utils.debug_and_log(
- '>> Extracting descriptors for negative pool...', debug=True)
-
- poolvecs = global_model.extract_global_descriptors_from_list(
- net,
- images=self._img_names_to_full_path([self.images[i] for i in
- neg_images_idxs]),
- image_size=self._imsize,
- print_freq=self._print_freq)
-
- global_features_utils.debug_and_log('>> Searching for hard negatives...',
- debug=True)
-
- # Compute dot product scores and ranks.
- scores = tf.linalg.matmul(poolvecs, qvecs, transpose_a=True)
- ranks = tf.argsort(scores, axis=0, direction='DESCENDING')
-
- sum_ndist = 0.
- n_ndist = 0.
-
- # Selection of negative examples.
- self._nidxs = []
-
- for q, qidx in enumerate(self._qidxs):
- # We are not using the query cluster, those images are potentially
- # positive.
- qcluster = self._clusters[qidx]
- clusters = [qcluster]
- nidxs = []
- rank = 0
-
- while len(nidxs) < self._num_negatives:
- if rank >= tf.shape(ranks)[0]:
- raise ValueError("Unable to create epoch tuples. Number of required "
- "negative images is larger than the number of "
- "clusters in the dataset.")
- potential = neg_images_idxs[ranks[rank, q]]
- # Take at most one image from the same cluster.
- if not self._clusters[potential] in clusters:
- nidxs.append(potential)
- clusters.append(self._clusters[potential])
- dist = tf.norm(qvecs[:, q] - poolvecs[:, ranks[rank, q]],
- axis=0).numpy()
- sum_ndist += dist
- n_ndist += 1
- rank += 1
-
- self._nidxs.append(nidxs)
-
- global_features_utils.debug_and_log(
- '>> Average negative l2-distance: {:.2f}'.format(
- sum_ndist / n_ndist))
-
- # Return average negative L2-distance.
- return sum_ndist / n_ndist
diff --git a/research/delf/delf/python/datasets/tuples_dataset_test.py b/research/delf/delf/python/datasets/tuples_dataset_test.py
deleted file mode 100644
index 4a34bb813b4..00000000000
--- a/research/delf/delf/python/datasets/tuples_dataset_test.py
+++ /dev/null
@@ -1,88 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"Tests for the tuples dataset module."
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import numpy as np
-from PIL import Image
-import tensorflow as tf
-import pickle
-
-from delf.python.datasets import tuples_dataset
-from delf.python.training.model import global_model
-
-FLAGS = flags.FLAGS
-
-
-class TuplesDatasetTest(tf.test.TestCase):
- """Tests for tuples dataset module."""
-
- def testCreateEpochTuples(self):
- """Tests epoch tuple creation."""
- # Create a tuples dataset instance.
- name = 'test_dataset'
- num_queries = 1
- pool_size = 5
- num_negatives = 2
- # Create a ground truth .pkl file.
- gnd = {
- 'train': {'ids': [str(i) + '.png' for i in range(2 * num_queries + pool_size)],
- 'cluster': [0, 0, 1, 2, 3, 4, 5],
- 'qidxs': [0], 'pidxs': [1]}}
- gnd_name = name + '.pkl'
- with tf.io.gfile.GFile(os.path.join(FLAGS.test_tmpdir, gnd_name),
- 'wb') as gnd_file:
- pickle.dump(gnd, gnd_file)
-
- # Create random images for the dataset.
- for i in range(2 * num_queries + pool_size):
- dummy_image = np.random.rand(1024, 750, 3) * 255
- img_out = Image.fromarray(dummy_image.astype('uint8')).convert('RGB')
- filename = os.path.join(FLAGS.test_tmpdir, '{}.png'.format(i))
- img_out.save(filename)
-
- dataset = tuples_dataset.TuplesDataset(
- name=name,
- data_root=FLAGS.test_tmpdir,
- mode='train',
- imsize=1024,
- num_negatives=num_negatives,
- num_queries=num_queries,
- pool_size=pool_size
- )
-
- # Assert that initially no negative images are set.
- self.assertIsNone(dataset._nidxs)
-
- # Initialize a network for negative re-mining.
- model_params = {'architecture': 'ResNet101', 'pooling': 'gem',
- 'whitening': False, 'pretrained': True}
- model = global_model.GlobalFeatureNet(**model_params)
-
- avg_neg_distance = dataset.create_epoch_tuples(model)
-
- # Check that an appropriate number of negative images has been chosen per
- # query.
- self.assertAllEqual(tf.shape(dataset._nidxs), [num_queries, num_negatives])
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/datasets/utils.py b/research/delf/delf/python/datasets/utils.py
deleted file mode 100644
index 596fca99a5d..00000000000
--- a/research/delf/delf/python/datasets/utils.py
+++ /dev/null
@@ -1,74 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Supporting functions for data loading."""
-
-import numpy as np
-from PIL import Image
-
-import tensorflow as tf
-from delf import utils as image_loading_utils
-
-
-def pil_imagenet_loader(path, imsize, bounding_box=None, preprocess=True):
- """Pillow loader for the images.
-
- Args:
- path: Path to image to be loaded.
- imsize: Integer, defines the maximum size of longer image side.
- bounding_box: (x1,y1,x2,y2) tuple to crop the query image.
- preprocess: Bool, whether to preprocess the images in respect to the
- ImageNet dataset.
-
- Returns:
- image: `Tensor`, image in ImageNet suitable format.
- """
- img = image_loading_utils.RgbLoader(path)
-
- if bounding_box is not None:
- imfullsize = max(img.size)
- img = img.crop(bounding_box)
- imsize = imsize * max(img.size) / imfullsize
-
- # Unlike `resize`, `thumbnail` resizes to the largest size that preserves
- # the aspect ratio, making sure that the output image does not exceed the
- # original image size and the size specified in the arguments of thumbnail.
- img.thumbnail((imsize, imsize), Image.ANTIALIAS)
- img = np.array(img)
-
- if preprocess:
- # Preprocessing for ImageNet data. Converts the images from RGB to BGR,
- # then zero-centers each color channel with respect to the ImageNet
- # dataset, without scaling.
- tf.keras.applications.imagenet_utils.preprocess_input(img, mode='caffe')
-
- return img
-
-
-def default_loader(path, imsize, bounding_box=None, preprocess=True):
- """Default loader for the images is using Pillow.
-
- Args:
- path: Path to image to be loaded.
- imsize: Integer, defines the maximum size of longer image side.
- bounding_box: (x1,y1,x2,y2) tuple to crop the query image.
- preprocess: Bool, whether to preprocess the images in respect to the
- ImageNet dataset.
-
- Returns:
- image: `Tensor`, image in ImageNet suitable format.
- """
- img = pil_imagenet_loader(path, imsize, bounding_box, preprocess)
- return img
diff --git a/research/delf/delf/python/datasets/utils_test.py b/research/delf/delf/python/datasets/utils_test.py
deleted file mode 100644
index e38671bc1b7..00000000000
--- a/research/delf/delf/python/datasets/utils_test.py
+++ /dev/null
@@ -1,76 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for dataset utilities."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import numpy as np
-from PIL import Image
-import tensorflow as tf
-
-from delf.python.datasets import utils as image_loading_utils
-
-FLAGS = flags.FLAGS
-
-
-class UtilsTest(tf.test.TestCase):
-
- def testDefaultLoader(self):
- # Create a dummy image.
- dummy_image = np.random.rand(1024, 750, 3) * 255
- img_out = Image.fromarray(dummy_image.astype('uint8')).convert('RGB')
- filename = os.path.join(FLAGS.test_tmpdir, 'test_image.png')
- # Save the dummy image.
- img_out.save(filename)
-
- max_img_size = 1024
- # Load the saved dummy image.
- img = image_loading_utils.default_loader(
- filename, imsize=max_img_size, preprocess=False)
-
- # Make sure the values are the same before and after loading.
- self.assertAllEqual(np.array(img_out), img)
-
- self.assertAllLessEqual(tf.shape(img), max_img_size)
-
- def testDefaultLoaderWithBoundingBox(self):
- # Create a dummy image.
- dummy_image = np.random.rand(1024, 750, 3) * 255
- img_out = Image.fromarray(dummy_image.astype('uint8')).convert('RGB')
- filename = os.path.join(FLAGS.test_tmpdir, 'test_image.png')
- # Save the dummy image.
- img_out.save(filename)
-
- max_img_size = 1024
- # Load the saved dummy image.
- expected_size = 400
- img = image_loading_utils.default_loader(
- filename,
- imsize=max_img_size,
- bounding_box=[120, 120, 120 + expected_size, 120 + expected_size],
- preprocess=False)
-
- # Check that the final shape is as expected.
- self.assertAllEqual(tf.shape(img), [expected_size, expected_size, 3])
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/datum_io.py b/research/delf/delf/python/datum_io.py
deleted file mode 100644
index f0d4cbfd11a..00000000000
--- a/research/delf/delf/python/datum_io.py
+++ /dev/null
@@ -1,221 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Python interface for DatumProto.
-
-DatumProto is protocol buffer used to serialize tensor with arbitrary shape.
-Please refer to datum.proto for details.
-
-Support read and write of DatumProto from/to NumPy array and file.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-
-from delf import datum_pb2
-
-
-def ArrayToDatum(arr):
- """Converts NumPy array to DatumProto.
-
- Supports arrays of types:
- - float16 (it is converted into a float32 in DatumProto)
- - float32
- - float64 (it is converted into a float32 in DatumProto)
- - uint8 (it is converted into a uint32 in DatumProto)
- - uint16 (it is converted into a uint32 in DatumProto)
- - uint32
- - uint64 (it is converted into a uint32 in DatumProto)
-
- Args:
- arr: NumPy array of arbitrary shape.
-
- Returns:
- datum: DatumProto object.
-
- Raises:
- ValueError: If array type is unsupported.
- """
- datum = datum_pb2.DatumProto()
- if arr.dtype in ('float16', 'float32', 'float64'):
- datum.float_list.value.extend(arr.astype('float32').flat)
- elif arr.dtype in ('uint8', 'uint16', 'uint32', 'uint64'):
- datum.uint32_list.value.extend(arr.astype('uint32').flat)
- else:
- raise ValueError('Unsupported array type: %s' % arr.dtype)
-
- datum.shape.dim.extend(arr.shape)
- return datum
-
-
-def ArraysToDatumPair(arr_1, arr_2):
- """Converts numpy arrays to DatumPairProto.
-
- Supports same formats as `ArrayToDatum`, see documentation therein.
-
- Args:
- arr_1: NumPy array of arbitrary shape.
- arr_2: NumPy array of arbitrary shape.
-
- Returns:
- datum_pair: DatumPairProto object.
- """
- datum_pair = datum_pb2.DatumPairProto()
- datum_pair.first.CopyFrom(ArrayToDatum(arr_1))
- datum_pair.second.CopyFrom(ArrayToDatum(arr_2))
-
- return datum_pair
-
-
-def DatumToArray(datum):
- """Converts data saved in DatumProto to NumPy array.
-
- Args:
- datum: DatumProto object.
-
- Returns:
- NumPy array of arbitrary shape.
- """
- if datum.HasField('float_list'):
- return np.array(datum.float_list.value).astype('float32').reshape(
- datum.shape.dim)
- elif datum.HasField('uint32_list'):
- return np.array(datum.uint32_list.value).astype('uint32').reshape(
- datum.shape.dim)
- else:
- raise ValueError('Input DatumProto does not have float_list or uint32_list')
-
-
-def DatumPairToArrays(datum_pair):
- """Converts data saved in DatumPairProto to NumPy arrays.
-
- Args:
- datum_pair: DatumPairProto object.
-
- Returns:
- Two NumPy arrays of arbitrary shape.
- """
- first_datum = DatumToArray(datum_pair.first)
- second_datum = DatumToArray(datum_pair.second)
- return first_datum, second_datum
-
-
-def SerializeToString(arr):
- """Converts NumPy array to serialized DatumProto.
-
- Args:
- arr: NumPy array of arbitrary shape.
-
- Returns:
- Serialized DatumProto string.
- """
- datum = ArrayToDatum(arr)
- return datum.SerializeToString()
-
-
-def SerializePairToString(arr_1, arr_2):
- """Converts pair of NumPy arrays to serialized DatumPairProto.
-
- Args:
- arr_1: NumPy array of arbitrary shape.
- arr_2: NumPy array of arbitrary shape.
-
- Returns:
- Serialized DatumPairProto string.
- """
- datum_pair = ArraysToDatumPair(arr_1, arr_2)
- return datum_pair.SerializeToString()
-
-
-def ParseFromString(string):
- """Converts serialized DatumProto string to NumPy array.
-
- Args:
- string: Serialized DatumProto string.
-
- Returns:
- NumPy array.
- """
- datum = datum_pb2.DatumProto()
- datum.ParseFromString(string)
- return DatumToArray(datum)
-
-
-def ParsePairFromString(string):
- """Converts serialized DatumPairProto string to NumPy arrays.
-
- Args:
- string: Serialized DatumProto string.
-
- Returns:
- Two NumPy arrays.
- """
- datum_pair = datum_pb2.DatumPairProto()
- datum_pair.ParseFromString(string)
- return DatumPairToArrays(datum_pair)
-
-
-def ReadFromFile(file_path):
- """Helper function to load data from a DatumProto format in a file.
-
- Args:
- file_path: Path to file containing data.
-
- Returns:
- data: NumPy array.
- """
- with tf.io.gfile.GFile(file_path, 'rb') as f:
- return ParseFromString(f.read())
-
-
-def ReadPairFromFile(file_path):
- """Helper function to load data from a DatumPairProto format in a file.
-
- Args:
- file_path: Path to file containing data.
-
- Returns:
- Two NumPy arrays.
- """
- with tf.io.gfile.GFile(file_path, 'rb') as f:
- return ParsePairFromString(f.read())
-
-
-def WriteToFile(data, file_path):
- """Helper function to write data to a file in DatumProto format.
-
- Args:
- data: NumPy array.
- file_path: Path to file that will be written.
- """
- serialized_data = SerializeToString(data)
- with tf.io.gfile.GFile(file_path, 'w') as f:
- f.write(serialized_data)
-
-
-def WritePairToFile(arr_1, arr_2, file_path):
- """Helper function to write pair of arrays to a file in DatumPairProto format.
-
- Args:
- arr_1: NumPy array of arbitrary shape.
- arr_2: NumPy array of arbitrary shape.
- file_path: Path to file that will be written.
- """
- serialized_data = SerializePairToString(arr_1, arr_2)
- with tf.io.gfile.GFile(file_path, 'w') as f:
- f.write(serialized_data)
diff --git a/research/delf/delf/python/datum_io_test.py b/research/delf/delf/python/datum_io_test.py
deleted file mode 100644
index f3587a10017..00000000000
--- a/research/delf/delf/python/datum_io_test.py
+++ /dev/null
@@ -1,97 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for datum_io, the python interface of DatumProto."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import numpy as np
-import tensorflow as tf
-
-from delf import datum_io
-
-FLAGS = flags.FLAGS
-
-
-class DatumIoTest(tf.test.TestCase):
-
- def Conversion2dTestWithType(self, dtype):
- original_data = np.arange(9).reshape(3, 3).astype(dtype)
- serialized = datum_io.SerializeToString(original_data)
- retrieved_data = datum_io.ParseFromString(serialized)
- self.assertTrue(np.array_equal(original_data, retrieved_data))
-
- def Conversion3dTestWithType(self, dtype):
- original_data = np.arange(24).reshape(2, 3, 4).astype(dtype)
- serialized = datum_io.SerializeToString(original_data)
- retrieved_data = datum_io.ParseFromString(serialized)
- self.assertTrue(np.array_equal(original_data, retrieved_data))
-
- # This test covers the following functions: ArrayToDatum, SerializeToString,
- # ParseFromString, DatumToArray.
- def testConversion2dWithType(self):
- self.Conversion2dTestWithType(np.uint16)
- self.Conversion2dTestWithType(np.uint32)
- self.Conversion2dTestWithType(np.uint64)
- self.Conversion2dTestWithType(np.float16)
- self.Conversion2dTestWithType(np.float32)
- self.Conversion2dTestWithType(np.float64)
-
- # This test covers the following functions: ArrayToDatum, SerializeToString,
- # ParseFromString, DatumToArray.
- def testConversion3dWithType(self):
- self.Conversion3dTestWithType(np.uint16)
- self.Conversion3dTestWithType(np.uint32)
- self.Conversion3dTestWithType(np.uint64)
- self.Conversion3dTestWithType(np.float16)
- self.Conversion3dTestWithType(np.float32)
- self.Conversion3dTestWithType(np.float64)
-
- def testConversionWithUnsupportedType(self):
- with self.assertRaisesRegex(ValueError, 'Unsupported array type'):
- self.Conversion3dTestWithType(int)
-
- # This test covers the following functions: ArrayToDatum, SerializeToString,
- # WriteToFile, ReadFromFile, ParseFromString, DatumToArray.
- def testWriteAndReadToFile(self):
- data = np.array([[[-1.0, 125.0, -2.5], [14.5, 3.5, 0.0]],
- [[20.0, 0.0, 30.0], [25.5, 36.0, 42.0]]])
- filename = os.path.join(FLAGS.test_tmpdir, 'test.datum')
- datum_io.WriteToFile(data, filename)
- data_read = datum_io.ReadFromFile(filename)
- self.assertAllEqual(data_read, data)
-
- # This test covers the following functions: ArraysToDatumPair,
- # SerializePairToString, WritePairToFile, ReadPairFromFile,
- # ParsePairFromString, DatumPairToArrays.
- def testWriteAndReadPairToFile(self):
- data_1 = np.array([[[-1.0, 125.0, -2.5], [14.5, 3.5, 0.0]],
- [[20.0, 0.0, 30.0], [25.5, 36.0, 42.0]]])
- data_2 = np.array(
- [[[255, 0, 5], [10, 300, 0]], [[20, 1, 100], [255, 360, 420]]],
- dtype='uint32')
- filename = os.path.join(FLAGS.test_tmpdir, 'test.datum_pair')
- datum_io.WritePairToFile(data_1, data_2, filename)
- data_read_1, data_read_2 = datum_io.ReadPairFromFile(filename)
- self.assertAllEqual(data_read_1, data_1)
- self.assertAllEqual(data_read_2, data_2)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/delg/DELG_INSTRUCTIONS.md b/research/delf/delf/python/delg/DELG_INSTRUCTIONS.md
deleted file mode 100644
index dc72422e87c..00000000000
--- a/research/delf/delf/python/delg/DELG_INSTRUCTIONS.md
+++ /dev/null
@@ -1,175 +0,0 @@
-## DELG instructions
-
-[![Paper](http://img.shields.io/badge/paper-arXiv.2001.05027-B3181B.svg)](https://arxiv.org/abs/2001.05027)
-
-These instructions can be used to reproduce the results from the
-[DELG paper](https://arxiv.org/abs/2001.05027) for the Revisited Oxford/Paris
-datasets.
-
-### Install DELF library
-
-To be able to use this code, please follow
-[these instructions](../../../INSTALL_INSTRUCTIONS.md) to properly install the
-DELF library.
-
-### Download datasets
-
-```bash
-mkdir -p ~/delg/data && cd ~/delg/data
-
-# Oxford dataset.
-wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
-mkdir oxford5k_images
-tar -xvzf oxbuild_images.tgz -C oxford5k_images/
-
-# Paris dataset. Download and move all images to same directory.
-wget http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/paris_1.tgz
-wget http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/paris_2.tgz
-mkdir paris6k_images_tmp
-tar -xvzf paris_1.tgz -C paris6k_images_tmp/
-tar -xvzf paris_2.tgz -C paris6k_images_tmp/
-mkdir paris6k_images
-mv paris6k_images_tmp/paris/*/*.jpg paris6k_images/
-
-# Revisited annotations.
-wget http://cmp.felk.cvut.cz/revisitop/data/datasets/roxford5k/gnd_roxford5k.mat
-wget http://cmp.felk.cvut.cz/revisitop/data/datasets/rparis6k/gnd_rparis6k.mat
-wget http://cmp.felk.cvut.cz/cnnimageretrieval/data/test/roxford5k/gnd_roxford5k.pkl
-wget http://cmp.felk.cvut.cz/cnnimageretrieval/data/test/rparis6k/gnd_rparis6k.pkl
-```
-
-### Download model
-
-This is necessary to reproduce the main paper results. This example shows the
-R50-DELG model, pretrained on GLD; see the available pre-trained models
-[here](../../../README.md#pre-trained-models), for other variants (eg, R101,
-trained on GLDv2-clean).
-
-```bash
-# From models/research/delf/delf/python/delg
-mkdir parameters && cd parameters
-
-# R50-DELG-GLD model.
-wget http://storage.googleapis.com/delf/r50delg_gld_20200814.tar.gz
-tar -xvzf r50delg_gld_20200814.tar.gz
-```
-
-### Feature extraction
-
-We present here commands for R50-DELG (pretrained on GLD) extraction on
-`roxford5k`.
-
-- To use the R101-DELG model pretrained on GLD, first download it as mentioned
- above; then, replace the below argument `delf_config_path` by
- `r101delg_gld_config.pbtxt`
-- To use the R50-DELG model pretrained on GLDv2-clean, first download it as
- mentioned above; then, replace the below argument `delf_config_path` by
- `r50delg_gldv2clean_config.pbtxt`
-- To use the R101-DELG model pretrained on GLDv2-clean, first download it as
- mentioned above; then, replace the below argument `delf_config_path` by
- `r101delg_gldv2clean_config.pbtxt`
-- To extract on `rparis6k` instead, please edit the arguments accordingly
- (especially the `dataset_file_path` argument).
-
-#### Query feature extraction
-
-For query feature extraction, the cropped query image should be used to extract
-features, according to the Revisited Oxford/Paris experimental protocol. Note
-that this is done in the `extract_features` script, when setting
-`image_set=query`.
-
-Query feature extraction can be run as follows:
-
-```bash
-# From models/research/delf/delf/python/delg
-python3 extract_features.py \
- --delf_config_path r50delg_gld_config.pbtxt \
- --dataset_file_path ~/delg/data/gnd_roxford5k.mat \
- --images_dir ~/delg/data/oxford5k_images \
- --image_set query \
- --output_features_dir ~/delg/data/oxford5k_features/query
-```
-
-#### Index feature extraction
-
-Run index feature extraction as follows:
-
-```bash
-# From models/research/delf/delf/python/delg
-python3 extract_features.py \
- --delf_config_path r50delg_gld_config.pbtxt \
- --dataset_file_path ~/delg/data/gnd_roxford5k.mat \
- --images_dir ~/delg/data/oxford5k_images \
- --image_set index \
- --output_features_dir ~/delg/data/oxford5k_features/index
-```
-
-### Perform retrieval
-
-To run retrieval on `roxford5k`, the following command can be used:
-
-```bash
-# From models/research/delf/delf/python/delg
-python3 perform_retrieval.py \
- --dataset_file_path ~/delg/data/gnd_roxford5k.mat \
- --query_features_dir ~/delg/data/oxford5k_features/query \
- --index_features_dir ~/delg/data/oxford5k_features/index \
- --output_dir ~/delg/results/oxford5k
-```
-
-A file with named `metrics.txt` will be written to the path given in
-`output_dir`, with retrieval metrics for an experiment where geometric
-verification is not used. The contents should look approximately like:
-
-```
-hard
- mAP=45.11
- mP@k[ 1 5 10] [85.71 72.29 60.14]
- mR@k[ 1 5 10] [19.15 29.72 36.32]
-medium
- mAP=69.71
- mP@k[ 1 5 10] [95.71 92. 86.86]
- mR@k[ 1 5 10] [10.17 25.94 33.83]
-```
-
-which are the results presented in Table 3 of the paper.
-
-If you want to run retrieval with geometric verification, set
-`use_geometric_verification` to `True`. It's much slower since (1) in this code
-example the re-ranking is loading DELF local features from disk, and (2)
-re-ranking needs to be performed separately for each dataset protocol, since the
-junk images from each protocol should be removed when re-ranking. Here is an
-example command:
-
-```bash
-# From models/research/delf/delf/python/delg
-python3 perform_retrieval.py \
- --dataset_file_path ~/delg/data/gnd_roxford5k.mat \
- --query_features_dir ~/delg/data/oxford5k_features/query \
- --index_features_dir ~/delg/data/oxford5k_features/index \
- --use_geometric_verification \
- --output_dir ~/delg/results/oxford5k_with_gv
-```
-
-The `metrics.txt` should now show:
-
-```
-hard
- mAP=45.11
- mP@k[ 1 5 10] [85.71 72.29 60.14]
- mR@k[ 1 5 10] [19.15 29.72 36.32]
-hard_after_gv
- mAP=53.72
- mP@k[ 1 5 10] [91.43 83.81 74.38]
- mR@k[ 1 5 10] [19.45 34.45 44.64]
-medium
- mAP=69.71
- mP@k[ 1 5 10] [95.71 92. 86.86]
- mR@k[ 1 5 10] [10.17 25.94 33.83]
-medium_after_gv
- mAP=75.42
- mP@k[ 1 5 10] [97.14 95.24 93.81]
- mR@k[ 1 5 10] [10.21 27.21 37.72]
-```
-
-which, again, are the results presented in Table 3 of the paper.
diff --git a/research/delf/delf/python/delg/extract_features.py b/research/delf/delf/python/delg/extract_features.py
deleted file mode 100644
index 4ef10dc9415..00000000000
--- a/research/delf/delf/python/delg/extract_features.py
+++ /dev/null
@@ -1,163 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Extracts DELG features for images from Revisited Oxford/Paris datasets.
-
-Note that query images are cropped before feature extraction, as required by the
-evaluation protocols of these datasets.
-
-The types of extracted features (local and/or global) depend on the input
-DelfConfig.
-
-The program checks if features already exist, and skips computation for those.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import time
-
-from absl import app
-from absl import flags
-import numpy as np
-import tensorflow as tf
-
-from google.protobuf import text_format
-from delf import delf_config_pb2
-from delf import datum_io
-from delf import feature_io
-from delf import utils
-from delf.python.datasets.revisited_op import dataset
-from delf import extractor
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string(
- 'delf_config_path', '/tmp/delf_config_example.pbtxt',
- 'Path to DelfConfig proto text file with configuration to be used for DELG '
- 'extraction. Local features are extracted if use_local_features is True; '
- 'global features are extracted if use_global_features is True.')
-flags.DEFINE_string(
- 'dataset_file_path', '/tmp/gnd_roxford5k.mat',
- 'Dataset file for Revisited Oxford or Paris dataset, in .mat format.')
-flags.DEFINE_string(
- 'images_dir', '/tmp/images',
- 'Directory where dataset images are located, all in .jpg format.')
-flags.DEFINE_enum('image_set', 'query', ['query', 'index'],
- 'Whether to extract features from query or index images.')
-flags.DEFINE_string(
- 'output_features_dir', '/tmp/features',
- "Directory where DELG features will be written to. Each image's features "
- 'will be written to files with same name but different extension: the '
- 'global feature is written to a file with extension .delg_global and the '
- 'local features are written to a file with extension .delg_local.')
-
-# Extensions.
-_DELG_GLOBAL_EXTENSION = '.delg_global'
-_DELG_LOCAL_EXTENSION = '.delg_local'
-_IMAGE_EXTENSION = '.jpg'
-
-# Pace to report extraction log.
-_STATUS_CHECK_ITERATIONS = 50
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Read list of images from dataset file.
- print('Reading list of images from dataset file...')
- query_list, index_list, ground_truth = dataset.ReadDatasetFile(
- FLAGS.dataset_file_path)
- if FLAGS.image_set == 'query':
- image_list = query_list
- else:
- image_list = index_list
- num_images = len(image_list)
- print('done! Found %d images' % num_images)
-
- # Parse DelfConfig proto.
- config = delf_config_pb2.DelfConfig()
- with tf.io.gfile.GFile(FLAGS.delf_config_path, 'r') as f:
- text_format.Parse(f.read(), config)
-
- # Create output directory if necessary.
- if not tf.io.gfile.exists(FLAGS.output_features_dir):
- tf.io.gfile.makedirs(FLAGS.output_features_dir)
-
- extractor_fn = extractor.MakeExtractor(config)
-
- start = time.time()
- for i in range(num_images):
- if i == 0:
- print('Starting to extract features...')
- elif i % _STATUS_CHECK_ITERATIONS == 0:
- elapsed = (time.time() - start)
- print('Processing image %d out of %d, last %d '
- 'images took %f seconds' %
- (i, num_images, _STATUS_CHECK_ITERATIONS, elapsed))
- start = time.time()
-
- image_name = image_list[i]
- input_image_filename = os.path.join(FLAGS.images_dir,
- image_name + _IMAGE_EXTENSION)
-
- # Compose output file name and decide if image should be skipped.
- should_skip_global = True
- should_skip_local = True
- if config.use_global_features:
- output_global_feature_filename = os.path.join(
- FLAGS.output_features_dir, image_name + _DELG_GLOBAL_EXTENSION)
- if not tf.io.gfile.exists(output_global_feature_filename):
- should_skip_global = False
- if config.use_local_features:
- output_local_feature_filename = os.path.join(
- FLAGS.output_features_dir, image_name + _DELG_LOCAL_EXTENSION)
- if not tf.io.gfile.exists(output_local_feature_filename):
- should_skip_local = False
- if should_skip_global and should_skip_local:
- print('Skipping %s' % image_name)
- continue
-
- pil_im = utils.RgbLoader(input_image_filename)
- resize_factor = 1.0
- if FLAGS.image_set == 'query':
- # Crop query image according to bounding box.
- original_image_size = max(pil_im.size)
- bbox = [int(round(b)) for b in ground_truth[i]['bbx']]
- pil_im = pil_im.crop(bbox)
- cropped_image_size = max(pil_im.size)
- resize_factor = cropped_image_size / original_image_size
-
- im = np.array(pil_im)
-
- # Extract and save features.
- extracted_features = extractor_fn(im, resize_factor)
- if config.use_global_features:
- global_descriptor = extracted_features['global_descriptor']
- datum_io.WriteToFile(global_descriptor, output_global_feature_filename)
- if config.use_local_features:
- locations = extracted_features['local_features']['locations']
- descriptors = extracted_features['local_features']['descriptors']
- feature_scales = extracted_features['local_features']['scales']
- attention = extracted_features['local_features']['attention']
- feature_io.WriteToFile(output_local_feature_filename, locations,
- feature_scales, descriptors, attention)
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/delg/measure_latency.py b/research/delf/delf/python/delg/measure_latency.py
deleted file mode 100644
index 966964d1072..00000000000
--- a/research/delf/delf/python/delg/measure_latency.py
+++ /dev/null
@@ -1,119 +0,0 @@
-# Copyright 2020 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Times DELF/G extraction."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import time
-
-from absl import app
-from absl import flags
-import numpy as np
-from six.moves import range
-import tensorflow as tf
-
-from google.protobuf import text_format
-from delf import delf_config_pb2
-from delf import utils
-from delf import extractor
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string(
- 'delf_config_path', '/tmp/delf_config_example.pbtxt',
- 'Path to DelfConfig proto text file with configuration to be used for DELG '
- 'extraction. Local features are extracted if use_local_features is True; '
- 'global features are extracted if use_global_features is True.')
-flags.DEFINE_string('list_images_path', '/tmp/list_images.txt',
- 'Path to list of images whose features will be extracted.')
-flags.DEFINE_integer('repeat_per_image', 10,
- 'Number of times to repeat extraction per image.')
-flags.DEFINE_boolean(
- 'binary_local_features', False,
- 'Whether to binarize local features after extraction, and take this extra '
- 'latency into account. This should only be used if use_local_features is '
- 'set in the input DelfConfig from `delf_config_path`.')
-
-# Pace to report extraction log.
-_STATUS_CHECK_ITERATIONS = 100
-
-
-def _ReadImageList(list_path):
- """Helper function to read image paths.
-
- Args:
- list_path: Path to list of images, one image path per line.
-
- Returns:
- image_paths: List of image paths.
- """
- with tf.io.gfile.GFile(list_path, 'r') as f:
- image_paths = f.readlines()
- image_paths = [entry.rstrip() for entry in image_paths]
- return image_paths
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Read list of images.
- print('Reading list of images...')
- image_paths = _ReadImageList(FLAGS.list_images_path)
- num_images = len(image_paths)
- print(f'done! Found {num_images} images')
-
- # Load images in memory.
- print('Loading images, %d times per image...' % FLAGS.repeat_per_image)
- im_array = []
- for filename in image_paths:
- im = np.array(utils.RgbLoader(filename))
- for _ in range(FLAGS.repeat_per_image):
- im_array.append(im)
- np.random.shuffle(im_array)
- print('done!')
-
- # Parse DelfConfig proto.
- config = delf_config_pb2.DelfConfig()
- with tf.io.gfile.GFile(FLAGS.delf_config_path, 'r') as f:
- text_format.Parse(f.read(), config)
-
- extractor_fn = extractor.MakeExtractor(config)
-
- start = time.time()
- for i, im in enumerate(im_array):
- if i == 0:
- print('Starting to extract DELF features from images...')
- elif i % _STATUS_CHECK_ITERATIONS == 0:
- elapsed = (time.time() - start)
- print(f'Processing image {i} out of {len(im_array)}, last '
- f'{_STATUS_CHECK_ITERATIONS} images took {elapsed} seconds,'
- f'ie {elapsed/_STATUS_CHECK_ITERATIONS} secs/image.')
- start = time.time()
-
- # Extract and save features.
- extracted_features = extractor_fn(im)
-
- # Binarize local features, if desired (and if there are local features).
- if (config.use_local_features and FLAGS.binary_local_features and
- extracted_features['local_features']['attention'].size):
- packed_descriptors = np.packbits(
- extracted_features['local_features']['descriptors'] > 0, axis=1)
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/delg/perform_retrieval.py b/research/delf/delf/python/delg/perform_retrieval.py
deleted file mode 100644
index dc380077c56..00000000000
--- a/research/delf/delf/python/delg/perform_retrieval.py
+++ /dev/null
@@ -1,224 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Performs DELG-based image retrieval on Revisited Oxford/Paris datasets."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import time
-
-from absl import app
-from absl import flags
-import numpy as np
-import tensorflow as tf
-
-from delf import datum_io
-from delf.python.datasets.revisited_op import dataset
-from delf.python.detect_to_retrieve import image_reranking
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string(
- 'dataset_file_path', '/tmp/gnd_roxford5k.mat',
- 'Dataset file for Revisited Oxford or Paris dataset, in .mat format.')
-flags.DEFINE_string('query_features_dir', '/tmp/features/query',
- 'Directory where query DELG features are located.')
-flags.DEFINE_string('index_features_dir', '/tmp/features/index',
- 'Directory where index DELG features are located.')
-flags.DEFINE_boolean(
- 'use_geometric_verification', False,
- 'If True, performs re-ranking using local feature-based geometric '
- 'verification.')
-flags.DEFINE_float(
- 'local_descriptor_matching_threshold', 1.0,
- 'Optional, only used if `use_geometric_verification` is True. '
- 'Threshold below which a pair of local descriptors is considered '
- 'a potential match, and will be fed into RANSAC.')
-flags.DEFINE_float(
- 'ransac_residual_threshold', 20.0,
- 'Optional, only used if `use_geometric_verification` is True. '
- 'Residual error threshold for considering matches as inliers, used in '
- 'RANSAC algorithm.')
-flags.DEFINE_boolean(
- 'use_ratio_test', False,
- 'Optional, only used if `use_geometric_verification` is True. '
- 'Whether to use ratio test for local feature matching.')
-flags.DEFINE_string(
- 'output_dir', '/tmp/retrieval',
- 'Directory where retrieval output will be written to. A file containing '
- "metrics for this run is saved therein, with file name 'metrics.txt'.")
-
-# Extensions.
-_DELG_GLOBAL_EXTENSION = '.delg_global'
-_DELG_LOCAL_EXTENSION = '.delg_local'
-
-# Precision-recall ranks to use in metric computation.
-_PR_RANKS = (1, 5, 10)
-
-# Pace to log.
-_STATUS_CHECK_LOAD_ITERATIONS = 50
-
-# Output file names.
-_METRICS_FILENAME = 'metrics.txt'
-
-
-def _ReadDelgGlobalDescriptors(input_dir, image_list):
- """Reads DELG global features.
-
- Args:
- input_dir: Directory where features are located.
- image_list: List of image names for which to load features.
-
- Returns:
- global_descriptors: NumPy array of shape (len(image_list), D), where D
- corresponds to the global descriptor dimensionality.
- """
- num_images = len(image_list)
- global_descriptors = []
- print('Starting to collect global descriptors for %d images...' % num_images)
- start = time.time()
- for i in range(num_images):
- if i > 0 and i % _STATUS_CHECK_LOAD_ITERATIONS == 0:
- elapsed = (time.time() - start)
- print('Reading global descriptors for image %d out of %d, last %d '
- 'images took %f seconds' %
- (i, num_images, _STATUS_CHECK_LOAD_ITERATIONS, elapsed))
- start = time.time()
-
- descriptor_filename = image_list[i] + _DELG_GLOBAL_EXTENSION
- descriptor_fullpath = os.path.join(input_dir, descriptor_filename)
- global_descriptors.append(datum_io.ReadFromFile(descriptor_fullpath))
-
- return np.array(global_descriptors)
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Parse dataset to obtain query/index images, and ground-truth.
- print('Parsing dataset...')
- query_list, index_list, ground_truth = dataset.ReadDatasetFile(
- FLAGS.dataset_file_path)
- num_query_images = len(query_list)
- num_index_images = len(index_list)
- (_, medium_ground_truth,
- hard_ground_truth) = dataset.ParseEasyMediumHardGroundTruth(ground_truth)
- print('done! Found %d queries and %d index images' %
- (num_query_images, num_index_images))
-
- # Read global features.
- query_global_features = _ReadDelgGlobalDescriptors(FLAGS.query_features_dir,
- query_list)
- index_global_features = _ReadDelgGlobalDescriptors(FLAGS.index_features_dir,
- index_list)
-
- # Compute similarity between query and index images, potentially re-ranking
- # with geometric verification.
- ranks_before_gv = np.zeros([num_query_images, num_index_images],
- dtype='int32')
- if FLAGS.use_geometric_verification:
- medium_ranks_after_gv = np.zeros([num_query_images, num_index_images],
- dtype='int32')
- hard_ranks_after_gv = np.zeros([num_query_images, num_index_images],
- dtype='int32')
- for i in range(num_query_images):
- print('Performing retrieval with query %d (%s)...' % (i, query_list[i]))
- start = time.time()
-
- # Compute similarity between global descriptors.
- similarities = np.dot(index_global_features, query_global_features[i])
- ranks_before_gv[i] = np.argsort(-similarities)
-
- # Re-rank using geometric verification.
- if FLAGS.use_geometric_verification:
- medium_ranks_after_gv[i] = image_reranking.RerankByGeometricVerification(
- input_ranks=ranks_before_gv[i],
- initial_scores=similarities,
- query_name=query_list[i],
- index_names=index_list,
- query_features_dir=FLAGS.query_features_dir,
- index_features_dir=FLAGS.index_features_dir,
- junk_ids=set(medium_ground_truth[i]['junk']),
- local_feature_extension=_DELG_LOCAL_EXTENSION,
- ransac_seed=0,
- descriptor_matching_threshold=FLAGS
- .local_descriptor_matching_threshold,
- ransac_residual_threshold=FLAGS.ransac_residual_threshold,
- use_ratio_test=FLAGS.use_ratio_test)
- hard_ranks_after_gv[i] = image_reranking.RerankByGeometricVerification(
- input_ranks=ranks_before_gv[i],
- initial_scores=similarities,
- query_name=query_list[i],
- index_names=index_list,
- query_features_dir=FLAGS.query_features_dir,
- index_features_dir=FLAGS.index_features_dir,
- junk_ids=set(hard_ground_truth[i]['junk']),
- local_feature_extension=_DELG_LOCAL_EXTENSION,
- ransac_seed=0,
- descriptor_matching_threshold=FLAGS
- .local_descriptor_matching_threshold,
- ransac_residual_threshold=FLAGS.ransac_residual_threshold,
- use_ratio_test=FLAGS.use_ratio_test)
-
- elapsed = (time.time() - start)
- print('done! Retrieval for query %d took %f seconds' % (i, elapsed))
-
- # Create output directory if necessary.
- if not tf.io.gfile.exists(FLAGS.output_dir):
- tf.io.gfile.makedirs(FLAGS.output_dir)
-
- # Compute metrics.
- medium_metrics = dataset.ComputeMetrics(ranks_before_gv, medium_ground_truth,
- _PR_RANKS)
- hard_metrics = dataset.ComputeMetrics(ranks_before_gv, hard_ground_truth,
- _PR_RANKS)
- if FLAGS.use_geometric_verification:
- medium_metrics_after_gv = dataset.ComputeMetrics(medium_ranks_after_gv,
- medium_ground_truth,
- _PR_RANKS)
- hard_metrics_after_gv = dataset.ComputeMetrics(hard_ranks_after_gv,
- hard_ground_truth, _PR_RANKS)
-
- # Write metrics to file.
- mean_average_precision_dict = {
- 'medium': medium_metrics[0],
- 'hard': hard_metrics[0]
- }
- mean_precisions_dict = {'medium': medium_metrics[1], 'hard': hard_metrics[1]}
- mean_recalls_dict = {'medium': medium_metrics[2], 'hard': hard_metrics[2]}
- if FLAGS.use_geometric_verification:
- mean_average_precision_dict.update({
- 'medium_after_gv': medium_metrics_after_gv[0],
- 'hard_after_gv': hard_metrics_after_gv[0]
- })
- mean_precisions_dict.update({
- 'medium_after_gv': medium_metrics_after_gv[1],
- 'hard_after_gv': hard_metrics_after_gv[1]
- })
- mean_recalls_dict.update({
- 'medium_after_gv': medium_metrics_after_gv[2],
- 'hard_after_gv': hard_metrics_after_gv[2]
- })
- dataset.SaveMetricsFile(mean_average_precision_dict, mean_precisions_dict,
- mean_recalls_dict, _PR_RANKS,
- os.path.join(FLAGS.output_dir, _METRICS_FILENAME))
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/delg/r101delg_gld_config.pbtxt b/research/delf/delf/python/delg/r101delg_gld_config.pbtxt
deleted file mode 100644
index ea8a70b53df..00000000000
--- a/research/delf/delf/python/delg/r101delg_gld_config.pbtxt
+++ /dev/null
@@ -1,22 +0,0 @@
-use_local_features: true
-use_global_features: true
-model_path: "parameters/r101delg_gld_20200814"
-image_scales: 0.25
-image_scales: 0.35355338
-image_scales: 0.5
-image_scales: 0.70710677
-image_scales: 1.0
-image_scales: 1.4142135
-image_scales: 2.0
-delf_local_config {
- use_pca: false
- max_feature_num: 1000
- score_threshold: 166.1
-}
-delf_global_config {
- use_pca: false
- image_scales_ind: 3
- image_scales_ind: 4
- image_scales_ind: 5
-}
-max_image_size: 1024
diff --git a/research/delf/delf/python/delg/r101delg_gldv2clean_config.pbtxt b/research/delf/delf/python/delg/r101delg_gldv2clean_config.pbtxt
deleted file mode 100644
index d34a039a4ea..00000000000
--- a/research/delf/delf/python/delg/r101delg_gldv2clean_config.pbtxt
+++ /dev/null
@@ -1,22 +0,0 @@
-use_local_features: true
-use_global_features: true
-model_path: "parameters/r101delg_gldv2clean_20200914"
-image_scales: 0.25
-image_scales: 0.35355338
-image_scales: 0.5
-image_scales: 0.70710677
-image_scales: 1.0
-image_scales: 1.4142135
-image_scales: 2.0
-delf_local_config {
- use_pca: false
- max_feature_num: 1000
- score_threshold: 357.48
-}
-delf_global_config {
- use_pca: false
- image_scales_ind: 3
- image_scales_ind: 4
- image_scales_ind: 5
-}
-max_image_size: 1024
diff --git a/research/delf/delf/python/delg/r50delg_gld_config.pbtxt b/research/delf/delf/python/delg/r50delg_gld_config.pbtxt
deleted file mode 100644
index 4457810b575..00000000000
--- a/research/delf/delf/python/delg/r50delg_gld_config.pbtxt
+++ /dev/null
@@ -1,22 +0,0 @@
-use_local_features: true
-use_global_features: true
-model_path: "parameters/r50delg_gld_20200814"
-image_scales: 0.25
-image_scales: 0.35355338
-image_scales: 0.5
-image_scales: 0.70710677
-image_scales: 1.0
-image_scales: 1.4142135
-image_scales: 2.0
-delf_local_config {
- use_pca: false
- max_feature_num: 1000
- score_threshold: 175.0
-}
-delf_global_config {
- use_pca: false
- image_scales_ind: 3
- image_scales_ind: 4
- image_scales_ind: 5
-}
-max_image_size: 1024
diff --git a/research/delf/delf/python/delg/r50delg_gldv2clean_config.pbtxt b/research/delf/delf/python/delg/r50delg_gldv2clean_config.pbtxt
deleted file mode 100644
index 358d7cbe56c..00000000000
--- a/research/delf/delf/python/delg/r50delg_gldv2clean_config.pbtxt
+++ /dev/null
@@ -1,22 +0,0 @@
-use_local_features: true
-use_global_features: true
-model_path: "parameters/r50delg_gldv2clean_20200914"
-image_scales: 0.25
-image_scales: 0.35355338
-image_scales: 0.5
-image_scales: 0.70710677
-image_scales: 1.0
-image_scales: 1.4142135
-image_scales: 2.0
-delf_local_config {
- use_pca: false
- max_feature_num: 1000
- score_threshold: 454.6
-}
-delf_global_config {
- use_pca: false
- image_scales_ind: 3
- image_scales_ind: 4
- image_scales_ind: 5
-}
-max_image_size: 1024
diff --git a/research/delf/delf/python/detect_to_retrieve/DETECT_TO_RETRIEVE_INSTRUCTIONS.md b/research/delf/delf/python/detect_to_retrieve/DETECT_TO_RETRIEVE_INSTRUCTIONS.md
deleted file mode 100644
index 2d18a328997..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/DETECT_TO_RETRIEVE_INSTRUCTIONS.md
+++ /dev/null
@@ -1,231 +0,0 @@
-## Detect-to-Retrieve instructions
-
-[![Paper](http://img.shields.io/badge/paper-arXiv.1812.01584-B3181B.svg)](https://arxiv.org/abs/1812.01584)
-
-These instructions can be used to reproduce the results from the
-[Detect-to-Retrieve paper](https://arxiv.org/abs/1812.01584) for the Revisited
-Oxford/Paris datasets.
-
-### Install DELF library
-
-To be able to use this code, please follow
-[these instructions](../../../INSTALL_INSTRUCTIONS.md) to properly install the
-DELF library.
-
-### Download datasets
-
-```bash
-mkdir -p ~/detect_to_retrieve/data && cd ~/detect_to_retrieve/data
-
-# Oxford dataset.
-wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
-mkdir oxford5k_images
-tar -xvzf oxbuild_images.tgz -C oxford5k_images/
-
-# Paris dataset. Download and move all images to same directory.
-wget http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/paris_1.tgz
-wget http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/paris_2.tgz
-mkdir paris6k_images_tmp
-tar -xvzf paris_1.tgz -C paris6k_images_tmp/
-tar -xvzf paris_2.tgz -C paris6k_images_tmp/
-mkdir paris6k_images
-mv paris6k_images_tmp/paris/*/*.jpg paris6k_images/
-
-# Revisited annotations.
-wget http://cmp.felk.cvut.cz/revisitop/data/datasets/roxford5k/gnd_roxford5k.mat
-wget http://cmp.felk.cvut.cz/revisitop/data/datasets/rparis6k/gnd_rparis6k.mat
-```
-
-### Download models
-
-These are necessary to reproduce the main paper results:
-
-```bash
-# From models/research/delf/delf/python/detect_to_retrieve
-mkdir parameters && cd parameters
-
-# DELF-GLD model.
-wget http://storage.googleapis.com/delf/delf_gld_20190411.tar.gz
-tar -xvzf delf_gld_20190411.tar.gz
-
-# Faster-RCNN detector model.
-wget http://storage.googleapis.com/delf/d2r_frcnn_20190411.tar.gz
-tar -xvzf d2r_frcnn_20190411.tar.gz
-
-# Codebooks.
-# Note: you should use codebook trained on rparis6k for roxford5k retrieval
-# experiments, and vice-versa.
-wget http://storage.googleapis.com/delf/rparis6k_codebook_65536.tar.gz
-mkdir rparis6k_codebook_65536
-tar -xvzf rparis6k_codebook_65536.tar.gz -C rparis6k_codebook_65536/
-wget http://storage.googleapis.com/delf/roxford5k_codebook_65536.tar.gz
-mkdir roxford5k_codebook_65536
-tar -xvzf roxford5k_codebook_65536.tar.gz -C roxford5k_codebook_65536/
-```
-
-We also make available other models/parameters that can be used to reproduce
-more results from the paper:
-
-- [MobileNet-SSD trained detector](http://storage.googleapis.com/delf/d2r_mnetssd_20190411.tar.gz).
-- Codebooks with 1024 centroids:
- [rparis6k](http://storage.googleapis.com/delf/rparis6k_codebook_1024.tar.gz),
- [roxford5k](http://storage.googleapis.com/delf/roxford5k_codebook_1024.tar.gz).
-
-### Feature extraction
-
-We present here commands for extraction on `roxford5k`. To extract on `rparis6k`
-instead, please edit the arguments accordingly (especially the
-`dataset_file_path` argument).
-
-#### Query feature extraction
-
-For query feature extraction, the cropped query image should be used to extract
-features, according to the Revisited Oxford/Paris experimental protocol. Note
-that this is done in the `extract_query_features` script.
-
-Query feature extraction can be run as follows:
-
-```bash
-# From models/research/delf/delf/python/detect_to_retrieve
-python3 extract_query_features.py \
- --delf_config_path delf_gld_config.pbtxt \
- --dataset_file_path ~/detect_to_retrieve/data/gnd_roxford5k.mat \
- --images_dir ~/detect_to_retrieve/data/oxford5k_images \
- --output_features_dir ~/detect_to_retrieve/data/oxford5k_features/query
-```
-
-#### Index feature extraction and box detection
-
-Index feature extraction / box detection can be run as follows:
-
-```bash
-# From models/research/delf/delf/python/detect_to_retrieve
-python3 extract_index_boxes_and_features.py \
- --delf_config_path delf_gld_config.pbtxt \
- --detector_model_dir parameters/d2r_frcnn_20190411 \
- --detector_thresh 0.1 \
- --dataset_file_path ~/detect_to_retrieve/data/gnd_roxford5k.mat \
- --images_dir ~/detect_to_retrieve/data/oxford5k_images \
- --output_boxes_dir ~/detect_to_retrieve/data/oxford5k_boxes/index \
- --output_features_dir ~/detect_to_retrieve/data/oxford5k_features/index_0.1 \
- --output_index_mapping ~/detect_to_retrieve/data/oxford5k_features/index_mapping_0.1.csv
-```
-
-### R-ASMK* aggregation extraction
-
-We present here commands for aggregation extraction on `roxford5k`. To extract
-on `rparis6k` instead, please edit the arguments accordingly. In particular,
-note that feature aggregation on `roxford5k` should use a codebook trained on
-`rparis6k`, and vice-versa (this can be edited in the
-`query_aggregation_config.pbtxt` and `index_aggregation_config.pbtxt` files.
-
-#### Query
-
-Run query feature aggregation as follows:
-
-```bash
-# From models/research/delf/delf/python/detect_to_retrieve
-python3 extract_aggregation.py \
- --use_query_images True \
- --aggregation_config_path query_aggregation_config.pbtxt \
- --dataset_file_path ~/detect_to_retrieve/data/gnd_roxford5k.mat \
- --features_dir ~/detect_to_retrieve/data/oxford5k_features/query \
- --output_aggregation_dir ~/detect_to_retrieve/data/oxford5k_aggregation/query
-```
-
-#### Index
-
-Run index feature aggregation as follows:
-
-```bash
-# From models/research/delf/delf/python/detect_to_retrieve
-python3 extract_aggregation.py \
- --aggregation_config_path index_aggregation_config.pbtxt \
- --dataset_file_path ~/detect_to_retrieve/data/gnd_roxford5k.mat \
- --features_dir ~/detect_to_retrieve/data/oxford5k_features/index_0.1 \
- --index_mapping_path ~/detect_to_retrieve/data/oxford5k_features/index_mapping_0.1.csv \
- --output_aggregation_dir ~/detect_to_retrieve/data/oxford5k_aggregation/index_0.1
-```
-
-### Perform retrieval
-
-Currently, we support retrieval via brute-force comparison of aggregated
-features.
-
-To run retrieval on `roxford5k`, the following command can be used:
-
-```bash
-# From models/research/delf/delf/python/detect_to_retrieve
-python3 perform_retrieval.py \
- --index_aggregation_config_path index_aggregation_config.pbtxt \
- --query_aggregation_config_path query_aggregation_config.pbtxt \
- --dataset_file_path ~/detect_to_retrieve/data/gnd_roxford5k.mat \
- --index_aggregation_dir ~/detect_to_retrieve/data/oxford5k_aggregation/index_0.1 \
- --query_aggregation_dir ~/detect_to_retrieve/data/oxford5k_aggregation/query \
- --output_dir ~/detect_to_retrieve/results/oxford5k
-```
-
-A file with named `metrics.txt` will be written to the path given in
-`output_dir`, with retrieval metrics for an experiment where geometric
-verification is not used. The contents should look approximately like:
-
-```
-hard
-mAP=47.61
-mP@k[ 1 5 10] [84.29 73.71 64.43]
-mR@k[ 1 5 10] [18.84 29.44 36.82]
-medium
-mAP=73.3
-mP@k[ 1 5 10] [97.14 94.57 90.14]
-mR@k[ 1 5 10] [10.14 26.2 34.75]
-```
-
-which are the results presented in Table 2 of the paper (with small numerical
-precision differences).
-
-If you want to run retrieval with geometric verification, set
-`use_geometric_verification` to `True` and the arguments
-`index_features_dir`/`query_features_dir`. It's much slower since (1) in this
-code example the re-ranking is loading DELF local features from disk, and (2)
-re-ranking needs to be performed separately for each dataset protocol, since the
-junk images from each protocol should be removed when re-ranking. Here is an
-example command:
-
-```bash
-# From models/research/delf/delf/python/detect_to_retrieve
-python3 perform_retrieval.py \
- --index_aggregation_config_path index_aggregation_config.pbtxt \
- --query_aggregation_config_path query_aggregation_config.pbtxt \
- --dataset_file_path ~/detect_to_retrieve/data/gnd_roxford5k.mat \
- --index_aggregation_dir ~/detect_to_retrieve/data/oxford5k_aggregation/index_0.1 \
- --query_aggregation_dir ~/detect_to_retrieve/data/oxford5k_aggregation/query \
- --use_geometric_verification True \
- --index_features_dir ~/detect_to_retrieve/data/oxford5k_features/index_0.1 \
- --query_features_dir ~/detect_to_retrieve/data/oxford5k_features/query \
- --output_dir ~/detect_to_retrieve/results/oxford5k_with_gv
-```
-
-### Clustering
-
-In the code example above, we used a pre-trained DELF codebook. We also provide
-code for re-training the codebook if desired.
-
-Note that for the time being this can only run on CPU, since the main ops in
-K-means are not registered for GPU usage in Tensorflow.
-
-```bash
-# From models/research/delf/delf/python/detect_to_retrieve
-python3 cluster_delf_features.py \
- --dataset_file_path ~/detect_to_retrieve/data/gnd_rparis6k.mat \
- --features_dir ~/detect_to_retrieve/data/paris6k_features/index_0.1 \
- --num_clusters 1024 \
- --num_iterations 50 \
- --output_cluster_dir ~/detect_to_retrieve/data/paris6k_clusters_1024
-```
-
-### Next steps
-
-To make retrieval more scalable and handle larger datasets more smoothly, we are
-considering to provide code for inverted index building and retrieval. Please
-reach out if you would like to help doing that -- feel free submit a pull
-request.
diff --git a/research/delf/delf/python/detect_to_retrieve/__init__.py b/research/delf/delf/python/detect_to_retrieve/__init__.py
deleted file mode 100644
index 82a78321eb8..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/__init__.py
+++ /dev/null
@@ -1,23 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Module for Detect-to-Retrieve technique."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-# pylint: disable=unused-import
-from delf.python.detect_to_retrieve import aggregation_extraction
-from delf.python.detect_to_retrieve import boxes_and_features_extraction
-# pylint: enable=unused-import
diff --git a/research/delf/delf/python/detect_to_retrieve/aggregation_extraction.py b/research/delf/delf/python/detect_to_retrieve/aggregation_extraction.py
deleted file mode 100644
index 4ddab944b8a..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/aggregation_extraction.py
+++ /dev/null
@@ -1,193 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Library to extract/save feature aggregation."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import csv
-import os
-import time
-
-import numpy as np
-import tensorflow as tf
-
-from google.protobuf import text_format
-from delf import aggregation_config_pb2
-from delf import datum_io
-from delf import feature_aggregation_extractor
-from delf import feature_io
-
-# Aliases for aggregation types.
-_VLAD = aggregation_config_pb2.AggregationConfig.VLAD
-_ASMK = aggregation_config_pb2.AggregationConfig.ASMK
-_ASMK_STAR = aggregation_config_pb2.AggregationConfig.ASMK_STAR
-
-# Extensions.
-_DELF_EXTENSION = '.delf'
-_VLAD_EXTENSION_SUFFIX = 'vlad'
-_ASMK_EXTENSION_SUFFIX = 'asmk'
-_ASMK_STAR_EXTENSION_SUFFIX = 'asmk_star'
-
-# Pace to report extraction log.
-_STATUS_CHECK_ITERATIONS = 50
-
-
-def _ReadMappingBasenameToBoxNames(input_path, index_image_names):
- """Reads mapping from image name to DELF file names for each box.
-
- Args:
- input_path: Path to CSV file containing mapping.
- index_image_names: List containing index image names, in order, for the
- dataset under consideration.
-
- Returns:
- images_to_box_feature_files: Dict. key=string (image name); value=list of
- strings (file names containing DELF features for boxes).
- """
- images_to_box_feature_files = {}
- with tf.io.gfile.GFile(input_path, 'r') as f:
- reader = csv.DictReader(f)
- for row in reader:
- index_image_name = index_image_names[int(row['index_image_id'])]
- if index_image_name not in images_to_box_feature_files:
- images_to_box_feature_files[index_image_name] = []
-
- images_to_box_feature_files[index_image_name].append(row['name'])
-
- return images_to_box_feature_files
-
-
-def ExtractAggregatedRepresentationsToFiles(image_names, features_dir,
- aggregation_config_path,
- mapping_path,
- output_aggregation_dir):
- """Extracts aggregated feature representations, saving them to files.
-
- It checks if the aggregated representation for an image already exists,
- and skips computation for those.
-
- Args:
- image_names: List of image names. These are used to compose input file names
- for the feature files, and the output file names for aggregated
- representations.
- features_dir: Directory where DELF features are located.
- aggregation_config_path: Path to AggregationConfig proto text file with
- configuration to be used for extraction.
- mapping_path: Optional CSV file which maps each .delf file name to the index
- image ID and detected box ID. If regional aggregation is performed, this
- should be set. Otherwise, this is ignored.
- output_aggregation_dir: Directory where aggregation output will be written
- to.
-
- Raises:
- ValueError: If AggregationConfig is malformed, or `mapping_path` is
- missing.
- """
- num_images = len(image_names)
-
- # Parse AggregationConfig proto, and select output extension.
- config = aggregation_config_pb2.AggregationConfig()
- with tf.io.gfile.GFile(aggregation_config_path, 'r') as f:
- text_format.Merge(f.read(), config)
- output_extension = '.'
- if config.use_regional_aggregation:
- output_extension += 'r'
- if config.aggregation_type == _VLAD:
- output_extension += _VLAD_EXTENSION_SUFFIX
- elif config.aggregation_type == _ASMK:
- output_extension += _ASMK_EXTENSION_SUFFIX
- elif config.aggregation_type == _ASMK_STAR:
- output_extension += _ASMK_STAR_EXTENSION_SUFFIX
- else:
- raise ValueError('Invalid aggregation type: %d' % config.aggregation_type)
-
- # Read index mapping path, if provided.
- if mapping_path:
- images_to_box_feature_files = _ReadMappingBasenameToBoxNames(
- mapping_path, image_names)
-
- # Create output directory if necessary.
- if not tf.io.gfile.exists(output_aggregation_dir):
- tf.io.gfile.makedirs(output_aggregation_dir)
-
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
-
- start = time.time()
- for i in range(num_images):
- if i == 0:
- print('Starting to extract aggregation from images...')
- elif i % _STATUS_CHECK_ITERATIONS == 0:
- elapsed = (time.time() - start)
- print('Processing image %d out of %d, last %d '
- 'images took %f seconds' %
- (i, num_images, _STATUS_CHECK_ITERATIONS, elapsed))
- start = time.time()
-
- image_name = image_names[i]
-
- # Compose output file name, skip extraction for this image if it already
- # exists.
- output_aggregation_filename = os.path.join(output_aggregation_dir,
- image_name + output_extension)
- if tf.io.gfile.exists(output_aggregation_filename):
- print('Skipping %s' % image_name)
- continue
-
- # Load DELF features.
- if config.use_regional_aggregation:
- if not mapping_path:
- raise ValueError(
- 'Requested regional aggregation, but mapping_path was not '
- 'provided')
- descriptors_list = []
- num_features_per_box = []
- for box_feature_file in images_to_box_feature_files[image_name]:
- delf_filename = os.path.join(features_dir,
- box_feature_file + _DELF_EXTENSION)
- _, _, box_descriptors, _, _ = feature_io.ReadFromFile(delf_filename)
- # If `box_descriptors` is empty, reshape it such that it can be
- # concatenated with other descriptors.
- if not box_descriptors.shape[0]:
- box_descriptors = np.reshape(box_descriptors,
- [0, config.feature_dimensionality])
- descriptors_list.append(box_descriptors)
- num_features_per_box.append(box_descriptors.shape[0])
-
- descriptors = np.concatenate(descriptors_list)
- else:
- input_delf_filename = os.path.join(features_dir,
- image_name + _DELF_EXTENSION)
- _, _, descriptors, _, _ = feature_io.ReadFromFile(input_delf_filename)
- # If `descriptors` is empty, reshape it to avoid extraction failure.
- if not descriptors.shape[0]:
- descriptors = np.reshape(descriptors,
- [0, config.feature_dimensionality])
- num_features_per_box = None
-
- # Extract and save aggregation. If using VLAD, only
- # `aggregated_descriptors` needs to be saved.
- (aggregated_descriptors,
- feature_visual_words) = extractor.Extract(descriptors,
- num_features_per_box)
- if config.aggregation_type == _VLAD:
- datum_io.WriteToFile(aggregated_descriptors,
- output_aggregation_filename)
- else:
- datum_io.WritePairToFile(aggregated_descriptors,
- feature_visual_words.astype('uint32'),
- output_aggregation_filename)
diff --git a/research/delf/delf/python/detect_to_retrieve/boxes_and_features_extraction.py b/research/delf/delf/python/detect_to_retrieve/boxes_and_features_extraction.py
deleted file mode 100644
index 1faef983b2e..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/boxes_and_features_extraction.py
+++ /dev/null
@@ -1,202 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Library to extract/save boxes and DELF features."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import csv
-import math
-import os
-import time
-
-import numpy as np
-import tensorflow as tf
-
-from google.protobuf import text_format
-from delf import delf_config_pb2
-from delf import box_io
-from delf import feature_io
-from delf import utils
-from delf import detector
-from delf import extractor
-
-# Extension of feature files.
-_BOX_EXTENSION = '.boxes'
-_DELF_EXTENSION = '.delf'
-
-# Pace to report extraction log.
-_STATUS_CHECK_ITERATIONS = 100
-
-
-def _WriteMappingBasenameToIds(index_names_ids_and_boxes, output_path):
- """Helper function to write CSV mapping from DELF file name to IDs.
-
- Args:
- index_names_ids_and_boxes: List containing 3-element lists with name, image
- ID and box ID.
- output_path: Output CSV path.
- """
- with tf.io.gfile.GFile(output_path, 'w') as f:
- csv_writer = csv.DictWriter(
- f, fieldnames=['name', 'index_image_id', 'box_id'])
- csv_writer.writeheader()
- for name_imid_boxid in index_names_ids_and_boxes:
- csv_writer.writerow({
- 'name': name_imid_boxid[0],
- 'index_image_id': name_imid_boxid[1],
- 'box_id': name_imid_boxid[2],
- })
-
-
-def ExtractBoxesAndFeaturesToFiles(image_names, image_paths, delf_config_path,
- detector_model_dir, detector_thresh,
- output_features_dir, output_boxes_dir,
- output_mapping):
- """Extracts boxes and features, saving them to files.
-
- Boxes are saved to .boxes files. DELF features are extracted for
- the entire image and saved into .delf files. In addition, DELF
- features are extracted for each high-confidence bounding box in the image, and
- saved into files named _0.delf, _1.delf, etc.
-
- It checks if descriptors/boxes already exist, and skips computation for those.
-
- Args:
- image_names: List of image names. These are used to compose output file
- names for boxes and features.
- image_paths: List of image paths. image_paths[i] is the path for the image
- named by image_names[i]. `image_names` and `image_paths` must have the
- same number of elements.
- delf_config_path: Path to DelfConfig proto text file.
- detector_model_dir: Directory where detector SavedModel is located.
- detector_thresh: Threshold used to decide if an image's detected box
- undergoes feature extraction.
- output_features_dir: Directory where DELF features will be written to.
- output_boxes_dir: Directory where detected boxes will be written to.
- output_mapping: CSV file which maps each .delf file name to the image ID and
- detected box ID.
-
- Raises:
- ValueError: If len(image_names) and len(image_paths) are different.
- """
- num_images = len(image_names)
- if len(image_paths) != num_images:
- raise ValueError(
- 'image_names and image_paths have different number of items')
-
- # Parse DelfConfig proto.
- config = delf_config_pb2.DelfConfig()
- with tf.io.gfile.GFile(delf_config_path, 'r') as f:
- text_format.Merge(f.read(), config)
-
- # Create output directories if necessary.
- if not tf.io.gfile.exists(output_features_dir):
- tf.io.gfile.makedirs(output_features_dir)
- if not tf.io.gfile.exists(output_boxes_dir):
- tf.io.gfile.makedirs(output_boxes_dir)
- if not tf.io.gfile.exists(os.path.dirname(output_mapping)):
- tf.io.gfile.makedirs(os.path.dirname(output_mapping))
-
- names_ids_and_boxes = []
- detector_fn = detector.MakeDetector(detector_model_dir)
- delf_extractor_fn = extractor.MakeExtractor(config)
-
- start = time.time()
- for i in range(num_images):
- if i == 0:
- print('Starting to extract features/boxes...')
- elif i % _STATUS_CHECK_ITERATIONS == 0:
- elapsed = (time.time() - start)
- print('Processing image %d out of %d, last %d '
- 'images took %f seconds' %
- (i, num_images, _STATUS_CHECK_ITERATIONS, elapsed))
- start = time.time()
-
- image_name = image_names[i]
- output_feature_filename_whole_image = os.path.join(
- output_features_dir, image_name + _DELF_EXTENSION)
- output_box_filename = os.path.join(output_boxes_dir,
- image_name + _BOX_EXTENSION)
-
- pil_im = utils.RgbLoader(image_paths[i])
- width, height = pil_im.size
-
- # Extract and save boxes.
- if tf.io.gfile.exists(output_box_filename):
- print('Skipping box computation for %s' % image_name)
- (boxes_out, scores_out,
- class_indices_out) = box_io.ReadFromFile(output_box_filename)
- else:
- (boxes_out, scores_out,
- class_indices_out) = detector_fn(np.expand_dims(pil_im, 0))
- # Using only one image per batch.
- boxes_out = boxes_out[0]
- scores_out = scores_out[0]
- class_indices_out = class_indices_out[0]
- box_io.WriteToFile(output_box_filename, boxes_out, scores_out,
- class_indices_out)
-
- # Select boxes with scores greater than threshold. Those will be the
- # ones with extracted DELF features (besides the whole image, whose DELF
- # features are extracted in all cases).
- num_delf_files = 1
- selected_boxes = []
- for box_ind, box in enumerate(boxes_out):
- if scores_out[box_ind] >= detector_thresh:
- selected_boxes.append(box)
- num_delf_files += len(selected_boxes)
-
- # Extract and save DELF features.
- for delf_file_ind in range(num_delf_files):
- if delf_file_ind == 0:
- box_name = image_name
- output_feature_filename = output_feature_filename_whole_image
- else:
- box_name = image_name + '_' + str(delf_file_ind - 1)
- output_feature_filename = os.path.join(output_features_dir,
- box_name + _DELF_EXTENSION)
-
- names_ids_and_boxes.append([box_name, i, delf_file_ind - 1])
-
- if tf.io.gfile.exists(output_feature_filename):
- print('Skipping DELF computation for %s' % box_name)
- continue
-
- if delf_file_ind >= 1:
- bbox_for_cropping = selected_boxes[delf_file_ind - 1]
- bbox_for_cropping_pil_convention = [
- int(math.floor(bbox_for_cropping[1] * width)),
- int(math.floor(bbox_for_cropping[0] * height)),
- int(math.ceil(bbox_for_cropping[3] * width)),
- int(math.ceil(bbox_for_cropping[2] * height))
- ]
- pil_cropped_im = pil_im.crop(bbox_for_cropping_pil_convention)
- im = np.array(pil_cropped_im)
- else:
- im = np.array(pil_im)
-
- extracted_features = delf_extractor_fn(im)
- locations_out = extracted_features['local_features']['locations']
- descriptors_out = extracted_features['local_features']['descriptors']
- feature_scales_out = extracted_features['local_features']['scales']
- attention_out = extracted_features['local_features']['attention']
-
- feature_io.WriteToFile(output_feature_filename, locations_out,
- feature_scales_out, descriptors_out, attention_out)
-
- # Save mapping from output DELF name to image id and box id.
- _WriteMappingBasenameToIds(names_ids_and_boxes, output_mapping)
diff --git a/research/delf/delf/python/detect_to_retrieve/cluster_delf_features.py b/research/delf/delf/python/detect_to_retrieve/cluster_delf_features.py
deleted file mode 100644
index f77d47b1db9..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/cluster_delf_features.py
+++ /dev/null
@@ -1,214 +0,0 @@
-# Lint as: python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Clusters DELF features using the K-means algorithm.
-
-All DELF local feature descriptors for a given dataset's index images are loaded
-as the input.
-
-Note that:
-- we only use features extracted from whole images (no features from boxes are
- used).
-- the codebook should be trained on Paris images for Oxford retrieval
- experiments, and vice-versa.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-import time
-
-from absl import app
-import numpy as np
-import tensorflow as tf
-
-from delf import feature_io
-from delf.python.datasets.revisited_op import dataset
-
-cmd_args = None
-
-# Extensions.
-_DELF_EXTENSION = '.delf'
-
-# Default DELF dimensionality.
-_DELF_DIM = 128
-
-# Pace to report log when collecting features.
-_STATUS_CHECK_ITERATIONS = 100
-
-
-class _IteratorInitHook(tf.estimator.SessionRunHook):
- """Hook to initialize data iterator after session is created."""
-
- def __init__(self):
- super(_IteratorInitHook, self).__init__()
- self.iterator_initializer_fn = None
-
- def after_create_session(self, session, coord):
- """Initialize the iterator after the session has been created."""
- del coord
- self.iterator_initializer_fn(session)
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Process output directory.
- if tf.io.gfile.exists(cmd_args.output_cluster_dir):
- raise RuntimeError(
- 'output_cluster_dir = %s already exists. This may indicate that a '
- 'previous run already wrote checkpoints in this directory, which would '
- 'lead to incorrect training. Please re-run this script by specifying an'
- ' inexisting directory.' % cmd_args.output_cluster_dir)
- else:
- tf.io.gfile.makedirs(cmd_args.output_cluster_dir)
-
- # Read list of index images from dataset file.
- print('Reading list of index images from dataset file...')
- _, index_list, _ = dataset.ReadDatasetFile(cmd_args.dataset_file_path)
- num_images = len(index_list)
- print('done! Found %d images' % num_images)
-
- # Loop over list of index images and collect DELF features.
- features_for_clustering = []
- start = time.clock()
- print('Starting to collect features from index images...')
- for i in range(num_images):
- if i > 0 and i % _STATUS_CHECK_ITERATIONS == 0:
- elapsed = (time.clock() - start)
- print('Processing index image %d out of %d, last %d '
- 'images took %f seconds' %
- (i, num_images, _STATUS_CHECK_ITERATIONS, elapsed))
- start = time.clock()
-
- features_filename = index_list[i] + _DELF_EXTENSION
- features_fullpath = os.path.join(cmd_args.features_dir, features_filename)
- _, _, features, _, _ = feature_io.ReadFromFile(features_fullpath)
- if features.size != 0:
- assert features.shape[1] == _DELF_DIM
- for feature in features:
- features_for_clustering.append(feature)
-
- features_for_clustering = np.array(features_for_clustering, dtype=np.float32)
- print('All features were loaded! There are %d features, each with %d '
- 'dimensions' %
- (features_for_clustering.shape[0], features_for_clustering.shape[1]))
-
- # Run K-means clustering.
- def _get_input_fn():
- """Helper function to create input function and hook for training.
-
- Returns:
- input_fn: Input function for k-means Estimator training.
- init_hook: Hook used to load data during training.
- """
- init_hook = _IteratorInitHook()
-
- def _input_fn():
- """Produces tf.data.Dataset object for k-means training.
-
- Returns:
- Tensor with the data for training.
- """
- features_placeholder = tf.compat.v1.placeholder(
- tf.float32, features_for_clustering.shape)
- delf_dataset = tf.data.Dataset.from_tensor_slices((features_placeholder))
- delf_dataset = delf_dataset.shuffle(1000).batch(
- features_for_clustering.shape[0])
- iterator = tf.compat.v1.data.make_initializable_iterator(delf_dataset)
-
- def _initializer_fn(sess):
- """Initialize dataset iterator, feed in the data."""
- sess.run(
- iterator.initializer,
- feed_dict={features_placeholder: features_for_clustering})
-
- init_hook.iterator_initializer_fn = _initializer_fn
- return iterator.get_next()
-
- return _input_fn, init_hook
-
- input_fn, init_hook = _get_input_fn()
-
- kmeans = tf.compat.v1.estimator.experimental.KMeans(
- num_clusters=cmd_args.num_clusters,
- model_dir=cmd_args.output_cluster_dir,
- use_mini_batch=False,
- )
-
- print('Starting K-means clustering...')
- start = time.clock()
- for i in range(cmd_args.num_iterations):
- kmeans.train(input_fn, hooks=[init_hook])
- average_sum_squared_error = kmeans.evaluate(
- input_fn, hooks=[init_hook])['score'] / features_for_clustering.shape[0]
- elapsed = (time.clock() - start)
- print('K-means iteration %d (out of %d) took %f seconds, '
- 'average-sum-of-squares: %f' %
- (i, cmd_args.num_iterations, elapsed, average_sum_squared_error))
- start = time.clock()
-
- print('K-means clustering finished!')
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--dataset_file_path',
- type=str,
- default='/tmp/gnd_roxford5k.mat',
- help="""
- Dataset file for Revisited Oxford or Paris dataset, in .mat format. The
- list of index images loaded from this file is used to collect local
- features, which are assumed to be in .delf file format.
- """)
- parser.add_argument(
- '--features_dir',
- type=str,
- default='/tmp/features',
- help="""
- Directory where DELF feature files are to be found.
- """)
- parser.add_argument(
- '--num_clusters',
- type=int,
- default=1024,
- help="""
- Number of clusters to use.
- """)
- parser.add_argument(
- '--num_iterations',
- type=int,
- default=50,
- help="""
- Number of iterations to use.
- """)
- parser.add_argument(
- '--output_cluster_dir',
- type=str,
- default='/tmp/cluster',
- help="""
- Directory where clustering outputs are written to. This directory should
- not exist before running this script; it will be created during
- clustering.
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/detect_to_retrieve/delf_gld_config.pbtxt b/research/delf/delf/python/detect_to_retrieve/delf_gld_config.pbtxt
deleted file mode 100644
index 046aed766ce..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/delf_gld_config.pbtxt
+++ /dev/null
@@ -1,25 +0,0 @@
-model_path: "parameters/delf_gld_20190411/model"
-image_scales: .25
-image_scales: .3536
-image_scales: .5
-image_scales: .7071
-image_scales: 1.0
-image_scales: 1.4142
-image_scales: 2.0
-delf_local_config {
- use_pca: true
- # Note that for the exported model provided as an example, layer_name and
- # iou_threshold are hard-coded in the checkpoint. So, the layer_name and
- # iou_threshold variables here have no effect on the provided
- # extract_features.py script.
- layer_name: "resnet_v1_50/block3"
- iou_threshold: 1.0
- max_feature_num: 1000
- score_threshold: 100.0
- pca_parameters {
- mean_path: "parameters/delf_gld_20190411/pca/mean.datum"
- projection_matrix_path: "parameters/delf_gld_20190411/pca/pca_proj_mat.datum"
- pca_dim: 128
- use_whitening: false
- }
-}
diff --git a/research/delf/delf/python/detect_to_retrieve/extract_aggregation.py b/research/delf/delf/python/detect_to_retrieve/extract_aggregation.py
deleted file mode 100644
index 451c4137d93..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/extract_aggregation.py
+++ /dev/null
@@ -1,113 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Extracts aggregation for images from Revisited Oxford/Paris datasets.
-
-The program checks if the aggregated representation for an image already exists,
-and skips computation for those.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import sys
-
-from absl import app
-from delf.python.datasets.revisited_op import dataset
-from delf.python.detect_to_retrieve import aggregation_extraction
-
-cmd_args = None
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Read list of images from dataset file.
- print('Reading list of images from dataset file...')
- query_list, index_list, _ = dataset.ReadDatasetFile(
- cmd_args.dataset_file_path)
- if cmd_args.use_query_images:
- image_list = query_list
- else:
- image_list = index_list
- num_images = len(image_list)
- print('done! Found %d images' % num_images)
-
- aggregation_extraction.ExtractAggregatedRepresentationsToFiles(
- image_names=image_list,
- features_dir=cmd_args.features_dir,
- aggregation_config_path=cmd_args.aggregation_config_path,
- mapping_path=cmd_args.index_mapping_path,
- output_aggregation_dir=cmd_args.output_aggregation_dir)
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--aggregation_config_path',
- type=str,
- default='/tmp/aggregation_config.pbtxt',
- help="""
- Path to AggregationConfig proto text file with configuration to be used
- for extraction.
- """)
- parser.add_argument(
- '--dataset_file_path',
- type=str,
- default='/tmp/gnd_roxford5k.mat',
- help="""
- Dataset file for Revisited Oxford or Paris dataset, in .mat format.
- """)
- parser.add_argument(
- '--use_query_images',
- type=lambda x: (str(x).lower() == 'true'),
- default=False,
- help="""
- If True, processes the query images of the dataset. If False, processes
- the database (ie, index) images.
- """)
- parser.add_argument(
- '--features_dir',
- type=str,
- default='/tmp/features',
- help="""
- Directory where image features are located, all in .delf format.
- """)
- parser.add_argument(
- '--index_mapping_path',
- type=str,
- default='',
- help="""
- Optional CSV file which maps each .delf file name to the index image ID
- and detected box ID. If regional aggregation is performed, this should be
- set. Otherwise, this is ignored.
- Usually this file is obtained as an output from the
- `extract_index_boxes_and_features.py` script.
- """)
- parser.add_argument(
- '--output_aggregation_dir',
- type=str,
- default='/tmp/aggregation',
- help="""
- Directory where aggregation output will be written to. Each image's
- features will be written to a file with same name, and extension replaced
- by one of
- ['.vlad', '.asmk', '.asmk_star', '.rvlad', '.rasmk', '.rasmk_star'].
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/detect_to_retrieve/extract_index_boxes_and_features.py b/research/delf/delf/python/detect_to_retrieve/extract_index_boxes_and_features.py
deleted file mode 100644
index 80bd721c874..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/extract_index_boxes_and_features.py
+++ /dev/null
@@ -1,151 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Extracts DELF and boxes from the Revisited Oxford/Paris index datasets.
-
-Boxes are saved to .boxes files. DELF features are extracted for the
-entire image and saved into .delf files. In addition, DELF features
-are extracted for each high-confidence bounding box in the image, and saved into
-files named _0.delf, _1.delf, etc.
-
-The program checks if descriptors/boxes already exist, and skips computation for
-those.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-
-from absl import app
-from delf.python.datasets.revisited_op import dataset
-from delf.python.detect_to_retrieve import boxes_and_features_extraction
-
-cmd_args = None
-
-_IMAGE_EXTENSION = '.jpg'
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Read list of index images from dataset file.
- print('Reading list of index images from dataset file...')
- _, index_list, _ = dataset.ReadDatasetFile(cmd_args.dataset_file_path)
- num_images = len(index_list)
- print('done! Found %d images' % num_images)
-
- # Compose list of image paths.
- image_paths = [
- os.path.join(cmd_args.images_dir, index_image_name + _IMAGE_EXTENSION)
- for index_image_name in index_list
- ]
-
- # Extract boxes/features and save them to files.
- boxes_and_features_extraction.ExtractBoxesAndFeaturesToFiles(
- image_names=index_list,
- image_paths=image_paths,
- delf_config_path=cmd_args.delf_config_path,
- detector_model_dir=cmd_args.detector_model_dir,
- detector_thresh=cmd_args.detector_thresh,
- output_features_dir=cmd_args.output_features_dir,
- output_boxes_dir=cmd_args.output_boxes_dir,
- output_mapping=cmd_args.output_index_mapping)
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--delf_config_path',
- type=str,
- default='/tmp/delf_config_example.pbtxt',
- help="""
- Path to DelfConfig proto text file with configuration to be used for DELF
- extraction.
- """)
- parser.add_argument(
- '--detector_model_dir',
- type=str,
- default='/tmp/detector_model',
- help="""
- Directory where detector SavedModel is located.
- """)
- parser.add_argument(
- '--detector_thresh',
- type=float,
- default=0.1,
- help="""
- Threshold used to decide if an image's detected box undergoes feature
- extraction. For all detected boxes with detection score larger than this,
- a .delf file is saved containing the box features. Note that this
- threshold is used only to select which boxes are used in feature
- extraction; all detected boxes are actually saved in the .boxes file, even
- those with score lower than detector_thresh.
- """)
- parser.add_argument(
- '--dataset_file_path',
- type=str,
- default='/tmp/gnd_roxford5k.mat',
- help="""
- Dataset file for Revisited Oxford or Paris dataset, in .mat format.
- """)
- parser.add_argument(
- '--images_dir',
- type=str,
- default='/tmp/images',
- help="""
- Directory where dataset images are located, all in .jpg format.
- """)
- parser.add_argument(
- '--output_boxes_dir',
- type=str,
- default='/tmp/boxes',
- help="""
- Directory where detected boxes will be written to. Each image's boxes
- will be written to a file with same name, and extension replaced by
- .boxes.
- """)
- parser.add_argument(
- '--output_features_dir',
- type=str,
- default='/tmp/features',
- help="""
- Directory where DELF features will be written to. Each image's features
- will be written to a file with same name, and extension replaced by .delf,
- eg: .delf. In addition, DELF features are extracted for each
- high-confidence bounding box in the image, and saved into files named
- _0.delf, _1.delf, etc.
- """)
- parser.add_argument(
- '--output_index_mapping',
- type=str,
- default='/tmp/index_mapping.csv',
- help="""
- CSV file which maps each .delf file name to the index image ID and
- detected box ID. The format is 'name,index_image_id,box_id', including a
- header. The 'name' refers to the .delf file name without extension.
-
- For example, a few lines may be like:
- 'radcliffe_camera_000158,2,-1'
- 'radcliffe_camera_000158_0,2,0'
- 'radcliffe_camera_000158_1,2,1'
- 'radcliffe_camera_000158_2,2,2'
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/detect_to_retrieve/extract_query_features.py b/research/delf/delf/python/detect_to_retrieve/extract_query_features.py
deleted file mode 100644
index 2ff4a5a23f5..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/extract_query_features.py
+++ /dev/null
@@ -1,137 +0,0 @@
-# Lint as: python3
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Extracts DELF features for query images from Revisited Oxford/Paris datasets.
-
-Note that query images are cropped before feature extraction, as required by the
-evaluation protocols of these datasets.
-
-The program checks if descriptors already exist, and skips computation for
-those.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-import time
-
-from absl import app
-import numpy as np
-import tensorflow as tf
-
-from google.protobuf import text_format
-from delf import delf_config_pb2
-from delf import feature_io
-from delf import utils
-from delf.python.datasets.revisited_op import dataset
-from delf import extractor
-
-cmd_args = None
-
-# Extensions.
-_DELF_EXTENSION = '.delf'
-_IMAGE_EXTENSION = '.jpg'
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Read list of query images from dataset file.
- print('Reading list of query images and boxes from dataset file...')
- query_list, _, ground_truth = dataset.ReadDatasetFile(
- cmd_args.dataset_file_path)
- num_images = len(query_list)
- print(f'done! Found {num_images} images')
-
- # Parse DelfConfig proto.
- config = delf_config_pb2.DelfConfig()
- with tf.io.gfile.GFile(cmd_args.delf_config_path, 'r') as f:
- text_format.Merge(f.read(), config)
-
- # Create output directory if necessary.
- if not tf.io.gfile.exists(cmd_args.output_features_dir):
- tf.io.gfile.makedirs(cmd_args.output_features_dir)
-
- extractor_fn = extractor.MakeExtractor(config)
-
- start = time.time()
- for i in range(num_images):
- query_image_name = query_list[i]
- input_image_filename = os.path.join(cmd_args.images_dir,
- query_image_name + _IMAGE_EXTENSION)
- output_feature_filename = os.path.join(cmd_args.output_features_dir,
- query_image_name + _DELF_EXTENSION)
- if tf.io.gfile.exists(output_feature_filename):
- print(f'Skipping {query_image_name}')
- continue
-
- # Crop query image according to bounding box.
- bbox = [int(round(b)) for b in ground_truth[i]['bbx']]
- im = np.array(utils.RgbLoader(input_image_filename).crop(bbox))
-
- # Extract and save features.
- extracted_features = extractor_fn(im)
- locations_out = extracted_features['local_features']['locations']
- descriptors_out = extracted_features['local_features']['descriptors']
- feature_scales_out = extracted_features['local_features']['scales']
- attention_out = extracted_features['local_features']['attention']
-
- feature_io.WriteToFile(output_feature_filename, locations_out,
- feature_scales_out, descriptors_out, attention_out)
-
- elapsed = (time.time() - start)
- print('Processed %d query images in %f seconds' % (num_images, elapsed))
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--delf_config_path',
- type=str,
- default='/tmp/delf_config_example.pbtxt',
- help="""
- Path to DelfConfig proto text file with configuration to be used for DELF
- extraction.
- """)
- parser.add_argument(
- '--dataset_file_path',
- type=str,
- default='/tmp/gnd_roxford5k.mat',
- help="""
- Dataset file for Revisited Oxford or Paris dataset, in .mat format.
- """)
- parser.add_argument(
- '--images_dir',
- type=str,
- default='/tmp/images',
- help="""
- Directory where dataset images are located, all in .jpg format.
- """)
- parser.add_argument(
- '--output_features_dir',
- type=str,
- default='/tmp/features',
- help="""
- Directory where DELF features will be written to. Each image's features
- will be written to a file with same name, and extension replaced by .delf.
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/detect_to_retrieve/image_reranking.py b/research/delf/delf/python/detect_to_retrieve/image_reranking.py
deleted file mode 100644
index 8c115835d63..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/image_reranking.py
+++ /dev/null
@@ -1,303 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Library to re-rank images based on geometric verification."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import io
-import os
-
-import matplotlib.pyplot as plt
-import numpy as np
-from scipy import spatial
-from skimage import feature
-from skimage import measure
-from skimage import transform
-
-from delf import feature_io
-
-# Extensions.
-_DELF_EXTENSION = '.delf'
-
-# Pace to log.
-_STATUS_CHECK_GV_ITERATIONS = 10
-
-# Re-ranking / geometric verification parameters.
-_NUM_TO_RERANK = 100
-_NUM_RANSAC_TRIALS = 1000
-_MIN_RANSAC_SAMPLES = 3
-
-
-def MatchFeatures(query_locations,
- query_descriptors,
- index_image_locations,
- index_image_descriptors,
- ransac_seed=None,
- descriptor_matching_threshold=0.9,
- ransac_residual_threshold=10.0,
- query_im_array=None,
- index_im_array=None,
- query_im_scale_factors=None,
- index_im_scale_factors=None,
- use_ratio_test=False):
- """Matches local features using geometric verification.
-
- First, finds putative local feature matches by matching `query_descriptors`
- against a KD-tree from the `index_image_descriptors`. Then, attempts to fit an
- affine transformation between the putative feature corresponces using their
- locations.
-
- Args:
- query_locations: Locations of local features for query image. NumPy array of
- shape [#query_features, 2].
- query_descriptors: Descriptors of local features for query image. NumPy
- array of shape [#query_features, depth].
- index_image_locations: Locations of local features for index image. NumPy
- array of shape [#index_image_features, 2].
- index_image_descriptors: Descriptors of local features for index image.
- NumPy array of shape [#index_image_features, depth].
- ransac_seed: Seed used by RANSAC. If None (default), no seed is provided.
- descriptor_matching_threshold: Threshold below which a pair of local
- descriptors is considered a potential match, and will be fed into RANSAC.
- If use_ratio_test==False, this is a simple distance threshold. If
- use_ratio_test==True, this is Lowe's ratio test threshold.
- ransac_residual_threshold: Residual error threshold for considering matches
- as inliers, used in RANSAC algorithm.
- query_im_array: Optional. If not None, contains a NumPy array with the query
- image, used to produce match visualization, if there is a match.
- index_im_array: Optional. Same as `query_im_array`, but for index image.
- query_im_scale_factors: Optional. If not None, contains a NumPy array with
- the query image scales, used to produce match visualization, if there is a
- match. If None and a visualization will be produced, [1.0, 1.0] is used
- (ie, feature locations are not scaled).
- index_im_scale_factors: Optional. Same as `query_im_scale_factors`, but for
- index image.
- use_ratio_test: If True, descriptor matching is performed via ratio test,
- instead of distance-based threshold.
-
- Returns:
- score: Number of inliers of match. If no match is found, returns 0.
- match_viz_bytes: Encoded image bytes with visualization of the match, if
- there is one, and if `query_im_array` and `index_im_array` are properly
- set. Otherwise, it's an empty bytes string.
-
- Raises:
- ValueError: If local descriptors from query and index images have different
- dimensionalities.
- """
- num_features_query = query_locations.shape[0]
- num_features_index_image = index_image_locations.shape[0]
- if not num_features_query or not num_features_index_image:
- return 0, b''
-
- local_feature_dim = query_descriptors.shape[1]
- if index_image_descriptors.shape[1] != local_feature_dim:
- raise ValueError(
- 'Local feature dimensionality is not consistent for query and index '
- 'images.')
-
- # Construct KD-tree used to find nearest neighbors.
- index_image_tree = spatial.cKDTree(index_image_descriptors)
- if use_ratio_test:
- distances, indices = index_image_tree.query(
- query_descriptors, k=2, n_jobs=-1)
- query_locations_to_use = np.array([
- query_locations[i,]
- for i in range(num_features_query)
- if distances[i][0] < descriptor_matching_threshold * distances[i][1]
- ])
- index_image_locations_to_use = np.array([
- index_image_locations[indices[i][0],]
- for i in range(num_features_query)
- if distances[i][0] < descriptor_matching_threshold * distances[i][1]
- ])
- else:
- _, indices = index_image_tree.query(
- query_descriptors,
- distance_upper_bound=descriptor_matching_threshold,
- n_jobs=-1)
-
- # Select feature locations for putative matches.
- query_locations_to_use = np.array([
- query_locations[i,]
- for i in range(num_features_query)
- if indices[i] != num_features_index_image
- ])
- index_image_locations_to_use = np.array([
- index_image_locations[indices[i],]
- for i in range(num_features_query)
- if indices[i] != num_features_index_image
- ])
-
- # If there are not enough putative matches, early return 0.
- if query_locations_to_use.shape[0] <= _MIN_RANSAC_SAMPLES:
- return 0, b''
-
- # Perform geometric verification using RANSAC.
- _, inliers = measure.ransac(
- (index_image_locations_to_use, query_locations_to_use),
- transform.AffineTransform,
- min_samples=_MIN_RANSAC_SAMPLES,
- residual_threshold=ransac_residual_threshold,
- max_trials=_NUM_RANSAC_TRIALS,
- random_state=ransac_seed)
- match_viz_bytes = b''
-
- if inliers is None:
- inliers = []
- elif query_im_array is not None and index_im_array is not None:
- if query_im_scale_factors is None:
- query_im_scale_factors = [1.0, 1.0]
- if index_im_scale_factors is None:
- index_im_scale_factors = [1.0, 1.0]
- inlier_idxs = np.nonzero(inliers)[0]
- _, ax = plt.subplots()
- ax.axis('off')
- ax.xaxis.set_major_locator(plt.NullLocator())
- ax.yaxis.set_major_locator(plt.NullLocator())
- plt.subplots_adjust(top=1, bottom=0, right=1, left=0, hspace=0, wspace=0)
- plt.margins(0, 0)
- feature.plot_matches(
- ax,
- query_im_array,
- index_im_array,
- query_locations_to_use * query_im_scale_factors,
- index_image_locations_to_use * index_im_scale_factors,
- np.column_stack((inlier_idxs, inlier_idxs)),
- only_matches=True)
-
- match_viz_io = io.BytesIO()
- plt.savefig(match_viz_io, format='jpeg', bbox_inches='tight', pad_inches=0)
- match_viz_bytes = match_viz_io.getvalue()
-
- return sum(inliers), match_viz_bytes
-
-
-def RerankByGeometricVerification(input_ranks,
- initial_scores,
- query_name,
- index_names,
- query_features_dir,
- index_features_dir,
- junk_ids,
- local_feature_extension=_DELF_EXTENSION,
- ransac_seed=None,
- descriptor_matching_threshold=0.9,
- ransac_residual_threshold=10.0,
- use_ratio_test=False):
- """Re-ranks retrieval results using geometric verification.
-
- Args:
- input_ranks: 1D NumPy array with indices of top-ranked index images, sorted
- from the most to the least similar.
- initial_scores: 1D NumPy array with initial similarity scores between query
- and index images. Entry i corresponds to score for image i.
- query_name: Name for query image (string).
- index_names: List of names for index images (strings).
- query_features_dir: Directory where query local feature file is located
- (string).
- index_features_dir: Directory where index local feature files are located
- (string).
- junk_ids: Set with indices of junk images which should not be considered
- during re-ranking.
- local_feature_extension: String, extension to use for loading local feature
- files.
- ransac_seed: Seed used by RANSAC. If None (default), no seed is provided.
- descriptor_matching_threshold: Threshold used for local descriptor matching.
- ransac_residual_threshold: Residual error threshold for considering matches
- as inliers, used in RANSAC algorithm.
- use_ratio_test: If True, descriptor matching is performed via ratio test,
- instead of distance-based threshold.
-
- Returns:
- output_ranks: 1D NumPy array with index image indices, sorted from the most
- to the least similar according to the geometric verification and initial
- scores.
-
- Raises:
- ValueError: If `input_ranks`, `initial_scores` and `index_names` do not have
- the same number of entries.
- """
- num_index_images = len(index_names)
- if len(input_ranks) != num_index_images:
- raise ValueError('input_ranks and index_names have different number of '
- 'elements: %d vs %d' %
- (len(input_ranks), len(index_names)))
- if len(initial_scores) != num_index_images:
- raise ValueError('initial_scores and index_names have different number of '
- 'elements: %d vs %d' %
- (len(initial_scores), len(index_names)))
-
- # Filter out junk images from list that will be re-ranked.
- input_ranks_for_gv = []
- for ind in input_ranks:
- if ind not in junk_ids:
- input_ranks_for_gv.append(ind)
- num_to_rerank = min(_NUM_TO_RERANK, len(input_ranks_for_gv))
-
- # Load query image features.
- query_features_path = os.path.join(query_features_dir,
- query_name + local_feature_extension)
- query_locations, _, query_descriptors, _, _ = feature_io.ReadFromFile(
- query_features_path)
-
- # Initialize list containing number of inliers and initial similarity scores.
- inliers_and_initial_scores = []
- for i in range(num_index_images):
- inliers_and_initial_scores.append([0, initial_scores[i]])
-
- # Loop over top-ranked images and get results.
- print('Starting to re-rank')
- for i in range(num_to_rerank):
- if i > 0 and i % _STATUS_CHECK_GV_ITERATIONS == 0:
- print('Re-ranking: i = %d out of %d' % (i, num_to_rerank))
-
- index_image_id = input_ranks_for_gv[i]
-
- # Load index image features.
- index_image_features_path = os.path.join(
- index_features_dir,
- index_names[index_image_id] + local_feature_extension)
- (index_image_locations, _, index_image_descriptors, _,
- _) = feature_io.ReadFromFile(index_image_features_path)
-
- inliers_and_initial_scores[index_image_id][0], _ = MatchFeatures(
- query_locations,
- query_descriptors,
- index_image_locations,
- index_image_descriptors,
- ransac_seed=ransac_seed,
- descriptor_matching_threshold=descriptor_matching_threshold,
- ransac_residual_threshold=ransac_residual_threshold,
- use_ratio_test=use_ratio_test)
-
- # Sort based on (inliers_score, initial_score).
- def _InliersInitialScoresSorting(k):
- """Helper function to sort list based on two entries.
-
- Args:
- k: Index into `inliers_and_initial_scores`.
-
- Returns:
- Tuple containing inlier score and initial score.
- """
- return (inliers_and_initial_scores[k][0], inliers_and_initial_scores[k][1])
-
- output_ranks = sorted(
- range(num_index_images), key=_InliersInitialScoresSorting, reverse=True)
-
- return output_ranks
diff --git a/research/delf/delf/python/detect_to_retrieve/index_aggregation_config.pbtxt b/research/delf/delf/python/detect_to_retrieve/index_aggregation_config.pbtxt
deleted file mode 100644
index ba7ba4e4956..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/index_aggregation_config.pbtxt
+++ /dev/null
@@ -1,10 +0,0 @@
-codebook_size: 65536
-feature_dimensionality: 128
-aggregation_type: ASMK_STAR
-use_l2_normalization: false
-codebook_path: "parameters/rparis6k_codebook_65536/k65536_codebook_tfckpt/codebook"
-num_assignments: 1
-use_regional_aggregation: true
-feature_batch_size: 100
-alpha: 3.0
-tau: 0.0
diff --git a/research/delf/delf/python/detect_to_retrieve/perform_retrieval.py b/research/delf/delf/python/detect_to_retrieve/perform_retrieval.py
deleted file mode 100644
index 2b7a2278925..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/perform_retrieval.py
+++ /dev/null
@@ -1,302 +0,0 @@
-# Lint as: python3
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Performs image retrieval on Revisited Oxford/Paris datasets."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-import time
-
-from absl import app
-import numpy as np
-import tensorflow as tf
-
-from google.protobuf import text_format
-from delf import aggregation_config_pb2
-from delf import datum_io
-from delf import feature_aggregation_similarity
-from delf.python.datasets.revisited_op import dataset
-from delf.python.detect_to_retrieve import image_reranking
-
-cmd_args = None
-
-# Aliases for aggregation types.
-_VLAD = aggregation_config_pb2.AggregationConfig.VLAD
-_ASMK = aggregation_config_pb2.AggregationConfig.ASMK
-_ASMK_STAR = aggregation_config_pb2.AggregationConfig.ASMK_STAR
-
-# Extensions.
-_VLAD_EXTENSION_SUFFIX = 'vlad'
-_ASMK_EXTENSION_SUFFIX = 'asmk'
-_ASMK_STAR_EXTENSION_SUFFIX = 'asmk_star'
-
-# Precision-recall ranks to use in metric computation.
-_PR_RANKS = (1, 5, 10)
-
-# Pace to log.
-_STATUS_CHECK_LOAD_ITERATIONS = 50
-
-# Output file names.
-_METRICS_FILENAME = 'metrics.txt'
-
-
-def _ReadAggregatedDescriptors(input_dir, image_list, config):
- """Reads aggregated descriptors.
-
- Args:
- input_dir: Directory where aggregated descriptors are located.
- image_list: List of image names for which to load descriptors.
- config: AggregationConfig used for images.
-
- Returns:
- aggregated_descriptors: List containing #images items, each a 1D NumPy
- array.
- visual_words: If using VLAD aggregation, returns an empty list. Otherwise,
- returns a list containing #images items, each a 1D NumPy array.
- """
- # Compose extension of aggregated descriptors.
- extension = '.'
- if config.use_regional_aggregation:
- extension += 'r'
- if config.aggregation_type == _VLAD:
- extension += _VLAD_EXTENSION_SUFFIX
- elif config.aggregation_type == _ASMK:
- extension += _ASMK_EXTENSION_SUFFIX
- elif config.aggregation_type == _ASMK_STAR:
- extension += _ASMK_STAR_EXTENSION_SUFFIX
- else:
- raise ValueError('Invalid aggregation type: %d' % config.aggregation_type)
-
- num_images = len(image_list)
- aggregated_descriptors = []
- visual_words = []
- print('Starting to collect descriptors for %d images...' % num_images)
- start = time.clock()
- for i in range(num_images):
- if i > 0 and i % _STATUS_CHECK_LOAD_ITERATIONS == 0:
- elapsed = (time.clock() - start)
- print('Reading descriptors for image %d out of %d, last %d '
- 'images took %f seconds' %
- (i, num_images, _STATUS_CHECK_LOAD_ITERATIONS, elapsed))
- start = time.clock()
-
- descriptors_filename = image_list[i] + extension
- descriptors_fullpath = os.path.join(input_dir, descriptors_filename)
- if config.aggregation_type == _VLAD:
- aggregated_descriptors.append(datum_io.ReadFromFile(descriptors_fullpath))
- else:
- d, v = datum_io.ReadPairFromFile(descriptors_fullpath)
- if config.aggregation_type == _ASMK_STAR:
- d = d.astype('uint8')
-
- aggregated_descriptors.append(d)
- visual_words.append(v)
-
- return aggregated_descriptors, visual_words
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Parse dataset to obtain query/index images, and ground-truth.
- print('Parsing dataset...')
- query_list, index_list, ground_truth = dataset.ReadDatasetFile(
- cmd_args.dataset_file_path)
- num_query_images = len(query_list)
- num_index_images = len(index_list)
- (_, medium_ground_truth,
- hard_ground_truth) = dataset.ParseEasyMediumHardGroundTruth(ground_truth)
- print('done! Found %d queries and %d index images' %
- (num_query_images, num_index_images))
-
- # Parse AggregationConfig protos.
- query_config = aggregation_config_pb2.AggregationConfig()
- with tf.io.gfile.GFile(cmd_args.query_aggregation_config_path, 'r') as f:
- text_format.Merge(f.read(), query_config)
- index_config = aggregation_config_pb2.AggregationConfig()
- with tf.io.gfile.GFile(cmd_args.index_aggregation_config_path, 'r') as f:
- text_format.Merge(f.read(), index_config)
-
- # Read aggregated descriptors.
- query_aggregated_descriptors, query_visual_words = _ReadAggregatedDescriptors(
- cmd_args.query_aggregation_dir, query_list, query_config)
- index_aggregated_descriptors, index_visual_words = _ReadAggregatedDescriptors(
- cmd_args.index_aggregation_dir, index_list, index_config)
-
- # Create similarity computer.
- similarity_computer = (
- feature_aggregation_similarity.SimilarityAggregatedRepresentation(
- index_config))
-
- # Compute similarity between query and index images, potentially re-ranking
- # with geometric verification.
- ranks_before_gv = np.zeros([num_query_images, num_index_images],
- dtype='int32')
- if cmd_args.use_geometric_verification:
- medium_ranks_after_gv = np.zeros([num_query_images, num_index_images],
- dtype='int32')
- hard_ranks_after_gv = np.zeros([num_query_images, num_index_images],
- dtype='int32')
- for i in range(num_query_images):
- print('Performing retrieval with query %d (%s)...' % (i, query_list[i]))
- start = time.clock()
-
- # Compute similarity between aggregated descriptors.
- similarities = np.zeros([num_index_images])
- for j in range(num_index_images):
- similarities[j] = similarity_computer.ComputeSimilarity(
- query_aggregated_descriptors[i], index_aggregated_descriptors[j],
- query_visual_words[i], index_visual_words[j])
-
- ranks_before_gv[i] = np.argsort(-similarities)
-
- # Re-rank using geometric verification.
- if cmd_args.use_geometric_verification:
- medium_ranks_after_gv[i] = image_reranking.RerankByGeometricVerification(
- ranks_before_gv[i], similarities, query_list[i], index_list,
- cmd_args.query_features_dir, cmd_args.index_features_dir,
- set(medium_ground_truth[i]['junk']))
- hard_ranks_after_gv[i] = image_reranking.RerankByGeometricVerification(
- ranks_before_gv[i], similarities, query_list[i], index_list,
- cmd_args.query_features_dir, cmd_args.index_features_dir,
- set(hard_ground_truth[i]['junk']))
-
- elapsed = (time.clock() - start)
- print('done! Retrieval for query %d took %f seconds' % (i, elapsed))
-
- # Create output directory if necessary.
- if not tf.io.gfile.exists(cmd_args.output_dir):
- tf.io.gfile.makedirs(cmd_args.output_dir)
-
- # Compute metrics.
- medium_metrics = dataset.ComputeMetrics(ranks_before_gv, medium_ground_truth,
- _PR_RANKS)
- hard_metrics = dataset.ComputeMetrics(ranks_before_gv, hard_ground_truth,
- _PR_RANKS)
- if cmd_args.use_geometric_verification:
- medium_metrics_after_gv = dataset.ComputeMetrics(medium_ranks_after_gv,
- medium_ground_truth,
- _PR_RANKS)
- hard_metrics_after_gv = dataset.ComputeMetrics(hard_ranks_after_gv,
- hard_ground_truth, _PR_RANKS)
-
- # Write metrics to file.
- mean_average_precision_dict = {
- 'medium': medium_metrics[0],
- 'hard': hard_metrics[0]
- }
- mean_precisions_dict = {'medium': medium_metrics[1], 'hard': hard_metrics[1]}
- mean_recalls_dict = {'medium': medium_metrics[2], 'hard': hard_metrics[2]}
- if cmd_args.use_geometric_verification:
- mean_average_precision_dict.update({
- 'medium_after_gv': medium_metrics_after_gv[0],
- 'hard_after_gv': hard_metrics_after_gv[0]
- })
- mean_precisions_dict.update({
- 'medium_after_gv': medium_metrics_after_gv[1],
- 'hard_after_gv': hard_metrics_after_gv[1]
- })
- mean_recalls_dict.update({
- 'medium_after_gv': medium_metrics_after_gv[2],
- 'hard_after_gv': hard_metrics_after_gv[2]
- })
- dataset.SaveMetricsFile(mean_average_precision_dict, mean_precisions_dict,
- mean_recalls_dict, _PR_RANKS,
- os.path.join(cmd_args.output_dir, _METRICS_FILENAME))
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--index_aggregation_config_path',
- type=str,
- default='/tmp/index_aggregation_config.pbtxt',
- help="""
- Path to index AggregationConfig proto text file. This is used to load the
- aggregated descriptors from the index, and to define the parameters used
- in computing similarity for aggregated descriptors.
- """)
- parser.add_argument(
- '--query_aggregation_config_path',
- type=str,
- default='/tmp/query_aggregation_config.pbtxt',
- help="""
- Path to query AggregationConfig proto text file. This is only used to load
- the aggregated descriptors for the queries.
- """)
- parser.add_argument(
- '--dataset_file_path',
- type=str,
- default='/tmp/gnd_roxford5k.mat',
- help="""
- Dataset file for Revisited Oxford or Paris dataset, in .mat format.
- """)
- parser.add_argument(
- '--index_aggregation_dir',
- type=str,
- default='/tmp/index_aggregation',
- help="""
- Directory where index aggregated descriptors are located.
- """)
- parser.add_argument(
- '--query_aggregation_dir',
- type=str,
- default='/tmp/query_aggregation',
- help="""
- Directory where query aggregated descriptors are located.
- """)
- parser.add_argument(
- '--use_geometric_verification',
- type=lambda x: (str(x).lower() == 'true'),
- default=False,
- help="""
- If True, performs re-ranking using local feature-based geometric
- verification.
- """)
- parser.add_argument(
- '--index_features_dir',
- type=str,
- default='/tmp/index_features',
- help="""
- Only used if `use_geometric_verification` is True.
- Directory where index local image features are located, all in .delf
- format.
- """)
- parser.add_argument(
- '--query_features_dir',
- type=str,
- default='/tmp/query_features',
- help="""
- Only used if `use_geometric_verification` is True.
- Directory where query local image features are located, all in .delf
- format.
- """)
- parser.add_argument(
- '--output_dir',
- type=str,
- default='/tmp/retrieval',
- help="""
- Directory where retrieval output will be written to. A file containing
- metrics for this run is saved therein, with file name "metrics.txt".
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/detect_to_retrieve/query_aggregation_config.pbtxt b/research/delf/delf/python/detect_to_retrieve/query_aggregation_config.pbtxt
deleted file mode 100644
index 39a917eef43..00000000000
--- a/research/delf/delf/python/detect_to_retrieve/query_aggregation_config.pbtxt
+++ /dev/null
@@ -1,7 +0,0 @@
-codebook_size: 65536
-feature_dimensionality: 128
-aggregation_type: ASMK_STAR
-codebook_path: "parameters/rparis6k_codebook_65536/k65536_codebook_tfckpt/codebook"
-num_assignments: 1
-use_regional_aggregation: false
-feature_batch_size: 100
diff --git a/research/delf/delf/python/examples/__init__.py b/research/delf/delf/python/examples/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/delf/delf/python/examples/delf_config_example.pbtxt b/research/delf/delf/python/examples/delf_config_example.pbtxt
deleted file mode 100644
index ff2d9c0023c..00000000000
--- a/research/delf/delf/python/examples/delf_config_example.pbtxt
+++ /dev/null
@@ -1,25 +0,0 @@
-model_path: "parameters/delf_gld_20190411/model/"
-image_scales: .25
-image_scales: .3536
-image_scales: .5
-image_scales: .7071
-image_scales: 1.0
-image_scales: 1.4142
-image_scales: 2.0
-delf_local_config {
- use_pca: true
- # Note that for the exported model provided as an example, layer_name and
- # iou_threshold are hard-coded in the checkpoint. So, the layer_name and
- # iou_threshold variables here have no effect on the provided
- # extract_features.py script.
- layer_name: "resnet_v1_50/block3"
- iou_threshold: 1.0
- max_feature_num: 1000
- score_threshold: 100.0
- pca_parameters {
- mean_path: "parameters/delf_gld_20190411/pca/mean.datum"
- projection_matrix_path: "parameters/delf_gld_20190411/pca/pca_proj_mat.datum"
- pca_dim: 40
- use_whitening: false
- }
-}
diff --git a/research/delf/delf/python/examples/detection_example_1.jpg b/research/delf/delf/python/examples/detection_example_1.jpg
deleted file mode 100644
index afdb388f0de..00000000000
Binary files a/research/delf/delf/python/examples/detection_example_1.jpg and /dev/null differ
diff --git a/research/delf/delf/python/examples/detection_example_2.jpg b/research/delf/delf/python/examples/detection_example_2.jpg
deleted file mode 100644
index 5baf54a8088..00000000000
Binary files a/research/delf/delf/python/examples/detection_example_2.jpg and /dev/null differ
diff --git a/research/delf/delf/python/examples/detector.py b/research/delf/delf/python/examples/detector.py
deleted file mode 100644
index fd8aef1cf7f..00000000000
--- a/research/delf/delf/python/examples/detector.py
+++ /dev/null
@@ -1,55 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Module to construct object detector function."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-
-def MakeDetector(model_dir):
- """Creates a function to detect objects in an image.
-
- Args:
- model_dir: Directory where SavedModel is located.
-
- Returns:
- Function that receives an image and returns detection results.
- """
- model = tf.saved_model.load(model_dir)
-
- # Input and output tensors.
- feeds = ['input_images:0']
- fetches = ['detection_boxes:0', 'detection_scores:0', 'detection_classes:0']
-
- model = model.prune(feeds=feeds, fetches=fetches)
-
- def DetectorFn(images):
- """Receives an image and returns detected boxes.
-
- Args:
- images: Uint8 array with shape (batch, height, width 3) containing a batch
- of RGB images.
-
- Returns:
- Tuple (boxes, scores, class_indices).
- """
- boxes, scores, class_indices = model(tf.convert_to_tensor(images))
-
- return boxes.numpy(), scores.numpy(), class_indices.numpy()
-
- return DetectorFn
diff --git a/research/delf/delf/python/examples/extract_boxes.py b/research/delf/delf/python/examples/extract_boxes.py
deleted file mode 100644
index 1a3b4886a39..00000000000
--- a/research/delf/delf/python/examples/extract_boxes.py
+++ /dev/null
@@ -1,229 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Extracts bounding boxes from a list of images, saving them to files.
-
-The images must be in JPG format. The program checks if boxes already
-exist, and skips computation for those.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-import time
-
-from absl import app
-import matplotlib.patches as patches
-import matplotlib.pyplot as plt
-import numpy as np
-import tensorflow as tf
-
-from delf import box_io
-from delf import utils
-from delf import detector
-
-cmd_args = None
-
-# Extension/suffix of produced files.
-_BOX_EXT = '.boxes'
-_VIZ_SUFFIX = '_viz.jpg'
-
-# Used for plotting boxes.
-_BOX_EDGE_COLORS = ['r', 'y', 'b', 'm', 'k', 'g', 'c', 'w']
-
-# Pace to report extraction log.
-_STATUS_CHECK_ITERATIONS = 100
-
-
-def _ReadImageList(list_path):
- """Helper function to read image paths.
-
- Args:
- list_path: Path to list of images, one image path per line.
-
- Returns:
- image_paths: List of image paths.
- """
- with tf.io.gfile.GFile(list_path, 'r') as f:
- image_paths = f.readlines()
- image_paths = [entry.rstrip() for entry in image_paths]
- return image_paths
-
-
-def _FilterBoxesByScore(boxes, scores, class_indices, score_threshold):
- """Filter boxes based on detection scores.
-
- Boxes with detection score >= score_threshold are returned.
-
- Args:
- boxes: [N, 4] float array denoting bounding box coordinates, in format [top,
- left, bottom, right].
- scores: [N] float array with detection scores.
- class_indices: [N] int array with class indices.
- score_threshold: Float detection score threshold to use.
-
- Returns:
- selected_boxes: selected `boxes`.
- selected_scores: selected `scores`.
- selected_class_indices: selected `class_indices`.
- """
- selected_boxes = []
- selected_scores = []
- selected_class_indices = []
- for i, box in enumerate(boxes):
- if scores[i] >= score_threshold:
- selected_boxes.append(box)
- selected_scores.append(scores[i])
- selected_class_indices.append(class_indices[i])
-
- return np.array(selected_boxes), np.array(selected_scores), np.array(
- selected_class_indices)
-
-
-def _PlotBoxesAndSaveImage(image, boxes, output_path):
- """Plot boxes on image and save to output path.
-
- Args:
- image: Numpy array containing image.
- boxes: [N, 4] float array denoting bounding box coordinates, in format [top,
- left, bottom, right].
- output_path: String containing output path.
- """
- height = image.shape[0]
- width = image.shape[1]
-
- fig, ax = plt.subplots(1)
- ax.imshow(image)
- for i, box in enumerate(boxes):
- scaled_box = [
- box[0] * height, box[1] * width, box[2] * height, box[3] * width
- ]
- rect = patches.Rectangle([scaled_box[1], scaled_box[0]],
- scaled_box[3] - scaled_box[1],
- scaled_box[2] - scaled_box[0],
- linewidth=3,
- edgecolor=_BOX_EDGE_COLORS[i %
- len(_BOX_EDGE_COLORS)],
- facecolor='none')
- ax.add_patch(rect)
-
- ax.axis('off')
- plt.savefig(output_path, bbox_inches='tight')
- plt.close(fig)
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Read list of images.
- print('Reading list of images...')
- image_paths = _ReadImageList(cmd_args.list_images_path)
- num_images = len(image_paths)
- print(f'done! Found {num_images} images')
-
- # Create output directories if necessary.
- if not tf.io.gfile.exists(cmd_args.output_dir):
- tf.io.gfile.makedirs(cmd_args.output_dir)
- if cmd_args.output_viz_dir and not tf.io.gfile.exists(
- cmd_args.output_viz_dir):
- tf.io.gfile.makedirs(cmd_args.output_viz_dir)
-
- detector_fn = detector.MakeDetector(cmd_args.detector_path)
-
- start = time.time()
- for i, image_path in enumerate(image_paths):
- # Report progress once in a while.
- if i == 0:
- print('Starting to detect objects in images...')
- elif i % _STATUS_CHECK_ITERATIONS == 0:
- elapsed = (time.time() - start)
- print(f'Processing image {i} out of {num_images}, last '
- f'{_STATUS_CHECK_ITERATIONS} images took {elapsed} seconds')
- start = time.time()
-
- # If descriptor already exists, skip its computation.
- base_boxes_filename, _ = os.path.splitext(os.path.basename(image_path))
- out_boxes_filename = base_boxes_filename + _BOX_EXT
- out_boxes_fullpath = os.path.join(cmd_args.output_dir, out_boxes_filename)
- if tf.io.gfile.exists(out_boxes_fullpath):
- print(f'Skipping {image_path}')
- continue
-
- im = np.expand_dims(np.array(utils.RgbLoader(image_paths[i])), 0)
-
- # Extract and save boxes.
- (boxes_out, scores_out, class_indices_out) = detector_fn(im)
- (selected_boxes, selected_scores,
- selected_class_indices) = _FilterBoxesByScore(boxes_out[0], scores_out[0],
- class_indices_out[0],
- cmd_args.detector_thresh)
-
- box_io.WriteToFile(out_boxes_fullpath, selected_boxes, selected_scores,
- selected_class_indices)
- if cmd_args.output_viz_dir:
- out_viz_filename = base_boxes_filename + _VIZ_SUFFIX
- out_viz_fullpath = os.path.join(cmd_args.output_viz_dir, out_viz_filename)
- _PlotBoxesAndSaveImage(im[0], selected_boxes, out_viz_fullpath)
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--detector_path',
- type=str,
- default='/tmp/d2r_frcnn_20190411/',
- help="""
- Path to exported detector model.
- """)
- parser.add_argument(
- '--detector_thresh',
- type=float,
- default=.0,
- help="""
- Detector threshold. Any box with confidence score lower than this is not
- returned.
- """)
- parser.add_argument(
- '--list_images_path',
- type=str,
- default='list_images.txt',
- help="""
- Path to list of images to undergo object detection.
- """)
- parser.add_argument(
- '--output_dir',
- type=str,
- default='test_boxes',
- help="""
- Directory where bounding boxes will be written to. Each image's boxes
- will be written to a file with same name, and extension replaced by
- .boxes.
- """)
- parser.add_argument(
- '--output_viz_dir',
- type=str,
- default='',
- help="""
- Optional. If set, a visualization of the detected boxes overlaid on the
- image is produced, and saved to this directory. Each image is saved with
- _viz.jpg suffix.
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/examples/extract_features.py b/research/delf/delf/python/examples/extract_features.py
deleted file mode 100644
index 1b55cba9fb6..00000000000
--- a/research/delf/delf/python/examples/extract_features.py
+++ /dev/null
@@ -1,142 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Extracts DELF features from a list of images, saving them to file.
-
-The images must be in JPG format. The program checks if descriptors already
-exist, and skips computation for those.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-import time
-
-from absl import app
-import numpy as np
-from six.moves import range
-import tensorflow as tf
-
-from google.protobuf import text_format
-from delf import delf_config_pb2
-from delf import feature_io
-from delf import utils
-from delf import extractor
-
-cmd_args = None
-
-# Extension of feature files.
-_DELF_EXT = '.delf'
-
-# Pace to report extraction log.
-_STATUS_CHECK_ITERATIONS = 100
-
-
-def _ReadImageList(list_path):
- """Helper function to read image paths.
-
- Args:
- list_path: Path to list of images, one image path per line.
-
- Returns:
- image_paths: List of image paths.
- """
- with tf.io.gfile.GFile(list_path, 'r') as f:
- image_paths = f.readlines()
- image_paths = [entry.rstrip() for entry in image_paths]
- return image_paths
-
-
-def main(unused_argv):
- # Read list of images.
- print('Reading list of images...')
- image_paths = _ReadImageList(cmd_args.list_images_path)
- num_images = len(image_paths)
- print(f'done! Found {num_images} images')
-
- # Parse DelfConfig proto.
- config = delf_config_pb2.DelfConfig()
- with tf.io.gfile.GFile(cmd_args.config_path, 'r') as f:
- text_format.Merge(f.read(), config)
-
- # Create output directory if necessary.
- if not tf.io.gfile.exists(cmd_args.output_dir):
- tf.io.gfile.makedirs(cmd_args.output_dir)
-
- extractor_fn = extractor.MakeExtractor(config)
-
- start = time.time()
- for i in range(num_images):
- # Report progress once in a while.
- if i == 0:
- print('Starting to extract DELF features from images...')
- elif i % _STATUS_CHECK_ITERATIONS == 0:
- elapsed = (time.time() - start)
- print(f'Processing image {i} out of {num_images}, last '
- f'{_STATUS_CHECK_ITERATIONS} images took {elapsed} seconds')
- start = time.time()
-
- # If descriptor already exists, skip its computation.
- out_desc_filename = os.path.splitext(os.path.basename(
- image_paths[i]))[0] + _DELF_EXT
- out_desc_fullpath = os.path.join(cmd_args.output_dir, out_desc_filename)
- if tf.io.gfile.exists(out_desc_fullpath):
- print(f'Skipping {image_paths[i]}')
- continue
-
- im = np.array(utils.RgbLoader(image_paths[i]))
-
- # Extract and save features.
- extracted_features = extractor_fn(im)
- locations_out = extracted_features['local_features']['locations']
- descriptors_out = extracted_features['local_features']['descriptors']
- feature_scales_out = extracted_features['local_features']['scales']
- attention_out = extracted_features['local_features']['attention']
-
- feature_io.WriteToFile(out_desc_fullpath, locations_out, feature_scales_out,
- descriptors_out, attention_out)
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--config_path',
- type=str,
- default='delf_config_example.pbtxt',
- help="""
- Path to DelfConfig proto text file with configuration to be used for DELF
- extraction.
- """)
- parser.add_argument(
- '--list_images_path',
- type=str,
- default='list_images.txt',
- help="""
- Path to list of images whose DELF features will be extracted.
- """)
- parser.add_argument(
- '--output_dir',
- type=str,
- default='test_features',
- help="""
- Directory where DELF features will be written to. Each image's features
- will be written to a file with same name, and extension replaced by .delf.
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/examples/extractor.py b/research/delf/delf/python/examples/extractor.py
deleted file mode 100644
index a6932b1de58..00000000000
--- a/research/delf/delf/python/examples/extractor.py
+++ /dev/null
@@ -1,262 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Module to construct DELF feature extractor."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-
-from delf import datum_io
-from delf import feature_extractor
-from delf import utils
-
-# Minimum dimensions below which features are not extracted (empty
-# features are returned). This applies after any resizing is performed.
-_MIN_HEIGHT = 10
-_MIN_WIDTH = 10
-
-
-def MakeExtractor(config):
- """Creates a function to extract global and/or local features from an image.
-
- Args:
- config: DelfConfig proto containing the model configuration.
-
- Returns:
- Function that receives an image and returns features.
-
- Raises:
- ValueError: if config is invalid.
- """
- # Assert the configuration.
- if not config.use_local_features and not config.use_global_features:
- raise ValueError('Invalid config: at least one of '
- '{use_local_features, use_global_features} must be True')
-
- # Load model.
- model = tf.saved_model.load(config.model_path)
-
- # Input image scales to use for extraction.
- image_scales_tensor = tf.convert_to_tensor(list(config.image_scales))
-
- # Input (feeds) and output (fetches) end-points. These are only needed when
- # using a model that was exported using TF1.
- feeds = ['input_image:0', 'input_scales:0']
- fetches = []
-
- # Custom configuration needed when local features are used.
- if config.use_local_features:
- # Extra input/output end-points/tensors.
- feeds.append('input_abs_thres:0')
- feeds.append('input_max_feature_num:0')
- fetches.append('boxes:0')
- fetches.append('features:0')
- fetches.append('scales:0')
- fetches.append('scores:0')
- score_threshold_tensor = tf.constant(
- config.delf_local_config.score_threshold)
- max_feature_num_tensor = tf.constant(
- config.delf_local_config.max_feature_num)
-
- # If using PCA, pre-load required parameters.
- local_pca_parameters = {}
- if config.delf_local_config.use_pca:
- local_pca_parameters['mean'] = tf.constant(
- datum_io.ReadFromFile(
- config.delf_local_config.pca_parameters.mean_path),
- dtype=tf.float32)
- local_pca_parameters['matrix'] = tf.constant(
- datum_io.ReadFromFile(
- config.delf_local_config.pca_parameters.projection_matrix_path),
- dtype=tf.float32)
- local_pca_parameters[
- 'dim'] = config.delf_local_config.pca_parameters.pca_dim
- local_pca_parameters['use_whitening'] = (
- config.delf_local_config.pca_parameters.use_whitening)
- if config.delf_local_config.pca_parameters.use_whitening:
- local_pca_parameters['variances'] = tf.squeeze(
- tf.constant(
- datum_io.ReadFromFile(
- config.delf_local_config.pca_parameters.pca_variances_path),
- dtype=tf.float32))
- else:
- local_pca_parameters['variances'] = None
-
- # Custom configuration needed when global features are used.
- if config.use_global_features:
- # Extra input/output end-points/tensors.
- feeds.append('input_global_scales_ind:0')
- fetches.append('global_descriptors:0')
- if config.delf_global_config.image_scales_ind:
- global_scales_ind_tensor = tf.constant(
- list(config.delf_global_config.image_scales_ind))
- else:
- global_scales_ind_tensor = tf.range(len(config.image_scales))
-
- # If using PCA, pre-load required parameters.
- global_pca_parameters = {}
- if config.delf_global_config.use_pca:
- global_pca_parameters['mean'] = tf.constant(
- datum_io.ReadFromFile(
- config.delf_global_config.pca_parameters.mean_path),
- dtype=tf.float32)
- global_pca_parameters['matrix'] = tf.constant(
- datum_io.ReadFromFile(
- config.delf_global_config.pca_parameters.projection_matrix_path),
- dtype=tf.float32)
- global_pca_parameters[
- 'dim'] = config.delf_global_config.pca_parameters.pca_dim
- global_pca_parameters['use_whitening'] = (
- config.delf_global_config.pca_parameters.use_whitening)
- if config.delf_global_config.pca_parameters.use_whitening:
- global_pca_parameters['variances'] = tf.squeeze(
- tf.constant(
- datum_io.ReadFromFile(config.delf_global_config.pca_parameters
- .pca_variances_path),
- dtype=tf.float32))
- else:
- global_pca_parameters['variances'] = None
-
- if not hasattr(config, 'is_tf2_exported') or not config.is_tf2_exported:
- model = model.prune(feeds=feeds, fetches=fetches)
-
- def ExtractorFn(image, resize_factor=1.0):
- """Receives an image and returns DELF global and/or local features.
-
- If image is too small, returns empty features.
-
- Args:
- image: Uint8 array with shape (height, width, 3) containing the RGB image.
- resize_factor: Optional float resize factor for the input image. If given,
- the maximum and minimum allowed image sizes in the config are scaled by
- this factor.
-
- Returns:
- extracted_features: A dict containing the extracted global descriptors
- (key 'global_descriptor' mapping to a [D] float array), and/or local
- features (key 'local_features' mapping to a dict with keys 'locations',
- 'descriptors', 'scales', 'attention').
- """
- resized_image, scale_factors = utils.ResizeImage(
- image, config, resize_factor=resize_factor)
-
- # If the image is too small, returns empty features.
- if resized_image.shape[0] < _MIN_HEIGHT or resized_image.shape[
- 1] < _MIN_WIDTH:
- extracted_features = {'global_descriptor': np.array([])}
- if config.use_local_features:
- extracted_features.update({
- 'local_features': {
- 'locations': np.array([]),
- 'descriptors': np.array([]),
- 'scales': np.array([]),
- 'attention': np.array([]),
- }
- })
- return extracted_features
-
- # Input tensors.
- image_tensor = tf.convert_to_tensor(resized_image)
-
- # Extracted features.
- extracted_features = {}
- output = None
-
- if hasattr(config, 'is_tf2_exported') and config.is_tf2_exported:
- predict = model.signatures['serving_default']
- if config.use_local_features and config.use_global_features:
- output_dict = predict(
- input_image=image_tensor,
- input_scales=image_scales_tensor,
- input_max_feature_num=max_feature_num_tensor,
- input_abs_thres=score_threshold_tensor,
- input_global_scales_ind=global_scales_ind_tensor)
- output = [
- output_dict['boxes'], output_dict['features'],
- output_dict['scales'], output_dict['scores'],
- output_dict['global_descriptors']
- ]
- elif config.use_local_features:
- output_dict = predict(
- input_image=image_tensor,
- input_scales=image_scales_tensor,
- input_max_feature_num=max_feature_num_tensor,
- input_abs_thres=score_threshold_tensor)
- output = [
- output_dict['boxes'], output_dict['features'],
- output_dict['scales'], output_dict['scores']
- ]
- else:
- output_dict = predict(
- input_image=image_tensor,
- input_scales=image_scales_tensor,
- input_global_scales_ind=global_scales_ind_tensor)
- output = [output_dict['global_descriptors']]
- else:
- if config.use_local_features and config.use_global_features:
- output = model(image_tensor, image_scales_tensor,
- score_threshold_tensor, max_feature_num_tensor,
- global_scales_ind_tensor)
- elif config.use_local_features:
- output = model(image_tensor, image_scales_tensor,
- score_threshold_tensor, max_feature_num_tensor)
- else:
- output = model(image_tensor, image_scales_tensor,
- global_scales_ind_tensor)
-
- # Post-process extracted features: normalize, PCA (optional), pooling.
- if config.use_global_features:
- raw_global_descriptors = output[-1]
- global_descriptors_per_scale = feature_extractor.PostProcessDescriptors(
- raw_global_descriptors, config.delf_global_config.use_pca,
- global_pca_parameters)
- unnormalized_global_descriptor = tf.reduce_sum(
- global_descriptors_per_scale, axis=0, name='sum_pooling')
- global_descriptor = tf.nn.l2_normalize(
- unnormalized_global_descriptor, axis=0, name='final_l2_normalization')
- extracted_features.update({
- 'global_descriptor': global_descriptor.numpy(),
- })
-
- if config.use_local_features:
- boxes = output[0]
- raw_local_descriptors = output[1]
- feature_scales = output[2]
- attention_with_extra_dim = output[3]
-
- attention = tf.reshape(attention_with_extra_dim,
- [tf.shape(attention_with_extra_dim)[0]])
- locations, local_descriptors = (
- feature_extractor.DelfFeaturePostProcessing(
- boxes, raw_local_descriptors, config.delf_local_config.use_pca,
- local_pca_parameters))
- if not config.delf_local_config.use_resized_coordinates:
- locations /= scale_factors
-
- extracted_features.update({
- 'local_features': {
- 'locations': locations.numpy(),
- 'descriptors': local_descriptors.numpy(),
- 'scales': feature_scales.numpy(),
- 'attention': attention.numpy(),
- }
- })
-
- return extracted_features
-
- return ExtractorFn
diff --git a/research/delf/delf/python/examples/match_images.py b/research/delf/delf/python/examples/match_images.py
deleted file mode 100644
index f14f93f9eb5..00000000000
--- a/research/delf/delf/python/examples/match_images.py
+++ /dev/null
@@ -1,144 +0,0 @@
-# Lint as: python3
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Matches two images using their DELF features.
-
-The matching is done using feature-based nearest-neighbor search, followed by
-geometric verification using RANSAC.
-
-The DELF features can be extracted using the extract_features.py script.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import sys
-
-from absl import app
-import matplotlib
-# Needed before pyplot import for matplotlib to work properly.
-matplotlib.use('Agg')
-import matplotlib.image as mpimg # pylint: disable=g-import-not-at-top
-import matplotlib.pyplot as plt
-import numpy as np
-from scipy import spatial
-from skimage import feature
-from skimage import measure
-from skimage import transform
-
-from delf import feature_io
-
-cmd_args = None
-
-_DISTANCE_THRESHOLD = 0.8
-
-
-def main(unused_argv):
- # Read features.
- locations_1, _, descriptors_1, _, _ = feature_io.ReadFromFile(
- cmd_args.features_1_path)
- num_features_1 = locations_1.shape[0]
- print(f"Loaded image 1's {num_features_1} features")
- locations_2, _, descriptors_2, _, _ = feature_io.ReadFromFile(
- cmd_args.features_2_path)
- num_features_2 = locations_2.shape[0]
- print(f"Loaded image 2's {num_features_2} features")
-
- # Find nearest-neighbor matches using a KD tree.
- d1_tree = spatial.cKDTree(descriptors_1)
- _, indices = d1_tree.query(
- descriptors_2, distance_upper_bound=_DISTANCE_THRESHOLD)
-
- # Select feature locations for putative matches.
- locations_2_to_use = np.array([
- locations_2[i,]
- for i in range(num_features_2)
- if indices[i] != num_features_1
- ])
- locations_1_to_use = np.array([
- locations_1[indices[i],]
- for i in range(num_features_2)
- if indices[i] != num_features_1
- ])
-
- # Perform geometric verification using RANSAC.
- _, inliers = measure.ransac((locations_1_to_use, locations_2_to_use),
- transform.AffineTransform,
- min_samples=3,
- residual_threshold=20,
- max_trials=1000)
-
- print(f'Found {sum(inliers)} inliers')
-
- # Visualize correspondences, and save to file.
- _, ax = plt.subplots()
- img_1 = mpimg.imread(cmd_args.image_1_path)
- img_2 = mpimg.imread(cmd_args.image_2_path)
- inlier_idxs = np.nonzero(inliers)[0]
- feature.plot_matches(
- ax,
- img_1,
- img_2,
- locations_1_to_use,
- locations_2_to_use,
- np.column_stack((inlier_idxs, inlier_idxs)),
- matches_color='b')
- ax.axis('off')
- ax.set_title('DELF correspondences')
- plt.savefig(cmd_args.output_image)
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.register('type', 'bool', lambda v: v.lower() == 'true')
- parser.add_argument(
- '--image_1_path',
- type=str,
- default='test_images/image_1.jpg',
- help="""
- Path to test image 1.
- """)
- parser.add_argument(
- '--image_2_path',
- type=str,
- default='test_images/image_2.jpg',
- help="""
- Path to test image 2.
- """)
- parser.add_argument(
- '--features_1_path',
- type=str,
- default='test_features/image_1.delf',
- help="""
- Path to DELF features from image 1.
- """)
- parser.add_argument(
- '--features_2_path',
- type=str,
- default='test_features/image_2.delf',
- help="""
- Path to DELF features from image 2.
- """)
- parser.add_argument(
- '--output_image',
- type=str,
- default='test_match.png',
- help="""
- Path where an image showing the matches will be saved.
- """)
- cmd_args, unparsed = parser.parse_known_args()
- app.run(main=main, argv=[sys.argv[0]] + unparsed)
diff --git a/research/delf/delf/python/examples/matched_images_example.jpg b/research/delf/delf/python/examples/matched_images_example.jpg
deleted file mode 100644
index bbd0061ac02..00000000000
Binary files a/research/delf/delf/python/examples/matched_images_example.jpg and /dev/null differ
diff --git a/research/delf/delf/python/feature_aggregation_extractor.py b/research/delf/delf/python/feature_aggregation_extractor.py
deleted file mode 100644
index 29496a0c20c..00000000000
--- a/research/delf/delf/python/feature_aggregation_extractor.py
+++ /dev/null
@@ -1,472 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Local feature aggregation extraction.
-
-For more details, please refer to the paper:
-"Detect-to-Retrieve: Efficient Regional Aggregation for Image Search",
-Proc. CVPR'19 (https://arxiv.org/abs/1812.01584).
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-
-from delf import aggregation_config_pb2
-
-_CLUSTER_CENTERS_VAR_NAME = "clusters"
-_NORM_SQUARED_TOLERANCE = 1e-12
-
-# Aliases for aggregation types.
-_VLAD = aggregation_config_pb2.AggregationConfig.VLAD
-_ASMK = aggregation_config_pb2.AggregationConfig.ASMK
-_ASMK_STAR = aggregation_config_pb2.AggregationConfig.ASMK_STAR
-
-
-class ExtractAggregatedRepresentation(object):
- """Class for extraction of aggregated local feature representation.
-
- Args:
- aggregation_config: AggregationConfig object defining type of aggregation to
- use.
-
- Raises:
- ValueError: If aggregation type is invalid.
- """
-
- def __init__(self, aggregation_config):
- self._codebook_size = aggregation_config.codebook_size
- self._feature_dimensionality = aggregation_config.feature_dimensionality
- self._aggregation_type = aggregation_config.aggregation_type
- self._feature_batch_size = aggregation_config.feature_batch_size
- self._codebook_path = aggregation_config.codebook_path
- self._use_regional_aggregation = aggregation_config.use_regional_aggregation
- self._use_l2_normalization = aggregation_config.use_l2_normalization
- self._num_assignments = aggregation_config.num_assignments
-
- if self._aggregation_type not in [_VLAD, _ASMK, _ASMK_STAR]:
- raise ValueError("Invalid aggregation type: %d" % self._aggregation_type)
-
- # Load codebook
- codebook = tf.Variable(
- tf.zeros([self._codebook_size, self._feature_dimensionality],
- dtype=tf.float32),
- name=_CLUSTER_CENTERS_VAR_NAME)
- ckpt = tf.train.Checkpoint(codebook=codebook)
- ckpt.restore(self._codebook_path)
-
- self._codebook = codebook
-
- def Extract(self, features, num_features_per_region=None):
- """Extracts aggregated representation.
-
- Args:
- features: [N, D] float numpy array with N local feature descriptors.
- num_features_per_region: Required only if computing regional aggregated
- representations, otherwise optional. List of number of features per
- region, such that sum(num_features_per_region) = N. It indicates which
- features correspond to each region.
-
- Returns:
- aggregated_descriptors: 1-D numpy array.
- feature_visual_words: Used only for ASMK/ASMK* aggregation type. 1-D
- numpy array denoting visual words corresponding to the
- `aggregated_descriptors`.
-
- Raises:
- ValueError: If inputs are misconfigured.
- """
- features = tf.cast(features, dtype=tf.float32)
-
- if num_features_per_region is None:
- # Use dummy value since it is unused.
- num_features_per_region = []
- else:
- num_features_per_region = tf.cast(num_features_per_region, dtype=tf.int32)
- if len(num_features_per_region
- ) and sum(num_features_per_region) != features.shape[0]:
- raise ValueError(
- "Incorrect arguments: sum(num_features_per_region) and "
- "features.shape[0] are different: %d vs %d" %
- (sum(num_features_per_region), features.shape[0]))
-
- # Extract features based on desired options.
- if self._aggregation_type == _VLAD:
- # Feature visual words are unused in the case of VLAD, so just return
- # dummy constant.
- feature_visual_words = tf.constant(-1, dtype=tf.int32)
- if self._use_regional_aggregation:
- aggregated_descriptors = self._ComputeRvlad(
- features,
- num_features_per_region,
- self._codebook,
- use_l2_normalization=self._use_l2_normalization,
- num_assignments=self._num_assignments)
- else:
- aggregated_descriptors = self._ComputeVlad(
- features,
- self._codebook,
- use_l2_normalization=self._use_l2_normalization,
- num_assignments=self._num_assignments)
- elif (self._aggregation_type == _ASMK or
- self._aggregation_type == _ASMK_STAR):
- if self._use_regional_aggregation:
- (aggregated_descriptors,
- feature_visual_words) = self._ComputeRasmk(
- features,
- num_features_per_region,
- self._codebook,
- num_assignments=self._num_assignments)
- else:
- (aggregated_descriptors,
- feature_visual_words) = self._ComputeAsmk(
- features,
- self._codebook,
- num_assignments=self._num_assignments)
-
- feature_visual_words_output = feature_visual_words.numpy()
-
- # If using ASMK*/RASMK*, binarize the aggregated descriptors.
- if self._aggregation_type == _ASMK_STAR:
- reshaped_aggregated_descriptors = np.reshape(
- aggregated_descriptors, [-1, self._feature_dimensionality])
- packed_descriptors = np.packbits(
- reshaped_aggregated_descriptors > 0, axis=1)
- aggregated_descriptors_output = np.reshape(packed_descriptors, [-1])
- else:
- aggregated_descriptors_output = aggregated_descriptors.numpy()
-
- return aggregated_descriptors_output, feature_visual_words_output
-
- def _ComputeVlad(self,
- features,
- codebook,
- use_l2_normalization=True,
- num_assignments=1):
- """Compute VLAD representation.
-
- Args:
- features: [N, D] float tensor.
- codebook: [K, D] float tensor.
- use_l2_normalization: If False, does not L2-normalize after aggregation.
- num_assignments: Number of visual words to assign a feature to.
-
- Returns:
- vlad: [K*D] float tensor.
- """
-
- def _ComputeVladEmptyFeatures():
- """Computes VLAD if `features` is empty.
-
- Returns:
- [K*D] all-zeros tensor.
- """
- return tf.zeros([self._codebook_size * self._feature_dimensionality],
- dtype=tf.float32)
-
- def _ComputeVladNonEmptyFeatures():
- """Computes VLAD if `features` is not empty.
-
- Returns:
- [K*D] tensor with VLAD descriptor.
- """
- num_features = tf.shape(features)[0]
-
- # Find nearest visual words for each feature. Possibly batch the local
- # features to avoid OOM.
- if self._feature_batch_size <= 0:
- actual_batch_size = num_features
- else:
- actual_batch_size = self._feature_batch_size
-
- def _BatchNearestVisualWords(ind, selected_visual_words):
- """Compute nearest neighbor visual words for a batch of features.
-
- Args:
- ind: Integer index denoting feature.
- selected_visual_words: Partial set of visual words.
-
- Returns:
- output_ind: Next index.
- output_selected_visual_words: Updated set of visual words, including
- the visual words for the new batch.
- """
- # Handle case of last batch, where there may be fewer than
- # `actual_batch_size` features.
- batch_size_to_use = tf.cond(
- tf.greater(ind + actual_batch_size, num_features),
- true_fn=lambda: num_features - ind,
- false_fn=lambda: actual_batch_size)
-
- # Denote B = batch_size_to_use.
- # K*B x D.
- tiled_features = tf.reshape(
- tf.tile(
- tf.slice(features, [ind, 0],
- [batch_size_to_use, self._feature_dimensionality]),
- [1, self._codebook_size]), [-1, self._feature_dimensionality])
- # K*B x D.
- tiled_codebook = tf.reshape(
- tf.tile(tf.reshape(codebook, [1, -1]), [batch_size_to_use, 1]),
- [-1, self._feature_dimensionality])
- # B x K.
- squared_distances = tf.reshape(
- tf.reduce_sum(
- tf.math.squared_difference(tiled_features, tiled_codebook),
- axis=1), [batch_size_to_use, self._codebook_size])
- # B x K.
- nearest_visual_words = tf.argsort(squared_distances)
- # B x num_assignments.
- batch_selected_visual_words = tf.slice(
- nearest_visual_words, [0, 0], [batch_size_to_use, num_assignments])
- selected_visual_words = tf.concat(
- [selected_visual_words, batch_selected_visual_words], axis=0)
-
- return ind + batch_size_to_use, selected_visual_words
-
- ind_batch = tf.constant(0, dtype=tf.int32)
- keep_going = lambda j, selected_visual_words: tf.less(j, num_features)
- selected_visual_words = tf.zeros([0, num_assignments], dtype=tf.int32)
- _, selected_visual_words = tf.while_loop(
- cond=keep_going,
- body=_BatchNearestVisualWords,
- loop_vars=[ind_batch, selected_visual_words],
- shape_invariants=[
- ind_batch.get_shape(),
- tf.TensorShape([None, num_assignments])
- ],
- parallel_iterations=1,
- back_prop=False)
-
- # Helper function to collect residuals for relevant visual words.
- def _ConstructVladFromAssignments(ind, vlad):
- """Add contributions of a feature to a VLAD descriptor.
-
- Args:
- ind: Integer index denoting feature.
- vlad: Partial VLAD descriptor.
-
- Returns:
- output_ind: Next index (ie, ind+1).
- output_vlad: VLAD descriptor updated to take into account contribution
- from ind-th feature.
- """
- diff = tf.tile(
- tf.expand_dims(features[ind],
- axis=0), [num_assignments, 1]) - tf.gather(
- codebook, selected_visual_words[ind])
- return ind + 1, tf.tensor_scatter_nd_add(
- vlad, tf.expand_dims(selected_visual_words[ind], axis=1), diff)
-
- ind_vlad = tf.constant(0, dtype=tf.int32)
- keep_going = lambda j, vlad: tf.less(j, num_features)
- vlad = tf.zeros([self._codebook_size, self._feature_dimensionality],
- dtype=tf.float32)
- _, vlad = tf.while_loop(
- cond=keep_going,
- body=_ConstructVladFromAssignments,
- loop_vars=[ind_vlad, vlad],
- back_prop=False)
-
- vlad = tf.reshape(vlad,
- [self._codebook_size * self._feature_dimensionality])
- if use_l2_normalization:
- vlad = tf.math.l2_normalize(vlad, epsilon=_NORM_SQUARED_TOLERANCE)
-
- return vlad
-
- return tf.cond(
- tf.greater(tf.size(features), 0),
- true_fn=_ComputeVladNonEmptyFeatures,
- false_fn=_ComputeVladEmptyFeatures)
-
- def _ComputeRvlad(self,
- features,
- num_features_per_region,
- codebook,
- use_l2_normalization=False,
- num_assignments=1):
- """Compute R-VLAD representation.
-
- Args:
- features: [N, D] float tensor.
- num_features_per_region: [R] int tensor. Contains number of features per
- region, such that sum(num_features_per_region) = N. It indicates which
- features correspond to each region.
- codebook: [K, D] float tensor.
- use_l2_normalization: If True, performs L2-normalization after regional
- aggregation; if False (default), performs componentwise division by R
- after regional aggregation.
- num_assignments: Number of visual words to assign a feature to.
-
- Returns:
- rvlad: [K*D] float tensor.
- """
-
- def _ComputeRvladEmptyRegions():
- """Computes R-VLAD if `num_features_per_region` is empty.
-
- Returns:
- [K*D] all-zeros tensor.
- """
- return tf.zeros([self._codebook_size * self._feature_dimensionality],
- dtype=tf.float32)
-
- def _ComputeRvladNonEmptyRegions():
- """Computes R-VLAD if `num_features_per_region` is not empty.
-
- Returns:
- [K*D] tensor with R-VLAD descriptor.
- """
-
- # Helper function to compose initial R-VLAD from image regions.
- def _ConstructRvladFromVlad(ind, rvlad):
- """Add contributions from different regions into R-VLAD.
-
- Args:
- ind: Integer index denoting region.
- rvlad: Partial R-VLAD descriptor.
-
- Returns:
- output_ind: Next index (ie, ind+1).
- output_rvlad: R-VLAD descriptor updated to take into account
- contribution from ind-th region.
- """
- return ind + 1, rvlad + self._ComputeVlad(
- tf.slice(
- features, [tf.reduce_sum(num_features_per_region[:ind]), 0],
- [num_features_per_region[ind], self._feature_dimensionality]),
- codebook,
- num_assignments=num_assignments)
-
- i = tf.constant(0, dtype=tf.int32)
- num_regions = tf.shape(num_features_per_region)[0]
- keep_going = lambda j, rvlad: tf.less(j, num_regions)
- rvlad = tf.zeros([self._codebook_size * self._feature_dimensionality],
- dtype=tf.float32)
- _, rvlad = tf.while_loop(
- cond=keep_going,
- body=_ConstructRvladFromVlad,
- loop_vars=[i, rvlad],
- back_prop=False,
- parallel_iterations=1)
-
- if use_l2_normalization:
- rvlad = tf.math.l2_normalize(rvlad, epsilon=_NORM_SQUARED_TOLERANCE)
- else:
- rvlad /= tf.cast(num_regions, dtype=tf.float32)
-
- return rvlad
-
- return tf.cond(
- tf.greater(tf.size(num_features_per_region), 0),
- true_fn=_ComputeRvladNonEmptyRegions,
- false_fn=_ComputeRvladEmptyRegions)
-
- def _PerCentroidNormalization(self, unnormalized_vector):
- """Perform per-centroid normalization.
-
- Args:
- unnormalized_vector: [KxD] float tensor.
-
- Returns:
- per_centroid_normalized_vector: [KxD] float tensor, with normalized
- aggregated residuals. Some residuals may be all-zero.
- visual_words: Int tensor containing indices of visual words which are
- present for the set of features.
- """
- unnormalized_vector = tf.reshape(
- unnormalized_vector,
- [self._codebook_size, self._feature_dimensionality])
- per_centroid_norms = tf.norm(unnormalized_vector, axis=1)
-
- visual_words = tf.reshape(
- tf.where(
- tf.greater(per_centroid_norms, tf.sqrt(_NORM_SQUARED_TOLERANCE))),
- [-1])
-
- per_centroid_normalized_vector = tf.math.l2_normalize(
- unnormalized_vector, axis=1, epsilon=_NORM_SQUARED_TOLERANCE)
-
- return per_centroid_normalized_vector, visual_words
-
- def _ComputeAsmk(self, features, codebook, num_assignments=1):
- """Compute ASMK representation.
-
- Args:
- features: [N, D] float tensor.
- codebook: [K, D] float tensor.
- num_assignments: Number of visual words to assign a feature to.
-
- Returns:
- normalized_residuals: 1-dimensional float tensor with concatenated
- residuals which are non-zero. Note that the dimensionality is
- input-dependent.
- visual_words: 1-dimensional int tensor of sorted visual word ids.
- Dimensionality is shape(normalized_residuals)[0] / D.
- """
- unnormalized_vlad = self._ComputeVlad(
- features,
- codebook,
- use_l2_normalization=False,
- num_assignments=num_assignments)
-
- per_centroid_normalized_vlad, visual_words = self._PerCentroidNormalization(
- unnormalized_vlad)
-
- normalized_residuals = tf.reshape(
- tf.gather(per_centroid_normalized_vlad, visual_words),
- [tf.shape(visual_words)[0] * self._feature_dimensionality])
-
- return normalized_residuals, visual_words
-
- def _ComputeRasmk(self,
- features,
- num_features_per_region,
- codebook,
- num_assignments=1):
- """Compute R-ASMK representation.
-
- Args:
- features: [N, D] float tensor.
- num_features_per_region: [R] int tensor. Contains number of features per
- region, such that sum(num_features_per_region) = N. It indicates which
- features correspond to each region.
- codebook: [K, D] float tensor.
- num_assignments: Number of visual words to assign a feature to.
-
- Returns:
- normalized_residuals: 1-dimensional float tensor with concatenated
- residuals which are non-zero. Note that the dimensionality is
- input-dependent.
- visual_words: 1-dimensional int tensor of sorted visual word ids.
- Dimensionality is shape(normalized_residuals)[0] / D.
- """
- unnormalized_rvlad = self._ComputeRvlad(
- features,
- num_features_per_region,
- codebook,
- use_l2_normalization=False,
- num_assignments=num_assignments)
-
- (per_centroid_normalized_rvlad,
- visual_words) = self._PerCentroidNormalization(unnormalized_rvlad)
-
- normalized_residuals = tf.reshape(
- tf.gather(per_centroid_normalized_rvlad, visual_words),
- [tf.shape(visual_words)[0] * self._feature_dimensionality])
-
- return normalized_residuals, visual_words
diff --git a/research/delf/delf/python/feature_aggregation_extractor_test.py b/research/delf/delf/python/feature_aggregation_extractor_test.py
deleted file mode 100644
index dfba92a2b1b..00000000000
--- a/research/delf/delf/python/feature_aggregation_extractor_test.py
+++ /dev/null
@@ -1,494 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for DELF feature aggregation."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import numpy as np
-import tensorflow as tf
-
-from delf import aggregation_config_pb2
-from delf import feature_aggregation_extractor
-
-FLAGS = flags.FLAGS
-
-
-class FeatureAggregationTest(tf.test.TestCase):
-
- def _CreateCodebook(self, checkpoint_path):
- """Creates codebook used in tests.
-
- Args:
- checkpoint_path: Directory where codebook is saved to.
- """
- codebook = tf.Variable(
- [[0.5, 0.5], [0.0, 0.0], [1.0, 0.0], [-0.5, -0.5], [0.0, 1.0]],
- name='clusters',
- dtype=tf.float32)
- ckpt = tf.train.Checkpoint(codebook=codebook)
- ckpt.write(checkpoint_path)
-
- def setUp(self):
- self._codebook_path = os.path.join(FLAGS.test_tmpdir, 'test_codebook')
- self._CreateCodebook(self._codebook_path)
-
- def testComputeNormalizedVladWorks(self):
- # Construct inputs.
- # 3 2-D features.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0]], dtype=float)
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.use_l2_normalization = True
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- vlad, extra_output = extractor.Extract(features)
-
- # Define expected results.
- exp_vlad = [
- 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.316228, 0.316228, 0.632456, 0.632456
- ]
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllClose(vlad, exp_vlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeNormalizedVladWithBatchingWorks(self):
- # Construct inputs.
- # 3 2-D features.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0]], dtype=float)
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.use_l2_normalization = True
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
- config.feature_batch_size = 2
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- vlad, extra_output = extractor.Extract(features)
-
- # Define expected results.
- exp_vlad = [
- 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.316228, 0.316228, 0.632456, 0.632456
- ]
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllClose(vlad, exp_vlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeUnnormalizedVladWorks(self):
- # Construct inputs.
- # 3 2-D features.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0]], dtype=float)
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.use_l2_normalization = False
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- vlad, extra_output = extractor.Extract(features)
-
- # Define expected results.
- exp_vlad = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.5, 0.5, 1.0, 1.0]
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllEqual(vlad, exp_vlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeUnnormalizedVladMultipleAssignmentWorks(self):
- # Construct inputs.
- # 3 2-D features.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0]], dtype=float)
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.use_l2_normalization = False
- config.codebook_path = self._codebook_path
- config.num_assignments = 3
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- vlad, extra_output = extractor.Extract(features)
-
- # Define expected results.
- exp_vlad = [1.0, 1.0, 0.0, 0.0, 0.0, 2.0, -0.5, 0.5, 0.0, 0.0]
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllEqual(vlad, exp_vlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeVladEmptyFeaturesWorks(self):
- # Construct inputs.
- # Empty feature array.
- features = np.array([[]])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.codebook_path = self._codebook_path
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- vlad, extra_output = extractor.Extract(features)
-
- # Define expected results.
- exp_vlad = np.zeros([10], dtype=float)
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllEqual(vlad, exp_vlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeUnnormalizedRvladWorks(self):
- # Construct inputs.
- # 4 2-D features: 3 in first region, 1 in second region.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0], [0.0, 2.0]],
- dtype=float)
- num_features_per_region = np.array([3, 1])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.use_l2_normalization = False
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
- config.use_regional_aggregation = True
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- rvlad, extra_output = extractor.Extract(features, num_features_per_region)
-
- # Define expected results.
- exp_rvlad = [
- 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.158114, 0.158114, 0.316228, 0.816228
- ]
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllClose(rvlad, exp_rvlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeNormalizedRvladWorks(self):
- # Construct inputs.
- # 4 2-D features: 3 in first region, 1 in second region.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0], [0.0, 2.0]],
- dtype=float)
- num_features_per_region = np.array([3, 1])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.use_l2_normalization = True
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
- config.use_regional_aggregation = True
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- rvlad, extra_output = extractor.Extract(features, num_features_per_region)
-
- # Define expected results.
- exp_rvlad = [
- 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.175011, 0.175011, 0.350021, 0.903453
- ]
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllClose(rvlad, exp_rvlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeRvladEmptyRegionsWorks(self):
- # Construct inputs.
- # Empty feature array.
- features = np.array([[]])
- num_features_per_region = np.array([])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.codebook_path = self._codebook_path
- config.use_regional_aggregation = True
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- rvlad, extra_output = extractor.Extract(features, num_features_per_region)
-
- # Define expected results.
- exp_rvlad = np.zeros([10], dtype=float)
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllEqual(rvlad, exp_rvlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeUnnormalizedRvladSomeEmptyRegionsWorks(self):
- # Construct inputs.
- # 4 2-D features: 0 in first region, 3 in second region, 0 in third region,
- # 1 in fourth region.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0], [0.0, 2.0]],
- dtype=float)
- num_features_per_region = np.array([0, 3, 0, 1])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.use_l2_normalization = False
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
- config.use_regional_aggregation = True
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- rvlad, extra_output = extractor.Extract(features, num_features_per_region)
-
- # Define expected results.
- exp_rvlad = [
- 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.079057, 0.079057, 0.158114, 0.408114
- ]
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllClose(rvlad, exp_rvlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeNormalizedRvladSomeEmptyRegionsWorks(self):
- # Construct inputs.
- # 4 2-D features: 0 in first region, 3 in second region, 0 in third region,
- # 1 in fourth region.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0], [0.0, 2.0]],
- dtype=float)
- num_features_per_region = np.array([0, 3, 0, 1])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.use_l2_normalization = True
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
- config.use_regional_aggregation = True
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- rvlad, extra_output = extractor.Extract(features, num_features_per_region)
-
- # Define expected results.
- exp_rvlad = [
- 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.175011, 0.175011, 0.350021, 0.903453
- ]
- exp_extra_output = -1
-
- # Compare actual and expected results.
- self.assertAllClose(rvlad, exp_rvlad)
- self.assertAllEqual(extra_output, exp_extra_output)
-
- def testComputeRvladMisconfiguredFeatures(self):
- # Construct inputs.
- # 4 2-D features: 3 in first region, 1 in second region.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0], [0.0, 2.0]],
- dtype=float)
- # Misconfigured number of features; there are only 4 features, but
- # sum(num_features_per_region) = 5.
- num_features_per_region = np.array([3, 2])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
- config.codebook_path = self._codebook_path
- config.use_regional_aggregation = True
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- with self.assertRaisesRegex(
- ValueError,
- r'Incorrect arguments: sum\(num_features_per_region\) and '
- r'features.shape\[0\] are different'):
- extractor.Extract(features, num_features_per_region)
-
- def testComputeAsmkWorks(self):
- # Construct inputs.
- # 3 2-D features.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0]], dtype=float)
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.ASMK
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- asmk, visual_words = extractor.Extract(features)
-
- # Define expected results.
- exp_asmk = [-0.707107, 0.707107, 0.707107, 0.707107]
- exp_visual_words = [3, 4]
-
- # Compare actual and expected results.
- self.assertAllClose(asmk, exp_asmk)
- self.assertAllEqual(visual_words, exp_visual_words)
-
- def testComputeAsmkStarWorks(self):
- # Construct inputs.
- # 3 2-D features.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0]], dtype=float)
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.ASMK_STAR
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- asmk_star, visual_words = extractor.Extract(features)
-
- # Define expected results.
- exp_asmk_star = [64, 192]
- exp_visual_words = [3, 4]
-
- # Compare actual and expected results.
- self.assertAllEqual(asmk_star, exp_asmk_star)
- self.assertAllEqual(visual_words, exp_visual_words)
-
- def testComputeAsmkMultipleAssignmentWorks(self):
- # Construct inputs.
- # 3 2-D features.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0]], dtype=float)
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.ASMK
- config.codebook_path = self._codebook_path
- config.num_assignments = 3
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- asmk, visual_words = extractor.Extract(features)
-
- # Define expected results.
- exp_asmk = [0.707107, 0.707107, 0.0, 1.0, -0.707107, 0.707107]
- exp_visual_words = [0, 2, 3]
-
- # Compare actual and expected results.
- self.assertAllClose(asmk, exp_asmk)
- self.assertAllEqual(visual_words, exp_visual_words)
-
- def testComputeRasmkWorks(self):
- # Construct inputs.
- # 4 2-D features: 3 in first region, 1 in second region.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0], [0.0, 2.0]],
- dtype=float)
- num_features_per_region = np.array([3, 1])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.ASMK
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
- config.use_regional_aggregation = True
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- rasmk, visual_words = extractor.Extract(features, num_features_per_region)
-
- # Define expected results.
- exp_rasmk = [-0.707107, 0.707107, 0.361261, 0.932465]
- exp_visual_words = [3, 4]
-
- # Compare actual and expected results.
- self.assertAllClose(rasmk, exp_rasmk)
- self.assertAllEqual(visual_words, exp_visual_words)
-
- def testComputeRasmkStarWorks(self):
- # Construct inputs.
- # 4 2-D features: 3 in first region, 1 in second region.
- features = np.array([[1.0, 0.0], [-1.0, 0.0], [1.0, 2.0], [0.0, 2.0]],
- dtype=float)
- num_features_per_region = np.array([3, 1])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.ASMK_STAR
- config.codebook_path = self._codebook_path
- config.num_assignments = 1
- config.use_regional_aggregation = True
-
- # Run tested function.
- extractor = feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
- rasmk_star, visual_words = extractor.Extract(features,
- num_features_per_region)
-
- # Define expected results.
- exp_rasmk_star = [64, 192]
- exp_visual_words = [3, 4]
-
- # Compare actual and expected results.
- self.assertAllEqual(rasmk_star, exp_rasmk_star)
- self.assertAllEqual(visual_words, exp_visual_words)
-
- def testComputeUnknownAggregation(self):
- # Construct inputs.
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = 0
- config.codebook_path = self._codebook_path
- config.use_regional_aggregation = True
-
- # Run tested function.
- with self.assertRaisesRegex(ValueError, 'Invalid aggregation type'):
- feature_aggregation_extractor.ExtractAggregatedRepresentation(
- config)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/feature_aggregation_similarity.py b/research/delf/delf/python/feature_aggregation_similarity.py
deleted file mode 100644
index 991c95c767c..00000000000
--- a/research/delf/delf/python/feature_aggregation_similarity.py
+++ /dev/null
@@ -1,265 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Local feature aggregation similarity computation.
-
-For more details, please refer to the paper:
-"Detect-to-Retrieve: Efficient Regional Aggregation for Image Search",
-Proc. CVPR'19 (https://arxiv.org/abs/1812.01584).
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-from delf import aggregation_config_pb2
-
-# Aliases for aggregation types.
-_VLAD = aggregation_config_pb2.AggregationConfig.VLAD
-_ASMK = aggregation_config_pb2.AggregationConfig.ASMK
-_ASMK_STAR = aggregation_config_pb2.AggregationConfig.ASMK_STAR
-
-
-class SimilarityAggregatedRepresentation(object):
- """Class for computing similarity of aggregated local feature representations.
-
- Args:
- aggregation_config: AggregationConfig object defining type of aggregation to
- use.
-
- Raises:
- ValueError: If aggregation type is invalid.
- """
-
- def __init__(self, aggregation_config):
- self._feature_dimensionality = aggregation_config.feature_dimensionality
- self._aggregation_type = aggregation_config.aggregation_type
-
- # Only relevant if using ASMK/ASMK*. Otherwise, ignored.
- self._use_l2_normalization = aggregation_config.use_l2_normalization
- self._alpha = aggregation_config.alpha
- self._tau = aggregation_config.tau
-
- # Only relevant if using ASMK*. Otherwise, ignored.
- self._number_bits = np.array([bin(n).count('1') for n in range(256)])
-
- def ComputeSimilarity(self,
- aggregated_descriptors_1,
- aggregated_descriptors_2,
- feature_visual_words_1=None,
- feature_visual_words_2=None):
- """Computes similarity between aggregated descriptors.
-
- Args:
- aggregated_descriptors_1: 1-D NumPy array.
- aggregated_descriptors_2: 1-D NumPy array.
- feature_visual_words_1: Used only for ASMK/ASMK* aggregation type. 1-D
- sorted NumPy integer array denoting visual words corresponding to
- `aggregated_descriptors_1`.
- feature_visual_words_2: Used only for ASMK/ASMK* aggregation type. 1-D
- sorted NumPy integer array denoting visual words corresponding to
- `aggregated_descriptors_2`.
-
- Returns:
- similarity: Float. The larger, the more similar.
-
- Raises:
- ValueError: If aggregation type is invalid.
- """
- if self._aggregation_type == _VLAD:
- similarity = np.dot(aggregated_descriptors_1, aggregated_descriptors_2)
- elif self._aggregation_type == _ASMK:
- similarity = self._AsmkSimilarity(
- aggregated_descriptors_1,
- aggregated_descriptors_2,
- feature_visual_words_1,
- feature_visual_words_2,
- binarized=False)
- elif self._aggregation_type == _ASMK_STAR:
- similarity = self._AsmkSimilarity(
- aggregated_descriptors_1,
- aggregated_descriptors_2,
- feature_visual_words_1,
- feature_visual_words_2,
- binarized=True)
- else:
- raise ValueError('Invalid aggregation type: %d' % self._aggregation_type)
-
- return similarity
-
- def _CheckAsmkDimensionality(self, aggregated_descriptors, num_visual_words,
- descriptor_name):
- """Checks that ASMK dimensionality is as expected.
-
- Args:
- aggregated_descriptors: 1-D NumPy array.
- num_visual_words: Integer.
- descriptor_name: String.
-
- Raises:
- ValueError: If descriptor dimensionality is incorrect.
- """
- if len(aggregated_descriptors
- ) / num_visual_words != self._feature_dimensionality:
- raise ValueError(
- 'Feature dimensionality for aggregated descriptor %s is invalid: %d;'
- ' expected %d.' % (descriptor_name, len(aggregated_descriptors) /
- num_visual_words, self._feature_dimensionality))
-
- def _SigmaFn(self, x):
- """Selectivity ASMK/ASMK* similarity function.
-
- Args:
- x: Scalar or 1-D NumPy array.
-
- Returns:
- result: Same type as input, with output of selectivity function.
- """
- if np.isscalar(x):
- if x > self._tau:
- result = np.sign(x) * np.power(np.absolute(x), self._alpha)
- else:
- result = 0.0
- else:
- result = np.zeros_like(x)
- above_tau = np.nonzero(x > self._tau)
- result[above_tau] = np.sign(x[above_tau]) * np.power(
- np.absolute(x[above_tau]), self._alpha)
-
- return result
-
- def _BinaryNormalizedInnerProduct(self, descriptors_1, descriptors_2):
- """Computes normalized binary inner product.
-
- Args:
- descriptors_1: 1-D NumPy integer array.
- descriptors_2: 1-D NumPy integer array.
-
- Returns:
- inner_product: Float.
-
- Raises:
- ValueError: If the dimensionality of descriptors is different.
- """
- num_descriptors = len(descriptors_1)
- if num_descriptors != len(descriptors_2):
- raise ValueError(
- 'Descriptors have incompatible dimensionality: %d vs %d' %
- (len(descriptors_1), len(descriptors_2)))
-
- h = 0
- for i in range(num_descriptors):
- h += self._number_bits[np.bitwise_xor(descriptors_1[i], descriptors_2[i])]
-
- # If local feature dimensionality is lower than 8, then use that to compute
- # proper binarized inner product.
- bits_per_descriptor = min(self._feature_dimensionality, 8)
-
- total_num_bits = bits_per_descriptor * num_descriptors
-
- return 1.0 - 2.0 * h / total_num_bits
-
- def _AsmkSimilarity(self,
- aggregated_descriptors_1,
- aggregated_descriptors_2,
- visual_words_1,
- visual_words_2,
- binarized=False):
- """Compute ASMK-based similarity.
-
- If `aggregated_descriptors_1` or `aggregated_descriptors_2` is empty, we
- return a similarity of -1.0.
-
- If binarized is True, `aggregated_descriptors_1` and
- `aggregated_descriptors_2` must be of type uint8.
-
- Args:
- aggregated_descriptors_1: 1-D NumPy array.
- aggregated_descriptors_2: 1-D NumPy array.
- visual_words_1: 1-D sorted NumPy integer array denoting visual words
- corresponding to `aggregated_descriptors_1`.
- visual_words_2: 1-D sorted NumPy integer array denoting visual words
- corresponding to `aggregated_descriptors_2`.
- binarized: If True, compute ASMK* similarity.
-
- Returns:
- similarity: Float. The larger, the more similar.
-
- Raises:
- ValueError: If input descriptor dimensionality is inconsistent, or if
- descriptor type is unsupported.
- """
- num_visual_words_1 = len(visual_words_1)
- num_visual_words_2 = len(visual_words_2)
-
- if not num_visual_words_1 or not num_visual_words_2:
- return -1.0
-
- # Parse dimensionality used per visual word. They must be the same for both
- # aggregated descriptors. If using ASMK, they also must be equal to
- # self._feature_dimensionality.
- if binarized:
- if aggregated_descriptors_1.dtype != 'uint8':
- raise ValueError('Incorrect input descriptor type: %s' %
- aggregated_descriptors_1.dtype)
- if aggregated_descriptors_2.dtype != 'uint8':
- raise ValueError('Incorrect input descriptor type: %s' %
- aggregated_descriptors_2.dtype)
-
- per_visual_word_dimensionality = int(
- len(aggregated_descriptors_1) / num_visual_words_1)
- if len(aggregated_descriptors_2
- ) / num_visual_words_2 != per_visual_word_dimensionality:
- raise ValueError('ASMK* dimensionality is inconsistent.')
- else:
- per_visual_word_dimensionality = self._feature_dimensionality
- self._CheckAsmkDimensionality(aggregated_descriptors_1,
- num_visual_words_1, '1')
- self._CheckAsmkDimensionality(aggregated_descriptors_2,
- num_visual_words_2, '2')
-
- aggregated_descriptors_1_reshape = np.reshape(
- aggregated_descriptors_1,
- [num_visual_words_1, per_visual_word_dimensionality])
- aggregated_descriptors_2_reshape = np.reshape(
- aggregated_descriptors_2,
- [num_visual_words_2, per_visual_word_dimensionality])
-
- # Loop over visual words, compute similarity.
- unnormalized_similarity = 0.0
- ind_1 = 0
- ind_2 = 0
- while ind_1 < num_visual_words_1 and ind_2 < num_visual_words_2:
- if visual_words_1[ind_1] == visual_words_2[ind_2]:
- if binarized:
- inner_product = self._BinaryNormalizedInnerProduct(
- aggregated_descriptors_1_reshape[ind_1],
- aggregated_descriptors_2_reshape[ind_2])
- else:
- inner_product = np.dot(aggregated_descriptors_1_reshape[ind_1],
- aggregated_descriptors_2_reshape[ind_2])
- unnormalized_similarity += self._SigmaFn(inner_product)
- ind_1 += 1
- ind_2 += 1
- elif visual_words_1[ind_1] > visual_words_2[ind_2]:
- ind_2 += 1
- else:
- ind_1 += 1
-
- final_similarity = unnormalized_similarity
- if self._use_l2_normalization:
- final_similarity /= np.sqrt(num_visual_words_1 * num_visual_words_2)
-
- return final_similarity
diff --git a/research/delf/delf/python/feature_aggregation_similarity_test.py b/research/delf/delf/python/feature_aggregation_similarity_test.py
deleted file mode 100644
index e2f01b1d2a7..00000000000
--- a/research/delf/delf/python/feature_aggregation_similarity_test.py
+++ /dev/null
@@ -1,137 +0,0 @@
-# Copyright 2019 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for DELF feature aggregation similarity."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-
-from delf import aggregation_config_pb2
-from delf import feature_aggregation_similarity
-
-
-class FeatureAggregationSimilarityTest(tf.test.TestCase):
-
- def testComputeVladSimilarityWorks(self):
- # Construct inputs.
- vlad_1 = np.array([0, 1, 2, 3, 4])
- vlad_2 = np.array([5, 6, 7, 8, 9])
- config = aggregation_config_pb2.AggregationConfig()
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.VLAD
-
- # Run tested function.
- similarity_computer = (
- feature_aggregation_similarity.SimilarityAggregatedRepresentation(
- config))
- similarity = similarity_computer.ComputeSimilarity(vlad_1, vlad_2)
-
- # Define expected results.
- exp_similarity = 80
-
- # Compare actual and expected results.
- self.assertAllEqual(similarity, exp_similarity)
-
- def testComputeAsmkSimilarityWorks(self):
- # Construct inputs.
- aggregated_descriptors_1 = np.array([
- 0.0, 0.0, -0.707107, -0.707107, 0.5, 0.866025, 0.816497, 0.577350, 1.0,
- 0.0
- ])
- visual_words_1 = np.array([0, 1, 2, 3, 4])
- aggregated_descriptors_2 = np.array(
- [0.0, 1.0, 1.0, 0.0, 0.707107, 0.707107])
- visual_words_2 = np.array([1, 2, 4])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.ASMK
- config.use_l2_normalization = True
-
- # Run tested function.
- similarity_computer = (
- feature_aggregation_similarity.SimilarityAggregatedRepresentation(
- config))
- similarity = similarity_computer.ComputeSimilarity(
- aggregated_descriptors_1, aggregated_descriptors_2, visual_words_1,
- visual_words_2)
-
- # Define expected results.
- exp_similarity = 0.123562
-
- # Compare actual and expected results.
- self.assertAllClose(similarity, exp_similarity)
-
- def testComputeAsmkSimilarityNoNormalizationWorks(self):
- # Construct inputs.
- aggregated_descriptors_1 = np.array([
- 0.0, 0.0, -0.707107, -0.707107, 0.5, 0.866025, 0.816497, 0.577350, 1.0,
- 0.0
- ])
- visual_words_1 = np.array([0, 1, 2, 3, 4])
- aggregated_descriptors_2 = np.array(
- [0.0, 1.0, 1.0, 0.0, 0.707107, 0.707107])
- visual_words_2 = np.array([1, 2, 4])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.ASMK
- config.use_l2_normalization = False
-
- # Run tested function.
- similarity_computer = (
- feature_aggregation_similarity.SimilarityAggregatedRepresentation(
- config))
- similarity = similarity_computer.ComputeSimilarity(
- aggregated_descriptors_1, aggregated_descriptors_2, visual_words_1,
- visual_words_2)
-
- # Define expected results.
- exp_similarity = 0.478554
-
- # Compare actual and expected results.
- self.assertAllClose(similarity, exp_similarity)
-
- def testComputeAsmkStarSimilarityWorks(self):
- # Construct inputs.
- aggregated_descriptors_1 = np.array([0, 0, 3, 3, 3], dtype='uint8')
- visual_words_1 = np.array([0, 1, 2, 3, 4])
- aggregated_descriptors_2 = np.array([1, 2, 3], dtype='uint8')
- visual_words_2 = np.array([1, 2, 4])
- config = aggregation_config_pb2.AggregationConfig()
- config.codebook_size = 5
- config.feature_dimensionality = 2
- config.aggregation_type = aggregation_config_pb2.AggregationConfig.ASMK_STAR
- config.use_l2_normalization = True
-
- # Run tested function.
- similarity_computer = (
- feature_aggregation_similarity.SimilarityAggregatedRepresentation(
- config))
- similarity = similarity_computer.ComputeSimilarity(
- aggregated_descriptors_1, aggregated_descriptors_2, visual_words_1,
- visual_words_2)
-
- # Define expected results.
- exp_similarity = 0.258199
-
- # Compare actual and expected results.
- self.assertAllClose(similarity, exp_similarity)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/feature_extractor.py b/research/delf/delf/python/feature_extractor.py
deleted file mode 100644
index 9545337f187..00000000000
--- a/research/delf/delf/python/feature_extractor.py
+++ /dev/null
@@ -1,175 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""DELF feature extractor."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-
-def NormalizePixelValues(image,
- pixel_value_offset=128.0,
- pixel_value_scale=128.0):
- """Normalize image pixel values.
-
- Args:
- image: a uint8 tensor.
- pixel_value_offset: a Python float, offset for normalizing pixel values.
- pixel_value_scale: a Python float, scale for normalizing pixel values.
-
- Returns:
- image: a float32 tensor of the same shape as the input image.
- """
- image = tf.cast(image, dtype=tf.float32)
- image = tf.truediv(tf.subtract(image, pixel_value_offset), pixel_value_scale)
- return image
-
-
-def CalculateReceptiveBoxes(height, width, rf, stride, padding):
- """Calculate receptive boxes for each feature point.
-
- Args:
- height: The height of feature map.
- width: The width of feature map.
- rf: The receptive field size.
- stride: The effective stride between two adjacent feature points.
- padding: The effective padding size.
-
- Returns:
- rf_boxes: [N, 4] receptive boxes tensor. Here N equals to height x width.
- Each box is represented by [ymin, xmin, ymax, xmax].
- """
- x, y = tf.meshgrid(tf.range(width), tf.range(height))
- coordinates = tf.reshape(tf.stack([y, x], axis=2), [-1, 2])
- # [y,x,y,x]
- point_boxes = tf.cast(
- tf.concat([coordinates, coordinates], 1), dtype=tf.float32)
- bias = [-padding, -padding, -padding + rf - 1, -padding + rf - 1]
- rf_boxes = stride * point_boxes + bias
- return rf_boxes
-
-
-def CalculateKeypointCenters(boxes):
- """Helper function to compute feature centers, from RF boxes.
-
- Args:
- boxes: [N, 4] float tensor.
-
- Returns:
- centers: [N, 2] float tensor.
- """
- return tf.divide(
- tf.add(
- tf.gather(boxes, [0, 1], axis=1), tf.gather(boxes, [2, 3], axis=1)),
- 2.0)
-
-
-def ApplyPcaAndWhitening(data,
- pca_matrix,
- pca_mean,
- output_dim,
- use_whitening=False,
- pca_variances=None):
- """Applies PCA/whitening to data.
-
- Args:
- data: [N, dim] float tensor containing data which undergoes PCA/whitening.
- pca_matrix: [dim, dim] float tensor PCA matrix, row-major.
- pca_mean: [dim] float tensor, mean to subtract before projection.
- output_dim: Number of dimensions to use in output data, of type int.
- use_whitening: Whether whitening is to be used.
- pca_variances: [dim] float tensor containing PCA variances. Only used if
- use_whitening is True.
-
- Returns:
- output: [N, output_dim] float tensor with output of PCA/whitening operation.
- """
- output = tf.matmul(
- tf.subtract(data, pca_mean),
- tf.slice(pca_matrix, [0, 0], [output_dim, -1]),
- transpose_b=True,
- name='pca_matmul')
-
- # Apply whitening if desired.
- if use_whitening:
- output = tf.divide(
- output,
- tf.sqrt(tf.slice(pca_variances, [0], [output_dim])),
- name='whitening')
-
- return output
-
-
-def PostProcessDescriptors(descriptors, use_pca, pca_parameters=None):
- """Post-process descriptors.
-
- Args:
- descriptors: [N, input_dim] float tensor.
- use_pca: Whether to use PCA.
- pca_parameters: Only used if `use_pca` is True. Dict containing PCA
- parameter tensors, with keys 'mean', 'matrix', 'dim', 'use_whitening',
- 'variances'.
-
- Returns:
- final_descriptors: [N, output_dim] float tensor with descriptors after
- normalization and (possibly) PCA/whitening.
- """
- # L2-normalize, and if desired apply PCA (followed by L2-normalization).
- final_descriptors = tf.nn.l2_normalize(
- descriptors, axis=1, name='l2_normalization')
-
- if use_pca:
- # Apply PCA, and whitening if desired.
- final_descriptors = ApplyPcaAndWhitening(final_descriptors,
- pca_parameters['matrix'],
- pca_parameters['mean'],
- pca_parameters['dim'],
- pca_parameters['use_whitening'],
- pca_parameters['variances'])
-
- # Re-normalize.
- final_descriptors = tf.nn.l2_normalize(
- final_descriptors, axis=1, name='pca_l2_normalization')
-
- return final_descriptors
-
-
-def DelfFeaturePostProcessing(boxes, descriptors, use_pca, pca_parameters=None):
- """Extract DELF features from input image.
-
- Args:
- boxes: [N, 4] float tensor which denotes the selected receptive box. N is
- the number of final feature points which pass through keypoint selection
- and NMS steps.
- descriptors: [N, input_dim] float tensor.
- use_pca: Whether to use PCA.
- pca_parameters: Only used if `use_pca` is True. Dict containing PCA
- parameter tensors, with keys 'mean', 'matrix', 'dim', 'use_whitening',
- 'variances'.
-
- Returns:
- locations: [N, 2] float tensor which denotes the selected keypoint
- locations.
- final_descriptors: [N, output_dim] float tensor with DELF descriptors after
- normalization and (possibly) PCA/whitening.
- """
-
- # Get center of descriptor boxes, corresponding to feature locations.
- locations = CalculateKeypointCenters(boxes)
- final_descriptors = PostProcessDescriptors(descriptors, use_pca,
- pca_parameters)
-
- return locations, final_descriptors
diff --git a/research/delf/delf/python/feature_extractor_test.py b/research/delf/delf/python/feature_extractor_test.py
deleted file mode 100644
index 0caa51c4321..00000000000
--- a/research/delf/delf/python/feature_extractor_test.py
+++ /dev/null
@@ -1,75 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for DELF feature extractor."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from delf import feature_extractor
-
-
-class FeatureExtractorTest(tf.test.TestCase):
-
- def testNormalizePixelValues(self):
- image = tf.constant(
- [[[3, 255, 0], [34, 12, 5]], [[45, 5, 65], [56, 77, 89]]],
- dtype=tf.uint8)
- normalized_image = feature_extractor.NormalizePixelValues(
- image, pixel_value_offset=5.0, pixel_value_scale=2.0)
- exp_normalized_image = [[[-1.0, 125.0, -2.5], [14.5, 3.5, 0.0]],
- [[20.0, 0.0, 30.0], [25.5, 36.0, 42.0]]]
-
- self.assertAllEqual(normalized_image, exp_normalized_image)
-
- def testCalculateReceptiveBoxes(self):
- boxes = feature_extractor.CalculateReceptiveBoxes(
- height=1, width=2, rf=291, stride=32, padding=145)
- exp_boxes = [[-145., -145., 145., 145.], [-145., -113., 145., 177.]]
-
- self.assertAllEqual(exp_boxes, boxes)
-
- def testCalculateKeypointCenters(self):
- boxes = [[-10.0, 0.0, 11.0, 21.0], [-2.5, 5.0, 18.5, 26.0],
- [45.0, -2.5, 66.0, 18.5]]
- centers = feature_extractor.CalculateKeypointCenters(boxes)
-
- exp_centers = [[0.5, 10.5], [8.0, 15.5], [55.5, 8.0]]
-
- self.assertAllEqual(exp_centers, centers)
-
- def testPcaWhitening(self):
- data = tf.constant([[1.0, 2.0, -2.0], [-5.0, 0.0, 3.0], [-1.0, 2.0, 0.0],
- [0.0, 4.0, -1.0]])
- pca_matrix = tf.constant([[2.0, 0.0, -1.0], [0.0, 1.0, 1.0],
- [-1.0, 1.0, 3.0]])
- pca_mean = tf.constant([1.0, 2.0, 3.0])
- output_dim = 2
- use_whitening = True
- pca_variances = tf.constant([4.0, 1.0])
-
- output = feature_extractor.ApplyPcaAndWhitening(data, pca_matrix, pca_mean,
- output_dim, use_whitening,
- pca_variances)
-
- exp_output = [[2.5, -5.0], [-6.0, -2.0], [-0.5, -3.0], [1.0, -2.0]]
-
- self.assertAllEqual(exp_output, output)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/feature_io.py b/research/delf/delf/python/feature_io.py
deleted file mode 100644
index 9b68586b854..00000000000
--- a/research/delf/delf/python/feature_io.py
+++ /dev/null
@@ -1,196 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Python interface for DelfFeatures proto.
-
-Support read and write of DelfFeatures from/to numpy arrays and file.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-
-from delf import feature_pb2
-from delf import datum_io
-
-
-def ArraysToDelfFeatures(locations,
- scales,
- descriptors,
- attention,
- orientations=None):
- """Converts DELF features to DelfFeatures proto.
-
- Args:
- locations: [N, 2] float array which denotes the selected keypoint locations.
- N is the number of features.
- scales: [N] float array with feature scales.
- descriptors: [N, depth] float array with DELF descriptors.
- attention: [N] float array with attention scores.
- orientations: [N] float array with orientations. If None, all orientations
- are set to zero.
-
- Returns:
- delf_features: DelfFeatures object.
- """
- num_features = len(attention)
- assert num_features == locations.shape[0]
- assert num_features == len(scales)
- assert num_features == descriptors.shape[0]
-
- if orientations is None:
- orientations = np.zeros([num_features], dtype=np.float32)
- else:
- assert num_features == len(orientations)
-
- delf_features = feature_pb2.DelfFeatures()
- for i in range(num_features):
- delf_feature = delf_features.feature.add()
- delf_feature.y = locations[i, 0]
- delf_feature.x = locations[i, 1]
- delf_feature.scale = scales[i]
- delf_feature.orientation = orientations[i]
- delf_feature.strength = attention[i]
- delf_feature.descriptor.CopyFrom(datum_io.ArrayToDatum(descriptors[i,]))
-
- return delf_features
-
-
-def DelfFeaturesToArrays(delf_features):
- """Converts data saved in DelfFeatures to numpy arrays.
-
- If there are no features, the function returns four empty arrays.
-
- Args:
- delf_features: DelfFeatures object.
-
- Returns:
- locations: [N, 2] float array which denotes the selected keypoint
- locations. N is the number of features.
- scales: [N] float array with feature scales.
- descriptors: [N, depth] float array with DELF descriptors.
- attention: [N] float array with attention scores.
- orientations: [N] float array with orientations.
- """
- num_features = len(delf_features.feature)
- if num_features == 0:
- return np.array([]), np.array([]), np.array([]), np.array([]), np.array([])
-
- # Figure out descriptor dimensionality by parsing first one.
- descriptor_dim = len(
- datum_io.DatumToArray(delf_features.feature[0].descriptor))
- locations = np.zeros([num_features, 2])
- scales = np.zeros([num_features])
- descriptors = np.zeros([num_features, descriptor_dim])
- attention = np.zeros([num_features])
- orientations = np.zeros([num_features])
-
- for i in range(num_features):
- delf_feature = delf_features.feature[i]
- locations[i, 0] = delf_feature.y
- locations[i, 1] = delf_feature.x
- scales[i] = delf_feature.scale
- descriptors[i,] = datum_io.DatumToArray(delf_feature.descriptor)
- attention[i] = delf_feature.strength
- orientations[i] = delf_feature.orientation
-
- return locations, scales, descriptors, attention, orientations
-
-
-def SerializeToString(locations,
- scales,
- descriptors,
- attention,
- orientations=None):
- """Converts numpy arrays to serialized DelfFeatures.
-
- Args:
- locations: [N, 2] float array which denotes the selected keypoint locations.
- N is the number of features.
- scales: [N] float array with feature scales.
- descriptors: [N, depth] float array with DELF descriptors.
- attention: [N] float array with attention scores.
- orientations: [N] float array with orientations. If None, all orientations
- are set to zero.
-
- Returns:
- Serialized DelfFeatures string.
- """
- delf_features = ArraysToDelfFeatures(locations, scales, descriptors,
- attention, orientations)
- return delf_features.SerializeToString()
-
-
-def ParseFromString(string):
- """Converts serialized DelfFeatures string to numpy arrays.
-
- Args:
- string: Serialized DelfFeatures string.
-
- Returns:
- locations: [N, 2] float array which denotes the selected keypoint
- locations. N is the number of features.
- scales: [N] float array with feature scales.
- descriptors: [N, depth] float array with DELF descriptors.
- attention: [N] float array with attention scores.
- orientations: [N] float array with orientations.
- """
- delf_features = feature_pb2.DelfFeatures()
- delf_features.ParseFromString(string)
- return DelfFeaturesToArrays(delf_features)
-
-
-def ReadFromFile(file_path):
- """Helper function to load data from a DelfFeatures format in a file.
-
- Args:
- file_path: Path to file containing data.
-
- Returns:
- locations: [N, 2] float array which denotes the selected keypoint
- locations. N is the number of features.
- scales: [N] float array with feature scales.
- descriptors: [N, depth] float array with DELF descriptors.
- attention: [N] float array with attention scores.
- orientations: [N] float array with orientations.
- """
- with tf.io.gfile.GFile(file_path, 'rb') as f:
- return ParseFromString(f.read())
-
-
-def WriteToFile(file_path,
- locations,
- scales,
- descriptors,
- attention,
- orientations=None):
- """Helper function to write data to a file in DelfFeatures format.
-
- Args:
- file_path: Path to file that will be written.
- locations: [N, 2] float array which denotes the selected keypoint locations.
- N is the number of features.
- scales: [N] float array with feature scales.
- descriptors: [N, depth] float array with DELF descriptors.
- attention: [N] float array with attention scores.
- orientations: [N] float array with orientations. If None, all orientations
- are set to zero.
- """
- serialized_data = SerializeToString(locations, scales, descriptors, attention,
- orientations)
- with tf.io.gfile.GFile(file_path, 'w') as f:
- f.write(serialized_data)
diff --git a/research/delf/delf/python/feature_io_test.py b/research/delf/delf/python/feature_io_test.py
deleted file mode 100644
index 8b68d3b241c..00000000000
--- a/research/delf/delf/python/feature_io_test.py
+++ /dev/null
@@ -1,112 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for feature_io, the python interface of DelfFeatures."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-import numpy as np
-import tensorflow as tf
-
-from delf import feature_io
-
-FLAGS = flags.FLAGS
-
-
-def create_data():
- """Creates data to be used in tests.
-
- Returns:
- locations: [N, 2] float array which denotes the selected keypoint
- locations. N is the number of features.
- scales: [N] float array with feature scales.
- descriptors: [N, depth] float array with DELF descriptors.
- attention: [N] float array with attention scores.
- orientations: [N] float array with orientations.
- """
- locations = np.arange(8, dtype=np.float32).reshape(4, 2)
- scales = np.arange(4, dtype=np.float32)
- attention = np.arange(4, dtype=np.float32)
- orientations = np.arange(4, dtype=np.float32)
- descriptors = np.zeros([4, 1024])
- descriptors[0,] = np.arange(1024)
- descriptors[1,] = np.zeros([1024])
- descriptors[2,] = np.ones([1024])
- descriptors[3,] = -np.ones([1024])
-
- return locations, scales, descriptors, attention, orientations
-
-
-class DelfFeaturesIoTest(tf.test.TestCase):
-
- def testConversionAndBack(self):
- locations, scales, descriptors, attention, orientations = create_data()
-
- serialized = feature_io.SerializeToString(locations, scales, descriptors,
- attention, orientations)
- parsed_data = feature_io.ParseFromString(serialized)
-
- self.assertAllEqual(locations, parsed_data[0])
- self.assertAllEqual(scales, parsed_data[1])
- self.assertAllEqual(descriptors, parsed_data[2])
- self.assertAllEqual(attention, parsed_data[3])
- self.assertAllEqual(orientations, parsed_data[4])
-
- def testConversionAndBackNoOrientations(self):
- locations, scales, descriptors, attention, _ = create_data()
-
- serialized = feature_io.SerializeToString(locations, scales, descriptors,
- attention)
- parsed_data = feature_io.ParseFromString(serialized)
-
- self.assertAllEqual(locations, parsed_data[0])
- self.assertAllEqual(scales, parsed_data[1])
- self.assertAllEqual(descriptors, parsed_data[2])
- self.assertAllEqual(attention, parsed_data[3])
- self.assertAllEqual(np.zeros([4]), parsed_data[4])
-
- def testWriteAndReadToFile(self):
- locations, scales, descriptors, attention, orientations = create_data()
-
- filename = os.path.join(FLAGS.test_tmpdir, 'test.delf')
- feature_io.WriteToFile(filename, locations, scales, descriptors, attention,
- orientations)
- data_read = feature_io.ReadFromFile(filename)
-
- self.assertAllEqual(locations, data_read[0])
- self.assertAllEqual(scales, data_read[1])
- self.assertAllEqual(descriptors, data_read[2])
- self.assertAllEqual(attention, data_read[3])
- self.assertAllEqual(orientations, data_read[4])
-
- def testWriteAndReadToFileEmptyFile(self):
- filename = os.path.join(FLAGS.test_tmpdir, 'test.delf')
- feature_io.WriteToFile(filename, np.array([]), np.array([]), np.array([]),
- np.array([]), np.array([]))
- data_read = feature_io.ReadFromFile(filename)
-
- self.assertAllEqual(np.array([]), data_read[0])
- self.assertAllEqual(np.array([]), data_read[1])
- self.assertAllEqual(np.array([]), data_read[2])
- self.assertAllEqual(np.array([]), data_read[3])
- self.assertAllEqual(np.array([]), data_read[4])
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/normalization_layers/__init__.py b/research/delf/delf/python/normalization_layers/__init__.py
deleted file mode 100644
index 9064f503de1..00000000000
--- a/research/delf/delf/python/normalization_layers/__init__.py
+++ /dev/null
@@ -1,14 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
\ No newline at end of file
diff --git a/research/delf/delf/python/normalization_layers/normalization.py b/research/delf/delf/python/normalization_layers/normalization.py
deleted file mode 100644
index cfb75da7535..00000000000
--- a/research/delf/delf/python/normalization_layers/normalization.py
+++ /dev/null
@@ -1,40 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Normalization layer definitions."""
-
-import tensorflow as tf
-
-
-class L2Normalization(tf.keras.layers.Layer):
- """Normalization layer using L2 norm."""
-
- def __init__(self):
- """Initialization of the L2Normalization layer."""
- super(L2Normalization, self).__init__()
- # A lower bound value for the norm.
- self.eps = 1e-6
-
- def call(self, x, axis=1):
- """Invokes the L2Normalization instance.
-
- Args:
- x: A Tensor.
- axis: Dimension along which to normalize. A scalar or a vector of
- integers.
-
- Returns:
- norm: A Tensor with the same shape as `x`.
- """
- return tf.nn.l2_normalize(x, axis, epsilon=self.eps)
diff --git a/research/delf/delf/python/normalization_layers/normalization_test.py b/research/delf/delf/python/normalization_layers/normalization_test.py
deleted file mode 100644
index ea302566c69..00000000000
--- a/research/delf/delf/python/normalization_layers/normalization_test.py
+++ /dev/null
@@ -1,36 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for normalization layers."""
-
-import tensorflow as tf
-
-from delf.python.normalization_layers import normalization
-
-
-class NormalizationsTest(tf.test.TestCase):
-
- def testL2Normalization(self):
- x = tf.constant([-4.0, 0.0, 4.0])
- layer = normalization.L2Normalization()
- # Run tested function.
- result = layer(x, axis=0)
- # Define expected result.
- exp_output = [-0.70710677, 0.0, 0.70710677]
- # Compare actual and expected.
- self.assertAllClose(exp_output, result)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/pooling_layers/__init__.py b/research/delf/delf/python/pooling_layers/__init__.py
deleted file mode 100644
index 9064f503de1..00000000000
--- a/research/delf/delf/python/pooling_layers/__init__.py
+++ /dev/null
@@ -1,14 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
\ No newline at end of file
diff --git a/research/delf/delf/python/pooling_layers/pooling.py b/research/delf/delf/python/pooling_layers/pooling.py
deleted file mode 100644
index 8244a414b31..00000000000
--- a/research/delf/delf/python/pooling_layers/pooling.py
+++ /dev/null
@@ -1,194 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Pooling layers definitions."""
-
-import tensorflow as tf
-
-
-class MAC(tf.keras.layers.Layer):
- """Global max pooling (MAC) layer.
-
- Maximum Activations of Convolutions (MAC) is simply constructed by
- max-pooling over all dimensions per feature map. See
- https://arxiv.org/abs/1511.05879 for a reference.
- """
-
- def call(self, x, axis=None):
- """Invokes the MAC pooling instance.
-
- Args:
- x: [B, H, W, D] A float32 Tensor.
- axis: Dimensions to reduce. By default, dimensions [1, 2] are reduced.
-
- Returns:
- output: [B, D] A float32 Tensor.
- """
- if axis is None:
- axis = [1, 2]
- return mac(x, axis=axis)
-
-
-class SPoC(tf.keras.layers.Layer):
- """Average pooling (SPoC) layer.
-
- Sum-pooled convolutional features (SPoC) is based on the sum pooling of the
- deep features. See https://arxiv.org/pdf/1510.07493.pdf for a reference.
- """
-
- def call(self, x, axis=None):
- """Invokes the SPoC instance.
-
- Args:
- x: [B, H, W, D] A float32 Tensor.
- axis: Dimensions to reduce. By default, dimensions [1, 2] are reduced.
-
- Returns:
- output: [B, D] A float32 Tensor.
- """
- if axis is None:
- axis = [1, 2]
- return spoc(x, axis)
-
-
-class GeM(tf.keras.layers.Layer):
- """Generalized mean pooling (GeM) layer.
-
- Generalized Mean Pooling (GeM) computes the generalized mean of each
- channel in a tensor. See https://arxiv.org/abs/1711.02512 for a reference.
- """
-
- def __init__(self, power=3.):
- """Initialization of the generalized mean pooling (GeM) layer.
-
- Args:
- power: Float power > 0 is an inverse exponent parameter, used during the
- generalized mean pooling computation. Setting this exponent as power > 1
- increases the contrast of the pooled feature map and focuses on the
- salient features of the image. GeM is a generalization of the average
- pooling commonly used in classification networks (power = 1) and of
- spatial max-pooling layer (power = inf).
- """
- super(GeM, self).__init__()
- self.power = power
- self.eps = 1e-6
-
- def call(self, x, axis=None):
- """Invokes the GeM instance.
-
- Args:
- x: [B, H, W, D] A float32 Tensor.
- axis: Dimensions to reduce. By default, dimensions [1, 2] are reduced.
-
- Returns:
- output: [B, D] A float32 Tensor.
- """
- if axis is None:
- axis = [1, 2]
- return gem(x, power=self.power, eps=self.eps, axis=axis)
-
-
-class GeMPooling2D(tf.keras.layers.Layer):
- """Generalized mean pooling (GeM) pooling operation for spatial data."""
-
- def __init__(self,
- power=20.,
- pool_size=(2, 2),
- strides=None,
- padding='valid',
- data_format='channels_last'):
- """Initialization of GeMPooling2D.
-
- Args:
- power: Float, power > 0. is an inverse exponent parameter (GeM power).
- pool_size: Integer or tuple of 2 integers, factors by which to downscale
- (vertical, horizontal)
- strides: Integer, tuple of 2 integers, or None. Strides values. If None,
- it will default to `pool_size`.
- padding: One of `valid` or `same`. `valid` means no padding. `same`
- results in padding evenly to the left/right or up/down of the input such
- that output has the same height/width dimension as the input.
- data_format: A string, one of `channels_last` (default) or
- `channels_first`. The ordering of the dimensions in the inputs.
- `channels_last` corresponds to inputs with shape `(batch, height, width,
- channels)` while `channels_first` corresponds to inputs with shape
- `(batch, channels, height, width)`.
- """
- super(GeMPooling2D, self).__init__()
- self.power = power
- self.eps = 1e-6
- self.pool_size = pool_size
- self.strides = strides
- self.padding = padding.upper()
- data_format_conv = {
- 'channels_last': 'NHWC',
- 'channels_first': 'NCHW',
- }
- self.data_format = data_format_conv[data_format]
-
- def call(self, x):
- tmp = tf.pow(x, self.power)
- tmp = tf.nn.avg_pool(tmp, self.pool_size, self.strides, self.padding,
- self.data_format)
- out = tf.pow(tmp, 1. / self.power)
- return out
-
-
-def mac(x, axis=None):
- """Performs global max pooling (MAC).
-
- Args:
- x: [B, H, W, D] A float32 Tensor.
- axis: Dimensions to reduce. By default, dimensions [1, 2] are reduced.
-
- Returns:
- output: [B, D] A float32 Tensor.
- """
- if axis is None:
- axis = [1, 2]
- return tf.reduce_max(x, axis=axis, keepdims=False)
-
-
-def spoc(x, axis=None):
- """Performs average pooling (SPoC).
-
- Args:
- x: [B, H, W, D] A float32 Tensor.
- axis: Dimensions to reduce. By default, dimensions [1, 2] are reduced.
-
- Returns:
- output: [B, D] A float32 Tensor.
- """
- if axis is None:
- axis = [1, 2]
- return tf.reduce_mean(x, axis=axis, keepdims=False)
-
-
-def gem(x, axis=None, power=3., eps=1e-6):
- """Performs generalized mean pooling (GeM).
-
- Args:
- x: [B, H, W, D] A float32 Tensor.
- axis: Dimensions to reduce. By default, dimensions [1, 2] are reduced.
- power: Float, power > 0 is an inverse exponent parameter (GeM power).
- eps: Float, parameter for numerical stability.
-
- Returns:
- output: [B, D] A float32 Tensor.
- """
- if axis is None:
- axis = [1, 2]
- tmp = tf.pow(tf.maximum(x, eps), power)
- out = tf.pow(tf.reduce_mean(tmp, axis=axis, keepdims=False), 1. / power)
- return out
diff --git a/research/delf/delf/python/pooling_layers/pooling_test.py b/research/delf/delf/python/pooling_layers/pooling_test.py
deleted file mode 100644
index 78653550e45..00000000000
--- a/research/delf/delf/python/pooling_layers/pooling_test.py
+++ /dev/null
@@ -1,84 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for pooling layers."""
-
-import tensorflow as tf
-
-from delf.python.pooling_layers import pooling
-
-
-class PoolingsTest(tf.test.TestCase):
-
- def testMac(self):
- x = tf.constant([[[[0., 1.], [2., 3.]], [[4., 5.], [6., 7.]]]])
- # Run tested function.
- result = pooling.mac(x)
- # Define expected result.
- exp_output = [[6., 7.]]
- # Compare actual and expected.
- self.assertAllClose(exp_output, result)
-
- def testSpoc(self):
- x = tf.constant([[[[0., 1.], [2., 3.]], [[4., 5.], [6., 7.]]]])
- # Run tested function.
- result = pooling.spoc(x)
- # Define expected result.
- exp_output = [[3., 4.]]
- # Compare actual and expected.
- self.assertAllClose(exp_output, result)
-
- def testGem(self):
- x = tf.constant([[[[0., 1.], [2., 3.]], [[4., 5.], [6., 7.]]]])
- # Run tested function.
- result = pooling.gem(x, power=3., eps=1e-6)
- # Define expected result.
- exp_output = [[4.1601677, 4.9866314]]
- # Compare actual and expected.
- self.assertAllClose(exp_output, result)
-
- def testGeMPooling2D(self):
- # Create a testing tensor.
- x = tf.constant([[[1., 2., 3.],
- [4., 5., 6.],
- [7., 8., 9.]]])
- x = tf.reshape(x, [1, 3, 3, 1])
-
- # Checking GeMPooling2D relation to MaxPooling2D for the large values of
- # `p`.
- max_pool_2d = tf.keras.layers.MaxPooling2D(pool_size=(2, 2),
- strides=(1, 1), padding='valid')
- out_max = max_pool_2d(x)
- gem_pool_2d = pooling.GeMPooling2D(power=30., pool_size=(2, 2),
- strides=(1, 1), padding='valid')
- out_gem_max = gem_pool_2d(x)
-
- # Check that for large `p` GeMPooling2D is close to MaxPooling2D.
- self.assertAllEqual(out_max, tf.round(out_gem_max))
-
- # Checking GeMPooling2D relation to AveragePooling2D for the value
- # of `p` = 1.
- avg_pool_2d = tf.keras.layers.AveragePooling2D(pool_size=(2, 2),
- strides=(1, 1),
- padding='valid')
- out_avg = avg_pool_2d(x)
- gem_pool_2d = pooling.GeMPooling2D(power=1., pool_size=(2, 2),
- strides=(1, 1), padding='valid')
- out_gem_avg = gem_pool_2d(x)
- # Check that for `p` equals 1., GeMPooling2D becomes AveragePooling2D.
- self.assertAllEqual(out_avg, out_gem_avg)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/training/README.md b/research/delf/delf/python/training/README.md
deleted file mode 100644
index 41ea2a0b47f..00000000000
--- a/research/delf/delf/python/training/README.md
+++ /dev/null
@@ -1,339 +0,0 @@
-# DELF/DELG Training Instructions
-
-This README documents the end-to-end process for training a local and/or global
-image feature model on the
-[Google Landmarks Dataset v2](https://github.com/cvdfoundation/google-landmark)
-(GLDv2). This can be achieved following these steps:
-
-1. Install the DELF Python library.
-2. Download the raw images of the GLDv2 dataset.
-3. Prepare the training data.
-4. Run the training.
-
-The next sections will cove each of these steps in greater detail.
-
-## Prerequisites
-
-Clone the [TensorFlow Model Garden](https://github.com/tensorflow/models)
-repository and move into the `models/research/delf/delf/python/training`folder.
-
-```
-git clone https://github.com/tensorflow/models.git
-cd models/research/delf/delf/python/training
-```
-
-## Install the DELF Library
-
-To be able to use this code, please follow
-[these instructions](../../../INSTALL_INSTRUCTIONS.md) to properly install the
-DELF library.
-
-## Download the GLDv2 Training Data
-
-The [GLDv2](https://github.com/cvdfoundation/google-landmark) images are grouped
-in 3 datasets: TRAIN, INDEX, TEST. Images in each dataset are grouped into
-`*.tar` files and individually referenced in `*.csv`files containing training
-metadata and licensing information. The number of `*.tar` files per dataset is
-as follows:
-
-* TRAIN: 500 files.
-* INDEX: 100 files.
-* TEST: 20 files.
-
-To download the GLDv2 images, run the
-[`download_dataset.sh`](./download_dataset.sh) script like in the following
-example:
-
-```
-bash download_dataset.sh 500 100 20
-```
-
-The script takes the following parameters, in order:
-
-* The number of image files from the TRAIN dataset to download (maximum 500).
-* The number of image files from the INDEX dataset to download (maximum 100).
-* The number of image files from the TEST dataset to download (maximum 20).
-
-The script downloads the GLDv2 images under the following directory structure:
-
-* gldv2_dataset/
- * train/ - Contains raw images from the TRAIN dataset.
- * index/ - Contains raw images from the INDEX dataset.
- * test/ - Contains raw images from the TEST dataset.
-
-Each of the three folders `gldv2_dataset/train/`, `gldv2_dataset/index/` and
-`gldv2_dataset/test/` contains the following:
-
-* The downloaded `*.tar` files.
-* The corresponding MD5 checksum files, `*.txt`.
-* The unpacked content of the downloaded files. (*Images are organized in
- folders and subfolders based on the first, second and third character in
- their file name.*)
-* The CSV files containing training and licensing metadata of the downloaded
- images.
-
-*Please note that due to the large size of the GLDv2 dataset, the download can
-take up to 12 hours and up to 1 TB of disk space. In order to save bandwidth and
-disk space, you may want to start by downloading only the TRAIN dataset, the
-only one required for the training, thus saving approximately ~95 GB, the
-equivalent of the INDEX and TEST datasets. To further save disk space, the
-`*.tar` files can be deleted after downloading and upacking them.*
-
-## Prepare the Data for Training
-
-Preparing the data for training consists of creating
-[TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) files from
-the raw GLDv2 images grouped into TRAIN and VALIDATION splits. The training set
-produced contains only the *clean* subset of the GLDv2 dataset. The
-[CVPR'20 paper](https://arxiv.org/abs/2004.01804) introducing the GLDv2 dataset
-contains a detailed description of the *clean* subset.
-
-Generating the TFRecord files containing the TRAIN and VALIDATION splits of the
-*clean* GLDv2 subset can be achieved by running the
-[`build_image_dataset.py`](./build_image_dataset.py) script. Assuming that the
-GLDv2 images have been downloaded to the `gldv2_dataset` folder, the script can
-be run as follows:
-
-```
-python3 build_image_dataset.py \
- --train_csv_path=gldv2_dataset/train/train.csv \
- --train_clean_csv_path=gldv2_dataset/train/train_clean.csv \
- --train_directory=gldv2_dataset/train/*/*/*/ \
- --output_directory=gldv2_dataset/tfrecord/ \
- --num_shards=128 \
- --generate_train_validation_splits \
- --validation_split_size=0.2
-```
-
-*Please refer to the source code of the
-[`build_image_dataset.py`](./build_image_dataset.py) script for a detailed
-description of its parameters.*
-
-The TFRecord files written in the `OUTPUT_DIRECTORY` will be prefixed as
-follows:
-
-* TRAIN split: `train-*`
-* VALIDATION split: `validation-*`
-
-The same script can be used to generate TFRecord files for the TEST split for
-post-training evaluation purposes. This can be achieved by adding the
-parameters:
-
-```
---test_csv_path=gldv2_dataset/train/test.csv \
---test_directory=gldv2_dataset/test/*/*/*/ \
-```
-
-In this scenario, the TFRecord files of the TEST split written in the
-`OUTPUT_DIRECTORY` will be named according to the pattern `test-*`.
-
-*Please note that due to the large size of the GLDv2 dataset, the generation of
-the TFRecord files can take up to 12 hours and up to 500 GB of space disk.*
-
-## Running the Training
-
-For the training to converge faster, it is possible to initialize the ResNet
-backbone with the weights of a pretrained ImageNet model. The ImageNet
-checkpoint is available at the following location:
-[`http://storage.googleapis.com/delf/resnet50_imagenet_weights.tar.gz`](http://storage.googleapis.com/delf/resnet50_imagenet_weights.tar.gz).
-To download and unpack it run the following commands on a Linux box:
-
-```
-curl -Os http://storage.googleapis.com/delf/resnet50_imagenet_weights.tar.gz
-tar -xzvf resnet50_imagenet_weights.tar.gz
-```
-
-### Training with Local Features
-
-Assuming the TFRecord files were generated in the `gldv2_dataset/tfrecord/`
-directory, running the following command should start training a model and
-output the results in the `gldv2_training` directory:
-
-```
-python3 train.py \
- --train_file_pattern=gldv2_dataset/tfrecord/train* \
- --validation_file_pattern=gldv2_dataset/tfrecord/validation* \
- --imagenet_checkpoint=resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5 \
- --dataset_version=gld_v2_clean \
- --logdir=gldv2_training/
-```
-
-*NOTE: The `--use_autoencoder` parameter is set by default to `True`, therefore
-the model will be by default trained with the autoencoder.*
-
-### Training with Local and Global Features
-
-It is also possible to train the model with an improved global features head as
-introduced in the [DELG paper](https://arxiv.org/abs/2001.05027). To do this,
-specify the additional parameter `--delg_global_features` when launching the
-training, like in the following example:
-
-```
-python3 train.py \
- --train_file_pattern=gldv2_dataset/tfrecord/train* \
- --validation_file_pattern=gldv2_dataset/tfrecord/validation* \
- --imagenet_checkpoint=resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5 \
- --dataset_version=gld_v2_clean \
- --logdir=gldv2_training/ \
- --delg_global_features
-```
-
-*NOTE: The `--use_autoencoder` parameter is set by default to `True`, therefore
-the model will be by default trained with the autoencoder.*
-
-### Hyperparameter Guidelines
-
-In order to improve the convergence of the training, the following
-hyperparameter values have been tested and validated on the following
-infrastructures, the remaining `train.py` flags keeping their **default
-values**:
-* 8 Tesla P100 GPUs: `--batch_size=256`, `--initial_lr=0.01`
-* 4 Tesla P100 GPUs: `--batch_size=128`, `--initial_lr=0.005`
-
-## Exporting the Trained Model
-
-Assuming the training output, the TensorFlow checkpoint, is in the
-`gldv2_training` directory, running the following commands exports the model.
-
-### DELF local feature-only model
-
-This should be used when you are only interested in having a local feature
-model.
-
-```
-python3 model/export_local_model.py \
- --ckpt_path=gldv2_training/delf_weights \
- --export_path=gldv2_model_local
-```
-
-### DELG global feature-only model
-
-This should be used when you are only interested in having a global feature
-model.
-
-```
-python3 model/export_global_model.py \
- --ckpt_path=gldv2_training/delf_weights \
- --export_path=gldv2_model_global \
- --delg_global_features
-```
-
-### DELG local+global feature model
-
-This should be used when you are interested in jointly extracting local and
-global features.
-
-```
-python3 model/export_local_and_global_model.py \
- --ckpt_path=gldv2_training/delf_weights \
- --export_path=gldv2_model_local_and_global \
- --delg_global_features
-```
-
-### Kaggle-compatible global feature model
-
-To export a global feature model in the format required by the
-[2020 Landmark Retrieval challenge](https://www.kaggle.com/c/landmark-retrieval-2020),
-you can use the following command:
-
-*NOTE*: this command is helpful to use the model directly in the above-mentioned
-Kaggle competition; however, this is a different format than the one required in
-this DELF/DELG codebase (ie, if you export the model this way, the commands
-found in the [DELG instructions](../delg/DELG_INSTRUCTIONS.md) would not work).
-To export the model in a manner compatible to this codebase, use a similar
-command as the "DELG global feature-only model" above.
-
-```
-python3 model/export_global_model.py \
- --ckpt_path=gldv2_training/delf_weights \
- --export_path=gldv2_model_global \
- --input_scales_list=0.70710677,1.0,1.4142135 \
- --multi_scale_pool_type=sum \
- --normalize_global_descriptor
-```
-
-## Testing the trained model
-
-### Testing the trained local feature model
-
-After the trained model has been exported, it can be used to extract DELF
-features from 2 images of the same landmark and to perform a matching test
-between the 2 images based on the extracted features to validate they represent
-the same landmark.
-
-Start by downloading the Oxford buildings dataset:
-
-```
-mkdir data && cd data
-wget http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/oxbuild_images.tgz
-mkdir oxford5k_images oxford5k_features
-tar -xvzf oxbuild_images.tgz -C oxford5k_images/
-cd ../
-echo data/oxford5k_images/hertford_000056.jpg >> list_images.txt
-echo data/oxford5k_images/oxford_000317.jpg >> list_images.txt
-```
-
-Make a copy of the
-[`delf_config_example.pbtxt`](../examples/delf_config_example.pbtxt) protobuffer
-file which configures the DELF feature extraction. Update the file by making the
-following changes:
-
-* set the `model_path` attribute to the directory containing the exported
- model, `gldv2_model_local` in this example
-* add at the root level the attribute `is_tf2_exported` with the value `true`
-* set to `false` the `use_pca` attribute inside `delf_local_config`
-
-The ensuing file should resemble the following:
-
-```
-model_path: "gldv2_model_local"
-image_scales: .25
-image_scales: .3536
-image_scales: .5
-image_scales: .7071
-image_scales: 1.0
-image_scales: 1.4142
-image_scales: 2.0
-is_tf2_exported: true
-delf_local_config {
- use_pca: false
- max_feature_num: 1000
- score_threshold: 100.0
-}
-```
-
-Run the following command to extract DELF features for the images
-`hertford_000056.jpg` and `oxford_000317.jpg`:
-
-```
-python3 ../examples/extract_features.py \
- --config_path delf_config_example.pbtxt \
- --list_images_path list_images.txt \
- --output_dir data/oxford5k_features
-```
-
-Run the following command to perform feature matching between the images
-`hertford_000056.jpg` and `oxford_000317.jpg`:
-
-```
-python3 ../examples/match_images.py \
- --image_1_path data/oxford5k_images/hertford_000056.jpg \
- --image_2_path data/oxford5k_images/oxford_000317.jpg \
- --features_1_path data/oxford5k_features/hertford_000056.delf \
- --features_2_path data/oxford5k_features/oxford_000317.delf \
- --output_image matched_images.png
-```
-
-The generated image `matched_images.png` should look similar to this one:
-
-![MatchedImagesDemo](./matched_images_demo.png)
-
-### Testing the trained global (or global+local) feature model
-
-Please follow the [DELG instructions](../delg/DELG_INSTRUCTIONS.md). The only
-modification should be to pass a different `delf_config_path` when doing feature
-extraction, which should point to the newly-trained model. As described in the
-[DelfConfig](../../protos/delf_config.proto), you should set the
-`use_local_features` and `use_global_features` in the right way, depending on
-which feature modalities you are using. Note also that you should set
-`is_tf2_exported` to `true`.
diff --git a/research/delf/delf/python/training/__init__.py b/research/delf/delf/python/training/__init__.py
deleted file mode 100644
index c87f3d895c7..00000000000
--- a/research/delf/delf/python/training/__init__.py
+++ /dev/null
@@ -1,22 +0,0 @@
-# Copyright 2020 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Module for DELF training."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-# pylint: disable=unused-import
-from delf.python.training import build_image_dataset
-# pylint: enable=unused-import
diff --git a/research/delf/delf/python/training/build_image_dataset.py b/research/delf/delf/python/training/build_image_dataset.py
deleted file mode 100644
index 23103d49196..00000000000
--- a/research/delf/delf/python/training/build_image_dataset.py
+++ /dev/null
@@ -1,490 +0,0 @@
-#!/usr/bin/python
-# Copyright 2020 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Converts landmark image data to TFRecords file format with Example protos.
-
-The image data set is expected to reside in JPEG files ends up with '.jpg'.
-
-This script converts the training and testing data into
-a sharded data set consisting of TFRecord files
- train_directory/train-00000-of-00128
- train_directory/train-00001-of-00128
- ...
- train_directory/train-00127-of-00128
-and
- test_directory/test-00000-of-00128
- test_directory/test-00001-of-00128
- ...
- test_directory/test-00127-of-00128
-where we have selected 128 shards for both data sets. Each record
-within the TFRecord file is a serialized Example proto. The Example proto
-contains the following fields:
- image/encoded: string containing JPEG encoded image in RGB colorspace
- image/height: integer, image height in pixels
- image/width: integer, image width in pixels
- image/colorspace: string, specifying the colorspace, always 'RGB'
- image/channels: integer, specifying the number of channels, always 3
- image/format: string, specifying the format, always 'JPEG'
- image/filename: string, the unique id of the image file
- e.g. '97c0a12e07ae8dd5' or '650c989dd3493748'
-Furthermore, if the data set type is training, it would contain one more field:
- image/class/label: integer, the landmark_id from the input training csv file.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import csv
-import os
-
-from absl import app
-from absl import flags
-
-import numpy as np
-import pandas as pd
-import tensorflow as tf
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('train_directory', '/tmp/', 'Training data directory.')
-flags.DEFINE_string('test_directory', None,
- '(Optional) Testing data directory. Required only if '
- 'test_csv_path is not None.')
-flags.DEFINE_string('output_directory', '/tmp/', 'Output data directory.')
-flags.DEFINE_string('train_csv_path', '/tmp/train.csv',
- 'Training data csv file path.')
-flags.DEFINE_string('train_clean_csv_path', None,
- ('(Optional) Clean training data csv file path. '
- 'If provided, filters images keeping the ones listed in '
- 'this file. In this case, also outputs a CSV file '
- 'relabeling.csv mapping new labels to old ones.'))
-flags.DEFINE_string('test_csv_path', None,
- '(Optional) Testing data csv file path. If None or absent,'
- 'TFRecords for the images in the test dataset are not'
- 'generated')
-flags.DEFINE_integer('num_shards', 128, 'Number of shards in output data.')
-flags.DEFINE_boolean('generate_train_validation_splits', False,
- '(Optional) Whether to split the train dataset into'
- 'TRAIN and VALIDATION splits.')
-flags.DEFINE_float('validation_split_size', 0.2,
- '(Optional) The size of the VALIDATION split as a fraction'
- 'of the train dataset.')
-flags.DEFINE_integer('seed', 0,
- '(Optional) The seed to be used while shuffling the train'
- 'dataset when generating the TRAIN and VALIDATION splits.'
- 'Recommended for splits reproducibility purposes.')
-
-_FILE_IDS_KEY = 'file_ids'
-_IMAGE_PATHS_KEY = 'image_paths'
-_LABELS_KEY = 'labels'
-_TEST_SPLIT = 'test'
-_TRAIN_SPLIT = 'train'
-_VALIDATION_SPLIT = 'validation'
-
-
-def _get_all_image_files_and_labels(name, csv_path, image_dir):
- """Process input and get the image file paths, image ids and the labels.
-
- Args:
- name: 'train' or 'test'.
- csv_path: path to the Google-landmark Dataset csv Data Sources files.
- image_dir: directory that stores downloaded images.
- Returns:
- image_paths: the paths to all images in the image_dir.
- file_ids: the unique ids of images.
- labels: the landmark id of all images. When name='test', the returned labels
- will be an empty list.
- Raises:
- ValueError: if input name is not supported.
- """
- image_paths = tf.io.gfile.glob(os.path.join(image_dir, '*.jpg'))
- file_ids = [os.path.basename(os.path.normpath(f))[:-4] for f in image_paths]
- if name == _TRAIN_SPLIT:
- with tf.io.gfile.GFile(csv_path, 'rb') as csv_file:
- df = pd.read_csv(csv_file)
- df = df.set_index('id')
- labels = [int(df.loc[fid]['landmark_id']) for fid in file_ids]
- elif name == _TEST_SPLIT:
- labels = []
- else:
- raise ValueError('Unsupported dataset split name: %s' % name)
- return image_paths, file_ids, labels
-
-
-def _get_clean_train_image_files_and_labels(csv_path, image_dir):
- """Get image file paths, image ids and labels for the clean training split.
-
- Args:
- csv_path: path to the Google-landmark Dataset v2 CSV Data Sources files
- of the clean train dataset. Assumes CSV header landmark_id;images.
- image_dir: directory that stores downloaded images.
-
- Returns:
- image_paths: the paths to all images in the image_dir.
- file_ids: the unique ids of images.
- labels: the landmark id of all images.
- relabeling: relabeling rules created to replace actual labels with
- a continuous set of labels.
- """
- # Load the content of the CSV file (landmark_id/label -> images).
- with tf.io.gfile.GFile(csv_path, 'rb') as csv_file:
- df = pd.read_csv(csv_file)
-
- # Create the dictionary (key = image_id, value = {label, file_id}).
- images = {}
- for _, row in df.iterrows():
- label = row['landmark_id']
- for file_id in row['images'].split(' '):
- images[file_id] = {}
- images[file_id]['label'] = label
- images[file_id]['file_id'] = file_id
-
- # Add the full image path to the dictionary of images.
- image_paths = tf.io.gfile.glob(os.path.join(image_dir, '*.jpg'))
- for image_path in image_paths:
- file_id = os.path.basename(os.path.normpath(image_path))[:-4]
- if file_id in images:
- images[file_id]['image_path'] = image_path
-
- # Explode the dictionary into lists (1 per image attribute).
- image_paths = []
- file_ids = []
- labels = []
- for _, value in images.items():
- image_paths.append(value['image_path'])
- file_ids.append(value['file_id'])
- labels.append(value['label'])
-
- # Relabel image labels to contiguous values.
- unique_labels = sorted(set(labels))
- relabeling = {label: index for index, label in enumerate(unique_labels)}
- new_labels = [relabeling[label] for label in labels]
- return image_paths, file_ids, new_labels, relabeling
-
-
-def _process_image(filename):
- """Process a single image file.
-
- Args:
- filename: string, path to an image file e.g., '/path/to/example.jpg'.
-
- Returns:
- image_buffer: string, JPEG encoding of RGB image.
- height: integer, image height in pixels.
- width: integer, image width in pixels.
- Raises:
- ValueError: if parsed image has wrong number of dimensions or channels.
- """
- # Read the image file.
- with tf.io.gfile.GFile(filename, 'rb') as f:
- image_data = f.read()
-
- # Decode the RGB JPEG.
- image = tf.io.decode_jpeg(image_data, channels=3)
-
- # Check that image converted to RGB
- if len(image.shape) != 3:
- raise ValueError('The parsed image number of dimensions is not 3 but %d' %
- (image.shape))
- height = image.shape[0]
- width = image.shape[1]
- if image.shape[2] != 3:
- raise ValueError('The parsed image channels is not 3 but %d' %
- (image.shape[2]))
-
- return image_data, height, width
-
-
-def _int64_feature(value):
- """Returns an int64_list from a bool / enum / int / uint."""
- return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
-
-
-def _bytes_feature(value):
- """Returns a bytes_list from a string / byte."""
- return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
-
-
-def _convert_to_example(file_id, image_buffer, height, width, label=None):
- """Build an Example proto for the given inputs.
-
- Args:
- file_id: string, unique id of an image file, e.g., '97c0a12e07ae8dd5'.
- image_buffer: string, JPEG encoding of RGB image.
- height: integer, image height in pixels.
- width: integer, image width in pixels.
- label: integer, the landmark id and prediction label.
-
- Returns:
- Example proto.
- """
- colorspace = 'RGB'
- channels = 3
- image_format = 'JPEG'
- features = {
- 'image/height': _int64_feature(height),
- 'image/width': _int64_feature(width),
- 'image/colorspace': _bytes_feature(colorspace.encode('utf-8')),
- 'image/channels': _int64_feature(channels),
- 'image/format': _bytes_feature(image_format.encode('utf-8')),
- 'image/id': _bytes_feature(file_id.encode('utf-8')),
- 'image/encoded': _bytes_feature(image_buffer)
- }
- if label is not None:
- features['image/class/label'] = _int64_feature(label)
- example = tf.train.Example(features=tf.train.Features(feature=features))
-
- return example
-
-
-def _write_tfrecord(output_prefix, image_paths, file_ids, labels):
- """Read image files and write image and label data into TFRecord files.
-
- Args:
- output_prefix: string, the prefix of output files, e.g. 'train'.
- image_paths: list of strings, the paths to images to be converted.
- file_ids: list of strings, the image unique ids.
- labels: list of integers, the landmark ids of images. It is an empty list
- when output_prefix='test'.
-
- Raises:
- ValueError: if the length of input images, ids and labels don't match
- """
- if output_prefix == _TEST_SPLIT:
- labels = [None] * len(image_paths)
- if not len(image_paths) == len(file_ids) == len(labels):
- raise ValueError('length of image_paths, file_ids, labels shoud be the' +
- ' same. But they are %d, %d, %d, respectively' %
- (len(image_paths), len(file_ids), len(labels)))
-
- spacing = np.linspace(0, len(image_paths), FLAGS.num_shards + 1, dtype=np.int)
-
- for shard in range(FLAGS.num_shards):
- output_file = os.path.join(
- FLAGS.output_directory,
- '%s-%.5d-of-%.5d' % (output_prefix, shard, FLAGS.num_shards))
- writer = tf.io.TFRecordWriter(output_file)
- print('Processing shard ', shard, ' and writing file ', output_file)
- for i in range(spacing[shard], spacing[shard + 1]):
- image_buffer, height, width = _process_image(image_paths[i])
- example = _convert_to_example(file_ids[i], image_buffer, height, width,
- labels[i])
- writer.write(example.SerializeToString())
- writer.close()
-
-
-def _write_relabeling_rules(relabeling_rules):
- """Write to a file the relabeling rules when the clean train dataset is used.
-
- Args:
- relabeling_rules: dictionary of relabeling rules applied when the clean
- train dataset is used (key = old_label, value = new_label).
- """
- relabeling_file_name = os.path.join(FLAGS.output_directory,
- 'relabeling.csv')
- with tf.io.gfile.GFile(relabeling_file_name, 'w') as relabeling_file:
- csv_writer = csv.writer(relabeling_file, delimiter=',')
- csv_writer.writerow(['new_label', 'old_label'])
- for old_label, new_label in relabeling_rules.items():
- csv_writer.writerow([new_label, old_label])
-
-
-def _shuffle_by_columns(np_array, random_state):
- """Shuffle the columns of a 2D numpy array.
-
- Args:
- np_array: array to shuffle.
- random_state: numpy RandomState to be used for shuffling.
- Returns:
- The shuffled array.
- """
- columns = np_array.shape[1]
- columns_indices = np.arange(columns)
- random_state.shuffle(columns_indices)
- return np_array[:, columns_indices]
-
-
-def _build_train_and_validation_splits(image_paths, file_ids, labels,
- validation_split_size, seed):
- """Create TRAIN and VALIDATION splits containg all labels in equal proportion.
-
- Args:
- image_paths: list of paths to the image files in the train dataset.
- file_ids: list of image file ids in the train dataset.
- labels: list of image labels in the train dataset.
- validation_split_size: size of the VALIDATION split as a ratio of the train
- dataset.
- seed: seed to use for shuffling the dataset for reproducibility purposes.
-
- Returns:
- splits : tuple containing the TRAIN and VALIDATION splits.
- Raises:
- ValueError: if the image attributes arrays don't all have the same length,
- which makes the shuffling impossible.
- """
- # Ensure all image attribute arrays have the same length.
- total_images = len(file_ids)
- if not (len(image_paths) == total_images and len(labels) == total_images):
- raise ValueError('Inconsistencies between number of file_ids (%d), number '
- 'of image_paths (%d) and number of labels (%d). Cannot'
- 'shuffle the train dataset.'% (total_images,
- len(image_paths),
- len(labels)))
-
- # Stack all image attributes arrays in a single 2D array of dimensions
- # (3, number of images) and group by label the indices of datapoins in the
- # image attributes arrays. Explicitly convert label types from 'int' to 'str'
- # to avoid implicit conversion during stacking with image_paths and file_ids
- # which are 'str'.
- labels_str = [str(label) for label in labels]
- image_attrs = np.stack((image_paths, file_ids, labels_str))
- image_attrs_idx_by_label = {}
- for index, label in enumerate(labels):
- if label not in image_attrs_idx_by_label:
- image_attrs_idx_by_label[label] = []
- image_attrs_idx_by_label[label].append(index)
-
- # Create subsets of image attributes by label, shuffle them separately and
- # split each subset into TRAIN and VALIDATION splits based on the size of the
- # validation split.
- splits = {
- _VALIDATION_SPLIT: [],
- _TRAIN_SPLIT: []
- }
- rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(seed)))
- for label, indexes in image_attrs_idx_by_label.items():
- # Create the subset for the current label.
- image_attrs_label = image_attrs[:, indexes]
- # Shuffle the current label subset.
- image_attrs_label = _shuffle_by_columns(image_attrs_label, rs)
- # Split the current label subset into TRAIN and VALIDATION splits and add
- # each split to the list of all splits.
- images_per_label = image_attrs_label.shape[1]
- cutoff_idx = max(1, int(validation_split_size * images_per_label))
- splits[_VALIDATION_SPLIT].append(image_attrs_label[:, 0 : cutoff_idx])
- splits[_TRAIN_SPLIT].append(image_attrs_label[:, cutoff_idx : ])
-
- # Concatenate all subsets of image attributes into TRAIN and VALIDATION splits
- # and reshuffle them again to ensure variance of labels across batches.
- validation_split = _shuffle_by_columns(
- np.concatenate(splits[_VALIDATION_SPLIT], axis=1), rs)
- train_split = _shuffle_by_columns(
- np.concatenate(splits[_TRAIN_SPLIT], axis=1), rs)
-
- # Unstack the image attribute arrays in the TRAIN and VALIDATION splits and
- # convert them back to lists. Convert labels back to 'int' from 'str'
- # following the explicit type change from 'str' to 'int' for stacking.
- return (
- {
- _IMAGE_PATHS_KEY: validation_split[0, :].tolist(),
- _FILE_IDS_KEY: validation_split[1, :].tolist(),
- _LABELS_KEY: [int(label) for label in validation_split[2, :].tolist()]
- }, {
- _IMAGE_PATHS_KEY: train_split[0, :].tolist(),
- _FILE_IDS_KEY: train_split[1, :].tolist(),
- _LABELS_KEY: [int(label) for label in train_split[2, :].tolist()]
- })
-
-
-def _build_train_tfrecord_dataset(csv_path,
- clean_csv_path,
- image_dir,
- generate_train_validation_splits,
- validation_split_size,
- seed):
- """Build a TFRecord dataset for the train split.
-
- Args:
- csv_path: path to the train Google-landmark Dataset csv Data Sources files.
- clean_csv_path: path to the Google-landmark Dataset v2 CSV Data Sources
- files of the clean train dataset.
- image_dir: directory that stores downloaded images.
- generate_train_validation_splits: whether to split the test dataset into
- TRAIN and VALIDATION splits.
- validation_split_size: size of the VALIDATION split as a ratio of the train
- dataset. Only used if 'generate_train_validation_splits' is True.
- seed: seed to use for shuffling the dataset for reproducibility purposes.
- Only used if 'generate_train_validation_splits' is True.
-
- Returns:
- Nothing. After the function call, sharded TFRecord files are materialized.
- Raises:
- ValueError: if the size of the VALIDATION split is outside (0,1) when TRAIN
- and VALIDATION splits need to be generated.
- """
- # Make sure the size of the VALIDATION split is inside (0, 1) if we need to
- # generate the TRAIN and VALIDATION splits.
- if generate_train_validation_splits:
- if validation_split_size <= 0 or validation_split_size >= 1:
- raise ValueError('Invalid VALIDATION split size. Expected inside (0,1)'
- 'but received %f.' % validation_split_size)
-
- if clean_csv_path:
- # Load clean train images and labels and write the relabeling rules.
- (image_paths, file_ids, labels,
- relabeling_rules) = _get_clean_train_image_files_and_labels(clean_csv_path,
- image_dir)
- _write_relabeling_rules(relabeling_rules)
- else:
- # Load all train images.
- image_paths, file_ids, labels = _get_all_image_files_and_labels(
- _TRAIN_SPLIT, csv_path, image_dir)
-
- if generate_train_validation_splits:
- # Generate the TRAIN and VALIDATION splits and write them to TFRecord.
- validation_split, train_split = _build_train_and_validation_splits(
- image_paths, file_ids, labels, validation_split_size, seed)
- _write_tfrecord(_VALIDATION_SPLIT,
- validation_split[_IMAGE_PATHS_KEY],
- validation_split[_FILE_IDS_KEY],
- validation_split[_LABELS_KEY])
- _write_tfrecord(_TRAIN_SPLIT,
- train_split[_IMAGE_PATHS_KEY],
- train_split[_FILE_IDS_KEY],
- train_split[_LABELS_KEY])
- else:
- # Write to TFRecord a single split, TRAIN.
- _write_tfrecord(_TRAIN_SPLIT, image_paths, file_ids, labels)
-
-
-def _build_test_tfrecord_dataset(csv_path, image_dir):
- """Build a TFRecord dataset for the 'test' split.
-
- Args:
- csv_path: path to the 'test' Google-landmark Dataset csv Data Sources files.
- image_dir: directory that stores downloaded images.
-
- Returns:
- Nothing. After the function call, sharded TFRecord files are materialized.
- """
- image_paths, file_ids, labels = _get_all_image_files_and_labels(
- _TEST_SPLIT, csv_path, image_dir)
- _write_tfrecord(_TEST_SPLIT, image_paths, file_ids, labels)
-
-
-def main(unused_argv):
- _build_train_tfrecord_dataset(FLAGS.train_csv_path,
- FLAGS.train_clean_csv_path,
- FLAGS.train_directory,
- FLAGS.generate_train_validation_splits,
- FLAGS.validation_split_size,
- FLAGS.seed)
- if FLAGS.test_csv_path is not None:
- _build_test_tfrecord_dataset(FLAGS.test_csv_path, FLAGS.test_directory)
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/training/download_dataset.sh b/research/delf/delf/python/training/download_dataset.sh
deleted file mode 100755
index ecbd905eccd..00000000000
--- a/research/delf/delf/python/training/download_dataset.sh
+++ /dev/null
@@ -1,161 +0,0 @@
-#!/bin/bash
-
-# Copyright 2020 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# This script downloads the Google Landmarks v2 dataset. To download the dataset
-# run the script like in the following example:
-# bash download_dataset.sh 500 100 20
-#
-# The script takes the following parameters, in order:
-# - number of image files from the TRAIN split to download (maximum 500)
-# - number of image files from the INDEX split to download (maximum 100)
-# - number of image files from the TEST split to download (maximum 20)
-
-image_files_train=$1 # Number of image files to download from the TRAIN split
-image_files_index=$2 # Number of image files to download from the INDEX split
-image_files_test=$3 # Number of image files to download from the TEST split
-
-splits=("train" "test" "index")
-dataset_root_folder=gldv2_dataset
-
-metadata_url="https://s3.amazonaws.com/google-landmark/metadata"
-ground_truth_url="https://s3.amazonaws.com/google-landmark/ground_truth"
-csv_train=(${metadata_url}/train.csv ${metadata_url}/train_clean.csv ${metadata_url}/train_attribution.csv ${metadata_url}/train_label_to_category.csv)
-csv_index=(${metadata_url}/index.csv ${metadata_url}/index_image_to_landmark.csv ${metadata_url}/index_label_to_category.csv)
-csv_test=(${metadata_url}/test.csv ${ground_truth_url}/recognition_solution_v2.1.csv ${ground_truth_url}/retrieval_solution_v2.1.csv)
-
-images_tar_file_base_url="https://s3.amazonaws.com/google-landmark"
-images_md5_file_base_url="https://s3.amazonaws.com/google-landmark/md5sum"
-num_processes=6
-
-make_folder() {
- # Creates a folder and checks if it exists. Exits if folder creation fails.
- local folder=$1
- if [ -d "${folder}" ]; then
- echo "Folder ${folder} already exists. Skipping folder creation."
- else
- echo "Creating folder ${folder}."
- if mkdir ${folder}; then
- echo "Successfully created folder ${folder}."
- else
- echo "Failed to create folder ${folder}. Exiting."
- exit 1
- fi
- fi
-}
-
-download_file() {
- # Downloads a file from an URL into a specified folder.
- local file_url=$1
- local folder=$2
- local file_path="${folder}/`basename ${file_url}`"
- echo "Downloading file ${file_url} to folder ${folder}."
- pushd . > /dev/null
- cd ${folder}
- curl -Os ${file_url}
- popd > /dev/null
-}
-
-validate_md5_checksum() {
- # Validate the MD5 checksum of a downloaded file.
- local content_file=$1
- local md5_file=$2
- echo "Checking MD5 checksum of file ${content_file} against ${md5_file}"
- if [[ "${OSTYPE}" == "linux-gnu" ]]; then
- content_md5=`md5sum ${content_file}`
- elif [[ "${OSTYPE}" == "darwin"* ]]; then
- content_md5=`md5 -r "${content_file}"`
- fi
- content_md5=`cut -d' ' -f1<<<"${content_md5}"`
- expected_md5=`cut -d' ' -f1<<${max_idx}?${max_idx}:${curr_max_idx}))
- for j in $(seq ${i} 1 ${last_idx}); do download_image_file "${split}" "${j}" "${split_folder}" & done
- wait
- done
-}
-
-download_csv_files() {
- # Downloads all medatada CSV files of a split.
- local split=$1
- local split_folder=$2
- local csv_list="csv_${split}[*]"
- for csv_file in ${!csv_list}; do
- download_file "${csv_file}" "${split_folder}"
- done
-}
-
-download_split() {
- # Downloads all artifacts, metadata CSV files and image files of a single split.
- local split=$1
- local split_folder=${dataset_root_folder}/${split}
- make_folder "${split_folder}"
- download_csv_files "${split}" "${split_folder}"
- download_image_files "${split}" "${split_folder}"
-}
-
-download_all_splits() {
- # Downloads all artifacts, metadata CSV files and image files of all splits.
- make_folder "${dataset_root_folder}"
- for split in "${splits[@]}"; do
- download_split "$split"
- done
-}
-
-download_all_splits
-
-exit 0
diff --git a/research/delf/delf/python/training/global_features/README.md b/research/delf/delf/python/training/global_features/README.md
deleted file mode 100644
index 4ca68316263..00000000000
--- a/research/delf/delf/python/training/global_features/README.md
+++ /dev/null
@@ -1,174 +0,0 @@
-## Global features: CNN Image Retrieval
-
-
-This Python toolbox implements the training and testing of the approach described in the papers:
-
-[![Paper](http://img.shields.io/badge/paper-arXiv.2001.05027-B3181B.svg)](https://arxiv.org/abs/1711.02512)
-
-```
-"Fine-tuning CNN Image Retrieval with No Human Annotation",
-Radenović F., Tolias G., Chum O.,
-TPAMI 2018
-```
-
-[![Paper](http://img.shields.io/badge/paper-arXiv.2001.05027-B3181B.svg)](http://arxiv.org/abs/1604.02426)
-```
-"CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples",
-Radenović F., Tolias G., Chum O.,
-ECCV 2016
-```
-
-Fine-tuned CNNs are used for global feature extraction with the goal of using
-those for image retrieval. The networks are trained on the SfM120k
-landmark images dataset.
-
-
-
-When initializing the network, one of the popular pre-trained architectures
- for classification tasks (such as ResNet or VGG) is used as the network’s
- backbone. The
-fully connected layers of such architectures are discarded, resulting in a fully
-convolutional backbone. Then, given an input image of the size [W × H × C],
-where C is the number of channels, W and H are image width and height,
-respectively; the output is a tensor X with dimensions [W' × H' × K], where
-K is the number of feature maps in the last layer. Tensor X
-can be considered as a set of the input image’s deep local features. For
-deep convolutional features, the simple aggregation approach based on global
-pooling arguably provides the best results. This method is fast, has a small
-number of parameters, and a low risk of overfitting. Keeping this in mind,
-we convert local features to a global descriptor vector using one of the
-retrieval system’s global poolings (MAC, SPoC, or GeM). After this stage,
-the feature vector is made up of the maximum activation per feature map
-with dimensionality equal to K. The final output dimensionality for the most
-common networks varies from 512 to 2048, making this image representation
-relatively compact.
-
-Vectors that have been pooled are subsequently L2-normalized. The obtained
- representation is then optionally passed through the fully connected
-layers before being subjected to a
-new L2 re-normalization. The finally produced image representation allows
-comparing the resemblance of two images by simply using their inner product.
-
-
-### Install DELF library
-
-To be able to use this code, please follow
-[these instructions](../../../../INSTALL_INSTRUCTIONS.md) to properly install
-the DELF library.
-
-### Usage
-
-
- Training
-
- Navigate (```cd```) to the folder ```[DELF_ROOT/delf/python/training
- /global_features].```
- Example training script is located in ```DELF_ROOT/delf/python/training/global_features/train.py```.
- ```
- python3 train.py [--arch ARCH] [--batch_size N] [--data_root PATH]
- [--debug] [--directory PATH] [--epochs N] [--gpu_id ID]
- [--image_size SIZE] [--launch_tensorboard] [--loss LOSS]
- [--loss_margin LM] [--lr LR] [--momentum M] [multiscale SCALES]
- [--neg_num N] [--optimizer OPTIMIZER] [--pool POOL] [--pool_size N]
- [--pretrained] [--precompute_whitening DATASET] [--resume]
- [--query_size N] [--test_datasets DATASET] [--test_freq N]
- [--test_whiten] [--training_dataset DATASET] [--update_every N]
- [--validation_type TYPE] [--weight_decay N] [--whitening]
- ```
-
- For detailed explanation of the options run:
- ```
- python3 train.py -helpfull
- ```
- Standard training of our models was run with the following parameters:
- ```
-python3 train.py \
---directory="DESTINATION_PATH" \
---gpu_ids='0' \
---data_root="TRAINING_DATA_DIRECTORY" \
---training_dataset='retrieval-SfM-120k' \
---test_datasets='roxford5k,rparis6k' \
---arch='ResNet101' \
---pool='gem' \
---whitening=True \
---debug=True \
---loss='triplet' \
---loss_margin=0.85 \
---optimizer='adam' \
---lr=5e-7 --neg_num=3 --query_size=2000 \
---pool_size=20000 --batch_size=5 \
---image_size=1024 --epochs=100 --test_freq=5 \
---multiscale='[1, 2**(1/2), 1/2**(1/2)]'
-```
-
- **Note**: Data and networks used for training and testing are automatically downloaded when using the example training
- script (```DELF_ROOT/delf/python/training/global_features/train.py```).
-
-
-
-
-Training logic flow
-
-**Initialization phase**
-
-1. Checking if required datasets are downloaded and automatically download them (both test and train/val) if they are
-not present in the data folder.
-1. Setting up the logging and creating a logging/checkpoint directory.
-1. Initialize model according to the user-provided parameters (architecture
-/pooling/whitening/pretrained etc.).
-1. Defining loss (contrastive/triplet) according to the user parameters.
-1. Defining optimizer (Adam/SGD with learning rate/weight decay/momentum) according to the user parameters.
-1. Initializing CheckpointManager and resuming from the latest checkpoint if the resume flag is set.
-1. Launching Tensorboard if the flag is set.
-1. Initializing training (and validation, if required) datasets.
-1. Freezing BatchNorm weights update, since we we do training for one image at a time so the statistics would not be per batch, hence we choose freezing (i.e., using pretrained imagenet statistics).
-1. Evaluating the network performance before training (on the test datasets).
-
-**Training phase**
-
-The main training loop (for the required number of epochs):
-1. Finding the hard negative pairs in the dataset (using the forward pass through the model)
-1. Creating the training dataset from generator which changes every epoch. Each
- element in the dataset consists of 1 x Positive image, 1 x Query image
- , N x Hard negative images (N is specified by the `num_neg` flag), an array
- specifying the Positive (-1), Query (0), Negative (1) images.
-1. Performing one training step and calculating the final epoch loss.
-1. If validation is required, finding hard negatives in the validation set
-, which has the same structure as the training set. Performing one validation
- step and calculating the loss.
-1. Evaluating on the test datasets every `test_freq` epochs.
-1. Saving checkpoint (optimizer and the model weights).
-
-
-
-## Exporting the Trained Model
-
-Assuming the training output, the TensorFlow checkpoint, is located in the
-`--directory` path. The following code exports the model:
-```
-python3 model/export_CNN_global_model.py \
- [--ckpt_path PATH] [--export_path PATH] [--input_scales_list LIST]
- [--multi_scale_pool_type TYPE] [--normalize_global_descriptor BOOL]
- [arch ARCHITECTURE] [pool POOLING] [whitening BOOL]
-```
-*NOTE:* Path to the checkpoint must include .h5 file.
-
-## Testing the trained model
-After the trained model has been exported, it can be used to extract global
-features similarly as for the DELG model. Please follow
-[these instructions](https://github.com/tensorflow/models/tree/master/research/delf/delf/python/training#testing-the-trained-model).
-
-After training the standard training setup for 100 epochs, the
- following results are obtained on Roxford and RParis datasets under a single
- -scale evaluation:
-```
->> roxford5k: mAP E: 74.88, M: 58.28, H: 30.4
->> roxford5k: mP@k[1, 5, 10] E: [89.71 84.8 79.07],
- M: [91.43 84.67 78.24],
- H: [68.57 53.29 43.29]
-
->> rparis6k: mAP E: 89.21, M: 73.69, H: 49.1
->> rparis6k: mP@k[1, 5, 10] E: [98.57 97.43 95.57],
- M: [98.57 99.14 98.14],
- H: [94.29 90. 87.29]
-```
\ No newline at end of file
diff --git a/research/delf/delf/python/training/global_features/__init__.py b/research/delf/delf/python/training/global_features/__init__.py
deleted file mode 100644
index cd947f3f090..00000000000
--- a/research/delf/delf/python/training/global_features/__init__.py
+++ /dev/null
@@ -1,19 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Global model training."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
diff --git a/research/delf/delf/python/training/global_features/train.py b/research/delf/delf/python/training/global_features/train.py
deleted file mode 100644
index 8558594f62d..00000000000
--- a/research/delf/delf/python/training/global_features/train.py
+++ /dev/null
@@ -1,362 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Training script for Global Features model."""
-
-import math
-import os
-
-from absl import app
-from absl import flags
-from absl import logging
-import numpy as np
-import tensorflow as tf
-import tensorflow_addons as tfa
-
-from delf.python.datasets.sfm120k import dataset_download
-from delf.python.datasets.sfm120k import sfm120k
-from delf.python.training import global_features_utils
-from delf.python.training import tensorboard_utils
-from delf.python.training.global_features import train_utils
-from delf.python.training.losses import ranking_losses
-from delf.python.training.model import global_model
-
-_LOSS_NAMES = ['contrastive', 'triplet']
-_MODEL_NAMES = global_features_utils.get_standard_keras_models()
-_OPTIMIZER_NAMES = ['sgd', 'adam']
-_POOL_NAMES = ['mac', 'spoc', 'gem']
-_PRECOMPUTE_WHITEN_NAMES = ['retrieval-SfM-30k', 'retrieval-SfM-120k']
-_TEST_DATASET_NAMES = ['roxford5k', 'rparis6k']
-_TRAINING_DATASET_NAMES = ['retrieval-SfM-120k']
-_VALIDATION_TYPES = ['standard', 'eccv2020']
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_boolean('debug', False, 'Debug mode.')
-
-# Export directory, training and val datasets, test datasets.
-flags.DEFINE_string('data_root', "data",
- 'Absolute path to the folder containing training data.')
-flags.DEFINE_string('directory', "data",
- 'Destination where trained network should be saved.')
-flags.DEFINE_enum('training_dataset', 'retrieval-SfM-120k',
- _TRAINING_DATASET_NAMES, 'Training dataset: ' +
- ' | '.join(_TRAINING_DATASET_NAMES) + '.')
-flags.DEFINE_enum('validation_type', None, _VALIDATION_TYPES,
- 'Type of the evaluation to use. Either `None`, `standard` '
- 'or `eccv2020`.')
-flags.DEFINE_list('test_datasets', 'roxford5k,rparis6k',
- 'Comma separated list of test datasets: ' +
- ' | '.join(_TEST_DATASET_NAMES) + '.')
-flags.DEFINE_enum('precompute_whitening', None, _PRECOMPUTE_WHITEN_NAMES,
- 'Dataset used to learn whitening: ' +
- ' | '.join(_PRECOMPUTE_WHITEN_NAMES) + '.')
-flags.DEFINE_integer('test_freq', 5,
- 'Run test evaluation every N epochs.')
-flags.DEFINE_list('multiscale', [1.],
- 'Use multiscale vectors for testing, ' +
- ' examples: 1 | 1,1/2**(1/2),1/2 | 1,2**(1/2),1/2**(1/2)]. '
- 'Pass as a string of comma separated values.')
-
-# Network architecture and initialization options.
-flags.DEFINE_enum('arch', 'ResNet101', _MODEL_NAMES,
- 'Model architecture: ' + ' | '.join(_MODEL_NAMES) + '.')
-flags.DEFINE_enum('pool', 'gem', _POOL_NAMES,
- 'Pooling options: ' + ' | '.join(_POOL_NAMES) + '.')
-flags.DEFINE_bool('whitening', False,
- 'Whether to train model with learnable whitening ('
- 'linear layer) after the pooling.')
-flags.DEFINE_bool('pretrained', True,
- 'Whether to initialize model with random weights ('
- 'default: pretrained on imagenet).')
-flags.DEFINE_enum('loss', 'contrastive', _LOSS_NAMES,
- 'Training loss options: ' + ' | '.join(_LOSS_NAMES) + '.')
-flags.DEFINE_float('loss_margin', 0.7, 'Loss margin.')
-
-# train/val options specific for image retrieval learning.
-flags.DEFINE_integer('image_size', 1024,
- 'Maximum size of longer image side used for training.')
-flags.DEFINE_integer('neg_num', 5, 'Number of negative images per train/val '
- 'tuple.')
-flags.DEFINE_integer('query_size', 2000,
- 'Number of queries randomly drawn per one training epoch.')
-flags.DEFINE_integer('pool_size', 20000,
- 'Size of the pool for hard negative mining.')
-
-# Standard training/validation options.
-flags.DEFINE_string('gpu_id', '0', 'GPU id used for training.')
-flags.DEFINE_integer('epochs', 100, 'Number of total epochs to run.')
-flags.DEFINE_integer('batch_size', 5,
- 'Number of (q,p,n1,...,nN) tuples in a mini-batch.')
-flags.DEFINE_integer('update_every', 1,
- 'Update model weights every N batches, used to handle '
- 'relatively large batches, batch_size effectively '
- 'becomes update_every `x` batch_size.')
-flags.DEFINE_enum('optimizer', 'adam', _OPTIMIZER_NAMES,
- 'Optimizer options: ' + ' | '.join(_OPTIMIZER_NAMES) + '.')
-flags.DEFINE_float('lr', 1e-6, 'Initial learning rate.')
-flags.DEFINE_float('momentum', 0.9, 'Momentum.')
-flags.DEFINE_float('weight_decay', 1e-6, 'Weight decay.')
-flags.DEFINE_bool('resume', False,
- 'Whether to start from the latest checkpoint in the logdir.')
-flags.DEFINE_bool('launch_tensorboard', False, 'Whether to launch tensorboard.')
-
-
-def main(argv):
- if len(argv) > 1:
- raise RuntimeError('Too many command-line arguments.')
-
- # Manually check if there are unknown test datasets and if the dataset
- # ground truth files are downloaded.
- for dataset in FLAGS.test_datasets:
- if dataset not in _TEST_DATASET_NAMES:
- raise ValueError('Unsupported or unknown test dataset: {}.'.format(
- dataset))
-
- test_data_config = os.path.join(FLAGS.data_root,
- 'gnd_{}.pkl'.format(dataset))
- if not tf.io.gfile.exists(test_data_config):
- raise ValueError(
- '{} ground truth file at {} not found. Please download it '
- 'according to '
- 'the DELG instructions.'.format(dataset, FLAGS.data_root))
-
- # Check if train dataset is downloaded and download it if not found.
- dataset_download.download_train(FLAGS.data_root)
-
- # Creating model export directory if it does not exist.
- model_directory = global_features_utils.create_model_directory(
- FLAGS.training_dataset, FLAGS.arch, FLAGS.pool, FLAGS.whitening,
- FLAGS.pretrained, FLAGS.loss, FLAGS.loss_margin, FLAGS.optimizer,
- FLAGS.lr, FLAGS.weight_decay, FLAGS.neg_num, FLAGS.query_size,
- FLAGS.pool_size, FLAGS.batch_size, FLAGS.update_every,
- FLAGS.image_size, FLAGS.directory)
-
- # Setting up logging directory, same as where the model is stored.
- logging.get_absl_handler().use_absl_log_file('absl_logging', model_directory)
-
- # Set cuda visible device.
- os.environ['CUDA_VISIBLE_DEVICES'] = FLAGS.gpu_id
- global_features_utils.debug_and_log('>> Num GPUs Available: {}'.format(
- len(tf.config.experimental.list_physical_devices('GPU'))),
- FLAGS.debug)
-
- # Set random seeds.
- tf.random.set_seed(0)
- np.random.seed(0)
-
- # Initialize the model.
- if FLAGS.pretrained:
- global_features_utils.debug_and_log(
- '>> Using pre-trained model \'{}\''.format(FLAGS.arch))
- else:
- global_features_utils.debug_and_log(
- '>> Using model from scratch (random weights) \'{}\'.'.format(
- FLAGS.arch))
-
- model_params = {'architecture': FLAGS.arch, 'pooling': FLAGS.pool,
- 'whitening': FLAGS.whitening, 'pretrained': FLAGS.pretrained,
- 'data_root': FLAGS.data_root}
- model = global_model.GlobalFeatureNet(**model_params)
-
- # Freeze running mean and std in batch normalization layers.
- # We do training one image at a time to improve memory requirements of
- # the network; therefore, the computed statistics would not be per a
- # batch. Instead, we choose freezing - setting the parameters of all
- # batch norm layers in the network to non-trainable (i.e., using original
- # imagenet statistics).
- for layer in model.feature_extractor.layers:
- if isinstance(layer, tf.keras.layers.BatchNormalization):
- layer.trainable = False
-
- global_features_utils.debug_and_log('>> Network initialized.')
-
- global_features_utils.debug_and_log('>> Loss: {}.'.format(FLAGS.loss))
- # Define the loss function.
- if FLAGS.loss == 'contrastive':
- criterion = ranking_losses.ContrastiveLoss(margin=FLAGS.loss_margin)
- elif FLAGS.loss == 'triplet':
- criterion = ranking_losses.TripletLoss(margin=FLAGS.loss_margin)
- else:
- raise ValueError('Loss {} not available.'.format(FLAGS.loss))
-
- # Defining parameters for the training.
- # When pre-computing whitening, we run evaluation before the network training
- # and the `start_epoch` is set to 0. In other cases, we start from epoch 1.
- start_epoch = 1
- exp_decay = math.exp(-0.01)
- decay_steps = FLAGS.query_size / FLAGS.batch_size
-
- # Define learning rate decay schedule.
- lr_scheduler = tf.keras.optimizers.schedules.ExponentialDecay(
- initial_learning_rate=FLAGS.lr,
- decay_steps=decay_steps,
- decay_rate=exp_decay)
-
- # Define the optimizer.
- if FLAGS.optimizer == 'sgd':
- opt = tfa.optimizers.extend_with_decoupled_weight_decay(
- tf.keras.optimizers.SGD)
- optimizer = opt(weight_decay=FLAGS.weight_decay,
- learning_rate=lr_scheduler, momentum=FLAGS.momentum)
- elif FLAGS.optimizer == 'adam':
- opt = tfa.optimizers.extend_with_decoupled_weight_decay(
- tf.keras.optimizers.Adam)
- optimizer = opt(weight_decay=FLAGS.weight_decay, learning_rate=lr_scheduler)
- else:
- raise ValueError('Optimizer {} not available.'.format(FLAGS.optimizer))
-
- # Initializing logging.
- writer = tf.summary.create_file_writer(model_directory)
- tf.summary.experimental.set_step(1)
-
- # Setting up the checkpoint manager.
- checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
- manager = tf.train.CheckpointManager(
- checkpoint,
- model_directory,
- max_to_keep=10,
- keep_checkpoint_every_n_hours=3)
- if FLAGS.resume:
- # Restores the checkpoint, if existing.
- global_features_utils.debug_and_log('>> Continuing from a checkpoint.')
- checkpoint.restore(manager.latest_checkpoint)
-
- # Launching tensorboard if required.
- if FLAGS.launch_tensorboard:
- tensorboard = tf.keras.callbacks.TensorBoard(model_directory)
- tensorboard.set_model(model=model)
- tensorboard_utils.launch_tensorboard(log_dir=model_directory)
-
- # Log flags used.
- global_features_utils.debug_and_log('>> Running training script with:')
- global_features_utils.debug_and_log('>> logdir = {}'.format(model_directory))
-
- if FLAGS.training_dataset.startswith('retrieval-SfM-120k'):
- train_dataset = sfm120k.CreateDataset(
- data_root=FLAGS.data_root,
- mode='train',
- imsize=FLAGS.image_size,
- num_negatives=FLAGS.neg_num,
- num_queries=FLAGS.query_size,
- pool_size=FLAGS.pool_size
- )
- if FLAGS.validation_type is not None:
- val_dataset = sfm120k.CreateDataset(
- data_root=FLAGS.data_root,
- mode='val',
- imsize=FLAGS.image_size,
- num_negatives=FLAGS.neg_num,
- num_queries=float('Inf'),
- pool_size=float('Inf'),
- eccv2020=True if FLAGS.validation_type == 'eccv2020' else False
- )
-
- train_dataset_output_types = [tf.float32 for i in range(2 + FLAGS.neg_num)]
- train_dataset_output_types.append(tf.int32)
-
- global_features_utils.debug_and_log(
- '>> Training the {} network'.format(model_directory))
- global_features_utils.debug_and_log('>> GPU ids: {}'.format(FLAGS.gpu_id))
-
- with writer.as_default():
-
- # Precompute whitening if needed.
- if FLAGS.precompute_whitening is not None:
- epoch = 0
- train_utils.test_retrieval(
- FLAGS.test_datasets, model, writer=writer,
- epoch=epoch, model_directory=model_directory,
- precompute_whitening=FLAGS.precompute_whitening,
- data_root=FLAGS.data_root,
- multiscale=FLAGS.multiscale)
-
- for epoch in range(start_epoch, FLAGS.epochs + 1):
- # Set manual seeds per epoch.
- np.random.seed(epoch)
- tf.random.set_seed(epoch)
-
- # Find hard-negatives.
- # While hard-positive examples are fixed during the whole training
- # process and are randomly chosen from every epoch; hard-negatives
- # depend on the current CNN parameters and are re-mined once per epoch.
- avg_neg_distance = train_dataset.create_epoch_tuples(model)
-
- def _train_gen():
- return (inst for inst in train_dataset)
-
- train_loader = tf.data.Dataset.from_generator(
- _train_gen,
- output_types=tuple(train_dataset_output_types))
-
- loss = train_utils.train_val_one_epoch(
- loader=iter(train_loader), model=model,
- criterion=criterion, optimizer=optimizer, epoch=epoch,
- batch_size=FLAGS.batch_size, query_size=FLAGS.query_size,
- neg_num=FLAGS.neg_num, update_every=FLAGS.update_every,
- debug=FLAGS.debug)
-
- # Write a scalar summary.
- tf.summary.scalar('train_epoch_loss', loss, step=epoch)
- # Forces summary writer to send any buffered data to storage.
- writer.flush()
-
- # Evaluate on validation set.
- if FLAGS.validation_type is not None and (epoch % FLAGS.test_freq == 0 or
- epoch == 1):
- avg_neg_distance = val_dataset.create_epoch_tuples(model,
- model_directory)
-
- def _val_gen():
- return (inst for inst in val_dataset)
-
- val_loader = tf.data.Dataset.from_generator(
- _val_gen, output_types=tuple(train_dataset_output_types))
-
- loss = train_utils.train_val_one_epoch(
- loader=iter(val_loader), model=model,
- criterion=criterion, optimizer=None,
- epoch=epoch, train=False, batch_size=FLAGS.batch_size,
- query_size=FLAGS.query_size, neg_num=FLAGS.neg_num,
- update_every=FLAGS.update_every, debug=FLAGS.debug)
- tf.summary.scalar('val_epoch_loss', loss, step=epoch)
- writer.flush()
-
- # Evaluate on test datasets every test_freq epochs.
- if epoch == 1 or epoch % FLAGS.test_freq == 0:
- train_utils.test_retrieval(
- FLAGS.test_datasets, model, writer=writer, epoch=epoch,
- model_directory=model_directory,
- precompute_whitening=FLAGS.precompute_whitening,
- data_root=FLAGS.data_root, multiscale=FLAGS.multiscale)
-
- # Saving checkpoints and model weights.
- try:
- save_path = manager.save(checkpoint_number=epoch)
- global_features_utils.debug_and_log(
- 'Saved ({}) at {}'.format(epoch, save_path))
-
- filename = os.path.join(model_directory,
- 'checkpoint_epoch_{}.h5'.format(epoch))
- model.save_weights(filename, save_format='h5')
- global_features_utils.debug_and_log(
- 'Saved weights ({}) at {}'.format(epoch, filename))
- except Exception as ex:
- global_features_utils.debug_and_log(
- 'Could not save checkpoint: {}'.format(ex))
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/training/global_features/train_utils.py b/research/delf/delf/python/training/global_features/train_utils.py
deleted file mode 100644
index 4eb5f80349a..00000000000
--- a/research/delf/delf/python/training/global_features/train_utils.py
+++ /dev/null
@@ -1,382 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Training utilities for Global Features model."""
-
-import os
-import pickle
-import time
-
-import numpy as np
-import tensorflow as tf
-
-from delf.python import whiten
-from delf.python.datasets.revisited_op import dataset as test_dataset
-from delf.python.datasets.sfm120k import sfm120k
-from delf.python.training import global_features_utils
-from delf.python.training.model import global_model
-
-
-def _compute_loss_and_gradient(criterion, model, input, target, neg_num=5):
- """Records gradients and loss through the network.
-
- Args:
- criterion: Loss function.
- model: Network for the gradient computation.
- input: Tuple of query, positive and negative images.
- target: List of indexes to specify queries (-1), positives(1), negatives(0).
- neg_num: Integer, number of negatives per a tuple.
-
- Returns:
- loss: Loss for the training step.
- gradients: Computed gradients for the network trainable variables.
- """
- # Record gradients and loss through the network.
- with tf.GradientTape() as tape:
- descriptors = tf.zeros(shape=(0, model.meta['outputdim']), dtype=tf.float32)
- for img in input:
- # Compute descriptor vector for each image.
- o = model(tf.expand_dims(img, axis=0), training=True)
- descriptors = tf.concat([descriptors, o], 0)
-
- queries = descriptors[target == -1]
- positives = descriptors[target == 1]
- negatives = descriptors[target == 0]
-
- negatives = tf.reshape(negatives, [tf.shape(queries)[0], neg_num,
- model.meta['outputdim']])
- # Loss calculation.
- loss = criterion(queries, positives, negatives)
-
- return loss, tape.gradient(loss, model.trainable_variables)
-
-
-def train_val_one_epoch(
- loader, model, criterion, optimizer, epoch, train=True, batch_size=5,
- query_size=2000, neg_num=5, update_every=1, debug=False):
- """Executes either training or validation step based on `train` value.
-
- Args:
- loader: Training/validation iterable dataset.
- model: Network to train/validate.
- criterion: Loss function.
- optimizer: Network optimizer.
- epoch: Integer, epoch number.
- train: Bool, specifies training or validation phase.
- batch_size: Integer, number of (q,p,n1,...,nN) tuples in a mini-batch.
- query_size: Integer, number of queries randomly drawn per one training
- epoch.
- neg_num: Integer, number of negatives per a tuple.
- update_every: Integer, update model weights every N batches, used to
- handle relatively large batches batch_size effectively becomes
- update_every x batch_size.
- debug: Bool, whether debug mode is used.
-
- Returns:
- average_epoch_loss: Average epoch loss.
- """
- batch_time = global_features_utils.AverageMeter()
- data_time = global_features_utils.AverageMeter()
- losses = global_features_utils.AverageMeter()
-
- # Retrieve all trainable variables we defined in the graph.
- tvs = model.trainable_variables
- accum_grads = [tf.zeros_like(tv.read_value()) for tv in tvs]
-
- end = time.time()
- batch_num = 0
- print_frequency = 10
- all_batch_num = query_size // batch_size
- state = 'Train' if train else 'Val'
- global_features_utils.debug_and_log('>> {} step:'.format(state))
-
- # For every batch in the dataset; Stops when all batches in the dataset have
- # been processed.
- while True:
- data_time.update(time.time() - end)
-
- if train:
- try:
- # Train on one batch.
- # Each image in the batch is loaded into memory consecutively.
- for _ in range(batch_size):
- # Because the images are not necessarily of the same size, we can't
- # set the batch size with .batch().
- batch = loader.get_next()
- input_tuple = batch[0:-1]
- target_tuple = batch[-1]
-
- loss_value, grads = _compute_loss_and_gradient(
- criterion, model, input_tuple, target_tuple, neg_num)
- losses.update(loss_value)
- # Accumulate gradients.
- accum_grads += grads
-
- # Perform weight update if required.
- if (batch_num + 1) % update_every == 0 or (
- batch_num + 1) == all_batch_num:
- # Do one step for multiple batches. Accumulated gradients are
- # used.
- optimizer.apply_gradients(
- zip(accum_grads, model.trainable_variables))
- accum_grads = [tf.zeros_like(tv.read_value()) for tv in tvs]
- # We break when we run out of range, i.e., we exhausted all dataset
- # images.
- except tf.errors.OutOfRangeError:
- break
-
- else:
- # Validate one batch.
- # We load full batch into memory.
- input = []
- target = []
- try:
- for _ in range(batch_size):
- # Because the images are not necessarily of the same size, we can't
- # set the batch size with .batch().
- batch = loader.get_next()
- input.append(batch[0:-1])
- target.append(batch[-1])
- # We break when we run out of range, i.e., we exhausted all dataset
- # images.
- except tf.errors.OutOfRangeError:
- break
-
- descriptors = tf.zeros(shape=(0, model.meta['outputdim']),
- dtype=tf.float32)
-
- for input_tuple in input:
- for img in input_tuple:
- # Compute the global descriptor vector.
- model_out = model(tf.expand_dims(img, axis=0), training=False)
- descriptors = tf.concat([descriptors, model_out], 0)
-
- # No need to reduce memory consumption (no backward pass):
- # Compute loss for the full batch.
- queries = descriptors[target == -1]
- positives = descriptors[target == 1]
- negatives = descriptors[target == 0]
- negatives = tf.reshape(negatives, [tf.shape(queries)[0], neg_num,
- model.meta['outputdim']])
- loss = criterion(queries, positives, negatives)
-
- # Record loss.
- losses.update(loss / batch_size, batch_size)
-
- # Measure elapsed time.
- batch_time.update(time.time() - end)
- end = time.time()
-
- # Record immediate loss and elapsed time.
- if debug and ((batch_num + 1) % print_frequency == 0 or
- batch_num == 0 or (batch_num + 1) == all_batch_num):
- global_features_utils.debug_and_log(
- '>> {0}: [{1} epoch][{2}/{3} batch]\t Time val: {'
- 'batch_time.val:.3f} '
- '(Batch Time avg: {batch_time.avg:.3f})\t Data {'
- 'data_time.val:.3f} ('
- 'Time avg: {data_time.avg:.3f})\t Immediate loss value: {'
- 'loss.val:.4f} '
- '(Loss avg: {loss.avg:.4f})'.format(
- state, epoch, batch_num + 1, all_batch_num,
- batch_time=batch_time,
- data_time=data_time, loss=losses), debug=True, log=False)
- batch_num += 1
-
- return losses.avg
-
-
-def test_retrieval(datasets, net, epoch, writer=None, model_directory=None,
- precompute_whitening=None, data_root='data', multiscale=[1.],
- test_image_size=1024):
- """Testing step.
-
- Evaluates the network on the provided test datasets by computing single-scale
- mAP for easy/medium/hard cases. If `writer` is specified, saves the mAP
- values in a tensorboard supported format.
-
- Args:
- datasets: List of dataset names for model testing (from
- `_TEST_DATASET_NAMES`).
- net: Network to evaluate.
- epoch: Integer, epoch number.
- writer: Tensorboard writer.
- model_directory: String, path to the model directory.
- precompute_whitening: Dataset used to learn whitening. If no
- precomputation required, then `None`. Only 'retrieval-SfM-30k' and
- 'retrieval-SfM-120k' datasets are supported for whitening pre-computation.
- data_root: Absolute path to the data folder.
- multiscale: List of scales for multiscale testing.
- test_image_size: Integer, maximum size of the test images.
- """
- global_features_utils.debug_and_log(">> Testing step:")
- global_features_utils.debug_and_log(
- '>> Evaluating network on test datasets...')
-
- # Precompute whitening.
- if precompute_whitening is not None:
-
- # If whitening already precomputed, load it and skip the computations.
- filename = os.path.join(
- model_directory, 'learned_whitening_mP_{}_epoch.pkl'.format(epoch))
- filename_layer = os.path.join(
- model_directory,
- 'learned_whitening_layer_config_{}_epoch.pkl'.format(
- epoch))
-
- if tf.io.gfile.exists(filename):
- global_features_utils.debug_and_log(
- '>> {}: Whitening for this epoch is already precomputed. '
- 'Loading...'.format(precompute_whitening))
- with tf.io.gfile.GFile(filename, 'rb') as learned_whitening_file:
- learned_whitening = pickle.load(learned_whitening_file)
-
- else:
- start = time.time()
- global_features_utils.debug_and_log(
- '>> {}: Learning whitening...'.format(precompute_whitening))
-
- # Loading db.
- db_root = os.path.join(data_root, 'train', precompute_whitening)
- ims_root = os.path.join(db_root, 'ims')
- db_filename = os.path.join(db_root,
- '{}-whiten.pkl'.format(precompute_whitening))
- with tf.io.gfile.GFile(db_filename, 'rb') as f:
- db = pickle.load(f)
- images = [sfm120k.id2filename(db['cids'][i], ims_root) for i in
- range(len(db['cids']))]
-
- # Extract whitening vectors.
- global_features_utils.debug_and_log(
- '>> {}: Extracting...'.format(precompute_whitening))
- wvecs = global_model.extract_global_descriptors_from_list(net, images,
- test_image_size)
-
- # Learning whitening.
- global_features_utils.debug_and_log(
- '>> {}: Learning...'.format(precompute_whitening))
- wvecs = wvecs.numpy()
- mean_vector, projection_matrix = whiten.whitenlearn(wvecs, db['qidxs'],
- db['pidxs'])
- learned_whitening = {'m': mean_vector, 'P': projection_matrix}
-
- global_features_utils.debug_and_log(
- '>> {}: Elapsed time: {}'.format(precompute_whitening,
- global_features_utils.htime(
- time.time() - start)))
- # Save learned_whitening parameters for a later use.
- with tf.io.gfile.GFile(filename, 'wb') as learned_whitening_file:
- pickle.dump(learned_whitening, learned_whitening_file)
-
- # Saving whitening as a layer.
- bias = -np.dot(mean_vector.T, projection_matrix.T)
- whitening_layer = tf.keras.layers.Dense(
- net.meta['outputdim'],
- activation=None,
- use_bias=True,
- kernel_initializer=tf.keras.initializers.Constant(
- projection_matrix.T),
- bias_initializer=tf.keras.initializers.Constant(bias)
- )
- with tf.io.gfile.GFile(filename_layer, 'wb') as learned_whitening_file:
- pickle.dump(whitening_layer.get_config(), learned_whitening_file)
- else:
- learned_whitening = None
-
- # Evaluate on test datasets.
- for dataset in datasets:
- start = time.time()
-
- # Prepare config structure for the test dataset.
- cfg = test_dataset.CreateConfigForTestDataset(dataset,
- os.path.join(data_root))
- images = [cfg['im_fname'](cfg, i) for i in range(cfg['n'])]
- qimages = [cfg['qim_fname'](cfg, i) for i in range(cfg['nq'])]
- bounding_boxes = [tuple(cfg['gnd'][i]['bbx']) for i in range(cfg['nq'])]
-
- # Extract database and query vectors.
- global_features_utils.debug_and_log(
- '>> {}: Extracting database images...'.format(dataset))
- vecs = global_model.extract_global_descriptors_from_list(
- net, images, test_image_size, scales=multiscale)
- global_features_utils.debug_and_log(
- '>> {}: Extracting query images...'.format(dataset))
- qvecs = global_model.extract_global_descriptors_from_list(
- net, qimages, test_image_size, bounding_boxes,
- scales=multiscale)
-
- global_features_utils.debug_and_log('>> {}: Evaluating...'.format(dataset))
-
- # Convert the obtained descriptors to numpy.
- vecs = vecs.numpy()
- qvecs = qvecs.numpy()
-
- # Search, rank and print test set metrics.
- _calculate_metrics_and_export_to_tensorboard(vecs, qvecs, dataset, cfg,
- writer, epoch, whiten=False)
-
- if learned_whitening is not None:
- # Whiten the vectors.
- mean_vector = learned_whitening['m']
- projection_matrix = learned_whitening['P']
- vecs_lw = whiten.whitenapply(vecs, mean_vector, projection_matrix)
- qvecs_lw = whiten.whitenapply(qvecs, mean_vector, projection_matrix)
-
- # Search, rank, and print.
- _calculate_metrics_and_export_to_tensorboard(
- vecs_lw, qvecs_lw, dataset, cfg, writer, epoch, whiten=True)
-
- global_features_utils.debug_and_log(
- '>> {}: Elapsed time: {}'.format(
- dataset, global_features_utils.htime(time.time() - start)))
-
-
-def _calculate_metrics_and_export_to_tensorboard(vecs, qvecs, dataset, cfg,
- writer, epoch, whiten=False):
- """
- Calculates metrics and exports them to tensorboard.
-
- Args:
- vecs: Numpy array dataset global descriptors.
- qvecs: Numpy array query global descriptors.
- dataset: String, one of `_TEST_DATASET_NAMES`.
- cfg: Dataset configuration.
- writer: Tensorboard writer.
- epoch: Integer, epoch number.
- whiten: Boolean, whether the metrics are with for whitening used as a
- post-processing step. Affects the name of the extracted TensorBoard
- metrics.
- """
- # Search, rank and print test set metrics.
- scores = np.dot(vecs.T, qvecs)
- ranks = np.transpose(np.argsort(-scores, axis=0))
-
- metrics = global_features_utils.compute_metrics_and_print(dataset, ranks,
- cfg['gnd'])
- # Save calculated metrics in a tensorboard format.
- if writer:
- if whiten:
- metric_names = ['test_accuracy_whiten_{}_E'.format(dataset),
- 'test_accuracy_whiten_{}_M'.format(dataset),
- 'test_accuracy_whiten_{}_H'.format(dataset)]
- else:
- metric_names = ['test_accuracy_{}_E'.format(dataset),
- 'test_accuracy_{}_M'.format(dataset),
- 'test_accuracy_{}_H'.format(dataset)]
- tf.summary.scalar(metric_names[0], metrics[0][0], step=epoch)
- tf.summary.scalar(metric_names[1], metrics[1][0], step=epoch)
- tf.summary.scalar(metric_names[2], metrics[2][0], step=epoch)
- writer.flush()
- return None
diff --git a/research/delf/delf/python/training/global_features_utils.py b/research/delf/delf/python/training/global_features_utils.py
deleted file mode 100644
index 273dabc46de..00000000000
--- a/research/delf/delf/python/training/global_features_utils.py
+++ /dev/null
@@ -1,221 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Utilities for the global model training."""
-
-import os
-
-from absl import logging
-
-import numpy as np
-import tensorflow as tf
-
-from delf.python.datasets.revisited_op import dataset as revisited_dataset
-
-
-class AverageMeter():
- """Computes and stores the average and current value of loss."""
-
- def __init__(self):
- """Initialization of the AverageMeter."""
- self.reset()
-
- def reset(self):
- """Resets all the values."""
- self.val = 0
- self.avg = 0
- self.sum = 0
- self.count = 0
-
- def update(self, val, n=1):
- """Updates values in the AverageMeter.
- Args:
- val: Float, loss value.
- n: Integer, number of instances.
- """
- self.val = val
- self.sum += val * n
- self.count += n
- self.avg = self.sum / self.count
-
-
-def compute_metrics_and_print(dataset_name,
- sorted_index_ids,
- ground_truth,
- desired_pr_ranks=None,
- log=True):
- """Computes and logs ground-truth metrics for Revisited datasets.
- Args:
- dataset_name: String, name of the dataset.
- sorted_index_ids: Integer NumPy array of shape [#queries, #index_images].
- For each query, contains an array denoting the most relevant index images,
- sorted from most to least relevant.
- ground_truth: List containing ground-truth information for dataset. Each
- entry is a dict corresponding to the ground-truth information for a query.
- The dict has keys 'ok' and 'junk', mapping to a NumPy array of integers.
- desired_pr_ranks: List of integers containing the desired precision/recall
- ranks to be reported. E.g., if precision@1/recall@1 and
- precision@10/recall@10 are desired, this should be set to [1, 10]. The
- largest item should be <= #sorted_index_ids. Default: [1, 5, 10].
- log: Whether to log results using logging.info().
- Returns:
- mAP: (metricsE, metricsM, metricsH) Tuple of the metrics for different
- levels of complexity. Each metrics is a list containing:
- mean_average_precision (float), mean_precisions (NumPy array of
- floats, with shape [len(desired_pr_ranks)]), mean_recalls (NumPy array
- of floats, with shape [len(desired_pr_ranks)]), average_precisions
- (NumPy array of floats, with shape [#queries]), precisions (NumPy array of
- floats, with shape [#queries, len(desired_pr_ranks)]), recalls (NumPy
- array of floats, with shape [#queries, len(desired_pr_ranks)]).
- Raises:
- ValueError: If an unknown dataset name is provided as an argument.
- """
- if dataset_name not in revisited_dataset.DATASET_NAMES:
- raise ValueError('Unknown dataset: {}!'.format(dataset))
-
- if desired_pr_ranks is None:
- desired_pr_ranks = [1, 5, 10]
-
- (easy_ground_truth, medium_ground_truth,
- hard_ground_truth) = revisited_dataset.ParseEasyMediumHardGroundTruth(
- ground_truth)
-
- metrics_easy = revisited_dataset.ComputeMetrics(sorted_index_ids,
- easy_ground_truth,
- desired_pr_ranks)
- metrics_medium = revisited_dataset.ComputeMetrics(sorted_index_ids,
- medium_ground_truth,
- desired_pr_ranks)
- metrics_hard = revisited_dataset.ComputeMetrics(sorted_index_ids,
- hard_ground_truth,
- desired_pr_ranks)
-
- debug_and_log(
- '>> {}: mAP E: {}, M: {}, H: {}'.format(
- dataset_name, np.around(metrics_easy[0] * 100, decimals=2),
- np.around(metrics_medium[0] * 100, decimals=2),
- np.around(metrics_hard[0] * 100, decimals=2)),
- log=log)
-
- debug_and_log(
- '>> {}: mP@k{} E: {}, M: {}, H: {}'.format(
- dataset_name, desired_pr_ranks,
- np.around(metrics_easy[1] * 100, decimals=2),
- np.around(metrics_medium[1] * 100, decimals=2),
- np.around(metrics_hard[1] * 100, decimals=2)),
- log=log)
-
- return metrics_easy, metrics_medium, metrics_hard
-
-
-def htime(time_difference):
- """Time formatting function.
- Depending on the value of `time_difference` outputs time in an appropriate
- time format.
- Args:
- time_difference: Float, time difference between the two events.
- Returns:
- time: String representing time in an appropriate time format.
- """
- time_difference = round(time_difference)
-
- days = time_difference // 86400
- hours = time_difference // 3600 % 24
- minutes = time_difference // 60 % 60
- seconds = time_difference % 60
-
- if days > 0:
- return '{:d}d {:d}h {:d}m {:d}s'.format(days, hours, minutes, seconds)
- if hours > 0:
- return '{:d}h {:d}m {:d}s'.format(hours, minutes, seconds)
- if minutes > 0:
- return '{:d}m {:d}s'.format(minutes, seconds)
- return '{:d}s'.format(seconds)
-
-
-def debug_and_log(msg, debug=True, log=True, debug_on_the_same_line=False):
- """Outputs `msg` to both stdout (if in the debug mode) and the log file.
- Args:
- msg: String, message to be logged.
- debug: Bool, if True, will print `msg` to stdout.
- log: Bool, if True, will redirect `msg` to the logfile.
- debug_on_the_same_line: Bool, if True, will print `msg` to stdout without a
- new line. When using this mode, logging to a logfile is disabled.
- """
- if debug_on_the_same_line:
- print(msg, end='')
- return
- if debug:
- print(msg)
- if log:
- logging.info(msg)
-
-
-def get_standard_keras_models():
- """Gets the standard keras model names.
- Returns:
- model_names: List, names of the standard keras models.
- """
- model_names = sorted(
- name for name in tf.keras.applications.__dict__
- if not name.startswith('__') and
- callable(tf.keras.applications.__dict__[name]))
- return model_names
-
-
-def create_model_directory(training_dataset, arch, pool, whitening, pretrained,
- loss, loss_margin, optimizer, lr, weight_decay,
- neg_num, query_size, pool_size, batch_size,
- update_every, image_size, directory):
- """Based on the model parameters, creates the model directory.
- If the model directory does not exist, the directory is created.
- Args:
- training_dataset: String, training dataset name.
- arch: String, model architecture.
- pool: String, pooling option.
- whitening: Bool, whether the model is trained with global whitening.
- pretrained: Bool, whether the model is initialized with the precomputed
- weights.
- loss: String, training loss type.
- loss_margin: Float, loss margin.
- optimizer: Sting, used optimizer.
- lr: Float, initial learning rate.
- weight_decay: Float, weight decay.
- neg_num: Integer, Number of negative images per train/val tuple.
- query_size: Integer, number of queries per one training epoch.
- pool_size: Integer, size of the pool for hard negative mining.
- batch_size: Integer, batch size.
- update_every: Integer, frequency of the model weights update.
- image_size: Integer, maximum size of longer image side used for training.
- directory: String, destination where trained network should be saved.
- Returns:
- folder: String, path to the model folder.
- """
- folder = '{}_{}_{}'.format(training_dataset, arch, pool)
- if whitening:
- folder += '_whiten'
- if not pretrained:
- folder += '_notpretrained'
- folder += ('_{}_m{:.2f}_{}_lr{:.1e}_wd{:.1e}_nnum{}_qsize{}_psize{}_bsize{}'
- '_uevery{}_imsize{}').format(loss, loss_margin, optimizer, lr,
- weight_decay, neg_num, query_size,
- pool_size, batch_size, update_every,
- image_size)
-
- folder = os.path.join(directory, folder)
- debug_and_log(
- '>> Creating directory if does not exist:\n>> \'{}\''.format(folder))
- if not os.path.exists(folder):
- os.makedirs(folder)
- return folder
diff --git a/research/delf/delf/python/training/install_delf.sh b/research/delf/delf/python/training/install_delf.sh
deleted file mode 100755
index 5e54bf8005c..00000000000
--- a/research/delf/delf/python/training/install_delf.sh
+++ /dev/null
@@ -1,154 +0,0 @@
-#!/bin/bash
-
-# Copyright 2020 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# This script installs the DELF package along with its dependencies. To install
-# the DELF package run the script like in the following example:
-# bash install_delf.sh
-
-protoc_folder="protoc"
-protoc_url="https://github.com/google/protobuf/releases/download/v3.3.0/protoc-3.3.0-linux-x86_64.zip"
-tf_slim_git_repo="https://github.com/google-research/tf-slim.git"
-
-handle_exit_code() {
- # Fail gracefully in case of an exit code different than 0.
- exit_code=$1
- error_message=$2
- if [ ${exit_code} -ne 0 ]; then
- echo "${error_message} Exiting."
- exit 1
- fi
-}
-
-install_tensorflow() {
- # Install TensorFlow 2.2.
- echo "Installing TensorFlow 2.2"
- pip3 install --upgrade tensorflow==2.2.0
- pip3 install tensorflow-addons==0.11.2
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to install Tensorflow 2.2."
- echo "Installing TensorFlow 2.2 for GPU"
- pip3 install --upgrade tensorflow-gpu==2.2.0
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to install Tensorflow for GPU 2.2.0."
-}
-
-install_tf_slim() {
- # Install TF-Slim from source.
- echo "Installing TF-Slim from source: ${git_repo}"
- git clone -b v1.1.0 ${tf_slim_git_repo}
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to clone TF-Slim repository ${tf_slim_git_repo}."
- pushd . > /dev/null
- cd tf-slim
- pip3 install .
- popd > /dev/null
- rm -rf tf-slim
-}
-
-download_protoc() {
- # Installs the Protobuf compiler protoc.
- echo "Downloading Protobuf compiler from ${protoc_url}"
- curl -L -Os ${protoc_url}
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to download Protobuf compiler from ${tf_slim_git_repo}."
-
- mkdir ${protoc_folder}
- local protoc_archive=`basename ${protoc_url}`
- unzip ${protoc_archive} -d ${protoc_folder}
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to unzip Protobuf compiler from ${protoc_archive}."
-
- rm ${protoc_archive}
-}
-
-compile_delf_protos() {
- # Compiles DELF protobufs from tensorflow/models/research/delf using the potoc compiler.
- echo "Compiling DELF Protobufs"
- PATH_TO_PROTOC="`pwd`/${protoc_folder}"
- pushd . > /dev/null
- cd ../../..
- ${PATH_TO_PROTOC}/bin/protoc delf/protos/*.proto --python_out=.
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to compile DELF Protobufs."
- popd > /dev/null
-}
-
-cleanup_protoc() {
- # Removes the downloaded Protobuf compiler protoc after the installation of the DELF package.
- echo "Cleaning up Protobuf compiler download"
- rm -rf ${protoc_folder}
-}
-
-install_python_libraries() {
- # Installs Python libraries upon which the DELF package has dependencies.
- echo "Installing matplotlib, numpy, scikit-image, scipy and python3-tk"
- pip3 install matplotlib numpy scikit-image scipy
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to install at least one of: matplotlib numpy scikit-image scipy."
- sudo apt-get -y install python3-tk
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to install python3-tk."
-}
-
-install_object_detection() {
- # Installs the object detection package from tensorflow/models/research.
- echo "Installing object detection"
- pushd . > /dev/null
- cd ../../../..
- export PYTHONPATH=$PYTHONPATH:`pwd`
- pip3 install .
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to install the object_detection package."
- popd > /dev/null
-}
-
-install_delf_package() {
- # Installs the DELF package from tensorflow/models/research/delf/delf.
- echo "Installing DELF package"
- pushd . > /dev/null
- cd ../../..
- pip3 install -e .
- local exit_code=$?
- handle_exit_code ${exit_code} "Unable to install the DELF package."
- popd > /dev/null
-}
-
-post_install_check() {
- # Checks the DELF package has been successfully installed.
- echo "Checking DELF package installation"
- python3 -c 'import delf'
- local exit_code=$?
- handle_exit_code ${exit_code} "DELF package installation check failed."
- echo "Installation successful."
-}
-
-install_delf() {
- # Orchestrates DELF package installation.
- install_tensorflow
- install_tf_slim
- download_protoc
- compile_delf_protos
- cleanup_protoc
- install_python_libraries
- install_object_detection
- install_delf_package
- post_install_check
-}
-
-install_delf
-
-exit 0
diff --git a/research/delf/delf/python/training/losses/__init__.py b/research/delf/delf/python/training/losses/__init__.py
deleted file mode 100644
index 9064f503de1..00000000000
--- a/research/delf/delf/python/training/losses/__init__.py
+++ /dev/null
@@ -1,14 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
\ No newline at end of file
diff --git a/research/delf/delf/python/training/losses/ranking_losses.py b/research/delf/delf/python/training/losses/ranking_losses.py
deleted file mode 100644
index fc7c2844790..00000000000
--- a/research/delf/delf/python/training/losses/ranking_losses.py
+++ /dev/null
@@ -1,175 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Ranking loss definitions."""
-
-import tensorflow as tf
-
-
-class ContrastiveLoss(tf.keras.losses.Loss):
- """Contrastive Loss layer.
-
- Contrastive Loss layer allows to compute contrastive loss for a batch of
- images. Implementation based on: https://arxiv.org/abs/1604.02426.
- """
-
- def __init__(self, margin=0.7, reduction=tf.keras.losses.Reduction.NONE):
- """Initialization of Contrastive Loss layer.
-
- Args:
- margin: Float contrastive loss margin.
- reduction: Type of loss reduction.
- """
- super(ContrastiveLoss, self).__init__(reduction)
- self.margin = margin
- # Parameter for numerical stability.
- self.eps = 1e-6
-
- def __call__(self, queries, positives, negatives):
- """Invokes the Contrastive Loss instance.
-
- Args:
- queries: [batch_size, dim] Anchor input tensor.
- positives: [batch_size, dim] Positive sample input tensor.
- negatives: [batch_size, num_neg, dim] Negative sample input tensor.
-
- Returns:
- loss: Scalar tensor.
- """
- return contrastive_loss(
- queries, positives, negatives, margin=self.margin, eps=self.eps)
-
-
-class TripletLoss(tf.keras.losses.Loss):
- """Triplet Loss layer.
-
- Triplet Loss layer computes triplet loss for a batch of images. Triplet
- loss tries to keep all queries closer to positives than to any negatives.
- Margin is used to specify when a triplet has become too "easy" and we no
- longer want to adjust the weights from it. Differently from the Contrastive
- Loss, Triplet Loss uses squared distances when computing the loss.
- Implementation based on: https://arxiv.org/abs/1511.07247.
- """
-
- def __init__(self, margin=0.1, reduction=tf.keras.losses.Reduction.NONE):
- """Initialization of Triplet Loss layer.
-
- Args:
- margin: Triplet loss margin.
- reduction: Type of loss reduction.
- """
- super(TripletLoss, self).__init__(reduction)
- self.margin = margin
-
- def __call__(self, queries, positives, negatives):
- """Invokes the Triplet Loss instance.
-
- Args:
- queries: [batch_size, dim] Anchor input tensor.
- positives: [batch_size, dim] Positive sample input tensor.
- negatives: [batch_size, num_neg, dim] Negative sample input tensor.
-
- Returns:
- loss: Scalar tensor.
- """
- return triplet_loss(queries, positives, negatives, margin=self.margin)
-
-
-def contrastive_loss(queries, positives, negatives, margin=0.7, eps=1e-6):
- """Calculates Contrastive Loss.
-
- We expect the `queries`, `positives` and `negatives` to be normalized with
- unit length for training stability. The contrastive loss directly
- optimizes this distance by encouraging all positive distances to
- approach 0, while keeping negative distances above a certain threshold.
-
- Args:
- queries: [batch_size, dim] Anchor input tensor.
- positives: [batch_size, dim] Positive sample input tensor.
- negatives: [batch_size, num_neg, dim] Negative sample input tensor.
- margin: Float contrastive loss loss margin.
- eps: Float parameter for numerical stability.
-
- Returns:
- loss: Scalar tensor.
- """
- dim = tf.shape(queries)[1]
- # Number of `queries`.
- batch_size = tf.shape(queries)[0]
- # Number of `positives`.
- np = tf.shape(positives)[0]
- # Number of `negatives`.
- num_neg = tf.shape(negatives)[1]
-
- # Preparing negatives.
- stacked_negatives = tf.reshape(negatives, [num_neg * batch_size, dim])
-
- # Preparing queries for further loss calculation.
- stacked_queries = tf.repeat(queries, num_neg + 1, axis=0)
- positives_and_negatives = tf.concat([positives, stacked_negatives], axis=0)
-
- # Calculate an Euclidean norm for each pair of points. For any positive
- # pair of data points this distance should be small, and for
- # negative pair it should be large.
- distances = tf.norm(stacked_queries - positives_and_negatives + eps, axis=1)
-
- positives_part = 0.5 * tf.pow(distances[:np], 2.0)
- negatives_part = 0.5 * tf.pow(
- tf.math.maximum(margin - distances[np:], 0), 2.0)
-
- # Final contrastive loss calculation.
- loss = tf.reduce_sum(tf.concat([positives_part, negatives_part], 0))
- return loss
-
-
-def triplet_loss(queries, positives, negatives, margin=0.1):
- """Calculates Triplet Loss.
-
- Triplet loss tries to keep all queries closer to positives than to any
- negatives. Differently from the Contrastive Loss, Triplet Loss uses squared
- distances when computing the loss.
-
- Args:
- queries: [batch_size, dim] Anchor input tensor.
- positives: [batch_size, dim] Positive sample input tensor.
- negatives: [batch_size, num_neg, dim] Negative sample input tensor.
- margin: Float triplet loss loss margin.
-
- Returns:
- loss: Scalar tensor.
- """
- dim = tf.shape(queries)[1]
- # Number of `queries`.
- batch_size = tf.shape(queries)[0]
- # Number of `negatives`.
- num_neg = tf.shape(negatives)[1]
-
- # Preparing negatives.
- stacked_negatives = tf.reshape(negatives, [num_neg * batch_size, dim])
-
- # Preparing queries for further loss calculation.
- stacked_queries = tf.repeat(queries, num_neg, axis=0)
-
- # Preparing positives for further loss calculation.
- stacked_positives = tf.repeat(positives, num_neg, axis=0)
-
- # Computes *squared* distances.
- distance_positives = tf.reduce_sum(
- tf.square(stacked_queries - stacked_positives), axis=1)
- distance_negatives = tf.reduce_sum(
- tf.square(stacked_queries - stacked_negatives), axis=1)
- # Final triplet loss calculation.
- loss = tf.reduce_sum(
- tf.maximum(distance_positives - distance_negatives + margin, 0.0))
- return loss
diff --git a/research/delf/delf/python/training/losses/ranking_losses_test.py b/research/delf/delf/python/training/losses/ranking_losses_test.py
deleted file mode 100644
index 8e540ca3ce6..00000000000
--- a/research/delf/delf/python/training/losses/ranking_losses_test.py
+++ /dev/null
@@ -1,60 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for Ranking losses."""
-
-import tensorflow as tf
-from delf.python.training.losses import ranking_losses
-
-
-class RankingLossesTest(tf.test.TestCase):
-
- def testContrastiveLoss(self):
- # Testing the correct numeric value.
- queries = tf.math.l2_normalize(tf.constant([[1.0, 2.0, -2.0]]))
- positives = tf.math.l2_normalize(tf.constant([[-1.0, 2.0, 0.0]]))
- negatives = tf.math.l2_normalize(tf.constant([[[-5.0, 0.0, 3.0]]]))
-
- result = ranking_losses.contrastive_loss(queries, positives, negatives,
- margin=0.7, eps=1e-6)
- exp_output = 0.55278635
- self.assertAllClose(exp_output, result)
-
- def testTripletLossZeroLoss(self):
- # Testing the correct numeric value in case if query-positive distance is
- # smaller than the query-negative distance.
- queries = tf.math.l2_normalize(tf.constant([[1.0, 2.0, -2.0]]))
- positives = tf.math.l2_normalize(tf.constant([[-1.0, 2.0, 0.0]]))
- negatives = tf.math.l2_normalize(tf.constant([[[-5.0, 0.0, 3.0]]]))
-
- result = ranking_losses.triplet_loss(queries, positives, negatives,
- margin=0.1)
- exp_output = 0.0
- self.assertAllClose(exp_output, result)
-
- def testTripletLossNonZeroLoss(self):
- # Testing the correct numeric value in case if query-positive distance is
- # bigger than the query-negative distance.
- queries = tf.math.l2_normalize(tf.constant([[1.0, 2.0, -2.0]]))
- positives = tf.math.l2_normalize(tf.constant([[-5.0, 0.0, 3.0]]))
- negatives = tf.math.l2_normalize(tf.constant([[[-1.0, 2.0, 0.0]]]))
-
- result = ranking_losses.triplet_loss(queries, positives, negatives,
- margin=0.1)
- exp_output = 2.2520838
- self.assertAllClose(exp_output, result)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/training/matched_images_demo.png b/research/delf/delf/python/training/matched_images_demo.png
deleted file mode 100644
index b8a4cc9ac89..00000000000
Binary files a/research/delf/delf/python/training/matched_images_demo.png and /dev/null differ
diff --git a/research/delf/delf/python/training/model/__init__.py b/research/delf/delf/python/training/model/__init__.py
deleted file mode 100644
index 3fd7e87af35..00000000000
--- a/research/delf/delf/python/training/model/__init__.py
+++ /dev/null
@@ -1,25 +0,0 @@
-# Copyright 2020 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""DELF model module, used for training and exporting."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-# pylint: disable=unused-import
-from delf.python.training.model import delf_model
-from delf.python.training.model import delg_model
-from delf.python.training.model import export_model_utils
-from delf.python.training.model import resnet50
-# pylint: enable=unused-import
diff --git a/research/delf/delf/python/training/model/delf_model.py b/research/delf/delf/python/training/model/delf_model.py
deleted file mode 100644
index 9d770ba4fd1..00000000000
--- a/research/delf/delf/python/training/model/delf_model.py
+++ /dev/null
@@ -1,240 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""DELF model implementation based on the following paper.
-
- Large-Scale Image Retrieval with Attentive Deep Local Features
- https://arxiv.org/abs/1612.06321
-"""
-
-import tensorflow as tf
-
-from delf.python.training.model import resnet50 as resnet
-
-layers = tf.keras.layers
-reg = tf.keras.regularizers
-
-_DECAY = 0.0001
-
-
-class AttentionModel(tf.keras.Model):
- """Instantiates attention model.
-
- Uses two [kernel_size x kernel_size] convolutions and softplus as activation
- to compute an attention map with the same resolution as the featuremap.
- Features l2-normalized and aggregated using attention probabilites as weights.
- The features (targets) to be aggregated can be the input featuremap, or a
- different one with the same resolution.
- """
-
- def __init__(self, kernel_size=1, decay=_DECAY, name='attention'):
- """Initialization of attention model.
-
- Args:
- kernel_size: int, kernel size of convolutions.
- decay: float, decay for l2 regularization of kernel weights.
- name: str, name to identify model.
- """
- super(AttentionModel, self).__init__(name=name)
-
- # First convolutional layer (called with relu activation).
- self.conv1 = layers.Conv2D(
- 512,
- kernel_size,
- kernel_regularizer=reg.l2(decay),
- padding='same',
- name='attn_conv1')
- self.bn_conv1 = layers.BatchNormalization(axis=3, name='bn_conv1')
-
- # Second convolutional layer, with softplus activation.
- self.conv2 = layers.Conv2D(
- 1,
- kernel_size,
- kernel_regularizer=reg.l2(decay),
- padding='same',
- name='attn_conv2')
- self.activation_layer = layers.Activation('softplus')
-
- def call(self, inputs, targets=None, training=True):
- x = self.conv1(inputs)
- x = self.bn_conv1(x, training=training)
- x = tf.nn.relu(x)
-
- score = self.conv2(x)
- prob = self.activation_layer(score)
-
- # Aggregate inputs if targets is None.
- if targets is None:
- targets = inputs
-
- # L2-normalize the featuremap before pooling.
- targets = tf.nn.l2_normalize(targets, axis=-1)
- feat = tf.reduce_mean(tf.multiply(targets, prob), [1, 2], keepdims=False)
-
- return feat, prob, score
-
-
-class AutoencoderModel(tf.keras.Model):
- """Instantiates the Keras Autoencoder model."""
-
- def __init__(self, reduced_dimension, expand_dimension, kernel_size=1,
- name='autoencoder'):
- """Initialization of Autoencoder model.
-
- Args:
- reduced_dimension: int, the output dimension of the autoencoder layer.
- expand_dimension: int, the input dimension of the autoencoder layer.
- kernel_size: int or tuple, height and width of the 2D convolution window.
- name: str, name to identify model.
- """
- super(AutoencoderModel, self).__init__(name=name)
- self.conv1 = layers.Conv2D(
- reduced_dimension,
- kernel_size,
- padding='same',
- name='autoenc_conv1')
- self.conv2 = layers.Conv2D(
- expand_dimension,
- kernel_size,
- activation=tf.keras.activations.relu,
- padding='same',
- name='autoenc_conv2')
-
- def call(self, inputs):
- dim_reduced_features = self.conv1(inputs)
- dim_expanded_features = self.conv2(dim_reduced_features)
- return dim_expanded_features, dim_reduced_features
-
-
-class Delf(tf.keras.Model):
- """Instantiates Keras DELF model using ResNet50 as backbone.
-
- This class implements the [DELF](https://arxiv.org/abs/1612.06321) model for
- extracting local features from images. The backbone is a ResNet50 network
- that extracts featuremaps from both conv_4 and conv_5 layers. Activations
- from conv_4 are used to compute an attention map of the same resolution.
- """
-
- def __init__(self,
- block3_strides=True,
- name='DELF',
- pooling='avg',
- gem_power=3.0,
- embedding_layer=False,
- embedding_layer_dim=2048,
- use_dim_reduction=False,
- reduced_dimension=128,
- dim_expand_channels=1024):
- """Initialization of DELF model.
-
- Args:
- block3_strides: bool, whether to add strides to the output of block3.
- name: str, name to identify model.
- pooling: str, pooling mode for global feature extraction; possible values
- are 'None', 'avg', 'max', 'gem.'
- gem_power: float, GeM power for GeM pooling. Only used if pooling ==
- 'gem'.
- embedding_layer: bool, whether to create an embedding layer (FC whitening
- layer).
- embedding_layer_dim: int, size of the embedding layer.
- use_dim_reduction: Whether to integrate dimensionality reduction layers.
- If True, extra layers are added to reduce the dimensionality of the
- extracted features.
- reduced_dimension: int, only used if use_dim_reduction is True. The output
- dimension of the autoencoder layer.
- dim_expand_channels: int, only used if use_dim_reduction is True. The
- number of channels of the backbone block used. Default value 1024 is the
- number of channels of backbone block 'block3'.
- """
- super(Delf, self).__init__(name=name)
-
- # Backbone using Keras ResNet50.
- self.backbone = resnet.ResNet50(
- 'channels_last',
- name='backbone',
- include_top=False,
- pooling=pooling,
- block3_strides=block3_strides,
- average_pooling=False,
- gem_power=gem_power,
- embedding_layer=embedding_layer,
- embedding_layer_dim=embedding_layer_dim)
-
- # Attention model.
- self.attention = AttentionModel(name='attention')
-
- # Autoencoder model.
- self._use_dim_reduction = use_dim_reduction
- if self._use_dim_reduction:
- self.autoencoder = AutoencoderModel(reduced_dimension,
- dim_expand_channels,
- name='autoencoder')
-
- def init_classifiers(self, num_classes, desc_classification=None):
- """Define classifiers for training backbone and attention models."""
- self.num_classes = num_classes
- if desc_classification is None:
- self.desc_classification = layers.Dense(
- num_classes, activation=None, kernel_regularizer=None, name='desc_fc')
- else:
- self.desc_classification = desc_classification
- self.attn_classification = layers.Dense(
- num_classes, activation=None, kernel_regularizer=None, name='att_fc')
-
- def global_and_local_forward_pass(self, images, training=True):
- """Run a forward to calculate global descriptor and attention prelogits.
-
- Args:
- images: Tensor containing the dataset on which to run the forward pass.
- training: Indicator of wether the forward pass is running in training mode
- or not.
-
- Returns:
- Global descriptor prelogits, attention prelogits, attention scores,
- backbone weights.
- """
- backbone_blocks = {}
- desc_prelogits = self.backbone.build_call(
- images, intermediates_dict=backbone_blocks, training=training)
- # Prevent gradients from propagating into the backbone. See DELG paper:
- # https://arxiv.org/abs/2001.05027.
- block3 = backbone_blocks['block3'] # pytype: disable=key-error
- block3 = tf.stop_gradient(block3)
- if self._use_dim_reduction:
- (dim_expanded_features, dim_reduced_features) = self.autoencoder(block3)
- attn_prelogits, attn_scores, _ = self.attention(
- block3,
- targets=dim_expanded_features,
- training=training)
- else:
- attn_prelogits, attn_scores, _ = self.attention(block3, training=training)
- dim_expanded_features = None
- dim_reduced_features = None
- return (desc_prelogits, attn_prelogits, attn_scores, backbone_blocks,
- dim_expanded_features, dim_reduced_features)
-
- def build_call(self, input_image, training=True):
- (global_feature, _, attn_scores, backbone_blocks, _,
- dim_reduced_features) = self.global_and_local_forward_pass(input_image,
- training)
- if self._use_dim_reduction:
- features = dim_reduced_features
- else:
- features = backbone_blocks['block3'] # pytype: disable=key-error
- return global_feature, attn_scores, features
-
- def call(self, input_image, training=True):
- _, probs, features = self.build_call(input_image, training=training)
- return probs, features
diff --git a/research/delf/delf/python/training/model/delf_model_test.py b/research/delf/delf/python/training/model/delf_model_test.py
deleted file mode 100644
index 7d5ca44e0c1..00000000000
--- a/research/delf/delf/python/training/model/delf_model_test.py
+++ /dev/null
@@ -1,108 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for the DELF model."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import tensorflow as tf
-
-from delf.python.training.model import delf_model
-
-
-class DelfTest(tf.test.TestCase, parameterized.TestCase):
-
- @parameterized.named_parameters(
- ('block3_stridesTrue', True),
- ('block3_stridesFalse', False),
- )
- def test_build_model(self, block3_strides):
- image_size = 321
- num_classes = 1000
- batch_size = 2
- input_shape = (batch_size, image_size, image_size, 3)
-
- model = delf_model.Delf(block3_strides=block3_strides, name='DELF')
- model.init_classifiers(num_classes)
-
- images = tf.random.uniform(input_shape, minval=-1.0, maxval=1.0, seed=0)
- blocks = {}
-
- # Get global feature by pooling block4 features.
- desc_prelogits = model.backbone(
- images, intermediates_dict=blocks, training=False)
- desc_logits = model.desc_classification(desc_prelogits)
- self.assertAllEqual(desc_prelogits.shape, (batch_size, 2048))
- self.assertAllEqual(desc_logits.shape, (batch_size, num_classes))
-
- features = blocks['block3']
- attn_prelogits, _, _ = model.attention(features)
- attn_logits = model.attn_classification(attn_prelogits)
- self.assertAllEqual(attn_prelogits.shape, (batch_size, 1024))
- self.assertAllEqual(attn_logits.shape, (batch_size, num_classes))
-
- @parameterized.named_parameters(
- ('block3_stridesTrue', True),
- ('block3_stridesFalse', False),
- )
- def test_train_step(self, block3_strides):
-
- image_size = 321
- num_classes = 1000
- batch_size = 2
- clip_val = 10.0
- input_shape = (batch_size, image_size, image_size, 3)
-
- model = delf_model.Delf(block3_strides=block3_strides, name='DELF')
- model.init_classifiers(num_classes)
-
- optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
-
- images = tf.random.uniform(input_shape, minval=0.0, maxval=1.0, seed=0)
- labels = tf.random.uniform((batch_size,),
- minval=0,
- maxval=model.num_classes - 1,
- dtype=tf.int64)
-
- loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
- from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
-
- def compute_loss(labels, predictions):
- per_example_loss = loss_object(labels, predictions)
- return tf.nn.compute_average_loss(
- per_example_loss, global_batch_size=batch_size)
-
- with tf.GradientTape() as gradient_tape:
- (desc_prelogits, attn_prelogits, _, _, _,
- _) = model.global_and_local_forward_pass(images)
- # Calculate global loss by applying the descriptor classifier.
- desc_logits = model.desc_classification(desc_prelogits)
- desc_loss = compute_loss(labels, desc_logits)
- # Calculate attention loss by applying the attention block classifier.
- attn_logits = model.attn_classification(attn_prelogits)
- attn_loss = compute_loss(labels, attn_logits)
- # Cumulate global loss and attention loss and backpropagate through the
- # descriptor layer and attention layer together.
- total_loss = desc_loss + attn_loss
- gradients = gradient_tape.gradient(total_loss, model.trainable_weights)
- clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
- optimizer.apply_gradients(zip(clipped, model.trainable_weights))
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/training/model/delg_model.py b/research/delf/delf/python/training/model/delg_model.py
deleted file mode 100644
index a29161b0581..00000000000
--- a/research/delf/delf/python/training/model/delg_model.py
+++ /dev/null
@@ -1,178 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""DELG model implementation based on the following paper.
-
- Unifying Deep Local and Global Features for Image Search
- https://arxiv.org/abs/2001.05027
-"""
-
-import functools
-import math
-
-from absl import logging
-import tensorflow as tf
-
-from delf.python.training.model import delf_model
-
-layers = tf.keras.layers
-
-
-class Delg(delf_model.Delf):
- """Instantiates Keras DELG model using ResNet50 as backbone.
-
- This class implements the [DELG](https://arxiv.org/abs/2001.05027) model for
- extracting local and global features from images. The same attention layer
- is trained as in the DELF model. In addition, the extraction of global
- features is trained using GeMPooling, a FC whitening layer also called
- "embedding layer" and ArcFace loss.
- """
-
- def __init__(self,
- block3_strides=True,
- name='DELG',
- gem_power=3.0,
- embedding_layer_dim=2048,
- scale_factor_init=45.25, # sqrt(2048)
- arcface_margin=0.1,
- use_dim_reduction=False,
- reduced_dimension=128,
- dim_expand_channels=1024):
- """Initialization of DELG model.
-
- Args:
- block3_strides: bool, whether to add strides to the output of block3.
- name: str, name to identify model.
- gem_power: float, GeM power parameter.
- embedding_layer_dim : int, dimension of the embedding layer.
- scale_factor_init: float.
- arcface_margin: float, ArcFace margin.
- use_dim_reduction: Whether to integrate dimensionality reduction layers.
- If True, extra layers are added to reduce the dimensionality of the
- extracted features.
- reduced_dimension: Only used if use_dim_reduction is True, the output
- dimension of the dim_reduction layer.
- dim_expand_channels: Only used if use_dim_reduction is True, the
- number of channels of the backbone block used. Default value 1024 is the
- number of channels of backbone block 'block3'.
- """
- logging.info('Creating Delg model, gem_power %d, embedding_layer_dim %d',
- gem_power, embedding_layer_dim)
- super(Delg, self).__init__(block3_strides=block3_strides,
- name=name,
- pooling='gem',
- gem_power=gem_power,
- embedding_layer=True,
- embedding_layer_dim=embedding_layer_dim,
- use_dim_reduction=use_dim_reduction,
- reduced_dimension=reduced_dimension,
- dim_expand_channels=dim_expand_channels)
- self._embedding_layer_dim = embedding_layer_dim
- self._scale_factor_init = scale_factor_init
- self._arcface_margin = arcface_margin
-
- def init_classifiers(self, num_classes):
- """Define classifiers for training backbone and attention models."""
- logging.info('Initializing Delg backbone and attention models classifiers')
- backbone_classifier_func = self._create_backbone_classifier(num_classes)
- super(Delg, self).init_classifiers(
- num_classes,
- desc_classification=backbone_classifier_func)
-
- def _create_backbone_classifier(self, num_classes):
- """Define the classifier for training the backbone model."""
- logging.info('Creating cosine classifier')
- self.cosine_weights = tf.Variable(
- initial_value=tf.initializers.GlorotUniform()(
- shape=[self._embedding_layer_dim, num_classes]),
- name='cosine_weights',
- trainable=True)
- self.scale_factor = tf.Variable(self._scale_factor_init,
- name='scale_factor',
- trainable=False)
- classifier_func = functools.partial(cosine_classifier_logits,
- num_classes=num_classes,
- cosine_weights=self.cosine_weights,
- scale_factor=self.scale_factor,
- arcface_margin=self._arcface_margin)
- classifier_func.trainable_weights = [self.cosine_weights]
- return classifier_func
-
-
-def cosine_classifier_logits(prelogits,
- labels,
- num_classes,
- cosine_weights,
- scale_factor,
- arcface_margin,
- training=True):
- """Compute cosine classifier logits using ArFace margin.
-
- Args:
- prelogits: float tensor of shape [batch_size, embedding_layer_dim].
- labels: int tensor of shape [batch_size].
- num_classes: int, number of classes.
- cosine_weights: float tensor of shape [embedding_layer_dim, num_classes].
- scale_factor: float.
- arcface_margin: float. Only used if greater than zero, and training is True.
- training: bool, True if training, False if eval.
-
- Returns:
- logits: Float tensor [batch_size, num_classes].
- """
- # L2-normalize prelogits, then obtain cosine similarity.
- normalized_prelogits = tf.math.l2_normalize(prelogits, axis=1)
- normalized_weights = tf.math.l2_normalize(cosine_weights, axis=0)
- cosine_sim = tf.matmul(normalized_prelogits, normalized_weights)
-
- # Optionally use ArcFace margin.
- if training and arcface_margin > 0.0:
- # Reshape labels tensor from [batch_size] to [batch_size, num_classes].
- one_hot_labels = tf.one_hot(labels, num_classes)
- cosine_sim = apply_arcface_margin(cosine_sim,
- one_hot_labels,
- arcface_margin)
-
- # Apply the scale factor to logits and return.
- logits = scale_factor * cosine_sim
- return logits
-
-
-def apply_arcface_margin(cosine_sim, one_hot_labels, arcface_margin):
- """Applies ArcFace margin to cosine similarity inputs.
-
- For a reference, see https://arxiv.org/pdf/1801.07698.pdf. ArFace margin is
- applied to angles from correct classes (as per the ArcFace paper), and only
- if they are <= (pi - margin). Otherwise, applying the margin may actually
- improve their cosine similarity.
-
- Args:
- cosine_sim: float tensor with shape [batch_size, num_classes].
- one_hot_labels: int tensor with shape [batch_size, num_classes].
- arcface_margin: float.
-
- Returns:
- cosine_sim_with_margin: Float tensor with shape [batch_size, num_classes].
- """
- theta = tf.acos(cosine_sim, name='acos')
- selected_labels = tf.where(tf.greater(theta, math.pi - arcface_margin),
- tf.zeros_like(one_hot_labels),
- one_hot_labels,
- name='selected_labels')
- final_theta = tf.where(tf.cast(selected_labels, dtype=tf.bool),
- theta + arcface_margin,
- theta,
- name='final_theta')
- return tf.cos(final_theta, name='cosine_sim_with_margin')
diff --git a/research/delf/delf/python/training/model/delg_model_test.py b/research/delf/delf/python/training/model/delg_model_test.py
deleted file mode 100644
index 3ac2ec5ad24..00000000000
--- a/research/delf/delf/python/training/model/delg_model_test.py
+++ /dev/null
@@ -1,151 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for the DELG model."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import tensorflow as tf
-
-from delf.python.training.model import delg_model
-
-
-class DelgTest(tf.test.TestCase, parameterized.TestCase):
-
- @parameterized.named_parameters(
- ('block3_stridesTrue', True),
- ('block3_stridesFalse', False),
- )
- def test_forward_pass(self, block3_strides):
- image_size = 321
- num_classes = 1000
- batch_size = 2
- input_shape = (batch_size, image_size, image_size, 3)
- local_feature_dim = 64
- feature_map_size = image_size // 16 # reduction factor for resnet50.
- if block3_strides:
- feature_map_size //= 2
-
- model = delg_model.Delg(block3_strides=block3_strides,
- use_dim_reduction=True,
- reduced_dimension=local_feature_dim)
- model.init_classifiers(num_classes)
-
- images = tf.random.uniform(input_shape, minval=-1.0, maxval=1.0, seed=0)
-
- # Run a complete forward pass of the model.
- global_feature, attn_scores, local_features = model.build_call(images)
-
- self.assertAllEqual(global_feature.shape, (batch_size, 2048))
- self.assertAllEqual(
- attn_scores.shape,
- (batch_size, feature_map_size, feature_map_size, 1))
- self.assertAllEqual(
- local_features.shape,
- (batch_size, feature_map_size, feature_map_size, local_feature_dim))
-
- @parameterized.named_parameters(
- ('block3_stridesTrue', True),
- ('block3_stridesFalse', False),
- )
- def test_build_model(self, block3_strides):
- image_size = 321
- num_classes = 1000
- batch_size = 2
- input_shape = (batch_size, image_size, image_size, 3)
-
- model = delg_model.Delg(
- block3_strides=block3_strides,
- use_dim_reduction=True)
- model.init_classifiers(num_classes)
-
- images = tf.random.uniform(input_shape, minval=-1.0, maxval=1.0, seed=0)
- labels = tf.random.uniform((batch_size,),
- minval=0,
- maxval=model.num_classes - 1,
- dtype=tf.int64)
- blocks = {}
-
- desc_prelogits = model.backbone(
- images, intermediates_dict=blocks, training=False)
- desc_logits = model.desc_classification(desc_prelogits, labels)
- self.assertAllEqual(desc_prelogits.shape, (batch_size, 2048))
- self.assertAllEqual(desc_logits.shape, (batch_size, num_classes))
-
- features = blocks['block3']
- attn_prelogits, _, _ = model.attention(features)
- attn_logits = model.attn_classification(attn_prelogits)
- self.assertAllEqual(attn_prelogits.shape, (batch_size, 1024))
- self.assertAllEqual(attn_logits.shape, (batch_size, num_classes))
-
- @parameterized.named_parameters(
- ('block3_stridesTrue', True),
- ('block3_stridesFalse', False),
- )
- def test_train_step(self, block3_strides):
- image_size = 321
- num_classes = 1000
- batch_size = 2
- clip_val = 10.0
- input_shape = (batch_size, image_size, image_size, 3)
-
- model = delg_model.Delg(
- block3_strides=block3_strides,
- use_dim_reduction=True)
- model.init_classifiers(num_classes)
-
- optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9)
-
- images = tf.random.uniform(input_shape, minval=0.0, maxval=1.0, seed=0)
- labels = tf.random.uniform((batch_size,),
- minval=0,
- maxval=model.num_classes - 1,
- dtype=tf.int64)
-
- loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
- from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
-
- def compute_loss(labels, predictions):
- per_example_loss = loss_object(labels, predictions)
- return tf.nn.compute_average_loss(
- per_example_loss, global_batch_size=batch_size)
-
- with tf.GradientTape() as gradient_tape:
- (desc_prelogits, attn_prelogits, _, backbone_blocks,
- dim_expanded_features, _) = model.global_and_local_forward_pass(images)
- # Calculate global loss by applying the descriptor classifier.
- desc_logits = model.desc_classification(desc_prelogits, labels)
- desc_loss = compute_loss(labels, desc_logits)
- # Calculate attention loss by applying the attention block classifier.
- attn_logits = model.attn_classification(attn_prelogits)
- attn_loss = compute_loss(labels, attn_logits)
- # Calculate reconstruction loss between the attention prelogits and the
- # backbone.
- block3 = tf.stop_gradient(backbone_blocks['block3'])
- reconstruction_loss = tf.math.reduce_mean(
- tf.keras.losses.MSE(block3, dim_expanded_features))
- # Cumulate global loss and attention loss and backpropagate through the
- # descriptor layer and attention layer together.
- total_loss = desc_loss + attn_loss + reconstruction_loss
- gradients = gradient_tape.gradient(total_loss, model.trainable_weights)
- clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
- optimizer.apply_gradients(zip(clipped, model.trainable_weights))
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/training/model/export_CNN_global.py b/research/delf/delf/python/training/model/export_CNN_global.py
deleted file mode 100644
index efdd1fe833d..00000000000
--- a/research/delf/delf/python/training/model/export_CNN_global.py
+++ /dev/null
@@ -1,173 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Export global CNN feature tensorflow inference model."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import app
-from absl import flags
-import tensorflow as tf
-
-from delf.python.training.model import global_model
-from delf.python.training.model import export_model_utils
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('ckpt_path', None, help='Path to saved checkpoint.')
-flags.DEFINE_string('export_path', None,
- help='Path where model will be exported.')
-flags.DEFINE_list(
- 'input_scales_list', None,
- 'Optional input image scales to use. If None (default), an input '
- 'end-point '
- '"input_scales" is added for the exported model. If not None, the '
- 'specified list of floats will be hard-coded as the desired input '
- 'scales.')
-flags.DEFINE_enum(
- 'multi_scale_pool_type', 'None', ['None', 'average', 'sum'],
- "If 'None' (default), the model is exported with an output end-point "
- "'global_descriptors', where the global descriptor for each scale is "
- "returned separately. If not 'None', the global descriptor of each "
- "scale is"
- ' pooled and a 1D global descriptor is returned, with output end-point '
- "'global_descriptor'.")
-flags.DEFINE_boolean('normalize_global_descriptor', False,
- 'If True, L2-normalizes global descriptor.')
-# Network architecture and initialization options.
-flags.DEFINE_string('arch', 'ResNet101',
- 'model architecture (default: ResNet101)')
-flags.DEFINE_string('pool', 'gem', 'pooling options (default: gem)')
-flags.DEFINE_boolean('whitening', False,
- 'train model with learnable whitening (linear layer) '
- 'after the pooling')
-
-
-def _NormalizeImages(images, *args):
- """Normalize pixel values in image.
-
- Args:
- images: `Tensor`, images to normalize.
-
- Returns:
- normalized_images: `Tensor`, normalized images.
- """
- tf.keras.applications.imagenet_utils.preprocess_input(images, mode='caffe')
- return images
-
-
-class _ExtractModule(tf.Module):
- """Helper module to build and save global feature model."""
-
- def __init__(self,
- multi_scale_pool_type='None',
- normalize_global_descriptor=False,
- input_scales_tensor=None):
- """Initialization of global feature model.
- Args:
- multi_scale_pool_type: Type of multi-scale pooling to perform.
- normalize_global_descriptor: Whether to L2-normalize global
- descriptor.
- input_scales_tensor: If None, the exported function to be used
- should be ExtractFeatures, where an input end-point "input_scales" is
- added for the exported model. If not None, the specified 1D tensor of
- floats will be hard-coded as the desired input scales, in conjunction
- with ExtractFeaturesFixedScales.
- """
- self._multi_scale_pool_type = multi_scale_pool_type
- self._normalize_global_descriptor = normalize_global_descriptor
- if input_scales_tensor is None:
- self._input_scales_tensor = []
- else:
- self._input_scales_tensor = input_scales_tensor
-
- self._model = global_model.GlobalFeatureNet(
- FLAGS.arch, FLAGS.pool, FLAGS.whitening, pretrained=False)
-
- def LoadWeights(self, checkpoint_path):
- self._model.load_weights(checkpoint_path)
-
- @tf.function(input_signature=[
- tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8,
- name='input_image'),
- tf.TensorSpec(shape=[None], dtype=tf.float32, name='input_scales'),
- tf.TensorSpec(shape=[None], dtype=tf.int32,
- name='input_global_scales_ind')
- ])
- def ExtractFeatures(self, input_image, input_scales,
- input_global_scales_ind):
- extracted_features = export_model_utils.ExtractGlobalFeatures(
- input_image,
- input_scales,
- input_global_scales_ind,
- lambda x: self._model(x, training=False),
- multi_scale_pool_type=self._multi_scale_pool_type,
- normalize_global_descriptor=self._normalize_global_descriptor,
- normalization_function=_NormalizeImages())
-
- named_output_tensors = {}
- named_output_tensors['global_descriptors'] = tf.identity(
- extracted_features, name='global_descriptors')
- return named_output_tensors
-
- @tf.function(input_signature=[
- tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image')
- ])
- def ExtractFeaturesFixedScales(self, input_image):
- return self.ExtractFeatures(input_image, self._input_scales_tensor,
- tf.range(tf.size(self._input_scales_tensor)))
-
-
-def main(argv):
- if len(argv) > 1:
- raise app.UsageError('Too many command-line arguments.')
-
- export_path = FLAGS.export_path
- if os.path.exists(export_path):
- raise ValueError('export_path %s already exists.' % export_path)
-
- if FLAGS.input_scales_list is None:
- input_scales_tensor = None
- else:
- input_scales_tensor = tf.constant(
- [float(s) for s in FLAGS.input_scales_list],
- dtype=tf.float32,
- shape=[len(FLAGS.input_scales_list)],
- name='input_scales')
- module = _ExtractModule(FLAGS.multi_scale_pool_type,
- FLAGS.normalize_global_descriptor,
- input_scales_tensor)
-
- # Load the weights.
- checkpoint_path = FLAGS.ckpt_path
- module.LoadWeights(checkpoint_path)
- print('Checkpoint loaded from ', checkpoint_path)
-
- # Save the module.
- if FLAGS.input_scales_list is None:
- served_function = module.ExtractFeatures
- else:
- served_function = module.ExtractFeaturesFixedScales
-
- tf.saved_model.save(
- module, export_path, signatures={'serving_default': served_function})
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/training/model/export_global_model.py b/research/delf/delf/python/training/model/export_global_model.py
deleted file mode 100644
index e5f9128a0bf..00000000000
--- a/research/delf/delf/python/training/model/export_global_model.py
+++ /dev/null
@@ -1,183 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Export global feature tensorflow inference model.
-
-The exported model may leverage image pyramids for multi-scale processing.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import app
-from absl import flags
-import tensorflow as tf
-
-from delf.python.training.model import delf_model
-from delf.python.training.model import delg_model
-from delf.python.training.model import export_model_utils
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('ckpt_path', '/tmp/delf-logdir/delf-weights',
- 'Path to saved checkpoint.')
-flags.DEFINE_string('export_path', None, 'Path where model will be exported.')
-flags.DEFINE_list(
- 'input_scales_list', None,
- 'Optional input image scales to use. If None (default), an input end-point '
- '"input_scales" is added for the exported model. If not None, the '
- 'specified list of floats will be hard-coded as the desired input scales.')
-flags.DEFINE_enum(
- 'multi_scale_pool_type', 'None', ['None', 'average', 'sum'],
- "If 'None' (default), the model is exported with an output end-point "
- "'global_descriptors', where the global descriptor for each scale is "
- "returned separately. If not 'None', the global descriptor of each scale is"
- ' pooled and a 1D global descriptor is returned, with output end-point '
- "'global_descriptor'.")
-flags.DEFINE_boolean('normalize_global_descriptor', False,
- 'If True, L2-normalizes global descriptor.')
-flags.DEFINE_boolean('delg_global_features', False,
- 'Whether the model uses a DELG-like global feature head.')
-flags.DEFINE_float(
- 'delg_gem_power', 3.0,
- 'Power for Generalized Mean pooling. Used only if --delg_global_features'
- 'is present.')
-flags.DEFINE_integer(
- 'delg_embedding_layer_dim', 2048,
- 'Size of the FC whitening layer (embedding layer). Used only if'
- '--delg_global_features is present.')
-
-
-class _ExtractModule(tf.Module):
- """Helper module to build and save global feature model."""
-
- def __init__(self,
- multi_scale_pool_type='None',
- normalize_global_descriptor=False,
- input_scales_tensor=None,
- delg_global_features=False,
- delg_gem_power=3.0,
- delg_embedding_layer_dim=2048):
- """Initialization of global feature model.
-
- Args:
- multi_scale_pool_type: Type of multi-scale pooling to perform.
- normalize_global_descriptor: Whether to L2-normalize global descriptor.
- input_scales_tensor: If None, the exported function to be used should be
- ExtractFeatures, where an input end-point "input_scales" is added for
- the exported model. If not None, the specified 1D tensor of floats will
- be hard-coded as the desired input scales, in conjunction with
- ExtractFeaturesFixedScales.
- delg_global_features: Whether the model uses a DELG-like global feature
- head.
- delg_gem_power: Power for Generalized Mean pooling in the DELG model. Used
- only if 'delg_global_features' is True.
- delg_embedding_layer_dim: Size of the FC whitening layer (embedding
- layer). Used only if 'delg_global_features' is True.
- """
- self._multi_scale_pool_type = multi_scale_pool_type
- self._normalize_global_descriptor = normalize_global_descriptor
- if input_scales_tensor is None:
- self._input_scales_tensor = []
- else:
- self._input_scales_tensor = input_scales_tensor
-
- # Setup the DELF model for extraction.
- if delg_global_features:
- self._model = delg_model.Delg(
- block3_strides=False,
- name='DELG',
- gem_power=delg_gem_power,
- embedding_layer_dim=delg_embedding_layer_dim)
- else:
- self._model = delf_model.Delf(block3_strides=False, name='DELF')
-
- def LoadWeights(self, checkpoint_path):
- self._model.load_weights(checkpoint_path)
-
- @tf.function(input_signature=[
- tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image'),
- tf.TensorSpec(shape=[None], dtype=tf.float32, name='input_scales'),
- tf.TensorSpec(
- shape=[None], dtype=tf.int32, name='input_global_scales_ind')
- ])
- def ExtractFeatures(self, input_image, input_scales, input_global_scales_ind):
- extracted_features = export_model_utils.ExtractGlobalFeatures(
- input_image,
- input_scales,
- input_global_scales_ind,
- lambda x: self._model.backbone.build_call(x, training=False),
- multi_scale_pool_type=self._multi_scale_pool_type,
- normalize_global_descriptor=self._normalize_global_descriptor)
-
- named_output_tensors = {}
- if self._multi_scale_pool_type == 'None':
- named_output_tensors['global_descriptors'] = tf.identity(
- extracted_features, name='global_descriptors')
- else:
- named_output_tensors['global_descriptor'] = tf.identity(
- extracted_features, name='global_descriptor')
-
- return named_output_tensors
-
- @tf.function(input_signature=[
- tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image')
- ])
- def ExtractFeaturesFixedScales(self, input_image):
- return self.ExtractFeatures(input_image, self._input_scales_tensor,
- tf.range(tf.size(self._input_scales_tensor)))
-
-
-def main(argv):
- if len(argv) > 1:
- raise app.UsageError('Too many command-line arguments.')
-
- export_path = FLAGS.export_path
- if os.path.exists(export_path):
- raise ValueError('export_path %s already exists.' % export_path)
-
- if FLAGS.input_scales_list is None:
- input_scales_tensor = None
- else:
- input_scales_tensor = tf.constant(
- [float(s) for s in FLAGS.input_scales_list],
- dtype=tf.float32,
- shape=[len(FLAGS.input_scales_list)],
- name='input_scales')
- module = _ExtractModule(FLAGS.multi_scale_pool_type,
- FLAGS.normalize_global_descriptor,
- input_scales_tensor, FLAGS.delg_global_features,
- FLAGS.delg_gem_power, FLAGS.delg_embedding_layer_dim)
-
- # Load the weights.
- checkpoint_path = FLAGS.ckpt_path
- module.LoadWeights(checkpoint_path)
- print('Checkpoint loaded from ', checkpoint_path)
-
- # Save the module
- if FLAGS.input_scales_list is None:
- served_function = module.ExtractFeatures
- else:
- served_function = module.ExtractFeaturesFixedScales
-
- tf.saved_model.save(
- module, export_path, signatures={'serving_default': served_function})
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/training/model/export_local_and_global_model.py b/research/delf/delf/python/training/model/export_local_and_global_model.py
deleted file mode 100644
index a6cee584f87..00000000000
--- a/research/delf/delf/python/training/model/export_local_and_global_model.py
+++ /dev/null
@@ -1,170 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Export DELG tensorflow inference model.
-
-The exported model can be used to jointly extract local and global features. It
-may use an image pyramid for multi-scale processing, and will include receptive
-field calculation and keypoint selection for the local feature head.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import app
-from absl import flags
-import tensorflow as tf
-
-from delf.python.training.model import delf_model
-from delf.python.training.model import delg_model
-from delf.python.training.model import export_model_utils
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string(
- 'ckpt_path', '/tmp/delf-logdir/delf-weights', 'Path to saved checkpoint.')
-flags.DEFINE_string('export_path', None, 'Path where model will be exported.')
-flags.DEFINE_boolean(
- 'delg_global_features', True,
- 'Whether the model uses a DELG-like global feature head.')
-flags.DEFINE_float(
- 'delg_gem_power', 3.0,
- 'Power for Generalized Mean pooling. Used only if --delg_global_features'
- 'is present.')
-flags.DEFINE_integer(
- 'delg_embedding_layer_dim', 2048,
- 'Size of the FC whitening layer (embedding layer). Used only if'
- '--delg_global_features is present.')
-flags.DEFINE_boolean(
- 'block3_strides', True,
- 'Whether to apply strides after block3, used for local feature head.')
-flags.DEFINE_float(
- 'iou', 1.0, 'IOU for non-max suppression used in local feature head.')
-flags.DEFINE_boolean(
- 'use_autoencoder', True,
- 'Whether the exported model should use an autoencoder.')
-flags.DEFINE_float(
- 'autoencoder_dimensions', 128,
- 'Number of dimensions of the autoencoder. Used only if'
- 'use_autoencoder=True.')
-flags.DEFINE_float(
- 'local_feature_map_channels', 1024,
- 'Number of channels at backbone layer used for local feature extraction. '
- 'Default value 1024 is the number of channels of block3. Used only if'
- 'use_autoencoder=True.')
-
-
-class _ExtractModule(tf.Module):
- """Helper module to build and save DELG model."""
-
- def __init__(self,
- delg_global_features=True,
- delg_gem_power=3.0,
- delg_embedding_layer_dim=2048,
- block3_strides=True,
- iou=1.0):
- """Initialization of DELG model.
-
- Args:
- delg_global_features: Whether the model uses a DELG-like global feature
- head.
- delg_gem_power: Power for Generalized Mean pooling in the DELG model. Used
- only if 'delg_global_features' is True.
- delg_embedding_layer_dim: Size of the FC whitening layer (embedding
- layer). Used only if 'delg_global_features' is True.
- block3_strides: bool, whether to add strides to the output of block3.
- iou: IOU for non-max suppression.
- """
- self._stride_factor = 2.0 if block3_strides else 1.0
- self._iou = iou
-
- # Setup the DELG model for extraction.
- if delg_global_features:
- self._model = delg_model.Delg(
- block3_strides=block3_strides,
- name='DELG',
- gem_power=delg_gem_power,
- embedding_layer_dim=delg_embedding_layer_dim,
- use_dim_reduction=FLAGS.use_autoencoder,
- reduced_dimension=FLAGS.autoencoder_dimensions,
- dim_expand_channels=FLAGS.local_feature_map_channels)
- else:
- self._model = delf_model.Delf(
- block3_strides=block3_strides,
- name='DELF',
- use_dim_reduction=FLAGS.use_autoencoder,
- reduced_dimension=FLAGS.autoencoder_dimensions,
- dim_expand_channels=FLAGS.local_feature_map_channels)
-
- def LoadWeights(self, checkpoint_path):
- self._model.load_weights(checkpoint_path)
-
- @tf.function(input_signature=[
- tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image'),
- tf.TensorSpec(shape=[None], dtype=tf.float32, name='input_scales'),
- tf.TensorSpec(shape=(), dtype=tf.int32, name='input_max_feature_num'),
- tf.TensorSpec(shape=(), dtype=tf.float32, name='input_abs_thres'),
- tf.TensorSpec(
- shape=[None], dtype=tf.int32, name='input_global_scales_ind')
- ])
- def ExtractFeatures(self, input_image, input_scales, input_max_feature_num,
- input_abs_thres, input_global_scales_ind):
- extracted_features = export_model_utils.ExtractLocalAndGlobalFeatures(
- input_image, input_scales, input_max_feature_num, input_abs_thres,
- input_global_scales_ind, self._iou,
- lambda x: self._model.build_call(x, training=False),
- self._stride_factor)
-
- named_output_tensors = {}
- named_output_tensors['boxes'] = tf.identity(
- extracted_features[0], name='boxes')
- named_output_tensors['features'] = tf.identity(
- extracted_features[1], name='features')
- named_output_tensors['scales'] = tf.identity(
- extracted_features[2], name='scales')
- named_output_tensors['scores'] = tf.identity(
- extracted_features[3], name='scores')
- named_output_tensors['global_descriptors'] = tf.identity(
- extracted_features[4], name='global_descriptors')
- return named_output_tensors
-
-
-def main(argv):
- if len(argv) > 1:
- raise app.UsageError('Too many command-line arguments.')
-
- export_path = FLAGS.export_path
- if os.path.exists(export_path):
- raise ValueError(f'Export_path {export_path} already exists. Please '
- 'specify a different path or delete the existing one.')
-
- module = _ExtractModule(FLAGS.delg_global_features, FLAGS.delg_gem_power,
- FLAGS.delg_embedding_layer_dim, FLAGS.block3_strides,
- FLAGS.iou)
-
- # Load the weights.
- checkpoint_path = FLAGS.ckpt_path
- module.LoadWeights(checkpoint_path)
- print('Checkpoint loaded from ', checkpoint_path)
-
- # Save the module
- tf.saved_model.save(module, export_path)
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/training/model/export_local_model.py b/research/delf/delf/python/training/model/export_local_model.py
deleted file mode 100644
index 767d363ef7e..00000000000
--- a/research/delf/delf/python/training/model/export_local_model.py
+++ /dev/null
@@ -1,128 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Export DELF tensorflow inference model.
-
-The exported model may use an image pyramid for multi-scale processing, with
-local feature extraction including receptive field calculation and keypoint
-selection.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import app
-from absl import flags
-import tensorflow as tf
-
-from delf.python.training.model import delf_model
-from delf.python.training.model import export_model_utils
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string(
- 'ckpt_path', '/tmp/delf-logdir/delf-weights', 'Path to saved checkpoint.')
-flags.DEFINE_string('export_path', None, 'Path where model will be exported.')
-flags.DEFINE_boolean(
- 'block3_strides', True, 'Whether to apply strides after block3.')
-flags.DEFINE_float('iou', 1.0, 'IOU for non-max suppression.')
-flags.DEFINE_boolean(
- 'use_autoencoder', True,
- 'Whether the exported model should use an autoencoder.')
-flags.DEFINE_float(
- 'autoencoder_dimensions', 128,
- 'Number of dimensions of the autoencoder. Used only if'
- 'use_autoencoder=True.')
-flags.DEFINE_float(
- 'local_feature_map_channels', 1024,
- 'Number of channels at backbone layer used for local feature extraction. '
- 'Default value 1024 is the number of channels of block3. Used only if'
- 'use_autoencoder=True.')
-
-
-class _ExtractModule(tf.Module):
- """Helper module to build and save DELF model."""
-
- def __init__(self, block3_strides, iou):
- """Initialization of DELF model.
-
- Args:
- block3_strides: bool, whether to add strides to the output of block3.
- iou: IOU for non-max suppression.
- """
- self._stride_factor = 2.0 if block3_strides else 1.0
- self._iou = iou
- # Setup the DELF model for extraction.
- self._model = delf_model.Delf(
- block3_strides=block3_strides,
- name='DELF',
- use_dim_reduction=FLAGS.use_autoencoder,
- reduced_dimension=FLAGS.autoencoder_dimensions,
- dim_expand_channels=FLAGS.local_feature_map_channels)
-
- def LoadWeights(self, checkpoint_path):
- self._model.load_weights(checkpoint_path)
-
- @tf.function(input_signature=[
- tf.TensorSpec(shape=[None, None, 3], dtype=tf.uint8, name='input_image'),
- tf.TensorSpec(shape=[None], dtype=tf.float32, name='input_scales'),
- tf.TensorSpec(shape=(), dtype=tf.int32, name='input_max_feature_num'),
- tf.TensorSpec(shape=(), dtype=tf.float32, name='input_abs_thres')
- ])
- def ExtractFeatures(self, input_image, input_scales, input_max_feature_num,
- input_abs_thres):
-
- extracted_features = export_model_utils.ExtractLocalFeatures(
- input_image, input_scales, input_max_feature_num, input_abs_thres,
- self._iou, lambda x: self._model(x, training=False),
- self._stride_factor)
-
- named_output_tensors = {}
- named_output_tensors['boxes'] = tf.identity(
- extracted_features[0], name='boxes')
- named_output_tensors['features'] = tf.identity(
- extracted_features[1], name='features')
- named_output_tensors['scales'] = tf.identity(
- extracted_features[2], name='scales')
- named_output_tensors['scores'] = tf.identity(
- extracted_features[3], name='scores')
- return named_output_tensors
-
-
-def main(argv):
- if len(argv) > 1:
- raise app.UsageError('Too many command-line arguments.')
-
- export_path = FLAGS.export_path
- if os.path.exists(export_path):
- raise ValueError(f'Export_path {export_path} already exists. Please '
- 'specify a different path or delete the existing one.')
-
- module = _ExtractModule(FLAGS.block3_strides, FLAGS.iou)
-
- # Load the weights.
- checkpoint_path = FLAGS.ckpt_path
- module.LoadWeights(checkpoint_path)
- print('Checkpoint loaded from ', checkpoint_path)
-
- # Save the module
- tf.saved_model.save(module, export_path)
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/training/model/export_model_utils.py b/research/delf/delf/python/training/model/export_model_utils.py
deleted file mode 100644
index f5419528ffc..00000000000
--- a/research/delf/delf/python/training/model/export_model_utils.py
+++ /dev/null
@@ -1,410 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Helper functions for DELF model exporting."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-from delf import feature_extractor
-from delf.python.datasets.google_landmarks_dataset import googlelandmarks as gld
-from object_detection.core import box_list
-from object_detection.core import box_list_ops
-
-
-# TODO(andrearaujo): Rewrite this function to be more similar to
-# "ExtractLocalAndGlobalFeatures" below, leveraging autograph to avoid the need
-# for tf.while loop.
-def ExtractLocalFeatures(image, image_scales, max_feature_num, abs_thres, iou,
- attention_model_fn, stride_factor):
- """Extract local features for input image.
-
- Args:
- image: image tensor of type tf.uint8 with shape [h, w, channels].
- image_scales: 1D float tensor which contains float scales used for image
- pyramid construction.
- max_feature_num: int tensor denoting the maximum selected feature points.
- abs_thres: float tensor denoting the score threshold for feature selection.
- iou: float scalar denoting the iou threshold for NMS.
- attention_model_fn: model function. Follows the signature:
- * Args:
- * `images`: Image tensor which is re-scaled.
- * Returns:
- * `attention_prob`: attention map after the non-linearity.
- * `feature_map`: feature map after ResNet convolution.
- stride_factor: integer accounting for striding after block3.
-
- Returns:
- boxes: [N, 4] float tensor which denotes the selected receptive box. N is
- the number of final feature points which pass through keypoint selection
- and NMS steps.
- features: [N, depth] float tensor.
- feature_scales: [N] float tensor. It is the inverse of the input image
- scales such that larger image scales correspond to larger image regions,
- which is compatible with keypoints detected with other techniques, for
- example Congas.
- scores: [N, 1] float tensor denoting the attention score.
-
- """
- original_image_shape_float = tf.gather(
- tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1])
-
- image_tensor = gld.NormalizeImages(
- image, pixel_value_offset=128.0, pixel_value_scale=128.0)
- image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
-
- # Hard code the feature depth and receptive field parameters for now.
- # We need to revisit this once we change the architecture and selected
- # convolutional blocks to use as local features.
- rf, stride, padding = [291.0, 16.0 * stride_factor, 145.0]
- feature_depth = 1024
-
- def _ProcessSingleScale(scale_index, boxes, features, scales, scores):
- """Resizes the image and run feature extraction and keypoint selection.
-
- This function will be passed into tf.while_loop() and be called
- repeatedly. The input boxes are collected from the previous iteration
- [0: scale_index -1]. We get the current scale by
- image_scales[scale_index], and run resize image, feature extraction and
- keypoint selection. Then we will get a new set of selected_boxes for
- current scale. In the end, we concat the previous boxes with current
- selected_boxes as the output.
- Args:
- scale_index: A valid index in the image_scales.
- boxes: Box tensor with the shape of [N, 4].
- features: Feature tensor with the shape of [N, depth].
- scales: Scale tensor with the shape of [N].
- scores: Attention score tensor with the shape of [N].
-
- Returns:
- scale_index: The next scale index for processing.
- boxes: Concatenated box tensor with the shape of [K, 4]. K >= N.
- features: Concatenated feature tensor with the shape of [K, depth].
- scales: Concatenated scale tensor with the shape of [K].
- scores: Concatenated score tensor with the shape of [K].
- """
- scale = tf.gather(image_scales, scale_index)
- new_image_size = tf.dtypes.cast(
- tf.round(original_image_shape_float * scale), tf.int32)
- resized_image = tf.image.resize(image_tensor, new_image_size)
-
- attention_prob, feature_map = attention_model_fn(resized_image)
- attention_prob = tf.squeeze(attention_prob, axis=[0])
- feature_map = tf.squeeze(feature_map, axis=[0])
-
- rf_boxes = feature_extractor.CalculateReceptiveBoxes(
- tf.shape(feature_map)[0],
- tf.shape(feature_map)[1], rf, stride, padding)
-
- # Re-project back to the original image space.
- rf_boxes = tf.divide(rf_boxes, scale)
- attention_prob = tf.reshape(attention_prob, [-1])
- feature_map = tf.reshape(feature_map, [-1, feature_depth])
-
- # Use attention score to select feature vectors.
- indices = tf.reshape(tf.where(attention_prob >= abs_thres), [-1])
- selected_boxes = tf.gather(rf_boxes, indices)
- selected_features = tf.gather(feature_map, indices)
- selected_scores = tf.gather(attention_prob, indices)
- selected_scales = tf.ones_like(selected_scores, tf.float32) / scale
-
- # Concat with the previous result from different scales.
- boxes = tf.concat([boxes, selected_boxes], 0)
- features = tf.concat([features, selected_features], 0)
- scales = tf.concat([scales, selected_scales], 0)
- scores = tf.concat([scores, selected_scores], 0)
-
- return scale_index + 1, boxes, features, scales, scores
-
- output_boxes = tf.zeros([0, 4], dtype=tf.float32)
- output_features = tf.zeros([0, feature_depth], dtype=tf.float32)
- output_scales = tf.zeros([0], dtype=tf.float32)
- output_scores = tf.zeros([0], dtype=tf.float32)
-
- # Process the first scale separately, the following scales will reuse the
- # graph variables.
- (_, output_boxes, output_features, output_scales,
- output_scores) = _ProcessSingleScale(0, output_boxes, output_features,
- output_scales, output_scores)
-
- i = tf.constant(1, dtype=tf.int32)
- num_scales = tf.shape(image_scales)[0]
- keep_going = lambda j, b, f, scales, scores: tf.less(j, num_scales)
-
- (_, output_boxes, output_features, output_scales,
- output_scores) = tf.nest.map_structure(
- tf.stop_gradient,
- tf.while_loop(
- cond=keep_going,
- body=_ProcessSingleScale,
- loop_vars=[
- i, output_boxes, output_features, output_scales, output_scores
- ],
- shape_invariants=[
- i.get_shape(),
- tf.TensorShape([None, 4]),
- tf.TensorShape([None, feature_depth]),
- tf.TensorShape([None]),
- tf.TensorShape([None])
- ]))
-
- feature_boxes = box_list.BoxList(output_boxes)
- feature_boxes.add_field('features', output_features)
- feature_boxes.add_field('scales', output_scales)
- feature_boxes.add_field('scores', output_scores)
-
- nms_max_boxes = tf.minimum(max_feature_num, feature_boxes.num_boxes())
- final_boxes = box_list_ops.non_max_suppression(feature_boxes, iou,
- nms_max_boxes)
-
- return final_boxes.get(), final_boxes.get_field(
- 'features'), final_boxes.get_field('scales'), tf.expand_dims(
- final_boxes.get_field('scores'), 1)
-
-
-@tf.function
-def ExtractGlobalFeatures(image,
- image_scales,
- global_scales_ind,
- model_fn,
- multi_scale_pool_type='None',
- normalize_global_descriptor=False,
- normalization_function=gld.NormalizeImages):
- """Extract global features for input image.
-
- Args:
- image: image tensor of type tf.uint8 with shape [h, w, channels].
- image_scales: 1D float tensor which contains float scales used for image
- pyramid construction.
- global_scales_ind: Feature extraction happens only for a subset of
- `image_scales`, those with corresponding indices from this tensor.
- model_fn: model function. Follows the signature:
- * Args:
- * `images`: Batched image tensor.
- * Returns:
- * `global_descriptors`: Global descriptors for input images.
- multi_scale_pool_type: If set, the global descriptor of each scale is pooled
- and a 1D global descriptor is returned.
- normalize_global_descriptor: If True, output global descriptors are
- L2-normalized.
- normalization_function: Function used for normalization.
-
- Returns:
- global_descriptors: If `multi_scale_pool_type` is 'None', returns a [S, D]
- float tensor. S is the number of scales, and D the global descriptor
- dimensionality. Each D-dimensional entry is a global descriptor, which may
- be L2-normalized depending on `normalize_global_descriptor`. If
- `multi_scale_pool_type` is not 'None', returns a [D] float tensor with the
- pooled global descriptor.
-
- """
- original_image_shape_float = tf.gather(
- tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1])
- image_tensor = normalization_function(
- image, pixel_value_offset=128.0, pixel_value_scale=128.0)
- image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
-
- def _ResizeAndExtract(scale_index):
- """Helper function to resize image then extract global feature.
-
- Args:
- scale_index: A valid index in image_scales.
-
- Returns:
- global_descriptor: [1,D] tensor denoting the extracted global descriptor.
- """
- scale = tf.gather(image_scales, scale_index)
- new_image_size = tf.dtypes.cast(
- tf.round(original_image_shape_float * scale), tf.int32)
- resized_image = tf.image.resize(image_tensor, new_image_size)
- global_descriptor = model_fn(resized_image)
- return global_descriptor
-
- # First loop to find initial scale to be used.
- num_scales = tf.shape(image_scales)[0]
- initial_scale_index = tf.constant(-1, dtype=tf.int32)
- for scale_index in tf.range(num_scales):
- if tf.reduce_any(tf.equal(global_scales_ind, scale_index)):
- initial_scale_index = scale_index
- break
-
- output_global = _ResizeAndExtract(initial_scale_index)
-
- # Loop over subsequent scales.
- for scale_index in tf.range(initial_scale_index + 1, num_scales):
- # Allow an undefined number of global feature scales to be extracted.
- tf.autograph.experimental.set_loop_options(
- shape_invariants=[(output_global, tf.TensorShape([None, None]))])
-
- if tf.reduce_any(tf.equal(global_scales_ind, scale_index)):
- global_descriptor = _ResizeAndExtract(scale_index)
- output_global = tf.concat([output_global, global_descriptor], 0)
-
- normalization_axis = 1
- if multi_scale_pool_type == 'average':
- output_global = tf.reduce_mean(
- output_global,
- axis=0,
- keepdims=False,
- name='multi_scale_average_pooling')
- normalization_axis = 0
- elif multi_scale_pool_type == 'sum':
- output_global = tf.reduce_sum(
- output_global, axis=0, keepdims=False, name='multi_scale_sum_pooling')
- normalization_axis = 0
-
- if normalize_global_descriptor:
- output_global = tf.nn.l2_normalize(
- output_global, axis=normalization_axis, name='l2_normalization')
-
- return output_global
-
-
-@tf.function
-def ExtractLocalAndGlobalFeatures(image, image_scales, max_feature_num,
- abs_thres, global_scales_ind, iou, model_fn,
- stride_factor):
- """Extract local+global features for input image.
-
- Args:
- image: image tensor of type tf.uint8 with shape [h, w, channels].
- image_scales: 1D float tensor which contains float scales used for image
- pyramid construction.
- max_feature_num: int tensor denoting the maximum selected feature points.
- abs_thres: float tensor denoting the score threshold for feature selection.
- global_scales_ind: Global feature extraction happens only for a subset of
- `image_scales`, those with corresponding indices from this tensor.
- iou: float scalar denoting the iou threshold for NMS.
- model_fn: model function. Follows the signature:
- * Args:
- * `images`: Batched image tensor.
- * Returns:
- * `global_descriptors`: Global descriptors for input images.
- * `attention_prob`: Attention map after the non-linearity.
- * `feature_map`: Feature map after ResNet convolution.
- stride_factor: integer accounting for striding after block3.
-
- Returns:
- boxes: [N, 4] float tensor which denotes the selected receptive boxes. N is
- the number of final feature points which pass through keypoint selection
- and NMS steps.
- local_descriptors: [N, depth] float tensor.
- feature_scales: [N] float tensor. It is the inverse of the input image
- scales such that larger image scales correspond to larger image regions,
- which is compatible with keypoints detected with other techniques, for
- example Congas.
- scores: [N, 1] float tensor denoting the attention score.
- global_descriptors: [S, D] float tensor, with the global descriptors for
- each scale; S is the number of scales, and D the global descriptor
- dimensionality.
- """
- original_image_shape_float = tf.gather(
- tf.dtypes.cast(tf.shape(image), tf.float32), [0, 1])
- image_tensor = gld.NormalizeImages(
- image, pixel_value_offset=128.0, pixel_value_scale=128.0)
- image_tensor = tf.expand_dims(image_tensor, 0, name='image/expand_dims')
-
- # Hard code the receptive field parameters for now.
- # We need to revisit this once we change the architecture and selected
- # convolutional blocks to use as local features.
- rf, stride, padding = [291.0, 16.0 * stride_factor, 145.0]
-
- def _ResizeAndExtract(scale_index):
- """Helper function to resize image then extract features.
-
- Args:
- scale_index: A valid index in image_scales.
-
- Returns:
- global_descriptor: [1,D] tensor denoting the extracted global descriptor.
- boxes: Box tensor with the shape of [K, 4].
- local_descriptors: Local descriptor tensor with the shape of [K, depth].
- scales: Scale tensor with the shape of [K].
- scores: Score tensor with the shape of [K].
- """
- scale = tf.gather(image_scales, scale_index)
- new_image_size = tf.dtypes.cast(
- tf.round(original_image_shape_float * scale), tf.int32)
- resized_image = tf.image.resize(image_tensor, new_image_size)
- global_descriptor, attention_prob, feature_map = model_fn(resized_image)
-
- attention_prob = tf.squeeze(attention_prob, axis=[0])
- feature_map = tf.squeeze(feature_map, axis=[0])
-
- # Compute RF boxes and re-project them to the original image space.
- rf_boxes = feature_extractor.CalculateReceptiveBoxes(
- tf.shape(feature_map)[0],
- tf.shape(feature_map)[1], rf, stride, padding)
- rf_boxes = tf.divide(rf_boxes, scale)
-
- attention_prob = tf.reshape(attention_prob, [-1])
- feature_map = tf.reshape(feature_map, [-1, tf.shape(feature_map)[2]])
-
- # Use attention score to select local features.
- indices = tf.reshape(tf.where(attention_prob >= abs_thres), [-1])
- boxes = tf.gather(rf_boxes, indices)
- local_descriptors = tf.gather(feature_map, indices)
- scores = tf.gather(attention_prob, indices)
- scales = tf.ones_like(scores, tf.float32) / scale
-
- return global_descriptor, boxes, local_descriptors, scales, scores
-
- # TODO(andrearaujo): Currently, a global feature is extracted even for scales
- # which are not using it. The obtained result is correct, however feature
- # extraction is slower than expected. We should try to fix this in the future.
-
- # Run first scale.
- (output_global_descriptors, output_boxes, output_local_descriptors,
- output_scales, output_scores) = _ResizeAndExtract(0)
- if not tf.reduce_any(tf.equal(global_scales_ind, 0)):
- # If global descriptor is not using the first scale, clear it out.
- output_global_descriptors = tf.zeros(
- [0, tf.shape(output_global_descriptors)[1]])
-
- # Loop over subsequent scales.
- num_scales = tf.shape(image_scales)[0]
- for scale_index in tf.range(1, num_scales):
- # Allow an undefined number of global feature scales to be extracted.
- tf.autograph.experimental.set_loop_options(
- shape_invariants=[(output_global_descriptors,
- tf.TensorShape([None, None]))])
-
- (global_descriptor, boxes, local_descriptors, scales,
- scores) = _ResizeAndExtract(scale_index)
- output_boxes = tf.concat([output_boxes, boxes], 0)
- output_local_descriptors = tf.concat(
- [output_local_descriptors, local_descriptors], 0)
- output_scales = tf.concat([output_scales, scales], 0)
- output_scores = tf.concat([output_scores, scores], 0)
- if tf.reduce_any(tf.equal(global_scales_ind, scale_index)):
- output_global_descriptors = tf.concat(
- [output_global_descriptors, global_descriptor], 0)
-
- feature_boxes = box_list.BoxList(output_boxes)
- feature_boxes.add_field('local_descriptors', output_local_descriptors)
- feature_boxes.add_field('scales', output_scales)
- feature_boxes.add_field('scores', output_scores)
-
- nms_max_boxes = tf.minimum(max_feature_num, feature_boxes.num_boxes())
- final_boxes = box_list_ops.non_max_suppression(feature_boxes, iou,
- nms_max_boxes)
-
- return (final_boxes.get(), final_boxes.get_field('local_descriptors'),
- final_boxes.get_field('scales'),
- tf.expand_dims(final_boxes.get_field('scores'),
- 1), output_global_descriptors)
diff --git a/research/delf/delf/python/training/model/global_model.py b/research/delf/delf/python/training/model/global_model.py
deleted file mode 100644
index bfeac376955..00000000000
--- a/research/delf/delf/python/training/model/global_model.py
+++ /dev/null
@@ -1,285 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""CNN Image Retrieval model implementation based on the following papers:
-
- [1] Fine-tuning CNN Image Retrieval with No Human Annotation,
- Radenović F., Tolias G., Chum O., TPAMI 2018 [arXiv]
- https://arxiv.org/abs/1711.02512
-
- [2] CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard
- Examples, Radenović F., Tolias G., Chum O., ECCV 2016 [arXiv]
- https://arxiv.org/abs/1604.02426
-"""
-
-import os
-
-import pickle
-import tensorflow as tf
-
-from delf.python.datasets import generic_dataset
-from delf.python.normalization_layers import normalization
-from delf.python.pooling_layers import pooling as pooling_layers
-from delf.python.training import global_features_utils
-
-# Pre-computed global whitening, for most commonly used architectures.
-# Using pre-computed whitening improves the speed of the convergence and the
-# performance.
-_WHITENING_CONFIG = {
- 'ResNet50': 'http://cmp.felk.cvut.cz/cnnimageretrieval_tf'
- '/SFM120k_ResNet50_gem_learned_whitening_config.pkl',
- 'ResNet101': 'http://cmp.felk.cvut.cz/cnnimageretrieval_tf'
- '/SFM120k_ResNet101_gem_learned_whitening_config.pkl',
- 'ResNet152': 'http://cmp.felk.cvut.cz/cnnimageretrieval_tf'
- '/SFM120k_ResNet152_gem_learned_whitening_config.pkl',
- 'VGG19': 'http://cmp.felk.cvut.cz/cnnimageretrieval_tf'
- '/SFM120k_VGG19_gem_learned_whitening_config.pkl'
-}
-
-# Possible global pooling layers.
-_POOLING = {
- 'mac': pooling_layers.MAC,
- 'spoc': pooling_layers.SPoC,
- 'gem': pooling_layers.GeM
-}
-
-# Output dimensionality for supported architectures.
-_OUTPUT_DIM = {
- 'VGG16': 512,
- 'VGG19': 512,
- 'ResNet50': 2048,
- 'ResNet101': 2048,
- 'ResNet101V2': 2048,
- 'ResNet152': 2048,
- 'DenseNet121': 1024,
- 'DenseNet169': 1664,
- 'DenseNet201': 1920,
- 'EfficientNetB5': 2048,
- 'EfficientNetB7': 2560
-}
-
-
-class GlobalFeatureNet(tf.keras.Model):
- """Instantiates global model for image retrieval.
-
- This class implements the [GlobalFeatureNet](
- https://arxiv.org/abs/1711.02512) for image retrieval. The model uses a
- user-defined model as a backbone.
- """
-
- def __init__(self, architecture='ResNet101', pooling='gem',
- whitening=False, pretrained=True, data_root=''):
- """GlobalFeatureNet network initialization.
-
- Args:
- architecture: Network backbone.
- pooling: Pooling method used 'mac'/'spoc'/'gem'.
- whitening: Bool, whether to use whitening.
- pretrained: Bool, whether to initialize the network with the weights
- pretrained on ImageNet.
- data_root: String, path to the data folder where the precomputed
- whitening is/will be saved in case `whitening` is True.
-
- Raises:
- ValueError: If `architecture` is not supported.
- """
- if architecture not in _OUTPUT_DIM.keys():
- raise ValueError("Architecture {} is not supported.".format(architecture))
-
- super(GlobalFeatureNet, self).__init__()
-
- # Get standard output dimensionality size.
- dim = _OUTPUT_DIM[architecture]
-
- if pretrained:
- # Initialize with network pretrained on imagenet.
- net_in = getattr(tf.keras.applications, architecture)(include_top=False,
- weights="imagenet")
- else:
- # Initialize with random weights.
- net_in = getattr(tf.keras.applications, architecture)(include_top=False,
- weights=None)
-
- # Initialize `feature_extractor`. Take only convolutions for
- # `feature_extractor`, always end with ReLU to make last activations
- # non-negative.
- if architecture.lower().startswith('densenet'):
- tmp_model = tf.keras.Sequential()
- tmp_model.add(net_in)
- net_in = tmp_model
- net_in.add(tf.keras.layers.ReLU())
-
- # Initialize pooling.
- self.pool = _POOLING[pooling]()
-
- # Initialize whitening.
- if whitening:
- if pretrained and architecture in _WHITENING_CONFIG:
- # If precomputed whitening for the architecture exists,
- # the fully-connected layer is going to be initialized according to
- # the precomputed layer configuration.
- global_features_utils.debug_and_log(
- ">> {}: for '{}' custom computed whitening '{}' is used."
- .format(os.getcwd(), architecture,
- os.path.basename(_WHITENING_CONFIG[architecture])))
- # The layer configuration is downloaded to the `data_root` folder.
- whiten_dir = os.path.join(data_root, architecture)
- path = tf.keras.utils.get_file(fname=whiten_dir,
- origin=_WHITENING_CONFIG[architecture])
- # Whitening configuration is loaded.
- with tf.io.gfile.GFile(path, 'rb') as learned_whitening_file:
- whitening_config = pickle.load(learned_whitening_file)
- # Whitening layer is initialized according to the configuration.
- self.whiten = tf.keras.layers.Dense.from_config(whitening_config)
- else:
- # In case if no precomputed whitening exists for the chosen
- # architecture, the fully-connected whitening layer is initialized
- # with the random weights.
- self.whiten = tf.keras.layers.Dense(dim, activation=None, use_bias=True)
- global_features_utils.debug_and_log(
- ">> There is either no whitening computed for the "
- "used network architecture or pretrained is False,"
- " random weights are used.")
- else:
- self.whiten = None
-
- # Create meta information to be stored in the network.
- self.meta = {
- 'architecture': architecture,
- 'pooling': pooling,
- 'whitening': whitening,
- 'outputdim': dim
- }
-
- self.feature_extractor = net_in
- self.normalize = normalization.L2Normalization()
-
- def call(self, x, training=False):
- """Invokes the GlobalFeatureNet instance.
-
- Args:
- x: [B, H, W, C] Tensor with a batch of images.
- training: Indicator of whether the forward pass is running in training
- mode or not.
-
- Returns:
- out: [B, out_dim] Global descriptor.
- """
- # Forward pass through the fully-convolutional backbone.
- o = self.feature_extractor(x, training)
- # Pooling.
- o = self.pool(o)
- # Normalization.
- o = self.normalize(o)
-
- # If whitening exists: the pooled global descriptor is whitened and
- # re-normalized.
- if self.whiten is not None:
- o = self.whiten(o)
- o = self.normalize(o)
- return o
-
- def meta_repr(self):
- '''Provides high-level information about the network.
-
- Returns:
- meta: string with the information about the network (used
- architecture, pooling type, whitening, outputdim).
- '''
- tmpstr = '(meta):\n'
- tmpstr += '\tarchitecture: {}\n'.format(self.meta['architecture'])
- tmpstr += '\tpooling: {}\n'.format(self.meta['pooling'])
- tmpstr += '\twhitening: {}\n'.format(self.meta['whitening'])
- tmpstr += '\toutputdim: {}\n'.format(self.meta['outputdim'])
- return tmpstr
-
-
-def extract_global_descriptors_from_list(net, images, image_size,
- bounding_boxes=None, scales=[1.],
- multi_scale_power=1., print_freq=10):
- """Extracting global descriptors from a list of images.
-
- Args:
- net: Model object, network for the forward pass.
- images: Absolute image paths as strings.
- image_size: Integer, defines the maximum size of longer image side.
- bounding_boxes: List of (x1,y1,x2,y2) tuples to crop the query images.
- scales: List of float scales.
- multi_scale_power: Float, multi-scale normalization power parameter.
- print_freq: Printing frequency for debugging.
-
- Returns:
- descriptors: Global descriptors for the input images.
- """
- # Creating dataset loader.
- data = generic_dataset.ImagesFromList(root='', image_paths=images,
- imsize=image_size,
- bounding_boxes=bounding_boxes)
-
- def _data_gen():
- return (inst for inst in data)
-
- loader = tf.data.Dataset.from_generator(_data_gen, output_types=(tf.float32))
- loader = loader.batch(1)
-
- # Extracting vectors.
- descriptors = tf.zeros((0, net.meta['outputdim']))
- for i, input in enumerate(loader):
- if len(scales) == 1 and scales[0] == 1:
- descriptors = tf.concat([descriptors, net(input)], 0)
- else:
- descriptors = tf.concat(
- [descriptors, extract_multi_scale_descriptor(
- net, input, scales, multi_scale_power)], 0)
-
- if (i + 1) % print_freq == 0 or (i + 1) == len(images):
- global_features_utils.debug_and_log(
- '\r>>>> {}/{} done...'.format((i + 1), len(images)),
- debug_on_the_same_line=True)
- global_features_utils.debug_and_log('', log=False)
-
- descriptors = tf.transpose(descriptors, perm=[1, 0])
- return descriptors
-
-
-def extract_multi_scale_descriptor(net, input, scales, multi_scale_power):
- """Extracts the global descriptor multi scale.
-
- Args:
- net: Model object, network for the forward pass.
- input: [B, H, W, C] input tensor in channel-last (BHWC) configuration.
- scales: List of float scales.
- multi_scale_power: Float, multi-scale normalization power parameter.
-
- Returns:
- descriptors: Multi-scale global descriptors for the input images.
- """
- descriptors = tf.zeros(net.meta['outputdim'])
-
- for s in scales:
- if s == 1:
- input_t = input
- else:
- output_shape = s * tf.shape(input)[1:3].numpy()
- input_t = tf.image.resize(input, output_shape,
- method='bilinear',
- preserve_aspect_ratio=True)
- descriptors += tf.pow(net(input_t), multi_scale_power)
-
- descriptors /= len(scales)
- descriptors = tf.pow(descriptors, 1. / multi_scale_power)
- descriptors /= tf.norm(descriptors)
-
- return descriptors
diff --git a/research/delf/delf/python/training/model/global_model_test.py b/research/delf/delf/python/training/model/global_model_test.py
deleted file mode 100644
index b171a089d57..00000000000
--- a/research/delf/delf/python/training/model/global_model_test.py
+++ /dev/null
@@ -1,86 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for the GlobalFeatureNet backbone."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-from absl import flags
-
-import numpy as np
-from PIL import Image
-import tensorflow as tf
-
-from delf.python.training.model import global_model
-
-FLAGS = flags.FLAGS
-
-
-class GlobalFeatureNetTest(tf.test.TestCase):
- """Tests for the GlobalFeatureNet backbone."""
-
- def testInitModel(self):
- """Testing GlobalFeatureNet initialization."""
- # Testing GlobalFeatureNet initialization.
- model_params = {'architecture': 'ResNet101', 'pooling': 'gem',
- 'whitening': False, 'pretrained': True}
- model = global_model.GlobalFeatureNet(**model_params)
- expected_meta = {'architecture': 'ResNet101', 'pooling': 'gem',
- 'whitening': False, 'outputdim': 2048}
- self.assertEqual(expected_meta, model.meta)
-
- def testExtractVectors(self):
- """Tests extraction of global descriptors from list."""
- # Initializing network for testing.
- model_params = {'architecture': 'ResNet101', 'pooling': 'gem',
- 'whitening': False, 'pretrained': True}
- model = global_model.GlobalFeatureNet(**model_params)
-
- # Number of images to be created.
- n = 2
- image_paths = []
-
- # Create `n` dummy images.
- for i in range(n):
- dummy_image = np.random.rand(1024, 750, 3) * 255
- img_out = Image.fromarray(dummy_image.astype('uint8')).convert('RGB')
- filename = os.path.join(FLAGS.test_tmpdir, 'test_image_{}.jpg'.format(i))
- img_out.save(filename)
- image_paths.append(filename)
-
- descriptors = global_model.extract_global_descriptors_from_list(
- model, image_paths, image_size=1024, bounding_boxes=None,
- scales=[1., 3.], multi_scale_power=2, print_freq=1)
- self.assertAllEqual([2048, 2], tf.shape(descriptors))
-
- def testExtractMultiScale(self):
- """Tests multi-scale global descriptor extraction."""
- # Initializing network for testing.
- model_params = {'architecture': 'ResNet101', 'pooling': 'gem',
- 'whitening': False, 'pretrained': True}
- model = global_model.GlobalFeatureNet(**model_params)
-
- input = tf.random.uniform([2, 1024, 750, 3], dtype=tf.float32, seed=0)
- descriptors = global_model.extract_multi_scale_descriptor(
- model, input, scales=[1., 3.], multi_scale_power=2)
- self.assertAllEqual([2, 2048], tf.shape(descriptors))
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/training/model/resnet50.py b/research/delf/delf/python/training/model/resnet50.py
deleted file mode 100644
index 3718ac5b05f..00000000000
--- a/research/delf/delf/python/training/model/resnet50.py
+++ /dev/null
@@ -1,460 +0,0 @@
-# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""ResNet50 backbone used in DELF model.
-
-Copied over from tensorflow/python/eager/benchmarks/resnet50/resnet50.py,
-because that code does not support dependencies.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import functools
-import os
-import tempfile
-
-from absl import logging
-import h5py
-import tensorflow as tf
-
-from delf.python.pooling_layers import pooling as pooling_layers
-
-layers = tf.keras.layers
-
-
-class _IdentityBlock(tf.keras.Model):
- """_IdentityBlock is the block that has no conv layer at shortcut.
-
- Args:
- kernel_size: the kernel size of middle conv layer at main path
- filters: list of integers, the filters of 3 conv layer at main path
- stage: integer, current stage label, used for generating layer names
- block: 'a','b'..., current block label, used for generating layer names
- data_format: data_format for the input ('channels_first' or
- 'channels_last').
- """
-
- def __init__(self, kernel_size, filters, stage, block, data_format):
- super(_IdentityBlock, self).__init__(name='')
- filters1, filters2, filters3 = filters
-
- conv_name_base = 'res' + str(stage) + block + '_branch'
- bn_name_base = 'bn' + str(stage) + block + '_branch'
- bn_axis = 1 if data_format == 'channels_first' else 3
-
- self.conv2a = layers.Conv2D(
- filters1, (1, 1), name=conv_name_base + '2a', data_format=data_format)
- self.bn2a = layers.BatchNormalization(
- axis=bn_axis, name=bn_name_base + '2a')
-
- self.conv2b = layers.Conv2D(
- filters2,
- kernel_size,
- padding='same',
- data_format=data_format,
- name=conv_name_base + '2b')
- self.bn2b = layers.BatchNormalization(
- axis=bn_axis, name=bn_name_base + '2b')
-
- self.conv2c = layers.Conv2D(
- filters3, (1, 1), name=conv_name_base + '2c', data_format=data_format)
- self.bn2c = layers.BatchNormalization(
- axis=bn_axis, name=bn_name_base + '2c')
-
- def call(self, input_tensor, training=False):
- x = self.conv2a(input_tensor)
- x = self.bn2a(x, training=training)
- x = tf.nn.relu(x)
-
- x = self.conv2b(x)
- x = self.bn2b(x, training=training)
- x = tf.nn.relu(x)
-
- x = self.conv2c(x)
- x = self.bn2c(x, training=training)
-
- x += input_tensor
- return tf.nn.relu(x)
-
-
-class _ConvBlock(tf.keras.Model):
- """_ConvBlock is the block that has a conv layer at shortcut.
-
- Args:
- kernel_size: the kernel size of middle conv layer at main path
- filters: list of integers, the filters of 3 conv layer at main path
- stage: integer, current stage label, used for generating layer names
- block: 'a','b'..., current block label, used for generating layer names
- data_format: data_format for the input ('channels_first' or
- 'channels_last').
- strides: strides for the convolution. Note that from stage 3, the first
- conv layer at main path is with strides=(2,2), and the shortcut should
- have strides=(2,2) as well.
- """
-
- def __init__(self,
- kernel_size,
- filters,
- stage,
- block,
- data_format,
- strides=(2, 2)):
- super(_ConvBlock, self).__init__(name='')
- filters1, filters2, filters3 = filters
-
- conv_name_base = 'res' + str(stage) + block + '_branch'
- bn_name_base = 'bn' + str(stage) + block + '_branch'
- bn_axis = 1 if data_format == 'channels_first' else 3
-
- self.conv2a = layers.Conv2D(
- filters1, (1, 1),
- strides=strides,
- name=conv_name_base + '2a',
- data_format=data_format)
- self.bn2a = layers.BatchNormalization(
- axis=bn_axis, name=bn_name_base + '2a')
-
- self.conv2b = layers.Conv2D(
- filters2,
- kernel_size,
- padding='same',
- name=conv_name_base + '2b',
- data_format=data_format)
- self.bn2b = layers.BatchNormalization(
- axis=bn_axis, name=bn_name_base + '2b')
-
- self.conv2c = layers.Conv2D(
- filters3, (1, 1), name=conv_name_base + '2c', data_format=data_format)
- self.bn2c = layers.BatchNormalization(
- axis=bn_axis, name=bn_name_base + '2c')
-
- self.conv_shortcut = layers.Conv2D(
- filters3, (1, 1),
- strides=strides,
- name=conv_name_base + '1',
- data_format=data_format)
- self.bn_shortcut = layers.BatchNormalization(
- axis=bn_axis, name=bn_name_base + '1')
-
- def call(self, input_tensor, training=False):
- x = self.conv2a(input_tensor)
- x = self.bn2a(x, training=training)
- x = tf.nn.relu(x)
-
- x = self.conv2b(x)
- x = self.bn2b(x, training=training)
- x = tf.nn.relu(x)
-
- x = self.conv2c(x)
- x = self.bn2c(x, training=training)
-
- shortcut = self.conv_shortcut(input_tensor)
- shortcut = self.bn_shortcut(shortcut, training=training)
-
- x += shortcut
- return tf.nn.relu(x)
-
-
-# pylint: disable=not-callable
-class ResNet50(tf.keras.Model):
- """Instantiates the ResNet50 architecture.
-
- Args:
- data_format: format for the image. Either 'channels_first' or
- 'channels_last'. 'channels_first' is typically faster on GPUs while
- 'channels_last' is typically faster on CPUs. See
- https://www.tensorflow.org/performance/performance_guide#data_formats
- name: Prefix applied to names of variables created in the model.
- include_top: whether to include the fully-connected layer at the top of the
- network.
- pooling: Optional pooling mode for feature extraction when `include_top` is
- False. 'None' means that the output of the model will be the 4D tensor
- output of the last convolutional layer. 'avg' means that global average
- pooling will be applied to the output of the last convolutional layer, and
- thus the output of the model will be a 2D tensor. 'max' means that global
- max pooling will be applied. 'gem' means GeM pooling will be applied.
- block3_strides: whether to add a stride of 2 to block3 to make it compatible
- with tf.slim ResNet implementation.
- average_pooling: whether to do average pooling of block4 features before
- global pooling.
- classes: optional number of classes to classify images into, only to be
- specified if `include_top` is True.
- gem_power: GeM power for GeM pooling. Only used if pooling == 'gem'.
- embedding_layer: whether to create an embedding layer (FC whitening layer).
- embedding_layer_dim: size of the embedding layer.
-
- Raises:
- ValueError: in case of invalid argument for data_format.
- """
-
- def __init__(self,
- data_format,
- name='',
- include_top=True,
- pooling=None,
- block3_strides=False,
- average_pooling=True,
- classes=1000,
- gem_power=3.0,
- embedding_layer=False,
- embedding_layer_dim=2048):
- super(ResNet50, self).__init__(name=name)
-
- valid_channel_values = ('channels_first', 'channels_last')
- if data_format not in valid_channel_values:
- raise ValueError('Unknown data_format: %s. Valid values: %s' %
- (data_format, valid_channel_values))
- self.include_top = include_top
- self.block3_strides = block3_strides
- self.average_pooling = average_pooling
- self.pooling = pooling
-
- def conv_block(filters, stage, block, strides=(2, 2)):
- return _ConvBlock(
- 3,
- filters,
- stage=stage,
- block=block,
- data_format=data_format,
- strides=strides)
-
- def id_block(filters, stage, block):
- return _IdentityBlock(
- 3, filters, stage=stage, block=block, data_format=data_format)
-
- self.conv1 = layers.Conv2D(
- 64, (7, 7),
- strides=(2, 2),
- data_format=data_format,
- padding='same',
- name='conv1')
- bn_axis = 1 if data_format == 'channels_first' else 3
- self.bn_conv1 = layers.BatchNormalization(axis=bn_axis, name='bn_conv1')
- self.max_pool = layers.MaxPooling2D((3, 3),
- strides=(2, 2),
- data_format=data_format)
-
- self.l2a = conv_block([64, 64, 256], stage=2, block='a', strides=(1, 1))
- self.l2b = id_block([64, 64, 256], stage=2, block='b')
- self.l2c = id_block([64, 64, 256], stage=2, block='c')
-
- self.l3a = conv_block([128, 128, 512], stage=3, block='a')
- self.l3b = id_block([128, 128, 512], stage=3, block='b')
- self.l3c = id_block([128, 128, 512], stage=3, block='c')
- self.l3d = id_block([128, 128, 512], stage=3, block='d')
-
- self.l4a = conv_block([256, 256, 1024], stage=4, block='a')
- self.l4b = id_block([256, 256, 1024], stage=4, block='b')
- self.l4c = id_block([256, 256, 1024], stage=4, block='c')
- self.l4d = id_block([256, 256, 1024], stage=4, block='d')
- self.l4e = id_block([256, 256, 1024], stage=4, block='e')
- self.l4f = id_block([256, 256, 1024], stage=4, block='f')
-
- # Striding layer that can be used on top of block3 to produce feature maps
- # with the same resolution as the TF-Slim implementation.
- if self.block3_strides:
- self.subsampling_layer = layers.MaxPooling2D((1, 1),
- strides=(2, 2),
- data_format=data_format)
- self.l5a = conv_block([512, 512, 2048],
- stage=5,
- block='a',
- strides=(1, 1))
- else:
- self.l5a = conv_block([512, 512, 2048], stage=5, block='a')
- self.l5b = id_block([512, 512, 2048], stage=5, block='b')
- self.l5c = id_block([512, 512, 2048], stage=5, block='c')
-
- self.avg_pool = layers.AveragePooling2D((7, 7),
- strides=(7, 7),
- data_format=data_format)
-
- if self.include_top:
- self.flatten = layers.Flatten()
- self.fc1000 = layers.Dense(classes, name='fc1000')
- else:
- reduction_indices = [1, 2] if data_format == 'channels_last' else [2, 3]
- reduction_indices = tf.constant(reduction_indices)
- if pooling == 'avg':
- self.global_pooling = functools.partial(
- tf.reduce_mean, axis=reduction_indices, keepdims=False)
- elif pooling == 'max':
- self.global_pooling = functools.partial(
- tf.reduce_max, axis=reduction_indices, keepdims=False)
- elif pooling == 'gem':
- logging.info('Adding GeMPooling layer with power %f', gem_power)
- self.global_pooling = functools.partial(
- pooling_layers.gem, axis=reduction_indices, power=gem_power)
- else:
- self.global_pooling = None
- if embedding_layer:
- logging.info('Adding embedding layer with dimension %d',
- embedding_layer_dim)
- self.embedding_layer = layers.Dense(
- embedding_layer_dim, name='embedding_layer')
- else:
- self.embedding_layer = None
-
- def build_call(self, inputs, training=True, intermediates_dict=None):
- """Building the ResNet50 model.
-
- Args:
- inputs: Images to compute features for.
- training: Whether model is in training phase.
- intermediates_dict: `None` or dictionary. If not None, accumulate feature
- maps from intermediate blocks into the dictionary. ""
-
- Returns:
- Tensor with featuremap.
- """
-
- x = self.conv1(inputs)
- x = self.bn_conv1(x, training=training)
- x = tf.nn.relu(x)
- if intermediates_dict is not None:
- intermediates_dict['block0'] = x
-
- x = self.max_pool(x)
- if intermediates_dict is not None:
- intermediates_dict['block0mp'] = x
-
- # Block 1 (equivalent to "conv2" in Resnet paper).
- x = self.l2a(x, training=training)
- x = self.l2b(x, training=training)
- x = self.l2c(x, training=training)
- if intermediates_dict is not None:
- intermediates_dict['block1'] = x
-
- # Block 2 (equivalent to "conv3" in Resnet paper).
- x = self.l3a(x, training=training)
- x = self.l3b(x, training=training)
- x = self.l3c(x, training=training)
- x = self.l3d(x, training=training)
- if intermediates_dict is not None:
- intermediates_dict['block2'] = x
-
- # Block 3 (equivalent to "conv4" in Resnet paper).
- x = self.l4a(x, training=training)
- x = self.l4b(x, training=training)
- x = self.l4c(x, training=training)
- x = self.l4d(x, training=training)
- x = self.l4e(x, training=training)
- x = self.l4f(x, training=training)
-
- if self.block3_strides:
- x = self.subsampling_layer(x)
- if intermediates_dict is not None:
- intermediates_dict['block3'] = x
- else:
- if intermediates_dict is not None:
- intermediates_dict['block3'] = x
-
- x = self.l5a(x, training=training)
- x = self.l5b(x, training=training)
- x = self.l5c(x, training=training)
-
- if self.average_pooling:
- x = self.avg_pool(x)
- if intermediates_dict is not None:
- intermediates_dict['block4'] = x
- else:
- if intermediates_dict is not None:
- intermediates_dict['block4'] = x
-
- if self.include_top:
- return self.fc1000(self.flatten(x))
- elif self.global_pooling:
- x = self.global_pooling(x)
- if self.embedding_layer:
- x = self.embedding_layer(x)
- return x
- else:
- return x
-
- def call(self, inputs, training=True, intermediates_dict=None):
- """Call the ResNet50 model.
-
- Args:
- inputs: Images to compute features for.
- training: Whether model is in training phase.
- intermediates_dict: `None` or dictionary. If not None, accumulate feature
- maps from intermediate blocks into the dictionary. ""
-
- Returns:
- Tensor with featuremap.
- """
- return self.build_call(inputs, training, intermediates_dict)
-
- def restore_weights(self, filepath):
- """Load pretrained weights.
-
- This function loads a .h5 file from the filepath with saved model weights
- and assigns them to the model.
-
- Args:
- filepath: String, path to the .h5 file
-
- Raises:
- ValueError: if the file referenced by `filepath` does not exist.
- """
- if not tf.io.gfile.exists(filepath):
- raise ValueError('Unable to load weights from %s. You must provide a'
- 'valid file.' % (filepath))
-
- # Create a local copy of the weights file for h5py to be able to read it.
- local_filename = os.path.basename(filepath)
- tmp_filename = os.path.join(tempfile.gettempdir(), local_filename)
- tf.io.gfile.copy(filepath, tmp_filename, overwrite=True)
-
- # Load the content of the weights file.
- f = h5py.File(tmp_filename, mode='r')
- saved_layer_names = [n.decode('utf8') for n in f.attrs['layer_names']]
-
- try:
- # Iterate through all the layers assuming the max `depth` is 2.
- for layer in self.layers:
- if hasattr(layer, 'layers'):
- for inlayer in layer.layers:
- # Make sure the weights are in the saved model, and that we are in
- # the innermost layer.
- if inlayer.name not in saved_layer_names:
- raise ValueError('Layer %s absent from the pretrained weights.'
- 'Unable to load its weights.' % (inlayer.name))
- if hasattr(inlayer, 'layers'):
- raise ValueError('Layer %s is not a depth 2 layer. Unable to load'
- 'its weights.' % (inlayer.name))
- # Assign the weights in the current layer.
- g = f[inlayer.name]
- weight_names = [n.decode('utf8') for n in g.attrs['weight_names']]
- weight_values = [g[weight_name] for weight_name in weight_names]
- logging.info('Setting the weights for layer %s', inlayer.name)
- inlayer.set_weights(weight_values)
- finally:
- # Clean up the temporary file.
- tf.io.gfile.remove(tmp_filename)
-
- def log_weights(self):
- """Log backbone weights."""
- logging.info('Logging backbone weights')
- logging.info('------------------------')
- for layer in self.layers:
- if hasattr(layer, 'layers'):
- for inlayer in layer.layers:
- logging.info('Weights for layer: %s, inlayer % s', layer.name,
- inlayer.name)
- weights = inlayer.get_weights()
- logging.info(weights)
- else:
- logging.info('Layer %s does not have inner layers.', layer.name)
diff --git a/research/delf/delf/python/training/tensorboard_utils.py b/research/delf/delf/python/training/tensorboard_utils.py
deleted file mode 100644
index f1d5e3f23e5..00000000000
--- a/research/delf/delf/python/training/tensorboard_utils.py
+++ /dev/null
@@ -1,31 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Utilities for tensorboard."""
-
-from tensorboard import program
-
-from delf.python.training import global_features_utils
-
-
-def launch_tensorboard(log_dir):
- """Runs tensorboard with the given `log_dir`.
-
- Args:
- log_dir: String, directory to launch tensorboard in.
- """
- tensorboard = program.TensorBoard()
- tensorboard.configure(argv=[None, '--logdir', log_dir])
- url = tensorboard.launch()
- global_features_utils.debug_and_log("Launching Tensorboard: {}".format(url))
diff --git a/research/delf/delf/python/training/train.py b/research/delf/delf/python/training/train.py
deleted file mode 100644
index d21decdd49b..00000000000
--- a/research/delf/delf/python/training/train.py
+++ /dev/null
@@ -1,558 +0,0 @@
-# Lint as: python3
-# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Training script for DELF/G on Google Landmarks Dataset.
-
-Uses classification loss, with MirroredStrategy, to support running on multiple
-GPUs.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import time
-
-from absl import app
-from absl import flags
-from absl import logging
-import tensorflow as tf
-import tensorflow_probability as tfp
-
-# Placeholder for internal import. Do not remove this line.
-from delf.python.datasets.google_landmarks_dataset import googlelandmarks as gld
-from delf.python.training.model import delf_model
-from delf.python.training.model import delg_model
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_boolean('debug', False, 'Debug mode.')
-flags.DEFINE_string('logdir', '/tmp/delf', 'WithTensorBoard logdir.')
-flags.DEFINE_string('train_file_pattern', '/tmp/data/train*',
- 'File pattern of training dataset files.')
-flags.DEFINE_string('validation_file_pattern', '/tmp/data/validation*',
- 'File pattern of validation dataset files.')
-flags.DEFINE_enum(
- 'dataset_version', 'gld_v1', ['gld_v1', 'gld_v2', 'gld_v2_clean'],
- 'Google Landmarks dataset version, used to determine the number of '
- 'classes.')
-flags.DEFINE_integer('seed', 0, 'Seed to training dataset.')
-flags.DEFINE_float('initial_lr', 0.01, 'Initial learning rate.')
-flags.DEFINE_integer('batch_size', 32, 'Global batch size.')
-flags.DEFINE_integer('max_iters', 500000, 'Maximum iterations.')
-flags.DEFINE_boolean('block3_strides', True, 'Whether to use block3_strides.')
-flags.DEFINE_boolean('use_augmentation', True,
- 'Whether to use ImageNet style augmentation.')
-flags.DEFINE_string(
- 'imagenet_checkpoint', None,
- 'ImageNet checkpoint for ResNet backbone. If None, no checkpoint is used.')
-flags.DEFINE_float(
- 'attention_loss_weight', 1.0,
- 'Weight to apply to the attention loss when calculating the '
- 'total loss of the model.')
-flags.DEFINE_boolean('delg_global_features', False,
- 'Whether to train a DELG model.')
-flags.DEFINE_float(
- 'delg_gem_power', 3.0, 'Power for Generalized Mean pooling. Used only if '
- 'delg_global_features=True.')
-flags.DEFINE_integer(
- 'delg_embedding_layer_dim', 2048,
- 'Size of the FC whitening layer (embedding layer). Used only if'
- 'delg_global_features:True.')
-flags.DEFINE_float(
- 'delg_scale_factor_init', 45.25,
- 'Initial value of the scaling factor of the cosine logits. The default '
- 'value is sqrt(2048). Used only if delg_global_features=True.')
-flags.DEFINE_float('delg_arcface_margin', 0.1,
- 'ArcFace margin. Used only if delg_global_features=True.')
-flags.DEFINE_integer('image_size', 321, 'Size of each image side to use.')
-flags.DEFINE_boolean('use_autoencoder', True,
- 'Whether to train an autoencoder.')
-flags.DEFINE_float(
- 'reconstruction_loss_weight', 10.0,
- 'Weight to apply to the reconstruction loss from the autoencoder when'
- 'calculating total loss of the model. Used only if use_autoencoder=True.')
-flags.DEFINE_float(
- 'autoencoder_dimensions', 128,
- 'Number of dimensions of the autoencoder. Used only if'
- 'use_autoencoder=True.')
-flags.DEFINE_float(
- 'local_feature_map_channels', 1024,
- 'Number of channels at backbone layer used for local feature extraction. '
- 'Default value 1024 is the number of channels of block3. Used only if'
- 'use_autoencoder=True.')
-
-
-def _record_accuracy(metric, logits, labels):
- """Record accuracy given predicted logits and ground-truth labels."""
- softmax_probabilities = tf.keras.layers.Softmax()(logits)
- metric.update_state(labels, softmax_probabilities)
-
-
-def _attention_summaries(scores, global_step):
- """Record statistics of the attention score."""
- tf.summary.image(
- 'batch_attention',
- scores / tf.reduce_max(scores + 1e-3),
- step=global_step)
- tf.summary.scalar('attention/max', tf.reduce_max(scores), step=global_step)
- tf.summary.scalar('attention/min', tf.reduce_min(scores), step=global_step)
- tf.summary.scalar('attention/mean', tf.reduce_mean(scores), step=global_step)
- tf.summary.scalar(
- 'attention/percent_25',
- tfp.stats.percentile(scores, 25.0),
- step=global_step)
- tf.summary.scalar(
- 'attention/percent_50',
- tfp.stats.percentile(scores, 50.0),
- step=global_step)
- tf.summary.scalar(
- 'attention/percent_75',
- tfp.stats.percentile(scores, 75.0),
- step=global_step)
-
-
-def create_model(num_classes):
- """Define DELF model, and initialize classifiers."""
- if FLAGS.delg_global_features:
- model = delg_model.Delg(
- block3_strides=FLAGS.block3_strides,
- name='DELG',
- gem_power=FLAGS.delg_gem_power,
- embedding_layer_dim=FLAGS.delg_embedding_layer_dim,
- scale_factor_init=FLAGS.delg_scale_factor_init,
- arcface_margin=FLAGS.delg_arcface_margin,
- use_dim_reduction=FLAGS.use_autoencoder,
- reduced_dimension=FLAGS.autoencoder_dimensions,
- dim_expand_channels=FLAGS.local_feature_map_channels)
- else:
- model = delf_model.Delf(
- block3_strides=FLAGS.block3_strides,
- name='DELF',
- use_dim_reduction=FLAGS.use_autoencoder,
- reduced_dimension=FLAGS.autoencoder_dimensions,
- dim_expand_channels=FLAGS.local_feature_map_channels)
- model.init_classifiers(num_classes)
- return model
-
-
-def _learning_rate_schedule(global_step_value, max_iters, initial_lr):
- """Calculates learning_rate with linear decay.
-
- Args:
- global_step_value: int, global step.
- max_iters: int, maximum iterations.
- initial_lr: float, initial learning rate.
-
- Returns:
- lr: float, learning rate.
- """
- lr = initial_lr * (1.0 - global_step_value / max_iters)
- return lr
-
-
-def main(argv):
- if len(argv) > 1:
- raise app.UsageError('Too many command-line arguments.')
-
- #-------------------------------------------------------------
- # Log flags used.
- logging.info('Running training script with\n')
- logging.info('logdir= %s', FLAGS.logdir)
- logging.info('initial_lr= %f', FLAGS.initial_lr)
- logging.info('block3_strides= %s', str(FLAGS.block3_strides))
-
- # ------------------------------------------------------------
- # Create the strategy.
- strategy = tf.distribute.MirroredStrategy()
- logging.info('Number of devices: %d', strategy.num_replicas_in_sync)
- if FLAGS.debug:
- print('Number of devices:', strategy.num_replicas_in_sync)
-
- max_iters = FLAGS.max_iters
- global_batch_size = FLAGS.batch_size
- image_size = FLAGS.image_size
- num_eval_batches = int(50000 / global_batch_size)
- report_interval = 100
- eval_interval = 1000
- save_interval = 1000
-
- initial_lr = FLAGS.initial_lr
-
- clip_val = tf.constant(10.0)
-
- if FLAGS.debug:
- tf.config.run_functions_eagerly(True)
- global_batch_size = 4
- max_iters = 100
- num_eval_batches = 1
- save_interval = 1
- report_interval = 10
-
- # Determine the number of classes based on the version of the dataset.
- gld_info = gld.GoogleLandmarksInfo()
- num_classes = gld_info.num_classes[FLAGS.dataset_version]
-
- # ------------------------------------------------------------
- # Create the distributed train/validation sets.
- train_dataset = gld.CreateDataset(
- file_pattern=FLAGS.train_file_pattern,
- batch_size=global_batch_size,
- image_size=image_size,
- augmentation=FLAGS.use_augmentation,
- seed=FLAGS.seed)
- validation_dataset = gld.CreateDataset(
- file_pattern=FLAGS.validation_file_pattern,
- batch_size=global_batch_size,
- image_size=image_size,
- augmentation=False,
- seed=FLAGS.seed)
-
- train_dist_dataset = strategy.experimental_distribute_dataset(train_dataset)
- validation_dist_dataset = strategy.experimental_distribute_dataset(
- validation_dataset)
-
- train_iter = iter(train_dist_dataset)
- validation_iter = iter(validation_dist_dataset)
-
- # Create a checkpoint directory to store the checkpoints.
- checkpoint_prefix = os.path.join(FLAGS.logdir, 'delf_tf2-ckpt')
-
- # ------------------------------------------------------------
- # Finally, we do everything in distributed scope.
- with strategy.scope():
- # Compute loss.
- # Set reduction to `none` so we can do the reduction afterwards and divide
- # by global batch size.
- loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
- from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
-
- def compute_loss(labels, predictions):
- per_example_loss = loss_object(labels, predictions)
- return tf.nn.compute_average_loss(
- per_example_loss, global_batch_size=global_batch_size)
-
- # Set up metrics.
- desc_validation_loss = tf.keras.metrics.Mean(name='desc_validation_loss')
- attn_validation_loss = tf.keras.metrics.Mean(name='attn_validation_loss')
- desc_train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
- name='desc_train_accuracy')
- attn_train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
- name='attn_train_accuracy')
- desc_validation_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
- name='desc_validation_accuracy')
- attn_validation_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
- name='attn_validation_accuracy')
-
- # ------------------------------------------------------------
- # Setup DELF model and optimizer.
- model = create_model(num_classes)
- logging.info('Model, datasets loaded.\nnum_classes= %d', num_classes)
-
- optimizer = tf.keras.optimizers.SGD(learning_rate=initial_lr, momentum=0.9)
-
- # Setup summary writer.
- summary_writer = tf.summary.create_file_writer(
- os.path.join(FLAGS.logdir, 'train_logs'), flush_millis=10000)
-
- # Setup checkpoint directory.
- checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
- manager = tf.train.CheckpointManager(
- checkpoint,
- checkpoint_prefix,
- max_to_keep=10,
- keep_checkpoint_every_n_hours=3)
- # Restores the checkpoint, if existing.
- checkpoint.restore(manager.latest_checkpoint)
-
- # ------------------------------------------------------------
- # Train step to run on one GPU.
- def train_step(inputs):
- """Train one batch."""
- images, labels = inputs
- # Temporary workaround to avoid some corrupted labels.
- labels = tf.clip_by_value(labels, 0, model.num_classes)
-
- def _backprop_loss(tape, loss, weights):
- """Backpropogate losses using clipped gradients.
-
- Args:
- tape: gradient tape.
- loss: scalar Tensor, loss value.
- weights: keras model weights.
- """
- gradients = tape.gradient(loss, weights)
- clipped, _ = tf.clip_by_global_norm(gradients, clip_norm=clip_val)
- optimizer.apply_gradients(zip(clipped, weights))
-
- # Record gradients and loss through backbone.
- with tf.GradientTape() as gradient_tape:
- # Make a forward pass to calculate prelogits.
- (desc_prelogits, attn_prelogits, attn_scores, backbone_blocks,
- dim_expanded_features, _) = model.global_and_local_forward_pass(images)
-
- # Calculate global loss by applying the descriptor classifier.
- if FLAGS.delg_global_features:
- desc_logits = model.desc_classification(desc_prelogits, labels)
- else:
- desc_logits = model.desc_classification(desc_prelogits)
- desc_loss = compute_loss(labels, desc_logits)
-
- # Calculate attention loss by applying the attention block classifier.
- attn_logits = model.attn_classification(attn_prelogits)
- attn_loss = compute_loss(labels, attn_logits)
-
- # Calculate reconstruction loss between the attention prelogits and the
- # backbone.
- if FLAGS.use_autoencoder:
- block3 = tf.stop_gradient(backbone_blocks['block3'])
- reconstruction_loss = tf.math.reduce_mean(
- tf.keras.losses.MSE(block3, dim_expanded_features))
- else:
- reconstruction_loss = 0
-
- # Cumulate global loss, attention loss and reconstruction loss.
- total_loss = (
- desc_loss + FLAGS.attention_loss_weight * attn_loss +
- FLAGS.reconstruction_loss_weight * reconstruction_loss)
-
- # Perform backpropagation through the descriptor and attention layers
- # together. Note that this will increment the number of iterations of
- # "optimizer".
- _backprop_loss(gradient_tape, total_loss, model.trainable_weights)
-
- # Step number, for summary purposes.
- global_step = optimizer.iterations
-
- # Input image-related summaries.
- tf.summary.image('batch_images', (images + 1.0) / 2.0, step=global_step)
- tf.summary.scalar(
- 'image_range/max', tf.reduce_max(images), step=global_step)
- tf.summary.scalar(
- 'image_range/min', tf.reduce_min(images), step=global_step)
-
- # Attention and sparsity summaries.
- _attention_summaries(attn_scores, global_step)
- activations_zero_fractions = {
- 'sparsity/%s' % k: tf.nn.zero_fraction(v)
- for k, v in backbone_blocks.items()
- }
- for k, v in activations_zero_fractions.items():
- tf.summary.scalar(k, v, step=global_step)
-
- # Scaling factor summary for cosine logits for a DELG model.
- if FLAGS.delg_global_features:
- tf.summary.scalar(
- 'desc/scale_factor', model.scale_factor, step=global_step)
-
- # Record train accuracies.
- _record_accuracy(desc_train_accuracy, desc_logits, labels)
- _record_accuracy(attn_train_accuracy, attn_logits, labels)
-
- return desc_loss, attn_loss, reconstruction_loss
-
- # ------------------------------------------------------------
- def validation_step(inputs):
- """Validate one batch."""
- images, labels = inputs
- labels = tf.clip_by_value(labels, 0, model.num_classes)
-
- # Get descriptor predictions.
- blocks = {}
- prelogits = model.backbone(
- images, intermediates_dict=blocks, training=False)
- if FLAGS.delg_global_features:
- logits = model.desc_classification(prelogits, labels, training=False)
- else:
- logits = model.desc_classification(prelogits, training=False)
- softmax_probabilities = tf.keras.layers.Softmax()(logits)
-
- validation_loss = loss_object(labels, logits)
- desc_validation_loss.update_state(validation_loss)
- desc_validation_accuracy.update_state(labels, softmax_probabilities)
-
- # Get attention predictions.
- block3 = blocks['block3'] # pytype: disable=key-error
- prelogits, _, _ = model.attention(block3, training=False)
-
- logits = model.attn_classification(prelogits, training=False)
- softmax_probabilities = tf.keras.layers.Softmax()(logits)
-
- validation_loss = loss_object(labels, logits)
- attn_validation_loss.update_state(validation_loss)
- attn_validation_accuracy.update_state(labels, softmax_probabilities)
-
- return desc_validation_accuracy.result(), attn_validation_accuracy.result(
- )
-
- # `run` replicates the provided computation and runs it
- # with the distributed input.
- @tf.function
- def distributed_train_step(dataset_inputs):
- """Get the actual losses."""
- # Each (desc, attn) is a list of 3 losses - crossentropy, reg, total.
- desc_per_replica_loss, attn_per_replica_loss, recon_per_replica_loss = (
- strategy.run(train_step, args=(dataset_inputs,)))
-
- # Reduce over the replicas.
- desc_global_loss = strategy.reduce(
- tf.distribute.ReduceOp.SUM, desc_per_replica_loss, axis=None)
- attn_global_loss = strategy.reduce(
- tf.distribute.ReduceOp.SUM, attn_per_replica_loss, axis=None)
- recon_global_loss = strategy.reduce(
- tf.distribute.ReduceOp.SUM, recon_per_replica_loss, axis=None)
-
- return desc_global_loss, attn_global_loss, recon_global_loss
-
- @tf.function
- def distributed_validation_step(dataset_inputs):
- return strategy.run(validation_step, args=(dataset_inputs,))
-
- # ------------------------------------------------------------
- # *** TRAIN LOOP ***
- with summary_writer.as_default():
- record_cond = lambda: tf.equal(optimizer.iterations % report_interval, 0)
- with tf.summary.record_if(record_cond):
- global_step_value = optimizer.iterations.numpy()
-
- # TODO(dananghel): try to load pretrained weights at backbone creation.
- # Load pretrained weights for ResNet50 trained on ImageNet.
- if (FLAGS.imagenet_checkpoint is not None) and (not global_step_value):
- logging.info('Attempting to load ImageNet pretrained weights.')
- input_batch = next(train_iter)
- _, _, _ = distributed_train_step(input_batch)
- model.backbone.restore_weights(FLAGS.imagenet_checkpoint)
- logging.info('Done.')
- else:
- logging.info('Skip loading ImageNet pretrained weights.')
- if FLAGS.debug:
- model.backbone.log_weights()
-
- last_summary_step_value = None
- last_summary_time = None
- while global_step_value < max_iters:
- # input_batch : images(b, h, w, c), labels(b,).
- try:
- input_batch = next(train_iter)
- except tf.errors.OutOfRangeError:
- # Break if we run out of data in the dataset.
- logging.info('Stopping training at global step %d, no more data',
- global_step_value)
- break
-
- # Set learning rate and run the training step over num_gpu gpus.
- optimizer.learning_rate = _learning_rate_schedule(
- optimizer.iterations.numpy(), max_iters, initial_lr)
- desc_dist_loss, attn_dist_loss, recon_dist_loss = (
- distributed_train_step(input_batch))
-
- # Step number, to be used for summary/logging.
- global_step = optimizer.iterations
- global_step_value = global_step.numpy()
-
- # LR, losses and accuracies summaries.
- tf.summary.scalar(
- 'learning_rate', optimizer.learning_rate, step=global_step)
- tf.summary.scalar(
- 'loss/desc/crossentropy', desc_dist_loss, step=global_step)
- tf.summary.scalar(
- 'loss/attn/crossentropy', attn_dist_loss, step=global_step)
- if FLAGS.use_autoencoder:
- tf.summary.scalar(
- 'loss/recon/mse', recon_dist_loss, step=global_step)
-
- tf.summary.scalar(
- 'train_accuracy/desc',
- desc_train_accuracy.result(),
- step=global_step)
- tf.summary.scalar(
- 'train_accuracy/attn',
- attn_train_accuracy.result(),
- step=global_step)
-
- # Summary for number of global steps taken per second.
- current_time = time.time()
- if (last_summary_step_value is not None and
- last_summary_time is not None):
- tf.summary.scalar(
- 'global_steps_per_sec',
- (global_step_value - last_summary_step_value) /
- (current_time - last_summary_time),
- step=global_step)
- if tf.summary.should_record_summaries().numpy():
- last_summary_step_value = global_step_value
- last_summary_time = current_time
-
- # Print to console if running locally.
- if FLAGS.debug:
- if global_step_value % report_interval == 0:
- print(global_step.numpy())
- print('desc:', desc_dist_loss.numpy())
- print('attn:', attn_dist_loss.numpy())
-
- # Validate once in {eval_interval*n, n \in N} steps.
- if global_step_value % eval_interval == 0:
- for i in range(num_eval_batches):
- try:
- validation_batch = next(validation_iter)
- desc_validation_result, attn_validation_result = (
- distributed_validation_step(validation_batch))
- except tf.errors.OutOfRangeError:
- logging.info('Stopping eval at batch %d, no more data', i)
- break
-
- # Log validation results to tensorboard.
- tf.summary.scalar(
- 'validation/desc', desc_validation_result, step=global_step)
- tf.summary.scalar(
- 'validation/attn', attn_validation_result, step=global_step)
-
- logging.info('\nValidation(%f)\n', global_step_value)
- logging.info(': desc: %f\n', desc_validation_result.numpy())
- logging.info(': attn: %f\n', attn_validation_result.numpy())
- # Print to console.
- if FLAGS.debug:
- print('Validation: desc:', desc_validation_result.numpy())
- print(' : attn:', attn_validation_result.numpy())
-
- # Save checkpoint once (each save_interval*n, n \in N) steps, or if
- # this is the last iteration.
- # TODO(andrearaujo): save only in one of the two ways. They are
- # identical, the only difference is that the manager adds some extra
- # prefixes and variables (eg, optimizer variables).
- if (global_step_value % save_interval
- == 0) or (global_step_value >= max_iters):
- save_path = manager.save(checkpoint_number=global_step_value)
- logging.info('Saved (%d) at %s', global_step_value, save_path)
-
- file_path = '%s/delf_weights' % FLAGS.logdir
- model.save_weights(file_path, save_format='tf')
- logging.info('Saved weights (%d) at %s', global_step_value,
- file_path)
-
- # Reset metrics for next step.
- desc_train_accuracy.reset_states()
- attn_train_accuracy.reset_states()
- desc_validation_loss.reset_states()
- attn_validation_loss.reset_states()
- desc_validation_accuracy.reset_states()
- attn_validation_accuracy.reset_states()
-
- logging.info('Finished training for %d steps.', max_iters)
-
-
-if __name__ == '__main__':
- app.run(main)
diff --git a/research/delf/delf/python/utils.py b/research/delf/delf/python/utils.py
deleted file mode 100644
index 46b62cbdf31..00000000000
--- a/research/delf/delf/python/utils.py
+++ /dev/null
@@ -1,104 +0,0 @@
-# Copyright 2020 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Helper functions for DELF."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-from PIL import Image
-from PIL import ImageFile
-import tensorflow as tf
-
-# To avoid PIL crashing for truncated (corrupted) images.
-ImageFile.LOAD_TRUNCATED_IMAGES = True
-
-
-def RgbLoader(path):
- """Helper function to read image with PIL.
-
- Args:
- path: Path to image to be loaded.
-
- Returns:
- PIL image in RGB format.
- """
- with tf.io.gfile.GFile(path, 'rb') as f:
- img = Image.open(f)
- return img.convert('RGB')
-
-
-def ResizeImage(image, config, resize_factor=1.0):
- """Resizes image according to config.
-
- Args:
- image: Uint8 array with shape (height, width, 3).
- config: DelfConfig proto containing the model configuration.
- resize_factor: Optional float resize factor for the input image. If given,
- the maximum and minimum allowed image sizes in `config` are scaled by this
- factor. Must be non-negative.
-
- Returns:
- resized_image: Uint8 array with resized image.
- scale_factors: 2D float array, with factors used for resizing along height
- and width (If upscaling, larger than 1; if downscaling, smaller than 1).
-
- Raises:
- ValueError: If `image` has incorrect number of dimensions/channels.
- """
- if resize_factor < 0.0:
- raise ValueError('negative resize_factor is not allowed: %f' %
- resize_factor)
- if image.ndim != 3:
- raise ValueError('image has incorrect number of dimensions: %d' %
- image.ndims)
- height, width, channels = image.shape
-
- # Take into account resize factor.
- max_image_size = resize_factor * config.max_image_size
- min_image_size = resize_factor * config.min_image_size
-
- if channels != 3:
- raise ValueError('image has incorrect number of channels: %d' % channels)
-
- largest_side = max(width, height)
-
- if max_image_size >= 0 and largest_side > max_image_size:
- scale_factor = max_image_size / largest_side
- elif min_image_size >= 0 and largest_side < min_image_size:
- scale_factor = min_image_size / largest_side
- elif config.use_square_images and (height != width):
- scale_factor = 1.0
- else:
- # No resizing needed, early return.
- return image, np.ones(2, dtype=float)
-
- # Note that new_shape is in (width, height) format (PIL convention), while
- # scale_factors are in (height, width) convention (NumPy convention).
- if config.use_square_images:
- new_shape = (int(round(largest_side * scale_factor)),
- int(round(largest_side * scale_factor)))
- else:
- new_shape = (int(round(width * scale_factor)),
- int(round(height * scale_factor)))
-
- scale_factors = np.array([new_shape[1] / height, new_shape[0] / width],
- dtype=float)
-
- pil_image = Image.fromarray(image)
- resized_image = np.array(pil_image.resize(new_shape, resample=Image.BILINEAR))
-
- return resized_image, scale_factors
diff --git a/research/delf/delf/python/utils_test.py b/research/delf/delf/python/utils_test.py
deleted file mode 100644
index a07d86d75d8..00000000000
--- a/research/delf/delf/python/utils_test.py
+++ /dev/null
@@ -1,103 +0,0 @@
-# Copyright 2020 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for helper utilities."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from absl.testing import parameterized
-import numpy as np
-import tensorflow as tf
-
-from delf import delf_config_pb2
-from delf import utils
-
-
-class UtilsTest(tf.test.TestCase, parameterized.TestCase):
-
- @parameterized.named_parameters(
- ('Max-1Min-1', -1, -1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
- ('Max-1Min-1Square', -1, -1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
- ('Max2Min-1', 2, -1, 1.0, False, [2, 1, 3], [0.5, 0.5]),
- ('Max2Min-1Square', 2, -1, 1.0, True, [2, 2, 3], [0.5, 1.0]),
- ('Max8Min-1', 8, -1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
- ('Max8Min-1Square', 8, -1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
- ('Max-1Min1', -1, 1, 1.0, False, [4, 2, 3], [1.0, 1.0]),
- ('Max-1Min1Square', -1, 1, 1.0, True, [4, 4, 3], [1.0, 2.0]),
- ('Max-1Min8', -1, 8, 1.0, False, [8, 4, 3], [2.0, 2.0]),
- ('Max-1Min8Square', -1, 8, 1.0, True, [8, 8, 3], [2.0, 4.0]),
- ('Max16Min8', 16, 8, 1.0, False, [8, 4, 3], [2.0, 2.0]),
- ('Max16Min8Square', 16, 8, 1.0, True, [8, 8, 3], [2.0, 4.0]),
- ('Max2Min2', 2, 2, 1.0, False, [2, 1, 3], [0.5, 0.5]),
- ('Max2Min2Square', 2, 2, 1.0, True, [2, 2, 3], [0.5, 1.0]),
- ('Max-1Min-1Factor0.5', -1, -1, 0.5, False, [4, 2, 3], [1.0, 1.0]),
- ('Max-1Min-1Factor0.5Square', -1, -1, 0.5, True, [4, 4, 3], [1.0, 2.0]),
- ('Max2Min-1Factor2.0', 2, -1, 2.0, False, [4, 2, 3], [1.0, 1.0]),
- ('Max2Min-1Factor2.0Square', 2, -1, 2.0, True, [4, 4, 3], [1.0, 2.0]),
- ('Max-1Min8Factor0.5', -1, 8, 0.5, False, [4, 2, 3], [1.0, 1.0]),
- ('Max-1Min8Factor0.5Square', -1, 8, 0.5, True, [4, 4, 3], [1.0, 2.0]),
- ('Max-1Min8Factor0.25', -1, 8, 0.25, False, [4, 2, 3], [1.0, 1.0]),
- ('Max-1Min8Factor0.25Square', -1, 8, 0.25, True, [4, 4, 3], [1.0, 2.0]),
- ('Max2Min2Factor2.0', 2, 2, 2.0, False, [4, 2, 3], [1.0, 1.0]),
- ('Max2Min2Factor2.0Square', 2, 2, 2.0, True, [4, 4, 3], [1.0, 2.0]),
- ('Max16Min8Factor0.5', 16, 8, 0.5, False, [4, 2, 3], [1.0, 1.0]),
- ('Max16Min8Factor0.5Square', 16, 8, 0.5, True, [4, 4, 3], [1.0, 2.0]),
- )
- def testResizeImageWorks(self, max_image_size, min_image_size, resize_factor,
- square_output, expected_shape,
- expected_scale_factors):
- # Construct image of size 4x2x3.
- image = np.array([[[0, 0, 0], [1, 1, 1]], [[2, 2, 2], [3, 3, 3]],
- [[4, 4, 4], [5, 5, 5]], [[6, 6, 6], [7, 7, 7]]],
- dtype='uint8')
-
- # Set up config.
- config = delf_config_pb2.DelfConfig(
- max_image_size=max_image_size,
- min_image_size=min_image_size,
- use_square_images=square_output)
-
- resized_image, scale_factors = utils.ResizeImage(image, config,
- resize_factor)
- self.assertAllEqual(resized_image.shape, expected_shape)
- self.assertAllClose(scale_factors, expected_scale_factors)
-
- @parameterized.named_parameters(
- ('Max2Min2', 2, 2, 1.0, False, [2, 1, 3], [0.666666, 0.5]),
- ('Max2Min2Square', 2, 2, 1.0, True, [2, 2, 3], [0.666666, 1.0]),
- )
- def testResizeImageRoundingWorks(self, max_image_size, min_image_size,
- resize_factor, square_output, expected_shape,
- expected_scale_factors):
- # Construct image of size 3x2x3.
- image = np.array([[[0, 0, 0], [1, 1, 1]], [[2, 2, 2], [3, 3, 3]],
- [[4, 4, 4], [5, 5, 5]]],
- dtype='uint8')
-
- # Set up config.
- config = delf_config_pb2.DelfConfig(
- max_image_size=max_image_size,
- min_image_size=min_image_size,
- use_square_images=square_output)
-
- resized_image, scale_factors = utils.ResizeImage(image, config,
- resize_factor)
- self.assertAllEqual(resized_image.shape, expected_shape)
- self.assertAllClose(scale_factors, expected_scale_factors)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/delf/python/whiten.py b/research/delf/delf/python/whiten.py
deleted file mode 100644
index d2c72d9f17e..00000000000
--- a/research/delf/delf/python/whiten.py
+++ /dev/null
@@ -1,125 +0,0 @@
-# Copyright 2021 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Whitening learning functions."""
-
-import os
-
-import numpy as np
-
-
-def apply_whitening(descriptors,
- mean_descriptor_vector,
- projection,
- output_dim=None):
- """Applies the whitening to the descriptors as a post-processing step.
-
- Args:
- descriptors: [N, D] NumPy array of L2-normalized descriptors to be
- post-processed.
- mean_descriptor_vector: Mean descriptor vector.
- projection: Whitening projection matrix.
- output_dim: Integer, parameter for the dimensionality reduction. If
- `output_dim` is None, the dimensionality reduction is not performed.
-
- Returns:
- descriptors_whitened: [N, output_dim] NumPy array of L2-normalized
- descriptors `descriptors` after whitening application.
- """
- eps = 1e-6
- if output_dim is None:
- output_dim = projection.shape[0]
-
- descriptors = np.dot(projection[:output_dim, :],
- descriptors - mean_descriptor_vector)
- descriptors_whitened = descriptors / (
- np.linalg.norm(descriptors, ord=2, axis=0, keepdims=True) + eps)
- return descriptors_whitened
-
-
-def learn_whitening(descriptors, qidxs, pidxs):
- """Learning the post-processing of fine-tuned descriptor vectors.
-
- This method of whitening learning leverages the provided labeled data and
- uses linear discriminant projections. The projection is decomposed into two
- parts: whitening and rotation. The whitening part is the inverse of the
- square-root of the intraclass (matching pairs) covariance matrix. The
- rotation part is the PCA of the interclass (non-matching pairs) covariance
- matrix in the whitened space. The described approach acts as a
- post-processing step, equivalently, once the fine-tuning of the CNN is
- finished. For more information about the method refer to the section 3.4
- of https://arxiv.org/pdf/1711.02512.pdf.
-
- Args:
- descriptors: [N, D] NumPy array of L2-normalized descriptors.
- qidxs: List of query indexes.
- pidxs: List of positive pairs indexes.
-
- Returns:
- mean_descriptor_vector: [N, 1] NumPy array, mean descriptor vector.
- projection: [N, N] NumPy array, whitening projection matrix.
- """
- # Calculating the mean descriptor vector, which is used to perform centering.
- mean_descriptor_vector = descriptors[:, qidxs].mean(axis=1, keepdims=True)
- # Interclass (matching pairs) difference.
- interclass_difference = descriptors[:, qidxs] - descriptors[:, pidxs]
- covariance_matrix = (
- np.dot(interclass_difference, interclass_difference.T) /
- interclass_difference.shape[1])
-
- # Whitening part.
- projection = np.linalg.inv(cholesky(covariance_matrix))
-
- projected_descriptors = np.dot(projection,
- descriptors - mean_descriptor_vector)
- non_matching_covariance_matrix = np.dot(projected_descriptors,
- projected_descriptors.T)
- eigval, eigvec = np.linalg.eig(non_matching_covariance_matrix)
- order = eigval.argsort()[::-1]
- eigvec = eigvec[:, order]
-
- # Rotational part.
- projection = np.dot(eigvec.T, projection)
- return mean_descriptor_vector, projection
-
-
-def cholesky(matrix):
- """Cholesky decomposition.
-
- Cholesky decomposition suitable for non-positive definite matrices: involves
- adding a small value `alpha` on the matrix diagonal until the matrix
- becomes positive definite.
-
- Args:
- matrix: [K, K] Square matrix to be decomposed.
-
- Returns:
- decomposition: [K, K] Upper-triangular Cholesky factor of `matrix`,
- a matrix with real and positive diagonal entries.
- """
- alpha = 0
- while True:
- try:
- # If the input parameter matrix is not positive-definite,
- # the decomposition fails and we iteratively add a small value `alpha` on
- # the matrix diagonal.
- decomposition = np.linalg.cholesky(matrix + alpha * np.eye(*matrix.shape))
- return decomposition
- except np.linalg.LinAlgError:
- if alpha == 0:
- alpha = 1e-10
- else:
- alpha *= 10
- print(">>>> {}::cholesky: Matrix is not positive definite, adding {:.0e} "
- "on the diagonal".format(os.path.basename(__file__), alpha))
diff --git a/research/delf/delf/python/whiten_test.py b/research/delf/delf/python/whiten_test.py
deleted file mode 100644
index 52cc51e65d1..00000000000
--- a/research/delf/delf/python/whiten_test.py
+++ /dev/null
@@ -1,73 +0,0 @@
-# Lint as: python3
-# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for whitening module."""
-
-import numpy as np
-import tensorflow as tf
-
-from delf import whiten
-
-
-class WhitenTest(tf.test.TestCase):
-
- def testApplyWhitening(self):
- # Testing the application of the learned whitening.
- vectors = np.array([[0.14022471, 0.96360618], [0.37601032, 0.25528411]])
- # Learn whitening for the `vectors`. First element in the `vectors` is
- # viewed is the example query and the second element is the corresponding
- # positive.
- mean_vector, projection = whiten.learn_whitening(vectors, [0], [1])
- # Apply the computed whitening.
- whitened_vectors = whiten.apply_whitening(vectors, mean_vector, projection)
- expected_whitened_vectors = np.array([[0., 9.99999000e-01],
- [0., -2.81240452e-13]])
- # Compare the obtained whitened vectors with the expected result.
- self.assertAllClose(whitened_vectors, expected_whitened_vectors)
-
- def testLearnWhitening(self):
- # Testing whitening learning function.
- descriptors = np.array([[0.14022471, 0.96360618], [0.37601032, 0.25528411]])
- # Obtain the mean descriptor vector and the projection matrix.
- mean_vector, projection = whiten.learn_whitening(descriptors, [0], [1])
- expected_mean_vector = np.array([[0.14022471], [0.37601032]])
- expected_projection = np.array([[1.18894378e+00, -1.74326044e-01],
- [1.45071361e+04, 9.89421193e+04]])
- # Check that the both calculated values are close to the expected values.
- self.assertAllClose(mean_vector, expected_mean_vector)
- self.assertAllClose(projection, expected_projection)
-
- def testCholeskyPositiveDefinite(self):
- # Testing the Cholesky decomposition for the positive definite matrix.
- descriptors = np.array([[1, -2j], [2j, 5]])
- output = whiten.cholesky(descriptors)
- expected_output = np.array([[1. + 0.j, 0. + 0.j], [0. + 2.j, 1. + 0.j]])
- # Check that the expected output is obtained.
- self.assertAllClose(output, expected_output)
- # Check that the properties of the Cholesky decomposition are satisfied.
- self.assertAllClose(np.matmul(output, output.T.conj()), descriptors)
-
- def testCholeskyNonPositiveDefinite(self):
- # Testing the Cholesky decomposition for a non-positive definite matrix.
- input_matrix = np.array([[1., 2.], [-2., 1.]])
- decomposition = whiten.cholesky(input_matrix)
- expected_output = np.array([[2., -2.], [-2., 2.]])
- # Check that the properties of the Cholesky decomposition are satisfied.
- self.assertAllClose(
- np.matmul(decomposition, decomposition.T), expected_output)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/delf/setup.py b/research/delf/setup.py
deleted file mode 100644
index f0ec02523ec..00000000000
--- a/research/delf/setup.py
+++ /dev/null
@@ -1,37 +0,0 @@
-# Copyright 2017 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Setup script for delf."""
-
-from setuptools import setup, find_packages
-
-install_requires = [
- 'absl-py >= 0.7.1',
- 'protobuf >= 3.8.0',
- 'pandas >= 0.24.2',
- 'numpy >= 1.16.1',
- 'scipy >= 1.2.2',
- 'tensorflow >= 2.2.0',
- 'tf_slim >= 1.1',
- 'tensorflow_probability >= 0.9.0',
-]
-
-setup(
- name='delf',
- version='2.0',
- include_package_data=True,
- packages=find_packages(),
- install_requires=install_requires,
- description='DELF (DEep Local Features)',
-)
diff --git a/research/efficient-hrl/README.md b/research/efficient-hrl/README.md
deleted file mode 100755
index 6c454c687a3..00000000000
--- a/research/efficient-hrl/README.md
+++ /dev/null
@@ -1,65 +0,0 @@
-![TensorFlow Requirement: 1.x](https://img.shields.io/badge/TensorFlow%20Requirement-1.x-brightgreen)
-![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg)
-
-Code for performing Hierarchical RL based on the following publications:
-
-"Data-Efficient Hierarchical Reinforcement Learning" by
-Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine
-(https://arxiv.org/abs/1805.08296).
-
-"Near-Optimal Representation Learning for Hierarchical Reinforcement Learning"
-by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine
-(https://arxiv.org/abs/1810.01257).
-
-
-Requirements:
-* TensorFlow (see http://www.tensorflow.org for how to install/upgrade)
-* Gin Config (see https://github.com/google/gin-config)
-* Tensorflow Agents (see https://github.com/tensorflow/agents)
-* OpenAI Gym (see http://gym.openai.com/docs, be sure to install MuJoCo as well)
-* NumPy (see http://www.numpy.org/)
-
-
-Quick Start:
-
-Run a training job based on the original HIRO paper on Ant Maze:
-
-```
-python scripts/local_train.py test1 hiro_orig ant_maze base_uvf suite
-```
-
-Run a continuous evaluation job for that experiment:
-
-```
-python scripts/local_eval.py test1 hiro_orig ant_maze base_uvf suite
-```
-
-To run the same experiment with online representation learning (the
-"Near-Optimal" paper), change `hiro_orig` to `hiro_repr`.
-You can also run with `hiro_xy` to run the same experiment with HIRO on only the
-xy coordinates of the agent.
-
-To run on other environments, change `ant_maze` to something else; e.g.,
-`ant_push_multi`, `ant_fall_multi`, etc. See `context/configs/*` for other options.
-
-
-Basic Code Guide:
-
-The code for training resides in train.py. The code trains a lower-level policy
-(a UVF agent in the code) and a higher-level policy (a MetaAgent in the code)
-concurrently. The higher-level policy communicates goals to the lower-level
-policy. In the code, this is called a context. Not only does the lower-level
-policy act with respect to a context (a higher-level specified goal), but the
-higher-level policy also acts with respect to an environment-specified context
-(corresponding to the navigation target location associated with the task).
-Therefore, in `context/configs/*` you will find both specifications for task setup
-as well as goal configurations. Most remaining hyperparameters used for
-training/evaluation may be found in `configs/*`.
-
-NOTE: Not all the code corresponding to the "Near-Optimal" paper is included.
-Namely, changes to low-level policy training proposed in the paper (discounting
-and auxiliary rewards) are not implemented here. Performance should not change
-significantly.
-
-
-Maintained by Ofir Nachum (ofirnachum).
diff --git a/research/efficient-hrl/agent.py b/research/efficient-hrl/agent.py
deleted file mode 100644
index 0028ddffa0d..00000000000
--- a/research/efficient-hrl/agent.py
+++ /dev/null
@@ -1,774 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""A UVF agent.
-"""
-
-import tensorflow as tf
-import gin.tf
-from agents import ddpg_agent
-# pylint: disable=unused-import
-import cond_fn
-from utils import utils as uvf_utils
-from context import gin_imports
-# pylint: enable=unused-import
-slim = tf.contrib.slim
-
-
-@gin.configurable
-class UvfAgentCore(object):
- """Defines basic functions for UVF agent. Must be inherited with an RL agent.
-
- Used as lower-level agent.
- """
-
- def __init__(self,
- observation_spec,
- action_spec,
- tf_env,
- tf_context,
- step_cond_fn=cond_fn.env_transition,
- reset_episode_cond_fn=cond_fn.env_restart,
- reset_env_cond_fn=cond_fn.false_fn,
- metrics=None,
- **base_agent_kwargs):
- """Constructs a UVF agent.
-
- Args:
- observation_spec: A TensorSpec defining the observations.
- action_spec: A BoundedTensorSpec defining the actions.
- tf_env: A Tensorflow environment object.
- tf_context: A Context class.
- step_cond_fn: A function indicating whether to increment the num of steps.
- reset_episode_cond_fn: A function indicating whether to restart the
- episode, resampling the context.
- reset_env_cond_fn: A function indicating whether to perform a manual reset
- of the environment.
- metrics: A list of functions that evaluate metrics of the agent.
- **base_agent_kwargs: A dictionary of parameters for base RL Agent.
- Raises:
- ValueError: If 'dqda_clipping' is < 0.
- """
- self._step_cond_fn = step_cond_fn
- self._reset_episode_cond_fn = reset_episode_cond_fn
- self._reset_env_cond_fn = reset_env_cond_fn
- self.metrics = metrics
-
- # expose tf_context methods
- self.tf_context = tf_context(tf_env=tf_env)
- self.set_replay = self.tf_context.set_replay
- self.sample_contexts = self.tf_context.sample_contexts
- self.compute_rewards = self.tf_context.compute_rewards
- self.gamma_index = self.tf_context.gamma_index
- self.context_specs = self.tf_context.context_specs
- self.context_as_action_specs = self.tf_context.context_as_action_specs
- self.init_context_vars = self.tf_context.create_vars
-
- self.env_observation_spec = observation_spec[0]
- merged_observation_spec = (uvf_utils.merge_specs(
- (self.env_observation_spec,) + self.context_specs),)
- self._context_vars = dict()
- self._action_vars = dict()
-
- self.BASE_AGENT_CLASS.__init__(
- self,
- observation_spec=merged_observation_spec,
- action_spec=action_spec,
- **base_agent_kwargs
- )
-
- def set_meta_agent(self, agent=None):
- self._meta_agent = agent
-
- @property
- def meta_agent(self):
- return self._meta_agent
-
- def actor_loss(self, states, actions, rewards, discounts,
- next_states):
- """Returns the next action for the state.
-
- Args:
- state: A [num_state_dims] tensor representing a state.
- context: A list of [num_context_dims] tensor representing a context.
- Returns:
- A [num_action_dims] tensor representing the action.
- """
- return self.BASE_AGENT_CLASS.actor_loss(self, states)
-
- def action(self, state, context=None):
- """Returns the next action for the state.
-
- Args:
- state: A [num_state_dims] tensor representing a state.
- context: A list of [num_context_dims] tensor representing a context.
- Returns:
- A [num_action_dims] tensor representing the action.
- """
- merged_state = self.merged_state(state, context)
- return self.BASE_AGENT_CLASS.action(self, merged_state)
-
- def actions(self, state, context=None):
- """Returns the next action for the state.
-
- Args:
- state: A [-1, num_state_dims] tensor representing a state.
- context: A list of [-1, num_context_dims] tensor representing a context.
- Returns:
- A [-1, num_action_dims] tensor representing the action.
- """
- merged_states = self.merged_states(state, context)
- return self.BASE_AGENT_CLASS.actor_net(self, merged_states)
-
- def log_probs(self, states, actions, state_reprs, contexts=None):
- assert contexts is not None
- batch_dims = [tf.shape(states)[0], tf.shape(states)[1]]
- contexts = self.tf_context.context_multi_transition_fn(
- contexts, states=tf.to_float(state_reprs))
-
- flat_states = tf.reshape(states,
- [batch_dims[0] * batch_dims[1], states.shape[-1]])
- flat_contexts = [tf.reshape(tf.cast(context, states.dtype),
- [batch_dims[0] * batch_dims[1], context.shape[-1]])
- for context in contexts]
- flat_pred_actions = self.actions(flat_states, flat_contexts)
- pred_actions = tf.reshape(flat_pred_actions,
- batch_dims + [flat_pred_actions.shape[-1]])
-
- error = tf.square(actions - pred_actions)
- spec_range = (self._action_spec.maximum - self._action_spec.minimum) / 2
- normalized_error = tf.cast(error, tf.float64) / tf.constant(spec_range) ** 2
- return -normalized_error
-
- @gin.configurable('uvf_add_noise_fn')
- def add_noise_fn(self, action_fn, stddev=1.0, debug=False,
- clip=True, global_step=None):
- """Returns the action_fn with additive Gaussian noise.
-
- Args:
- action_fn: A callable(`state`, `context`) which returns a
- [num_action_dims] tensor representing a action.
- stddev: stddev for the Ornstein-Uhlenbeck noise.
- debug: Print debug messages.
- Returns:
- A [num_action_dims] action tensor.
- """
- if global_step is not None:
- stddev *= tf.maximum( # Decay exploration during training.
- tf.train.exponential_decay(1.0, global_step, 1e6, 0.8), 0.5)
- def noisy_action_fn(state, context=None):
- """Noisy action fn."""
- action = action_fn(state, context)
- if debug:
- action = uvf_utils.tf_print(
- action, [action],
- message='[add_noise_fn] pre-noise action',
- first_n=100)
- noise_dist = tf.distributions.Normal(tf.zeros_like(action),
- tf.ones_like(action) * stddev)
- noise = noise_dist.sample()
- action += noise
- if debug:
- action = uvf_utils.tf_print(
- action, [action],
- message='[add_noise_fn] post-noise action',
- first_n=100)
- if clip:
- action = uvf_utils.clip_to_spec(action, self._action_spec)
- return action
- return noisy_action_fn
-
- def merged_state(self, state, context=None):
- """Returns the merged state from the environment state and contexts.
-
- Args:
- state: A [num_state_dims] tensor representing a state.
- context: A list of [num_context_dims] tensor representing a context.
- If None, use the internal context.
- Returns:
- A [num_merged_state_dims] tensor representing the merged state.
- """
- if context is None:
- context = list(self.context_vars)
- state = tf.concat([state,] + context, axis=-1)
- self._validate_states(self._batch_state(state))
- return state
-
- def merged_states(self, states, contexts=None):
- """Returns the batch merged state from the batch env state and contexts.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- contexts: A list of [batch_size, num_context_dims] tensor
- representing a batch of contexts. If None,
- use the internal context.
- Returns:
- A [batch_size, num_merged_state_dims] tensor representing the batch
- of merged states.
- """
- if contexts is None:
- contexts = [tf.tile(tf.expand_dims(context, axis=0),
- (tf.shape(states)[0], 1)) for
- context in self.context_vars]
- states = tf.concat([states,] + contexts, axis=-1)
- self._validate_states(states)
- return states
-
- def unmerged_states(self, merged_states):
- """Returns the batch state and contexts from the batch merged state.
-
- Args:
- merged_states: A [batch_size, num_merged_state_dims] tensor
- representing a batch of merged states.
- Returns:
- A [batch_size, num_state_dims] tensor and a list of
- [batch_size, num_context_dims] tensors representing the batch state
- and contexts respectively.
- """
- self._validate_states(merged_states)
- num_state_dims = self.env_observation_spec.shape.as_list()[0]
- num_context_dims_list = [c.shape.as_list()[0] for c in self.context_specs]
- states = merged_states[:, :num_state_dims]
- contexts = []
- i = num_state_dims
- for num_context_dims in num_context_dims_list:
- contexts.append(merged_states[:, i: i+num_context_dims])
- i += num_context_dims
- return states, contexts
-
- def sample_random_actions(self, batch_size=1):
- """Return random actions.
-
- Args:
- batch_size: Batch size.
- Returns:
- A [batch_size, num_action_dims] tensor representing the batch of actions.
- """
- actions = tf.concat(
- [
- tf.random_uniform(
- shape=(batch_size, 1),
- minval=self._action_spec.minimum[i],
- maxval=self._action_spec.maximum[i])
- for i in range(self._action_spec.shape[0].value)
- ],
- axis=1)
- return actions
-
- def clip_actions(self, actions):
- """Clip actions to spec.
-
- Args:
- actions: A [batch_size, num_action_dims] tensor representing
- the batch of actions.
- Returns:
- A [batch_size, num_action_dims] tensor representing the batch
- of clipped actions.
- """
- actions = tf.concat(
- [
- tf.clip_by_value(
- actions[:, i:i+1],
- self._action_spec.minimum[i],
- self._action_spec.maximum[i])
- for i in range(self._action_spec.shape[0].value)
- ],
- axis=1)
- return actions
-
- def mix_contexts(self, contexts, insert_contexts, indices):
- """Mix two contexts based on indices.
-
- Args:
- contexts: A list of [batch_size, num_context_dims] tensor representing
- the batch of contexts.
- insert_contexts: A list of [batch_size, num_context_dims] tensor
- representing the batch of contexts to be inserted.
- indices: A list of a list of integers denoting indices to replace.
- Returns:
- A list of resulting contexts.
- """
- if indices is None: indices = [[]] * len(contexts)
- assert len(contexts) == len(indices)
- assert all([spec.shape.ndims == 1 for spec in self.context_specs])
- mix_contexts = []
- for contexts_, insert_contexts_, indices_, spec in zip(
- contexts, insert_contexts, indices, self.context_specs):
- mix_contexts.append(
- tf.concat(
- [
- insert_contexts_[:, i:i + 1] if i in indices_ else
- contexts_[:, i:i + 1] for i in range(spec.shape.as_list()[0])
- ],
- axis=1))
- return mix_contexts
-
- def begin_episode_ops(self, mode, action_fn=None, state=None):
- """Returns ops that reset agent at beginning of episodes.
-
- Args:
- mode: a string representing the mode=[train, explore, eval].
- Returns:
- A list of ops.
- """
- all_ops = []
- for _, action_var in sorted(self._action_vars.items()):
- sample_action = self.sample_random_actions(1)[0]
- all_ops.append(tf.assign(action_var, sample_action))
- all_ops += self.tf_context.reset(mode=mode, agent=self._meta_agent,
- action_fn=action_fn, state=state)
- return all_ops
-
- def cond_begin_episode_op(self, cond, input_vars, mode, meta_action_fn):
- """Returns op that resets agent at beginning of episodes.
-
- A new episode is begun if the cond op evalues to `False`.
-
- Args:
- cond: a Boolean tensor variable.
- input_vars: A list of tensor variables.
- mode: a string representing the mode=[train, explore, eval].
- Returns:
- Conditional begin op.
- """
- (state, action, reward, next_state,
- state_repr, next_state_repr) = input_vars
- def continue_fn():
- """Continue op fn."""
- items = [state, action, reward, next_state,
- state_repr, next_state_repr] + list(self.context_vars)
- batch_items = [tf.expand_dims(item, 0) for item in items]
- (states, actions, rewards, next_states,
- state_reprs, next_state_reprs) = batch_items[:6]
- context_reward = self.compute_rewards(
- mode, state_reprs, actions, rewards, next_state_reprs,
- batch_items[6:])[0][0]
- context_reward = tf.cast(context_reward, dtype=reward.dtype)
- if self.meta_agent is not None:
- meta_action = tf.concat(self.context_vars, -1)
- items = [state, meta_action, reward, next_state,
- state_repr, next_state_repr] + list(self.meta_agent.context_vars)
- batch_items = [tf.expand_dims(item, 0) for item in items]
- (states, meta_actions, rewards, next_states,
- state_reprs, next_state_reprs) = batch_items[:6]
- meta_reward = self.meta_agent.compute_rewards(
- mode, states, meta_actions, rewards,
- next_states, batch_items[6:])[0][0]
- meta_reward = tf.cast(meta_reward, dtype=reward.dtype)
- else:
- meta_reward = tf.constant(0, dtype=reward.dtype)
-
- with tf.control_dependencies([context_reward, meta_reward]):
- step_ops = self.tf_context.step(mode=mode, agent=self._meta_agent,
- state=state,
- next_state=next_state,
- state_repr=state_repr,
- next_state_repr=next_state_repr,
- action_fn=meta_action_fn)
- with tf.control_dependencies(step_ops):
- context_reward, meta_reward = map(tf.identity, [context_reward, meta_reward])
- return context_reward, meta_reward
- def begin_episode_fn():
- """Begin op fn."""
- begin_ops = self.begin_episode_ops(mode=mode, action_fn=meta_action_fn, state=state)
- with tf.control_dependencies(begin_ops):
- return tf.zeros_like(reward), tf.zeros_like(reward)
- with tf.control_dependencies(input_vars):
- cond_begin_episode_op = tf.cond(cond, continue_fn, begin_episode_fn)
- return cond_begin_episode_op
-
- def get_env_base_wrapper(self, env_base, **begin_kwargs):
- """Create a wrapper around env_base, with agent-specific begin/end_episode.
-
- Args:
- env_base: A python environment base.
- **begin_kwargs: Keyword args for begin_episode_ops.
- Returns:
- An object with begin_episode() and end_episode().
- """
- begin_ops = self.begin_episode_ops(**begin_kwargs)
- return uvf_utils.get_contextual_env_base(env_base, begin_ops)
-
- def init_action_vars(self, name, i=None):
- """Create and return a tensorflow Variable holding an action.
-
- Args:
- name: Name of the variables.
- i: Integer id.
- Returns:
- A [num_action_dims] tensor.
- """
- if i is not None:
- name += '_%d' % i
- assert name not in self._action_vars, ('Conflict! %s is already '
- 'initialized.') % name
- self._action_vars[name] = tf.Variable(
- self.sample_random_actions(1)[0], name='%s_action' % (name))
- self._validate_actions(tf.expand_dims(self._action_vars[name], 0))
- return self._action_vars[name]
-
- @gin.configurable('uvf_critic_function')
- def critic_function(self, critic_vals, states, critic_fn=None):
- """Computes q values based on outputs from the critic net.
-
- Args:
- critic_vals: A tf.float32 [batch_size, ...] tensor representing outputs
- from the critic net.
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- critic_fn: A callable that process outputs from critic_net and
- outputs a [batch_size] tensor representing q values.
- Returns:
- A tf.float32 [batch_size] tensor representing q values.
- """
- if critic_fn is not None:
- env_states, contexts = self.unmerged_states(states)
- critic_vals = critic_fn(critic_vals, env_states, contexts)
- critic_vals.shape.assert_has_rank(1)
- return critic_vals
-
- def get_action_vars(self, key):
- return self._action_vars[key]
-
- def get_context_vars(self, key):
- return self.tf_context.context_vars[key]
-
- def step_cond_fn(self, *args):
- return self._step_cond_fn(self, *args)
-
- def reset_episode_cond_fn(self, *args):
- return self._reset_episode_cond_fn(self, *args)
-
- def reset_env_cond_fn(self, *args):
- return self._reset_env_cond_fn(self, *args)
-
- @property
- def context_vars(self):
- return self.tf_context.vars
-
-
-@gin.configurable
-class MetaAgentCore(UvfAgentCore):
- """Defines basic functions for UVF Meta-agent. Must be inherited with an RL agent.
-
- Used as higher-level agent.
- """
-
- def __init__(self,
- observation_spec,
- action_spec,
- tf_env,
- tf_context,
- sub_context,
- step_cond_fn=cond_fn.env_transition,
- reset_episode_cond_fn=cond_fn.env_restart,
- reset_env_cond_fn=cond_fn.false_fn,
- metrics=None,
- actions_reg=0.,
- k=2,
- **base_agent_kwargs):
- """Constructs a Meta agent.
-
- Args:
- observation_spec: A TensorSpec defining the observations.
- action_spec: A BoundedTensorSpec defining the actions.
- tf_env: A Tensorflow environment object.
- tf_context: A Context class.
- step_cond_fn: A function indicating whether to increment the num of steps.
- reset_episode_cond_fn: A function indicating whether to restart the
- episode, resampling the context.
- reset_env_cond_fn: A function indicating whether to perform a manual reset
- of the environment.
- metrics: A list of functions that evaluate metrics of the agent.
- **base_agent_kwargs: A dictionary of parameters for base RL Agent.
- Raises:
- ValueError: If 'dqda_clipping' is < 0.
- """
- self._step_cond_fn = step_cond_fn
- self._reset_episode_cond_fn = reset_episode_cond_fn
- self._reset_env_cond_fn = reset_env_cond_fn
- self.metrics = metrics
- self._actions_reg = actions_reg
- self._k = k
-
- # expose tf_context methods
- self.tf_context = tf_context(tf_env=tf_env)
- self.sub_context = sub_context(tf_env=tf_env)
- self.set_replay = self.tf_context.set_replay
- self.sample_contexts = self.tf_context.sample_contexts
- self.compute_rewards = self.tf_context.compute_rewards
- self.gamma_index = self.tf_context.gamma_index
- self.context_specs = self.tf_context.context_specs
- self.context_as_action_specs = self.tf_context.context_as_action_specs
- self.sub_context_as_action_specs = self.sub_context.context_as_action_specs
- self.init_context_vars = self.tf_context.create_vars
-
- self.env_observation_spec = observation_spec[0]
- merged_observation_spec = (uvf_utils.merge_specs(
- (self.env_observation_spec,) + self.context_specs),)
- self._context_vars = dict()
- self._action_vars = dict()
-
- assert len(self.context_as_action_specs) == 1
- self.BASE_AGENT_CLASS.__init__(
- self,
- observation_spec=merged_observation_spec,
- action_spec=self.sub_context_as_action_specs,
- **base_agent_kwargs
- )
-
- @gin.configurable('meta_add_noise_fn')
- def add_noise_fn(self, action_fn, stddev=1.0, debug=False,
- global_step=None):
- noisy_action_fn = super(MetaAgentCore, self).add_noise_fn(
- action_fn, stddev,
- clip=True, global_step=global_step)
- return noisy_action_fn
-
- def actor_loss(self, states, actions, rewards, discounts,
- next_states):
- """Returns the next action for the state.
-
- Args:
- state: A [num_state_dims] tensor representing a state.
- context: A list of [num_context_dims] tensor representing a context.
- Returns:
- A [num_action_dims] tensor representing the action.
- """
- actions = self.actor_net(states, stop_gradients=False)
- regularizer = self._actions_reg * tf.reduce_mean(
- tf.reduce_sum(tf.abs(actions[:, self._k:]), -1), 0)
- loss = self.BASE_AGENT_CLASS.actor_loss(self, states)
- return regularizer + loss
-
-
-@gin.configurable
-class UvfAgent(UvfAgentCore, ddpg_agent.TD3Agent):
- """A DDPG agent with UVF.
- """
- BASE_AGENT_CLASS = ddpg_agent.TD3Agent
- ACTION_TYPE = 'continuous'
-
- def __init__(self, *args, **kwargs):
- UvfAgentCore.__init__(self, *args, **kwargs)
-
-
-@gin.configurable
-class MetaAgent(MetaAgentCore, ddpg_agent.TD3Agent):
- """A DDPG meta-agent.
- """
- BASE_AGENT_CLASS = ddpg_agent.TD3Agent
- ACTION_TYPE = 'continuous'
-
- def __init__(self, *args, **kwargs):
- MetaAgentCore.__init__(self, *args, **kwargs)
-
-
-@gin.configurable()
-def state_preprocess_net(
- states,
- num_output_dims=2,
- states_hidden_layers=(100,),
- normalizer_fn=None,
- activation_fn=tf.nn.relu,
- zero_time=True,
- images=False):
- """Creates a simple feed forward net for embedding states.
- """
- with slim.arg_scope(
- [slim.fully_connected],
- activation_fn=activation_fn,
- normalizer_fn=normalizer_fn,
- weights_initializer=slim.variance_scaling_initializer(
- factor=1.0/3.0, mode='FAN_IN', uniform=True)):
-
- states_shape = tf.shape(states)
- states_dtype = states.dtype
- states = tf.to_float(states)
- if images: # Zero-out x-y
- states *= tf.constant([0.] * 2 + [1.] * (states.shape[-1] - 2), dtype=states.dtype)
- if zero_time:
- states *= tf.constant([1.] * (states.shape[-1] - 1) + [0.], dtype=states.dtype)
- orig_states = states
- embed = states
- if states_hidden_layers:
- embed = slim.stack(embed, slim.fully_connected, states_hidden_layers,
- scope='states')
-
- with slim.arg_scope([slim.fully_connected],
- weights_regularizer=None,
- weights_initializer=tf.random_uniform_initializer(
- minval=-0.003, maxval=0.003)):
- embed = slim.fully_connected(embed, num_output_dims,
- activation_fn=None,
- normalizer_fn=None,
- scope='value')
-
- output = embed
- output = tf.cast(output, states_dtype)
- return output
-
-
-@gin.configurable()
-def action_embed_net(
- actions,
- states=None,
- num_output_dims=2,
- hidden_layers=(400, 300),
- normalizer_fn=None,
- activation_fn=tf.nn.relu,
- zero_time=True,
- images=False):
- """Creates a simple feed forward net for embedding actions.
- """
- with slim.arg_scope(
- [slim.fully_connected],
- activation_fn=activation_fn,
- normalizer_fn=normalizer_fn,
- weights_initializer=slim.variance_scaling_initializer(
- factor=1.0/3.0, mode='FAN_IN', uniform=True)):
-
- actions = tf.to_float(actions)
- if states is not None:
- if images: # Zero-out x-y
- states *= tf.constant([0.] * 2 + [1.] * (states.shape[-1] - 2), dtype=states.dtype)
- if zero_time:
- states *= tf.constant([1.] * (states.shape[-1] - 1) + [0.], dtype=states.dtype)
- actions = tf.concat([actions, tf.to_float(states)], -1)
-
- embed = actions
- if hidden_layers:
- embed = slim.stack(embed, slim.fully_connected, hidden_layers,
- scope='hidden')
-
- with slim.arg_scope([slim.fully_connected],
- weights_regularizer=None,
- weights_initializer=tf.random_uniform_initializer(
- minval=-0.003, maxval=0.003)):
- embed = slim.fully_connected(embed, num_output_dims,
- activation_fn=None,
- normalizer_fn=None,
- scope='value')
- if num_output_dims == 1:
- return embed[:, 0, ...]
- else:
- return embed
-
-
-def huber(x, kappa=0.1):
- return (0.5 * tf.square(x) * tf.to_float(tf.abs(x) <= kappa) +
- kappa * (tf.abs(x) - 0.5 * kappa) * tf.to_float(tf.abs(x) > kappa)
- ) / kappa
-
-
-@gin.configurable()
-class StatePreprocess(object):
- STATE_PREPROCESS_NET_SCOPE = 'state_process_net'
- ACTION_EMBED_NET_SCOPE = 'action_embed_net'
-
- def __init__(self, trainable=False,
- state_preprocess_net=lambda states: states,
- action_embed_net=lambda actions, *args, **kwargs: actions,
- ndims=None):
- self.trainable = trainable
- self._scope = tf.get_variable_scope().name
- self._ndims = ndims
- self._state_preprocess_net = tf.make_template(
- self.STATE_PREPROCESS_NET_SCOPE, state_preprocess_net,
- create_scope_now_=True)
- self._action_embed_net = tf.make_template(
- self.ACTION_EMBED_NET_SCOPE, action_embed_net,
- create_scope_now_=True)
-
- def __call__(self, states):
- batched = states.get_shape().ndims != 1
- if not batched:
- states = tf.expand_dims(states, 0)
- embedded = self._state_preprocess_net(states)
- if self._ndims is not None:
- embedded = embedded[..., :self._ndims]
- if not batched:
- return embedded[0]
- return embedded
-
- def loss(self, states, next_states, low_actions, low_states):
- batch_size = tf.shape(states)[0]
- d = int(low_states.shape[1])
- # Sample indices into meta-transition to train on.
- probs = 0.99 ** tf.range(d, dtype=tf.float32)
- probs *= tf.constant([1.0] * (d - 1) + [1.0 / (1 - 0.99)],
- dtype=tf.float32)
- probs /= tf.reduce_sum(probs)
- index_dist = tf.distributions.Categorical(probs=probs, dtype=tf.int64)
- indices = index_dist.sample(batch_size)
- batch_size = tf.cast(batch_size, tf.int64)
- next_indices = tf.concat(
- [tf.range(batch_size, dtype=tf.int64)[:, None],
- (1 + indices[:, None]) % d], -1)
- new_next_states = tf.where(indices < d - 1,
- tf.gather_nd(low_states, next_indices),
- next_states)
- next_states = new_next_states
-
- embed1 = tf.to_float(self._state_preprocess_net(states))
- embed2 = tf.to_float(self._state_preprocess_net(next_states))
- action_embed = self._action_embed_net(
- tf.layers.flatten(low_actions), states=states)
-
- tau = 2.0
- fn = lambda z: tau * tf.reduce_sum(huber(z), -1)
- all_embed = tf.get_variable('all_embed', [1024, int(embed1.shape[-1])],
- initializer=tf.zeros_initializer())
- upd = all_embed.assign(tf.concat([all_embed[batch_size:], embed2], 0))
- with tf.control_dependencies([upd]):
- close = 1 * tf.reduce_mean(fn(embed1 + action_embed - embed2))
- prior_log_probs = tf.reduce_logsumexp(
- -fn((embed1 + action_embed)[:, None, :] - all_embed[None, :, :]),
- axis=-1) - tf.log(tf.to_float(all_embed.shape[0]))
- far = tf.reduce_mean(tf.exp(-fn((embed1 + action_embed)[1:] - embed2[:-1])
- - tf.stop_gradient(prior_log_probs[1:])))
- repr_log_probs = tf.stop_gradient(
- -fn(embed1 + action_embed - embed2) - prior_log_probs) / tau
- return close + far, repr_log_probs, indices
-
- def get_trainable_vars(self):
- return (
- slim.get_trainable_variables(
- uvf_utils.join_scope(self._scope, self.STATE_PREPROCESS_NET_SCOPE)) +
- slim.get_trainable_variables(
- uvf_utils.join_scope(self._scope, self.ACTION_EMBED_NET_SCOPE)))
-
-
-@gin.configurable()
-class InverseDynamics(object):
- INVERSE_DYNAMICS_NET_SCOPE = 'inverse_dynamics'
-
- def __init__(self, spec):
- self._spec = spec
-
- def sample(self, states, next_states, num_samples, orig_goals, sc=0.5):
- goal_dim = orig_goals.shape[-1]
- spec_range = (self._spec.maximum - self._spec.minimum) / 2 * tf.ones([goal_dim])
- loc = tf.cast(next_states - states, tf.float32)[:, :goal_dim]
- scale = sc * tf.tile(tf.reshape(spec_range, [1, goal_dim]),
- [tf.shape(states)[0], 1])
- dist = tf.distributions.Normal(loc, scale)
- if num_samples == 1:
- return dist.sample()
- samples = tf.concat([dist.sample(num_samples - 2),
- tf.expand_dims(loc, 0),
- tf.expand_dims(orig_goals, 0)], 0)
- return uvf_utils.clip_to_spec(samples, self._spec)
diff --git a/research/efficient-hrl/agents/__init__.py b/research/efficient-hrl/agents/__init__.py
deleted file mode 100644
index 8b137891791..00000000000
--- a/research/efficient-hrl/agents/__init__.py
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/research/efficient-hrl/agents/circular_buffer.py b/research/efficient-hrl/agents/circular_buffer.py
deleted file mode 100644
index 72f90f0de89..00000000000
--- a/research/efficient-hrl/agents/circular_buffer.py
+++ /dev/null
@@ -1,289 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""A circular buffer where each element is a list of tensors.
-
-Each element of the buffer is a list of tensors. An example use case is a replay
-buffer in reinforcement learning, where each element is a list of tensors
-representing the state, action, reward etc.
-
-New elements are added sequentially, and once the buffer is full, we
-start overwriting them in a circular fashion. Reading does not remove any
-elements, only adding new elements does.
-"""
-
-import collections
-import numpy as np
-import tensorflow as tf
-
-import gin.tf
-
-
-@gin.configurable
-class CircularBuffer(object):
- """A circular buffer where each element is a list of tensors."""
-
- def __init__(self, buffer_size=1000, scope='replay_buffer'):
- """Circular buffer of list of tensors.
-
- Args:
- buffer_size: (integer) maximum number of tensor lists the buffer can hold.
- scope: (string) variable scope for creating the variables.
- """
- self._buffer_size = np.int64(buffer_size)
- self._scope = scope
- self._tensors = collections.OrderedDict()
- with tf.variable_scope(self._scope):
- self._num_adds = tf.Variable(0, dtype=tf.int64, name='num_adds')
- self._num_adds_cs = tf.CriticalSection(name='num_adds')
-
- @property
- def buffer_size(self):
- return self._buffer_size
-
- @property
- def scope(self):
- return self._scope
-
- @property
- def num_adds(self):
- return self._num_adds
-
- def _create_variables(self, tensors):
- with tf.variable_scope(self._scope):
- for name in tensors.keys():
- tensor = tensors[name]
- self._tensors[name] = tf.get_variable(
- name='BufferVariable_' + name,
- shape=[self._buffer_size] + tensor.get_shape().as_list(),
- dtype=tensor.dtype,
- trainable=False)
-
- def _validate(self, tensors):
- """Validate shapes of tensors."""
- if len(tensors) != len(self._tensors):
- raise ValueError('Expected tensors to have %d elements. Received %d '
- 'instead.' % (len(self._tensors), len(tensors)))
- if self._tensors.keys() != tensors.keys():
- raise ValueError('The keys of tensors should be the always the same.'
- 'Received %s instead %s.' %
- (tensors.keys(), self._tensors.keys()))
- for name, tensor in tensors.items():
- if tensor.get_shape().as_list() != self._tensors[
- name].get_shape().as_list()[1:]:
- raise ValueError('Tensor %s has incorrect shape.' % name)
- if not tensor.dtype.is_compatible_with(self._tensors[name].dtype):
- raise ValueError(
- 'Tensor %s has incorrect data type. Expected %s, received %s' %
- (name, self._tensors[name].read_value().dtype, tensor.dtype))
-
- def add(self, tensors):
- """Adds an element (list/tuple/dict of tensors) to the buffer.
-
- Args:
- tensors: (list/tuple/dict of tensors) to be added to the buffer.
- Returns:
- An add operation that adds the input `tensors` to the buffer. Similar to
- an enqueue_op.
- Raises:
- ValueError: If the shapes and data types of input `tensors' are not the
- same across calls to the add function.
- """
- return self.maybe_add(tensors, True)
-
- def maybe_add(self, tensors, condition):
- """Adds an element (tensors) to the buffer based on the condition..
-
- Args:
- tensors: (list/tuple of tensors) to be added to the buffer.
- condition: A boolean Tensor controlling whether the tensors would be added
- to the buffer or not.
- Returns:
- An add operation that adds the input `tensors` to the buffer. Similar to
- an maybe_enqueue_op.
- Raises:
- ValueError: If the shapes and data types of input `tensors' are not the
- same across calls to the add function.
- """
- if not isinstance(tensors, dict):
- names = [str(i) for i in range(len(tensors))]
- tensors = collections.OrderedDict(zip(names, tensors))
- if not isinstance(tensors, collections.OrderedDict):
- tensors = collections.OrderedDict(
- sorted(tensors.items(), key=lambda t: t[0]))
- if not self._tensors:
- self._create_variables(tensors)
- else:
- self._validate(tensors)
-
- #@tf.critical_section(self._position_mutex)
- def _increment_num_adds():
- # Adding 0 to the num_adds variable is a trick to read the value of the
- # variable and return a read-only tensor. Doing this in a critical
- # section allows us to capture a snapshot of the variable that will
- # not be affected by other threads updating num_adds.
- return self._num_adds.assign_add(1) + 0
- def _add():
- num_adds_inc = self._num_adds_cs.execute(_increment_num_adds)
- current_pos = tf.mod(num_adds_inc - 1, self._buffer_size)
- update_ops = []
- for name in self._tensors.keys():
- update_ops.append(
- tf.scatter_update(self._tensors[name], current_pos, tensors[name]))
- return tf.group(*update_ops)
-
- return tf.contrib.framework.smart_cond(condition, _add, tf.no_op)
-
- def get_random_batch(self, batch_size, keys=None, num_steps=1):
- """Samples a batch of tensors from the buffer with replacement.
-
- Args:
- batch_size: (integer) number of elements to sample.
- keys: List of keys of tensors to retrieve. If None retrieve all.
- num_steps: (integer) length of trajectories to return. If > 1 will return
- a list of lists, where each internal list represents a trajectory of
- length num_steps.
- Returns:
- A list of tensors, where each element in the list is a batch sampled from
- one of the tensors in the buffer.
- Raises:
- ValueError: If get_random_batch is called before calling the add function.
- tf.errors.InvalidArgumentError: If this operation is executed before any
- items are added to the buffer.
- """
- if not self._tensors:
- raise ValueError('The add function must be called before get_random_batch.')
- if keys is None:
- keys = self._tensors.keys()
-
- latest_start_index = self.get_num_adds() - num_steps + 1
- empty_buffer_assert = tf.Assert(
- tf.greater(latest_start_index, 0),
- ['Not enough elements have been added to the buffer.'])
- with tf.control_dependencies([empty_buffer_assert]):
- max_index = tf.minimum(self._buffer_size, latest_start_index)
- indices = tf.random_uniform(
- [batch_size],
- minval=0,
- maxval=max_index,
- dtype=tf.int64)
- if num_steps == 1:
- return self.gather(indices, keys)
- else:
- return self.gather_nstep(num_steps, indices, keys)
-
- def gather(self, indices, keys=None):
- """Returns elements at the specified indices from the buffer.
-
- Args:
- indices: (list of integers or rank 1 int Tensor) indices in the buffer to
- retrieve elements from.
- keys: List of keys of tensors to retrieve. If None retrieve all.
- Returns:
- A list of tensors, where each element in the list is obtained by indexing
- one of the tensors in the buffer.
- Raises:
- ValueError: If gather is called before calling the add function.
- tf.errors.InvalidArgumentError: If indices are bigger than the number of
- items in the buffer.
- """
- if not self._tensors:
- raise ValueError('The add function must be called before calling gather.')
- if keys is None:
- keys = self._tensors.keys()
- with tf.name_scope('Gather'):
- index_bound_assert = tf.Assert(
- tf.less(
- tf.to_int64(tf.reduce_max(indices)),
- tf.minimum(self.get_num_adds(), self._buffer_size)),
- ['Index out of bounds.'])
- with tf.control_dependencies([index_bound_assert]):
- indices = tf.convert_to_tensor(indices)
-
- batch = []
- for key in keys:
- batch.append(tf.gather(self._tensors[key], indices, name=key))
- return batch
-
- def gather_nstep(self, num_steps, indices, keys=None):
- """Returns elements at the specified indices from the buffer.
-
- Args:
- num_steps: (integer) length of trajectories to return.
- indices: (list of rank num_steps int Tensor) indices in the buffer to
- retrieve elements from for multiple trajectories. Each Tensor in the
- list represents the indices for a trajectory.
- keys: List of keys of tensors to retrieve. If None retrieve all.
- Returns:
- A list of list-of-tensors, where each element in the list is obtained by
- indexing one of the tensors in the buffer.
- Raises:
- ValueError: If gather is called before calling the add function.
- tf.errors.InvalidArgumentError: If indices are bigger than the number of
- items in the buffer.
- """
- if not self._tensors:
- raise ValueError('The add function must be called before calling gather.')
- if keys is None:
- keys = self._tensors.keys()
- with tf.name_scope('Gather'):
- index_bound_assert = tf.Assert(
- tf.less_equal(
- tf.to_int64(tf.reduce_max(indices) + num_steps),
- self.get_num_adds()),
- ['Trajectory indices go out of bounds.'])
- with tf.control_dependencies([index_bound_assert]):
- indices = tf.map_fn(
- lambda x: tf.mod(tf.range(x, x + num_steps), self._buffer_size),
- indices,
- dtype=tf.int64)
-
- batch = []
- for key in keys:
-
- def SampleTrajectories(trajectory_indices, key=key,
- num_steps=num_steps):
- trajectory_indices.set_shape([num_steps])
- return tf.gather(self._tensors[key], trajectory_indices, name=key)
-
- batch.append(tf.map_fn(SampleTrajectories, indices,
- dtype=self._tensors[key].dtype))
- return batch
-
- def get_position(self):
- """Returns the position at which the last element was added.
-
- Returns:
- An int tensor representing the index at which the last element was added
- to the buffer or -1 if no elements were added.
- """
- return tf.cond(self.get_num_adds() < 1,
- lambda: self.get_num_adds() - 1,
- lambda: tf.mod(self.get_num_adds() - 1, self._buffer_size))
-
- def get_num_adds(self):
- """Returns the number of additions to the buffer.
-
- Returns:
- An int tensor representing the number of elements that were added.
- """
- def num_adds():
- return self._num_adds.value()
-
- return self._num_adds_cs.execute(num_adds)
-
- def get_num_tensors(self):
- """Returns the number of tensors (slots) in the buffer."""
- return len(self._tensors)
diff --git a/research/efficient-hrl/agents/ddpg_agent.py b/research/efficient-hrl/agents/ddpg_agent.py
deleted file mode 100644
index 904eb650271..00000000000
--- a/research/efficient-hrl/agents/ddpg_agent.py
+++ /dev/null
@@ -1,739 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""A DDPG/NAF agent.
-
-Implements the Deep Deterministic Policy Gradient (DDPG) algorithm from
-"Continuous control with deep reinforcement learning" - Lilicrap et al.
-https://arxiv.org/abs/1509.02971, and the Normalized Advantage Functions (NAF)
-algorithm "Continuous Deep Q-Learning with Model-based Acceleration" - Gu et al.
-https://arxiv.org/pdf/1603.00748.
-"""
-
-import tensorflow as tf
-slim = tf.contrib.slim
-import gin.tf
-from utils import utils
-from agents import ddpg_networks as networks
-
-
-@gin.configurable
-class DdpgAgent(object):
- """An RL agent that learns using the DDPG algorithm.
-
- Example usage:
-
- def critic_net(states, actions):
- ...
- def actor_net(states, num_action_dims):
- ...
-
- Given a tensorflow environment tf_env,
- (of type learning.deepmind.rl.environments.tensorflow.python.tfpyenvironment)
-
- obs_spec = tf_env.observation_spec()
- action_spec = tf_env.action_spec()
-
- ddpg_agent = agent.DdpgAgent(obs_spec,
- action_spec,
- actor_net=actor_net,
- critic_net=critic_net)
-
- we can perform actions on the environment as follows:
-
- state = tf_env.observations()[0]
- action = ddpg_agent.actor_net(tf.expand_dims(state, 0))[0, :]
- transition_type, reward, discount = tf_env.step([action])
-
- Train:
-
- critic_loss = ddpg_agent.critic_loss(states, actions, rewards, discounts,
- next_states)
- actor_loss = ddpg_agent.actor_loss(states)
-
- critic_train_op = slim.learning.create_train_op(
- critic_loss,
- critic_optimizer,
- variables_to_train=ddpg_agent.get_trainable_critic_vars(),
- )
-
- actor_train_op = slim.learning.create_train_op(
- actor_loss,
- actor_optimizer,
- variables_to_train=ddpg_agent.get_trainable_actor_vars(),
- )
- """
-
- ACTOR_NET_SCOPE = 'actor_net'
- CRITIC_NET_SCOPE = 'critic_net'
- TARGET_ACTOR_NET_SCOPE = 'target_actor_net'
- TARGET_CRITIC_NET_SCOPE = 'target_critic_net'
-
- def __init__(self,
- observation_spec,
- action_spec,
- actor_net=networks.actor_net,
- critic_net=networks.critic_net,
- td_errors_loss=tf.losses.huber_loss,
- dqda_clipping=0.,
- actions_regularizer=0.,
- target_q_clipping=None,
- residual_phi=0.0,
- debug_summaries=False):
- """Constructs a DDPG agent.
-
- Args:
- observation_spec: A TensorSpec defining the observations.
- action_spec: A BoundedTensorSpec defining the actions.
- actor_net: A callable that creates the actor network. Must take the
- following arguments: states, num_actions. Please see networks.actor_net
- for an example.
- critic_net: A callable that creates the critic network. Must take the
- following arguments: states, actions. Please see networks.critic_net
- for an example.
- td_errors_loss: A callable defining the loss function for the critic
- td error.
- dqda_clipping: (float) clips the gradient dqda element-wise between
- [-dqda_clipping, dqda_clipping]. Does not perform clipping if
- dqda_clipping == 0.
- actions_regularizer: A scalar, when positive penalizes the norm of the
- actions. This can prevent saturation of actions for the actor_loss.
- target_q_clipping: (tuple of floats) clips target q values within
- (low, high) values when computing the critic loss.
- residual_phi: (float) [0.0, 1.0] Residual algorithm parameter that
- interpolates between Q-learning and residual gradient algorithm.
- http://www.leemon.com/papers/1995b.pdf
- debug_summaries: If True, add summaries to help debug behavior.
- Raises:
- ValueError: If 'dqda_clipping' is < 0.
- """
- self._observation_spec = observation_spec[0]
- self._action_spec = action_spec[0]
- self._state_shape = tf.TensorShape([None]).concatenate(
- self._observation_spec.shape)
- self._action_shape = tf.TensorShape([None]).concatenate(
- self._action_spec.shape)
- self._num_action_dims = self._action_spec.shape.num_elements()
-
- self._scope = tf.get_variable_scope().name
- self._actor_net = tf.make_template(
- self.ACTOR_NET_SCOPE, actor_net, create_scope_now_=True)
- self._critic_net = tf.make_template(
- self.CRITIC_NET_SCOPE, critic_net, create_scope_now_=True)
- self._target_actor_net = tf.make_template(
- self.TARGET_ACTOR_NET_SCOPE, actor_net, create_scope_now_=True)
- self._target_critic_net = tf.make_template(
- self.TARGET_CRITIC_NET_SCOPE, critic_net, create_scope_now_=True)
- self._td_errors_loss = td_errors_loss
- if dqda_clipping < 0:
- raise ValueError('dqda_clipping must be >= 0.')
- self._dqda_clipping = dqda_clipping
- self._actions_regularizer = actions_regularizer
- self._target_q_clipping = target_q_clipping
- self._residual_phi = residual_phi
- self._debug_summaries = debug_summaries
-
- def _batch_state(self, state):
- """Convert state to a batched state.
-
- Args:
- state: Either a list/tuple with an state tensor [num_state_dims].
- Returns:
- A tensor [1, num_state_dims]
- """
- if isinstance(state, (tuple, list)):
- state = state[0]
- if state.get_shape().ndims == 1:
- state = tf.expand_dims(state, 0)
- return state
-
- def action(self, state):
- """Returns the next action for the state.
-
- Args:
- state: A [num_state_dims] tensor representing a state.
- Returns:
- A [num_action_dims] tensor representing the action.
- """
- return self.actor_net(self._batch_state(state), stop_gradients=True)[0, :]
-
- @gin.configurable('ddpg_sample_action')
- def sample_action(self, state, stddev=1.0):
- """Returns the action for the state with additive noise.
-
- Args:
- state: A [num_state_dims] tensor representing a state.
- stddev: stddev for the Ornstein-Uhlenbeck noise.
- Returns:
- A [num_action_dims] action tensor.
- """
- agent_action = self.action(state)
- agent_action += tf.random_normal(tf.shape(agent_action)) * stddev
- return utils.clip_to_spec(agent_action, self._action_spec)
-
- def actor_net(self, states, stop_gradients=False):
- """Returns the output of the actor network.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- stop_gradients: (boolean) if true, gradients cannot be propogated through
- this operation.
- Returns:
- A [batch_size, num_action_dims] tensor of actions.
- Raises:
- ValueError: If `states` does not have the expected dimensions.
- """
- self._validate_states(states)
- actions = self._actor_net(states, self._action_spec)
- if stop_gradients:
- actions = tf.stop_gradient(actions)
- return actions
-
- def critic_net(self, states, actions, for_critic_loss=False):
- """Returns the output of the critic network.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] tensor representing a batch
- of actions.
- Returns:
- q values: A [batch_size] tensor of q values.
- Raises:
- ValueError: If `states` or `actions' do not have the expected dimensions.
- """
- self._validate_states(states)
- self._validate_actions(actions)
- return self._critic_net(states, actions,
- for_critic_loss=for_critic_loss)
-
- def target_actor_net(self, states):
- """Returns the output of the target actor network.
-
- The target network is used to compute stable targets for training.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- Returns:
- A [batch_size, num_action_dims] tensor of actions.
- Raises:
- ValueError: If `states` does not have the expected dimensions.
- """
- self._validate_states(states)
- actions = self._target_actor_net(states, self._action_spec)
- return tf.stop_gradient(actions)
-
- def target_critic_net(self, states, actions, for_critic_loss=False):
- """Returns the output of the target critic network.
-
- The target network is used to compute stable targets for training.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] tensor representing a batch
- of actions.
- Returns:
- q values: A [batch_size] tensor of q values.
- Raises:
- ValueError: If `states` or `actions' do not have the expected dimensions.
- """
- self._validate_states(states)
- self._validate_actions(actions)
- return tf.stop_gradient(
- self._target_critic_net(states, actions,
- for_critic_loss=for_critic_loss))
-
- def value_net(self, states, for_critic_loss=False):
- """Returns the output of the critic evaluated with the actor.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- Returns:
- q values: A [batch_size] tensor of q values.
- """
- actions = self.actor_net(states)
- return self.critic_net(states, actions,
- for_critic_loss=for_critic_loss)
-
- def target_value_net(self, states, for_critic_loss=False):
- """Returns the output of the target critic evaluated with the target actor.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- Returns:
- q values: A [batch_size] tensor of q values.
- """
- target_actions = self.target_actor_net(states)
- return self.target_critic_net(states, target_actions,
- for_critic_loss=for_critic_loss)
-
- def critic_loss(self, states, actions, rewards, discounts,
- next_states):
- """Computes a loss for training the critic network.
-
- The loss is the mean squared error between the Q value predictions of the
- critic and Q values estimated using TD-lambda.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] tensor representing a batch
- of actions.
- rewards: A [batch_size, ...] tensor representing a batch of rewards,
- broadcastable to the critic net output.
- discounts: A [batch_size, ...] tensor representing a batch of discounts,
- broadcastable to the critic net output.
- next_states: A [batch_size, num_state_dims] tensor representing a batch
- of next states.
- Returns:
- A rank-0 tensor representing the critic loss.
- Raises:
- ValueError: If any of the inputs do not have the expected dimensions, or
- if their batch_sizes do not match.
- """
- self._validate_states(states)
- self._validate_actions(actions)
- self._validate_states(next_states)
-
- target_q_values = self.target_value_net(next_states, for_critic_loss=True)
- td_targets = target_q_values * discounts + rewards
- if self._target_q_clipping is not None:
- td_targets = tf.clip_by_value(td_targets, self._target_q_clipping[0],
- self._target_q_clipping[1])
- q_values = self.critic_net(states, actions, for_critic_loss=True)
- td_errors = td_targets - q_values
- if self._debug_summaries:
- gen_debug_td_error_summaries(
- target_q_values, q_values, td_targets, td_errors)
-
- loss = self._td_errors_loss(td_targets, q_values)
-
- if self._residual_phi > 0.0: # compute residual gradient loss
- residual_q_values = self.value_net(next_states, for_critic_loss=True)
- residual_td_targets = residual_q_values * discounts + rewards
- if self._target_q_clipping is not None:
- residual_td_targets = tf.clip_by_value(residual_td_targets,
- self._target_q_clipping[0],
- self._target_q_clipping[1])
- residual_td_errors = residual_td_targets - q_values
- residual_loss = self._td_errors_loss(
- residual_td_targets, residual_q_values)
- loss = (loss * (1.0 - self._residual_phi) +
- residual_loss * self._residual_phi)
- return loss
-
- def actor_loss(self, states):
- """Computes a loss for training the actor network.
-
- Note that output does not represent an actual loss. It is called a loss only
- in the sense that its gradient w.r.t. the actor network weights is the
- correct gradient for training the actor network,
- i.e. dloss/dweights = (dq/da)*(da/dweights)
- which is the gradient used in Algorithm 1 of Lilicrap et al.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- Returns:
- A rank-0 tensor representing the actor loss.
- Raises:
- ValueError: If `states` does not have the expected dimensions.
- """
- self._validate_states(states)
- actions = self.actor_net(states, stop_gradients=False)
- critic_values = self.critic_net(states, actions)
- q_values = self.critic_function(critic_values, states)
- dqda = tf.gradients([q_values], [actions])[0]
- dqda_unclipped = dqda
- if self._dqda_clipping > 0:
- dqda = tf.clip_by_value(dqda, -self._dqda_clipping, self._dqda_clipping)
-
- actions_norm = tf.norm(actions)
- if self._debug_summaries:
- with tf.name_scope('dqda'):
- tf.summary.scalar('actions_norm', actions_norm)
- tf.summary.histogram('dqda', dqda)
- tf.summary.histogram('dqda_unclipped', dqda_unclipped)
- tf.summary.histogram('actions', actions)
- for a in range(self._num_action_dims):
- tf.summary.histogram('dqda_unclipped_%d' % a, dqda_unclipped[:, a])
- tf.summary.histogram('dqda_%d' % a, dqda[:, a])
-
- actions_norm *= self._actions_regularizer
- return slim.losses.mean_squared_error(tf.stop_gradient(dqda + actions),
- actions,
- scope='actor_loss') + actions_norm
-
- @gin.configurable('ddpg_critic_function')
- def critic_function(self, critic_values, states, weights=None):
- """Computes q values based on critic_net outputs, states, and weights.
-
- Args:
- critic_values: A tf.float32 [batch_size, ...] tensor representing outputs
- from the critic net.
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- weights: A list or Numpy array or tensor with a shape broadcastable to
- `critic_values`.
- Returns:
- A tf.float32 [batch_size] tensor representing q values.
- """
- del states # unused args
- if weights is not None:
- weights = tf.convert_to_tensor(weights, dtype=critic_values.dtype)
- critic_values *= weights
- if critic_values.shape.ndims > 1:
- critic_values = tf.reduce_sum(critic_values,
- range(1, critic_values.shape.ndims))
- critic_values.shape.assert_has_rank(1)
- return critic_values
-
- @gin.configurable('ddpg_update_targets')
- def update_targets(self, tau=1.0):
- """Performs a soft update of the target network parameters.
-
- For each weight w_s in the actor/critic networks, and its corresponding
- weight w_t in the target actor/critic networks, a soft update is:
- w_t = (1- tau) x w_t + tau x ws
-
- Args:
- tau: A float scalar in [0, 1]
- Returns:
- An operation that performs a soft update of the target network parameters.
- Raises:
- ValueError: If `tau` is not in [0, 1].
- """
- if tau < 0 or tau > 1:
- raise ValueError('Input `tau` should be in [0, 1].')
- update_actor = utils.soft_variables_update(
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.ACTOR_NET_SCOPE)),
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.TARGET_ACTOR_NET_SCOPE)),
- tau)
- update_critic = utils.soft_variables_update(
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.CRITIC_NET_SCOPE)),
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.TARGET_CRITIC_NET_SCOPE)),
- tau)
- return tf.group(update_actor, update_critic, name='update_targets')
-
- def get_trainable_critic_vars(self):
- """Returns a list of trainable variables in the critic network.
-
- Returns:
- A list of trainable variables in the critic network.
- """
- return slim.get_trainable_variables(
- utils.join_scope(self._scope, self.CRITIC_NET_SCOPE))
-
- def get_trainable_actor_vars(self):
- """Returns a list of trainable variables in the actor network.
-
- Returns:
- A list of trainable variables in the actor network.
- """
- return slim.get_trainable_variables(
- utils.join_scope(self._scope, self.ACTOR_NET_SCOPE))
-
- def get_critic_vars(self):
- """Returns a list of all variables in the critic network.
-
- Returns:
- A list of trainable variables in the critic network.
- """
- return slim.get_model_variables(
- utils.join_scope(self._scope, self.CRITIC_NET_SCOPE))
-
- def get_actor_vars(self):
- """Returns a list of all variables in the actor network.
-
- Returns:
- A list of trainable variables in the actor network.
- """
- return slim.get_model_variables(
- utils.join_scope(self._scope, self.ACTOR_NET_SCOPE))
-
- def _validate_states(self, states):
- """Raises a value error if `states` does not have the expected shape.
-
- Args:
- states: A tensor.
- Raises:
- ValueError: If states.shape or states.dtype are not compatible with
- observation_spec.
- """
- states.shape.assert_is_compatible_with(self._state_shape)
- if not states.dtype.is_compatible_with(self._observation_spec.dtype):
- raise ValueError('states.dtype={} is not compatible with'
- ' observation_spec.dtype={}'.format(
- states.dtype, self._observation_spec.dtype))
-
- def _validate_actions(self, actions):
- """Raises a value error if `actions` does not have the expected shape.
-
- Args:
- actions: A tensor.
- Raises:
- ValueError: If actions.shape or actions.dtype are not compatible with
- action_spec.
- """
- actions.shape.assert_is_compatible_with(self._action_shape)
- if not actions.dtype.is_compatible_with(self._action_spec.dtype):
- raise ValueError('actions.dtype={} is not compatible with'
- ' action_spec.dtype={}'.format(
- actions.dtype, self._action_spec.dtype))
-
-
-@gin.configurable
-class TD3Agent(DdpgAgent):
- """An RL agent that learns using the TD3 algorithm."""
-
- ACTOR_NET_SCOPE = 'actor_net'
- CRITIC_NET_SCOPE = 'critic_net'
- CRITIC_NET2_SCOPE = 'critic_net2'
- TARGET_ACTOR_NET_SCOPE = 'target_actor_net'
- TARGET_CRITIC_NET_SCOPE = 'target_critic_net'
- TARGET_CRITIC_NET2_SCOPE = 'target_critic_net2'
-
- def __init__(self,
- observation_spec,
- action_spec,
- actor_net=networks.actor_net,
- critic_net=networks.critic_net,
- td_errors_loss=tf.losses.huber_loss,
- dqda_clipping=0.,
- actions_regularizer=0.,
- target_q_clipping=None,
- residual_phi=0.0,
- debug_summaries=False):
- """Constructs a TD3 agent.
-
- Args:
- observation_spec: A TensorSpec defining the observations.
- action_spec: A BoundedTensorSpec defining the actions.
- actor_net: A callable that creates the actor network. Must take the
- following arguments: states, num_actions. Please see networks.actor_net
- for an example.
- critic_net: A callable that creates the critic network. Must take the
- following arguments: states, actions. Please see networks.critic_net
- for an example.
- td_errors_loss: A callable defining the loss function for the critic
- td error.
- dqda_clipping: (float) clips the gradient dqda element-wise between
- [-dqda_clipping, dqda_clipping]. Does not perform clipping if
- dqda_clipping == 0.
- actions_regularizer: A scalar, when positive penalizes the norm of the
- actions. This can prevent saturation of actions for the actor_loss.
- target_q_clipping: (tuple of floats) clips target q values within
- (low, high) values when computing the critic loss.
- residual_phi: (float) [0.0, 1.0] Residual algorithm parameter that
- interpolates between Q-learning and residual gradient algorithm.
- http://www.leemon.com/papers/1995b.pdf
- debug_summaries: If True, add summaries to help debug behavior.
- Raises:
- ValueError: If 'dqda_clipping' is < 0.
- """
- self._observation_spec = observation_spec[0]
- self._action_spec = action_spec[0]
- self._state_shape = tf.TensorShape([None]).concatenate(
- self._observation_spec.shape)
- self._action_shape = tf.TensorShape([None]).concatenate(
- self._action_spec.shape)
- self._num_action_dims = self._action_spec.shape.num_elements()
-
- self._scope = tf.get_variable_scope().name
- self._actor_net = tf.make_template(
- self.ACTOR_NET_SCOPE, actor_net, create_scope_now_=True)
- self._critic_net = tf.make_template(
- self.CRITIC_NET_SCOPE, critic_net, create_scope_now_=True)
- self._critic_net2 = tf.make_template(
- self.CRITIC_NET2_SCOPE, critic_net, create_scope_now_=True)
- self._target_actor_net = tf.make_template(
- self.TARGET_ACTOR_NET_SCOPE, actor_net, create_scope_now_=True)
- self._target_critic_net = tf.make_template(
- self.TARGET_CRITIC_NET_SCOPE, critic_net, create_scope_now_=True)
- self._target_critic_net2 = tf.make_template(
- self.TARGET_CRITIC_NET2_SCOPE, critic_net, create_scope_now_=True)
- self._td_errors_loss = td_errors_loss
- if dqda_clipping < 0:
- raise ValueError('dqda_clipping must be >= 0.')
- self._dqda_clipping = dqda_clipping
- self._actions_regularizer = actions_regularizer
- self._target_q_clipping = target_q_clipping
- self._residual_phi = residual_phi
- self._debug_summaries = debug_summaries
-
- def get_trainable_critic_vars(self):
- """Returns a list of trainable variables in the critic network.
- NOTE: This gets the vars of both critic networks.
-
- Returns:
- A list of trainable variables in the critic network.
- """
- return (
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.CRITIC_NET_SCOPE)))
-
- def critic_net(self, states, actions, for_critic_loss=False):
- """Returns the output of the critic network.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] tensor representing a batch
- of actions.
- Returns:
- q values: A [batch_size] tensor of q values.
- Raises:
- ValueError: If `states` or `actions' do not have the expected dimensions.
- """
- values1 = self._critic_net(states, actions,
- for_critic_loss=for_critic_loss)
- values2 = self._critic_net2(states, actions,
- for_critic_loss=for_critic_loss)
- if for_critic_loss:
- return values1, values2
- return values1
-
- def target_critic_net(self, states, actions, for_critic_loss=False):
- """Returns the output of the target critic network.
-
- The target network is used to compute stable targets for training.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] tensor representing a batch
- of actions.
- Returns:
- q values: A [batch_size] tensor of q values.
- Raises:
- ValueError: If `states` or `actions' do not have the expected dimensions.
- """
- self._validate_states(states)
- self._validate_actions(actions)
- values1 = tf.stop_gradient(
- self._target_critic_net(states, actions,
- for_critic_loss=for_critic_loss))
- values2 = tf.stop_gradient(
- self._target_critic_net2(states, actions,
- for_critic_loss=for_critic_loss))
- if for_critic_loss:
- return values1, values2
- return values1
-
- def value_net(self, states, for_critic_loss=False):
- """Returns the output of the critic evaluated with the actor.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- Returns:
- q values: A [batch_size] tensor of q values.
- """
- actions = self.actor_net(states)
- return self.critic_net(states, actions,
- for_critic_loss=for_critic_loss)
-
- def target_value_net(self, states, for_critic_loss=False):
- """Returns the output of the target critic evaluated with the target actor.
-
- Args:
- states: A [batch_size, num_state_dims] tensor representing a batch
- of states.
- Returns:
- q values: A [batch_size] tensor of q values.
- """
- target_actions = self.target_actor_net(states)
- noise = tf.clip_by_value(
- tf.random_normal(tf.shape(target_actions), stddev=0.2), -0.5, 0.5)
- values1, values2 = self.target_critic_net(
- states, target_actions + noise,
- for_critic_loss=for_critic_loss)
- values = tf.minimum(values1, values2)
- return values, values
-
- @gin.configurable('td3_update_targets')
- def update_targets(self, tau=1.0):
- """Performs a soft update of the target network parameters.
-
- For each weight w_s in the actor/critic networks, and its corresponding
- weight w_t in the target actor/critic networks, a soft update is:
- w_t = (1- tau) x w_t + tau x ws
-
- Args:
- tau: A float scalar in [0, 1]
- Returns:
- An operation that performs a soft update of the target network parameters.
- Raises:
- ValueError: If `tau` is not in [0, 1].
- """
- if tau < 0 or tau > 1:
- raise ValueError('Input `tau` should be in [0, 1].')
- update_actor = utils.soft_variables_update(
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.ACTOR_NET_SCOPE)),
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.TARGET_ACTOR_NET_SCOPE)),
- tau)
- # NOTE: This updates both critic networks.
- update_critic = utils.soft_variables_update(
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.CRITIC_NET_SCOPE)),
- slim.get_trainable_variables(
- utils.join_scope(self._scope, self.TARGET_CRITIC_NET_SCOPE)),
- tau)
- return tf.group(update_actor, update_critic, name='update_targets')
-
-
-def gen_debug_td_error_summaries(
- target_q_values, q_values, td_targets, td_errors):
- """Generates debug summaries for critic given a set of batch samples.
-
- Args:
- target_q_values: set of predicted next stage values.
- q_values: current predicted value for the critic network.
- td_targets: discounted target_q_values with added next stage reward.
- td_errors: the different between td_targets and q_values.
- """
- with tf.name_scope('td_errors'):
- tf.summary.histogram('td_targets', td_targets)
- tf.summary.histogram('q_values', q_values)
- tf.summary.histogram('target_q_values', target_q_values)
- tf.summary.histogram('td_errors', td_errors)
- with tf.name_scope('td_targets'):
- tf.summary.scalar('mean', tf.reduce_mean(td_targets))
- tf.summary.scalar('max', tf.reduce_max(td_targets))
- tf.summary.scalar('min', tf.reduce_min(td_targets))
- with tf.name_scope('q_values'):
- tf.summary.scalar('mean', tf.reduce_mean(q_values))
- tf.summary.scalar('max', tf.reduce_max(q_values))
- tf.summary.scalar('min', tf.reduce_min(q_values))
- with tf.name_scope('target_q_values'):
- tf.summary.scalar('mean', tf.reduce_mean(target_q_values))
- tf.summary.scalar('max', tf.reduce_max(target_q_values))
- tf.summary.scalar('min', tf.reduce_min(target_q_values))
- with tf.name_scope('td_errors'):
- tf.summary.scalar('mean', tf.reduce_mean(td_errors))
- tf.summary.scalar('max', tf.reduce_max(td_errors))
- tf.summary.scalar('min', tf.reduce_min(td_errors))
- tf.summary.scalar('mean_abs', tf.reduce_mean(tf.abs(td_errors)))
diff --git a/research/efficient-hrl/agents/ddpg_networks.py b/research/efficient-hrl/agents/ddpg_networks.py
deleted file mode 100644
index 63074dfb91c..00000000000
--- a/research/efficient-hrl/agents/ddpg_networks.py
+++ /dev/null
@@ -1,150 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Sample actor(policy) and critic(q) networks to use with DDPG/NAF agents.
-
-The DDPG networks are defined in "Section 7: Experiment Details" of
-"Continuous control with deep reinforcement learning" - Lilicrap et al.
-https://arxiv.org/abs/1509.02971
-
-The NAF critic network is based on "Section 4" of "Continuous deep Q-learning
-with model-based acceleration" - Gu et al. https://arxiv.org/pdf/1603.00748.
-"""
-
-import tensorflow as tf
-slim = tf.contrib.slim
-import gin.tf
-
-
-@gin.configurable('ddpg_critic_net')
-def critic_net(states, actions,
- for_critic_loss=False,
- num_reward_dims=1,
- states_hidden_layers=(400,),
- actions_hidden_layers=None,
- joint_hidden_layers=(300,),
- weight_decay=0.0001,
- normalizer_fn=None,
- activation_fn=tf.nn.relu,
- zero_obs=False,
- images=False):
- """Creates a critic that returns q values for the given states and actions.
-
- Args:
- states: (castable to tf.float32) a [batch_size, num_state_dims] tensor
- representing a batch of states.
- actions: (castable to tf.float32) a [batch_size, num_action_dims] tensor
- representing a batch of actions.
- num_reward_dims: Number of reward dimensions.
- states_hidden_layers: tuple of hidden layers units for states.
- actions_hidden_layers: tuple of hidden layers units for actions.
- joint_hidden_layers: tuple of hidden layers units after joining states
- and actions using tf.concat().
- weight_decay: Weight decay for l2 weights regularizer.
- normalizer_fn: Normalizer function, i.e. slim.layer_norm,
- activation_fn: Activation function, i.e. tf.nn.relu, slim.leaky_relu, ...
- Returns:
- A tf.float32 [batch_size] tensor of q values, or a tf.float32
- [batch_size, num_reward_dims] tensor of vector q values if
- num_reward_dims > 1.
- """
- with slim.arg_scope(
- [slim.fully_connected],
- activation_fn=activation_fn,
- normalizer_fn=normalizer_fn,
- weights_regularizer=slim.l2_regularizer(weight_decay),
- weights_initializer=slim.variance_scaling_initializer(
- factor=1.0/3.0, mode='FAN_IN', uniform=True)):
-
- orig_states = tf.to_float(states)
- #states = tf.to_float(states)
- states = tf.concat([tf.to_float(states), tf.to_float(actions)], -1) #TD3
- if images or zero_obs:
- states *= tf.constant([0.0] * 2 + [1.0] * (states.shape[1] - 2)) #LALA
- actions = tf.to_float(actions)
- if states_hidden_layers:
- states = slim.stack(states, slim.fully_connected, states_hidden_layers,
- scope='states')
- if actions_hidden_layers:
- actions = slim.stack(actions, slim.fully_connected, actions_hidden_layers,
- scope='actions')
- joint = tf.concat([states, actions], 1)
- if joint_hidden_layers:
- joint = slim.stack(joint, slim.fully_connected, joint_hidden_layers,
- scope='joint')
- with slim.arg_scope([slim.fully_connected],
- weights_regularizer=None,
- weights_initializer=tf.random_uniform_initializer(
- minval=-0.003, maxval=0.003)):
- value = slim.fully_connected(joint, num_reward_dims,
- activation_fn=None,
- normalizer_fn=None,
- scope='q_value')
- if num_reward_dims == 1:
- value = tf.reshape(value, [-1])
- if not for_critic_loss and num_reward_dims > 1:
- value = tf.reduce_sum(
- value * tf.abs(orig_states[:, -num_reward_dims:]), -1)
- return value
-
-
-@gin.configurable('ddpg_actor_net')
-def actor_net(states, action_spec,
- hidden_layers=(400, 300),
- normalizer_fn=None,
- activation_fn=tf.nn.relu,
- zero_obs=False,
- images=False):
- """Creates an actor that returns actions for the given states.
-
- Args:
- states: (castable to tf.float32) a [batch_size, num_state_dims] tensor
- representing a batch of states.
- action_spec: (BoundedTensorSpec) A tensor spec indicating the shape
- and range of actions.
- hidden_layers: tuple of hidden layers units.
- normalizer_fn: Normalizer function, i.e. slim.layer_norm,
- activation_fn: Activation function, i.e. tf.nn.relu, slim.leaky_relu, ...
- Returns:
- A tf.float32 [batch_size, num_action_dims] tensor of actions.
- """
-
- with slim.arg_scope(
- [slim.fully_connected],
- activation_fn=activation_fn,
- normalizer_fn=normalizer_fn,
- weights_initializer=slim.variance_scaling_initializer(
- factor=1.0/3.0, mode='FAN_IN', uniform=True)):
-
- states = tf.to_float(states)
- orig_states = states
- if images or zero_obs: # Zero-out x, y position. Hacky.
- states *= tf.constant([0.0] * 2 + [1.0] * (states.shape[1] - 2))
- if hidden_layers:
- states = slim.stack(states, slim.fully_connected, hidden_layers,
- scope='states')
- with slim.arg_scope([slim.fully_connected],
- weights_initializer=tf.random_uniform_initializer(
- minval=-0.003, maxval=0.003)):
- actions = slim.fully_connected(states,
- action_spec.shape.num_elements(),
- scope='actions',
- normalizer_fn=None,
- activation_fn=tf.nn.tanh)
- action_means = (action_spec.maximum + action_spec.minimum) / 2.0
- action_magnitudes = (action_spec.maximum - action_spec.minimum) / 2.0
- actions = action_means + action_magnitudes * actions
-
- return actions
diff --git a/research/efficient-hrl/cond_fn.py b/research/efficient-hrl/cond_fn.py
deleted file mode 100644
index cd1a276e136..00000000000
--- a/research/efficient-hrl/cond_fn.py
+++ /dev/null
@@ -1,244 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Defines many boolean functions indicating when to step and reset.
-"""
-
-import tensorflow as tf
-import gin.tf
-
-
-@gin.configurable
-def env_transition(agent, state, action, transition_type, environment_steps,
- num_episodes):
- """True if the transition_type is TRANSITION or FINAL_TRANSITION.
-
- Args:
- agent: RL agent.
- state: A [num_state_dims] tensor representing a state.
- action: Action performed.
- transition_type: Type of transition after action
- environment_steps: Number of steps performed by environment.
- num_episodes: Number of episodes.
- Returns:
- cond: Returns an op that evaluates to true if the transition type is
- not RESTARTING
- """
- del agent, state, action, num_episodes, environment_steps
- cond = tf.logical_not(transition_type)
- return cond
-
-
-@gin.configurable
-def env_restart(agent, state, action, transition_type, environment_steps,
- num_episodes):
- """True if the transition_type is RESTARTING.
-
- Args:
- agent: RL agent.
- state: A [num_state_dims] tensor representing a state.
- action: Action performed.
- transition_type: Type of transition after action
- environment_steps: Number of steps performed by environment.
- num_episodes: Number of episodes.
- Returns:
- cond: Returns an op that evaluates to true if the transition type equals
- RESTARTING.
- """
- del agent, state, action, num_episodes, environment_steps
- cond = tf.identity(transition_type)
- return cond
-
-
-@gin.configurable
-def every_n_steps(agent,
- state,
- action,
- transition_type,
- environment_steps,
- num_episodes,
- n=150):
- """True once every n steps.
-
- Args:
- agent: RL agent.
- state: A [num_state_dims] tensor representing a state.
- action: Action performed.
- transition_type: Type of transition after action
- environment_steps: Number of steps performed by environment.
- num_episodes: Number of episodes.
- n: Return true once every n steps.
- Returns:
- cond: Returns an op that evaluates to true if environment_steps
- equals 0 mod n. We increment the step before checking this condition, so
- we do not need to add one to environment_steps.
- """
- del agent, state, action, transition_type, num_episodes
- cond = tf.equal(tf.mod(environment_steps, n), 0)
- return cond
-
-
-@gin.configurable
-def every_n_episodes(agent,
- state,
- action,
- transition_type,
- environment_steps,
- num_episodes,
- n=2,
- steps_per_episode=None):
- """True once every n episodes.
-
- Specifically, evaluates to True on the 0th step of every nth episode.
- Unlike environment_steps, num_episodes starts at 0, so we do want to add
- one to ensure it does not reset on the first call.
-
- Args:
- agent: RL agent.
- state: A [num_state_dims] tensor representing a state.
- action: Action performed.
- transition_type: Type of transition after action
- environment_steps: Number of steps performed by environment.
- num_episodes: Number of episodes.
- n: Return true once every n episodes.
- steps_per_episode: How many steps per episode. Needed to determine when a
- new episode starts.
- Returns:
- cond: Returns an op that evaluates to true on the last step of the episode
- (i.e. if num_episodes equals 0 mod n).
- """
- assert steps_per_episode is not None
- del agent, action, transition_type
- ant_fell = tf.logical_or(state[2] < 0.2, state[2] > 1.0)
- cond = tf.logical_and(
- tf.logical_or(
- ant_fell,
- tf.equal(tf.mod(num_episodes + 1, n), 0)),
- tf.equal(tf.mod(environment_steps, steps_per_episode), 0))
- return cond
-
-
-@gin.configurable
-def failed_reset_after_n_episodes(agent,
- state,
- action,
- transition_type,
- environment_steps,
- num_episodes,
- steps_per_episode=None,
- reset_state=None,
- max_dist=1.0,
- epsilon=1e-10):
- """Every n episodes, returns True if the reset agent fails to return.
-
- Specifically, evaluates to True if the distance between the state and the
- reset state is greater than max_dist at the end of the episode.
-
- Args:
- agent: RL agent.
- state: A [num_state_dims] tensor representing a state.
- action: Action performed.
- transition_type: Type of transition after action
- environment_steps: Number of steps performed by environment.
- num_episodes: Number of episodes.
- steps_per_episode: How many steps per episode. Needed to determine when a
- new episode starts.
- reset_state: State to which the reset controller should return.
- max_dist: Agent is considered to have successfully reset if its distance
- from the reset_state is less than max_dist.
- epsilon: small offset to ensure non-negative/zero distance.
- Returns:
- cond: Returns an op that evaluates to true if num_episodes+1 equals 0
- mod n. We add one to the num_episodes so the environment is not reset after
- the 0th step.
- """
- assert steps_per_episode is not None
- assert reset_state is not None
- del agent, state, action, transition_type, num_episodes
- dist = tf.sqrt(
- tf.reduce_sum(tf.squared_difference(state, reset_state)) + epsilon)
- cond = tf.logical_and(
- tf.greater(dist, tf.constant(max_dist)),
- tf.equal(tf.mod(environment_steps, steps_per_episode), 0))
- return cond
-
-
-@gin.configurable
-def q_too_small(agent,
- state,
- action,
- transition_type,
- environment_steps,
- num_episodes,
- q_min=0.5):
- """True of q is too small.
-
- Args:
- agent: RL agent.
- state: A [num_state_dims] tensor representing a state.
- action: Action performed.
- transition_type: Type of transition after action
- environment_steps: Number of steps performed by environment.
- num_episodes: Number of episodes.
- q_min: Returns true if the qval is less than q_min
- Returns:
- cond: Returns an op that evaluates to true if qval is less than q_min.
- """
- del transition_type, environment_steps, num_episodes
- state_for_reset_agent = tf.stack(state[:-1], tf.constant([0], dtype=tf.float))
- qval = agent.BASE_AGENT_CLASS.critic_net(
- tf.expand_dims(state_for_reset_agent, 0), tf.expand_dims(action, 0))[0, :]
- cond = tf.greater(tf.constant(q_min), qval)
- return cond
-
-
-@gin.configurable
-def true_fn(agent, state, action, transition_type, environment_steps,
- num_episodes):
- """Returns an op that evaluates to true.
-
- Args:
- agent: RL agent.
- state: A [num_state_dims] tensor representing a state.
- action: Action performed.
- transition_type: Type of transition after action
- environment_steps: Number of steps performed by environment.
- num_episodes: Number of episodes.
- Returns:
- cond: op that always evaluates to True.
- """
- del agent, state, action, transition_type, environment_steps, num_episodes
- cond = tf.constant(True, dtype=tf.bool)
- return cond
-
-
-@gin.configurable
-def false_fn(agent, state, action, transition_type, environment_steps,
- num_episodes):
- """Returns an op that evaluates to false.
-
- Args:
- agent: RL agent.
- state: A [num_state_dims] tensor representing a state.
- action: Action performed.
- transition_type: Type of transition after action
- environment_steps: Number of steps performed by environment.
- num_episodes: Number of episodes.
- Returns:
- cond: op that always evaluates to False.
- """
- del agent, state, action, transition_type, environment_steps, num_episodes
- cond = tf.constant(False, dtype=tf.bool)
- return cond
diff --git a/research/efficient-hrl/configs/base_uvf.gin b/research/efficient-hrl/configs/base_uvf.gin
deleted file mode 100644
index 2f3f47b67a3..00000000000
--- a/research/efficient-hrl/configs/base_uvf.gin
+++ /dev/null
@@ -1,68 +0,0 @@
-#-*-Python-*-
-import gin.tf.external_configurables
-
-create_maze_env.top_down_view = %IMAGES
-## Create the agent
-AGENT_CLASS = @UvfAgent
-UvfAgent.tf_context = %CONTEXT
-UvfAgent.actor_net = @agent/ddpg_actor_net
-UvfAgent.critic_net = @agent/ddpg_critic_net
-UvfAgent.dqda_clipping = 0.0
-UvfAgent.td_errors_loss = @tf.losses.huber_loss
-UvfAgent.target_q_clipping = %TARGET_Q_CLIPPING
-
-# Create meta agent
-META_CLASS = @MetaAgent
-MetaAgent.tf_context = %META_CONTEXT
-MetaAgent.sub_context = %CONTEXT
-MetaAgent.actor_net = @meta/ddpg_actor_net
-MetaAgent.critic_net = @meta/ddpg_critic_net
-MetaAgent.dqda_clipping = 0.0
-MetaAgent.td_errors_loss = @tf.losses.huber_loss
-MetaAgent.target_q_clipping = %TARGET_Q_CLIPPING
-
-# Create state preprocess
-STATE_PREPROCESS_CLASS = @StatePreprocess
-StatePreprocess.ndims = %SUBGOAL_DIM
-state_preprocess_net.states_hidden_layers = (100, 100)
-state_preprocess_net.num_output_dims = %SUBGOAL_DIM
-state_preprocess_net.images = %IMAGES
-action_embed_net.num_output_dims = %SUBGOAL_DIM
-INVERSE_DYNAMICS_CLASS = @InverseDynamics
-
-# actor_net
-ACTOR_HIDDEN_SIZE_1 = 300
-ACTOR_HIDDEN_SIZE_2 = 300
-agent/ddpg_actor_net.hidden_layers = (%ACTOR_HIDDEN_SIZE_1, %ACTOR_HIDDEN_SIZE_2)
-agent/ddpg_actor_net.activation_fn = @tf.nn.relu
-agent/ddpg_actor_net.zero_obs = %ZERO_OBS
-agent/ddpg_actor_net.images = %IMAGES
-meta/ddpg_actor_net.hidden_layers = (%ACTOR_HIDDEN_SIZE_1, %ACTOR_HIDDEN_SIZE_2)
-meta/ddpg_actor_net.activation_fn = @tf.nn.relu
-meta/ddpg_actor_net.zero_obs = False
-meta/ddpg_actor_net.images = %IMAGES
-# critic_net
-CRITIC_HIDDEN_SIZE_1 = 300
-CRITIC_HIDDEN_SIZE_2 = 300
-agent/ddpg_critic_net.states_hidden_layers = (%CRITIC_HIDDEN_SIZE_1,)
-agent/ddpg_critic_net.actions_hidden_layers = None
-agent/ddpg_critic_net.joint_hidden_layers = (%CRITIC_HIDDEN_SIZE_2,)
-agent/ddpg_critic_net.weight_decay = 0.0
-agent/ddpg_critic_net.activation_fn = @tf.nn.relu
-agent/ddpg_critic_net.zero_obs = %ZERO_OBS
-agent/ddpg_critic_net.images = %IMAGES
-meta/ddpg_critic_net.states_hidden_layers = (%CRITIC_HIDDEN_SIZE_1,)
-meta/ddpg_critic_net.actions_hidden_layers = None
-meta/ddpg_critic_net.joint_hidden_layers = (%CRITIC_HIDDEN_SIZE_2,)
-meta/ddpg_critic_net.weight_decay = 0.0
-meta/ddpg_critic_net.activation_fn = @tf.nn.relu
-meta/ddpg_critic_net.zero_obs = False
-meta/ddpg_critic_net.images = %IMAGES
-
-tf.losses.huber_loss.delta = 1.0
-# Sample action
-uvf_add_noise_fn.stddev = 1.0
-meta_add_noise_fn.stddev = %META_EXPLORE_NOISE
-# Update targets
-ddpg_update_targets.tau = 0.001
-td3_update_targets.tau = 0.005
diff --git a/research/efficient-hrl/configs/eval_uvf.gin b/research/efficient-hrl/configs/eval_uvf.gin
deleted file mode 100644
index 7a58241e06a..00000000000
--- a/research/efficient-hrl/configs/eval_uvf.gin
+++ /dev/null
@@ -1,14 +0,0 @@
-#-*-Python-*-
-# Config eval
-evaluate.environment = @create_maze_env()
-evaluate.agent_class = %AGENT_CLASS
-evaluate.meta_agent_class = %META_CLASS
-evaluate.state_preprocess_class = %STATE_PREPROCESS_CLASS
-evaluate.num_episodes_eval = 50
-evaluate.num_episodes_videos = 1
-evaluate.gamma = 1.0
-evaluate.eval_interval_secs = 1
-evaluate.generate_videos = False
-evaluate.generate_summaries = True
-evaluate.eval_modes = %EVAL_MODES
-evaluate.max_steps_per_episode = %RESET_EPISODE_PERIOD
diff --git a/research/efficient-hrl/configs/train_uvf.gin b/research/efficient-hrl/configs/train_uvf.gin
deleted file mode 100644
index 8b02d7a6cb4..00000000000
--- a/research/efficient-hrl/configs/train_uvf.gin
+++ /dev/null
@@ -1,52 +0,0 @@
-#-*-Python-*-
-# Create replay_buffer
-agent/CircularBuffer.buffer_size = 200000
-meta/CircularBuffer.buffer_size = 200000
-agent/CircularBuffer.scope = "agent"
-meta/CircularBuffer.scope = "meta"
-
-# Config train
-train_uvf.environment = @create_maze_env()
-train_uvf.agent_class = %AGENT_CLASS
-train_uvf.meta_agent_class = %META_CLASS
-train_uvf.state_preprocess_class = %STATE_PREPROCESS_CLASS
-train_uvf.inverse_dynamics_class = %INVERSE_DYNAMICS_CLASS
-train_uvf.replay_buffer = @agent/CircularBuffer()
-train_uvf.meta_replay_buffer = @meta/CircularBuffer()
-train_uvf.critic_optimizer = @critic/AdamOptimizer()
-train_uvf.actor_optimizer = @actor/AdamOptimizer()
-train_uvf.meta_critic_optimizer = @meta_critic/AdamOptimizer()
-train_uvf.meta_actor_optimizer = @meta_actor/AdamOptimizer()
-train_uvf.repr_optimizer = @repr/AdamOptimizer()
-train_uvf.num_episodes_train = 25000
-train_uvf.batch_size = 100
-train_uvf.initial_episodes = 5
-train_uvf.gamma = 0.99
-train_uvf.meta_gamma = 0.99
-train_uvf.reward_scale_factor = 1.0
-train_uvf.target_update_period = 2
-train_uvf.num_updates_per_observation = 1
-train_uvf.num_collect_per_update = 1
-train_uvf.num_collect_per_meta_update = 10
-train_uvf.debug_summaries = False
-train_uvf.log_every_n_steps = 1000
-train_uvf.save_policy_every_n_steps =100000
-
-# Config Optimizers
-critic/AdamOptimizer.learning_rate = 0.001
-critic/AdamOptimizer.beta1 = 0.9
-critic/AdamOptimizer.beta2 = 0.999
-actor/AdamOptimizer.learning_rate = 0.0001
-actor/AdamOptimizer.beta1 = 0.9
-actor/AdamOptimizer.beta2 = 0.999
-
-meta_critic/AdamOptimizer.learning_rate = 0.001
-meta_critic/AdamOptimizer.beta1 = 0.9
-meta_critic/AdamOptimizer.beta2 = 0.999
-meta_actor/AdamOptimizer.learning_rate = 0.0001
-meta_actor/AdamOptimizer.beta1 = 0.9
-meta_actor/AdamOptimizer.beta2 = 0.999
-
-repr/AdamOptimizer.learning_rate = 0.0001
-repr/AdamOptimizer.beta1 = 0.9
-repr/AdamOptimizer.beta2 = 0.999
diff --git a/research/efficient-hrl/context/__init__.py b/research/efficient-hrl/context/__init__.py
deleted file mode 100644
index 8b137891791..00000000000
--- a/research/efficient-hrl/context/__init__.py
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/research/efficient-hrl/context/configs/ant_block.gin b/research/efficient-hrl/context/configs/ant_block.gin
deleted file mode 100644
index d5bd4f01e01..00000000000
--- a/research/efficient-hrl/context/configs/ant_block.gin
+++ /dev/null
@@ -1,67 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntBlock"
-ZERO_OBS = False
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-4, -4), (20, 20))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval1", "eval2", "eval3"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [2]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval1": [@eval1/ConstantSampler],
- "eval2": [@eval2/ConstantSampler],
- "eval3": [@eval3/ConstantSampler],
-}
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [3, 4]
-task/negative_distance.relative_context = False
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-MetaAgent.k = %SUBGOAL_DIM
-
-eval1/ConstantSampler.value = [16, 0]
-eval2/ConstantSampler.value = [16, 16]
-eval3/ConstantSampler.value = [0, 16]
diff --git a/research/efficient-hrl/context/configs/ant_block_maze.gin b/research/efficient-hrl/context/configs/ant_block_maze.gin
deleted file mode 100644
index cebf775be12..00000000000
--- a/research/efficient-hrl/context/configs/ant_block_maze.gin
+++ /dev/null
@@ -1,67 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntBlockMaze"
-ZERO_OBS = False
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-4, -4), (12, 20))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval1", "eval2", "eval3"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [2]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval1": [@eval1/ConstantSampler],
- "eval2": [@eval2/ConstantSampler],
- "eval3": [@eval3/ConstantSampler],
-}
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [3, 4]
-task/negative_distance.relative_context = False
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-MetaAgent.k = %SUBGOAL_DIM
-
-eval1/ConstantSampler.value = [8, 0]
-eval2/ConstantSampler.value = [8, 16]
-eval3/ConstantSampler.value = [0, 16]
diff --git a/research/efficient-hrl/context/configs/ant_fall_multi.gin b/research/efficient-hrl/context/configs/ant_fall_multi.gin
deleted file mode 100644
index eb89ad0cb16..00000000000
--- a/research/efficient-hrl/context/configs/ant_fall_multi.gin
+++ /dev/null
@@ -1,62 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntFall"
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-4, -4, 0), (12, 28, 5))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval1"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [3]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval1": [@eval1/ConstantSampler],
-}
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1, 2]
-task/negative_distance.relative_context = False
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-MetaAgent.k = %SUBGOAL_DIM
-
-eval1/ConstantSampler.value = [0, 27, 4.5]
diff --git a/research/efficient-hrl/context/configs/ant_fall_multi_img.gin b/research/efficient-hrl/context/configs/ant_fall_multi_img.gin
deleted file mode 100644
index b54fb7c9196..00000000000
--- a/research/efficient-hrl/context/configs/ant_fall_multi_img.gin
+++ /dev/null
@@ -1,68 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntFall"
-IMAGES = True
-
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-4, -4, 0), (12, 28, 5))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval1"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [3]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval1": [@eval1/ConstantSampler],
-}
-meta/Context.context_transition_fn = @task/relative_context_transition_fn
-meta/Context.context_multi_transition_fn = @task/relative_context_multi_transition_fn
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1, 2]
-task/negative_distance.relative_context = True
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-task/relative_context_transition_fn.k = 3
-task/relative_context_multi_transition_fn.k = 3
-MetaAgent.k = %SUBGOAL_DIM
-
-eval1/ConstantSampler.value = [0, 27, 0]
diff --git a/research/efficient-hrl/context/configs/ant_fall_single.gin b/research/efficient-hrl/context/configs/ant_fall_single.gin
deleted file mode 100644
index 56bbc070072..00000000000
--- a/research/efficient-hrl/context/configs/ant_fall_single.gin
+++ /dev/null
@@ -1,62 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntFall"
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-4, -4, 0), (12, 28, 5))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval1"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [3]
-meta/Context.samplers = {
- "train": [@eval1/ConstantSampler],
- "explore": [@eval1/ConstantSampler],
- "eval1": [@eval1/ConstantSampler],
-}
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1, 2]
-task/negative_distance.relative_context = False
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-MetaAgent.k = %SUBGOAL_DIM
-
-eval1/ConstantSampler.value = [0, 27, 4.5]
diff --git a/research/efficient-hrl/context/configs/ant_maze.gin b/research/efficient-hrl/context/configs/ant_maze.gin
deleted file mode 100644
index 3a0b73e30d7..00000000000
--- a/research/efficient-hrl/context/configs/ant_maze.gin
+++ /dev/null
@@ -1,66 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntMaze"
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-4, -4), (20, 20))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval1", "eval2", "eval3"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [2]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval1": [@eval1/ConstantSampler],
- "eval2": [@eval2/ConstantSampler],
- "eval3": [@eval3/ConstantSampler],
-}
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1]
-task/negative_distance.relative_context = False
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-MetaAgent.k = %SUBGOAL_DIM
-
-eval1/ConstantSampler.value = [16, 0]
-eval2/ConstantSampler.value = [16, 16]
-eval3/ConstantSampler.value = [0, 16]
diff --git a/research/efficient-hrl/context/configs/ant_maze_img.gin b/research/efficient-hrl/context/configs/ant_maze_img.gin
deleted file mode 100644
index ceed65a0884..00000000000
--- a/research/efficient-hrl/context/configs/ant_maze_img.gin
+++ /dev/null
@@ -1,72 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntMaze"
-IMAGES = True
-
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-4, -4), (20, 20))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval1", "eval2", "eval3"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [2]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval1": [@eval1/ConstantSampler],
- "eval2": [@eval2/ConstantSampler],
- "eval3": [@eval3/ConstantSampler],
-}
-meta/Context.context_transition_fn = @task/relative_context_transition_fn
-meta/Context.context_multi_transition_fn = @task/relative_context_multi_transition_fn
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1]
-task/negative_distance.relative_context = True
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-task/relative_context_transition_fn.k = 2
-task/relative_context_multi_transition_fn.k = 2
-MetaAgent.k = %SUBGOAL_DIM
-
-eval1/ConstantSampler.value = [16, 0]
-eval2/ConstantSampler.value = [16, 16]
-eval3/ConstantSampler.value = [0, 16]
diff --git a/research/efficient-hrl/context/configs/ant_push_multi.gin b/research/efficient-hrl/context/configs/ant_push_multi.gin
deleted file mode 100644
index db9b4ed7bbe..00000000000
--- a/research/efficient-hrl/context/configs/ant_push_multi.gin
+++ /dev/null
@@ -1,62 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntPush"
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-16, -4), (16, 20))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval2"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [2]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval2": [@eval2/ConstantSampler],
-}
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1]
-task/negative_distance.relative_context = False
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-MetaAgent.k = %SUBGOAL_DIM
-
-eval2/ConstantSampler.value = [0, 19]
diff --git a/research/efficient-hrl/context/configs/ant_push_multi_img.gin b/research/efficient-hrl/context/configs/ant_push_multi_img.gin
deleted file mode 100644
index abdc43402fc..00000000000
--- a/research/efficient-hrl/context/configs/ant_push_multi_img.gin
+++ /dev/null
@@ -1,68 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntPush"
-IMAGES = True
-
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-16, -4), (16, 20))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval2"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [2]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval2": [@eval2/ConstantSampler],
-}
-meta/Context.context_transition_fn = @task/relative_context_transition_fn
-meta/Context.context_multi_transition_fn = @task/relative_context_multi_transition_fn
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1]
-task/negative_distance.relative_context = True
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-task/relative_context_transition_fn.k = 2
-task/relative_context_multi_transition_fn.k = 2
-MetaAgent.k = %SUBGOAL_DIM
-
-eval2/ConstantSampler.value = [0, 19]
diff --git a/research/efficient-hrl/context/configs/ant_push_single.gin b/research/efficient-hrl/context/configs/ant_push_single.gin
deleted file mode 100644
index e85c5dfba4d..00000000000
--- a/research/efficient-hrl/context/configs/ant_push_single.gin
+++ /dev/null
@@ -1,62 +0,0 @@
-#-*-Python-*-
-create_maze_env.env_name = "AntPush"
-context_range = (%CONTEXT_RANGE_MIN, %CONTEXT_RANGE_MAX)
-meta_context_range = ((-16, -4), (16, 20))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval2"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [2]
-meta/Context.samplers = {
- "train": [@eval2/ConstantSampler],
- "explore": [@eval2/ConstantSampler],
- "eval2": [@eval2/ConstantSampler],
-}
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1]
-task/negative_distance.relative_context = False
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-MetaAgent.k = %SUBGOAL_DIM
-
-eval2/ConstantSampler.value = [0, 19]
diff --git a/research/efficient-hrl/context/configs/default.gin b/research/efficient-hrl/context/configs/default.gin
deleted file mode 100644
index 65f91e5292d..00000000000
--- a/research/efficient-hrl/context/configs/default.gin
+++ /dev/null
@@ -1,12 +0,0 @@
-#-*-Python-*-
-ENV_CONTEXT = None
-EVAL_MODES = ["eval"]
-TARGET_Q_CLIPPING = None
-RESET_EPISODE_PERIOD = None
-ZERO_OBS = False
-CONTEXT_RANGE_MIN = -10
-CONTEXT_RANGE_MAX = 10
-SUBGOAL_DIM = 2
-
-uvf/negative_distance.summarize = False
-uvf/negative_distance.relative_context = True
diff --git a/research/efficient-hrl/context/configs/hiro_orig.gin b/research/efficient-hrl/context/configs/hiro_orig.gin
deleted file mode 100644
index e39ba96be7b..00000000000
--- a/research/efficient-hrl/context/configs/hiro_orig.gin
+++ /dev/null
@@ -1,14 +0,0 @@
-#-*-Python-*-
-ENV_CONTEXT = None
-EVAL_MODES = ["eval"]
-TARGET_Q_CLIPPING = None
-RESET_EPISODE_PERIOD = None
-ZERO_OBS = True
-IMAGES = False
-CONTEXT_RANGE_MIN = (-10, -10, -0.5, -1, -1, -1, -1, -0.5, -0.3, -0.5, -0.3, -0.5, -0.3, -0.5, -0.3)
-CONTEXT_RANGE_MAX = ( 10, 10, 0.5, 1, 1, 1, 1, 0.5, 0.3, 0.5, 0.3, 0.5, 0.3, 0.5, 0.3)
-SUBGOAL_DIM = 15
-META_EXPLORE_NOISE = 1.0
-
-uvf/negative_distance.summarize = False
-uvf/negative_distance.relative_context = True
diff --git a/research/efficient-hrl/context/configs/hiro_repr.gin b/research/efficient-hrl/context/configs/hiro_repr.gin
deleted file mode 100644
index a0a8057bd3c..00000000000
--- a/research/efficient-hrl/context/configs/hiro_repr.gin
+++ /dev/null
@@ -1,18 +0,0 @@
-#-*-Python-*-
-ENV_CONTEXT = None
-EVAL_MODES = ["eval"]
-TARGET_Q_CLIPPING = None
-RESET_EPISODE_PERIOD = None
-ZERO_OBS = False
-IMAGES = False
-CONTEXT_RANGE_MIN = -10
-CONTEXT_RANGE_MAX = 10
-SUBGOAL_DIM = 2
-META_EXPLORE_NOISE = 5.0
-
-StatePreprocess.trainable = True
-StatePreprocess.state_preprocess_net = @state_preprocess_net
-StatePreprocess.action_embed_net = @action_embed_net
-
-uvf/negative_distance.summarize = False
-uvf/negative_distance.relative_context = True
diff --git a/research/efficient-hrl/context/configs/hiro_xy.gin b/research/efficient-hrl/context/configs/hiro_xy.gin
deleted file mode 100644
index f35026c9e24..00000000000
--- a/research/efficient-hrl/context/configs/hiro_xy.gin
+++ /dev/null
@@ -1,14 +0,0 @@
-#-*-Python-*-
-ENV_CONTEXT = None
-EVAL_MODES = ["eval"]
-TARGET_Q_CLIPPING = None
-RESET_EPISODE_PERIOD = None
-ZERO_OBS = False
-IMAGES = False
-CONTEXT_RANGE_MIN = -10
-CONTEXT_RANGE_MAX = 10
-SUBGOAL_DIM = 2
-META_EXPLORE_NOISE = 1.0
-
-uvf/negative_distance.summarize = False
-uvf/negative_distance.relative_context = True
diff --git a/research/efficient-hrl/context/configs/point_maze.gin b/research/efficient-hrl/context/configs/point_maze.gin
deleted file mode 100644
index 0ea67d2d5ff..00000000000
--- a/research/efficient-hrl/context/configs/point_maze.gin
+++ /dev/null
@@ -1,73 +0,0 @@
-#-*-Python-*-
-# NOTE: For best training, low-level exploration (uvf_add_noise_fn.stddev)
-# should be reduced to around 0.1.
-create_maze_env.env_name = "PointMaze"
-context_range_min = -10
-context_range_max = 10
-context_range = (%context_range_min, %context_range_max)
-meta_context_range = ((-2, -2), (10, 10))
-
-RESET_EPISODE_PERIOD = 500
-RESET_ENV_PERIOD = 1
-# End episode every N steps
-UvfAgent.reset_episode_cond_fn = @every_n_steps
-every_n_steps.n = %RESET_EPISODE_PERIOD
-train_uvf.max_steps_per_episode = %RESET_EPISODE_PERIOD
-# Do a manual reset every N episodes
-UvfAgent.reset_env_cond_fn = @every_n_episodes
-every_n_episodes.n = %RESET_ENV_PERIOD
-every_n_episodes.steps_per_episode = %RESET_EPISODE_PERIOD
-
-## Config defaults
-EVAL_MODES = ["eval1", "eval2", "eval3"]
-
-## Config agent
-CONTEXT = @agent/Context
-META_CONTEXT = @meta/Context
-
-## Config agent context
-agent/Context.context_ranges = [%context_range]
-agent/Context.context_shapes = [%SUBGOAL_DIM]
-agent/Context.meta_action_every_n = 10
-agent/Context.samplers = {
- "train": [@train/DirectionSampler],
- "explore": [@train/DirectionSampler],
- "eval1": [@uvf_eval1/ConstantSampler],
- "eval2": [@uvf_eval2/ConstantSampler],
- "eval3": [@uvf_eval3/ConstantSampler],
-}
-
-agent/Context.context_transition_fn = @relative_context_transition_fn
-agent/Context.context_multi_transition_fn = @relative_context_multi_transition_fn
-
-agent/Context.reward_fn = @uvf/negative_distance
-
-## Config meta context
-meta/Context.context_ranges = [%meta_context_range]
-meta/Context.context_shapes = [2]
-meta/Context.samplers = {
- "train": [@train/RandomSampler],
- "explore": [@train/RandomSampler],
- "eval1": [@eval1/ConstantSampler],
- "eval2": [@eval2/ConstantSampler],
- "eval3": [@eval3/ConstantSampler],
-}
-meta/Context.reward_fn = @task/negative_distance
-
-## Config rewards
-task/negative_distance.state_indices = [0, 1]
-task/negative_distance.relative_context = False
-task/negative_distance.diff = False
-task/negative_distance.offset = 0.0
-
-## Config samplers
-train/RandomSampler.context_range = %meta_context_range
-train/DirectionSampler.context_range = %context_range
-train/DirectionSampler.k = %SUBGOAL_DIM
-relative_context_transition_fn.k = %SUBGOAL_DIM
-relative_context_multi_transition_fn.k = %SUBGOAL_DIM
-MetaAgent.k = %SUBGOAL_DIM
-
-eval1/ConstantSampler.value = [8, 0]
-eval2/ConstantSampler.value = [8, 8]
-eval3/ConstantSampler.value = [0, 8]
diff --git a/research/efficient-hrl/context/context.py b/research/efficient-hrl/context/context.py
deleted file mode 100644
index 76be00b4966..00000000000
--- a/research/efficient-hrl/context/context.py
+++ /dev/null
@@ -1,467 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Context for Universal Value Function agents.
-
-A context specifies a list of contextual variables, each with
- own sampling and reward computation methods.
-
-Examples of contextual variables include
- goal states, reward combination vectors, etc.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import numpy as np
-import tensorflow as tf
-from tf_agents import specs
-import gin.tf
-from utils import utils as uvf_utils
-
-
-@gin.configurable
-class Context(object):
- """Base context."""
- VAR_NAME = 'action'
-
- def __init__(self,
- tf_env,
- context_ranges=None,
- context_shapes=None,
- state_indices=None,
- variable_indices=None,
- gamma_index=None,
- settable_context=False,
- timers=None,
- samplers=None,
- reward_weights=None,
- reward_fn=None,
- random_sampler_mode='random',
- normalizers=None,
- context_transition_fn=None,
- context_multi_transition_fn=None,
- meta_action_every_n=None):
- self._tf_env = tf_env
- self.variable_indices = variable_indices
- self.gamma_index = gamma_index
- self._settable_context = settable_context
- self.timers = timers
- self._context_transition_fn = context_transition_fn
- self._context_multi_transition_fn = context_multi_transition_fn
- self._random_sampler_mode = random_sampler_mode
-
- # assign specs
- self._obs_spec = self._tf_env.observation_spec()
- self._context_shapes = tuple([
- shape if shape is not None else self._obs_spec.shape
- for shape in context_shapes
- ])
- self.context_specs = tuple([
- specs.TensorSpec(dtype=self._obs_spec.dtype, shape=shape)
- for shape in self._context_shapes
- ])
- if context_ranges is not None:
- self.context_ranges = context_ranges
- else:
- self.context_ranges = [None] * len(self._context_shapes)
-
- self.context_as_action_specs = tuple([
- specs.BoundedTensorSpec(
- shape=shape,
- dtype=(tf.float32 if self._obs_spec.dtype in
- [tf.float32, tf.float64] else self._obs_spec.dtype),
- minimum=context_range[0],
- maximum=context_range[-1])
- for shape, context_range in zip(self._context_shapes, self.context_ranges)
- ])
-
- if state_indices is not None:
- self.state_indices = state_indices
- else:
- self.state_indices = [None] * len(self._context_shapes)
- if self.variable_indices is not None and self.n != len(
- self.variable_indices):
- raise ValueError(
- 'variable_indices (%s) must have the same length as contexts (%s).' %
- (self.variable_indices, self.context_specs))
- assert self.n == len(self.context_ranges)
- assert self.n == len(self.state_indices)
-
- # assign reward/sampler fns
- self._sampler_fns = dict()
- self._samplers = dict()
- self._reward_fns = dict()
-
- # assign reward fns
- self._add_custom_reward_fns()
- reward_weights = reward_weights or None
- self._reward_fn = self._make_reward_fn(reward_fn, reward_weights)
-
- # assign samplers
- self._add_custom_sampler_fns()
- for mode, sampler_fns in samplers.items():
- self._make_sampler_fn(sampler_fns, mode)
-
- # create normalizers
- if normalizers is None:
- self._normalizers = [None] * len(self.context_specs)
- else:
- self._normalizers = [
- normalizer(tf.zeros(shape=spec.shape, dtype=spec.dtype))
- if normalizer is not None else None
- for normalizer, spec in zip(normalizers, self.context_specs)
- ]
- assert self.n == len(self._normalizers)
-
- self.meta_action_every_n = meta_action_every_n
-
- # create vars
- self.context_vars = {}
- self.timer_vars = {}
- self.create_vars(self.VAR_NAME)
- self.t = tf.Variable(
- tf.zeros(shape=(), dtype=tf.int32), name='num_timer_steps')
-
- def _add_custom_reward_fns(self):
- pass
-
- def _add_custom_sampler_fns(self):
- pass
-
- def sample_random_contexts(self, batch_size):
- """Sample random batch contexts."""
- assert self._random_sampler_mode is not None
- return self.sample_contexts(self._random_sampler_mode, batch_size)[0]
-
- def sample_contexts(self, mode, batch_size, state=None, next_state=None,
- **kwargs):
- """Sample a batch of contexts.
-
- Args:
- mode: A string representing the mode [`train`, `explore`, `eval`].
- batch_size: Batch size.
- Returns:
- Two lists of [batch_size, num_context_dims] contexts.
- """
- contexts, next_contexts = self._sampler_fns[mode](
- batch_size, state=state, next_state=next_state,
- **kwargs)
- self._validate_contexts(contexts)
- self._validate_contexts(next_contexts)
- return contexts, next_contexts
-
- def compute_rewards(self, mode, states, actions, rewards, next_states,
- contexts):
- """Compute context-based rewards.
-
- Args:
- mode: A string representing the mode ['uvf', 'task'].
- states: A [batch_size, num_state_dims] tensor.
- actions: A [batch_size, num_action_dims] tensor.
- rewards: A [batch_size] tensor representing unmodified rewards.
- next_states: A [batch_size, num_state_dims] tensor.
- contexts: A list of [batch_size, num_context_dims] tensors.
- Returns:
- A [batch_size] tensor representing rewards.
- """
- return self._reward_fn(states, actions, rewards, next_states,
- contexts)
-
- def _make_reward_fn(self, reward_fns_list, reward_weights):
- """Returns a fn that computes rewards.
-
- Args:
- reward_fns_list: A fn or a list of reward fns.
- mode: A string representing the operating mode.
- reward_weights: A list of reward weights.
- """
- if not isinstance(reward_fns_list, (list, tuple)):
- reward_fns_list = [reward_fns_list]
- if reward_weights is None:
- reward_weights = [1.0] * len(reward_fns_list)
- assert len(reward_fns_list) == len(reward_weights)
-
- reward_fns_list = [
- self._custom_reward_fns[fn] if isinstance(fn, (str,)) else fn
- for fn in reward_fns_list
- ]
-
- def reward_fn(*args, **kwargs):
- """Returns rewards, discounts."""
- reward_tuples = [
- reward_fn(*args, **kwargs) for reward_fn in reward_fns_list
- ]
- rewards_list = [reward_tuple[0] for reward_tuple in reward_tuples]
- discounts_list = [reward_tuple[1] for reward_tuple in reward_tuples]
- ndims = max([r.shape.ndims for r in rewards_list])
- if ndims > 1: # expand reward shapes to allow broadcasting
- for i in range(len(rewards_list)):
- for _ in range(rewards_list[i].shape.ndims - ndims):
- rewards_list[i] = tf.expand_dims(rewards_list[i], axis=-1)
- for _ in range(discounts_list[i].shape.ndims - ndims):
- discounts_list[i] = tf.expand_dims(discounts_list[i], axis=-1)
- rewards = tf.add_n(
- [r * tf.to_float(w) for r, w in zip(rewards_list, reward_weights)])
- discounts = discounts_list[0]
- for d in discounts_list[1:]:
- discounts *= d
-
- return rewards, discounts
-
- return reward_fn
-
- def _make_sampler_fn(self, sampler_cls_list, mode):
- """Returns a fn that samples a list of context vars.
-
- Args:
- sampler_cls_list: A list of sampler classes.
- mode: A string representing the operating mode.
- """
- if not isinstance(sampler_cls_list, (list, tuple)):
- sampler_cls_list = [sampler_cls_list]
-
- self._samplers[mode] = []
- sampler_fns = []
- for spec, sampler in zip(self.context_specs, sampler_cls_list):
- if isinstance(sampler, (str,)):
- sampler_fn = self._custom_sampler_fns[sampler]
- else:
- sampler_fn = sampler(context_spec=spec)
- self._samplers[mode].append(sampler_fn)
- sampler_fns.append(sampler_fn)
-
- def batch_sampler_fn(batch_size, state=None, next_state=None, **kwargs):
- """Sampler fn."""
- contexts_tuples = [
- sampler(batch_size, state=state, next_state=next_state, **kwargs)
- for sampler in sampler_fns]
- contexts = [c[0] for c in contexts_tuples]
- next_contexts = [c[1] for c in contexts_tuples]
- contexts = [
- normalizer.update_apply(c) if normalizer is not None else c
- for normalizer, c in zip(self._normalizers, contexts)
- ]
- next_contexts = [
- normalizer.apply(c) if normalizer is not None else c
- for normalizer, c in zip(self._normalizers, next_contexts)
- ]
- return contexts, next_contexts
-
- self._sampler_fns[mode] = batch_sampler_fn
-
- def set_env_context_op(self, context, disable_unnormalizer=False):
- """Returns a TensorFlow op that sets the environment context.
-
- Args:
- context: A list of context Tensor variables.
- disable_unnormalizer: Disable unnormalization.
- Returns:
- A TensorFlow op that sets the environment context.
- """
- ret_val = np.array(1.0, dtype=np.float32)
- if not self._settable_context:
- return tf.identity(ret_val)
-
- if not disable_unnormalizer:
- context = [
- normalizer.unapply(tf.expand_dims(c, 0))[0]
- if normalizer is not None else c
- for normalizer, c in zip(self._normalizers, context)
- ]
-
- def set_context_func(*env_context_values):
- tf.logging.info('[set_env_context_op] Setting gym environment context.')
- # pylint: disable=protected-access
- self.gym_env.set_context(*env_context_values)
- return ret_val
- # pylint: enable=protected-access
-
- with tf.name_scope('set_env_context'):
- set_op = tf.py_func(set_context_func, context, tf.float32,
- name='set_env_context_py_func')
- set_op.set_shape([])
- return set_op
-
- def set_replay(self, replay):
- """Set replay buffer for samplers.
-
- Args:
- replay: A replay buffer.
- """
- for _, samplers in self._samplers.items():
- for sampler in samplers:
- sampler.set_replay(replay)
-
- def get_clip_fns(self):
- """Returns a list of clip fns for contexts.
-
- Returns:
- A list of fns that clip context tensors.
- """
- clip_fns = []
- for context_range in self.context_ranges:
- def clip_fn(var_, range_=context_range):
- """Clip a tensor."""
- if range_ is None:
- clipped_var = tf.identity(var_)
- elif isinstance(range_[0], (int, long, float, list, np.ndarray)):
- clipped_var = tf.clip_by_value(
- var_,
- range_[0],
- range_[1],)
- else: raise NotImplementedError(range_)
- return clipped_var
- clip_fns.append(clip_fn)
- return clip_fns
-
- def _validate_contexts(self, contexts):
- """Validate if contexts have right specs.
-
- Args:
- contexts: A list of [batch_size, num_context_dim] tensors.
- Raises:
- ValueError: If shape or dtype mismatches that of spec.
- """
- for i, (context, spec) in enumerate(zip(contexts, self.context_specs)):
- if context[0].shape != spec.shape:
- raise ValueError('contexts[%d] has invalid shape %s wrt spec shape %s' %
- (i, context[0].shape, spec.shape))
- if context.dtype != spec.dtype:
- raise ValueError('contexts[%d] has invalid dtype %s wrt spec dtype %s' %
- (i, context.dtype, spec.dtype))
-
- def context_multi_transition_fn(self, contexts, **kwargs):
- """Returns multiple future contexts starting from a batch."""
- assert self._context_multi_transition_fn
- return self._context_multi_transition_fn(contexts, None, None, **kwargs)
-
- def step(self, mode, agent=None, action_fn=None, **kwargs):
- """Returns [next_contexts..., next_timer] list of ops.
-
- Args:
- mode: a string representing the mode=[train, explore, eval].
- **kwargs: kwargs for context_transition_fn.
- Returns:
- a list of ops that set the context.
- """
- if agent is None:
- ops = []
- if self._context_transition_fn is not None:
- def sampler_fn():
- samples = self.sample_contexts(mode, 1)[0]
- return [s[0] for s in samples]
- values = self._context_transition_fn(self.vars, self.t, sampler_fn, **kwargs)
- ops += [tf.assign(var, value) for var, value in zip(self.vars, values)]
- ops.append(tf.assign_add(self.t, 1)) # increment timer
- return ops
- else:
- ops = agent.tf_context.step(mode, **kwargs)
- state = kwargs['state']
- next_state = kwargs['next_state']
- state_repr = kwargs['state_repr']
- next_state_repr = kwargs['next_state_repr']
- with tf.control_dependencies(ops): # Step high level context before computing low level one.
- # Get the context transition function output.
- values = self._context_transition_fn(self.vars, self.t, None,
- state=state_repr,
- next_state=next_state_repr)
- # Select a new goal every C steps, otherwise use context transition.
- low_level_context = [
- tf.cond(tf.equal(self.t % self.meta_action_every_n, 0),
- lambda: tf.cast(action_fn(next_state, context=None), tf.float32),
- lambda: values)]
- ops = [tf.assign(var, value)
- for var, value in zip(self.vars, low_level_context)]
- with tf.control_dependencies(ops):
- return [tf.assign_add(self.t, 1)] # increment timer
- return ops
-
- def reset(self, mode, agent=None, action_fn=None, state=None):
- """Returns ops that reset the context.
-
- Args:
- mode: a string representing the mode=[train, explore, eval].
- Returns:
- a list of ops that reset the context.
- """
- if agent is None:
- values = self.sample_contexts(mode=mode, batch_size=1)[0]
- if values is None:
- return []
- values = [value[0] for value in values]
- values[0] = uvf_utils.tf_print(
- values[0],
- values,
- message='context:reset, mode=%s' % mode,
- first_n=10,
- name='context:reset:%s' % mode)
- all_ops = []
- for _, context_vars in sorted(self.context_vars.items()):
- ops = [tf.assign(var, value) for var, value in zip(context_vars, values)]
- all_ops += ops
- all_ops.append(self.set_env_context_op(values))
- all_ops.append(tf.assign(self.t, 0)) # reset timer
- return all_ops
- else:
- ops = agent.tf_context.reset(mode)
- # NOTE: The code is currently written in such a way that the higher level
- # policy does not provide a low-level context until the second
- # observation. Insead, we just zero-out low-level contexts.
- for key, context_vars in sorted(self.context_vars.items()):
- ops += [tf.assign(var, tf.zeros_like(var)) for var, meta_var in
- zip(context_vars, agent.tf_context.context_vars[key])]
-
- ops.append(tf.assign(self.t, 0)) # reset timer
- return ops
-
- def create_vars(self, name, agent=None):
- """Create tf variables for contexts.
-
- Args:
- name: Name of the variables.
- Returns:
- A list of [num_context_dims] tensors.
- """
- if agent is not None:
- meta_vars = agent.create_vars(name)
- else:
- meta_vars = {}
- assert name not in self.context_vars, ('Conflict! %s is already '
- 'initialized.') % name
- self.context_vars[name] = tuple([
- tf.Variable(
- tf.zeros(shape=spec.shape, dtype=spec.dtype),
- name='%s_context_%d' % (name, i))
- for i, spec in enumerate(self.context_specs)
- ])
- return self.context_vars[name], meta_vars
-
- @property
- def n(self):
- return len(self.context_specs)
-
- @property
- def vars(self):
- return self.context_vars[self.VAR_NAME]
-
- # pylint: disable=protected-access
- @property
- def gym_env(self):
- return self._tf_env.pyenv._gym_env
-
- @property
- def tf_env(self):
- return self._tf_env
- # pylint: enable=protected-access
diff --git a/research/efficient-hrl/context/context_transition_functions.py b/research/efficient-hrl/context/context_transition_functions.py
deleted file mode 100644
index 70326debde4..00000000000
--- a/research/efficient-hrl/context/context_transition_functions.py
+++ /dev/null
@@ -1,123 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Context functions.
-
-Given the current contexts, timer and context sampler, returns new contexts
- after an environment step. This can be used to define a high-level policy
- that controls contexts as its actions.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-import gin.tf
-import utils as uvf_utils
-
-
-@gin.configurable
-def periodic_context_fn(contexts, timer, sampler_fn, period=1):
- """Periodically samples contexts.
-
- Args:
- contexts: a list of [num_context_dims] tensor variables representing
- current contexts.
- timer: a scalar integer tensor variable holding the current time step.
- sampler_fn: a sampler function that samples a list of [num_context_dims]
- tensors.
- period: (integer) period of update.
- Returns:
- a list of [num_context_dims] tensors.
- """
- contexts = list(contexts[:]) # create copy
- return tf.cond(tf.mod(timer, period) == 0, sampler_fn, lambda: contexts)
-
-
-@gin.configurable
-def timer_context_fn(contexts,
- timer,
- sampler_fn,
- period=1,
- timer_index=-1,
- debug=False):
- """Samples contexts based on timer in contexts.
-
- Args:
- contexts: a list of [num_context_dims] tensor variables representing
- current contexts.
- timer: a scalar integer tensor variable holding the current time step.
- sampler_fn: a sampler function that samples a list of [num_context_dims]
- tensors.
- period: (integer) period of update; actual period = `period` + 1.
- timer_index: (integer) Index of context list that present timer.
- debug: (boolean) Print debug messages.
- Returns:
- a list of [num_context_dims] tensors.
- """
- contexts = list(contexts[:]) # create copy
- cond = tf.equal(contexts[timer_index][0], 0)
- def reset():
- """Sample context and reset the timer."""
- new_contexts = sampler_fn()
- new_contexts[timer_index] = tf.zeros_like(
- contexts[timer_index]) + period
- return new_contexts
- def update():
- """Decrement the timer."""
- contexts[timer_index] -= 1
- return contexts
- values = tf.cond(cond, reset, update)
- if debug:
- values[0] = uvf_utils.tf_print(
- values[0],
- values + [timer],
- 'timer_context_fn',
- first_n=200,
- name='timer_context_fn:contexts')
- return values
-
-
-@gin.configurable
-def relative_context_transition_fn(
- contexts, timer, sampler_fn,
- k=2, state=None, next_state=None,
- **kwargs):
- """Contexts updated to be relative to next state.
- """
- contexts = list(contexts[:]) # create copy
- assert len(contexts) == 1
- new_contexts = [
- tf.concat(
- [contexts[0][:k] + state[:k] - next_state[:k],
- contexts[0][k:]], -1)]
- return new_contexts
-
-
-@gin.configurable
-def relative_context_multi_transition_fn(
- contexts, timer, sampler_fn,
- k=2, states=None,
- **kwargs):
- """Given contexts at first state and sequence of states, derives sequence of all contexts.
- """
- contexts = list(contexts[:]) # create copy
- assert len(contexts) == 1
- contexts = [
- tf.concat(
- [tf.expand_dims(contexts[0][:, :k] + states[:, 0, :k], 1) - states[:, :, :k],
- contexts[0][:, None, k:] * tf.ones_like(states[:, :, :1])], -1)]
- return contexts
diff --git a/research/efficient-hrl/context/gin_imports.py b/research/efficient-hrl/context/gin_imports.py
deleted file mode 100644
index 94512cef847..00000000000
--- a/research/efficient-hrl/context/gin_imports.py
+++ /dev/null
@@ -1,25 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Import gin configurable modules.
-"""
-
-# pylint: disable=unused-import
-from context import context
-from context import context_transition_functions
-from context import gin_utils
-from context import rewards_functions
-from context import samplers
-# pylint: disable=unused-import
diff --git a/research/efficient-hrl/context/gin_utils.py b/research/efficient-hrl/context/gin_utils.py
deleted file mode 100644
index ab7c1b2d1dd..00000000000
--- a/research/efficient-hrl/context/gin_utils.py
+++ /dev/null
@@ -1,45 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Gin configurable utility functions.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import gin.tf
-
-
-@gin.configurable
-def gin_sparse_array(size, values, indices, fill_value=0):
- arr = np.zeros(size)
- arr.fill(fill_value)
- arr[indices] = values
- return arr
-
-
-@gin.configurable
-def gin_sum(values):
- result = values[0]
- for value in values[1:]:
- result += value
- return result
-
-
-@gin.configurable
-def gin_range(n):
- return range(n)
diff --git a/research/efficient-hrl/context/rewards_functions.py b/research/efficient-hrl/context/rewards_functions.py
deleted file mode 100644
index ab560a7f429..00000000000
--- a/research/efficient-hrl/context/rewards_functions.py
+++ /dev/null
@@ -1,741 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Reward shaping functions used by Contexts.
-
- Each reward function should take the following inputs and return new rewards,
- and discounts.
-
- new_rewards, discounts = reward_fn(states, actions, rewards,
- next_states, contexts)
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-import gin.tf
-
-
-def summarize_stats(stats):
- """Summarize a dictionary of variables.
-
- Args:
- stats: a dictionary of {name: tensor} to compute stats over.
- """
- for name, stat in stats.items():
- mean = tf.reduce_mean(stat)
- tf.summary.scalar('mean_%s' % name, mean)
- tf.summary.scalar('max_%s' % name, tf.reduce_max(stat))
- tf.summary.scalar('min_%s' % name, tf.reduce_min(stat))
- std = tf.sqrt(tf.reduce_mean(tf.square(stat)) - tf.square(mean) + 1e-10)
- tf.summary.scalar('std_%s' % name, std)
- tf.summary.histogram(name, stat)
-
-
-def index_states(states, indices):
- """Return indexed states.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- indices: (a list of Numpy integer array) Indices of states dimensions
- to be mapped.
- Returns:
- A [batch_size, num_indices] Tensor representing the batch of indexed states.
- """
- if indices is None:
- return states
- indices = tf.constant(indices, dtype=tf.int32)
- return tf.gather(states, indices=indices, axis=1)
-
-
-def record_tensor(tensor, indices, stats, name='states'):
- """Record specified tensor dimensions into stats.
-
- Args:
- tensor: A [batch_size, num_dims] Tensor.
- indices: (a list of integers) Indices of dimensions to record.
- stats: A dictionary holding stats.
- name: (string) Name of tensor.
- """
- if indices is None:
- indices = range(tensor.shape.as_list()[1])
- for index in indices:
- stats['%s_%02d' % (name, index)] = tensor[:, index]
-
-
-@gin.configurable
-def potential_rewards(states,
- actions,
- rewards,
- next_states,
- contexts,
- gamma=1.0,
- reward_fn=None):
- """Return the potential-based rewards.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- gamma: Reward discount.
- reward_fn: A reward function.
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del actions # unused args
- gamma = tf.to_float(gamma)
- rewards_tp1, discounts = reward_fn(None, None, rewards, next_states, contexts)
- rewards, _ = reward_fn(None, None, rewards, states, contexts)
- return -rewards + gamma * rewards_tp1, discounts
-
-
-@gin.configurable
-def timed_rewards(states,
- actions,
- rewards,
- next_states,
- contexts,
- reward_fn=None,
- dense=False,
- timer_index=-1):
- """Return the timed rewards.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- reward_fn: A reward function.
- dense: (boolean) Provide dense rewards or sparse rewards at time = 0.
- timer_index: (integer) The context list index that specifies timer.
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- assert contexts[timer_index].get_shape().as_list()[1] == 1
- timers = contexts[timer_index][:, 0]
- rewards, discounts = reward_fn(states, actions, rewards, next_states,
- contexts)
- terminates = tf.to_float(timers <= 0) # if terminate set 1, else set 0
- for _ in range(rewards.shape.ndims - 1):
- terminates = tf.expand_dims(terminates, axis=-1)
- if not dense:
- rewards *= terminates # if terminate, return rewards, else return 0
- discounts *= (tf.to_float(1.0) - terminates)
- return rewards, discounts
-
-
-@gin.configurable
-def reset_rewards(states,
- actions,
- rewards,
- next_states,
- contexts,
- reset_index=0,
- reset_state=None,
- reset_reward_function=None,
- include_forward_rewards=True,
- include_reset_rewards=True):
- """Returns the rewards for a forward/reset agent.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- reset_index: (integer) The context list index that specifies reset.
- reset_state: Reset state.
- reset_reward_function: Reward function for reset step.
- include_forward_rewards: Include the rewards from the forward pass.
- include_reset_rewards: Include the rewards from the reset pass.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- reset_state = tf.constant(
- reset_state, dtype=next_states.dtype, shape=next_states.shape)
- reset_states = tf.expand_dims(reset_state, 0)
-
- def true_fn():
- if include_reset_rewards:
- return reset_reward_function(states, actions, rewards, next_states,
- [reset_states] + contexts[1:])
- else:
- return tf.zeros_like(rewards), tf.ones_like(rewards)
-
- def false_fn():
- if include_forward_rewards:
- return plain_rewards(states, actions, rewards, next_states, contexts)
- else:
- return tf.zeros_like(rewards), tf.ones_like(rewards)
-
- rewards, discounts = tf.cond(
- tf.cast(contexts[reset_index][0, 0], dtype=tf.bool), true_fn, false_fn)
- return rewards, discounts
-
-
-@gin.configurable
-def tanh_similarity(states,
- actions,
- rewards,
- next_states,
- contexts,
- mse_scale=1.0,
- state_scales=1.0,
- goal_scales=1.0,
- summarize=False):
- """Returns the similarity between next_states and contexts using tanh and mse.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- mse_scale: A float, to scale mse before tanh.
- state_scales: multiplicative scale for (next) states. A scalar or 1D tensor,
- must be broadcastable to number of state dimensions.
- goal_scales: multiplicative scale for contexts. A scalar or 1D tensor,
- must be broadcastable to number of goal dimensions.
- summarize: (boolean) enable summary ops.
-
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del states, actions, rewards # Unused
- mse = tf.reduce_mean(tf.squared_difference(next_states * state_scales,
- contexts[0] * goal_scales), -1)
- tanh = tf.tanh(mse_scale * mse)
- if summarize:
- with tf.name_scope('RewardFn/'):
- tf.summary.scalar('mean_mse', tf.reduce_mean(mse))
- tf.summary.histogram('mse', mse)
- tf.summary.scalar('mean_tanh', tf.reduce_mean(tanh))
- tf.summary.histogram('tanh', tanh)
- rewards = tf.to_float(1 - tanh)
- return rewards, tf.ones_like(rewards)
-
-
-@gin.configurable
-def negative_mse(states,
- actions,
- rewards,
- next_states,
- contexts,
- state_scales=1.0,
- goal_scales=1.0,
- summarize=False):
- """Returns the negative mean square error between next_states and contexts.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- state_scales: multiplicative scale for (next) states. A scalar or 1D tensor,
- must be broadcastable to number of state dimensions.
- goal_scales: multiplicative scale for contexts. A scalar or 1D tensor,
- must be broadcastable to number of goal dimensions.
- summarize: (boolean) enable summary ops.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del states, actions, rewards # Unused
- mse = tf.reduce_mean(tf.squared_difference(next_states * state_scales,
- contexts[0] * goal_scales), -1)
- if summarize:
- with tf.name_scope('RewardFn/'):
- tf.summary.scalar('mean_mse', tf.reduce_mean(mse))
- tf.summary.histogram('mse', mse)
- rewards = tf.to_float(-mse)
- return rewards, tf.ones_like(rewards)
-
-
-@gin.configurable
-def negative_distance(states,
- actions,
- rewards,
- next_states,
- contexts,
- state_scales=1.0,
- goal_scales=1.0,
- reward_scales=1.0,
- weight_index=None,
- weight_vector=None,
- summarize=False,
- termination_epsilon=1e-4,
- state_indices=None,
- goal_indices=None,
- vectorize=False,
- relative_context=False,
- diff=False,
- norm='L2',
- epsilon=1e-10,
- bonus_epsilon=0., #5.,
- offset=0.0):
- """Returns the negative euclidean distance between next_states and contexts.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- state_scales: multiplicative scale for (next) states. A scalar or 1D tensor,
- must be broadcastable to number of state dimensions.
- goal_scales: multiplicative scale for goals. A scalar or 1D tensor,
- must be broadcastable to number of goal dimensions.
- reward_scales: multiplicative scale for rewards. A scalar or 1D tensor,
- must be broadcastable to number of reward dimensions.
- weight_index: (integer) The context list index that specifies weight.
- weight_vector: (a number or a list or Numpy array) The weighting vector,
- broadcastable to `next_states`.
- summarize: (boolean) enable summary ops.
- termination_epsilon: terminate if dist is less than this quantity.
- state_indices: (a list of integers) list of state indices to select.
- goal_indices: (a list of integers) list of goal indices to select.
- vectorize: Return a vectorized form.
- norm: L1 or L2.
- epsilon: small offset to ensure non-negative/zero distance.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del actions, rewards # Unused
- stats = {}
- record_tensor(next_states, state_indices, stats, 'next_states')
- states = index_states(states, state_indices)
- next_states = index_states(next_states, state_indices)
- goals = index_states(contexts[0], goal_indices)
- if relative_context:
- goals = states + goals
- sq_dists = tf.squared_difference(next_states * state_scales,
- goals * goal_scales)
- old_sq_dists = tf.squared_difference(states * state_scales,
- goals * goal_scales)
- record_tensor(sq_dists, None, stats, 'sq_dists')
- if weight_vector is not None:
- sq_dists *= tf.convert_to_tensor(weight_vector, dtype=next_states.dtype)
- old_sq_dists *= tf.convert_to_tensor(weight_vector, dtype=next_states.dtype)
- if weight_index is not None:
- #sq_dists *= contexts[weight_index]
- weights = tf.abs(index_states(contexts[0], weight_index))
- #weights /= tf.reduce_sum(weights, -1, keepdims=True)
- sq_dists *= weights
- old_sq_dists *= weights
- if norm == 'L1':
- dist = tf.sqrt(sq_dists + epsilon)
- old_dist = tf.sqrt(old_sq_dists + epsilon)
- if not vectorize:
- dist = tf.reduce_sum(dist, -1)
- old_dist = tf.reduce_sum(old_dist, -1)
- elif norm == 'L2':
- if vectorize:
- dist = sq_dists
- old_dist = old_sq_dists
- else:
- dist = tf.reduce_sum(sq_dists, -1)
- old_dist = tf.reduce_sum(old_sq_dists, -1)
- dist = tf.sqrt(dist + epsilon) # tf.gradients fails when tf.sqrt(-0.0)
- old_dist = tf.sqrt(old_dist + epsilon) # tf.gradients fails when tf.sqrt(-0.0)
- else:
- raise NotImplementedError(norm)
- discounts = dist > termination_epsilon
- if summarize:
- with tf.name_scope('RewardFn/'):
- tf.summary.scalar('mean_dist', tf.reduce_mean(dist))
- tf.summary.histogram('dist', dist)
- summarize_stats(stats)
- bonus = tf.to_float(dist < bonus_epsilon)
- dist *= reward_scales
- old_dist *= reward_scales
- if diff:
- return bonus + offset + tf.to_float(old_dist - dist), tf.to_float(discounts)
- return bonus + offset + tf.to_float(-dist), tf.to_float(discounts)
-
-
-@gin.configurable
-def cosine_similarity(states,
- actions,
- rewards,
- next_states,
- contexts,
- state_scales=1.0,
- goal_scales=1.0,
- reward_scales=1.0,
- normalize_states=True,
- normalize_goals=True,
- weight_index=None,
- weight_vector=None,
- summarize=False,
- state_indices=None,
- goal_indices=None,
- offset=0.0):
- """Returns the cosine similarity between next_states - states and contexts.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- state_scales: multiplicative scale for (next) states. A scalar or 1D tensor,
- must be broadcastable to number of state dimensions.
- goal_scales: multiplicative scale for goals. A scalar or 1D tensor,
- must be broadcastable to number of goal dimensions.
- reward_scales: multiplicative scale for rewards. A scalar or 1D tensor,
- must be broadcastable to number of reward dimensions.
- weight_index: (integer) The context list index that specifies weight.
- weight_vector: (a number or a list or Numpy array) The weighting vector,
- broadcastable to `next_states`.
- summarize: (boolean) enable summary ops.
- termination_epsilon: terminate if dist is less than this quantity.
- state_indices: (a list of integers) list of state indices to select.
- goal_indices: (a list of integers) list of goal indices to select.
- vectorize: Return a vectorized form.
- norm: L1 or L2.
- epsilon: small offset to ensure non-negative/zero distance.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del actions, rewards # Unused
- stats = {}
- record_tensor(next_states, state_indices, stats, 'next_states')
- states = index_states(states, state_indices)
- next_states = index_states(next_states, state_indices)
- goals = index_states(contexts[0], goal_indices)
-
- if weight_vector is not None:
- goals *= tf.convert_to_tensor(weight_vector, dtype=next_states.dtype)
- if weight_index is not None:
- weights = tf.abs(index_states(contexts[0], weight_index))
- goals *= weights
-
- direction_vec = next_states - states
- if normalize_states:
- direction_vec = tf.nn.l2_normalize(direction_vec, -1)
- goal_vec = goals
- if normalize_goals:
- goal_vec = tf.nn.l2_normalize(goal_vec, -1)
-
- similarity = tf.reduce_sum(goal_vec * direction_vec, -1)
- discounts = tf.ones_like(similarity)
- return offset + tf.to_float(similarity), tf.to_float(discounts)
-
-
-@gin.configurable
-def diff_distance(states,
- actions,
- rewards,
- next_states,
- contexts,
- state_scales=1.0,
- goal_scales=1.0,
- reward_scales=1.0,
- weight_index=None,
- weight_vector=None,
- summarize=False,
- termination_epsilon=1e-4,
- state_indices=None,
- goal_indices=None,
- norm='L2',
- epsilon=1e-10):
- """Returns the difference in euclidean distance between states/next_states and contexts.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- state_scales: multiplicative scale for (next) states. A scalar or 1D tensor,
- must be broadcastable to number of state dimensions.
- goal_scales: multiplicative scale for goals. A scalar or 1D tensor,
- must be broadcastable to number of goal dimensions.
- reward_scales: multiplicative scale for rewards. A scalar or 1D tensor,
- must be broadcastable to number of reward dimensions.
- weight_index: (integer) The context list index that specifies weight.
- weight_vector: (a number or a list or Numpy array) The weighting vector,
- broadcastable to `next_states`.
- summarize: (boolean) enable summary ops.
- termination_epsilon: terminate if dist is less than this quantity.
- state_indices: (a list of integers) list of state indices to select.
- goal_indices: (a list of integers) list of goal indices to select.
- vectorize: Return a vectorized form.
- norm: L1 or L2.
- epsilon: small offset to ensure non-negative/zero distance.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del actions, rewards # Unused
- stats = {}
- record_tensor(next_states, state_indices, stats, 'next_states')
- next_states = index_states(next_states, state_indices)
- states = index_states(states, state_indices)
- goals = index_states(contexts[0], goal_indices)
- next_sq_dists = tf.squared_difference(next_states * state_scales,
- goals * goal_scales)
- sq_dists = tf.squared_difference(states * state_scales,
- goals * goal_scales)
- record_tensor(sq_dists, None, stats, 'sq_dists')
- if weight_vector is not None:
- next_sq_dists *= tf.convert_to_tensor(weight_vector, dtype=next_states.dtype)
- sq_dists *= tf.convert_to_tensor(weight_vector, dtype=next_states.dtype)
- if weight_index is not None:
- next_sq_dists *= contexts[weight_index]
- sq_dists *= contexts[weight_index]
- if norm == 'L1':
- next_dist = tf.sqrt(next_sq_dists + epsilon)
- dist = tf.sqrt(sq_dists + epsilon)
- next_dist = tf.reduce_sum(next_dist, -1)
- dist = tf.reduce_sum(dist, -1)
- elif norm == 'L2':
- next_dist = tf.reduce_sum(next_sq_dists, -1)
- next_dist = tf.sqrt(next_dist + epsilon) # tf.gradients fails when tf.sqrt(-0.0)
- dist = tf.reduce_sum(sq_dists, -1)
- dist = tf.sqrt(dist + epsilon) # tf.gradients fails when tf.sqrt(-0.0)
- else:
- raise NotImplementedError(norm)
- discounts = next_dist > termination_epsilon
- if summarize:
- with tf.name_scope('RewardFn/'):
- tf.summary.scalar('mean_dist', tf.reduce_mean(dist))
- tf.summary.histogram('dist', dist)
- summarize_stats(stats)
- diff = dist - next_dist
- diff *= reward_scales
- return tf.to_float(diff), tf.to_float(discounts)
-
-
-@gin.configurable
-def binary_indicator(states,
- actions,
- rewards,
- next_states,
- contexts,
- termination_epsilon=1e-4,
- offset=0,
- epsilon=1e-10,
- state_indices=None,
- summarize=False):
- """Returns 0/1 by checking if next_states and contexts overlap.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- termination_epsilon: terminate if dist is less than this quantity.
- offset: Offset the rewards.
- epsilon: small offset to ensure non-negative/zero distance.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del states, actions # unused args
- next_states = index_states(next_states, state_indices)
- dist = tf.reduce_sum(tf.squared_difference(next_states, contexts[0]), -1)
- dist = tf.sqrt(dist + epsilon)
- discounts = dist > termination_epsilon
- rewards = tf.logical_not(discounts)
- rewards = tf.to_float(rewards) + offset
- return tf.to_float(rewards), tf.ones_like(tf.to_float(discounts)) #tf.to_float(discounts)
-
-
-@gin.configurable
-def plain_rewards(states, actions, rewards, next_states, contexts):
- """Returns the given rewards.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del states, actions, next_states, contexts # Unused
- return rewards, tf.ones_like(rewards)
-
-
-@gin.configurable
-def ctrl_rewards(states,
- actions,
- rewards,
- next_states,
- contexts,
- reward_scales=1.0):
- """Returns the negative control cost.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- reward_scales: multiplicative scale for rewards. A scalar or 1D tensor,
- must be broadcastable to number of reward dimensions.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del states, rewards, contexts # Unused
- if actions is None:
- rewards = tf.to_float(tf.zeros(shape=next_states.shape[:1]))
- else:
- rewards = -tf.reduce_sum(tf.square(actions), axis=1)
- rewards *= reward_scales
- rewards = tf.to_float(rewards)
- return rewards, tf.ones_like(rewards)
-
-
-@gin.configurable
-def diff_rewards(
- states,
- actions,
- rewards,
- next_states,
- contexts,
- state_indices=None,
- goal_index=0,):
- """Returns (next_states - goals) as a batched vector reward."""
- del states, rewards, actions # Unused
- if state_indices is not None:
- next_states = index_states(next_states, state_indices)
- rewards = tf.to_float(next_states - contexts[goal_index])
- return rewards, tf.ones_like(rewards)
-
-
-@gin.configurable
-def state_rewards(states,
- actions,
- rewards,
- next_states,
- contexts,
- weight_index=None,
- state_indices=None,
- weight_vector=1.0,
- offset_vector=0.0,
- summarize=False):
- """Returns the rewards that are linear mapping of next_states.
-
- Args:
- states: A [batch_size, num_state_dims] Tensor representing a batch
- of states.
- actions: A [batch_size, num_action_dims] Tensor representing a batch
- of actions.
- rewards: A [batch_size] Tensor representing a batch of rewards.
- next_states: A [batch_size, num_state_dims] Tensor representing a batch
- of next states.
- contexts: A list of [batch_size, num_context_dims] Tensor representing
- a batch of contexts.
- weight_index: (integer) Index of contexts lists that specify weighting.
- state_indices: (a list of Numpy integer array) Indices of states dimensions
- to be mapped.
- weight_vector: (a number or a list or Numpy array) The weighting vector,
- broadcastable to `next_states`.
- offset_vector: (a number or a list of Numpy array) The off vector.
- summarize: (boolean) enable summary ops.
-
- Returns:
- A new tf.float32 [batch_size] rewards Tensor, and
- tf.float32 [batch_size] discounts tensor.
- """
- del states, actions, rewards # unused args
- stats = {}
- record_tensor(next_states, state_indices, stats)
- next_states = index_states(next_states, state_indices)
- weight = tf.constant(
- weight_vector, dtype=next_states.dtype, shape=next_states[0].shape)
- weights = tf.expand_dims(weight, 0)
- offset = tf.constant(
- offset_vector, dtype=next_states.dtype, shape=next_states[0].shape)
- offsets = tf.expand_dims(offset, 0)
- if weight_index is not None:
- weights *= contexts[weight_index]
- rewards = tf.to_float(tf.reduce_sum(weights * (next_states+offsets), axis=1))
- if summarize:
- with tf.name_scope('RewardFn/'):
- summarize_stats(stats)
- return rewards, tf.ones_like(rewards)
diff --git a/research/efficient-hrl/context/samplers.py b/research/efficient-hrl/context/samplers.py
deleted file mode 100644
index 15a22df5eb3..00000000000
--- a/research/efficient-hrl/context/samplers.py
+++ /dev/null
@@ -1,445 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Samplers for Contexts.
-
- Each sampler class should define __call__(batch_size).
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow as tf
-slim = tf.contrib.slim
-import gin.tf
-
-
-@gin.configurable
-class BaseSampler(object):
- """Base sampler."""
-
- def __init__(self, context_spec, context_range=None, k=2, scope='sampler'):
- """Construct a base sampler.
-
- Args:
- context_spec: A context spec.
- context_range: A tuple of (minval, max), where minval, maxval are floats
- or Numpy arrays with the same shape as the context.
- scope: A string denoting scope.
- """
- self._context_spec = context_spec
- self._context_range = context_range
- self._k = k
- self._scope = scope
-
- def __call__(self, batch_size, **kwargs):
- raise NotImplementedError
-
- def set_replay(self, replay=None):
- pass
-
- def _validate_contexts(self, contexts):
- """Validate if contexts have right spec.
-
- Args:
- contexts: A [batch_size, num_contexts_dim] tensor.
- Raises:
- ValueError: If shape or dtype mismatches that of spec.
- """
- if contexts[0].shape != self._context_spec.shape:
- raise ValueError('contexts has invalid shape %s wrt spec shape %s' %
- (contexts[0].shape, self._context_spec.shape))
- if contexts.dtype != self._context_spec.dtype:
- raise ValueError('contexts has invalid dtype %s wrt spec dtype %s' %
- (contexts.dtype, self._context_spec.dtype))
-
-
-@gin.configurable
-class ZeroSampler(BaseSampler):
- """Zero sampler."""
-
- def __call__(self, batch_size, **kwargs):
- """Sample a batch of context.
-
- Args:
- batch_size: Batch size.
- Returns:
- Two [batch_size, num_context_dims] tensors.
- """
- contexts = tf.zeros(
- dtype=self._context_spec.dtype,
- shape=[
- batch_size,
- ] + self._context_spec.shape.as_list())
- return contexts, contexts
-
-
-@gin.configurable
-class BinarySampler(BaseSampler):
- """Binary sampler."""
-
- def __init__(self, probs=0.5, *args, **kwargs):
- """Constructor."""
- super(BinarySampler, self).__init__(*args, **kwargs)
- self._probs = probs
-
- def __call__(self, batch_size, **kwargs):
- """Sample a batch of context."""
- spec = self._context_spec
- contexts = tf.random_uniform(
- shape=[
- batch_size,
- ] + spec.shape.as_list(), dtype=tf.float32)
- contexts = tf.cast(tf.greater(contexts, self._probs), dtype=spec.dtype)
- return contexts, contexts
-
-
-@gin.configurable
-class RandomSampler(BaseSampler):
- """Random sampler."""
-
- def __call__(self, batch_size, **kwargs):
- """Sample a batch of context.
-
- Args:
- batch_size: Batch size.
- Returns:
- Two [batch_size, num_context_dims] tensors.
- """
- spec = self._context_spec
- context_range = self._context_range
- if isinstance(context_range[0], (int, float)):
- contexts = tf.random_uniform(
- shape=[
- batch_size,
- ] + spec.shape.as_list(),
- minval=context_range[0],
- maxval=context_range[1],
- dtype=spec.dtype)
- elif isinstance(context_range[0], (list, tuple, np.ndarray)):
- assert len(spec.shape.as_list()) == 1
- assert spec.shape.as_list()[0] == len(context_range[0])
- assert spec.shape.as_list()[0] == len(context_range[1])
- contexts = tf.concat(
- [
- tf.random_uniform(
- shape=[
- batch_size, 1,
- ] + spec.shape.as_list()[1:],
- minval=context_range[0][i],
- maxval=context_range[1][i],
- dtype=spec.dtype) for i in range(spec.shape.as_list()[0])
- ],
- axis=1)
- else: raise NotImplementedError(context_range)
- self._validate_contexts(contexts)
- state, next_state = kwargs['state'], kwargs['next_state']
- if state is not None and next_state is not None:
- pass
- #contexts = tf.concat(
- # [tf.random_normal(tf.shape(state[:, :self._k]), dtype=tf.float64) +
- # tf.random_shuffle(state[:, :self._k]),
- # contexts[:, self._k:]], 1)
-
- return contexts, contexts
-
-
-@gin.configurable
-class ScheduledSampler(BaseSampler):
- """Scheduled sampler."""
-
- def __init__(self,
- scope='default',
- values=None,
- scheduler='cycle',
- scheduler_params=None,
- *args, **kwargs):
- """Construct sampler.
-
- Args:
- scope: Scope name.
- values: A list of numbers or [num_context_dim] Numpy arrays
- representing the values to cycle.
- scheduler: scheduler type.
- scheduler_params: scheduler parameters.
- *args: arguments.
- **kwargs: keyword arguments.
- """
- super(ScheduledSampler, self).__init__(*args, **kwargs)
- self._scope = scope
- self._values = values
- self._scheduler = scheduler
- self._scheduler_params = scheduler_params or {}
- assert self._values is not None and len(
- self._values), 'must provide non-empty values.'
- self._n = len(self._values)
- # TODO(shanegu): move variable creation outside. resolve tf.cond problem.
- self._count = 0
- self._i = tf.Variable(
- tf.zeros(shape=(), dtype=tf.int32),
- name='%s-scheduled_sampler_%d' % (self._scope, self._count))
- self._values = tf.constant(self._values, dtype=self._context_spec.dtype)
-
- def __call__(self, batch_size, **kwargs):
- """Sample a batch of context.
-
- Args:
- batch_size: Batch size.
- Returns:
- Two [batch_size, num_context_dims] tensors.
- """
- spec = self._context_spec
- next_op = self._next(self._i)
- with tf.control_dependencies([next_op]):
- value = self._values[self._i]
- if value.get_shape().as_list():
- values = tf.tile(
- tf.expand_dims(value, 0), (batch_size,) + (1,) * spec.shape.ndims)
- else:
- values = value + tf.zeros(
- shape=[
- batch_size,
- ] + spec.shape.as_list(), dtype=spec.dtype)
- self._validate_contexts(values)
- self._count += 1
- return values, values
-
- def _next(self, i):
- """Return op that increments pointer to next value.
-
- Args:
- i: A tensorflow integer variable.
- Returns:
- Op that increments pointer.
- """
- if self._scheduler == 'cycle':
- inc = ('inc' in self._scheduler_params and
- self._scheduler_params['inc']) or 1
- return tf.assign(i, tf.mod(i+inc, self._n))
- else:
- raise NotImplementedError(self._scheduler)
-
-
-@gin.configurable
-class ReplaySampler(BaseSampler):
- """Replay sampler."""
-
- def __init__(self,
- prefetch_queue_capacity=2,
- override_indices=None,
- state_indices=None,
- *args,
- **kwargs):
- """Construct sampler.
-
- Args:
- prefetch_queue_capacity: Capacity for prefetch queue.
- override_indices: Override indices.
- state_indices: Select certain indices from state dimension.
- *args: arguments.
- **kwargs: keyword arguments.
- """
- super(ReplaySampler, self).__init__(*args, **kwargs)
- self._prefetch_queue_capacity = prefetch_queue_capacity
- self._override_indices = override_indices
- self._state_indices = state_indices
-
- def set_replay(self, replay):
- """Set replay.
-
- Args:
- replay: A replay buffer.
- """
- self._replay = replay
-
- def __call__(self, batch_size, **kwargs):
- """Sample a batch of context.
-
- Args:
- batch_size: Batch size.
- Returns:
- Two [batch_size, num_context_dims] tensors.
- """
- batch = self._replay.GetRandomBatch(batch_size)
- next_states = batch[4]
- if self._prefetch_queue_capacity > 0:
- batch_queue = slim.prefetch_queue.prefetch_queue(
- [next_states],
- capacity=self._prefetch_queue_capacity,
- name='%s/batch_context_queue' % self._scope)
- next_states = batch_queue.dequeue()
- if self._override_indices is not None:
- assert self._context_range is not None and isinstance(
- self._context_range[0], (int, long, float))
- next_states = tf.concat(
- [
- tf.random_uniform(
- shape=next_states[:, :1].shape,
- minval=self._context_range[0],
- maxval=self._context_range[1],
- dtype=next_states.dtype)
- if i in self._override_indices else next_states[:, i:i + 1]
- for i in range(self._context_spec.shape.as_list()[0])
- ],
- axis=1)
- if self._state_indices is not None:
- next_states = tf.concat(
- [
- next_states[:, i:i + 1]
- for i in range(self._context_spec.shape.as_list()[0])
- ],
- axis=1)
- self._validate_contexts(next_states)
- return next_states, next_states
-
-
-@gin.configurable
-class TimeSampler(BaseSampler):
- """Time Sampler."""
-
- def __init__(self, minval=0, maxval=1, timestep=-1, *args, **kwargs):
- """Construct sampler.
-
- Args:
- minval: Min value integer.
- maxval: Max value integer.
- timestep: Time step between states and next_states.
- *args: arguments.
- **kwargs: keyword arguments.
- """
- super(TimeSampler, self).__init__(*args, **kwargs)
- assert self._context_spec.shape.as_list() == [1]
- self._minval = minval
- self._maxval = maxval
- self._timestep = timestep
-
- def __call__(self, batch_size, **kwargs):
- """Sample a batch of context.
-
- Args:
- batch_size: Batch size.
- Returns:
- Two [batch_size, num_context_dims] tensors.
- """
- if self._maxval == self._minval:
- contexts = tf.constant(
- self._maxval, shape=[batch_size, 1], dtype=tf.int32)
- else:
- contexts = tf.random_uniform(
- shape=[batch_size, 1],
- dtype=tf.int32,
- maxval=self._maxval,
- minval=self._minval)
- next_contexts = tf.maximum(contexts + self._timestep, 0)
-
- return tf.cast(
- contexts, dtype=self._context_spec.dtype), tf.cast(
- next_contexts, dtype=self._context_spec.dtype)
-
-
-@gin.configurable
-class ConstantSampler(BaseSampler):
- """Constant sampler."""
-
- def __init__(self, value=None, *args, **kwargs):
- """Construct sampler.
-
- Args:
- value: A list or Numpy array for values of the constant.
- *args: arguments.
- **kwargs: keyword arguments.
- """
- super(ConstantSampler, self).__init__(*args, **kwargs)
- self._value = value
-
- def __call__(self, batch_size, **kwargs):
- """Sample a batch of context.
-
- Args:
- batch_size: Batch size.
- Returns:
- Two [batch_size, num_context_dims] tensors.
- """
- spec = self._context_spec
- value_ = tf.constant(self._value, shape=spec.shape, dtype=spec.dtype)
- values = tf.tile(
- tf.expand_dims(value_, 0), (batch_size,) + (1,) * spec.shape.ndims)
- self._validate_contexts(values)
- return values, values
-
-
-@gin.configurable
-class DirectionSampler(RandomSampler):
- """Direction sampler."""
-
- def __call__(self, batch_size, **kwargs):
- """Sample a batch of context.
-
- Args:
- batch_size: Batch size.
- Returns:
- Two [batch_size, num_context_dims] tensors.
- """
- spec = self._context_spec
- context_range = self._context_range
- if isinstance(context_range[0], (int, float)):
- contexts = tf.random_uniform(
- shape=[
- batch_size,
- ] + spec.shape.as_list(),
- minval=context_range[0],
- maxval=context_range[1],
- dtype=spec.dtype)
- elif isinstance(context_range[0], (list, tuple, np.ndarray)):
- assert len(spec.shape.as_list()) == 1
- assert spec.shape.as_list()[0] == len(context_range[0])
- assert spec.shape.as_list()[0] == len(context_range[1])
- contexts = tf.concat(
- [
- tf.random_uniform(
- shape=[
- batch_size, 1,
- ] + spec.shape.as_list()[1:],
- minval=context_range[0][i],
- maxval=context_range[1][i],
- dtype=spec.dtype) for i in range(spec.shape.as_list()[0])
- ],
- axis=1)
- else: raise NotImplementedError(context_range)
- self._validate_contexts(contexts)
- if 'sampler_fn' in kwargs:
- other_contexts = kwargs['sampler_fn']()
- else:
- other_contexts = contexts
- state, next_state = kwargs['state'], kwargs['next_state']
- if state is not None and next_state is not None:
- my_context_range = (np.array(context_range[1]) - np.array(context_range[0])) / 2 * np.ones(spec.shape.as_list())
- contexts = tf.concat(
- [0.1 * my_context_range[:self._k] *
- tf.random_normal(tf.shape(state[:, :self._k]), dtype=state.dtype) +
- tf.random_shuffle(state[:, :self._k]) - state[:, :self._k],
- other_contexts[:, self._k:]], 1)
- #contexts = tf.Print(contexts,
- # [contexts, tf.reduce_max(contexts, 0),
- # tf.reduce_min(state, 0), tf.reduce_max(state, 0)], 'contexts', summarize=15)
- next_contexts = tf.concat( #LALA
- [state[:, :self._k] + contexts[:, :self._k] - next_state[:, :self._k],
- other_contexts[:, self._k:]], 1)
- next_contexts = contexts #LALA cosine
- else:
- next_contexts = contexts
- return tf.stop_gradient(contexts), tf.stop_gradient(next_contexts)
diff --git a/research/efficient-hrl/environments/__init__.py b/research/efficient-hrl/environments/__init__.py
deleted file mode 100644
index 8b137891791..00000000000
--- a/research/efficient-hrl/environments/__init__.py
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/research/efficient-hrl/environments/ant.py b/research/efficient-hrl/environments/ant.py
deleted file mode 100644
index feab1eef4c5..00000000000
--- a/research/efficient-hrl/environments/ant.py
+++ /dev/null
@@ -1,141 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Wrapper for creating the ant environment in gym_mujoco."""
-
-import math
-import numpy as np
-import mujoco_py
-from gym import utils
-from gym.envs.mujoco import mujoco_env
-
-
-def q_inv(a):
- return [a[0], -a[1], -a[2], -a[3]]
-
-
-def q_mult(a, b): # multiply two quaternion
- w = a[0] * b[0] - a[1] * b[1] - a[2] * b[2] - a[3] * b[3]
- i = a[0] * b[1] + a[1] * b[0] + a[2] * b[3] - a[3] * b[2]
- j = a[0] * b[2] - a[1] * b[3] + a[2] * b[0] + a[3] * b[1]
- k = a[0] * b[3] + a[1] * b[2] - a[2] * b[1] + a[3] * b[0]
- return [w, i, j, k]
-
-
-class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
- FILE = "ant.xml"
- ORI_IND = 3
-
- def __init__(self, file_path=None, expose_all_qpos=True,
- expose_body_coms=None, expose_body_comvels=None):
- self._expose_all_qpos = expose_all_qpos
- self._expose_body_coms = expose_body_coms
- self._expose_body_comvels = expose_body_comvels
- self._body_com_indices = {}
- self._body_comvel_indices = {}
-
- mujoco_env.MujocoEnv.__init__(self, file_path, 5)
- utils.EzPickle.__init__(self)
-
- @property
- def physics(self):
- # check mujoco version is greater than version 1.50 to call correct physics
- # model containing PyMjData object for getting and setting position/velocity
- # check https://github.com/openai/mujoco-py/issues/80 for updates to api
- if mujoco_py.get_version() >= '1.50':
- return self.sim
- else:
- return self.model
-
- def _step(self, a):
- return self.step(a)
-
- def step(self, a):
- xposbefore = self.get_body_com("torso")[0]
- self.do_simulation(a, self.frame_skip)
- xposafter = self.get_body_com("torso")[0]
- forward_reward = (xposafter - xposbefore) / self.dt
- ctrl_cost = .5 * np.square(a).sum()
- survive_reward = 1.0
- reward = forward_reward - ctrl_cost + survive_reward
- state = self.state_vector()
- done = False
- ob = self._get_obs()
- return ob, reward, done, dict(
- reward_forward=forward_reward,
- reward_ctrl=-ctrl_cost,
- reward_survive=survive_reward)
-
- def _get_obs(self):
- # No cfrc observation
- if self._expose_all_qpos:
- obs = np.concatenate([
- self.physics.data.qpos.flat[:15], # Ensures only ant obs.
- self.physics.data.qvel.flat[:14],
- ])
- else:
- obs = np.concatenate([
- self.physics.data.qpos.flat[2:15],
- self.physics.data.qvel.flat[:14],
- ])
-
- if self._expose_body_coms is not None:
- for name in self._expose_body_coms:
- com = self.get_body_com(name)
- if name not in self._body_com_indices:
- indices = range(len(obs), len(obs) + len(com))
- self._body_com_indices[name] = indices
- obs = np.concatenate([obs, com])
-
- if self._expose_body_comvels is not None:
- for name in self._expose_body_comvels:
- comvel = self.get_body_comvel(name)
- if name not in self._body_comvel_indices:
- indices = range(len(obs), len(obs) + len(comvel))
- self._body_comvel_indices[name] = indices
- obs = np.concatenate([obs, comvel])
- return obs
-
- def reset_model(self):
- qpos = self.init_qpos + self.np_random.uniform(
- size=self.model.nq, low=-.1, high=.1)
- qvel = self.init_qvel + self.np_random.randn(self.model.nv) * .1
-
- # Set everything other than ant to original position and 0 velocity.
- qpos[15:] = self.init_qpos[15:]
- qvel[14:] = 0.
- self.set_state(qpos, qvel)
- return self._get_obs()
-
- def viewer_setup(self):
- self.viewer.cam.distance = self.model.stat.extent * 0.5
-
- def get_ori(self):
- ori = [0, 1, 0, 0]
- rot = self.physics.data.qpos[self.__class__.ORI_IND:self.__class__.ORI_IND + 4] # take the quaternion
- ori = q_mult(q_mult(rot, ori), q_inv(rot))[1:3] # project onto x-y plane
- ori = math.atan2(ori[1], ori[0])
- return ori
-
- def set_xy(self, xy):
- qpos = np.copy(self.physics.data.qpos)
- qpos[0] = xy[0]
- qpos[1] = xy[1]
-
- qvel = self.physics.data.qvel
- self.set_state(qpos, qvel)
-
- def get_xy(self):
- return self.physics.data.qpos[:2]
diff --git a/research/efficient-hrl/environments/ant_maze_env.py b/research/efficient-hrl/environments/ant_maze_env.py
deleted file mode 100644
index 69a10663f4d..00000000000
--- a/research/efficient-hrl/environments/ant_maze_env.py
+++ /dev/null
@@ -1,21 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from environments.maze_env import MazeEnv
-from environments.ant import AntEnv
-
-
-class AntMazeEnv(MazeEnv):
- MODEL_CLASS = AntEnv
diff --git a/research/efficient-hrl/environments/assets/ant.xml b/research/efficient-hrl/environments/assets/ant.xml
deleted file mode 100755
index 5a49d7f52a0..00000000000
--- a/research/efficient-hrl/environments/assets/ant.xml
+++ /dev/null
@@ -1,81 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/research/efficient-hrl/environments/create_maze_env.py b/research/efficient-hrl/environments/create_maze_env.py
deleted file mode 100644
index f6dc4f42190..00000000000
--- a/research/efficient-hrl/environments/create_maze_env.py
+++ /dev/null
@@ -1,97 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from environments.ant_maze_env import AntMazeEnv
-from environments.point_maze_env import PointMazeEnv
-
-import tensorflow as tf
-import gin.tf
-from tf_agents.environments import gym_wrapper
-from tf_agents.environments import tf_py_environment
-
-
-@gin.configurable
-def create_maze_env(env_name=None, top_down_view=False):
- n_bins = 0
- manual_collision = False
- if env_name.startswith('Ego'):
- n_bins = 8
- env_name = env_name[3:]
- if env_name.startswith('Ant'):
- cls = AntMazeEnv
- env_name = env_name[3:]
- maze_size_scaling = 8
- elif env_name.startswith('Point'):
- cls = PointMazeEnv
- manual_collision = True
- env_name = env_name[5:]
- maze_size_scaling = 4
- else:
- assert False, 'unknown env %s' % env_name
-
- maze_id = None
- observe_blocks = False
- put_spin_near_agent = False
- if env_name == 'Maze':
- maze_id = 'Maze'
- elif env_name == 'Push':
- maze_id = 'Push'
- elif env_name == 'Fall':
- maze_id = 'Fall'
- elif env_name == 'Block':
- maze_id = 'Block'
- put_spin_near_agent = True
- observe_blocks = True
- elif env_name == 'BlockMaze':
- maze_id = 'BlockMaze'
- put_spin_near_agent = True
- observe_blocks = True
- else:
- raise ValueError('Unknown maze environment %s' % env_name)
-
- gym_mujoco_kwargs = {
- 'maze_id': maze_id,
- 'n_bins': n_bins,
- 'observe_blocks': observe_blocks,
- 'put_spin_near_agent': put_spin_near_agent,
- 'top_down_view': top_down_view,
- 'manual_collision': manual_collision,
- 'maze_size_scaling': maze_size_scaling
- }
- gym_env = cls(**gym_mujoco_kwargs)
- gym_env.reset()
- wrapped_env = gym_wrapper.GymWrapper(gym_env)
- return wrapped_env
-
-
-class TFPyEnvironment(tf_py_environment.TFPyEnvironment):
-
- def __init__(self, *args, **kwargs):
- super(TFPyEnvironment, self).__init__(*args, **kwargs)
-
- def start_collect(self):
- pass
-
- def current_obs(self):
- time_step = self.current_time_step()
- return time_step.observation[0] # For some reason, there is an extra dim.
-
- def step(self, actions):
- actions = tf.expand_dims(actions, 0)
- next_step = super(TFPyEnvironment, self).step(actions)
- return next_step.is_last()[0], next_step.reward[0], next_step.discount[0]
-
- def reset(self):
- return super(TFPyEnvironment, self).reset()
diff --git a/research/efficient-hrl/environments/maze_env.py b/research/efficient-hrl/environments/maze_env.py
deleted file mode 100644
index cf7d1f2dc0a..00000000000
--- a/research/efficient-hrl/environments/maze_env.py
+++ /dev/null
@@ -1,499 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Adapted from rllab maze_env.py."""
-
-import os
-import tempfile
-import xml.etree.ElementTree as ET
-import math
-import numpy as np
-import gym
-
-from environments import maze_env_utils
-
-# Directory that contains mujoco xml files.
-MODEL_DIR = 'environments/assets'
-
-
-class MazeEnv(gym.Env):
- MODEL_CLASS = None
-
- MAZE_HEIGHT = None
- MAZE_SIZE_SCALING = None
-
- def __init__(
- self,
- maze_id=None,
- maze_height=0.5,
- maze_size_scaling=8,
- n_bins=0,
- sensor_range=3.,
- sensor_span=2 * math.pi,
- observe_blocks=False,
- put_spin_near_agent=False,
- top_down_view=False,
- manual_collision=False,
- *args,
- **kwargs):
- self._maze_id = maze_id
-
- model_cls = self.__class__.MODEL_CLASS
- if model_cls is None:
- raise "MODEL_CLASS unspecified!"
- xml_path = os.path.join(MODEL_DIR, model_cls.FILE)
- tree = ET.parse(xml_path)
- worldbody = tree.find(".//worldbody")
-
- self.MAZE_HEIGHT = height = maze_height
- self.MAZE_SIZE_SCALING = size_scaling = maze_size_scaling
- self._n_bins = n_bins
- self._sensor_range = sensor_range * size_scaling
- self._sensor_span = sensor_span
- self._observe_blocks = observe_blocks
- self._put_spin_near_agent = put_spin_near_agent
- self._top_down_view = top_down_view
- self._manual_collision = manual_collision
-
- self.MAZE_STRUCTURE = structure = maze_env_utils.construct_maze(maze_id=self._maze_id)
- self.elevated = any(-1 in row for row in structure) # Elevate the maze to allow for falling.
- self.blocks = any(
- any(maze_env_utils.can_move(r) for r in row)
- for row in structure) # Are there any movable blocks?
-
- torso_x, torso_y = self._find_robot()
- self._init_torso_x = torso_x
- self._init_torso_y = torso_y
- self._init_positions = [
- (x - torso_x, y - torso_y)
- for x, y in self._find_all_robots()]
-
- self._xy_to_rowcol = lambda x, y: (2 + (y + size_scaling / 2) / size_scaling,
- 2 + (x + size_scaling / 2) / size_scaling)
- self._view = np.zeros([5, 5, 3]) # walls (immovable), chasms (fall), movable blocks
-
- height_offset = 0.
- if self.elevated:
- # Increase initial z-pos of ant.
- height_offset = height * size_scaling
- torso = tree.find(".//body[@name='torso']")
- torso.set('pos', '0 0 %.2f' % (0.75 + height_offset))
- if self.blocks:
- # If there are movable blocks, change simulation settings to perform
- # better contact detection.
- default = tree.find(".//default")
- default.find('.//geom').set('solimp', '.995 .995 .01')
-
- self.movable_blocks = []
- for i in range(len(structure)):
- for j in range(len(structure[0])):
- struct = structure[i][j]
- if struct == 'r' and self._put_spin_near_agent:
- struct = maze_env_utils.Move.SpinXY
- if self.elevated and struct not in [-1]:
- # Create elevated platform.
- ET.SubElement(
- worldbody, "geom",
- name="elevated_%d_%d" % (i, j),
- pos="%f %f %f" % (j * size_scaling - torso_x,
- i * size_scaling - torso_y,
- height / 2 * size_scaling),
- size="%f %f %f" % (0.5 * size_scaling,
- 0.5 * size_scaling,
- height / 2 * size_scaling),
- type="box",
- material="",
- contype="1",
- conaffinity="1",
- rgba="0.9 0.9 0.9 1",
- )
- if struct == 1: # Unmovable block.
- # Offset all coordinates so that robot starts at the origin.
- ET.SubElement(
- worldbody, "geom",
- name="block_%d_%d" % (i, j),
- pos="%f %f %f" % (j * size_scaling - torso_x,
- i * size_scaling - torso_y,
- height_offset +
- height / 2 * size_scaling),
- size="%f %f %f" % (0.5 * size_scaling,
- 0.5 * size_scaling,
- height / 2 * size_scaling),
- type="box",
- material="",
- contype="1",
- conaffinity="1",
- rgba="0.4 0.4 0.4 1",
- )
- elif maze_env_utils.can_move(struct): # Movable block.
- # The "falling" blocks are shrunk slightly and increased in mass to
- # ensure that it can fall easily through a gap in the platform blocks.
- name = "movable_%d_%d" % (i, j)
- self.movable_blocks.append((name, struct))
- falling = maze_env_utils.can_move_z(struct)
- spinning = maze_env_utils.can_spin(struct)
- x_offset = 0.25 * size_scaling if spinning else 0.0
- y_offset = 0.0
- shrink = 0.1 if spinning else 0.99 if falling else 1.0
- height_shrink = 0.1 if spinning else 1.0
- movable_body = ET.SubElement(
- worldbody, "body",
- name=name,
- pos="%f %f %f" % (j * size_scaling - torso_x + x_offset,
- i * size_scaling - torso_y + y_offset,
- height_offset +
- height / 2 * size_scaling * height_shrink),
- )
- ET.SubElement(
- movable_body, "geom",
- name="block_%d_%d" % (i, j),
- pos="0 0 0",
- size="%f %f %f" % (0.5 * size_scaling * shrink,
- 0.5 * size_scaling * shrink,
- height / 2 * size_scaling * height_shrink),
- type="box",
- material="",
- mass="0.001" if falling else "0.0002",
- contype="1",
- conaffinity="1",
- rgba="0.9 0.1 0.1 1"
- )
- if maze_env_utils.can_move_x(struct):
- ET.SubElement(
- movable_body, "joint",
- armature="0",
- axis="1 0 0",
- damping="0.0",
- limited="true" if falling else "false",
- range="%f %f" % (-size_scaling, size_scaling),
- margin="0.01",
- name="movable_x_%d_%d" % (i, j),
- pos="0 0 0",
- type="slide"
- )
- if maze_env_utils.can_move_y(struct):
- ET.SubElement(
- movable_body, "joint",
- armature="0",
- axis="0 1 0",
- damping="0.0",
- limited="true" if falling else "false",
- range="%f %f" % (-size_scaling, size_scaling),
- margin="0.01",
- name="movable_y_%d_%d" % (i, j),
- pos="0 0 0",
- type="slide"
- )
- if maze_env_utils.can_move_z(struct):
- ET.SubElement(
- movable_body, "joint",
- armature="0",
- axis="0 0 1",
- damping="0.0",
- limited="true",
- range="%f 0" % (-height_offset),
- margin="0.01",
- name="movable_z_%d_%d" % (i, j),
- pos="0 0 0",
- type="slide"
- )
- if maze_env_utils.can_spin(struct):
- ET.SubElement(
- movable_body, "joint",
- armature="0",
- axis="0 0 1",
- damping="0.0",
- limited="false",
- name="spinable_%d_%d" % (i, j),
- pos="0 0 0",
- type="ball"
- )
-
- torso = tree.find(".//body[@name='torso']")
- geoms = torso.findall(".//geom")
- for geom in geoms:
- if 'name' not in geom.attrib:
- raise Exception("Every geom of the torso must have a name "
- "defined")
-
- _, file_path = tempfile.mkstemp(text=True, suffix='.xml')
- tree.write(file_path)
-
- self.wrapped_env = model_cls(*args, file_path=file_path, **kwargs)
-
- def get_ori(self):
- return self.wrapped_env.get_ori()
-
- def get_top_down_view(self):
- self._view = np.zeros_like(self._view)
-
- def valid(row, col):
- return self._view.shape[0] > row >= 0 and self._view.shape[1] > col >= 0
-
- def update_view(x, y, d, row=None, col=None):
- if row is None or col is None:
- x = x - self._robot_x
- y = y - self._robot_y
- th = self._robot_ori
-
- row, col = self._xy_to_rowcol(x, y)
- update_view(x, y, d, row=row, col=col)
- return
-
- row, row_frac, col, col_frac = int(row), row % 1, int(col), col % 1
- if row_frac < 0:
- row_frac += 1
- if col_frac < 0:
- col_frac += 1
-
- if valid(row, col):
- self._view[row, col, d] += (
- (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
- (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
- if valid(row - 1, col):
- self._view[row - 1, col, d] += (
- (max(0., 0.5 - row_frac)) *
- (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
- if valid(row + 1, col):
- self._view[row + 1, col, d] += (
- (max(0., row_frac - 0.5)) *
- (min(1., col_frac + 0.5) - max(0., col_frac - 0.5)))
- if valid(row, col - 1):
- self._view[row, col - 1, d] += (
- (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
- (max(0., 0.5 - col_frac)))
- if valid(row, col + 1):
- self._view[row, col + 1, d] += (
- (min(1., row_frac + 0.5) - max(0., row_frac - 0.5)) *
- (max(0., col_frac - 0.5)))
- if valid(row - 1, col - 1):
- self._view[row - 1, col - 1, d] += (
- (max(0., 0.5 - row_frac)) * max(0., 0.5 - col_frac))
- if valid(row - 1, col + 1):
- self._view[row - 1, col + 1, d] += (
- (max(0., 0.5 - row_frac)) * max(0., col_frac - 0.5))
- if valid(row + 1, col + 1):
- self._view[row + 1, col + 1, d] += (
- (max(0., row_frac - 0.5)) * max(0., col_frac - 0.5))
- if valid(row + 1, col - 1):
- self._view[row + 1, col - 1, d] += (
- (max(0., row_frac - 0.5)) * max(0., 0.5 - col_frac))
-
- # Draw ant.
- robot_x, robot_y = self.wrapped_env.get_body_com("torso")[:2]
- self._robot_x = robot_x
- self._robot_y = robot_y
- self._robot_ori = self.get_ori()
-
- structure = self.MAZE_STRUCTURE
- size_scaling = self.MAZE_SIZE_SCALING
- height = self.MAZE_HEIGHT
-
- # Draw immovable blocks and chasms.
- for i in range(len(structure)):
- for j in range(len(structure[0])):
- if structure[i][j] == 1: # Wall.
- update_view(j * size_scaling - self._init_torso_x,
- i * size_scaling - self._init_torso_y,
- 0)
- if structure[i][j] == -1: # Chasm.
- update_view(j * size_scaling - self._init_torso_x,
- i * size_scaling - self._init_torso_y,
- 1)
-
- # Draw movable blocks.
- for block_name, block_type in self.movable_blocks:
- block_x, block_y = self.wrapped_env.get_body_com(block_name)[:2]
- update_view(block_x, block_y, 2)
-
- return self._view
-
- def get_range_sensor_obs(self):
- """Returns egocentric range sensor observations of maze."""
- robot_x, robot_y, robot_z = self.wrapped_env.get_body_com("torso")[:3]
- ori = self.get_ori()
-
- structure = self.MAZE_STRUCTURE
- size_scaling = self.MAZE_SIZE_SCALING
- height = self.MAZE_HEIGHT
-
- segments = []
- # Get line segments (corresponding to outer boundary) of each immovable
- # block or drop-off.
- for i in range(len(structure)):
- for j in range(len(structure[0])):
- if structure[i][j] in [1, -1]: # There's a wall or drop-off.
- cx = j * size_scaling - self._init_torso_x
- cy = i * size_scaling - self._init_torso_y
- x1 = cx - 0.5 * size_scaling
- x2 = cx + 0.5 * size_scaling
- y1 = cy - 0.5 * size_scaling
- y2 = cy + 0.5 * size_scaling
- struct_segments = [
- ((x1, y1), (x2, y1)),
- ((x2, y1), (x2, y2)),
- ((x2, y2), (x1, y2)),
- ((x1, y2), (x1, y1)),
- ]
- for seg in struct_segments:
- segments.append(dict(
- segment=seg,
- type=structure[i][j],
- ))
- # Get line segments (corresponding to outer boundary) of each movable
- # block within the agent's z-view.
- for block_name, block_type in self.movable_blocks:
- block_x, block_y, block_z = self.wrapped_env.get_body_com(block_name)[:3]
- if (block_z + height * size_scaling / 2 >= robot_z and
- robot_z >= block_z - height * size_scaling / 2): # Block in view.
- x1 = block_x - 0.5 * size_scaling
- x2 = block_x + 0.5 * size_scaling
- y1 = block_y - 0.5 * size_scaling
- y2 = block_y + 0.5 * size_scaling
- struct_segments = [
- ((x1, y1), (x2, y1)),
- ((x2, y1), (x2, y2)),
- ((x2, y2), (x1, y2)),
- ((x1, y2), (x1, y1)),
- ]
- for seg in struct_segments:
- segments.append(dict(
- segment=seg,
- type=block_type,
- ))
-
- sensor_readings = np.zeros((self._n_bins, 3)) # 3 for wall, drop-off, block
- for ray_idx in range(self._n_bins):
- ray_ori = (ori - self._sensor_span * 0.5 +
- (2 * ray_idx + 1.0) / (2 * self._n_bins) * self._sensor_span)
- ray_segments = []
- # Get all segments that intersect with ray.
- for seg in segments:
- p = maze_env_utils.ray_segment_intersect(
- ray=((robot_x, robot_y), ray_ori),
- segment=seg["segment"])
- if p is not None:
- ray_segments.append(dict(
- segment=seg["segment"],
- type=seg["type"],
- ray_ori=ray_ori,
- distance=maze_env_utils.point_distance(p, (robot_x, robot_y)),
- ))
- if len(ray_segments) > 0:
- # Find out which segment is intersected first.
- first_seg = sorted(ray_segments, key=lambda x: x["distance"])[0]
- seg_type = first_seg["type"]
- idx = (0 if seg_type == 1 else # Wall.
- 1 if seg_type == -1 else # Drop-off.
- 2 if maze_env_utils.can_move(seg_type) else # Block.
- None)
- if first_seg["distance"] <= self._sensor_range:
- sensor_readings[ray_idx][idx] = (self._sensor_range - first_seg["distance"]) / self._sensor_range
-
- return sensor_readings
-
- def _get_obs(self):
- wrapped_obs = self.wrapped_env._get_obs()
- if self._top_down_view:
- view = [self.get_top_down_view().flat]
- else:
- view = []
-
- if self._observe_blocks:
- additional_obs = []
- for block_name, block_type in self.movable_blocks:
- additional_obs.append(self.wrapped_env.get_body_com(block_name))
- wrapped_obs = np.concatenate([wrapped_obs[:3]] + additional_obs +
- [wrapped_obs[3:]])
-
- range_sensor_obs = self.get_range_sensor_obs()
- return np.concatenate([wrapped_obs,
- range_sensor_obs.flat] +
- view + [[self.t * 0.001]])
-
- def reset(self):
- self.t = 0
- self.trajectory = []
- self.wrapped_env.reset()
- if len(self._init_positions) > 1:
- xy = random.choice(self._init_positions)
- self.wrapped_env.set_xy(xy)
- return self._get_obs()
-
- @property
- def viewer(self):
- return self.wrapped_env.viewer
-
- def render(self, *args, **kwargs):
- return self.wrapped_env.render(*args, **kwargs)
-
- @property
- def observation_space(self):
- shape = self._get_obs().shape
- high = np.inf * np.ones(shape)
- low = -high
- return gym.spaces.Box(low, high)
-
- @property
- def action_space(self):
- return self.wrapped_env.action_space
-
- def _find_robot(self):
- structure = self.MAZE_STRUCTURE
- size_scaling = self.MAZE_SIZE_SCALING
- for i in range(len(structure)):
- for j in range(len(structure[0])):
- if structure[i][j] == 'r':
- return j * size_scaling, i * size_scaling
- assert False, 'No robot in maze specification.'
-
- def _find_all_robots(self):
- structure = self.MAZE_STRUCTURE
- size_scaling = self.MAZE_SIZE_SCALING
- coords = []
- for i in range(len(structure)):
- for j in range(len(structure[0])):
- if structure[i][j] == 'r':
- coords.append((j * size_scaling, i * size_scaling))
- return coords
-
- def _is_in_collision(self, pos):
- x, y = pos
- structure = self.MAZE_STRUCTURE
- size_scaling = self.MAZE_SIZE_SCALING
- for i in range(len(structure)):
- for j in range(len(structure[0])):
- if structure[i][j] == 1:
- minx = j * size_scaling - size_scaling * 0.5 - self._init_torso_x
- maxx = j * size_scaling + size_scaling * 0.5 - self._init_torso_x
- miny = i * size_scaling - size_scaling * 0.5 - self._init_torso_y
- maxy = i * size_scaling + size_scaling * 0.5 - self._init_torso_y
- if minx <= x <= maxx and miny <= y <= maxy:
- return True
- return False
-
- def step(self, action):
- self.t += 1
- if self._manual_collision:
- old_pos = self.wrapped_env.get_xy()
- inner_next_obs, inner_reward, done, info = self.wrapped_env.step(action)
- new_pos = self.wrapped_env.get_xy()
- if self._is_in_collision(new_pos):
- self.wrapped_env.set_xy(old_pos)
- else:
- inner_next_obs, inner_reward, done, info = self.wrapped_env.step(action)
- next_obs = self._get_obs()
- done = False
- return next_obs, inner_reward, done, info
diff --git a/research/efficient-hrl/environments/maze_env_utils.py b/research/efficient-hrl/environments/maze_env_utils.py
deleted file mode 100644
index 4f52509b65a..00000000000
--- a/research/efficient-hrl/environments/maze_env_utils.py
+++ /dev/null
@@ -1,164 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Adapted from rllab maze_env_utils.py."""
-import numpy as np
-import math
-
-
-class Move(object):
- X = 11
- Y = 12
- Z = 13
- XY = 14
- XZ = 15
- YZ = 16
- XYZ = 17
- SpinXY = 18
-
-
-def can_move_x(movable):
- return movable in [Move.X, Move.XY, Move.XZ, Move.XYZ,
- Move.SpinXY]
-
-
-def can_move_y(movable):
- return movable in [Move.Y, Move.XY, Move.YZ, Move.XYZ,
- Move.SpinXY]
-
-
-def can_move_z(movable):
- return movable in [Move.Z, Move.XZ, Move.YZ, Move.XYZ]
-
-
-def can_spin(movable):
- return movable in [Move.SpinXY]
-
-
-def can_move(movable):
- return can_move_x(movable) or can_move_y(movable) or can_move_z(movable)
-
-
-def construct_maze(maze_id='Maze'):
- if maze_id == 'Maze':
- structure = [
- [1, 1, 1, 1, 1],
- [1, 'r', 0, 0, 1],
- [1, 1, 1, 0, 1],
- [1, 0, 0, 0, 1],
- [1, 1, 1, 1, 1],
- ]
- elif maze_id == 'Push':
- structure = [
- [1, 1, 1, 1, 1],
- [1, 0, 'r', 1, 1],
- [1, 0, Move.XY, 0, 1],
- [1, 1, 0, 1, 1],
- [1, 1, 1, 1, 1],
- ]
- elif maze_id == 'Fall':
- structure = [
- [1, 1, 1, 1],
- [1, 'r', 0, 1],
- [1, 0, Move.YZ, 1],
- [1, -1, -1, 1],
- [1, 0, 0, 1],
- [1, 1, 1, 1],
- ]
- elif maze_id == 'Block':
- O = 'r'
- structure = [
- [1, 1, 1, 1, 1],
- [1, O, 0, 0, 1],
- [1, 0, 0, 0, 1],
- [1, 0, 0, 0, 1],
- [1, 1, 1, 1, 1],
- ]
- elif maze_id == 'BlockMaze':
- O = 'r'
- structure = [
- [1, 1, 1, 1],
- [1, O, 0, 1],
- [1, 1, 0, 1],
- [1, 0, 0, 1],
- [1, 1, 1, 1],
- ]
- else:
- raise NotImplementedError('The provided MazeId %s is not recognized' % maze_id)
-
- return structure
-
-
-def line_intersect(pt1, pt2, ptA, ptB):
- """
- Taken from https://www.cs.hmc.edu/ACM/lectures/intersections.html
-
- this returns the intersection of Line(pt1,pt2) and Line(ptA,ptB)
- """
-
- DET_TOLERANCE = 0.00000001
-
- # the first line is pt1 + r*(pt2-pt1)
- # in component form:
- x1, y1 = pt1
- x2, y2 = pt2
- dx1 = x2 - x1
- dy1 = y2 - y1
-
- # the second line is ptA + s*(ptB-ptA)
- x, y = ptA
- xB, yB = ptB
- dx = xB - x
- dy = yB - y
-
- DET = (-dx1 * dy + dy1 * dx)
-
- if math.fabs(DET) < DET_TOLERANCE: return (0, 0, 0, 0, 0)
-
- # now, the determinant should be OK
- DETinv = 1.0 / DET
-
- # find the scalar amount along the "self" segment
- r = DETinv * (-dy * (x - x1) + dx * (y - y1))
-
- # find the scalar amount along the input line
- s = DETinv * (-dy1 * (x - x1) + dx1 * (y - y1))
-
- # return the average of the two descriptions
- xi = (x1 + r * dx1 + x + s * dx) / 2.0
- yi = (y1 + r * dy1 + y + s * dy) / 2.0
- return (xi, yi, 1, r, s)
-
-
-def ray_segment_intersect(ray, segment):
- """
- Check if the ray originated from (x, y) with direction theta intersects the line segment (x1, y1) -- (x2, y2),
- and return the intersection point if there is one
- """
- (x, y), theta = ray
- # (x1, y1), (x2, y2) = segment
- pt1 = (x, y)
- len = 1
- pt2 = (x + len * math.cos(theta), y + len * math.sin(theta))
- xo, yo, valid, r, s = line_intersect(pt1, pt2, *segment)
- if valid and r >= 0 and 0 <= s <= 1:
- return (xo, yo)
- return None
-
-
-def point_distance(p1, p2):
- x1, y1 = p1
- x2, y2 = p2
- return ((x1 - x2) ** 2 + (y1 - y2) ** 2) ** 0.5
diff --git a/research/efficient-hrl/environments/point.py b/research/efficient-hrl/environments/point.py
deleted file mode 100644
index 9c2fc80bc82..00000000000
--- a/research/efficient-hrl/environments/point.py
+++ /dev/null
@@ -1,97 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Wrapper for creating the ant environment in gym_mujoco."""
-
-import math
-import numpy as np
-import mujoco_py
-from gym import utils
-from gym.envs.mujoco import mujoco_env
-
-
-class PointEnv(mujoco_env.MujocoEnv, utils.EzPickle):
- FILE = "point.xml"
- ORI_IND = 2
-
- def __init__(self, file_path=None, expose_all_qpos=True):
- self._expose_all_qpos = expose_all_qpos
-
- mujoco_env.MujocoEnv.__init__(self, file_path, 1)
- utils.EzPickle.__init__(self)
-
- @property
- def physics(self):
- # check mujoco version is greater than version 1.50 to call correct physics
- # model containing PyMjData object for getting and setting position/velocity
- # check https://github.com/openai/mujoco-py/issues/80 for updates to api
- if mujoco_py.get_version() >= '1.50':
- return self.sim
- else:
- return self.model
-
- def _step(self, a):
- return self.step(a)
-
- def step(self, action):
- action[0] = 0.2 * action[0]
- qpos = np.copy(self.physics.data.qpos)
- qpos[2] += action[1]
- ori = qpos[2]
- # compute increment in each direction
- dx = math.cos(ori) * action[0]
- dy = math.sin(ori) * action[0]
- # ensure that the robot is within reasonable range
- qpos[0] = np.clip(qpos[0] + dx, -100, 100)
- qpos[1] = np.clip(qpos[1] + dy, -100, 100)
- qvel = self.physics.data.qvel
- self.set_state(qpos, qvel)
- for _ in range(0, self.frame_skip):
- self.physics.step()
- next_obs = self._get_obs()
- reward = 0
- done = False
- info = {}
- return next_obs, reward, done, info
-
- def _get_obs(self):
- if self._expose_all_qpos:
- return np.concatenate([
- self.physics.data.qpos.flat[:3], # Only point-relevant coords.
- self.physics.data.qvel.flat[:3]])
- return np.concatenate([
- self.physics.data.qpos.flat[2:3],
- self.physics.data.qvel.flat[:3]])
-
- def reset_model(self):
- qpos = self.init_qpos + self.np_random.uniform(
- size=self.physics.model.nq, low=-.1, high=.1)
- qvel = self.init_qvel + self.np_random.randn(self.physics.model.nv) * .1
-
- # Set everything other than point to original position and 0 velocity.
- qpos[3:] = self.init_qpos[3:]
- qvel[3:] = 0.
- self.set_state(qpos, qvel)
- return self._get_obs()
-
- def get_ori(self):
- return self.physics.data.qpos[self.__class__.ORI_IND]
-
- def set_xy(self, xy):
- qpos = np.copy(self.physics.data.qpos)
- qpos[0] = xy[0]
- qpos[1] = xy[1]
-
- qvel = self.physics.data.qvel
diff --git a/research/efficient-hrl/environments/point_maze_env.py b/research/efficient-hrl/environments/point_maze_env.py
deleted file mode 100644
index 8d6b8194863..00000000000
--- a/research/efficient-hrl/environments/point_maze_env.py
+++ /dev/null
@@ -1,21 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-from environments.maze_env import MazeEnv
-from environments.point import PointEnv
-
-
-class PointMazeEnv(MazeEnv):
- MODEL_CLASS = PointEnv
diff --git a/research/efficient-hrl/eval.py b/research/efficient-hrl/eval.py
deleted file mode 100644
index 4f5a4b20a53..00000000000
--- a/research/efficient-hrl/eval.py
+++ /dev/null
@@ -1,460 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-r"""Script for evaluating a UVF agent.
-
-To run locally: See run_eval.py
-
-To run on borg: See train_eval.borg
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import tensorflow as tf
-slim = tf.contrib.slim
-import gin.tf
-# pylint: disable=unused-import
-import agent
-import train
-from utils import utils as uvf_utils
-from utils import eval_utils
-from environments import create_maze_env
-# pylint: enable=unused-import
-
-flags = tf.app.flags
-
-flags.DEFINE_string('eval_dir', None,
- 'Directory for writing logs/summaries during eval.')
-flags.DEFINE_string('checkpoint_dir', None,
- 'Directory containing checkpoints to eval.')
-FLAGS = flags.FLAGS
-
-
-def get_evaluate_checkpoint_fn(master, output_dir, eval_step_fns,
- model_rollout_fn, gamma, max_steps_per_episode,
- num_episodes_eval, num_episodes_videos,
- tuner_hook, generate_videos,
- generate_summaries, video_settings):
- """Returns a function that evaluates a given checkpoint.
-
- Args:
- master: BNS name of the TensorFlow master
- output_dir: The output directory to which the metric summaries are written.
- eval_step_fns: A dictionary of a functions that return a list of
- [state, action, reward, discount, transition_type] tensors,
- indexed by summary tag name.
- model_rollout_fn: Model rollout fn.
- gamma: Discount factor for the reward.
- max_steps_per_episode: Maximum steps to run each episode for.
- num_episodes_eval: Number of episodes to evaluate and average reward over.
- num_episodes_videos: Number of episodes to record for video.
- tuner_hook: A callable(average reward, global step) that updates a Vizier
- tuner trial.
- generate_videos: Whether to generate videos of the agent in action.
- generate_summaries: Whether to generate summaries.
- video_settings: Settings for generating videos of the agent.
-
- Returns:
- A function that evaluates a checkpoint.
- """
- sess = tf.Session(master, graph=tf.get_default_graph())
- sess.run(tf.global_variables_initializer())
- sess.run(tf.local_variables_initializer())
- summary_writer = tf.summary.FileWriter(output_dir)
-
- def evaluate_checkpoint(checkpoint_path):
- """Performs a one-time evaluation of the given checkpoint.
-
- Args:
- checkpoint_path: Checkpoint to evaluate.
- Returns:
- True if the evaluation process should stop
- """
- restore_fn = tf.contrib.framework.assign_from_checkpoint_fn(
- checkpoint_path,
- uvf_utils.get_all_vars(),
- ignore_missing_vars=True,
- reshape_variables=False)
- assert restore_fn is not None, 'cannot restore %s' % checkpoint_path
- restore_fn(sess)
- global_step = sess.run(slim.get_global_step())
- should_stop = False
- max_reward = -1e10
- max_meta_reward = -1e10
-
- for eval_tag, (eval_step, env_base,) in sorted(eval_step_fns.items()):
- if hasattr(env_base, 'set_sess'):
- env_base.set_sess(sess) # set session
-
- if generate_summaries:
- tf.logging.info(
- '[%s] Computing average reward over %d episodes at global step %d.',
- eval_tag, num_episodes_eval, global_step)
- (average_reward, last_reward,
- average_meta_reward, last_meta_reward, average_success,
- states, actions) = eval_utils.compute_average_reward(
- sess, env_base, eval_step, gamma, max_steps_per_episode,
- num_episodes_eval)
- tf.logging.info('[%s] Average reward = %f', eval_tag, average_reward)
- tf.logging.info('[%s] Last reward = %f', eval_tag, last_reward)
- tf.logging.info('[%s] Average meta reward = %f', eval_tag, average_meta_reward)
- tf.logging.info('[%s] Last meta reward = %f', eval_tag, last_meta_reward)
- tf.logging.info('[%s] Average success = %f', eval_tag, average_success)
- if model_rollout_fn is not None:
- preds, model_losses = eval_utils.compute_model_loss(
- sess, model_rollout_fn, states, actions)
- for i, (pred, state, model_loss) in enumerate(
- zip(preds, states, model_losses)):
- tf.logging.info('[%s] Model rollout step %d: loss=%f', eval_tag, i,
- model_loss)
- tf.logging.info('[%s] Model rollout step %d: pred=%s', eval_tag, i,
- str(pred.tolist()))
- tf.logging.info('[%s] Model rollout step %d: state=%s', eval_tag, i,
- str(state.tolist()))
-
- # Report the eval stats to the tuner.
- if average_reward > max_reward:
- max_reward = average_reward
- if average_meta_reward > max_meta_reward:
- max_meta_reward = average_meta_reward
-
- for (tag, value) in [('Reward/average_%s_reward', average_reward),
- ('Reward/last_%s_reward', last_reward),
- ('Reward/average_%s_meta_reward', average_meta_reward),
- ('Reward/last_%s_meta_reward', last_meta_reward),
- ('Reward/average_%s_success', average_success)]:
- summary_str = tf.Summary(value=[
- tf.Summary.Value(
- tag=tag % eval_tag,
- simple_value=value)
- ])
- summary_writer.add_summary(summary_str, global_step)
- summary_writer.flush()
-
- if generate_videos or should_stop:
- # Do a manual reset before generating the video to see the initial
- # pose of the robot, towards which the reset controller is moving.
- if hasattr(env_base, '_gym_env'):
- tf.logging.info('Resetting before recording video')
- if hasattr(env_base._gym_env, 'reset_model'):
- env_base._gym_env.reset_model() # pylint: disable=protected-access
- else:
- env_base._gym_env.wrapped_env.reset_model()
- video_filename = os.path.join(output_dir, 'videos',
- '%s_step_%d.mp4' % (eval_tag,
- global_step))
- eval_utils.capture_video(sess, eval_step, env_base,
- max_steps_per_episode * num_episodes_videos,
- video_filename, video_settings,
- reset_every=max_steps_per_episode)
-
- should_stop = should_stop or (generate_summaries and tuner_hook and
- tuner_hook(max_reward, global_step))
- return bool(should_stop)
-
- return evaluate_checkpoint
-
-
-def get_model_rollout(uvf_agent, tf_env):
- """Model rollout function."""
- state_spec = tf_env.observation_spec()[0]
- action_spec = tf_env.action_spec()[0]
- state_ph = tf.placeholder(dtype=state_spec.dtype, shape=state_spec.shape)
- action_ph = tf.placeholder(dtype=action_spec.dtype, shape=action_spec.shape)
-
- merged_state = uvf_agent.merged_state(state_ph)
- diff_value = uvf_agent.critic_net(tf.expand_dims(merged_state, 0),
- tf.expand_dims(action_ph, 0))[0]
- diff_value = tf.cast(diff_value, dtype=state_ph.dtype)
- state_ph.shape.assert_is_compatible_with(diff_value.shape)
- next_state = state_ph + diff_value
-
- def model_rollout_fn(sess, state, action):
- return sess.run(next_state, feed_dict={state_ph: state, action_ph: action})
-
- return model_rollout_fn
-
-
-def get_eval_step(uvf_agent,
- state_preprocess,
- tf_env,
- action_fn,
- meta_action_fn,
- environment_steps,
- num_episodes,
- mode='eval'):
- """Get one-step policy/env stepping ops.
-
- Args:
- uvf_agent: A UVF agent.
- tf_env: A TFEnvironment.
- action_fn: A function to produce actions given current state.
- meta_action_fn: A function to produce meta actions given current state.
- environment_steps: A variable to count the number of steps in the tf_env.
- num_episodes: A variable to count the number of episodes.
- mode: a string representing the mode=[train, explore, eval].
-
- Returns:
- A collect_experience_op that excute an action and store into the
- replay_buffer
- """
-
- tf_env.start_collect()
- state = tf_env.current_obs()
- action = action_fn(state, context=None)
- state_repr = state_preprocess(state)
-
- action_spec = tf_env.action_spec()
- action_ph = tf.placeholder(dtype=action_spec.dtype, shape=action_spec.shape)
- with tf.control_dependencies([state]):
- transition_type, reward, discount = tf_env.step(action_ph)
-
- def increment_step():
- return environment_steps.assign_add(1)
-
- def increment_episode():
- return num_episodes.assign_add(1)
-
- def no_op_int():
- return tf.constant(0, dtype=tf.int64)
-
- step_cond = uvf_agent.step_cond_fn(state, action,
- transition_type,
- environment_steps, num_episodes)
- reset_episode_cond = uvf_agent.reset_episode_cond_fn(
- state, action,
- transition_type, environment_steps, num_episodes)
- reset_env_cond = uvf_agent.reset_env_cond_fn(state, action,
- transition_type,
- environment_steps, num_episodes)
-
- increment_step_op = tf.cond(step_cond, increment_step, no_op_int)
- with tf.control_dependencies([increment_step_op]):
- increment_episode_op = tf.cond(reset_episode_cond, increment_episode,
- no_op_int)
-
- with tf.control_dependencies([reward, discount]):
- next_state = tf_env.current_obs()
- next_state_repr = state_preprocess(next_state)
-
- with tf.control_dependencies([increment_episode_op]):
- post_reward, post_meta_reward = uvf_agent.cond_begin_episode_op(
- tf.logical_not(reset_episode_cond),
- [state, action_ph, reward, next_state,
- state_repr, next_state_repr],
- mode=mode, meta_action_fn=meta_action_fn)
-
- # Important: do manual reset after getting the final reward from the
- # unreset environment.
- with tf.control_dependencies([post_reward, post_meta_reward]):
- cond_reset_op = tf.cond(reset_env_cond,
- tf_env.reset,
- tf_env.current_time_step)
-
- # Add a dummy control dependency to force the reset_op to run
- with tf.control_dependencies(cond_reset_op):
- post_reward, post_meta_reward = map(tf.identity, [post_reward, post_meta_reward])
-
- eval_step = [next_state, action_ph, transition_type, post_reward, post_meta_reward, discount, uvf_agent.context_vars, state_repr]
-
- if callable(action):
- def step_fn(sess):
- action_value = action(sess)
- return sess.run(eval_step, feed_dict={action_ph: action_value})
- else:
- action = uvf_utils.clip_to_spec(action, action_spec)
- def step_fn(sess):
- action_value = sess.run(action)
- return sess.run(eval_step, feed_dict={action_ph: action_value})
-
- return step_fn
-
-
-@gin.configurable
-def evaluate(checkpoint_dir,
- eval_dir,
- environment=None,
- num_bin_actions=3,
- agent_class=None,
- meta_agent_class=None,
- state_preprocess_class=None,
- gamma=1.0,
- num_episodes_eval=10,
- eval_interval_secs=60,
- max_number_of_evaluations=None,
- checkpoint_timeout=None,
- timeout_fn=None,
- tuner_hook=None,
- generate_videos=False,
- generate_summaries=True,
- num_episodes_videos=5,
- video_settings=None,
- eval_modes=('eval',),
- eval_model_rollout=False,
- policy_save_dir='policy',
- checkpoint_range=None,
- checkpoint_path=None,
- max_steps_per_episode=None,
- evaluate_nohrl=False):
- """Loads and repeatedly evaluates a checkpointed model at a set interval.
-
- Args:
- checkpoint_dir: The directory where the checkpoints reside.
- eval_dir: Directory to save the evaluation summary results.
- environment: A BaseEnvironment to evaluate.
- num_bin_actions: Number of bins for discretizing continuous actions.
- agent_class: An RL agent class.
- meta_agent_class: A Meta agent class.
- gamma: Discount factor for the reward.
- num_episodes_eval: Number of episodes to evaluate and average reward over.
- eval_interval_secs: The number of seconds between each evaluation run.
- max_number_of_evaluations: The max number of evaluations. If None the
- evaluation continues indefinitely.
- checkpoint_timeout: The maximum amount of time to wait between checkpoints.
- If left as `None`, then the process will wait indefinitely.
- timeout_fn: Optional function to call after a timeout.
- tuner_hook: A callable that takes the average reward and global step and
- updates a Vizier tuner trial.
- generate_videos: Whether to generate videos of the agent in action.
- generate_summaries: Whether to generate summaries.
- num_episodes_videos: Number of episodes to evaluate for generating videos.
- video_settings: Settings for generating videos of the agent.
- optimal action based on the critic.
- eval_modes: A tuple of eval modes.
- eval_model_rollout: Evaluate model rollout.
- policy_save_dir: Optional sub-directory where the policies are
- saved.
- checkpoint_range: Optional. If provided, evaluate all checkpoints in
- the range.
- checkpoint_path: Optional sub-directory specifying which checkpoint to
- evaluate. If None, will evaluate the most recent checkpoint.
- """
- tf_env = create_maze_env.TFPyEnvironment(environment)
- observation_spec = [tf_env.observation_spec()]
- action_spec = [tf_env.action_spec()]
-
- assert max_steps_per_episode, 'max_steps_per_episode need to be set'
-
- if agent_class.ACTION_TYPE == 'discrete':
- assert False
- else:
- assert agent_class.ACTION_TYPE == 'continuous'
-
- if meta_agent_class is not None:
- assert agent_class.ACTION_TYPE == meta_agent_class.ACTION_TYPE
- with tf.variable_scope('meta_agent'):
- meta_agent = meta_agent_class(
- observation_spec,
- action_spec,
- tf_env,
- )
- else:
- meta_agent = None
-
- with tf.variable_scope('uvf_agent'):
- uvf_agent = agent_class(
- observation_spec,
- action_spec,
- tf_env,
- )
- uvf_agent.set_meta_agent(agent=meta_agent)
-
- with tf.variable_scope('state_preprocess'):
- state_preprocess = state_preprocess_class()
-
- # run both actor and critic once to ensure networks are initialized
- # and gin configs will be saved
- # pylint: disable=protected-access
- temp_states = tf.expand_dims(
- tf.zeros(
- dtype=uvf_agent._observation_spec.dtype,
- shape=uvf_agent._observation_spec.shape), 0)
- # pylint: enable=protected-access
- temp_actions = uvf_agent.actor_net(temp_states)
- uvf_agent.critic_net(temp_states, temp_actions)
-
- # create eval_step_fns for each action function
- eval_step_fns = dict()
- meta_agent = uvf_agent.meta_agent
- for meta in [True] + [False] * evaluate_nohrl:
- meta_tag = 'hrl' if meta else 'nohrl'
- uvf_agent.set_meta_agent(meta_agent if meta else None)
- for mode in eval_modes:
- # wrap environment
- wrapped_environment = uvf_agent.get_env_base_wrapper(
- environment, mode=mode)
- action_wrapper = lambda agent_: agent_.action
- action_fn = action_wrapper(uvf_agent)
- meta_action_fn = action_wrapper(meta_agent)
- eval_step_fns['%s_%s' % (mode, meta_tag)] = (get_eval_step(
- uvf_agent=uvf_agent,
- state_preprocess=state_preprocess,
- tf_env=tf_env,
- action_fn=action_fn,
- meta_action_fn=meta_action_fn,
- environment_steps=tf.Variable(
- 0, dtype=tf.int64, name='environment_steps'),
- num_episodes=tf.Variable(0, dtype=tf.int64, name='num_episodes'),
- mode=mode), wrapped_environment,)
-
- model_rollout_fn = None
- if eval_model_rollout:
- model_rollout_fn = get_model_rollout(uvf_agent, tf_env)
-
- tf.train.get_or_create_global_step()
-
- if policy_save_dir:
- checkpoint_dir = os.path.join(checkpoint_dir, policy_save_dir)
-
- tf.logging.info('Evaluating policies at %s', checkpoint_dir)
- tf.logging.info('Running episodes for max %d steps', max_steps_per_episode)
-
- evaluate_checkpoint_fn = get_evaluate_checkpoint_fn(
- '', eval_dir, eval_step_fns, model_rollout_fn, gamma,
- max_steps_per_episode, num_episodes_eval, num_episodes_videos, tuner_hook,
- generate_videos, generate_summaries, video_settings)
-
- if checkpoint_path is not None:
- checkpoint_path = os.path.join(checkpoint_dir, checkpoint_path)
- evaluate_checkpoint_fn(checkpoint_path)
- elif checkpoint_range is not None:
- model_files = tf.gfile.Glob(
- os.path.join(checkpoint_dir, 'model.ckpt-*.index'))
- tf.logging.info('Found %s policies at %s', len(model_files), checkpoint_dir)
- model_files = {
- int(f.split('model.ckpt-', 1)[1].split('.', 1)[0]):
- os.path.splitext(f)[0]
- for f in model_files
- }
- model_files = {
- k: v
- for k, v in model_files.items()
- if k >= checkpoint_range[0] and k <= checkpoint_range[1]
- }
- tf.logging.info('Evaluating %d policies at %s',
- len(model_files), checkpoint_dir)
- for _, checkpoint_path in sorted(model_files.items()):
- evaluate_checkpoint_fn(checkpoint_path)
- else:
- eval_utils.evaluate_checkpoint_repeatedly(
- checkpoint_dir,
- evaluate_checkpoint_fn,
- eval_interval_secs=eval_interval_secs,
- max_number_of_evaluations=max_number_of_evaluations,
- checkpoint_timeout=checkpoint_timeout,
- timeout_fn=timeout_fn)
diff --git a/research/efficient-hrl/run_env.py b/research/efficient-hrl/run_env.py
deleted file mode 100644
index 87fad542aea..00000000000
--- a/research/efficient-hrl/run_env.py
+++ /dev/null
@@ -1,129 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Random policy on an environment."""
-
-import tensorflow as tf
-import numpy as np
-import random
-
-from environments import create_maze_env
-
-app = tf.app
-flags = tf.flags
-logging = tf.logging
-
-FLAGS = flags.FLAGS
-
-flags.DEFINE_string('env', 'AntMaze', 'environment name: AntMaze, AntPush, or AntFall')
-flags.DEFINE_integer('episode_length', 500, 'episode length')
-flags.DEFINE_integer('num_episodes', 50, 'number of episodes')
-
-
-def get_goal_sample_fn(env_name):
- if env_name == 'AntMaze':
- # NOTE: When evaluating (i.e. the metrics shown in the paper,
- # we use the commented out goal sampling function. The uncommented
- # one is only used for training.
- #return lambda: np.array([0., 16.])
- return lambda: np.random.uniform((-4, -4), (20, 20))
- elif env_name == 'AntPush':
- return lambda: np.array([0., 19.])
- elif env_name == 'AntFall':
- return lambda: np.array([0., 27., 4.5])
- else:
- assert False, 'Unknown env'
-
-
-def get_reward_fn(env_name):
- if env_name == 'AntMaze':
- return lambda obs, goal: -np.sum(np.square(obs[:2] - goal)) ** 0.5
- elif env_name == 'AntPush':
- return lambda obs, goal: -np.sum(np.square(obs[:2] - goal)) ** 0.5
- elif env_name == 'AntFall':
- return lambda obs, goal: -np.sum(np.square(obs[:3] - goal)) ** 0.5
- else:
- assert False, 'Unknown env'
-
-
-def success_fn(last_reward):
- return last_reward > -5.0
-
-
-class EnvWithGoal(object):
-
- def __init__(self, base_env, env_name):
- self.base_env = base_env
- self.goal_sample_fn = get_goal_sample_fn(env_name)
- self.reward_fn = get_reward_fn(env_name)
- self.goal = None
-
- def reset(self):
- obs = self.base_env.reset()
- self.goal = self.goal_sample_fn()
- return np.concatenate([obs, self.goal])
-
- def step(self, a):
- obs, _, done, info = self.base_env.step(a)
- reward = self.reward_fn(obs, self.goal)
- return np.concatenate([obs, self.goal]), reward, done, info
-
- @property
- def action_space(self):
- return self.base_env.action_space
-
-
-def run_environment(env_name, episode_length, num_episodes):
- env = EnvWithGoal(
- create_maze_env.create_maze_env(env_name).gym,
- env_name)
-
- def action_fn(obs):
- action_space = env.action_space
- action_space_mean = (action_space.low + action_space.high) / 2.0
- action_space_magn = (action_space.high - action_space.low) / 2.0
- random_action = (action_space_mean +
- action_space_magn *
- np.random.uniform(low=-1.0, high=1.0,
- size=action_space.shape))
- return random_action
-
- rewards = []
- successes = []
- for ep in range(num_episodes):
- rewards.append(0.0)
- successes.append(False)
- obs = env.reset()
- for _ in range(episode_length):
- obs, reward, done, _ = env.step(action_fn(obs))
- rewards[-1] += reward
- successes[-1] = success_fn(reward)
- if done:
- break
- logging.info('Episode %d reward: %.2f, Success: %d', ep + 1, rewards[-1], successes[-1])
-
- logging.info('Average Reward over %d episodes: %.2f',
- num_episodes, np.mean(rewards))
- logging.info('Average Success over %d episodes: %.2f',
- num_episodes, np.mean(successes))
-
-
-def main(unused_argv):
- logging.set_verbosity(logging.INFO)
- run_environment(FLAGS.env, FLAGS.episode_length, FLAGS.num_episodes)
-
-
-if __name__ == '__main__':
- app.run()
diff --git a/research/efficient-hrl/run_eval.py b/research/efficient-hrl/run_eval.py
deleted file mode 100644
index 12f12369c4c..00000000000
--- a/research/efficient-hrl/run_eval.py
+++ /dev/null
@@ -1,51 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-r"""Script for evaluating a UVF agent.
-
-To run locally: See scripts/local_eval.py
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-import gin.tf
-# pylint: disable=unused-import
-import eval as eval_
-# pylint: enable=unused-import
-
-flags = tf.app.flags
-FLAGS = flags.FLAGS
-
-
-def main(_):
- tf.logging.set_verbosity(tf.logging.INFO)
- assert FLAGS.checkpoint_dir, "Flag 'checkpoint_dir' must be set."
- assert FLAGS.eval_dir, "Flag 'eval_dir' must be set."
-
- if FLAGS.config_file:
- for config_file in FLAGS.config_file:
- gin.parse_config_file(config_file)
- if FLAGS.params:
- gin.parse_config(FLAGS.params)
-
- eval_.evaluate(FLAGS.checkpoint_dir, FLAGS.eval_dir)
-
-
-if __name__ == "__main__":
- tf.app.run()
diff --git a/research/efficient-hrl/run_train.py b/research/efficient-hrl/run_train.py
deleted file mode 100644
index 1d459d60b7f..00000000000
--- a/research/efficient-hrl/run_train.py
+++ /dev/null
@@ -1,49 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-r"""Script for training an RL agent using the UVF algorithm.
-
-To run locally: See scripts/local_train.py
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow as tf
-
-import gin.tf
-# pylint: enable=unused-import
-import train
-# pylint: disable=unused-import
-
-flags = tf.app.flags
-FLAGS = flags.FLAGS
-
-
-def main(_):
- tf.logging.set_verbosity(tf.logging.INFO)
- if FLAGS.config_file:
- for config_file in FLAGS.config_file:
- gin.parse_config_file(config_file)
- if FLAGS.params:
- gin.parse_config(FLAGS.params)
-
- assert FLAGS.train_dir, "Flag 'train_dir' must be set."
- return train.train_uvf(FLAGS.train_dir)
-
-
-if __name__ == '__main__':
- tf.app.run()
diff --git a/research/efficient-hrl/scripts/local_eval.py b/research/efficient-hrl/scripts/local_eval.py
deleted file mode 100644
index 89ef745a408..00000000000
--- a/research/efficient-hrl/scripts/local_eval.py
+++ /dev/null
@@ -1,76 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Script to run run_eval.py locally.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-from subprocess import call
-import sys
-
-CONFIGS_PATH = 'configs'
-CONTEXT_CONFIGS_PATH = 'context/configs'
-
-def main():
- bb = './'
- base_num_args = 6
- if len(sys.argv) < base_num_args:
- print(
- "usage: python %s "
- " [params...]"
- % sys.argv[0])
- sys.exit(0)
- exp = sys.argv[1]
- context_setting = sys.argv[2]
- context = sys.argv[3]
- agent = sys.argv[4]
- assert sys.argv[5] in ["suite"], "args[5] must be `suite'"
- suite = ""
- binary = "python {bb}/run_eval{suite}.py ".format(bb=bb, suite=suite)
-
- h = os.environ["HOME"]
- ucp = CONFIGS_PATH
- ccp = CONTEXT_CONFIGS_PATH
- extra = ''
- command_str = ("{binary} "
- "--logtostderr "
- "--checkpoint_dir={h}/tmp/{context_setting}/{context}/{agent}/{exp}/train "
- "--eval_dir={h}/tmp/{context_setting}/{context}/{agent}/{exp}/eval "
- "--config_file={ucp}/{agent}.gin "
- "--config_file={ucp}/eval_{extra}uvf.gin "
- "--config_file={ccp}/{context_setting}.gin "
- "--config_file={ccp}/{context}.gin ").format(
- h=h,
- ucp=ucp,
- ccp=ccp,
- context_setting=context_setting,
- context=context,
- agent=agent,
- extra=extra,
- suite=suite,
- exp=exp,
- binary=binary)
- for extra_arg in sys.argv[base_num_args:]:
- command_str += "--params='%s' " % extra_arg
-
- print(command_str)
- call(command_str, shell=True)
-
-
-if __name__ == "__main__":
- main()
diff --git a/research/efficient-hrl/scripts/local_train.py b/research/efficient-hrl/scripts/local_train.py
deleted file mode 100644
index 718c88e8fed..00000000000
--- a/research/efficient-hrl/scripts/local_train.py
+++ /dev/null
@@ -1,76 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Script to run run_train.py locally.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import random
-from subprocess import call
-import sys
-
-CONFIGS_PATH = './configs'
-CONTEXT_CONFIGS_PATH = './context/configs'
-
-def main():
- bb = '.'
- base_num_args = 6
- if len(sys.argv) < base_num_args:
- print(
- "usage: python %s "
- " [params...]"
- % sys.argv[0])
- sys.exit(0)
- exp = sys.argv[1] # Name for experiment, e.g. 'test001'
- context_setting = sys.argv[2] # Context setting, e.g. 'hiro_orig'
- context = sys.argv[3] # Environment-specific context, e.g. 'ant_maze'
- agent = sys.argv[4] # Agent settings, e.g. 'base_uvf'
- assert sys.argv[5] in ["suite"], "args[5] must be `suite'"
- suite = ""
- binary = "python {bb}/run_train{suite}.py ".format(bb=bb, suite=suite)
-
- h = os.environ["HOME"]
- ucp = CONFIGS_PATH
- ccp = CONTEXT_CONFIGS_PATH
- extra = ''
- port = random.randint(2000, 8000)
- command_str = ("{binary} "
- "--train_dir={h}/tmp/{context_setting}/{context}/{agent}/{exp}/train "
- "--config_file={ucp}/{agent}.gin "
- "--config_file={ucp}/train_{extra}uvf.gin "
- "--config_file={ccp}/{context_setting}.gin "
- "--config_file={ccp}/{context}.gin "
- "--summarize_gradients=False "
- "--save_interval_secs=60 "
- "--save_summaries_secs=1 "
- "--master=local "
- "--alsologtostderr ").format(h=h, ucp=ucp,
- context_setting=context_setting,
- context=context, ccp=ccp,
- suite=suite, agent=agent, extra=extra,
- exp=exp, binary=binary,
- port=port)
- for extra_arg in sys.argv[base_num_args:]:
- command_str += "--params='%s' " % extra_arg
-
- print(command_str)
- call(command_str, shell=True)
-
-
-if __name__ == "__main__":
- main()
diff --git a/research/efficient-hrl/train.py b/research/efficient-hrl/train.py
deleted file mode 100644
index a40e81dbec6..00000000000
--- a/research/efficient-hrl/train.py
+++ /dev/null
@@ -1,670 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-r"""Script for training an RL agent using the UVF algorithm.
-
-To run locally: See run_train.py
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import time
-import tensorflow as tf
-slim = tf.contrib.slim
-
-import gin.tf
-# pylint: disable=unused-import
-import train_utils
-import agent as agent_
-from agents import circular_buffer
-from utils import utils as uvf_utils
-from environments import create_maze_env
-# pylint: enable=unused-import
-
-
-flags = tf.app.flags
-
-FLAGS = flags.FLAGS
-flags.DEFINE_string('goal_sample_strategy', 'sample',
- 'None, sample, FuN')
-
-LOAD_PATH = None
-
-
-def collect_experience(tf_env, agent, meta_agent, state_preprocess,
- replay_buffer, meta_replay_buffer,
- action_fn, meta_action_fn,
- environment_steps, num_episodes, num_resets,
- episode_rewards, episode_meta_rewards,
- store_context,
- disable_agent_reset):
- """Collect experience in a tf_env into a replay_buffer using action_fn.
-
- Args:
- tf_env: A TFEnvironment.
- agent: A UVF agent.
- meta_agent: A Meta Agent.
- replay_buffer: A Replay buffer to collect experience in.
- meta_replay_buffer: A Replay buffer to collect meta agent experience in.
- action_fn: A function to produce actions given current state.
- meta_action_fn: A function to produce meta actions given current state.
- environment_steps: A variable to count the number of steps in the tf_env.
- num_episodes: A variable to count the number of episodes.
- num_resets: A variable to count the number of resets.
- store_context: A boolean to check if store context in replay.
- disable_agent_reset: A boolean that disables agent from resetting.
-
- Returns:
- A collect_experience_op that excute an action and store into the
- replay_buffers
- """
- tf_env.start_collect()
- state = tf_env.current_obs()
- state_repr = state_preprocess(state)
- action = action_fn(state, context=None)
-
- with tf.control_dependencies([state]):
- transition_type, reward, discount = tf_env.step(action)
-
- def increment_step():
- return environment_steps.assign_add(1)
-
- def increment_episode():
- return num_episodes.assign_add(1)
-
- def increment_reset():
- return num_resets.assign_add(1)
-
- def update_episode_rewards(context_reward, meta_reward, reset):
- new_episode_rewards = tf.concat(
- [episode_rewards[:1] + context_reward, episode_rewards[1:]], 0)
- new_episode_meta_rewards = tf.concat(
- [episode_meta_rewards[:1] + meta_reward,
- episode_meta_rewards[1:]], 0)
- return tf.group(
- episode_rewards.assign(
- tf.cond(reset,
- lambda: tf.concat([[0.], episode_rewards[:-1]], 0),
- lambda: new_episode_rewards)),
- episode_meta_rewards.assign(
- tf.cond(reset,
- lambda: tf.concat([[0.], episode_meta_rewards[:-1]], 0),
- lambda: new_episode_meta_rewards)))
-
- def no_op_int():
- return tf.constant(0, dtype=tf.int64)
-
- step_cond = agent.step_cond_fn(state, action,
- transition_type,
- environment_steps, num_episodes)
- reset_episode_cond = agent.reset_episode_cond_fn(
- state, action,
- transition_type, environment_steps, num_episodes)
- reset_env_cond = agent.reset_env_cond_fn(state, action,
- transition_type,
- environment_steps, num_episodes)
-
- increment_step_op = tf.cond(step_cond, increment_step, no_op_int)
- increment_episode_op = tf.cond(reset_episode_cond, increment_episode,
- no_op_int)
- increment_reset_op = tf.cond(reset_env_cond, increment_reset, no_op_int)
- increment_op = tf.group(increment_step_op, increment_episode_op,
- increment_reset_op)
-
- with tf.control_dependencies([increment_op, reward, discount]):
- next_state = tf_env.current_obs()
- next_state_repr = state_preprocess(next_state)
- next_reset_episode_cond = tf.logical_or(
- agent.reset_episode_cond_fn(
- state, action,
- transition_type, environment_steps, num_episodes),
- tf.equal(discount, 0.0))
-
- if store_context:
- context = [tf.identity(var) + tf.zeros_like(var) for var in agent.context_vars]
- meta_context = [tf.identity(var) + tf.zeros_like(var) for var in meta_agent.context_vars]
- else:
- context = []
- meta_context = []
- with tf.control_dependencies([next_state] + context + meta_context):
- if disable_agent_reset:
- collect_experience_ops = [tf.no_op()] # don't reset agent
- else:
- collect_experience_ops = agent.cond_begin_episode_op(
- tf.logical_not(reset_episode_cond),
- [state, action, reward, next_state,
- state_repr, next_state_repr],
- mode='explore', meta_action_fn=meta_action_fn)
- context_reward, meta_reward = collect_experience_ops
- collect_experience_ops = list(collect_experience_ops)
- collect_experience_ops.append(
- update_episode_rewards(tf.reduce_sum(context_reward), meta_reward,
- reset_episode_cond))
-
- meta_action_every_n = agent.tf_context.meta_action_every_n
- with tf.control_dependencies(collect_experience_ops):
- transition = [state, action, reward, discount, next_state]
-
- meta_action = tf.to_float(
- tf.concat(context, -1)) # Meta agent action is low-level context
-
- meta_end = tf.logical_and( # End of meta-transition.
- tf.equal(agent.tf_context.t % meta_action_every_n, 1),
- agent.tf_context.t > 1)
- with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
- states_var = tf.get_variable('states_var',
- [meta_action_every_n, state.shape[-1]],
- state.dtype)
- actions_var = tf.get_variable('actions_var',
- [meta_action_every_n, action.shape[-1]],
- action.dtype)
- state_var = tf.get_variable('state_var', state.shape, state.dtype)
- reward_var = tf.get_variable('reward_var', reward.shape, reward.dtype)
- meta_action_var = tf.get_variable('meta_action_var',
- meta_action.shape, meta_action.dtype)
- meta_context_var = [
- tf.get_variable('meta_context_var%d' % idx,
- meta_context[idx].shape, meta_context[idx].dtype)
- for idx in range(len(meta_context))]
-
- actions_var_upd = tf.scatter_update(
- actions_var, (agent.tf_context.t - 2) % meta_action_every_n, action)
- with tf.control_dependencies([actions_var_upd]):
- actions = tf.identity(actions_var) + tf.zeros_like(actions_var)
- meta_reward = tf.identity(meta_reward) + tf.zeros_like(meta_reward)
- meta_reward = tf.reshape(meta_reward, reward.shape)
-
- reward = 0.1 * meta_reward
- meta_transition = [state_var, meta_action_var,
- reward_var + reward,
- discount * (1 - tf.to_float(next_reset_episode_cond)),
- next_state]
- meta_transition.extend([states_var, actions])
- if store_context: # store current and next context into replay
- transition += context + list(agent.context_vars)
- meta_transition += meta_context_var + list(meta_agent.context_vars)
-
- meta_step_cond = tf.squeeze(tf.logical_and(step_cond, tf.logical_or(next_reset_episode_cond, meta_end)))
-
- collect_experience_op = tf.group(
- replay_buffer.maybe_add(transition, step_cond),
- meta_replay_buffer.maybe_add(meta_transition, meta_step_cond),
- )
-
- with tf.control_dependencies([collect_experience_op]):
- collect_experience_op = tf.cond(reset_env_cond,
- tf_env.reset,
- tf_env.current_time_step)
-
- meta_period = tf.equal(agent.tf_context.t % meta_action_every_n, 1)
- states_var_upd = tf.scatter_update(
- states_var, (agent.tf_context.t - 1) % meta_action_every_n,
- next_state)
- state_var_upd = tf.assign(
- state_var,
- tf.cond(meta_period, lambda: next_state, lambda: state_var))
- reward_var_upd = tf.assign(
- reward_var,
- tf.cond(meta_period,
- lambda: tf.zeros_like(reward_var),
- lambda: reward_var + reward))
- meta_action = tf.to_float(tf.concat(agent.context_vars, -1))
- meta_action_var_upd = tf.assign(
- meta_action_var,
- tf.cond(meta_period, lambda: meta_action, lambda: meta_action_var))
- meta_context_var_upd = [
- tf.assign(
- meta_context_var[idx],
- tf.cond(meta_period,
- lambda: meta_agent.context_vars[idx],
- lambda: meta_context_var[idx]))
- for idx in range(len(meta_context))]
-
- return tf.group(
- collect_experience_op,
- states_var_upd,
- state_var_upd,
- reward_var_upd,
- meta_action_var_upd,
- *meta_context_var_upd)
-
-
-def sample_best_meta_actions(state_reprs, next_state_reprs, prev_meta_actions,
- low_states, low_actions, low_state_reprs,
- inverse_dynamics, uvf_agent, k=10):
- """Return meta-actions which approximately maximize low-level log-probs."""
- sampled_actions = inverse_dynamics.sample(state_reprs, next_state_reprs, k, prev_meta_actions)
- sampled_actions = tf.stop_gradient(sampled_actions)
- sampled_log_probs = tf.reshape(uvf_agent.log_probs(
- tf.tile(low_states, [k, 1, 1]),
- tf.tile(low_actions, [k, 1, 1]),
- tf.tile(low_state_reprs, [k, 1, 1]),
- [tf.reshape(sampled_actions, [-1, sampled_actions.shape[-1]])]),
- [k, low_states.shape[0],
- low_states.shape[1], -1])
- fitness = tf.reduce_sum(sampled_log_probs, [2, 3])
- best_actions = tf.argmax(fitness, 0)
- actions = tf.gather_nd(
- sampled_actions,
- tf.stack([best_actions,
- tf.range(prev_meta_actions.shape[0], dtype=tf.int64)], -1))
- return actions
-
-
-@gin.configurable
-def train_uvf(train_dir,
- environment=None,
- num_bin_actions=3,
- agent_class=None,
- meta_agent_class=None,
- state_preprocess_class=None,
- inverse_dynamics_class=None,
- exp_action_wrapper=None,
- replay_buffer=None,
- meta_replay_buffer=None,
- replay_num_steps=1,
- meta_replay_num_steps=1,
- critic_optimizer=None,
- actor_optimizer=None,
- meta_critic_optimizer=None,
- meta_actor_optimizer=None,
- repr_optimizer=None,
- relabel_contexts=False,
- meta_relabel_contexts=False,
- batch_size=64,
- repeat_size=0,
- num_episodes_train=2000,
- initial_episodes=2,
- initial_steps=None,
- num_updates_per_observation=1,
- num_collect_per_update=1,
- num_collect_per_meta_update=1,
- gamma=1.0,
- meta_gamma=1.0,
- reward_scale_factor=1.0,
- target_update_period=1,
- should_stop_early=None,
- clip_gradient_norm=0.0,
- summarize_gradients=False,
- debug_summaries=False,
- log_every_n_steps=100,
- prefetch_queue_capacity=2,
- policy_save_dir='policy',
- save_policy_every_n_steps=1000,
- save_policy_interval_secs=0,
- replay_context_ratio=0.0,
- next_state_as_context_ratio=0.0,
- state_index=0,
- zero_timer_ratio=0.0,
- timer_index=-1,
- debug=False,
- max_policies_to_save=None,
- max_steps_per_episode=None,
- load_path=LOAD_PATH):
- """Train an agent."""
- tf_env = create_maze_env.TFPyEnvironment(environment)
- observation_spec = [tf_env.observation_spec()]
- action_spec = [tf_env.action_spec()]
-
- max_steps_per_episode = max_steps_per_episode or tf_env.pyenv.max_episode_steps
-
- assert max_steps_per_episode, 'max_steps_per_episode need to be set'
-
- if initial_steps is None:
- initial_steps = initial_episodes * max_steps_per_episode
-
- if agent_class.ACTION_TYPE == 'discrete':
- assert False
- else:
- assert agent_class.ACTION_TYPE == 'continuous'
-
- assert agent_class.ACTION_TYPE == meta_agent_class.ACTION_TYPE
- with tf.variable_scope('meta_agent'):
- meta_agent = meta_agent_class(
- observation_spec,
- action_spec,
- tf_env,
- debug_summaries=debug_summaries)
- meta_agent.set_replay(replay=meta_replay_buffer)
-
- with tf.variable_scope('uvf_agent'):
- uvf_agent = agent_class(
- observation_spec,
- action_spec,
- tf_env,
- debug_summaries=debug_summaries)
- uvf_agent.set_meta_agent(agent=meta_agent)
- uvf_agent.set_replay(replay=replay_buffer)
-
- with tf.variable_scope('state_preprocess'):
- state_preprocess = state_preprocess_class()
-
- with tf.variable_scope('inverse_dynamics'):
- inverse_dynamics = inverse_dynamics_class(
- meta_agent.sub_context_as_action_specs[0])
-
- # Create counter variables
- global_step = tf.contrib.framework.get_or_create_global_step()
- num_episodes = tf.Variable(0, dtype=tf.int64, name='num_episodes')
- num_resets = tf.Variable(0, dtype=tf.int64, name='num_resets')
- num_updates = tf.Variable(0, dtype=tf.int64, name='num_updates')
- num_meta_updates = tf.Variable(0, dtype=tf.int64, name='num_meta_updates')
- episode_rewards = tf.Variable([0.] * 100, name='episode_rewards')
- episode_meta_rewards = tf.Variable([0.] * 100, name='episode_meta_rewards')
-
- # Create counter variables summaries
- train_utils.create_counter_summaries([
- ('environment_steps', global_step),
- ('num_episodes', num_episodes),
- ('num_resets', num_resets),
- ('num_updates', num_updates),
- ('num_meta_updates', num_meta_updates),
- ('replay_buffer_adds', replay_buffer.get_num_adds()),
- ('meta_replay_buffer_adds', meta_replay_buffer.get_num_adds()),
- ])
-
- tf.summary.scalar('avg_episode_rewards',
- tf.reduce_mean(episode_rewards[1:]))
- tf.summary.scalar('avg_episode_meta_rewards',
- tf.reduce_mean(episode_meta_rewards[1:]))
- tf.summary.histogram('episode_rewards', episode_rewards[1:])
- tf.summary.histogram('episode_meta_rewards', episode_meta_rewards[1:])
-
- # Create init ops
- action_fn = uvf_agent.action
- action_fn = uvf_agent.add_noise_fn(action_fn, global_step=None)
- meta_action_fn = meta_agent.action
- meta_action_fn = meta_agent.add_noise_fn(meta_action_fn, global_step=None)
- meta_actions_fn = meta_agent.actions
- meta_actions_fn = meta_agent.add_noise_fn(meta_actions_fn, global_step=None)
- init_collect_experience_op = collect_experience(
- tf_env,
- uvf_agent,
- meta_agent,
- state_preprocess,
- replay_buffer,
- meta_replay_buffer,
- action_fn,
- meta_action_fn,
- environment_steps=global_step,
- num_episodes=num_episodes,
- num_resets=num_resets,
- episode_rewards=episode_rewards,
- episode_meta_rewards=episode_meta_rewards,
- store_context=True,
- disable_agent_reset=False,
- )
-
- # Create train ops
- collect_experience_op = collect_experience(
- tf_env,
- uvf_agent,
- meta_agent,
- state_preprocess,
- replay_buffer,
- meta_replay_buffer,
- action_fn,
- meta_action_fn,
- environment_steps=global_step,
- num_episodes=num_episodes,
- num_resets=num_resets,
- episode_rewards=episode_rewards,
- episode_meta_rewards=episode_meta_rewards,
- store_context=True,
- disable_agent_reset=False,
- )
-
- train_op_list = []
- repr_train_op = tf.constant(0.0)
- for mode in ['meta', 'nometa']:
- if mode == 'meta':
- agent = meta_agent
- buff = meta_replay_buffer
- critic_opt = meta_critic_optimizer
- actor_opt = meta_actor_optimizer
- relabel = meta_relabel_contexts
- num_steps = meta_replay_num_steps
- my_gamma = meta_gamma,
- n_updates = num_meta_updates
- else:
- agent = uvf_agent
- buff = replay_buffer
- critic_opt = critic_optimizer
- actor_opt = actor_optimizer
- relabel = relabel_contexts
- num_steps = replay_num_steps
- my_gamma = gamma
- n_updates = num_updates
-
- with tf.name_scope(mode):
- batch = buff.get_random_batch(batch_size, num_steps=num_steps)
- states, actions, rewards, discounts, next_states = batch[:5]
- with tf.name_scope('Reward'):
- tf.summary.scalar('average_step_reward', tf.reduce_mean(rewards))
- rewards *= reward_scale_factor
- batch_queue = slim.prefetch_queue.prefetch_queue(
- [states, actions, rewards, discounts, next_states] + batch[5:],
- capacity=prefetch_queue_capacity,
- name='batch_queue')
-
- batch_dequeue = batch_queue.dequeue()
- if repeat_size > 0:
- batch_dequeue = [
- tf.tile(batch, (repeat_size+1,) + (1,) * (batch.shape.ndims - 1))
- for batch in batch_dequeue
- ]
- batch_size *= (repeat_size + 1)
- states, actions, rewards, discounts, next_states = batch_dequeue[:5]
- if mode == 'meta':
- low_states = batch_dequeue[5]
- low_actions = batch_dequeue[6]
- low_state_reprs = state_preprocess(low_states)
- state_reprs = state_preprocess(states)
- next_state_reprs = state_preprocess(next_states)
-
- if mode == 'meta': # Re-label meta-action
- prev_actions = actions
- if FLAGS.goal_sample_strategy == 'None':
- pass
- elif FLAGS.goal_sample_strategy == 'FuN':
- actions = inverse_dynamics.sample(state_reprs, next_state_reprs, 1, prev_actions, sc=0.1)
- actions = tf.stop_gradient(actions)
- elif FLAGS.goal_sample_strategy == 'sample':
- actions = sample_best_meta_actions(state_reprs, next_state_reprs, prev_actions,
- low_states, low_actions, low_state_reprs,
- inverse_dynamics, uvf_agent, k=10)
- else:
- assert False
-
- if state_preprocess.trainable and mode == 'meta':
- # Representation learning is based on meta-transitions, but is trained
- # along with low-level policy updates.
- repr_loss, _, _ = state_preprocess.loss(states, next_states, low_actions, low_states)
- repr_train_op = slim.learning.create_train_op(
- repr_loss,
- repr_optimizer,
- global_step=None,
- update_ops=None,
- summarize_gradients=summarize_gradients,
- clip_gradient_norm=clip_gradient_norm,
- variables_to_train=state_preprocess.get_trainable_vars(),)
-
- # Get contexts for training
- contexts, next_contexts = agent.sample_contexts(
- mode='train', batch_size=batch_size,
- state=states, next_state=next_states,
- )
- if not relabel: # Re-label context (in the style of TDM or HER).
- contexts, next_contexts = (
- batch_dequeue[-2*len(contexts):-1*len(contexts)],
- batch_dequeue[-1*len(contexts):])
-
- merged_states = agent.merged_states(states, contexts)
- merged_next_states = agent.merged_states(next_states, next_contexts)
- if mode == 'nometa':
- context_rewards, context_discounts = agent.compute_rewards(
- 'train', state_reprs, actions, rewards, next_state_reprs, contexts)
- elif mode == 'meta': # Meta-agent uses sum of rewards, not context-specific rewards.
- _, context_discounts = agent.compute_rewards(
- 'train', states, actions, rewards, next_states, contexts)
- context_rewards = rewards
-
- if agent.gamma_index is not None:
- context_discounts *= tf.cast(
- tf.reshape(contexts[agent.gamma_index], (-1,)),
- dtype=context_discounts.dtype)
- else: context_discounts *= my_gamma
-
- critic_loss = agent.critic_loss(merged_states, actions,
- context_rewards, context_discounts,
- merged_next_states)
-
- critic_loss = tf.reduce_mean(critic_loss)
-
- actor_loss = agent.actor_loss(merged_states, actions,
- context_rewards, context_discounts,
- merged_next_states)
- actor_loss *= tf.to_float( # Only update actor every N steps.
- tf.equal(n_updates % target_update_period, 0))
-
- critic_train_op = slim.learning.create_train_op(
- critic_loss,
- critic_opt,
- global_step=n_updates,
- update_ops=None,
- summarize_gradients=summarize_gradients,
- clip_gradient_norm=clip_gradient_norm,
- variables_to_train=agent.get_trainable_critic_vars(),)
- critic_train_op = uvf_utils.tf_print(
- critic_train_op, [critic_train_op],
- message='critic_loss',
- print_freq=1000,
- name='critic_loss')
- train_op_list.append(critic_train_op)
- if actor_loss is not None:
- actor_train_op = slim.learning.create_train_op(
- actor_loss,
- actor_opt,
- global_step=None,
- update_ops=None,
- summarize_gradients=summarize_gradients,
- clip_gradient_norm=clip_gradient_norm,
- variables_to_train=agent.get_trainable_actor_vars(),)
- actor_train_op = uvf_utils.tf_print(
- actor_train_op, [actor_train_op],
- message='actor_loss',
- print_freq=1000,
- name='actor_loss')
- train_op_list.append(actor_train_op)
-
- assert len(train_op_list) == 4
- # Update targets should happen after the networks have been updated.
- with tf.control_dependencies(train_op_list[2:]):
- update_targets_op = uvf_utils.periodically(
- uvf_agent.update_targets, target_update_period, 'update_targets')
- if meta_agent is not None:
- with tf.control_dependencies(train_op_list[:2]):
- update_meta_targets_op = uvf_utils.periodically(
- meta_agent.update_targets, target_update_period, 'update_targets')
-
- assert_op = tf.Assert( # Hack to get training to stop.
- tf.less_equal(global_step, 200 + num_episodes_train * max_steps_per_episode),
- [global_step])
- with tf.control_dependencies([update_targets_op, assert_op]):
- train_op = tf.add_n(train_op_list[2:], name='post_update_targets')
- # Representation training steps on every low-level policy training step.
- train_op += repr_train_op
- with tf.control_dependencies([update_meta_targets_op, assert_op]):
- meta_train_op = tf.add_n(train_op_list[:2],
- name='post_update_meta_targets')
-
- if debug_summaries:
- train_.gen_debug_batch_summaries(batch)
- slim.summaries.add_histogram_summaries(
- uvf_agent.get_trainable_critic_vars(), 'critic_vars')
- slim.summaries.add_histogram_summaries(
- uvf_agent.get_trainable_actor_vars(), 'actor_vars')
-
- train_ops = train_utils.TrainOps(train_op, meta_train_op,
- collect_experience_op)
-
- policy_save_path = os.path.join(train_dir, policy_save_dir, 'model.ckpt')
- policy_vars = uvf_agent.get_actor_vars() + meta_agent.get_actor_vars() + [
- global_step, num_episodes, num_resets
- ] + list(uvf_agent.context_vars) + list(meta_agent.context_vars) + state_preprocess.get_trainable_vars()
- # add critic vars, since some test evaluation depends on them
- policy_vars += uvf_agent.get_trainable_critic_vars() + meta_agent.get_trainable_critic_vars()
- policy_saver = tf.train.Saver(
- policy_vars, max_to_keep=max_policies_to_save, sharded=False)
-
- lowlevel_vars = (uvf_agent.get_actor_vars() +
- uvf_agent.get_trainable_critic_vars() +
- state_preprocess.get_trainable_vars())
- lowlevel_saver = tf.train.Saver(lowlevel_vars)
-
- def policy_save_fn(sess):
- policy_saver.save(
- sess, policy_save_path, global_step=global_step, write_meta_graph=False)
- if save_policy_interval_secs > 0:
- tf.logging.info(
- 'Wait %d secs after save policy.' % save_policy_interval_secs)
- time.sleep(save_policy_interval_secs)
-
- train_step_fn = train_utils.TrainStep(
- max_number_of_steps=num_episodes_train * max_steps_per_episode + 100,
- num_updates_per_observation=num_updates_per_observation,
- num_collect_per_update=num_collect_per_update,
- num_collect_per_meta_update=num_collect_per_meta_update,
- log_every_n_steps=log_every_n_steps,
- policy_save_fn=policy_save_fn,
- save_policy_every_n_steps=save_policy_every_n_steps,
- should_stop_early=should_stop_early).train_step
-
- local_init_op = tf.local_variables_initializer()
- init_targets_op = tf.group(uvf_agent.update_targets(1.0),
- meta_agent.update_targets(1.0))
-
- def initialize_training_fn(sess):
- """Initialize training function."""
- sess.run(local_init_op)
- sess.run(init_targets_op)
- if load_path:
- tf.logging.info('Restoring low-level from %s' % load_path)
- lowlevel_saver.restore(sess, load_path)
- global_step_value = sess.run(global_step)
- assert global_step_value == 0, 'Global step should be zero.'
- collect_experience_call = sess.make_callable(
- init_collect_experience_op)
-
- for _ in range(initial_steps):
- collect_experience_call()
-
- train_saver = tf.train.Saver(max_to_keep=2, sharded=True)
- tf.logging.info('train dir: %s', train_dir)
- return slim.learning.train(
- train_ops,
- train_dir,
- train_step_fn=train_step_fn,
- save_interval_secs=FLAGS.save_interval_secs,
- saver=train_saver,
- log_every_n_steps=0,
- global_step=global_step,
- master="",
- is_chief=(FLAGS.task == 0),
- save_summaries_secs=FLAGS.save_summaries_secs,
- init_fn=initialize_training_fn)
diff --git a/research/efficient-hrl/train_utils.py b/research/efficient-hrl/train_utils.py
deleted file mode 100644
index ae23ef9f095..00000000000
--- a/research/efficient-hrl/train_utils.py
+++ /dev/null
@@ -1,175 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-r""""""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from collections import namedtuple
-import os
-import time
-
-import tensorflow as tf
-
-import gin.tf
-
-flags = tf.app.flags
-
-
-flags.DEFINE_multi_string('config_file', None,
- 'List of paths to the config files.')
-flags.DEFINE_multi_string('params', None,
- 'Newline separated list of Gin parameter bindings.')
-
-flags.DEFINE_string('train_dir', None,
- 'Directory for writing logs/summaries during training.')
-flags.DEFINE_string('master', 'local',
- 'BNS name of the TensorFlow master to use.')
-flags.DEFINE_integer('task', 0, 'task id')
-flags.DEFINE_integer('save_interval_secs', 300, 'The frequency at which '
- 'checkpoints are saved, in seconds.')
-flags.DEFINE_integer('save_summaries_secs', 30, 'The frequency at which '
- 'summaries are saved, in seconds.')
-flags.DEFINE_boolean('summarize_gradients', False,
- 'Whether to generate gradient summaries.')
-
-FLAGS = flags.FLAGS
-
-TrainOps = namedtuple('TrainOps',
- ['train_op', 'meta_train_op', 'collect_experience_op'])
-
-
-class TrainStep(object):
- """Handles training step."""
-
- def __init__(self,
- max_number_of_steps=0,
- num_updates_per_observation=1,
- num_collect_per_update=1,
- num_collect_per_meta_update=1,
- log_every_n_steps=1,
- policy_save_fn=None,
- save_policy_every_n_steps=0,
- should_stop_early=None):
- """Returns a function that is executed at each step of slim training.
-
- Args:
- max_number_of_steps: Optional maximum number of train steps to take.
- num_updates_per_observation: Number of updates per observation.
- log_every_n_steps: The frequency, in terms of global steps, that the loss
- and global step and logged.
- policy_save_fn: A tf.Saver().save function to save the policy.
- save_policy_every_n_steps: How frequently to save the policy.
- should_stop_early: Optional hook to report whether training should stop.
- Raises:
- ValueError: If policy_save_fn is not provided when
- save_policy_every_n_steps > 0.
- """
- if save_policy_every_n_steps and policy_save_fn is None:
- raise ValueError(
- 'policy_save_fn is required when save_policy_every_n_steps > 0')
- self.max_number_of_steps = max_number_of_steps
- self.num_updates_per_observation = num_updates_per_observation
- self.num_collect_per_update = num_collect_per_update
- self.num_collect_per_meta_update = num_collect_per_meta_update
- self.log_every_n_steps = log_every_n_steps
- self.policy_save_fn = policy_save_fn
- self.save_policy_every_n_steps = save_policy_every_n_steps
- self.should_stop_early = should_stop_early
- self.last_global_step_val = 0
- self.train_op_fn = None
- self.collect_and_train_fn = None
- tf.logging.info('Training for %d max_number_of_steps',
- self.max_number_of_steps)
-
- def train_step(self, sess, train_ops, global_step, _):
- """This function will be called at each step of training.
-
- This represents one step of the DDPG algorithm and can include:
- 1. collect a transition
- 2. update the target network
- 3. train the actor
- 4. train the critic
-
- Args:
- sess: A Tensorflow session.
- train_ops: A DdpgTrainOps tuple of train ops to run.
- global_step: The global step.
-
- Returns:
- A scalar total loss.
- A boolean should stop.
- """
- start_time = time.time()
- if self.train_op_fn is None:
- self.train_op_fn = sess.make_callable([train_ops.train_op, global_step])
- self.meta_train_op_fn = sess.make_callable([train_ops.meta_train_op, global_step])
- self.collect_fn = sess.make_callable([train_ops.collect_experience_op, global_step])
- self.collect_and_train_fn = sess.make_callable(
- [train_ops.train_op, global_step, train_ops.collect_experience_op])
- self.collect_and_meta_train_fn = sess.make_callable(
- [train_ops.meta_train_op, global_step, train_ops.collect_experience_op])
- for _ in range(self.num_collect_per_update - 1):
- self.collect_fn()
- for _ in range(self.num_updates_per_observation - 1):
- self.train_op_fn()
-
- total_loss, global_step_val, _ = self.collect_and_train_fn()
- if (global_step_val // self.num_collect_per_meta_update !=
- self.last_global_step_val // self.num_collect_per_meta_update):
- self.meta_train_op_fn()
-
- time_elapsed = time.time() - start_time
- should_stop = False
- if self.max_number_of_steps:
- should_stop = global_step_val >= self.max_number_of_steps
- if global_step_val != self.last_global_step_val:
- if (self.save_policy_every_n_steps and
- global_step_val // self.save_policy_every_n_steps !=
- self.last_global_step_val // self.save_policy_every_n_steps):
- self.policy_save_fn(sess)
-
- if (self.log_every_n_steps and
- global_step_val % self.log_every_n_steps == 0):
- tf.logging.info(
- 'global step %d: loss = %.4f (%.3f sec/step) (%d steps/sec)',
- global_step_val, total_loss, time_elapsed, 1 / time_elapsed)
-
- self.last_global_step_val = global_step_val
- stop_early = bool(self.should_stop_early and self.should_stop_early())
- return total_loss, should_stop or stop_early
-
-
-def create_counter_summaries(counters):
- """Add named summaries to counters, a list of tuples (name, counter)."""
- if counters:
- with tf.name_scope('Counters/'):
- for name, counter in counters:
- tf.summary.scalar(name, counter)
-
-
-def gen_debug_batch_summaries(batch):
- """Generates summaries for the sampled replay batch."""
- states, actions, rewards, _, next_states = batch
- with tf.name_scope('batch'):
- for s in range(states.get_shape()[-1]):
- tf.summary.histogram('states_%d' % s, states[:, s])
- for s in range(states.get_shape()[-1]):
- tf.summary.histogram('next_states_%d' % s, next_states[:, s])
- for a in range(actions.get_shape()[-1]):
- tf.summary.histogram('actions_%d' % a, actions[:, a])
- tf.summary.histogram('rewards', rewards)
diff --git a/research/efficient-hrl/utils/__init__.py b/research/efficient-hrl/utils/__init__.py
deleted file mode 100644
index 8b137891791..00000000000
--- a/research/efficient-hrl/utils/__init__.py
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/research/efficient-hrl/utils/eval_utils.py b/research/efficient-hrl/utils/eval_utils.py
deleted file mode 100644
index c88efc80fe1..00000000000
--- a/research/efficient-hrl/utils/eval_utils.py
+++ /dev/null
@@ -1,151 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Evaluation utility functions.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import numpy as np
-import tensorflow as tf
-from collections import namedtuple
-logging = tf.logging
-import gin.tf
-
-
-@gin.configurable
-def evaluate_checkpoint_repeatedly(checkpoint_dir,
- evaluate_checkpoint_fn,
- eval_interval_secs=600,
- max_number_of_evaluations=None,
- checkpoint_timeout=None,
- timeout_fn=None):
- """Evaluates a checkpointed model at a set interval."""
- if max_number_of_evaluations is not None and max_number_of_evaluations <= 0:
- raise ValueError(
- '`max_number_of_evaluations` must be either None or a positive number.')
-
- number_of_evaluations = 0
- for checkpoint_path in tf.contrib.training.checkpoints_iterator(
- checkpoint_dir,
- min_interval_secs=eval_interval_secs,
- timeout=checkpoint_timeout,
- timeout_fn=timeout_fn):
- retries = 3
- for _ in range(retries):
- try:
- should_stop = evaluate_checkpoint_fn(checkpoint_path)
- break
- except tf.errors.DataLossError as e:
- logging.warn(
- 'Encountered a DataLossError while evaluating a checkpoint. This '
- 'can happen when reading a checkpoint before it is fully written. '
- 'Retrying...'
- )
- time.sleep(2.0)
-
-
-def compute_model_loss(sess, model_rollout_fn, states, actions):
- """Computes model loss."""
- preds, losses = [], []
- preds.append(states[0])
- losses.append(0)
- for state, action in zip(states[1:], actions[1:]):
- pred = model_rollout_fn(sess, preds[-1], action)
- loss = np.sqrt(np.sum((state - pred) ** 2))
- preds.append(pred)
- losses.append(loss)
- return preds, losses
-
-
-def compute_average_reward(sess, env_base, step_fn, gamma, num_steps,
- num_episodes):
- """Computes the discounted reward for a given number of steps.
-
- Args:
- sess: The tensorflow session.
- env_base: A python environment.
- step_fn: A function that takes in `sess` and returns a list of
- [state, action, reward, discount, transition_type] values.
- gamma: discounting factor to apply to the reward.
- num_steps: number of steps to compute the reward over.
- num_episodes: number of episodes to average the reward over.
- Returns:
- average_reward: a scalar of discounted reward.
- last_reward: last reward received.
- """
- average_reward = 0
- average_last_reward = 0
- average_meta_reward = 0
- average_last_meta_reward = 0
- average_success = 0.
- states, actions = None, None
- for i in range(num_episodes):
- env_base.end_episode()
- env_base.begin_episode()
- (reward, last_reward, meta_reward, last_meta_reward,
- states, actions) = compute_reward(
- sess, step_fn, gamma, num_steps)
- s_reward = last_meta_reward # Navigation
- success = (s_reward > -5.0) # When using diff=False
- logging.info('Episode = %d, reward = %s, meta_reward = %f, '
- 'last_reward = %s, last meta_reward = %f, success = %s',
- i, reward, meta_reward, last_reward, last_meta_reward,
- success)
- average_reward += reward
- average_last_reward += last_reward
- average_meta_reward += meta_reward
- average_last_meta_reward += last_meta_reward
- average_success += success
- average_reward /= num_episodes
- average_last_reward /= num_episodes
- average_meta_reward /= num_episodes
- average_last_meta_reward /= num_episodes
- average_success /= num_episodes
- return (average_reward, average_last_reward,
- average_meta_reward, average_last_meta_reward,
- average_success,
- states, actions)
-
-
-def compute_reward(sess, step_fn, gamma, num_steps):
- """Computes the discounted reward for a given number of steps.
-
- Args:
- sess: The tensorflow session.
- step_fn: A function that takes in `sess` and returns a list of
- [state, action, reward, discount, transition_type] values.
- gamma: discounting factor to apply to the reward.
- num_steps: number of steps to compute the reward over.
- Returns:
- reward: cumulative discounted reward.
- last_reward: reward received at final step.
- """
-
- total_reward = 0
- total_meta_reward = 0
- gamma_step = 1
- states = []
- actions = []
- for _ in range(num_steps):
- state, action, transition_type, reward, meta_reward, discount, _, _ = step_fn(sess)
- total_reward += reward * gamma_step * discount
- total_meta_reward += meta_reward * gamma_step * discount
- gamma_step *= gamma
- states.append(state)
- actions.append(action)
- return (total_reward, reward, total_meta_reward, meta_reward,
- states, actions)
diff --git a/research/efficient-hrl/utils/utils.py b/research/efficient-hrl/utils/utils.py
deleted file mode 100644
index e188316c33b..00000000000
--- a/research/efficient-hrl/utils/utils.py
+++ /dev/null
@@ -1,318 +0,0 @@
-# Copyright 2018 The TensorFlow Authors All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""TensorFlow utility functions.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from copy import deepcopy
-import tensorflow as tf
-from tf_agents import specs
-from tf_agents.utils import common
-
-_tf_print_counts = dict()
-_tf_print_running_sums = dict()
-_tf_print_running_counts = dict()
-_tf_print_ids = 0
-
-
-def get_contextual_env_base(env_base, begin_ops=None, end_ops=None):
- """Wrap env_base with additional tf ops."""
- # pylint: disable=protected-access
- def init(self_, env_base):
- self_._env_base = env_base
- attribute_list = ["_render_mode", "_gym_env"]
- for attribute in attribute_list:
- if hasattr(env_base, attribute):
- setattr(self_, attribute, getattr(env_base, attribute))
- if hasattr(env_base, "physics"):
- self_._physics = env_base.physics
- elif hasattr(env_base, "gym"):
- class Physics(object):
- def render(self, *args, **kwargs):
- return env_base.gym.render("rgb_array")
- physics = Physics()
- self_._physics = physics
- self_.physics = physics
- def set_sess(self_, sess):
- self_._sess = sess
- if hasattr(self_._env_base, "set_sess"):
- self_._env_base.set_sess(sess)
- def begin_episode(self_):
- self_._env_base.reset()
- if begin_ops is not None:
- self_._sess.run(begin_ops)
- def end_episode(self_):
- self_._env_base.reset()
- if end_ops is not None:
- self_._sess.run(end_ops)
- return type("ContextualEnvBase", (env_base.__class__,), dict(
- __init__=init,
- set_sess=set_sess,
- begin_episode=begin_episode,
- end_episode=end_episode,
- ))(env_base)
- # pylint: enable=protected-access
-
-
-def merge_specs(specs_):
- """Merge TensorSpecs.
-
- Args:
- specs_: List of TensorSpecs to be merged.
- Returns:
- a TensorSpec: a merged TensorSpec.
- """
- shape = specs_[0].shape
- dtype = specs_[0].dtype
- name = specs_[0].name
- for spec in specs_[1:]:
- assert shape[1:] == spec.shape[1:], "incompatible shapes: %s, %s" % (
- shape, spec.shape)
- assert dtype == spec.dtype, "incompatible dtypes: %s, %s" % (
- dtype, spec.dtype)
- shape = merge_shapes((shape, spec.shape), axis=0)
- return specs.TensorSpec(
- shape=shape,
- dtype=dtype,
- name=name,
- )
-
-
-def merge_shapes(shapes, axis=0):
- """Merge TensorShapes.
-
- Args:
- shapes: List of TensorShapes to be merged.
- axis: optional, the axis to merge shaped.
- Returns:
- a TensorShape: a merged TensorShape.
- """
- assert len(shapes) > 1
- dims = deepcopy(shapes[0].dims)
- for shape in shapes[1:]:
- assert shapes[0].ndims == shape.ndims
- dims[axis] += shape.dims[axis]
- return tf.TensorShape(dims=dims)
-
-
-def get_all_vars(ignore_scopes=None):
- """Get all tf variables in scope.
-
- Args:
- ignore_scopes: A list of scope names to ignore.
- Returns:
- A list of all tf variables in scope.
- """
- all_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
- all_vars = [var for var in all_vars if ignore_scopes is None or not
- any(var.name.startswith(scope) for scope in ignore_scopes)]
- return all_vars
-
-
-def clip(tensor, range_=None):
- """Return a tf op which clips tensor according to range_.
-
- Args:
- tensor: A Tensor to be clipped.
- range_: None, or a tuple representing (minval, maxval)
- Returns:
- A clipped Tensor.
- """
- if range_ is None:
- return tf.identity(tensor)
- elif isinstance(range_, (tuple, list)):
- assert len(range_) == 2
- return tf.clip_by_value(tensor, range_[0], range_[1])
- else: raise NotImplementedError("Unacceptable range input: %r" % range_)
-
-
-def clip_to_bounds(value, minimum, maximum):
- """Clips value to be between minimum and maximum.
-
- Args:
- value: (tensor) value to be clipped.
- minimum: (numpy float array) minimum value to clip to.
- maximum: (numpy float array) maximum value to clip to.
- Returns:
- clipped_value: (tensor) `value` clipped to between `minimum` and `maximum`.
- """
- value = tf.minimum(value, maximum)
- return tf.maximum(value, minimum)
-
-
-clip_to_spec = common.clip_to_spec
-def _clip_to_spec(value, spec):
- """Clips value to a given bounded tensor spec.
-
- Args:
- value: (tensor) value to be clipped.
- spec: (BoundedTensorSpec) spec containing min. and max. values for clipping.
- Returns:
- clipped_value: (tensor) `value` clipped to be compatible with `spec`.
- """
- return clip_to_bounds(value, spec.minimum, spec.maximum)
-
-
-join_scope = common.join_scope
-def _join_scope(parent_scope, child_scope):
- """Joins a parent and child scope using `/`, checking for empty/none.
-
- Args:
- parent_scope: (string) parent/prefix scope.
- child_scope: (string) child/suffix scope.
- Returns:
- joined scope: (string) parent and child scopes joined by /.
- """
- if not parent_scope:
- return child_scope
- if not child_scope:
- return parent_scope
- return '/'.join([parent_scope, child_scope])
-
-
-def assign_vars(vars_, values):
- """Returns the update ops for assigning a list of vars.
-
- Args:
- vars_: A list of variables.
- values: A list of tensors representing new values.
- Returns:
- A list of update ops for the variables.
- """
- return [var.assign(value) for var, value in zip(vars_, values)]
-
-
-def identity_vars(vars_):
- """Return the identity ops for a list of tensors.
-
- Args:
- vars_: A list of tensors.
- Returns:
- A list of identity ops.
- """
- return [tf.identity(var) for var in vars_]
-
-
-def tile(var, batch_size=1):
- """Return tiled tensor.
-
- Args:
- var: A tensor representing the state.
- batch_size: Batch size.
- Returns:
- A tensor with shape [batch_size,] + var.shape.
- """
- batch_var = tf.tile(
- tf.expand_dims(var, 0),
- (batch_size,) + (1,) * var.get_shape().ndims)
- return batch_var
-
-
-def batch_list(vars_list):
- """Batch a list of variables.
-
- Args:
- vars_list: A list of tensor variables.
- Returns:
- A list of tensor variables with additional first dimension.
- """
- return [tf.expand_dims(var, 0) for var in vars_list]
-
-
-def tf_print(op,
- tensors,
- message="",
- first_n=-1,
- name=None,
- sub_messages=None,
- print_freq=-1,
- include_count=True):
- """tf.Print, but to stdout."""
- # TODO(shanegu): `name` is deprecated. Remove from the rest of codes.
- global _tf_print_ids
- _tf_print_ids += 1
- name = _tf_print_ids
- _tf_print_counts[name] = 0
- if print_freq > 0:
- _tf_print_running_sums[name] = [0 for _ in tensors]
- _tf_print_running_counts[name] = 0
- def print_message(*xs):
- """print message fn."""
- _tf_print_counts[name] += 1
- if print_freq > 0:
- for i, x in enumerate(xs):
- _tf_print_running_sums[name][i] += x
- _tf_print_running_counts[name] += 1
- if (print_freq <= 0 or _tf_print_running_counts[name] >= print_freq) and (
- first_n < 0 or _tf_print_counts[name] <= first_n):
- for i, x in enumerate(xs):
- if print_freq > 0:
- del x
- x = _tf_print_running_sums[name][i]/_tf_print_running_counts[name]
- if sub_messages is None:
- sub_message = str(i)
- else:
- sub_message = sub_messages[i]
- log_message = "%s, %s" % (message, sub_message)
- if include_count:
- log_message += ", count=%d" % _tf_print_counts[name]
- tf.logging.info("[%s]: %s" % (log_message, x))
- if print_freq > 0:
- for i, x in enumerate(xs):
- _tf_print_running_sums[name][i] = 0
- _tf_print_running_counts[name] = 0
- return xs[0]
-
- print_op = tf.py_func(print_message, tensors, tensors[0].dtype)
- with tf.control_dependencies([print_op]):
- op = tf.identity(op)
- return op
-
-
-periodically = common.periodically
-def _periodically(body, period, name='periodically'):
- """Periodically performs a tensorflow op."""
- if period is None or period == 0:
- return tf.no_op()
-
- if period < 0:
- raise ValueError("period cannot be less than 0.")
-
- if period == 1:
- return body()
-
- with tf.variable_scope(None, default_name=name):
- counter = tf.get_variable(
- "counter",
- shape=[],
- dtype=tf.int64,
- trainable=False,
- initializer=tf.constant_initializer(period, dtype=tf.int64))
-
- def _wrapped_body():
- with tf.control_dependencies([body()]):
- return counter.assign(1)
-
- update = tf.cond(
- tf.equal(counter, period), _wrapped_body,
- lambda: counter.assign_add(1))
-
- return update
-
-soft_variables_update = common.soft_variables_update
diff --git a/research/lfads/README.md b/research/lfads/README.md
deleted file mode 100644
index c75b656e474..00000000000
--- a/research/lfads/README.md
+++ /dev/null
@@ -1,224 +0,0 @@
-![TensorFlow Requirement: 1.x](https://img.shields.io/badge/TensorFlow%20Requirement-1.x-brightgreen)
-![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg)
-
-# LFADS - Latent Factor Analysis via Dynamical Systems
-
-This code implements the model from the paper "[LFADS - Latent Factor Analysis via Dynamical Systems](http://biorxiv.org/content/early/2017/06/20/152884)". It is a sequential variational auto-encoder designed specifically for investigating neuroscience data, but can be applied widely to any time series data. In an unsupervised setting, LFADS is able to decompose time series data into various factors, such as an initial condition, a generative dynamical system, control inputs to that generator, and a low dimensional description of the observed data, called the factors. Additionally, the observation model is a loss on a probability distribution, so when LFADS processes a dataset, a denoised version of the dataset is also created. For example, if the dataset is raw spike counts, then under the negative log-likelihood loss under a Poisson distribution, the denoised data would be the inferred Poisson rates.
-
-
-## Prerequisites
-
-The code is written in Python 2.7.6. You will also need:
-
-* **TensorFlow** version 1.5 ([install](https://www.tensorflow.org/install/)) -
-* **NumPy, SciPy, Matplotlib** ([install SciPy stack](https://www.scipy.org/install.html), contains all of them)
-* **h5py** ([install](https://pypi.python.org/pypi/h5py))
-
-
-## Getting started
-
-Before starting, run the following:
-
-
-
-where "path/to/your/directory" is replaced with the path to the LFADS repository (you can get this path by using the `pwd` command). This allows the nested directories to access modules from their parent directory.
-
-## Generate synthetic data
-
-In order to generate the synthetic datasets first, from the top-level lfads directory, run:
-
-```sh
-$ cd synth_data
-$ ./run_generate_synth_data.sh
-$ cd ..
-```
-
-These synthetic datasets are provided 1. to gain insight into how the LFADS algorithm operates, and 2. to give reasonable starting points for analyses you might be interested for your own data.
-
-## Train an LFADS model
-
-Now that we have our example datasets, we can train some models! To spin up an LFADS model on the synthetic data, run any of the following commands. For the examples that are in the paper, the important hyperparameters are roughly replicated. Most hyperparameters are insensitive to small changes or won't ever be changed unless you want a very fine level of control. In the first example, all hyperparameter flags are enumerated for easy copy-pasting, but for the rest of the examples only the most important flags (~the first 9) are specified for brevity. For a full list of flags, their descriptions, and their default values, refer to the top of `run_lfads.py`. Please see Table 1 in the Online Methods of the associated paper for definitions of the most important hyperparameters.
-
-```sh
-# Run LFADS on chaotic rnn data with no input pulses (g = 1.5) with spiking noise
-$ python run_lfads.py --kind=train \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=chaotic_rnn_no_inputs \
---lfads_save_dir=/tmp/lfads_chaotic_rnn_no_inputs \
---co_dim=0 \
---factors_dim=20 \
---ext_input_dim=0 \
---controller_input_lag=1 \
---output_dist=poisson \
---do_causal_controller=false \
---batch_size=128 \
---learning_rate_init=0.01 \
---learning_rate_stop=1e-05 \
---learning_rate_decay_factor=0.95 \
---learning_rate_n_to_compare=6 \
---do_reset_learning_rate=false \
---keep_prob=0.95 \
---con_dim=128 \
---gen_dim=200 \
---ci_enc_dim=128 \
---ic_dim=64 \
---ic_enc_dim=128 \
---ic_prior_var_min=0.1 \
---gen_cell_input_weight_scale=1.0 \
---cell_weight_scale=1.0 \
---do_feed_factors_to_controller=true \
---kl_start_step=0 \
---kl_increase_steps=2000 \
---kl_ic_weight=1.0 \
---l2_con_scale=0.0 \
---l2_gen_scale=2000.0 \
---l2_start_step=0 \
---l2_increase_steps=2000 \
---ic_prior_var_scale=0.1 \
---ic_post_var_min=0.0001 \
---kl_co_weight=1.0 \
---prior_ar_nvar=0.1 \
---cell_clip_value=5.0 \
---max_ckpt_to_keep_lve=5 \
---do_train_prior_ar_atau=true \
---co_prior_var_scale=0.1 \
---csv_log=fitlog \
---feedback_factors_or_rates=factors \
---do_train_prior_ar_nvar=true \
---max_grad_norm=200.0 \
---device=gpu:0 \
---num_steps_for_gen_ic=100000000 \
---ps_nexamples_to_process=100000000 \
---checkpoint_name=lfads_vae \
---temporal_spike_jitter_width=0 \
---checkpoint_pb_load_name=checkpoint \
---inject_ext_input_to_gen=false \
---co_mean_corr_scale=0.0 \
---gen_cell_rec_weight_scale=1.0 \
---max_ckpt_to_keep=5 \
---output_filename_stem="" \
---ic_prior_var_max=0.1 \
---prior_ar_atau=10.0 \
---do_train_io_only=false \
---do_train_encoder_only=false
-
-# Run LFADS on chaotic rnn data with no input pulses (g = 1.5) with Gaussian noise
-$ python run_lfads.py --kind=train \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=gaussian_chaotic_rnn_no_inputs \
---lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \
---co_dim=1 \
---factors_dim=20 \
---output_dist=gaussian
-
-
-# Run LFADS on chaotic rnn data with input pulses (g = 2.5)
-$ python run_lfads.py --kind=train \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=chaotic_rnn_inputs_g2p5 \
---lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \
---co_dim=1 \
---factors_dim=20 \
---output_dist=poisson
-
-# Run LFADS on multi-session RNN data
-$ python run_lfads.py --kind=train \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=chaotic_rnn_multisession \
---lfads_save_dir=/tmp/lfads_chaotic_rnn_multisession \
---factors_dim=10 \
---output_dist=poisson
-
-# Run LFADS on integration to bound model data
-$ python run_lfads.py --kind=train \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=itb_rnn \
---lfads_save_dir=/tmp/lfads_itb_rnn \
---co_dim=1 \
---factors_dim=20 \
---controller_input_lag=0 \
---output_dist=poisson
-
-# Run LFADS on chaotic RNN data with labels
-$ python run_lfads.py --kind=train \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=chaotic_rnns_labeled \
---lfads_save_dir=/tmp/lfads_chaotic_rnns_labeled \
---co_dim=0 \
---factors_dim=20 \
---controller_input_lag=0 \
---ext_input_dim=1 \
---output_dist=poisson
-
-# Run LFADS on chaotic rnn data with no input pulses (g = 1.5) with Gaussian noise
-$ python run_lfads.py --kind=train \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=chaotic_rnn_no_inputs \
---lfads_save_dir=/tmp/lfads_chaotic_rnn_no_inputs \
---co_dim=0 \
---factors_dim=20 \
---ext_input_dim=0 \
---controller_input_lag=1 \
---output_dist=gaussian \
-
-
-```
-
-**Tip**: If you are running LFADS on GPU and would like to run more than one model concurrently, set the `--allow_gpu_growth=True` flag on each job, otherwise one model will take up the entire GPU for performance purposes. Also, one needs to install the TensorFlow libraries with GPU support.
-
-
-## Visualize a training model
-
-To visualize training curves and various other metrics while training and LFADS model, run the following command on your model directory. To launch a tensorboard on the chaotic RNN data with input pulses, for example:
-
-```sh
-tensorboard --logdir=/tmp/lfads_chaotic_rnn_inputs_g2p5
-```
-
-## Evaluate a trained model
-
-Once your model is finished training, there are multiple ways you can evaluate
-it. Below are some sample commands to evaluate an LFADS model trained on the
-chaotic rnn data with input pulses (g = 2.5). The key differences here are
-setting the `--kind` flag to the appropriate mode, as well as the
-`--checkpoint_pb_load_name` flag to `checkpoint_lve` and the `--batch_size` flag
-(if you'd like to make it larger or smaller). All other flags should be the
-same as used in training, so that the same model architecture is built.
-
-```sh
-# Take samples from posterior then average (denoising operation)
-$ python run_lfads.py --kind=posterior_sample_and_average \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=chaotic_rnn_inputs_g2p5 \
---lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \
---co_dim=1 \
---factors_dim=20 \
---batch_size=1024 \
---checkpoint_pb_load_name=checkpoint_lve
-
-# Sample from prior (generation of completely new samples)
-$ python run_lfads.py --kind=prior_sample \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=chaotic_rnn_inputs_g2p5 \
---lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \
---co_dim=1 \
---factors_dim=20 \
---batch_size=50 \
---checkpoint_pb_load_name=checkpoint_lve
-
-# Write down model parameters
-$ python run_lfads.py --kind=write_model_params \
---data_dir=/tmp/rnn_synth_data_v1.0/ \
---data_filename_stem=chaotic_rnn_inputs_g2p5 \
---lfads_save_dir=/tmp/lfads_chaotic_rnn_inputs_g2p5 \
---co_dim=1 \
---factors_dim=20 \
---checkpoint_pb_load_name=checkpoint_lve
-```
-
-## Contact
-
-File any issues with the [issue tracker](https://github.com/tensorflow/models/issues). For any questions or problems, this code is maintained by [@sussillo](https://github.com/sussillo) and [@jazcollins](https://github.com/jazcollins).
-
diff --git a/research/lfads/distributions.py b/research/lfads/distributions.py
deleted file mode 100644
index 351d019af2b..00000000000
--- a/research/lfads/distributions.py
+++ /dev/null
@@ -1,493 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-import numpy as np
-import tensorflow as tf
-from utils import linear, log_sum_exp
-
-class Poisson(object):
- """Poisson distributon
-
- Computes the log probability under the model.
-
- """
- def __init__(self, log_rates):
- """ Create Poisson distributions with log_rates parameters.
-
- Args:
- log_rates: a tensor-like list of log rates underlying the Poisson dist.
- """
- self.logr = log_rates
-
- def logp(self, bin_counts):
- """Compute the log probability for the counts in the bin, under the model.
-
- Args:
- bin_counts: array-like integer counts
-
- Returns:
- The log-probability under the Poisson models for each element of
- bin_counts.
- """
- k = tf.to_float(bin_counts)
- # log poisson(k, r) = log(r^k * e^(-r) / k!) = k log(r) - r - log k!
- # log poisson(k, r=exp(x)) = k * x - exp(x) - lgamma(k + 1)
- return k * self.logr - tf.exp(self.logr) - tf.lgamma(k + 1)
-
-
-def diag_gaussian_log_likelihood(z, mu=0.0, logvar=0.0):
- """Log-likelihood under a Gaussian distribution with diagonal covariance.
- Returns the log-likelihood for each dimension. One should sum the
- results for the log-likelihood under the full multidimensional model.
-
- Args:
- z: The value to compute the log-likelihood.
- mu: The mean of the Gaussian
- logvar: The log variance of the Gaussian.
-
- Returns:
- The log-likelihood under the Gaussian model.
- """
-
- return -0.5 * (logvar + np.log(2*np.pi) + \
- tf.square((z-mu)/tf.exp(0.5*logvar)))
-
-
-def gaussian_pos_log_likelihood(unused_mean, logvar, noise):
- """Gaussian log-likelihood function for a posterior in VAE
-
- Note: This function is specialized for a posterior distribution, that has the
- form of z = mean + sigma * noise.
-
- Args:
- unused_mean: ignore
- logvar: The log variance of the distribution
- noise: The noise used in the sampling of the posterior.
-
- Returns:
- The log-likelihood under the Gaussian model.
- """
- # ln N(z; mean, sigma) = - ln(sigma) - 0.5 ln 2pi - noise^2 / 2
- return - 0.5 * (logvar + np.log(2 * np.pi) + tf.square(noise))
-
-
-class Gaussian(object):
- """Base class for Gaussian distribution classes."""
- pass
-
-
-class DiagonalGaussian(Gaussian):
- """Diagonal Gaussian with different constant mean and variances in each
- dimension.
- """
-
- def __init__(self, batch_size, z_size, mean, logvar):
- """Create a diagonal gaussian distribution.
-
- Args:
- batch_size: The size of the batch, i.e. 0th dim in 2D tensor of samples.
- z_size: The dimension of the distribution, i.e. 1st dim in 2D tensor.
- mean: The N-D mean of the distribution.
- logvar: The N-D log variance of the diagonal distribution.
- """
- size__xz = [None, z_size]
- self.mean = mean # bxn already
- self.logvar = logvar # bxn already
- self.noise = noise = tf.random_normal(tf.shape(logvar))
- self.sample = mean + tf.exp(0.5 * logvar) * noise
- mean.set_shape(size__xz)
- logvar.set_shape(size__xz)
- self.sample.set_shape(size__xz)
-
- def logp(self, z=None):
- """Compute the log-likelihood under the distribution.
-
- Args:
- z (optional): value to compute likelihood for, if None, use sample.
-
- Returns:
- The likelihood of z under the model.
- """
- if z is None:
- z = self.sample
-
- # This is needed to make sure that the gradients are simple.
- # The value of the function shouldn't change.
- if z == self.sample:
- return gaussian_pos_log_likelihood(self.mean, self.logvar, self.noise)
-
- return diag_gaussian_log_likelihood(z, self.mean, self.logvar)
-
-
-class LearnableDiagonalGaussian(Gaussian):
- """Diagonal Gaussian whose mean and variance are learned parameters."""
-
- def __init__(self, batch_size, z_size, name, mean_init=0.0,
- var_init=1.0, var_min=0.0, var_max=1000000.0):
- """Create a learnable diagonal gaussian distribution.
-
- Args:
- batch_size: The size of the batch, i.e. 0th dim in 2D tensor of samples.
- z_size: The dimension of the distribution, i.e. 1st dim in 2D tensor.
- name: prefix name for the mean and log TF variables.
- mean_init (optional): The N-D mean initialization of the distribution.
- var_init (optional): The N-D variance initialization of the diagonal
- distribution.
- var_min (optional): The minimum value the learned variance can take in any
- dimension.
- var_max (optional): The maximum value the learned variance can take in any
- dimension.
- """
-
- size_1xn = [1, z_size]
- size__xn = [None, z_size]
- size_bx1 = tf.stack([batch_size, 1])
- assert var_init > 0.0, "Problems"
- assert var_max >= var_min, "Problems"
- assert var_init >= var_min, "Problems"
- assert var_max >= var_init, "Problems"
-
-
- z_mean_1xn = tf.get_variable(name=name+"/mean", shape=size_1xn,
- initializer=tf.constant_initializer(mean_init))
- self.mean_bxn = mean_bxn = tf.tile(z_mean_1xn, size_bx1)
- mean_bxn.set_shape(size__xn) # tile loses shape
-
- log_var_init = np.log(var_init)
- if var_max > var_min:
- var_is_trainable = True
- else:
- var_is_trainable = False
-
- z_logvar_1xn = \
- tf.get_variable(name=(name+"/logvar"), shape=size_1xn,
- initializer=tf.constant_initializer(log_var_init),
- trainable=var_is_trainable)
-
- if var_is_trainable:
- z_logit_var_1xn = tf.exp(z_logvar_1xn)
- z_var_1xn = tf.nn.sigmoid(z_logit_var_1xn)*(var_max-var_min) + var_min
- z_logvar_1xn = tf.log(z_var_1xn)
-
- logvar_bxn = tf.tile(z_logvar_1xn, size_bx1)
- self.logvar_bxn = logvar_bxn
- self.noise_bxn = noise_bxn = tf.random_normal(tf.shape(logvar_bxn))
- self.sample_bxn = mean_bxn + tf.exp(0.5 * logvar_bxn) * noise_bxn
-
- def logp(self, z=None):
- """Compute the log-likelihood under the distribution.
-
- Args:
- z (optional): value to compute likelihood for, if None, use sample.
-
- Returns:
- The likelihood of z under the model.
- """
- if z is None:
- z = self.sample
-
- # This is needed to make sure that the gradients are simple.
- # The value of the function shouldn't change.
- if z == self.sample_bxn:
- return gaussian_pos_log_likelihood(self.mean_bxn, self.logvar_bxn,
- self.noise_bxn)
-
- return diag_gaussian_log_likelihood(z, self.mean_bxn, self.logvar_bxn)
-
- @property
- def mean(self):
- return self.mean_bxn
-
- @property
- def logvar(self):
- return self.logvar_bxn
-
- @property
- def sample(self):
- return self.sample_bxn
-
-
-class DiagonalGaussianFromInput(Gaussian):
- """Diagonal Gaussian whose mean and variance are conditioned on other
- variables.
-
- Note: the parameters to convert from input to the learned mean and log
- variance are held in this class.
- """
-
- def __init__(self, x_bxu, z_size, name, var_min=0.0):
- """Create an input dependent diagonal Gaussian distribution.
-
- Args:
- x: The input tensor from which the mean and variance are computed,
- via a linear transformation of x. I.e.
- mu = Wx + b, log(var) = Mx + c
- z_size: The size of the distribution.
- name: The name to prefix to learned variables.
- var_min (optional): Minimal variance allowed. This is an additional
- way to control the amount of information getting through the stochastic
- layer.
- """
- size_bxn = tf.stack([tf.shape(x_bxu)[0], z_size])
- self.mean_bxn = mean_bxn = linear(x_bxu, z_size, name=(name+"/mean"))
- logvar_bxn = linear(x_bxu, z_size, name=(name+"/logvar"))
- if var_min > 0.0:
- logvar_bxn = tf.log(tf.exp(logvar_bxn) + var_min)
- self.logvar_bxn = logvar_bxn
-
- self.noise_bxn = noise_bxn = tf.random_normal(size_bxn)
- self.noise_bxn.set_shape([None, z_size])
- self.sample_bxn = mean_bxn + tf.exp(0.5 * logvar_bxn) * noise_bxn
-
- def logp(self, z=None):
- """Compute the log-likelihood under the distribution.
-
- Args:
- z (optional): value to compute likelihood for, if None, use sample.
-
- Returns:
- The likelihood of z under the model.
- """
-
- if z is None:
- z = self.sample
-
- # This is needed to make sure that the gradients are simple.
- # The value of the function shouldn't change.
- if z == self.sample_bxn:
- return gaussian_pos_log_likelihood(self.mean_bxn,
- self.logvar_bxn, self.noise_bxn)
-
- return diag_gaussian_log_likelihood(z, self.mean_bxn, self.logvar_bxn)
-
- @property
- def mean(self):
- return self.mean_bxn
-
- @property
- def logvar(self):
- return self.logvar_bxn
-
- @property
- def sample(self):
- return self.sample_bxn
-
-
-class GaussianProcess:
- """Base class for Gaussian processes."""
- pass
-
-
-class LearnableAutoRegressive1Prior(GaussianProcess):
- """AR(1) model where autocorrelation and process variance are learned
- parameters. Assumed zero mean.
-
- """
-
- def __init__(self, batch_size, z_size,
- autocorrelation_taus, noise_variances,
- do_train_prior_ar_atau, do_train_prior_ar_nvar,
- num_steps, name):
- """Create a learnable autoregressive (1) process.
-
- Args:
- batch_size: The size of the batch, i.e. 0th dim in 2D tensor of samples.
- z_size: The dimension of the distribution, i.e. 1st dim in 2D tensor.
- autocorrelation_taus: The auto correlation time constant of the AR(1)
- process.
- A value of 0 is uncorrelated gaussian noise.
- noise_variances: The variance of the additive noise, *not* the process
- variance.
- do_train_prior_ar_atau: Train or leave as constant, the autocorrelation?
- do_train_prior_ar_nvar: Train or leave as constant, the noise variance?
- num_steps: Number of steps to run the process.
- name: The name to prefix to learned TF variables.
- """
-
- # Note the use of the plural in all of these quantities. This is intended
- # to mark that even though a sample z_t from the posterior is thought of a
- # single sample of a multidimensional gaussian, the prior is actually
- # thought of as U AR(1) processes, where U is the dimension of the inferred
- # input.
- size_bx1 = tf.stack([batch_size, 1])
- size__xu = [None, z_size]
- # process variance, the variance at time t over all instantiations of AR(1)
- # with these parameters.
- log_evar_inits_1xu = tf.expand_dims(tf.log(noise_variances), 0)
- self.logevars_1xu = logevars_1xu = \
- tf.Variable(log_evar_inits_1xu, name=name+"/logevars", dtype=tf.float32,
- trainable=do_train_prior_ar_nvar)
- self.logevars_bxu = logevars_bxu = tf.tile(logevars_1xu, size_bx1)
- logevars_bxu.set_shape(size__xu) # tile loses shape
-
- # \tau, which is the autocorrelation time constant of the AR(1) process
- log_atau_inits_1xu = tf.expand_dims(tf.log(autocorrelation_taus), 0)
- self.logataus_1xu = logataus_1xu = \
- tf.Variable(log_atau_inits_1xu, name=name+"/logatau", dtype=tf.float32,
- trainable=do_train_prior_ar_atau)
-
- # phi in x_t = \mu + phi x_tm1 + \eps
- # phi = exp(-1/tau)
- # phi = exp(-1/exp(logtau))
- # phi = exp(-exp(-logtau))
- phis_1xu = tf.exp(-tf.exp(-logataus_1xu))
- self.phis_bxu = phis_bxu = tf.tile(phis_1xu, size_bx1)
- phis_bxu.set_shape(size__xu)
-
- # process noise
- # pvar = evar / (1- phi^2)
- # logpvar = log ( exp(logevar) / (1 - phi^2) )
- # logpvar = logevar - log(1-phi^2)
- # logpvar = logevar - (log(1-phi) + log(1+phi))
- self.logpvars_1xu = \
- logevars_1xu - tf.log(1.0-phis_1xu) - tf.log(1.0+phis_1xu)
- self.logpvars_bxu = logpvars_bxu = tf.tile(self.logpvars_1xu, size_bx1)
- logpvars_bxu.set_shape(size__xu)
-
- # process mean (zero but included in for completeness)
- self.pmeans_bxu = pmeans_bxu = tf.zeros_like(phis_bxu)
-
- # For sampling from the prior during de-novo generation.
- self.means_t = means_t = [None] * num_steps
- self.logvars_t = logvars_t = [None] * num_steps
- self.samples_t = samples_t = [None] * num_steps
- self.gaussians_t = gaussians_t = [None] * num_steps
- sample_bxu = tf.zeros_like(phis_bxu)
- for t in range(num_steps):
- # process variance used here to make process completely stationary
- if t == 0:
- logvar_pt_bxu = self.logpvars_bxu
- else:
- logvar_pt_bxu = self.logevars_bxu
-
- z_mean_pt_bxu = pmeans_bxu + phis_bxu * sample_bxu
- gaussians_t[t] = DiagonalGaussian(batch_size, z_size,
- mean=z_mean_pt_bxu,
- logvar=logvar_pt_bxu)
- sample_bxu = gaussians_t[t].sample
- samples_t[t] = sample_bxu
- logvars_t[t] = logvar_pt_bxu
- means_t[t] = z_mean_pt_bxu
-
- def logp_t(self, z_t_bxu, z_tm1_bxu=None):
- """Compute the log-likelihood under the distribution for a given time t,
- not the whole sequence.
-
- Args:
- z_t_bxu: sample to compute likelihood for at time t.
- z_tm1_bxu (optional): sample condition probability of z_t upon.
-
- Returns:
- The likelihood of p_t under the model at time t. i.e.
- p(z_t|z_tm1_bxu) = N(z_tm1_bxu * phis, eps^2)
-
- """
- if z_tm1_bxu is None:
- return diag_gaussian_log_likelihood(z_t_bxu, self.pmeans_bxu,
- self.logpvars_bxu)
- else:
- means_t_bxu = self.pmeans_bxu + self.phis_bxu * z_tm1_bxu
- logp_tgtm1_bxu = diag_gaussian_log_likelihood(z_t_bxu,
- means_t_bxu,
- self.logevars_bxu)
- return logp_tgtm1_bxu
-
-
-class KLCost_GaussianGaussian(object):
- """log p(x|z) + KL(q||p) terms for Gaussian posterior and Gaussian prior. See
- eqn 10 and Appendix B in VAE for latter term,
- http://arxiv.org/abs/1312.6114
-
- The log p(x|z) term is the reconstruction error under the model.
- The KL term represents the penalty for passing information from the encoder
- to the decoder.
- To sample KL(q||p), we simply sample
- ln q - ln p
- by drawing samples from q and averaging.
- """
-
- def __init__(self, zs, prior_zs):
- """Create a lower bound in three parts, normalized reconstruction
- cost, normalized KL divergence cost, and their sum.
-
- E_q[ln p(z_i | z_{i+1}) / q(z_i | x)
- \int q(z) ln p(z) dz = - 0.5 ln(2pi) - 0.5 \sum (ln(sigma_p^2) + \
- sigma_q^2 / sigma_p^2 + (mean_p - mean_q)^2 / sigma_p^2)
-
- \int q(z) ln q(z) dz = - 0.5 ln(2pi) - 0.5 \sum (ln(sigma_q^2) + 1)
-
- Args:
- zs: posterior z ~ q(z|x)
- prior_zs: prior zs
- """
- # L = -KL + log p(x|z), to maximize bound on likelihood
- # -L = KL - log p(x|z), to minimize bound on NLL
- # so 'KL cost' is postive KL divergence
- kl_b = 0.0
- for z, prior_z in zip(zs, prior_zs):
- assert isinstance(z, Gaussian)
- assert isinstance(prior_z, Gaussian)
- # ln(2pi) terms cancel
- kl_b += 0.5 * tf.reduce_sum(
- prior_z.logvar - z.logvar
- + tf.exp(z.logvar - prior_z.logvar)
- + tf.square((z.mean - prior_z.mean) / tf.exp(0.5 * prior_z.logvar))
- - 1.0, [1])
-
- self.kl_cost_b = kl_b
- self.kl_cost = tf.reduce_mean(kl_b)
-
-
-class KLCost_GaussianGaussianProcessSampled(object):
- """ log p(x|z) + KL(q||p) terms for Gaussian posterior and Gaussian process
- prior via sampling.
-
- The log p(x|z) term is the reconstruction error under the model.
- The KL term represents the penalty for passing information from the encoder
- to the decoder.
- To sample KL(q||p), we simply sample
- ln q - ln p
- by drawing samples from q and averaging.
- """
-
- def __init__(self, post_zs, prior_z_process):
- """Create a lower bound in three parts, normalized reconstruction
- cost, normalized KL divergence cost, and their sum.
-
- Args:
- post_zs: posterior z ~ q(z|x)
- prior_z_process: prior AR(1) process
- """
- assert len(post_zs) > 1, "GP is for time, need more than 1 time step."
- assert isinstance(prior_z_process, GaussianProcess), "Must use GP."
-
- # L = -KL + log p(x|z), to maximize bound on likelihood
- # -L = KL - log p(x|z), to minimize bound on NLL
- # so 'KL cost' is postive KL divergence
- z0_bxu = post_zs[0].sample
- logq_bxu = post_zs[0].logp(z0_bxu)
- logp_bxu = prior_z_process.logp_t(z0_bxu)
- z_tm1_bxu = z0_bxu
- for z_t in post_zs[1:]:
- # posterior is independent in time, prior is not
- z_t_bxu = z_t.sample
- logq_bxu += z_t.logp(z_t_bxu)
- logp_bxu += prior_z_process.logp_t(z_t_bxu, z_tm1_bxu)
- z_tm1_bxu = z_t_bxu
-
- kl_bxu = logq_bxu - logp_bxu
- kl_b = tf.reduce_sum(kl_bxu, [1])
- self.kl_cost_b = kl_b
- self.kl_cost = tf.reduce_mean(kl_b)
diff --git a/research/lfads/lfads.py b/research/lfads/lfads.py
deleted file mode 100644
index 925484c62eb..00000000000
--- a/research/lfads/lfads.py
+++ /dev/null
@@ -1,2170 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-"""
-LFADS - Latent Factor Analysis via Dynamical Systems.
-
-LFADS is an unsupervised method to decompose time series data into
-various factors, such as an initial condition, a generative
-dynamical system, control inputs to that generator, and a low
-dimensional description of the observed data, called the factors.
-Additionally, the observations have a noise model (in this case
-Poisson), so a denoised version of the observations is also created
-(e.g. underlying rates of a Poisson distribution given the observed
-event counts).
-
-The main data structure being passed around is a dataset. This is a dictionary
-of data dictionaries.
-
-DATASET: The top level dictionary is simply name (string -> dictionary).
-The nested dictionary is the DATA DICTIONARY, which has the following keys:
- 'train_data' and 'valid_data', whose values are the corresponding training
- and validation data with shape
- ExTxD, E - # examples, T - # time steps, D - # dimensions in data.
- The data dictionary also has a few more keys:
- 'train_ext_input' and 'valid_ext_input', if there are know external inputs
- to the system being modeled, these take on dimensions:
- ExTxI, E - # examples, T - # time steps, I = # dimensions in input.
- 'alignment_matrix_cxf' - If you are using multiple days data, it's possible
- that one can align the channels (see manuscript). If so each dataset will
- contain this matrix, which will be used for both the input adapter and the
- output adapter for each dataset. These matrices, if provided, must be of
- size [data_dim x factors] where data_dim is the number of neurons recorded
- on that day, and factors is chosen and set through the '--factors' flag.
- 'alignment_bias_c' - See alignment_matrix_cxf. This bias will used to
- the offset for the alignment transformation. It will *subtract* off the
- bias from the data, so pca style inits can align factors across sessions.
-
-
- If one runs LFADS on data where the true rates are known for some trials,
- (say simulated, testing data, as in the example shipped with the paper), then
- one can add three more fields for plotting purposes. These are 'train_truth'
- and 'valid_truth', and 'conversion_factor'. These have the same dimensions as
- 'train_data', and 'valid_data' but represent the underlying rates of the
- observations. Finally, if one needs to convert scale for plotting the true
- underlying firing rates, there is the 'conversion_factor' key.
-"""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-
-import numpy as np
-import os
-import tensorflow as tf
-from distributions import LearnableDiagonalGaussian, DiagonalGaussianFromInput
-from distributions import diag_gaussian_log_likelihood
-from distributions import KLCost_GaussianGaussian, Poisson
-from distributions import LearnableAutoRegressive1Prior
-from distributions import KLCost_GaussianGaussianProcessSampled
-
-from utils import init_linear, linear, list_t_bxn_to_tensor_bxtxn, write_data
-from utils import log_sum_exp, flatten
-from plot_lfads import plot_lfads
-
-
-class GRU(object):
- """Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078).
-
- """
- def __init__(self, num_units, forget_bias=1.0, weight_scale=1.0,
- clip_value=np.inf, collections=None):
- """Create a GRU object.
-
- Args:
- num_units: Number of units in the GRU.
- forget_bias (optional): Hack to help learning.
- weight_scale (optional): Weights are scaled by ws/sqrt(#inputs), with
- ws being the weight scale.
- clip_value (optional): If the recurrent values grow above this value,
- clip them.
- collections (optional): List of additional collections variables should
- belong to.
- """
- self._num_units = num_units
- self._forget_bias = forget_bias
- self._weight_scale = weight_scale
- self._clip_value = clip_value
- self._collections = collections
-
- @property
- def state_size(self):
- return self._num_units
-
- @property
- def output_size(self):
- return self._num_units
-
- @property
- def state_multiplier(self):
- return 1
-
- def output_from_state(self, state):
- """Return the output portion of the state."""
- return state
-
- def __call__(self, inputs, state, scope=None):
- """Gated recurrent unit (GRU) function.
-
- Args:
- inputs: A 2D batch x input_dim tensor of inputs.
- state: The previous state from the last time step.
- scope (optional): TF variable scope for defined GRU variables.
-
- Returns:
- A tuple (state, state), where state is the newly computed state at time t.
- It is returned twice to respect an interface that works for LSTMs.
- """
-
- x = inputs
- h = state
- if inputs is not None:
- xh = tf.concat(axis=1, values=[x, h])
- else:
- xh = h
-
- with tf.variable_scope(scope or type(self).__name__): # "GRU"
- with tf.variable_scope("Gates"): # Reset gate and update gate.
- # We start with bias of 1.0 to not reset and not update.
- r, u = tf.split(axis=1, num_or_size_splits=2, value=linear(xh,
- 2 * self._num_units,
- alpha=self._weight_scale,
- name="xh_2_ru",
- collections=self._collections))
- r, u = tf.sigmoid(r), tf.sigmoid(u + self._forget_bias)
- with tf.variable_scope("Candidate"):
- xrh = tf.concat(axis=1, values=[x, r * h])
- c = tf.tanh(linear(xrh, self._num_units, name="xrh_2_c",
- collections=self._collections))
- new_h = u * h + (1 - u) * c
- new_h = tf.clip_by_value(new_h, -self._clip_value, self._clip_value)
-
- return new_h, new_h
-
-
-class GenGRU(object):
- """Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078).
-
- This version is specialized for the generator, but isn't as fast, so
- we have two. Note this allows for l2 regularization on the recurrent
- weights, but also implicitly rescales the inputs via the 1/sqrt(input)
- scaling in the linear helper routine to be large magnitude, if there are
- fewer inputs than recurrent state.
-
- """
- def __init__(self, num_units, forget_bias=1.0,
- input_weight_scale=1.0, rec_weight_scale=1.0, clip_value=np.inf,
- input_collections=None, recurrent_collections=None):
- """Create a GRU object.
-
- Args:
- num_units: Number of units in the GRU.
- forget_bias (optional): Hack to help learning.
- input_weight_scale (optional): Weights are scaled ws/sqrt(#inputs), with
- ws being the weight scale.
- rec_weight_scale (optional): Weights are scaled ws/sqrt(#inputs),
- with ws being the weight scale.
- clip_value (optional): If the recurrent values grow above this value,
- clip them.
- input_collections (optional): List of additional collections variables
- that input->rec weights should belong to.
- recurrent_collections (optional): List of additional collections variables
- that rec->rec weights should belong to.
- """
- self._num_units = num_units
- self._forget_bias = forget_bias
- self._input_weight_scale = input_weight_scale
- self._rec_weight_scale = rec_weight_scale
- self._clip_value = clip_value
- self._input_collections = input_collections
- self._rec_collections = recurrent_collections
-
- @property
- def state_size(self):
- return self._num_units
-
- @property
- def output_size(self):
- return self._num_units
-
- @property
- def state_multiplier(self):
- return 1
-
- def output_from_state(self, state):
- """Return the output portion of the state."""
- return state
-
- def __call__(self, inputs, state, scope=None):
- """Gated recurrent unit (GRU) function.
-
- Args:
- inputs: A 2D batch x input_dim tensor of inputs.
- state: The previous state from the last time step.
- scope (optional): TF variable scope for defined GRU variables.
-
- Returns:
- A tuple (state, state), where state is the newly computed state at time t.
- It is returned twice to respect an interface that works for LSTMs.
- """
-
- x = inputs
- h = state
- with tf.variable_scope(scope or type(self).__name__): # "GRU"
- with tf.variable_scope("Gates"): # Reset gate and update gate.
- # We start with bias of 1.0 to not reset and not update.
- r_x = u_x = 0.0
- if x is not None:
- r_x, u_x = tf.split(axis=1, num_or_size_splits=2, value=linear(x,
- 2 * self._num_units,
- alpha=self._input_weight_scale,
- do_bias=False,
- name="x_2_ru",
- normalized=False,
- collections=self._input_collections))
-
- r_h, u_h = tf.split(axis=1, num_or_size_splits=2, value=linear(h,
- 2 * self._num_units,
- do_bias=True,
- alpha=self._rec_weight_scale,
- name="h_2_ru",
- collections=self._rec_collections))
- r = r_x + r_h
- u = u_x + u_h
- r, u = tf.sigmoid(r), tf.sigmoid(u + self._forget_bias)
-
- with tf.variable_scope("Candidate"):
- c_x = 0.0
- if x is not None:
- c_x = linear(x, self._num_units, name="x_2_c", do_bias=False,
- alpha=self._input_weight_scale,
- normalized=False,
- collections=self._input_collections)
- c_rh = linear(r*h, self._num_units, name="rh_2_c", do_bias=True,
- alpha=self._rec_weight_scale,
- collections=self._rec_collections)
- c = tf.tanh(c_x + c_rh)
-
- new_h = u * h + (1 - u) * c
- new_h = tf.clip_by_value(new_h, -self._clip_value, self._clip_value)
-
- return new_h, new_h
-
-
-class LFADS(object):
- """LFADS - Latent Factor Analysis via Dynamical Systems.
-
- LFADS is an unsupervised method to decompose time series data into
- various factors, such as an initial condition, a generative
- dynamical system, inferred inputs to that generator, and a low
- dimensional description of the observed data, called the factors.
- Additionally, the observations have a noise model (in this case
- Poisson), so a denoised version of the observations is also created
- (e.g. underlying rates of a Poisson distribution given the observed
- event counts).
- """
-
- def __init__(self, hps, kind="train", datasets=None):
- """Create an LFADS model.
-
- train - a model for training, sampling of posteriors is used
- posterior_sample_and_average - sample from the posterior, this is used
- for evaluating the expected value of the outputs of LFADS, given a
- specific input, by averaging over multiple samples from the approx
- posterior. Also used for the lower bound on the negative
- log-likelihood using IWAE error (Importance Weighed Auto-encoder).
- This is the denoising operation.
- prior_sample - a model for generation - sampling from priors is used
-
- Args:
- hps: The dictionary of hyper parameters.
- kind: The type of model to build (see above).
- datasets: A dictionary of named data_dictionaries, see top of lfads.py
- """
- print("Building graph...")
- all_kinds = ['train', 'posterior_sample_and_average', 'posterior_push_mean',
- 'prior_sample']
- assert kind in all_kinds, 'Wrong kind'
- if hps.feedback_factors_or_rates == "rates":
- assert len(hps.dataset_names) == 1, \
- "Multiple datasets not supported for rate feedback."
- num_steps = hps.num_steps
- ic_dim = hps.ic_dim
- co_dim = hps.co_dim
- ext_input_dim = hps.ext_input_dim
- cell_class = GRU
- gen_cell_class = GenGRU
-
- def makelambda(v): # Used with tf.case
- return lambda: v
-
- # Define the data placeholder, and deal with all parts of the graph
- # that are dataset dependent.
- self.dataName = tf.placeholder(tf.string, shape=())
- # The batch_size to be inferred from data, as normal.
- # Additionally, the data_dim will be inferred as well, allowing for a
- # single placeholder for all datasets, regardless of data dimension.
- if hps.output_dist == 'poisson':
- # Enforce correct dtype
- assert np.issubdtype(
- datasets[hps.dataset_names[0]]['train_data'].dtype, int), \
- "Data dtype must be int for poisson output distribution"
- data_dtype = tf.int32
- elif hps.output_dist == 'gaussian':
- assert np.issubdtype(
- datasets[hps.dataset_names[0]]['train_data'].dtype, float), \
- "Data dtype must be float for gaussian output dsitribution"
- data_dtype = tf.float32
- else:
- assert False, "NIY"
- self.dataset_ph = dataset_ph = tf.placeholder(data_dtype,
- [None, num_steps, None],
- name="data")
- self.train_step = tf.get_variable("global_step", [], tf.int64,
- tf.zeros_initializer(),
- trainable=False)
- self.hps = hps
- ndatasets = hps.ndatasets
- factors_dim = hps.factors_dim
- self.preds = preds = [None] * ndatasets
- self.fns_in_fac_Ws = fns_in_fac_Ws = [None] * ndatasets
- self.fns_in_fatcor_bs = fns_in_fac_bs = [None] * ndatasets
- self.fns_out_fac_Ws = fns_out_fac_Ws = [None] * ndatasets
- self.fns_out_fac_bs = fns_out_fac_bs = [None] * ndatasets
- self.datasetNames = dataset_names = hps.dataset_names
- self.ext_inputs = ext_inputs = None
-
- if len(dataset_names) == 1: # single session
- if 'alignment_matrix_cxf' in datasets[dataset_names[0]].keys():
- used_in_factors_dim = factors_dim
- in_identity_if_poss = False
- else:
- used_in_factors_dim = hps.dataset_dims[dataset_names[0]]
- in_identity_if_poss = True
- else: # multisession
- used_in_factors_dim = factors_dim
- in_identity_if_poss = False
-
- for d, name in enumerate(dataset_names):
- data_dim = hps.dataset_dims[name]
- in_mat_cxf = None
- in_bias_1xf = None
- align_bias_1xc = None
-
- if datasets and 'alignment_matrix_cxf' in datasets[name].keys():
- dataset = datasets[name]
- if hps.do_train_readin:
- print("Initializing trainable readin matrix with alignment matrix" \
- " provided for dataset:", name)
- else:
- print("Setting non-trainable readin matrix to alignment matrix" \
- " provided for dataset:", name)
- in_mat_cxf = dataset['alignment_matrix_cxf'].astype(np.float32)
- if in_mat_cxf.shape != (data_dim, factors_dim):
- raise ValueError("""Alignment matrix must have dimensions %d x %d
- (data_dim x factors_dim), but currently has %d x %d."""%
- (data_dim, factors_dim, in_mat_cxf.shape[0],
- in_mat_cxf.shape[1]))
- if datasets and 'alignment_bias_c' in datasets[name].keys():
- dataset = datasets[name]
- if hps.do_train_readin:
- print("Initializing trainable readin bias with alignment bias " \
- "provided for dataset:", name)
- else:
- print("Setting non-trainable readin bias to alignment bias " \
- "provided for dataset:", name)
- align_bias_c = dataset['alignment_bias_c'].astype(np.float32)
- align_bias_1xc = np.expand_dims(align_bias_c, axis=0)
- if align_bias_1xc.shape[1] != data_dim:
- raise ValueError("""Alignment bias must have dimensions %d
- (data_dim), but currently has %d."""%
- (data_dim, in_mat_cxf.shape[0]))
- if in_mat_cxf is not None and align_bias_1xc is not None:
- # (data - alignment_bias) * W_in
- # data * W_in - alignment_bias * W_in
- # So b = -alignment_bias * W_in to accommodate PCA style offset.
- in_bias_1xf = -np.dot(align_bias_1xc, in_mat_cxf)
-
- if hps.do_train_readin:
- # only add to IO transformations collection only if we want it to be
- # learnable, because IO_transformations collection will be trained
- # when do_train_io_only
- collections_readin=['IO_transformations']
- else:
- collections_readin=None
-
- in_fac_lin = init_linear(data_dim, used_in_factors_dim,
- do_bias=True,
- mat_init_value=in_mat_cxf,
- bias_init_value=in_bias_1xf,
- identity_if_possible=in_identity_if_poss,
- normalized=False, name="x_2_infac_"+name,
- collections=collections_readin,
- trainable=hps.do_train_readin)
- in_fac_W, in_fac_b = in_fac_lin
- fns_in_fac_Ws[d] = makelambda(in_fac_W)
- fns_in_fac_bs[d] = makelambda(in_fac_b)
-
- with tf.variable_scope("glm"):
- out_identity_if_poss = False
- if len(dataset_names) == 1 and \
- factors_dim == hps.dataset_dims[dataset_names[0]]:
- out_identity_if_poss = True
- for d, name in enumerate(dataset_names):
- data_dim = hps.dataset_dims[name]
- in_mat_cxf = None
- if datasets and 'alignment_matrix_cxf' in datasets[name].keys():
- dataset = datasets[name]
- in_mat_cxf = dataset['alignment_matrix_cxf'].astype(np.float32)
-
- if datasets and 'alignment_bias_c' in datasets[name].keys():
- dataset = datasets[name]
- align_bias_c = dataset['alignment_bias_c'].astype(np.float32)
- align_bias_1xc = np.expand_dims(align_bias_c, axis=0)
-
- out_mat_fxc = None
- out_bias_1xc = None
- if in_mat_cxf is not None:
- out_mat_fxc = in_mat_cxf.T
- if align_bias_1xc is not None:
- out_bias_1xc = align_bias_1xc
-
- if hps.output_dist == 'poisson':
- out_fac_lin = init_linear(factors_dim, data_dim, do_bias=True,
- mat_init_value=out_mat_fxc,
- bias_init_value=out_bias_1xc,
- identity_if_possible=out_identity_if_poss,
- normalized=False,
- name="fac_2_logrates_"+name,
- collections=['IO_transformations'])
- out_fac_W, out_fac_b = out_fac_lin
-
- elif hps.output_dist == 'gaussian':
- out_fac_lin_mean = \
- init_linear(factors_dim, data_dim, do_bias=True,
- mat_init_value=out_mat_fxc,
- bias_init_value=out_bias_1xc,
- normalized=False,
- name="fac_2_means_"+name,
- collections=['IO_transformations'])
- out_fac_W_mean, out_fac_b_mean = out_fac_lin_mean
-
- mat_init_value = np.zeros([factors_dim, data_dim]).astype(np.float32)
- bias_init_value = np.ones([1, data_dim]).astype(np.float32)
- out_fac_lin_logvar = \
- init_linear(factors_dim, data_dim, do_bias=True,
- mat_init_value=mat_init_value,
- bias_init_value=bias_init_value,
- normalized=False,
- name="fac_2_logvars_"+name,
- collections=['IO_transformations'])
- out_fac_W_mean, out_fac_b_mean = out_fac_lin_mean
- out_fac_W_logvar, out_fac_b_logvar = out_fac_lin_logvar
- out_fac_W = tf.concat(
- axis=1, values=[out_fac_W_mean, out_fac_W_logvar])
- out_fac_b = tf.concat(
- axis=1, values=[out_fac_b_mean, out_fac_b_logvar])
- else:
- assert False, "NIY"
-
- preds[d] = tf.equal(tf.constant(name), self.dataName)
- data_dim = hps.dataset_dims[name]
- fns_out_fac_Ws[d] = makelambda(out_fac_W)
- fns_out_fac_bs[d] = makelambda(out_fac_b)
-
- pf_pairs_in_fac_Ws = zip(preds, fns_in_fac_Ws)
- pf_pairs_in_fac_bs = zip(preds, fns_in_fac_bs)
- pf_pairs_out_fac_Ws = zip(preds, fns_out_fac_Ws)
- pf_pairs_out_fac_bs = zip(preds, fns_out_fac_bs)
-
- this_in_fac_W = tf.case(pf_pairs_in_fac_Ws, exclusive=True)
- this_in_fac_b = tf.case(pf_pairs_in_fac_bs, exclusive=True)
- this_out_fac_W = tf.case(pf_pairs_out_fac_Ws, exclusive=True)
- this_out_fac_b = tf.case(pf_pairs_out_fac_bs, exclusive=True)
-
- # External inputs (not changing by dataset, by definition).
- if hps.ext_input_dim > 0:
- self.ext_input = tf.placeholder(tf.float32,
- [None, num_steps, ext_input_dim],
- name="ext_input")
- else:
- self.ext_input = None
- ext_input_bxtxi = self.ext_input
-
- self.keep_prob = keep_prob = tf.placeholder(tf.float32, [], "keep_prob")
- self.batch_size = batch_size = int(hps.batch_size)
- self.learning_rate = tf.Variable(float(hps.learning_rate_init),
- trainable=False, name="learning_rate")
- self.learning_rate_decay_op = self.learning_rate.assign(
- self.learning_rate * hps.learning_rate_decay_factor)
-
- # Dropout the data.
- dataset_do_bxtxd = tf.nn.dropout(tf.to_float(dataset_ph), keep_prob)
- if hps.ext_input_dim > 0:
- ext_input_do_bxtxi = tf.nn.dropout(ext_input_bxtxi, keep_prob)
- else:
- ext_input_do_bxtxi = None
-
- # ENCODERS
- def encode_data(dataset_bxtxd, enc_cell, name, forward_or_reverse,
- num_steps_to_encode):
- """Encode data for LFADS
- Args:
- dataset_bxtxd - the data to encode, as a 3 tensor, with dims
- time x batch x data dims.
- enc_cell: encoder cell
- name: name of encoder
- forward_or_reverse: string, encode in forward or reverse direction
- num_steps_to_encode: number of steps to encode, 0:num_steps_to_encode
- Returns:
- encoded data as a list with num_steps_to_encode items, in order
- """
- if forward_or_reverse == "forward":
- dstr = "_fwd"
- time_fwd_or_rev = range(num_steps_to_encode)
- else:
- dstr = "_rev"
- time_fwd_or_rev = reversed(range(num_steps_to_encode))
-
- with tf.variable_scope(name+"_enc"+dstr, reuse=False):
- enc_state = tf.tile(
- tf.Variable(tf.zeros([1, enc_cell.state_size]),
- name=name+"_enc_t0"+dstr), tf.stack([batch_size, 1]))
- enc_state.set_shape([None, enc_cell.state_size]) # tile loses shape
-
- enc_outs = [None] * num_steps_to_encode
- for i, t in enumerate(time_fwd_or_rev):
- with tf.variable_scope(name+"_enc"+dstr, reuse=True if i > 0 else None):
- dataset_t_bxd = dataset_bxtxd[:,t,:]
- in_fac_t_bxf = tf.matmul(dataset_t_bxd, this_in_fac_W) + this_in_fac_b
- in_fac_t_bxf.set_shape([None, used_in_factors_dim])
- if ext_input_dim > 0 and not hps.inject_ext_input_to_gen:
- ext_input_t_bxi = ext_input_do_bxtxi[:,t,:]
- enc_input_t_bxfpe = tf.concat(
- axis=1, values=[in_fac_t_bxf, ext_input_t_bxi])
- else:
- enc_input_t_bxfpe = in_fac_t_bxf
- enc_out, enc_state = enc_cell(enc_input_t_bxfpe, enc_state)
- enc_outs[t] = enc_out
-
- return enc_outs
-
- # Encode initial condition means and variances
- # ([x_T, x_T-1, ... x_0] and [x_0, x_1, ... x_T] -> g0/c0)
- self.ic_enc_fwd = [None] * num_steps
- self.ic_enc_rev = [None] * num_steps
- if ic_dim > 0:
- enc_ic_cell = cell_class(hps.ic_enc_dim,
- weight_scale=hps.cell_weight_scale,
- clip_value=hps.cell_clip_value)
- ic_enc_fwd = encode_data(dataset_do_bxtxd, enc_ic_cell,
- "ic", "forward",
- hps.num_steps_for_gen_ic)
- ic_enc_rev = encode_data(dataset_do_bxtxd, enc_ic_cell,
- "ic", "reverse",
- hps.num_steps_for_gen_ic)
- self.ic_enc_fwd = ic_enc_fwd
- self.ic_enc_rev = ic_enc_rev
-
- # Encoder control input means and variances, bi-directional encoding so:
- # ([x_T, x_T-1, ..., x_0] and [x_0, x_1 ... x_T] -> u_t)
- self.ci_enc_fwd = [None] * num_steps
- self.ci_enc_rev = [None] * num_steps
- if co_dim > 0:
- enc_ci_cell = cell_class(hps.ci_enc_dim,
- weight_scale=hps.cell_weight_scale,
- clip_value=hps.cell_clip_value)
- ci_enc_fwd = encode_data(dataset_do_bxtxd, enc_ci_cell,
- "ci", "forward",
- hps.num_steps)
- if hps.do_causal_controller:
- ci_enc_rev = None
- else:
- ci_enc_rev = encode_data(dataset_do_bxtxd, enc_ci_cell,
- "ci", "reverse",
- hps.num_steps)
- self.ci_enc_fwd = ci_enc_fwd
- self.ci_enc_rev = ci_enc_rev
-
- # STOCHASTIC LATENT VARIABLES, priors and posteriors
- # (initial conditions g0, and control inputs, u_t)
- # Note that zs represent all the stochastic latent variables.
- with tf.variable_scope("z", reuse=False):
- self.prior_zs_g0 = None
- self.posterior_zs_g0 = None
- self.g0s_val = None
- if ic_dim > 0:
- self.prior_zs_g0 = \
- LearnableDiagonalGaussian(batch_size, ic_dim, name="prior_g0",
- mean_init=0.0,
- var_min=hps.ic_prior_var_min,
- var_init=hps.ic_prior_var_scale,
- var_max=hps.ic_prior_var_max)
- ic_enc = tf.concat(axis=1, values=[ic_enc_fwd[-1], ic_enc_rev[0]])
- ic_enc = tf.nn.dropout(ic_enc, keep_prob)
- self.posterior_zs_g0 = \
- DiagonalGaussianFromInput(ic_enc, ic_dim, "ic_enc_2_post_g0",
- var_min=hps.ic_post_var_min)
- if kind in ["train", "posterior_sample_and_average",
- "posterior_push_mean"]:
- zs_g0 = self.posterior_zs_g0
- else:
- zs_g0 = self.prior_zs_g0
- if kind in ["train", "posterior_sample_and_average", "prior_sample"]:
- self.g0s_val = zs_g0.sample
- else:
- self.g0s_val = zs_g0.mean
-
- # Priors for controller, 'co' for controller output
- self.prior_zs_co = prior_zs_co = [None] * num_steps
- self.posterior_zs_co = posterior_zs_co = [None] * num_steps
- self.zs_co = zs_co = [None] * num_steps
- self.prior_zs_ar_con = None
- if co_dim > 0:
- # Controller outputs
- autocorrelation_taus = [hps.prior_ar_atau for x in range(hps.co_dim)]
- noise_variances = [hps.prior_ar_nvar for x in range(hps.co_dim)]
- self.prior_zs_ar_con = prior_zs_ar_con = \
- LearnableAutoRegressive1Prior(batch_size, hps.co_dim,
- autocorrelation_taus,
- noise_variances,
- hps.do_train_prior_ar_atau,
- hps.do_train_prior_ar_nvar,
- num_steps, "u_prior_ar1")
-
- # CONTROLLER -> GENERATOR -> RATES
- # (u(t) -> gen(t) -> factors(t) -> rates(t) -> p(x_t|z_t) )
- self.controller_outputs = u_t = [None] * num_steps
- self.con_ics = con_state = None
- self.con_states = con_states = [None] * num_steps
- self.con_outs = con_outs = [None] * num_steps
- self.gen_inputs = gen_inputs = [None] * num_steps
- if co_dim > 0:
- # gen_cell_class here for l2 penalty recurrent weights
- # didn't split the cell_weight scale here, because I doubt it matters
- con_cell = gen_cell_class(hps.con_dim,
- input_weight_scale=hps.cell_weight_scale,
- rec_weight_scale=hps.cell_weight_scale,
- clip_value=hps.cell_clip_value,
- recurrent_collections=['l2_con_reg'])
- with tf.variable_scope("con", reuse=False):
- self.con_ics = tf.tile(
- tf.Variable(tf.zeros([1, hps.con_dim*con_cell.state_multiplier]),
- name="c0"),
- tf.stack([batch_size, 1]))
- self.con_ics.set_shape([None, con_cell.state_size]) # tile loses shape
- con_states[-1] = self.con_ics
-
- gen_cell = gen_cell_class(hps.gen_dim,
- input_weight_scale=hps.gen_cell_input_weight_scale,
- rec_weight_scale=hps.gen_cell_rec_weight_scale,
- clip_value=hps.cell_clip_value,
- recurrent_collections=['l2_gen_reg'])
- with tf.variable_scope("gen", reuse=False):
- if ic_dim == 0:
- self.gen_ics = tf.tile(
- tf.Variable(tf.zeros([1, gen_cell.state_size]), name="g0"),
- tf.stack([batch_size, 1]))
- else:
- self.gen_ics = linear(self.g0s_val, gen_cell.state_size,
- identity_if_possible=True,
- name="g0_2_gen_ic")
-
- self.gen_states = gen_states = [None] * num_steps
- self.gen_outs = gen_outs = [None] * num_steps
- gen_states[-1] = self.gen_ics
- gen_outs[-1] = gen_cell.output_from_state(gen_states[-1])
- self.factors = factors = [None] * num_steps
- factors[-1] = linear(gen_outs[-1], factors_dim, do_bias=False,
- normalized=True, name="gen_2_fac")
-
- self.rates = rates = [None] * num_steps
- # rates[-1] is collected to potentially feed back to controller
- with tf.variable_scope("glm", reuse=False):
- if hps.output_dist == 'poisson':
- log_rates_t0 = tf.matmul(factors[-1], this_out_fac_W) + this_out_fac_b
- log_rates_t0.set_shape([None, None])
- rates[-1] = tf.exp(log_rates_t0) # rate
- rates[-1].set_shape([None, hps.dataset_dims[hps.dataset_names[0]]])
- elif hps.output_dist == 'gaussian':
- mean_n_logvars = tf.matmul(factors[-1],this_out_fac_W) + this_out_fac_b
- mean_n_logvars.set_shape([None, None])
- means_t_bxd, logvars_t_bxd = tf.split(axis=1, num_or_size_splits=2,
- value=mean_n_logvars)
- rates[-1] = means_t_bxd
- else:
- assert False, "NIY"
-
- # We support multiple output distributions, for example Poisson, and also
- # Gaussian. In these two cases respectively, there are one and two
- # parameters (rates vs. mean and variance). So the output_dist_params
- # tensor will variable sizes via tf.concat and tf.split, along the 1st
- # dimension. So in the case of gaussian, for example, it'll be
- # batch x (D+D), where each D dims is the mean, and then variances,
- # respectively. For a distribution with 3 parameters, it would be
- # batch x (D+D+D).
- self.output_dist_params = dist_params = [None] * num_steps
- self.log_p_xgz_b = log_p_xgz_b = 0.0 # log P(x|z)
- for t in range(num_steps):
- # Controller
- if co_dim > 0:
- # Build inputs for controller
- tlag = t - hps.controller_input_lag
- if tlag < 0:
- con_in_f_t = tf.zeros_like(ci_enc_fwd[0])
- else:
- con_in_f_t = ci_enc_fwd[tlag]
- if hps.do_causal_controller:
- # If controller is causal (wrt to data generation process), then it
- # cannot see future data. Thus, excluding ci_enc_rev[t] is obvious.
- # Less obvious is the need to exclude factors[t-1]. This arises
- # because information flows from g0 through factors to the controller
- # input. The g0 encoding is backwards, so we must necessarily exclude
- # the factors in order to keep the controller input purely from a
- # forward encoding (however unlikely it is that
- # g0->factors->controller channel might actually be used in this way).
- con_in_list_t = [con_in_f_t]
- else:
- tlag_rev = t + hps.controller_input_lag
- if tlag_rev >= num_steps:
- # better than zeros
- con_in_r_t = tf.zeros_like(ci_enc_rev[0])
- else:
- con_in_r_t = ci_enc_rev[tlag_rev]
- con_in_list_t = [con_in_f_t, con_in_r_t]
-
- if hps.do_feed_factors_to_controller:
- if hps.feedback_factors_or_rates == "factors":
- con_in_list_t.append(factors[t-1])
- elif hps.feedback_factors_or_rates == "rates":
- con_in_list_t.append(rates[t-1])
- else:
- assert False, "NIY"
-
- con_in_t = tf.concat(axis=1, values=con_in_list_t)
- con_in_t = tf.nn.dropout(con_in_t, keep_prob)
- with tf.variable_scope("con", reuse=True if t > 0 else None):
- con_outs[t], con_states[t] = con_cell(con_in_t, con_states[t-1])
- posterior_zs_co[t] = \
- DiagonalGaussianFromInput(con_outs[t], co_dim,
- name="con_to_post_co")
- if kind == "train":
- u_t[t] = posterior_zs_co[t].sample
- elif kind == "posterior_sample_and_average":
- u_t[t] = posterior_zs_co[t].sample
- elif kind == "posterior_push_mean":
- u_t[t] = posterior_zs_co[t].mean
- else:
- u_t[t] = prior_zs_ar_con.samples_t[t]
-
- # Inputs to the generator (controller output + external input)
- if ext_input_dim > 0 and hps.inject_ext_input_to_gen:
- ext_input_t_bxi = ext_input_do_bxtxi[:,t,:]
- if co_dim > 0:
- gen_inputs[t] = tf.concat(axis=1, values=[u_t[t], ext_input_t_bxi])
- else:
- gen_inputs[t] = ext_input_t_bxi
- else:
- gen_inputs[t] = u_t[t]
-
- # Generator
- data_t_bxd = dataset_ph[:,t,:]
- with tf.variable_scope("gen", reuse=True if t > 0 else None):
- gen_outs[t], gen_states[t] = gen_cell(gen_inputs[t], gen_states[t-1])
- gen_outs[t] = tf.nn.dropout(gen_outs[t], keep_prob)
- with tf.variable_scope("gen", reuse=True): # ic defined it above
- factors[t] = linear(gen_outs[t], factors_dim, do_bias=False,
- normalized=True, name="gen_2_fac")
- with tf.variable_scope("glm", reuse=True if t > 0 else None):
- if hps.output_dist == 'poisson':
- log_rates_t = tf.matmul(factors[t], this_out_fac_W) + this_out_fac_b
- log_rates_t.set_shape([None, None])
- rates[t] = dist_params[t] = tf.exp(tf.clip_by_value(log_rates_t, -hps._clip_value, hps._clip_value)) # rates feed back
- rates[t].set_shape([None, hps.dataset_dims[hps.dataset_names[0]]])
- loglikelihood_t = Poisson(log_rates_t).logp(data_t_bxd)
-
- elif hps.output_dist == 'gaussian':
- mean_n_logvars = tf.matmul(factors[t],this_out_fac_W) + this_out_fac_b
- mean_n_logvars.set_shape([None, None])
- means_t_bxd, logvars_t_bxd = tf.split(axis=1, num_or_size_splits=2,
- value=mean_n_logvars)
- rates[t] = means_t_bxd # rates feed back to controller
- dist_params[t] = tf.concat(
- axis=1, values=[means_t_bxd, tf.exp(tf.clip_by_value(logvars_t_bxd, -hps._clip_value, hps._clip_value))])
- loglikelihood_t = \
- diag_gaussian_log_likelihood(data_t_bxd,
- means_t_bxd, logvars_t_bxd)
- else:
- assert False, "NIY"
-
- log_p_xgz_b += tf.reduce_sum(loglikelihood_t, [1])
-
- # Correlation of inferred inputs cost.
- self.corr_cost = tf.constant(0.0)
- if hps.co_mean_corr_scale > 0.0:
- all_sum_corr = []
- for i in range(hps.co_dim):
- for j in range(i+1, hps.co_dim):
- sum_corr_ij = tf.constant(0.0)
- for t in range(num_steps):
- u_mean_t = posterior_zs_co[t].mean
- sum_corr_ij += u_mean_t[:,i]*u_mean_t[:,j]
- all_sum_corr.append(0.5 * tf.square(sum_corr_ij))
- self.corr_cost = tf.reduce_mean(all_sum_corr) # div by batch and by n*(n-1)/2 pairs
-
- # Variational Lower Bound on posterior, p(z|x), plus reconstruction cost.
- # KL and reconstruction costs are normalized only by batch size, not by
- # dimension, or by time steps.
- kl_cost_g0_b = tf.zeros_like(batch_size, dtype=tf.float32)
- kl_cost_co_b = tf.zeros_like(batch_size, dtype=tf.float32)
- self.kl_cost = tf.constant(0.0) # VAE KL cost
- self.recon_cost = tf.constant(0.0) # VAE reconstruction cost
- self.nll_bound_vae = tf.constant(0.0)
- self.nll_bound_iwae = tf.constant(0.0) # for eval with IWAE cost.
- if kind in ["train", "posterior_sample_and_average", "posterior_push_mean"]:
- kl_cost_g0_b = 0.0
- kl_cost_co_b = 0.0
- if ic_dim > 0:
- g0_priors = [self.prior_zs_g0]
- g0_posts = [self.posterior_zs_g0]
- kl_cost_g0_b = KLCost_GaussianGaussian(g0_posts, g0_priors).kl_cost_b
- kl_cost_g0_b = hps.kl_ic_weight * kl_cost_g0_b
- if co_dim > 0:
- kl_cost_co_b = \
- KLCost_GaussianGaussianProcessSampled(
- posterior_zs_co, prior_zs_ar_con).kl_cost_b
- kl_cost_co_b = hps.kl_co_weight * kl_cost_co_b
-
- # L = -KL + log p(x|z), to maximize bound on likelihood
- # -L = KL - log p(x|z), to minimize bound on NLL
- # so 'reconstruction cost' is negative log likelihood
- self.recon_cost = - tf.reduce_mean(log_p_xgz_b)
- self.kl_cost = tf.reduce_mean(kl_cost_g0_b + kl_cost_co_b)
-
- lb_on_ll_b = log_p_xgz_b - kl_cost_g0_b - kl_cost_co_b
-
- # VAE error averages outside the log
- self.nll_bound_vae = -tf.reduce_mean(lb_on_ll_b)
-
- # IWAE error averages inside the log
- k = tf.cast(tf.shape(log_p_xgz_b)[0], tf.float32)
- iwae_lb_on_ll = -tf.log(k) + log_sum_exp(lb_on_ll_b)
- self.nll_bound_iwae = -iwae_lb_on_ll
-
- # L2 regularization on the generator, normalized by number of parameters.
- self.l2_cost = tf.constant(0.0)
- if self.hps.l2_gen_scale > 0.0 or self.hps.l2_con_scale > 0.0:
- l2_costs = []
- l2_numels = []
- l2_reg_var_lists = [tf.get_collection('l2_gen_reg'),
- tf.get_collection('l2_con_reg')]
- l2_reg_scales = [self.hps.l2_gen_scale, self.hps.l2_con_scale]
- for l2_reg_vars, l2_scale in zip(l2_reg_var_lists, l2_reg_scales):
- for v in l2_reg_vars:
- numel = tf.reduce_prod(tf.concat(axis=0, values=tf.shape(v)))
- numel_f = tf.cast(numel, tf.float32)
- l2_numels.append(numel_f)
- v_l2 = tf.reduce_sum(v*v)
- l2_costs.append(0.5 * l2_scale * v_l2)
- self.l2_cost = tf.add_n(l2_costs) / tf.add_n(l2_numels)
-
- # Compute the cost for training, part of the graph regardless.
- # The KL cost can be problematic at the beginning of optimization,
- # so we allow an exponential increase in weighting the KL from 0
- # to 1.
- self.kl_decay_step = tf.maximum(self.train_step - hps.kl_start_step, 0)
- self.l2_decay_step = tf.maximum(self.train_step - hps.l2_start_step, 0)
- kl_decay_step_f = tf.cast(self.kl_decay_step, tf.float32)
- l2_decay_step_f = tf.cast(self.l2_decay_step, tf.float32)
- kl_increase_steps_f = tf.cast(hps.kl_increase_steps, tf.float32)
- l2_increase_steps_f = tf.cast(hps.l2_increase_steps, tf.float32)
- self.kl_weight = kl_weight = \
- tf.minimum(kl_decay_step_f / kl_increase_steps_f, 1.0)
- self.l2_weight = l2_weight = \
- tf.minimum(l2_decay_step_f / l2_increase_steps_f, 1.0)
-
- self.timed_kl_cost = kl_weight * self.kl_cost
- self.timed_l2_cost = l2_weight * self.l2_cost
- self.weight_corr_cost = hps.co_mean_corr_scale * self.corr_cost
- self.cost = self.recon_cost + self.timed_kl_cost + \
- self.timed_l2_cost + self.weight_corr_cost
-
- if kind != "train":
- # save every so often
- self.seso_saver = tf.train.Saver(tf.global_variables(),
- max_to_keep=hps.max_ckpt_to_keep)
- # lowest validation error
- self.lve_saver = tf.train.Saver(tf.global_variables(),
- max_to_keep=hps.max_ckpt_to_keep_lve)
-
- return
-
- # OPTIMIZATION
- # train the io matrices only
- if self.hps.do_train_io_only:
- self.train_vars = tvars = \
- tf.get_collection('IO_transformations',
- scope=tf.get_variable_scope().name)
- # train the encoder only
- elif self.hps.do_train_encoder_only:
- tvars1 = \
- tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
- scope='LFADS/ic_enc_*')
- tvars2 = \
- tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
- scope='LFADS/z/ic_enc_*')
-
- self.train_vars = tvars = tvars1 + tvars2
- # train all variables
- else:
- self.train_vars = tvars = \
- tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
- scope=tf.get_variable_scope().name)
- print("done.")
- print("Model Variables (to be optimized): ")
- total_params = 0
- for i in range(len(tvars)):
- shape = tvars[i].get_shape().as_list()
- print(" ", i, tvars[i].name, shape)
- total_params += np.prod(shape)
- print("Total model parameters: ", total_params)
-
- grads = tf.gradients(self.cost, tvars)
- grads, grad_global_norm = tf.clip_by_global_norm(grads, hps.max_grad_norm)
- opt = tf.train.AdamOptimizer(self.learning_rate, beta1=0.9, beta2=0.999,
- epsilon=1e-01)
- self.grads = grads
- self.grad_global_norm = grad_global_norm
- self.train_op = opt.apply_gradients(
- zip(grads, tvars), global_step=self.train_step)
-
- self.seso_saver = tf.train.Saver(tf.global_variables(),
- max_to_keep=hps.max_ckpt_to_keep)
-
- # lowest validation error
- self.lve_saver = tf.train.Saver(tf.global_variables(),
- max_to_keep=hps.max_ckpt_to_keep)
-
- # SUMMARIES, used only during training.
- # example summary
- self.example_image = tf.placeholder(tf.float32, shape=[1,None,None,3],
- name='image_tensor')
- self.example_summ = tf.summary.image("LFADS example", self.example_image,
- collections=["example_summaries"])
-
- # general training summaries
- self.lr_summ = tf.summary.scalar("Learning rate", self.learning_rate)
- self.kl_weight_summ = tf.summary.scalar("KL weight", self.kl_weight)
- self.l2_weight_summ = tf.summary.scalar("L2 weight", self.l2_weight)
- self.corr_cost_summ = tf.summary.scalar("Corr cost", self.weight_corr_cost)
- self.grad_global_norm_summ = tf.summary.scalar("Gradient global norm",
- self.grad_global_norm)
- if hps.co_dim > 0:
- self.atau_summ = [None] * hps.co_dim
- self.pvar_summ = [None] * hps.co_dim
- for c in range(hps.co_dim):
- self.atau_summ[c] = \
- tf.summary.scalar("AR Autocorrelation taus " + str(c),
- tf.exp(self.prior_zs_ar_con.logataus_1xu[0,c]))
- self.pvar_summ[c] = \
- tf.summary.scalar("AR Variances " + str(c),
- tf.exp(self.prior_zs_ar_con.logpvars_1xu[0,c]))
-
- # cost summaries, separated into different collections for
- # training vs validation. We make placeholders for these, because
- # even though the graph computes these costs on a per-batch basis,
- # we want to report the more reliable metric of per-epoch cost.
- kl_cost_ph = tf.placeholder(tf.float32, shape=[], name='kl_cost_ph')
- self.kl_t_cost_summ = tf.summary.scalar("KL cost (train)", kl_cost_ph,
- collections=["train_summaries"])
- self.kl_v_cost_summ = tf.summary.scalar("KL cost (valid)", kl_cost_ph,
- collections=["valid_summaries"])
- l2_cost_ph = tf.placeholder(tf.float32, shape=[], name='l2_cost_ph')
- self.l2_cost_summ = tf.summary.scalar("L2 cost", l2_cost_ph,
- collections=["train_summaries"])
-
- recon_cost_ph = tf.placeholder(tf.float32, shape=[], name='recon_cost_ph')
- self.recon_t_cost_summ = tf.summary.scalar("Reconstruction cost (train)",
- recon_cost_ph,
- collections=["train_summaries"])
- self.recon_v_cost_summ = tf.summary.scalar("Reconstruction cost (valid)",
- recon_cost_ph,
- collections=["valid_summaries"])
-
- total_cost_ph = tf.placeholder(tf.float32, shape=[], name='total_cost_ph')
- self.cost_t_summ = tf.summary.scalar("Total cost (train)", total_cost_ph,
- collections=["train_summaries"])
- self.cost_v_summ = tf.summary.scalar("Total cost (valid)", total_cost_ph,
- collections=["valid_summaries"])
-
- self.kl_cost_ph = kl_cost_ph
- self.l2_cost_ph = l2_cost_ph
- self.recon_cost_ph = recon_cost_ph
- self.total_cost_ph = total_cost_ph
-
- # Merged summaries, for easy coding later.
- self.merged_examples = tf.summary.merge_all(key="example_summaries")
- self.merged_generic = tf.summary.merge_all() # default key is 'summaries'
- self.merged_train = tf.summary.merge_all(key="train_summaries")
- self.merged_valid = tf.summary.merge_all(key="valid_summaries")
-
- session = tf.get_default_session()
- self.logfile = os.path.join(hps.lfads_save_dir, "lfads_log")
- self.writer = tf.summary.FileWriter(self.logfile)
-
- def build_feed_dict(self, train_name, data_bxtxd, ext_input_bxtxi=None,
- keep_prob=None):
- """Build the feed dictionary, handles cases where there is no value defined.
-
- Args:
- train_name: The key into the datasets, to set the tf.case statement for
- the proper readin / readout matrices.
- data_bxtxd: The data tensor.
- ext_input_bxtxi (optional): The external input tensor.
- keep_prob: The drop out keep probability.
-
- Returns:
- The feed dictionary with TF tensors as keys and data as values, for use
- with tf.Session.run()
-
- """
- feed_dict = {}
- B, T, _ = data_bxtxd.shape
- feed_dict[self.dataName] = train_name
- feed_dict[self.dataset_ph] = data_bxtxd
-
- if self.ext_input is not None and ext_input_bxtxi is not None:
- feed_dict[self.ext_input] = ext_input_bxtxi
-
- if keep_prob is None:
- feed_dict[self.keep_prob] = self.hps.keep_prob
- else:
- feed_dict[self.keep_prob] = keep_prob
-
- return feed_dict
-
- @staticmethod
- def get_batch(data_extxd, ext_input_extxi=None, batch_size=None,
- example_idxs=None):
- """Get a batch of data, either randomly chosen, or specified directly.
-
- Args:
- data_extxd: The data to model, numpy tensors with shape:
- # examples x # time steps x # dimensions
- ext_input_extxi (optional): The external inputs, numpy tensor with shape:
- # examples x # time steps x # external input dimensions
- batch_size: The size of the batch to return.
- example_idxs (optional): The example indices used to select examples.
-
- Returns:
- A tuple with two parts:
- 1. Batched data numpy tensor with shape:
- batch_size x # time steps x # dimensions
- 2. Batched external input numpy tensor with shape:
- batch_size x # time steps x # external input dims
- """
- assert batch_size is not None or example_idxs is not None, "Problems"
- E, T, D = data_extxd.shape
- if example_idxs is None:
- example_idxs = np.random.choice(E, batch_size)
-
- ext_input_bxtxi = None
- if ext_input_extxi is not None:
- ext_input_bxtxi = ext_input_extxi[example_idxs,:,:]
-
- return data_extxd[example_idxs,:,:], ext_input_bxtxi
-
- @staticmethod
- def example_idxs_mod_batch_size(nexamples, batch_size):
- """Given a number of examples, E, and a batch_size, B, generate indices
- [0, 1, 2, ... B-1;
- [B, B+1, ... 2*B-1;
- ...
- ]
- returning those indices as a 2-dim tensor shaped like E/B x B. Note that
- shape is only correct if E % B == 0. If not, then an extra row is generated
- so that the remainder of examples is included. The extra examples are
- explicitly to to the zero index (see randomize_example_idxs_mod_batch_size)
- for randomized behavior.
-
- Args:
- nexamples: The number of examples to batch up.
- batch_size: The size of the batch.
- Returns:
- 2-dim tensor as described above.
- """
- bmrem = batch_size - (nexamples % batch_size)
- bmrem_examples = []
- if bmrem < batch_size:
- #bmrem_examples = np.zeros(bmrem, dtype=np.int32)
- ridxs = np.random.permutation(nexamples)[0:bmrem].astype(np.int32)
- bmrem_examples = np.sort(ridxs)
- example_idxs = range(nexamples) + list(bmrem_examples)
- example_idxs_e_x_edivb = np.reshape(example_idxs, [-1, batch_size])
- return example_idxs_e_x_edivb, bmrem
-
- @staticmethod
- def randomize_example_idxs_mod_batch_size(nexamples, batch_size):
- """Indices 1:nexamples, randomized, in 2D form of
- shape = (nexamples / batch_size) x batch_size. The remainder
- is managed by drawing randomly from 1:nexamples.
-
- Args:
- nexamples: Number of examples to randomize.
- batch_size: Number of elements in batch.
-
- Returns:
- The randomized, properly shaped indicies.
- """
- assert nexamples > batch_size, "Problems"
- bmrem = batch_size - nexamples % batch_size
- bmrem_examples = []
- if bmrem < batch_size:
- bmrem_examples = np.random.choice(range(nexamples),
- size=bmrem, replace=False)
- example_idxs = range(nexamples) + list(bmrem_examples)
- mixed_example_idxs = np.random.permutation(example_idxs)
- example_idxs_e_x_edivb = np.reshape(mixed_example_idxs, [-1, batch_size])
- return example_idxs_e_x_edivb, bmrem
-
- def shuffle_spikes_in_time(self, data_bxtxd):
- """Shuffle the spikes in the temporal dimension. This is useful to
- help the LFADS system avoid overfitting to individual spikes or fast
- oscillations found in the data that are irrelevant to behavior. A
- pure 'tabula rasa' approach would avoid this, but LFADS is sensitive
- enough to pick up dynamics that you may not want.
-
- Args:
- data_bxtxd: Numpy array of spike count data to be shuffled.
- Returns:
- S_bxtxd, a numpy array with the same dimensions and contents as
- data_bxtxd, but shuffled appropriately.
-
- """
-
- B, T, N = data_bxtxd.shape
- w = self.hps.temporal_spike_jitter_width
-
- if w == 0:
- return data_bxtxd
-
- max_counts = np.max(data_bxtxd)
- S_bxtxd = np.zeros([B,T,N])
-
- # Intuitively, shuffle spike occurances, 0 or 1, but since we have counts,
- # Do it over and over again up to the max count.
- for mc in range(1,max_counts+1):
- idxs = np.nonzero(data_bxtxd >= mc)
-
- data_ones = np.zeros_like(data_bxtxd)
- data_ones[data_bxtxd >= mc] = 1
-
- nfound = len(idxs[0])
- shuffles_incrs_in_time = np.random.randint(-w, w, size=nfound)
-
- shuffle_tidxs = idxs[1].copy()
- shuffle_tidxs += shuffles_incrs_in_time
-
- # Reflect on the boundaries to not lose mass.
- shuffle_tidxs[shuffle_tidxs < 0] = -shuffle_tidxs[shuffle_tidxs < 0]
- shuffle_tidxs[shuffle_tidxs > T-1] = \
- (T-1)-(shuffle_tidxs[shuffle_tidxs > T-1] -(T-1))
-
- for iii in zip(idxs[0], shuffle_tidxs, idxs[2]):
- S_bxtxd[iii] += 1
-
- return S_bxtxd
-
- def shuffle_and_flatten_datasets(self, datasets, kind='train'):
- """Since LFADS supports multiple datasets in the same dynamical model,
- we have to be careful to use all the data in a single training epoch. But
- since the datasets my have different data dimensionality, we cannot batch
- examples from data dictionaries together. Instead, we generate random
- batches within each data dictionary, and then randomize these batches
- while holding onto the dataname, so that when it's time to feed
- the graph, the correct in/out matrices can be selected, per batch.
-
- Args:
- datasets: A dict of data dicts. The dataset dict is simply a
- name(string)-> data dictionary mapping (See top of lfads.py).
- kind: 'train' or 'valid'
-
- Returns:
- A flat list, in which each element is a pair ('name', indices).
- """
- batch_size = self.hps.batch_size
- ndatasets = len(datasets)
- random_example_idxs = {}
- epoch_idxs = {}
- all_name_example_idx_pairs = []
- kind_data = kind + '_data'
- for name, data_dict in datasets.items():
- nexamples, ntime, data_dim = data_dict[kind_data].shape
- epoch_idxs[name] = 0
- random_example_idxs, _ = \
- self.randomize_example_idxs_mod_batch_size(nexamples, batch_size)
-
- epoch_size = random_example_idxs.shape[0]
- names = [name] * epoch_size
- all_name_example_idx_pairs += zip(names, random_example_idxs)
-
- np.random.shuffle(all_name_example_idx_pairs) # shuffle in place
-
- return all_name_example_idx_pairs
-
- def train_epoch(self, datasets, batch_size=None, do_save_ckpt=True):
- """Train the model through the entire dataset once.
-
- Args:
- datasets: A dict of data dicts. The dataset dict is simply a
- name(string)-> data dictionary mapping (See top of lfads.py).
- batch_size (optional): The batch_size to use.
- do_save_ckpt (optional): Should the routine save a checkpoint on this
- training epoch?
-
- Returns:
- A tuple with 6 float values:
- (total cost of the epoch, epoch reconstruction cost,
- epoch kl cost, KL weight used this training epoch,
- total l2 cost on generator, and the corresponding weight).
- """
- ops_to_eval = [self.cost, self.recon_cost,
- self.kl_cost, self.kl_weight,
- self.l2_cost, self.l2_weight,
- self.train_op]
- collected_op_values = self.run_epoch(datasets, ops_to_eval, kind="train")
-
- total_cost = total_recon_cost = total_kl_cost = 0.0
- # normalizing by batch done in distributions.py
- epoch_size = len(collected_op_values)
- for op_values in collected_op_values:
- total_cost += op_values[0]
- total_recon_cost += op_values[1]
- total_kl_cost += op_values[2]
-
- kl_weight = collected_op_values[-1][3]
- l2_cost = collected_op_values[-1][4]
- l2_weight = collected_op_values[-1][5]
-
- epoch_total_cost = total_cost / epoch_size
- epoch_recon_cost = total_recon_cost / epoch_size
- epoch_kl_cost = total_kl_cost / epoch_size
-
- if do_save_ckpt:
- session = tf.get_default_session()
- checkpoint_path = os.path.join(self.hps.lfads_save_dir,
- self.hps.checkpoint_name + '.ckpt')
- self.seso_saver.save(session, checkpoint_path,
- global_step=self.train_step)
-
- return epoch_total_cost, epoch_recon_cost, epoch_kl_cost, \
- kl_weight, l2_cost, l2_weight
-
-
- def run_epoch(self, datasets, ops_to_eval, kind="train", batch_size=None,
- do_collect=True, keep_prob=None):
- """Run the model through the entire dataset once.
-
- Args:
- datasets: A dict of data dicts. The dataset dict is simply a
- name(string)-> data dictionary mapping (See top of lfads.py).
- ops_to_eval: A list of tensorflow operations that will be evaluated in
- the tf.session.run() call.
- batch_size (optional): The batch_size to use.
- do_collect (optional): Should the routine collect all session.run
- output as a list, and return it?
- keep_prob (optional): The dropout keep probability.
-
- Returns:
- A list of lists, the internal list is the return for the ops for each
- session.run() call. The outer list collects over the epoch.
- """
- hps = self.hps
- all_name_example_idx_pairs = \
- self.shuffle_and_flatten_datasets(datasets, kind)
-
- kind_data = kind + '_data'
- kind_ext_input = kind + '_ext_input'
-
- total_cost = total_recon_cost = total_kl_cost = 0.0
- session = tf.get_default_session()
- epoch_size = len(all_name_example_idx_pairs)
- evaled_ops_list = []
- for name, example_idxs in all_name_example_idx_pairs:
- data_dict = datasets[name]
- data_extxd = data_dict[kind_data]
- if hps.output_dist == 'poisson' and hps.temporal_spike_jitter_width > 0:
- data_extxd = self.shuffle_spikes_in_time(data_extxd)
-
- ext_input_extxi = data_dict[kind_ext_input]
- data_bxtxd, ext_input_bxtxi = self.get_batch(data_extxd, ext_input_extxi,
- example_idxs=example_idxs)
-
- feed_dict = self.build_feed_dict(name, data_bxtxd, ext_input_bxtxi,
- keep_prob=keep_prob)
- evaled_ops_np = session.run(ops_to_eval, feed_dict=feed_dict)
- if do_collect:
- evaled_ops_list.append(evaled_ops_np)
-
- return evaled_ops_list
-
- def summarize_all(self, datasets, summary_values):
- """Plot and summarize stuff in tensorboard.
-
- Note that everything done in the current function is otherwise done on
- a single, randomly selected dataset (except for summary_values, which are
- passed in.)
-
- Args:
- datasets, the dictionary of datasets used in the study.
- summary_values: These summary values are created from the training loop,
- and so summarize the entire set of datasets.
- """
- hps = self.hps
- tr_kl_cost = summary_values['tr_kl_cost']
- tr_recon_cost = summary_values['tr_recon_cost']
- tr_total_cost = summary_values['tr_total_cost']
- kl_weight = summary_values['kl_weight']
- l2_weight = summary_values['l2_weight']
- l2_cost = summary_values['l2_cost']
- has_any_valid_set = summary_values['has_any_valid_set']
- i = summary_values['nepochs']
-
- session = tf.get_default_session()
- train_summ, train_step = session.run([self.merged_train,
- self.train_step],
- feed_dict={self.l2_cost_ph:l2_cost,
- self.kl_cost_ph:tr_kl_cost,
- self.recon_cost_ph:tr_recon_cost,
- self.total_cost_ph:tr_total_cost})
- self.writer.add_summary(train_summ, train_step)
- if has_any_valid_set:
- ev_kl_cost = summary_values['ev_kl_cost']
- ev_recon_cost = summary_values['ev_recon_cost']
- ev_total_cost = summary_values['ev_total_cost']
- eval_summ = session.run(self.merged_valid,
- feed_dict={self.kl_cost_ph:ev_kl_cost,
- self.recon_cost_ph:ev_recon_cost,
- self.total_cost_ph:ev_total_cost})
- self.writer.add_summary(eval_summ, train_step)
- print("Epoch:%d, step:%d (TRAIN, VALID): total: %.2f, %.2f\
- recon: %.2f, %.2f, kl: %.2f, %.2f, l2: %.5f,\
- kl weight: %.2f, l2 weight: %.2f" % \
- (i, train_step, tr_total_cost, ev_total_cost,
- tr_recon_cost, ev_recon_cost, tr_kl_cost, ev_kl_cost,
- l2_cost, kl_weight, l2_weight))
-
- csv_outstr = "epoch,%d, step,%d, total,%.2f,%.2f, \
- recon,%.2f,%.2f, kl,%.2f,%.2f, l2,%.5f, \
- klweight,%.2f, l2weight,%.2f\n"% \
- (i, train_step, tr_total_cost, ev_total_cost,
- tr_recon_cost, ev_recon_cost, tr_kl_cost, ev_kl_cost,
- l2_cost, kl_weight, l2_weight)
-
- else:
- print("Epoch:%d, step:%d TRAIN: total: %.2f recon: %.2f, kl: %.2f,\
- l2: %.5f, kl weight: %.2f, l2 weight: %.2f" % \
- (i, train_step, tr_total_cost, tr_recon_cost, tr_kl_cost,
- l2_cost, kl_weight, l2_weight))
- csv_outstr = "epoch,%d, step,%d, total,%.2f, recon,%.2f, kl,%.2f, \
- l2,%.5f, klweight,%.2f, l2weight,%.2f\n"% \
- (i, train_step, tr_total_cost, tr_recon_cost,
- tr_kl_cost, l2_cost, kl_weight, l2_weight)
-
- if self.hps.csv_log:
- csv_file = os.path.join(self.hps.lfads_save_dir, self.hps.csv_log+'.csv')
- with open(csv_file, "a") as myfile:
- myfile.write(csv_outstr)
-
-
- def plot_single_example(self, datasets):
- """Plot an image relating to a randomly chosen, specific example. We use
- posterior sample and average by taking one example, and filling a whole
- batch with that example, sample from the posterior, and then average the
- quantities.
-
- """
- hps = self.hps
- all_data_names = datasets.keys()
- data_name = np.random.permutation(all_data_names)[0]
- data_dict = datasets[data_name]
- has_valid_set = True if data_dict['valid_data'] is not None else False
- cf = 1.0 # plotting concern
-
- # posterior sample and average here
- E, _, _ = data_dict['train_data'].shape
- eidx = np.random.choice(E)
- example_idxs = eidx * np.ones(hps.batch_size, dtype=np.int32)
-
- train_data_bxtxd, train_ext_input_bxtxi = \
- self.get_batch(data_dict['train_data'], data_dict['train_ext_input'],
- example_idxs=example_idxs)
-
- truth_train_data_bxtxd = None
- if 'train_truth' in data_dict and data_dict['train_truth'] is not None:
- truth_train_data_bxtxd, _ = self.get_batch(data_dict['train_truth'],
- example_idxs=example_idxs)
- cf = data_dict['conversion_factor']
-
- # plotter does averaging
- train_model_values = self.eval_model_runs_batch(data_name,
- train_data_bxtxd,
- train_ext_input_bxtxi,
- do_average_batch=False)
-
- train_step = train_model_values['train_steps']
- feed_dict = self.build_feed_dict(data_name, train_data_bxtxd,
- train_ext_input_bxtxi, keep_prob=1.0)
-
- session = tf.get_default_session()
- generic_summ = session.run(self.merged_generic, feed_dict=feed_dict)
- self.writer.add_summary(generic_summ, train_step)
-
- valid_data_bxtxd = valid_model_values = valid_ext_input_bxtxi = None
- truth_valid_data_bxtxd = None
- if has_valid_set:
- E, _, _ = data_dict['valid_data'].shape
- eidx = np.random.choice(E)
- example_idxs = eidx * np.ones(hps.batch_size, dtype=np.int32)
- valid_data_bxtxd, valid_ext_input_bxtxi = \
- self.get_batch(data_dict['valid_data'],
- data_dict['valid_ext_input'],
- example_idxs=example_idxs)
- if 'valid_truth' in data_dict and data_dict['valid_truth'] is not None:
- truth_valid_data_bxtxd, _ = self.get_batch(data_dict['valid_truth'],
- example_idxs=example_idxs)
- else:
- truth_valid_data_bxtxd = None
-
- # plotter does averaging
- valid_model_values = self.eval_model_runs_batch(data_name,
- valid_data_bxtxd,
- valid_ext_input_bxtxi,
- do_average_batch=False)
-
- example_image = plot_lfads(train_bxtxd=train_data_bxtxd,
- train_model_vals=train_model_values,
- train_ext_input_bxtxi=train_ext_input_bxtxi,
- train_truth_bxtxd=truth_train_data_bxtxd,
- valid_bxtxd=valid_data_bxtxd,
- valid_model_vals=valid_model_values,
- valid_ext_input_bxtxi=valid_ext_input_bxtxi,
- valid_truth_bxtxd=truth_valid_data_bxtxd,
- bidx=None, cf=cf, output_dist=hps.output_dist)
- example_image = np.expand_dims(example_image, axis=0)
- example_summ = session.run(self.merged_examples,
- feed_dict={self.example_image : example_image})
- self.writer.add_summary(example_summ)
-
- def train_model(self, datasets):
- """Train the model, print per-epoch information, and save checkpoints.
-
- Loop over training epochs. The function that actually does the
- training is train_epoch. This function iterates over the training
- data, one epoch at a time. The learning rate schedule is such
- that it will stay the same until the cost goes up in comparison to
- the last few values, then it will drop.
-
- Args:
- datasets: A dict of data dicts. The dataset dict is simply a
- name(string)-> data dictionary mapping (See top of lfads.py).
- """
- hps = self.hps
- has_any_valid_set = False
- for data_dict in datasets.values():
- if data_dict['valid_data'] is not None:
- has_any_valid_set = True
- break
-
- session = tf.get_default_session()
- lr = session.run(self.learning_rate)
- lr_stop = hps.learning_rate_stop
- i = -1
- train_costs = []
- valid_costs = []
- ev_total_cost = ev_recon_cost = ev_kl_cost = 0.0
- lowest_ev_cost = np.Inf
- while True:
- i += 1
- do_save_ckpt = True if i % 10 ==0 else False
- tr_total_cost, tr_recon_cost, tr_kl_cost, kl_weight, l2_cost, l2_weight = \
- self.train_epoch(datasets, do_save_ckpt=do_save_ckpt)
-
- # Evaluate the validation cost, and potentially save. Note that this
- # routine will not save a validation checkpoint until the kl weight and
- # l2 weights are equal to 1.0.
- if has_any_valid_set:
- ev_total_cost, ev_recon_cost, ev_kl_cost = \
- self.eval_cost_epoch(datasets, kind='valid')
- valid_costs.append(ev_total_cost)
-
- # > 1 may give more consistent results, but not the actual lowest vae.
- # == 1 gives the lowest vae seen so far.
- n_lve = 1
- run_avg_lve = np.mean(valid_costs[-n_lve:])
-
- # conditions for saving checkpoints:
- # KL weight must have finished stepping (>=1.0), AND
- # L2 weight must have finished stepping OR L2 is not being used, AND
- # the current run has a lower LVE than previous runs AND
- # len(valid_costs > n_lve) (not sure what that does)
- if kl_weight >= 1.0 and \
- (l2_weight >= 1.0 or \
- (self.hps.l2_gen_scale == 0.0 and self.hps.l2_con_scale == 0.0)) \
- and (len(valid_costs) > n_lve and run_avg_lve < lowest_ev_cost):
-
- lowest_ev_cost = run_avg_lve
- checkpoint_path = os.path.join(self.hps.lfads_save_dir,
- self.hps.checkpoint_name + '_lve.ckpt')
- self.lve_saver.save(session, checkpoint_path,
- global_step=self.train_step,
- latest_filename='checkpoint_lve')
-
- # Plot and summarize.
- values = {'nepochs':i, 'has_any_valid_set': has_any_valid_set,
- 'tr_total_cost':tr_total_cost, 'ev_total_cost':ev_total_cost,
- 'tr_recon_cost':tr_recon_cost, 'ev_recon_cost':ev_recon_cost,
- 'tr_kl_cost':tr_kl_cost, 'ev_kl_cost':ev_kl_cost,
- 'l2_weight':l2_weight, 'kl_weight':kl_weight,
- 'l2_cost':l2_cost}
- self.summarize_all(datasets, values)
- self.plot_single_example(datasets)
-
- # Manage learning rate.
- train_res = tr_total_cost
- n_lr = hps.learning_rate_n_to_compare
- if len(train_costs) > n_lr and train_res > np.max(train_costs[-n_lr:]):
- _ = session.run(self.learning_rate_decay_op)
- lr = session.run(self.learning_rate)
- print(" Decreasing learning rate to %f." % lr)
- # Force the system to run n_lr times while at this lr.
- train_costs.append(np.inf)
- else:
- train_costs.append(train_res)
-
- if lr < lr_stop:
- print("Stopping optimization based on learning rate criteria.")
- break
-
- def eval_cost_epoch(self, datasets, kind='train', ext_input_extxi=None,
- batch_size=None):
- """Evaluate the cost of the epoch.
-
- Args:
- data_dict: The dictionary of data (training and validation) used for
- training and evaluation of the model, respectively.
-
- Returns:
- a 3 tuple of costs:
- (epoch total cost, epoch reconstruction cost, epoch KL cost)
- """
- ops_to_eval = [self.cost, self.recon_cost, self.kl_cost]
- collected_op_values = self.run_epoch(datasets, ops_to_eval, kind=kind,
- keep_prob=1.0)
-
- total_cost = total_recon_cost = total_kl_cost = 0.0
- # normalizing by batch done in distributions.py
- epoch_size = len(collected_op_values)
- for op_values in collected_op_values:
- total_cost += op_values[0]
- total_recon_cost += op_values[1]
- total_kl_cost += op_values[2]
-
- epoch_total_cost = total_cost / epoch_size
- epoch_recon_cost = total_recon_cost / epoch_size
- epoch_kl_cost = total_kl_cost / epoch_size
-
- return epoch_total_cost, epoch_recon_cost, epoch_kl_cost
-
- def eval_model_runs_batch(self, data_name, data_bxtxd, ext_input_bxtxi=None,
- do_eval_cost=False, do_average_batch=False):
- """Returns all the goodies for the entire model, per batch.
-
- If data_bxtxd and ext_input_bxtxi can have fewer than batch_size along dim 1
- in which case this handles the padding and truncating automatically
-
- Args:
- data_name: The name of the data dict, to select which in/out matrices
- to use.
- data_bxtxd: Numpy array training data with shape:
- batch_size x # time steps x # dimensions
- ext_input_bxtxi: Numpy array training external input with shape:
- batch_size x # time steps x # external input dims
- do_eval_cost (optional): If true, the IWAE (Importance Weighted
- Autoencoder) log likeihood bound, instead of the VAE version.
- do_average_batch (optional): average over the batch, useful for getting
- good IWAE costs, and model outputs for a single data point.
-
- Returns:
- A dictionary with the outputs of the model decoder, namely:
- prior g0 mean, prior g0 variance, approx. posterior mean, approx
- posterior mean, the generator initial conditions, the control inputs (if
- enabled), the state of the generator, the factors, and the rates.
- """
- session = tf.get_default_session()
-
- # if fewer than batch_size provided, pad to batch_size
- hps = self.hps
- batch_size = hps.batch_size
- E, _, _ = data_bxtxd.shape
- if E < hps.batch_size:
- data_bxtxd = np.pad(data_bxtxd, ((0, hps.batch_size-E), (0, 0), (0, 0)),
- mode='constant', constant_values=0)
- if ext_input_bxtxi is not None:
- ext_input_bxtxi = np.pad(ext_input_bxtxi,
- ((0, hps.batch_size-E), (0, 0), (0, 0)),
- mode='constant', constant_values=0)
-
- feed_dict = self.build_feed_dict(data_name, data_bxtxd,
- ext_input_bxtxi, keep_prob=1.0)
-
- # Non-temporal signals will be batch x dim.
- # Temporal signals are list length T with elements batch x dim.
- tf_vals = [self.gen_ics, self.gen_states, self.factors,
- self.output_dist_params]
- tf_vals.append(self.cost)
- tf_vals.append(self.nll_bound_vae)
- tf_vals.append(self.nll_bound_iwae)
- tf_vals.append(self.train_step) # not train_op!
- if self.hps.ic_dim > 0:
- tf_vals += [self.prior_zs_g0.mean, self.prior_zs_g0.logvar,
- self.posterior_zs_g0.mean, self.posterior_zs_g0.logvar]
- if self.hps.co_dim > 0:
- tf_vals.append(self.controller_outputs)
- tf_vals_flat, fidxs = flatten(tf_vals)
-
- np_vals_flat = session.run(tf_vals_flat, feed_dict=feed_dict)
-
- ff = 0
- gen_ics = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- gen_states = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- factors = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- out_dist_params = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- costs = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- nll_bound_vaes = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- nll_bound_iwaes = [np_vals_flat[f] for f in fidxs[ff]]; ff +=1
- train_steps = [np_vals_flat[f] for f in fidxs[ff]]; ff +=1
- if self.hps.ic_dim > 0:
- prior_g0_mean = [np_vals_flat[f] for f in fidxs[ff]]; ff +=1
- prior_g0_logvar = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- post_g0_mean = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- post_g0_logvar = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- if self.hps.co_dim > 0:
- controller_outputs = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
-
- # [0] are to take out the non-temporal items from lists
- gen_ics = gen_ics[0]
- costs = costs[0]
- nll_bound_vaes = nll_bound_vaes[0]
- nll_bound_iwaes = nll_bound_iwaes[0]
- train_steps = train_steps[0]
-
- # Convert to full tensors, not lists of tensors in time dim.
- gen_states = list_t_bxn_to_tensor_bxtxn(gen_states)
- factors = list_t_bxn_to_tensor_bxtxn(factors)
- out_dist_params = list_t_bxn_to_tensor_bxtxn(out_dist_params)
- if self.hps.ic_dim > 0:
- # select first time point
- prior_g0_mean = prior_g0_mean[0]
- prior_g0_logvar = prior_g0_logvar[0]
- post_g0_mean = post_g0_mean[0]
- post_g0_logvar = post_g0_logvar[0]
- if self.hps.co_dim > 0:
- controller_outputs = list_t_bxn_to_tensor_bxtxn(controller_outputs)
-
- # slice out the trials in case < batch_size provided
- if E < hps.batch_size:
- idx = np.arange(E)
- gen_ics = gen_ics[idx, :]
- gen_states = gen_states[idx, :]
- factors = factors[idx, :, :]
- out_dist_params = out_dist_params[idx, :, :]
- if self.hps.ic_dim > 0:
- prior_g0_mean = prior_g0_mean[idx, :]
- prior_g0_logvar = prior_g0_logvar[idx, :]
- post_g0_mean = post_g0_mean[idx, :]
- post_g0_logvar = post_g0_logvar[idx, :]
- if self.hps.co_dim > 0:
- controller_outputs = controller_outputs[idx, :, :]
-
- if do_average_batch:
- gen_ics = np.mean(gen_ics, axis=0)
- gen_states = np.mean(gen_states, axis=0)
- factors = np.mean(factors, axis=0)
- out_dist_params = np.mean(out_dist_params, axis=0)
- if self.hps.ic_dim > 0:
- prior_g0_mean = np.mean(prior_g0_mean, axis=0)
- prior_g0_logvar = np.mean(prior_g0_logvar, axis=0)
- post_g0_mean = np.mean(post_g0_mean, axis=0)
- post_g0_logvar = np.mean(post_g0_logvar, axis=0)
- if self.hps.co_dim > 0:
- controller_outputs = np.mean(controller_outputs, axis=0)
-
- model_vals = {}
- model_vals['gen_ics'] = gen_ics
- model_vals['gen_states'] = gen_states
- model_vals['factors'] = factors
- model_vals['output_dist_params'] = out_dist_params
- model_vals['costs'] = costs
- model_vals['nll_bound_vaes'] = nll_bound_vaes
- model_vals['nll_bound_iwaes'] = nll_bound_iwaes
- model_vals['train_steps'] = train_steps
- if self.hps.ic_dim > 0:
- model_vals['prior_g0_mean'] = prior_g0_mean
- model_vals['prior_g0_logvar'] = prior_g0_logvar
- model_vals['post_g0_mean'] = post_g0_mean
- model_vals['post_g0_logvar'] = post_g0_logvar
- if self.hps.co_dim > 0:
- model_vals['controller_outputs'] = controller_outputs
-
- return model_vals
-
- def eval_model_runs_avg_epoch(self, data_name, data_extxd,
- ext_input_extxi=None):
- """Returns all the expected value for goodies for the entire model.
-
- The expected value is taken over hidden (z) variables, namely the initial
- conditions and the control inputs. The expected value is approximate, and
- accomplished via sampling (batch_size) samples for every examples.
-
- Args:
- data_name: The name of the data dict, to select which in/out matrices
- to use.
- data_extxd: Numpy array training data with shape:
- # examples x # time steps x # dimensions
- ext_input_extxi (optional): Numpy array training external input with
- shape: # examples x # time steps x # external input dims
-
- Returns:
- A dictionary with the averaged outputs of the model decoder, namely:
- prior g0 mean, prior g0 variance, approx. posterior mean, approx
- posterior mean, the generator initial conditions, the control inputs (if
- enabled), the state of the generator, the factors, and the output
- distribution parameters, e.g. (rates or mean and variances).
- """
- hps = self.hps
- batch_size = hps.batch_size
- E, T, D = data_extxd.shape
- E_to_process = hps.ps_nexamples_to_process
- if E_to_process > E:
- E_to_process = E
-
- if hps.ic_dim > 0:
- prior_g0_mean = np.zeros([E_to_process, hps.ic_dim])
- prior_g0_logvar = np.zeros([E_to_process, hps.ic_dim])
- post_g0_mean = np.zeros([E_to_process, hps.ic_dim])
- post_g0_logvar = np.zeros([E_to_process, hps.ic_dim])
-
- if hps.co_dim > 0:
- controller_outputs = np.zeros([E_to_process, T, hps.co_dim])
- gen_ics = np.zeros([E_to_process, hps.gen_dim])
- gen_states = np.zeros([E_to_process, T, hps.gen_dim])
- factors = np.zeros([E_to_process, T, hps.factors_dim])
-
- if hps.output_dist == 'poisson':
- out_dist_params = np.zeros([E_to_process, T, D])
- elif hps.output_dist == 'gaussian':
- out_dist_params = np.zeros([E_to_process, T, D+D])
- else:
- assert False, "NIY"
-
- costs = np.zeros(E_to_process)
- nll_bound_vaes = np.zeros(E_to_process)
- nll_bound_iwaes = np.zeros(E_to_process)
- train_steps = np.zeros(E_to_process)
- for es_idx in range(E_to_process):
- print("Running %d of %d." % (es_idx+1, E_to_process))
- example_idxs = es_idx * np.ones(batch_size, dtype=np.int32)
- data_bxtxd, ext_input_bxtxi = self.get_batch(data_extxd,
- ext_input_extxi,
- batch_size=batch_size,
- example_idxs=example_idxs)
- model_values = self.eval_model_runs_batch(data_name, data_bxtxd,
- ext_input_bxtxi,
- do_eval_cost=True,
- do_average_batch=True)
-
- if self.hps.ic_dim > 0:
- prior_g0_mean[es_idx,:] = model_values['prior_g0_mean']
- prior_g0_logvar[es_idx,:] = model_values['prior_g0_logvar']
- post_g0_mean[es_idx,:] = model_values['post_g0_mean']
- post_g0_logvar[es_idx,:] = model_values['post_g0_logvar']
- gen_ics[es_idx,:] = model_values['gen_ics']
-
- if self.hps.co_dim > 0:
- controller_outputs[es_idx,:,:] = model_values['controller_outputs']
- gen_states[es_idx,:,:] = model_values['gen_states']
- factors[es_idx,:,:] = model_values['factors']
- out_dist_params[es_idx,:,:] = model_values['output_dist_params']
- costs[es_idx] = model_values['costs']
- nll_bound_vaes[es_idx] = model_values['nll_bound_vaes']
- nll_bound_iwaes[es_idx] = model_values['nll_bound_iwaes']
- train_steps[es_idx] = model_values['train_steps']
- print('bound nll(vae): %.3f, bound nll(iwae): %.3f' \
- % (nll_bound_vaes[es_idx], nll_bound_iwaes[es_idx]))
-
- model_runs = {}
- if self.hps.ic_dim > 0:
- model_runs['prior_g0_mean'] = prior_g0_mean
- model_runs['prior_g0_logvar'] = prior_g0_logvar
- model_runs['post_g0_mean'] = post_g0_mean
- model_runs['post_g0_logvar'] = post_g0_logvar
- model_runs['gen_ics'] = gen_ics
-
- if self.hps.co_dim > 0:
- model_runs['controller_outputs'] = controller_outputs
- model_runs['gen_states'] = gen_states
- model_runs['factors'] = factors
- model_runs['output_dist_params'] = out_dist_params
- model_runs['costs'] = costs
- model_runs['nll_bound_vaes'] = nll_bound_vaes
- model_runs['nll_bound_iwaes'] = nll_bound_iwaes
- model_runs['train_steps'] = train_steps
- return model_runs
-
- def eval_model_runs_push_mean(self, data_name, data_extxd,
- ext_input_extxi=None):
- """Returns values of interest for the model by pushing the means through
-
- The mean values for both initial conditions and the control inputs are
- pushed through the model instead of sampling (as is done in
- eval_model_runs_avg_epoch).
- This is a quick and approximate version of estimating these values instead
- of sampling from the posterior many times and then averaging those values of
- interest.
-
- Internally, a total of batch_size trials are run through the model at once.
-
- Args:
- data_name: The name of the data dict, to select which in/out matrices
- to use.
- data_extxd: Numpy array training data with shape:
- # examples x # time steps x # dimensions
- ext_input_extxi (optional): Numpy array training external input with
- shape: # examples x # time steps x # external input dims
-
- Returns:
- A dictionary with the estimated outputs of the model decoder, namely:
- prior g0 mean, prior g0 variance, approx. posterior mean, approx
- posterior mean, the generator initial conditions, the control inputs (if
- enabled), the state of the generator, the factors, and the output
- distribution parameters, e.g. (rates or mean and variances).
- """
- hps = self.hps
- batch_size = hps.batch_size
- E, T, D = data_extxd.shape
- E_to_process = hps.ps_nexamples_to_process
- if E_to_process > E:
- print("Setting number of posterior samples to process to : ", E)
- E_to_process = E
-
- if hps.ic_dim > 0:
- prior_g0_mean = np.zeros([E_to_process, hps.ic_dim])
- prior_g0_logvar = np.zeros([E_to_process, hps.ic_dim])
- post_g0_mean = np.zeros([E_to_process, hps.ic_dim])
- post_g0_logvar = np.zeros([E_to_process, hps.ic_dim])
-
- if hps.co_dim > 0:
- controller_outputs = np.zeros([E_to_process, T, hps.co_dim])
- gen_ics = np.zeros([E_to_process, hps.gen_dim])
- gen_states = np.zeros([E_to_process, T, hps.gen_dim])
- factors = np.zeros([E_to_process, T, hps.factors_dim])
-
- if hps.output_dist == 'poisson':
- out_dist_params = np.zeros([E_to_process, T, D])
- elif hps.output_dist == 'gaussian':
- out_dist_params = np.zeros([E_to_process, T, D+D])
- else:
- assert False, "NIY"
-
- costs = np.zeros(E_to_process)
- nll_bound_vaes = np.zeros(E_to_process)
- nll_bound_iwaes = np.zeros(E_to_process)
- train_steps = np.zeros(E_to_process)
-
- # generator that will yield 0:N in groups of per items, e.g.
- # (0:per-1), (per:2*per-1), ..., with the last group containing <= per items
- # this will be used to feed per=batch_size trials into the model at a time
- def trial_batches(N, per):
- for i in range(0, N, per):
- yield np.arange(i, min(i+per, N), dtype=np.int32)
-
- for batch_idx, es_idx in enumerate(trial_batches(E_to_process,
- hps.batch_size)):
- print("Running trial batch %d with %d trials" % (batch_idx+1,
- len(es_idx)))
- data_bxtxd, ext_input_bxtxi = self.get_batch(data_extxd,
- ext_input_extxi,
- batch_size=batch_size,
- example_idxs=es_idx)
- model_values = self.eval_model_runs_batch(data_name, data_bxtxd,
- ext_input_bxtxi,
- do_eval_cost=True,
- do_average_batch=False)
-
- if self.hps.ic_dim > 0:
- prior_g0_mean[es_idx,:] = model_values['prior_g0_mean']
- prior_g0_logvar[es_idx,:] = model_values['prior_g0_logvar']
- post_g0_mean[es_idx,:] = model_values['post_g0_mean']
- post_g0_logvar[es_idx,:] = model_values['post_g0_logvar']
- gen_ics[es_idx,:] = model_values['gen_ics']
-
- if self.hps.co_dim > 0:
- controller_outputs[es_idx,:,:] = model_values['controller_outputs']
- gen_states[es_idx,:,:] = model_values['gen_states']
- factors[es_idx,:,:] = model_values['factors']
- out_dist_params[es_idx,:,:] = model_values['output_dist_params']
-
- # TODO
- # model_values['costs'] and other costs come out as scalars, summed over
- # all the trials in the batch. what we want is the per-trial costs
- costs[es_idx] = model_values['costs']
- nll_bound_vaes[es_idx] = model_values['nll_bound_vaes']
- nll_bound_iwaes[es_idx] = model_values['nll_bound_iwaes']
-
- train_steps[es_idx] = model_values['train_steps']
-
- model_runs = {}
- if self.hps.ic_dim > 0:
- model_runs['prior_g0_mean'] = prior_g0_mean
- model_runs['prior_g0_logvar'] = prior_g0_logvar
- model_runs['post_g0_mean'] = post_g0_mean
- model_runs['post_g0_logvar'] = post_g0_logvar
- model_runs['gen_ics'] = gen_ics
-
- if self.hps.co_dim > 0:
- model_runs['controller_outputs'] = controller_outputs
- model_runs['gen_states'] = gen_states
- model_runs['factors'] = factors
- model_runs['output_dist_params'] = out_dist_params
-
- # You probably do not want the LL associated values when pushing the mean
- # instead of sampling.
- model_runs['costs'] = costs
- model_runs['nll_bound_vaes'] = nll_bound_vaes
- model_runs['nll_bound_iwaes'] = nll_bound_iwaes
- model_runs['train_steps'] = train_steps
- return model_runs
-
- def write_model_runs(self, datasets, output_fname=None, push_mean=False):
- """Run the model on the data in data_dict, and save the computed values.
-
- LFADS generates a number of outputs for each examples, and these are all
- saved. They are:
- The mean and variance of the prior of g0.
- The mean and variance of approximate posterior of g0.
- The control inputs (if enabled).
- The initial conditions, g0, for all examples.
- The generator states for all time.
- The factors for all time.
- The output distribution parameters (e.g. rates) for all time.
-
- Args:
- datasets: A dictionary of named data_dictionaries, see top of lfads.py
- output_fname: a file name stem for the output files.
- push_mean: If False (default), generates batch_size samples for each trial
- and averages the results. if True, runs each trial once without noise,
- pushing the posterior mean initial conditions and control inputs through
- the trained model. False is used for posterior_sample_and_average, True
- is used for posterior_push_mean.
- """
- hps = self.hps
- kind = hps.kind
-
- for data_name, data_dict in datasets.items():
- data_tuple = [('train', data_dict['train_data'],
- data_dict['train_ext_input']),
- ('valid', data_dict['valid_data'],
- data_dict['valid_ext_input'])]
- for data_kind, data_extxd, ext_input_extxi in data_tuple:
- if not output_fname:
- fname = "model_runs_" + data_name + '_' + data_kind + '_' + kind
- else:
- fname = output_fname + data_name + '_' + data_kind + '_' + kind
-
- print("Writing data for %s data and kind %s." % (data_name, data_kind))
- if push_mean:
- model_runs = self.eval_model_runs_push_mean(data_name, data_extxd,
- ext_input_extxi)
- else:
- model_runs = self.eval_model_runs_avg_epoch(data_name, data_extxd,
- ext_input_extxi)
- full_fname = os.path.join(hps.lfads_save_dir, fname)
- write_data(full_fname, model_runs, compression='gzip')
- print("Done.")
-
- def write_model_samples(self, dataset_name, output_fname=None):
- """Use the prior distribution to generate batch_size number of samples
- from the model.
-
- LFADS generates a number of outputs for each sample, and these are all
- saved. They are:
- The mean and variance of the prior of g0.
- The control inputs (if enabled).
- The initial conditions, g0, for all examples.
- The generator states for all time.
- The factors for all time.
- The output distribution parameters (e.g. rates) for all time.
-
- Args:
- dataset_name: The name of the dataset to grab the factors -> rates
- alignment matrices from.
- output_fname: The name of the file in which to save the generated
- samples.
- """
- hps = self.hps
- batch_size = hps.batch_size
-
- print("Generating %d samples" % (batch_size))
- tf_vals = [self.factors, self.gen_states, self.gen_ics,
- self.cost, self.output_dist_params]
- if hps.ic_dim > 0:
- tf_vals += [self.prior_zs_g0.mean, self.prior_zs_g0.logvar]
- if hps.co_dim > 0:
- tf_vals += [self.prior_zs_ar_con.samples_t]
- tf_vals_flat, fidxs = flatten(tf_vals)
-
- session = tf.get_default_session()
- feed_dict = {}
- feed_dict[self.dataName] = dataset_name
- feed_dict[self.keep_prob] = 1.0
-
- np_vals_flat = session.run(tf_vals_flat, feed_dict=feed_dict)
-
- ff = 0
- factors = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- gen_states = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- gen_ics = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- costs = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- output_dist_params = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- if hps.ic_dim > 0:
- prior_g0_mean = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- prior_g0_logvar = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
- if hps.co_dim > 0:
- prior_zs_ar_con = [np_vals_flat[f] for f in fidxs[ff]]; ff += 1
-
- # [0] are to take out the non-temporal items from lists
- gen_ics = gen_ics[0]
- costs = costs[0]
-
- # Convert to full tensors, not lists of tensors in time dim.
- gen_states = list_t_bxn_to_tensor_bxtxn(gen_states)
- factors = list_t_bxn_to_tensor_bxtxn(factors)
- output_dist_params = list_t_bxn_to_tensor_bxtxn(output_dist_params)
- if hps.ic_dim > 0:
- prior_g0_mean = prior_g0_mean[0]
- prior_g0_logvar = prior_g0_logvar[0]
- if hps.co_dim > 0:
- prior_zs_ar_con = list_t_bxn_to_tensor_bxtxn(prior_zs_ar_con)
-
- model_vals = {}
- model_vals['gen_ics'] = gen_ics
- model_vals['gen_states'] = gen_states
- model_vals['factors'] = factors
- model_vals['output_dist_params'] = output_dist_params
- model_vals['costs'] = costs.reshape(1)
- if hps.ic_dim > 0:
- model_vals['prior_g0_mean'] = prior_g0_mean
- model_vals['prior_g0_logvar'] = prior_g0_logvar
- if hps.co_dim > 0:
- model_vals['prior_zs_ar_con'] = prior_zs_ar_con
-
- full_fname = os.path.join(hps.lfads_save_dir, output_fname)
- write_data(full_fname, model_vals, compression='gzip')
- print("Done.")
-
- @staticmethod
- def eval_model_parameters(use_nested=True, include_strs=None):
- """Evaluate and return all of the TF variables in the model.
-
- Args:
- use_nested (optional): For returning values, use a nested dictoinary, based
- on variable scoping, or return all variables in a flat dictionary.
- include_strs (optional): A list of strings to use as a filter, to reduce the
- number of variables returned. A variable name must contain at least one
- string in include_strs as a sub-string in order to be returned.
-
- Returns:
- The parameters of the model. This can be in a flat
- dictionary, or a nested dictionary, where the nesting is by variable
- scope.
- """
- all_tf_vars = tf.global_variables()
- session = tf.get_default_session()
- all_tf_vars_eval = session.run(all_tf_vars)
- vars_dict = {}
- strs = ["LFADS"]
- if include_strs:
- strs += include_strs
-
- for i, (var, var_eval) in enumerate(zip(all_tf_vars, all_tf_vars_eval)):
- if any(s in include_strs for s in var.name):
- if not isinstance(var_eval, np.ndarray): # for H5PY
- print(var.name, """ is not numpy array, saving as numpy array
- with value: """, var_eval, type(var_eval))
- e = np.array(var_eval)
- print(e, type(e))
- else:
- e = var_eval
- vars_dict[var.name] = e
-
- if not use_nested:
- return vars_dict
-
- var_names = vars_dict.keys()
- nested_vars_dict = {}
- current_dict = nested_vars_dict
- for v, var_name in enumerate(var_names):
- var_split_name_list = var_name.split('/')
- split_name_list_len = len(var_split_name_list)
- current_dict = nested_vars_dict
- for p, part in enumerate(var_split_name_list):
- if p < split_name_list_len - 1:
- if part in current_dict:
- current_dict = current_dict[part]
- else:
- current_dict[part] = {}
- current_dict = current_dict[part]
- else:
- current_dict[part] = vars_dict[var_name]
-
- return nested_vars_dict
-
- @staticmethod
- def spikify_rates(rates_bxtxd):
- """Randomly spikify underlying rates according a Poisson distribution
-
- Args:
- rates_bxtxd: A numpy tensor with shape:
-
- Returns:
- A numpy array with the same shape as rates_bxtxd, but with the event
- counts.
- """
-
- B,T,N = rates_bxtxd.shape
- assert all([B > 0, N > 0]), "problems"
-
- # Because the rates are changing, there is nesting
- spikes_bxtxd = np.zeros([B,T,N], dtype=np.int32)
- for b in range(B):
- for t in range(T):
- for n in range(N):
- rate = rates_bxtxd[b,t,n]
- count = np.random.poisson(rate)
- spikes_bxtxd[b,t,n] = count
-
- return spikes_bxtxd
diff --git a/research/lfads/plot_lfads.py b/research/lfads/plot_lfads.py
deleted file mode 100644
index c4e1a0332ef..00000000000
--- a/research/lfads/plot_lfads.py
+++ /dev/null
@@ -1,181 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import matplotlib
-matplotlib.use('Agg')
-from matplotlib import pyplot as plt
-import numpy as np
-import tensorflow as tf
-
-def _plot_item(W, name, full_name, nspaces):
- plt.figure()
- if W.shape == ():
- print(name, ": ", W)
- elif W.shape[0] == 1:
- plt.stem(W.T)
- plt.title(full_name)
- elif W.shape[1] == 1:
- plt.stem(W)
- plt.title(full_name)
- else:
- plt.imshow(np.abs(W), interpolation='nearest', cmap='jet');
- plt.colorbar()
- plt.title(full_name)
-
-
-def all_plot(d, full_name="", exclude="", nspaces=0):
- """Recursively plot all the LFADS model parameters in the nested
- dictionary."""
- for k, v in d.iteritems():
- this_name = full_name+"/"+k
- if isinstance(v, dict):
- all_plot(v, full_name=this_name, exclude=exclude, nspaces=nspaces+4)
- else:
- if exclude == "" or exclude not in this_name:
- _plot_item(v, name=k, full_name=full_name+"/"+k, nspaces=nspaces+4)
-
-
-
-def plot_time_series(vals_bxtxn, bidx=None, n_to_plot=np.inf, scale=1.0,
- color='r', title=None):
-
- if bidx is None:
- vals_txn = np.mean(vals_bxtxn, axis=0)
- else:
- vals_txn = vals_bxtxn[bidx,:,:]
-
- T, N = vals_txn.shape
- if n_to_plot > N:
- n_to_plot = N
-
- plt.plot(vals_txn[:,0:n_to_plot] + scale*np.array(range(n_to_plot)),
- color=color, lw=1.0)
- plt.axis('tight')
- if title:
- plt.title(title)
-
-
-def plot_lfads_timeseries(data_bxtxn, model_vals, ext_input_bxtxi=None,
- truth_bxtxn=None, bidx=None, output_dist="poisson",
- conversion_factor=1.0, subplot_cidx=0,
- col_title=None):
-
- n_to_plot = 10
- scale = 1.0
- nrows = 7
- plt.subplot(nrows,2,1+subplot_cidx)
-
- if output_dist == 'poisson':
- rates = means = conversion_factor * model_vals['output_dist_params']
- plot_time_series(rates, bidx, n_to_plot=n_to_plot, scale=scale,
- title=col_title + " rates (LFADS - red, Truth - black)")
- elif output_dist == 'gaussian':
- means_vars = model_vals['output_dist_params']
- means, vars = np.split(means_vars,2, axis=2) # bxtxn
- stds = np.sqrt(vars)
- plot_time_series(means, bidx, n_to_plot=n_to_plot, scale=scale,
- title=col_title + " means (LFADS - red, Truth - black)")
- plot_time_series(means+stds, bidx, n_to_plot=n_to_plot, scale=scale,
- color='c')
- plot_time_series(means-stds, bidx, n_to_plot=n_to_plot, scale=scale,
- color='c')
- else:
- assert 'NIY'
-
-
- if truth_bxtxn is not None:
- plot_time_series(truth_bxtxn, bidx, n_to_plot=n_to_plot, color='k',
- scale=scale)
-
- input_title = ""
- if "controller_outputs" in model_vals.keys():
- input_title += " Controller Output"
- plt.subplot(nrows,2,3+subplot_cidx)
- u_t = model_vals['controller_outputs'][0:-1]
- plot_time_series(u_t, bidx, n_to_plot=n_to_plot, color='c', scale=1.0,
- title=col_title + input_title)
-
- if ext_input_bxtxi is not None:
- input_title += " External Input"
- plot_time_series(ext_input_bxtxi, n_to_plot=n_to_plot, color='b',
- scale=scale, title=col_title + input_title)
-
- plt.subplot(nrows,2,5+subplot_cidx)
- plot_time_series(means, bidx,
- n_to_plot=n_to_plot, scale=1.0,
- title=col_title + " Spikes (LFADS - red, Spikes - black)")
- plot_time_series(data_bxtxn, bidx, n_to_plot=n_to_plot, color='k', scale=1.0)
-
- plt.subplot(nrows,2,7+subplot_cidx)
- plot_time_series(model_vals['factors'], bidx, n_to_plot=n_to_plot, color='b',
- scale=2.0, title=col_title + " Factors")
-
- plt.subplot(nrows,2,9+subplot_cidx)
- plot_time_series(model_vals['gen_states'], bidx, n_to_plot=n_to_plot,
- color='g', scale=1.0, title=col_title + " Generator State")
-
- if bidx is not None:
- data_nxt = data_bxtxn[bidx,:,:].T
- params_nxt = model_vals['output_dist_params'][bidx,:,:].T
- else:
- data_nxt = np.mean(data_bxtxn, axis=0).T
- params_nxt = np.mean(model_vals['output_dist_params'], axis=0).T
- if output_dist == 'poisson':
- means_nxt = params_nxt
- elif output_dist == 'gaussian': # (means+vars) x time
- means_nxt = np.vsplit(params_nxt,2)[0] # get means
- else:
- assert "NIY"
-
- plt.subplot(nrows,2,11+subplot_cidx)
- plt.imshow(data_nxt, aspect='auto', interpolation='nearest')
- plt.title(col_title + ' Data')
-
- plt.subplot(nrows,2,13+subplot_cidx)
- plt.imshow(means_nxt, aspect='auto', interpolation='nearest')
- plt.title(col_title + ' Means')
-
-
-def plot_lfads(train_bxtxd, train_model_vals,
- train_ext_input_bxtxi=None, train_truth_bxtxd=None,
- valid_bxtxd=None, valid_model_vals=None,
- valid_ext_input_bxtxi=None, valid_truth_bxtxd=None,
- bidx=None, cf=1.0, output_dist='poisson'):
-
- # Plotting
- f = plt.figure(figsize=(18,20), tight_layout=True)
- plot_lfads_timeseries(train_bxtxd, train_model_vals,
- train_ext_input_bxtxi,
- truth_bxtxn=train_truth_bxtxd,
- conversion_factor=cf, bidx=bidx,
- output_dist=output_dist, col_title='Train')
- plot_lfads_timeseries(valid_bxtxd, valid_model_vals,
- valid_ext_input_bxtxi,
- truth_bxtxn=valid_truth_bxtxd,
- conversion_factor=cf, bidx=bidx,
- output_dist=output_dist,
- subplot_cidx=1, col_title='Valid')
-
- # Convert from figure to an numpy array width x height x 3 (last for RGB)
- f.canvas.draw()
- data = np.fromstring(f.canvas.tostring_rgb(), dtype=np.uint8, sep='')
- data_wxhx3 = data.reshape(f.canvas.get_width_height()[::-1] + (3,))
- plt.close()
-
- return data_wxhx3
diff --git a/research/lfads/run_lfads.py b/research/lfads/run_lfads.py
deleted file mode 100755
index bd1c0d5e4de..00000000000
--- a/research/lfads/run_lfads.py
+++ /dev/null
@@ -1,815 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-from lfads import LFADS
-import numpy as np
-import os
-import tensorflow as tf
-import re
-import utils
-import sys
-MAX_INT = sys.maxsize
-
-# Lots of hyperparameters, but most are pretty insensitive. The
-# explanation of these hyperparameters is found below, in the flags
-# session.
-
-CHECKPOINT_PB_LOAD_NAME = "checkpoint"
-CHECKPOINT_NAME = "lfads_vae"
-CSV_LOG = "fitlog"
-OUTPUT_FILENAME_STEM = ""
-DEVICE = "gpu:0" # "cpu:0", or other gpus, e.g. "gpu:1"
-MAX_CKPT_TO_KEEP = 5
-MAX_CKPT_TO_KEEP_LVE = 5
-PS_NEXAMPLES_TO_PROCESS = MAX_INT # if larger than number of examples, process all
-EXT_INPUT_DIM = 0
-IC_DIM = 64
-FACTORS_DIM = 50
-IC_ENC_DIM = 128
-GEN_DIM = 200
-GEN_CELL_INPUT_WEIGHT_SCALE = 1.0
-GEN_CELL_REC_WEIGHT_SCALE = 1.0
-CELL_WEIGHT_SCALE = 1.0
-BATCH_SIZE = 128
-LEARNING_RATE_INIT = 0.01
-LEARNING_RATE_DECAY_FACTOR = 0.95
-LEARNING_RATE_STOP = 0.00001
-LEARNING_RATE_N_TO_COMPARE = 6
-INJECT_EXT_INPUT_TO_GEN = False
-DO_TRAIN_IO_ONLY = False
-DO_TRAIN_ENCODER_ONLY = False
-DO_RESET_LEARNING_RATE = False
-FEEDBACK_FACTORS_OR_RATES = "factors"
-DO_TRAIN_READIN = True
-
-# Calibrated just above the average value for the rnn synthetic data.
-MAX_GRAD_NORM = 200.0
-CELL_CLIP_VALUE = 5.0
-KEEP_PROB = 0.95
-TEMPORAL_SPIKE_JITTER_WIDTH = 0
-OUTPUT_DISTRIBUTION = 'poisson' # 'poisson' or 'gaussian'
-NUM_STEPS_FOR_GEN_IC = MAX_INT # set to num_steps if greater than num_steps
-
-DATA_DIR = "/tmp/rnn_synth_data_v1.0/"
-DATA_FILENAME_STEM = "chaotic_rnn_inputs_g1p5"
-LFADS_SAVE_DIR = "/tmp/lfads_chaotic_rnn_inputs_g1p5/"
-CO_DIM = 1
-DO_CAUSAL_CONTROLLER = False
-DO_FEED_FACTORS_TO_CONTROLLER = True
-CONTROLLER_INPUT_LAG = 1
-PRIOR_AR_AUTOCORRELATION = 10.0
-PRIOR_AR_PROCESS_VAR = 0.1
-DO_TRAIN_PRIOR_AR_ATAU = True
-DO_TRAIN_PRIOR_AR_NVAR = True
-CI_ENC_DIM = 128
-CON_DIM = 128
-CO_PRIOR_VAR_SCALE = 0.1
-KL_INCREASE_STEPS = 2000
-L2_INCREASE_STEPS = 2000
-L2_GEN_SCALE = 2000.0
-L2_CON_SCALE = 0.0
-# scale of regularizer on time correlation of inferred inputs
-CO_MEAN_CORR_SCALE = 0.0
-KL_IC_WEIGHT = 1.0
-KL_CO_WEIGHT = 1.0
-KL_START_STEP = 0
-L2_START_STEP = 0
-IC_PRIOR_VAR_MIN = 0.1
-IC_PRIOR_VAR_SCALE = 0.1
-IC_PRIOR_VAR_MAX = 0.1
-IC_POST_VAR_MIN = 0.0001 # protection from KL blowing up
-
-flags = tf.app.flags
-flags.DEFINE_string("kind", "train",
- "Type of model to build {train, \
- posterior_sample_and_average, \
- posterior_push_mean, \
- prior_sample, write_model_params")
-flags.DEFINE_string("output_dist", OUTPUT_DISTRIBUTION,
- "Type of output distribution, 'poisson' or 'gaussian'")
-flags.DEFINE_boolean("allow_gpu_growth", False,
- "If true, only allocate amount of memory needed for \
- Session. Otherwise, use full GPU memory.")
-
-# DATA
-flags.DEFINE_string("data_dir", DATA_DIR, "Data for training")
-flags.DEFINE_string("data_filename_stem", DATA_FILENAME_STEM,
- "Filename stem for data dictionaries.")
-flags.DEFINE_string("lfads_save_dir", LFADS_SAVE_DIR, "model save dir")
-flags.DEFINE_string("checkpoint_pb_load_name", CHECKPOINT_PB_LOAD_NAME,
- "Name of checkpoint files, use 'checkpoint_lve' for best \
- error")
-flags.DEFINE_string("checkpoint_name", CHECKPOINT_NAME,
- "Name of checkpoint files (.ckpt appended)")
-flags.DEFINE_string("output_filename_stem", OUTPUT_FILENAME_STEM,
- "Name of output file (postfix will be added)")
-flags.DEFINE_string("device", DEVICE,
- "Which device to use (default: \"gpu:0\", can also be \
- \"cpu:0\", \"gpu:1\", etc)")
-flags.DEFINE_string("csv_log", CSV_LOG,
- "Name of file to keep running log of fit likelihoods, \
- etc (.csv appended)")
-flags.DEFINE_integer("max_ckpt_to_keep", MAX_CKPT_TO_KEEP,
- "Max # of checkpoints to keep (rolling)")
-flags.DEFINE_integer("ps_nexamples_to_process", PS_NEXAMPLES_TO_PROCESS,
- "Number of examples to process for posterior sample and \
- average (not number of samples to average over).")
-flags.DEFINE_integer("max_ckpt_to_keep_lve", MAX_CKPT_TO_KEEP_LVE,
- "Max # of checkpoints to keep for lowest validation error \
- models (rolling)")
-flags.DEFINE_integer("ext_input_dim", EXT_INPUT_DIM, "Dimension of external \
-inputs")
-flags.DEFINE_integer("num_steps_for_gen_ic", NUM_STEPS_FOR_GEN_IC,
- "Number of steps to train the generator initial conditon.")
-
-
-# If there are observed inputs, there are two ways to add that observed
-# input to the model. The first is by treating as something to be
-# inferred, and thus encoding the observed input via the encoders, and then
-# input to the generator via the "inferred inputs" channel. Second, one
-# can input the input directly into the generator. This has the downside
-# of making the generation process strictly dependent on knowing the
-# observed input for any generated trial.
-flags.DEFINE_boolean("inject_ext_input_to_gen",
- INJECT_EXT_INPUT_TO_GEN,
- "Should observed inputs be input to model via encoders, \
- or injected directly into generator?")
-
-# CELL
-
-# The combined recurrent and input weights of the encoder and
-# controller cells are by default set to scale at ws/sqrt(#inputs),
-# with ws=1.0. You can change this scaling with this parameter.
-flags.DEFINE_float("cell_weight_scale", CELL_WEIGHT_SCALE,
- "Input scaling for input weights in generator.")
-
-
-# GENERATION
-
-# Note that the dimension of the initial conditions is separated from the
-# dimensions of the generator initial conditions (and a linear matrix will
-# adapt the shapes if necessary). This is just another way to control
-# complexity. In all likelihood, setting the ic dims to the size of the
-# generator hidden state is just fine.
-flags.DEFINE_integer("ic_dim", IC_DIM, "Dimension of h0")
-# Setting the dimensions of the factors to something smaller than the data
-# dimension is a way to get a reduced dimensionality representation of your
-# data.
-flags.DEFINE_integer("factors_dim", FACTORS_DIM,
- "Number of factors from generator")
-flags.DEFINE_integer("ic_enc_dim", IC_ENC_DIM,
- "Cell hidden size, encoder of h0")
-
-# Controlling the size of the generator is one way to control complexity of
-# the dynamics (there is also l2, which will squeeze out unnecessary
-# dynamics also). The modern deep learning approach is to make these cells
-# as large as tolerable (from a waiting perspective), and then regularize
-# them to death with drop out or whatever. I don't know if this is correct
-# for the LFADS application or not.
-flags.DEFINE_integer("gen_dim", GEN_DIM,
- "Cell hidden size, generator.")
-# The weights of the generator cell by default set to scale at
-# ws/sqrt(#inputs), with ws=1.0. You can change ws for
-# the input weights or the recurrent weights with these hyperparameters.
-flags.DEFINE_float("gen_cell_input_weight_scale", GEN_CELL_INPUT_WEIGHT_SCALE,
- "Input scaling for input weights in generator.")
-flags.DEFINE_float("gen_cell_rec_weight_scale", GEN_CELL_REC_WEIGHT_SCALE,
- "Input scaling for rec weights in generator.")
-
-# KL DISTRIBUTIONS
-# If you don't know what you are donig here, please leave alone, the
-# defaults should be fine for most cases, irregardless of other parameters.
-#
-# If you don't want the prior variance to be learned, set the
-# following values to the same thing: ic_prior_var_min,
-# ic_prior_var_scale, ic_prior_var_max. The prior mean will be
-# learned regardless.
-flags.DEFINE_float("ic_prior_var_min", IC_PRIOR_VAR_MIN,
- "Minimum variance in posterior h0 codes.")
-flags.DEFINE_float("ic_prior_var_scale", IC_PRIOR_VAR_SCALE,
- "Variance of ic prior distribution")
-flags.DEFINE_float("ic_prior_var_max", IC_PRIOR_VAR_MAX,
- "Maximum variance of IC prior distribution.")
-# If you really want to limit the information from encoder to decoder,
-# Increase ic_post_var_min above 0.0.
-flags.DEFINE_float("ic_post_var_min", IC_POST_VAR_MIN,
- "Minimum variance of IC posterior distribution.")
-flags.DEFINE_float("co_prior_var_scale", CO_PRIOR_VAR_SCALE,
- "Variance of control input prior distribution.")
-
-
-flags.DEFINE_float("prior_ar_atau", PRIOR_AR_AUTOCORRELATION,
- "Initial autocorrelation of AR(1) priors.")
-flags.DEFINE_float("prior_ar_nvar", PRIOR_AR_PROCESS_VAR,
- "Initial noise variance for AR(1) priors.")
-flags.DEFINE_boolean("do_train_prior_ar_atau", DO_TRAIN_PRIOR_AR_ATAU,
- "Is the value for atau an init, or the constant value?")
-flags.DEFINE_boolean("do_train_prior_ar_nvar", DO_TRAIN_PRIOR_AR_NVAR,
- "Is the value for noise variance an init, or the constant \
- value?")
-
-# CONTROLLER
-# This parameter critically controls whether or not there is a controller
-# (along with controller encoders placed into the LFADS graph. If CO_DIM >
-# 1, that means there is a 1 dimensional controller outputs, if equal to 0,
-# then no controller.
-flags.DEFINE_integer("co_dim", CO_DIM,
- "Number of control net outputs (>0 builds that graph).")
-
-# The controller will be more powerful if it can see the encoding of the entire
-# trial. However, this allows the controller to create inferred inputs that are
-# acausal with respect to the actual data generation process. E.g. the data
-# generator could have an input at time t, but the controller, after seeing the
-# entirety of the trial could infer that the input is coming a little before
-# time t, because there are no restrictions on the data the controller sees.
-# One can force the controller to be causal (with respect to perturbations in
-# the data generator) so that it only sees forward encodings of the data at time
-# t that originate at times before or at time t. One can also control the data
-# the controller sees by using an input lag (forward encoding at time [t-tlag]
-# for controller input at time t. The same can be done in the reverse direction
-# (controller input at time t from reverse encoding at time [t+tlag], in the
-# case of an acausal controller). Setting this lag > 0 (even lag=1) can be a
-# powerful way of avoiding very spiky decodes. Finally, one can manually control
-# whether the factors at time t-1 are fed to the controller at time t.
-#
-# If you don't care about any of this, and just want to smooth your data, set
-# do_causal_controller = False
-# do_feed_factors_to_controller = True
-# causal_input_lag = 0
-flags.DEFINE_boolean("do_causal_controller",
- DO_CAUSAL_CONTROLLER,
- "Restrict the controller create only causal inferred \
- inputs?")
-# Strictly speaking, feeding either the factors or the rates to the controller
-# violates causality, since the g0 gets to see all the data. This may or may not
-# be only a theoretical concern.
-flags.DEFINE_boolean("do_feed_factors_to_controller",
- DO_FEED_FACTORS_TO_CONTROLLER,
- "Should factors[t-1] be input to controller at time t?")
-flags.DEFINE_string("feedback_factors_or_rates", FEEDBACK_FACTORS_OR_RATES,
- "Feedback the factors or the rates to the controller? \
- Acceptable values: 'factors' or 'rates'.")
-flags.DEFINE_integer("controller_input_lag", CONTROLLER_INPUT_LAG,
- "Time lag on the encoding to controller t-lag for \
- forward, t+lag for reverse.")
-
-flags.DEFINE_integer("ci_enc_dim", CI_ENC_DIM,
- "Cell hidden size, encoder of control inputs")
-flags.DEFINE_integer("con_dim", CON_DIM,
- "Cell hidden size, controller")
-
-
-# OPTIMIZATION
-flags.DEFINE_integer("batch_size", BATCH_SIZE,
- "Batch size to use during training.")
-flags.DEFINE_float("learning_rate_init", LEARNING_RATE_INIT,
- "Learning rate initial value")
-flags.DEFINE_float("learning_rate_decay_factor", LEARNING_RATE_DECAY_FACTOR,
- "Learning rate decay, decay by this fraction every so \
- often.")
-flags.DEFINE_float("learning_rate_stop", LEARNING_RATE_STOP,
- "The lr is adaptively reduced, stop training at this value.")
-# Rather put the learning rate on an exponentially decreasiong schedule,
-# the current algorithm pays attention to the learning rate, and if it
-# isn't regularly decreasing, it will decrease the learning rate. So far,
-# it works fine, though it is not perfect.
-flags.DEFINE_integer("learning_rate_n_to_compare", LEARNING_RATE_N_TO_COMPARE,
- "Number of previous costs current cost has to be worse \
- than, to lower learning rate.")
-
-# This sets a value, above which, the gradients will be clipped. This hp
-# is extremely useful to avoid an infrequent, but highly pathological
-# problem whereby the gradient is so large that it destroys the
-# optimziation by setting parameters too large, leading to a vicious cycle
-# that ends in NaNs. If it's too large, it's useless, if it's too small,
-# it essentially becomes the learning rate. It's pretty insensitive, though.
-flags.DEFINE_float("max_grad_norm", MAX_GRAD_NORM,
- "Max norm of gradient before clipping.")
-
-# If your optimizations start "NaN-ing out", reduce this value so that
-# the values of the network don't grow out of control. Typically, once
-# this parameter is set to a reasonable value, one stops having numerical
-# problems.
-flags.DEFINE_float("cell_clip_value", CELL_CLIP_VALUE,
- "Max value recurrent cell can take before being clipped.")
-
-# This flag is used for an experiment where one sees if training a model with
-# many days data can be used to learn the dynamics from a held-out days data.
-# If you don't care about that particular experiment, this flag should always be
-# false.
-flags.DEFINE_boolean("do_train_io_only", DO_TRAIN_IO_ONLY,
- "Train only the input (readin) and output (readout) \
- affine functions.")
-
-# This flag is used for an experiment where one wants to know if the dynamics
-# learned by the generator generalize across conditions. In that case, you might
-# train up a model on one set of data, and then only further train the encoder
-# on another set of data (the conditions to be tested) so that the model is
-# forced to use the same dynamics to describe that data. If you don't care about
-# that particular experiment, this flag should always be false.
-flags.DEFINE_boolean("do_train_encoder_only", DO_TRAIN_ENCODER_ONLY,
- "Train only the encoder weights.")
-
-flags.DEFINE_boolean("do_reset_learning_rate", DO_RESET_LEARNING_RATE,
- "Reset the learning rate to initial value.")
-
-
-# for multi-session "stitching" models, the per-session readin matrices map from
-# neurons to input factors which are fed into the shared encoder. These are
-# initialized by alignment_matrix_cxf and alignment_bias_c in the input .h5
-# files. They can be fixed or made trainable.
-flags.DEFINE_boolean("do_train_readin", DO_TRAIN_READIN, "Whether to train the \
- readin matrices and bias vectors. False leaves them fixed \
- at their initial values specified by the alignment \
- matrices and vectors.")
-
-
-# OVERFITTING
-# Dropout is done on the input data, on controller inputs (from
-# encoder), on outputs from generator to factors.
-flags.DEFINE_float("keep_prob", KEEP_PROB, "Dropout keep probability.")
-# It appears that the system will happily fit spikes (blessing or
-# curse, depending). You may not want this. Jittering the spikes a
-# bit will help (-/+ bin size, as specified here).
-flags.DEFINE_integer("temporal_spike_jitter_width",
- TEMPORAL_SPIKE_JITTER_WIDTH,
- "Shuffle spikes around this window.")
-
-# General note about helping ascribe controller inputs vs dynamics:
-#
-# If controller is heavily penalized, then it won't have any output.
-# If dynamics are heavily penalized, then generator won't make
-# dynamics. Note this l2 penalty is only on the recurrent portion of
-# the RNNs, as dropout is also available, penalizing the feed-forward
-# connections.
-flags.DEFINE_float("l2_gen_scale", L2_GEN_SCALE,
- "L2 regularization cost for the generator only.")
-flags.DEFINE_float("l2_con_scale", L2_CON_SCALE,
- "L2 regularization cost for the controller only.")
-flags.DEFINE_float("co_mean_corr_scale", CO_MEAN_CORR_SCALE,
- "Cost of correlation (thru time)in the means of \
- controller output.")
-
-# UNDERFITTING
-# If the primary task of LFADS is "filtering" of data and not
-# generation, then it is possible that the KL penalty is too strong.
-# Empirically, we have found this to be the case. So we add a
-# hyperparameter in front of the the two KL terms (one for the initial
-# conditions to the generator, the other for the controller outputs).
-# You should always think of the the default values as 1.0, and that
-# leads to a standard VAE formulation whereby the numbers that are
-# optimized are a lower-bound on the log-likelihood of the data. When
-# these 2 HPs deviate from 1.0, one cannot make any statement about
-# what those LL lower bounds mean anymore, and they cannot be compared
-# (AFAIK).
-flags.DEFINE_float("kl_ic_weight", KL_IC_WEIGHT,
- "Strength of KL weight on initial conditions KL penatly.")
-flags.DEFINE_float("kl_co_weight", KL_CO_WEIGHT,
- "Strength of KL weight on controller output KL penalty.")
-
-# Sometimes the task can be sufficiently hard to learn that the
-# optimizer takes the 'easy route', and simply minimizes the KL
-# divergence, setting it to near zero, and the optimization gets
-# stuck. These two parameters will help avoid that by by getting the
-# optimization to 'latch' on to the main optimization, and only
-# turning in the regularizers later.
-flags.DEFINE_integer("kl_start_step", KL_START_STEP,
- "Start increasing weight after this many steps.")
-# training passes, not epochs, increase by 0.5 every kl_increase_steps
-flags.DEFINE_integer("kl_increase_steps", KL_INCREASE_STEPS,
- "Increase weight of kl cost to avoid local minimum.")
-# Same story for l2 regularizer. One wants a simple generator, for scientific
-# reasons, but not at the expense of hosing the optimization.
-flags.DEFINE_integer("l2_start_step", L2_START_STEP,
- "Start increasing l2 weight after this many steps.")
-flags.DEFINE_integer("l2_increase_steps", L2_INCREASE_STEPS,
- "Increase weight of l2 cost to avoid local minimum.")
-
-FLAGS = flags.FLAGS
-
-
-def build_model(hps, kind="train", datasets=None):
- """Builds a model from either random initialization, or saved parameters.
-
- Args:
- hps: The hyper parameters for the model.
- kind: (optional) The kind of model to build. Training vs inference require
- different graphs.
- datasets: The datasets structure (see top of lfads.py).
-
- Returns:
- an LFADS model.
- """
-
- build_kind = kind
- if build_kind == "write_model_params":
- build_kind = "train"
- with tf.variable_scope("LFADS", reuse=None):
- model = LFADS(hps, kind=build_kind, datasets=datasets)
-
- if not os.path.exists(hps.lfads_save_dir):
- print("Save directory %s does not exist, creating it." % hps.lfads_save_dir)
- os.makedirs(hps.lfads_save_dir)
-
- cp_pb_ln = hps.checkpoint_pb_load_name
- cp_pb_ln = 'checkpoint' if cp_pb_ln == "" else cp_pb_ln
- if cp_pb_ln == 'checkpoint':
- print("Loading latest training checkpoint in: ", hps.lfads_save_dir)
- saver = model.seso_saver
- elif cp_pb_ln == 'checkpoint_lve':
- print("Loading lowest validation checkpoint in: ", hps.lfads_save_dir)
- saver = model.lve_saver
- else:
- print("Loading checkpoint: ", cp_pb_ln, ", in: ", hps.lfads_save_dir)
- saver = model.seso_saver
-
- ckpt = tf.train.get_checkpoint_state(hps.lfads_save_dir,
- latest_filename=cp_pb_ln)
-
- session = tf.get_default_session()
- print("ckpt: ", ckpt)
- if ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path):
- print("Reading model parameters from %s" % ckpt.model_checkpoint_path)
- saver.restore(session, ckpt.model_checkpoint_path)
- else:
- print("Created model with fresh parameters.")
- if kind in ["posterior_sample_and_average", "posterior_push_mean",
- "prior_sample", "write_model_params"]:
- print("Possible error!!! You are running ", kind, " on a newly \
- initialized model!")
- # cannot print ckpt.model_check_point path if no ckpt
- print("Are you sure you sure a checkpoint in ", hps.lfads_save_dir,
- " exists?")
-
- tf.global_variables_initializer().run()
-
- if ckpt:
- train_step_str = re.search('-[0-9]+$', ckpt.model_checkpoint_path).group()
- else:
- train_step_str = '-0'
-
- fname = 'hyperparameters' + train_step_str + '.txt'
- hp_fname = os.path.join(hps.lfads_save_dir, fname)
- hps_for_saving = jsonify_dict(hps)
- utils.write_data(hp_fname, hps_for_saving, use_json=True)
-
- return model
-
-
-def jsonify_dict(d):
- """Turns python booleans into strings so hps dict can be written in json.
- Creates a shallow-copied dictionary first, then accomplishes string
- conversion.
-
- Args:
- d: hyperparameter dictionary
-
- Returns: hyperparameter dictionary with bool's as strings
- """
-
- d2 = d.copy() # shallow copy is fine by assumption of d being shallow
- def jsonify_bool(boolean_value):
- if boolean_value:
- return "true"
- else:
- return "false"
-
- for key in d2.keys():
- if isinstance(d2[key], bool):
- d2[key] = jsonify_bool(d2[key])
- return d2
-
-
-def build_hyperparameter_dict(flags):
- """Simple script for saving hyper parameters. Under the hood the
- flags structure isn't a dictionary, so it has to be simplified since we
- want to be able to view file as text.
-
- Args:
- flags: From tf.app.flags
-
- Returns:
- dictionary of hyper parameters (ignoring other flag types).
- """
- d = {}
- # Data
- d['output_dist'] = flags.output_dist
- d['data_dir'] = flags.data_dir
- d['lfads_save_dir'] = flags.lfads_save_dir
- d['checkpoint_pb_load_name'] = flags.checkpoint_pb_load_name
- d['checkpoint_name'] = flags.checkpoint_name
- d['output_filename_stem'] = flags.output_filename_stem
- d['max_ckpt_to_keep'] = flags.max_ckpt_to_keep
- d['max_ckpt_to_keep_lve'] = flags.max_ckpt_to_keep_lve
- d['ps_nexamples_to_process'] = flags.ps_nexamples_to_process
- d['ext_input_dim'] = flags.ext_input_dim
- d['data_filename_stem'] = flags.data_filename_stem
- d['device'] = flags.device
- d['csv_log'] = flags.csv_log
- d['num_steps_for_gen_ic'] = flags.num_steps_for_gen_ic
- d['inject_ext_input_to_gen'] = flags.inject_ext_input_to_gen
- # Cell
- d['cell_weight_scale'] = flags.cell_weight_scale
- # Generation
- d['ic_dim'] = flags.ic_dim
- d['factors_dim'] = flags.factors_dim
- d['ic_enc_dim'] = flags.ic_enc_dim
- d['gen_dim'] = flags.gen_dim
- d['gen_cell_input_weight_scale'] = flags.gen_cell_input_weight_scale
- d['gen_cell_rec_weight_scale'] = flags.gen_cell_rec_weight_scale
- # KL distributions
- d['ic_prior_var_min'] = flags.ic_prior_var_min
- d['ic_prior_var_scale'] = flags.ic_prior_var_scale
- d['ic_prior_var_max'] = flags.ic_prior_var_max
- d['ic_post_var_min'] = flags.ic_post_var_min
- d['co_prior_var_scale'] = flags.co_prior_var_scale
- d['prior_ar_atau'] = flags.prior_ar_atau
- d['prior_ar_nvar'] = flags.prior_ar_nvar
- d['do_train_prior_ar_atau'] = flags.do_train_prior_ar_atau
- d['do_train_prior_ar_nvar'] = flags.do_train_prior_ar_nvar
- # Controller
- d['do_causal_controller'] = flags.do_causal_controller
- d['controller_input_lag'] = flags.controller_input_lag
- d['do_feed_factors_to_controller'] = flags.do_feed_factors_to_controller
- d['feedback_factors_or_rates'] = flags.feedback_factors_or_rates
- d['co_dim'] = flags.co_dim
- d['ci_enc_dim'] = flags.ci_enc_dim
- d['con_dim'] = flags.con_dim
- d['co_mean_corr_scale'] = flags.co_mean_corr_scale
- # Optimization
- d['batch_size'] = flags.batch_size
- d['learning_rate_init'] = flags.learning_rate_init
- d['learning_rate_decay_factor'] = flags.learning_rate_decay_factor
- d['learning_rate_stop'] = flags.learning_rate_stop
- d['learning_rate_n_to_compare'] = flags.learning_rate_n_to_compare
- d['max_grad_norm'] = flags.max_grad_norm
- d['cell_clip_value'] = flags.cell_clip_value
- d['do_train_io_only'] = flags.do_train_io_only
- d['do_train_encoder_only'] = flags.do_train_encoder_only
- d['do_reset_learning_rate'] = flags.do_reset_learning_rate
- d['do_train_readin'] = flags.do_train_readin
-
- # Overfitting
- d['keep_prob'] = flags.keep_prob
- d['temporal_spike_jitter_width'] = flags.temporal_spike_jitter_width
- d['l2_gen_scale'] = flags.l2_gen_scale
- d['l2_con_scale'] = flags.l2_con_scale
- # Underfitting
- d['kl_ic_weight'] = flags.kl_ic_weight
- d['kl_co_weight'] = flags.kl_co_weight
- d['kl_start_step'] = flags.kl_start_step
- d['kl_increase_steps'] = flags.kl_increase_steps
- d['l2_start_step'] = flags.l2_start_step
- d['l2_increase_steps'] = flags.l2_increase_steps
- d['_clip_value'] = 80 # bounds the tf.exp to avoid INF
-
- return d
-
-
-class hps_dict_to_obj(dict):
- """Helper class allowing us to access hps dictionary more easily."""
-
- def __getattr__(self, key):
- if key in self:
- return self[key]
- else:
- assert False, ("%s does not exist." % key)
- def __setattr__(self, key, value):
- self[key] = value
-
-
-def train(hps, datasets):
- """Train the LFADS model.
-
- Args:
- hps: The dictionary of hyperparameters.
- datasets: A dictionary of data dictionaries. The dataset dict is simply a
- name(string)-> data dictionary mapping (See top of lfads.py).
- """
- model = build_model(hps, kind="train", datasets=datasets)
- if hps.do_reset_learning_rate:
- sess = tf.get_default_session()
- sess.run(model.learning_rate.initializer)
-
- model.train_model(datasets)
-
-
-def write_model_runs(hps, datasets, output_fname=None, push_mean=False):
- """Run the model on the data in data_dict, and save the computed values.
-
- LFADS generates a number of outputs for each examples, and these are all
- saved. They are:
- The mean and variance of the prior of g0.
- The mean and variance of approximate posterior of g0.
- The control inputs (if enabled)
- The initial conditions, g0, for all examples.
- The generator states for all time.
- The factors for all time.
- The rates for all time.
-
- Args:
- hps: The dictionary of hyperparameters.
- datasets: A dictionary of data dictionaries. The dataset dict is simply a
- name(string)-> data dictionary mapping (See top of lfads.py).
- output_fname (optional): output filename stem to write the model runs.
- push_mean: if False (default), generates batch_size samples for each trial
- and averages the results. if True, runs each trial once without noise,
- pushing the posterior mean initial conditions and control inputs through
- the trained model. False is used for posterior_sample_and_average, True
- is used for posterior_push_mean.
- """
- model = build_model(hps, kind=hps.kind, datasets=datasets)
- model.write_model_runs(datasets, output_fname, push_mean)
-
-
-def write_model_samples(hps, datasets, dataset_name=None, output_fname=None):
- """Use the prior distribution to generate samples from the model.
- Generates batch_size number of samples (set through FLAGS).
-
- LFADS generates a number of outputs for each examples, and these are all
- saved. They are:
- The mean and variance of the prior of g0.
- The control inputs (if enabled)
- The initial conditions, g0, for all examples.
- The generator states for all time.
- The factors for all time.
- The output distribution parameters (e.g. rates) for all time.
-
- Args:
- hps: The dictionary of hyperparameters.
- datasets: A dictionary of data dictionaries. The dataset dict is simply a
- name(string)-> data dictionary mapping (See top of lfads.py).
- dataset_name: The name of the dataset to grab the factors -> rates
- alignment matrices from. Only a concern with models trained on
- multi-session data. By default, uses the first dataset in the data dict.
- output_fname: The name prefix of the file in which to save the generated
- samples.
- """
- if not output_fname:
- output_fname = "model_runs_" + hps.kind
- else:
- output_fname = output_fname + "model_runs_" + hps.kind
- if not dataset_name:
- dataset_name = datasets.keys()[0]
- else:
- if dataset_name not in datasets.keys():
- raise ValueError("Invalid dataset name '%s'."%(dataset_name))
- model = build_model(hps, kind=hps.kind, datasets=datasets)
- model.write_model_samples(dataset_name, output_fname)
-
-
-def write_model_parameters(hps, output_fname=None, datasets=None):
- """Save all the model parameters
-
- Save all the parameters to hps.lfads_save_dir.
-
- Args:
- hps: The dictionary of hyperparameters.
- output_fname: The prefix of the file in which to save the generated
- samples.
- datasets: A dictionary of data dictionaries. The dataset dict is simply a
- name(string)-> data dictionary mapping (See top of lfads.py).
- """
- if not output_fname:
- output_fname = "model_params"
- else:
- output_fname = output_fname + "_model_params"
- fname = os.path.join(hps.lfads_save_dir, output_fname)
- print("Writing model parameters to: ", fname)
- # save the optimizer params as well
- model = build_model(hps, kind="write_model_params", datasets=datasets)
- model_params = model.eval_model_parameters(use_nested=False,
- include_strs="LFADS")
- utils.write_data(fname, model_params, compression=None)
- print("Done.")
-
-
-def clean_data_dict(data_dict):
- """Add some key/value pairs to the data dict, if they are missing.
- Args:
- data_dict - dictionary containing data for LFADS
- Returns:
- data_dict with some keys filled in, if they are absent.
- """
-
- keys = ['train_truth', 'train_ext_input', 'valid_data',
- 'valid_truth', 'valid_ext_input', 'valid_train']
- for k in keys:
- if k not in data_dict:
- data_dict[k] = None
-
- return data_dict
-
-
-def load_datasets(data_dir, data_filename_stem):
- """Load the datasets from a specified directory.
-
- Example files look like
- >data_dir/my_dataset_first_day
- >data_dir/my_dataset_second_day
-
- If my_dataset (filename) stem is in the directory, the read routine will try
- and load it. The datasets dictionary will then look like
- dataset['first_day'] -> (first day data dictionary)
- dataset['second_day'] -> (first day data dictionary)
-
- Args:
- data_dir: The directory from which to load the datasets.
- data_filename_stem: The stem of the filename for the datasets.
-
- Returns:
- datasets: a dataset dictionary, with one name->data dictionary pair for
- each dataset file.
- """
- print("Reading data from ", data_dir)
- datasets = utils.read_datasets(data_dir, data_filename_stem)
- for k, data_dict in datasets.items():
- datasets[k] = clean_data_dict(data_dict)
-
- train_total_size = len(data_dict['train_data'])
- if train_total_size == 0:
- print("Did not load training set.")
- else:
- print("Found training set with number examples: ", train_total_size)
-
- valid_total_size = len(data_dict['valid_data'])
- if valid_total_size == 0:
- print("Did not load validation set.")
- else:
- print("Found validation set with number examples: ", valid_total_size)
-
- return datasets
-
-
-def main(_):
- """Get this whole shindig off the ground."""
- d = build_hyperparameter_dict(FLAGS)
- hps = hps_dict_to_obj(d) # hyper parameters
- kind = FLAGS.kind
-
- # Read the data, if necessary.
- train_set = valid_set = None
- if kind in ["train", "posterior_sample_and_average", "posterior_push_mean",
- "prior_sample", "write_model_params"]:
- datasets = load_datasets(hps.data_dir, hps.data_filename_stem)
- else:
- raise ValueError('Kind {} is not supported.'.format(kind))
-
- # infer the dataset names and dataset dimensions from the loaded files
- hps.kind = kind # needs to be added here, cuz not saved as hyperparam
- hps.dataset_names = []
- hps.dataset_dims = {}
- for key in datasets:
- hps.dataset_names.append(key)
- hps.dataset_dims[key] = datasets[key]['data_dim']
-
- # also store down the dimensionality of the data
- # - just pull from one set, required to be same for all sets
- hps.num_steps = datasets.values()[0]['num_steps']
- hps.ndatasets = len(hps.dataset_names)
-
- if hps.num_steps_for_gen_ic > hps.num_steps:
- hps.num_steps_for_gen_ic = hps.num_steps
-
- # Build and run the model, for varying purposes.
- config = tf.ConfigProto(allow_soft_placement=True,
- log_device_placement=False)
- if FLAGS.allow_gpu_growth:
- config.gpu_options.allow_growth = True
- sess = tf.Session(config=config)
- with sess.as_default():
- with tf.device(hps.device):
- if kind == "train":
- train(hps, datasets)
- elif kind == "posterior_sample_and_average":
- write_model_runs(hps, datasets, hps.output_filename_stem,
- push_mean=False)
- elif kind == "posterior_push_mean":
- write_model_runs(hps, datasets, hps.output_filename_stem,
- push_mean=True)
- elif kind == "prior_sample":
- write_model_samples(hps, datasets, hps.output_filename_stem)
- elif kind == "write_model_params":
- write_model_parameters(hps, hps.output_filename_stem, datasets)
- else:
- assert False, ("Kind %s is not implemented. " % kind)
-
-
-if __name__ == "__main__":
- tf.app.run()
diff --git a/research/lfads/synth_data/generate_chaotic_rnn_data.py b/research/lfads/synth_data/generate_chaotic_rnn_data.py
deleted file mode 100644
index 3de72e58b22..00000000000
--- a/research/lfads/synth_data/generate_chaotic_rnn_data.py
+++ /dev/null
@@ -1,200 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-from __future__ import print_function
-
-import h5py
-import numpy as np
-import os
-import tensorflow as tf # used for flags here
-
-from utils import write_datasets
-from synthetic_data_utils import add_alignment_projections, generate_data
-from synthetic_data_utils import generate_rnn, get_train_n_valid_inds
-from synthetic_data_utils import nparray_and_transpose
-from synthetic_data_utils import spikify_data, gaussify_data, split_list_by_inds
-import matplotlib
-import matplotlib.pyplot as plt
-import scipy.signal
-
-matplotlib.rcParams['image.interpolation'] = 'nearest'
-DATA_DIR = "rnn_synth_data_v1.0"
-
-flags = tf.app.flags
-flags.DEFINE_string("save_dir", "/tmp/" + DATA_DIR + "/",
- "Directory for saving data.")
-flags.DEFINE_string("datafile_name", "thits_data",
- "Name of data file for input case.")
-flags.DEFINE_string("noise_type", "poisson", "Noise type for data.")
-flags.DEFINE_integer("synth_data_seed", 5, "Random seed for RNN generation.")
-flags.DEFINE_float("T", 1.0, "Time in seconds to generate.")
-flags.DEFINE_integer("C", 100, "Number of conditions")
-flags.DEFINE_integer("N", 50, "Number of units for the RNN")
-flags.DEFINE_integer("S", 50, "Number of sampled units from RNN")
-flags.DEFINE_integer("npcs", 10, "Number of PCS for multi-session case.")
-flags.DEFINE_float("train_percentage", 4.0/5.0,
- "Percentage of train vs validation trials")
-flags.DEFINE_integer("nreplications", 40,
- "Number of noise replications of the same underlying rates.")
-flags.DEFINE_float("g", 1.5, "Complexity of dynamics")
-flags.DEFINE_float("x0_std", 1.0,
- "Volume from which to pull initial conditions (affects diversity of dynamics.")
-flags.DEFINE_float("tau", 0.025, "Time constant of RNN")
-flags.DEFINE_float("dt", 0.010, "Time bin")
-flags.DEFINE_float("input_magnitude", 20.0,
- "For the input case, what is the value of the input?")
-flags.DEFINE_float("max_firing_rate", 30.0, "Map 1.0 of RNN to a spikes per second")
-FLAGS = flags.FLAGS
-
-
-# Note that with N small, (as it is 25 above), the finite size effects
-# will have pretty dramatic effects on the dynamics of the random RNN.
-# If you want more complex dynamics, you'll have to run the script a
-# lot, or increase N (or g).
-
-# Getting hard vs. easy data can be a little stochastic, so we set the seed.
-
-# Pull out some commonly used parameters.
-# These are user parameters (configuration)
-rng = np.random.RandomState(seed=FLAGS.synth_data_seed)
-T = FLAGS.T
-C = FLAGS.C
-N = FLAGS.N
-S = FLAGS.S
-input_magnitude = FLAGS.input_magnitude
-nreplications = FLAGS.nreplications
-E = nreplications * C # total number of trials
-# S is the number of measurements in each datasets, w/ each
-# dataset having a different set of observations.
-ndatasets = N/S # ok if rounded down
-train_percentage = FLAGS.train_percentage
-ntime_steps = int(T / FLAGS.dt)
-# End of user parameters
-
-rnn = generate_rnn(rng, N, FLAGS.g, FLAGS.tau, FLAGS.dt, FLAGS.max_firing_rate)
-
-# Check to make sure the RNN is the one we used in the paper.
-if N == 50:
- assert abs(rnn['W'][0,0] - 0.06239899) < 1e-8, 'Error in random seed?'
- rem_check = nreplications * train_percentage
- assert abs(rem_check - int(rem_check)) < 1e-8, \
- 'Train percentage * nreplications should be integral number.'
-
-
-# Initial condition generation, and condition label generation. This
-# happens outside of the dataset loop, so that all datasets have the
-# same conditions, which is similar to a neurophys setup.
-condition_number = 0
-x0s = []
-condition_labels = []
-for c in range(C):
- x0 = FLAGS.x0_std * rng.randn(N, 1)
- x0s.append(np.tile(x0, nreplications)) # replicate x0 nreplications times
- # replicate the condition label nreplications times
- for ns in range(nreplications):
- condition_labels.append(condition_number)
- condition_number += 1
-x0s = np.concatenate(x0s, axis=1)
-
-# Containers for storing data across data.
-datasets = {}
-for n in range(ndatasets):
- print(n+1, " of ", ndatasets)
-
- # First generate all firing rates. in the next loop, generate all
- # replications this allows the random state for rate generation to be
- # independent of n_replications.
- dataset_name = 'dataset_N' + str(N) + '_S' + str(S)
- if S < N:
- dataset_name += '_n' + str(n+1)
-
- # Sample neuron subsets. The assumption is the PC axes of the RNN
- # are not unit aligned, so sampling units is adequate to sample all
- # the high-variance PCs.
- P_sxn = np.eye(S,N)
- for m in range(n):
- P_sxn = np.roll(P_sxn, S, axis=1)
-
- if input_magnitude > 0.0:
- # time of "hits" randomly chosen between [1/4 and 3/4] of total time
- input_times = rng.choice(int(ntime_steps/2), size=[E]) + int(ntime_steps/4)
- else:
- input_times = None
-
- rates, x0s, inputs = \
- generate_data(rnn, T=T, E=E, x0s=x0s, P_sxn=P_sxn,
- input_magnitude=input_magnitude,
- input_times=input_times)
-
- if FLAGS.noise_type == "poisson":
- noisy_data = spikify_data(rates, rng, rnn['dt'], rnn['max_firing_rate'])
- elif FLAGS.noise_type == "gaussian":
- noisy_data = gaussify_data(rates, rng, rnn['dt'], rnn['max_firing_rate'])
- else:
- raise ValueError("Only noise types supported are poisson or gaussian")
-
- # split into train and validation sets
- train_inds, valid_inds = get_train_n_valid_inds(E, train_percentage,
- nreplications)
-
- # Split the data, inputs, labels and times into train vs. validation.
- rates_train, rates_valid = \
- split_list_by_inds(rates, train_inds, valid_inds)
- noisy_data_train, noisy_data_valid = \
- split_list_by_inds(noisy_data, train_inds, valid_inds)
- input_train, inputs_valid = \
- split_list_by_inds(inputs, train_inds, valid_inds)
- condition_labels_train, condition_labels_valid = \
- split_list_by_inds(condition_labels, train_inds, valid_inds)
- input_times_train, input_times_valid = \
- split_list_by_inds(input_times, train_inds, valid_inds)
-
- # Turn rates, noisy_data, and input into numpy arrays.
- rates_train = nparray_and_transpose(rates_train)
- rates_valid = nparray_and_transpose(rates_valid)
- noisy_data_train = nparray_and_transpose(noisy_data_train)
- noisy_data_valid = nparray_and_transpose(noisy_data_valid)
- input_train = nparray_and_transpose(input_train)
- inputs_valid = nparray_and_transpose(inputs_valid)
-
- # Note that we put these 'truth' rates and input into this
- # structure, the only data that is used in LFADS are the noisy
- # data e.g. spike trains. The rest is either for printing or posterity.
- data = {'train_truth': rates_train,
- 'valid_truth': rates_valid,
- 'input_train_truth' : input_train,
- 'input_valid_truth' : inputs_valid,
- 'train_data' : noisy_data_train,
- 'valid_data' : noisy_data_valid,
- 'train_percentage' : train_percentage,
- 'nreplications' : nreplications,
- 'dt' : rnn['dt'],
- 'input_magnitude' : input_magnitude,
- 'input_times_train' : input_times_train,
- 'input_times_valid' : input_times_valid,
- 'P_sxn' : P_sxn,
- 'condition_labels_train' : condition_labels_train,
- 'condition_labels_valid' : condition_labels_valid,
- 'conversion_factor': 1.0 / rnn['conversion_factor']}
- datasets[dataset_name] = data
-
-if S < N:
- # Note that this isn't necessary for this synthetic example, but
- # it's useful to see how the input factor matrices were initialized
- # for actual neurophysiology data.
- datasets = add_alignment_projections(datasets, npcs=FLAGS.npcs)
-
-# Write out the datasets.
-write_datasets(FLAGS.save_dir, FLAGS.datafile_name, datasets)
diff --git a/research/lfads/synth_data/generate_itb_data.py b/research/lfads/synth_data/generate_itb_data.py
deleted file mode 100644
index 66bc45d02e9..00000000000
--- a/research/lfads/synth_data/generate_itb_data.py
+++ /dev/null
@@ -1,209 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-from __future__ import print_function
-
-import h5py
-import numpy as np
-import os
-from six.moves import xrange
-import tensorflow as tf
-
-from utils import write_datasets
-from synthetic_data_utils import normalize_rates
-from synthetic_data_utils import get_train_n_valid_inds, nparray_and_transpose
-from synthetic_data_utils import spikify_data, split_list_by_inds
-
-DATA_DIR = "rnn_synth_data_v1.0"
-
-flags = tf.app.flags
-flags.DEFINE_string("save_dir", "/tmp/" + DATA_DIR + "/",
- "Directory for saving data.")
-flags.DEFINE_string("datafile_name", "itb_rnn",
- "Name of data file for input case.")
-flags.DEFINE_integer("synth_data_seed", 5, "Random seed for RNN generation.")
-flags.DEFINE_float("T", 1.0, "Time in seconds to generate.")
-flags.DEFINE_integer("C", 800, "Number of conditions")
-flags.DEFINE_integer("N", 50, "Number of units for the RNN")
-flags.DEFINE_float("train_percentage", 4.0/5.0,
- "Percentage of train vs validation trials")
-flags.DEFINE_integer("nreplications", 5,
- "Number of spikifications of the same underlying rates.")
-flags.DEFINE_float("tau", 0.025, "Time constant of RNN")
-flags.DEFINE_float("dt", 0.010, "Time bin")
-flags.DEFINE_float("max_firing_rate", 30.0,
- "Map 1.0 of RNN to a spikes per second")
-flags.DEFINE_float("u_std", 0.25,
- "Std dev of input to integration to bound model")
-flags.DEFINE_string("checkpoint_path", "SAMPLE_CHECKPOINT",
- """Path to directory with checkpoints of model
- trained on integration to bound task. Currently this
- is a placeholder which tells the code to grab the
- checkpoint that is provided with the code
- (in /trained_itb/..). If you have your own checkpoint
- you would like to restore, you would point it to
- that path.""")
-FLAGS = flags.FLAGS
-
-
-class IntegrationToBoundModel:
- def __init__(self, N):
- scale = 0.8 / float(N**0.5)
- self.N = N
- self.Wh_nxn = tf.Variable(tf.random_normal([N, N], stddev=scale))
- self.b_1xn = tf.Variable(tf.zeros([1, N]))
- self.Bu_1xn = tf.Variable(tf.zeros([1, N]))
- self.Wro_nxo = tf.Variable(tf.random_normal([N, 1], stddev=scale))
- self.bro_o = tf.Variable(tf.zeros([1]))
-
- def call(self, h_tm1_bxn, u_bx1):
- act_t_bxn = tf.matmul(h_tm1_bxn, self.Wh_nxn) + self.b_1xn + u_bx1 * self.Bu_1xn
- h_t_bxn = tf.nn.tanh(act_t_bxn)
- z_t = tf.nn.xw_plus_b(h_t_bxn, self.Wro_nxo, self.bro_o)
- return z_t, h_t_bxn
-
-def get_data_batch(batch_size, T, rng, u_std):
- u_bxt = rng.randn(batch_size, T) * u_std
- running_sum_b = np.zeros([batch_size])
- labels_bxt = np.zeros([batch_size, T])
- for t in xrange(T):
- running_sum_b += u_bxt[:, t]
- labels_bxt[:, t] += running_sum_b
- labels_bxt = np.clip(labels_bxt, -1, 1)
- return u_bxt, labels_bxt
-
-
-rng = np.random.RandomState(seed=FLAGS.synth_data_seed)
-u_rng = np.random.RandomState(seed=FLAGS.synth_data_seed+1)
-T = FLAGS.T
-C = FLAGS.C
-N = FLAGS.N # must be same N as in trained model (provided example is N = 50)
-nreplications = FLAGS.nreplications
-E = nreplications * C # total number of trials
-train_percentage = FLAGS.train_percentage
-ntimesteps = int(T / FLAGS.dt)
-batch_size = 1 # gives one example per ntrial
-
-model = IntegrationToBoundModel(N)
-inputs_ph_t = [tf.placeholder(tf.float32,
- shape=[None, 1]) for _ in range(ntimesteps)]
-state = tf.zeros([batch_size, N])
-saver = tf.train.Saver()
-
-P_nxn = rng.randn(N,N) / np.sqrt(N) # random projections
-
-# unroll RNN for T timesteps
-outputs_t = []
-states_t = []
-
-for inp in inputs_ph_t:
- output, state = model.call(state, inp)
- outputs_t.append(output)
- states_t.append(state)
-
-with tf.Session() as sess:
- # restore the latest model ckpt
- if FLAGS.checkpoint_path == "SAMPLE_CHECKPOINT":
- dir_path = os.path.dirname(os.path.realpath(__file__))
- model_checkpoint_path = os.path.join(dir_path, "trained_itb/model-65000")
- else:
- model_checkpoint_path = FLAGS.checkpoint_path
- try:
- saver.restore(sess, model_checkpoint_path)
- print ('Model restored from', model_checkpoint_path)
- except:
- assert False, ("No checkpoints to restore from, is the path %s correct?"
- %model_checkpoint_path)
-
- # generate data for trials
- data_e = []
- u_e = []
- outs_e = []
- for c in range(C):
- u_1xt, outs_1xt = get_data_batch(batch_size, ntimesteps, u_rng, FLAGS.u_std)
-
- feed_dict = {}
- for t in xrange(ntimesteps):
- feed_dict[inputs_ph_t[t]] = np.reshape(u_1xt[:,t], (batch_size,-1))
-
- states_t_bxn, outputs_t_bxn = sess.run([states_t, outputs_t],
- feed_dict=feed_dict)
- states_nxt = np.transpose(np.squeeze(np.asarray(states_t_bxn)))
- outputs_t_bxn = np.squeeze(np.asarray(outputs_t_bxn))
- r_sxt = np.dot(P_nxn, states_nxt)
-
- for s in xrange(nreplications):
- data_e.append(r_sxt)
- u_e.append(u_1xt)
- outs_e.append(outputs_t_bxn)
-
- truth_data_e = normalize_rates(data_e, E, N)
-
-spiking_data_e = spikify_data(truth_data_e, rng, dt=FLAGS.dt,
- max_firing_rate=FLAGS.max_firing_rate)
-train_inds, valid_inds = get_train_n_valid_inds(E, train_percentage,
- nreplications)
-
-data_train_truth, data_valid_truth = split_list_by_inds(truth_data_e,
- train_inds,
- valid_inds)
-data_train_spiking, data_valid_spiking = split_list_by_inds(spiking_data_e,
- train_inds,
- valid_inds)
-
-data_train_truth = nparray_and_transpose(data_train_truth)
-data_valid_truth = nparray_and_transpose(data_valid_truth)
-data_train_spiking = nparray_and_transpose(data_train_spiking)
-data_valid_spiking = nparray_and_transpose(data_valid_spiking)
-
-# save down the inputs used to generate this data
-train_inputs_u, valid_inputs_u = split_list_by_inds(u_e,
- train_inds,
- valid_inds)
-train_inputs_u = nparray_and_transpose(train_inputs_u)
-valid_inputs_u = nparray_and_transpose(valid_inputs_u)
-
-# save down the network outputs (may be useful later)
-train_outputs_u, valid_outputs_u = split_list_by_inds(outs_e,
- train_inds,
- valid_inds)
-train_outputs_u = np.array(train_outputs_u)
-valid_outputs_u = np.array(valid_outputs_u)
-
-
-data = { 'train_truth': data_train_truth,
- 'valid_truth': data_valid_truth,
- 'train_data' : data_train_spiking,
- 'valid_data' : data_valid_spiking,
- 'train_percentage' : train_percentage,
- 'nreplications' : nreplications,
- 'dt' : FLAGS.dt,
- 'u_std' : FLAGS.u_std,
- 'max_firing_rate': FLAGS.max_firing_rate,
- 'train_inputs_u': train_inputs_u,
- 'valid_inputs_u': valid_inputs_u,
- 'train_outputs_u': train_outputs_u,
- 'valid_outputs_u': valid_outputs_u,
- 'conversion_factor' : FLAGS.max_firing_rate/(1.0/FLAGS.dt) }
-
-# just one dataset here
-datasets = {}
-dataset_name = 'dataset_N' + str(N)
-datasets[dataset_name] = data
-
-# write out the dataset
-write_datasets(FLAGS.save_dir, FLAGS.datafile_name, datasets)
-print ('Saved to ', os.path.join(FLAGS.save_dir,
- FLAGS.datafile_name + '_' + dataset_name))
diff --git a/research/lfads/synth_data/generate_labeled_rnn_data.py b/research/lfads/synth_data/generate_labeled_rnn_data.py
deleted file mode 100644
index 06955854865..00000000000
--- a/research/lfads/synth_data/generate_labeled_rnn_data.py
+++ /dev/null
@@ -1,147 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-from __future__ import print_function
-
-import os
-import h5py
-import numpy as np
-from six.moves import xrange
-
-from synthetic_data_utils import generate_data, generate_rnn
-from synthetic_data_utils import get_train_n_valid_inds
-from synthetic_data_utils import nparray_and_transpose
-from synthetic_data_utils import spikify_data, split_list_by_inds
-import tensorflow as tf
-from utils import write_datasets
-
-DATA_DIR = "rnn_synth_data_v1.0"
-
-flags = tf.app.flags
-flags.DEFINE_string("save_dir", "/tmp/" + DATA_DIR + "/",
- "Directory for saving data.")
-flags.DEFINE_string("datafile_name", "conditioned_rnn_data",
- "Name of data file for input case.")
-flags.DEFINE_integer("synth_data_seed", 5, "Random seed for RNN generation.")
-flags.DEFINE_float("T", 1.0, "Time in seconds to generate.")
-flags.DEFINE_integer("C", 400, "Number of conditions")
-flags.DEFINE_integer("N", 50, "Number of units for the RNN")
-flags.DEFINE_float("train_percentage", 4.0/5.0,
- "Percentage of train vs validation trials")
-flags.DEFINE_integer("nreplications", 10,
- "Number of spikifications of the same underlying rates.")
-flags.DEFINE_float("g", 1.5, "Complexity of dynamics")
-flags.DEFINE_float("x0_std", 1.0,
- "Volume from which to pull initial conditions (affects diversity of dynamics.")
-flags.DEFINE_float("tau", 0.025, "Time constant of RNN")
-flags.DEFINE_float("dt", 0.010, "Time bin")
-flags.DEFINE_float("max_firing_rate", 30.0, "Map 1.0 of RNN to a spikes per second")
-FLAGS = flags.FLAGS
-
-rng = np.random.RandomState(seed=FLAGS.synth_data_seed)
-rnn_rngs = [np.random.RandomState(seed=FLAGS.synth_data_seed+1),
- np.random.RandomState(seed=FLAGS.synth_data_seed+2)]
-T = FLAGS.T
-C = FLAGS.C
-N = FLAGS.N
-nreplications = FLAGS.nreplications
-E = nreplications * C
-train_percentage = FLAGS.train_percentage
-ntimesteps = int(T / FLAGS.dt)
-
-rnn_a = generate_rnn(rnn_rngs[0], N, FLAGS.g, FLAGS.tau, FLAGS.dt,
- FLAGS.max_firing_rate)
-rnn_b = generate_rnn(rnn_rngs[1], N, FLAGS.g, FLAGS.tau, FLAGS.dt,
- FLAGS.max_firing_rate)
-rnns = [rnn_a, rnn_b]
-
-# pick which RNN is used on each trial
-rnn_to_use = rng.randint(2, size=E)
-ext_input = np.repeat(np.expand_dims(rnn_to_use, axis=1), ntimesteps, axis=1)
-ext_input = np.expand_dims(ext_input, axis=2) # these are "a's" in the paper
-
-x0s = []
-condition_labels = []
-condition_number = 0
-for c in range(C):
- x0 = FLAGS.x0_std * rng.randn(N, 1)
- x0s.append(np.tile(x0, nreplications))
- for ns in range(nreplications):
- condition_labels.append(condition_number)
- condition_number += 1
-x0s = np.concatenate(x0s, axis=1)
-
-P_nxn = rng.randn(N, N) / np.sqrt(N)
-
-# generate trials for both RNNs
-rates_a, x0s_a, _ = generate_data(rnn_a, T=T, E=E, x0s=x0s, P_sxn=P_nxn,
- input_magnitude=0.0, input_times=None)
-spikes_a = spikify_data(rates_a, rng, rnn_a['dt'], rnn_a['max_firing_rate'])
-
-rates_b, x0s_b, _ = generate_data(rnn_b, T=T, E=E, x0s=x0s, P_sxn=P_nxn,
- input_magnitude=0.0, input_times=None)
-spikes_b = spikify_data(rates_b, rng, rnn_b['dt'], rnn_b['max_firing_rate'])
-
-# not the best way to do this but E is small enough
-rates = []
-spikes = []
-for trial in xrange(E):
- if rnn_to_use[trial] == 0:
- rates.append(rates_a[trial])
- spikes.append(spikes_a[trial])
- else:
- rates.append(rates_b[trial])
- spikes.append(spikes_b[trial])
-
-# split into train and validation sets
-train_inds, valid_inds = get_train_n_valid_inds(E, train_percentage,
- nreplications)
-
-rates_train, rates_valid = split_list_by_inds(rates, train_inds, valid_inds)
-spikes_train, spikes_valid = split_list_by_inds(spikes, train_inds, valid_inds)
-condition_labels_train, condition_labels_valid = split_list_by_inds(
- condition_labels, train_inds, valid_inds)
-ext_input_train, ext_input_valid = split_list_by_inds(
- ext_input, train_inds, valid_inds)
-
-rates_train = nparray_and_transpose(rates_train)
-rates_valid = nparray_and_transpose(rates_valid)
-spikes_train = nparray_and_transpose(spikes_train)
-spikes_valid = nparray_and_transpose(spikes_valid)
-
-# add train_ext_input and valid_ext input
-data = {'train_truth': rates_train,
- 'valid_truth': rates_valid,
- 'train_data' : spikes_train,
- 'valid_data' : spikes_valid,
- 'train_ext_input' : np.array(ext_input_train),
- 'valid_ext_input': np.array(ext_input_valid),
- 'train_percentage' : train_percentage,
- 'nreplications' : nreplications,
- 'dt' : FLAGS.dt,
- 'P_sxn' : P_nxn,
- 'condition_labels_train' : condition_labels_train,
- 'condition_labels_valid' : condition_labels_valid,
- 'conversion_factor': 1.0 / rnn_a['conversion_factor']}
-
-# just one dataset here
-datasets = {}
-dataset_name = 'dataset_N' + str(N)
-datasets[dataset_name] = data
-
-# write out the dataset
-write_datasets(FLAGS.save_dir, FLAGS.datafile_name, datasets)
-print ('Saved to ', os.path.join(FLAGS.save_dir,
- FLAGS.datafile_name + '_' + dataset_name))
diff --git a/research/lfads/synth_data/run_generate_synth_data.sh b/research/lfads/synth_data/run_generate_synth_data.sh
deleted file mode 100755
index 9ebc8ce2e5e..00000000000
--- a/research/lfads/synth_data/run_generate_synth_data.sh
+++ /dev/null
@@ -1,40 +0,0 @@
-#!/bin/bash
-
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-
-SYNTH_PATH=/tmp/rnn_synth_data_v1.0/
-
- echo "Generating chaotic rnn data with no input pulses (g=1.5) with spiking noise"
- python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnn_no_inputs --synth_data_seed=5 --T=1.0 --C=400 --N=50 --S=50 --train_percentage=0.8 --nreplications=10 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=0.0 --max_firing_rate=30.0 --noise_type='poisson'
-
-echo "Generating chaotic rnn data with no input pulses (g=1.5) with Gaussian noise"
-python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=gaussian_chaotic_rnn_no_inputs --synth_data_seed=5 --T=1.0 --C=400 --N=50 --S=50 --train_percentage=0.8 --nreplications=10 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=0.0 --max_firing_rate=30.0 --noise_type='gaussian'
-
- echo "Generating chaotic rnn data with input pulses (g=1.5)"
- python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnn_inputs_g1p5 --synth_data_seed=5 --T=1.0 --C=400 --N=50 --S=50 --train_percentage=0.8 --nreplications=10 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=20.0 --max_firing_rate=30.0 --noise_type='poisson'
-
- echo "Generating chaotic rnn data with input pulses (g=2.5)"
- python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnn_inputs_g2p5 --synth_data_seed=5 --T=1.0 --C=400 --N=50 --S=50 --train_percentage=0.8 --nreplications=10 --g=2.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=20.0 --max_firing_rate=30.0 --noise_type='poisson'
-
- echo "Generate the multi-session RNN data (no multi-session synth example in paper)"
- python generate_chaotic_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnn_multisession --synth_data_seed=5 --T=1.0 --C=150 --N=100 --S=20 --npcs=10 --train_percentage=0.8 --nreplications=40 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --input_magnitude=0.0 --max_firing_rate=30.0 --noise_type='poisson'
-
- echo "Generating Integration-to-bound RNN data"
- python generate_itb_data.py --save_dir=$SYNTH_PATH --datafile_name=itb_rnn --u_std=0.25 --checkpoint_path=SAMPLE_CHECKPOINT --synth_data_seed=5 --T=1.0 --C=800 --N=50 --train_percentage=0.8 --nreplications=5 --tau=0.025 --dt=0.01 --max_firing_rate=30.0
-
- echo "Generating chaotic rnn data with external input labels (no external input labels example in paper)"
- python generate_labeled_rnn_data.py --save_dir=$SYNTH_PATH --datafile_name=chaotic_rnns_labeled --synth_data_seed=5 --T=1.0 --C=400 --N=50 --train_percentage=0.8 --nreplications=10 --g=1.5 --x0_std=1.0 --tau=0.025 --dt=0.01 --max_firing_rate=30.0
diff --git a/research/lfads/synth_data/synthetic_data_utils.py b/research/lfads/synth_data/synthetic_data_utils.py
deleted file mode 100644
index cc264ee49fd..00000000000
--- a/research/lfads/synth_data/synthetic_data_utils.py
+++ /dev/null
@@ -1,348 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-from __future__ import print_function
-
-import h5py
-import numpy as np
-import os
-
-from utils import write_datasets
-import matplotlib
-import matplotlib.pyplot as plt
-import scipy.signal
-
-
-def generate_rnn(rng, N, g, tau, dt, max_firing_rate):
- """Create a (vanilla) RNN with a bunch of hyper parameters for generating
-chaotic data.
- Args:
- rng: numpy random number generator
- N: number of hidden units
- g: scaling of recurrent weight matrix in g W, with W ~ N(0,1/N)
- tau: time scale of individual unit dynamics
- dt: time step for equation updates
- max_firing_rate: how to resecale the -1,1 firing rates
- Returns:
- the dictionary of these parameters, plus some others.
-"""
- rnn = {}
- rnn['N'] = N
- rnn['W'] = rng.randn(N,N)/np.sqrt(N)
- rnn['Bin'] = rng.randn(N)/np.sqrt(1.0)
- rnn['Bin2'] = rng.randn(N)/np.sqrt(1.0)
- rnn['b'] = np.zeros(N)
- rnn['g'] = g
- rnn['tau'] = tau
- rnn['dt'] = dt
- rnn['max_firing_rate'] = max_firing_rate
- mfr = rnn['max_firing_rate'] # spikes / sec
- nbins_per_sec = 1.0/rnn['dt'] # bins / sec
- # Used for plotting in LFADS
- rnn['conversion_factor'] = mfr / nbins_per_sec # spikes / bin
- return rnn
-
-
-def generate_data(rnn, T, E, x0s=None, P_sxn=None, input_magnitude=0.0,
- input_times=None):
- """ Generates data from an randomly initialized RNN.
- Args:
- rnn: the rnn
- T: Time in seconds to run (divided by rnn['dt'] to get steps, rounded down.
- E: total number of examples
- S: number of samples (subsampling N)
- Returns:
- A list of length E of NxT tensors of the network being run.
- """
- N = rnn['N']
- def run_rnn(rnn, x0, ntime_steps, input_time=None):
- rs = np.zeros([N,ntime_steps])
- x_tm1 = x0
- r_tm1 = np.tanh(x0)
- tau = rnn['tau']
- dt = rnn['dt']
- alpha = (1.0-dt/tau)
- W = dt/tau*rnn['W']*rnn['g']
- Bin = dt/tau*rnn['Bin']
- Bin2 = dt/tau*rnn['Bin2']
- b = dt/tau*rnn['b']
-
- us = np.zeros([1, ntime_steps])
- for t in range(ntime_steps):
- x_t = alpha*x_tm1 + np.dot(W,r_tm1) + b
- if input_time is not None and t == input_time:
- us[0,t] = input_magnitude
- x_t += Bin * us[0,t] # DCS is this what was used?
- r_t = np.tanh(x_t)
- x_tm1 = x_t
- r_tm1 = r_t
- rs[:,t] = r_t
- return rs, us
-
- if P_sxn is None:
- P_sxn = np.eye(N)
- ntime_steps = int(T / rnn['dt'])
- data_e = []
- inputs_e = []
- for e in range(E):
- input_time = input_times[e] if input_times is not None else None
- r_nxt, u_uxt = run_rnn(rnn, x0s[:,e], ntime_steps, input_time)
- r_sxt = np.dot(P_sxn, r_nxt)
- inputs_e.append(u_uxt)
- data_e.append(r_sxt)
-
- S = P_sxn.shape[0]
- data_e = normalize_rates(data_e, E, S)
-
- return data_e, x0s, inputs_e
-
-
-def normalize_rates(data_e, E, S):
- # Normalization, made more complex because of the P matrices.
- # Normalize by min and max in each channel. This normalization will
- # cause offset differences between identical rnn runs, but different
- # t hits.
- for e in range(E):
- r_sxt = data_e[e]
- for i in range(S):
- rmin = np.min(r_sxt[i,:])
- rmax = np.max(r_sxt[i,:])
- assert rmax - rmin != 0, 'Something wrong'
- r_sxt[i,:] = (r_sxt[i,:] - rmin)/(rmax-rmin)
- data_e[e] = r_sxt
- return data_e
-
-
-def spikify_data(data_e, rng, dt=1.0, max_firing_rate=100):
- """ Apply spikes to a continuous dataset whose values are between 0.0 and 1.0
- Args:
- data_e: nexamples length list of NxT trials
- dt: how often the data are sampled
- max_firing_rate: the firing rate that is associated with a value of 1.0
- Returns:
- spikified_e: a list of length b of the data represented as spikes,
- sampled from the underlying poisson process.
- """
-
- E = len(data_e)
- spikes_e = []
- for e in range(E):
- data = data_e[e]
- N,T = data.shape
- data_s = np.zeros([N,T]).astype(np.int)
- for n in range(N):
- f = data[n,:]
- s = rng.poisson(f*max_firing_rate*dt, size=T)
- data_s[n,:] = s
- spikes_e.append(data_s)
-
- return spikes_e
-
-
-def gaussify_data(data_e, rng, dt=1.0, max_firing_rate=100):
- """ Apply gaussian noise to a continuous dataset whose values are between
- 0.0 and 1.0
-
- Args:
- data_e: nexamples length list of NxT trials
- dt: how often the data are sampled
- max_firing_rate: the firing rate that is associated with a value of 1.0
- Returns:
- gauss_e: a list of length b of the data with noise.
- """
-
- E = len(data_e)
- mfr = max_firing_rate
- gauss_e = []
- for e in range(E):
- data = data_e[e]
- N,T = data.shape
- noisy_data = data * mfr + np.random.randn(N,T) * (5.0*mfr) * np.sqrt(dt)
- gauss_e.append(noisy_data)
-
- return gauss_e
-
-
-
-def get_train_n_valid_inds(num_trials, train_fraction, nreplications):
- """Split the numbers between 0 and num_trials-1 into two portions for
- training and validation, based on the train fraction.
- Args:
- num_trials: the number of trials
- train_fraction: (e.g. .80)
- nreplications: the number of spiking trials per initial condition
- Returns:
- a 2-tuple of two lists: the training indices and validation indices
- """
- train_inds = []
- valid_inds = []
- for i in range(num_trials):
- # This line divides up the trials so that within one initial condition,
- # the randomness of spikifying the condition is shared among both
- # training and validation data splits.
- if (i % nreplications)+1 > train_fraction * nreplications:
- valid_inds.append(i)
- else:
- train_inds.append(i)
-
- return train_inds, valid_inds
-
-
-def split_list_by_inds(data, inds1, inds2):
- """Take the data, a list, and split it up based on the indices in inds1 and
- inds2.
- Args:
- data: the list of data to split
- inds1, the first list of indices
- inds2, the second list of indices
- Returns: a 2-tuple of two lists.
- """
- if data is None or len(data) == 0:
- return [], []
- else:
- dout1 = [data[i] for i in inds1]
- dout2 = [data[i] for i in inds2]
- return dout1, dout2
-
-
-def nparray_and_transpose(data_a_b_c):
- """Convert the list of items in data to a numpy array, and transpose it
- Args:
- data: data_asbsc: a nested, nested list of length a, with sublist length
- b, with sublist length c.
- Returns:
- a numpy 3-tensor with dimensions a x c x b
-"""
- data_axbxc = np.array([datum_b_c for datum_b_c in data_a_b_c])
- data_axcxb = np.transpose(data_axbxc, axes=[0,2,1])
- return data_axcxb
-
-
-def add_alignment_projections(datasets, npcs, ntime=None, nsamples=None):
- """Create a matrix that aligns the datasets a bit, under
- the assumption that each dataset is observing the same underlying dynamical
- system.
-
- Args:
- datasets: The dictionary of dataset structures.
- npcs: The number of pcs for each, basically like lfads factors.
- nsamples (optional): Number of samples to take for each dataset.
- ntime (optional): Number of time steps to take in each sample.
-
- Returns:
- The dataset structures, with the field alignment_matrix_cxf added.
- This is # channels x npcs dimension
-"""
- nchannels_all = 0
- channel_idxs = {}
- conditions_all = {}
- nconditions_all = 0
- for name, dataset in datasets.items():
- cidxs = np.where(dataset['P_sxn'])[1] # non-zero entries in columns
- channel_idxs[name] = [cidxs[0], cidxs[-1]+1]
- nchannels_all += cidxs[-1]+1 - cidxs[0]
- conditions_all[name] = np.unique(dataset['condition_labels_train'])
-
- all_conditions_list = \
- np.unique(np.ndarray.flatten(np.array(conditions_all.values())))
- nconditions_all = all_conditions_list.shape[0]
-
- if ntime is None:
- ntime = dataset['train_data'].shape[1]
- if nsamples is None:
- nsamples = dataset['train_data'].shape[0]
-
- # In the data workup in the paper, Chethan did intra condition
- # averaging, so let's do that here.
- avg_data_all = {}
- for name, conditions in conditions_all.items():
- dataset = datasets[name]
- avg_data_all[name] = {}
- for cname in conditions:
- td_idxs = np.argwhere(np.array(dataset['condition_labels_train'])==cname)
- data = np.squeeze(dataset['train_data'][td_idxs,:,:], axis=1)
- avg_data = np.mean(data, axis=0)
- avg_data_all[name][cname] = avg_data
-
- # Visualize this in the morning.
- all_data_nxtc = np.zeros([nchannels_all, ntime * nconditions_all])
- for name, dataset in datasets.items():
- cidx_s = channel_idxs[name][0]
- cidx_f = channel_idxs[name][1]
- for cname in conditions_all[name]:
- cidxs = np.argwhere(all_conditions_list == cname)
- if cidxs.shape[0] > 0:
- cidx = cidxs[0][0]
- all_tidxs = np.arange(0, ntime+1) + cidx*ntime
- all_data_nxtc[cidx_s:cidx_f, all_tidxs[0]:all_tidxs[-1]] = \
- avg_data_all[name][cname].T
-
- # A bit of filtering. We don't care about spectral properties, or
- # filtering artifacts, simply correlate time steps a bit.
- filt_len = 6
- bc_filt = np.ones([filt_len])/float(filt_len)
- for c in range(nchannels_all):
- all_data_nxtc[c,:] = scipy.signal.filtfilt(bc_filt, [1.0], all_data_nxtc[c,:])
-
- # Compute the PCs.
- all_data_mean_nx1 = np.mean(all_data_nxtc, axis=1, keepdims=True)
- all_data_zm_nxtc = all_data_nxtc - all_data_mean_nx1
- corr_mat_nxn = np.dot(all_data_zm_nxtc, all_data_zm_nxtc.T)
- evals_n, evecs_nxn = np.linalg.eigh(corr_mat_nxn)
- sidxs = np.flipud(np.argsort(evals_n)) # sort such that 0th is highest
- evals_n = evals_n[sidxs]
- evecs_nxn = evecs_nxn[:,sidxs]
-
- # Project all the channels data onto the low-D PCA basis, where
- # low-d is the npcs parameter.
- all_data_pca_pxtc = np.dot(evecs_nxn[:, 0:npcs].T, all_data_zm_nxtc)
-
- # Now for each dataset, we regress the channel data onto the top
- # pcs, and this will be our alignment matrix for that dataset.
- # |B - A*W|^2
- for name, dataset in datasets.items():
- cidx_s = channel_idxs[name][0]
- cidx_f = channel_idxs[name][1]
- all_data_zm_chxtc = all_data_zm_nxtc[cidx_s:cidx_f,:] # ch for channel
- W_chxp, _, _, _ = \
- np.linalg.lstsq(all_data_zm_chxtc.T, all_data_pca_pxtc.T)
- dataset['alignment_matrix_cxf'] = W_chxp
- alignment_bias_cx1 = all_data_mean_nx1[cidx_s:cidx_f]
- dataset['alignment_bias_c'] = np.squeeze(alignment_bias_cx1, axis=1)
-
- do_debug_plot = False
- if do_debug_plot:
- pc_vecs = evecs_nxn[:,0:npcs]
- ntoplot = 400
-
- plt.figure()
- plt.plot(np.log10(evals_n), '-x')
- plt.figure()
- plt.subplot(311)
- plt.imshow(all_data_pca_pxtc)
- plt.colorbar()
-
- plt.subplot(312)
- plt.imshow(np.dot(W_chxp.T, all_data_zm_chxtc))
- plt.colorbar()
-
- plt.subplot(313)
- plt.imshow(np.dot(all_data_zm_chxtc.T, W_chxp).T - all_data_pca_pxtc)
- plt.colorbar()
-
- import pdb
- pdb.set_trace()
-
- return datasets
diff --git a/research/lfads/synth_data/trained_itb/model-65000.data-00000-of-00001 b/research/lfads/synth_data/trained_itb/model-65000.data-00000-of-00001
deleted file mode 100644
index 9459a2a1b72..00000000000
Binary files a/research/lfads/synth_data/trained_itb/model-65000.data-00000-of-00001 and /dev/null differ
diff --git a/research/lfads/synth_data/trained_itb/model-65000.index b/research/lfads/synth_data/trained_itb/model-65000.index
deleted file mode 100644
index dd9c793acf8..00000000000
Binary files a/research/lfads/synth_data/trained_itb/model-65000.index and /dev/null differ
diff --git a/research/lfads/synth_data/trained_itb/model-65000.meta b/research/lfads/synth_data/trained_itb/model-65000.meta
deleted file mode 100644
index 07bd2b9688e..00000000000
Binary files a/research/lfads/synth_data/trained_itb/model-65000.meta and /dev/null differ
diff --git a/research/lfads/utils.py b/research/lfads/utils.py
deleted file mode 100644
index e64825ffc1d..00000000000
--- a/research/lfads/utils.py
+++ /dev/null
@@ -1,367 +0,0 @@
-# Copyright 2017 Google Inc. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-# ==============================================================================
-from __future__ import print_function
-
-import os
-import h5py
-import json
-
-import numpy as np
-import tensorflow as tf
-
-
-def log_sum_exp(x_k):
- """Computes log \sum exp in a numerically stable way.
- log ( sum_i exp(x_i) )
- log ( sum_i exp(x_i - m + m) ), with m = max(x_i)
- log ( sum_i exp(x_i - m)*exp(m) )
- log ( sum_i exp(x_i - m) + m
-
- Args:
- x_k - k -dimensional list of arguments to log_sum_exp.
-
- Returns:
- log_sum_exp of the arguments.
- """
- m = tf.reduce_max(x_k)
- x1_k = x_k - m
- u_k = tf.exp(x1_k)
- z = tf.reduce_sum(u_k)
- return tf.log(z) + m
-
-
-def linear(x, out_size, do_bias=True, alpha=1.0, identity_if_possible=False,
- normalized=False, name=None, collections=None):
- """Linear (affine) transformation, y = x W + b, for a variety of
- configurations.
-
- Args:
- x: input The tensor to tranformation.
- out_size: The integer size of non-batch output dimension.
- do_bias (optional): Add a learnable bias vector to the operation.
- alpha (optional): A multiplicative scaling for the weight initialization
- of the matrix, in the form \alpha * 1/\sqrt{x.shape[1]}.
- identity_if_possible (optional): just return identity,
- if x.shape[1] == out_size.
- normalized (optional): Option to divide out by the norms of the rows of W.
- name (optional): The name prefix to add to variables.
- collections (optional): List of additional collections. (Placed in
- tf.GraphKeys.GLOBAL_VARIABLES already, so no need for that.)
-
- Returns:
- In the equation, y = x W + b, returns the tensorflow op that yields y.
- """
- in_size = int(x.get_shape()[1]) # from Dimension(10) -> 10
- stddev = alpha/np.sqrt(float(in_size))
- mat_init = tf.random_normal_initializer(0.0, stddev)
- wname = (name + "/W") if name else "/W"
-
- if identity_if_possible and in_size == out_size:
- # Sometimes linear layers are nothing more than size adapters.
- return tf.identity(x, name=(wname+'_ident'))
-
- W,b = init_linear(in_size, out_size, do_bias=do_bias, alpha=alpha,
- normalized=normalized, name=name, collections=collections)
-
- if do_bias:
- return tf.matmul(x, W) + b
- else:
- return tf.matmul(x, W)
-
-
-def init_linear(in_size, out_size, do_bias=True, mat_init_value=None,
- bias_init_value=None, alpha=1.0, identity_if_possible=False,
- normalized=False, name=None, collections=None, trainable=True):
- """Linear (affine) transformation, y = x W + b, for a variety of
- configurations.
-
- Args:
- in_size: The integer size of the non-batc input dimension. [(x),y]
- out_size: The integer size of non-batch output dimension. [x,(y)]
- do_bias (optional): Add a (learnable) bias vector to the operation,
- if false, b will be None
- mat_init_value (optional): numpy constant for matrix initialization, if None
- , do random, with additional parameters.
- alpha (optional): A multiplicative scaling for the weight initialization
- of the matrix, in the form \alpha * 1/\sqrt{x.shape[1]}.
- identity_if_possible (optional): just return identity,
- if x.shape[1] == out_size.
- normalized (optional): Option to divide out by the norms of the rows of W.
- name (optional): The name prefix to add to variables.
- collections (optional): List of additional collections. (Placed in
- tf.GraphKeys.GLOBAL_VARIABLES already, so no need for that.)
-
- Returns:
- In the equation, y = x W + b, returns the pair (W, b).
- """
-
- if mat_init_value is not None and mat_init_value.shape != (in_size, out_size):
- raise ValueError(
- 'Provided mat_init_value must have shape [%d, %d].'%(in_size, out_size))
- if bias_init_value is not None and bias_init_value.shape != (1,out_size):
- raise ValueError(
- 'Provided bias_init_value must have shape [1,%d].'%(out_size,))
-
- if mat_init_value is None:
- stddev = alpha/np.sqrt(float(in_size))
- mat_init = tf.random_normal_initializer(0.0, stddev)
-
- wname = (name + "/W") if name else "/W"
-
- if identity_if_possible and in_size == out_size:
- return (tf.constant(np.eye(in_size).astype(np.float32)),
- tf.zeros(in_size))
-
- # Note the use of get_variable vs. tf.Variable. this is because get_variable
- # does not allow the initialization of the variable with a value.
- if normalized:
- w_collections = [tf.GraphKeys.GLOBAL_VARIABLES, "norm-variables"]
- if collections:
- w_collections += collections
- if mat_init_value is not None:
- w = tf.Variable(mat_init_value, name=wname, collections=w_collections,
- trainable=trainable)
- else:
- w = tf.get_variable(wname, [in_size, out_size], initializer=mat_init,
- collections=w_collections, trainable=trainable)
- w = tf.nn.l2_normalize(w, dim=0) # x W, so xW_j = \sum_i x_bi W_ij
- else:
- w_collections = [tf.GraphKeys.GLOBAL_VARIABLES]
- if collections:
- w_collections += collections
- if mat_init_value is not None:
- w = tf.Variable(mat_init_value, name=wname, collections=w_collections,
- trainable=trainable)
- else:
- w = tf.get_variable(wname, [in_size, out_size], initializer=mat_init,
- collections=w_collections, trainable=trainable)
- b = None
- if do_bias:
- b_collections = [tf.GraphKeys.GLOBAL_VARIABLES]
- if collections:
- b_collections += collections
- bname = (name + "/b") if name else "/b"
- if bias_init_value is None:
- b = tf.get_variable(bname, [1, out_size],
- initializer=tf.zeros_initializer(),
- collections=b_collections,
- trainable=trainable)
- else:
- b = tf.Variable(bias_init_value, name=bname,
- collections=b_collections,
- trainable=trainable)
-
- return (w, b)
-
-
-def write_data(data_fname, data_dict, use_json=False, compression=None):
- """Write data in HD5F format.
-
- Args:
- data_fname: The filename of teh file in which to write the data.
- data_dict: The dictionary of data to write. The keys are strings
- and the values are numpy arrays.
- use_json (optional): human readable format for simple items
- compression (optional): The compression to use for h5py (disabled by
- default because the library borks on scalars, otherwise try 'gzip').
- """
-
- dir_name = os.path.dirname(data_fname)
- if not os.path.exists(dir_name):
- os.makedirs(dir_name)
-
- if use_json:
- the_file = open(data_fname,'wb')
- json.dump(data_dict, the_file)
- the_file.close()
- else:
- try:
- with h5py.File(data_fname, 'w') as hf:
- for k, v in data_dict.items():
- clean_k = k.replace('/', '_')
- if clean_k is not k:
- print('Warning: saving variable with name: ', k, ' as ', clean_k)
- else:
- print('Saving variable with name: ', clean_k)
- hf.create_dataset(clean_k, data=v, compression=compression)
- except IOError:
- print("Cannot open %s for writing.", data_fname)
- raise
-
-
-def read_data(data_fname):
- """ Read saved data in HDF5 format.
-
- Args:
- data_fname: The filename of the file from which to read the data.
- Returns:
- A dictionary whose keys will vary depending on dataset (but should
- always contain the keys 'train_data' and 'valid_data') and whose
- values are numpy arrays.
- """
-
- try:
- with h5py.File(data_fname, 'r') as hf:
- data_dict = {k: np.array(v) for k, v in hf.items()}
- return data_dict
- except IOError:
- print("Cannot open %s for reading." % data_fname)
- raise
-
-
-def write_datasets(data_path, data_fname_stem, dataset_dict, compression=None):
- """Write datasets in HD5F format.
-
- This function assumes the dataset_dict is a mapping ( string ->
- to data_dict ). It calls write_data for each data dictionary,
- post-fixing the data filename with the key of the dataset.
-
- Args:
- data_path: The path to the save directory.
- data_fname_stem: The filename stem of the file in which to write the data.
- dataset_dict: The dictionary of datasets. The keys are strings
- and the values data dictionaries (str -> numpy arrays) associations.
- compression (optional): The compression to use for h5py (disabled by
- default because the library borks on scalars, otherwise try 'gzip').
- """
-
- full_name_stem = os.path.join(data_path, data_fname_stem)
- for s, data_dict in dataset_dict.items():
- write_data(full_name_stem + "_" + s, data_dict, compression=compression)
-
-
-def read_datasets(data_path, data_fname_stem):
- """Read dataset sin HD5F format.
-
- This function assumes the dataset_dict is a mapping ( string ->
- to data_dict ). It calls write_data for each data dictionary,
- post-fixing the data filename with the key of the dataset.
-
- Args:
- data_path: The path to the save directory.
- data_fname_stem: The filename stem of the file in which to write the data.
- """
-
- dataset_dict = {}
- fnames = os.listdir(data_path)
-
- print ('loading data from ' + data_path + ' with stem ' + data_fname_stem)
- for fname in fnames:
- if fname.startswith(data_fname_stem):
- data_dict = read_data(os.path.join(data_path,fname))
- idx = len(data_fname_stem) + 1
- key = fname[idx:]
- data_dict['data_dim'] = data_dict['train_data'].shape[2]
- data_dict['num_steps'] = data_dict['train_data'].shape[1]
- dataset_dict[key] = data_dict
-
- if len(dataset_dict) == 0:
- raise ValueError("Failed to load any datasets, are you sure that the "
- "'--data_dir' and '--data_filename_stem' flag values "
- "are correct?")
-
- print (str(len(dataset_dict)) + ' datasets loaded')
- return dataset_dict
-
-
-# NUMPY utility functions
-def list_t_bxn_to_list_b_txn(values_t_bxn):
- """Convert a length T list of BxN numpy tensors of length B list of TxN numpy
- tensors.
-
- Args:
- values_t_bxn: The length T list of BxN numpy tensors.
-
- Returns:
- The length B list of TxN numpy tensors.
- """
- T = len(values_t_bxn)
- B, N = values_t_bxn[0].shape
- values_b_txn = []
- for b in range(B):
- values_pb_txn = np.zeros([T,N])
- for t in range(T):
- values_pb_txn[t,:] = values_t_bxn[t][b,:]
- values_b_txn.append(values_pb_txn)
-
- return values_b_txn
-
-
-def list_t_bxn_to_tensor_bxtxn(values_t_bxn):
- """Convert a length T list of BxN numpy tensors to single numpy tensor with
- shape BxTxN.
-
- Args:
- values_t_bxn: The length T list of BxN numpy tensors.
-
- Returns:
- values_bxtxn: The BxTxN numpy tensor.
- """
-
- T = len(values_t_bxn)
- B, N = values_t_bxn[0].shape
- values_bxtxn = np.zeros([B,T,N])
- for t in range(T):
- values_bxtxn[:,t,:] = values_t_bxn[t]
-
- return values_bxtxn
-
-
-def tensor_bxtxn_to_list_t_bxn(tensor_bxtxn):
- """Convert a numpy tensor with shape BxTxN to a length T list of numpy tensors
- with shape BxT.
-
- Args:
- tensor_bxtxn: The BxTxN numpy tensor.
-
- Returns:
- A length T list of numpy tensors with shape BxT.
- """
-
- values_t_bxn = []
- B, T, N = tensor_bxtxn.shape
- for t in range(T):
- values_t_bxn.append(np.squeeze(tensor_bxtxn[:,t,:]))
-
- return values_t_bxn
-
-
-def flatten(list_of_lists):
- """Takes a list of lists and returns a list of the elements.
-
- Args:
- list_of_lists: List of lists.
-
- Returns:
- flat_list: Flattened list.
- flat_list_idxs: Flattened list indices.
- """
- flat_list = []
- flat_list_idxs = []
- start_idx = 0
- for item in list_of_lists:
- if isinstance(item, list):
- flat_list += item
- l = len(item)
- idxs = range(start_idx, start_idx+l)
- start_idx = start_idx+l
- else: # a value
- flat_list.append(item)
- idxs = [start_idx]
- start_idx += 1
- flat_list_idxs.append(idxs)
-
- return flat_list, flat_list_idxs
diff --git a/research/lstm_object_detection/README.md b/research/lstm_object_detection/README.md
deleted file mode 100644
index a696ba3df30..00000000000
--- a/research/lstm_object_detection/README.md
+++ /dev/null
@@ -1,40 +0,0 @@
-# Tensorflow Mobile Video Object Detection
-
-Tensorflow mobile video object detection implementation proposed in the
-following papers:
-
-
-
-
-
-```
-"Mobile Video Object Detection with Temporally-Aware Feature Maps",
-Liu, Mason and Zhu, Menglong, CVPR 2018.
-```
-\[[link](http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Mobile_Video_Object_CVPR_2018_paper.pdf)\]\[[bibtex](
-https://scholar.googleusercontent.com/scholar.bib?q=info:hq5rcMUUXysJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAXLdwXcU5g_wiMQ40EvbHQ9kTyvfUxffh&scisf=4&ct=citation&cd=-1&hl=en)\]
-
-
-
-
-
-
-```
-"Looking Fast and Slow: Memory-Guided Mobile Video Object Detection",
-Liu, Mason and Zhu, Menglong and White, Marie and Li, Yinxiao and Kalenichenko, Dmitry
-```
-\[[link](https://arxiv.org/abs/1903.10172)\]\[[bibtex](
-https://scholar.googleusercontent.com/scholar.bib?q=info:rLqvkztmWYgJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAXLdwNf-LJlm2M1ymQHbq2wYA995MHpJu&scisf=4&ct=citation&cd=-1&hl=en)\]
-
-
-## Maintainers
-* masonliuw@gmail.com
-* yinxiao@google.com
-* menglong@google.com
-* yongzhe@google.com
-* lzyuan@google.com
-
-
-## Table of Contents
-
- * Exporting a trained model
diff --git a/research/lstm_object_detection/__init__.py b/research/lstm_object_detection/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/lstm_object_detection/builders/__init__.py b/research/lstm_object_detection/builders/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/lstm_object_detection/builders/graph_rewriter_builder.py b/research/lstm_object_detection/builders/graph_rewriter_builder.py
deleted file mode 100644
index accced2f0fc..00000000000
--- a/research/lstm_object_detection/builders/graph_rewriter_builder.py
+++ /dev/null
@@ -1,147 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Custom version for quantized training and evaluation functions.
-
-The main difference between this and the third_party graph_rewriter_builder.py
-is that this version uses experimental_create_training_graph which allows the
-customization of freeze_bn_delay.
-"""
-
-import re
-import tensorflow.compat.v1 as tf
-from tensorflow.contrib import layers as contrib_layers
-from tensorflow.contrib import quantize as contrib_quantize
-from tensorflow.contrib.quantize.python import common
-from tensorflow.contrib.quantize.python import input_to_ops
-from tensorflow.contrib.quantize.python import quant_ops
-from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import math_ops
-
-
-def build(graph_rewriter_config,
- quant_overrides_config=None,
- is_training=True,
- is_export=False):
- """Returns a function that modifies default graph based on options.
-
- Args:
- graph_rewriter_config: graph_rewriter_pb2.GraphRewriter proto.
- quant_overrides_config: quant_overrides_pb2.QuantOverrides proto.
- is_training: whether in training or eval mode.
- is_export: whether exporting the graph.
- """
- def graph_rewrite_fn():
- """Function to quantize weights and activation of the default graph."""
- if (graph_rewriter_config.quantization.weight_bits != 8 or
- graph_rewriter_config.quantization.activation_bits != 8):
- raise ValueError('Only 8bit quantization is supported')
-
- graph = tf.get_default_graph()
-
- # Insert custom quant ops.
- if quant_overrides_config is not None:
- input_to_ops_map = input_to_ops.InputToOps(graph)
- for q in quant_overrides_config.quant_configs:
- producer = graph.get_operation_by_name(q.op_name)
- if producer is None:
- raise ValueError('Op name does not exist in graph.')
- context = _get_context_from_op(producer)
- consumers = input_to_ops_map.ConsumerOperations(producer)
- if q.fixed_range:
- _insert_fixed_quant_op(
- context,
- q.quant_op_name,
- producer,
- consumers,
- init_min=q.min,
- init_max=q.max,
- quant_delay=q.delay if is_training else 0)
- else:
- raise ValueError('Learned ranges are not yet supported.')
-
- # Quantize the graph by inserting quantize ops for weights and activations
- if is_training:
- contrib_quantize.experimental_create_training_graph(
- input_graph=graph,
- quant_delay=graph_rewriter_config.quantization.delay,
- freeze_bn_delay=graph_rewriter_config.quantization.delay)
- else:
- contrib_quantize.experimental_create_eval_graph(
- input_graph=graph,
- quant_delay=graph_rewriter_config.quantization.delay
- if not is_export else 0)
-
- contrib_layers.summarize_collection('quant_vars')
-
- return graph_rewrite_fn
-
-
-def _get_context_from_op(op):
- """Gets the root context name from the op name."""
- context_re = re.search(r'^(.*)/([^/]+)', op.name)
- if context_re:
- return context_re.group(1)
- return ''
-
-
-def _insert_fixed_quant_op(context,
- name,
- producer,
- consumers,
- init_min=-6.0,
- init_max=6.0,
- quant_delay=None):
- """Adds a fake quant op with fixed ranges.
-
- Args:
- context: The parent scope of the op to be quantized.
- name: The name of the fake quant op.
- producer: The producer op to be quantized.
- consumers: The consumer ops to the producer op.
- init_min: The minimum range for the fake quant op.
- init_max: The maximum range for the fake quant op.
- quant_delay: Number of steps to wait before activating the fake quant op.
-
- Raises:
- ValueError: When producer operation is not directly connected to the
- consumer operation.
- """
- name_prefix = name if not context else context + '/' + name
- inputs = producer.outputs[0]
- quant = quant_ops.FixedQuantize(
- inputs, init_min=init_min, init_max=init_max, scope=name_prefix)
-
- if quant_delay and quant_delay > 0:
- activate_quant = math_ops.greater_equal(
- common.CreateOrGetQuantizationStep(),
- quant_delay,
- name=name_prefix + '/activate_quant')
- quant = control_flow_ops.cond(
- activate_quant,
- lambda: quant,
- lambda: inputs,
- name=name_prefix + '/delayed_quant')
-
- if consumers:
- tensors_modified_count = common.RerouteTensor(
- quant, inputs, can_modify=consumers)
- # Some operations can have multiple output tensors going to the same
- # consumer. Since consumers is a set, we need to ensure that
- # tensors_modified_count is greater than or equal to the length of the set
- # of consumers.
- if tensors_modified_count < len(consumers):
- raise ValueError('No inputs quantized for ops: [%s]' % ', '.join(
- [consumer.name for consumer in consumers]))
diff --git a/research/lstm_object_detection/builders/graph_rewriter_builder_test.py b/research/lstm_object_detection/builders/graph_rewriter_builder_test.py
deleted file mode 100644
index e06a9f5a3d7..00000000000
--- a/research/lstm_object_detection/builders/graph_rewriter_builder_test.py
+++ /dev/null
@@ -1,117 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for graph_rewriter_builder."""
-import mock
-import tensorflow.compat.v1 as tf
-from tensorflow.contrib import layers as contrib_layers
-from tensorflow.contrib import quantize as contrib_quantize
-from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from lstm_object_detection.builders import graph_rewriter_builder
-from lstm_object_detection.protos import quant_overrides_pb2
-from object_detection.protos import graph_rewriter_pb2
-
-
-class QuantizationBuilderTest(tf.test.TestCase):
-
- def testQuantizationBuilderSetsUpCorrectTrainArguments(self):
- with mock.patch.object(
- contrib_quantize,
- 'experimental_create_training_graph') as mock_quant_fn:
- with mock.patch.object(contrib_layers,
- 'summarize_collection') as mock_summarize_col:
- graph_rewriter_proto = graph_rewriter_pb2.GraphRewriter()
- graph_rewriter_proto.quantization.delay = 10
- graph_rewriter_proto.quantization.weight_bits = 8
- graph_rewriter_proto.quantization.activation_bits = 8
- graph_rewrite_fn = graph_rewriter_builder.build(
- graph_rewriter_proto, is_training=True)
- graph_rewrite_fn()
- _, kwargs = mock_quant_fn.call_args
- self.assertEqual(kwargs['input_graph'], tf.get_default_graph())
- self.assertEqual(kwargs['quant_delay'], 10)
- mock_summarize_col.assert_called_with('quant_vars')
-
- def testQuantizationBuilderSetsUpCorrectEvalArguments(self):
- with mock.patch.object(contrib_quantize,
- 'experimental_create_eval_graph') as mock_quant_fn:
- with mock.patch.object(contrib_layers,
- 'summarize_collection') as mock_summarize_col:
- graph_rewriter_proto = graph_rewriter_pb2.GraphRewriter()
- graph_rewriter_proto.quantization.delay = 10
- graph_rewrite_fn = graph_rewriter_builder.build(
- graph_rewriter_proto, is_training=False)
- graph_rewrite_fn()
- _, kwargs = mock_quant_fn.call_args
- self.assertEqual(kwargs['input_graph'], tf.get_default_graph())
- mock_summarize_col.assert_called_with('quant_vars')
-
- def testQuantizationBuilderAddsQuantOverride(self):
- graph = ops.Graph()
- with graph.as_default():
- self._buildGraph()
-
- quant_overrides_proto = quant_overrides_pb2.QuantOverrides()
- quant_config = quant_overrides_proto.quant_configs.add()
- quant_config.op_name = 'test_graph/add_ab'
- quant_config.quant_op_name = 'act_quant'
- quant_config.fixed_range = True
- quant_config.min = 0
- quant_config.max = 6
- quant_config.delay = 100
-
- graph_rewriter_proto = graph_rewriter_pb2.GraphRewriter()
- graph_rewriter_proto.quantization.delay = 10
- graph_rewriter_proto.quantization.weight_bits = 8
- graph_rewriter_proto.quantization.activation_bits = 8
-
- graph_rewrite_fn = graph_rewriter_builder.build(
- graph_rewriter_proto,
- quant_overrides_config=quant_overrides_proto,
- is_training=True)
- graph_rewrite_fn()
-
- act_quant_found = False
- quant_delay_found = False
- for op in graph.get_operations():
- if (quant_config.quant_op_name in op.name and
- op.type == 'FakeQuantWithMinMaxArgs'):
- act_quant_found = True
- min_val = op.get_attr('min')
- max_val = op.get_attr('max')
- self.assertEqual(min_val, quant_config.min)
- self.assertEqual(max_val, quant_config.max)
- if ('activate_quant' in op.name and
- quant_config.quant_op_name in op.name and op.type == 'Const'):
- tensor = op.get_attr('value')
- if tensor.int64_val[0] == quant_config.delay:
- quant_delay_found = True
-
- self.assertTrue(act_quant_found)
- self.assertTrue(quant_delay_found)
-
- def _buildGraph(self, scope='test_graph'):
- with ops.name_scope(scope):
- a = tf.constant(10, dtype=dtypes.float32, name='input_a')
- b = tf.constant(20, dtype=dtypes.float32, name='input_b')
- ab = tf.add(a, b, name='add_ab')
- c = tf.constant(30, dtype=dtypes.float32, name='input_c')
- abc = tf.multiply(ab, c, name='mul_ab_c')
- return abc
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/configs/lstm_ssd_interleaved_mobilenet_v2_imagenet.config b/research/lstm_object_detection/configs/lstm_ssd_interleaved_mobilenet_v2_imagenet.config
deleted file mode 100644
index 536d7d53271..00000000000
--- a/research/lstm_object_detection/configs/lstm_ssd_interleaved_mobilenet_v2_imagenet.config
+++ /dev/null
@@ -1,239 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# For training on Imagenet Video with LSTM Interleaved Mobilenet V2
-
-[lstm_object_detection.protos.lstm_model] {
- train_unroll_length: 4
- eval_unroll_length: 4
- lstm_state_depth: 320
- depth_multipliers: 1.4
- depth_multipliers: 0.35
- pre_bottleneck: true
- low_res: true
- train_interleave_method: 'RANDOM_SKIP_SMALL'
- eval_interleave_method: 'SKIP3'
-}
-model {
- ssd {
- num_classes: 30 # Num of class for imagenet vid dataset.
- box_coder {
- faster_rcnn_box_coder {
- y_scale: 10.0
- x_scale: 10.0
- height_scale: 5.0
- width_scale: 5.0
- }
- }
- matcher {
- argmax_matcher {
- matched_threshold: 0.5
- unmatched_threshold: 0.5
- ignore_thresholds: false
- negatives_lower_than_unmatched: true
- force_match_for_each_row: true
- }
- }
- similarity_calculator {
- iou_similarity {
- }
- }
- anchor_generator {
- ssd_anchor_generator {
- num_layers: 5
- min_scale: 0.2
- max_scale: 0.95
- aspect_ratios: 1.0
- aspect_ratios: 2.0
- aspect_ratios: 0.5
- aspect_ratios: 3.0
- aspect_ratios: 0.3333
- }
- }
- image_resizer {
- fixed_shape_resizer {
- height: 320
- width: 320
- }
- }
- box_predictor {
- convolutional_box_predictor {
- min_depth: 0
- max_depth: 0
- num_layers_before_predictor: 3
- use_dropout: false
- dropout_keep_probability: 0.8
- kernel_size: 3
- box_code_size: 4
- apply_sigmoid_to_scores: false
- use_depthwise: true
- conv_hyperparams {
- activation: RELU_6,
- regularizer {
- l2_regularizer {
- weight: 0.00004
- }
- }
- initializer {
- truncated_normal_initializer {
- stddev: 0.03
- mean: 0.0
- }
- }
- batch_norm {
- train: true,
- scale: true,
- center: true,
- decay: 0.9997,
- epsilon: 0.001,
- }
- }
- }
- }
- feature_extractor {
- type: 'lstm_ssd_interleaved_mobilenet_v2'
- conv_hyperparams {
- activation: RELU_6,
- regularizer {
- l2_regularizer {
- weight: 0.00004
- }
- }
- initializer {
- truncated_normal_initializer {
- stddev: 0.03
- mean: 0.0
- }
- }
- batch_norm {
- train: true,
- scale: true,
- center: true,
- decay: 0.9997,
- epsilon: 0.001,
- }
- }
- }
- loss {
- classification_loss {
- weighted_sigmoid {
- }
- }
- localization_loss {
- weighted_smooth_l1 {
- }
- }
- hard_example_miner {
- num_hard_examples: 3000
- iou_threshold: 0.99
- loss_type: CLASSIFICATION
- max_negatives_per_positive: 3
- min_negatives_per_image: 0
- }
- classification_weight: 1.0
- localization_weight: 4.0
- }
- normalize_loss_by_num_matches: true
- post_processing {
- batch_non_max_suppression {
- score_threshold: -20.0
- iou_threshold: 0.5
- max_detections_per_class: 100
- max_total_detections: 100
- }
- score_converter: SIGMOID
- }
- }
-}
-
-train_config: {
- batch_size: 8
- optimizer {
- use_moving_average: false
- rms_prop_optimizer: {
- learning_rate: {
- exponential_decay_learning_rate {
- initial_learning_rate: 0.002
- decay_steps: 200000
- decay_factor: 0.95
- }
- }
- momentum_optimizer_value: 0.9
- decay: 0.9
- epsilon: 1.0
- }
- }
- gradient_clipping_by_norm: 10.0
- batch_queue_capacity: 12
- prefetch_queue_capacity: 4
-}
-
-train_input_reader: {
- shuffle_buffer_size: 32
- queue_capacity: 12
- prefetch_size: 12
- min_after_dequeue: 4
- label_map_path: "path/to/label_map"
- external_input_reader {
- [lstm_object_detection.protos.GoogleInputReader.google_input_reader] {
- tf_record_video_input_reader: {
- input_path: '/data/lstm_detection/tfrecords/test.tfrecord'
- data_type: TF_SEQUENCE_EXAMPLE
- video_length: 4
- }
- }
- }
-}
-
-eval_config: {
- metrics_set: "coco_evaluation_all_frames"
- use_moving_averages: true
- min_score_threshold: 0.5
- max_num_boxes_to_visualize: 300
- visualize_groundtruth_boxes: true
- groundtruth_box_visualization_color: "red"
-}
-
-eval_input_reader {
- label_map_path: "path/to/label_map"
- shuffle: true
- num_epochs: 1
- num_parallel_batches: 1
- num_readers: 1
- external_input_reader {
- [lstm_object_detection.protos.GoogleInputReader.google_input_reader] {
- tf_record_video_input_reader: {
- input_path: "path/to/sequence_example/data"
- data_type: TF_SEQUENCE_EXAMPLE
- video_length: 10
- }
- }
- }
-}
-
-eval_input_reader: {
- label_map_path: "path/to/label_map"
- external_input_reader {
- [lstm_object_detection.protos.GoogleInputReader.google_input_reader] {
- tf_record_video_input_reader: {
- input_path: "path/to/sequence_example/data"
- data_type: TF_SEQUENCE_EXAMPLE
- video_length: 4
- }
- }
- }
- shuffle: true
- num_readers: 1
-}
diff --git a/research/lstm_object_detection/configs/lstm_ssd_mobilenet_v1_imagenet.config b/research/lstm_object_detection/configs/lstm_ssd_mobilenet_v1_imagenet.config
deleted file mode 100644
index cb357ec17ee..00000000000
--- a/research/lstm_object_detection/configs/lstm_ssd_mobilenet_v1_imagenet.config
+++ /dev/null
@@ -1,232 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-# For training on Imagenet Video with LSTM Mobilenet V1
-
-[lstm_object_detection.protos.lstm_model] {
- train_unroll_length: 4
- eval_unroll_length: 4
-}
-
-model {
- ssd {
- num_classes: 30 # Num of class for imagenet vid dataset.
- box_coder {
- faster_rcnn_box_coder {
- y_scale: 10.0
- x_scale: 10.0
- height_scale: 5.0
- width_scale: 5.0
- }
- }
- matcher {
- argmax_matcher {
- matched_threshold: 0.5
- unmatched_threshold: 0.5
- ignore_thresholds: false
- negatives_lower_than_unmatched: true
- force_match_for_each_row: true
- }
- }
- similarity_calculator {
- iou_similarity {
- }
- }
- anchor_generator {
- ssd_anchor_generator {
- num_layers: 5
- min_scale: 0.2
- max_scale: 0.95
- aspect_ratios: 1.0
- aspect_ratios: 2.0
- aspect_ratios: 0.5
- aspect_ratios: 3.0
- aspect_ratios: 0.3333
- }
- }
- image_resizer {
- fixed_shape_resizer {
- height: 256
- width: 256
- }
- }
- box_predictor {
- convolutional_box_predictor {
- min_depth: 0
- max_depth: 0
- num_layers_before_predictor: 3
- use_dropout: false
- dropout_keep_probability: 0.8
- kernel_size: 3
- box_code_size: 4
- apply_sigmoid_to_scores: false
- use_depthwise: true
- conv_hyperparams {
- activation: RELU_6,
- regularizer {
- l2_regularizer {
- weight: 0.00004
- }
- }
- initializer {
- truncated_normal_initializer {
- stddev: 0.03
- mean: 0.0
- }
- }
- batch_norm {
- train: true,
- scale: true,
- center: true,
- decay: 0.9997,
- epsilon: 0.001,
- }
- }
- }
- }
- feature_extractor {
- type: 'lstm_mobilenet_v1'
- min_depth: 16
- depth_multiplier: 1.0
- use_depthwise: true
- conv_hyperparams {
- activation: RELU_6,
- regularizer {
- l2_regularizer {
- weight: 0.00004
- }
- }
- initializer {
- truncated_normal_initializer {
- stddev: 0.03
- mean: 0.0
- }
- }
- batch_norm {
- train: true,
- scale: true,
- center: true,
- decay: 0.9997,
- epsilon: 0.001,
- }
- }
- }
- loss {
- classification_loss {
- weighted_sigmoid {
- }
- }
- localization_loss {
- weighted_smooth_l1 {
- }
- }
- hard_example_miner {
- num_hard_examples: 3000
- iou_threshold: 0.99
- loss_type: CLASSIFICATION
- max_negatives_per_positive: 3
- min_negatives_per_image: 0
- }
- classification_weight: 1.0
- localization_weight: 4.0
- }
- normalize_loss_by_num_matches: true
- post_processing {
- batch_non_max_suppression {
- score_threshold: -20.0
- iou_threshold: 0.5
- max_detections_per_class: 100
- max_total_detections: 100
- }
- score_converter: SIGMOID
- }
- }
-}
-
-train_config: {
- batch_size: 8
- data_augmentation_options {
- random_horizontal_flip {
- }
- }
- data_augmentation_options {
- ssd_random_crop {
- }
- }
- optimizer {
- use_moving_average: false
- rms_prop_optimizer: {
- learning_rate: {
- exponential_decay_learning_rate {
- initial_learning_rate: 0.002
- decay_steps: 200000
- decay_factor: 0.95
- }
- }
- momentum_optimizer_value: 0.9
- decay: 0.9
- epsilon: 1.0
- }
- }
-
- from_detection_checkpoint: true
- gradient_clipping_by_norm: 10.0
- batch_queue_capacity: 12
- prefetch_queue_capacity: 4
- fine_tune_checkpoint: "/path/to/checkpoint/"
- fine_tune_checkpoint_type: "detection"
-}
-
-
-train_input_reader: {
- shuffle_buffer_size: 32
- queue_capacity: 12
- prefetch_size: 12
- min_after_dequeue: 4
- label_map_path: "path/to/label_map"
- external_input_reader {
- [lstm_object_detection.protos.GoogleInputReader.google_input_reader] {
- tf_record_video_input_reader: {
- input_path: "path/to/sequence_example/data"
- data_type: TF_SEQUENCE_EXAMPLE
- video_length: 4
- }
- }
- }
-}
-
-eval_config: {
- metrics_set: "coco_evaluation_all_frames"
- use_moving_averages: true
- min_score_threshold: 0.5
- max_num_boxes_to_visualize: 300
- visualize_groundtruth_boxes: true
- groundtruth_box_visualization_color: "red"
-}
-
-eval_input_reader: {
- label_map_path: "path/to/label_map"
- external_input_reader {
- [lstm_object_detection.protos.GoogleInputReader.google_input_reader] {
- tf_record_video_input_reader: {
- input_path: "path/to/sequence_example/data"
- data_type: TF_SEQUENCE_EXAMPLE
- video_length: 4
- }
- }
- }
- shuffle: true
- num_readers: 1
-}
diff --git a/research/lstm_object_detection/eval.py b/research/lstm_object_detection/eval.py
deleted file mode 100644
index aac25c1182b..00000000000
--- a/research/lstm_object_detection/eval.py
+++ /dev/null
@@ -1,108 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-r"""Evaluation executable for detection models.
-
-This executable is used to evaluate DetectionModels. Example usage:
- ./eval \
- --logtostderr \
- --checkpoint_dir=path/to/checkpoint_dir \
- --eval_dir=path/to/eval_dir \
- --pipeline_config_path=pipeline_config.pbtxt
-"""
-
-import functools
-import os
-import tensorflow.compat.v1 as tf
-from google.protobuf import text_format
-from lstm_object_detection import evaluator
-from lstm_object_detection import model_builder
-from lstm_object_detection.inputs import seq_dataset_builder
-from lstm_object_detection.utils import config_util
-from object_detection.utils import label_map_util
-
-tf.logging.set_verbosity(tf.logging.INFO)
-flags = tf.app.flags
-flags.DEFINE_boolean('eval_training_data', False,
- 'If training data should be evaluated for this job.')
-flags.DEFINE_string('checkpoint_dir', '',
- 'Directory containing checkpoints to evaluate, typically '
- 'set to `train_dir` used in the training job.')
-flags.DEFINE_string('eval_dir', '', 'Directory to write eval summaries to.')
-flags.DEFINE_string('pipeline_config_path', '',
- 'Path to a pipeline_pb2.TrainEvalPipelineConfig config '
- 'file. If provided, other configs are ignored')
-flags.DEFINE_boolean('run_once', False, 'Option to only run a single pass of '
- 'evaluation. Overrides the `max_evals` parameter in the '
- 'provided config.')
-FLAGS = flags.FLAGS
-
-
-def main(unused_argv):
- assert FLAGS.checkpoint_dir, '`checkpoint_dir` is missing.'
- assert FLAGS.eval_dir, '`eval_dir` is missing.'
- if FLAGS.pipeline_config_path:
- configs = config_util.get_configs_from_pipeline_file(
- FLAGS.pipeline_config_path)
- else:
- configs = config_util.get_configs_from_multiple_files(
- model_config_path=FLAGS.model_config_path,
- eval_config_path=FLAGS.eval_config_path,
- eval_input_config_path=FLAGS.input_config_path)
-
- pipeline_proto = config_util.create_pipeline_proto_from_configs(configs)
- config_text = text_format.MessageToString(pipeline_proto)
- tf.gfile.MakeDirs(FLAGS.eval_dir)
- with tf.gfile.Open(os.path.join(FLAGS.eval_dir, 'pipeline.config'),
- 'wb') as f:
- f.write(config_text)
-
- model_config = configs['model']
- lstm_config = configs['lstm_model']
- eval_config = configs['eval_config']
- input_config = configs['eval_input_config']
-
- if FLAGS.eval_training_data:
- input_config.external_input_reader.CopyFrom(
- configs['train_input_config'].external_input_reader)
- lstm_config.eval_unroll_length = lstm_config.train_unroll_length
-
- model_fn = functools.partial(
- model_builder.build,
- model_config=model_config,
- lstm_config=lstm_config,
- is_training=False)
-
- def get_next(config, model_config, lstm_config, unroll_length):
- return seq_dataset_builder.build(config, model_config, lstm_config,
- unroll_length)
-
- create_input_dict_fn = functools.partial(get_next, input_config, model_config,
- lstm_config,
- lstm_config.eval_unroll_length)
-
- label_map = label_map_util.load_labelmap(input_config.label_map_path)
- max_num_classes = max([item.id for item in label_map.item])
- categories = label_map_util.convert_label_map_to_categories(
- label_map, max_num_classes)
-
- if FLAGS.run_once:
- eval_config.max_evals = 1
-
- evaluator.evaluate(create_input_dict_fn, model_fn, eval_config, categories,
- FLAGS.checkpoint_dir, FLAGS.eval_dir)
-
-if __name__ == '__main__':
- tf.app.run()
diff --git a/research/lstm_object_detection/evaluator.py b/research/lstm_object_detection/evaluator.py
deleted file mode 100644
index 6ed3e476e8e..00000000000
--- a/research/lstm_object_detection/evaluator.py
+++ /dev/null
@@ -1,337 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Detection model evaluator.
-
-This file provides a generic evaluation method that can be used to evaluate a
-DetectionModel.
-
-"""
-
-import tensorflow.compat.v1 as tf
-from tensorflow.contrib import tfprof as contrib_tfprof
-from lstm_object_detection.metrics import coco_evaluation_all_frames
-from object_detection import eval_util
-from object_detection.core import prefetcher
-from object_detection.core import standard_fields as fields
-from object_detection.metrics import coco_evaluation
-from object_detection.utils import object_detection_evaluation
-
-
-# A dictionary of metric names to classes that implement the metric. The classes
-# in the dictionary must implement
-# utils.object_detection_evaluation.DetectionEvaluator interface.
-EVAL_METRICS_CLASS_DICT = {
- 'pascal_voc_detection_metrics':
- object_detection_evaluation.PascalDetectionEvaluator,
- 'weighted_pascal_voc_detection_metrics':
- object_detection_evaluation.WeightedPascalDetectionEvaluator,
- 'pascal_voc_instance_segmentation_metrics':
- object_detection_evaluation.PascalInstanceSegmentationEvaluator,
- 'weighted_pascal_voc_instance_segmentation_metrics':
- object_detection_evaluation.WeightedPascalInstanceSegmentationEvaluator,
- 'open_images_detection_metrics':
- object_detection_evaluation.OpenImagesDetectionEvaluator,
- 'coco_detection_metrics':
- coco_evaluation.CocoDetectionEvaluator,
- 'coco_mask_metrics':
- coco_evaluation.CocoMaskEvaluator,
- 'coco_evaluation_all_frames':
- coco_evaluation_all_frames.CocoEvaluationAllFrames,
-}
-
-EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'
-
-
-def _create_detection_op(model, input_dict, batch):
- """Create detection ops.
-
- Args:
- model: model to perform predictions with.
- input_dict: A dict holds input data.
- batch: batch size for evaluation.
-
- Returns:
- Detection tensor ops.
- """
- video_tensor = tf.stack(list(input_dict[fields.InputDataFields.image]))
- preprocessed_video, true_image_shapes = model.preprocess(
- tf.to_float(video_tensor))
- if batch is not None:
- prediction_dict = model.predict(preprocessed_video, true_image_shapes,
- batch)
- else:
- prediction_dict = model.predict(preprocessed_video, true_image_shapes)
-
- return model.postprocess(prediction_dict, true_image_shapes)
-
-
-def _extract_prediction_tensors(model,
- create_input_dict_fn,
- ignore_groundtruth=False):
- """Restores the model in a tensorflow session.
-
- Args:
- model: model to perform predictions with.
- create_input_dict_fn: function to create input tensor dictionaries.
- ignore_groundtruth: whether groundtruth should be ignored.
-
-
- Returns:
- tensor_dict: A tensor dictionary with evaluations.
- """
- input_dict = create_input_dict_fn()
- batch = None
- if 'batch' in input_dict:
- batch = input_dict.pop('batch')
- else:
- prefetch_queue = prefetcher.prefetch(input_dict, capacity=500)
- input_dict = prefetch_queue.dequeue()
- # consistent format for images and videos
- for key, value in input_dict.iteritems():
- input_dict[key] = (value,)
-
- detections = _create_detection_op(model, input_dict, batch)
-
- # Print out anaylsis of the model.
- contrib_tfprof.model_analyzer.print_model_analysis(
- tf.get_default_graph(),
- tfprof_options=contrib_tfprof.model_analyzer
- .TRAINABLE_VARS_PARAMS_STAT_OPTIONS)
- contrib_tfprof.model_analyzer.print_model_analysis(
- tf.get_default_graph(),
- tfprof_options=contrib_tfprof.model_analyzer.FLOAT_OPS_OPTIONS)
-
- num_frames = len(input_dict[fields.InputDataFields.image])
- ret = []
- for i in range(num_frames):
- original_image = tf.expand_dims(input_dict[fields.InputDataFields.image][i],
- 0)
- groundtruth = None
- if not ignore_groundtruth:
- groundtruth = {
- fields.InputDataFields.groundtruth_boxes:
- input_dict[fields.InputDataFields.groundtruth_boxes][i],
- fields.InputDataFields.groundtruth_classes:
- input_dict[fields.InputDataFields.groundtruth_classes][i],
- }
- optional_keys = (
- fields.InputDataFields.groundtruth_area,
- fields.InputDataFields.groundtruth_is_crowd,
- fields.InputDataFields.groundtruth_difficult,
- fields.InputDataFields.groundtruth_group_of,
- )
- for opt_key in optional_keys:
- if opt_key in input_dict:
- groundtruth[opt_key] = input_dict[opt_key][i]
- if fields.DetectionResultFields.detection_masks in detections:
- groundtruth[fields.InputDataFields.groundtruth_instance_masks] = (
- input_dict[fields.InputDataFields.groundtruth_instance_masks][i])
-
- detections_frame = {
- key: tf.expand_dims(value[i], 0)
- for key, value in detections.iteritems()
- }
-
- source_id = (
- batch.key[0] if batch is not None else
- input_dict[fields.InputDataFields.source_id][i])
- ret.append(
- eval_util.result_dict_for_single_example(
- original_image,
- source_id,
- detections_frame,
- groundtruth,
- class_agnostic=(fields.DetectionResultFields.detection_classes
- not in detections),
- scale_to_absolute=True))
- return ret
-
-
-def get_evaluators(eval_config, categories):
- """Returns the evaluator class according to eval_config, valid for categories.
-
- Args:
- eval_config: evaluation configurations.
- categories: a list of categories to evaluate.
- Returns:
- An list of instances of DetectionEvaluator.
-
- Raises:
- ValueError: if metric is not in the metric class dictionary.
- """
- eval_metric_fn_keys = eval_config.metrics_set
- if not eval_metric_fn_keys:
- eval_metric_fn_keys = [EVAL_DEFAULT_METRIC]
- evaluators_list = []
- for eval_metric_fn_key in eval_metric_fn_keys:
- if eval_metric_fn_key not in EVAL_METRICS_CLASS_DICT:
- raise ValueError('Metric not found: {}'.format(eval_metric_fn_key))
- else:
- evaluators_list.append(
- EVAL_METRICS_CLASS_DICT[eval_metric_fn_key](categories=categories))
- return evaluators_list
-
-
-def evaluate(create_input_dict_fn,
- create_model_fn,
- eval_config,
- categories,
- checkpoint_dir,
- eval_dir,
- graph_hook_fn=None):
- """Evaluation function for detection models.
-
- Args:
- create_input_dict_fn: a function to create a tensor input dictionary.
- create_model_fn: a function that creates a DetectionModel.
- eval_config: a eval_pb2.EvalConfig protobuf.
- categories: a list of category dictionaries. Each dict in the list should
- have an integer 'id' field and string 'name' field.
- checkpoint_dir: directory to load the checkpoints to evaluate from.
- eval_dir: directory to write evaluation metrics summary to.
- graph_hook_fn: Optional function that is called after the training graph is
- completely built. This is helpful to perform additional changes to the
- training graph such as optimizing batchnorm. The function should modify
- the default graph.
-
- Returns:
- metrics: A dictionary containing metric names and values from the latest
- run.
- """
-
- model = create_model_fn()
-
- if eval_config.ignore_groundtruth and not eval_config.export_path:
- tf.logging.fatal('If ignore_groundtruth=True then an export_path is '
- 'required. Aborting!!!')
-
- tensor_dicts = _extract_prediction_tensors(
- model=model,
- create_input_dict_fn=create_input_dict_fn,
- ignore_groundtruth=eval_config.ignore_groundtruth)
-
- def _process_batch(tensor_dicts,
- sess,
- batch_index,
- counters,
- losses_dict=None):
- """Evaluates tensors in tensor_dicts, visualizing the first K examples.
-
- This function calls sess.run on tensor_dicts, evaluating the original_image
- tensor only on the first K examples and visualizing detections overlaid
- on this original_image.
-
- Args:
- tensor_dicts: a dictionary of tensors
- sess: tensorflow session
- batch_index: the index of the batch amongst all batches in the run.
- counters: a dictionary holding 'success' and 'skipped' fields which can
- be updated to keep track of number of successful and failed runs,
- respectively. If these fields are not updated, then the success/skipped
- counter values shown at the end of evaluation will be incorrect.
- losses_dict: Optional dictonary of scalar loss tensors. Necessary only
- for matching function signiture in third_party eval_util.py.
-
- Returns:
- result_dict: a dictionary of numpy arrays
- result_losses_dict: a dictionary of scalar losses. This is empty if input
- losses_dict is None. Necessary only for matching function signiture in
- third_party eval_util.py.
- """
- if batch_index % 10 == 0:
- tf.logging.info('Running eval ops batch %d', batch_index)
- if not losses_dict:
- losses_dict = {}
- try:
- result_dicts, result_losses_dict = sess.run([tensor_dicts, losses_dict])
- counters['success'] += 1
- except tf.errors.InvalidArgumentError:
- tf.logging.info('Skipping image')
- counters['skipped'] += 1
- return {}
- num_images = len(tensor_dicts)
- for i in range(num_images):
- result_dict = result_dicts[i]
- global_step = tf.train.global_step(sess, tf.train.get_global_step())
- tag = 'image-%d' % (batch_index * num_images + i)
- if batch_index < eval_config.num_visualizations / num_images:
- eval_util.visualize_detection_results(
- result_dict,
- tag,
- global_step,
- categories=categories,
- summary_dir=eval_dir,
- export_dir=eval_config.visualization_export_dir,
- show_groundtruth=eval_config.visualize_groundtruth_boxes,
- groundtruth_box_visualization_color=eval_config.
- groundtruth_box_visualization_color,
- min_score_thresh=eval_config.min_score_threshold,
- max_num_predictions=eval_config.max_num_boxes_to_visualize,
- skip_scores=eval_config.skip_scores,
- skip_labels=eval_config.skip_labels,
- keep_image_id_for_visualization_export=eval_config.
- keep_image_id_for_visualization_export)
- if num_images > 1:
- return result_dicts, result_losses_dict
- else:
- return result_dicts[0], result_losses_dict
-
- variables_to_restore = tf.global_variables()
- global_step = tf.train.get_or_create_global_step()
- variables_to_restore.append(global_step)
-
- if graph_hook_fn:
- graph_hook_fn()
-
- if eval_config.use_moving_averages:
- variable_averages = tf.train.ExponentialMovingAverage(0.0)
- variables_to_restore = variable_averages.variables_to_restore()
- for key in variables_to_restore.keys():
- if 'moving_mean' in key:
- variables_to_restore[key.replace(
- 'moving_mean', 'moving_mean/ExponentialMovingAverage')] = (
- variables_to_restore[key])
- del variables_to_restore[key]
- if 'moving_variance' in key:
- variables_to_restore[key.replace(
- 'moving_variance', 'moving_variance/ExponentialMovingAverage')] = (
- variables_to_restore[key])
- del variables_to_restore[key]
-
- saver = tf.train.Saver(variables_to_restore)
-
- def _restore_latest_checkpoint(sess):
- latest_checkpoint = tf.train.latest_checkpoint(checkpoint_dir)
- saver.restore(sess, latest_checkpoint)
-
- metrics = eval_util.repeated_checkpoint_run(
- tensor_dict=tensor_dicts,
- summary_dir=eval_dir,
- evaluators=get_evaluators(eval_config, categories),
- batch_processor=_process_batch,
- checkpoint_dirs=[checkpoint_dir],
- variables_to_restore=None,
- restore_fn=_restore_latest_checkpoint,
- num_batches=eval_config.num_examples,
- eval_interval_secs=eval_config.eval_interval_secs,
- max_number_of_evaluations=(1 if eval_config.ignore_groundtruth else
- eval_config.max_evals
- if eval_config.max_evals else None),
- master=eval_config.eval_master,
- save_graph=eval_config.save_graph,
- save_graph_dir=(eval_dir if eval_config.save_graph else ''))
-
- return metrics
diff --git a/research/lstm_object_detection/export_tflite_lstd_graph.py b/research/lstm_object_detection/export_tflite_lstd_graph.py
deleted file mode 100644
index 7e933fb480d..00000000000
--- a/research/lstm_object_detection/export_tflite_lstd_graph.py
+++ /dev/null
@@ -1,138 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-r"""Exports an LSTM detection model to use with tf-lite.
-
-Outputs file:
-* A tflite compatible frozen graph - $output_directory/tflite_graph.pb
-
-The exported graph has the following input and output nodes.
-
-Inputs:
-'input_video_tensor': a float32 tensor of shape
-[unroll_length, height, width, 3] containing the normalized input image.
-Note that the height and width must be compatible with the height and
-width configured in the fixed_shape_image resizer options in the pipeline
-config proto.
-
-Outputs:
-If add_postprocessing_op is true: frozen graph adds a
- TFLite_Detection_PostProcess custom op node has four outputs:
- detection_boxes: a float32 tensor of shape [1, num_boxes, 4] with box
- locations
- detection_classes: a float32 tensor of shape [1, num_boxes]
- with class indices
- detection_scores: a float32 tensor of shape [1, num_boxes]
- with class scores
- num_boxes: a float32 tensor of size 1 containing the number of detected boxes
-else:
- the graph has three outputs:
- 'raw_outputs/box_encodings': a float32 tensor of shape [1, num_anchors, 4]
- containing the encoded box predictions.
- 'raw_outputs/class_predictions': a float32 tensor of shape
- [1, num_anchors, num_classes] containing the class scores for each anchor
- after applying score conversion.
- 'anchors': a float32 constant tensor of shape [num_anchors, 4]
- containing the anchor boxes.
-
-Example Usage:
---------------
-python lstm_object_detection/export_tflite_lstd_graph.py \
- --pipeline_config_path path/to/lstm_pipeline.config \
- --trained_checkpoint_prefix path/to/model.ckpt \
- --output_directory path/to/exported_model_directory
-
-The expected output would be in the directory
-path/to/exported_model_directory (which is created if it does not exist)
-with contents:
- - tflite_graph.pbtxt
- - tflite_graph.pb
-Config overrides (see the `config_override` flag) are text protobufs
-(also of type pipeline_pb2.TrainEvalPipelineConfig) which are used to override
-certain fields in the provided pipeline_config_path. These are useful for
-making small changes to the inference graph that differ from the training or
-eval config.
-
-Example Usage (in which we change the NMS iou_threshold to be 0.5 and
-NMS score_threshold to be 0.0):
-python lstm_object_detection/export_tflite_lstd_graph.py \
- --pipeline_config_path path/to/lstm_pipeline.config \
- --trained_checkpoint_prefix path/to/model.ckpt \
- --output_directory path/to/exported_model_directory
- --config_override " \
- model{ \
- ssd{ \
- post_processing { \
- batch_non_max_suppression { \
- score_threshold: 0.0 \
- iou_threshold: 0.5 \
- } \
- } \
- } \
- } \
- "
-"""
-
-import tensorflow.compat.v1 as tf
-
-from lstm_object_detection import export_tflite_lstd_graph_lib
-from lstm_object_detection.utils import config_util
-
-flags = tf.app.flags
-flags.DEFINE_string('output_directory', None, 'Path to write outputs.')
-flags.DEFINE_string(
- 'pipeline_config_path', None,
- 'Path to a pipeline_pb2.TrainEvalPipelineConfig config '
- 'file.')
-flags.DEFINE_string('trained_checkpoint_prefix', None, 'Checkpoint prefix.')
-flags.DEFINE_integer('max_detections', 10,
- 'Maximum number of detections (boxes) to show.')
-flags.DEFINE_integer('max_classes_per_detection', 1,
- 'Maximum number of classes to output per detection box.')
-flags.DEFINE_integer(
- 'detections_per_class', 100,
- 'Number of anchors used per class in Regular Non-Max-Suppression.')
-flags.DEFINE_bool('add_postprocessing_op', True,
- 'Add TFLite custom op for postprocessing to the graph.')
-flags.DEFINE_bool(
- 'use_regular_nms', False,
- 'Flag to set postprocessing op to use Regular NMS instead of Fast NMS.')
-flags.DEFINE_string(
- 'config_override', '', 'pipeline_pb2.TrainEvalPipelineConfig '
- 'text proto to override pipeline_config_path.')
-
-FLAGS = flags.FLAGS
-
-
-def main(argv):
- del argv # Unused.
- flags.mark_flag_as_required('output_directory')
- flags.mark_flag_as_required('pipeline_config_path')
- flags.mark_flag_as_required('trained_checkpoint_prefix')
-
- pipeline_config = config_util.get_configs_from_pipeline_file(
- FLAGS.pipeline_config_path)
-
- export_tflite_lstd_graph_lib.export_tflite_graph(
- pipeline_config,
- FLAGS.trained_checkpoint_prefix,
- FLAGS.output_directory,
- FLAGS.add_postprocessing_op,
- FLAGS.max_detections,
- FLAGS.max_classes_per_detection,
- use_regular_nms=FLAGS.use_regular_nms)
-
-
-if __name__ == '__main__':
- tf.app.run(main)
diff --git a/research/lstm_object_detection/export_tflite_lstd_graph_lib.py b/research/lstm_object_detection/export_tflite_lstd_graph_lib.py
deleted file mode 100644
index e066f11b45f..00000000000
--- a/research/lstm_object_detection/export_tflite_lstd_graph_lib.py
+++ /dev/null
@@ -1,327 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-r"""Exports detection models to use with tf-lite.
-
-See export_tflite_lstd_graph.py for usage.
-"""
-import os
-import tempfile
-
-import numpy as np
-import tensorflow.compat.v1 as tf
-
-from tensorflow.core.framework import attr_value_pb2
-from tensorflow.core.framework import types_pb2
-from tensorflow.core.protobuf import saver_pb2
-from tensorflow.tools.graph_transforms import TransformGraph
-from lstm_object_detection import model_builder
-from object_detection import exporter
-from object_detection.builders import graph_rewriter_builder
-from object_detection.builders import post_processing_builder
-from object_detection.core import box_list
-
-_DEFAULT_NUM_CHANNELS = 3
-_DEFAULT_NUM_COORD_BOX = 4
-
-
-def get_const_center_size_encoded_anchors(anchors):
- """Exports center-size encoded anchors as a constant tensor.
-
- Args:
- anchors: a float32 tensor of shape [num_anchors, 4] containing the anchor
- boxes
-
- Returns:
- encoded_anchors: a float32 constant tensor of shape [num_anchors, 4]
- containing the anchor boxes.
- """
- anchor_boxlist = box_list.BoxList(anchors)
- y, x, h, w = anchor_boxlist.get_center_coordinates_and_sizes()
- num_anchors = y.get_shape().as_list()
-
- with tf.Session() as sess:
- y_out, x_out, h_out, w_out = sess.run([y, x, h, w])
- encoded_anchors = tf.constant(
- np.transpose(np.stack((y_out, x_out, h_out, w_out))),
- dtype=tf.float32,
- shape=[num_anchors[0], _DEFAULT_NUM_COORD_BOX],
- name='anchors')
- return encoded_anchors
-
-
-def append_postprocessing_op(frozen_graph_def,
- max_detections,
- max_classes_per_detection,
- nms_score_threshold,
- nms_iou_threshold,
- num_classes,
- scale_values,
- detections_per_class=100,
- use_regular_nms=False):
- """Appends postprocessing custom op.
-
- Args:
- frozen_graph_def: Frozen GraphDef for SSD model after freezing the
- checkpoint
- max_detections: Maximum number of detections (boxes) to show
- max_classes_per_detection: Number of classes to display per detection
- nms_score_threshold: Score threshold used in Non-maximal suppression in
- post-processing
- nms_iou_threshold: Intersection-over-union threshold used in Non-maximal
- suppression in post-processing
- num_classes: number of classes in SSD detector
- scale_values: scale values is a dict with following key-value pairs
- {y_scale: 10, x_scale: 10, h_scale: 5, w_scale: 5} that are used in decode
- centersize boxes
- detections_per_class: In regular NonMaxSuppression, number of anchors used
- for NonMaxSuppression per class
- use_regular_nms: Flag to set postprocessing op to use Regular NMS instead of
- Fast NMS.
-
- Returns:
- transformed_graph_def: Frozen GraphDef with postprocessing custom op
- appended
- TFLite_Detection_PostProcess custom op node has four outputs:
- detection_boxes: a float32 tensor of shape [1, num_boxes, 4] with box
- locations
- detection_classes: a float32 tensor of shape [1, num_boxes]
- with class indices
- detection_scores: a float32 tensor of shape [1, num_boxes]
- with class scores
- num_boxes: a float32 tensor of size 1 containing the number of detected
- boxes
- """
- new_output = frozen_graph_def.node.add()
- new_output.op = 'TFLite_Detection_PostProcess'
- new_output.name = 'TFLite_Detection_PostProcess'
- new_output.attr['_output_quantized'].CopyFrom(
- attr_value_pb2.AttrValue(b=True))
- new_output.attr['_output_types'].list.type.extend([
- types_pb2.DT_FLOAT, types_pb2.DT_FLOAT, types_pb2.DT_FLOAT,
- types_pb2.DT_FLOAT
- ])
- new_output.attr['_support_output_type_float_in_quantized_op'].CopyFrom(
- attr_value_pb2.AttrValue(b=True))
- new_output.attr['max_detections'].CopyFrom(
- attr_value_pb2.AttrValue(i=max_detections))
- new_output.attr['max_classes_per_detection'].CopyFrom(
- attr_value_pb2.AttrValue(i=max_classes_per_detection))
- new_output.attr['nms_score_threshold'].CopyFrom(
- attr_value_pb2.AttrValue(f=nms_score_threshold.pop()))
- new_output.attr['nms_iou_threshold'].CopyFrom(
- attr_value_pb2.AttrValue(f=nms_iou_threshold.pop()))
- new_output.attr['num_classes'].CopyFrom(
- attr_value_pb2.AttrValue(i=num_classes))
-
- new_output.attr['y_scale'].CopyFrom(
- attr_value_pb2.AttrValue(f=scale_values['y_scale'].pop()))
- new_output.attr['x_scale'].CopyFrom(
- attr_value_pb2.AttrValue(f=scale_values['x_scale'].pop()))
- new_output.attr['h_scale'].CopyFrom(
- attr_value_pb2.AttrValue(f=scale_values['h_scale'].pop()))
- new_output.attr['w_scale'].CopyFrom(
- attr_value_pb2.AttrValue(f=scale_values['w_scale'].pop()))
- new_output.attr['detections_per_class'].CopyFrom(
- attr_value_pb2.AttrValue(i=detections_per_class))
- new_output.attr['use_regular_nms'].CopyFrom(
- attr_value_pb2.AttrValue(b=use_regular_nms))
-
- new_output.input.extend(
- ['raw_outputs/box_encodings', 'raw_outputs/class_predictions', 'anchors'])
- # Transform the graph to append new postprocessing op
- input_names = []
- output_names = ['TFLite_Detection_PostProcess']
- transforms = ['strip_unused_nodes']
- transformed_graph_def = TransformGraph(frozen_graph_def, input_names,
- output_names, transforms)
- return transformed_graph_def
-
-
-def export_tflite_graph(pipeline_config,
- trained_checkpoint_prefix,
- output_dir,
- add_postprocessing_op,
- max_detections,
- max_classes_per_detection,
- detections_per_class=100,
- use_regular_nms=False,
- binary_graph_name='tflite_graph.pb',
- txt_graph_name='tflite_graph.pbtxt'):
- """Exports a tflite compatible graph and anchors for ssd detection model.
-
- Anchors are written to a tensor and tflite compatible graph
- is written to output_dir/tflite_graph.pb.
-
- Args:
- pipeline_config: Dictionary of configuration objects. Keys are `model`,
- `train_config`, `train_input_config`, `eval_config`, `eval_input_config`,
- `lstm_model`. Value are the corresponding config objects.
- trained_checkpoint_prefix: a file prefix for the checkpoint containing the
- trained parameters of the SSD model.
- output_dir: A directory to write the tflite graph and anchor file to.
- add_postprocessing_op: If add_postprocessing_op is true: frozen graph adds a
- TFLite_Detection_PostProcess custom op
- max_detections: Maximum number of detections (boxes) to show
- max_classes_per_detection: Number of classes to display per detection
- detections_per_class: In regular NonMaxSuppression, number of anchors used
- for NonMaxSuppression per class
- use_regular_nms: Flag to set postprocessing op to use Regular NMS instead of
- Fast NMS.
- binary_graph_name: Name of the exported graph file in binary format.
- txt_graph_name: Name of the exported graph file in text format.
-
- Raises:
- ValueError: if the pipeline config contains models other than ssd or uses an
- fixed_shape_resizer and provides a shape as well.
- """
- model_config = pipeline_config['model']
- lstm_config = pipeline_config['lstm_model']
- eval_config = pipeline_config['eval_config']
- tf.gfile.MakeDirs(output_dir)
- if model_config.WhichOneof('model') != 'ssd':
- raise ValueError('Only ssd models are supported in tflite. '
- 'Found {} in config'.format(
- model_config.WhichOneof('model')))
-
- num_classes = model_config.ssd.num_classes
- nms_score_threshold = {
- model_config.ssd.post_processing.batch_non_max_suppression.score_threshold
- }
- nms_iou_threshold = {
- model_config.ssd.post_processing.batch_non_max_suppression.iou_threshold
- }
- scale_values = {}
- scale_values['y_scale'] = {
- model_config.ssd.box_coder.faster_rcnn_box_coder.y_scale
- }
- scale_values['x_scale'] = {
- model_config.ssd.box_coder.faster_rcnn_box_coder.x_scale
- }
- scale_values['h_scale'] = {
- model_config.ssd.box_coder.faster_rcnn_box_coder.height_scale
- }
- scale_values['w_scale'] = {
- model_config.ssd.box_coder.faster_rcnn_box_coder.width_scale
- }
-
- image_resizer_config = model_config.ssd.image_resizer
- image_resizer = image_resizer_config.WhichOneof('image_resizer_oneof')
- num_channels = _DEFAULT_NUM_CHANNELS
- if image_resizer == 'fixed_shape_resizer':
- height = image_resizer_config.fixed_shape_resizer.height
- width = image_resizer_config.fixed_shape_resizer.width
- if image_resizer_config.fixed_shape_resizer.convert_to_grayscale:
- num_channels = 1
-
- shape = [lstm_config.eval_unroll_length, height, width, num_channels]
- else:
- raise ValueError(
- 'Only fixed_shape_resizer'
- 'is supported with tflite. Found {}'.format(
- image_resizer_config.WhichOneof('image_resizer_oneof')))
-
- video_tensor = tf.placeholder(
- tf.float32, shape=shape, name='input_video_tensor')
-
- detection_model = model_builder.build(
- model_config, lstm_config, is_training=False)
- preprocessed_video, true_image_shapes = detection_model.preprocess(
- tf.to_float(video_tensor))
- predicted_tensors = detection_model.predict(preprocessed_video,
- true_image_shapes)
- # predicted_tensors = detection_model.postprocess(predicted_tensors,
- # true_image_shapes)
- # The score conversion occurs before the post-processing custom op
- _, score_conversion_fn = post_processing_builder.build(
- model_config.ssd.post_processing)
- class_predictions = score_conversion_fn(
- predicted_tensors['class_predictions_with_background'])
-
- with tf.name_scope('raw_outputs'):
- # 'raw_outputs/box_encodings': a float32 tensor of shape [1, num_anchors, 4]
- # containing the encoded box predictions. Note that these are raw
- # predictions and no Non-Max suppression is applied on them and
- # no decode center size boxes is applied to them.
- tf.identity(predicted_tensors['box_encodings'], name='box_encodings')
- # 'raw_outputs/class_predictions': a float32 tensor of shape
- # [1, num_anchors, num_classes] containing the class scores for each anchor
- # after applying score conversion.
- tf.identity(class_predictions, name='class_predictions')
- # 'anchors': a float32 tensor of shape
- # [4, num_anchors] containing the anchors as a constant node.
- tf.identity(
- get_const_center_size_encoded_anchors(predicted_tensors['anchors']),
- name='anchors')
-
- # Add global step to the graph, so we know the training step number when we
- # evaluate the model.
- tf.train.get_or_create_global_step()
-
- # graph rewriter
- is_quantized = ('graph_rewriter' in pipeline_config)
- if is_quantized:
- graph_rewriter_config = pipeline_config['graph_rewriter']
- graph_rewriter_fn = graph_rewriter_builder.build(
- graph_rewriter_config, is_training=False, is_export=True)
- graph_rewriter_fn()
-
- if model_config.ssd.feature_extractor.HasField('fpn'):
- exporter.rewrite_nn_resize_op(is_quantized)
-
- # freeze the graph
- saver_kwargs = {}
- if eval_config.use_moving_averages:
- saver_kwargs['write_version'] = saver_pb2.SaverDef.V1
- moving_average_checkpoint = tempfile.NamedTemporaryFile()
- exporter.replace_variable_values_with_moving_averages(
- tf.get_default_graph(), trained_checkpoint_prefix,
- moving_average_checkpoint.name)
- checkpoint_to_use = moving_average_checkpoint.name
- else:
- checkpoint_to_use = trained_checkpoint_prefix
-
- saver = tf.train.Saver(**saver_kwargs)
- input_saver_def = saver.as_saver_def()
- frozen_graph_def = exporter.freeze_graph_with_def_protos(
- input_graph_def=tf.get_default_graph().as_graph_def(),
- input_saver_def=input_saver_def,
- input_checkpoint=checkpoint_to_use,
- output_node_names=','.join([
- 'raw_outputs/box_encodings', 'raw_outputs/class_predictions',
- 'anchors'
- ]),
- restore_op_name='save/restore_all',
- filename_tensor_name='save/Const:0',
- clear_devices=True,
- output_graph='',
- initializer_nodes='')
-
- # Add new operation to do post processing in a custom op (TF Lite only)
-
- if add_postprocessing_op:
- transformed_graph_def = append_postprocessing_op(
- frozen_graph_def, max_detections, max_classes_per_detection,
- nms_score_threshold, nms_iou_threshold, num_classes, scale_values,
- detections_per_class, use_regular_nms)
- else:
- # Return frozen without adding post-processing custom op
- transformed_graph_def = frozen_graph_def
-
- binary_graph = os.path.join(output_dir, binary_graph_name)
- with tf.gfile.GFile(binary_graph, 'wb') as f:
- f.write(transformed_graph_def.SerializeToString())
- txt_graph = os.path.join(output_dir, txt_graph_name)
- with tf.gfile.GFile(txt_graph, 'w') as f:
- f.write(str(transformed_graph_def))
diff --git a/research/lstm_object_detection/export_tflite_lstd_model.py b/research/lstm_object_detection/export_tflite_lstd_model.py
deleted file mode 100644
index 58c674728b5..00000000000
--- a/research/lstm_object_detection/export_tflite_lstd_model.py
+++ /dev/null
@@ -1,65 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Export a LSTD model in tflite format."""
-
-import os
-from absl import flags
-import tensorflow.compat.v1 as tf
-
-from lstm_object_detection.utils import config_util
-
-flags.DEFINE_string('export_path', None, 'Path to export model.')
-flags.DEFINE_string('frozen_graph_path', None, 'Path to frozen graph.')
-flags.DEFINE_string(
- 'pipeline_config_path', '',
- 'Path to a pipeline_pb2.TrainEvalPipelineConfig config file.')
-
-FLAGS = flags.FLAGS
-
-
-def main(_):
- flags.mark_flag_as_required('export_path')
- flags.mark_flag_as_required('frozen_graph_path')
- flags.mark_flag_as_required('pipeline_config_path')
-
- configs = config_util.get_configs_from_pipeline_file(
- FLAGS.pipeline_config_path)
- lstm_config = configs['lstm_model']
-
- input_arrays = ['input_video_tensor']
- output_arrays = [
- 'TFLite_Detection_PostProcess',
- 'TFLite_Detection_PostProcess:1',
- 'TFLite_Detection_PostProcess:2',
- 'TFLite_Detection_PostProcess:3',
- ]
- input_shapes = {
- 'input_video_tensor': [lstm_config.eval_unroll_length, 320, 320, 3],
- }
-
- converter = tf.lite.TFLiteConverter.from_frozen_graph(
- FLAGS.frozen_graph_path,
- input_arrays,
- output_arrays,
- input_shapes=input_shapes)
- converter.allow_custom_ops = True
- tflite_model = converter.convert()
- ofilename = os.path.join(FLAGS.export_path)
- open(ofilename, 'wb').write(tflite_model)
-
-
-if __name__ == '__main__':
- tf.app.run()
diff --git a/research/lstm_object_detection/g3doc/Interleaved_Intro.png b/research/lstm_object_detection/g3doc/Interleaved_Intro.png
deleted file mode 100644
index 2b829c997bc..00000000000
Binary files a/research/lstm_object_detection/g3doc/Interleaved_Intro.png and /dev/null differ
diff --git a/research/lstm_object_detection/g3doc/exporting_models.md b/research/lstm_object_detection/g3doc/exporting_models.md
deleted file mode 100644
index 7d501d97efd..00000000000
--- a/research/lstm_object_detection/g3doc/exporting_models.md
+++ /dev/null
@@ -1,49 +0,0 @@
-# Exporting a tflite model from a checkpoint
-
-Starting from a trained model checkpoint, creating a tflite model requires 2
-steps:
-
-* exporting a tflite frozen graph from a checkpoint
-* exporting a tflite model from a frozen graph
-
-## Exporting a tflite frozen graph from a checkpoint
-
-With a candidate checkpoint to export, run the following command from
-tensorflow/models/research:
-
-```bash
-# from tensorflow/models/research
-PIPELINE_CONFIG_PATH={path to pipeline config}
-TRAINED_CKPT_PREFIX=/{path to model.ckpt}
-EXPORT_DIR={path to folder that will be used for export}
-python lstm_object_detection/export_tflite_lstd_graph.py \
- --pipeline_config_path ${PIPELINE_CONFIG_PATH} \
- --trained_checkpoint_prefix ${TRAINED_CKPT_PREFIX} \
- --output_directory ${EXPORT_DIR} \
- --add_preprocessing_op
-```
-
-After export, you should see the directory ${EXPORT_DIR} containing the
-following files:
-
-* `tflite_graph.pb`
-* `tflite_graph.pbtxt`
-
-## Exporting a tflite model from a frozen graph
-
-We then take the exported tflite-compatable tflite model, and convert it to a
-TFLite FlatBuffer file by running the following:
-
-```bash
-# from tensorflow/models/research
-FROZEN_GRAPH_PATH={path to exported tflite_graph.pb}
-EXPORT_PATH={path to filename that will be used for export}
-PIPELINE_CONFIG_PATH={path to pipeline config}
-python lstm_object_detection/export_tflite_lstd_model.py \
- --export_path ${EXPORT_PATH} \
- --frozen_graph_path ${FROZEN_GRAPH_PATH} \
- --pipeline_config_path ${PIPELINE_CONFIG_PATH}
-```
-
-After export, you should see the file ${EXPORT_PATH} containing the FlatBuffer
-model to be used by an application.
diff --git a/research/lstm_object_detection/g3doc/lstm_ssd_intro.png b/research/lstm_object_detection/g3doc/lstm_ssd_intro.png
deleted file mode 100644
index fa62eb533b9..00000000000
Binary files a/research/lstm_object_detection/g3doc/lstm_ssd_intro.png and /dev/null differ
diff --git a/research/lstm_object_detection/inputs/__init__.py b/research/lstm_object_detection/inputs/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/lstm_object_detection/inputs/seq_dataset_builder.py b/research/lstm_object_detection/inputs/seq_dataset_builder.py
deleted file mode 100644
index 55e24820f60..00000000000
--- a/research/lstm_object_detection/inputs/seq_dataset_builder.py
+++ /dev/null
@@ -1,242 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-r"""tf.data.Dataset builder.
-
-Creates data sources for DetectionModels from an InputReader config. See
-input_reader.proto for options.
-
-Note: If users wishes to also use their own InputReaders with the Object
-Detection configuration framework, they should define their own builder function
-that wraps the build function.
-"""
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-
-from tensorflow.contrib.training.python.training import sequence_queueing_state_saver as sqss
-from lstm_object_detection.inputs import tf_sequence_example_decoder
-from lstm_object_detection.protos import input_reader_google_pb2
-from object_detection.core import preprocessor
-from object_detection.core import preprocessor_cache
-from object_detection.core import standard_fields as fields
-from object_detection.protos import input_reader_pb2
-from object_detection.utils import ops as util_ops
-
-parallel_reader = slim.parallel_reader
-# TODO(yinxiao): Make the following variable into configurable proto.
-# Padding size for the labeled objects in each frame. Here we assume each
-# frame has a total number of objects less than _PADDING_SIZE.
-_PADDING_SIZE = 30
-
-
-def _build_training_batch_dict(batch_sequences_with_states, unroll_length,
- batch_size):
- """Builds training batch samples.
-
- Args:
- batch_sequences_with_states: A batch_sequences_with_states object.
- unroll_length: Unrolled length for LSTM training.
- batch_size: Batch size for queue outputs.
-
- Returns:
- A dictionary of tensors based on items in input_reader_config.
- """
- seq_tensors_dict = {
- fields.InputDataFields.image: [],
- fields.InputDataFields.groundtruth_boxes: [],
- fields.InputDataFields.groundtruth_classes: [],
- 'batch': batch_sequences_with_states,
- }
- for i in range(unroll_length):
- for j in range(batch_size):
- filtered_dict = util_ops.filter_groundtruth_with_nan_box_coordinates({
- fields.InputDataFields.groundtruth_boxes: (
- batch_sequences_with_states.sequences['groundtruth_boxes'][j][i]),
- fields.InputDataFields.groundtruth_classes: (
- batch_sequences_with_states.sequences['groundtruth_classes'][j][i]
- ),
- })
- filtered_dict = util_ops.retain_groundtruth_with_positive_classes(
- filtered_dict)
- seq_tensors_dict[fields.InputDataFields.image].append(
- batch_sequences_with_states.sequences['image'][j][i])
- seq_tensors_dict[fields.InputDataFields.groundtruth_boxes].append(
- filtered_dict[fields.InputDataFields.groundtruth_boxes])
- seq_tensors_dict[fields.InputDataFields.groundtruth_classes].append(
- filtered_dict[fields.InputDataFields.groundtruth_classes])
- seq_tensors_dict[fields.InputDataFields.image] = tuple(
- seq_tensors_dict[fields.InputDataFields.image])
- seq_tensors_dict[fields.InputDataFields.groundtruth_boxes] = tuple(
- seq_tensors_dict[fields.InputDataFields.groundtruth_boxes])
- seq_tensors_dict[fields.InputDataFields.groundtruth_classes] = tuple(
- seq_tensors_dict[fields.InputDataFields.groundtruth_classes])
-
- return seq_tensors_dict
-
-
-def build(input_reader_config,
- model_config,
- lstm_config,
- unroll_length,
- data_augmentation_options=None,
- batch_size=1):
- """Builds a tensor dictionary based on the InputReader config.
-
- Args:
- input_reader_config: An input_reader_builder.InputReader object.
- model_config: A model.proto object containing the config for the desired
- DetectionModel.
- lstm_config: LSTM specific configs.
- unroll_length: Unrolled length for LSTM training.
- data_augmentation_options: A list of tuples, where each tuple contains a
- data augmentation function and a dictionary containing arguments and their
- values (see preprocessor.py).
- batch_size: Batch size for queue outputs.
-
- Returns:
- A dictionary of tensors based on items in the input_reader_config.
-
- Raises:
- ValueError: On invalid input reader proto.
- ValueError: If no input paths are specified.
- """
- if not isinstance(input_reader_config, input_reader_pb2.InputReader):
- raise ValueError('input_reader_config not of type '
- 'input_reader_pb2.InputReader.')
-
- external_reader_config = input_reader_config.external_input_reader
- external_input_reader_config = external_reader_config.Extensions[
- input_reader_google_pb2.GoogleInputReader.google_input_reader]
- input_reader_type = external_input_reader_config.WhichOneof('input_reader')
-
- if input_reader_type == 'tf_record_video_input_reader':
- config = external_input_reader_config.tf_record_video_input_reader
- reader_type_class = tf.TFRecordReader
- else:
- raise ValueError(
- 'Unsupported reader in input_reader_config: %s' % input_reader_type)
-
- if not config.input_path:
- raise ValueError('At least one input path must be specified in '
- '`input_reader_config`.')
- key, value = parallel_reader.parallel_read(
- config.input_path[:], # Convert `RepeatedScalarContainer` to list.
- reader_class=reader_type_class,
- num_epochs=(input_reader_config.num_epochs
- if input_reader_config.num_epochs else None),
- num_readers=input_reader_config.num_readers,
- shuffle=input_reader_config.shuffle,
- dtypes=[tf.string, tf.string],
- capacity=input_reader_config.queue_capacity,
- min_after_dequeue=input_reader_config.min_after_dequeue)
-
- # TODO(yinxiao): Add loading instance mask option.
- decoder = tf_sequence_example_decoder.TFSequenceExampleDecoder()
-
- keys_to_decode = [
- fields.InputDataFields.image, fields.InputDataFields.groundtruth_boxes,
- fields.InputDataFields.groundtruth_classes
- ]
- tensor_dict = decoder.decode(value, items=keys_to_decode)
-
- tensor_dict['image'].set_shape([None, None, None, 3])
- tensor_dict['groundtruth_boxes'].set_shape([None, None, 4])
-
- height = model_config.ssd.image_resizer.fixed_shape_resizer.height
- width = model_config.ssd.image_resizer.fixed_shape_resizer.width
-
- # If data augmentation is specified in the config file, the preprocessor
- # will be called here to augment the data as specified. Most common
- # augmentations include horizontal flip and cropping.
- if data_augmentation_options:
- images_pre = tf.split(tensor_dict['image'], config.video_length, axis=0)
- bboxes_pre = tf.split(
- tensor_dict['groundtruth_boxes'], config.video_length, axis=0)
- labels_pre = tf.split(
- tensor_dict['groundtruth_classes'], config.video_length, axis=0)
- images_proc, bboxes_proc, labels_proc = [], [], []
- cache = preprocessor_cache.PreprocessorCache()
-
- for i, _ in enumerate(images_pre):
- image_dict = {
- fields.InputDataFields.image:
- images_pre[i],
- fields.InputDataFields.groundtruth_boxes:
- tf.squeeze(bboxes_pre[i], axis=0),
- fields.InputDataFields.groundtruth_classes:
- tf.squeeze(labels_pre[i], axis=0),
- }
- image_dict = preprocessor.preprocess(
- image_dict,
- data_augmentation_options,
- func_arg_map=preprocessor.get_default_func_arg_map(),
- preprocess_vars_cache=cache)
- # Pads detection count to _PADDING_SIZE.
- image_dict[fields.InputDataFields.groundtruth_boxes] = tf.pad(
- image_dict[fields.InputDataFields.groundtruth_boxes],
- [[0, _PADDING_SIZE], [0, 0]])
- image_dict[fields.InputDataFields.groundtruth_boxes] = tf.slice(
- image_dict[fields.InputDataFields.groundtruth_boxes], [0, 0],
- [_PADDING_SIZE, -1])
- image_dict[fields.InputDataFields.groundtruth_classes] = tf.pad(
- image_dict[fields.InputDataFields.groundtruth_classes],
- [[0, _PADDING_SIZE]])
- image_dict[fields.InputDataFields.groundtruth_classes] = tf.slice(
- image_dict[fields.InputDataFields.groundtruth_classes], [0],
- [_PADDING_SIZE])
- images_proc.append(image_dict[fields.InputDataFields.image])
- bboxes_proc.append(image_dict[fields.InputDataFields.groundtruth_boxes])
- labels_proc.append(image_dict[fields.InputDataFields.groundtruth_classes])
- tensor_dict['image'] = tf.concat(images_proc, axis=0)
- tensor_dict['groundtruth_boxes'] = tf.stack(bboxes_proc, axis=0)
- tensor_dict['groundtruth_classes'] = tf.stack(labels_proc, axis=0)
- else:
- # Pads detection count to _PADDING_SIZE per frame.
- tensor_dict['groundtruth_boxes'] = tf.pad(
- tensor_dict['groundtruth_boxes'], [[0, 0], [0, _PADDING_SIZE], [0, 0]])
- tensor_dict['groundtruth_boxes'] = tf.slice(
- tensor_dict['groundtruth_boxes'], [0, 0, 0], [-1, _PADDING_SIZE, -1])
- tensor_dict['groundtruth_classes'] = tf.pad(
- tensor_dict['groundtruth_classes'], [[0, 0], [0, _PADDING_SIZE]])
- tensor_dict['groundtruth_classes'] = tf.slice(
- tensor_dict['groundtruth_classes'], [0, 0], [-1, _PADDING_SIZE])
-
- tensor_dict['image'], _ = preprocessor.resize_image(
- tensor_dict['image'], new_height=height, new_width=width)
-
- num_steps = config.video_length / unroll_length
-
- init_states = {
- 'lstm_state_c':
- tf.zeros([height / 32, width / 32, lstm_config.lstm_state_depth]),
- 'lstm_state_h':
- tf.zeros([height / 32, width / 32, lstm_config.lstm_state_depth]),
- 'lstm_state_step':
- tf.constant(num_steps, shape=[]),
- }
-
- batch = sqss.batch_sequences_with_states(
- input_key=key,
- input_sequences=tensor_dict,
- input_context={},
- input_length=None,
- initial_states=init_states,
- num_unroll=unroll_length,
- batch_size=batch_size,
- num_threads=batch_size,
- make_keys_unique=True,
- capacity=batch_size * batch_size)
-
- return _build_training_batch_dict(batch, unroll_length, batch_size)
diff --git a/research/lstm_object_detection/inputs/seq_dataset_builder_test.py b/research/lstm_object_detection/inputs/seq_dataset_builder_test.py
deleted file mode 100644
index 4b894d24f71..00000000000
--- a/research/lstm_object_detection/inputs/seq_dataset_builder_test.py
+++ /dev/null
@@ -1,282 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for dataset_builder."""
-
-import os
-import numpy as np
-import tensorflow.compat.v1 as tf
-
-from google.protobuf import text_format
-from tensorflow.core.example import example_pb2
-from tensorflow.core.example import feature_pb2
-from lstm_object_detection.inputs import seq_dataset_builder
-from lstm_object_detection.protos import pipeline_pb2 as internal_pipeline_pb2
-from object_detection.builders import preprocessor_builder
-from object_detection.core import standard_fields as fields
-from object_detection.protos import input_reader_pb2
-from object_detection.protos import pipeline_pb2
-from object_detection.protos import preprocessor_pb2
-
-
-class DatasetBuilderTest(tf.test.TestCase):
-
- def _create_tf_record(self):
- path = os.path.join(self.get_temp_dir(), 'tfrecord')
- writer = tf.python_io.TFRecordWriter(path)
-
- image_tensor = np.random.randint(255, size=(16, 16, 3)).astype(np.uint8)
- with self.test_session():
- encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).eval()
-
- sequence_example = example_pb2.SequenceExample(
- context=feature_pb2.Features(
- feature={
- 'image/format':
- feature_pb2.Feature(
- bytes_list=feature_pb2.BytesList(
- value=['jpeg'.encode('utf-8')])),
- 'image/height':
- feature_pb2.Feature(
- int64_list=feature_pb2.Int64List(value=[16])),
- 'image/width':
- feature_pb2.Feature(
- int64_list=feature_pb2.Int64List(value=[16])),
- }),
- feature_lists=feature_pb2.FeatureLists(
- feature_list={
- 'image/encoded':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- bytes_list=feature_pb2.BytesList(
- value=[encoded_jpeg])),
- ]),
- 'image/object/bbox/xmin':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- float_list=feature_pb2.FloatList(value=[0.0])),
- ]),
- 'image/object/bbox/xmax':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- float_list=feature_pb2.FloatList(value=[1.0]))
- ]),
- 'image/object/bbox/ymin':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- float_list=feature_pb2.FloatList(value=[0.0])),
- ]),
- 'image/object/bbox/ymax':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- float_list=feature_pb2.FloatList(value=[1.0]))
- ]),
- 'image/object/class/label':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- int64_list=feature_pb2.Int64List(value=[2]))
- ]),
- }))
-
- writer.write(sequence_example.SerializeToString())
- writer.close()
-
- return path
-
- def _get_model_configs_from_proto(self):
- """Creates a model text proto for testing.
-
- Returns:
- A dictionary of model configs.
- """
-
- model_text_proto = """
- [lstm_object_detection.protos.lstm_model] {
- train_unroll_length: 4
- eval_unroll_length: 4
- }
- model {
- ssd {
- feature_extractor {
- type: 'lstm_mobilenet_v1_fpn'
- conv_hyperparams {
- regularizer {
- l2_regularizer {
- }
- }
- initializer {
- truncated_normal_initializer {
- }
- }
- }
- }
- negative_class_weight: 2.0
- box_coder {
- faster_rcnn_box_coder {
- }
- }
- matcher {
- argmax_matcher {
- }
- }
- similarity_calculator {
- iou_similarity {
- }
- }
- anchor_generator {
- ssd_anchor_generator {
- aspect_ratios: 1.0
- }
- }
- image_resizer {
- fixed_shape_resizer {
- height: 32
- width: 32
- }
- }
- box_predictor {
- convolutional_box_predictor {
- conv_hyperparams {
- regularizer {
- l2_regularizer {
- }
- }
- initializer {
- truncated_normal_initializer {
- }
- }
- }
- }
- }
- normalize_loc_loss_by_codesize: true
- loss {
- classification_loss {
- weighted_softmax {
- }
- }
- localization_loss {
- weighted_smooth_l1 {
- }
- }
- }
- }
- }"""
-
- pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
- text_format.Merge(model_text_proto, pipeline_config)
- configs = {}
- configs['model'] = pipeline_config.model
- configs['lstm_model'] = pipeline_config.Extensions[
- internal_pipeline_pb2.lstm_model]
-
- return configs
-
- def _get_data_augmentation_preprocessor_proto(self):
- preprocessor_text_proto = """
- random_horizontal_flip {
- }
- """
- preprocessor_proto = preprocessor_pb2.PreprocessingStep()
- text_format.Merge(preprocessor_text_proto, preprocessor_proto)
- return preprocessor_proto
-
- def _create_training_dict(self, tensor_dict):
- image_dict = {}
- all_dict = {}
- all_dict['batch'] = tensor_dict.pop('batch')
- for i, _ in enumerate(tensor_dict[fields.InputDataFields.image]):
- for key, val in tensor_dict.items():
- image_dict[key] = val[i]
-
- image_dict[fields.InputDataFields.image] = tf.to_float(
- tf.expand_dims(image_dict[fields.InputDataFields.image], 0))
- suffix = str(i)
- for key, val in image_dict.items():
- all_dict[key + suffix] = val
- return all_dict
-
- def _get_input_proto(self, input_reader):
- return """
- external_input_reader {
- [lstm_object_detection.protos.GoogleInputReader.google_input_reader] {
- %s: {
- input_path: '{0}'
- data_type: TF_SEQUENCE_EXAMPLE
- video_length: 4
- }
- }
- }
- """ % input_reader
-
- def test_video_input_reader(self):
- input_reader_proto = input_reader_pb2.InputReader()
- text_format.Merge(
- self._get_input_proto('tf_record_video_input_reader'),
- input_reader_proto)
-
- configs = self._get_model_configs_from_proto()
- tensor_dict = seq_dataset_builder.build(
- input_reader_proto,
- configs['model'],
- configs['lstm_model'],
- unroll_length=1)
-
- all_dict = self._create_training_dict(tensor_dict)
-
- self.assertEqual((1, 32, 32, 3), all_dict['image0'].shape)
- self.assertEqual(4, all_dict['groundtruth_boxes0'].shape[1])
-
- def test_build_with_data_augmentation(self):
- input_reader_proto = input_reader_pb2.InputReader()
- text_format.Merge(
- self._get_input_proto('tf_record_video_input_reader'),
- input_reader_proto)
-
- configs = self._get_model_configs_from_proto()
- data_augmentation_options = [
- preprocessor_builder.build(
- self._get_data_augmentation_preprocessor_proto())
- ]
- tensor_dict = seq_dataset_builder.build(
- input_reader_proto,
- configs['model'],
- configs['lstm_model'],
- unroll_length=1,
- data_augmentation_options=data_augmentation_options)
-
- all_dict = self._create_training_dict(tensor_dict)
- self.assertEqual((1, 32, 32, 3), all_dict['image0'].shape)
- self.assertEqual(4, all_dict['groundtruth_boxes0'].shape[1])
-
- def test_raises_error_without_input_paths(self):
- input_reader_text_proto = """
- shuffle: false
- num_readers: 1
- load_instance_masks: true
- """
- input_reader_proto = input_reader_pb2.InputReader()
- text_format.Merge(input_reader_text_proto, input_reader_proto)
-
- configs = self._get_model_configs_from_proto()
- with self.assertRaises(ValueError):
- _ = seq_dataset_builder.build(
- input_reader_proto,
- configs['model'],
- configs['lstm_model'],
- unroll_length=1)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/inputs/tf_sequence_example_decoder.py b/research/lstm_object_detection/inputs/tf_sequence_example_decoder.py
deleted file mode 100644
index def945b3f07..00000000000
--- a/research/lstm_object_detection/inputs/tf_sequence_example_decoder.py
+++ /dev/null
@@ -1,263 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tensorflow Sequence Example proto decoder.
-
-A decoder to decode string tensors containing serialized
-tensorflow.SequenceExample protos.
-"""
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-from object_detection.core import data_decoder
-from object_detection.core import standard_fields as fields
-
-tfexample_decoder = slim.tfexample_decoder
-
-
-class BoundingBoxSequence(tfexample_decoder.ItemHandler):
- """An ItemHandler that concatenates SparseTensors to Bounding Boxes.
- """
-
- def __init__(self, keys=None, prefix=None, return_dense=True,
- default_value=-1.0):
- """Initialize the bounding box handler.
-
- Args:
- keys: A list of four key names representing the ymin, xmin, ymax, xmax
- in the Example or SequenceExample.
- prefix: An optional prefix for each of the bounding box keys in the
- Example or SequenceExample. If provided, `prefix` is prepended to each
- key in `keys`.
- return_dense: if True, returns a dense tensor; if False, returns as
- sparse tensor.
- default_value: The value used when the `tensor_key` is not found in a
- particular `TFExample`.
-
- Raises:
- ValueError: if keys is not `None` and also not a list of exactly 4 keys
- """
- if keys is None:
- keys = ['ymin', 'xmin', 'ymax', 'xmax']
- elif len(keys) != 4:
- raise ValueError('BoundingBoxSequence expects 4 keys but got {}'.format(
- len(keys)))
- self._prefix = prefix
- self._keys = keys
- self._full_keys = [prefix + k for k in keys]
- self._return_dense = return_dense
- self._default_value = default_value
- super(BoundingBoxSequence, self).__init__(self._full_keys)
-
- def tensors_to_item(self, keys_to_tensors):
- """Maps the given dictionary of tensors to a concatenated list of bboxes.
-
- Args:
- keys_to_tensors: a mapping of TF-Example keys to parsed tensors.
-
- Returns:
- [time, num_boxes, 4] tensor of bounding box coordinates, in order
- [y_min, x_min, y_max, x_max]. Whether the tensor is a SparseTensor
- or a dense Tensor is determined by the return_dense parameter. Empty
- positions in the sparse tensor are filled with -1.0 values.
- """
- sides = []
- for key in self._full_keys:
- value = keys_to_tensors[key]
- expanded_dims = tf.concat(
- [tf.to_int64(tf.shape(value)),
- tf.constant([1], dtype=tf.int64)], 0)
- side = tf.sparse_reshape(value, expanded_dims)
- sides.append(side)
- bounding_boxes = tf.sparse_concat(2, sides)
- if self._return_dense:
- bounding_boxes = tf.sparse_tensor_to_dense(
- bounding_boxes, default_value=self._default_value)
- return bounding_boxes
-
-
-class TFSequenceExampleDecoder(data_decoder.DataDecoder):
- """Tensorflow Sequence Example proto decoder."""
-
- def __init__(self):
- """Constructor sets keys_to_features and items_to_handlers."""
- self.keys_to_context_features = {
- 'image/format':
- tf.FixedLenFeature((), tf.string, default_value='jpeg'),
- 'image/filename':
- tf.FixedLenFeature((), tf.string, default_value=''),
- 'image/key/sha256':
- tf.FixedLenFeature((), tf.string, default_value=''),
- 'image/source_id':
- tf.FixedLenFeature((), tf.string, default_value=''),
- 'image/height':
- tf.FixedLenFeature((), tf.int64, 1),
- 'image/width':
- tf.FixedLenFeature((), tf.int64, 1),
- }
- self.keys_to_features = {
- 'image/encoded': tf.FixedLenSequenceFeature((), tf.string),
- 'bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
- 'bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
- 'bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
- 'bbox/ymax': tf.VarLenFeature(dtype=tf.float32),
- 'bbox/label/index': tf.VarLenFeature(dtype=tf.int64),
- 'bbox/label/string': tf.VarLenFeature(tf.string),
- 'area': tf.VarLenFeature(tf.float32),
- 'is_crowd': tf.VarLenFeature(tf.int64),
- 'difficult': tf.VarLenFeature(tf.int64),
- 'group_of': tf.VarLenFeature(tf.int64),
- }
- self.items_to_handlers = {
- fields.InputDataFields.image:
- tfexample_decoder.Image(
- image_key='image/encoded',
- format_key='image/format',
- channels=3,
- repeated=True),
- fields.InputDataFields.source_id: (
- tfexample_decoder.Tensor('image/source_id')),
- fields.InputDataFields.key: (
- tfexample_decoder.Tensor('image/key/sha256')),
- fields.InputDataFields.filename: (
- tfexample_decoder.Tensor('image/filename')),
- # Object boxes and classes.
- fields.InputDataFields.groundtruth_boxes:
- BoundingBoxSequence(prefix='bbox/'),
- fields.InputDataFields.groundtruth_classes: (
- tfexample_decoder.Tensor('bbox/label/index')),
- fields.InputDataFields.groundtruth_area:
- tfexample_decoder.Tensor('area'),
- fields.InputDataFields.groundtruth_is_crowd: (
- tfexample_decoder.Tensor('is_crowd')),
- fields.InputDataFields.groundtruth_difficult: (
- tfexample_decoder.Tensor('difficult')),
- fields.InputDataFields.groundtruth_group_of: (
- tfexample_decoder.Tensor('group_of'))
- }
-
- def decode(self, tf_seq_example_string_tensor, items=None):
- """Decodes serialized tf.SequenceExample and returns a tensor dictionary.
-
- Args:
- tf_seq_example_string_tensor: A string tensor holding a serialized
- tensorflow example proto.
- items: The list of items to decode. These must be a subset of the item
- keys in self._items_to_handlers. If `items` is left as None, then all
- of the items in self._items_to_handlers are decoded.
-
- Returns:
- A dictionary of the following tensors.
- fields.InputDataFields.image - 3D uint8 tensor of shape [None, None, seq]
- containing image(s).
- fields.InputDataFields.source_id - string tensor containing original
- image id.
- fields.InputDataFields.key - string tensor with unique sha256 hash key.
- fields.InputDataFields.filename - string tensor with original dataset
- filename.
- fields.InputDataFields.groundtruth_boxes - 2D float32 tensor of shape
- [None, 4] containing box corners.
- fields.InputDataFields.groundtruth_classes - 1D int64 tensor of shape
- [None] containing classes for the boxes.
- fields.InputDataFields.groundtruth_area - 1D float32 tensor of shape
- [None] containing object mask area in pixel squared.
- fields.InputDataFields.groundtruth_is_crowd - 1D bool tensor of shape
- [None] indicating if the boxes enclose a crowd.
- fields.InputDataFields.groundtruth_difficult - 1D bool tensor of shape
- [None] indicating if the boxes represent `difficult` instances.
- """
- serialized_example = tf.reshape(tf_seq_example_string_tensor, shape=[])
- decoder = TFSequenceExampleDecoderHelper(self.keys_to_context_features,
- self.keys_to_features,
- self.items_to_handlers)
- if not items:
- items = decoder.list_items()
- tensors = decoder.decode(serialized_example, items=items)
- tensor_dict = dict(zip(items, tensors))
-
- return tensor_dict
-
-
-class TFSequenceExampleDecoderHelper(data_decoder.DataDecoder):
- """A decoder helper class for TensorFlow SequenceExamples.
-
- To perform this decoding operation, a SequenceExampleDecoder is given a list
- of ItemHandlers. Each ItemHandler indicates the set of features.
- """
-
- def __init__(self, keys_to_context_features, keys_to_sequence_features,
- items_to_handlers):
- """Constructs the decoder.
-
- Args:
- keys_to_context_features: A dictionary from TF-SequenceExample context
- keys to either tf.VarLenFeature or tf.FixedLenFeature instances.
- See tensorflow's parsing_ops.py.
- keys_to_sequence_features: A dictionary from TF-SequenceExample sequence
- keys to either tf.VarLenFeature or tf.FixedLenSequenceFeature instances.
- items_to_handlers: A dictionary from items (strings) to ItemHandler
- instances. Note that the ItemHandler's are provided the keys that they
- use to return the final item Tensors.
- Raises:
- ValueError: If the same key is present for context features and sequence
- features.
- """
- unique_keys = set()
- unique_keys.update(keys_to_context_features)
- unique_keys.update(keys_to_sequence_features)
- if len(unique_keys) != (
- len(keys_to_context_features) + len(keys_to_sequence_features)):
- # This situation is ambiguous in the decoder's keys_to_tensors variable.
- raise ValueError('Context and sequence keys are not unique. \n'
- ' Context keys: %s \n Sequence keys: %s' %
- (list(keys_to_context_features.keys()),
- list(keys_to_sequence_features.keys())))
- self._keys_to_context_features = keys_to_context_features
- self._keys_to_sequence_features = keys_to_sequence_features
- self._items_to_handlers = items_to_handlers
-
- def list_items(self):
- """Returns keys of items."""
- return self._items_to_handlers.keys()
-
- def decode(self, serialized_example, items=None):
- """Decodes the given serialized TF-SequenceExample.
-
- Args:
- serialized_example: A serialized TF-SequenceExample tensor.
- items: The list of items to decode. These must be a subset of the item
- keys in self._items_to_handlers. If `items` is left as None, then all
- of the items in self._items_to_handlers are decoded.
- Returns:
- The decoded items, a list of tensor.
- """
- context, feature_list = tf.parse_single_sequence_example(
- serialized_example, self._keys_to_context_features,
- self._keys_to_sequence_features)
- # Reshape non-sparse elements just once:
- for k in self._keys_to_context_features:
- v = self._keys_to_context_features[k]
- if isinstance(v, tf.FixedLenFeature):
- context[k] = tf.reshape(context[k], v.shape)
- if not items:
- items = self._items_to_handlers.keys()
- outputs = []
- for item in items:
- handler = self._items_to_handlers[item]
- keys_to_tensors = {
- key: context[key] if key in context else feature_list[key]
- for key in handler.keys
- }
- outputs.append(handler.tensors_to_item(keys_to_tensors))
- return outputs
diff --git a/research/lstm_object_detection/inputs/tf_sequence_example_decoder_test.py b/research/lstm_object_detection/inputs/tf_sequence_example_decoder_test.py
deleted file mode 100644
index dbbb8d3c744..00000000000
--- a/research/lstm_object_detection/inputs/tf_sequence_example_decoder_test.py
+++ /dev/null
@@ -1,113 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for lstm_object_detection.tf_sequence_example_decoder."""
-
-import numpy as np
-import tensorflow.compat.v1 as tf
-from tensorflow.core.example import example_pb2
-from tensorflow.core.example import feature_pb2
-from tensorflow.python.framework import dtypes
-from tensorflow.python.ops import parsing_ops
-from lstm_object_detection.inputs import tf_sequence_example_decoder
-from object_detection.core import standard_fields as fields
-
-
-class TFSequenceExampleDecoderTest(tf.test.TestCase):
- """Tests for sequence example decoder."""
-
- def _EncodeImage(self, image_tensor, encoding_type='jpeg'):
- with self.test_session():
- if encoding_type == 'jpeg':
- image_encoded = tf.image.encode_jpeg(tf.constant(image_tensor)).eval()
- else:
- raise ValueError('Invalid encoding type.')
- return image_encoded
-
- def _DecodeImage(self, image_encoded, encoding_type='jpeg'):
- with self.test_session():
- if encoding_type == 'jpeg':
- image_decoded = tf.image.decode_jpeg(tf.constant(image_encoded)).eval()
- else:
- raise ValueError('Invalid encoding type.')
- return image_decoded
-
- def testDecodeJpegImageAndBoundingBox(self):
- """Test if the decoder can correctly decode the image and bounding box.
-
- A set of random images (represented as an image tensor) is first decoded as
- the groundtrue image. Meanwhile, the image tensor will be encoded and pass
- through the sequence example, and then decoded as images. The groundtruth
- image and the decoded image are expected to be equal. Similar tests are
- also applied to labels such as bounding box.
- """
- image_tensor = np.random.randint(256, size=(256, 256, 3)).astype(np.uint8)
- encoded_jpeg = self._EncodeImage(image_tensor)
- decoded_jpeg = self._DecodeImage(encoded_jpeg)
-
- sequence_example = example_pb2.SequenceExample(
- feature_lists=feature_pb2.FeatureLists(
- feature_list={
- 'image/encoded':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- bytes_list=feature_pb2.BytesList(
- value=[encoded_jpeg])),
- ]),
- 'bbox/xmin':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- float_list=feature_pb2.FloatList(value=[0.0])),
- ]),
- 'bbox/xmax':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- float_list=feature_pb2.FloatList(value=[1.0]))
- ]),
- 'bbox/ymin':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- float_list=feature_pb2.FloatList(value=[0.0])),
- ]),
- 'bbox/ymax':
- feature_pb2.FeatureList(feature=[
- feature_pb2.Feature(
- float_list=feature_pb2.FloatList(value=[1.0]))
- ]),
- })).SerializeToString()
-
- example_decoder = tf_sequence_example_decoder.TFSequenceExampleDecoder()
- tensor_dict = example_decoder.decode(tf.convert_to_tensor(sequence_example))
-
- # Test tensor dict image dimension.
- self.assertAllEqual(
- (tensor_dict[fields.InputDataFields.image].get_shape().as_list()),
- [None, None, None, 3])
- with self.test_session() as sess:
- tensor_dict[fields.InputDataFields.image] = tf.squeeze(
- tensor_dict[fields.InputDataFields.image])
- tensor_dict[fields.InputDataFields.groundtruth_boxes] = tf.squeeze(
- tensor_dict[fields.InputDataFields.groundtruth_boxes])
- tensor_dict = sess.run(tensor_dict)
-
- # Test decoded image.
- self.assertAllEqual(decoded_jpeg, tensor_dict[fields.InputDataFields.image])
- # Test decoded bounding box.
- self.assertAllEqual([0.0, 0.0, 1.0, 1.0],
- tensor_dict[fields.InputDataFields.groundtruth_boxes])
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/lstm/__init__.py b/research/lstm_object_detection/lstm/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/lstm_object_detection/lstm/lstm_cells.py b/research/lstm_object_detection/lstm/lstm_cells.py
deleted file mode 100644
index a553073d978..00000000000
--- a/research/lstm_object_detection/lstm/lstm_cells.py
+++ /dev/null
@@ -1,734 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""BottleneckConvLSTMCell implementation."""
-import functools
-
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-
-from tensorflow.contrib import rnn as contrib_rnn
-from tensorflow.contrib.framework.python.ops import variables as contrib_variables
-import lstm_object_detection.lstm.utils as lstm_utils
-
-
-class BottleneckConvLSTMCell(contrib_rnn.RNNCell):
- """Basic LSTM recurrent network cell using separable convolutions.
-
- The implementation is based on:
- Mobile Video Object Detection with Temporally-Aware Feature Maps
- https://arxiv.org/abs/1711.06368.
-
- We add forget_bias (default: 1) to the biases of the forget gate in order to
- reduce the scale of forgetting in the beginning of the training.
-
- This LSTM first projects inputs to the size of the output before doing gate
- computations. This saves params unless the input is less than a third of the
- state size channel-wise.
- """
-
- def __init__(self,
- filter_size,
- output_size,
- num_units,
- forget_bias=1.0,
- activation=tf.tanh,
- flatten_state=False,
- clip_state=False,
- output_bottleneck=False,
- pre_bottleneck=False,
- visualize_gates=False):
- """Initializes the basic LSTM cell.
-
- Args:
- filter_size: collection, conv filter size.
- output_size: collection, the width/height dimensions of the cell/output.
- num_units: int, The number of channels in the LSTM cell.
- forget_bias: float, The bias added to forget gates (see above).
- activation: Activation function of the inner states.
- flatten_state: if True, state tensor will be flattened and stored as a 2-d
- tensor. Use for exporting the model to tfmini.
- clip_state: if True, clip state between [-6, 6].
- output_bottleneck: if True, the cell bottleneck will be concatenated to
- the cell output.
- pre_bottleneck: if True, cell assumes that bottlenecking was performing
- before the function was called.
- visualize_gates: if True, add histogram summaries of all gates and outputs
- to tensorboard.
- """
- self._filter_size = list(filter_size)
- self._output_size = list(output_size)
- self._num_units = num_units
- self._forget_bias = forget_bias
- self._activation = activation
- self._viz_gates = visualize_gates
- self._flatten_state = flatten_state
- self._clip_state = clip_state
- self._output_bottleneck = output_bottleneck
- self._pre_bottleneck = pre_bottleneck
- self._param_count = self._num_units
- for dim in self._output_size:
- self._param_count *= dim
-
- @property
- def state_size(self):
- return contrib_rnn.LSTMStateTuple(self._output_size + [self._num_units],
- self._output_size + [self._num_units])
-
- @property
- def state_size_flat(self):
- return contrib_rnn.LSTMStateTuple([self._param_count], [self._param_count])
-
- @property
- def output_size(self):
- return self._output_size + [self._num_units]
-
- def __call__(self, inputs, state, scope=None):
- """Long short-term memory cell (LSTM) with bottlenecking.
-
- Args:
- inputs: Input tensor at the current timestep.
- state: Tuple of tensors, the state and output at the previous timestep.
- scope: Optional scope.
-
- Returns:
- A tuple where the first element is the LSTM output and the second is
- a LSTMStateTuple of the state at the current timestep.
- """
- scope = scope or 'conv_lstm_cell'
- with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
- c, h = state
-
- # unflatten state if necessary
- if self._flatten_state:
- c = tf.reshape(c, [-1] + self.output_size)
- h = tf.reshape(h, [-1] + self.output_size)
-
- # summary of input passed into cell
- if self._viz_gates:
- slim.summaries.add_histogram_summary(inputs, 'cell_input')
- if self._pre_bottleneck:
- bottleneck = inputs
- else:
- bottleneck = slim.separable_conv2d(
- tf.concat([inputs, h], 3),
- self._num_units,
- self._filter_size,
- depth_multiplier=1,
- activation_fn=self._activation,
- normalizer_fn=None,
- scope='bottleneck')
-
- if self._viz_gates:
- slim.summaries.add_histogram_summary(bottleneck, 'bottleneck')
-
- concat = slim.separable_conv2d(
- bottleneck,
- 4 * self._num_units,
- self._filter_size,
- depth_multiplier=1,
- activation_fn=None,
- normalizer_fn=None,
- scope='gates')
-
- i, j, f, o = tf.split(concat, 4, 3)
-
- new_c = (
- c * tf.sigmoid(f + self._forget_bias) +
- tf.sigmoid(i) * self._activation(j))
- if self._clip_state:
- new_c = tf.clip_by_value(new_c, -6, 6)
- new_h = self._activation(new_c) * tf.sigmoid(o)
- # summary of cell output and new state
- if self._viz_gates:
- slim.summaries.add_histogram_summary(new_h, 'cell_output')
- slim.summaries.add_histogram_summary(new_c, 'cell_state')
-
- output = new_h
- if self._output_bottleneck:
- output = tf.concat([new_h, bottleneck], axis=3)
-
- # reflatten state to store it
- if self._flatten_state:
- new_c = tf.reshape(new_c, [-1, self._param_count])
- new_h = tf.reshape(new_h, [-1, self._param_count])
-
- return output, contrib_rnn.LSTMStateTuple(new_c, new_h)
-
- def init_state(self, state_name, batch_size, dtype, learned_state=False):
- """Creates an initial state compatible with this cell.
-
- Args:
- state_name: name of the state tensor
- batch_size: model batch size
- dtype: dtype for the tensor values i.e. tf.float32
- learned_state: whether the initial state should be learnable. If false,
- the initial state is set to all 0's
-
- Returns:
- The created initial state.
- """
- state_size = (
- self.state_size_flat if self._flatten_state else self.state_size)
- # list of 2 zero tensors or variables tensors, depending on if
- # learned_state is true
- # pylint: disable=g-long-ternary,g-complex-comprehension
- ret_flat = [(contrib_variables.model_variable(
- state_name + str(i),
- shape=s,
- dtype=dtype,
- initializer=tf.truncated_normal_initializer(stddev=0.03))
- if learned_state else tf.zeros(
- [batch_size] + s, dtype=dtype, name=state_name))
- for i, s in enumerate(state_size)]
-
- # duplicates initial state across the batch axis if it's learned
- if learned_state:
- ret_flat = [
- tf.stack([tensor
- for i in range(int(batch_size))])
- for tensor in ret_flat
- ]
- for s, r in zip(state_size, ret_flat):
- r.set_shape([None] + s)
- return tf.nest.pack_sequence_as(structure=[1, 1], flat_sequence=ret_flat)
-
- def pre_bottleneck(self, inputs, state, input_index):
- """Apply pre-bottleneck projection to inputs.
-
- Pre-bottleneck operation maps features of different channels into the same
- dimension. The purpose of this op is to share the features from both large
- and small models in the same LSTM cell.
-
- Args:
- inputs: 4D Tensor with shape [batch_size x width x height x input_size].
- state: 4D Tensor with shape [batch_size x width x height x state_size].
- input_index: integer index indicating which base features the inputs
- correspoding to.
-
- Returns:
- inputs: pre-bottlenecked inputs.
- Raises:
- ValueError: If pre_bottleneck is not set or inputs is not rank 4.
- """
- # Sometimes state is a tuple, in which case it cannot be modified, e.g.
- # during training, tf.contrib.training.SequenceQueueingStateSaver
- # returns the state as a tuple. This should not be an issue since we
- # only need to modify state[1] during export, when state should be a
- # list.
- if len(inputs.shape) != 4:
- raise ValueError('Expect rank 4 feature tensor.')
- if not self._flatten_state and len(state.shape) != 4:
- raise ValueError('Expect rank 4 state tensor.')
- if self._flatten_state and len(state.shape) != 2:
- raise ValueError('Expect rank 2 state tensor when flatten_state is set.')
-
- with tf.name_scope(None):
- state = tf.identity(state, name='raw_inputs/init_lstm_h')
- if self._flatten_state:
- batch_size = inputs.shape[0]
- height = inputs.shape[1]
- width = inputs.shape[2]
- state = tf.reshape(state, [batch_size, height, width, -1])
- with tf.variable_scope('conv_lstm_cell', reuse=tf.AUTO_REUSE):
- scope_name = 'bottleneck_%d' % input_index
- inputs = slim.separable_conv2d(
- tf.concat([inputs, state], 3),
- self.output_size[-1],
- self._filter_size,
- depth_multiplier=1,
- activation_fn=tf.nn.relu6,
- normalizer_fn=None,
- scope=scope_name)
- # For exporting inference graph, we only mark the first timestep.
- with tf.name_scope(None):
- inputs = tf.identity(
- inputs, name='raw_outputs/base_endpoint_%d' % (input_index + 1))
- return inputs
-
-
-class GroupedConvLSTMCell(contrib_rnn.RNNCell):
- """Basic LSTM recurrent network cell using separable convolutions.
-
- The implementation is based on: https://arxiv.org/abs/1903.10172.
-
- We add forget_bias (default: 1) to the biases of the forget gate in order to
- reduce the scale of forgetting in the beginning of the training.
-
- This LSTM first projects inputs to the size of the output before doing gate
- computations. This saves params unless the input is less than a third of the
- state size channel-wise. Computation of bottlenecks and gates is divided
- into independent groups for further savings.
- """
-
- def __init__(self,
- filter_size,
- output_size,
- num_units,
- is_training,
- forget_bias=1.0,
- activation=tf.tanh,
- use_batch_norm=False,
- flatten_state=False,
- groups=4,
- clip_state=False,
- scale_state=False,
- output_bottleneck=False,
- pre_bottleneck=False,
- is_quantized=False,
- visualize_gates=False,
- conv_op_overrides=None):
- """Initialize the basic LSTM cell.
-
- Args:
- filter_size: collection, conv filter size
- output_size: collection, the width/height dimensions of the cell/output
- num_units: int, The number of channels in the LSTM cell.
- is_training: Whether the LSTM is in training mode.
- forget_bias: float, The bias added to forget gates (see above).
- activation: Activation function of the inner states.
- use_batch_norm: if True, use batch norm after convolution
- flatten_state: if True, state tensor will be flattened and stored as a 2-d
- tensor. Use for exporting the model to tfmini
- groups: Number of groups to split the state into. Must evenly divide
- num_units.
- clip_state: if True, clips state between [-6, 6].
- scale_state: if True, scales state so that all values are under 6 at all
- times.
- output_bottleneck: if True, the cell bottleneck will be concatenated to
- the cell output.
- pre_bottleneck: if True, cell assumes that bottlenecking was performing
- before the function was called.
- is_quantized: if True, the model is in quantize mode, which requires
- quantization friendly concat and separable_conv2d ops.
- visualize_gates: if True, add histogram summaries of all gates and outputs
- to tensorboard
- conv_op_overrides: A list of convolutional operations that override the
- 'bottleneck' and 'convolution' layers before lstm gates. If None, the
- original implementation of seperable_conv will be used. The length of
- the list should be two.
-
- Raises:
- ValueError: when both clip_state and scale_state are enabled.
- """
- if clip_state and scale_state:
- raise ValueError('clip_state and scale_state cannot both be enabled.')
-
- self._filter_size = list(filter_size)
- self._output_size = list(output_size)
- self._num_units = num_units
- self._is_training = is_training
- self._forget_bias = forget_bias
- self._activation = activation
- self._use_batch_norm = use_batch_norm
- self._viz_gates = visualize_gates
- self._flatten_state = flatten_state
- self._param_count = self._num_units
- self._groups = groups
- self._scale_state = scale_state
- self._clip_state = clip_state
- self._output_bottleneck = output_bottleneck
- self._pre_bottleneck = pre_bottleneck
- self._is_quantized = is_quantized
- for dim in self._output_size:
- self._param_count *= dim
- self._conv_op_overrides = conv_op_overrides
- if self._conv_op_overrides and len(self._conv_op_overrides) != 2:
- raise ValueError('Bottleneck and Convolutional layer should be overriden'
- 'together')
-
- @property
- def state_size(self):
- return contrib_rnn.LSTMStateTuple(self._output_size + [self._num_units],
- self._output_size + [self._num_units])
-
- @property
- def state_size_flat(self):
- return contrib_rnn.LSTMStateTuple([self._param_count], [self._param_count])
-
- @property
- def output_size(self):
- return self._output_size + [self._num_units]
-
- @property
- def filter_size(self):
- return self._filter_size
-
- @property
- def num_groups(self):
- return self._groups
-
- def __call__(self, inputs, state, scope=None):
- """Long short-term memory cell (LSTM) with bottlenecking.
-
- Includes logic for quantization-aware training. Note that all concats and
- activations use fixed ranges unless stated otherwise.
-
- Args:
- inputs: Input tensor at the current timestep.
- state: Tuple of tensors, the state at the previous timestep.
- scope: Optional scope.
-
- Returns:
- A tuple where the first element is the LSTM output and the second is
- a LSTMStateTuple of the state at the current timestep.
- """
- scope = scope or 'conv_lstm_cell'
- with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
- c, h = state
-
- # Set nodes to be under raw_inputs/ name scope for tfmini export.
- with tf.name_scope(None):
- c = tf.identity(c, name='raw_inputs/init_lstm_c')
- # When pre_bottleneck is enabled, input h handle is in rnn_decoder.py
- if not self._pre_bottleneck:
- h = tf.identity(h, name='raw_inputs/init_lstm_h')
-
- # unflatten state if necessary
- if self._flatten_state:
- c = tf.reshape(c, [-1] + self.output_size)
- h = tf.reshape(h, [-1] + self.output_size)
-
- c_list = tf.split(c, self._groups, axis=3)
- if self._pre_bottleneck:
- inputs_list = tf.split(inputs, self._groups, axis=3)
- else:
- h_list = tf.split(h, self._groups, axis=3)
- out_bottleneck = []
- out_c = []
- out_h = []
- # summary of input passed into cell
- if self._viz_gates:
- slim.summaries.add_histogram_summary(inputs, 'cell_input')
-
- for k in range(self._groups):
- if self._pre_bottleneck:
- bottleneck = inputs_list[k]
- else:
- if self._conv_op_overrides:
- bottleneck_fn = self._conv_op_overrides[0]
- else:
- bottleneck_fn = functools.partial(
- lstm_utils.quantizable_separable_conv2d,
- kernel_size=self._filter_size,
- activation_fn=self._activation)
- if self._use_batch_norm:
- b_x = bottleneck_fn(
- inputs=inputs,
- num_outputs=self._num_units // self._groups,
- is_quantized=self._is_quantized,
- depth_multiplier=1,
- normalizer_fn=None,
- scope='bottleneck_%d_x' % k)
- b_h = bottleneck_fn(
- inputs=h_list[k],
- num_outputs=self._num_units // self._groups,
- is_quantized=self._is_quantized,
- depth_multiplier=1,
- normalizer_fn=None,
- scope='bottleneck_%d_h' % k)
- b_x = slim.batch_norm(
- b_x,
- scale=True,
- is_training=self._is_training,
- scope='BatchNorm_%d_X' % k)
- b_h = slim.batch_norm(
- b_h,
- scale=True,
- is_training=self._is_training,
- scope='BatchNorm_%d_H' % k)
- bottleneck = b_x + b_h
- else:
- # All concats use fixed quantization ranges to prevent rescaling
- # at inference. Both |inputs| and |h_list| are tensors resulting
- # from Relu6 operations so we fix the ranges to [0, 6].
- bottleneck_concat = lstm_utils.quantizable_concat(
- [inputs, h_list[k]],
- axis=3,
- is_training=False,
- is_quantized=self._is_quantized,
- scope='bottleneck_%d/quantized_concat' % k)
- bottleneck = bottleneck_fn(
- inputs=bottleneck_concat,
- num_outputs=self._num_units // self._groups,
- is_quantized=self._is_quantized,
- depth_multiplier=1,
- normalizer_fn=None,
- scope='bottleneck_%d' % k)
-
- if self._conv_op_overrides:
- conv_fn = self._conv_op_overrides[1]
- else:
- conv_fn = functools.partial(
- lstm_utils.quantizable_separable_conv2d,
- kernel_size=self._filter_size,
- activation_fn=None)
- concat = conv_fn(
- inputs=bottleneck,
- num_outputs=4 * self._num_units // self._groups,
- is_quantized=self._is_quantized,
- depth_multiplier=1,
- normalizer_fn=None,
- scope='concat_conv_%d' % k)
-
- # Since there is no activation in the previous separable conv, we
- # quantize here. A starting range of [-6, 6] is used because the
- # tensors are input to a Sigmoid function that saturates at these
- # ranges.
- concat = lstm_utils.quantize_op(
- concat,
- is_training=self._is_training,
- default_min=-6,
- default_max=6,
- is_quantized=self._is_quantized,
- scope='gates_%d/act_quant' % k)
-
- # i = input_gate, j = new_input, f = forget_gate, o = output_gate
- i, j, f, o = tf.split(concat, 4, 3)
-
- f_add = f + self._forget_bias
- f_add = lstm_utils.quantize_op(
- f_add,
- is_training=self._is_training,
- default_min=-6,
- default_max=6,
- is_quantized=self._is_quantized,
- scope='forget_gate_%d/add_quant' % k)
- f_act = tf.sigmoid(f_add)
-
- a = c_list[k] * f_act
- a = lstm_utils.quantize_op(
- a,
- is_training=self._is_training,
- is_quantized=self._is_quantized,
- scope='forget_gate_%d/mul_quant' % k)
-
- i_act = tf.sigmoid(i)
-
- j_act = self._activation(j)
- # The quantization range is fixed for the relu6 to ensure that zero
- # is exactly representable.
- j_act = lstm_utils.fixed_quantize_op(
- j_act,
- fixed_min=0.0,
- fixed_max=6.0,
- is_quantized=self._is_quantized,
- scope='new_input_%d/act_quant' % k)
-
- b = i_act * j_act
- b = lstm_utils.quantize_op(
- b,
- is_training=self._is_training,
- is_quantized=self._is_quantized,
- scope='input_gate_%d/mul_quant' % k)
-
- new_c = a + b
- # The quantization range is fixed to [0, 6] due to an optimization in
- # TFLite. The order of operations is as fllows:
- # Add -> FakeQuant -> Relu6 -> FakeQuant -> Concat.
- # The fakequant ranges to the concat must be fixed to ensure all inputs
- # to the concat have the same range, removing the need for rescaling.
- # The quantization ranges input to the relu6 are propagated to its
- # output. Any mismatch between these two ranges will cause an error.
- new_c = lstm_utils.fixed_quantize_op(
- new_c,
- fixed_min=0.0,
- fixed_max=6.0,
- is_quantized=self._is_quantized,
- scope='new_c_%d/add_quant' % k)
-
- if not self._is_quantized:
- if self._scale_state:
- normalizer = tf.maximum(1.0,
- tf.reduce_max(new_c, axis=(1, 2, 3)) / 6)
- new_c /= tf.reshape(normalizer, [tf.shape(new_c)[0], 1, 1, 1])
- elif self._clip_state:
- new_c = tf.clip_by_value(new_c, -6, 6)
-
- new_c_act = self._activation(new_c)
- # The quantization range is fixed for the relu6 to ensure that zero
- # is exactly representable.
- new_c_act = lstm_utils.fixed_quantize_op(
- new_c_act,
- fixed_min=0.0,
- fixed_max=6.0,
- is_quantized=self._is_quantized,
- scope='new_c_%d/act_quant' % k)
-
- o_act = tf.sigmoid(o)
-
- new_h = new_c_act * o_act
- # The quantization range is fixed since it is input to a concat.
- # A range of [0, 6] is used since |new_h| is a product of ranges [0, 6]
- # and [0, 1].
- new_h_act = lstm_utils.fixed_quantize_op(
- new_h,
- fixed_min=0.0,
- fixed_max=6.0,
- is_quantized=self._is_quantized,
- scope='new_h_%d/act_quant' % k)
-
- out_bottleneck.append(bottleneck)
- out_c.append(new_c_act)
- out_h.append(new_h_act)
-
- # Since all inputs to the below concats are already quantized, we can use
- # a regular concat operation.
- new_c = tf.concat(out_c, axis=3)
- new_h = tf.concat(out_h, axis=3)
-
- # |bottleneck| is input to a concat with |new_h|. We must use
- # quantizable_concat() with a fixed range that matches |new_h|.
- bottleneck = lstm_utils.quantizable_concat(
- out_bottleneck,
- axis=3,
- is_training=False,
- is_quantized=self._is_quantized,
- scope='out_bottleneck/quantized_concat')
-
- # summary of cell output and new state
- if self._viz_gates:
- slim.summaries.add_histogram_summary(new_h, 'cell_output')
- slim.summaries.add_histogram_summary(new_c, 'cell_state')
-
- output = new_h
- if self._output_bottleneck:
- output = lstm_utils.quantizable_concat(
- [new_h, bottleneck],
- axis=3,
- is_training=False,
- is_quantized=self._is_quantized,
- scope='new_output/quantized_concat')
-
- # reflatten state to store it
- if self._flatten_state:
- new_c = tf.reshape(new_c, [-1, self._param_count], name='lstm_c')
- new_h = tf.reshape(new_h, [-1, self._param_count], name='lstm_h')
-
- # Set nodes to be under raw_outputs/ name scope for tfmini export.
- with tf.name_scope(None):
- new_c = tf.identity(new_c, name='raw_outputs/lstm_c')
- new_h = tf.identity(new_h, name='raw_outputs/lstm_h')
- states_and_output = contrib_rnn.LSTMStateTuple(new_c, new_h)
-
- return output, states_and_output
-
- def init_state(self, state_name, batch_size, dtype, learned_state=False):
- """Creates an initial state compatible with this cell.
-
- Args:
- state_name: name of the state tensor
- batch_size: model batch size
- dtype: dtype for the tensor values i.e. tf.float32
- learned_state: whether the initial state should be learnable. If false,
- the initial state is set to all 0's
-
- Returns:
- ret: the created initial state
- """
- state_size = (
- self.state_size_flat if self._flatten_state else self.state_size)
- # list of 2 zero tensors or variables tensors,
- # depending on if learned_state is true
- # pylint: disable=g-long-ternary,g-complex-comprehension
- ret_flat = [(contrib_variables.model_variable(
- state_name + str(i),
- shape=s,
- dtype=dtype,
- initializer=tf.truncated_normal_initializer(stddev=0.03))
- if learned_state else tf.zeros(
- [batch_size] + s, dtype=dtype, name=state_name))
- for i, s in enumerate(state_size)]
-
- # duplicates initial state across the batch axis if it's learned
- if learned_state:
- ret_flat = [tf.stack([tensor for i in range(int(batch_size))])
- for tensor in ret_flat]
- for s, r in zip(state_size, ret_flat):
- r = tf.reshape(r, [-1] + s)
- ret = tf.nest.pack_sequence_as(structure=[1, 1], flat_sequence=ret_flat)
- return ret
-
- def pre_bottleneck(self, inputs, state, input_index):
- """Apply pre-bottleneck projection to inputs.
-
- Pre-bottleneck operation maps features of different channels into the same
- dimension. The purpose of this op is to share the features from both large
- and small models in the same LSTM cell.
-
- Args:
- inputs: 4D Tensor with shape [batch_size x width x height x input_size].
- state: 4D Tensor with shape [batch_size x width x height x state_size].
- input_index: integer index indicating which base features the inputs
- correspoding to.
-
- Returns:
- inputs: pre-bottlenecked inputs.
- Raises:
- ValueError: If pre_bottleneck is not set or inputs is not rank 4.
- """
- # Sometimes state is a tuple, in which case it cannot be modified, e.g.
- # during training, tf.contrib.training.SequenceQueueingStateSaver
- # returns the state as a tuple. This should not be an issue since we
- # only need to modify state[1] during export, when state should be a
- # list.
- if not self._pre_bottleneck:
- raise ValueError('Only applied when pre_bottleneck is set to true.')
- if len(inputs.shape) != 4:
- raise ValueError('Expect a rank 4 feature tensor.')
- if not self._flatten_state and len(state.shape) != 4:
- raise ValueError('Expect rank 4 state tensor.')
- if self._flatten_state and len(state.shape) != 2:
- raise ValueError('Expect rank 2 state tensor when flatten_state is set.')
-
- with tf.name_scope(None):
- state = tf.identity(
- state, name='raw_inputs/init_lstm_h_%d' % (input_index + 1))
- if self._flatten_state:
- batch_size = inputs.shape[0]
- height = inputs.shape[1]
- width = inputs.shape[2]
- state = tf.reshape(state, [batch_size, height, width, -1])
- with tf.variable_scope('conv_lstm_cell', reuse=tf.AUTO_REUSE):
- state_split = tf.split(state, self._groups, axis=3)
- with tf.variable_scope('bottleneck_%d' % input_index):
- bottleneck_out = []
- for k in range(self._groups):
- with tf.variable_scope('group_%d' % k):
- bottleneck_out.append(
- lstm_utils.quantizable_separable_conv2d(
- lstm_utils.quantizable_concat(
- [inputs, state_split[k]],
- axis=3,
- is_training=self._is_training,
- is_quantized=self._is_quantized,
- scope='quantized_concat'),
- self.output_size[-1] / self._groups,
- self._filter_size,
- is_quantized=self._is_quantized,
- depth_multiplier=1,
- activation_fn=tf.nn.relu6,
- normalizer_fn=None,
- scope='project'))
- inputs = lstm_utils.quantizable_concat(
- bottleneck_out,
- axis=3,
- is_training=self._is_training,
- is_quantized=self._is_quantized,
- scope='bottleneck_out/quantized_concat')
- # For exporting inference graph, we only mark the first timestep.
- with tf.name_scope(None):
- inputs = tf.identity(
- inputs, name='raw_outputs/base_endpoint_%d' % (input_index + 1))
- return inputs
diff --git a/research/lstm_object_detection/lstm/lstm_cells_test.py b/research/lstm_object_detection/lstm/lstm_cells_test.py
deleted file mode 100644
index b296310194d..00000000000
--- a/research/lstm_object_detection/lstm/lstm_cells_test.py
+++ /dev/null
@@ -1,412 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for lstm_object_detection.lstm.lstm_cells."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow.compat.v1 as tf
-
-from lstm_object_detection.lstm import lstm_cells
-
-
-class BottleneckConvLstmCellsTest(tf.test.TestCase):
-
- def test_run_lstm_cell(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 15
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = False
-
- inputs = tf.zeros([4, 10, 10, 3], dtype=tf.float32)
- cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units)
- init_state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- output, state_tuple = cell(inputs, init_state)
- self.assertAllEqual([4, 10, 10, 15], output.shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], state_tuple[0].shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], state_tuple[1].shape.as_list())
-
- def test_run_lstm_cell_with_flattened_state(self):
- filter_size = [3, 3]
- output_dim = 10
- output_size = [output_dim] * 2
- num_units = 15
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = False
-
- inputs = tf.zeros([batch_size, output_dim, output_dim, 3], dtype=tf.float32)
- cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- flatten_state=True)
- init_state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- output, state_tuple = cell(inputs, init_state)
- self.assertAllEqual([4, 10, 10, 15], output.shape.as_list())
- self.assertAllEqual([4, 1500], state_tuple[0].shape.as_list())
- self.assertAllEqual([4, 1500], state_tuple[1].shape.as_list())
-
- def test_run_lstm_cell_with_output_bottleneck(self):
- filter_size = [3, 3]
- output_dim = 10
- output_size = [output_dim] * 2
- num_units = 15
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = False
-
- inputs = tf.zeros([batch_size, output_dim, output_dim, 3], dtype=tf.float32)
- cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- output_bottleneck=True)
- init_state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- output, state_tuple = cell(inputs, init_state)
- self.assertAllEqual([4, 10, 10, 30], output.shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], state_tuple[0].shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], state_tuple[1].shape.as_list())
-
- def test_get_init_state(self):
- filter_size = [3, 3]
- output_dim = 10
- output_size = [output_dim] * 2
- num_units = 15
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = False
-
- cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units)
- init_c, init_h = cell.init_state(
- state_name, batch_size, dtype, learned_state)
-
- self.assertEqual(tf.float32, init_c.dtype)
- self.assertEqual(tf.float32, init_h.dtype)
- with self.test_session() as sess:
- init_c_res, init_h_res = sess.run([init_c, init_h])
- self.assertAllClose(np.zeros((4, 10, 10, 15)), init_c_res)
- self.assertAllClose(np.zeros((4, 10, 10, 15)), init_h_res)
-
- def test_get_init_learned_state(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 15
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = True
-
- cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units)
- init_c, init_h = cell.init_state(
- state_name, batch_size, dtype, learned_state)
-
- self.assertEqual(tf.float32, init_c.dtype)
- self.assertEqual(tf.float32, init_h.dtype)
- self.assertAllEqual([4, 10, 10, 15], init_c.shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], init_h.shape.as_list())
-
- def test_unroll(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 15
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- unroll = 10
- learned_state = False
-
- inputs = tf.zeros([4, 10, 10, 3], dtype=tf.float32)
- cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units)
- state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- for step in range(unroll):
- output, state = cell(inputs, state)
- self.assertAllEqual([4, 10, 10, 15], output.shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], state[0].shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], state[1].shape.as_list())
-
- def test_prebottleneck(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 15
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- unroll = 10
- learned_state = False
-
- inputs_large = tf.zeros([4, 10, 10, 5], dtype=tf.float32)
- inputs_small = tf.zeros([4, 10, 10, 3], dtype=tf.float32)
- cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- pre_bottleneck=True)
- state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- for step in range(unroll):
- if step % 2 == 0:
- inputs = cell.pre_bottleneck(inputs_large, state[1], 0)
- else:
- inputs = cell.pre_bottleneck(inputs_small, state[1], 1)
- output, state = cell(inputs, state)
- self.assertAllEqual([4, 10, 10, 15], output.shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], state[0].shape.as_list())
- self.assertAllEqual([4, 10, 10, 15], state[1].shape.as_list())
-
- def test_flatten_state(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 15
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- unroll = 10
- learned_state = False
-
- inputs_large = tf.zeros([4, 10, 10, 5], dtype=tf.float32)
- inputs_small = tf.zeros([4, 10, 10, 3], dtype=tf.float32)
- cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- pre_bottleneck=True,
- flatten_state=True)
- state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- for step in range(unroll):
- if step % 2 == 0:
- inputs = cell.pre_bottleneck(inputs_large, state[1], 0)
- else:
- inputs = cell.pre_bottleneck(inputs_small, state[1], 1)
- output, state = cell(inputs, state)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output_result, state_result = sess.run([output, state])
- self.assertAllEqual((4, 10, 10, 15), output_result.shape)
- self.assertAllEqual((4, 10*10*15), state_result[0].shape)
- self.assertAllEqual((4, 10*10*15), state_result[1].shape)
-
-
-class GroupedConvLstmCellsTest(tf.test.TestCase):
-
- def test_run_lstm_cell(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 16
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = False
-
- inputs = tf.zeros([4, 10, 10, 3], dtype=tf.float32)
- cell = lstm_cells.GroupedConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- is_training=True)
- init_state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- output, state_tuple = cell(inputs, init_state)
- self.assertAllEqual([4, 10, 10, 16], output.shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], state_tuple[0].shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], state_tuple[1].shape.as_list())
-
- def test_run_lstm_cell_with_output_bottleneck(self):
- filter_size = [3, 3]
- output_dim = 10
- output_size = [output_dim] * 2
- num_units = 16
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = False
-
- inputs = tf.zeros([batch_size, output_dim, output_dim, 3], dtype=tf.float32)
- cell = lstm_cells.GroupedConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- is_training=True,
- output_bottleneck=True)
- init_state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- output, state_tuple = cell(inputs, init_state)
- self.assertAllEqual([4, 10, 10, 32], output.shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], state_tuple[0].shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], state_tuple[1].shape.as_list())
-
- def test_get_init_state(self):
- filter_size = [3, 3]
- output_dim = 10
- output_size = [output_dim] * 2
- num_units = 16
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = False
-
- cell = lstm_cells.GroupedConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- is_training=True)
- init_c, init_h = cell.init_state(
- state_name, batch_size, dtype, learned_state)
-
- self.assertEqual(tf.float32, init_c.dtype)
- self.assertEqual(tf.float32, init_h.dtype)
- with self.test_session() as sess:
- init_c_res, init_h_res = sess.run([init_c, init_h])
- self.assertAllClose(np.zeros((4, 10, 10, 16)), init_c_res)
- self.assertAllClose(np.zeros((4, 10, 10, 16)), init_h_res)
-
- def test_get_init_learned_state(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 16
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- learned_state = True
-
- cell = lstm_cells.GroupedConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- is_training=True)
- init_c, init_h = cell.init_state(
- state_name, batch_size, dtype, learned_state)
-
- self.assertEqual(tf.float32, init_c.dtype)
- self.assertEqual(tf.float32, init_h.dtype)
- self.assertAllEqual([4, 10, 10, 16], init_c.shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], init_h.shape.as_list())
-
- def test_unroll(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 16
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- unroll = 10
- learned_state = False
-
- inputs = tf.zeros([4, 10, 10, 3], dtype=tf.float32)
- cell = lstm_cells.GroupedConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- is_training=True)
- state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- for step in range(unroll):
- output, state = cell(inputs, state)
- self.assertAllEqual([4, 10, 10, 16], output.shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], state[0].shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], state[1].shape.as_list())
-
- def test_prebottleneck(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 16
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- unroll = 10
- learned_state = False
-
- inputs_large = tf.zeros([4, 10, 10, 5], dtype=tf.float32)
- inputs_small = tf.zeros([4, 10, 10, 3], dtype=tf.float32)
- cell = lstm_cells.GroupedConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- is_training=True,
- pre_bottleneck=True)
- state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- for step in range(unroll):
- if step % 2 == 0:
- inputs = cell.pre_bottleneck(inputs_large, state[1], 0)
- else:
- inputs = cell.pre_bottleneck(inputs_small, state[1], 1)
- output, state = cell(inputs, state)
- self.assertAllEqual([4, 10, 10, 16], output.shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], state[0].shape.as_list())
- self.assertAllEqual([4, 10, 10, 16], state[1].shape.as_list())
-
- def test_flatten_state(self):
- filter_size = [3, 3]
- output_size = [10, 10]
- num_units = 16
- state_name = 'lstm_state'
- batch_size = 4
- dtype = tf.float32
- unroll = 10
- learned_state = False
-
- inputs_large = tf.zeros([4, 10, 10, 5], dtype=tf.float32)
- inputs_small = tf.zeros([4, 10, 10, 3], dtype=tf.float32)
- cell = lstm_cells.GroupedConvLSTMCell(
- filter_size=filter_size,
- output_size=output_size,
- num_units=num_units,
- is_training=True,
- pre_bottleneck=True,
- flatten_state=True)
- state = cell.init_state(
- state_name, batch_size, dtype, learned_state)
- for step in range(unroll):
- if step % 2 == 0:
- inputs = cell.pre_bottleneck(inputs_large, state[1], 0)
- else:
- inputs = cell.pre_bottleneck(inputs_small, state[1], 1)
- output, state = cell(inputs, state)
- with self.test_session() as sess:
- sess.run(tf.global_variables_initializer())
- output_result, state_result = sess.run([output, state])
- self.assertAllEqual((4, 10, 10, 16), output_result.shape)
- self.assertAllEqual((4, 10*10*16), state_result[0].shape)
- self.assertAllEqual((4, 10*10*16), state_result[1].shape)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/lstm/rnn_decoder.py b/research/lstm_object_detection/lstm/rnn_decoder.py
deleted file mode 100644
index 185ca130396..00000000000
--- a/research/lstm_object_detection/lstm/rnn_decoder.py
+++ /dev/null
@@ -1,269 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Custom RNN decoder."""
-
-import tensorflow.compat.v1 as tf
-import lstm_object_detection.lstm.utils as lstm_utils
-
-
-class _NoVariableScope(object):
-
- def __enter__(self):
- return
-
- def __exit__(self, exc_type, exc_value, traceback):
- return False
-
-
-def rnn_decoder(decoder_inputs,
- initial_state,
- cell,
- loop_function=None,
- scope=None):
- """RNN decoder for the LSTM-SSD model.
-
- This decoder returns a list of all states, rather than only the final state.
- Args:
- decoder_inputs: A list of 4D Tensors with shape [batch_size x input_size].
- initial_state: 2D Tensor with shape [batch_size x cell.state_size].
- cell: rnn_cell.RNNCell defining the cell function and size.
- loop_function: If not None, this function will be applied to the i-th output
- in order to generate the i+1-st input, and decoder_inputs will be ignored,
- except for the first element ("GO" symbol). This can be used for decoding,
- but also for training to emulate http://arxiv.org/abs/1506.03099.
- Signature -- loop_function(prev, i) = next
- * prev is a 2D Tensor of shape [batch_size x output_size],
- * i is an integer, the step number (when advanced control is needed),
- * next is a 2D Tensor of shape [batch_size x input_size].
- scope: optional VariableScope for the created subgraph.
- Returns:
- A tuple of the form (outputs, state), where:
- outputs: A list of the same length as decoder_inputs of 4D Tensors with
- shape [batch_size x output_size] containing generated outputs.
- states: A list of the same length as decoder_inputs of the state of each
- cell at each time-step. It is a 2D Tensor of shape
- [batch_size x cell.state_size].
- """
- with tf.variable_scope(scope) if scope else _NoVariableScope():
- state_tuple = initial_state
- outputs = []
- states = []
- prev = None
- for local_step, decoder_input in enumerate(decoder_inputs):
- if loop_function is not None and prev is not None:
- with tf.variable_scope('loop_function', reuse=True):
- decoder_input = loop_function(prev, local_step)
- output, state_tuple = cell(decoder_input, state_tuple)
- outputs.append(output)
- states.append(state_tuple)
- if loop_function is not None:
- prev = output
- return outputs, states
-
-def multi_input_rnn_decoder(decoder_inputs,
- initial_state,
- cell,
- sequence_step,
- selection_strategy='RANDOM',
- is_training=None,
- is_quantized=False,
- preprocess_fn_list=None,
- pre_bottleneck=False,
- flatten_state=False,
- scope=None):
- """RNN decoder for the Interleaved LSTM-SSD model.
-
- This decoder takes multiple sequences of inputs and selects the input to feed
- to the rnn at each timestep using its selection_strategy, which can be random,
- learned, or deterministic.
- This decoder returns a list of all states, rather than only the final state.
- Args:
- decoder_inputs: A list of lists of 2D Tensors [batch_size x input_size].
- initial_state: 2D Tensor with shape [batch_size x cell.state_size].
- cell: rnn_cell.RNNCell defining the cell function and size.
- sequence_step: Tensor [batch_size] of the step number of the first elements
- in the sequence.
- selection_strategy: Method for picking the decoder_input to use at each
- timestep. Must be 'RANDOM', 'SKIPX' for integer X, where X is the number
- of times to use the second input before using the first.
- is_training: boolean, whether the network is training. When using learned
- selection, attempts exploration if training.
- is_quantized: flag to enable/disable quantization mode.
- preprocess_fn_list: List of functions accepting two tensor arguments: one
- timestep of decoder_inputs and the lstm state. If not None,
- decoder_inputs[i] will be updated with preprocess_fn[i] at the start of
- each timestep.
- pre_bottleneck: if True, use separate bottleneck weights for each sequence.
- Useful when input sequences have differing numbers of channels. Final
- bottlenecks will have the same dimension.
- flatten_state: Whether the LSTM state is flattened.
- scope: optional VariableScope for the created subgraph.
- Returns:
- A tuple of the form (outputs, state), where:
- outputs: A list of the same length as decoder_inputs of 2D Tensors with
- shape [batch_size x output_size] containing generated outputs.
- states: A list of the same length as decoder_inputs of the state of each
- cell at each time-step. It is a 2D Tensor of shape
- [batch_size x cell.state_size].
- Raises:
- ValueError: If selection_strategy is not recognized or unexpected unroll
- length.
- """
- if flatten_state and len(decoder_inputs[0]) > 1:
- raise ValueError('In export mode, unroll length should not be more than 1')
- with tf.variable_scope(scope) if scope else _NoVariableScope():
- state_tuple = initial_state
- outputs = []
- states = []
- batch_size = decoder_inputs[0][0].shape[0].value
- num_sequences = len(decoder_inputs)
- sequence_length = len(decoder_inputs[0])
-
- for local_step in range(sequence_length):
- for sequence_index in range(num_sequences):
- if preprocess_fn_list is not None:
- decoder_inputs[sequence_index][local_step] = (
- preprocess_fn_list[sequence_index](
- decoder_inputs[sequence_index][local_step], state_tuple[0]))
- if pre_bottleneck:
- decoder_inputs[sequence_index][local_step] = cell.pre_bottleneck(
- inputs=decoder_inputs[sequence_index][local_step],
- state=state_tuple[1],
- input_index=sequence_index)
-
- action = generate_action(selection_strategy, local_step, sequence_step,
- [batch_size, 1, 1, 1])
- inputs, _ = (
- select_inputs(decoder_inputs, action, local_step, is_training,
- is_quantized))
- # Mark base network endpoints under raw_inputs/
- with tf.name_scope(None):
- inputs = tf.identity(inputs, 'raw_inputs/base_endpoint')
- output, state_tuple_out = cell(inputs, state_tuple)
- state_tuple = select_state(state_tuple, state_tuple_out, action)
-
- outputs.append(output)
- states.append(state_tuple)
- return outputs, states
-
-
-def generate_action(selection_strategy, local_step, sequence_step,
- action_shape):
- """Generate current (binary) action based on selection strategy.
-
- Args:
- selection_strategy: Method for picking the decoder_input to use at each
- timestep. Must be 'RANDOM', 'SKIPX' for integer X, where X is the number
- of times to use the second input before using the first.
- local_step: Tensor [batch_size] of the step number within the current
- unrolled batch.
- sequence_step: Tensor [batch_size] of the step number of the first elements
- in the sequence.
- action_shape: The shape of action tensor to be generated.
-
- Returns:
- A tensor of shape action_shape, each element is an individual action.
-
- Raises:
- ValueError: if selection_strategy is not supported or if 'SKIP' is not
- followed by numerics.
- """
- if selection_strategy.startswith('RANDOM'):
- action = tf.random.uniform(action_shape, maxval=2, dtype=tf.int32)
- action = tf.minimum(action, 1)
-
- # First step always runs large network.
- if local_step == 0 and sequence_step is not None:
- action *= tf.minimum(
- tf.reshape(tf.cast(sequence_step, tf.int32), action_shape), 1)
- elif selection_strategy.startswith('SKIP'):
- inter_count = int(selection_strategy[4:])
- if local_step % (inter_count + 1) == 0:
- action = tf.zeros(action_shape)
- else:
- action = tf.ones(action_shape)
- else:
- raise ValueError('Selection strategy %s not recognized' %
- selection_strategy)
- return tf.cast(action, tf.int32)
-
-
-def select_inputs(decoder_inputs, action, local_step, is_training, is_quantized,
- get_alt_inputs=False):
- """Selects sequence from decoder_inputs based on 1D actions.
-
- Given multiple input batches, creates a single output batch by
- selecting from the action[i]-ith input for the i-th batch element.
-
- Args:
- decoder_inputs: A 2-D list of tensor inputs.
- action: A tensor of shape [batch_size]. Each element corresponds to an index
- of decoder_inputs to choose.
- local_step: The current timestep.
- is_training: boolean, whether the network is training. When using learned
- selection, attempts exploration if training.
- is_quantized: flag to enable/disable quantization mode.
- get_alt_inputs: Whether the non-chosen inputs should also be returned.
-
- Returns:
- The constructed output. Also outputs the elements that were not chosen
- if get_alt_inputs is True, otherwise None.
-
- Raises:
- ValueError: if the decoder inputs contains other than two sequences.
- """
- num_seqs = len(decoder_inputs)
- if not num_seqs == 2:
- raise ValueError('Currently only supports two sets of inputs.')
- stacked_inputs = tf.stack(
- [decoder_inputs[seq_index][local_step] for seq_index in range(num_seqs)],
- axis=-1)
- action_index = tf.one_hot(action, num_seqs)
- selected_inputs = (
- lstm_utils.quantize_op(stacked_inputs * action_index, is_training,
- is_quantized, scope='quant_selected_inputs'))
- inputs = tf.reduce_sum(selected_inputs, axis=-1)
- inputs_alt = None
- # Only works for 2 models.
- if get_alt_inputs:
- # Reverse of action_index.
- action_index_alt = tf.one_hot(action, num_seqs, on_value=0.0, off_value=1.0)
- selected_inputs = (
- lstm_utils.quantize_op(stacked_inputs * action_index_alt, is_training,
- is_quantized, scope='quant_selected_inputs_alt'))
- inputs_alt = tf.reduce_sum(selected_inputs, axis=-1)
- return inputs, inputs_alt
-
-def select_state(previous_state, new_state, action):
- """Select state given action.
-
- Currently only supports binary action. If action is 0, it means the state is
- generated from the large model, and thus we will update the state. Otherwise,
- if the action is 1, it means the state is generated from the small model, and
- in interleaved model, we skip this state update.
-
- Args:
- previous_state: A state tuple representing state from previous step.
- new_state: A state tuple representing newly computed state.
- action: A tensor the same shape as state.
-
- Returns:
- A state tuple selected based on the given action.
- """
- action = tf.cast(action, tf.float32)
- state_c = previous_state[0] * action + new_state[0] * (1 - action)
- state_h = previous_state[1] * action + new_state[1] * (1 - action)
- return (state_c, state_h)
diff --git a/research/lstm_object_detection/lstm/rnn_decoder_test.py b/research/lstm_object_detection/lstm/rnn_decoder_test.py
deleted file mode 100644
index 480694f6fde..00000000000
--- a/research/lstm_object_detection/lstm/rnn_decoder_test.py
+++ /dev/null
@@ -1,306 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for lstm_object_detection.lstm.rnn_decoder."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import tensorflow.compat.v1 as tf
-
-from tensorflow.contrib import layers as contrib_layers
-from tensorflow.contrib import rnn as contrib_rnn
-from lstm_object_detection.lstm import rnn_decoder
-
-
-class MockRnnCell(contrib_rnn.RNNCell):
-
- def __init__(self, input_size, num_units):
- self._input_size = input_size
- self._num_units = num_units
- self._filter_size = [3, 3]
-
- def __call__(self, inputs, state_tuple):
- outputs = tf.concat([inputs, state_tuple[0]], axis=3)
- new_state_tuple = (tf.multiply(state_tuple[0], 2), state_tuple[1])
- return outputs, new_state_tuple
-
- def state_size(self):
- return self._num_units
-
- def output_size(self):
- return self._input_size + self._num_units
-
- def pre_bottleneck(self, inputs, state, input_index):
- with tf.variable_scope('bottleneck_%d' % input_index, reuse=tf.AUTO_REUSE):
- inputs = contrib_layers.separable_conv2d(
- tf.concat([inputs, state], 3),
- self._input_size,
- self._filter_size,
- depth_multiplier=1,
- activation_fn=tf.nn.relu6,
- normalizer_fn=None)
- return inputs
-
-
-class RnnDecoderTest(tf.test.TestCase):
-
- def test_rnn_decoder_single_unroll(self):
- batch_size = 2
- num_unroll = 1
- num_units = 64
- width = 8
- height = 10
- input_channels = 128
-
- initial_state = tf.random_normal((batch_size, width, height, num_units))
- inputs = tf.random_normal([batch_size, width, height, input_channels])
-
- rnn_cell = MockRnnCell(input_channels, num_units)
- outputs, states = rnn_decoder.rnn_decoder(
- decoder_inputs=[inputs] * num_unroll,
- initial_state=(initial_state, initial_state),
- cell=rnn_cell)
-
- self.assertEqual(len(outputs), num_unroll)
- self.assertEqual(len(states), num_unroll)
- with tf.Session() as sess:
- sess.run(tf.global_variables_initializer())
- results = sess.run((outputs, states, inputs, initial_state))
- outputs_results = results[0]
- states_results = results[1]
- inputs_results = results[2]
- initial_states_results = results[3]
- self.assertEqual(outputs_results[0].shape,
- (batch_size, width, height, input_channels + num_units))
- self.assertAllEqual(
- outputs_results[0],
- np.concatenate((inputs_results, initial_states_results), axis=3))
- self.assertEqual(states_results[0][0].shape,
- (batch_size, width, height, num_units))
- self.assertEqual(states_results[0][1].shape,
- (batch_size, width, height, num_units))
- self.assertAllEqual(states_results[0][0],
- np.multiply(initial_states_results, 2.0))
- self.assertAllEqual(states_results[0][1], initial_states_results)
-
- def test_rnn_decoder_multiple_unroll(self):
- batch_size = 2
- num_unroll = 3
- num_units = 64
- width = 8
- height = 10
- input_channels = 128
-
- initial_state = tf.random_normal((batch_size, width, height, num_units))
- inputs = tf.random_normal([batch_size, width, height, input_channels])
-
- rnn_cell = MockRnnCell(input_channels, num_units)
- outputs, states = rnn_decoder.rnn_decoder(
- decoder_inputs=[inputs] * num_unroll,
- initial_state=(initial_state, initial_state),
- cell=rnn_cell)
-
- self.assertEqual(len(outputs), num_unroll)
- self.assertEqual(len(states), num_unroll)
- with tf.Session() as sess:
- sess.run(tf.global_variables_initializer())
- results = sess.run((outputs, states, inputs, initial_state))
- outputs_results = results[0]
- states_results = results[1]
- inputs_results = results[2]
- initial_states_results = results[3]
- for i in range(num_unroll):
- previous_state = ([initial_states_results, initial_states_results]
- if i == 0 else states_results[i - 1])
- self.assertEqual(
- outputs_results[i].shape,
- (batch_size, width, height, input_channels + num_units))
- self.assertAllEqual(
- outputs_results[i],
- np.concatenate((inputs_results, previous_state[0]), axis=3))
- self.assertEqual(states_results[i][0].shape,
- (batch_size, width, height, num_units))
- self.assertEqual(states_results[i][1].shape,
- (batch_size, width, height, num_units))
- self.assertAllEqual(states_results[i][0],
- np.multiply(previous_state[0], 2.0))
- self.assertAllEqual(states_results[i][1], previous_state[1])
-
-
-class MultiInputRnnDecoderTest(tf.test.TestCase):
-
- def test_rnn_decoder_single_unroll(self):
- batch_size = 2
- num_unroll = 1
- num_units = 12
- width = 8
- height = 10
- input_channels_large = 24
- input_channels_small = 12
- bottleneck_channels = 20
-
- initial_state_c = tf.random_normal((batch_size, width, height, num_units))
- initial_state_h = tf.random_normal((batch_size, width, height, num_units))
- initial_state = (initial_state_c, initial_state_h)
- inputs_large = tf.random_normal(
- [batch_size, width, height, input_channels_large])
- inputs_small = tf.random_normal(
- [batch_size, width, height, input_channels_small])
-
- rnn_cell = MockRnnCell(bottleneck_channels, num_units)
- outputs, states = rnn_decoder.multi_input_rnn_decoder(
- decoder_inputs=[[inputs_large] * num_unroll,
- [inputs_small] * num_unroll],
- initial_state=initial_state,
- cell=rnn_cell,
- sequence_step=tf.zeros([batch_size]),
- pre_bottleneck=True)
-
- self.assertEqual(len(outputs), num_unroll)
- self.assertEqual(len(states), num_unroll)
- with tf.Session() as sess:
- sess.run(tf.global_variables_initializer())
- results = sess.run(
- (outputs, states, inputs_large, inputs_small, initial_state))
- outputs_results = results[0]
- states_results = results[1]
- initial_states_results = results[4]
- self.assertEqual(
- outputs_results[0].shape,
- (batch_size, width, height, bottleneck_channels + num_units))
- self.assertEqual(states_results[0][0].shape,
- (batch_size, width, height, num_units))
- self.assertEqual(states_results[0][1].shape,
- (batch_size, width, height, num_units))
- # The first step should always update state.
- self.assertAllEqual(states_results[0][0],
- np.multiply(initial_states_results[0], 2))
- self.assertAllEqual(states_results[0][1], initial_states_results[1])
-
- def test_rnn_decoder_multiple_unroll(self):
- batch_size = 2
- num_unroll = 3
- num_units = 12
- width = 8
- height = 10
- input_channels_large = 24
- input_channels_small = 12
- bottleneck_channels = 20
-
- initial_state_c = tf.random_normal((batch_size, width, height, num_units))
- initial_state_h = tf.random_normal((batch_size, width, height, num_units))
- initial_state = (initial_state_c, initial_state_h)
- inputs_large = tf.random_normal(
- [batch_size, width, height, input_channels_large])
- inputs_small = tf.random_normal(
- [batch_size, width, height, input_channels_small])
-
- rnn_cell = MockRnnCell(bottleneck_channels, num_units)
- outputs, states = rnn_decoder.multi_input_rnn_decoder(
- decoder_inputs=[[inputs_large] * num_unroll,
- [inputs_small] * num_unroll],
- initial_state=initial_state,
- cell=rnn_cell,
- sequence_step=tf.zeros([batch_size]),
- pre_bottleneck=True)
-
- self.assertEqual(len(outputs), num_unroll)
- self.assertEqual(len(states), num_unroll)
- with tf.Session() as sess:
- sess.run(tf.global_variables_initializer())
- results = sess.run(
- (outputs, states, inputs_large, inputs_small, initial_state))
- outputs_results = results[0]
- states_results = results[1]
- initial_states_results = results[4]
-
- # The first step should always update state.
- self.assertAllEqual(states_results[0][0],
- np.multiply(initial_states_results[0], 2))
- self.assertAllEqual(states_results[0][1], initial_states_results[1])
- for i in range(num_unroll):
- self.assertEqual(
- outputs_results[i].shape,
- (batch_size, width, height, bottleneck_channels + num_units))
- self.assertEqual(states_results[i][0].shape,
- (batch_size, width, height, num_units))
- self.assertEqual(states_results[i][1].shape,
- (batch_size, width, height, num_units))
-
- def test_rnn_decoder_multiple_unroll_with_skip(self):
- batch_size = 2
- num_unroll = 5
- num_units = 12
- width = 8
- height = 10
- input_channels_large = 24
- input_channels_small = 12
- bottleneck_channels = 20
- skip = 2
-
- initial_state_c = tf.random_normal((batch_size, width, height, num_units))
- initial_state_h = tf.random_normal((batch_size, width, height, num_units))
- initial_state = (initial_state_c, initial_state_h)
- inputs_large = tf.random_normal(
- [batch_size, width, height, input_channels_large])
- inputs_small = tf.random_normal(
- [batch_size, width, height, input_channels_small])
-
- rnn_cell = MockRnnCell(bottleneck_channels, num_units)
- outputs, states = rnn_decoder.multi_input_rnn_decoder(
- decoder_inputs=[[inputs_large] * num_unroll,
- [inputs_small] * num_unroll],
- initial_state=initial_state,
- cell=rnn_cell,
- sequence_step=tf.zeros([batch_size]),
- pre_bottleneck=True,
- selection_strategy='SKIP%d' % skip)
-
- self.assertEqual(len(outputs), num_unroll)
- self.assertEqual(len(states), num_unroll)
- with tf.Session() as sess:
- sess.run(tf.global_variables_initializer())
- results = sess.run(
- (outputs, states, inputs_large, inputs_small, initial_state))
- outputs_results = results[0]
- states_results = results[1]
- initial_states_results = results[4]
-
- for i in range(num_unroll):
- self.assertEqual(
- outputs_results[i].shape,
- (batch_size, width, height, bottleneck_channels + num_units))
- self.assertEqual(states_results[i][0].shape,
- (batch_size, width, height, num_units))
- self.assertEqual(states_results[i][1].shape,
- (batch_size, width, height, num_units))
-
- previous_state = (
- initial_states_results if i == 0 else states_results[i - 1])
- # State only updates during key frames
- if i % (skip + 1) == 0:
- self.assertAllEqual(states_results[i][0],
- np.multiply(previous_state[0], 2))
- self.assertAllEqual(states_results[i][1], previous_state[1])
- else:
- self.assertAllEqual(states_results[i][0], previous_state[0])
- self.assertAllEqual(states_results[i][1], previous_state[1])
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/lstm/utils.py b/research/lstm_object_detection/lstm/utils.py
deleted file mode 100644
index 0c87db4bb20..00000000000
--- a/research/lstm_object_detection/lstm/utils.py
+++ /dev/null
@@ -1,257 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Quantization related ops for LSTM."""
-
-from __future__ import absolute_import
-from __future__ import division
-
-import tensorflow.compat.v1 as tf
-from tensorflow.contrib import framework as contrib_framework
-from tensorflow.contrib import layers as contrib_layers
-from tensorflow.python.training import moving_averages
-
-
-def _quant_var(
- name,
- initializer_val,
- vars_collection=tf.GraphKeys.MOVING_AVERAGE_VARIABLES,
-):
- """Create an var for storing the min/max quantization range."""
- return contrib_framework.model_variable(
- name,
- shape=[],
- initializer=tf.constant_initializer(initializer_val),
- collections=[vars_collection],
- trainable=False)
-
-
-def quantizable_concat(inputs,
- axis,
- is_training,
- is_quantized=True,
- default_min=0,
- default_max=6,
- ema_decay=0.999,
- scope='quantized_concat'):
- """Concat replacement with quantization option.
-
- Allows concat inputs to share the same min max ranges,
- from experimental/gazelle/synthetic/model/tpu/utils.py.
-
- Args:
- inputs: list of tensors to concatenate.
- axis: dimension along which to concatenate.
- is_training: true if the graph is a training graph.
- is_quantized: flag to enable/disable quantization.
- default_min: default min value for fake quant op.
- default_max: default max value for fake quant op.
- ema_decay: the moving average decay for the quantization variables.
- scope: Optional scope for variable_scope.
-
- Returns:
- Tensor resulting from concatenation of input tensors
- """
- if is_quantized:
- with tf.variable_scope(scope):
- tf.logging.info('inputs: {}'.format(inputs))
- for t in inputs:
- tf.logging.info(t)
-
- min_var = _quant_var('min', default_min)
- max_var = _quant_var('max', default_max)
- if not is_training:
- # If we are building an eval graph just use the values in the variables.
- quant_inputs = [
- tf.fake_quant_with_min_max_vars(t, min_var, max_var) for t in inputs
- ]
- tf.logging.info('min_val: {}'.format(min_var))
- tf.logging.info('max_val: {}'.format(max_var))
- else:
- concat_tensors = tf.concat(inputs, axis=axis)
- tf.logging.info('concat_tensors: {}'.format(concat_tensors))
- # TFLite requires that 0.0 is always in the [min; max] range.
- range_min = tf.minimum(
- tf.reduce_min(concat_tensors), 0.0, name='SafeQuantRangeMin')
- range_max = tf.maximum(
- tf.reduce_max(concat_tensors), 0.0, name='SafeQuantRangeMax')
- # Otherwise we need to keep track of the moving averages of the min and
- # of the elements of the input tensor max.
- min_val = moving_averages.assign_moving_average(
- min_var,
- range_min,
- ema_decay,
- name='AssignMinEma')
- max_val = moving_averages.assign_moving_average(
- max_var,
- range_max,
- ema_decay,
- name='AssignMaxEma')
- tf.logging.info('min_val: {}'.format(min_val))
- tf.logging.info('max_val: {}'.format(max_val))
- quant_inputs = [
- tf.fake_quant_with_min_max_vars(t, min_val, max_val) for t in inputs
- ]
- tf.logging.info('quant_inputs: {}'.format(quant_inputs))
- outputs = tf.concat(quant_inputs, axis=axis)
- tf.logging.info('outputs: {}'.format(outputs))
- else:
- outputs = tf.concat(inputs, axis=axis)
- return outputs
-
-
-def quantizable_separable_conv2d(inputs,
- num_outputs,
- kernel_size,
- is_quantized=True,
- depth_multiplier=1,
- stride=1,
- activation_fn=tf.nn.relu6,
- normalizer_fn=None,
- weights_initializer=None,
- pointwise_initializer=None,
- scope=None):
- """Quantization friendly backward compatible separable conv2d.
-
- This op has the same API is separable_conv2d. The main difference is that an
- additional BiasAdd is manually inserted after the depthwise conv, such that
- the depthwise bias will not have name conflict with pointwise bias. The
- motivation of this op is that quantization script need BiasAdd in order to
- recognize the op, in which a native call to separable_conv2d do not create
- for the depthwise conv.
-
- Args:
- inputs: A tensor of size [batch_size, height, width, channels].
- num_outputs: The number of pointwise convolution output filters. If is
- None, then we skip the pointwise convolution stage.
- kernel_size: A list of length 2: [kernel_height, kernel_width] of the
- filters. Can be an int if both values are the same.
- is_quantized: flag to enable/disable quantization.
- depth_multiplier: The number of depthwise convolution output channels for
- each input channel. The total number of depthwise convolution output
- channels will be equal to num_filters_in * depth_multiplier.
- stride: A list of length 2: [stride_height, stride_width], specifying the
- depthwise convolution stride. Can be an int if both strides are the same.
- activation_fn: Activation function. The default value is a ReLU function.
- Explicitly set it to None to skip it and maintain a linear activation.
- normalizer_fn: Normalization function to use instead of biases.
- weights_initializer: An initializer for the depthwise weights.
- pointwise_initializer: An initializer for the pointwise weights.
- scope: Optional scope for variable_scope.
-
- Returns:
- Tensor resulting from concatenation of input tensors
- """
- if is_quantized:
- outputs = contrib_layers.separable_conv2d(
- inputs,
- None,
- kernel_size,
- depth_multiplier=depth_multiplier,
- stride=1,
- activation_fn=None,
- normalizer_fn=None,
- biases_initializer=None,
- weights_initializer=weights_initializer,
- pointwise_initializer=None,
- scope=scope)
- outputs = contrib_layers.bias_add(
- outputs, trainable=True, scope='%s_bias' % scope)
- outputs = contrib_layers.conv2d(
- outputs,
- num_outputs, [1, 1],
- activation_fn=activation_fn,
- stride=stride,
- normalizer_fn=normalizer_fn,
- weights_initializer=pointwise_initializer,
- scope=scope)
- else:
- outputs = contrib_layers.separable_conv2d(
- inputs,
- num_outputs,
- kernel_size,
- depth_multiplier=depth_multiplier,
- stride=stride,
- activation_fn=activation_fn,
- normalizer_fn=normalizer_fn,
- weights_initializer=weights_initializer,
- pointwise_initializer=pointwise_initializer,
- scope=scope)
- return outputs
-
-
-def quantize_op(inputs,
- is_training=True,
- is_quantized=True,
- default_min=0,
- default_max=6,
- ema_decay=0.999,
- scope='quant'):
- """Inserts a fake quantization op after inputs.
-
- Args:
- inputs: A tensor of size [batch_size, height, width, channels].
- is_training: true if the graph is a training graph.
- is_quantized: flag to enable/disable quantization.
- default_min: default min value for fake quant op.
- default_max: default max value for fake quant op.
- ema_decay: the moving average decay for the quantization variables.
- scope: Optional scope for variable_scope.
-
- Returns:
- Tensor resulting from quantizing the input tensors.
- """
- if not is_quantized:
- return inputs
-
- with tf.variable_scope(scope):
- min_var = _quant_var('min', default_min)
- max_var = _quant_var('max', default_max)
- if not is_training:
- # Just use variables in the checkpoint.
- return tf.fake_quant_with_min_max_vars(inputs, min_var, max_var)
-
- # While training, collect EMAs of ranges seen, store in min_var, max_var.
- # TFLite requires that 0.0 is always in the [min; max] range.
- range_min = tf.minimum(tf.reduce_min(inputs), 0.0, 'SafeQuantRangeMin')
- # We set the lower_bound of max_range to prevent range collapse.
- range_max = tf.maximum(tf.reduce_max(inputs), 1e-5, 'SafeQuantRangeMax')
- min_val = moving_averages.assign_moving_average(
- min_var, range_min, ema_decay, name='AssignMinEma')
- max_val = moving_averages.assign_moving_average(
- max_var, range_max, ema_decay, name='AssignMaxEma')
- return tf.fake_quant_with_min_max_vars(inputs, min_val, max_val)
-
-
-def fixed_quantize_op(inputs, is_quantized=True,
- fixed_min=0.0, fixed_max=6.0, scope='quant'):
- """Inserts a fake quantization op with fixed range after inputs.
-
- Args:
- inputs: A tensor of size [batch_size, height, width, channels].
- is_quantized: flag to enable/disable quantization.
- fixed_min: fixed min value for fake quant op.
- fixed_max: fixed max value for fake quant op.
- scope: Optional scope for variable_scope.
-
- Returns:
- Tensor resulting from quantizing the input tensors.
- """
- if not is_quantized:
- return inputs
-
- with tf.variable_scope(scope):
- # Just use fixed quantization range.
- return tf.fake_quant_with_min_max_args(inputs, fixed_min, fixed_max)
diff --git a/research/lstm_object_detection/lstm/utils_test.py b/research/lstm_object_detection/lstm/utils_test.py
deleted file mode 100644
index f5f5bc75db8..00000000000
--- a/research/lstm_object_detection/lstm/utils_test.py
+++ /dev/null
@@ -1,149 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for lstm_object_detection.lstm.utils."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow.compat.v1 as tf
-from lstm_object_detection.lstm import utils
-
-
-class QuantizableUtilsTest(tf.test.TestCase):
-
- def test_quantizable_concat_is_training(self):
- inputs_1 = tf.zeros([4, 10, 10, 1], dtype=tf.float32)
- inputs_2 = tf.ones([4, 10, 10, 2], dtype=tf.float32)
- concat_in_train = utils.quantizable_concat([inputs_1, inputs_2],
- axis=3,
- is_training=True)
- self.assertAllEqual([4, 10, 10, 3], concat_in_train.shape.as_list())
- self._check_min_max_ema(tf.get_default_graph())
- self._check_min_max_vars(tf.get_default_graph())
-
- def test_quantizable_concat_inference(self):
- inputs_1 = tf.zeros([4, 10, 10, 1], dtype=tf.float32)
- inputs_2 = tf.ones([4, 10, 10, 2], dtype=tf.float32)
- concat_in_train = utils.quantizable_concat([inputs_1, inputs_2],
- axis=3,
- is_training=False)
- self.assertAllEqual([4, 10, 10, 3], concat_in_train.shape.as_list())
- self._check_no_min_max_ema(tf.get_default_graph())
- self._check_min_max_vars(tf.get_default_graph())
-
- def test_quantizable_concat_not_quantized_is_training(self):
- inputs_1 = tf.zeros([4, 10, 10, 1], dtype=tf.float32)
- inputs_2 = tf.ones([4, 10, 10, 2], dtype=tf.float32)
- concat_in_train = utils.quantizable_concat([inputs_1, inputs_2],
- axis=3,
- is_training=True,
- is_quantized=False)
- self.assertAllEqual([4, 10, 10, 3], concat_in_train.shape.as_list())
- self._check_no_min_max_ema(tf.get_default_graph())
- self._check_no_min_max_vars(tf.get_default_graph())
-
- def test_quantizable_concat_not_quantized_inference(self):
- inputs_1 = tf.zeros([4, 10, 10, 1], dtype=tf.float32)
- inputs_2 = tf.ones([4, 10, 10, 2], dtype=tf.float32)
- concat_in_train = utils.quantizable_concat([inputs_1, inputs_2],
- axis=3,
- is_training=False,
- is_quantized=False)
- self.assertAllEqual([4, 10, 10, 3], concat_in_train.shape.as_list())
- self._check_no_min_max_ema(tf.get_default_graph())
- self._check_no_min_max_vars(tf.get_default_graph())
-
- def test_quantize_op_is_training(self):
- inputs = tf.zeros([4, 10, 10, 128], dtype=tf.float32)
- outputs = utils.quantize_op(inputs)
- self.assertAllEqual(inputs.shape.as_list(), outputs.shape.as_list())
- self._check_min_max_ema(tf.get_default_graph())
- self._check_min_max_vars(tf.get_default_graph())
-
- def test_quantize_op_inference(self):
- inputs = tf.zeros([4, 10, 10, 128], dtype=tf.float32)
- outputs = utils.quantize_op(inputs, is_training=False)
- self.assertAllEqual(inputs.shape.as_list(), outputs.shape.as_list())
- self._check_no_min_max_ema(tf.get_default_graph())
- self._check_min_max_vars(tf.get_default_graph())
-
- def test_fixed_quantize_op(self):
- inputs = tf.zeros([4, 10, 10, 128], dtype=tf.float32)
- outputs = utils.fixed_quantize_op(inputs)
- self.assertAllEqual(inputs.shape.as_list(), outputs.shape.as_list())
- self._check_no_min_max_ema(tf.get_default_graph())
- self._check_no_min_max_vars(tf.get_default_graph())
-
- def _check_min_max_vars(self, graph):
- op_types = [op.type for op in graph.get_operations()]
- self.assertTrue(
- any('FakeQuantWithMinMaxVars' in op_type for op_type in op_types))
-
- def _check_min_max_ema(self, graph):
- op_names = [op.name for op in graph.get_operations()]
- self.assertTrue(any('AssignMinEma' in name for name in op_names))
- self.assertTrue(any('AssignMaxEma' in name for name in op_names))
- self.assertTrue(any('SafeQuantRangeMin' in name for name in op_names))
- self.assertTrue(any('SafeQuantRangeMax' in name for name in op_names))
-
- def _check_no_min_max_vars(self, graph):
- op_types = [op.type for op in graph.get_operations()]
- self.assertFalse(
- any('FakeQuantWithMinMaxVars' in op_type for op_type in op_types))
-
- def _check_no_min_max_ema(self, graph):
- op_names = [op.name for op in graph.get_operations()]
- self.assertFalse(any('AssignMinEma' in name for name in op_names))
- self.assertFalse(any('AssignMaxEma' in name for name in op_names))
- self.assertFalse(any('SafeQuantRangeMin' in name for name in op_names))
- self.assertFalse(any('SafeQuantRangeMax' in name for name in op_names))
-
-
-class QuantizableSeparableConv2dTest(tf.test.TestCase):
-
- def test_quantizable_separable_conv2d(self):
- inputs = tf.zeros([4, 10, 10, 128], dtype=tf.float32)
- num_outputs = 64
- kernel_size = [3, 3]
- scope = 'QuantSeparable'
- outputs = utils.quantizable_separable_conv2d(
- inputs, num_outputs, kernel_size, scope=scope)
- self.assertAllEqual([4, 10, 10, num_outputs], outputs.shape.as_list())
- self._check_depthwise_bias_add(tf.get_default_graph(), scope)
-
- def test_quantizable_separable_conv2d_not_quantized(self):
- inputs = tf.zeros([4, 10, 10, 128], dtype=tf.float32)
- num_outputs = 64
- kernel_size = [3, 3]
- scope = 'QuantSeparable'
- outputs = utils.quantizable_separable_conv2d(
- inputs, num_outputs, kernel_size, is_quantized=False, scope=scope)
- self.assertAllEqual([4, 10, 10, num_outputs], outputs.shape.as_list())
- self._check_no_depthwise_bias_add(tf.get_default_graph(), scope)
-
- def _check_depthwise_bias_add(self, graph, scope):
- op_names = [op.name for op in graph.get_operations()]
- self.assertTrue(
- any('%s_bias/BiasAdd' % scope in name for name in op_names))
-
- def _check_no_depthwise_bias_add(self, graph, scope):
- op_names = [op.name for op in graph.get_operations()]
- self.assertFalse(
- any('%s_bias/BiasAdd' % scope in name for name in op_names))
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/meta_architectures/__init__.py b/research/lstm_object_detection/meta_architectures/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/lstm_object_detection/meta_architectures/lstm_ssd_meta_arch.py b/research/lstm_object_detection/meta_architectures/lstm_ssd_meta_arch.py
deleted file mode 100644
index 22edc97ee34..00000000000
--- a/research/lstm_object_detection/meta_architectures/lstm_ssd_meta_arch.py
+++ /dev/null
@@ -1,463 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""LSTM SSD Meta-architecture definition.
-
-General tensorflow implementation of convolutional Multibox/SSD detection
-models with LSTM states, for use on video data. This implementation supports
-both regular LSTM-SSD and interleaved LSTM-SSD framework.
-
-See https://arxiv.org/abs/1711.06368 and https://arxiv.org/abs/1903.10172
-for details.
-"""
-import abc
-import re
-import tensorflow.compat.v1 as tf
-
-from object_detection.core import box_list_ops
-from object_detection.core import matcher
-from object_detection.core import standard_fields as fields
-from object_detection.meta_architectures import ssd_meta_arch
-from object_detection.utils import ops
-from object_detection.utils import shape_utils
-
-
-class LSTMSSDMetaArch(ssd_meta_arch.SSDMetaArch):
- """LSTM Meta-architecture definition."""
-
- def __init__(self,
- is_training,
- anchor_generator,
- box_predictor,
- box_coder,
- feature_extractor,
- encode_background_as_zeros,
- image_resizer_fn,
- non_max_suppression_fn,
- score_conversion_fn,
- classification_loss,
- localization_loss,
- classification_loss_weight,
- localization_loss_weight,
- normalize_loss_by_num_matches,
- hard_example_miner,
- unroll_length,
- target_assigner_instance,
- add_summaries=True):
- super(LSTMSSDMetaArch, self).__init__(
- is_training=is_training,
- anchor_generator=anchor_generator,
- box_predictor=box_predictor,
- box_coder=box_coder,
- feature_extractor=feature_extractor,
- encode_background_as_zeros=encode_background_as_zeros,
- image_resizer_fn=image_resizer_fn,
- non_max_suppression_fn=non_max_suppression_fn,
- score_conversion_fn=score_conversion_fn,
- classification_loss=classification_loss,
- localization_loss=localization_loss,
- classification_loss_weight=classification_loss_weight,
- localization_loss_weight=localization_loss_weight,
- normalize_loss_by_num_matches=normalize_loss_by_num_matches,
- hard_example_miner=hard_example_miner,
- target_assigner_instance=target_assigner_instance,
- add_summaries=add_summaries)
- self._unroll_length = unroll_length
-
- @property
- def unroll_length(self):
- return self._unroll_length
-
- @unroll_length.setter
- def unroll_length(self, unroll_length):
- self._unroll_length = unroll_length
-
- def predict(self, preprocessed_inputs, true_image_shapes, states=None,
- state_name='lstm_state', feature_scope=None):
- with tf.variable_scope(self._extract_features_scope,
- values=[preprocessed_inputs], reuse=tf.AUTO_REUSE):
- feature_maps = self._feature_extractor.extract_features(
- preprocessed_inputs, states, state_name,
- unroll_length=self._unroll_length, scope=feature_scope)
- feature_map_spatial_dims = self._get_feature_map_spatial_dims(feature_maps)
- image_shape = shape_utils.combined_static_and_dynamic_shape(
- preprocessed_inputs)
- self._batch_size = preprocessed_inputs.shape[0].value / self._unroll_length
- self._states = states
- anchors = self._anchor_generator.generate(feature_map_spatial_dims,
- im_height=image_shape[1],
- im_width=image_shape[2])
- with tf.variable_scope('MultipleGridAnchorGenerator', reuse=tf.AUTO_REUSE):
- self._anchors = box_list_ops.concatenate(anchors)
- prediction_dict = self._box_predictor.predict(
- feature_maps, self._anchor_generator.num_anchors_per_location())
- with tf.variable_scope('Loss', reuse=tf.AUTO_REUSE):
- box_encodings = tf.concat(prediction_dict['box_encodings'], axis=1)
- if box_encodings.shape.ndims == 4 and box_encodings.shape[2] == 1:
- box_encodings = tf.squeeze(box_encodings, axis=2)
- class_predictions_with_background = tf.concat(
- prediction_dict['class_predictions_with_background'], axis=1)
- predictions_dict = {
- 'preprocessed_inputs': preprocessed_inputs,
- 'box_encodings': box_encodings,
- 'class_predictions_with_background': class_predictions_with_background,
- 'feature_maps': feature_maps,
- 'anchors': self._anchors.get(),
- 'states_and_outputs': self._feature_extractor.states_and_outputs,
- }
- # In cases such as exporting the model, the states is always zero. Thus the
- # step should be ignored.
- if states is not None:
- predictions_dict['step'] = self._feature_extractor.step
- return predictions_dict
-
- def loss(self, prediction_dict, true_image_shapes, scope=None):
- """Computes scalar loss tensors with respect to provided groundtruth.
-
- Calling this function requires that groundtruth tensors have been
- provided via the provide_groundtruth function.
-
- Args:
- prediction_dict: a dictionary holding prediction tensors with
- 1) box_encodings: 3-D float tensor of shape [batch_size, num_anchors,
- box_code_dimension] containing predicted boxes.
- 2) class_predictions_with_background: 3-D float tensor of shape
- [batch_size, num_anchors, num_classes+1] containing class predictions
- (logits) for each of the anchors. Note that this tensor *includes*
- background class predictions.
- true_image_shapes: int32 tensor of shape [batch, 3] where each row is
- of the form [height, width, channels] indicating the shapes
- of true images in the resized images, as resized images can be padded
- with zeros.
- scope: Optional scope name.
-
- Returns:
- a dictionary mapping loss keys (`localization_loss` and
- `classification_loss`) to scalar tensors representing corresponding loss
- values.
- """
- with tf.name_scope(scope, 'Loss', prediction_dict.values()):
- keypoints = None
- if self.groundtruth_has_field(fields.BoxListFields.keypoints):
- keypoints = self.groundtruth_lists(fields.BoxListFields.keypoints)
- weights = None
- if self.groundtruth_has_field(fields.BoxListFields.weights):
- weights = self.groundtruth_lists(fields.BoxListFields.weights)
- (batch_cls_targets, batch_cls_weights, batch_reg_targets,
- batch_reg_weights, batch_match) = self._assign_targets(
- self.groundtruth_lists(fields.BoxListFields.boxes),
- self.groundtruth_lists(fields.BoxListFields.classes),
- keypoints, weights)
- match_list = [matcher.Match(match) for match in tf.unstack(batch_match)]
- if self._add_summaries:
- self._summarize_target_assignment(
- self.groundtruth_lists(fields.BoxListFields.boxes), match_list)
- location_losses = self._localization_loss(
- prediction_dict['box_encodings'],
- batch_reg_targets,
- ignore_nan_targets=True,
- weights=batch_reg_weights)
- cls_losses = ops.reduce_sum_trailing_dimensions(
- self._classification_loss(
- prediction_dict['class_predictions_with_background'],
- batch_cls_targets,
- weights=batch_cls_weights),
- ndims=2)
-
- if self._hard_example_miner:
- (loc_loss_list, cls_loss_list) = self._apply_hard_mining(
- location_losses, cls_losses, prediction_dict, match_list)
- localization_loss = tf.reduce_sum(tf.stack(loc_loss_list))
- classification_loss = tf.reduce_sum(tf.stack(cls_loss_list))
-
- if self._add_summaries:
- self._hard_example_miner.summarize()
- else:
- if self._add_summaries:
- class_ids = tf.argmax(batch_cls_targets, axis=2)
- flattened_class_ids = tf.reshape(class_ids, [-1])
- flattened_classification_losses = tf.reshape(cls_losses, [-1])
- self._summarize_anchor_classification_loss(
- flattened_class_ids, flattened_classification_losses)
- localization_loss = tf.reduce_sum(location_losses)
- classification_loss = tf.reduce_sum(cls_losses)
-
- # Optionally normalize by number of positive matches
- normalizer = tf.constant(1.0, dtype=tf.float32)
- if self._normalize_loss_by_num_matches:
- normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
- 1.0)
-
- with tf.name_scope('localization_loss'):
- localization_loss_normalizer = normalizer
- if self._normalize_loc_loss_by_codesize:
- localization_loss_normalizer *= self._box_coder.code_size
- localization_loss = ((self._localization_loss_weight / (
- localization_loss_normalizer)) * localization_loss)
- with tf.name_scope('classification_loss'):
- classification_loss = ((self._classification_loss_weight / normalizer) *
- classification_loss)
-
- loss_dict = {
- 'localization_loss': localization_loss,
- 'classification_loss': classification_loss
- }
- return loss_dict
-
- def restore_map(self, fine_tune_checkpoint_type='lstm'):
- """Returns a map of variables to load from a foreign checkpoint.
-
- See parent class for details.
-
- Args:
- fine_tune_checkpoint_type: the type of checkpoint to restore from, either
- SSD/LSTM detection checkpoint (with compatible variable names)
- classification checkpoint for initialization prior to training.
- Available options: `classification`, `detection`, `interleaved`,
- and `lstm`.
-
- Returns:
- A dict mapping variable names (to load from a checkpoint) to variables in
- the model graph.
- Raises:
- ValueError: if fine_tune_checkpoint_type is not among
- `classification`/`detection`/`interleaved`/`lstm`.
- """
- if fine_tune_checkpoint_type not in [
- 'classification', 'detection', 'interleaved', 'lstm',
- 'interleaved_pretrain'
- ]:
- raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
- fine_tune_checkpoint_type))
-
- self._restored_networks += 1
- base_network_scope = self.get_base_network_scope()
- if base_network_scope:
- scope_to_replace = '{0}_{1}'.format(base_network_scope,
- self._restored_networks)
-
- interleaved_model = False
- for variable in tf.global_variables():
- if scope_to_replace in variable.op.name:
- interleaved_model = True
- break
-
- variables_to_restore = {}
- for variable in tf.global_variables():
- var_name = variable.op.name
- if 'global_step' in var_name:
- continue
-
- # Remove FeatureExtractor prefix for classification checkpoints.
- if (fine_tune_checkpoint_type == 'classification' or
- fine_tune_checkpoint_type == 'interleaved_pretrain'):
- var_name = (
- re.split('^' + self._extract_features_scope + '/', var_name)[-1])
-
- # When loading from single frame detection checkpoints, we need to
- # remap FeatureMaps variable names.
- if ('FeatureMaps' in var_name and
- fine_tune_checkpoint_type == 'detection'):
- var_name = var_name.replace('FeatureMaps',
- self.get_base_network_scope())
-
- # Load interleaved checkpoint specifically.
- if interleaved_model: # Interleaved LSTD.
- if 'interleaved' in fine_tune_checkpoint_type:
- variables_to_restore[var_name] = variable
- else:
- # Restore non-base layers from the first checkpoint only.
- if self._restored_networks == 1:
- if base_network_scope + '_' not in var_name: # LSTM and FeatureMap
- variables_to_restore[var_name] = variable
- if scope_to_replace in var_name:
- var_name = var_name.replace(scope_to_replace, base_network_scope)
- variables_to_restore[var_name] = variable
- else:
- # Restore from the first model of interleaved checkpoints
- if 'interleaved' in fine_tune_checkpoint_type:
- var_name = var_name.replace(self.get_base_network_scope(),
- self.get_base_network_scope() + '_1', 1)
-
- variables_to_restore[var_name] = variable
-
- return variables_to_restore
-
- def get_base_network_scope(self):
- """Returns the variable scope of the base network.
-
- Returns:
- The variable scope of the feature extractor base network, e.g. MobilenetV1
- """
- return self._feature_extractor.get_base_network_scope()
-
-
-class LSTMSSDFeatureExtractor(ssd_meta_arch.SSDFeatureExtractor):
- """LSTM SSD Meta-architecture Feature Extractor definition."""
-
- __metaclass__ = abc.ABCMeta
-
- @property
- def clip_state(self):
- return self._clip_state
-
- @clip_state.setter
- def clip_state(self, clip_state):
- self._clip_state = clip_state
-
- @property
- def depth_multipliers(self):
- return self._depth_multipliers
-
- @depth_multipliers.setter
- def depth_multipliers(self, depth_multipliers):
- self._depth_multipliers = depth_multipliers
-
- @property
- def lstm_state_depth(self):
- return self._lstm_state_depth
-
- @lstm_state_depth.setter
- def lstm_state_depth(self, lstm_state_depth):
- self._lstm_state_depth = lstm_state_depth
-
- @property
- def is_quantized(self):
- return self._is_quantized
-
- @is_quantized.setter
- def is_quantized(self, is_quantized):
- self._is_quantized = is_quantized
-
- @property
- def interleaved(self):
- return False
-
- @property
- def states_and_outputs(self):
- """LSTM states and outputs.
-
- This variable includes both LSTM states {C_t} and outputs {h_t}.
-
- Returns:
- states_and_outputs: A list of 4-D float tensors, including the lstm state
- and output at each timestep.
- """
- return self._states_out
-
- @property
- def step(self):
- return self._step
-
- def preprocess(self, resized_inputs):
- """SSD preprocessing.
-
- Maps pixel values to the range [-1, 1].
-
- Args:
- resized_inputs: a [batch, height, width, channels] float tensor
- representing a batch of images.
-
- Returns:
- preprocessed_inputs: a [batch, height, width, channels] float tensor
- representing a batch of images.
- """
- return (2.0 / 255.0) * resized_inputs - 1.0
-
- def get_base_network_scope(self):
- """Returns the variable scope of the base network.
-
- Returns:
- The variable scope of the base network, e.g. MobilenetV1
- """
- return self._base_network_scope
-
- @abc.abstractmethod
- def create_lstm_cell(self, batch_size, output_size, state_saver, state_name):
- """Create the LSTM cell, and initialize state if necessary.
-
- Args:
- batch_size: input batch size.
- output_size: output size of the lstm cell, [width, height].
- state_saver: a state saver object with methods `state` and `save_state`.
- state_name: string, the name to use with the state_saver.
- Returns:
- lstm_cell: the lstm cell unit.
- init_state: initial state representations.
- step: the step
- """
- pass
-
-
-class LSTMSSDInterleavedFeatureExtractor(LSTMSSDFeatureExtractor):
- """LSTM SSD Meta-architecture Interleaved Feature Extractor definition."""
-
- __metaclass__ = abc.ABCMeta
-
- @property
- def pre_bottleneck(self):
- return self._pre_bottleneck
-
- @pre_bottleneck.setter
- def pre_bottleneck(self, pre_bottleneck):
- self._pre_bottleneck = pre_bottleneck
-
- @property
- def low_res(self):
- return self._low_res
-
- @low_res.setter
- def low_res(self, low_res):
- self._low_res = low_res
-
- @property
- def interleaved(self):
- return True
-
- @property
- def interleave_method(self):
- return self._interleave_method
-
- @interleave_method.setter
- def interleave_method(self, interleave_method):
- self._interleave_method = interleave_method
-
- @abc.abstractmethod
- def extract_base_features_large(self, preprocessed_inputs):
- """Extract the large base model features.
-
- Args:
- preprocessed_inputs: preprocessed input images of shape:
- [batch, width, height, depth].
-
- Returns:
- net: the last feature map created from the base feature extractor.
- end_points: a dictionary of feature maps created.
- """
- pass
-
- @abc.abstractmethod
- def extract_base_features_small(self, preprocessed_inputs):
- """Extract the small base model features.
-
- Args:
- preprocessed_inputs: preprocessed input images of shape:
- [batch, width, height, depth].
-
- Returns:
- net: the last feature map created from the base feature extractor.
- end_points: a dictionary of feature maps created.
- """
- pass
diff --git a/research/lstm_object_detection/meta_architectures/lstm_ssd_meta_arch_test.py b/research/lstm_object_detection/meta_architectures/lstm_ssd_meta_arch_test.py
deleted file mode 100644
index 03e8a127460..00000000000
--- a/research/lstm_object_detection/meta_architectures/lstm_ssd_meta_arch_test.py
+++ /dev/null
@@ -1,320 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for meta_architectures.lstm_ssd_meta_arch."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import functools
-
-import numpy as np
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-
-from lstm_object_detection.lstm import lstm_cells
-from lstm_object_detection.meta_architectures import lstm_ssd_meta_arch
-from object_detection.core import anchor_generator
-from object_detection.core import box_list
-from object_detection.core import losses
-from object_detection.core import post_processing
-from object_detection.core import region_similarity_calculator as sim_calc
-from object_detection.core import standard_fields as fields
-from object_detection.core import target_assigner
-from object_detection.models import feature_map_generators
-from object_detection.utils import test_case
-from object_detection.utils import test_utils
-
-
-MAX_TOTAL_NUM_BOXES = 5
-NUM_CLASSES = 1
-
-
-class FakeLSTMFeatureExtractor(
- lstm_ssd_meta_arch.LSTMSSDFeatureExtractor):
-
- def __init__(self):
- super(FakeLSTMFeatureExtractor, self).__init__(
- is_training=True,
- depth_multiplier=1.0,
- min_depth=0,
- pad_to_multiple=1,
- conv_hyperparams_fn=self.scope_fn)
- self._lstm_state_depth = 256
-
- def scope_fn(self):
- with slim.arg_scope([slim.conv2d], activation_fn=tf.nn.relu6) as sc:
- return sc
-
- def create_lstm_cell(self):
- pass
-
- def extract_features(self, preprocessed_inputs, state_saver=None,
- state_name='lstm_state', unroll_length=5, scope=None):
- with tf.variable_scope('mock_model'):
- net = slim.conv2d(inputs=preprocessed_inputs, num_outputs=32,
- kernel_size=1, scope='layer1')
- image_features = {'last_layer': net}
-
- self._states_out = {}
- feature_map_layout = {
- 'from_layer': ['last_layer'],
- 'layer_depth': [-1],
- 'use_explicit_padding': self._use_explicit_padding,
- 'use_depthwise': self._use_depthwise,
- }
- feature_maps = feature_map_generators.multi_resolution_feature_maps(
- feature_map_layout=feature_map_layout,
- depth_multiplier=(self._depth_multiplier),
- min_depth=self._min_depth,
- insert_1x1_conv=True,
- image_features=image_features)
- return list(feature_maps.values())
-
-
-class FakeLSTMInterleavedFeatureExtractor(
- lstm_ssd_meta_arch.LSTMSSDInterleavedFeatureExtractor):
-
- def __init__(self):
- super(FakeLSTMInterleavedFeatureExtractor, self).__init__(
- is_training=True,
- depth_multiplier=1.0,
- min_depth=0,
- pad_to_multiple=1,
- conv_hyperparams_fn=self.scope_fn)
- self._lstm_state_depth = 256
-
- def scope_fn(self):
- with slim.arg_scope([slim.conv2d], activation_fn=tf.nn.relu6) as sc:
- return sc
-
- def create_lstm_cell(self):
- pass
-
- def extract_base_features_large(self, preprocessed_inputs):
- with tf.variable_scope('base_large'):
- net = slim.conv2d(inputs=preprocessed_inputs, num_outputs=32,
- kernel_size=1, scope='layer1')
- return net
-
- def extract_base_features_small(self, preprocessed_inputs):
- with tf.variable_scope('base_small'):
- net = slim.conv2d(inputs=preprocessed_inputs, num_outputs=32,
- kernel_size=1, scope='layer1')
- return net
-
- def extract_features(self, preprocessed_inputs, state_saver=None,
- state_name='lstm_state', unroll_length=5, scope=None):
- with tf.variable_scope('mock_model'):
- net_large = self.extract_base_features_large(preprocessed_inputs)
- net_small = self.extract_base_features_small(preprocessed_inputs)
- net = slim.conv2d(
- inputs=tf.concat([net_large, net_small], axis=3),
- num_outputs=32,
- kernel_size=1,
- scope='layer1')
- image_features = {'last_layer': net}
-
- self._states_out = {}
- feature_map_layout = {
- 'from_layer': ['last_layer'],
- 'layer_depth': [-1],
- 'use_explicit_padding': self._use_explicit_padding,
- 'use_depthwise': self._use_depthwise,
- }
- feature_maps = feature_map_generators.multi_resolution_feature_maps(
- feature_map_layout=feature_map_layout,
- depth_multiplier=(self._depth_multiplier),
- min_depth=self._min_depth,
- insert_1x1_conv=True,
- image_features=image_features)
- return list(feature_maps.values())
-
-
-class MockAnchorGenerator2x2(anchor_generator.AnchorGenerator):
- """Sets up a simple 2x2 anchor grid on the unit square."""
-
- def name_scope(self):
- return 'MockAnchorGenerator'
-
- def num_anchors_per_location(self):
- return [1]
-
- def _generate(self, feature_map_shape_list, im_height, im_width):
- return [box_list.BoxList(
- tf.constant([[0, 0, .5, .5],
- [0, .5, .5, 1],
- [.5, 0, 1, .5],
- [1., 1., 1.5, 1.5] # Anchor that is outside clip_window.
- ], tf.float32))]
-
- def num_anchors(self):
- return 4
-
-
-class LSTMSSDMetaArchTest(test_case.TestCase):
-
- def _create_model(self,
- interleaved=False,
- apply_hard_mining=True,
- normalize_loc_loss_by_codesize=False,
- add_background_class=True,
- random_example_sampling=False,
- use_expected_classification_loss_under_sampling=False,
- min_num_negative_samples=1,
- desired_negative_sampling_ratio=3,
- unroll_length=1):
- num_classes = NUM_CLASSES
- is_training = False
- mock_anchor_generator = MockAnchorGenerator2x2()
- mock_box_predictor = test_utils.MockBoxPredictor(is_training, num_classes)
- mock_box_coder = test_utils.MockBoxCoder()
- if interleaved:
- fake_feature_extractor = FakeLSTMInterleavedFeatureExtractor()
- else:
- fake_feature_extractor = FakeLSTMFeatureExtractor()
- mock_matcher = test_utils.MockMatcher()
- region_similarity_calculator = sim_calc.IouSimilarity()
- encode_background_as_zeros = False
- def image_resizer_fn(image):
- return [tf.identity(image), tf.shape(image)]
-
- classification_loss = losses.WeightedSigmoidClassificationLoss()
- localization_loss = losses.WeightedSmoothL1LocalizationLoss()
- non_max_suppression_fn = functools.partial(
- post_processing.batch_multiclass_non_max_suppression,
- score_thresh=-20.0,
- iou_thresh=1.0,
- max_size_per_class=5,
- max_total_size=MAX_TOTAL_NUM_BOXES)
- classification_loss_weight = 1.0
- localization_loss_weight = 1.0
- negative_class_weight = 1.0
- normalize_loss_by_num_matches = False
-
- hard_example_miner = None
- if apply_hard_mining:
- # This hard example miner is expected to be a no-op.
- hard_example_miner = losses.HardExampleMiner(
- num_hard_examples=None,
- iou_threshold=1.0)
-
- target_assigner_instance = target_assigner.TargetAssigner(
- region_similarity_calculator,
- mock_matcher,
- mock_box_coder,
- negative_class_weight=negative_class_weight)
-
- code_size = 4
- model = lstm_ssd_meta_arch.LSTMSSDMetaArch(
- is_training=is_training,
- anchor_generator=mock_anchor_generator,
- box_predictor=mock_box_predictor,
- box_coder=mock_box_coder,
- feature_extractor=fake_feature_extractor,
- encode_background_as_zeros=encode_background_as_zeros,
- image_resizer_fn=image_resizer_fn,
- non_max_suppression_fn=non_max_suppression_fn,
- score_conversion_fn=tf.identity,
- classification_loss=classification_loss,
- localization_loss=localization_loss,
- classification_loss_weight=classification_loss_weight,
- localization_loss_weight=localization_loss_weight,
- normalize_loss_by_num_matches=normalize_loss_by_num_matches,
- hard_example_miner=hard_example_miner,
- unroll_length=unroll_length,
- target_assigner_instance=target_assigner_instance,
- add_summaries=False)
- return model, num_classes, mock_anchor_generator.num_anchors(), code_size
-
- def _get_value_for_matching_key(self, dictionary, suffix):
- for key in dictionary.keys():
- if key.endswith(suffix):
- return dictionary[key]
- raise ValueError('key not found {}'.format(suffix))
-
- def test_predict_returns_correct_items_and_sizes(self):
- batch_size = 3
- height = width = 2
- num_unroll = 1
-
- graph = tf.Graph()
- with graph.as_default():
- model, num_classes, num_anchors, code_size = self._create_model()
- preprocessed_images = tf.random_uniform(
- [batch_size * num_unroll, height, width, 3],
- minval=-1.,
- maxval=1.)
- true_image_shapes = tf.tile(
- [[height, width, 3]], [batch_size, 1])
- prediction_dict = model.predict(preprocessed_images, true_image_shapes)
-
-
- self.assertIn('preprocessed_inputs', prediction_dict)
- self.assertIn('box_encodings', prediction_dict)
- self.assertIn('class_predictions_with_background', prediction_dict)
- self.assertIn('feature_maps', prediction_dict)
- self.assertIn('anchors', prediction_dict)
- self.assertAllEqual(
- [batch_size * num_unroll, height, width, 3],
- prediction_dict['preprocessed_inputs'].shape.as_list())
- self.assertAllEqual(
- [batch_size * num_unroll, num_anchors, code_size],
- prediction_dict['box_encodings'].shape.as_list())
- self.assertAllEqual(
- [batch_size * num_unroll, num_anchors, num_classes + 1],
- prediction_dict['class_predictions_with_background'].shape.as_list())
- self.assertAllEqual(
- [num_anchors, code_size],
- prediction_dict['anchors'].shape.as_list())
-
- def test_interleaved_predict_returns_correct_items_and_sizes(self):
- batch_size = 3
- height = width = 2
- num_unroll = 1
-
- graph = tf.Graph()
- with graph.as_default():
- model, num_classes, num_anchors, code_size = self._create_model(
- interleaved=True)
- preprocessed_images = tf.random_uniform(
- [batch_size * num_unroll, height, width, 3],
- minval=-1.,
- maxval=1.)
- true_image_shapes = tf.tile(
- [[height, width, 3]], [batch_size, 1])
- prediction_dict = model.predict(preprocessed_images, true_image_shapes)
-
- self.assertIn('preprocessed_inputs', prediction_dict)
- self.assertIn('box_encodings', prediction_dict)
- self.assertIn('class_predictions_with_background', prediction_dict)
- self.assertIn('feature_maps', prediction_dict)
- self.assertIn('anchors', prediction_dict)
- self.assertAllEqual(
- [batch_size * num_unroll, height, width, 3],
- prediction_dict['preprocessed_inputs'].shape.as_list())
- self.assertAllEqual(
- [batch_size * num_unroll, num_anchors, code_size],
- prediction_dict['box_encodings'].shape.as_list())
- self.assertAllEqual(
- [batch_size * num_unroll, num_anchors, num_classes + 1],
- prediction_dict['class_predictions_with_background'].shape.as_list())
- self.assertAllEqual(
- [num_anchors, code_size],
- prediction_dict['anchors'].shape.as_list())
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/metrics/__init__.py b/research/lstm_object_detection/metrics/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/lstm_object_detection/metrics/coco_evaluation_all_frames.py b/research/lstm_object_detection/metrics/coco_evaluation_all_frames.py
deleted file mode 100644
index 8e6d336cbf7..00000000000
--- a/research/lstm_object_detection/metrics/coco_evaluation_all_frames.py
+++ /dev/null
@@ -1,124 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Class for evaluating video object detections with COCO metrics."""
-
-import tensorflow.compat.v1 as tf
-
-from object_detection.core import standard_fields
-from object_detection.metrics import coco_evaluation
-from object_detection.metrics import coco_tools
-
-
-class CocoEvaluationAllFrames(coco_evaluation.CocoDetectionEvaluator):
- """Class to evaluate COCO detection metrics for frame sequences.
-
- The class overrides two functions: add_single_ground_truth_image_info and
- add_single_detected_image_info.
-
- For the evaluation of sequence video detection, by iterating through the
- entire groundtruth_dict, all the frames in the unrolled frames in one LSTM
- training sample are considered. Therefore, both groundtruth and detection
- results of all frames are added for the evaluation. This is used when all the
- frames are labeled in the video object detection training job.
- """
-
- def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
- """Add groundtruth results of all frames to the eval pipeline.
-
- This method overrides the function defined in the base class.
-
- Args:
- image_id: A unique string/integer identifier for the image.
- groundtruth_dict: A list of dictionary containing -
- InputDataFields.groundtruth_boxes: float32 numpy array of shape
- [num_boxes, 4] containing `num_boxes` groundtruth boxes of the format
- [ymin, xmin, ymax, xmax] in absolute image coordinates.
- InputDataFields.groundtruth_classes: integer numpy array of shape
- [num_boxes] containing 1-indexed groundtruth classes for the boxes.
- InputDataFields.groundtruth_is_crowd (optional): integer numpy array of
- shape [num_boxes] containing iscrowd flag for groundtruth boxes.
- """
- for idx, gt in enumerate(groundtruth_dict):
- if not gt:
- continue
-
- image_frame_id = '{}_{}'.format(image_id, idx)
- if image_frame_id in self._image_ids:
- tf.logging.warning(
- 'Ignoring ground truth with image id %s since it was '
- 'previously added', image_frame_id)
- continue
-
- self._groundtruth_list.extend(
- coco_tools.ExportSingleImageGroundtruthToCoco(
- image_id=image_frame_id,
- next_annotation_id=self._annotation_id,
- category_id_set=self._category_id_set,
- groundtruth_boxes=gt[
- standard_fields.InputDataFields.groundtruth_boxes],
- groundtruth_classes=gt[
- standard_fields.InputDataFields.groundtruth_classes]))
- self._annotation_id += (
- gt[standard_fields.InputDataFields.groundtruth_boxes].shape[0])
-
- # Boolean to indicate whether a detection has been added for this image.
- self._image_ids[image_frame_id] = False
-
- def add_single_detected_image_info(self, image_id, detections_dict):
- """Add detection results of all frames to the eval pipeline.
-
- This method overrides the function defined in the base class.
-
- Args:
- image_id: A unique string/integer identifier for the image.
- detections_dict: A list of dictionary containing -
- DetectionResultFields.detection_boxes: float32 numpy array of shape
- [num_boxes, 4] containing `num_boxes` detection boxes of the format
- [ymin, xmin, ymax, xmax] in absolute image coordinates.
- DetectionResultFields.detection_scores: float32 numpy array of shape
- [num_boxes] containing detection scores for the boxes.
- DetectionResultFields.detection_classes: integer numpy array of shape
- [num_boxes] containing 1-indexed detection classes for the boxes.
-
- Raises:
- ValueError: If groundtruth for the image_id is not available.
- """
- for idx, det in enumerate(detections_dict):
- if not det:
- continue
-
- image_frame_id = '{}_{}'.format(image_id, idx)
- if image_frame_id not in self._image_ids:
- raise ValueError(
- 'Missing groundtruth for image-frame id: {}'.format(image_frame_id))
-
- if self._image_ids[image_frame_id]:
- tf.logging.warning(
- 'Ignoring detection with image id %s since it was '
- 'previously added', image_frame_id)
- continue
-
- self._detection_boxes_list.extend(
- coco_tools.ExportSingleImageDetectionBoxesToCoco(
- image_id=image_frame_id,
- category_id_set=self._category_id_set,
- detection_boxes=det[
- standard_fields.DetectionResultFields.detection_boxes],
- detection_scores=det[
- standard_fields.DetectionResultFields.detection_scores],
- detection_classes=det[
- standard_fields.DetectionResultFields.detection_classes]))
- self._image_ids[image_frame_id] = True
diff --git a/research/lstm_object_detection/metrics/coco_evaluation_all_frames_test.py b/research/lstm_object_detection/metrics/coco_evaluation_all_frames_test.py
deleted file mode 100644
index 9c1e7b7546b..00000000000
--- a/research/lstm_object_detection/metrics/coco_evaluation_all_frames_test.py
+++ /dev/null
@@ -1,156 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for video_object_detection.metrics.coco_video_evaluation."""
-
-import numpy as np
-import tensorflow.compat.v1 as tf
-from lstm_object_detection.metrics import coco_evaluation_all_frames
-from object_detection.core import standard_fields
-
-
-class CocoEvaluationAllFramesTest(tf.test.TestCase):
-
- def testGroundtruthAndDetectionsDisagreeOnAllFrames(self):
- """Tests that mAP is calculated on several different frame results."""
- category_list = [{'id': 0, 'name': 'dog'}, {'id': 1, 'name': 'cat'}]
- video_evaluator = coco_evaluation_all_frames.CocoEvaluationAllFrames(
- category_list)
- video_evaluator.add_single_ground_truth_image_info(
- image_id='image1',
- groundtruth_dict=[{
- standard_fields.InputDataFields.groundtruth_boxes:
- np.array([[50., 50., 200., 200.]]),
- standard_fields.InputDataFields.groundtruth_classes:
- np.array([1])
- }, {
- standard_fields.InputDataFields.groundtruth_boxes:
- np.array([[50., 50., 100., 100.]]),
- standard_fields.InputDataFields.groundtruth_classes:
- np.array([1])
- }])
- video_evaluator.add_single_detected_image_info(
- image_id='image1',
- # A different groundtruth box on the frame other than the last one.
- detections_dict=[{
- standard_fields.DetectionResultFields.detection_boxes:
- np.array([[100., 100., 200., 200.]]),
- standard_fields.DetectionResultFields.detection_scores:
- np.array([.8]),
- standard_fields.DetectionResultFields.detection_classes:
- np.array([1])
- }, {
- standard_fields.DetectionResultFields.detection_boxes:
- np.array([[50., 50., 100., 100.]]),
- standard_fields.DetectionResultFields.detection_scores:
- np.array([.8]),
- standard_fields.DetectionResultFields.detection_classes:
- np.array([1])
- }])
-
- metrics = video_evaluator.evaluate()
- self.assertNotEqual(metrics['DetectionBoxes_Precision/mAP'], 1.0)
-
- def testGroundtruthAndDetections(self):
- """Tests that mAP is calculated correctly on GT and Detections."""
- category_list = [{'id': 0, 'name': 'dog'}, {'id': 1, 'name': 'cat'}]
- video_evaluator = coco_evaluation_all_frames.CocoEvaluationAllFrames(
- category_list)
- video_evaluator.add_single_ground_truth_image_info(
- image_id='image1',
- groundtruth_dict=[{
- standard_fields.InputDataFields.groundtruth_boxes:
- np.array([[100., 100., 200., 200.]]),
- standard_fields.InputDataFields.groundtruth_classes:
- np.array([1])
- }])
- video_evaluator.add_single_ground_truth_image_info(
- image_id='image2',
- groundtruth_dict=[{
- standard_fields.InputDataFields.groundtruth_boxes:
- np.array([[50., 50., 100., 100.]]),
- standard_fields.InputDataFields.groundtruth_classes:
- np.array([1])
- }])
- video_evaluator.add_single_ground_truth_image_info(
- image_id='image3',
- groundtruth_dict=[{
- standard_fields.InputDataFields.groundtruth_boxes:
- np.array([[50., 100., 100., 120.]]),
- standard_fields.InputDataFields.groundtruth_classes:
- np.array([1])
- }])
- video_evaluator.add_single_detected_image_info(
- image_id='image1',
- detections_dict=[{
- standard_fields.DetectionResultFields.detection_boxes:
- np.array([[100., 100., 200., 200.]]),
- standard_fields.DetectionResultFields.detection_scores:
- np.array([.8]),
- standard_fields.DetectionResultFields.detection_classes:
- np.array([1])
- }])
- video_evaluator.add_single_detected_image_info(
- image_id='image2',
- detections_dict=[{
- standard_fields.DetectionResultFields.detection_boxes:
- np.array([[50., 50., 100., 100.]]),
- standard_fields.DetectionResultFields.detection_scores:
- np.array([.8]),
- standard_fields.DetectionResultFields.detection_classes:
- np.array([1])
- }])
- video_evaluator.add_single_detected_image_info(
- image_id='image3',
- detections_dict=[{
- standard_fields.DetectionResultFields.detection_boxes:
- np.array([[50., 100., 100., 120.]]),
- standard_fields.DetectionResultFields.detection_scores:
- np.array([.8]),
- standard_fields.DetectionResultFields.detection_classes:
- np.array([1])
- }])
- metrics = video_evaluator.evaluate()
- self.assertAlmostEqual(metrics['DetectionBoxes_Precision/mAP'], 1.0)
-
- def testMissingDetectionResults(self):
- """Tests if groundtrue is missing, raises ValueError."""
- category_list = [{'id': 0, 'name': 'dog'}]
- video_evaluator = coco_evaluation_all_frames.CocoEvaluationAllFrames(
- category_list)
- video_evaluator.add_single_ground_truth_image_info(
- image_id='image1',
- groundtruth_dict=[{
- standard_fields.InputDataFields.groundtruth_boxes:
- np.array([[100., 100., 200., 200.]]),
- standard_fields.InputDataFields.groundtruth_classes:
- np.array([1])
- }])
- with self.assertRaisesRegexp(ValueError,
- r'Missing groundtruth for image-frame id:.*'):
- video_evaluator.add_single_detected_image_info(
- image_id='image3',
- detections_dict=[{
- standard_fields.DetectionResultFields.detection_boxes:
- np.array([[100., 100., 200., 200.]]),
- standard_fields.DetectionResultFields.detection_scores:
- np.array([.8]),
- standard_fields.DetectionResultFields.detection_classes:
- np.array([1])
- }])
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/model_builder.py b/research/lstm_object_detection/model_builder.py
deleted file mode 100644
index d622558cf75..00000000000
--- a/research/lstm_object_detection/model_builder.py
+++ /dev/null
@@ -1,192 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""A function to build a DetectionModel from configuration."""
-from lstm_object_detection.meta_architectures import lstm_ssd_meta_arch
-from lstm_object_detection.models import lstm_ssd_interleaved_mobilenet_v2_feature_extractor
-from lstm_object_detection.models import lstm_ssd_mobilenet_v1_feature_extractor
-from object_detection.builders import anchor_generator_builder
-from object_detection.builders import box_coder_builder
-from object_detection.builders import box_predictor_builder
-from object_detection.builders import hyperparams_builder
-from object_detection.builders import image_resizer_builder
-from object_detection.builders import losses_builder
-from object_detection.builders import matcher_builder
-from object_detection.builders import model_builder
-from object_detection.builders import post_processing_builder
-from object_detection.builders import region_similarity_calculator_builder as sim_calc
-from object_detection.core import target_assigner
-
-model_builder.SSD_FEATURE_EXTRACTOR_CLASS_MAP.update({
- 'lstm_ssd_mobilenet_v1':
- lstm_ssd_mobilenet_v1_feature_extractor
- .LSTMSSDMobileNetV1FeatureExtractor,
- 'lstm_ssd_interleaved_mobilenet_v2':
- lstm_ssd_interleaved_mobilenet_v2_feature_extractor
- .LSTMSSDInterleavedMobilenetV2FeatureExtractor,
-})
-SSD_FEATURE_EXTRACTOR_CLASS_MAP = model_builder.SSD_FEATURE_EXTRACTOR_CLASS_MAP
-
-
-def build(model_config, lstm_config, is_training):
- """Builds a DetectionModel based on the model config.
-
- Args:
- model_config: A model.proto object containing the config for the desired
- DetectionModel.
- lstm_config: LstmModel config proto that specifies LSTM train/eval configs.
- is_training: True if this model is being built for training purposes.
-
- Returns:
- DetectionModel based on the config.
-
- Raises:
- ValueError: On invalid meta architecture or model.
- """
- return _build_lstm_model(model_config.ssd, lstm_config, is_training)
-
-
-def _build_lstm_feature_extractor(feature_extractor_config,
- is_training,
- lstm_config,
- reuse_weights=None):
- """Builds a ssd_meta_arch.SSDFeatureExtractor based on config.
-
- Args:
- feature_extractor_config: A SSDFeatureExtractor proto config from ssd.proto.
- is_training: True if this feature extractor is being built for training.
- lstm_config: LSTM-SSD specific configs.
- reuse_weights: If the feature extractor should reuse weights.
-
- Returns:
- ssd_meta_arch.SSDFeatureExtractor based on config.
-
- Raises:
- ValueError: On invalid feature extractor type.
- """
-
- feature_type = feature_extractor_config.type
- depth_multiplier = feature_extractor_config.depth_multiplier
- min_depth = feature_extractor_config.min_depth
- pad_to_multiple = feature_extractor_config.pad_to_multiple
- use_explicit_padding = feature_extractor_config.use_explicit_padding
- use_depthwise = feature_extractor_config.use_depthwise
- conv_hyperparams = hyperparams_builder.build(
- feature_extractor_config.conv_hyperparams, is_training)
- override_base_feature_extractor_hyperparams = (
- feature_extractor_config.override_base_feature_extractor_hyperparams)
-
- if feature_type not in SSD_FEATURE_EXTRACTOR_CLASS_MAP:
- raise ValueError('Unknown ssd feature_extractor: {}'.format(feature_type))
-
- feature_extractor_class = SSD_FEATURE_EXTRACTOR_CLASS_MAP[feature_type]
- feature_extractor = feature_extractor_class(
- is_training, depth_multiplier, min_depth, pad_to_multiple,
- conv_hyperparams, reuse_weights, use_explicit_padding, use_depthwise,
- override_base_feature_extractor_hyperparams)
-
- # Extra configs for LSTM-SSD.
- feature_extractor.lstm_state_depth = lstm_config.lstm_state_depth
- feature_extractor.flatten_state = lstm_config.flatten_state
- feature_extractor.clip_state = lstm_config.clip_state
- feature_extractor.scale_state = lstm_config.scale_state
- feature_extractor.is_quantized = lstm_config.is_quantized
- feature_extractor.low_res = lstm_config.low_res
- # Extra configs for interleaved LSTM-SSD.
- if 'interleaved' in feature_extractor_config.type:
- feature_extractor.pre_bottleneck = lstm_config.pre_bottleneck
- feature_extractor.depth_multipliers = lstm_config.depth_multipliers
- if is_training:
- feature_extractor.interleave_method = lstm_config.train_interleave_method
- else:
- feature_extractor.interleave_method = lstm_config.eval_interleave_method
- return feature_extractor
-
-
-def _build_lstm_model(ssd_config, lstm_config, is_training):
- """Builds an LSTM detection model based on the model config.
-
- Args:
- ssd_config: A ssd.proto object containing the config for the desired
- LSTMSSDMetaArch.
- lstm_config: LstmModel config proto that specifies LSTM train/eval configs.
- is_training: True if this model is being built for training purposes.
-
- Returns:
- LSTMSSDMetaArch based on the config.
- Raises:
- ValueError: If ssd_config.type is not recognized (i.e. not registered in
- model_class_map), or if lstm_config.interleave_strategy is not recognized.
- ValueError: If unroll_length is not specified in the config file.
- """
- feature_extractor = _build_lstm_feature_extractor(
- ssd_config.feature_extractor, is_training, lstm_config)
-
- box_coder = box_coder_builder.build(ssd_config.box_coder)
- matcher = matcher_builder.build(ssd_config.matcher)
- region_similarity_calculator = sim_calc.build(
- ssd_config.similarity_calculator)
-
- num_classes = ssd_config.num_classes
- ssd_box_predictor = box_predictor_builder.build(hyperparams_builder.build,
- ssd_config.box_predictor,
- is_training, num_classes)
- anchor_generator = anchor_generator_builder.build(ssd_config.anchor_generator)
- image_resizer_fn = image_resizer_builder.build(ssd_config.image_resizer)
- non_max_suppression_fn, score_conversion_fn = post_processing_builder.build(
- ssd_config.post_processing)
- (classification_loss, localization_loss, classification_weight,
- localization_weight, miner, _, _) = losses_builder.build(ssd_config.loss)
-
- normalize_loss_by_num_matches = ssd_config.normalize_loss_by_num_matches
- encode_background_as_zeros = ssd_config.encode_background_as_zeros
- negative_class_weight = ssd_config.negative_class_weight
-
- # Extra configs for lstm unroll length.
- unroll_length = None
- if 'lstm' in ssd_config.feature_extractor.type:
- if is_training:
- unroll_length = lstm_config.train_unroll_length
- else:
- unroll_length = lstm_config.eval_unroll_length
- if unroll_length is None:
- raise ValueError('No unroll length found in the config file')
-
- target_assigner_instance = target_assigner.TargetAssigner(
- region_similarity_calculator,
- matcher,
- box_coder,
- negative_class_weight=negative_class_weight)
-
- lstm_model = lstm_ssd_meta_arch.LSTMSSDMetaArch(
- is_training=is_training,
- anchor_generator=anchor_generator,
- box_predictor=ssd_box_predictor,
- box_coder=box_coder,
- feature_extractor=feature_extractor,
- encode_background_as_zeros=encode_background_as_zeros,
- image_resizer_fn=image_resizer_fn,
- non_max_suppression_fn=non_max_suppression_fn,
- score_conversion_fn=score_conversion_fn,
- classification_loss=classification_loss,
- localization_loss=localization_loss,
- classification_loss_weight=classification_weight,
- localization_loss_weight=localization_weight,
- normalize_loss_by_num_matches=normalize_loss_by_num_matches,
- hard_example_miner=miner,
- unroll_length=unroll_length,
- target_assigner_instance=target_assigner_instance)
-
- return lstm_model
diff --git a/research/lstm_object_detection/model_builder_test.py b/research/lstm_object_detection/model_builder_test.py
deleted file mode 100644
index 9d64b537cdc..00000000000
--- a/research/lstm_object_detection/model_builder_test.py
+++ /dev/null
@@ -1,302 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for lstm_object_detection.tensorflow.model_builder."""
-
-import tensorflow.compat.v1 as tf
-from google.protobuf import text_format
-from lstm_object_detection import model_builder
-from lstm_object_detection.meta_architectures import lstm_ssd_meta_arch
-from lstm_object_detection.protos import pipeline_pb2 as internal_pipeline_pb2
-from object_detection.protos import pipeline_pb2
-
-
-class ModelBuilderTest(tf.test.TestCase):
-
- def create_train_model(self, model_config, lstm_config):
- """Builds a DetectionModel based on the model config.
-
- Args:
- model_config: A model.proto object containing the config for the desired
- DetectionModel.
- lstm_config: LstmModel config proto that specifies LSTM train/eval
- configs.
-
- Returns:
- DetectionModel based on the config.
- """
- return model_builder.build(model_config, lstm_config, is_training=True)
-
- def create_eval_model(self, model_config, lstm_config):
- """Builds a DetectionModel based on the model config.
-
- Args:
- model_config: A model.proto object containing the config for the desired
- DetectionModel.
- lstm_config: LstmModel config proto that specifies LSTM train/eval
- configs.
-
- Returns:
- DetectionModel based on the config.
- """
- return model_builder.build(model_config, lstm_config, is_training=False)
-
- def get_model_configs_from_proto(self):
- """Creates a model text proto for testing.
-
- Returns:
- A dictionary of model configs.
- """
-
- model_text_proto = """
- [lstm_object_detection.protos.lstm_model] {
- train_unroll_length: 4
- eval_unroll_length: 4
- }
- model {
- ssd {
- feature_extractor {
- type: 'lstm_ssd_mobilenet_v1'
- conv_hyperparams {
- regularizer {
- l2_regularizer {
- }
- }
- initializer {
- truncated_normal_initializer {
- }
- }
- }
- }
- negative_class_weight: 2.0
- box_coder {
- faster_rcnn_box_coder {
- }
- }
- matcher {
- argmax_matcher {
- }
- }
- similarity_calculator {
- iou_similarity {
- }
- }
- anchor_generator {
- ssd_anchor_generator {
- aspect_ratios: 1.0
- }
- }
- image_resizer {
- fixed_shape_resizer {
- height: 320
- width: 320
- }
- }
- box_predictor {
- convolutional_box_predictor {
- conv_hyperparams {
- regularizer {
- l2_regularizer {
- }
- }
- initializer {
- truncated_normal_initializer {
- }
- }
- }
- }
- }
- normalize_loc_loss_by_codesize: true
- loss {
- classification_loss {
- weighted_softmax {
- }
- }
- localization_loss {
- weighted_smooth_l1 {
- }
- }
- }
- }
- }"""
-
- pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
- text_format.Merge(model_text_proto, pipeline_config)
-
- configs = {}
- configs['model'] = pipeline_config.model
- configs['lstm_model'] = pipeline_config.Extensions[
- internal_pipeline_pb2.lstm_model]
-
- return configs
-
- def get_interleaved_model_configs_from_proto(self):
- """Creates an interleaved model text proto for testing.
-
- Returns:
- A dictionary of model configs.
- """
-
- model_text_proto = """
- [lstm_object_detection.protos.lstm_model] {
- train_unroll_length: 4
- eval_unroll_length: 10
- lstm_state_depth: 320
- depth_multipliers: 1.4
- depth_multipliers: 0.35
- pre_bottleneck: true
- low_res: true
- train_interleave_method: 'RANDOM_SKIP_SMALL'
- eval_interleave_method: 'SKIP3'
- }
- model {
- ssd {
- feature_extractor {
- type: 'lstm_ssd_interleaved_mobilenet_v2'
- conv_hyperparams {
- regularizer {
- l2_regularizer {
- }
- }
- initializer {
- truncated_normal_initializer {
- }
- }
- }
- }
- negative_class_weight: 2.0
- box_coder {
- faster_rcnn_box_coder {
- }
- }
- matcher {
- argmax_matcher {
- }
- }
- similarity_calculator {
- iou_similarity {
- }
- }
- anchor_generator {
- ssd_anchor_generator {
- aspect_ratios: 1.0
- }
- }
- image_resizer {
- fixed_shape_resizer {
- height: 320
- width: 320
- }
- }
- box_predictor {
- convolutional_box_predictor {
- conv_hyperparams {
- regularizer {
- l2_regularizer {
- }
- }
- initializer {
- truncated_normal_initializer {
- }
- }
- }
- }
- }
- normalize_loc_loss_by_codesize: true
- loss {
- classification_loss {
- weighted_softmax {
- }
- }
- localization_loss {
- weighted_smooth_l1 {
- }
- }
- }
- }
- }"""
-
- pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
- text_format.Merge(model_text_proto, pipeline_config)
-
- configs = {}
- configs['model'] = pipeline_config.model
- configs['lstm_model'] = pipeline_config.Extensions[
- internal_pipeline_pb2.lstm_model]
-
- return configs
-
- def test_model_creation_from_valid_configs(self):
- configs = self.get_model_configs_from_proto()
- # Test model properties.
- self.assertEqual(configs['model'].ssd.negative_class_weight, 2.0)
- self.assertTrue(configs['model'].ssd.normalize_loc_loss_by_codesize)
- self.assertEqual(configs['model'].ssd.feature_extractor.type,
- 'lstm_ssd_mobilenet_v1')
-
- model = self.create_train_model(configs['model'], configs['lstm_model'])
- # Test architechture type.
- self.assertIsInstance(model, lstm_ssd_meta_arch.LSTMSSDMetaArch)
- # Test LSTM unroll length.
- self.assertEqual(model.unroll_length, 4)
-
- model = self.create_eval_model(configs['model'], configs['lstm_model'])
- # Test architechture type.
- self.assertIsInstance(model, lstm_ssd_meta_arch.LSTMSSDMetaArch)
- # Test LSTM configs.
- self.assertEqual(model.unroll_length, 4)
-
- def test_interleaved_model_creation_from_valid_configs(self):
- configs = self.get_interleaved_model_configs_from_proto()
- # Test model properties.
- self.assertEqual(configs['model'].ssd.negative_class_weight, 2.0)
- self.assertTrue(configs['model'].ssd.normalize_loc_loss_by_codesize)
- self.assertEqual(configs['model'].ssd.feature_extractor.type,
- 'lstm_ssd_interleaved_mobilenet_v2')
-
- model = self.create_train_model(configs['model'], configs['lstm_model'])
- # Test architechture type.
- self.assertIsInstance(model, lstm_ssd_meta_arch.LSTMSSDMetaArch)
- # Test LSTM configs.
- self.assertEqual(model.unroll_length, 4)
- self.assertEqual(model._feature_extractor.lstm_state_depth, 320)
- self.assertAllClose(model._feature_extractor.depth_multipliers, (1.4, 0.35))
- self.assertTrue(model._feature_extractor.pre_bottleneck)
- self.assertTrue(model._feature_extractor.low_res)
- self.assertEqual(model._feature_extractor.interleave_method,
- 'RANDOM_SKIP_SMALL')
-
- model = self.create_eval_model(configs['model'], configs['lstm_model'])
- # Test architechture type.
- self.assertIsInstance(model, lstm_ssd_meta_arch.LSTMSSDMetaArch)
- # Test LSTM configs.
- self.assertEqual(model.unroll_length, 10)
- self.assertEqual(model._feature_extractor.lstm_state_depth, 320)
- self.assertAllClose(model._feature_extractor.depth_multipliers, (1.4, 0.35))
- self.assertTrue(model._feature_extractor.pre_bottleneck)
- self.assertTrue(model._feature_extractor.low_res)
- self.assertEqual(model._feature_extractor.interleave_method, 'SKIP3')
-
- def test_model_creation_from_invalid_configs(self):
- configs = self.get_model_configs_from_proto()
- # Test model build failure with wrong input configs.
- with self.assertRaises(AttributeError):
- _ = self.create_train_model(configs['model'], configs['model'])
- with self.assertRaises(AttributeError):
- _ = self.create_eval_model(configs['model'], configs['model'])
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/models/__init__.py b/research/lstm_object_detection/models/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/lstm_object_detection/models/lstm_ssd_interleaved_mobilenet_v2_feature_extractor.py b/research/lstm_object_detection/models/lstm_ssd_interleaved_mobilenet_v2_feature_extractor.py
deleted file mode 100644
index 5a2d4bd0bdc..00000000000
--- a/research/lstm_object_detection/models/lstm_ssd_interleaved_mobilenet_v2_feature_extractor.py
+++ /dev/null
@@ -1,298 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""LSTDInterleavedFeatureExtractor which interleaves multiple MobileNet V2."""
-
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-
-from tensorflow.python.framework import ops as tf_ops
-from lstm_object_detection.lstm import lstm_cells
-from lstm_object_detection.lstm import rnn_decoder
-from lstm_object_detection.meta_architectures import lstm_ssd_meta_arch
-from lstm_object_detection.models import mobilenet_defs
-from object_detection.models import feature_map_generators
-from object_detection.utils import ops
-from object_detection.utils import shape_utils
-from nets.mobilenet import mobilenet
-from nets.mobilenet import mobilenet_v2
-
-
-class LSTMSSDInterleavedMobilenetV2FeatureExtractor(
- lstm_ssd_meta_arch.LSTMSSDInterleavedFeatureExtractor):
- """LSTM-SSD Interleaved Feature Extractor using MobilenetV2 features."""
-
- def __init__(self,
- is_training,
- depth_multiplier,
- min_depth,
- pad_to_multiple,
- conv_hyperparams_fn,
- reuse_weights=None,
- use_explicit_padding=False,
- use_depthwise=True,
- override_base_feature_extractor_hyperparams=False):
- """Interleaved Feature Extractor for LSTD Models with MobileNet v2.
-
- Args:
- is_training: whether the network is in training mode.
- depth_multiplier: float depth multiplier for feature extractor.
- min_depth: minimum feature extractor depth.
- pad_to_multiple: the nearest multiple to zero pad the input height and
- width dimensions to.
- conv_hyperparams_fn: A function to construct tf slim arg_scope for conv2d
- and separable_conv2d ops in the layers that are added on top of the
- base feature extractor.
- reuse_weights: Whether to reuse variables. Default is None.
- use_explicit_padding: Whether to use explicit padding when extracting
- features. Default is False.
- use_depthwise: Whether to use depthwise convolutions. Default is True.
- override_base_feature_extractor_hyperparams: Whether to override
- hyperparameters of the base feature extractor with the one from
- `conv_hyperparams_fn`.
- """
- super(LSTMSSDInterleavedMobilenetV2FeatureExtractor, self).__init__(
- is_training=is_training,
- depth_multiplier=depth_multiplier,
- min_depth=min_depth,
- pad_to_multiple=pad_to_multiple,
- conv_hyperparams_fn=conv_hyperparams_fn,
- reuse_weights=reuse_weights,
- use_explicit_padding=use_explicit_padding,
- use_depthwise=use_depthwise,
- override_base_feature_extractor_hyperparams=
- override_base_feature_extractor_hyperparams)
- # RANDOM_SKIP_SMALL means the training policy is random and the small model
- # does not update state during training.
- if self._is_training:
- self._interleave_method = 'RANDOM_SKIP_SMALL'
- else:
- self._interleave_method = 'SKIP9'
-
- self._flatten_state = False
- self._scale_state = False
- self._clip_state = True
- self._pre_bottleneck = True
- self._feature_map_layout = {
- 'from_layer': ['layer_19', '', '', '', ''],
- 'layer_depth': [-1, 256, 256, 256, 256],
- 'use_depthwise': self._use_depthwise,
- 'use_explicit_padding': self._use_explicit_padding,
- }
- self._low_res = True
- self._base_network_scope = 'MobilenetV2'
-
- def extract_base_features_large(self, preprocessed_inputs):
- """Extract the large base model features.
-
- Variables are created under the scope of /MobilenetV2_1/
-
- Args:
- preprocessed_inputs: preprocessed input images of shape:
- [batch, width, height, depth].
-
- Returns:
- net: the last feature map created from the base feature extractor.
- end_points: a dictionary of feature maps created.
- """
- scope_name = self._base_network_scope + '_1'
- with tf.variable_scope(scope_name, reuse=self._reuse_weights) as base_scope:
- net, end_points = mobilenet_v2.mobilenet_base(
- preprocessed_inputs,
- depth_multiplier=self._depth_multipliers[0],
- conv_defs=mobilenet_defs.mobilenet_v2_lite_def(
- is_quantized=self._is_quantized),
- use_explicit_padding=self._use_explicit_padding,
- scope=base_scope)
- return net, end_points
-
- def extract_base_features_small(self, preprocessed_inputs):
- """Extract the small base model features.
-
- Variables are created under the scope of /MobilenetV2_2/
-
- Args:
- preprocessed_inputs: preprocessed input images of shape:
- [batch, width, height, depth].
-
- Returns:
- net: the last feature map created from the base feature extractor.
- end_points: a dictionary of feature maps created.
- """
- scope_name = self._base_network_scope + '_2'
- with tf.variable_scope(scope_name, reuse=self._reuse_weights) as base_scope:
- if self._low_res:
- height_small = preprocessed_inputs.get_shape().as_list()[1] // 2
- width_small = preprocessed_inputs.get_shape().as_list()[2] // 2
- inputs_small = tf.image.resize_images(preprocessed_inputs,
- [height_small, width_small])
- # Create end point handle for tflite deployment.
- with tf.name_scope(None):
- inputs_small = tf.identity(
- inputs_small, name='normalized_input_image_tensor_small')
- else:
- inputs_small = preprocessed_inputs
- net, end_points = mobilenet_v2.mobilenet_base(
- inputs_small,
- depth_multiplier=self._depth_multipliers[1],
- conv_defs=mobilenet_defs.mobilenet_v2_lite_def(
- is_quantized=self._is_quantized, low_res=self._low_res),
- use_explicit_padding=self._use_explicit_padding,
- scope=base_scope)
- return net, end_points
-
- def create_lstm_cell(self, batch_size, output_size, state_saver, state_name,
- dtype=tf.float32):
- """Create the LSTM cell, and initialize state if necessary.
-
- Args:
- batch_size: input batch size.
- output_size: output size of the lstm cell, [width, height].
- state_saver: a state saver object with methods `state` and `save_state`.
- state_name: string, the name to use with the state_saver.
- dtype: dtype to initialize lstm state.
-
- Returns:
- lstm_cell: the lstm cell unit.
- init_state: initial state representations.
- step: the step
- """
- lstm_cell = lstm_cells.GroupedConvLSTMCell(
- filter_size=(3, 3),
- output_size=output_size,
- num_units=max(self._min_depth, self._lstm_state_depth),
- is_training=self._is_training,
- activation=tf.nn.relu6,
- flatten_state=self._flatten_state,
- scale_state=self._scale_state,
- clip_state=self._clip_state,
- output_bottleneck=True,
- pre_bottleneck=self._pre_bottleneck,
- is_quantized=self._is_quantized,
- visualize_gates=False)
-
- if state_saver is None:
- init_state = lstm_cell.init_state('lstm_state', batch_size, dtype)
- step = None
- else:
- step = state_saver.state(state_name + '_step')
- c = state_saver.state(state_name + '_c')
- h = state_saver.state(state_name + '_h')
- c.set_shape([batch_size] + c.get_shape().as_list()[1:])
- h.set_shape([batch_size] + h.get_shape().as_list()[1:])
- init_state = (c, h)
- return lstm_cell, init_state, step
-
- def extract_features(self, preprocessed_inputs, state_saver=None,
- state_name='lstm_state', unroll_length=10, scope=None):
- """Extract features from preprocessed inputs.
-
- The features include the base network features, lstm features and SSD
- features, organized in the following name scope:
-
- /MobilenetV2_1/...
- /MobilenetV2_2/...
- /LSTM/...
- /FeatureMap/...
-
- Args:
- preprocessed_inputs: a [batch, height, width, channels] float tensor
- representing a batch of consecutive frames from video clips.
- state_saver: A state saver object with methods `state` and `save_state`.
- state_name: Python string, the name to use with the state_saver.
- unroll_length: number of steps to unroll the lstm.
- scope: Scope for the base network of the feature extractor.
-
- Returns:
- feature_maps: a list of tensors where the ith tensor has shape
- [batch, height_i, width_i, depth_i]
- Raises:
- ValueError: if interleave_method not recognized or large and small base
- network output feature maps of different sizes.
- """
- preprocessed_inputs = shape_utils.check_min_image_dim(
- 33, preprocessed_inputs)
- preprocessed_inputs = ops.pad_to_multiple(
- preprocessed_inputs, self._pad_to_multiple)
- batch_size = preprocessed_inputs.shape[0].value // unroll_length
- batch_axis = 0
- nets = []
-
- # Batch processing of mobilenet features.
- with slim.arg_scope(mobilenet_v2.training_scope(
- is_training=self._is_training,
- bn_decay=0.9997)), \
- slim.arg_scope([mobilenet.depth_multiplier],
- min_depth=self._min_depth, divisible_by=8):
- # Big model.
- net, _ = self.extract_base_features_large(preprocessed_inputs)
- nets.append(net)
- large_base_feature_shape = net.shape
-
- # Small models
- net, _ = self.extract_base_features_small(preprocessed_inputs)
- nets.append(net)
- small_base_feature_shape = net.shape
- if not (large_base_feature_shape[1] == small_base_feature_shape[1] and
- large_base_feature_shape[2] == small_base_feature_shape[2]):
- raise ValueError('Large and Small base network feature map dimension '
- 'not equal!')
-
- with slim.arg_scope(self._conv_hyperparams_fn()):
- with tf.variable_scope('LSTM', reuse=self._reuse_weights):
- output_size = (large_base_feature_shape[1], large_base_feature_shape[2])
- lstm_cell, init_state, step = self.create_lstm_cell(
- batch_size, output_size, state_saver, state_name,
- dtype=preprocessed_inputs.dtype)
-
- nets_seq = [
- tf.split(net, unroll_length, axis=batch_axis) for net in nets
- ]
-
- net_seq, states_out = rnn_decoder.multi_input_rnn_decoder(
- nets_seq,
- init_state,
- lstm_cell,
- step,
- selection_strategy=self._interleave_method,
- is_training=self._is_training,
- is_quantized=self._is_quantized,
- pre_bottleneck=self._pre_bottleneck,
- flatten_state=self._flatten_state,
- scope=None)
- self._states_out = states_out
-
- image_features = {}
- if state_saver is not None:
- self._step = state_saver.state(state_name + '_step')
- batcher_ops = [
- state_saver.save_state(state_name + '_c', states_out[-1][0]),
- state_saver.save_state(state_name + '_h', states_out[-1][1]),
- state_saver.save_state(state_name + '_step', self._step + 1)]
- with tf_ops.control_dependencies(batcher_ops):
- image_features['layer_19'] = tf.concat(net_seq, 0)
- else:
- image_features['layer_19'] = tf.concat(net_seq, 0)
-
- # SSD layers.
- with tf.variable_scope('FeatureMap'):
- feature_maps = feature_map_generators.multi_resolution_feature_maps(
- feature_map_layout=self._feature_map_layout,
- depth_multiplier=self._depth_multiplier,
- min_depth=self._min_depth,
- insert_1x1_conv=True,
- image_features=image_features,
- pool_residual=True)
- return list(feature_maps.values())
diff --git a/research/lstm_object_detection/models/lstm_ssd_interleaved_mobilenet_v2_feature_extractor_test.py b/research/lstm_object_detection/models/lstm_ssd_interleaved_mobilenet_v2_feature_extractor_test.py
deleted file mode 100644
index b285f0e4441..00000000000
--- a/research/lstm_object_detection/models/lstm_ssd_interleaved_mobilenet_v2_feature_extractor_test.py
+++ /dev/null
@@ -1,352 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for lstm_ssd_interleaved_mobilenet_v2_feature_extractor."""
-
-import numpy as np
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-from tensorflow.contrib import training as contrib_training
-
-from lstm_object_detection.models import lstm_ssd_interleaved_mobilenet_v2_feature_extractor
-from object_detection.models import ssd_feature_extractor_test
-
-
-class LSTMSSDInterleavedMobilenetV2FeatureExtractorTest(
- ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
-
- def _create_feature_extractor(self,
- depth_multiplier,
- pad_to_multiple,
- is_quantized=False):
- """Constructs a new feature extractor.
-
- Args:
- depth_multiplier: float depth multiplier for feature extractor
- pad_to_multiple: the nearest multiple to zero pad the input height and
- width dimensions to.
- is_quantized: whether to quantize the graph.
- Returns:
- an ssd_meta_arch.SSDFeatureExtractor object.
- """
- min_depth = 32
- def conv_hyperparams_fn():
- with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm), \
- slim.arg_scope([slim.batch_norm], is_training=False) as sc:
- return sc
- feature_extractor = (
- lstm_ssd_interleaved_mobilenet_v2_feature_extractor
- .LSTMSSDInterleavedMobilenetV2FeatureExtractor(False, depth_multiplier,
- min_depth,
- pad_to_multiple,
- conv_hyperparams_fn))
- feature_extractor.lstm_state_depth = int(320 * depth_multiplier)
- feature_extractor.depth_multipliers = [
- depth_multiplier, depth_multiplier / 4.0
- ]
- feature_extractor.is_quantized = is_quantized
- return feature_extractor
-
- def test_feature_extractor_construct_with_expected_params(self):
- def conv_hyperparams_fn():
- with (slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm) and
- slim.arg_scope([slim.batch_norm], decay=0.97, epsilon=1e-3)) as sc:
- return sc
-
- params = {
- 'is_training': True,
- 'depth_multiplier': .55,
- 'min_depth': 9,
- 'pad_to_multiple': 3,
- 'conv_hyperparams_fn': conv_hyperparams_fn,
- 'reuse_weights': False,
- 'use_explicit_padding': True,
- 'use_depthwise': False,
- 'override_base_feature_extractor_hyperparams': True}
-
- feature_extractor = (
- lstm_ssd_interleaved_mobilenet_v2_feature_extractor
- .LSTMSSDInterleavedMobilenetV2FeatureExtractor(**params))
-
- self.assertEqual(params['is_training'],
- feature_extractor._is_training)
- self.assertEqual(params['depth_multiplier'],
- feature_extractor._depth_multiplier)
- self.assertEqual(params['min_depth'],
- feature_extractor._min_depth)
- self.assertEqual(params['pad_to_multiple'],
- feature_extractor._pad_to_multiple)
- self.assertEqual(params['conv_hyperparams_fn'],
- feature_extractor._conv_hyperparams_fn)
- self.assertEqual(params['reuse_weights'],
- feature_extractor._reuse_weights)
- self.assertEqual(params['use_explicit_padding'],
- feature_extractor._use_explicit_padding)
- self.assertEqual(params['use_depthwise'],
- feature_extractor._use_depthwise)
- self.assertEqual(params['override_base_feature_extractor_hyperparams'],
- (feature_extractor.
- _override_base_feature_extractor_hyperparams))
-
- def test_extract_features_returns_correct_shapes_128(self):
- image_height = 128
- image_width = 128
- depth_multiplier = 1.0
- pad_to_multiple = 1
- expected_feature_map_shape = [(2, 4, 4, 640),
- (2, 2, 2, 256), (2, 1, 1, 256),
- (2, 1, 1, 256), (2, 1, 1, 256)]
- self.check_extract_features_returns_correct_shape(
- 2, image_height, image_width, depth_multiplier, pad_to_multiple,
- expected_feature_map_shape)
-
- def test_extract_features_returns_correct_shapes_unroll10(self):
- image_height = 128
- image_width = 128
- depth_multiplier = 1.0
- pad_to_multiple = 1
- expected_feature_map_shape = [(10, 4, 4, 640),
- (10, 2, 2, 256), (10, 1, 1, 256),
- (10, 1, 1, 256), (10, 1, 1, 256)]
- self.check_extract_features_returns_correct_shape(
- 10, image_height, image_width, depth_multiplier, pad_to_multiple,
- expected_feature_map_shape, unroll_length=10)
-
- def test_extract_features_returns_correct_shapes_320(self):
- image_height = 320
- image_width = 320
- depth_multiplier = 1.0
- pad_to_multiple = 1
- expected_feature_map_shape = [(2, 10, 10, 640),
- (2, 5, 5, 256), (2, 3, 3, 256),
- (2, 2, 2, 256), (2, 1, 1, 256)]
- self.check_extract_features_returns_correct_shape(
- 2, image_height, image_width, depth_multiplier, pad_to_multiple,
- expected_feature_map_shape)
-
- def test_extract_features_returns_correct_shapes_enforcing_min_depth(self):
- image_height = 320
- image_width = 320
- depth_multiplier = 0.5**12
- pad_to_multiple = 1
- expected_feature_map_shape = [(2, 10, 10, 64),
- (2, 5, 5, 32), (2, 3, 3, 32),
- (2, 2, 2, 32), (2, 1, 1, 32)]
- self.check_extract_features_returns_correct_shape(
- 2, image_height, image_width, depth_multiplier, pad_to_multiple,
- expected_feature_map_shape)
-
- def test_extract_features_returns_correct_shapes_with_pad_to_multiple(self):
- image_height = 299
- image_width = 299
- depth_multiplier = 1.0
- pad_to_multiple = 32
- expected_feature_map_shape = [(2, 10, 10, 640),
- (2, 5, 5, 256), (2, 3, 3, 256),
- (2, 2, 2, 256), (2, 1, 1, 256)]
- self.check_extract_features_returns_correct_shape(
- 2, image_height, image_width, depth_multiplier, pad_to_multiple,
- expected_feature_map_shape)
-
- def test_preprocess_returns_correct_value_range(self):
- image_height = 128
- image_width = 128
- depth_multiplier = 1
- pad_to_multiple = 1
- test_image = np.random.rand(4, image_height, image_width, 3)
- feature_extractor = self._create_feature_extractor(depth_multiplier,
- pad_to_multiple)
- preprocessed_image = feature_extractor.preprocess(test_image)
- self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
-
- def test_variables_only_created_in_scope(self):
- depth_multiplier = 1
- pad_to_multiple = 1
- scope_names = ['MobilenetV2', 'LSTM', 'FeatureMap']
- self.check_feature_extractor_variables_under_scopes(
- depth_multiplier, pad_to_multiple, scope_names)
-
- def test_has_fused_batchnorm(self):
- image_height = 40
- image_width = 40
- depth_multiplier = 1
- pad_to_multiple = 32
- image_placeholder = tf.placeholder(tf.float32,
- [1, image_height, image_width, 3])
- feature_extractor = self._create_feature_extractor(depth_multiplier,
- pad_to_multiple)
- preprocessed_image = feature_extractor.preprocess(image_placeholder)
- _ = feature_extractor.extract_features(preprocessed_image, unroll_length=1)
- self.assertTrue(any(op.type.startswith('FusedBatchNorm')
- for op in tf.get_default_graph().get_operations()))
-
- def test_variables_for_tflite(self):
- image_height = 40
- image_width = 40
- depth_multiplier = 1
- pad_to_multiple = 32
- image_placeholder = tf.placeholder(tf.float32,
- [1, image_height, image_width, 3])
- feature_extractor = self._create_feature_extractor(depth_multiplier,
- pad_to_multiple)
- preprocessed_image = feature_extractor.preprocess(image_placeholder)
- tflite_unsupported = ['SquaredDifference']
- _ = feature_extractor.extract_features(preprocessed_image, unroll_length=1)
- self.assertFalse(any(op.type in tflite_unsupported
- for op in tf.get_default_graph().get_operations()))
-
- def test_output_nodes_for_tflite(self):
- image_height = 64
- image_width = 64
- depth_multiplier = 1.0
- pad_to_multiple = 1
- image_placeholder = tf.placeholder(tf.float32,
- [1, image_height, image_width, 3])
- feature_extractor = self._create_feature_extractor(depth_multiplier,
- pad_to_multiple)
- preprocessed_image = feature_extractor.preprocess(image_placeholder)
- _ = feature_extractor.extract_features(preprocessed_image, unroll_length=1)
-
- tflite_nodes = [
- 'raw_inputs/init_lstm_c',
- 'raw_inputs/init_lstm_h',
- 'raw_inputs/base_endpoint',
- 'raw_outputs/lstm_c',
- 'raw_outputs/lstm_h',
- 'raw_outputs/base_endpoint_1',
- 'raw_outputs/base_endpoint_2'
- ]
- ops_names = [op.name for op in tf.get_default_graph().get_operations()]
- for node in tflite_nodes:
- self.assertTrue(any(node in s for s in ops_names))
-
- def test_fixed_concat_nodes(self):
- image_height = 64
- image_width = 64
- depth_multiplier = 1.0
- pad_to_multiple = 1
- image_placeholder = tf.placeholder(tf.float32,
- [1, image_height, image_width, 3])
- feature_extractor = self._create_feature_extractor(
- depth_multiplier, pad_to_multiple, is_quantized=True)
- preprocessed_image = feature_extractor.preprocess(image_placeholder)
- _ = feature_extractor.extract_features(preprocessed_image, unroll_length=1)
-
- concat_nodes = [
- 'MobilenetV2_1/expanded_conv_16/project/Relu6',
- 'MobilenetV2_2/expanded_conv_16/project/Relu6'
- ]
- ops_names = [op.name for op in tf.get_default_graph().get_operations()]
- for node in concat_nodes:
- self.assertTrue(any(node in s for s in ops_names))
-
- def test_lstm_states(self):
- image_height = 256
- image_width = 256
- depth_multiplier = 1
- pad_to_multiple = 1
- state_channel = 320
- init_state1 = {
- 'lstm_state_c': tf.zeros(
- [image_height // 32, image_width // 32, state_channel]),
- 'lstm_state_h': tf.zeros(
- [image_height // 32, image_width // 32, state_channel]),
- 'lstm_state_step': tf.zeros([1])
- }
- init_state2 = {
- 'lstm_state_c': tf.random_uniform(
- [image_height // 32, image_width // 32, state_channel]),
- 'lstm_state_h': tf.random_uniform(
- [image_height // 32, image_width // 32, state_channel]),
- 'lstm_state_step': tf.zeros([1])
- }
- seq = {'dummy': tf.random_uniform([2, 1, 1, 1])}
- stateful_reader1 = contrib_training.SequenceQueueingStateSaver(
- batch_size=1,
- num_unroll=1,
- input_length=2,
- input_key='',
- input_sequences=seq,
- input_context={},
- initial_states=init_state1,
- capacity=1)
- stateful_reader2 = contrib_training.SequenceQueueingStateSaver(
- batch_size=1,
- num_unroll=1,
- input_length=2,
- input_key='',
- input_sequences=seq,
- input_context={},
- initial_states=init_state2,
- capacity=1)
- image = tf.random_uniform([1, image_height, image_width, 3])
- feature_extractor = self._create_feature_extractor(depth_multiplier,
- pad_to_multiple)
- with tf.variable_scope('zero_state'):
- feature_maps1 = feature_extractor.extract_features(
- image, stateful_reader1.next_batch, unroll_length=1)
- with tf.variable_scope('random_state'):
- feature_maps2 = feature_extractor.extract_features(
- image, stateful_reader2.next_batch, unroll_length=1)
- with tf.Session() as sess:
- sess.run(tf.global_variables_initializer())
- sess.run(tf.local_variables_initializer())
- sess.run(tf.get_collection(tf.GraphKeys.TABLE_INITIALIZERS))
- sess.run([stateful_reader1.prefetch_op, stateful_reader2.prefetch_op])
- maps1, maps2 = sess.run([feature_maps1, feature_maps2])
- state = sess.run(stateful_reader1.next_batch.state('lstm_state_c'))
- # feature maps should be different because states are different
- self.assertFalse(np.all(np.equal(maps1[0], maps2[0])))
- # state should no longer be zero after update
- self.assertTrue(state.any())
-
- def check_extract_features_returns_correct_shape(
- self, batch_size, image_height, image_width, depth_multiplier,
- pad_to_multiple, expected_feature_map_shapes, unroll_length=1):
- def graph_fn(image_tensor):
- feature_extractor = self._create_feature_extractor(depth_multiplier,
- pad_to_multiple)
- feature_maps = feature_extractor.extract_features(
- image_tensor, unroll_length=unroll_length)
- return feature_maps
-
- image_tensor = np.random.rand(batch_size, image_height, image_width,
- 3).astype(np.float32)
- feature_maps = self.execute(graph_fn, [image_tensor])
- for feature_map, expected_shape in zip(
- feature_maps, expected_feature_map_shapes):
- self.assertAllEqual(feature_map.shape, expected_shape)
-
- def check_feature_extractor_variables_under_scopes(
- self, depth_multiplier, pad_to_multiple, scope_names):
- g = tf.Graph()
- with g.as_default():
- feature_extractor = self._create_feature_extractor(
- depth_multiplier, pad_to_multiple)
- preprocessed_inputs = tf.placeholder(tf.float32, (4, 320, 320, 3))
- feature_extractor.extract_features(
- preprocessed_inputs, unroll_length=1)
- variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
- for variable in variables:
- self.assertTrue(
- any([
- variable.name.startswith(scope_name)
- for scope_name in scope_names
- ]), 'Variable name: ' + variable.name +
- ' is not under any provided scopes: ' + ','.join(scope_names))
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/models/lstm_ssd_mobilenet_v1_feature_extractor.py b/research/lstm_object_detection/models/lstm_ssd_mobilenet_v1_feature_extractor.py
deleted file mode 100644
index cccf740aadd..00000000000
--- a/research/lstm_object_detection/models/lstm_ssd_mobilenet_v1_feature_extractor.py
+++ /dev/null
@@ -1,211 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""LSTMSSDFeatureExtractor for MobilenetV1 features."""
-
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-from tensorflow.python.framework import ops as tf_ops
-from lstm_object_detection.lstm import lstm_cells
-from lstm_object_detection.lstm import rnn_decoder
-from lstm_object_detection.meta_architectures import lstm_ssd_meta_arch
-from object_detection.models import feature_map_generators
-from object_detection.utils import context_manager
-from object_detection.utils import ops
-from object_detection.utils import shape_utils
-from nets import mobilenet_v1
-
-
-class LSTMSSDMobileNetV1FeatureExtractor(
- lstm_ssd_meta_arch.LSTMSSDFeatureExtractor):
- """LSTM Feature Extractor using MobilenetV1 features."""
-
- def __init__(self,
- is_training,
- depth_multiplier,
- min_depth,
- pad_to_multiple,
- conv_hyperparams_fn,
- reuse_weights=None,
- use_explicit_padding=False,
- use_depthwise=True,
- override_base_feature_extractor_hyperparams=False,
- lstm_state_depth=256):
- """Initializes instance of MobileNetV1 Feature Extractor for LSTMSSD Models.
-
- Args:
- is_training: A boolean whether the network is in training mode.
- depth_multiplier: A float depth multiplier for feature extractor.
- min_depth: A number representing minimum feature extractor depth.
- pad_to_multiple: The nearest multiple to zero pad the input height and
- width dimensions to.
- conv_hyperparams_fn: A function to construct tf slim arg_scope for conv2d
- and separable_conv2d ops in the layers that are added on top of the
- base feature extractor.
- reuse_weights: Whether to reuse variables. Default is None.
- use_explicit_padding: Whether to use explicit padding when extracting
- features. Default is False.
- use_depthwise: Whether to use depthwise convolutions. Default is True.
- override_base_feature_extractor_hyperparams: Whether to override
- hyperparameters of the base feature extractor with the one from
- `conv_hyperparams_fn`.
- lstm_state_depth: An integter of the depth of the lstm state.
- """
- super(LSTMSSDMobileNetV1FeatureExtractor, self).__init__(
- is_training=is_training,
- depth_multiplier=depth_multiplier,
- min_depth=min_depth,
- pad_to_multiple=pad_to_multiple,
- conv_hyperparams_fn=conv_hyperparams_fn,
- reuse_weights=reuse_weights,
- use_explicit_padding=use_explicit_padding,
- use_depthwise=use_depthwise,
- override_base_feature_extractor_hyperparams=
- override_base_feature_extractor_hyperparams)
- self._feature_map_layout = {
- 'from_layer': ['Conv2d_13_pointwise_lstm', '', '', '', ''],
- 'layer_depth': [-1, 512, 256, 256, 128],
- 'use_explicit_padding': self._use_explicit_padding,
- 'use_depthwise': self._use_depthwise,
- }
- self._base_network_scope = 'MobilenetV1'
- self._lstm_state_depth = lstm_state_depth
-
- def create_lstm_cell(self, batch_size, output_size, state_saver, state_name,
- dtype=tf.float32):
- """Create the LSTM cell, and initialize state if necessary.
-
- Args:
- batch_size: input batch size.
- output_size: output size of the lstm cell, [width, height].
- state_saver: a state saver object with methods `state` and `save_state`.
- state_name: string, the name to use with the state_saver.
- dtype: dtype to initialize lstm state.
-
- Returns:
- lstm_cell: the lstm cell unit.
- init_state: initial state representations.
- step: the step
- """
- lstm_cell = lstm_cells.BottleneckConvLSTMCell(
- filter_size=(3, 3),
- output_size=output_size,
- num_units=max(self._min_depth, self._lstm_state_depth),
- activation=tf.nn.relu6,
- visualize_gates=False)
-
- if state_saver is None:
- init_state = lstm_cell.init_state(state_name, batch_size, dtype)
- step = None
- else:
- step = state_saver.state(state_name + '_step')
- c = state_saver.state(state_name + '_c')
- h = state_saver.state(state_name + '_h')
- init_state = (c, h)
- return lstm_cell, init_state, step
-
- def extract_features(self,
- preprocessed_inputs,
- state_saver=None,
- state_name='lstm_state',
- unroll_length=5,
- scope=None):
- """Extracts features from preprocessed inputs.
-
- The features include the base network features, lstm features and SSD
- features, organized in the following name scope:
-
- /MobilenetV1/...
- /LSTM/...
- /FeatureMaps/...
-
- Args:
- preprocessed_inputs: A [batch, height, width, channels] float tensor
- representing a batch of consecutive frames from video clips.
- state_saver: A state saver object with methods `state` and `save_state`.
- state_name: A python string for the name to use with the state_saver.
- unroll_length: The number of steps to unroll the lstm.
- scope: The scope for the base network of the feature extractor.
-
- Returns:
- A list of tensors where the ith tensor has shape [batch, height_i,
- width_i, depth_i]
- """
- preprocessed_inputs = shape_utils.check_min_image_dim(
- 33, preprocessed_inputs)
- with slim.arg_scope(
- mobilenet_v1.mobilenet_v1_arg_scope(is_training=self._is_training)):
- with (slim.arg_scope(self._conv_hyperparams_fn())
- if self._override_base_feature_extractor_hyperparams else
- context_manager.IdentityContextManager()):
- with slim.arg_scope([slim.batch_norm], fused=False):
- # Base network.
- with tf.variable_scope(
- scope, self._base_network_scope,
- reuse=self._reuse_weights) as scope:
- net, image_features = mobilenet_v1.mobilenet_v1_base(
- ops.pad_to_multiple(preprocessed_inputs, self._pad_to_multiple),
- final_endpoint='Conv2d_13_pointwise',
- min_depth=self._min_depth,
- depth_multiplier=self._depth_multiplier,
- scope=scope)
-
- with slim.arg_scope(self._conv_hyperparams_fn()):
- with slim.arg_scope(
- [slim.batch_norm], fused=False, is_training=self._is_training):
- # ConvLSTM layers.
- batch_size = net.shape[0].value // unroll_length
- with tf.variable_scope('LSTM', reuse=self._reuse_weights) as lstm_scope:
- lstm_cell, init_state, _ = self.create_lstm_cell(
- batch_size,
- (net.shape[1].value, net.shape[2].value),
- state_saver,
- state_name,
- dtype=preprocessed_inputs.dtype)
- net_seq = list(tf.split(net, unroll_length))
-
- # Identities added for inputing state tensors externally.
- c_ident = tf.identity(init_state[0], name='lstm_state_in_c')
- h_ident = tf.identity(init_state[1], name='lstm_state_in_h')
- init_state = (c_ident, h_ident)
-
- net_seq, states_out = rnn_decoder.rnn_decoder(
- net_seq, init_state, lstm_cell, scope=lstm_scope)
- batcher_ops = None
- self._states_out = states_out
- if state_saver is not None:
- self._step = state_saver.state('%s_step' % state_name)
- batcher_ops = [
- state_saver.save_state('%s_c' % state_name, states_out[-1][0]),
- state_saver.save_state('%s_h' % state_name, states_out[-1][1]),
- state_saver.save_state('%s_step' % state_name, self._step + 1)
- ]
- with tf_ops.control_dependencies(batcher_ops):
- image_features['Conv2d_13_pointwise_lstm'] = tf.concat(net_seq, 0)
-
- # Identities added for reading output states, to be reused externally.
- tf.identity(states_out[-1][0], name='lstm_state_out_c')
- tf.identity(states_out[-1][1], name='lstm_state_out_h')
-
- # SSD layers.
- with tf.variable_scope('FeatureMaps', reuse=self._reuse_weights):
- feature_maps = feature_map_generators.multi_resolution_feature_maps(
- feature_map_layout=self._feature_map_layout,
- depth_multiplier=(self._depth_multiplier),
- min_depth=self._min_depth,
- insert_1x1_conv=True,
- image_features=image_features)
-
- return list(feature_maps.values())
diff --git a/research/lstm_object_detection/models/lstm_ssd_mobilenet_v1_feature_extractor_test.py b/research/lstm_object_detection/models/lstm_ssd_mobilenet_v1_feature_extractor_test.py
deleted file mode 100644
index 56ad2745dae..00000000000
--- a/research/lstm_object_detection/models/lstm_ssd_mobilenet_v1_feature_extractor_test.py
+++ /dev/null
@@ -1,179 +0,0 @@
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Tests for models.lstm_ssd_mobilenet_v1_feature_extractor."""
-
-import numpy as np
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-from tensorflow.contrib import training as contrib_training
-
-from lstm_object_detection.models import lstm_ssd_mobilenet_v1_feature_extractor as feature_extractor
-from object_detection.models import ssd_feature_extractor_test
-
-
-class LstmSsdMobilenetV1FeatureExtractorTest(
- ssd_feature_extractor_test.SsdFeatureExtractorTestBase):
-
- def _create_feature_extractor(self,
- depth_multiplier=1.0,
- pad_to_multiple=1,
- is_training=True,
- use_explicit_padding=False):
- """Constructs a new feature extractor.
-
- Args:
- depth_multiplier: A float depth multiplier for feature extractor.
- pad_to_multiple: The nearest multiple to zero pad the input height and
- width dimensions to.
- is_training: A boolean whether the network is in training mode.
- use_explicit_padding: A boolean whether to use explicit padding.
-
- Returns:
- An lstm_ssd_meta_arch.LSTMSSDMobileNetV1FeatureExtractor object.
- """
- min_depth = 32
- extractor = (
- feature_extractor.LSTMSSDMobileNetV1FeatureExtractor(
- is_training,
- depth_multiplier,
- min_depth,
- pad_to_multiple,
- self.conv_hyperparams_fn,
- use_explicit_padding=use_explicit_padding))
- extractor.lstm_state_depth = int(256 * depth_multiplier)
- return extractor
-
- def test_feature_extractor_construct_with_expected_params(self):
- def conv_hyperparams_fn():
- with (slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm) and
- slim.arg_scope([slim.batch_norm], decay=0.97, epsilon=1e-3)) as sc:
- return sc
-
- params = {
- 'is_training': True,
- 'depth_multiplier': .55,
- 'min_depth': 9,
- 'pad_to_multiple': 3,
- 'conv_hyperparams_fn': conv_hyperparams_fn,
- 'reuse_weights': False,
- 'use_explicit_padding': True,
- 'use_depthwise': False,
- 'override_base_feature_extractor_hyperparams': True}
-
- extractor = (
- feature_extractor.LSTMSSDMobileNetV1FeatureExtractor(**params))
-
- self.assertEqual(params['is_training'],
- extractor._is_training)
- self.assertEqual(params['depth_multiplier'],
- extractor._depth_multiplier)
- self.assertEqual(params['min_depth'],
- extractor._min_depth)
- self.assertEqual(params['pad_to_multiple'],
- extractor._pad_to_multiple)
- self.assertEqual(params['conv_hyperparams_fn'],
- extractor._conv_hyperparams_fn)
- self.assertEqual(params['reuse_weights'],
- extractor._reuse_weights)
- self.assertEqual(params['use_explicit_padding'],
- extractor._use_explicit_padding)
- self.assertEqual(params['use_depthwise'],
- extractor._use_depthwise)
- self.assertEqual(params['override_base_feature_extractor_hyperparams'],
- (extractor.
- _override_base_feature_extractor_hyperparams))
-
- def test_extract_features_returns_correct_shapes_256(self):
- image_height = 256
- image_width = 256
- depth_multiplier = 1.0
- pad_to_multiple = 1
- batch_size = 5
- expected_feature_map_shape = [(batch_size, 8, 8, 256), (batch_size, 4, 4,
- 512),
- (batch_size, 2, 2, 256), (batch_size, 1, 1,
- 256)]
- self.check_extract_features_returns_correct_shape(
- batch_size,
- image_height,
- image_width,
- depth_multiplier,
- pad_to_multiple,
- expected_feature_map_shape,
- use_explicit_padding=False)
- self.check_extract_features_returns_correct_shape(
- batch_size,
- image_height,
- image_width,
- depth_multiplier,
- pad_to_multiple,
- expected_feature_map_shape,
- use_explicit_padding=True)
-
- def test_preprocess_returns_correct_value_range(self):
- test_image = np.random.rand(5, 128, 128, 3)
- extractor = self._create_feature_extractor()
- preprocessed_image = extractor.preprocess(test_image)
- self.assertTrue(np.all(np.less_equal(np.abs(preprocessed_image), 1.0)))
-
- def test_variables_only_created_in_scope(self):
- scope_name = 'MobilenetV1'
- g = tf.Graph()
- with g.as_default():
- preprocessed_inputs = tf.placeholder(tf.float32, (5, 256, 256, 3))
- extractor = self._create_feature_extractor()
- extractor.extract_features(preprocessed_inputs)
- variables = g.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
- find_scope = False
- for variable in variables:
- if scope_name in variable.name:
- find_scope = True
- break
- self.assertTrue(find_scope)
-
- def test_lstm_non_zero_state(self):
- init_state = {
- 'lstm_state_c': tf.zeros([8, 8, 256]),
- 'lstm_state_h': tf.zeros([8, 8, 256]),
- 'lstm_state_step': tf.zeros([1])
- }
- seq = {'test': tf.random_uniform([3, 1, 1, 1])}
- stateful_reader = contrib_training.SequenceQueueingStateSaver(
- batch_size=1,
- num_unroll=1,
- input_length=2,
- input_key='',
- input_sequences=seq,
- input_context={},
- initial_states=init_state,
- capacity=1)
- extractor = self._create_feature_extractor()
- image = tf.random_uniform([5, 256, 256, 3])
- with tf.variable_scope('zero_state'):
- feature_map = extractor.extract_features(
- image, stateful_reader.next_batch)
- with tf.Session() as sess:
- sess.run(tf.global_variables_initializer())
- sess.run([stateful_reader.prefetch_op])
- _ = sess.run([feature_map])
- # Update states with the next batch.
- state = sess.run(stateful_reader.next_batch.state('lstm_state_c'))
- # State should no longer be zero after update.
- self.assertTrue(state.any())
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/models/mobilenet_defs.py b/research/lstm_object_detection/models/mobilenet_defs.py
deleted file mode 100644
index 4f984240215..00000000000
--- a/research/lstm_object_detection/models/mobilenet_defs.py
+++ /dev/null
@@ -1,142 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Definitions for modified MobileNet models used in LSTD."""
-
-import tensorflow.compat.v1 as tf
-import tf_slim as slim
-from nets import mobilenet_v1
-from nets.mobilenet import conv_blocks as mobilenet_convs
-from nets.mobilenet import mobilenet
-
-
-def mobilenet_v1_lite_def(depth_multiplier, low_res=False):
- """Conv definitions for a lite MobileNet v1 model.
-
- Args:
- depth_multiplier: float depth multiplier for MobileNet.
- low_res: An option of low-res conv input for interleave model.
-
- Returns:
- Array of convolutions.
-
- Raises:
- ValueError: On invalid channels with provided depth multiplier.
- """
- conv = mobilenet_v1.Conv
- sep_conv = mobilenet_v1.DepthSepConv
-
- def _find_target_depth(original, depth_multiplier):
- # Find the target depth such that:
- # int(target * depth_multiplier) == original
- pseudo_target = int(original / depth_multiplier)
- for target in range(pseudo_target - 1, pseudo_target + 2):
- if int(target * depth_multiplier) == original:
- return target
- raise ValueError('Cannot have %d channels with depth multiplier %0.2f' %
- (original, depth_multiplier))
-
- return [
- conv(kernel=[3, 3], stride=2, depth=32),
- sep_conv(kernel=[3, 3], stride=1, depth=64),
- sep_conv(kernel=[3, 3], stride=2, depth=128),
- sep_conv(kernel=[3, 3], stride=1, depth=128),
- sep_conv(kernel=[3, 3], stride=2, depth=256),
- sep_conv(kernel=[3, 3], stride=1, depth=256),
- sep_conv(kernel=[3, 3], stride=2, depth=512),
- sep_conv(kernel=[3, 3], stride=1, depth=512),
- sep_conv(kernel=[3, 3], stride=1, depth=512),
- sep_conv(kernel=[3, 3], stride=1, depth=512),
- sep_conv(kernel=[3, 3], stride=1, depth=512),
- sep_conv(kernel=[3, 3], stride=1, depth=512),
- sep_conv(kernel=[3, 3], stride=1 if low_res else 2, depth=1024),
- sep_conv(
- kernel=[3, 3],
- stride=1,
- depth=int(_find_target_depth(1024, depth_multiplier)))
- ]
-
-
-def mobilenet_v2_lite_def(reduced=False, is_quantized=False, low_res=False):
- """Conv definitions for a lite MobileNet v2 model.
-
- Args:
- reduced: Determines the scaling factor for expanded conv. If True, a factor
- of 6 is used. If False, a factor of 3 is used.
- is_quantized: Whether the model is trained in quantized mode.
- low_res: Whether the input to the model is of half resolution.
-
- Returns:
- Array of convolutions.
- """
- expanded_conv = mobilenet_convs.expanded_conv
- expand_input = mobilenet_convs.expand_input_by_factor
- op = mobilenet.op
- return dict(
- defaults={
- # Note: these parameters of batch norm affect the architecture
- # that's why they are here and not in training_scope.
- (slim.batch_norm,): {
- 'center': True,
- 'scale': True
- },
- (slim.conv2d, slim.fully_connected, slim.separable_conv2d): {
- 'normalizer_fn': slim.batch_norm,
- 'activation_fn': tf.nn.relu6
- },
- (expanded_conv,): {
- 'expansion_size': expand_input(6),
- 'split_expansion': 1,
- 'normalizer_fn': slim.batch_norm,
- 'residual': True
- },
- (slim.conv2d, slim.separable_conv2d): {
- 'padding': 'SAME'
- }
- },
- spec=[
- op(slim.conv2d, stride=2, num_outputs=32, kernel_size=[3, 3]),
- op(expanded_conv,
- expansion_size=expand_input(1, divisible_by=1),
- num_outputs=16),
- op(expanded_conv,
- expansion_size=(expand_input(3, divisible_by=1)
- if reduced else expand_input(6)),
- stride=2,
- num_outputs=24),
- op(expanded_conv,
- expansion_size=(expand_input(3, divisible_by=1)
- if reduced else expand_input(6)),
- stride=1,
- num_outputs=24),
- op(expanded_conv, stride=2, num_outputs=32),
- op(expanded_conv, stride=1, num_outputs=32),
- op(expanded_conv, stride=1, num_outputs=32),
- op(expanded_conv, stride=2, num_outputs=64),
- op(expanded_conv, stride=1, num_outputs=64),
- op(expanded_conv, stride=1, num_outputs=64),
- op(expanded_conv, stride=1, num_outputs=64),
- op(expanded_conv, stride=1, num_outputs=96),
- op(expanded_conv, stride=1, num_outputs=96),
- op(expanded_conv, stride=1, num_outputs=96),
- op(expanded_conv, stride=1 if low_res else 2, num_outputs=160),
- op(expanded_conv, stride=1, num_outputs=160),
- op(expanded_conv, stride=1, num_outputs=160),
- op(expanded_conv,
- stride=1,
- num_outputs=320,
- project_activation_fn=(tf.nn.relu6
- if is_quantized else tf.identity))
- ],
- )
diff --git a/research/lstm_object_detection/models/mobilenet_defs_test.py b/research/lstm_object_detection/models/mobilenet_defs_test.py
deleted file mode 100644
index f1b5bda504b..00000000000
--- a/research/lstm_object_detection/models/mobilenet_defs_test.py
+++ /dev/null
@@ -1,136 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for lstm_object_detection.models.mobilenet_defs."""
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import tensorflow.compat.v1 as tf
-from lstm_object_detection.models import mobilenet_defs
-from nets import mobilenet_v1
-from nets.mobilenet import mobilenet_v2
-
-
-class MobilenetV1DefsTest(tf.test.TestCase):
-
- def test_mobilenet_v1_lite_def(self):
- net, _ = mobilenet_v1.mobilenet_v1_base(
- tf.placeholder(tf.float32, (10, 320, 320, 3)),
- final_endpoint='Conv2d_13_pointwise',
- min_depth=8,
- depth_multiplier=1.0,
- conv_defs=mobilenet_defs.mobilenet_v1_lite_def(1.0),
- use_explicit_padding=True,
- scope='MobilenetV1')
- self.assertEqual(net.get_shape().as_list(), [10, 10, 10, 1024])
-
- def test_mobilenet_v1_lite_def_depthmultiplier_half(self):
- net, _ = mobilenet_v1.mobilenet_v1_base(
- tf.placeholder(tf.float32, (10, 320, 320, 3)),
- final_endpoint='Conv2d_13_pointwise',
- min_depth=8,
- depth_multiplier=0.5,
- conv_defs=mobilenet_defs.mobilenet_v1_lite_def(0.5),
- use_explicit_padding=True,
- scope='MobilenetV1')
- self.assertEqual(net.get_shape().as_list(), [10, 10, 10, 1024])
-
- def test_mobilenet_v1_lite_def_depthmultiplier_2x(self):
- net, _ = mobilenet_v1.mobilenet_v1_base(
- tf.placeholder(tf.float32, (10, 320, 320, 3)),
- final_endpoint='Conv2d_13_pointwise',
- min_depth=8,
- depth_multiplier=2.0,
- conv_defs=mobilenet_defs.mobilenet_v1_lite_def(2.0),
- use_explicit_padding=True,
- scope='MobilenetV1')
- self.assertEqual(net.get_shape().as_list(), [10, 10, 10, 1024])
-
- def test_mobilenet_v1_lite_def_low_res(self):
- net, _ = mobilenet_v1.mobilenet_v1_base(
- tf.placeholder(tf.float32, (10, 320, 320, 3)),
- final_endpoint='Conv2d_13_pointwise',
- min_depth=8,
- depth_multiplier=1.0,
- conv_defs=mobilenet_defs.mobilenet_v1_lite_def(1.0, low_res=True),
- use_explicit_padding=True,
- scope='MobilenetV1')
- self.assertEqual(net.get_shape().as_list(), [10, 20, 20, 1024])
-
-
-class MobilenetV2DefsTest(tf.test.TestCase):
-
- def test_mobilenet_v2_lite_def(self):
- net, features = mobilenet_v2.mobilenet_base(
- tf.placeholder(tf.float32, (10, 320, 320, 3)),
- min_depth=8,
- depth_multiplier=1.0,
- conv_defs=mobilenet_defs.mobilenet_v2_lite_def(),
- use_explicit_padding=True,
- scope='MobilenetV2')
- self.assertEqual(net.get_shape().as_list(), [10, 10, 10, 320])
- self._assert_contains_op('MobilenetV2/expanded_conv_16/project/Identity')
- self.assertEqual(
- features['layer_3/expansion_output'].get_shape().as_list(),
- [10, 160, 160, 96])
- self.assertEqual(
- features['layer_4/expansion_output'].get_shape().as_list(),
- [10, 80, 80, 144])
-
- def test_mobilenet_v2_lite_def_is_quantized(self):
- net, _ = mobilenet_v2.mobilenet_base(
- tf.placeholder(tf.float32, (10, 320, 320, 3)),
- min_depth=8,
- depth_multiplier=1.0,
- conv_defs=mobilenet_defs.mobilenet_v2_lite_def(is_quantized=True),
- use_explicit_padding=True,
- scope='MobilenetV2')
- self.assertEqual(net.get_shape().as_list(), [10, 10, 10, 320])
- self._assert_contains_op('MobilenetV2/expanded_conv_16/project/Relu6')
-
- def test_mobilenet_v2_lite_def_low_res(self):
- net, _ = mobilenet_v2.mobilenet_base(
- tf.placeholder(tf.float32, (10, 320, 320, 3)),
- min_depth=8,
- depth_multiplier=1.0,
- conv_defs=mobilenet_defs.mobilenet_v2_lite_def(low_res=True),
- use_explicit_padding=True,
- scope='MobilenetV2')
- self.assertEqual(net.get_shape().as_list(), [10, 20, 20, 320])
-
- def test_mobilenet_v2_lite_def_reduced(self):
- net, features = mobilenet_v2.mobilenet_base(
- tf.placeholder(tf.float32, (10, 320, 320, 3)),
- min_depth=8,
- depth_multiplier=1.0,
- conv_defs=mobilenet_defs.mobilenet_v2_lite_def(reduced=True),
- use_explicit_padding=True,
- scope='MobilenetV2')
- self.assertEqual(net.get_shape().as_list(), [10, 10, 10, 320])
- self.assertEqual(
- features['layer_3/expansion_output'].get_shape().as_list(),
- [10, 160, 160, 48])
- self.assertEqual(
- features['layer_4/expansion_output'].get_shape().as_list(),
- [10, 80, 80, 72])
-
- def _assert_contains_op(self, op_name):
- op_names = [op.name for op in tf.get_default_graph().get_operations()]
- self.assertIn(op_name, op_names)
-
-
-if __name__ == '__main__':
- tf.test.main()
diff --git a/research/lstm_object_detection/protos/__init__.py b/research/lstm_object_detection/protos/__init__.py
deleted file mode 100644
index e69de29bb2d..00000000000
diff --git a/research/lstm_object_detection/protos/input_reader_google.proto b/research/lstm_object_detection/protos/input_reader_google.proto
deleted file mode 100644
index 2c494a62e97..00000000000
--- a/research/lstm_object_detection/protos/input_reader_google.proto
+++ /dev/null
@@ -1,32 +0,0 @@
-syntax = "proto2";
-
-package lstm_object_detection.protos;
-
-import "object_detection/protos/input_reader.proto";
-
-message GoogleInputReader {
- extend object_detection.protos.ExternalInputReader {
- optional GoogleInputReader google_input_reader = 444;
- }
-
- oneof input_reader {
- TFRecordVideoInputReader tf_record_video_input_reader = 1;
- }
-}
-
-message TFRecordVideoInputReader {
- // Path(s) to tfrecords of input data.
- repeated string input_path = 1;
-
- enum DataType {
- UNSPECIFIED = 0;
- TF_EXAMPLE = 1;
- TF_SEQUENCE_EXAMPLE = 2;
- }
- optional DataType data_type = 2 [default=TF_SEQUENCE_EXAMPLE];
-
- // Length of the video sequence. All the input video sequence should have the
- // same length in frames, e.g. 5 frames.
- optional int32 video_length = 3;
-}
-
diff --git a/research/lstm_object_detection/protos/pipeline.proto b/research/lstm_object_detection/protos/pipeline.proto
deleted file mode 100644
index 10dd652554a..00000000000
--- a/research/lstm_object_detection/protos/pipeline.proto
+++ /dev/null
@@ -1,69 +0,0 @@
-syntax = "proto2";
-
-package lstm_object_detection.protos;
-
-import "object_detection/protos/pipeline.proto";
-import "lstm_object_detection/protos/quant_overrides.proto";
-
-extend object_detection.protos.TrainEvalPipelineConfig {
- optional LstmModel lstm_model = 205743444;
- optional QuantOverrides quant_overrides = 246059837;
-}
-
-// Message for extra fields needed for configuring LSTM model.
-message LstmModel {
- // Unroll length for training LSTMs.
- optional int32 train_unroll_length = 1;
-
- // Unroll length for evaluating LSTMs.
- optional int32 eval_unroll_length = 2;
-
- // Depth of the lstm feature map.
- optional int32 lstm_state_depth = 3 [default = 256];
-
- // Depth multipliers for multiple feature extractors. Used for interleaved
- // or ensemble model.
- repeated float depth_multipliers = 4;
-
- // Specifies how models are interleaved when multiple feature extractors are
- // used during training. Must be in ['RANDOM', 'RANDOM_SKIP_SMALL'].
- optional string train_interleave_method = 5 [default = 'RANDOM'];
-
- // Specifies how models are interleaved when multiple feature extractors are
- // used during training. Must be in ['RANDOM', 'RANDOM_SKIP', 'SKIPK'].
- optional string eval_interleave_method = 6 [default = 'SKIP9'];
-
- // The stride of the lstm state.
- optional int32 lstm_state_stride = 7 [default = 32];
-
- // Whether to flattern LSTM state and output. Note that this is typically
- // intended only to be modified internally by export_tfmini_lstd_graph_lib
- // to support flatten state for tfmini/tflite. Do not set this field in
- // the pipeline config file unless necessary.
- optional bool flatten_state = 8 [default = false];
-
- // Whether to apply bottleneck layer before going into LSTM gates. This
- // allows multiple feature extractors to use separate bottleneck layers
- // instead of sharing the same one so that different base model output
- // feature dimensions are not forced to be the same.
- // For example:
- // Model 1 outputs feature map f_1 of depth d_1.
- // Model 2 outputs feature map f_2 of depth d_2.
- // Pre-bottlenecking allows lstm input to be either:
- // conv(concat([f_1, h])) or conv(concat([f_2, h])).
- optional bool pre_bottleneck = 9 [default = false];
-
- // Normalize LSTM state, default false.
- optional bool scale_state = 10 [default = false];
-
- // Clip LSTM state at [0, 6], default true.
- optional bool clip_state = 11 [default = true];
-
- // If the model is in quantized training. This field does NOT need to be set
- // manually. Instead, it will be overridden by configs in graph_rewriter.
- optional bool is_quantized = 12 [default = false];
-
- // Downsample input image when using the smaller network in interleaved
- // models, default false.
- optional bool low_res = 13 [default = false];
-}
diff --git a/research/lstm_object_detection/protos/quant_overrides.proto b/research/lstm_object_detection/protos/quant_overrides.proto
deleted file mode 100644
index 9dc0eaf86e5..00000000000
--- a/research/lstm_object_detection/protos/quant_overrides.proto
+++ /dev/null
@@ -1,40 +0,0 @@
-syntax = "proto2";
-
-package lstm_object_detection.protos;
-
-// Message to override default quantization behavior.
-message QuantOverrides {
- repeated QuantConfig quant_configs = 1;
-}
-
-// Parameters to manually create fake quant ops outside of the generic
-// tensorflow/contrib/quantize/python/quantize.py script. This may be
-// used to override default behaviour or quantize ops not already supported.
-message QuantConfig {
- // The name of the op to add a fake quant op to.
- required string op_name = 1;
-
- // The name of the fake quant op.
- required string quant_op_name = 2;
-
- // Whether the fake quant op uses fixed ranges. Otherwise, learned moving
- // average ranges are used.
- required bool fixed_range = 3 [default = false];
-
- // The intitial minimum value of the range.
- optional float min = 4 [default = -6];
-
- // The initial maximum value of the range.
- optional float max = 5 [default = 6];
-
- // Number of steps to delay before quantization takes effect during training.
- optional int32 delay = 6 [default = 500000];
-
- // Number of bits to use for quantizing weights.
- // Only 8 bit is supported for now.
- optional int32 weight_bits = 7 [default = 8];
-
- // Number of bits to use for quantizing activations.
- // Only 8 bit is supported for now.
- optional int32 activation_bits = 8 [default = 8];
-}
diff --git a/research/lstm_object_detection/test_tflite_model.py b/research/lstm_object_detection/test_tflite_model.py
deleted file mode 100644
index a8b5e15e210..00000000000
--- a/research/lstm_object_detection/test_tflite_model.py
+++ /dev/null
@@ -1,53 +0,0 @@
-# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-
-"""Test a tflite model using random input data."""
-
-from __future__ import print_function
-from absl import flags
-import numpy as np
-import tensorflow.compat.v1 as tf
-
-flags.DEFINE_string('model_path', None, 'Path to model.')
-FLAGS = flags.FLAGS
-
-
-def main(_):
-
- flags.mark_flag_as_required('model_path')
-
- # Load TFLite model and allocate tensors.
- interpreter = tf.lite.Interpreter(model_path=FLAGS.model_path)
- interpreter.allocate_tensors()
-
- # Get input and output tensors.
- input_details = interpreter.get_input_details()
- print('input_details:', input_details)
- output_details = interpreter.get_output_details()
- print('output_details:', output_details)
-
- # Test model on random input data.
- input_shape = input_details[0]['shape']
- # change the following line to feed into your own data.
- input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
- interpreter.set_tensor(input_details[0]['index'], input_data)
-
- interpreter.invoke()
- output_data = interpreter.get_tensor(output_details[0]['index'])
- print(output_data)
-
-
-if __name__ == '__main__':
- tf.app.run()
diff --git a/research/lstm_object_detection/tflite/BUILD b/research/lstm_object_detection/tflite/BUILD
deleted file mode 100644
index 66068925da4..00000000000
--- a/research/lstm_object_detection/tflite/BUILD
+++ /dev/null
@@ -1,81 +0,0 @@
-package(
- default_visibility = ["//visibility:public"],
-)
-
-licenses(["notice"])
-
-cc_library(
- name = "mobile_ssd_client",
- srcs = ["mobile_ssd_client.cc"],
- hdrs = ["mobile_ssd_client.h"],
- deps = [
- "//protos:box_encodings_cc_proto",
- "//protos:detections_cc_proto",
- "//protos:labelmap_cc_proto",
- "//protos:mobile_ssd_client_options_cc_proto",
- "//utils:conversion_utils",
- "//utils:ssd_utils",
- "@com_google_absl//absl/base:core_headers",
- "@com_google_absl//absl/memory",
- "@com_google_absl//absl/types:span",
- "@com_google_glog//:glog",
- "@gemmlowp",
- ],
-)
-
-config_setting(
- name = "enable_edgetpu",
- define_values = {"enable_edgetpu": "true"},
- visibility = ["//visibility:public"],
-)
-
-cc_library(
- name = "mobile_ssd_tflite_client",
- srcs = ["mobile_ssd_tflite_client.cc"],
- hdrs = ["mobile_ssd_tflite_client.h"],
- defines = select({
- "//conditions:default": [],
- "enable_edgetpu": ["ENABLE_EDGETPU"],
- }),
- deps = [
- ":mobile_ssd_client",
- "@com_google_glog//:glog",
- "@com_google_absl//absl/memory",
- "@org_tensorflow//tensorflow/lite:arena_planner",
- "@org_tensorflow//tensorflow/lite:framework",
- "@org_tensorflow//tensorflow/lite/delegates/nnapi:nnapi_delegate",
- "@org_tensorflow//tensorflow/lite/kernels:builtin_ops",
- "//protos:anchor_generation_options_cc_proto",
- "//utils:file_utils",
- "//utils:ssd_utils",
- ] + select({
- "//conditions:default": [],
- "enable_edgetpu": [
- "@libedgetpu//libedgetpu:header",
- ],
- }),
- alwayslink = 1,
-)
-
-cc_library(
- name = "mobile_lstd_tflite_client",
- srcs = ["mobile_lstd_tflite_client.cc"],
- hdrs = ["mobile_lstd_tflite_client.h"],
- defines = select({
- "//conditions:default": [],
- "enable_edgetpu": ["ENABLE_EDGETPU"],
- }),
- deps = [
- ":mobile_ssd_client",
- ":mobile_ssd_tflite_client",
- "@com_google_glog//:glog",
- "@com_google_absl//absl/base:core_headers",
- "@org_tensorflow//tensorflow/lite/kernels:builtin_ops",
- ] + select({
- "//conditions:default": [],
- "enable_edgetpu": [
- "@libedgetpu//libedgetpu:header",
- ],
- }),
- alwayslink = 1,
-)
diff --git a/research/lstm_object_detection/tflite/WORKSPACE b/research/lstm_object_detection/tflite/WORKSPACE
deleted file mode 100644
index 3bce3814f36..00000000000
--- a/research/lstm_object_detection/tflite/WORKSPACE
+++ /dev/null
@@ -1,133 +0,0 @@
-workspace(name = "lstm_object_detection")
-
-load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
-load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
-
-http_archive(
- name = "bazel_skylib",
- sha256 = "bbccf674aa441c266df9894182d80de104cabd19be98be002f6d478aaa31574d",
- strip_prefix = "bazel-skylib-2169ae1c374aab4a09aa90e65efe1a3aad4e279b",
- urls = ["https://github.com/bazelbuild/bazel-skylib/archive/2169ae1c374aab4a09aa90e65efe1a3aad4e279b.tar.gz"],
-)
-load("@bazel_skylib//lib:versions.bzl", "versions")
-versions.check(minimum_bazel_version = "0.23.0")
-
-# ABSL cpp library.
-http_archive(
- name = "com_google_absl",
- urls = [
- "https://github.com/abseil/abseil-cpp/archive/a02f62f456f2c4a7ecf2be3104fe0c6e16fbad9a.tar.gz",
- ],
- sha256 = "d437920d1434c766d22e85773b899c77c672b8b4865d5dc2cd61a29fdff3cf03",
- strip_prefix = "abseil-cpp-a02f62f456f2c4a7ecf2be3104fe0c6e16fbad9a",
-)
-
-http_archive(
- name = "rules_cc",
- strip_prefix = "rules_cc-master",
- urls = ["https://github.com/bazelbuild/rules_cc/archive/master.zip"],
-)
-
-# GoogleTest/GoogleMock framework. Used by most unit-tests.
-http_archive(
- name = "com_google_googletest",
- urls = ["https://github.com/google/googletest/archive/master.zip"],
- strip_prefix = "googletest-master",
-)
-
-# gflags needed by glog
-http_archive(
- name = "com_github_gflags_gflags",
- sha256 = "6e16c8bc91b1310a44f3965e616383dbda48f83e8c1eaa2370a215057b00cabe",
- strip_prefix = "gflags-77592648e3f3be87d6c7123eb81cbad75f9aef5a",
- urls = [
- "https://mirror.bazel.build/github.com/gflags/gflags/archive/77592648e3f3be87d6c7123eb81cbad75f9aef5a.tar.gz",
- "https://github.com/gflags/gflags/archive/77592648e3f3be87d6c7123eb81cbad75f9aef5a.tar.gz",
- ],
-)
-
-# glog
-http_archive(
- name = "com_google_glog",
- sha256 = "f28359aeba12f30d73d9e4711ef356dc842886968112162bc73002645139c39c",
- strip_prefix = "glog-0.4.0",
- urls = ["https://github.com/google/glog/archive/v0.4.0.tar.gz"],
-)
-
-http_archive(
- name = "zlib",
- build_file = "@com_google_protobuf//:third_party/zlib.BUILD",
- sha256 = "c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1",
- strip_prefix = "zlib-1.2.11",
- urls = ["https://zlib.net/zlib-1.2.11.tar.gz"],
-)
-
-http_archive(
- name = "gemmlowp",
- sha256 = "6678b484d929f2d0d3229d8ac4e3b815a950c86bb9f17851471d143f6d4f7834",
- strip_prefix = "gemmlowp-12fed0cd7cfcd9e169bf1925bc3a7a58725fdcc3",
- urls = [
- "http://mirror.tensorflow.org/github.com/google/gemmlowp/archive/12fed0cd7cfcd9e169bf1925bc3a7a58725fdcc3.zip",
- "https://github.com/google/gemmlowp/archive/12fed0cd7cfcd9e169bf1925bc3a7a58725fdcc3.zip",
- ],
-)
-
-#-----------------------------------------------------------------------------
-# proto
-#-----------------------------------------------------------------------------
-# proto_library, cc_proto_library and java_proto_library rules implicitly depend
-# on @com_google_protobuf//:proto, @com_google_protobuf//:cc_toolchain and
-# @com_google_protobuf//:java_toolchain, respectively.
-# This statement defines the @com_google_protobuf repo.
-http_archive(
- name = "com_google_protobuf",
- strip_prefix = "protobuf-3.8.0",
- urls = ["https://github.com/google/protobuf/archive/v3.8.0.zip"],
- sha256 = "1e622ce4b84b88b6d2cdf1db38d1a634fe2392d74f0b7b74ff98f3a51838ee53",
-)
-
-# java_lite_proto_library rules implicitly depend on
-# @com_google_protobuf_javalite//:javalite_toolchain, which is the JavaLite proto
-# runtime (base classes and common utilities).
-http_archive(
- name = "com_google_protobuf_javalite",
- strip_prefix = "protobuf-384989534b2246d413dbcd750744faab2607b516",
- urls = ["https://github.com/google/protobuf/archive/384989534b2246d413dbcd750744faab2607b516.zip"],
- sha256 = "79d102c61e2a479a0b7e5fc167bcfaa4832a0c6aad4a75fa7da0480564931bcc",
-)
-
-#
-# http_archive(
-# name = "com_google_protobuf",
-# strip_prefix = "protobuf-master",
-# urls = ["https://github.com/protocolbuffers/protobuf/archive/master.zip"],
-# )
-
-# Needed by TensorFlow
-http_archive(
- name = "io_bazel_rules_closure",
- sha256 = "e0a111000aeed2051f29fcc7a3f83be3ad8c6c93c186e64beb1ad313f0c7f9f9",
- strip_prefix = "rules_closure-cf1e44edb908e9616030cc83d085989b8e6cd6df",
- urls = [
- "http://mirror.tensorflow.org/github.com/bazelbuild/rules_closure/archive/cf1e44edb908e9616030cc83d085989b8e6cd6df.tar.gz",
- "https://github.com/bazelbuild/rules_closure/archive/cf1e44edb908e9616030cc83d085989b8e6cd6df.tar.gz", # 2019-04-04
- ],
-)
-
-
-# TensorFlow r1.14-rc0
-http_archive(
- name = "org_tensorflow",
- strip_prefix = "tensorflow-1.14.0-rc0",
- sha256 = "76404a6157a45e8d7a07e4f5690275256260130145924c2a7c73f6eda2a3de10",
- urls = ["https://github.com/tensorflow/tensorflow/archive/v1.14.0-rc0.zip"],
-)
-
-load("@org_tensorflow//tensorflow:workspace.bzl", "tf_workspace")
-tf_workspace(tf_repo_name = "org_tensorflow")
-
-git_repository(
- name = "libedgetpu",
- remote = "sso://coral.googlesource.com/edgetpu-native",
- commit = "83e47d1bcf22686fae5150ebb99281f6134ef062",
-)
diff --git a/research/lstm_object_detection/tflite/mobile_lstd_tflite_client.cc b/research/lstm_object_detection/tflite/mobile_lstd_tflite_client.cc
deleted file mode 100644
index 05a7bbac1b5..00000000000
--- a/research/lstm_object_detection/tflite/mobile_lstd_tflite_client.cc
+++ /dev/null
@@ -1,261 +0,0 @@
-/* Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-==============================================================================*/
-
-#include "mobile_lstd_tflite_client.h"
-
-#include
-
-namespace lstm_object_detection {
-namespace tflite {
-
-std::unique_ptr MobileLSTDTfLiteClient::Create() {
- auto client = absl::make_unique();
- if (!client->InitializeClient(CreateDefaultOptions())) {
- LOG(ERROR) << "Failed to initialize client";
- return nullptr;
- }
- return client;
-}
-
-protos::ClientOptions MobileLSTDTfLiteClient::CreateDefaultOptions() {
- const int kMaxDetections = 100;
- const int kClassesPerDetection = 1;
- const double kScoreThreshold = -2.0;
- const double kIouThreshold = 0.5;
-
- protos::ClientOptions options;
- options.set_max_detections(kMaxDetections);
- options.set_max_categories(kClassesPerDetection);
- options.set_score_threshold(kScoreThreshold);
- options.set_iou_threshold(kIouThreshold);
- options.set_agnostic_mode(false);
- options.set_quantize(false);
- options.set_num_keypoints(0);
-
- return options;
-}
-
-std::unique_ptr MobileLSTDTfLiteClient::Create(
- const protos::ClientOptions& options) {
- auto client = absl::make_unique();
- if (!client->InitializeClient(options)) {
- LOG(ERROR) << "Failed to initialize client";
- return nullptr;
- }
- return client;
-}
-
-bool MobileLSTDTfLiteClient::InitializeInterpreter(
- const protos::ClientOptions& options) {
- if (options.prefer_nnapi_delegate()) {
- LOG(ERROR) << "NNAPI not supported.";
- return false;
- } else {
- interpreter_->UseNNAPI(false);
- }
-
-#ifdef ENABLE_EDGETPU
- interpreter_->SetExternalContext(kTfLiteEdgeTpuContext,
- edge_tpu_context_.get());
-#endif
-
- // Inputs are: normalized_input_image_tensor, raw_inputs/init_lstm_c,
- // raw_inputs/init_lstm_h
- if (interpreter_->inputs().size() != 3) {
- LOG(ERROR) << "Invalid number of interpreter inputs: " <<
- interpreter_->inputs().size();
- return false;
- }
-
- const std::vector input_tensor_indices = interpreter_->inputs();
- const TfLiteTensor& input_lstm_c =
- *interpreter_->tensor(input_tensor_indices[1]);
- if (input_lstm_c.dims->size != 4) {
- LOG(ERROR) << "Invalid input lstm_c dimensions: " <<
- input_lstm_c.dims->size;
- return false;
- }
- if (input_lstm_c.dims->data[0] != 1) {
- LOG(ERROR) << "Invalid input lstm_c batch size: " <<
- input_lstm_c.dims->data[0];
- return false;
- }
- lstm_state_width_ = input_lstm_c.dims->data[1];
- lstm_state_height_ = input_lstm_c.dims->data[2];
- lstm_state_depth_ = input_lstm_c.dims->data[3];
- lstm_state_size_ = lstm_state_width_ * lstm_state_height_ * lstm_state_depth_;
-
- const TfLiteTensor& input_lstm_h =
- *interpreter_->tensor(input_tensor_indices[2]);
- if (!ValidateStateTensor(input_lstm_h, "input lstm_h")) {
- return false;
- }
-
- // Outputs are:
- // TFLite_Detection_PostProcess,
- // TFLite_Detection_PostProcess:1,
- // TFLite_Detection_PostProcess:2,
- // TFLite_Detection_PostProcess:3,
- // raw_outputs/lstm_c, raw_outputs/lstm_h
- if (interpreter_->outputs().size() != 6) {
- LOG(ERROR) << "Invalid number of interpreter outputs: " <<
- interpreter_->outputs().size();
- return false;
- }
-
- const std::vector output_tensor_indices = interpreter_->outputs();
- const TfLiteTensor& output_lstm_c =
- *interpreter_->tensor(output_tensor_indices[4]);
- if (!ValidateStateTensor(output_lstm_c, "output lstm_c")) {
- return false;
- }
- const TfLiteTensor& output_lstm_h =
- *interpreter_->tensor(output_tensor_indices[5]);
- if (!ValidateStateTensor(output_lstm_h, "output lstm_h")) {
- return false;
- }
-
- // Initialize state with all zeroes.
- lstm_c_data_.resize(lstm_state_size_);
- lstm_h_data_.resize(lstm_state_size_);
- lstm_c_data_uint8_.resize(lstm_state_size_);
- lstm_h_data_uint8_.resize(lstm_state_size_);
-
- if (interpreter_->AllocateTensors() != kTfLiteOk) {
- LOG(ERROR) << "Failed to allocate tensors";
- return false;
- }
-
- return true;
-}
-
-bool MobileLSTDTfLiteClient::ValidateStateTensor(const TfLiteTensor& tensor,
- const std::string& name) {
- if (tensor.dims->size != 4) {
- LOG(ERROR) << "Invalid " << name << " dimensions: " << tensor.dims->size;
- return false;
- }
- if (tensor.dims->data[0] != 1) {
- LOG(ERROR) << "Invalid " << name << " batch size: " << tensor.dims->data[0];
- return false;
- }
- if (tensor.dims->data[1] != lstm_state_width_ ||
- tensor.dims->data[2] != lstm_state_height_ ||
- tensor.dims->data[3] != lstm_state_depth_) {
- LOG(ERROR) << "Invalid " << name << " dimensions: [" <<
- tensor.dims->data[0] << ", " << tensor.dims->data[1] << ", " <<
- tensor.dims->data[2] << ", " << tensor.dims->data[3] << "]";
- return false;
- }
- return true;
-}
-
-bool MobileLSTDTfLiteClient::ComputeOutputLayerCount() {
- // Outputs are: raw_outputs/box_encodings, raw_outputs/class_predictions,
- // raw_outputs/lstm_c, raw_outputs/lstm_h
- CHECK_EQ(interpreter_->outputs().size(), 4);
- num_output_layers_ = 1;
- return true;
-}
-
-bool MobileLSTDTfLiteClient::FloatInference(const uint8_t* input_data) {
- // Inputs are: normalized_input_image_tensor, raw_inputs/init_lstm_c,
- // raw_inputs/init_lstm_h
- CHECK(input_data) << "Input data cannot be null.";
- float* input = interpreter_->typed_input_tensor(0);
- CHECK(input) << "Input tensor cannot be null.";
- // Normalize the uint8 input image with mean_value_, std_value_.
- NormalizeInputImage(input_data, input);
-
- // Copy input LSTM state into TFLite's input tensors.
- float* lstm_c_input = interpreter_->typed_input_tensor(1);
- CHECK(lstm_c_input) << "Input lstm_c tensor cannot be null.";
- std::copy(lstm_c_data_.begin(), lstm_c_data_.end(), lstm_c_input);
-
- float* lstm_h_input = interpreter_->typed_input_tensor(2);
- CHECK(lstm_h_input) << "Input lstm_h tensor cannot be null.";
- std::copy(lstm_h_data_.begin(), lstm_h_data_.end(), lstm_h_input);
-
- // Run inference on inputs.
- CHECK_EQ(interpreter_->Invoke(), kTfLiteOk) << "Invoking interpreter failed.";
-
- // Copy LSTM state out of TFLite's output tensors.
- // Outputs are: raw_outputs/box_encodings, raw_outputs/class_predictions,
- // raw_outputs/lstm_c, raw_outputs/lstm_h
- float* lstm_c_output = interpreter_->typed_output_tensor(2);
- CHECK(lstm_c_output) << "Output lstm_c tensor cannot be null.";
- std::copy(lstm_c_output, lstm_c_output + lstm_state_size_,
- lstm_c_data_.begin());
-
- float* lstm_h_output = interpreter_->typed_output_tensor(3);
- CHECK(lstm_h_output) << "Output lstm_h tensor cannot be null.";
- std::copy(lstm_h_output, lstm_h_output + lstm_state_size_,
- lstm_h_data_.begin());
- return true;
-}
-
-bool MobileLSTDTfLiteClient::QuantizedInference(const uint8_t* input_data) {
- // Inputs are: normalized_input_image_tensor, raw_inputs/init_lstm_c,
- // raw_inputs/init_lstm_h
- CHECK(input_data) << "Input data cannot be null.";
- uint8_t* input = interpreter_->typed_input_tensor(0);
- CHECK(input) << "Input tensor cannot be null.";
- memcpy(input, input_data, input_size_);
-
- // Copy input LSTM state into TFLite's input tensors.
- uint8_t* lstm_c_input = interpreter_->typed_input_tensor(1);
- CHECK(lstm_c_input) << "Input lstm_c tensor cannot be null.";
- std::copy(lstm_c_data_uint8_.begin(), lstm_c_data_uint8_.end(), lstm_c_input);
-
- uint8_t* lstm_h_input = interpreter_->typed_input_tensor(2);
- CHECK(lstm_h_input) << "Input lstm_h tensor cannot be null.";
- std::copy(lstm_h_data_uint8_.begin(), lstm_h_data_uint8_.end(), lstm_h_input);
-
- // Run inference on inputs.
- CHECK_EQ(interpreter_->Invoke(), kTfLiteOk) << "Invoking interpreter failed.";
-
- // Copy LSTM state out of TFLite's output tensors.
- // Outputs are:
- // TFLite_Detection_PostProcess,
- // TFLite_Detection_PostProcess:1,
- // TFLite_Detection_PostProcess:2,
- // TFLite_Detection_PostProcess:3,
- // raw_outputs/lstm_c, raw_outputs/lstm_h
- uint8_t* lstm_c_output = interpreter_->typed_output_tensor(4);
- CHECK(lstm_c_output) << "Output lstm_c tensor cannot be null.";
- std::copy(lstm_c_output, lstm_c_output + lstm_state_size_,
- lstm_c_data_uint8_.begin());
-
- uint8_t* lstm_h_output = interpreter_->typed_output_tensor(5);
- CHECK(lstm_h_output) << "Output lstm_h tensor cannot be null.";
- std::copy(lstm_h_output, lstm_h_output + lstm_state_size_,
- lstm_h_data_uint8_.begin());
- return true;
-}
-
-bool MobileLSTDTfLiteClient::Inference(const uint8_t* input_data) {
- if (input_data == nullptr) {
- LOG(ERROR) << "input_data cannot be null for inference.";
- return false;
- }
- if (IsQuantizedModel())
- return QuantizedInference(input_data);
- else
- return FloatInference(input_data);
- return true;
-}
-
-} // namespace tflite
-} // namespace lstm_object_detection
diff --git a/research/lstm_object_detection/tflite/mobile_lstd_tflite_client.h b/research/lstm_object_detection/tflite/mobile_lstd_tflite_client.h
deleted file mode 100644
index e4f16bc945a..00000000000
--- a/research/lstm_object_detection/tflite/mobile_lstd_tflite_client.h
+++ /dev/null
@@ -1,74 +0,0 @@
-/* Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-==============================================================================*/
-
-#ifndef TENSORFLOW_MODELS_LSTM_OBJECT_DETECTION_TFLITE_MOBILE_LSTD_TFLITE_CLIENT_H_
-#define TENSORFLOW_MODELS_LSTM_OBJECT_DETECTION_TFLITE_MOBILE_LSTD_TFLITE_CLIENT_H_
-
-#include
-#include
-
-#include
-#include "mobile_ssd_client.h"
-#include "mobile_ssd_tflite_client.h"
-
-namespace lstm_object_detection {
-namespace tflite {
-
-// Client for LSTD MobileNet TfLite model.
-class MobileLSTDTfLiteClient : public MobileSSDTfLiteClient {
- public:
- MobileLSTDTfLiteClient() = default;
- // Create with default options.
- static std::unique_ptr Create();
- static std::unique_ptr Create(
- const protos::ClientOptions& options);
- ~MobileLSTDTfLiteClient() override = default;
- static protos::ClientOptions CreateDefaultOptions();
-
- protected:
- bool InitializeInterpreter(const protos::ClientOptions& options) override;
- bool ComputeOutputLayerCount() override;
- bool Inference(const uint8_t* input_data) override;
-
- private:
- // MobileLSTDTfLiteClient is neither copyable nor movable.
- MobileLSTDTfLiteClient(const MobileLSTDTfLiteClient&) = delete;
- MobileLSTDTfLiteClient& operator=(const MobileLSTDTfLiteClient&) = delete;
-
- bool ValidateStateTensor(const TfLiteTensor& tensor, const std::string& name);
-
- // Helper functions used by Inference functions.
- bool FloatInference(const uint8_t* input_data);
- bool QuantizedInference(const uint8_t* input_data);
-
- // LSTM model parameters.
- int lstm_state_width_ = 0;
- int lstm_state_height_ = 0;
- int lstm_state_depth_ = 0;
- int lstm_state_size_ = 0;
-
- // LSTM state stored between float inference runs.
- std::vector lstm_c_data_;
- std::vector lstm_h_data_;
-
- // LSTM state stored between uint8 inference runs.
- std::vector lstm_c_data_uint8_;
- std::vector lstm_h_data_uint8_;
-};
-
-} // namespace tflite
-} // namespace lstm_object_detection
-
-#endif // TENSORFLOW_MODELS_LSTM_OBJECT_DETECTION_TFLITE_MOBILE_LSTD_TFLITE_CLIENT_H_
diff --git a/research/lstm_object_detection/tflite/mobile_ssd_client.cc b/research/lstm_object_detection/tflite/mobile_ssd_client.cc
deleted file mode 100644
index 27bf70109e4..00000000000
--- a/research/lstm_object_detection/tflite/mobile_ssd_client.cc
+++ /dev/null
@@ -1,209 +0,0 @@
-/* Copyright 2019 The TensorFlow Authors. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-==============================================================================*/
-
-#include "mobile_ssd_client.h"
-
-#include
-
-#include