Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MobileNetV3 QAT TFLite Conversion Issue #1107

Open
tarushbansal opened this issue Jan 15, 2024 · 7 comments
Open

MobileNetV3 QAT TFLite Conversion Issue #1107

tarushbansal opened this issue Jan 15, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@tarushbansal
Copy link

tarushbansal commented Jan 15, 2024

Prior to filing: check that this should be a bug instead of a feature request. Everything supported, including the compatible versions of TensorFlow, is listed in the overview page of each technique. For example, the overview page of quantization-aware training is here. An issue for anything not supported should be a feature request.

Describe the bug
A clear and concise description of what the bug is.

This is a similar issue to #368 but for MobileNetV3Large, where after Quantisation Aware Training, I see a large drop in accuracy in the QAT TFLite model compared to the corresponding QAT Keras Model. Minor Implementation details: I had to refactor the default MobileNetV3Large Keras Code to make it compatible with QAT in the Tensorflow Model Optimisation library by replacing the Add operations in its Hard Sigmoid function with Rescaling and using a Moving Average Output only Quantiser for the Multiply and Rescaling layers in the network. I train the network for more than 6-7 epochs with ~5100 batches in each epoch (each batch consisting of 10 samples) but I see no convergence between the Keras and TFlite models as was seen in #368. I see a number of people in #974 have raised the same issue but this has not been fixed yet. It's likely that is a kernel implementation bug similar to #368 so would be great if a fix could be developed for this. It might be helpful to note that I didn't face this issue in MobileNetV3Large minimalistic which makes me wonder that the issue might be in the Multiply Layers of the Squeeze-Excite and Hard Swish functions.

Would appreciate any help. Thanks!

System information

TensorFlow version (installed from source or binary): 2.15.0

TensorFlow Model Optimization version (installed from source or binary): 0.7.5

Python version: 3.10.12

Keras Version: 2.15.0

Describe the expected behavior
QAT Keras Model should generate identical outputs to the converted QAT TFlite Model

Describe the current behavior
Converted QAT TFlite Model has much lower accuracy than the corresponding QAT Keras Model

Code to reproduce the issue
Provide a reproducible code that is the bare minimum necessary to generate the
problem.

This is the refactored part of MobileNetV3:

def hard_sigmoid(x):
    return layers.Rescaling(1.0 / 6.0, offset=0.0)(
        layers.ReLU(6.0)(layers.Rescaling(1.0, offset=3.0)(x))

The Quantization Config I use for Multiply and Rescaling layers is this:

class CustomQuantizeConfig(quantize_config.QuantizeConfig):
    """QuantizeConfig which only quantizes layer outputs."""

    def get_weights_and_quantizers(self, layer):
        return []

    def get_activations_and_quantizers(self, layer):
        return []

    def set_quantize_weights(self, layer, quantize_weights):
        pass

    def set_quantize_activations(self, layer, quantize_activations):
        pass

    def get_output_quantizers(self, layer):
        return [
            tfmot.quantization.keras.quantizers.MovingAverageQuantizer(
                num_bits=8, symmetric=False, narrow_range=False, per_axis=False
            )
        ]

    def get_config(self):
        return {}

I've tried using an AllValuesQuantizer as well since it was mentioned in #368 that MovingAverageQuantizer takes time to converge but that didn't seem to help either.

Screenshots
If applicable, add screenshots to help explain your problem.
Screenshot 2024-01-15 at 09 57 11
Screenshot 2024-01-15 at 09 57 47

Additional context
The use case for the network is Monocular Depth Estimation so there is a decoding network attached on top of the MobileNetV3 encoder.

@tarushbansal tarushbansal added the bug Something isn't working label Jan 15, 2024
@Alexey234432
Copy link

I had the same issue and was not able to solve it; it was specifically for MobileNetV3 with Keras / TFLite. If I remember correctly as a workaround I had to switch to "SLIM" based implementation (w/o Keras at all).

@tarushbansal
Copy link
Author

Hi @Alexey234432. Just for curiosity, did you have any luck with getting the SLIM based implementation working? It's unfortunate that Keras / TFLite doesn't have a solution for this since MobileNetV3 is quite a popular choice for edge-device use cases these days.

@Alexey234432
Copy link

Alexey234432 commented Jan 15, 2024

Hi @Alexey234432. Just for curiosity, did you have any luck with getting the SLIM based implementation working? It's unfortunate that Keras / TFLite doesn't have a solution for this since MobileNetV3 is quite a popular choice for edge-device use cases these days.

Yes, it worked for my use case. https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/README.md SLIM implementation already has 8bit weights (I guess QAT) and their accuracy after TFLite conversion was good. I think this is an indication that the problem is not in architecture of MobileNetV3 but a likely a bug somewhere in tooling.

@doyeonkim0
Copy link
Member

Hi @tucan9389
Could you please take a look into this? Thank you!

@tucan9389 tucan9389 self-assigned this Apr 1, 2024
@tucan9389 tucan9389 removed their assignment Jul 17, 2024
@MATTYGILO
Copy link

@tarushbansal @Alexey234432
Has a solution been found for tensorflow keras?

@MATTYGILO
Copy link

@tucan9389 Has a solution been found?

@tucan9389
Copy link
Member

@MATTYGILO No.. As far as I know, this thread is up-to-date information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants