Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bonded clients do not work when switching from 1.4.2 to 2.0.0 or vice versa #740

Open
KlausMu opened this issue Nov 2, 2024 · 13 comments
Open

Comments

@KlausMu
Copy link

KlausMu commented Nov 2, 2024

I am using NimBLE with an ESP32. It seems the way bonded clients are stored in NVS changed from NimBLE 1.4.2 to 2.0.0.

Going from 1.4.2 to 2.0.0
In 2.0.0, I can connect to an already bonded peer (which was bonded in 1.4.2). Even direct advertisement works (see #651)
But when sending data, the result is

D NimBLECharacteristic: >> sendValue
D NimBLECharacteristic: << sendValue: No clients subscribed.
D NimBLECharacteristic: >> sendValue
D NimBLECharacteristic: << sendValue: No clients subscribed.

Normally, in version 1.4.2, when sending data, there is something like

D NimBLECharacteristic: >> setValue: length=8, data=0000510000000000, characteristic UUID=0x2a4d
D NimBLECharacteristic: << setValue
D NimBLECharacteristic: >> notify: length: 8
D NimBLECharacteristicCallbacks: onNotify: default
D NimBLEServer: >> handleGapEvent:
D NimBLECharacteristicCallbacks: onStatus: default
D NimBLECharacteristic: << notify
D NimBLECharacteristic: >> setValue: length=8, data=0000000000000000, characteristic UUID=0x2a4d
D NimBLECharacteristic: << setValue
D NimBLECharacteristic: >> notify: length: 8
D NimBLECharacteristicCallbacks: onNotify: default
D NimBLEServer: >> handleGapEvent:
D NimBLECharacteristicCallbacks: onStatus: default
D NimBLECharacteristic: << notify

As a result, I have to delete all bonds in 2.0.0 and to repair them. From that on, everything works as expected.

Going from 2.0.0 to 1.4.2
When going back from 2.0.0 to 1.4.2, the software crashes at

void NimBLEDevice::init(const std::string &deviceName) {
  ble_store_config_init();

If there was no bonded peer in 2.0.0, going back to 1.4.2 works without problem.

@h2zero
Copy link
Owner

h2zero commented Nov 3, 2024

It appears this may be a result of upstream changes in the way the data is stored in NVS, I haven't identified the changes yet but it may just have to be another part of the breaking changes coming with 2.0.0 release.

@h2zero
Copy link
Owner

h2zero commented Dec 3, 2024

@KlausMu I have identified the cause of this issue and it was an intended change, that is also a good one.
The commit here changes the IRK when freshly flashed and booted to a random one so that all the esp32 devices have different ones rather than the same, which was a problem when trying to bond 1 device with many esp32's.

@KlausMu
Copy link
Author

KlausMu commented Dec 3, 2024

Ok, thanks. Do I understand correctly:

Going from 1.4.2 to 2.0.0
ESP32 is freshly flashed and booted, IRK changes, already bonded devices do not work because of the changed IRK.
As an application developer I could store the last used version (1.4.2 or 2.0.0) in NVS and in that case delete the bonds or at least let the user do it.

Going from 2.0.0 back to 1.4.2
What is happening to the random IRK generated when 2.0.0 is used for the first time? Will it be set back to the default one?
Why is the ESP32 crashing on boot? Anything we can do to avoid this? Is the only advice to erase the flash?

@h2zero
Copy link
Owner

h2zero commented Dec 3, 2024

Going from 1.4.2 to 2.0.0
ESP32 is freshly flashed and booted, IRK changes, already bonded devices do not work because of the changed IRK.

Correct, this basically changes the device identity.

Going from 2.0.0 back to 1.4.2
What is happening to the random IRK generated when 2.0.0 is used for the first time? Will it be set back to the default one?
Why is the ESP32 crashing on boot? Anything we can do to avoid this? Is the only advice to erase the flash?

The NVS partition will still have that IRK, the old firmware will not be aware of it.

I can't say for sure what the crashing is, I would need to test it. I suspect it's due to the different data stored in NVS due to the IRK change.

Best bet is to erase the flash because the NVS format will have been changed. Or at least erase the NVS on boot. You could do this by writing a version parameter or something and detecting it.

@h2zero
Copy link
Owner

h2zero commented Dec 9, 2024

@KlausMu I have created a way to detect the downgrade and only erase the NimBLE bond storage info instead of the entire NVS.

    esp_err_t err = nvs_flash_init();
    if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        err = nvs_flash_erase();
        if (err == ESP_OK) {
            err = nvs_flash_init();
        }
    }

    if (err != ESP_OK) {
        Serial.printf("nvs_flash_init() failed; err=%d", err);
    }

    nvs_handle_t nimble_handle;
    err = nvs_open("nimble_bond", NVS_READWRITE, &nimble_handle);
    if (err != ESP_OK) {
        Serial.printf("NVS open operation failed");
    }

    size_t required_size = 0;
    err = nvs_get_blob(nimble_handle, "rpa_rec_1", NULL, &required_size);
    if (err != ESP_OK) {
        err = nvs_get_blob(nimble_handle, "local_irk_1", NULL, &required_size);
    }

    if (err == ESP_OK) {
        nvs_erase_all(nimble_handle);
        nvs_commit(nimble_handle);
    }

    nvs_close(nimble_handle);

Give that a try and let me know.

@KlausMu
Copy link
Author

KlausMu commented Dec 10, 2024

@h2zero I think this could be useful, but I am not 100% sure which cases your code covers.

So when exactly is rpa_rec_1 present and when is local_irk_1 present.

I did some tests, and I think I understood the following:

rpa_rec_1
is present exactly if a bond from 2.0.0 is in NVS

local_irk_1
in 2.0.0 seems always to be present, no matter if bonds are in NVS or not, no matter if freshly flashed or not
in 1.4.2 present only if 2.0.0 was flashed before, no matter if bonds are in NVS or not

Overall, I think the following pseudo code could be used:

if ((NimBLE == 2.0.0) && (numbonds > 0) && (rpa_rec_1 is NOT present))  {
  // we are in 2.0.0  
  // we have at least one bond
  // but this bond is not from 2.0.0
  // -> we are going from 1.4.2 to 2.0..0 with an already bonded peer from 1.4.2
  // -> we have to delete bonds, because they will connect, but they will not work properly
  nvs_erase_all("nimble_bond") 

} else if ((NimBLE == 1.4.2) && (rpa_rec_1 is present)) {
  // we are in 1.4.2
  // we have a bond from 2.0.0 in NVS
  // -> we are going back from 2.0.0 to 1.4.2 with an already bonded peer from 2.0.0
  // -> we have to delete bonds, otherwise it will crash
  nvs_erase_all("nimble_bond") 

}

Do you think this is correct?
How can I get the version numer of NimBLE at runtime?

@h2zero
Copy link
Owner

h2zero commented Dec 10, 2024

Both of those keys are only present in version 2.x (nimble core 1.5), so in your app you can use the code I provided to detect if they exist and erase the bonds if found. I'd put this right at the beginning of setup, before any NimBLE calls.

@KlausMu
Copy link
Author

KlausMu commented Dec 10, 2024

Ok, they are only set by 2.0.0. And if I am in 1.4.2, I should simply delete the bonds (and these two blobs) in any case.

But I think the first part of my code (detect upgrade from 1.4.2 to 2.0.0) should be correct, right?
If I am in 2.0.0 and numBonds > 0 and rpa_rec_1 is NOT present (or both blobs are not set, should make no difference), I also have to delete all bonds.

@h2zero
Copy link
Owner

h2zero commented Dec 10, 2024

Yes, that would be correct as well.

@KlausMu
Copy link
Author

KlausMu commented Dec 10, 2024

Ok, I'll test it in more detail and give feedback here. Thanks!

@KlausMu
Copy link
Author

KlausMu commented Dec 11, 2024

Two more questions:

Question 1
NimBLEDevice::getNumBonds(); is only available after NimBLEDevice::init(deviceName); was called.
But init already makes the ESP32 crashing.
Is there any other way to check if there are already bonds stored in NVS? Any other data I could directly read vom NVS?
Sorry for asking, I tried digging through ble_store.cand ble_store_util.c, but it is really hard to understand for me ...

Question 2
After erasing the nimble_bond namespace, NimBLEDevice::getNumBonds() still returns the bonded peers.
I believe they are saved somewhere in RAM after they have been read from NVS. Can I force refilling the RAM from NVS?
Or do I have to reboot the ESP32?

@h2zero
Copy link
Owner

h2zero commented Dec 11, 2024

@KlausMu You can check for peer_sec_1 which will be there if at least 1 bond exists.

After erasing the nimble_bond namespace, NimBLEDevice::getNumBonds() still returns the bonded peers.
I believe they are saved somewhere in RAM after they have been read from NVS. Can I force refilling the RAM from NVS?
Or do I have to reboot the ESP32?

I can't think of any way to clear the RAM other than reset.

@KlausMu
Copy link
Author

KlausMu commented Dec 12, 2024

Finally, I got it working. Both when upgrading from 1.4.x to 2.0.x, and when downgrading from 2.0.x to 1.4.x.
Thanks @h2zero for your help.
For those who are interested, I'll post here the complete code of a function called delete_bonds_if_NimBLE_version_changed()
This code is called best as early as possible in setup(). It must be called before NimBLEDevice::init

// This include is only needed to determine if NimBLE 1.4.x or 2.0.x is used.
// NimBLE 2.0.x is using nimble core 1.5, and only in this version BLE_STORE_OBJ_TYPE_LOCAL_IRK is defined
#include "nimble/nimble/host/include/host/ble_store.h"
#if defined(BLE_STORE_OBJ_TYPE_LOCAL_IRK)
#define NIMBLE_ARDUINO_2_x
#endif

#include <nvs.h>
#include <nvs_flash.h>

void delete_bonds_if_NimBLE_version_changed() {
  // This function checks if bonds are already present when changing from NimBLE 1.4.x to 2.0.x or from 2.0.x back to 1.4.x
  // In these cases, we have to delete the already existing bonds.
  // Otherwise the bonds will not work (when going from 1.4.x to 2.0.x) or the ESP32 will even crash (when going from 2.0.x back to 1.4.x).
  // See https://github.com/h2zero/NimBLE-Arduino/issues/740
  // The name of the NVS partition and blobs used in this function can be seen here:
  // <nimble/nimble/host/store/config/src/ble_store_nvs.c>
  // NimBLE 1.4.x -> nimble core 1.4
  // NimBLE 2.0.x -> nimble core 1.5

  // startup: init flash
  esp_err_t err = nvs_flash_init();
  if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
    Serial.printf("nvs_flash_init() failed with error=%d, will erase flash\r\n", err);
    err = nvs_flash_erase();
    if (err != ESP_OK) {
      Serial.printf("nvs_flash_erase() failed with error=%d; will return\r\n", err);
      return;
    }
    err = nvs_flash_init();
    if (err != ESP_OK) {
      Serial.printf("nvs_flash_init() failed with error=%d, even after flash was erased; will return\r\n", err);
      return;
    }
  }
 
  // open partition "nimble_bond" where the bonds are stored
  nvs_handle_t nimble_bond_handle;
  err = nvs_open("nimble_bond", NVS_READWRITE, &nimble_bond_handle);
  if (err != ESP_OK) {
    Serial.printf("nvs_open 'nimble_bond' failed with error=%d, will return\r\n", err);
    return;
  }

  size_t required_size = 0;
  // Key generated during the pairing process. Present if a bond exists, used by NimBLE 1.4.x and NimBLE 2.0.x
  err = nvs_get_blob(nimble_bond_handle, "peer_sec_1", NULL, &required_size);
  bool bond_exists = (err == ESP_OK);
  // Resolvable Private Address (RPA): Bluetooth Device Address that changes periodically.
  // Only present in NimBLE 2.0.x
  err = nvs_get_blob(nimble_bond_handle, "rpa_rec_1", NULL, &required_size);
  bool rpa_exists = (err == ESP_OK);
  // Identity Resolving Key (IRK): Key used for Address Resolution (resolves an RPA).
  // Only present in NimBLE 2.0.x
  err = nvs_get_blob(nimble_bond_handle, "local_irk_1", NULL, &required_size);
  bool irk_exists = (err == ESP_OK);
  // and just for information, what an Identity Address is:
  // Identity Address: An address associated with an RPA that does not change over time. An IRK is required to resolve an RPA to its Identity Address.
 
  // Serial.printf("'peer_sec_1' present: %s; 'rpa_rec_1' present: %s; 'local_irk_1' present: %s\r\n", bond_exists ? "yes" : "no", rpa_exists ? "yes" : "no", irk_exists ? "yes" : "no");
  /*
                                              peer_sec_1 rpa_rec_1  local_irk_1     partition 'nimble_bond' should be deleted
  1.4.x, no bonds                             NO         NO         NO
  1.4.x, with bonds from 1.4.x                YES        NO         NO
  1.4.x, with bonds from 2.0.x                YES        YES        YES             x  (otherwise will not work)
  1.4.x, with bonds from 2.0.x deleted        NO         NO         Y/N(*)         (x) (just to be save, would work without)    (*)YES or NO, depending on ESP32 has rebooted at least once in 2.0.x after bond was deleted
  2.0.x, no bonds                             NO         NO         YES
  2.0.x, with bonds from 1.4.x                YES        NO         YES             x  (otherwise will crash)
  2.0.x, with bonds from 1.4.x deleted        NO         NO         YES
  2.0.x, with bonds from 2.0.x                YES        YES        YES
  */

  #if !defined(NIMBLE_ARDUINO_2_x)
  // We are in NimBLE 1.4.x. Check if we downgraded from NimBLE 2.0.x
  bool erase_nimble_partition = (rpa_exists || irk_exists);
  if (erase_nimble_partition) {
    Serial.printf("We are using NimBLE 1.4.x, but bonds from NimBLE 2.0.x are present. We have to delete all bonds, otherwise ESP32 will crash! Please bond your peers again.\r\n");
  }
  #else
  // We are in NimBLE 2.0.x. Check if we upgraded from NimBLE 1.4.x
  bool erase_nimble_partition = bond_exists && !(rpa_exists);
  if (erase_nimble_partition) {
    Serial.printf("We are using NimBLE 2.0.x, but bonds from NimBLE 1.4.x are present. We have to delete all bonds, otherwise they will not work! Please bond your peers again.\r\n");
  }
  #endif

  if (erase_nimble_partition) {
    nvs_erase_all(nimble_bond_handle);
    nvs_commit(nimble_bond_handle);
    nvs_close(nimble_bond_handle);
    // ESP needs to be restarted, because NVS data is still in nimble RAM
    Serial.printf("  NVS partition 'nimble_bond' was erased. Now we have to restart the ESP32 to also clear nimble RAM.\r\n");
    ESP.restart();
  } else {
    nvs_close(nimble_bond_handle);
  }
}

@h2zero h2zero pinned this issue Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants