Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update WiFiMulti.cpp to support FreeRTOS #1878

Closed
wants to merge 10 commits into from

Conversation

richteel
Copy link
Contributor

@richteel richteel commented Dec 4, 2023

When used with FreeRTOS, the delay statement caused the code to stop responding as FreeRTOS was stuck. Replaced delay with another loop to do nothing for 5ms.

Attempting to fix astyle failed test.

richteel and others added 8 commits November 26, 2023 22:26
Added clearAPList method to allow loading of new APs for WiFiMulti.
Added clearAPList method to allow loading of new APs for WiFiMulti.
Changed whitespace before clearAPList method.
When used with FreeRTOS, the delay statement caused the code to stop responding as FreeRTOS was stuck. Replaced delay with another loop to do nothing for 5ms.
Attempted to fix indentations to pass astyle checks
@richteel
Copy link
Contributor Author

richteel commented Dec 4, 2023

@earlephilhower, I've made a few attempts to get this to pass the astyle check but still fails there. If you could let me know what may be wrong here, I will try to correct formatting in the future. In the meantime, I would appreciate it if you could pull this into the code base.

I also had a question regarding having Arduino IDE pick up these changes. I assume a new version needs to be created, which is done on a schedule or major update of the library.

@maxgerhardt
Copy link
Contributor

maxgerhardt commented Dec 4, 2023

The action details always show you why something failed. In this case, you are missing a space before the bracket of an if statement and a space before a variable name and the -.

grafik

You can also execute the checks locally like the CI does

sudo apt install astyle
./tests/restyle.sh
# If anything changed, GIT should return an error and fail the test
git diff --exit-code

@richteel
Copy link
Contributor Author

richteel commented Dec 4, 2023

@maxgerhardt, Thanks, I looked at the output, but did not see the issue on line 562. Thank you, I assumed it was the preceding spaces and did not look at the spaces in the statements. I appreciate your reply and will try running the checks locally as you mentioned.

Copy link
Owner

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give an MCVE showing the failure you're correcting here. WiFi is under use by many folks w/FreeRTOS already so I'm not sure exactly what you're seeing or trying to accomplish here.

Please also explain your reasoning for swapping out a task-switching delay for a busy wait. I'm not grokking any difference except this ties up a complete core while waiting for WiFi vs. letting other tasks run...

// failure when using RTOS
unsigned long tWifiDelayForRtos_start = millis();
while (millis()-tWifiDelayForRtos_start <= 5) {
if(millis()<tWifiDelayForRtos_start) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, with unsigned math there is no need for this check at all. Wraparound deltas just work.

delay(5);
// Replaced delay(5); with the following to prevent
// failure when using RTOS
unsigned long tWifiDelayForRtos_start = millis();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delayMicroseconds(5000); // Avoid task switch in FreeRTOS does the same thing, more simply.

@maxgerhardt
Copy link
Contributor

I think this PR is a direct result of #1873. Let me just test their example code that they say hangs..

@earlephilhower
Copy link
Owner

Yeah, I saw the full app. I was hoping for a nice, small MCVE w/just WiFiMulti and FreeRTOS. The app looks like is uses a SD card for settings and other extraneous stuff which may be the cause of their issue, not WiFiMulti.

The code being changed is essentially the beginning of every WiFi example we have: (start wifi, while !connected delay()) and a common configuration. I would bet anyone using FreeRTOS and WiFi would be doing the same kind of thing...

The change as-is seems to be a pure no-op logically. If it's really needed, then there's another, deeper problem in the WiFi stack because the WiFi chip should be connecting async from whatever the main core is doing anyway.

@maxgerhardt
Copy link
Contributor

I've just tested this with the code.zip in the discussion but cut out the SD card by commenting out all calls to it and instead hardcoding the config object in setup()

  // Overwrite config object
  clockSdCard.sdCardConfig = {
    .hostname = "My Pico W",
    .timezone = "GMT",
    .networks = {
      {
        .ssid = "Vodafone-1E90",
        .pass = "XXXXXXXXXXXXXXX"
      }
    }
  };
  configLoadMillis = millis();

And ran the sketch with delay(5); and delayMicroseconds(5000);. In the discussion it seems like this patch helps reduce the connection tries a ton ("it took 12 resets before it connected and kept working"). I did my own testing and found that, indeed, with the microsecond delay, it never had to retry to connect to WiFi. In the millisecond delay, it sometimes had one connection retry.

Connecting WiFi...

Failed to connect to Wi-Fi. Trying again...
------^^------ WIFI: Added AP Vodafone-1E90 ------^^------
Connecting WiFi...

@earlephilhower
Copy link
Owner

earlephilhower commented Dec 4, 2023

I can't reproduce the failure here using a simplest MCVE with only WiFiMulti.

12:19:59.501 -> Connecting to NOBABIES
12:19:59.501 -> [WIFIMULTI] Adding: 'NOBABIES' xxxx' to list
12:19:59.501 -> [WIFIMULTI] Rescanning to build new list of APs
12:20:00.992 -> [WIFIMULTI] scanList: SSID: 'NOBABIES' -- BSSID: '382C4A90FA80' -- RSSI: -30
12:20:00.992 -> [WIFIMULTI] scanList: SSID: 'NOBABIES' -- BSSID: 'AC84C66CEFF3' -- RSSI: -56
12:20:00.992 -> [WIFIMULTI] scanList: SSID: 'NOBABIES' -- BSSID: '50D4F7FEA34D' -- RSSI: -74
12:20:00.992 -> [WIFIMULTI] Connecting to: SSID: 'NOBABIES' -- BSSID: '382C4A90FA80' -- RSSI: -30
12:20:07.458 -> 

Would you be able to try adding #include <FreeRTOS.h> to the File->Examples->WiFi->WiFiClient.ino example, enable Tools->Debug Port->Serial and Tools->Debug Level->All and capture the output as it runs?

Random guessing here, but what may be happening is noise on the power supply for some PicoWs affecting WiFi performance. When you delay() the core goes to sleep, and the power supply's going to have to adjust down it's supplied current quickly. When you delayMicroseconds() the core runs at 100% meaning the power supply itself doesn't see any transients and should theoretically be less noisy, meaning better WiFi reception. Especially if you're at the limits of reception already.

The debug logs w/RSSI might help see if it's a weak signal problem.

@maxgerhardt
Copy link
Contributor

In a minimal sketch using WiFiMulti and FreeRTOS I can also not determine a difference between delay() and delayMicroseconds(). Each connection is done without retries. My MVCE

#include <Arduino.h>
#include <FreeRTOS.h>
#include <task.h>
#include <semphr.h>
#include <WiFi.h>
#include <WiFiMulti.h>

#define WIFI_SSID "Vodafone-1E90"
#define WIFI_PASS "XXXXXXXXXXXXXX"

WiFiMulti multi;
int _beginLoopCount;
bool skip = false;

void setup() {
    while(!Serial);
    multi.addAP(WIFI_SSID, WIFI_PASS);
    delay(1000);    
    Serial.println("Connecting to " + String(WIFI_SSID));
    Serial.flush();
}

void loop() {
    if(skip) return;
    Serial.println("Running on core " + String(get_core_num()) + " task ID " + String(uxTaskGetTaskNumber(xTaskGetCurrentTaskHandle())));
    unsigned long startConnect = millis();
    if (multi.run() != WL_CONNECTED) {
      if (_beginLoopCount < 5) {
        Serial.println("\nFailed to connect to Wi-Fi. Trying again...");
        _beginLoopCount++;
        delay(1000);
      } else {
        Serial.println("\nFailed to connect to Wi-Fi. Quiting");
        _beginLoopCount = 0;
      }
    } else {
      _beginLoopCount = 0;
      Serial.print("\nWiFi connected, SSID: ");
      Serial.print(WiFi.SSID());
      Serial.print(", IP address: ");
      Serial.println(WiFi.localIP());
      Serial.printf("Connected in %d ms\n", millis() - startConnect);
      skip = true;
    }
}

@earlephilhower
Copy link
Owner

earlephilhower commented Dec 4, 2023

I also tried the full app example (commenting out and hacking in the config like you did @maxgerhardt and making all debug go to USB). I did not use the edited WiFiMulti.cpp/.h in the lib directory, only the core version.

earle@amd:~/Arduino$ diff -w --recursive sketch_dec2a1 sketch_dec2a
diff -w --recursive sketch_dec2a1/DbgPrint.h sketch_dec2a/DbgPrint.h
6,16d5
< 
< #if DEBUG
< #if USE_PICOPROBE
< #define Dbg_begin(...) Serial1.begin(__VA_ARGS__);
< #define Dbg_end(...) Serial1.end(__VA_ARGS__);
< #define Dbg_print(...) Serial1.print(__VA_ARGS__)
< #define Dbg_println(...) Serial1.println(__VA_ARGS__)
< #define Dbg_printf(...) Serial1.printf(__VA_ARGS__)
< #define Dbg_printlnf(...) Serial1.printlnf(__VA_ARGS__)
< #define Dbg_write(...) Serial1.write(__VA_ARGS__)
< #else
24,33d12
< #endif
< #else
< #define Dbg_begin(...)
< #define Dbg_end(...)
< #define Dbg_print(...)
< #define Dbg_println(...)
< #define Dbg_printf(...)
< #define Dbg_printlnf(...)
< #define Dbg_write(...)
< #endif
diff -w --recursive sketch_dec2a1/sketch_dec2a.ino sketch_dec2a/sketch_dec2a.ino
111c111
< 
---
> delay(5000);
122c122,134
<   clockSdCard.begin();
---
>   // Overwrite config object
>   clockSdCard.sdCardConfig = {
>     .hostname = "My Pico W",
>     .timezone = "GMT",
>     .networks = {
>       {
>         .ssid = "NOBABIES",
>         .pass = "xxxx"
>       }
>     }
>   };
>   configLoadMillis = millis();
> //  clockSdCard.begin();
126,128c138,140
<   if (clockSdCard.isCardPresent()) {
<     configLoadMillis = millis();
<   }
---
> //  if (clockSdCard.isCardPresent()) {
> //    configLoadMillis = millis();
> //  }

I've power-cycled the PicoW about ten times so far and it always connects on the 1st try for me. That said, I am ~3 meters from the AP it's connecting to w/a RSSI of about -32. So if this is noise related there's plenty of headroom in the signal it's seeing which might not be the case at the @richteel's location...


12:42:10.160 -> 
12:42:10.160 -> sdcard_init_complete = true
12:42:10.160 -> -----------------------------	0	149248
12:42:10.160 -> END: setup()	0	149248
12:42:10.160 -> .START: Restart Wi-Fi with new Config
12:42:10.160 -> [WIFIMULTI] Adding: 'NOBABIES' xx' to list
12:42:10.160 -> ------^^------ WIFI: Added AP NOBABIES ------^^------
12:42:10.160 -> Connecting WiFi...
12:42:10.160 -> [WIFIMULTI] Rescanning to build new list of APs
12:42:11.683 -> [WIFIMULTI] scanList: SSID: 'NOBABIES' -- BSSID: '382C4A90FA80' -- RSSI: -32
12:42:11.683 -> [WIFIMULTI] scanList: SSID: 'NOBABIES' -- BSSID: 'AC84C66CEFF3' -- RSSI: -52
12:42:11.683 -> [WIFIMULTI] scanList: SSID: 'NOBABIES' -- BSSID: '50D4F7FEA34D' -- RSSI: -67
12:42:11.683 -> [WIFIMULTI] Connecting to: SSID: 'NOBABIES' -- BSSID: '382C4A90FA80' -- RSSI: -32
12:42:18.176 -> 
12:42:18.176 -> WiFi connected, SSID: NOBABIES, IP address: 192.168.1.131
12:42:18.176 -> Connected in 8008 ms
12:42:18.176 -> ------^^------ WIFI: Starting Station Mode ------^^------
12:42:18.176 -> Wi-Fi Begin Completed
12:42:18.176 -> END: Restart Wi-Fi with new Config
12:42:18.176 -> START: Update IP Address
12:42:18.176 -> IP Address has changed to 192.168.1.131
12:42:18.176 -> END: Update IP Address
12:42:18.176 -> START: Print Debug Statement
12:42:18.176 -> Wi-Fi has changed to Station Mode
12:42:18.176 -> END: Print Debug Statement
12:42:18.441 -> ..........checkWiFi	0	147344

@richteel
Copy link
Contributor Author

richteel commented Dec 4, 2023 via email

@richteel
Copy link
Contributor Author

richteel commented Dec 4, 2023 via email

@earlephilhower
Copy link
Owner

Re: delay() vs vTaskDelay()...in this core, delay() calls it behind the scenes, no need to change Arduino code:

extern "C" void delay(unsigned long ms) {
vTaskDelay(ms / portTICK_PERIOD_MS);
}

@earlephilhower
Copy link
Owner

Also, by "power" I meant the onboard power supply. There's an LDO on the PCB which can operate in different modes to bring the 5V USB down to the 3.3V (or lower) required by the chip. I was suggesting that the transients could come from that and not the USB cable or hub itself (although on the ESP8266 there are many, many instances where a poor quality USB cable caused WiFi problems there with Vdroop because it was so power hungry while transmitting...)

Checking the seen RSSI would be a useful thing just to see if it's at the margin or if it's loud and clear to the Pico.

@richteel
Copy link
Contributor Author

richteel commented Dec 5, 2023

Here are the results of running the ScanNetworks example code to get the RSSI for each network.

Beginning scan at 5003
Found 8 networks

                            SSID   ENC     BSSID         CH RSSI
                        tl6cd020  WPA2 00:25:F0:6C:D0:20 10  -62
                      bananabros  WPA2 08:BF:B8:22:DC:30  2  -81
                      Fios-212BS  WPA2 20:C0:47:C4:3F:04  1  -78
                      Fios-X0BEY  WPA2 20:C0:47:EC:33:2E  1  -89
                        T_Travel  WPA2 28:87:BA:88:10:5A 10  -46
                                  AUTO 56:12:65:53:20:5F  6  -87
                         TeelSys  WPA2 C4:41:1E:A9:B8:57 10  -34
                                  WPA2 CA:41:1E:A9:B8:57 10  -33

--- Sleeping ---

Running the minimal example code that @maxgerhardt, posted above, I get the same behavior as I saw in my example code. The Raspberry Pico W seems to lock up as it fails to execute the remaining code. Also, when I attempt to upload a new sketch, I need to unplug it and press the BOOTSEL button while I plug it back in. I tried this on three Pico W devices, with one of them being brand new. Here is the output of the sample code above. It has been sitting for about 10 minutes now without advancing any further.

Connecting to TeelSys
Running on core 0 task ID 3330250393

I tried the code without any references to FreeRTOS and it connects right away. Here is the results of that code.

Connecting to TeelSys
Running on core 0

WiFi connected, SSID: TeelSys, IP address: 10.140.1.58
Connected in 8001 ms

It connects successfully with debugging but takes a long time to connect as it fails with the first attempt.
Here is the debug output with the minimal code example:

[WIFIMULTI] Adding: 'TeelSys' xxxxxxxxx' to list
Connecting to TeelSys
Running on core 0 task ID 3330250457
[WIFIMULTI] Rescanning to build new list of APs
[WIFIMULTI] scanList: SSID: 'TeelSys' -- BSSID: 'C4411EA9B857' -- RSSI: -47
[WIFIMULTI] Connecting to: SSID: 'TeelSys' -- BSSID: 'C4411EA9B857' -- RSSI: -47

Failed to connect to Wi-Fi. Trying again...
Running on core 0 task ID 3330250457
[WIFIMULTI] Rescanning to build new list of APs
[WIFIMULTI] scanList: SSID: 'TeelSys' -- BSSID: 'C4411EA9B857' -- RSSI: -41
[WIFIMULTI] Connecting to: SSID: 'TeelSys' -- BSSID: 'C4411EA9B857' -- RSSI: -41

WiFi connected, SSID: TeelSys, IP address: 10.140.1.58
Connected in 7702 ms

After unplugging the Pico W and plugging it back into the PC, it seemed to lock up again but it did try to make a second attempt. I saw the same behavior on all three Pico W boards that I tried.

[WIFIMULTI] Adding: 'TeelSys' xxxxxxxxx' to list
Connecting to TeelSys
Running on core 0 task ID 3598685913
[WIFIMULTI] Rescanning to build new list of APs
[WIFIMULTI] scanList: SSID: 'TeelSys' -- BSSID: 'C4411EA9B857' -- RSSI: -44
[WIFIMULTI] Connecting to: SSID: 'TeelSys' -- BSSID: 'C4411EA9B857' -- RSSI: -44

Failed to connect to Wi-Fi. Trying again...
Running on core 0 task ID 3598685913
[WIFIMULTI] Rescanning to build new list of APs
[WIFIMULTI] scanList: SSID: 'TeelSys' -- BSSID: 'C4411EA9B857' -- RSSI: -43
[WIFIMULTI] Connecting to: SSID: 'TeelSys' -- BSSID: 'C4411EA9B857' -- RSSI: -43

@earlephilhower
Copy link
Owner

Lockups and slow connections are very different symptoms with very different causes. I suggest debugging them separately.

For lock ups, please make sure you are using the latest and greatest git master and not the 3.6.1 release. #1872 has a fix for an IRQ race condition. Also, make sure you are not using MDNS because it is incompatible w/FreeRTOS #1880.

When you get a lock up, please grab the stack trace for both cores. If that's all scrambled, or PicoProbe can't connect to the SWD debugger, then it's a physical/voltage problem and not SW...

@richteel
Copy link
Contributor Author

richteel commented Dec 5, 2023

Okay, the plot thickens. Delay is not the issue but having only delay in the while statement causes the issue.

I had removed the if block from the while loop and I saw the same behavior as the delay.

        while (!WiFi.connected() && (millis() - start < to)) {
            // Replaced delay(5); with the following to prevent failure when using RTOS
            unsigned long tWifiDelayForRtos_start = millis();
            while (millis() - tWifiDelayForRtos_start <= 5) {
                ;
            }
        }

I then added the if block back in and commented out the while statement and closing bracket and added the delay statement back in and it would connect on the first or second attempt.

        while (!WiFi.connected() && (millis() - start < to)) {
            // Replaced delay(5); with the following to prevent failure when using RTOS
            unsigned long tWifiDelayForRtos_start = millis();
            // while (millis() - tWifiDelayForRtos_start <= 5) {
                if (millis() < tWifiDelayForRtos_start) {
                    // millis wrapped around to zero. Rare to hit this
                    // issue, but it will do this every 49 days.
                    tWifiDelayForRtos_start = millis();
                }
            //}
			delay(5);
        }

Any idea why there needs to be something in addition to delay in the loop?

Calling it a night so will pick this back up tomorrow afternoon.

BTW: Thanks for checking on this and pushing back a bit to find out what is really going on. It looks like it needs more work to get to the bottom of the issue. I just got lucky yesterday by putting that if statement in the loop.

@maxgerhardt
Copy link
Contributor

maxgerhardt commented Dec 5, 2023

Take note I tested this with PlatformIO and the commit d2461a1, not 3.6.1 stable. That is exactly the commit that has the IRQ fix. If any delay somewhere causes a task switch which causes the incorrect re-enabling of IRQs, that could be the problem. Can you manually apply the above commit and retry?

@richteel
Copy link
Contributor Author

richteel commented Dec 5, 2023

Take note I tested this with PlatformIO and the commit d2461a1, not 3.6.1 stable. That is exactly the commit that has the IRQ fix. If any delay somewhere causes a task switch which causes the incorrect re-enabling of IRQs, that could be the problem. Can you manually apply the above commit and retry?

Yes, you are correct. I was doubtful because I thought I would have had that change from the start as I had copied the latest code into my library, but my guess is I had copied the code before this commit. I had pulled the latest code again at some point yesterday when I was doing testing because of Earl's comment about making certain I was using the latest code on GitHub. I had thought that I tried only delay in the loop and was seeing the same issue. I pulled the code again this morning and reverted to having only delay in the loop and it is indeed working.

Thank you both for all the help. I'm closing this pull request as it is not needed and did not fix the root problem.

I have a suspicion that what may have happened is I've been seeing some errors as I edit or overwrite the files in the library. "Failed uploading: cannot execute upload tool: exec: "C:\Users\USER\AppData\Local\Arduino15\packages\rp2040\hardware\rp2040\3.6.1/system/python3/python3": file does not exist". If I uninstall the Pico W board from the Arduino IDE and add it back, it resolves the issue. Not sure why that is happening but is only occasionally, and typically after a reboot if I edit or overwrite the RP2040 files in the library. I think that may have happened and I did not overwrite all the files, only the WiFiMulti. h and cpp files.

@richteel richteel closed this Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants