PiAware won't reconnect to wifi network if it drops off

A friend of mine has his PiAware in his attic connected via wifi to the router downstairs. It has a static IP configured in /boot/piaware-config.txt. It’s an SD card build.

It has a habit of disconnecting from the network and not coming back on, requiring a trip up a ladder and a reboot to get it back on. My suspicion is that it’s the same problem that was described in this post a few years ago. Up in the attic he’s on the edge of his wifi’s reach. However in this case it’s already using a static IP. I think the disconnect effect is happening at the interface level.

I’ve written a script that checks for connectivity every 5 minutes and brings wlan0 down and back up and I’m hoping that’s enough to reconnect it. Not tested it yet, but before I do so and deploy it I wondered if anyone else has seen this behaviour with PiAware or Raspbian, and whether there are any settings in PiAware itself that can deal with this scenario?

Of course the problem could be something else and I’ve yet to see the logs but this seems the most likely culprit.

EDIT 6 days later – the problem in this case was traced to a bug in dhcpcd-5 during background address defence in which it would seg fault, taking PiAware offline until reset. The bug was fixed and a .deb patch produced which will be in the main branch later on. See below for details and the patch.

1 Like

piaware relies on wpa_supplicant to do the reconnect, which in turn I believe relies to some degree on the underlying wireless driver. It doesn’t do anything special beyond configuring the SSID etc.

Thanks Oliver that’s useful to know in my diags. I see posts related to that around the web which may offer some clues, eg this one. However they all mostly involve settings in the file /etc/network/interfaces.

In the SD card PiAware that file doesn’t seem to be used (on mine it just references the interfaces.d subdirectory which is empty). So I wondered if PiAware is doing anything directy to manage interfaces.

Starting with 3.8.0 on Buster, the piaware sdcard image uses the same config method as upstream Raspbian. All interfaces use /etc/dhcpcd.conf (including the static IP case). The wifi network details are configured in /etc/wpa_supplicant/wpa_supplicant.conf.

Everything else happens automatically - dhcpcd starts wpa_supplicant on specific interfaces when it sees them appear, wpa_supplicant manages the wifi association and link state of the interface, and dhcpcd does DHCP when it sees the interface link come up (or configures a static IP if told to do that)

1 Like

Thanks, I appreciate the step by step details. I’ve tweaked my script to test for connectivity; if it’s not there then collect some logs and bounce wlan0 otherwise do nothing. That runs every 5 minutes. It won’t solve a spotty connectivity problem but it will avoid the need to haul out ladders and help rule out a bigger problem, and it works around the problem of the network not reconnecting after a drop, which anecdotally is mentioned a fair amount in forums (for Raspbian in general, not specifically PiAware). Going forward there’s a mesh going in shortly anyway which will fix any range issues, and powerline is available now once additional adapters are procured.

I’ll update on the root cause once I know more. Here’s the script in case it’s useful for someone else in the future with this or a similar problem. I’ve not tested it yet in situ – I’m assuming that taking wlan0 down and up is sufficient to do the same as whatever would happen on a reboot which I know does work.

Stick it in /usr/local/bin/checkwifi

#!/bin/bash

# Check if router is reachable
ping -c 5 192.168.1.1 &> /dev/null

# If ping failed then update and copy logs and bounce wifi interface
if [ $? != 0 ]; then

  echo $(date) " Wifi dropped and is being restarted" >> /home/pi/wifilog
  cp /var/log/piaware.log "/home/pi/wifilog-$(date +%Y-%m-%d-%H-%M)-piaware.log"
  cp /var/log/syslog "/home/pi/wifilog-$(date +%Y-%m-%d-%H-%M)-syslog"
  chown pi:pi /home/pi/wifilog*

  /usr/sbin/ifconfig wlan0 down
  sleep 5
  /usr/sbin/ifconfig wlan0 up

fi

Open root’s crontab.

$ sudo crontab -e

Add the line

*/5 * * * * /usr/local/bin/checkwifi &> /dev/null
2 Likes

I had the same issue and wrote a similar script. It has been running for several months now, and seems to have resolved my issue.

1 Like

Might i suggest building in some sort of check mechanism and restart dhcpcd if WiFi hasn’t been able to connect for say 5 mins or so?

Aren’t you using dhcpcd? In that case using ifconfig up and down might not work.
This is probably a good approach for ppl with dhcpcd:

sudo systemctl restart dhcpcd

I’ve done a small script for people who require their RPi to have internet connectivity:
pingfail · wiedehopf/adsb-scripts Wiki · GitHub

It’s a bit drastic as it reboots the pi, but for remote installations getting the connection going again is the top priority.

1 Like

I’d need to see the underlying failure first. If it’s a problem with wpa_supplicant, let’s fix wpa_supplicant rather than band-aid it. If it’s a hardware problem I’m not sure that anything short of a reboot will help.

That would probably be useful as a feature: rebooting after 30 minutes without being able to connect to FA.
I’ve seen quite a few posts about not being able to access the RPi easily, having to power cycle it.

I would argue that due to driver variability and other factors, a connectivity watchdog for headless units is exactly what you want.
It might also be that it’s not wpa_supplicant itself, but the interaction with some rare access points.

We do exactly that sort of watchdog on FlightFeeders, but I’m wary of blindly doing it on piaware installs, since even with piaware sdcard images, piaware is often not the only thing running there. (And every time you reboot a Pi, there’s a nonzero chance it never comes back…)

Either way, I need to understand the problem before I can do anything sensible with it; I’ve never seen the problem on my own hardware.

Interesting to see this post so I’ll share my experience though I doubt it adds anything to the discussion. For about 2 years I ran my Pi 3B+ in the attic. Periodically it would stop feeding and be inaccessible on the network. It could take months for this to recur or it might happen twice in a week. In theory (i.e. as measured by my iPhone) the WiFi signal in the attic was strong.

The drops weren’t regular but often enough to be annoying. I ended up putting a WeMo switch on the A/C plug so I can reboot it remotely. Then I added a script to my NAS to check periodically if it could fetch data from the Pi and text me if it couldn’t.

Then I moved the Pi into the house, just below where it had been in the attic. This was the result of moving the antenna. I ran it this way for several months and never had a single network drop. About a month ago I moved the antenna again and put the Pi back in the attic. In that time it has gone offline twice. Nothing about the wireless network has changed, all cables and power brick the same and so on. I’ve never really gotten confident with that’s going on that causes this behavior.

However I have a number of IoT devices that use various WiFi chips and some of them have the same behavior of randomly losing network connectivity for no known reason and failing to connect. One of them, using the WINC chip, won’t even reconnect on a watchdog reboot - it has to be power cycled.

This thread suggests using the watchdog feature of the Pi instead. Does anyone have long term experience with how well this work out?

Well i didn’t mean it as a blanket solution, rather an option you can turn on :slight_smile:

I hadn’t realised that dhcpcd was being used when a static IP was configured in piaware-config.txt or that it was responsible for bringing up the wpa components. Given that, it makes more sense to do it the way you describe, a restart of dhcpcd.

I’ve tested it and it works to bring a downed interface back up, so I’ll be putting it on my mate’s Pi later on and hopefully that will prevent the need to keep hauling out the ladders. I’ll also be able to grab the existing logs which will hopefully shed more light.

In my script I’m grabbing copies of /var/log/piaware.log and /var/log/syslog and saving them in /home/pi to timestamped copies. Are there any other logs worth grabbing which will help shed light on these disconnects? The reason I’m grabbing copies is so that each one has its latest content relevant to that event, plus to avoid missing anything if they cycle out the file.

For reference by future readers in the script above I’ve replaced

....
  /usr/sbin/ifconfig wlan0 down
  sleep 5
  /usr/sbin/ifconfig wlan0 up
....

with

....
  systemctl restart dhcpcd
....

UPDATE

I’ve now been on the unit remotely and have the logs. I installed the checkwifi script and restarted the Pi to confirm it all comes back up okay. But coincidentally on that occasion it did not come back and the script did not run. But it was working okay for the short while I was on it before rebooting, hinting at more of a binary fault/no-fault condition. My mate got out the ladders and power-cycled it and I grabbed piaware.log and syslog for the last few days. While he was up in the attic he was on facetime and had a decent enough wireless signal, so poor wifi coverage appears to be a red herring.

Looking at the logs the reason it’s losing network connectivity is dhcpcd is occasionally failing with a seg fault. Every time it has lost connectivity this same pattern is present in the logs. This is an example of one of them.

For reference’s he’s got a RPi 3B+ and RPi PSU with a quality SD card. I have the same setup except mine is a 3B and I’m not seeing any seg faults. I don’t have these “hardware address 00:00:00:00:00:00 claims” entries in my syslogs either.

from /var/logsyslog


Jan 19 21:39:05 piaware dhcpcd[530]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.4
Jan 19 21:39:06 piaware dhcpcd[530]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.4
Jan 19 21:39:06 piaware dhcpcd[530]: wlan0: 10 second defence failed for 192.168.1.4
Jan 19 21:39:06 piaware avahi-daemon[267]: Withdrawing address record for 192.168.1.4 on wlan0.
Jan 19 21:39:06 piaware avahi-daemon[267]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.4.
Jan 19 21:39:06 piaware avahi-daemon[267]: Interface wlan0.IPv4 no longer relevant for mDNS.
Jan 19 21:39:06 piaware dhcpcd[530]: wlan0: deleting route to 192.168.1.0/24
Jan 19 21:39:06 piaware dhcpcd[530]: wlan0: deleting default route via 192.168.1.1
Jan 19 21:39:06 piaware systemd[1]: dhcpcd.service: Main process exited, code=killed, status=11/SEGV
Jan 19 21:39:07 piaware avahi-daemon[267]: Interface wlan0.IPv6 no longer relevant for mDNS.
Jan 19 21:39:07 piaware avahi-daemon[267]: Leaving mDNS multicast group on interface wlan0.IPv6 with address fdaa:bbcc:ddee:0:a7d3:c67a:13f8:81de.
Jan 19 21:39:07 piaware avahi-daemon[267]: Withdrawing address record for fda1:f321:cef:1:f6bc:b4dd:ce0c:9cb7 on wlan0.
Jan 19 21:39:07 piaware avahi-daemon[267]: Withdrawing address record for fdaa:bbcc:ddee:0:a7d3:c67a:13f8:81de on wlan0.
Jan 19 21:39:07 piaware systemd[1]: dhcpcd.service: Failed with result ‘signal’.
Jan 19 21:39:08 piaware ntpd[553]: Deleting interface #17 wlan0, fe80::6dec:bf26:ff73:ba6c%3#123, interface stats: received=0, sent=1, dropped=0, active_time=1869 secs
Jan 19 21:39:08 piaware ntpd[553]: Deleting interface #22 wlan0, fdaa:bbcc:ddee:0:a7d3:c67a:13f8:81de#123, interface stats: received=0, sent=0, dropped=0, active_time=1765 secs
Jan 19 21:39:08 piaware ntpd[553]: Deleting interface #24 wlan0, fda1:f321:cef:1:f6bc:b4dd:ce0c:9cb7#123, interface stats: received=0, sent=0, dropped=0, active_time=1747 secs
Jan 19 21:39:08 piaware ntpd[553]: Deleting interface #25 wlan0, 192.168.1.4#123, interface stats: received=223, sent=223, dropped=0, active_time=932 secs
Jan 19 21:39:08 piaware ntpd[553]: xxx.xxx.xxx.xxx local addr 192.168.1.4 → <null>
[ … 15 entries like this, all different IPs, I’ve blanked them as I’ve not checked who they are, I suspect OS and FlightAware services ]

from /var/log/piaware.log


Jan 19 21:39:39 piaware piaware[534]: data isn’t making it to FlightAware, reconnecting…
Jan 19 21:39:39 piaware piaware[534]: multilateration data no longer required, disabling mlat client
Jan 19 21:39:40 piaware piaware[534]: fa-mlat-client exited normally
Jan 19 21:39:40 piaware piaware[534]: reconnecting in 6 seconds…
Jan 19 21:39:40 piaware piaware[534]: mlat-client(936): Disconnecting from localhost:30005: Lost connection to multilateration server, no need for input data
Jan 19 21:39:40 piaware piaware[534]: mlat-client(936): Exiting on connection loss
Jan 19 21:39:46 piaware piaware[534]: Connecting to FlightAware adept server at piaware.flightaware.com/1200
Jan 19 21:39:46 piaware piaware[534]: Connection to adept server at piaware.flightaware.com/1200 failed: couldn’t open socket: Temporary failure in name resolution
Jan 19 21:39:46 piaware piaware[534]: reconnecting in 4 seconds…
Jan 19 21:39:50 piaware piaware[534]: Connecting to FlightAware adept server at piaware.flightaware.com/1200
Jan 19 21:39:50 piaware piaware[534]: Connection to adept server at piaware.flightaware.com/1200 failed: couldn’t open socket: Temporary failure in name resolution
Jan 19 21:39:50 piaware piaware[534]: reconnecting in 5 seconds…
Jan 19 21:39:55 piaware piaware[534]: Connecting to FlightAware adept server at 70.42.6.191/1200
Jan 19 21:39:55 piaware piaware[534]: Connection to adept server at 70.42.6.191/1200 failed: couldn’t open socket: network is unreachable
Jan 19 21:39:55 piaware piaware[534]: reconnecting in 4 seconds…

This line and the ones before it seem rather odd. I just checked my current log file on the Pi 3 B+ and there is nothing like any of those there. I’m going to monitor mine if/when it goes offline again to see if something like this shows up. It seems either the Pi itself, the router or something on his network is getting pretty confused. An all zero MAC address seems to be taking over the assigned IP address. Possibly there is a device on the network that loses track of it’s own MAC address and then takes the Pi’s IP? I suppose even the Pi could be erroring on it’s own MAC and claim it’s own IP with bogus MAC.

The Pi’s on a static IP of 192.168.1.4 and the dynamic range is from .10 to .254.

It feels like some edge case bug being triggered by some specific network traffic in some component of PiAware’s glue. Here are a couple of recent examples I found of that same “[network interface]: hardware address 00:00:00:00:00:00 claims [localhost IP]” effect in other projects. Example 1, Example 2. It appears in this PiAware case, whatever’s causing it is resulting in a seg fault in dhcpcd.

Well,. that’s not good!

The triggering factor looks like an IP/ARP conflict, though. Here’s the relevant dhcpcd code path:

which if I follow correctly means that the Pi heard more than one broadcast ARP for its own IP with an all-zeros source address. The “defence failed” means that it kept seeing the ARPs even after it sent its own saying “hey back off”, at which point it’s meant to give up, drop the current IP, and try to get a new lease. (Which isn’t gonna work in the static IP case, obviously…)

The segfault follows that; there’s obviously a bug in there somewhere, but things are broken before that point.

Does dhcpcd get restarted OK? If it does restart OK, what does it say on restart? If it’s not getting restarted, it may be as simple as adding a dropin service file to restart it on failure.

(it’s hard to tell where the spurious ARP is coming from - it looks like there are least some ARP probing tools that generate bad packets that could do this, though, and router firmware being what it is, I’d kinda suspect your router is doing something wrong…)

edit: I looked at my Pi here and dhcpcd will not auto-restart on failure, so the segfault is going to be fatal.

pi@piaware:~ $ systemctl show dhcpcd | grep Restart
Restart=no
RestartUSec=100ms
NRestarts=0

Try creating /etc/systemd/system/dhcpcd.service.d/restart.conf with contents:

[Service]
Restart=on-failure
RestartSec=15s

then systemctl daemon-reload; and see if that helps at all.

Thanks for sharing the details. I checked mine and it also won’t restart dhcpcd. I’m running a script to capture log files if/when mine goes offline again. I’m curious if it is the same cause or something different. Until I see my cause I’m not going to do any automatic restarts - want to be sure I notice when the problem occurs.

I asked the dhcpcd maintainer about this and got a rapid fix: https://roy.marples.name/cgit/dhcpcd.git/commit/?h=dhcpcd-8&id=40fcdef71dc07985049c7d6edda9198e760a8bcf

I’ll put together a rebuilt dhcpcd package that includes that for testing.

2 Likes

@chrislfa, if you’re feeling adventurous, please try https://flightaware.com/adsb/piaware/files/dhcpcd5_8.1.2-1+rpt1+fa1_armhf.deb and see if it helps. This is a rebuild of the Raspbian package with the change linked above applied.

1 Like