PiAware won't reconnect to wifi network if it drops off

Analysis

In my previous logs post you can see that the underlying ARP-related traffic is still present and still causing this ‘claim’ situation, but, since applying the dhcpcd fix, it is no longer causing a seg fault. The seg fault was happening after the IP and route had been dropped but before the new IP and route (which would be the same anyway for a static allocation) was set. This essentially took PiAware offline until dhcpcd was restarted, typically via a power cycle.

Syslog is showing that this claim process continues to take place every 15 minutes. It causes PiAware to relinquish its IP, default route and FlightAware connections, and then get them all back again, a process which takes around 8 seconds from end to end. I decided to dig into this some more this evening and determine what is triggering the effect on the network. Since this process is a RFC 5227 IPv4 defence I used Wireshark on another machine to listen for ARP announcements meeting the RFC spec, while tailing syslog to watch for the next occurrence.

Wireshark captured a number of ARP announcements and replies with this configuration (I’ve masked the full ethernet address for privacy):

Sender MAC address: piaware.local (b8:27:eb:xx:xx:xx)
Sender IP address: piaware.local (192.168.1.4)
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: piaware.local (192.168.1.4)

RFC 5227 states:

A host probes to see if an address is already in use by broadcasting an ARP Request for the desired address. The client MUST fill in the ‘sender hardware address’ field of the ARP Request with the hardware address of the interface through which it is sending the packet. The ‘sender IP address’ field MUST be set to all zeroes; this is to avoid polluting ARP caches in other hosts on the same link in the case where the address turns out to be already in use by another host. The ‘target hardware address’ field is ignored and SHOULD be set to all zeroes. The ‘target IP address’ field MUST be set to the address being probed. An ARP Request constructed this way, with an all-zero ‘sender IP address’, is referred to as an ‘ARP Probe’.

I didn’t see any probes with the sender IP set to all zeroes, but the sender MAC is that of the Pi. RFC 5227 goes on to say:

An ARP Announcement is identical to the ARP Probe described above, except that now the sender and target IP addresses are both set to the host’s newly selected IPv4 address. The purpose of these ARP Announcements is to make sure that other hosts on the link do not have stale ARP cache entries left over from some other host that may previously have been using the same address. The host may begin legitimately using the IP address immediately after sending the first of the two ARP Announcements; the sending of the second ARP Announcement may be completed asynchronously, concurrent with other networking operations the host may wish to perform.

The data captured matches that of an announcement and is flagged as such in Wireshark.

Thus it appears that PiAware is getting into an RFC 5227 fight with itself. Every 15 minutes this installation of PiAware is announcing its intent to use the address 192.168.1.4. This announcement is picked up by the same PiAware which then tries to defend its use of the address. This defence fails and PiAware relinquishes the address, default route and all connections. A few seconds later it establishes its newly ‘won’ same IP and re-establishes the route and connections. Until recently this odd situation was often causing the dhcpcd seg fault which has now been patched.

It is as if the networking on this installation has somehow ended up being managed by two components in the OS, leading to this kind of ‘split brain’ operation. This PiAware installation is the 3.8.0 card image and belongs to a friend of mine. It was set up using a dynamic IP initially in piaware-config.txt and then changed to a static IP to work around then fix the lappend bug.

I have the same image and configuration and I am not seeing this behaviour on my network. Indeed I am not seeing any IP claim/defence traffic on the LAN at all, and nothing like this is being recorded in my PiAware’s syslog. The only difference between us is that I’m on a Pi 3B and he’s on a 3B+, so perhaps different networking chipset modules or something.

Is anyone else seeing this claim/defence activity in syslog? Any FA devs who might be able to conjecture what is going on here?