Having looked into it some more it appears this is normal background network activity when two interfaces are negotiating a link-local IPs as per RFC 3927. The all-zero MAC is seen but should be configured as the target address since it’s ignored; the logs above appear to show a source address that has been set to all zeros. Perhaps a device on the LAN is wrongly implementing it or perhaps PiAware is getting mixed up with what it’s interpreting. And perhaps either of those is the cause of the seg fault.
Thanks for digging into the code and reaching out to to the dev, @obj, you beat me to it as I was part way through an email to him. That’s a rapid fix indeed! Thanks for compiling it; I’ll certainly give it a go and see if it appears to stop the crashes (can’t ever be sure until these RFC 3927 negotiations take place and appear in the logs and it survives). If so then PiAware should stop going down, and I suspect this is the cause of those few others who are reporting their PiAware drops offline and doesn’t come back without manual intervention.
Can you clarify some points please?
What’s the correct syntax to install this .deb on a running box? I’m seeing various syntaxes and even a single user mode suggestion. Rebooting’s not a problem, just don’t want to break it as it will need to be reimaged and start over if that happens.
Has Roy confirmed a bug which would be causing the seg fault and this fix is a formal fix for it? I can’t really make sense of the code snippet that I can see on the link you provided and I don’t know how “formal” this is as a proper fix?
If PiAware 3.8.0 is installed from the current SD card image, does this patch need to be applied or will it be rolled into 3.8.0? Or what about 3.8.1?
Download the .deb then sudo dpkg -i <path to the downloaded deb>. You might want to run it under screen just in case if you’re doing it remotely, but I installed it OK over ssh and it didn’t interrupt the connection. May need a reboot (or at least a dhcpcd restart) afterwards.
Not in so many words but it looks like a plausible fix for the crash (use-after-free of astate) and it’s on the main branch for dhcpcd 8 so it’ll go into the next release.
I’ll see what I can do for getting it into 3.8.1 assuming upstream doesn’t get there first (which reminds me that I need to let the raspbian maintainers know, since the dhcpcd in raspbian actually seems to be newer than the package in standard debian; debian has dhcpcd 7.1.0)
Thanks for your assistance @obj, very much appreciated. I installed it on my own PiAware to test it, despite this ‘claim’ trigger apparently not present on my network. It installed okay. I’ll be installing it on my mate’s PiAware later on. I’ll be interested to see if these claim events continue in his syslog but without the seg fault and the dropping off the network. I’ll report back.
Patch installed on mate’s Pi… I’ll report back on any more occurances of RFC 5227 IP defence in the logs; this time there hopefully should be no seg fault and no network drop-off. Thanks again.
Hey @obj that .deb patch has done the trick. I’m still seeing the same RFC 5227 defence taking place but it’s no longer causing a dhcpcd seg fault.
BEFORE PATCH – dhcpcd often seg faulted after deleting default route and then attempting probe, taking PiAware offline until manually reset.
...
Jan 19 21:39:05 piaware dhcpcd[530]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.4
Jan 19 21:39:06 piaware dhcpcd[530]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.4
Jan 19 21:39:06 piaware dhcpcd[530]: wlan0: 10 second defence failed for 192.168.1.4
Jan 19 21:39:06 piaware avahi-daemon[267]: Withdrawing address record for 192.168.1.4 on wlan0.
Jan 19 21:39:06 piaware avahi-daemon[267]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.4.
Jan 19 21:39:06 piaware avahi-daemon[267]: Interface wlan0.IPv4 no longer relevant for mDNS.
Jan 19 21:39:06 piaware dhcpcd[530]: wlan0: deleting route to 192.168.1.0/24
Jan 19 21:39:06 piaware dhcpcd[530]: wlan0: deleting default route via 192.168.1.1
Jan 19 21:39:06 piaware systemd[1]: dhcpcd.service: Main process exited, code=killed, status=11/SEGV
...
AFTER PATCH – dhcpcd completes deletion, probe and reinstatement of route and is no longer seg faulting.
...
Jan 26 16:53:52 piaware dhcpcd[530]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.4
Jan 26 16:53:53 piaware dhcpcd[530]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.4
Jan 26 16:53:53 piaware dhcpcd[530]: wlan0: 10 second defence failed for 192.168.1.4
Jan 26 16:53:53 piaware avahi-daemon[294]: Withdrawing address record for 192.168.1.4 on wlan0.
Jan 26 16:53:53 piaware avahi-daemon[294]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.4.
Jan 26 16:53:53 piaware avahi-daemon[294]: Interface wlan0.IPv4 no longer relevant for mDNS.
Jan 26 16:53:53 piaware dhcpcd[530]: wlan0: deleting route to 192.168.1.0/24
Jan 26 16:53:53 piaware dhcpcd[530]: wlan0: deleting default route via 192.168.1.1
Jan 26 16:53:53 piaware dhcpcd[530]: wlan0: probing address 192.168.1.4/24
Jan 26 16:53:59 piaware dhcpcd[530]: wlan0: using static address 192.168.1.4/24
Jan 26 16:53:59 piaware avahi-daemon[294]: Joining mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.4.
Jan 26 16:53:59 piaware avahi-daemon[294]: New relevant interface wlan0.IPv4 for mDNS.
Jan 26 16:53:59 piaware avahi-daemon[294]: Registering new address record for 192.168.1.4 on wlan0.IPv4.
Jan 26 16:53:59 piaware dhcpcd[530]: wlan0: adding route to 192.168.1.0/24
Jan 26 16:53:59 piaware dhcpcd[530]: wlan0: adding default route via 192.168.1.1
...
In my previous logs post you can see that the underlying ARP-related traffic is still present and still causing this ‘claim’ situation, but, since applying the dhcpcd fix, it is no longer causing a seg fault. The seg fault was happening after the IP and route had been dropped but before the new IP and route (which would be the same anyway for a static allocation) was set. This essentially took PiAware offline until dhcpcd was restarted, typically via a power cycle.
Syslog is showing that this claim process continues to take place every 15 minutes. It causes PiAware to relinquish its IP, default route and FlightAware connections, and then get them all back again, a process which takes around 8 seconds from end to end. I decided to dig into this some more this evening and determine what is triggering the effect on the network. Since this process is a RFC 5227 IPv4 defence I used Wireshark on another machine to listen for ARP announcements meeting the RFC spec, while tailing syslog to watch for the next occurrence.
Wireshark captured a number of ARP announcements and replies with this configuration (I’ve masked the full ethernet address for privacy):
Sender MAC address: piaware.local (b8:27:eb:xx:xx:xx)
Sender IP address: piaware.local (192.168.1.4)
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: piaware.local (192.168.1.4)
RFC 5227 states:
A host probes to see if an address is already in use by broadcasting an ARP Request for the desired address. The client MUST fill in the ‘sender hardware address’ field of the ARP Request with the hardware address of the interface through which it is sending the packet. The ‘sender IP address’ field MUST be set to all zeroes; this is to avoid polluting ARP caches in other hosts on the same link in the case where the address turns out to be already in use by another host. The ‘target hardware address’ field is ignored and SHOULD be set to all zeroes. The ‘target IP address’ field MUST be set to the address being probed. An ARP Request constructed this way, with an all-zero ‘sender IP address’, is referred to as an ‘ARP Probe’.
I didn’t see any probes with the sender IP set to all zeroes, but the sender MAC is that of the Pi. RFC 5227 goes on to say:
An ARP Announcement is identical to the ARP Probe described above, except that now the sender and target IP addresses are both set to the host’s newly selected IPv4 address. The purpose of these ARP Announcements is to make sure that other hosts on the link do not have stale ARP cache entries left over from some other host that may previously have been using the same address. The host may begin legitimately using the IP address immediately after sending the first of the two ARP Announcements; the sending of the second ARP Announcement may be completed asynchronously, concurrent with other networking operations the host may wish to perform.
The data captured matches that of an announcement and is flagged as such in Wireshark.
Thus it appears that PiAware is getting into an RFC 5227 fight with itself. Every 15 minutes this installation of PiAware is announcing its intent to use the address 192.168.1.4. This announcement is picked up by the same PiAware which then tries to defend its use of the address. This defence fails and PiAware relinquishes the address, default route and all connections. A few seconds later it establishes its newly ‘won’ same IP and re-establishes the route and connections. Until recently this odd situation was often causing the dhcpcd seg fault which has now been patched.
It is as if the networking on this installation has somehow ended up being managed by two components in the OS, leading to this kind of ‘split brain’ operation. This PiAware installation is the 3.8.0 card image and belongs to a friend of mine. It was set up using a dynamic IP initially in piaware-config.txt and then changed to a static IP to work around then fix the lappend bug.
I have the same image and configuration and I am not seeing this behaviour on my network. Indeed I am not seeing any IP claim/defence traffic on the LAN at all, and nothing like this is being recorded in my PiAware’s syslog. The only difference between us is that I’m on a Pi 3B and he’s on a 3B+, so perhaps different networking chipset modules or something.
Is anyone else seeing this claim/defence activity in syslog? Any FA devs who might be able to conjecture what is going on here?
I have been following this discussion with just a passing interest.
You started off the discussion by saying it had a static IP.
Why?
Was that an attempt to fix some other problem and there are now just more layers of problems?
Is there some reason why you can’t burn a virgin piaware image on a new sdcard and simply edit piaware-config.txt to add the WiFi credentials and the Unique ID and nothing else?
If it still fails, take it to another site and another AP/router and see if that fixes it.
There are lots and lots of Pis running piaware images and WiFi that are not suffering this problem. I am suggesting identifying the cause and then fixing that.
It could be Pi hardware, piaware image on the SDcard as amended, the router or some other device on the network not playing quite right. It shouldn’t be hard to identify what is causing it so that you can return to a relatively vanilla image.
BTW I run a collection of Pi3B and 3B+ here with piaware images amended for bias-T and another feeder with any WiFi network problems.
No, it’s always had a static IP configured in piaware-config.txt. Regardless of whether a static IP or not is used, in the SD card image it is managed by dhcpcd.
That is what this installation is.
I concur; as I said I’m not seeing it at my site, just at my friend’s site. Don’t forget that the original purpose of this thread related to PiAware dropping off and not coming back on, and that’s been traced to an obscure bug in dhcpcd5 which causes a seg fault. That’s now been fixed by the dev, Roy, and turned into a .deb patch by @obj and applied by me to the affected installation and that’s fixed the primary concern. My attention is now focused on why this installation of PiAware is experiencing this challenge/defend/fail/release/renew cycle every 15 minutes, apparently originating from itself, which was the original trigger of the dhcpcd bug and which continues to take place.
This is a vanilla image. Network analysis so far appears to show that it is PiAware which is causing the problem for itself. I’d like hear from anyone else who is seeing this behaviour in their syslog, particularly if they’ve also suffered from the primary problem of PiAware going and staying offline, and I’m curious to hear from FA devs if the network behaviour described sparks any ideas given their deeper knowledge of what 3.8.0 is doing under the hood.
Can you raise this with the dhcpcd maintainer? I can poke at the dhcpcd code and make educated guesses but it’s really not something I am very familiar with. Sounds like perhaps dhcpcd is getting the source/destination hardware addresses reversed or something.
Static IP is a relatively uncommon configuration so it may just be that.
I’m sorry, what I mean is that having a static IP configured may be the trigger for the underlying bug, and the bug just hasn’t been noticed in the past because it’s an uncommon configuration. Uncommon or not, it’s still a valid configuration, it doesn’t need to be “fixed”.
The IP world existed with static IP before dhcp became the norm. I’ve spent many sleepless nights trying to calculate netmasks and network addresses and broadcast addresses and the list goes on and on. I know how hard it is to find one errant bit on one host when it was all done manually. And yes, I started before NAT was invented when ever host had a real, routable IP address.
In more recent times dhcp has solved all those error prone processes so for me the logical question is to ask what was the problem that changing from dhcp to fixed IP address tried to solve.
I don’t disagree, but whether you “should” use a static IP (which is a decision that’s entirely up to whoever is managing the device) is irrelevant; the sdcard config supports it, so it should work correctly if configured.
My concern is that the vast majority of piaware users configure their Pi with the default DHCP.
Those who elect to use static IP usually do so because of some other network problem.
As demonstrated in the thread I linked to previously it is not unusual for those who choose to set up static IP to get it wrong. Often by setting the static Ip within the dhcp range but also by getting the broadcast and/or netmask wrong.
As the thread title states it is the failure to reconnect that is the problem here rather than the fact that the network regularly fails.
As i have said previously, the engineer in me is interested in fixing the initial problem rather than applying a bandaid and declaring it fixed.
Not in these cases; it’s simply a design decision. I prefer unchanging underlying addresses for major services.
Possibly but that’s not relevant in this case.
The thread title is based on observed symptoms, prior to the diagnosis, so that it will assist future readers who experience the same symptoms.
The initial problem is fixed; it was a bug in the dhcpcd5 component. The unit no longer fails to reconnect to wifi when it drops off. My attention is now focused on the interesting cause of the unit apparently entering into RFC 5227 negotiation with itself as I am not seeing that on my own unit which to all intents and purposes is an identical configuration. It may well be another dhcpcd5 bug. If anyone else sees this same reported claim/defend behaviour in syslog, possibly leading to the same original seg fault bug, I’ll be interested to hear more from them.
As I mentioned earlier on this thread, as after I moved my Pi 3B+ to the attic it twice dropped off the network and had to be rebooted. So I installed a script to detect this happening and copy the log files (but not try to restart WiFi or reboot the device). It’s been a couple of weeks now and the problem hasn’t occurred. I have also checked today for ‘defence’ messages in the log and I don’t have any.
The behavior seen on your buddy’s network is quite strange.