PiAware won't reconnect to wifi network if it drops off

@SweetPea11

Static addressing does not indicate there problem.Static addressing is a useful technique for networks that require incoming internet access. It is easier than using a bastion server to translate ports to internal addresses. I use it with routable IP’s on my /25 home network.

I’ve been in contact with Roy for a couple of days and sent him the wireshark trace of the claim/defend sequence. This is happening on my mate’s PiAware every 15 minutes, causing it to lose connectivity for 8 seconds every 15 minutes. The sequence isn’t quite correct though as there’s no probe adhering to RFC 5227. The claim is coming from an interface on the LAN but at the moment it’s not known where it’s sitting.

Roy wondered if dhcpcd was failing to resolve the MAC of the claimant and asked me to add a line to /etc/dhcpcd.conf in the interface section. It’s the IP of the router in this case, something with a known MAC which will cause it to get logged in syslog if dhcpcd is working okay, or not in which case it would help indicate an issue with dhcpcd.

arping 192.168.1.1

Since dhcpcd is controlled by PiAware in the PiAware image, the config file is overwritten on every boot, it’s necessary to HUP it to reload the modified config.

$ sudo dhcpcd -n

The next time the claim happened it revealed that dhcpcd is behaving correctly and is not at fault here. The last line has dhcpcd correctly capturing the MAC of the Sagecom router.

Jan 31 18:52:59 piaware dhcpcd[760]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.4
Jan 31 18:53:00 piaware dhcpcd[760]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.4
Jan 31 18:53:00 piaware dhcpcd[760]: wlan0: 10 second defence failed for 192.168.1.4
Jan 31 18:53:00 piaware avahi-daemon[264]: Withdrawing address record for 192.168.1.4 on wlan0.
Jan 31 18:53:00 piaware avahi-daemon[264]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.4.
Jan 31 18:53:00 piaware avahi-daemon[264]: Interface wlan0.IPv4 no longer relevant for mDNS.
Jan 31 18:53:00 piaware dhcpcd[760]: wlan0: deleting route to 192.168.1.0/24
Jan 31 18:53:00 piaware dhcpcd[760]: wlan0: deleting default route via 192.168.1.1
Jan 31 18:53:00 piaware dhcpcd[760]: wlan0: hardware address d8:a7:56:xx:xx:xx [redacted] claims 192.168.1.1

In order to try and ascertain where this claim is coming from on the LAN, Roy has kindly written a patch which will ammend dhcpcd to save the ARP packets for analysis. @obj I’ve messaged you the patch, would you mind turning it into a .deb file like you did with the fix please? I can then apply it and get the data needed for analysis.

Many thanks.

@chrislfa I’m not a PiAware user, but I am seeing the same claim/defend issue in the syslogs of 2 Raspberry Pi’s running the latest vanilla Raspbian (Buster):

Feb  1 13:36:36 RPi-Mike2 dhcpcd[357]: wlan0: hardware address 00:00:00:00:00:00 claims 192.168.1.76
Feb  1 13:36:36 RPi-Mike2 dhcpcd[357]: wlan0: 10 second defence failed for 192.168.1.76
Feb  1 13:36:36 RPi-Mike2 dhcpcd[357]: wlan0: deleting route to 192.168.1.0/24
Feb  1 13:36:36 RPi-Mike2 dhcpcd[357]: wlan0: deleting default route via 192.168.1.254
Feb  1 13:36:36 RPi-Mike2 dhcpcd[357]: wlan0: rebinding lease of 192.168.1.76
Feb  1 13:36:36 RPi-Mike2 dhcpcd[357]: wlan0: probing address 192.168.1.76/24
Feb  1 13:36:42 RPi-Mike2 dhcpcd[357]: wlan0: leased 192.168.1.76 for 86400 seconds
Feb  1 13:36:42 RPi-Mike2 dhcpcd[357]: wlan0: adding route to 192.168.1.0/24
Feb  1 13:36:42 RPi-Mike2 dhcpcd[357]: wlan0: adding default route via 192.168.1.254

Mine repeat once each hour on the same minute. I’ve done a lot of investigation on this issue and I’m now thinking that it is caused by something coming from my Actiontec T3200M router from my ISP Telus (western Canada).

I’m preparing some network captures and other info to send to Roy shortly (I haven’t contacted him directly yet). If you want to message me, I’ll keep you in the loop on what I find. I just wanted to let you know that you’re not the only one seeing this issue.

3 Likes

Hi @GordDelaney, thanks for the info, that’s really useful to know. The problem does need solving; it causes the PiAware webpage to throw up the dump1090 warning for a few seconds every 15 mins, and it also means that a percentage of MLAT messages are being repeatedly lost according to piaware.log.

piaware[771]: NOTICE from adept server: 25% of multilateration messages (UDP) are not reaching the server - check your network?

I’m also of the conclusion now that this is some interaction between the latest Debian on the RPi and the networking of some routers. I’ve been remoted onto my mate’s network this evening doing a lot of testing. Here’s what I’ve found.

The OS sees the conflict no matter what unique IPv4 it has, meaning it’s not coming from any genuine external IPv4 conflict. It must be an OS interpretation of some situation which is being treated like a conflict. It may be IPv6 related.

Roy wrote a patch which saves the ARP packet contents when this happens. Sure enough they are being saved. They are all identical and they don’t reveal anything that Wireshark hasn’t already revealed, that they are ARP announcements following RFC 5227 Section 2.3.

pi@piaware:/tmp $ md5sum dhcpcd-arp*
c5683475e67e806b48a61e21888cc48f  dhcpcd-arp-0.raw
c5683475e67e806b48a61e21888cc48f  dhcpcd-arp-1.raw
c5683475e67e806b48a61e21888cc48f  dhcpcd-arp-2.raw
...

pi@piaware:/tmp $ xxd dhcpcd-arp-0.raw 
00000000: 0001 0800 0604 0001 0000 0000 0000 c0a8  ................
00000010: 0104 0000 0000 0000 c0a8 0104            ............

0001 is ARP hardware type Ethernet
0800 is ARP protocol type IPv4
06 is ARP hardware size
04 is ARP protocol size
0001 is ARP opcode type Request
000000000000 is sender MAC (on a remote Wireshark this part of the ARP announcement packet shows as the Pi’s MAC)
c0a80104 is sender IP which is 192.168.1.4
000000000000 is target MAC
c0a80104 is target IP which is 192.168.1.4

If I change the PiAware’s IP address in /boot/piaware-config.txt to another unused one and reboot with that, the Pi does indeed use the new address. The problem still happens and all that changes is the IP’s captured in the ARP packets are the new IP, eg for 192.168.1.7 it’s c0a80107.

However during PiAware’s startup there are all kinds of messages involving wlan0, and these are mostly caused by dhcpcd removing the address bindings. This in turn seems to be related to various things which don’t appear on my PiAware at all, eg

piaware dhcpcd[752]: wlan0: carrier lost

after which dhcpcd removes all the address bindings and everything goes through the cycle again. Or

piaware dhcpcd[348]: Too few arguments.

No idea what’s in the PiAware build is triggering that, it’s not appearing in mine.

Some of the error messages relate to IPv6. Perhaps this “conflict” is related to some IPv6 aspect which isn’t being handled correctly by dhcpcd or something else. The router does support IPv6 but it’s not used by the ISP so perhaps there’s some default settings which are causing problems on the LAN.

It’s not a direct Pi-hole interaction

My mate is using Pi-hole as his nameserver and that is configured in /boot/piaware-config.txt. If I change it back to the router and reboot, the conflict process still takes place. The only other IP in the conflg that PiAware knows about is the router which is a Sagemcom CS50001 with ISP branded firmware, offering multiple services as most of these consumer devices do.

The conflict cycle timing is not coming from PiAware

On his network this process of claim/defend/fail/drop/renew is taking place every 15 minutes at 53, 08, 23 and 38 minutes past the hour. This is regardless of when PiAware is booted so it’s being dicated by something external to PiAware. Later today I’m going to ask him to power-cycle the router so it’s back up between those minutes and see if that changes when the conflict takes place.

It’s not vrsfeed

I wrote a short script which uses socat to feed data to a VRS server. I wondered if that could be causing a problem, but stopping it from running makes no difference.

I think my next step will be to capture a complete syslog from reboot to the conflict process taking place and seeing if Roy notices anything related to these wlan0 messages, in particular around IPv6.

2 Likes

@chrislfa

I don’t think it’s ipv6 related - as part of my troubleshooting, I’ve disabled ipv6 (“ipv4only” in /etc/dhcpcd.conf) and disabled ipv6 on the lan side in my router with a router setting. Nothing changed.

I changed IP addresses as well, I’m using DHCP reservations, no help.
I also added a second PI (a PI 4 - the first one was a 3B) and they both get the issue within seconds - first one, then the other.
I’m not running anything other than the standard Raspbian desktop release on both PIs. The first one is monitoring a couple of GPIO pins for some home control stuff - just a couple of cron scripts to collect data.

I tried the same thing to diagnose the peculiar timing with this issue - on rebooting either PI the incident time stays hourly at xx:45 (ie: 09:45, 10:45…). When I rebooted the router, however, the timing shifted to about 10 minutes after the router reboot (ie reboot at 10:30 - errors at 10:38, 11:38…, then after a few hours the timing slipped back to the xx:45 (14:45, 15:45) - and stayed there for days. Fairly baffling!

2 Likes

@chrislfa: I probably won’t be able to get anything to Roy for a day or two - if you want to relay any of my info to him, please fell free.
BTW: I started with the Raspbian release dhcpcd version of 8.1.2, but I’ve since installed Roy’s latest release version of 8.1.6 from 5 days ago - nothing changed with it.

1 Like

That’s ace, really handy to know. So IPv6 may be another way to trigger whatever’s happening, but it’s not the trigger of what’s happening.

That’s also useful to know, I was going to test that at some stage because I’m using a static configuration via PiAware’s config file, which in turn populates /etc/dhcpcd.conf on reboot with static ip_address=... and the other static items. I was wondering whether dynamic allocation would have it act differently.

Very useful, it suggests that PiAware isn’t doing anything specific to cause this, it’s some odd interaction between a Buster environment and one if its components, quite possibly dhcpcd. This wasn’t happening on my mate’s setup when he was running PiAware 3.7.2 which used the pre-Buster Debian.

The timing here has shifted a few minutes too, it’s broadly every 15 minutes, but each one is give or take a couple of minutes. A router reboot at a clearly different offset, like 8 minutes in this case, should make it clear how relevant this is.

I will do, I’ll refer him to this thread and ensure he has the syslog with the pre-claim stuff going on.

My suspicion is that some element of the new Buster is not playing nicely with dhcpcd when it’s in an environment has certain network services/versions used by some consumer routers. Possibly caused by previous assumptions which are no longer valid. PiAware and the like are just casualties of the bouncing going on. The effect on PiAware is to impact the MLAT reliability and to render the local service unavailable while the network is being kicked around.

I appreciate @obj help with compiling patches to test and log things. It’s good to know we can actively help diagnose what impact PiAware is having if any, and to perhaps find a way to work around anything that’s outside it. And also thanks to Roy for his help with this so far, even when it strays into areas typically not his concern.

Are you running any Powerline / HomePlug adapters on your network, or do you have any vehicle charging sockets which are network enabled? Just an angle… this applies to my mate but not me, and I came across this article when I was looking for examples of misbehaving broadcasts.

@chrislfa

No - neither, sorry! That might have been a good catch!
BTW: I sent some packet captures and info to Roy today. I haven’t heard anything back yet, but I’ll keep you posted.

Thanks, it’s great to be able to cross-check these ideas for consistent aspects. I was wondering if a sloppily implemented car charger could be misusing broadcasts to communicate with the car and its app, and this was being picked up by the Pi’s dhcpcd which was acting in good faith to the packet structure but ending up seeing it as an address claim with the wrong source.

I updated my mate’s Pi earlier, it updated a fair bit of Debian but after a reboot the problem remains, with the address claims and loss/reclaim cycles continuing every 15 minutes. Roy’s created another patch to add header info to the ARP packet captures, that may help identify the host or at least show it’s a proper broadcast. I’ll update this thread as more info is revealed!

Can you clarify, are you running any FlightAware software, or dump1090 (which version) or anything that we might have in common (my mate is running PiAware 3.8.0)? It’s beginning to look like some component of such software is triggering an odd corner case in some envionments, leading to this bouncing effect which is technically RFC-compliant.

The same thing has happened at my mate’s now too. Starting 0330 this morning until 0630 it’s slid back to 08, 23, 38, 53 give or take a couple of minutes each time (mostly exact).

No, nothing from FlightAware. Just a plain Raspbian Buster desktop bundle install with very few utility packages added (Wireshark, Apache2, Samba, etc).
I’ve been thinking about your question of other devices on the network - other than “standard stuff” - a desktop PC and printer (wired), a laptop, Samsung phone and tablet (which are off the WiFi network a lot of the time), the only other thing is a Ring Video Doorbell Pro that is connected over WiFi. However, I’ve taken both PIs to a friends house where they both DO NOT exhibit the symptom, I suspect due to a different router model and ISP. But there is also the same model of Ring Doorbell there, so I’m leaning towards the router not the Ring.

I’f you’d like to confirm that I’ll send you the .deb kindly compiled by @obj from Roy’s latest v2 patch. That ammends dhcpcd so that when this happens it captures the raw ARP packets including the headers and saves them to /tmp. They can then be deconstructed. The earlier v1 patch only captured the packet contents.

In my mate’s case the extra packets contain the Pi’s and the router’s hardware addresses for target and sender headers respectively (I’ve left just the vendor parts of the MACs here):

pi@piaware:/tmp $ xxd dhcpcd-arp-0.raw
00000000: b827 ebXX XXXX d8a7 56YY YYYY 0806 0001  .'......V(.K....
00000010: 0800 0604 0001 0000 0000 0000 c0a8 0104  ................
00000020: 0000 0000 0000 c0a8 0104                 ..........

The first 6 bytes (b8 27 eb XX XX XX) are the target hardware address, which is the Pi (its wlan0 interface which is the one it’s using):

pi@piaware:/tmp $ ifconfig
...
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
    inet 192.168.1.4  netmask 255.255.255.0  broadcast 192.168.1.255
    ....
    ether b8:27:eb:XX:XX:XX  txqueuelen 1000  (Ethernet)
    RX packets 275669  bytes 21356489 (20.3 MiB)
    RX errors 0  dropped 0  overruns 0  frame 0
    TX packets 301593  bytes 291249017 (277.7 MiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The next 6 bytes (d8 a7 56 YY YY YY) are the sender hardware address, which is his Sagemcom CS50001 TalkTalk router (this handy online tool lets you quickly verify a vendor from the first 3 bytes of the hardware address):

pi@piaware:/tmp $ arp -a
? (192.168.1.1) at d8:a7:56:YY:YY:YY [ether] on wlan0
...

The next 2 bytes are 0806 which is a packet type of ARP. The remaining bytes are as previously analysed in the patch v1 packet capture above.

He has another Pi 3B+ on his network running standard Raspbian Buster (acting as Pi-hole). That’s also on a static allocation and that’s not having any of these issues at all. I can’t find any common factors in this so far. My mate’s PiAware continues to bounce offline for 8 seconds every 15 minutes.

[ Edited to fix the sender and target byte segments which were the wrong way around ]

Yes, please send me Roy’s latest patch. I’d prefer the original source patch since I’ve built and installed Roy’s latest dhcpcd (8.1.6) and I can easily apply the patch and rebuild. Or, if you prefer, you can send me the .deb

Is the Pi-hole PI on a wired Ethernet connection by any chance? I’ve found that if I move either of my machines from WiFi to wired connections, the problem does not occur. I normally prefer wired connections and only found the original problem when I installed the unit in a location with no wired access available that required me to use WiFi.

I will msg it.

Yes his Pihole is on a wired connection next to the router. His PiAware is up in the loft on wifi. I’ll be sure to let Roy know, and I’ve also seen some odd syslog messages relating to wlan0 which I’m going to send him.

I mentioned this to Roy in an email yesterday.

So this plus the ARP payload looking different on the wire sounds suspiciously like a normal ARP packet generated by the local Pi (I guess by the kernel itself) that dhcpcd’s BPF socket is picking up too early, before the ARP payload is fully populated.

I’d start looking at kernel versions…

Has that happened before and is a thing? As far as I can remember this only started happening on a fresh install of PiAware 3.8.0, ie Buster [Lite] in that case. I am positive that I never saw this on my mate’s feeder on PiAware 3.7.2. Now @GordDelaney isn’t running PiAware but IS also running Buster, so that at least is common to this. And we’re both only seeing it on the wlan interface. I wonder if something in Buster has changed how networking is handled, leading to some occasional out of bounds situation which tries to be processed as normal by dhcpcd and then results in the fun we’re seeing.

If it’s the router sending the ARP packet then that might account for different timings, eg my mate’s every 15 mins, GD’s every hour.

Too many variables and not enough hours in the day to play as much as I’d like sadly.

It turns out it is the router sending the packets, not the Pi. Roy mistakenly had the sender and receiver sections misremembered, and @GordDelaney picked up on the switcharoo. No probs, that explains much better what we’re seeing. So in summary:

  • Some routers are sending malformed ARP request packets directly to the Pi. They need to be patched in their firmware to stop doing it.
  • Buster on the Pi, when connected via the wireless port, doesn’t play nicely with these packets and this results in the claim/defend/drop/repeat behaviour which is technically RFC-compliant. The cycle depends on when the router sends them.
  • There may be a bug in the components of the current Buster / Buster Lite which is allowing this to happen which also needs patching.
  • dhcpcd was seg faulting when this happened, that has been patched and is now in the upstream code
  • PiAware 3.8.0 is based on Buster Lite so it’s seeing this same behaviour, it’s not a FlightAware issue per se, only in as much as PiAware is impacted by this on affected environments.

This explains the apparent rarity of this situation, although perhaps with Buster we’ll see more instances initially, especially unrelated to flight-tracking.

1 Like