At the time where the drop started, the syslog is full of entries like these:
Nov 16 10:20:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-quart1; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 10:20:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-median; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 10:20:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-quart3; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 10:20:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-peak_signal; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 10:20:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-min_signal; value time = 1605518305.500; last cache update = 1605518305.500;
This was running until 59 Minutes later:
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-quart1; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-median; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-quart3; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-minimum; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-max_range; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_aircraft-recent; value time = 1605518305.500; last cache update = 1605518305.500;
At 11:20 the process obviously restarted again. Since then it works without further issues.
> Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: State 'stop-sigterm' timed out. Killing.
> Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Killing process 786 (dump1090-fa) with signal SIGKILL.
> Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Main process exited, code=killed, status=9/KILL
> Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Failed with result 'timeout'.
> Nov 16 11:20:32 Raspi3 systemd[1]: Stopped dump1090 ADS-B receiver (FlightAware customization).
> Nov 16 11:20:32 Raspi3 systemd[1]: Started dump1090 ADS-B receiver (FlightAware customization).
> Nov 16 11:20:32 Raspi3 dump1090-fa[19638]: Mon Nov 16 11:20:32 2020 CET dump1090-fa 4.0 starting up.
> Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: rtlsdr: using device #0: Generic RTL2832U (Realtek, RTL2832U, SN 00001000)
> Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: Detached kernel driver
> Nov 16 11:20:33 Raspi3 kernel: [1990614.340991] r820t 12-001a: destroying instance
> Nov 16 11:20:33 Raspi3 kernel: [1990614.341451] dvb_usb_v2: 'Realtek RTL2832U reference design:1-1.3' successfully deinitialized and disconnected
> Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: Found Rafael Micro R820T tuner
> Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: rtlsdr: tuner gain set to 48.0 dB
> Nov 16 11:20:33 Raspi3 tar1090[785]: /usr/local/share/tar1090/tar1090.sh: line 104: /run/dump1090-fa/aircraft.json: No such file or directory
> Nov 16 11:20:33 Raspi3 tar1090[785]: No aircraft.json found in /run/dump1090-fa! Try restarting dump1090 or reinstalling tar1090 if you switched dump1090 to readsb!
Any ideas what caused this?
The device itself is running all the time meanwhile since 28 days and i did not touch it during the time
You cut off the interesting bit of the logs immediately before the restart, but my guess would be that the hardware got stuck and piaware restarted dump1090-fa after some time with no data seen.
Here we go, filtered on “dump1090”. Before 11:18:54 the entries for dump1090 where the same as in my initial post
Nov 16 11:18:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_cpu-reader; value time = 1605518274.700; last cache update = 1605518274.700;
Nov 16 11:19:02 Raspi3 dump1090-fa[786]: Mon Nov 16 11:19:02 2020 CET Caught SIGTERM, shutting down..
Nov 16 11:19:02 Raspi3 systemd[1]: Stopping dump1090 ADS-B receiver (FlightAware customization)...
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-noise; value time = 1605518274.700; last cache update = 1605518274.700;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-quart1; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-median; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-quart3; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-peak_signal; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_dbfs-min_signal; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-quart1; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-median; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-quart3; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-minimum; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_range-max_range; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_aircraft-recent; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_mlat-recent; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_tisb-recent; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:19:54 Raspi3 collectd[841]: uc_update: Value too old: name = localhost/dump1090-localhost/dump1090_gps-recent; value time = 1605518305.500; last cache update = 1605518305.500;
Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: State 'stop-sigterm' timed out. Killing.
Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Killing process 786 (dump1090-fa) with signal SIGKILL.
Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Main process exited, code=killed, status=9/KILL
Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Failed with result 'timeout'.
Nov 16 11:20:32 Raspi3 systemd[1]: Stopped dump1090 ADS-B receiver (FlightAware customization).
Nov 16 11:20:32 Raspi3 systemd[1]: Started dump1090 ADS-B receiver (FlightAware customization).
Nov 16 11:20:32 Raspi3 dump1090-fa[19638]: Mon Nov 16 11:20:32 2020 CET dump1090-fa 4.0 starting up.
Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: rtlsdr: using device #0: Generic RTL2832U (Realtek, RTL2832U, SN 00001000)
Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: Detached kernel driver
Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: Found Rafael Micro R820T tuner
Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: rtlsdr: tuner gain set to 48.0 dB
Nov 16 11:20:33 Raspi3 tar1090[785]: /usr/local/share/tar1090/tar1090.sh: line 104: /run/dump1090-fa/aircraft.json: No such file or directory
Nov 16 11:20:33 Raspi3 tar1090[785]: No aircraft.json found in /run/dump1090-fa! Try restarting dump1090 or reinstalling tar1090 if you switched dump1090 to readsb!
Thanks for explanation where this hour comes from.
-- Logs begin at Sun 2020-11-15 01:30:02 CET, end at Tue 2020-11-17 10:11:05 CET. --
Nov 16 11:19:02 Raspi3 dump1090-fa[786]: Mon Nov 16 11:19:02 2020 CET Caught SIGTERM, shutting down..
Nov 16 11:19:02 Raspi3 systemd[1]: Stopping dump1090 ADS-B receiver (FlightAware customization)...
Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: State 'stop-sigterm' timed out. Killing.
Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Killing process 786 (dump1090-fa) with signal SIGKILL.
Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Main process exited, code=killed, status=9/KILL
Nov 16 11:20:32 Raspi3 systemd[1]: dump1090-fa.service: Failed with result 'timeout'.
Nov 16 11:20:32 Raspi3 systemd[1]: Stopped dump1090 ADS-B receiver (FlightAware customization).
Nov 16 11:20:32 Raspi3 systemd[1]: Started dump1090 ADS-B receiver (FlightAware customization).
Nov 16 11:20:32 Raspi3 dump1090-fa[19638]: Mon Nov 16 11:20:32 2020 CET dump1090-fa 4.0 starting up.
Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: rtlsdr: using device #0: Generic RTL2832U (Realtek, RTL2832U, SN 00001000)
Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: Detached kernel driver
Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: Found Rafael Micro R820T tuner
Nov 16 11:20:33 Raspi3 dump1090-fa[19638]: rtlsdr: tuner gain set to 48.0 dB
Thanks. That implies that the main thread is wedged, which is an unusual failure mode I haven’t seen before. If it happens again it’s probably worth trying to catch in the act to track down further.
(I’m slightly suspicious of the pthread_cond_timedwait error path in fifo_dequeue; there’s conflicting information on how to handle error returns & the resulting state of the mutex; I wonder if we’re hitting the error path and the mutex is unexpectedly locked and then we self-deadlock trying to reacquire it)
Whatever i can deliver as data, let me know.
The device is the first one i have built more than a year ago. I’ve never seen such a fault before on that.
I stared at the fifo code for a while and I think I found one path where it could fail and get stuck. It is, literally, a one-in-a-billion chance (which triggers the bug described in this second commit) but that code is called a lot so it’s plausible that it could happen from time to time. (This code is new in 4.0)
Do let me know if you see the failure again, maybe we can get you onto the dev branch code in that case to see if it helps.