Since Piaware 2.1-5, something crashes every now and then

Hi,

Since updating to Piaware 2.1-5 (I also updated dump1090), I noticed that every now and then, maybe once a day, something crashes and then everything just works fine again. Not sure, if it is dump190 or the MLAT-client.

I am monitoring CPU-load and network traffic and can match the crashes to something like this in the logs:

01/04/2016 18:26:52 mlat(23053): Server status: synchronized with 37 nearby receivers
01/04/2016 18:26:52 mlat(23053): Receiver: 457.9 msg/s received 8.1kB/s from receiver
01/04/2016 18:26:52 mlat(23053): Server: 0.6 kB/s from server 0.0kB/s TCP to server 1.2kB/s UDP to server
01/04/2016 18:26:52 mlat(23053): Results: 254.6 positions/minute
01/04/2016 18:26:52 mlat(23053): Aircraft: 20 of 29 Mode S, 40 of 70 ADS-B used
01/04/2016 18:28:01 312685 msgs recv’d from dump1090 (1270 in last 5m); 312598 msgs sent to FlightAware
01/04/2016 18:33:01 314026 msgs recv’d from dump1090 (1341 in last 5m); 313939 msgs sent to FlightAware
01/04/2016 18:38:01 315186 msgs recv’d from dump1090 (1160 in last 5m); 315099 msgs sent to FlightAware
01/04/2016 18:38:39 mlat(23053): Lost connection to localhost:30005
01/04/2016 18:38:39 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:38:39 mlat(23053): Beast-format results connection with localhost:30104: connection lost
01/04/2016 18:38:39 lost connection to dump1090 via faup1090
01/04/2016 18:38:39 reconnecting to dump1090
01/04/2016 18:38:39 no ADS-B data program seen listening on port 30005 for 0 seconds, next check in 60s
01/04/2016 18:39:09 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:39:09 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:39:10 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:39:39 no ADS-B data program seen listening on port 30005 for 60 seconds, next check in 60s
01/04/2016 18:39:40 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:39:40 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:39:40 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:40:10 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:40:10 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:40:11 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:40:39 no ADS-B data program seen listening on port 30005 for 120 seconds, next check in 60s
01/04/2016 18:40:40 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:40:40 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:40:41 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:41:10 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:41:10 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:41:12 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:41:39 no ADS-B data program seen listening on port 30005 for 180 seconds, next check in 60s
01/04/2016 18:41:41 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:41:41 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:41:42 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:41:53 mlat(23053): Receiver status: disconnected
01/04/2016 18:41:53 mlat(23053): Server status: not synchronized with any nearby receivers
01/04/2016 18:41:53 mlat(23053): Receiver: 373.2 msg/s received 6.7kB/s from receiver
01/04/2016 18:41:53 mlat(23053): Server: 0.5 kB/s from server 0.0kB/s TCP to server 0.9kB/s UDP to server
01/04/2016 18:41:53 mlat(23053): Results: 204.5 positions/minute
01/04/2016 18:41:53 mlat(23053): Aircraft: 0 of 0 Mode S, 0 of 0 ADS-B used
01/04/2016 18:42:11 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:42:11 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:42:13 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:42:40 no ADS-B data program seen listening on port 30005 for 241 seconds, next check in 60s
01/04/2016 18:42:41 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:42:41 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:42:43 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:43:01 315327 msgs recv’d from dump1090 (141 in last 5m); 315240 msgs sent to FlightAware
01/04/2016 18:43:12 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:43:12 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:43:13 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:43:40 no ADS-B data program seen listening on port 30005 for 301 seconds, next check in 60s
01/04/2016 18:43:42 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:43:42 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:43:43 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:44:12 mlat(23053): Connection to localhost:30005 lost: [Errno 111] Connection refused
01/04/2016 18:44:12 mlat(23053): Reconnecting in 30.0 seconds
01/04/2016 18:44:13 mlat(23053): Beast-format results connection with localhost:30104: [Errno 111] Connection refused
01/04/2016 18:44:40 no ADS-B data program seen listening on port 30005 for 361 seconds, trying to start it…
01/04/2016 18:44:40 attempting to start fadump1090.sh using ‘invoke-rc.d fadump1090.sh start’…
Starting dump1090 server: dump1090001/04/2016 18:44:40 dump1090 start appears to have been successful
01/04/2016 18:44:42 mlat(23053): Input connected to localhost:30005
01/04/2016 18:44:44 mlat(23053): Beast-format results connection with localhost:30104: connection established
01/04/2016 18:44:50 ADS-B data program ‘dump1090’ is listening on port 30005, so far so good
01/04/2016 18:44:50 Starting faup1090: /usr/lib/piaware/helpers/faup1090 --net-bo-ipaddr localhost --net-bo-port 30005 --stdout --lat 48.360 --lon 8.971
01/04/2016 18:44:50 Started faup1090 (pid 13448) to connect to dump1090
01/04/2016 18:48:01 316056 msgs recv’d from dump1090 (729 in last 5m); 315969 msgs sent to FlightAware
01/04/2016 18:53:01 317105 msgs recv’d from dump1090 (1049 in last 5m); 317018 msgs sent to FlightAware
01/04/2016 18:56:53 mlat(23053): Receiver status: connected
01/04/2016 18:56:53 mlat(23053): Server status: synchronized with 39 nearby receivers
01/04/2016 18:56:53 mlat(23053): Receiver: 360.9 msg/s received 6.5kB/s from receiver
01/04/2016 18:56:53 mlat(23053): Server: 0.4 kB/s from server 0.0kB/s TCP to server 0.9kB/s UDP to server
01/04/2016 18:56:53 mlat(23053): Results: 174.3 positions/minute
01/04/2016 18:56:53 mlat(23053): Aircraft: 12 of 17 Mode S, 30 of 55 ADS-B used

Like I said before, never had this kind of problem before updating to Piaware 2.1-5. It occured to me, because I now can see it clearly (small drops) in my graphs of the CPU-load and network traffic.

Regards,

Kabuse

Looks to me like dump1090 is crashing, so that’ll be the culprit.

What did you upgrade from, what did you upgrade to?

I did a manual update from Piaware 2.1-3 with corresponding dump1090 to Piaware 2.1-5 with corresponding dump1090.

I also rebooted after the update.

There was no new release of the FA dump1090 between piaware 2.1-3 and piaware 2.1-5 …

Argh, I thought in the Piaware 2.1-5 thread it is mentioned to also upgrade dump1090, but maybe I am wrong on this.

Anyway, I just ran the manual update for dump1090, when updating to Piaware 2.1-5 and my rage circles were gone, so I just thougth it did update someting. Actually I did not check the version numbers before and after.

Nevertheless, now I have checked my version of dump1090 again and it says 1.2-3. So I guess, this is the latest one?

What could be the possible reasons for crashes of dump1090? Anything, I could check more deeply by myself?

Unfortunately the FA dump1090 is not very good at logging. You could try starting it by hand from a ‘screen’ session with the same args that the init.d script does. Hopefully you’ll get some errors and/or a coredump message when it dies at least :slight_smile:

dump1090 (well, librtlsdr) is not very stable if there are USB disconnects - in fact the FA dump1090 will just exit or crash if it happens - so it’s worth checking dmesg to see if USB has been dropping out too.

Well, I tried ‘dmesg | grep usb’ a few times, I think this what you mean. Please correct me, if I am wrong.
But my interpretation is that it just shows me initialization, nothing special there ever:

pi@piaware ~ $ dmesg | grep usb
0.689773] usbcore: registered new interface driver usbfs
0.695361] usbcore: registered new interface driver hub
0.700791] usbcore: registered new device driver usb
1.066372] usbcore: registered new interface driver smsc95xx
1.585823] dwc_otg 3f980000.usb: DWC OTG Controller
1.592203] dwc_otg 3f980000.usb: new USB bus registered, assigned bus number 1
1.602333] dwc_otg 3f980000.usb: irq 32, io mem 0x00000000
1.619514] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
1.627743] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
1.637744] usb usb1: Product: DWC OTG Controller
1.643876] usb usb1: Manufacturer: Linux 4.1.15-v7+ dwc_otg_hcd
1.651298] usb usb1: SerialNumber: 3f980000.usb
1.669258] usbcore: registered new interface driver usb-storage
1.810997] usbcore: registered new interface driver usbhid
1.818034] usbhid: USB HID core driver
2.051617] usb 1-1: new high-speed USB device number 2 using dwc_otg
2.261913] usb 1-1: New USB device found, idVendor=0424, idProduct=9514
2.270218] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
2.561626] usb 1-1.1: new high-speed USB device number 3 using dwc_otg
2.682099] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
2.690913] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
2.775510] smsc95xx 1-1.1:1.0 eth0: register ‘smsc95xx’ at usb-3f980000.usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:7e:35:9b
2.871663] usb 1-1.2: new high-speed USB device number 4 using dwc_otg
2.993914] usb 1-1.2: New USB device found, idVendor=0bda, idProduct=2838
3.002643] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
3.013723] usb 1-1.2: Product: RTL2838UHIDIR
3.013739] usb 1-1.2: Manufacturer: Realtek
3.013746] usb 1-1.2: SerialNumber: 00000001
3.131658] usb 1-1.5: new high-speed USB device number 5 using dwc_otg
3.256871] usb 1-1.5: New USB device found, idVendor=1a40, idProduct=0101
3.266025] usb 1-1.5: New USB device strings: Mfr=0, Product=1, SerialNumber=0
3.277756] usb 1-1.5: Product: USB 2.0 Hub
3.581695] usb 1-1.5.2: new high-speed USB device number 6 using dwc_otg
3.693647] usb 1-1.5.2: New USB device found, idVendor=7392, idProduct=7822
3.702787] usb 1-1.5.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
3.713701] usb 1-1.5.2: Product: 802.11n WLAN Adapter
3.720446] usb 1-1.5.2: Manufacturer: Realtek
3.726514] usb 1-1.5.2: SerialNumber: 00e04c000001
3.821633] usb 1-1.5.3: new full-speed USB device number 7 using dwc_otg
3.950605] usb 1-1.5.3: New USB device found, idVendor=10c4, idProduct=ea60
3.959279] usb 1-1.5.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
3.969904] usb 1-1.5.3: Product: ELV USB-WDE1 Wetterdatenempf\xffffffc3\xffffffa4\xffffffa4nger
3.977883] usb 1-1.5.3: Manufacturer: Silicon Labs
3.984341] usb 1-1.5.3: SerialNumber: 4MZTAAMCG5JJREYG
4.045379] usbcore: registered new interface driver usbserial
4.053819] usbcore: registered new interface driver usbserial_generic
4.062557] usbserial: USB Serial support registered for generic
4.076087] usbcore: registered new interface driver cp210x
4.083926] usbserial: USB Serial support registered for cp210x
4.099000] usb 1-1.5.3: cp210x converter now attached to ttyUSB0
4.192679] usbcore: registered new interface driver rtl8192cu
pi@piaware ~ $

Also, I plugged my RTL-SDR directly to the RasPi and also equipped it with some heatsinks, since I still think that temperature is connected to the MLAT-anomalies. :slight_smile: (In the picture, the stick is not connected to the RasPi, just for illustrating the heatsinks).

Yeah, that dmesg output looks normal. If you can try dump1090 in a screen session, that’s probably the best next step.

Apologies for hijacking the thread but is this an example of the USB disconnects? I’ve have 3 drop outs in the last 24 hours and I’m trying to determine the failing component. RPi? SDR dongle? dump1090_muta appears to successfully restart itself after an hour or so.

Snippet from dmesg output…


[234688.340429] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
[234861.437311] usb 1-1-port3: disabled by hub (EMI?), re-enabling...
[234861.437357] usb 1-1.3: USB disconnect, device number 74
[234861.674521] usb 1-1.3: new high-speed USB device number 75 using dwc_otg
[234861.786557] usb 1-1.3: New USB device found, idVendor=0bda, idProduct=2838
[234861.786589] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[234861.786606] usb 1-1.3: Product: RTL2838UHIDIR
[234861.786623] usb 1-1.3: Manufacturer: Realtek
[234861.786638] usb 1-1.3: SerialNumber: 02000360
[235703.021502] usb 1-1-port3: disabled by hub (EMI?), re-enabling...
[235703.021548] usb 1-1.3: USB disconnect, device number 75
[235703.257849] usb 1-1.3: new high-speed USB device number 76 using dwc_otg
[235703.369873] usb 1-1.3: New USB device found, idVendor=0bda, idProduct=2838
[235703.369901] usb 1-1.3: New USB device strings: Mfr=1, Product=

Snippet from dump_1090_mutability.log…


0 unique aircraft tracks
0 aircraft tracks where only one message was seen
CPU load: 0.3%
  0 ms for demodulation
  0 ms for reading from USB
  300 ms for network input and background tasks
Wed Jan  6 14:34:10 2016 UTC  Caught SIGTERM, shutting down..

Wed Jan  6 14:34:41 2016 UTC  dump1090-mutability v1.15~dev starting up.
Using sample converter: UC8, integer/table path
Found 1 device(s):
  0:  Realtek, RTL2838UHIDIR, SN: 02000360

Using device 0: Generic RTL2832U OEM
Found 1 device(s):
0: Realtek, RTL2838UHIDIR, SN: 02000360 (currently selected)
Found Rafael Micro R820T tuner
Closest available gain: 44.50 dB
Setting gain to: 44.50 dB
Gain reported by device: 44.50 dB
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
cb transfer status: 5, canceling...
Wed Jan  6 14:36:04 2016 UTC  Warning: lost the connection to the RTLSDR device.
Wed Jan  6 14:36:09 2016 UTC  Trying to reconnect to the RTLSDR device..
Found 1 device(s):
  0:  Realtek, RTL2838UHIDIR, SN: 02000360

Using device 0: Generic RTL2832U OEM
Found 1 device(s):
0: Realtek, RTL2838UHIDIR, SN: 02000360 (currently selected)
Found Rafael Micro R820T tuner
Setting gain to: 44.50 dB
Gain reported by device: 44.50 dB


Statistics: Wed Jan  6 14:34:42 2016 UTC - Wed Jan  6 14:36:42 2016 UTC
Local receiver:
  274071552 samples processed
  0 samples dropped
  0 Mode A/C messages received
  1687523 Mode-S message preambles received


That is a USB problem, yep.

Given that it’s pointing at possible interference:



usb 1-1-port3: disabled by hub (EMI?), re-enabling...


if you have a cable between the Pi and the dongle, I would try replacing the cable. Or it could be a bad contact in one of the USB connections.

Or it could be power problems, which the Pi+dongle can be really sensitive to and which can manifest as all sorts of weird problems…

Thanks again! I suspect the latter as other weird power glitches have been happening on other equipment but I’m hoping for one of the former. I changed the cable and connected to a different port, so we’ll see.