I’ve decided to spend a bit of time trying to see if I can add native support for the AirSpy SDRs to dump1090-fa. I have the plumbing done and I can actually decode samples although not well enough for real use. Right now I have 2 stumbling blocks that I could use some help with.
First some background…
The AirSpy R2 can:
Sample at 2.5, 10, 12, 20 and 24 Msamples/sec.
Output Float32-IQ, Float32-Real, Int16-IQ, Int16-Real, UInt16 real and “raw” formats.
It can only use the 20 and 24 samples rates with the non-IQ output formats however.
dump1090-fa:
Expects a fixed sample rate of 2.4 Msamples/sec.
Expects Int16-IQ or Int8-IQ data formats. (We’ll leave the Q11 variant out of the mix for now)
The first problem is that even if the AirSpy uses an IQ supported sample rate, dump1090’s 2.4 sample rate isn’t supported at all. It’s also kinda pointless to use the AirSpy for such a low sample rate. By setting the AirSpy’s sample rate to 12 and the format to Int16-IQ, I was able to feed the converter and demodulator with every fifth sample (12 / 2.4 = 5) but the results are disappointing to say the least. So demodulators that can operate at the higher rates would be needed.
The other problem is that, if we want to use the 20 and 24 rates, we can’t use the IQ format. That means converters for at least the Int16-real format.
My math/signal processing skills aren’t up to the task but I’m trying to get some help.
I’ve compounded the same math (well, not as complete as above) in the past and came up just as confused. I figured it was my own misunderstanding of how it works in general, but now that someone else has brought it up… It would be cool to be straight.
Sampling at 12 MSPS complex and decimating by 5 should work OK (I’ve done something similar at 19.2 real → 9.6 complex → 2.4 on similar hardware recently), but you do need to decimate properly: feed the input through a low-pass filter with a cutoff below the Nyquist frequency of the new sample rate, before you take every Nth output. I used a 39-tap halfband FIR LPF for each decimate-by-2 step and that worked OK.
dump1090 accepts IQ input mostly because that’s the common baseband format; what it really cares about (for now at least - things are changing) is the magnitude of the carrier, so it needs an envelope detector. Converting to IQ happens to be the first half of a good envelope detector - you just take the magnitude of the IQ signal.
The way the airspy hardware works is that it is feeding a low IF signal into a single-channel ADC and doing the conversion to IQ baseband purely in software.
So you could just implement IQ conversion for the higher rates. Or it may already be there - I haven’t checked what the airspy host lib conventions are exactly, but it’s common to do a decimate-by-2 after IQ conversion, so that “12MSPS complex” may actually be running the hardware with exactly the same settings and reading exactly the same data as “24MSPS real”. (I’d certainly be surprised if there were some hardware sample rates where the host lib wouldn’t do IQ conversion, it’s the same task regardless of sample rate)
But there are other ways of extracting the envelope from a real signal, notably you can just feed it through a lowpass filter with a cutoff below the carrier frequency but above the modulation rate, and that will approximate the envelope of the carrier. I’ve tried this too and it also works acceptably.
dump1090-fa is likely to gain a higher-rate demodulator and a bunch of DSP helpers to do FIR filters and the like, some time Soon™ but that obviously doesn’t help you right now. I’ll take a look this week and work out if there’s anything immediately releasable that might help you.
The IQ output works for all the native sample rates in both 16bit and float formats.
As a general remark, decimating to 2.4 MSPS IQ will only reduce the probability of good decodes (reduced timing resolution.) Also, a proper decimation that preserves the available dynamic range that comes out of the radio requires good FIR filters. This makes everything harder on the cheap/old generation SBCs. This means you have to demodulate, filter and decode at the input sample rate without losing any data or resorting to any decimation. Start with a simple wide band AM demod and meditate about it.
There are many aspects to consider. First, multi-threading is only useful if you can’t get the job done with one core. As long as your heaviest processing is using less than one core equivalent, adding more threads to the job - if feasible at all - only adds extra overhead without bringing any benefits. You really don’t want that to happen in a resource restricted SBC. Second aspect is the sequential nature of the decoding. You current decodes depends on previous hits (whitelisting). Once you factor that, there is very little left to parallelize and the overhead largely exceeds any savings. So, you would say, why not use a beefier x86 machine and enable all the good processing in a single core? Same answer plus the fact that the challenge works the other way around. It’s relatively easy to get bloated code or stupid settings to saturate all your cores without doing anything useful. Doing it efficiently is a bit more difficult. Ah, and good math matters, of course.
The other sample types work OK and IQ works at 12 and below. If you think IQ should work with all rates then I can try and track it down in libairspy or the firmware.
I was thinking along the same lines but in my case using the CUDA cores on the nvidia Tegra SOC on my Jetson’s. @prog’s right though, won’t help much unless you’re CPU bound on one core. Besides, dump1090-fa is already multi-threaded. The acquisition and conversion of data is done on one thread and the demodulation/decoding is done on another.
When working with you on airspy_adsb i was considering adding some more pipelining than already existed.
That’s the alternative to parallel multithreading, having multiple producer and consumer threads that form a pipeline.
But saturating one core on the SBCs is really all you want to do.
That usually leaves you at a reasonable system load.
airspy_adsb got to the point of not gaining much even on an RPi3 by increasing the processing. (trying to decode even if the preamble didn’t look very promising)
And people who wanted every last bit of decoder performance were running it on an RPi4 already so i didn’t push further in the pipeline direction.
Decreasing the preamble filter threshold further than possible now on an RPi4 really yields almost no extra decoded messages, we’re probably talking doubling the compute power to get maybe half a percent extra messages or something.
While this is true it doesn’t really help if what you’re trying to do is add airspy support to the existing 2.4MSPS-only demodulator. If the airspy won’t do arbitrary sample rates then the only real option is to decimate from something it does support.
DSP is a… big topic. You probably want a book. I found Understanding Digital Signal Processing (Richard Lyons, ISBN 978-0-13-702741-5) to be a pretty good all round reference. You can find a ton of info online but it can be a bit inchoherent…
Here’s the really really short version
You can only represent up to a certain frequency at a given sample rate. e.g. using complex samples, at 1MSPS you can only represent 1MHz of bandwidth, frequencies between +/- 0.5MHz. Signals outside that range will get aliased so they appear at lower frequencies (this is basically the same thing as the “wagon-wheel” optical illusion). You don’t want that in this case. So you need to filter out those higher frequencies before you reduce the sample rate.
A FIR is a type of digital filter. You implement them by having a set of constants (“taps”) that you multiply a window of the input samples with to produce each output sample (i.e. a convolution):
The choice of taps controls what sort of filter it is; a halfband filter is just a special class of low-pass filter that happens to have some nice properties for decimating by 2. For general purpose filters you will want to use a filter design tool to work out suitable values. Octave/Matlab and gnuradio have libraries for this. Search for “Parks-McClellan”/“Remez exchange” or “FIR window design method”. Selecting the right taps is the hard part - actually implementing the FIR is very simple. (Implementing it fast is another question)
note that for the special case of decimation, you don’t need to compute samples you’re just going to throw away…
May not be worth going that far with it. The low hanging fruit is to use SIMD instructions. I got something like a 5x speedup on a FIR implementation just using NEON on a Pi 4; I’d guess you’d get something similar with SSE/AVX et al. Sometimes the compiler can auto-vectorize things but they’re not great at that, usually you’d need a hand-rolled implementation.
@obj Thanks for the additional pointers and reference!
I’m not seriously pursuing the additional core stuff. It was more of an musing.
@prog I think the issue with IQ at the higher sample rates is the fact that each sample is two samples (I, Q) which therefore doubles the stream rate. No way USB2 can handle that. IQ at 12 already streams at 283 Mb/s. I think we’re going to need a converter for one of the “real” formats.
I was seeing the effects of USB saturation because I had another SDR running at the same time to compare with. When I stopped the other dump1090 instance, the performance at 12/IQ with the simple keep 1 of 5 samples was much improved.