FlightAware Discussions

Working on native AirSpy support for dump1090-fa

In what respect?

It will be also interesting to compare with 20MHz and 24MHz…

You have a couple of conditional branches deep in the FIR code that won’t be helping there.

edit: you may also want to mess with the build flags to try to convince gcc to auto-vectorize; try (assuming a Pi 4) -mfpu=neon-vfpv4 and/or -march=armv8-a+simd and/or -funsafe-math-optimizations and/or -mtune=cortex-a72. ymmv, options not actually tested…

1 Like

Would be nice if those options will be passed based on actual platform. x86 will probably make use of SSE, SSE2, SSE3, SSSE3, SSE4.1&4.2…?

That is in progress.

1 Like

How would anyone sell a product without proper engineering ans stay in business?

Just don’t confuse open source and education.

You shouldn’t need any help with this: https://www.silabs.com/documents/public/application-notes/AN619.pdf

1 Like

That’s how most time is spent in tech.

Avoid:

  • Branching inside your processing loops
  • floats and doubles
  • Divisions
  • Multiplications (if possible)
  • Calculating a dot product for the samples that will be dismissed (decimated)
  • Harsh quantization to 8-bit or you will end up with 7-bit.

Also, you would get more insight about where you are going if you can include the FPS in your results.

Have fun!

2 Likes

@SoNic67 airspy_adsb is running about 15% cpu, dump1090+airspy is running about 40%, and dump1090+NESDR is runnign about 4%.

@obj The code isn’t even remotely optimized yet. I needed to make it functional before performant. I haven’t even tested it on an ARM architecture platform yet. In any case, I’ve got quote a bit of optimization experience so I don’t think making it performant is going to be an issue. It was actually quite a bit better before I reorganized the code to make it easier to follow and look better. :slight_smile:

@prog Are there any hidden options in airspy_adsb that could produce stats similar to dump1090’s “Local receiver” stats?

1 Like

Forgot to mention about the FIRs. Right now I’m using a 19 tap profile but my “average of 5” instead of “1 in 5” decimator without a FIR worked almost as well as the FIR + “1 in 5” decimator.

1 Like

No. There is no point in reducing the decoding performance since the whole program is not CPU bound.

Well, my 10 core/20 thread Core i7 desktop couldn’t care in the least but on an SBC it means more power and heat and fewer cycles available for other stuff.

I am also talking about SBCs. FYI. I am only interested in the technical challenge of getting the best decoder for Airspy radios - which is done. If you want to improve things further you will likely need to shift the entire problem to another domain with different tools, constraints and probably different mindset as well.
My algo could eventually fit in a small specialized ADSB dongle, but for know I don’t see it worth the effort and there are many more sophisticated radios on my bench right now, plus all the fancy SDR# DSP plugins that won’t code themselves.

1 Like

Assuming you mean you sum 5 consecutive samples, divide by 5, output that one value, then move on to the next 5 samples; you can actually just analyze that as a FIR with an impulse response that is [0.2, 0.2, 0.2, 0.2, 0.2]. It’s just a (not great, but better than nothing) LPF:

>> freqz([0.2, 0.2, 0.2, 0.2, 0.2], 1.0, 512, 12e6)

1 Like

Well, it’s kinda done. I’m only using the default preamble filter, whitelist and timeout settings with airspy_adsb to make the comparison more even. I realize those have nothing to do with the sampling, filtering and decimation but I’d like to get the support to the point where I’d use it on my production feeder.

Yeah I was thinking about what could be done on even the existing dongle but having to produce custom firmware builds would be a good amount of overhead.

Yeah it’s a 5 samples at a time, then move on to the next 5. I can easily try it as a 5 tap fir with equal coefficients. I’ve also been playing with tap counts that are powers of 2 so I can just mask the tap index with 2^tapcount-1 and let it just wrap rather than having to test and reset it.

1 Like

Try this multiplication-less, better than boxcar, 5 taps filter: 2 8 10 8 2
Considering the signal is in a buffer of integers:
output = ( (buffer[0] << 1) + (buffer[1] << 3) + (buffer[2] << 3) + (buffer[2] << 1) + (buffer[3] << 3) + (buffer[4] << 1) ) >> 5;

Check if this helps.

2 Likes

Nevermind. 8+2=10…

If you notice, the 3rd coefficient is 10 which isn;t a multiple of 2 so the 3rd sample needs to be shifted twice.

Cross Post :slight_smile:

1 Like

I edited while you were replying :slight_smile:
Don’t know why I saw 16 there.

Took a few looks for me to see it. :slight_smile:

Anyway, the shift method results in a bit fewer messages than the equal 0.2 with multiplication.

EDIT: But the 0.2 with multiplication FIR is better than the 19 tap filter or my “average 5” method.
EDIT2: …and it’s about half the CPU utilization as the 19 tap filter.

I’m done for the day. The latest stuff is pushed up to