Piaware exposed to the internet

For those of you who have exposed your PiAware to the external internet, I am finding someone from so-net.ne.jp pounding on my site. They are downloading aircraft.json every few seconds. There is no http-Referer so that means they are not coming in via my website but rather, just trying to suck the flight tracking data. I’m near CYYZ so the site is fairly busy. Maybe that’s the attraction. Anyway, I blocked the entire so-net.ne.jp netblock using iptables. …just a heads up if you are seeing an abnormally large amount of data uploaded from your site.
The command is “sudo iptables -A INPUT -s 118.236.0.0/16 -j DROP”
and 210.128.0.0/13. Case matters. And those commands do not persist after a reboot so you may want to put them somewhere where they will run at reboot.

4 Likes

Thanks for the PSA. How did you figure out that this was happening in the first place? And also, is there a command to see which ip-addresses are sucking data ?

1 Like

My ISP’s website has a graph of upload/download data by day. The upload took a really really big jump up and stayed there, probably a month or two ago. I haven’t had time until recently to look into it. I had a heck of a time figuring out which of my systems was causing that. Fortunately I don’t have a cap on data. There is a Linux command “sudo iftop” which is like “top” but for I/O instead of CPU, so once I’d figured out which computer was the culprit, finding the IP address wasn’t difficult. I also proxy the piaware from my web server, rather than connect it directly to the internet. So an Apache/Linux webserver gives me a few more logs etc to look at. I don’t know if lighttp has the same logs or not. Lighttp is what is used by piaware. You’ll have to dig into that. I turned on access.log for a bit on the front end webserver. That will show you who is going after what, when and how often.

2 Likes

This does not really surprise me. Skyaware is intended mostly for local use; if you expose it to the internet at large, YMMV.

1 Like

You can enable the lighttpd access log.
Instead of using a file i would recommend using the syslog output which shouldn’t cause any space issues as it’s normally automatically handled not to overfill.

This is how the start of my /etc/lighttpd/lighttpd.conf looks like:

server.modules = (
        "mod_access",
        "mod_alias",
        "mod_accesslog",
        "mod_redirect",
        )

accesslog.use-syslog = "enable"

I added the server module

        "mod_accesslog",

and enabled it to log to syslog:

accesslog.use-syslog = "enable"

Then you can have a live view of what’s happening:

sudo journalctl -u lighttpd -f
1 Like

I find it easier to control resource access via my router. That way I don’t have to separately configure every resource on my network.

2 Likes

This in interesting, just the other day I noticed a jump in traffic to my systems (running the Ads-b receiver project scripts web pages). On looking into it, my firewall logs showed connections every min (or more) to the url: “/dump1090/data/aircraft.json?_=1572806170934” this was to both my systems. It came from IP 47.241.60.245. I blocked this at the router, and of course it stopped.

Then the next day the connections re-started, but this time from 161.117.35.181. Doing exactly the same thing, accessing URL “/dump1090/data/aircraft.json?_=1572806170934”.

Seeing as the connections were coming in using my IP address (and not DNS name), I have configured my router to block incoming http traffic where the IP has been used in the URL (e.g http://56.34.65.45/dump1090/data/aircraft.json?_=1572806170934), and that seems to have stopped the connections.

(Note 56.34.65.45 is not my IP).

I was wondering if it was something that Flight Aware had started doing, so interesting to see someone else notice this also recently. So, one to look out for.

1 Like

Both of those ip addresses are registered to Alibaba . com Singapore.

1 Like

I had similar problems a while ago. …just pounding on the piaware box every few seconds. I don’t know if they were using my domain or IP address though. Good for you to think of checking. I’d block one address and they would show up on another. I ended up blocking the entire IP range assigned to a couple of Asian countries. :slight_smile: I wish there was a way in PiAware to stop this. I am happy for folks to watch my Skyview map for a while, but no one sits in front of a screen uninterrupted for days on end!

2 Likes

Nope, it’s not a FlightAware thing; given the constant cache-busting timestamp it’s probably someone scraping your data.

Don’t expose it to the internet?

1 Like

Haha! Well yes, of course! But I have friends who sometimes like to check it out. And others with PiAware receivers elsewhere who want to compare what they are seeing with what I’m picking up. And I don’t mind as long as it’s reasonable.

1 Like

Whitelist their MAC addresses, can keep the rest blocked.

3 Likes

I keep a list of IP address blocks by geographic areas. If there is no legitimate purpose of communicating with a region I reject those addresses in my firewall. In my case I block addresses from Africa, Asia, Eastern Europe and South America.

The easiest way to establish your own list is to log and monitor router traffic.

2 Likes

What list provider are you using? If you are using one that is…

If you used the Raspbian install (and then added PiAware), you’d have a licensed copy of RealVNC.

Give your friends the password and it’ll be a whole lot more secure than leaving it open.

2 Likes

Did not know about RealVNC. Thank you. But the Piaware box is front ended by an Apache webserver that proxies allowed requests over to the piaware. The Piaware box isn’t totally exposed to the internet. But you make a good point and offer a potential solution.

I have a pretty decent site located on a hilltop near me in SW UK.
It is solar powered and the internet is a microwave feed from the town below, also solar powered.
Just recently I have seen an increase on the graph for outgoing bandwidth useage.
The site is publically available and I now see my data being scraped (aircraft.json).
I added a block/deny to the entire subnet of the offending IP and next day the IP has changed and the scraping starts over, block that subnet and… rinse and repeat!
The scraping request is every second, so logs fill quite quickly.

It would be interesting to find out from others who have their site publicly available just how widespread this data scraping is?

If someone has a ‘smart’ solution to stopping this scraping of data, then I would be keen to hear about it.
Only other alternative might be to stop making my site(s) publically available as it pee’s me off that someone is probably making something from this that I haven’t agreed to or signed up to.
I’m seeing similar scraping of my home site (aircraft.json) but from different IP addresses.

1 Like

There are plenty of people looking to scrape data. It’s rude, but ultimately if you are going to have something publicly accessible it’s down to you to implement controls that prevent people abusing it.

If that site has limited bandwidth available, I would be inclined to not make it publicly accessible directly, but put a reverse proxy in front of it configured to limit bandwidth and connection rate. Alternatively you could add user authentication.

1 Like

Interesting for me as well. In which log file did you notice that?

I’m finding similar on my VRS server, I’ll often look at it and see an IP has been connected for hours and hours and when I look at the connection client log, I’ll see that they’ve never touched any HTML or JS files, just been sucking the json all the time.

As @caius says, it’s rude but realistically there’s not a lot I can do about it. When I spot them, I add them to the firewall blacklist but of course, there could be people sucking the json directly from the Pi as well.

It’s my choice to allow these things to be publicly available and I accept that this is going to happen. I have unlimited bandwidth and they don’t actually use much of it so I just live with it.

1 Like