Calculation of values "nearby sites"

Does somebody know for which time frame the values in the table “nearby sites” are calculated/counted?

these values:

EDIT: Found it. Seem to be the seven day median of the last days

In my case there is no median. It just takes one of the previous days (usually 3rd or 4th one back) on “reported” and slaps it in there.

1 Like

That will probably be the median.
Median - Wikipedia

Note that it’s not an arithmetic average, that’s something different.

There is no median, those are exact numbers.

Yeah the median is an exact number chosen by a certain principle, which is explained in the article i linked.

Again, the median is not an average.

2 Likes

A continuous probability distribution would make sense in this specific case, a discrete one it doesn’t.

But hey, if it’s easier on the server…

This is correct.

Not when the underlying measurements are in terms of UTC days.

1 Like

What you have shown in the screenshot is exactly the median of your last seven days :wink:

IMO the measurements are the numbers on vertical axis, those get “averaged”. The days are just sampling intervals, those don’t get “averaged”.

day 1: 1000
day 2: 1500
day 3: 1200
day 4: 1700
day 5: 1300
day 6: 1100
day 7: 900

These numbers will be sorted by size and then the value which is at position 4 will be reported. That’s how it is explained in the article posted by wiedehopf.

In my example the median would be 1200 because there are three values lower (900/1000/1100) and three values higher (1300/1500/1700)

The median is as you illustrated chosen individually for positions and aircraft count.

Both positions and aircraft count, you have 7 values.
Sort those 7 values, choose the middle value.

There you have your two medians.

1 Like

No.
Both the median and the average (better known as mean) are measures of location of data.
the median is the value such that half the values are smaller and half are larger. For FA this is calculated as the median over the last 7 days (as pointed out by @foxhunter). For an uneven number of values the median is the middle value exactly.

The mean or average is a different measure of location.

I understand that, how it is calculated.
What I say is that it’s irrelevant (or badly applied) here where the granularity of the input data is much higher than what is used for median in statistics. We are taking if thousands of flight per day, those are better represented by normal average. We have a data set with 1000-2000 range, that can be averaged and rounded up/down to integer number of airplanes.
Median is used only what something like that (average mean) makes no physical sense. Like for small sets of integer data, maybe under 10. Like throwing a dice.

That’s not what the median is for. The median is used when you don’t want the data to be skewed by outliers so much - for example, say someone turns off their receiver for a few days or their antenna falls over or something. Using the mean would result in those two unrepresentative days pulling the average down. Using the median is more representative of the normal situation.

1 Like

There is no “calculation” in the overview of “nearby sites”. It’s simply a value taken off from the list of the last seven days. The example on Wikipedia shows the lack of this median values. If you change the hightest from 40 to 400 it does not change the median value because it’s still the highest.

So worst case example:

1, 2, 3, 4, 5, 6, 1000
will still give the median of 4

However it can give you a trend because the next day the values for counting have been changed.

I can only speculate to what FA’s thinking was, but I can give you my opinion and considerations

  • What is a reasonable amount of data included in the statistic ? Here is it is a week. It could be a month or a year of some other amount. Given the currency of the data and its fluctuation, it seems quite reasonable to focus the number of data points to include to be limited. 1 week seems right, if, admittedly somewhat arbitrary (e.g. why not 2 weeks?).

  • With that amount of data, what is the proper way of summarizing it? There are only 7 data points included. Then we need a measure that is properly robust to small sample size AND will be interpretable. The mean lacks this robustness for small sample sizes (but might have worked well for a month of data, for example).

  • Another consideration is quality of data. While it is tempting to think of the counts of aircraft and positions are perfect, they are unlikely to be that (e.g. TIS-B overcounting, weather influence, etc.). Some protection against interpretation of bad data is helpful. The median will be more robust than the mean (this would likely be true even with a month worths of data).

Arguably, none of these data are really continuous, or fitting into some statistical distribution. Now, that is okay as we are not trying to perform inference on the distribution itself. In that regard, the mean has no real issues.

fun fact: if you are a new member and have reported an even number of days smaller than 7, it takes the average of the two closest median values

Thinking as a systems and data administrator in another life, (now happily retired), Using a median is much simpler and better to represent how a site works. Little math, quick results. Much less thashing in the servers and consistent for all users. It just makes sense.