In realworld database system, Suppose query Query1 obtains the median med_a of a set Aset of values. Suppose query Query2 obtains the median med_b of a subset Bset of Aset. If med_a < med_b, what can be inferred about Aset, Bset, and the elements of Aset not in Bset? Can you think of an example where this may be revealing or of possible interest from a privacy standpoint (e.g. what could Aset, Bset andmed_a or med_b be?)?
Expert Answer
Let us take a simple example which will help us understand the solution better.
Aset = {1,2,3,4,5,6,7,8,9}. med_a will be 5.
Now, think of Bset = {4,5,6,7,8}. med_b will be 6.
Aset-Bset (i.e. the elements which are in Aset, but not in Bset) = {1,2,3,9}. The median becomes (2+3)/2 = 2.5.
So, from the question, if med_b is lesser than med_a, we can infer that more than 50% of the elements of Bset are from the upper half of Aset. In simpler words, we have selected Bset with realtively higher values.
From a realworld example, let us say we have the people’s income database which is large, but a very few no. of people actually earns a lot of money. Hence, median will be relatively lower. Now, if you select a set from this and find out that the median is higher, you can decide that the persons selected in the new group has mostly people with income on the higher side compared to the original set. This might be a provacy worry because careful and many no. of sampling can help a person find a set with a very high median, and can actually target on that group of users for hacking or other harmful purposes.