What is the difference between the 3 categories of Data Cube Measures in Data mining. how do they work Please show me a with a sample.

Distributive

Algebraic

Hollistic

## Expert Answer

**Answer:**

Measures can be organized into three categories based on the kind of aggregate functions

used:

- distributive,

- algebraic,

- holistic.

**Distributive.**

An aggregate function is distributive if it can be computed in a distributed manner. Suppose the data are partitioned into *n* sets. We apply the function to each partition, resulting in *n* aggregate values. If the result derived by applying the function to the *n* aggregate values is the same as that derived by applying the function to the entire data set (without partitioning), the function can be computed in a distributed manner.

For example, count() can be computed for a data cube by first partitioning the cube into a set of subcubes, computing count() for each subcube, and then summing up the counts obtained for each subcube. Hence, count() is a distributive

aggregate function. For the same reason, sum(), min(), and max() are distributive aggregate functions.

A measure is distributive if it is obtained by applying a distributive aggregate function. Distributive measures can be computed efficiently because they can be computed in a distributive manner.

**Algebraic.**

An aggregate function is algebraic if it can be computed by an algebraic function with *m* arguments (where *m* is a bounded positive integer), each of which is obtained by applying a distributive aggregate function.

For example, avg() (average) can be computed by sum()/count(), where both sum() and count() are distributive

aggregate functions. Similarly, it can be shown that min N() and max N() (which find the N minimum and N maximum values, respectively, in a given set) and standard deviation() are algebraic aggregate functions.

A measure is algebraic if it is obtained by applying an algebraic aggregate function.

**Holistic.**

An aggregate function is holistic if there is no constant bound on the storage size needed to describe a subaggregate. That is, there does not exist an algebraic function with *m* arguments (where *m* is a constant) that characterizes the computation.

Common examples of holistic functions include median(), mode(), and rank().

A measure is holistic if it is obtained by applying a holistic aggregate function.

**Distributive:**

**Examples**:

Sum(),

Count(), Minimum(), Maximum()

**Algebraic:**

Average(),

StandardDeviation(),

MaxN() (N largest values),

MinN() (N smallest values), CenterOfMass()

**Holistic:**

Median(),

MostFrequent(), Rank().