Which Spatial Partition Trees Are Adaptive to Intrinsic Dimension?

Nakul Verma, Samory Kpotufe, and Sanjoy Dasgupta

{naverma, skpotufe, dasgupta}@ucsd.edu

University of California, San Diego

Local covariance dimension

Why is this important?

Spatial trees are at the heart of many machine learning tasks (e.g. regression, near neighbor search, vector quantization).

However, they tend to suffer from the curse of dimensionality: the rate

at which the diameter of the data decreases as we go down the tree

depends on the dimension of the space. In particular, we might require

partitions of size O(2D) to attain small data diameters.

Fortunately, many real world data have low intrinsic dimension (e.g.

manifolds, sparse datasets), and we would like to benefit from such

situations.

Theoretical Guarantees

A set S D is said to have local covariance dimension (d, ) if the

largest d eigenvalues of its covariance matrix satisfy:

1

2

1

2

d

2

1

2

D

d=2

Diameter decrease rate (k): Smallest k such that data diameter is halved

every k levels.

We show that:

k O d

RP tree:

number of levels

PD tree:

d=3

d=1

Empirical estimates of local covariance dimension

2M tree:

k O

k O min

2 2

1

d ,

2

i

2

i

2 2

1

needed to halve

the diameter

( k)

dyadic tree / kd tree: k O D log D

Axis parallel splitting rules (dyadic / kd tree) dont always adapt to intrinsic

dimension; the upper bounds have matching lower bounds.

On the other hand, the irregular splitting rules (RP / PD / 2M trees) always

adapt to intrinsic dimension. They therefore tend to perform better on real

world tasks.

We show that

Trees such as RPTree, PDTree and 2-MeansTree adapt to the intrinsic

dimension of the data in terms of the rate at which they decrease diameter

down the tree.

This has strong implications on the performance of these trees on the

various learning tasks they are used for.

Experiments

Vector Quantization

Some real world data with low intrinsic dimension

Rotating teapot. One degree

of freedom (rotation angle).

Movement of a robotic arm.

Two degrees of freedom.

one for each joint.

Loc. cov. dim. estimate at different scales for some real-world datasets.

Spatial Partition Trees

Handwritten characters. The

tilt angle, thickness, etc.

govern the final written form.

Speech. Few anatomical characteristics govern the spoken

phonemes.

Hand gestures in Sign Language. Few

gestures can follow other gestures.

Standard characterizations of intrinsic dimension

Common notions of intrinsic dimension (e.g. Box dimension, Doubling

dimension, etc.) originally emerged from fractal geometry. They,

however, have the following issues in the context of machine learning:

These notions are purely geometrical and dont account for the underlying

distribution.

They are not robust to distributional noise: e.g. for a noisy manifold, these

dimensions can be very high.

They are difficult to verify empirically.

Need a more statistical notion of intrinsic dimension that

characterizes the underlying distribution, is robust to noise, and is

easy to verify for real world datasets.

Builds a hierarchy of nested partitions of the data space by recursively

bisecting the space.

dyadic tree

kd tree

rp tree

Quantization error of test data at different levels for various partition trees

(built using separate training data). 2-means and PD trees perform the best.

Nearest Neighbor

Level 1

Level 2

The trees we consider:

dyadic tree: Pick a coordinate direction and split the data at the mid point

along this direction.

Quality of the found neighbor at various levels of the partition trees.

Regression

kd tree: Pick a coordinate direction and split the data at the median along

this direction.

RP tree: Pick a random direction and split the data at the median along

this direction.

PCA/PD tree: Split the data at the median along the principal direction.

2Means tree: Compute the 2-means solution, and split the data as per the

cluster assignment.

l2 regression error in predicting the rotation angle at different tree levels.

All experiments are done with 10-fold cross validation.