Which Spatial Partition Trees Are Adaptive to Intrinsic

Which Spatial Partition Trees Are Adaptive to Intrinsic

Which Spatial Partition Trees Are Adaptive to Intrinsic Dimension?
Nakul Verma, Samory Kpotufe, and Sanjoy Dasgupta
{naverma, skpotufe, dasgupta}@ucsd.edu

University of California, San Diego
Local covariance dimension

Why is this important?
Spatial trees are at the heart of many machine learning tasks (e.g. regression, near neighbor search, vector quantization).
However, they tend to suffer from the curse of dimensionality: the rate
at which the diameter of the data decreases as we go down the tree
depends on the dimension of the space. In particular, we might require
partitions of size O(2D) to attain small data diameters.
Fortunately, many real world data have low intrinsic dimension (e.g.
manifolds, sparse datasets), and we would like to benefit from such
situations.

Theoretical Guarantees

A set S D is said to have local covariance dimension (d, ) if the
largest d eigenvalues of its covariance matrix satisfy:

1
2
1

2
d

2
1

2
D

d=2

Diameter decrease rate (k): Smallest k such that data diameter is halved
every k levels.
We show that:
k O d
RP tree:
number of levels
PD tree:

d=3

d=1

Empirical estimates of local covariance dimension

2M tree:

k O

k O min

2 2
1

d ,

2
i

2
i

2 2
1

needed to halve
the diameter
( k)

dyadic tree / kd tree: k O D log D
Axis parallel splitting rules (dyadic / kd tree) dont always adapt to intrinsic
dimension; the upper bounds have matching lower bounds.
On the other hand, the irregular splitting rules (RP / PD / 2M trees) always
adapt to intrinsic dimension. They therefore tend to perform better on real
world tasks.

We show that
Trees such as RPTree, PDTree and 2-MeansTree adapt to the intrinsic
dimension of the data in terms of the rate at which they decrease diameter
down the tree.
This has strong implications on the performance of these trees on the
various learning tasks they are used for.

Experiments

Vector Quantization

Some real world data with low intrinsic dimension
Rotating teapot. One degree
of freedom (rotation angle).

Movement of a robotic arm.
Two degrees of freedom.
one for each joint.

Loc. cov. dim. estimate at different scales for some real-world datasets.

Spatial Partition Trees
Handwritten characters. The
tilt angle, thickness, etc.
govern the final written form.

Speech. Few anatomical characteristics govern the spoken
phonemes.

Hand gestures in Sign Language. Few
gestures can follow other gestures.

Standard characterizations of intrinsic dimension
Common notions of intrinsic dimension (e.g. Box dimension, Doubling
dimension, etc.) originally emerged from fractal geometry. They,
however, have the following issues in the context of machine learning:
These notions are purely geometrical and dont account for the underlying
distribution.
They are not robust to distributional noise: e.g. for a noisy manifold, these
dimensions can be very high.
They are difficult to verify empirically.

Need a more statistical notion of intrinsic dimension that
characterizes the underlying distribution, is robust to noise, and is
easy to verify for real world datasets.

Builds a hierarchy of nested partitions of the data space by recursively
bisecting the space.
dyadic tree

kd tree

rp tree

Quantization error of test data at different levels for various partition trees
(built using separate training data). 2-means and PD trees perform the best.

Nearest Neighbor

Level 1

Level 2

The trees we consider:
dyadic tree: Pick a coordinate direction and split the data at the mid point
along this direction.

Quality of the found neighbor at various levels of the partition trees.

Regression

kd tree: Pick a coordinate direction and split the data at the median along
this direction.

RP tree: Pick a random direction and split the data at the median along
this direction.

PCA/PD tree: Split the data at the median along the principal direction.
2Means tree: Compute the 2-means solution, and split the data as per the
cluster assignment.

l2 regression error in predicting the rotation angle at different tree levels.
All experiments are done with 10-fold cross validation.

Recently Viewed Presentations

  • Evaluating the Existing Evidence in Complementary Medicine ...

    Evaluating the Existing Evidence in Complementary Medicine ...

    Introduction of Chinese Acupuncture Zhi Ping Li MB. MA. Lic. Acu.
  • Earthquake in Pakistan

    Earthquake in Pakistan

    Health Resources Availability Mapping System The Darfur Case Study HeRAMS SUDAN WHO Country Office Health Information Service Unit * What is HeRAMS CONTEXTUAL INFORMATION Evolution - in absolute numbers - of the Population of Humanitarian Concern (Total Population, IDPs and...
  • Chapter 5 Gases - KauHelping

    Chapter 5 Gases - KauHelping

    The value of m s . has no effect on the energy ,size, shape , or orintation of an orbital, but it determines , how electron are arranged in an orbital. Dr Laila Al-Harbi . 1s1. principal quantum. number ....
  • Plug-In 2009 Template

    Plug-In 2009 Template

    Electric Vehicles: Matching Low-Carbon Technology to People and Policy. Jonn Axsen. Resource and Environmental ManagementSimon Fraser University. August 20, 2014
  • Work Experience Saint Nicholas School 21st  25th May

    Work Experience Saint Nicholas School 21st 25th May

    Work Experience Saint Nicholas School 21st - 25th May 2018 School Coordinator: Mrs Brooks What is Work Experience? An unpaid opportunity for your child to experience working life, whilst they are still at school A chance for them to develop...
  • Environmental Ethics and Metaethics - Auburn University

    Environmental Ethics and Metaethics - Auburn University

    "Metaethics is the attempt to understand the metaphysical, epistemological, semantic, and psychological, presuppositions and commitments of moral thought, talk, and practice. […] Metaethicsexplores as well the connection between values, reasons for action, and human motivation, asking how it ...
  • Region and Zone Chairperson Workshop 1 MD21 and

    Region and Zone Chairperson Workshop 1 MD21 and

    Remind clubs to submit ALL required reports on time. Monthly Membership Report. Service Activities. Report Incoming Officers Online. Encourage attendance at International, Multiple District and District conventions, conferences, training & events ... Lionsclubs.org ...
  • Context Clues - Montgomery Township School District

    Context Clues - Montgomery Township School District

    EXAMPLE: After a time, glaciers, or slowly moving rivers of ice, formed over many parts of the Earth. In this sentence the words "slowly moving rivers of ice" tell us what glaciers are. Synonym context clues are words around a...