11-755/18-797 Machine Learning for Signal Processing Fundamentals of Linear Algebra, Part II Class 2. 31 August 2009 Instructor: Bhiksha Raj 31 Aug 2010 11-755/18-797 1 Administrivia Registration: Anyone on waitlist still? We have a second TA Homework: Slightly delayed

Sohail Bahmani [email protected] Linear algebra Adding some fun new problems. Use the discussion lists on blackboard.andrew.cmu.edu Blackboard if you are not registered on blackboard please register 31 Aug 2010 11-755/18-797 2 Overview Vectors and matrices Basic vector/matrix operations Vector products Matrix products

Various matrix types Matrix inversion Matrix interpretation Eigenanalysis Singular value decomposition 31 Aug 2010 11-755/18-797 3 The Identity Matrix 1 0 Y 0 1 An identity matrix is a square matrix where All diagonal elements are 1.0 All off-diagonal elements are 0.0 Multiplication by an identity matrix does not change vectors 31 Aug 2010 11-755/18-797 4

Diagonal Matrix 2 0 Y 0 1 All off-diagonal elements are zero Diagonal elements are non-zero Scales the axes May flip axes 31 Aug 2010 11-755/18-797 5 Diagonal matrix to transform images How? 31 Aug 2010

11-755/18-797 6 Stretching 2 0 0 1 1 . 2 . 2 2 . 2 . 10 0 1 0 1 2 . 1 . 5 6 . 10 . 10 0 0 1 1 1 . 1 . 0 0 . 1 . 1 Location-based representation Scaling matrix only scales the X axis 31 Aug 2010 The Y axis and pixel value are scaled by identity Not a good way of scaling. 11-755/18-797

7 Stretching D= 1 .5 0 0 0 .5 1 .5 A 0 0 0 .5 0 0 0 0 . . . . Newpic DA 31 Aug 2010 . . . ( Nx 2 N ) . . Better way 11-755/18-797 8 Modifying color R G

P 1 Newpic P 0 0 B 0 0 2 0 0 1 Scale only Green 31 Aug 2010 11-755/18-797 9 Permutation Matrix

0 1 0 x y 0 0 1 y z 1 0 0 z x Y Z 4 X 3 Y (old Z) 3 5 X (old Y) 4 A permutation matrix simply rearranges the axes (3,4,5) Z (old X) 5

The row entries are axis vectors in a different order The result is a combination of rotations and reflections The permutation matrix effectively permutes the arrangement of the elements in a vector 31 Aug 2010 11-755/18-797 10 Permutation Matrix 0 1 0 P 1 0 0 0 0 1 0 1 0 P 0 0 1 1 0 0 1 1 . 2 . 2 2 . 2 . 10 1 2 . 1 . 5 6 . 10 . 10 1 1 . 1 . 0 0 . 1 . 1 Reflections and 90 degree rotations of images and objects

31 Aug 2010 11-755/18-797 11 Permutation Matrix 0 1 0 P 1 0 0 0 0 1 0 1 0 P 0 0 1 1 0 0 x1 y 1 z1 x2 y2 z2 . . xN . . y N . . z N Reflections and 90 degree rotations of images and objects Object represented as a matrix of 3-Dimensional position

vectors Positions identify each point on the surface 31 Aug 2010 11-755/18-797 12 Rotation Matrix x' x cos y sin y ' x sin y cos (x,y) Y cos R sin sin cos x X y x' X new y ' X

R X X new (x,y) y y (x,y) Y X x x A rotation matrix rotates the vector by some angle Alternately viewed, it rotates the axes The new axes are at an angle to the old one 31 Aug 2010 11-755/18-797 13 Rotating a picture cos 45

R sin 45 0 1 1 . 2 . 2 1 2 . 1 . 5 1 1 . 1 . 0 sin 45 0 cos 45 0 0 1 2 . 2 . . 6 . 10 . . 0 . 1 . 1 0 2 1 2 3 2 1 . 2 . 3 2

. 1 . 3 2 . 7 2 . 0 4 2 8 2 0 . 8 2 . 12 2 . 1 Note the representation: 3-row matrix Rotation only applies on the coordinate rows The value does not change Why is pacman grainy? 31 Aug 2010 11-755/18-797 14

. . . . . 1 3-D Rotation Xnew Ynew Z Y Znew X 2 degrees of freedom 2 separate angles What will the rotation matrix be? 31 Aug 2010 11-755/18-797

15 Projections What would we see if the cone to the left were transparent if we looked at it along the normal to the plane The plane goes through the origin Answer: the figure to the right How do we get this? Projection 31 Aug 2010 11-755/18-797 16 Projections Each pixel in the cone to the left is mapped onto to its shadow on the plane in the figure to the right The location of the pixels shadow is obtained by multiplying the vector V representing the pixels location in the first figure by a matrix A

Shadow (V )= A V The matrix A is a projection matrix 31 Aug 2010 11-755/18-797 17 Projections 90degrees W2 W1 Consider any plane specified by a set of vectors W1, W2.. projection Or matrix [W1 W2 ..]

Any vector can be projected onto this plane by multiplying it with the projection matrix for the plane The projection is the shadow 31 Aug 2010 11-755/18-797 18 Projection Matrix90degrees W2 W1 Given a set of vectors W1, W2, which form a matrix W = [W1 W2.. ] The projection matrix that transforms any vector X to its projection on the plane is P = W (WTW)-1 WT projection We will visit matrix inversion shortly

Magic any set of vectors from the same plane that are expressed as a matrix will give you the same projection matrix P = V (VTV)-1 VT 31 Aug 2010 11-755/18-797 19 Projections HOW? 31 Aug 2010 11-755/18-797 20 Projections Draw any two vectors W1 and W2 that lie on the plane

ANY two so long as they have different angles Compose a matrix W = [W1 W2] Compose the projection matrix P = W (WTW)-1 WT Multiply every point on the cone by P to get its projection View it Im missing a step here what is it? 31 Aug 2010 11-755/18-797 21 Projections The projection actually projects it onto the plane, but youre still seeing the plane in 3D The result of the projection is a 3-D vector

P = W (WTW)-1 WT = 3x3, P*Vector = 3x1 The image must be rotated till the plane is in the plane of the paper 31 Aug 2010 The Z axis in this case will always be zero and can be ignored How will you rotate it? (remember you know W1 and W2) 11-755/18-797 22 Projection matrix properties The projection of any vector that is already on the plane is the vector itself The projection of a projection is the projection Px = x if x is on the plane If the object is already on the plane, there is no further projection to be performed P (Px) = Px

That is because Px is already on the plane Projection matrices are idempotent P2 = P 31 Aug 2010 Follows from the above 11-755/18-797 23 Projections: A more physical meaning Let W , W .. W be bases 1 k We want to explain our data in terms of these bases 2

We often cannot do so But we can explain a significant portion of it The portion of the data that can be expressed in terms of our vectors W1, W2, .. Wk, is the projection of the data on the W1 .. Wk (hyper) plane In our previous example, the data were all the points on a cone The interpretation for volumetric data is obvious 31 Aug 2010 11-755/18-797 24 Projection : an example with sounds The spectrogram (matrix) of a piece of music How much of the above music was composed of the above notes

I.e. how much can it be explained by the notes 31 Aug 2010 11-755/18-797 25 Projection: one note M= The spectrogram (matrix) of a piece of music W= M = spectrogram; W = note P = W (WTW)-1 WT Projected Spectrogram = P * M 31 Aug 2010 11-755/18-797 26 Projection: one note

cleaned up M= The spectrogram (matrix) of a piece of music W= Floored all matrix values below a threshold to zero 31 Aug 2010 11-755/18-797 27 Projection: multiple notes M= The spectrogram (matrix) of a piece of music W= P = W (WTW)-1 WT

Projected Spectrogram = P * M 31 Aug 2010 11-755/18-797 28 Projection: multiple notes, cleaned up M= The spectrogram (matrix) of a piece of music W= P = W (WTW)-1 WT Projected Spectrogram = P * M 31 Aug 2010 11-755/18-797 29 Projection and Least Squares

Projection actually computes a least squared error estimate For each vector V in the music spectrogram matrix Approximation: Vapprox = a*note1 + b*note2 + c*note3.. a b c note1 note2 note3 Vapprox Error vector E = V Vapprox Squared error energy for V e(V) = norm(E)2 Total error = sum_over_all_V { e(V) } = V e(V)

Projection computes Vapprox for all vectors such that Total error is minimized It does not give you a, b, c.. Though 31 Aug 2010 That needs a different operation the inverse / pseudo inverse 11-755/18-797 30 Orthogonal and Orthonormal matrices 1 0 0 0.707 0 1 0 0.707 0 0 1 0 0.354 0.354 0.866 0.612 0.612 0.5

Orthogonal Matrix : AAT = diagonal Each row vector lies exactly along the normal to the plane specified by the rest of the vectors in the matrix Orthonormal Matrix: AAT = ATA = I In additional to be orthogonal, each vector has length exactly = 1.0 Interesting observation: In a square matrix if the length of the row vectors is 1.0, the length of the column vectors is also 1.0 31 Aug 2010 11-755/18-797 31 Orthogonal and Orthonormal Matrices Orthonormal matrices will retain the relative angles between transformed vectors

Essentially, they are combinations of rotations, reflections and permutations Rotation matrices and permutation matrices are all orthonormal matrices The vectors in an orthonormal matrix are at 90degrees to one another. Orthogonal matrices are like Orthonormal matrices with stretching The product of a diagonal matrix and an orthonormal matrix 31 Aug 2010 11-755/18-797 32 Matrix Rank and Rank-Deficient Matrices P * Cone = Some matrices will eliminate one or more dimensions during transformation These are rank deficient matrices The rank of the matrix is the dimensionality of the trasnsformed version of a full-dimensional object

31 Aug 2010 11-755/18-797 33 Matrix Rank and Rank-Deficient Matrices Rank = 2 Rank = 1 Some matrices will eliminate one or more dimensions during transformation These are rank deficient matrices The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object 31 Aug 2010 11-755/18-797 34 Projections are often examples of rankdeficient transforms M=

W= P = W (WTW)-1 WT ; Projected Spectrogram = P * M The original spectrogram can never be recovered P is rank deficient P explains all vectors in the new spectrogram as a mixture of only the 4 vectors in W There are only 4 independent bases Rank of P is 4 31 Aug 2010 11-755/18-797 35 Non-square Matrices x1 y

1 x2 y2 . . xN . . y N X = 2D data .8 .9 .1 .9 .6 0 P = transform x1 y 1 z1 x2 y2 z2 . . xN . . y N . . z N PX = 3D, rank 2

Non-square matrices add or subtract axes More rows than columns add axes But does not increase the dimensionality of the data Fewer rows than columns reduce axes 31 Aug 2010 May reduce dimensionality of the data 11-755/18-797 36 Non-square Matrices x1 y 1 z1 x2 y2 z2

. . xN . . y N . . z N X = 3D data, rank 3 .3 1 .2 .5 1 1 x1 y 1 P = transform x2 y2 . . xN . . y N PX = 2D, rank 2 Non-square matrices add or subtract axes More rows than columns add axes

But does not increase the dimensionality of the data Fewer rows than columns reduce axes 31 Aug 2010 May reduce dimensionality of the data 11-755/18-797 37 The Rank of a Matrix .3 1 .2 .5 1 1 .8 .9 .1 .9 .6 0 The matrix rank is the dimensionality of the transformation of a fulldimensioned object in the original space

The matrix can never increase dimensions Cannot convert a circle to a sphere or a line to a circle The rank of a matrix can never be greater than the lower of its two dimensions 31 Aug 2010 11-755/18-797 38 The Rank of Matrix M= Projected Spectrogram = P * M Every vector in it is a combination of only 4 bases The rank of the matrix is the smallest no. of bases required to describe the output

E.g. if note no. 4 in P could be expressed as a combination of notes 1,2 and 3, it provides no additional information Eliminating note no. 4 would give us the same projection The rank of P would be 3! 31 Aug 2010 11-755/18-797 39 Matrix rank is unchanged by transposition 0.9 0.5 0.8 0.1 0.4 0.9 0.42 0.44 0.86 0.9 0.1 0.42 0.5 0.4 0.44 0.8 0.9 0.86 If an N-D object is compressed to a K-D object by a matrix, it will also be compressed to a K-D object by the transpose of the matrix

31 Aug 2010 11-755/18-797 40 Matrix Determinant (r1+r2) (r2) (r1) (r2) (r1) The determinant is the volume of a matrix Actually the volume of a parallelepiped formed from its row vectors Also the volume of the parallelepiped formed from its column vectors Standard formula for determinant: in text book

31 Aug 2010 11-755/18-797 41 Matrix Determinant: Another Volume = V1 Volume = V2 Perspective 0 .8 1 .0 0.7 0 0 .8 0.9 0 .7 0.8 0.7 The determinant is the ratio of N-volumes If V is the volume of an N-dimensional object O in N1 dimensional space

O is the complete set of points or vertices that specify the object If V2 is the volume of the N-dimensional object specified by A*O, where A is a matrix that transforms the space |A| = V2 / V1 31 Aug 2010 11-755/18-797 42 Matrix Determinants Matrix determinants are only defined for square matrices They characterize volumes in linearly transformed space of the same dimensionality as the vectors Rank deficient matrices have determinant 0 Since they compress full-volumed N-D objects into zero-volume N-D objects E.g. a 3-D sphere into a 2-D ellipse: The ellipse has 0 volume (although it does have area)

Conversely, all matrices of determinant 0 are rank deficient Since they compress full-volumed N-D objects into zero-volume objects 31 Aug 2010 11-755/18-797 43 Multiplication properties Properties of vector/matrix products Associative A (B C) (A B) C Distributive A (B C) A B A C NOT commutative!!! A B B A

left multiplications right multiplications Transposition 31 Aug 2010 A B T BT A T 11-755/18-797 44 Determinant properties Associative for square matrices Scaling volume sequentially by several matrices is equal to scaling once by the product of the matrices Volume of sum != sum of Volumes

A B C A B C (B C) B C The volume of the parallelepiped formed by row vectors of the sum of two matrices is not the sum of the volumes of the parallelepipeds formed by the original matrices Commutative for square matrices!!! A B B A A B The order in which you scale the volume of an object is irrelevant 31 Aug 2010 11-755/18-797 45 Matrix Inversion A matrix transforms an ND object to a different N-D object What transforms the new

object back to the original? The inverse transformation The inverse transformation is called the matrix inverse 31 Aug 2010 11-755/18-797 0.8 T 1.0 0.7 0 0.8 0.9 0.7 0.8 0.7 ? ? ? Q ? ? ? T 1 ? ? ?

46 Matrix Inversion T T-1 T-1T = I The product of a matrix and its inverse is the identity matrix Transforming an object, and then inverse transforming it gives us back the original object 31 Aug 2010 11-755/18-797 47 Inverting rank-deficient matrices 0 0 1 0 .25 0.433

0 0.433 0.75 Rank deficient matrices flatten objects It is not possible to go back from the flattened object to the original object In the process, multiple points in the original object get mapped to the same point in the transformed object Because of the many-to-one forward mapping Rank deficient matrices have no inverse 31 Aug 2010 11-755/18-797 48 Revisiting Projections and Least Projection computes a least squared error estimate Squares

For each vector V in the music spectrogram matrix Approximation: V approx = a*note1 + b*note2 + c*note3.. note1 note2 note3 W Error vector E = V Vapprox Squared error energy for V e(V) = norm(E)2 Total error = Total error + e(V)

a Vapprox W b c Projection computes Vapprox for all vectors such that Total error is minimized But WHAT ARE a b and c? 31 Aug 2010 11-755/18-797 49 The Pseudo Inverse (PINV) a Vapprox W b c a b PINV (W ) * V c We are approximating spectral vectors V as the transformation of the vector [a b c]T

a V W b c Note were viewing the collection of bases in W as a transformation The solution is obtained using the pseudo inverse This give us a LEAST SQUARES solution 31 Aug 2010 If W were square and invertible Pinv(W) = W -1, and V=Vapprox 11-755/18-797 50 Explaining music with one note M= X =PINV(W)*M W= Recap: P = W (WTW)-1 WT, Projected Spectrogram = P*M

Approximation: M W*X The amount of W in each vector = X = PINV(W)*M W*Pinv(W)*M = Projected Spectrogram = P*M W*Pinv(W) = Projection matrix = W (WTW)-1 W. 31 Aug 2010 11-755/18-797 PINV(W) = (WTW)-1WT 51 Explanation with multiple notes M= X=PINV(W)*M W= X = Pinv(W) * M; Projected matrix = W*X = W*Pinv(W)*M

31 Aug 2010 11-755/18-797 52 How about the other way? M= V= W= ? WV \approx M 31 Aug 2010 U= ? W = M * Pinv(V) 11-755/18-797 U = WV 53 Pseudo-inverse (PINV)

Pinv() applies to non-square matrices Pinv ( Pinv (A))) = A A*Pinv(A)= projection matrix! Projection onto the columns of A If A = K x N matrix and K > N, A projects N-D vectors into a higher-dimensional K-D space Pinv(A)*A = I in this case 31 Aug 2010 11-755/18-797 54 Matrix inversion (division) The inverse of matrix multiplication

Not element-wise division!! Provides a way to undo a linear transformation Inverse of the unit matrix is itself Inverse of a diagonal is diagonal Inverse of a rotation is a (counter)rotation (its transpose!) Inverse of a rank deficient matrix does not exist! But pseudoinverse exists Pay attention to multiplication side! 1 Matrix inverses defined matrices only 1 C A B C,forAsquare C B , B A

If matrix not square use a matrix pseudoinverse: MATLAB syntax: inv(a), pinv(a) A B C, A C B , B A C 31 Aug 2010 11-755/18-797 55 What is the Matrix ? Duality in terms of the matrix identity Can be a container of data Can be a linear transformation

A process by which to transform data in another matrix Well usually start with the first definition and then apply the second one on it An image, a set of vectors, a table, etc Very frequent operation Room reverberations, mirror reflections, etc Most of signal processing and machine learning are matrix operations! 31 Aug 2010 11-755/18-797 56 Eigenanalysis If something can go through a process mostly unscathed in character it is an eigen-something

A vector that can undergo a matrix multiplication and keep pointing the same way is an eigenvector Its length can change though How much its length changes is expressed by its corresponding eigenvalue Sound example: Each eigenvector of a matrix has its eigenvalue Finding these eigenthings is called eigenanalysis 31 Aug 2010 11-755/18-797 57 EigenVectors and EigenValues Black 1.5 0.7

M 0.7 1.0 vectors are eigen vectors Vectors that do not change angle upon transformation They may change length MV V V = eigen vector = eigen value Matlab: [V, L] = eig(M) 31 Aug 2010 L is a diagonal matrix whose entries are the eigen values V is a maxtrix whose columns are the eigen vectors 11-755/18-797 58

Eigen vector example 31 Aug 2010 11-755/18-797 59 Matrix multiplication revisited 1.0 0.07 M 1.1 1.2 Matrix transformation transforms the space Warps the paper so that the normals to the two vectors now lie along the axes 31 Aug 2010 11-755/18-797 60 A stretching operation 1.4

Draw two lines Stretch / shrink the paper along these lines by factors 1 and 2 0.8 The factors could be negative implies flipping the paper The result is a transformation of the space 31 Aug 2010 11-755/18-797 61 A stretching operation Draw two lines Stretch / shrink the paper along these lines by factors 1 and 2

The factors could be negative implies flipping the paper The result is a transformation of the space 31 Aug 2010 11-755/18-797 62 Physical interpretation of eigen vector The result of the stretching is exactly the same as transformation by a matrix The axes of stretching/shrinking are the eigenvectors The degree of stretching/shrinking are the corresponding eigenvalues The EigenVectors and EigenValues convey all the information about the matrix 31 Aug 2010

11-755/18-797 63 Physical interpretation of eigen vector V V1 V2 1 0 L 0 2 M VLV 1 The result of the stretching is exactly the same as transformation by a matrix The axes of stretching/shrinking are the eigenvectors The degree of stretching/shrinking are the corresponding eigenvalues The EigenVectors and EigenValues convey all the information about the matrix 31 Aug 2010 11-755/18-797 64

Eigen Analysis Not all square matrices have nice eigen values and vectors E.g. consider a rotation matrix cos R sin sin cos x X y x ' X new y ' This rotates every vector in the plane

No vector that remains unchanged In these cases the Eigen vectors and values are complex Some matrices are special however.. 31 Aug 2010 11-755/18-797 65 Symmetric Matrices 1.5 0.7 0.7 1 Matrices that do not change on transposition Row and column vectors are identical

Symmetric matrix: Eigen vectors and Eigen values are always real Eigen vectors are always orthogonal At 90 degrees to one another 31 Aug 2010 11-755/18-797 66 Symmetric Matrices 1.5 0.7 0 .7 1 Eigen vectors point in the direction of the major and minor axes of the ellipsoid resulting from the transformation of a spheroid The eigen values are the lengths of the axes 31 Aug 2010 11-755/18-797

67 Symmetric matrices Eigen vectors Vi are orthonormal ViTVi = 1 ViTVj = 0, i != j Listing all eigen vectors in matrix form V VT = V-1 VT V = I V VT= I M Vi = Vi In matrix form : M V = V L

Lis a diagonal matrix with all eigen values M = V L VT 31 Aug 2010 11-755/18-797 68