Matrices
A Matrix encodes an operator’s ‘instructions’ on how to transform vectors in a space based on the coordinate system (Basis Vectors) used. Different system a different matrix.
“Applying” a Matrix means that you change this Vector Space and all the objects/vectors you’ve embedded in it in some way. You shrink it, stretch it, rotate it by some angle, flip it inside-out, smush it into lower dimensions, or just leave it alone! In some cases, you can even change your mind and smash an undo button called “Commutativity”.
Types of Matrices
Oh there’s several!
Square Matrix
What it says. An matrix.
Identity Matrix
A square matrix (because it’s the “do nothing” operator that maps ) that looks like this.
Singular Matrix
Super important. These are square matrices that lose information. All these statements are saying this same thing:
- “determinant zero”
- “not invertible”
- “rank deficient”
- “columns linearly dependent”
- “rows linearly dependent”
- “nullspace contains nonzero vector” — contains some non-zero vector such that
- “transformation collapses dimensions”
- “zero is an eigenvalue”
Invertible Matrix
You really need to think about what this does to the vector space. A matrix is invertible if it loses no information when it transforms the space: it preserves all independent directions/bases. These do not:
- Flatten dimensions
- Collapse directions
- Merge distinct vectors
like Singular matrices. It’s still a square matrix that has some other square matrix such that
This other square matrix is the Inverse of and is denoted . It has some properties:
- Its Determinant is not zero.
- It has a Full Rank
- If is some vector (), has only one solution: is full of zeroes!
- If is some vector (), has just one solution
Orthogonal Matrix
These things do rotations and reflections without destroying information. Geometry is preserved. PCA components are orthogonal. The and in SVD are orthogonal. These are important things, yo.
Diagonal Matrix
What it says. Just values on the diagonal. These are super nice since you scale independently along all axes. This is the goal of (the middle part of) the SVD decomposition for a reason.
Symmetric Matrix
A Symmetric matrix is one where
Conjugate Matrix
Just for matrices with Complex Numbers. A Conjugate matrix just flips the sign of the imaginary part of any complex numbers in a matrix and is denoted
Hermitian Matrix
A Hermitian matrix is when a matrix equals its Conjugate Transpose: . Pretty important in ML and Quantum Mechanics. First Transpose, then take Conjugate.
Matrix Operations
Here’s an awesome cookbook of all the stuff you can do with matrices.
Transposition
Transposes are when you turn a matrix ‘s rows into columns and vice-versa and denote the monstrosity . They’re just a different kind of transformation and are useful depending on the problem you’re trying to solve. They have some properties.
Inversion
Sort of like division but with matrices. . Not all matrices are invertible. ‘un-does’ the transformation of the vector space specified by .
Commutation
In general, . You can verify this yourself with two matrices. But there are cases where this holds:
- If for some scalar then (i.e. you can scale the Identity Matrix all you want)
- Some diagonal matrices…
Singular Value Decomposition
This is really cool. It decomposes every matrix (which remember is a transformation) into Rotation → Scaling → Rotation: it says “Every Matrix is built from rotations + scaling.” That’s it. Every complicated transformation can be decomposed/understood this way!
- rotates. Columns of are input directions and right singular vectors.
- scales (stretch and shrink) along perpendicular directions, the stretch factors.
- rotates again. Columns of are output directions and are left singular vectors.
The “directions” here refer to geometry and not the coordinates (which are representations of the geometry!)
Eigendecomposition
This is very closely related to SVD but not the same. And not all matrices can be eigendecomposed because not all of them have eigenvectors and eigenvalues! This wants to find the directions preserved by the transformation but SVD is more general and wants to find the orthogonal directions stretched independently. Eigendecomposition is:
Where contains columns of eigenvectors and the eigenvalues. Looks suspiciously similar!
Eigendecomposition is based on some ideal: “This transform has some invariant directions.
Real World™ data is messy. SVD is the more Universal decomposition. “Every transform can be understood as rotations and scaling.” or “Which perpendicular directions experience pure stretching?”
This is why SVD is used a lot more in ML. This is why PCA and Recommender Systems and Transformers rely on it.
PCA
This is basically centered SVD!
Matrix Properties
Traces
This is just the sum of the diagonal elements of a Square Matrix.
Determinants
This gives you a scalar (boring-ass number) of your matrix/transformation. Different matrices/transformations (rotate, shear, stretch) can have the same determinant! It’s a kind of summary metric. These are only defined for square matrices (think of composability).
- The value tells you how much the objects in the Vector Space are scaled by (e.g. area doubles in 2D, volume triples in 3D)
- The sign tells you whether the orientation is preserved.
So in 2D space means the area is tripled and the orientation is flipped. What about ? This means that the space is collapsed into some lower dimension (4D → 3, 2, 1, or 0). The zero means that “some dimension disappeared.” Consider:
that maps . An entire dimension disappeared! So
Rank
This is an easy concept but is pretty important downstream. It’s the number of linearly independent rows or columns of a matrix.
When you do you pick rows versus columns? The smallest of the two: if you have a ‘rectangular’ matrix (always rows columns), .
A “Full Rank” matrix is one where there are no linearly dependent (not independent!) rows or columns (whichever is smallest). So if you have a matrix that’s 4 rows and 3 columns, the maximum rank possible is 3. Now look at the columns and see if you can figure out if one column depends on the other. Didn’t find any? Awesome, you have a Full Rank matrix.
Found one that depends on the other? Your rank is 2. Found two? Rank 1. See this Wikipedia article on Row Echelon Forms for more.
A Full-Rank matrix has no ‘redundant instructions’ and always has a non-zero Determinant! If you have a 3x3 matrix with rank 2, there are two linearly dependent columns.
Other Tidbits
The Matrix-Vector Derivative Identity
Importantly, if , then the partial derivative is since .
Random
A Square Matrix is invertible it has Full Rank.
Cramer’s Rule
Easier shown with an example. Heaven forbid you compute things by hand these days…
Replace the -th column of by to get :
By Cramer’s Rule,