Skip to main content

Matrices

A Matrix encodes an operator’s ‘instructions’ on how to transform vectors in a space based on the coordinate system (Basis Vectors) used. Different system     \implies a different matrix.

“Applying” a Matrix means that you change this Vector Space and all the objects/vectors you’ve embedded in it in some way. You shrink it, stretch it, rotate it by some angle, flip it inside-out, smush it into lower dimensions, or just leave it alone! In some cases, you can even change your mind and smash an undo button called “Commutativity”.

Types of Matrices

Oh there’s several!

Square Matrix

What it says. An n×nn \times n matrix.

Identity Matrix

A square matrix (because it’s the “do nothing” operator that maps VVV \rightarrow V) that looks like this.

In=[1000010000100001]I_n = \begin{bmatrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{bmatrix}

Singular Matrix

Super important. These are square matrices that lose information. All these statements are saying this same thing:

  • “determinant zero”
  • “not invertible”
  • “rank deficient”
  • “columns linearly dependent”
  • “rows linearly dependent”
  • “nullspace contains nonzero vector” — contains some non-zero vector vv such that Av=0Av = 0
  • “transformation collapses dimensions”
  • “zero is an eigenvalue”

Invertible Matrix

You really need to think about what this does to the vector space. A matrix is invertible if it loses no information when it transforms the space: it preserves all independent directions/bases. These do not:

  • Flatten dimensions
  • Collapse directions
  • Merge distinct vectors

like Singular matrices. It’s still a square matrix ARn×nA \in \mathbb{R}^{n \times n} that has some other square matrix BRn×nB \in \mathbb{R}^{n \times n} such that

AB=BA=InAB = BA = I_n

This other square matrix BB is the Inverse of AA and is denoted A1A^{-1}. It has some properties:

  1. Its Determinant is not zero.
  2. It has a Full Rank
  3. If xx is some vector (xRnx \in \mathbb{R}^n), Ax=0Ax = 0 has only one solution: xx is full of zeroes!
  4. If bb is some vector (bRnb \in \mathbb{R}^n), Ax=bAx = b has just one solution x=A1bx = A^{-1}b

Orthogonal Matrix

AAT=ATA=I     AT=A1AA^T = A^TA = I \ \implies A^T = A^{-1}

These things do rotations and reflections without destroying information. Geometry is preserved. PCA components are orthogonal. The UU and VV in SVD are orthogonal. These are important things, yo.

Diagonal Matrix

What it says. Just values on the diagonal. These are super nice since you scale independently along all axes. This is the goal of (the middle part of) the SVD decomposition for a reason.

A=[120000100005600009]A = \begin{bmatrix} 12 & 0 & 0 & \cdots & 0 \\ 0 & -1 & 0 & \cdots & 0 \\ 0 & 0 & 56 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 9 \end{bmatrix}

Symmetric Matrix

A Symmetric matrix is one where A=ATA = A^T

Conjugate Matrix

Just for matrices with Complex Numbers. A Conjugate matrix just flips the sign of the imaginary part of any complex numbers in a matrix and is denoted AA^*

Hermitian Matrix

A Hermitian matrix is when a matrix equals its Conjugate Transpose: A=AˉTA = \bar{A}^T. Pretty important in ML and Quantum Mechanics. First Transpose, then take Conjugate.


Matrix Operations

Here’s an awesome cookbook of all the stuff you can do with matrices.

Transposition

Transposes are when you turn a matrix AA‘s rows into columns and vice-versa and denote the monstrosity ATA^T. They’re just a different kind of transformation and are useful depending on the problem you’re trying to solve. They have some properties.

  • (AT)T=A(A^T)^T = A
  • (AB)T=BTAT(AB)^T = B^TA^T
  • (A+B)T=AT+BT(A+B)^T = A^T + B^T
  • det(AT)=det(A)det(A^T) = det(A)
  • (αA)T=αAT(\alpha A)^T = \alpha A^T
  • (A1)T=(AT)1(A^{-1})^T = (A^T)^{-1}

Inversion

Sort of like division but with matrices. AA1=IA\cdot A^{-1} = I. Not all matrices are invertible. A1A^{-1} ‘un-does’ the transformation of the vector space specified by AA.

Commutation

In general, ABBAAB \ne BA. You can verify this yourself with two 2×22 \times 2 matrices. But there are cases where this holds:

  • AI=IA=AAI = IA = A
  • A0=0A=AA0 = 0A = A
  • If B=λIB = \lambda I for some scalar λ\lambda then AB=λA=BAAB = \lambda A = BA (i.e. you can scale the Identity Matrix all you want)
  • Some diagonal matrices…

Singular Value Decomposition

This is really cool. It decomposes every matrix (which remember is a transformation) into Rotation → Scaling → Rotation: it says “Every Matrix is built from rotations + scaling.” That’s it. Every complicated transformation can be decomposed/understood this way!

A=UΣVTA = U\Sigma V^T
  • VTV^T rotates. Columns of VV are input directions and right singular vectors.
  • Σ\Sigma scales (stretch and shrink) along perpendicular directions, the stretch factors.
  • UU rotates again. Columns of UU are output directions and are left singular vectors.

The “directions” here refer to geometry and not the coordinates (which are representations of the geometry!)

Eigendecomposition

This is very closely related to SVD but not the same. And not all matrices can be eigendecomposed because not all of them have eigenvectors and eigenvalues! This wants to find the directions preserved by the transformation but SVD is more general and wants to find the orthogonal directions stretched independently. Eigendecomposition is:

A=PDP1A = P D P^{-1}

Where PP contains columns of eigenvectors and DD the eigenvalues. Looks suspiciously similar!

Important

Eigendecomposition is based on some ideal: “This transform has some invariant directions.

Real World™ data is messy. SVD is the more Universal decomposition. “Every transform can be understood as rotations and scaling.” or “Which perpendicular directions experience pure stretching?”

This is why SVD is used a lot more in ML. This is why PCA and Recommender Systems and Transformers rely on it.

PCA

This is basically centered SVD!

Matrix Properties

Traces

This is just the sum of the diagonal elements of a Square Matrix. Tr(A)=Ai,iTr(A) = \sum{A_{i,i}}

Determinants

This gives you a scalar (boring-ass number) of your matrix/transformation. Different matrices/transformations (rotate, shear, stretch) can have the same determinant! It’s a kind of summary metric. These are only defined for square matrices (think of composability).

  • The value tells you how much the objects in the Vector Space are scaled by (e.g. area doubles in 2D, volume triples in 3D)
  • The sign tells you whether the orientation is preserved.

So det(A)=3det(A) = -3 in 2D space means the area is tripled and the orientation is flipped. What about det(A)=0det(A) = 0? This means that the space is collapsed into some lower dimension (4D → 3, 2, 1, or 0). The zero means that “some dimension disappeared.” Consider:

A=[1000010000100000]A = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

that maps (x,y,z,p)(x,y,z,0)(x,y,z,p) \rightarrow (x,y,z,0). An entire dimension disappeared! So det(A)=0det(A) = 0

Rank

This is an easy concept but is pretty important downstream. It’s the number of linearly independent rows or columns of a matrix.

When you do you pick rows versus columns? The smallest of the two: if you have a ‘rectangular’ m×nm \times n matrix (always rows ×\times columns), rankmin(m,n)rank \leq min(m, n).

A “Full Rank” matrix is one where there are no linearly dependent (not independent!) rows or columns (whichever is smallest). So if you have a matrix that’s 4 rows and 3 columns, the maximum rank possible is 3. Now look at the columns and see if you can figure out if one column depends on the other. Didn’t find any? Awesome, you have a Full Rank matrix.

Found one that depends on the other? Your rank is 2. Found two? Rank 1. See this Wikipedia article on Row Echelon Forms for more.

This means something with transformation!

A Full-Rank matrix has no ‘redundant instructions’ and always has a non-zero Determinant! If you have a 3x3 matrix with rank 2, there are two linearly dependent columns.

Other Tidbits

The Matrix-Vector Derivative Identity

x(xTAx)=(A+AT)x\frac{\partial}{\partial{x}} (x^T A x) = (A + A^T)x

Importantly, if f(x)=xTxf(x) = x^T x, then the partial derivative is 2x2x since A=IA = I.

Random

A Square Matrix is invertible iffiff it has Full Rank.

Cramer’s Rule

Easier shown with an example. Heaven forbid you compute things by hand these days…

Solve Ax=b with A=[211312212],x=[xyz],b=[8113].\textbf{Solve } A\mathbf{x}=\mathbf{b}\text{ with } A=\begin{bmatrix} 2 & 1 & -1\\ -3 & -1 & 2\\ -2 & 1 & 2 \end{bmatrix},\quad \mathbf{x}=\begin{bmatrix}x\\y\\z\end{bmatrix},\quad \mathbf{b}=\begin{bmatrix}8\\-11\\-3\end{bmatrix}. det(A)=1.\det(A) = -1.

Replace the ii-th column of AA by b\mathbf{b} to get AiA_i:

A1=[8111112312],A2=[2813112232],A3=[2183111213].A_1= \begin{bmatrix} 8 & 1 & -1\\ -11 & -1 & 2\\ -3 & 1 & 2 \end{bmatrix},\quad A_2= \begin{bmatrix} 2 & 8 & -1\\ -3 & -11 & 2\\ -2 & -3 & 2 \end{bmatrix},\quad A_3= \begin{bmatrix} 2 & 1 & 8\\ -3 & -1 & -11\\ -2 & 1 & -3 \end{bmatrix}. det(A1)=2,det(A2)=3,det(A3)=1.\det(A_1)=-2,\qquad \det(A_2)=-3,\qquad \det(A_3)=1.

By Cramer’s Rule,

x=det(A1)det(A)=21=2,y=det(A2)det(A)=31=3,z=det(A3)det(A)=11=1.x=\frac{\det(A_1)}{\det(A)}=\frac{-2}{-1}=2,\qquad y=\frac{\det(A_2)}{\det(A)}=\frac{-3}{-1}=3,\qquad z=\frac{\det(A_3)}{\det(A)}=\frac{1}{-1}=-1. (x,y,z)=(2,3,1).\boxed{(x,y,z)=(2,\,3,\,-1).}