Skip to main content

Linear Algebra

Basis Space and Basis Vectors

Imagine these in 2D as ‘tiling’ a vector space. Imagine making an even grid with those long pieces from Erector Sets. You can shear/smush them only at certain angles. Now imagine that the long pieces can only stretch or shrink lengthwise. That’s kinda what these are.

Now “applying” a Matrix means that you change this Vector Space and all the vectors you’ve embedded in it in some way. You shrink it, stretch it, rotate it by some angle, flip it inside-out, or just leave it alone! In some cases, you can even change your mind and smash an undo button called “Commutativity”.

Vector Norm

This is a function that takes a vector and maps it to R+\mathbb{R}^+ so you get an idea of the ‘size’ or ‘length’ of the vector.

ap=(i=1naip)1/p\|a\|_p = \left( \sum_{i=1}^{n} |a_i|^p \right)^{1/p}

That’s the p-norm. Using that, and for p=1,2,p = 1,2,\infty, you get these norms:

  1. Manhattan a1=a1+a2++an\|a\|_1 = |a_1| + |a_2| + \cdots + |a_n|
  2. Euclidean a2=a12+a22++an2\|a\|_2 = \sqrt{a_1^2 + a_2^2 + \cdots + a_n^2}
  3. Infinity a=maxiai\|a\|_\infty = \max_i |a_i|
    AKA The Fuck It Norm

TODO: When does one use each?

Eigenvectors (and Eigenvalues)

These just vectors (of course) but relate to matrics.

TODO: Finish this.

Dot and Cross Products

Vectors Only Please

Note that Dot and Cross Products are only defined for Vectors. I mean there are things like the Kronecker Product but that’s not what we’re dealing with here.

Dot Products

These are easy-peasy and tell you about how well two vectors vibe with each other. The result is a number. Consider two vectors with the same size. If a,bRn\bold{a},\bold{b} \in \mathbb{R}^n

a.b=a b cosθ=k=1nakbk\bold{a}.{\bold{b}} = ||\bold{a}||\space||\bold{b}||\space cos\theta = \sum_{k=1}^n{a_kb_k}

That’s about it. If you get a zero, they’re orthogonal (at 90o90^o in 2D space). That Cosine is a good similarity measure that’s used in all manner of Machine Learning algos like LLMs. E.g. Recall that Cos(90o)=0Cos(90^o) = 0, which you can take to mean that they’re not similar at all.

Cross Products

These work in 3D for the most part and will give you a new vector that is orthogonal/perpendicular to the plane of the two input vectors (which are 3D!) I’ve never used them for anything. Read this for more.

Matrix Rank

This is an easy concept but is pretty important downstream. It’s the number of linearly independent rows or columns of a matrix.

When you do you pick rows versus columns? The smallest of the two: if you have a ‘rectangular’ m×nm \times n matrix (always rows ×\times columns), rankmin(m,n)rank \leq min(m, n).

A “Full Rank” matrix is one where there are no linearly dependent (not independent!) rows or columns (whichever is smallest). So if you have a matrix that’s 4 rows and 3 columns, the maximum rank possible is 3. Now look at the columns and see if you can figure out if one column depends on the other. Didn’t find any? Awesome, you have a Full Rank matrix.

Found one that depends on the other? Your rank is 2. Found two? Rank 1. See this Wikipedia article on Row Echelon Forms for more.

Identity Matrix

A nice simple square matrix that looks like this.

In=[1000010000100001]I_n = \begin{bmatrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{bmatrix}

Determinants

This gives you a scalar (boring-ass number) from your matrix. This number tells you how much applying the matrix will transform the magnitude and direction of an area (2D) or volume (3D+) of a space. If the number’s negative you get a mirror image.

TODO: More here…

Commutativity

In general, ABBAAB \ne BA. You can verify this yourself with two 2×22 \times 2 matrices. But there are cases where this holds:

  • AI=IA=AAI = IA = A
  • A0=0A=AA0 = 0A = A
  • If B=λIB = \lambda I for some scalar λ\lambda then AB=λA=BAAB = \lambda A = BA (i.e. you can scale the Identity Matrix all you want)
  • Some diagonal matrices…

Invertible Matrix

This is a square matrix ARn×nA \in \mathbb{R}^{n \times n} that has some other square matrix BRn×nB \in \mathbb{R}^{n \times n} such that

AB=BA=InAB = BA = I_n

This other square matrix BB is the Inverse of AA and is denoted A1A^{-1}. It has some properties:

  1. Its Determinant is not zero.
  2. It has a Full Rank
  3. If xx is some vector (xRnx \in \mathbb{R}^n), Ax=0Ax = 0 has only one solution: xx is full of zeroes!
  4. If bb is some vector (bRnb \in \mathbb{R}^n), Ax=bAx = b has just one solution x=A1bx = A^{-1}b

Singular Matrix

This is a square matrix (n×nn \times n) where

  1. The Determinant is Zero
  2. It’s not Full Rank
  3. There’s some non-zero vector xx such that Ax=0Ax = 0
  4. It is not invertible!

These things smush a vector space into lower dimensions. Well really they create a mapping to a lower space (the original is preserved) but yeah.

Transposes

Transposes are when you turn a matrix AA‘s rows into columns and vice-versa and denote the monstrosity ATA^T. They’re just a different kind of transformation and are useful depending on the problem you’re trying to solve. They have some properties.

  • (AT)T=A(A^T)^T = A
  • (AB)T=BTAT(AB)^T = B^TA^T
  • (A+B)T=AT+BT(A+B)^T = A^T + B^T
  • det(AT)=det(A)det(A^T) = det(A)
  • (αA)T=αAT(\alpha A)^T = \alpha A^T
  • (A1)T=(AT)1(A^{-1})^T = (A^T)^{-1}

Miscellaneous

Other Types of Matrices

  • An Orthogonal matrix is one where AT=A1A^T = A^{-1}
  • A Symmetric matrix is one where A=ATA = A^T
  • A Conjugate matrix just flips the sign of the imaginary part of any complex numbers in a matrix.
  • A Hermitian matrix is when a matrix equals its Conjugate Transpose: A=AˉTA = \bar{A}^T
    Pretty important in ML and Quantum Mechanics
  • TODO: Conjugate and Adjoint matrices…

Cramer’s Rule

Easier shown with an example. Heaven forbid you compute things by hand these days…

Solve Ax=b with A=[211312212],x=[xyz],b=[8113].\textbf{Solve } A\mathbf{x}=\mathbf{b}\text{ with } A=\begin{bmatrix} 2 & 1 & -1\\ -3 & -1 & 2\\ -2 & 1 & 2 \end{bmatrix},\quad \mathbf{x}=\begin{bmatrix}x\\y\\z\end{bmatrix},\quad \mathbf{b}=\begin{bmatrix}8\\-11\\-3\end{bmatrix}. det(A)=1.\det(A) = -1.

Replace the ii-th column of AA by b\mathbf{b} to get AiA_i:

A1=[8111112312],A2=[2813112232],A3=[2183111213].A_1= \begin{bmatrix} 8 & 1 & -1\\ -11 & -1 & 2\\ -3 & 1 & 2 \end{bmatrix},\quad A_2= \begin{bmatrix} 2 & 8 & -1\\ -3 & -11 & 2\\ -2 & -3 & 2 \end{bmatrix},\quad A_3= \begin{bmatrix} 2 & 1 & 8\\ -3 & -1 & -11\\ -2 & 1 & -3 \end{bmatrix}. det(A1)=2,det(A2)=3,det(A3)=1.\det(A_1)=-2,\qquad \det(A_2)=-3,\qquad \det(A_3)=1.

By Cramer’s Rule,

x=det(A1)det(A)=21=2,y=det(A2)det(A)=31=3,z=det(A3)det(A)=11=1.x=\frac{\det(A_1)}{\det(A)}=\frac{-2}{-1}=2,\qquad y=\frac{\det(A_2)}{\det(A)}=\frac{-3}{-1}=3,\qquad z=\frac{\det(A_3)}{\det(A)}=\frac{1}{-1}=-1. (x,y,z)=(2,3,1).\boxed{(x,y,z)=(2,\,3,\,-1).}