🧠 AI Math Roadmap β€” From Zero to Building GPT From Scratch

Phase 1: Foundations (Start Here)

1. Basic Math Refresher (if needed)

  • Arithmetic, exponents, logarithms
  • Algebra: solving equations, factorization
  • Functions and graphs
  • Inequalities, absolute value

2. Linear Algebra (Core for All AI Models)

  • Vectors and vector spaces
  • Dot product, cross product
  • Matrices: multiplication, inverse, transpose
  • Matrix rank, determinant, trace
  • Systems of linear equations (Gaussian elimination)
  • Eigenvalues and eigenvectors
  • Diagonalization and Singular Value Decomposition (SVD)
  • Norms, projections, orthogonality

πŸ“š Resources:

  • 3Blue1Brown’s Essence of Linear Algebra
  • MIT OCW - Linear Algebra (Gilbert Strang)

3. Calculus (Backprop Depends on It)

  • Limits and continuity
  • Derivatives (single-variable, then partial)
  • Chain rule
  • Gradients, Jacobians, Hessians
  • Optimization: critical points, max/min
  • Multivariable integrals
  • Taylor series

πŸ“š Resources:

  • MIT OCW Calculus I–III
  • Khan Academy

Phase 2: Probabilistic Thinking

4. Probability & Statistics

  • Probability rules: union, intersection, conditional
  • Random variables (discrete and continuous)
  • Expectation, variance, covariance
  • Distributions: Bernoulli, Binomial, Gaussian, Poisson
  • Bayes’ Theorem
  • Law of Large Numbers, Central Limit Theorem
  • Entropy, cross-entropy, KL divergence
  • MLE, MAP estimation
  • Markov chains

πŸ“š Resources:

  • StatQuest (YouTube)
  • Khan Academy - Probability & Stats

Phase 3: Learning to Train

5. Optimization

  • Gradient descent: SGD, Adam, RMSProp
  • Loss functions and minimization
  • Convex vs non-convex functions
  • Saddle points, local/global minima
  • Momentum, learning rate schedules
  • Lagrange multipliers
  • Lipschitz continuity

πŸ“š Resources:

  • Convex Optimization by Boyd
  • Deep Learning Book - Chapter 4

6. Numerical Methods

  • Floating point precision
  • Numerical differentiation
  • Gradient checking
  • LU, QR decomposition
  • Efficient matrix operations

πŸ“š Resources:

  • Numerical Recipes (book)
  • MIT OCW Numerical Methods

Phase 4: Building Intelligence

7. Information Theory

  • Entropy
  • Joint, marginal, conditional entropy
  • Cross-entropy loss
  • KL divergence
  • Mutual information
  • Perplexity

πŸ“š Resources:

  • Elements of Information Theory - Cover & Thomas
  • StatQuest - Info Theory

8. Graph Theory

  • Directed Acyclic Graphs (DAGs)
  • Topological sorting
  • Adjacency matrices
  • Attention graphs

πŸ“š Resources:

  • MIT 6.006 - Intro to Algorithms
  • Khan Academy

9. Combinatorics & Discrete Math

  • Set theory, logic
  • Permutations and combinations
  • Recursion and recurrence relations
  • Trees and graphs
  • Symbolic representations

πŸ“š Resources:

  • Discrete Mathematics by Rosen
  • MIT OCW - Mathematics for CS

Phase 5: Advanced & Specialized Topics

10. Fourier Analysis & Signal Processing

  • Fourier series, transforms
  • Discrete Fourier Transform (DFT, FFT)
  • Convolutions (1D, 2D)
  • Frequency domain filtering

πŸ“š Resources:

  • The Scientist & Engineer’s Guide to DSP
  • 3Blue1Brown - Fourier Intuition

11. Topology & Manifolds (Advanced)

  • Metric spaces, continuity
  • Manifolds and embeddings
  • Lipschitz mappings
  • Generalization theory

πŸ“š Resources:

  • Topology by Munkres
  • Geometric Deep Learning literature

12. Category Theory (God Mode)

  • Sets, categories, morphisms
  • Functors, monoids, natural transformations
  • Compositionality of learning systems

πŸ“š Resources:

  • Category Theory for Programmers - Bartosz Milewski
  • nLab (for deep research)

Final Phase: Put It All Together

You’ll be able to:

  • Build a neural net from scratch (forward + backprop)
  • Write your own autodiff engine
  • Train it with your own optimizer
  • Implement transformers and attention
  • Build a GPT-style model from raw math and code