Skip to content

Latest commit

 

History

History
273 lines (167 loc) · 16.6 KB

README.md

File metadata and controls

273 lines (167 loc) · 16.6 KB

18.335J/6.7310J: Introduction to Numerical Methods

This is the repository of course materials for the 18.335J/6.7310J course at MIT, taught by Dr. Andrew Horning, in Spring 2023.

Syllabus

Lectures: Tuesday/Thursday 11am–12:30pm in room 4-370

Office Hours: Wednesday and Thursday at 12:30 - 1:30 in room 2-238C.

Contact: horninga@mit.edu

Topics: Advanced introduction to numerical linear algebra and related numerical methods. Topics include direct and iterative methods for linear systems, eigenvalue decompositions and QR/SVD factorizations, stability and accuracy of numerical algorithms, the IEEE floating-point standard, sparse and structured matrices, and linear algebra software. Other topics may include memory hierarchies and the impact of caches on algorithms, nonlinear optimization, numerical integration, FFTs, and sensitivity analysis. Problem sets will involve use of Julia, a Matlab-like environment (little or no prior experience required; you will learn as you go).

Launch a Julia environment in the cloud: Binder

Prerequisites: Understanding of linear algebra (18.06, 18.700, or equivalents). 18.335 is a graduate-level subject, however, so much more mathematical maturity, ability to deal with abstractions and proofs, and general exposure to mathematics is assumed than for 18.06!

Textbook: The primary textbook for the course is Numerical Linear Algebra by Trefethen and Bau. For a classical (and rigorous) treatment, see Nick Higham's Accuracy and Stability of Numerical Algorithms.

Other Reading: Previous terms can be found in branches of the 18335 git repository. The course notes from 18.335 in much earlier terms can be found on OpenCourseWare. For a review of iterative methods, the online books Templates for the Solution of Linear Systems (Barrett et al.) and Templates for the Solution of Algebraic Eigenvalue Problems are useful surveys.

Grading: 40% problem sets (four psets due / released every other Friday), 30% take-home mid-term exam (second week of April), 30% final project (one-page proposal due Friday April 7, project due Tuesday May 16).

TA/grader: TBD

Collaboration policy: Talk to anyone you want to and read anything you want to, with three exceptions: First, you may not refer to homework solutions from the previous terms in which I taught 18.335. Second, make a solid effort to solve a problem on your own before discussing it with classmates or googling. Third, no matter whom you talk to or what you read, write up the solution on your own, without having their answer in front of you.

Final Projects: The final project will be an 8–15 page paper reviewing some interesting numerical algorithm not covered in the course. See the 18.335 final-projects page for more information, including topics from past semesters.

Assignments

  • Pset 1 is due on Friday, February 24 at 11:59pm.

  • Pset 2 is due on Friday, March 17 at 11:59pm.

  • Pset 3 is due on Friday, April 14 at 11:59pm.

Lecture Summaries and Handouts

Lecture 1 (February 7)

This course is about Numerical Linear Algebra (NLA) and related numerical methods. But why do we need NLA? How does it fit in to other areas of computational science and engineering (CSE)? Three simple examples demonstrate how NLA problems arise naturally when solving problems drawn from across continuous applied mathematics.

  • Solving Poisson's equation: from charge density to electric potential. (Linear systems: LU and Cholesky, iterative methods.)
  • Dynamic Mode Decomposition: learning models from data. (Least squares: QR factorization, SVD, low-rank approximation.)
  • Charge density of interacting electrons: NLA in nonlinear problems. (Eigenvalue problem: QR algorithm, iterative methods)

NLA is often applied in tandem with tools from other fields of mathematics: approximation theory, functional analysis, and statistics, to name a few. We'll focus on NLA, which is a computational workhorse within CSE. The starting point is floating point: how do we respresent real numbers on the computer?

Further Reading: L.N. Trefethen, The Definition of Numerical Analysis.

Lecture 2 (February 9)

  • Floating point arithmetic, exact rounding, and the "fundamental axiom"
  • Catastrophic cancellation, overflow, underflow
  • Forward and backward stability
  • Stability of summation algorithms

Further Reading: L. N. Trefethen, Lectures 13 and 14. Also, see the notebook about floating point.

Lecture 3 (February 14)

  • Vector and matrix norms
  • Jacobian and condition numbers
  • Accuracy <= backward stable algorithms + well-conditioned problems

Further Reading: L. N. Trefethen, Lectures 12 and 15.

Lecture 4 (February 16)

  • Solving Ax = b
  • Condition number of A
  • Orthogonal/Unitary matrices
  • The singular value decomposition (SVD)

** Further Reading:** L. N. Trefethen, Lectures 4 and 5.

Lecture 5 (February 23)

  • Gaussian elimination and A = LU (no row interchanges)
  • Cost (flops) for A = LU and Ux = b
  • Solving Ax=b via A=LU: when to save L?
  • Partial pivoting and backward stability (counterexamples)

** Further Reading:** L. N. Trefethen, Lectures 20, 21, and 22. Can you bring GE's instability to life in this example?

Lecture 6 (February 28)

This lecture is remote (MIT only): https://mit.zoom.us/j/96174583317

  • Partial pivoting and PA=LU
  • Solving Ax=b via PA=LU
  • Backward stability of PA=LU in theory and in practice
  • Sparse LU factors via row and column pivoting

** Further Reading:** L. N. Trefethen, Lectures 21 and 22. See the MATLAB docs for PA=LU for more examples of pivoting in action.

Lecture 7 (March 2)

  • The Cholesky decomposition: symmetric elimination for symmetric positive definite (SPD) matrices
  • Cost (1/2 of A=LU) and stability (backward stable, no pivoting) of Cholesky
  • Column and row pivoting to improve sparsity
  • Least-squares solutions to overdetermined linear systems

** Further Reading:** L. N. Trefethen, Lecture 23. See the MATLAB docs for Cholesky and, e.g., approximate minimal degree (AMD) reordering to reduce fill-in.

Lecture 8 (March 7)

Three ways to compute least-squares solutions to overdetermined linear systems

  • The normal equations
  • The singular value decomposition
  • The QR decomposition

All three formulations lead to the pseudoinverse of A and are mathematically equivalent, but not numerically equivalent. The SVD approach leverages orthogonal projections to simplify the pseudoinverse and avoid squaring the condition number of the resulting n x n diagonal system for the least-squares solution. While the SVD diagonalizes A by rotating/reflecting inputs and outputs, the QR decomposition triangularizes A by rotating/reflecting outputs. The result is an orthogonal basis for the column space of A and a triangular n x n system to solve for the least-squares solution.

Further Reading: L. N. Trefethen, Lectures 11, 18, and 19.

Lecture 9 (March 9)

  • Sensitivity and conditioning of least-squares solutions
  • Gram-Schmidt orthogonalization and A = QR
  • Loss of orthogonality and modified Gram-Schmidt
  • Householder triangularization

Further Reading: L. N. Trefethen, Lectures 7, 8, and 9.

Lecture 10 (March 14)

  • Triangularization vs orthogonalization
  • "Thin" and "full" QR decompositions
  • Computing and applying Householder reflectors
  • Householder QR algorithm
  • Flop count and backward stability

Further Reading: L. N. Trefethen, Lectures 10 and 16. For backward stability analysis, see Chapter 18.3 of Higham.

Lecture 11 (March 16)

  • Householder reflections and Givens rotations demos
  • Eigenvalues, eigenvectors, and diagonalizable matrices
  • Why do we need eigenvectors? (And not just the SVD)

Further Reading: L. N. Trefethen, Lecture 24. Explore the reflections-and-rotations notebook and the least-squares notebook for more computational demos with least-squares and A=QR in Julia.

Lecture 12 (March 21)

  • Eigensolvers must be iterative
  • Simultaneous power iterations
  • The QR iteration for eigenvalues
  • Convergence of QR iterations

Further Reading: L. N. Trefethen, Lecture 25 and 28. For the remarkable stories of John Francis and Vera Kublanovskaya, who independently proposed and analyzed the QR algorithm, see Gene Golub and Frank Uhlig's article.

Lecture 13 (March 23)

  • Accelerating power iterations with shift-and-invert
  • Rayleigh quotient iterations
  • QR iterations with shifts
  • Tridiagonal reduction and QR iterations

Further Reading: L. N. Trefethen, Lectures 26, 27, and 29.

Lecture 14 (April 4)

  • Wilkinson's condition number for simple eigenvalues
  • "Sine Theta" theorem for eigenvectors of Hermitian matrices
  • The pseudospectrum of a matrix

Further Reading: Check out the Pseudospectral Gateway for more about pseudospectral analysis and its remarkable power in applications across applied math. For more about the perturbation theory of invariant subspaces, see Prof. Ipsen excellent summary of Pete Stewart's contributions to the area.

Lecture 15 (April 6)

  • Iterative methods and mat-vec paradigm
  • Build a subspace and extract approximations: Galerkin and Rayleigh-Ritz projections
  • Arnoldi iteration: power iteration meets modified Gram-Schmidt

Further Reading: L. N. Trefethen, Lectures 32 and 33. Check out the SuiteSparse Matrix Collection for four decades worth of sparse, structured matrix examples.

Lecture 16 (April 11)

  • Arnoldi decomposition
  • Approximating eigenvalues with Arnoldi
  • Convergence of Ritz values

Further Reading: L. N. Trefethen, Lectures 33 and 34. For a beautiful explanation of the Ritz values' behavior using potential theory, see the SIAM review paper by Arno Kuijlaars. You can also read about the implicitly restarted Arnoldi iteration and the bells and whistles that made it a standard iterative eigensolver for non-Hermitian matrices.

Lecture 17 (April 13)

  • Arnoldi for Ax=b
  • The method of Generalized Minimal Residuals (GMRES)
  • Convergence of GMRES

Further Reading: L. N. Trefethen, Lecture 35.

Lecture 18 (April 20)

  • Symmetric and symmetric positive definite matrices
  • The Lanczos iteration
  • The method of Conjugate Gradients (CG)

Further Reading: L. N. Trefethen, Lectures 36 and 38.

Lecture 19 (April 25)

  • Convergence of CG
  • CG as an optimization algorithm
  • Gradient descent

Further Reading: L. N. Trefethen, Lecture 38.

Lecture 20 (April 27)

  • Convergence of gradient descent
  • CG vs gradient descent
  • Newton's method

Further Reading: See this excellent introduction to CG from first principles.

Lecture 21 (May 2)

  • Newton's method in higher dimensions
  • Secant and quasi-Newton methods
  • Broyden's updates and BFGS

Further Reading: See Prof. Johnson's notes on the BFGS update (find there also the original references of Broyden, Fletcher, Goldfarb, and Shanno). For a modern perspective on secant and multi-secant methods, see Feng and Saad.

Lecture 22 (May 4)

  • Many big data matrices are approximately low-rank
  • Optimal low-rank approximation and the SVD
  • Randomized range-finders

Further Reading: See the now classic review paper by Halko, Martinsson, and Tropp for an introduction to randomized range-finders and approximate matrix decompositions. This interesting paper by Udell and Townsend explores the origins of low-rank structure in big-data matrices.

Lecture 23 (May 9)

  • Accuracy of randomized range-finders
  • Oversampling and failure probability
  • Accelerating NLA via "sketch-and-solve"

Further Reading: Read, for example, the recent work of Nakatsukasa and Tropp to see how randomized "sketch-and-solve" techniques can accelerate established iterative methods for linear systems and the eigenproblem.

Lecture 24 (May 11)

  • Dynamic Mode Decomposition (DMD)
  • Procrustes problems over matrix manifolds
  • physics informed Dynamic Mode Decomposition (piDMD)

Further Reading: Check out Steven Brunton's youtube channel for a variety of wonderful presentations on DMD and its applications, including a video on piDMD by Peter Baddoo. You can also find the piDMD paper here.

Lecture 25 (May 16)

  • Why (and when) do we count flops?
  • Memory hierarchy and the "ideal cache"
  • Cache-aware matrix-matrix multiplication

Further Reading: Check out these slides and experiments by Prof. Johnson from 2008 on the memory hierarchy.