{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Obligatorisk oppgave 1 i MAT1110, V26\n",
        "\n",
        "Øyvind Ryan  \n",
        "2026-02-12\n",
        "\n",
        "Denne obligatoriske oppgaven tar for seg en viktig anvendelse av\n",
        "kjerneregelen innen maskinlæring. Det er mye tekst og mange oppgaver\n",
        "her, og det er frivillig hvor mye du gjør og leser - det eneste som er\n",
        "obligatorisk er at du faktisk gjør en innlevering i canvas (fristen er\n",
        "26. februar kl. 1430). Du får tilbakemelding fra gruppelærer på det du\n",
        "leverer inn, og alle innleveringer blir godkjent.\n",
        "\n",
        "Legg merke til at de fleste av oppgavene i denne teksten er betydelig\n",
        "enklere å løse, sammenlignet med det å lese og forstå hele teksten (som\n",
        "er mer vanskelig enn stoffet ellers i kurset, da det dukker opp en del\n",
        "tekniske ting).\n",
        "\n",
        "Vi anbefaler derfor at du starter med å løse noen av oppgavene, spesielt\n",
        "oppgave 1.1, 1.2, 1.3, 1.6, 1.7 (i denne rekkefølgen). Hvis dette går\n",
        "greit kan du forsøke å lese mer av teksten, samt se på de resterende\n",
        "oppgavene (1.4, 1.5, 1.8). Hvis også dette går greit kan du forsøke å\n",
        "lese resten av teksten.\n",
        "\n",
        "Oppgaven er på engelsk. Du bestemmer selv om du skriver på engelsk eller\n",
        "norsk.\n",
        "\n",
        "På kurssidene kan du finne en jupyter notebook for pythonkoden i denne\n",
        "obligatoriske oppgaven, der du kan kjøre og eksperimentere med koden i\n",
        "oppgaven. Denne koden er noe mer avansert enn det dere støtte på i\n",
        "IN1900.\n",
        "\n",
        "# 1. Backpropagation and the chain rule\n",
        "\n",
        "## 1.1 Introduction\n",
        "\n",
        "In machine learning one tries to find a function which approximates a\n",
        "given set of measurement data in a best possible way. For this to be\n",
        "possible computationally, we require that the candidate functions have a\n",
        "given form. If the measurement data are $\\mathbf{x}_i\\in\\mathbb{R}^n$,\n",
        "$\\mathbf{y}_i\\in\\mathbb{R}^n$, we would like to find an $\\mathbf{F}$ on\n",
        "the given form so that “the total error”\n",
        "$$\\sum_{i=1}^m | \\mathbf{F}(\\mathbf{x}_i)-\\mathbf{y}_i |^2$$ in the\n",
        "measurement data is as small as possible. We assume that the functions\n",
        "on the given form are uniquely described by a set of *parameters*\n",
        "$w_1,w_2,...$, and we then write\n",
        "$$\\mathbf{F}_{w_1,w_2,...}: \\mathbb{R}^n\\to\\mathbb{R}^n$$ for\n",
        "$\\mathbf{F}$. Our question becomes: Which choice of parameters gives a\n",
        "function $\\mathbf{F}_{w_1,w_2,...}$ with total error\n",
        "<span id=\"eq-find-min\">$$f(w_1,w_2,...)=\\sum_{i=1}^m | \\mathbf{F}_{w_1,w_2,...}(\\mathbf{x}_i)-\\mathbf{y}_i |^2. \\qquad(1)$$</span>\n",
        "as small as possible? This is a minimisation problem where $w_1,w_2,...$\n",
        "are the variables (not $\\mathbf{x}_i$ and $\\mathbf{y}_i$, these are\n",
        "given measurement data!). The $f$ in\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 1</a> is also called\n",
        "the *loss function*.\n",
        "\n",
        "Finding the $w_1,w_2,...$ which give the minimum in\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 1</a> turns out to\n",
        "be difficult. To find good approximations to this minimum can be\n",
        "simpler: From analysis in several variables we know that the *gradient*\n",
        "of a function $f:\\mathbb{R}^n\\to\\mathbb{R}$,\n",
        "$$\\nabla f = \\begin{pmatrix} \\frac{\\partial f}{\\partial x_1}, \\frac{\\partial f}{\\partial x_2},...,\\frac{\\partial f}{\\partial x_n}\\end{pmatrix},$$\n",
        "points in the direction where $f$ increases the most. Therefore, the\n",
        "function will decrease the most if we move in the direction of the\n",
        "negative gradient, and this is exactly what we will do: We compute the\n",
        "gradient of $f$ given by\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 1</a>, and follow\n",
        "this a little, to reach a point where $f$ has a smaller value.\n",
        "\n",
        "We will require our candidate function to be on the following form.\n",
        "<span id=\"eq-neuralnet\">$$\n",
        "\\mathbf{F}_{\\mathbf{w}^{(1)},\\mathbf{b}^{(1)},\\mathbf{w}^{(2)},\\mathbf{b}^{(2)},...} = \\mathbf{F}_1(\\mathbf{F}_2(\\cdots \\mathbf{F}_r(\\mathbf{x})))\n",
        " \\qquad(2)$$</span> where all $\\mathbf{w}^{(k)}$, $\\mathbf{b}^{(k)}$ are\n",
        "vectors of length $n$, and where $$\n",
        "\\mathbf{F}_k(\\mathbf{x}) = \\sigma(W^{(k)}\\mathbf{x}+\\mathbf{b}^{(k)}),\n",
        "$$ with <span id=\"eq-circtop\">$$\n",
        "W^{(k)}=\n",
        "\\begin{pmatrix}\n",
        "w_1^{(k)} & w_n^{(k)} & w_{n-1}^{(k)} & \\cdots & w_2^{(k)} \\\\ \n",
        "w_2^{(k)} & w_1^{(k)} & w_n^{(k)} & \\cdots & w_3^{(k)} \\\\\n",
        "w_3^{(k)} & w_2^{(k)} & w_1^{(k)} & \\cdots & w_4^{(k)} \\\\\n",
        "\\vdots & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
        "w_n^{(k)} & w_{n-1}^{(k)} & w_{n-2}^{(k)} & \\cdots & w_1^{(k)}\n",
        "\\end{pmatrix}.\n",
        " \\qquad(3)$$</span> Some comments are in order:\n",
        "\n",
        "-   $W^{(k)}$ is a special type of matrix which is constant along\n",
        "    diagonals. Such matrices are called *circulant Toeplitz matrices*,\n",
        "    and have a series of nice properties which can be useful in machine\n",
        "    learning. There also exist effective algorithms to multiply with\n",
        "    such matrices, something which is exploited in the function `cconv`,\n",
        "    to be defined later. `cconv` takes two vectors $\\mathbf{w}$ and\n",
        "    $\\mathbf{x}$ as input, and returns $W\\mathbf{x}$, where $W$ is the\n",
        "    circulant Toeplitz matrix with first column being $\\mathbf{w}$.\n",
        "-   $\\sigma$ above is the (component-wise) *activation function*. This\n",
        "    is first defined as a map from $\\mathbb{R}$ to $\\mathbb{R}$ by\n",
        "    <span id=\"eq-activationdef\">$$\\sigma(x)=1/(1+e^{-x}), \\qquad(4)$$</span>\n",
        "    and extended component-wise to a map from $\\mathbb{R}^n$ to\n",
        "    $\\mathbb{R}^n$ by\n",
        "    $\\sigma\\begin{pmatrix} x_1 \\\\ x_2 \\\\ \\vdots \\\\ x_n \\end{pmatrix} = \\begin{pmatrix} \\sigma(x_1) \\\\ \\sigma(x_2) \\\\ \\vdots \\\\ \\sigma(x_n) \\end{pmatrix}$.\n",
        "-   <a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 2</a> is also\n",
        "    called a *Neural Network* (NN), and the $\\mathbf{F}_k$’s are called\n",
        "    its *layers*. We see that the parameters of the candidate functions\n",
        "    are split in groups, as dictated by the vectors $\\mathbf{w}^{(i)}$,\n",
        "    $\\mathbf{b}^{(i)}$. When the $W^{(i)}$ have the form in\n",
        "    <a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 3</a> (as we\n",
        "    require), the neural network is called a *Convolutional Neural\n",
        "    Network* (CNN). In a general neural network the $W^{(i)}$ can have\n",
        "    other forms as well, and it can be the case that some of the\n",
        "    $W^{(i)}$ are circulant Toeplitz matrices, while others are not.\n",
        "\n",
        "Since <a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 2</a> is a\n",
        "composition of many functions, we must use the *chain rule* in several\n",
        "variables to find the gradient of\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 1</a>. For this we\n",
        "need to find the *Jacobians* of the “inner functions” in\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 2</a>. The Jacobian\n",
        "of $\\mathbf{F}(\\mathbf{x}):\\mathbb{R}^n\\to\\mathbb{R}^m$, denoted\n",
        "$\\mathbf{F}'(\\mathbf{x})$, is defined as the $m\\times n$-matrix $$\n",
        "\\begin{pmatrix} \n",
        "\\frac{\\partial F_1}{\\partial x_1} & \\frac{\\partial F_1}{\\partial x_2} & \\cdots & \\frac{\\partial F_1}{\\partial x_n} \\\\\n",
        "\\frac{\\partial F_2}{\\partial x_1} & \\frac{\\partial F_2}{\\partial x_2} & \\cdots & \\frac{\\partial F_2}{\\partial x_n} \\\\\n",
        "\\vdots                                       & \\vdots                                       & \\vdots & \\vdots \\\\\n",
        "\\frac{\\partial F_m}{\\partial x_1} & \\frac{\\partial F_m}{\\partial x_2} & \\cdots & \\frac{\\partial F_m}{\\partial x_n} \\\\                 \n",
        "\\end{pmatrix}\n",
        "$$ where $F_1,F_2,...,F_m$ are the component functions of $\\mathbf{F}$.\n",
        "The chain rule can now be formulated as follows..\n",
        "\n",
        "<span class=\"theorem-title\">**Theorem 1**</span> Let\n",
        "$\\mathbf{F}: \\mathbb{R}^k\\to\\mathbb{R}^m$,\n",
        "$\\mathbf{G}: \\mathbb{R}^n\\to\\mathbb{R}^k$ be differentiable functions.\n",
        "Then the composite function\n",
        "$\\mathbf{H}(\\mathbf{x})=\\mathbf{F}(\\mathbf{G}(\\mathbf{x}))$ is also a\n",
        "differentiable function, and\n",
        "$$\\mathbf{H}'(\\mathbf{x})=\\mathbf{F}'(\\mathbf{G}(\\mathbf{x}))\\mathbf{G}'(\\mathbf{x}).$$\n",
        "\n",
        "We thus obtain the Jacobian of a composite function by multiplying the\n",
        "Jacobians of the involved functions. In particular we need the Jacobian\n",
        "of the activation function, so let us start with this.\n",
        "\n",
        "<span class=\"theorem-title\">**Exercise 1**</span> Recall the definition\n",
        "of the activation function $\\sigma$ in\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 4</a>. Show\n",
        "that $\\sigma'(x)=\\sigma(x)(1-\\sigma(x))$.\n",
        "\n",
        "The component-wise extension of the activation function thus has the\n",
        "diagonal matrix with $\\sigma'(x_i)$ on the diagonal as Jacobian.\n",
        "\n",
        "Given measurement data $\\mathbf{x}_i\\in\\mathbb{R}^n$,\n",
        "$\\mathbf{y}_i\\in\\mathbb{R}^n$, we would like to find vectors\n",
        "$\\mathbf{w}^{(i)}$, $\\mathbf{b}^{(i)}$ that gives a neural network\n",
        "(<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 2</a>) with\n",
        "minimum total error $$\n",
        "f\\left( \\mathbf{w}^{(1)},\\mathbf{b}^{(1)},\\mathbf{w}^{(2)},\\mathbf{b}^{(2)},... \\right)\n",
        "= \\sum_{i=1}^m | \\mathbf{F}_{\\mathbf{w}^{(1)},\\mathbf{b}^{(1)},\\mathbf{w}^{(2)},\\mathbf{b}^{(2)},...}(\\mathbf{x}_i)-\\mathbf{y}_i |^2.\n",
        "$$ When we compute the gradient of this we will write\n",
        "$\\mathbf{F}_1\\circ\\mathbf{F}_2$ for the composite function\n",
        "$\\mathbf{x}\\to \\mathbf{F}_1(\\mathbf{F}_2(\\mathbf{x}))$, and more\n",
        "generally $\\mathbf{F}_1\\circ\\cdots\\circ\\mathbf{F}_r$ for the composite\n",
        "function\n",
        "$\\mathbf{x}\\to\\mathbf{F}_1(\\mathbf{F}_2(\\cdots \\mathbf{F}_r(\\mathbf{x})))$.\n",
        "\n",
        "Let us now look at the individual layers. To apply the chain rule\n",
        "correctly we need to set up the points where we evaluate the Jacobians.\n",
        "For this we define\n",
        "\n",
        "<span id=\"eq-yzdef\">$$\\begin{aligned}\n",
        "\\mathbf{z}^{(2r)}     &=\\mathbf{x}    &\\\\\n",
        "\\mathbf{z}^{(2k-1)} &=W^{(k)}\\mathbf{z}^{(2k)}+\\mathbf{b}^{(k)} &1\\leq k\\leq r \\\\\n",
        "\\mathbf{z}^{(2k-2)} &=\\sigma(\\mathbf{z}^{(2k-1)}) & 1\\leq k\\leq r.\n",
        "\\end{aligned} \\qquad(5)$$</span>\n",
        "\n",
        "The first layer can be written $$\n",
        "\\mathbf{F}_r: \n",
        "\\begin{pmatrix} \n",
        "\\mathbf{w}^{(1)} \\\\\n",
        "\\mathbf{b}^{(1)}  \\\\\n",
        "\\vdots \\\\\n",
        "\\mathbf{w}^{(r-1)} \\\\\n",
        "\\mathbf{b}^{(r-1)} \\\\\n",
        "\\mathbf{w}^{(r)} \\\\\n",
        "\\mathbf{b}^{(r)}\n",
        "\\end{pmatrix} \n",
        "\\to\n",
        "\\begin{pmatrix} \n",
        "\\mathbf{w}^{(1)} \\\\\n",
        "\\mathbf{b}^{(1)}  \\\\\n",
        "\\vdots \\\\\n",
        "\\mathbf{w}^{(r-1)} \\\\\n",
        "\\mathbf{b}^{(r-1)} \\\\\n",
        "\\sigma\\left(W^{(r)}\\mathbf{x} +\\mathbf{b}^{(r)}\\right)\n",
        "\\end{pmatrix},\n",
        "$$ We have here re-defined $\\mathbf{F}_r$ in the sense that the\n",
        "variables, with the exception of $\\mathbf{w}^{(r)},\\mathbf{b}^{(r)}$,\n",
        "are forwarded to the next layer. We continue like this, so that we\n",
        "eliminate the $\\mathbf{w}$ and $\\mathbf{b}$-variables, layer by layer.\n",
        "Let us find the Jacobian $\\mathbf{F}_r'$, and let us start with the last\n",
        "$n$ component functions, which have the form\n",
        "$$ \\left( \\mathbf{w}^{(1)},\\mathbf{b}^{(1)},...,\\mathbf{w}^{(r)},\\mathbf{b}^{(r)}\\right) \\to \\sigma(\\mathbf{G}(\\mathbf{w}^{(r)},\\mathbf{b}^{(r)}))$$\n",
        "with\n",
        "$\\mathbf{G}(\\mathbf{w}^{(r)},\\mathbf{b}^{(r)}) = W^{(r)}\\mathbf{x} +\\mathbf{b}^{(r)}$.\n",
        "\n",
        "<span class=\"theorem-title\">**Exercise 2**</span>  \n",
        "\n",
        "1.  Let $w_i, b_i$ be real numbers, and set $\\mathbf{G}(\\mathbf{x})\n",
        "    =\\begin{pmatrix} w_1 & w_3 & w_2 \\\\ w_2 & w_1 & w_3 \\\\ w_3 & w_2 & w_1\\end{pmatrix}\n",
        "    \\begin{pmatrix} x_1 \\\\ x_2 \\\\ x_3 \\end{pmatrix} + \\begin{pmatrix} b_1 \\\\ b_2 \\\\ b_3 \\end{pmatrix}$.\n",
        "    Show that\n",
        "    $$\\mathbf{G}'(\\mathbf{x})=\\begin{pmatrix} w_1 & w_3 & w_2 \\\\ w_2 & w_1 & w_3 \\\\ w_3 & w_2 & w_1\\end{pmatrix}.$$\n",
        "2.  Let $W$ be any $n\\times n$ matrix, and $\\mathbf{b}\\in\\mathbb{R}^n$,\n",
        "    and set $\\mathbf{G}(\\mathbf{x})=W\\mathbf{x} + \\mathbf{b}$. Show that\n",
        "    $\\mathbf{G}'(\\mathbf{x})=W$.\n",
        "\n",
        "Let instead $\\mathbf{G}: \\mathbb{R}^{2n}\\to\\mathbb{R}^n$ be defined by\n",
        "$$\n",
        "\\mathbf{G}\\left( w_1,...,w_n,b_1,...,b_n\\right) =\n",
        "\\begin{pmatrix}\n",
        "w_1 & w_n & w_{n-1} & \\cdots & w_2 \\\\ \n",
        "w_2 & w_1 & w_n & \\cdots & w_3 \\\\\n",
        "w_3 & w_2 & w_1 & \\cdots & w_4 \\\\\n",
        "\\vdots & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
        "w_n & w_{n-1} & w_{n-2} & \\cdots & w_1\n",
        "\\end{pmatrix}\n",
        "\\mathbf{x}+\\mathbf{b}.\n",
        "$$ In other words, we now consider $w_i$ and $b_i$ as variables, and\n",
        "$\\mathbf{x}$ as a constant.\n",
        "\n",
        "<span class=\"theorem-title\">**Exercise 3**</span> Show that\n",
        "$\\mathbf{G}\\left( w_1,...,w_n,b_1,...,b_n\\right)$ also can be written as\n",
        "$$ \n",
        "\\left(\n",
        "\\begin{array}{ccccc|ccccc}\n",
        "x_1 & x_n & x_{n-1} & \\cdots & x_2 & 1 & 0 & 0 & \\cdots & 0 \\\\\n",
        "x_2 & x_1 & x_n & \\cdots & x_3 & 0 & 1 & 0 & \\cdots & 0 \\\\\n",
        "x_3 & x_2 & x_1 & \\cdots & x_4 & 0 & 0 & 1 & \\cdots & 0 \\\\\n",
        "\\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots\\\\\n",
        "x_n & x_{n-1} & x_{n-2} & \\cdots & x_1 & 0 & 0 & 0 & \\cdots & 1 \\\\\n",
        "\\end{array}\n",
        "\\right)\n",
        "\\begin{pmatrix} w_1 \\\\ \\vdots \\\\ w_n \\\\ b_1 \\\\ \\vdots \\\\ b_n \\end{pmatrix}\n",
        "$$ Hint: Compute and compare the i’th components of $$\n",
        "\\begin{pmatrix}\n",
        "w_1 & w_n & w_{n-1} & \\cdots & w_2 \\\\ \n",
        "w_2 & w_1 & w_n & \\cdots & w_3 \\\\\n",
        "w_3 & w_2 & w_1 & \\cdots & w_4 \\\\\n",
        "\\vdots & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
        "w_n & w_{n-1} & w_{n-2} & \\cdots & w_1\n",
        "\\end{pmatrix}\n",
        "\\begin{pmatrix} x_1 \\\\ x_2 \\\\ x_3 \\\\ \\vdots \\\\ x_n \\end{pmatrix}\n",
        "\\text{ and }\n",
        "\\begin{pmatrix}\n",
        "x_1 & x_n & x_{n-1} & \\cdots & x_2 \\\\ \n",
        "x_2 & x_1 & x_n & \\cdots & x_3 \\\\\n",
        "x_3 & x_2 & x_1 & \\cdots & x_4 \\\\\n",
        "\\vdots & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
        "x_n & x_{n-1} & x_{n-2} & \\cdots & x_1\n",
        "\\end{pmatrix}\n",
        "\\begin{pmatrix} w_1 \\\\ w_2 \\\\ w_3 \\\\ \\vdots \\\\ w_n \\end{pmatrix}\n",
        "$$\n",
        "\n",
        "It follows from <a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Exercise 2</a> that\n",
        "$\\mathbf{G}(w_1,...,w_n,b_1,...,b_n)$ is linear with Jacobi matrix $$\n",
        "\\mathbf{G}'(w_1,...,w_n,b_1,...,b_n) = \n",
        "\\left(\n",
        "\\begin{array}{ccccc|ccccc}\n",
        "x_1 & x_n & x_{n-1} & \\cdots & x_2 & 1 & 0 & 0 & \\cdots & 0 \\\\\n",
        "x_2 & x_1 & x_n & \\cdots & x_3 & 0 & 1 & 0 & \\cdots & 0 \\\\\n",
        "x_3 & x_2 & x_1 & \\cdots & x_4 & 0 & 0 & 1 & \\cdots & 0 \\\\\n",
        "\\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots & \\vdots\\\\\n",
        "x_n & x_{n-1} & x_{n-2} & \\cdots & x_1 & 0 & 0 & 0 & \\cdots & 1 \\\\\n",
        "\\end{array}\n",
        "\\right)\n",
        "$$\n",
        "\n",
        "For the last part of this project we need the concept of *block\n",
        "matrices*. We give an overview of this in\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Section 1.3</a>.\n",
        "\n",
        "Since $\\sigma\\left(W^{(r)}\\mathbf{x} +\\mathbf{b}^{(r)}\\right)$ does not\n",
        "depend on $\\mathbf{w}^{(1)}$, $\\mathbf{b}^{(1)}$,…,\n",
        "$\\mathbf{w}^{(r-1)}$, $\\mathbf{b}^{(r-1)}$, the last $n$ component\n",
        "functions in $\\mathbf{F}_r$ will therefore have Jacobian\n",
        "$\\left(\\begin{array}{c|c}O & U^{(r)}\\end{array}\\right)$, where\n",
        "\n",
        "<span id=\"eq-urdef\">$$\n",
        "U^{(r)} :=\n",
        "\\sigma'(\\mathbf{z}^{(2r-1)})\n",
        "\\left(\\begin{array}{cccc|cccc}\n",
        "x_1 & x_n   & \\cdots  & x_2               & 1 & 0 & \\cdots & 0 \\\\\n",
        "x_2 & x_1   & \\cdots   & x_3                & 0 & 1 & \\cdots & 0 \\\\\n",
        "\\vdots     & \\vdots       & \\vdots  & \\vdots               & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
        "x_n     & x_{n-1}        & \\cdots & x_1            & 0 & 0 & \\cdots & 1 \n",
        "\\end{array}\\right),\n",
        " \\qquad(6)$$</span>\n",
        "\n",
        "which is an $n\\times n$ diagonal matrix multiplied with a $n\\times(2n)$\n",
        "matrix (here we have used the block matrix notation from\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Section 1.3</a>). Since the\n",
        "first component functions are simply the identity mapping, we also get\n",
        "<span id=\"eq-firstexpr\">$$\n",
        "\\mathbf{F}_r'=\n",
        "\\left(\\begin{array}{c|c}\n",
        "I          & O \\\\ \\hline\n",
        "O & U^{(r)}\n",
        "\\end{array}\\right),\n",
        " \\qquad(7)$$</span> In the same way we write the other layers\n",
        "$\\mathbf{F}_1$,…,$\\mathbf{F}_{r-1}$ as $$\n",
        "\\mathbf{F}_k: \n",
        "\\begin{pmatrix} \n",
        "\\mathbf{w}^{(1)} \\\\\n",
        "\\mathbf{b}^{(1)} \\\\\n",
        "\\vdots \\\\\n",
        "\\mathbf{w}^{(k-1)} \\\\\n",
        "\\mathbf{b}^{(k-1)}\\\\\n",
        "\\mathbf{w}^{(k)} \\\\\n",
        "\\mathbf{b}^{(k)} \\\\\n",
        "\\mathbf{x}\n",
        "\\end{pmatrix} \n",
        "\\to\n",
        "\\begin{pmatrix} \n",
        "\\mathbf{w}^{(1)} \\\\\n",
        "\\mathbf{b}^{(1)} \\\\\n",
        "\\vdots \\\\\n",
        "\\mathbf{w}^{(k-1)} \\\\\n",
        "\\mathbf{b}^{(k-1)}\\\\\n",
        "\\sigma\\left(W^{(k)}\\mathbf{x} +\\mathbf{b}^{(k)}\\right)\n",
        "\\end{pmatrix},$$ where $1\\leq k\\leq r-1$. The only difference from the\n",
        "above is that there are $n$ extra variables $x_i$ here (we shall\n",
        "evaluate in $\\mathbf{x}=\\mathbf{z}^{(2k-1)}$). This means that the only\n",
        "difference for $\\mathbf{F}'_k$, when comparing with $\\mathbf{F}'_r$ in\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 7</a>, is that we\n",
        "get extra columns $\\begin{pmatrix} O \\\\ V^{(k)} \\end{pmatrix}$ on the\n",
        "end, where <span id=\"eq-computevk\">$$\n",
        "V^{(k)} := \n",
        "\\sigma'(\\mathbf{z}^{(2k-1)})\n",
        "\\left(\\begin{array}{cccc}\n",
        "w_1^{(k)} & w_n^{(k)}   & \\cdots  & w_2^{(k)}           \\\\\n",
        "w_2^{(k)} & w_1^{(k)}   & \\cdots   & w_3^{(k)}           \\\\\n",
        "\\vdots     & \\vdots       & \\vdots  & \\vdots               \\\\\n",
        "w_n^{(k)}     & w_{n-1}^{(k)}        & \\cdots & w_1^{(k)}         \n",
        "\\end{array}\\right)\n",
        " \\qquad(8)$$</span> (this is the Jacobian of\n",
        "$\\mathbf{x}\\to \\sigma\\left(W^{(k)}\\mathbf{x} + \\mathbf{b}\\right)$). The\n",
        "other columns are deduced in the same way as\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 6</a>, giving\n",
        "$\\left(\\begin{array}{c|c} I & O \\\\ \\hline O & U^{(k)}\\end{array}\\right)$,\n",
        "where <span id=\"eq-computeuk\">$$\n",
        "U^{(k)} :=\n",
        "\\sigma'(\\mathbf{z}^{(2k-1)})\n",
        "\\left(\\begin{array}{cccc|cccc}\n",
        "z_1^{(2k)} & z_n^{(2k)}   & \\cdots  & z_2^{(2k)}               & 1 & 0 & \\cdots & 0 \\\\\n",
        "z_2^{(2k)} & z_1^{(2k)}   & \\cdots   & z_3^{(2k)}                & 0 & 1 & \\cdots & 0 \\\\\n",
        "\\vdots     & \\vdots       & \\vdots  & \\vdots               & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
        "z_n^{(2k)}     & z_{n-1}^{(2k)}        & \\cdots & z_1^{(2k)}            & 0 & 0 & \\cdots & 1 \n",
        "\\end{array}\\right)\n",
        " \\qquad(9)$$</span> ($\\mathbf{x}$ in\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 6</a> is exchanged with\n",
        "$\\mathbf{z}^{(2k)}$). In other words we get $$\n",
        "\\mathbf{F}_k'=\n",
        "\\left(\\begin{array}{c|c|c}\n",
        "I         & O & O \\\\ \\hline\n",
        "O & U^{(k)} & V^{(k)}\n",
        "\\end{array}\\right).\n",
        "$$\n",
        "\n",
        "Now that we have the Jacobian of each individual layer, we can use the\n",
        "chain rule to find the Jacobian of the entire neural network. If we\n",
        "first apply the chain rule to find the Jacobian of\n",
        "$\\mathbf{F}_{r-1}\\circ\\mathbf{F}_r$ we get <span id=\"eq-oppg1\">$$\n",
        "\\left(\\begin{array}{c|c|c}\n",
        "I_1     & O       &O           \\\\ \\hline\n",
        "O & U^{(r-1)}  & V^{(r-1)} \n",
        "\\end{array} \\right) \n",
        "\\left(\\begin{array}{c|c|c} \n",
        "I_1     & O & O \\\\ \\hline \n",
        "O & I_2     & O \\\\ \\hline \n",
        "O  & O & U^{(r)}\n",
        "\\end{array}\\right).\n",
        " \\qquad(10)$$</span> Here we split the identity matrix $I$ into\n",
        "$\\left(\\begin{array}{c|c} I_1 & O \\\\ \\hline O & I_2 \\end{array}\\right)$,\n",
        "for the rows to match the columns in the preceding matrix.\n",
        "\n",
        "<span class=\"theorem-title\">**Exercise 4**</span> Show that\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 10</a> equals $$\n",
        "\\left(\\begin{array}{c|c|c} \n",
        "I_1     & O & O \\\\ \\hline\n",
        "O & U^{(r-1)} & V^{(r-1)}U^{(r)}\n",
        "\\end{array}\\right)\n",
        "$$\n",
        "\n",
        "The Jacobian of $\\mathbf{F}_{r-2}\\circ\\mathbf{F}_{r-1}\\circ\\mathbf{F}_r$\n",
        "now becomes <span id=\"eq-oppg2\">$$\n",
        "\\left(\\begin{array}{c|c|c}\n",
        "I_1     & O  & O             \\\\ \\hline\n",
        "O & U^{(r-2)}  & V^{(r-2)} \n",
        "\\end{array} \\right)\n",
        "\\left(\\begin{array}{c|c|c|c} \n",
        "I_1     & O & O & O \\\\ \\hline\n",
        "O & I_2     & O & O \\\\ \\hline \n",
        "O & O & U^{(r-1)} & V^{(r-1)}U^{(r)}\n",
        "\\end{array}\\right),\n",
        " \\qquad(11)$$</span>\n",
        "\n",
        "<span class=\"theorem-title\">**Exercise 5**</span> Show that\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 11</a> becomes $$\n",
        "\\left(\\begin{array}{c|c|c|c} \n",
        "I_1     & O & O & O \\\\ \\hline\n",
        "O & U^{(r-2)} & V^{(r-2)} U^{(r-1)} & V^{(r-2)} V^{(r-1)}U^{(r)}\n",
        "\\end{array}\\right)\n",
        "$$\n",
        "\n",
        "If we continue this process to include all layers, we get that the\n",
        "$k$’th part in the end becomes. <span id=\"eq-kthpart\">$$\n",
        "V^{(1)}\\cdots V^{(k-1)}U^{(k)}.\n",
        " \\qquad(12)$$</span> What remains now is to also apply the Jacobian\n",
        "(=gradient) of the outer function.\n",
        "\n",
        "<span class=\"theorem-title\">**Exercise 6**</span>  \n",
        "\n",
        "1.  Let $g(\\mathbf{x})=|\\mathbf{H}(\\mathbf{x})-\\mathbf{y}|^2$, where\n",
        "    $\\mathbf{H}$ is a differentiable function, and\n",
        "    $\\mathbf{y}\\in\\mathbb{R}^n$. Show that\n",
        "    $$\\nabla g(\\mathbf{x}) = 2(\\mathbf{H}(\\mathbf{x})-\\mathbf{y})^T\\mathbf{H}'(\\mathbf{x}).$$\n",
        "2.  Let\n",
        "    $g(\\mathbf{x})=\\sum_{i=1}^m |\\mathbf{H}_i(\\mathbf{x})-\\mathbf{y_i}|^2$,\n",
        "    where the $\\mathbf{H}_i$ are differentiable functions, and\n",
        "    $\\mathbf{y}_i\\in\\mathbb{R}^n$. Show that\n",
        "    $$\\nabla g(\\mathbf{x}) = \\sum_{i=1}^m 2(\\mathbf{H}_i(\\mathbf{x})-\\mathbf{y}_i)^T\\mathbf{H}_i'(\\mathbf{x}).$$\n",
        "\n",
        "If we combine <a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 12</a>\n",
        "with <a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Exercise 6</a>, the $k$’th\n",
        "part of the gradient for the whole neural network becomes (in the case\n",
        "of one measurement point) <span id=\"eq-kthpartgrad\">$$\n",
        "2(\\mathbf{z}_0-\\mathbf{y})^TV^{(1)}\\cdots V^{(k-1)}U^{(k)}.\n",
        " \\qquad(13)$$</span> The first half of this matrix are the partial\n",
        "derivatives for the $w_i^{(k)}$-variables, the second half those of the\n",
        "$b_i^{(k)}$ variables.\n",
        "\n",
        "## 1.2 Implementation\n",
        "\n",
        "Let us take a look at how all this can be implemented. The algorithm for\n",
        "following the gradient to get closer to a minimum of the loss function\n",
        "is called *backpropagation*. The algorithm takes the measurements\n",
        "$\\mathbf{x}_i$ and $\\mathbf{y}_i$, the number of layers $r$, and the\n",
        "number of iterations `its`, as input, and returns the updated parameters\n",
        "$\\mathbf{w}^{(i)}$ and $\\mathbf{b}^{(i)}$, after this number of\n",
        "iterations. As before we assume that all matrices are $n\\times n$, all\n",
        "vectors have length $n$, and that there are $m$ measurement points,\n",
        "which we will randomly generate.\n",
        "\n",
        "We start by importing the packages we need:"
      ],
      "id": "e5387760-1c09-4676-9765-1e3487ce6f3e"
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "import numpy as np"
      ],
      "id": "6ffde748"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We also need functions that implement the activation function, as well\n",
        "as multiplication with its Jacobian, and a function that implements\n",
        "circular convolution."
      ],
      "id": "6be1e7b3-e96b-4563-a784-23486c500164"
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "def sigma(x):\n",
        "    return 1/(1+np.exp(-x))\n",
        "\n",
        "def sigmader(x):\n",
        "    return sigma(x)*(1-sigma(x))\n",
        "\n",
        "def cconv(x, y):\n",
        "    X = np.fft.fft(x)\n",
        "    Y = np.fft.fft(y)\n",
        "    Z = X*Y\n",
        "    return np.fft.ifft(Z).real"
      ],
      "id": "a356bf72"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We also need code that computes the current value of the lossfunction:"
      ],
      "id": "4cb22530-407e-49cc-972a-9f64f5ecbeb7"
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": [
        "def loss(x, y, w, b):\n",
        "    n, m = np.shape(x)\n",
        "    n, r = np.shape(w)\n",
        "    loss = 0\n",
        "    for t in range(m):          # Iterate through the measurements\n",
        "        z = np.zeros(n)\n",
        "        for k in range(r,0,-1): # iterate through the layers\n",
        "            z = cconv(w[:,k-1], z) + b[:,k-1]\n",
        "            z = sigma(z)\n",
        "        loss += sum(abs(z-y[:,t])**2)\n",
        "    return loss"
      ],
      "id": "5c00baee"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now have all we need to implement the backpropagation function:"
      ],
      "id": "3dde8315-3022-48ff-98a5-bf101ec2cca3"
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "def backpropagation(x, y, r, its):\n",
        "    valp = 0\n",
        "    lrate = 0.01  # Learning rate\n",
        "    n, m = np.shape(x)\n",
        "    w = np.random.rand(n, r) # Randomly generated initial parameters\n",
        "    b = np.random.rand(n, r) # Randomly generated initial parameters\n",
        "    for it in range(its):\n",
        "        dw = np.zeros_like(w)\n",
        "        db = np.zeros_like(b)\n",
        "        for t in range(m):\n",
        "            z = np.zeros((n,2*r+1))\n",
        "            z[:,2*r] = x[:,t]\n",
        "            for k in range(r,0,-1):\n",
        "                z[:, 2*k-1] = cconv(w[:,k-1], z[:,2*k]) + b[:,k-1]\n",
        "                z[:, 2*k-2] = sigma(z[:,2*k-1])\n",
        "            \n",
        "            temp = 2*(z[:,0]-y[:,t])\n",
        "            for k in range(1,r+1):\n",
        "                temp = temp*sigmader(z[:,2*k-1])\n",
        "                dw[:,k-1] += cconv( np.flip(z[:,2*k]), temp)\n",
        "                db[:,k-1] += temp\n",
        "                temp = cconv( np.flip(w[:,k-1]), temp)\n",
        "        w = w - lrate*dw\n",
        "        b = b - lrate*db\n",
        "        print(loss(x, y, w, b))\n",
        "    return w, b"
      ],
      "id": "68c80fa9"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This code requires several comments. The implementation has randomly\n",
        "generated initial values for the weights $\\mathbf{w}^{(k)}$ and\n",
        "$\\mathbf{b}^{(k)}$. These are stored in $n\\times r$-matrices, where each\n",
        "column represents a layer. The outer `for`-loop in the code goes through\n",
        "the iterations. Each iteration follows the gradient a small bit from the\n",
        "current point to reach a new point. In the next iteration the gradient\n",
        "at the new point is computed, and this is followed a little bit from the\n",
        "new point, and so on. The update of the point by following the gradient\n",
        "a small bit is captured by the two last lines in the outermost\n",
        "`for`-loop. There the *hyperparameter* `lrate` is seen, which captures\n",
        "the distance to follow the gradient. In machine learning this is also\n",
        "called the *learning rate*, and the algorithm is highly sensitive to\n",
        "this parameter: Following the gradient for a little while corresponds to\n",
        "using a first order approximation to the function. But this\n",
        "approximation is good only near the point we evaluate in. We should\n",
        "therefore not follow the gradient too far. We should not follow it too\n",
        "short either - that would render the algorithm ineffective. In practice\n",
        "a compromise is struck. An algorithm can also set the learning rate\n",
        "adaptively, as some points may allow for following the gradient longer\n",
        "than others.\n",
        "\n",
        "The next `for`-loop goes through the $m$ data points. For each of them\n",
        "the vectors $\\mathbf{z}^{(k)}$ in\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 5</a> are computed.\n",
        "This is called the *forward pass* of the algorithm, and is done by\n",
        "applying the activation function and multiplication with $W^{(k)}$ in\n",
        "alternating order (line 14-15). Once this has been done, the expression\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 13</a> for the\n",
        "different parts of the gradient is computed. This is also called the\n",
        "*backward pass* of the algorithm since, as we will see, the order of the\n",
        "computations will be reversed when compared to the forward pass. Let us\n",
        "look at why this is the case.\n",
        "\n",
        "<span class=\"theorem-title\">**Exercise 7**</span> Assume we would like\n",
        "to compute the expression $\\mathbf{x}^TA_1A_2\\cdots A_n$ on a computer,\n",
        "where $\\mathbf{x}\\in\\mathbb{R}^n$ and the $A_i$ are\n",
        "$n\\times n$-matrices. Why is it smart to compute this expression from\n",
        "left to right, rather than from right to left.\n",
        "\n",
        "In terms of complexity it is preferable to compute the quantities from\n",
        "left to right, i.e., in order opposite to the forward pass. When\n",
        "computing <a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 13</a>\n",
        "from left to right, $2(\\mathbf{z}_0-\\mathbf{y})^TV^{(1)}\\cdots V^{(k)}$\n",
        "can be obtained from\n",
        "$2(\\mathbf{z}_0-\\mathbf{y})^TV^{(1)}\\cdots V^{(k-1)}$ by multiplying\n",
        "with $V^{(k)}$ to the right. Our implementation exploits this by storing\n",
        "the $2(\\mathbf{z}_0-\\mathbf{y})^TV^{(1)}\\cdots V^{(k)}$ in the temporary\n",
        "variable `temp`. According to\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 8</a>, this can be\n",
        "done by\n",
        "\n",
        "-   first multiplying with $\\sigma'(\\mathbf{z}^{(2k-1)})$ to the right\n",
        "    (first line in the innermost `for`-loop),\n",
        "-   then multiplying with $W^{(k)}$ to the right (circular convolution\n",
        "    with the reverse of $\\mathbf{w}^{(k)}$. Last line in the innermost\n",
        "    `for`-loop).\n",
        "\n",
        "To get the partial derivatives w.r.t. the $\\mathbf{w}^{(k)}$ and the\n",
        "$\\mathbf{b}^{(k)}$ (the $k$’th part of the gradient), we multiply with\n",
        "$U^{(k)}$ to the right. According to\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 9</a> this can be\n",
        "done by first multiplying with $\\sigma'(\\mathbf{z}^{(2k-1)})$ to the\n",
        "right (first line in the innermost `for`-loop). After this,\n",
        "\n",
        "-   storing this gives partial derivatives w.r.t. the $\\mathbf{b}^{(k)}$\n",
        "    (third line in the innermost `for`-loop),\n",
        "-   circular convolution with the reverse of $\\mathbf{z}^{(2k)}$ gives\n",
        "    partial derivatives w.r.t. the $\\mathbf{w}^{(k)}$ (second line in\n",
        "    the innermost `for`-loop),\n",
        "\n",
        "After the innermost `for`-loop the gradient at the current point\n",
        "(captured by the variables `dw` and `db`) has been updated with the\n",
        "contribution from one more data point.\n",
        "\n",
        "Code for invoking backpropagation can now be as follows."
      ],
      "id": "ffa581d9-15c0-4f7f-9ea4-6ce7f0ac6779"
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "37.486126635221865\n",
            "37.48512012611465\n",
            "37.484093233910855\n",
            "37.483045335032095\n",
            "37.48197578026227\n",
            "37.48088389341876\n",
            "37.47976896994035\n",
            "37.478630275385676\n",
            "37.47746704383529\n",
            "37.476278476190444"
          ]
        }
      ],
      "source": [
        "m = 10   # Number of data points\n",
        "n = 10   # dimension of the data\n",
        "its = 10 # Number of iterations\n",
        "r = 10   # Number of layers\n",
        "\n",
        "x = np.random.rand(n,m) # Randomly generated measurements\n",
        "y = np.random.rand(n,m) # Randomly generated measurements\n",
        "\n",
        "w, b = backpropagation(x, y, r, its)"
      ],
      "id": "5ebcd374"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<span class=\"theorem-title\">**Exercise 8**</span> If you run the code\n",
        "you will see that the loss function usually decreases in the iterations.\n",
        "Is this always the case? If you increase the learning rate to 0.1, will\n",
        "the loss function decrease when you run the code? If the loss function\n",
        "increases in some iterations, can you think of a way to adjust the\n",
        "learning rate so that the loss function always decreases?\n",
        "\n",
        "## 1.3 Block matrices\n",
        "\n",
        "If we have four matrices $A$, $B$, $C$, $D$, we write\n",
        "<span id=\"eq-blockmatrixdef\">$$\n",
        "\\left(\\begin{array}{c|c} A & B \\\\ \\hline C & D \\end{array}\\right)\n",
        " \\qquad(14)$$</span> for the matrix where the components in these four\n",
        "matrices are placed over, under, to the left, or to the right of each\n",
        "other as indicated. Lines are used to separate the blocks. We call this\n",
        "a *block matrix*.\n",
        "<a href=/studier/emner/matnat/math/MAT1110/v26/jupyter/"" class=\"quarto-xref\">Equation 14</a> only\n",
        "gives meaning if\n",
        "\n",
        "-   $A$ and $B$ have equally many rows,\n",
        "-   $C$ and $D$ have equally many rows,\n",
        "\n",
        "for then these can be put next to each other. In the same way $A$ and\n",
        "$C$ must have equally many columns, $B$ and $D$ equally many columns. As\n",
        "an example, if $$\n",
        "\\begin{aligned}\n",
        "A &=\\begin{pmatrix} 1 & 1 & 1 \\\\ 1 & 1 & 1 \\end{pmatrix} &\n",
        "B &= \\begin{pmatrix} 2 & 2 \\\\ 2 & 2 \\end{pmatrix} &\n",
        "C &= \\begin{pmatrix} 3 & 3 & 3 \\\\ 3 & 3 & 3 \\\\ 3 & 3 & 3 \\end{pmatrix} &\n",
        "D &= \\begin{pmatrix} 4 & 4 \\\\  4 & 4 \\\\ 4 &4 \\end{pmatrix}\n",
        "\\end{aligned}\n",
        "$$ then $$\n",
        "\\left(\\begin{array}{c|c} A & B \\\\ \\hline C & D \\end{array}\\right)\n",
        "=\n",
        "\\begin{pmatrix} 1 & 1 &1 & 2 & 2 \\\\ 1 & 1 & 1 & 2 & 2 \\\\ 3 & 3 & 3 & 4 & 4 \\\\ 3 & 3 & 3 & 4 & 4 \\\\ 3 & 3 & 3 & 4 & 4 \\end{pmatrix}\n",
        "$$\n",
        "\n",
        "It is easy to generalise to block matrices with more than four blocks.\n",
        "The product of two block matrices\n",
        "$\\left(\\begin{array}{c|c} A & B \\\\ \\hline C & D \\end{array}\\right)$ and\n",
        "$\\left(\\begin{array}{c|c} E & F \\\\ \\hline G & H \\end{array}\\right)$ \\$\n",
        "give meaning if\n",
        "\n",
        "-   the number of columns in $A$ equals the number of rows in $E$,\n",
        "-   the number of columns in $B$ equals the number of rows in $G$.\n",
        "\n",
        "Block matrices can then be multiplied the way we are used to for\n",
        "ordinary matrices: $$\n",
        "\\left(\\begin{array}{c|c} A & B \\\\ \\hline C & D \\end{array}\\right)\n",
        "\\left(\\begin{array}{c|c} E & F \\\\ \\hline G & H \\end{array}\\right)=\n",
        "\\left(\\begin{array}{c|c} AE+BG & AF+BH \\\\ \\hline CE+DG & CF+DH \\end{array}\\right)\n",
        "$$"
      ],
      "id": "e9746a63-010c-48ca-a817-1ff6482d209e"
    }
  ],
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "path": "/opt/anaconda3/share/jupyter/kernels/python3"
    }
  }
}