Department of Mathematics
The Chinese University of Hong Kong
HSMMC Math Modelling Advanced Training Workshop
May 16, 2026
We now introduce the Gradient Descent Method to obtain an approximate solution to the least-squares problem numerically.
There is a problem of finding the minimum value of a one-variable function \(f(x)\) that can be differentiated as follows.
import numpy as np
import matplotlib.pyplot as plt
# Create x values (matches xmin=-3, xmax=3)
x = np.linspace(-3, 3, 100)
# Define function f(x) = x² + 1
y = x ** 2 + 1
# Create figure
plt.figure(figsize=(6, 4))
# Plot the function (blue, thick line)
plt.plot(x, y, 'blue', linewidth=2)
# Dashed vertical line at x* = 0 (from y=0 to y=1)
plt.plot([0, 0], [0, 1], 'k--') # black dashed line
# Label x* below the x-axis
plt.text(0, -0.3, r'$x^*$', ha='center', fontsize=12)
# Axes settings (match TikZ exactly)
plt.axhline(0, color='black', linewidth=0.8) # x-axis
plt.axvline(0, color='black', linewidth=0.8) # y-axis
plt.xlabel(r'$x$', fontsize=12)
plt.ylabel(r'$f(x)$', fontsize=12)
plt.xlim(-3, 3)## (-3.0, 3.0)
## (-1.0, 5.0)
## ([], [])
## ([], [])
Then, the point \(x^*\) giving the minimum value of the function (it is called the optimal solution) by Fermat’s critical point theorem satisfies the following: \[ f'(x^*) = 0. \] Therefore, we can determine which of the solutions satisfying the equation \(f'(x) = 0\) is the optimal solution. However, when the function is complex, it is not easy to find a critical point by solving the equation. In this case, the critical point is obtained numerically. In this section, we look at the gradient descent method, which is the most popular numerical optimization method to find the minimum value of a given function. The basic idea of the gradient descent method is to find the slope of the function, move it toward downhill, and repeat it until it reaches the extreme value.
Gradient Descent Method] Algorithm
Let’s explain this GDM algorithm. Set \(x_k\) and compute \(g_k = f'(x_k)\). If \(|g_k| < \epsilon\) (it satisfies the critical point theorem within the margin of error we allow), then the algorithm stops and will give \(x_k\) as the optimal solution. Here, \(\epsilon\) (epsilon, tolerance) satisfies \(0 \leq \epsilon \ll 1\). A notation \(``\ll"\) in this inequality means that the epsilon is much smaller than \(1\). Hence, we give a small tolerance epsilon \(\epsilon = 10^{-6}\), which is very close to zero, and with a learning rate \(\eta\). And let the iteration number \(k\) be \(1\). If \(|g_k| > \epsilon\), \(x_{k+1} = x_k - \eta g_k\) is determined using the formula. Then \(g_k = f'(x_k)\) is calculated to determine whether or not the critical point theorem is satisfied. If the critical point theorem is not satisfied, then \(x_1\) is determined similarly. In this way, we create \(x_1, x_2, \dots (\to x^*)\). If the value of \(g_k = f'(x_k)\) is within a given tolerance after some repeated steps \(x_{k+1} = x_k - \eta f'(x_k)\), then the algorithm stops. We expect some \(x_k\) or the limit \(x^*\) to satisfy \(f'(x^*) = 0\).
Let’s see how the GDM works in detail. Suppose the slope of the tangent line to the function \(f\) is negative at the approximate solution \(x_k\) after the \(k\)-th iteration. It is \(g_k = f'(x_k) < 0\) as shown in the figure below. This means the function decreases when \(x_k\) moves from left to right, so we can expect \(x^*\) is located on the right side of \(x_k\). If \(x_k\) moves from the left to the right, then \(x_{k+1}\) is closer to the optimal value \(x^*\). Therefore, we move from \(x_k\) to \(x_{k+1} = x_k - \eta g_k\) with \(-\eta g_k > 0\) direction.
import numpy as np
import matplotlib.pyplot as plt
# ----------------------
# Data for the function
# ----------------------
x = np.linspace(-3, 3, 100)
y = x ** 2 + 1
# ----------------------
# Create plot
# ----------------------
plt.figure(figsize=(7, 5))
# Plot f(x) = x² + 1 (blue thick line)
plt.plot(x, y, color='blue', linewidth=3)
# Dashed vertical line at x* = 0
plt.plot([0, 0], [0, 1], 'k--', linewidth=1.5)
# Red arrow: (-1.5, 3.25) → (-0.75, 1.5625)
plt.arrow(
-1.5, 3.25, # start point
0.75, -1.6875, # dx, dy
color='red',
linewidth=2,
head_width=0.12,
head_length=0.15,
length_includes_head=True
)
# ----------------------
# All labels (exact positions)
# ----------------------
plt.text(-1.7, 3.5, r"$f'(x_k) < 0$", color='red', fontsize=14)
plt.text(-1.5, -0.4, r"$x_k$", ha='center', fontsize=12)
plt.text(-0.75, -0.4, r"$x_{k+1}$", ha='center', fontsize=12)
plt.text(0, -0.4, r"$x^*$", ha='center', fontsize=12)
# ----------------------
# Axes styling (match TikZ)
# ----------------------
plt.axhline(0, color='black', linewidth=1)
plt.axvline(0, color='black', linewidth=1)
plt.xlabel(r'$x$', fontsize=12)
plt.ylabel(r'$f(x)$', fontsize=12)
plt.xlim(-3, 3)## (-3.0, 3.0)
## (-1.0, 5.0)
## ([], [])
## ([], [])
Similarly, the slope of the tangent line to the function \(f\) is positive at the approximate solution \(x_k\) after the \(k\)-th iteration. It is \(g_k = f'(x_k) > 0\) as shown in the figure below. It means the function is increasing when \(x_k\) moves from left to right, so we can expect \(x^*\) is located on the left side of \(x_k\). If \(x_k\) moves to the left, then \(x_{k+1}\) is closer to the optimal value \(x^*\). Therefore, we move from \(x_k\) to \(x_{k+1} = x_k - \eta g_k\) with \(-\eta g_k < 0\) direction. If we repeat the process until \(x_k\) moves to \(x^*\), then \(f'(x_k)\) will converge to \(0\). In this way, we get the approximate solution \(x_k\).
import numpy as np
import matplotlib.pyplot as plt
# Create x values for the curve
x = np.linspace(-3, 3, 100)
y = x ** 2 + 1
# Create figure
plt.figure(figsize=(7, 5))
# Plot the function: f(x) = x² + 1 (blue thick line)
plt.plot(x, y, 'blue', linewidth=3)
# Dashed vertical line at x* = 0
plt.plot([0, 0], [0, 1], 'k--', linewidth=1.5)
# Red arrow: (1.5, 3.25) → (0.75, 1.5625)
plt.arrow(1.5, 3.25, -0.75, -1.6875,
color='red', linewidth=2,
head_width=0.12, head_length=0.15,
length_includes_head=True)
# All labels (exact positions)
plt.text(1.7, 3.5, r"$f'(x_k) > 0$", color='red', fontsize=14)
plt.text(1.5, -0.4, r"$x_k$", ha='center', fontsize=12)
plt.text(0.75, -0.4, r"$x_{k+1}$", ha='center', fontsize=12)
plt.text(0, -0.4, r"$x^*$", ha='center', fontsize=12)
# Axes style (matches TikZ exactly)
plt.axhline(0, color='black', linewidth=1)
plt.axvline(0, color='black', linewidth=1)
plt.xlabel(r'$x$', fontsize=12)
plt.ylabel(r'$f(x)$', fontsize=12)
# Axis limits
plt.xlim(-3, 3)## (-3.0, 3.0)
## (-1.0, 5.0)
## ([], [])
## ([], [])
This shows that the GDM will find \(x_1, x_2, x_3, \dots\) that satisfy \[f(x_1) > f(x_2) > f(x_3) > \dots.\] Here, \(\eta\) determines the magnitude of movements. We call this \(``\)the learning rate\("\). If the learning rate is too large, \(x_{k+1}\) can pass over \(x^*\), or even the function value may increase. So we should be careful to choose a reasonable learning rate \(\eta\).
import numpy as np
import matplotlib.pyplot as plt
# Create figure with two subplots (side by side)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
# --------------------------
# Shared function data
# --------------------------
x = np.linspace(-4, 4, 100)
y = x**2 + 1
# ==========================
# LEFT PLOT: Do not converge (too large step)
# ==========================
ax1.plot(x, y, 'blue', linewidth=3)
# Red horizontal arrow: (-2,5) → (2,5)
ax1.arrow(-2, 5, 4, 0,
color='red', linewidth=2,
head_width=0.2, head_length=0.15, length_includes_head=True)
# Labels
ax1.text(-2, -0.8, r'$x_k$', ha='center', fontsize=12)
ax1.text(2, -0.8, r'$x_{k+1}$', ha='center', fontsize=12)
ax1.text(0, -1.5, 'Do not converge', ha='center', fontsize=12)
# Axes styling (matches TikZ)
ax1.axhline(0, color='black', linewidth=1)
ax1.axvline(0, color='black', linewidth=1)
ax1.set_xlabel(r'$x$')
ax1.set_ylabel(r'$f(x)$')
ax1.set_xlim(-4, 4)## (-4.0, 4.0)
## (-1.0, 10.0)
## []
## []
# ==========================
# RIGHT PLOT: Too slow (small step)
# ==========================
ax2.plot(x, y, 'blue', linewidth=3)
# Three small red arrows
ax2.arrow(-2, 5, 0.25, -0.9375,
color='red', linewidth=2, head_width=0.15, head_length=0.1, length_includes_head=True)
ax2.arrow(-1.75, 4.0625, 0.1875, -0.6211,
color='red', linewidth=2, head_width=0.15, head_length=0.1, length_includes_head=True)
ax2.arrow(-1.5625, 3.4414, 0.15625, -0.4687,
color='red', linewidth=2, head_width=0.15, head_length=0.1, length_includes_head=True)
# Labels
ax2.text(-2, -0.8, r'$x_k$', ha='center', fontsize=12)
ax2.text(-1.75, -0.8, r'$x_{k+1}$', ha='center', fontsize=12)
ax2.text(-1.5625, -0.8, r'$x_{k+2}$', ha='center', fontsize=12)
ax2.text(-1.40625, -0.8, r'$x_{k+3}$', ha='center', fontsize=12)
ax2.text(-1, -1.5, 'Too slow', ha='center', fontsize=12)
# Axes styling
ax2.axhline(0, color='black', linewidth=1)
ax2.axvline(0, color='black', linewidth=1)
ax2.set_xlabel(r'$x$')
ax2.set_ylabel(r'$f(x)$')
ax2.set_xlim(-4, 4)## (-4.0, 4.0)
## (-1.0, 10.0)
## []
## []
import numpy as np
import matplotlib.pyplot as plt
# ==============================================
# 1. Define the function and its derivative (from LaTeX)
# f(x) = x² + 1 f'(x) = 2x
# ==============================================
def f(x):
return x**2 + 1
def df(x):
return 2 * x
# ==============================================
# 2. Gradient Descent Algorithm (EXACT as LaTeX)
# ==============================================
def gradient_descent(x_start, eta=0.1, tol=1e-6, max_iter=1000):
x = x_start
path = [x] # Store iteration path for plotting
for k in range(max_iter):
grad = df(x)
# Convergence condition: |f'(x_k)| ≤ ε
if abs(grad) <= tol:
print(f"lgorithm Succeed! (Converged at iteration {k+1})")
break
# Update rule: x_{k+1} = x_k - η * f'(x_k)
x = x - eta * grad
path.append(x)
else:
print("Did not converge within max iterations")
return x, np.array(path)
# ==============================================
# 3. Run GDM with parameters from LaTeX
# ==============================================
x_opt, path = gradient_descent(x_start=1.5, eta=0.1, tol=1e-6)## lgorithm Succeed! (Converged at iteration 68)
##
## Optimal x* = 0.000000
## Minimum f(x*) = 1.000000
## Iterations: 68
# ==============================================
# 4. Plot 1: Function + Convergence Path (matches LaTeX figure)
# ==============================================
x_plot = np.linspace(-3, 3, 500)
y_plot = f(x_plot)
plt.figure(figsize=(10, 5))
plt.plot(x_plot, y_plot, 'b-', linewidth=2, label=r'$f(x) = x^2 + 1$')
plt.scatter(path, f(path), color='red', s=30, zorder=5)
plt.plot(path, f(path), 'r--', linewidth=1.5, label='GD Path')
plt.axvline(x=0, color='k', linestyle='dashed', linewidth=1, label=r'$x^* = 0$')
plt.xlabel(r'$x$')
plt.ylabel(r'$f(x)$')
plt.title('Gradient Descent on $f(x) = x^2 + 1$')
plt.grid(alpha=0.3)
plt.legend()
plt.xlim(-3, 3)## (-3.0, 3.0)
## (-1.0, 5.0)
# ==============================================
# 5. Demo: Too Large Learning Rate (Diverges)
# ==============================================
print("\n--- Testing TOO LARGE learning rate (η=1.0) ---")##
## --- Testing TOO LARGE learning rate (η=1.0) ---
## Did not converge within max iterations
# ==============================================
# 6. Demo: Too Small Learning Rate (Too Slow)
# ==============================================
print("\n--- Testing TOO SMALL learning rate (η=0.01) ---")##
## --- Testing TOO SMALL learning rate (η=0.01) ---
## Did not converge within max iterations
Find the minimum of \(f(x) = 2x^2 - 3x + 2\). Take \(x_1 = 0\), \(\eta = 0.1\), and \(\epsilon = 10^{-6}\) in the following code.
Solution
From \(f'(x) = 4x - 3 = 0\), the critical point becomes \(x = \frac{3}{4} = 0.75\). Since \(f(x)\) is convex (opens upward), the minimum value \[ f\left(\frac{3}{4}\right) = \frac{7}{8} = 0.875 \] is obtained at \(x = \frac{3}{4}\).
The output of the algorithm is:
In the above example, the points created by the gradient descent method \((x_1, f(x_1)),\ (x_2, f(x_2)),\ (x_3, f(x_3)),\ \dots\) are shown on the coordinate plane along with the graph of the function \(y = f(x)\). From the figure, it is easy to see that the sequence converged to the point \((x^*, f(x^*))\) on the curve with the minimum value.
It can be seen that the sequence converges to the optimal solution \(x^*\).
\[ \begin{array}{cc} \textbf{Parameter} & \textbf{Value} \\ \hline \textrm{Function} & f(x) = 2x^2 - 3x + 2 \\ \textrm{Derivative} & f'(x) = 4x - 3 \\ \textrm{Initial iterate} ~x_1 & 0.0 \\ \textrm{Tolerance}~ \epsilon & 10^{-6} \\ \textrm{Learning rate} ~\eta & 0.1 \\ \textrm{Converged solution} ~ x^* & 0.75 \\ \textrm{Minimum value}~ f(x^*) & 0.875 \\ \textrm{Iterations to converge} & 31 \\ \hline \end{array} \]
Summary of Example 1 Gradient Descent Results
import numpy as np
import matplotlib.pyplot as plt
# ==============================================
# 1. Define function and derivative (from LaTeX)
# f(x) = 2x² - 3x + 2
# f'(x) = 4x - 3
# ==============================================
def f(x):
return 2 * x**2 - 3 * x + 2
def df(x):
return 4 * x - 3
# ==============================================
# 2. Gradient Descent (EXACT parameters from LaTeX)
# ==============================================
x0 = 0.0 # initial iterate
tol = 1e-6 # tolerance
eta = 0.1 # learning rate
max_iter = 300
path = [] # store iteration path
for k in range(max_iter):
path.append(x0)
g0 = df(x0)
# Stopping condition
if abs(g0) <= tol:
print("Algorithm Succeed!")
break
# Update step
x0 = x0 - eta * g0## Algorithm Succeed!
# ==============================================
# 3. Print results (matches LaTeX output)
# ==============================================
print(f"x* = {x0}")## x* = 0.7499998341945602
## |g*| = 6.632217592894563e-07
## f(x*) = 0.8750000000000553
## iteration number = 31
# ==============================================
# 4. Plot (EXACTLY matches your LaTeX figure)
# ==============================================
x_plot = np.linspace(-0.2, 1.2, 200)
y_plot = f(x_plot)
plt.figure(figsize=(8, 6))
# Plot the blue curve
plt.plot(x_plot, y_plot, 'b-', linewidth=2, label=r'$f(x)=2x^2-3x+2$')
# Plot red iteration points
x_path = np.array(path)
y_path = f(x_path)
plt.scatter(x_path, y_path, color='red', s=25, zorder=5)
# Dashed line at optimal point x* = 0.75
plt.plot([0.75, 0.75], [0, 0.875], 'k--', linewidth=1)
plt.text(0.75, -0.05, r'$x^*$', ha='center', fontsize=12)
# Axes settings (same as LaTeX)
plt.axhline(0, color='black', linewidth=0.8)
plt.axvline(0, color='black', linewidth=0.8)
plt.xlabel(r'$x$')
plt.ylabel(r'$f(x)$')
plt.xlim(-0.2, 1.2)## (-0.2, 1.2)
## (0.8, 2.2)
plt.grid(alpha=0.3)
plt.title("Gradient Descent Path on $f(x) = 2x^2 - 3x + 2$")
plt.tight_layout()
plt.show()Find the minimum value of \(f(x) = 9x^2 - 7x + 6\). Use \(x_1 = 0\), \(\eta = 0.1\), and \(\epsilon = 10^{-6}\) with the above GDM code.
Solution
First, compute the derivative: \[ f'(x) = 18x - 7. \] Setting \(f'(x) = 0\) gives the critical point: \[ 18x - 7 = 0 \implies x^* = \frac{7}{18} \approx 0.3889. \] Since \(f''(x) = 18 > 0\), the function is convex, so \(x^* = \frac{7}{18}\) is the minimum point. The minimum value is: \[ f\left(\frac{7}{18}\right) = 9\left(\frac{7}{18}\right)^2 - 7\left(\frac{7}{18}\right) + 6 = \frac{167}{36} \approx 4.6389. \]
Summary of Example 2 Gradient Descent Results
\[ \begin{array}{cc} \textbf{Parameter} & \textbf{Value} \\ \hline \textrm{Function} & f(x) = 9x^2 - 7x + 6 \\ \textrm{Derivative} & f'(x) = 18x - 7 \\ \textrm{Initial iterate}~ x_1 & 0.0 \\ \textrm{Tolerance}~ \epsilon & 10^{-6} \\ \textrm{Learning rate}~ \eta & 0.1 \\ \textrm{True minimum}~ x^* & 7/18 \approx 0.3889 \\ \textrm{Minimum value}~ f(x^*) & 167/36 \approx 4.6389 \\ \hline \end{array} \]
import numpy as np
import matplotlib.pyplot as plt
# --------------------------
# Define the function and its derivative
# --------------------------
def f(x):
return 9 * x**2 - 7 * x + 6
def df(x):
return 18 * x - 7
# --------------------------
# Gradient Descent Parameters
# --------------------------
x = 0.0 # initial value x1 = 0
eta = 0.1 # learning rate η = 0.1
epsilon = 1e-6 # tolerance ε = 10⁻⁶
max_iter = 500
path = [] # to store the path of x values
# --------------------------
# Run Gradient Descent
# --------------------------
for k in range(max_iter):
path.append(x)
gradient = df(x)
if abs(gradient) <= epsilon:
print("Algorithm Succeed!")
break
x = x - eta * gradient## Algorithm Succeed!
# --------------------------
# Print results
# --------------------------
print(f"Optimal x* = {x:.8f}")## Optimal x* = 0.38888894
## Gradient at x* = 9.21e-07
## Minimum f(x*) = 4.63888889
## Iterations taken: 72
# --------------------------
# Plot the function and path
# --------------------------
x_plot = np.linspace(-0.2, 1.0, 200)
y_plot = f(x_plot)
plt.figure(figsize=(8, 5))
plt.plot(x_plot, y_plot, 'b-', linewidth=2, label=r'$f(x)=9x^2-7x+6$')
plt.scatter(path, [f(p) for p in path], color='red', s=25, zorder=5)
plt.axvline(x=7/18, color='k', linestyle='--', label=r'True minimum $x^*=7/18$')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Gradient Descent on $f(x)=9x^2-7x+6$')
plt.legend()
plt.grid(alpha=0.3)
plt.show()The GDM introduced in the first session is for minimizing a function \(f(x)\) of one variable. Let’s generalize the algorithm for minimizing a function of several variables. The least-squares problem we learned is a multivariate function \(f(\mathbf{x})\) with at least two independent variables. The least-squares problem was solved using the error function \(E(\mathbf{u})\) transformed using multiple variables. The least-squares problem can also be solved by the gradient descent method (GDM).
GDM Algorithm for minimizing a multi-variable function
Step 1 Set an initial iterate \(\mathbf{u}_1 \in \mathbb{R}^n\), tolerance \(0\leq\epsilon<1\), initial learning rate \(\eta\) (eta), and iteration number \(k:=1\).
Step 2 Compute \(\mathbf{g}_k = \nabla E(\mathbf{u}_k)\). If \(\|\mathbf{g}_k\| \leq \epsilon\), then stop.
Step 3 Set \(\mathbf{u}_{k+1} = \mathbf{u}_k - \eta \mathbf{g}_k\), \(k:=k+1\) and go to Step 2.
All the steps of the GDM algorithm for minimizing a multi-variable function are the same as for a one-variable function.
Correspondence between One-Variable and Multi-Variable Functions
\[ \begin{array} \hline \textbf{One variable function} & \textbf{Multi-variable function} \\ \hline \textrm{Scalar}~ x & \textrm{Vector}~ \mathbf{u} \\ \textrm{Absolute value}~ |a| & \textrm{Norm of vector}~ \|\mathbf{g}\| \\ \textrm{Derivative}~ f'(x) & \textrm{Gradient}~ \nabla E(\mathbf{u}) \\ \hline \end{array} \]
Let \(x,y\) be independent variables and \(z\) be a third variable. Now we can consider a function \(z=f(x,y)\), which is dependent on \(x\) and \(y\). Functions of several variables can be defined in the same manner.
In a 3-dimensional (coordinate) space, \(z=f(x,y)\) can be considered as a point \((x,y,z)\). As \(x\) and \(y\) move, the graph of \(z=f(x,y)\) becomes a surface. For example, the graph of a two-variable function \(z=f(x,y)=-xye^{-x^2-y^2}\) can be drawn using the codes as follows.
# Import required libraries (standard Python)
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# --------------------------
# Example 1: 3D Surface Plot of f(x,y) = -xye^(-x²-y²)
# --------------------------
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
# Create grid
x_surf = np.linspace(-2, 2, 100)
y_surf = np.linspace(-2, 2, 100)
X_surf, Y_surf = np.meshgrid(x_surf, y_surf)
# Define the surface function
f = -X_surf * Y_surf * np.exp(-X_surf**2 - Y_surf**2)
# Plot 3D surface
surf = ax.plot_surface(X_surf, Y_surf, f, cmap='viridis',
alpha=0.6, edgecolor='none')
ax.set_title('3D Surface of $f(x,y) = -xye^{-x^2 - y^2}$', fontsize=14)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.view_init(30, 30) # Same viewing angle as your LaTeX figure
# Show all plots
plt.tight_layout()
plt.show()The graph of \(f\) in the figure above intuitively shows that \(f\) has a local maximum value at the peak and a local minimum value at the valley. To find those critical points, the notion of a gradient of a multi-variable function is required.
The gradient of a function of two variables \(f(x,y)\) is defined as follows: \[ \operatorname{grad} f(x,y) = \nabla f(x,y) = \begin{bmatrix} \frac{\partial f}{\partial x} \\[6pt] \frac{\partial f}{\partial y} \end{bmatrix}. \]
The gradient of \(f(x,y)\) is a \(2\times1\) vector, where \(\frac{\partial f}{\partial x}\) means the partial derivative of \(f\) with respect to \(x\). The partial derivative with respect to \(x\) considers other variables except \(x\) is constant. Similarly, \(\frac{\partial f}{\partial y}\) means the partial derivative of \(f\) with respect to \(y\).
The gradient of \(z = f(x,y) = -xye^{-x^2-y^2}\) can be found as follows.
The output is: \[ (2x^2 y e^{(-x^2 - y^2)} - y e^{(-x^2 - y^2)},\ 2x y^2 e^{(-x^2 - y^2)} - x e^{(-x^2 - y^2)}) \]
Answer:
\[ \operatorname{grad} f(x,y) = \nabla f(x,y) = \left( 2x^2 y e^{-(x^2+y^2)} - y e^{-(x^2+y^2)},\ 2x y^2 e^{-(x^2+y^2)} - x e^{-(x^2+y^2)} \right) \]
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
# 1. Symbolic gradient computation (SymPy)
x, y = sp.symbols('x y')
f = -x * y * sp.exp(-x**2 - y**2)
# Compute partial derivatives
df_dx = sp.diff(f, x)
df_dy = sp.diff(f, y)
# Print the symbolic gradient (matches your image's formula)
print("Symbolic Gradient (SymPy):")## Symbolic Gradient (SymPy):
## ∂f/∂x = 2*x**2*y*exp(-x**2 - y**2) - y*exp(-x**2 - y**2)
## ∂f/∂y = 2*x*y**2*exp(-x**2 - y**2) - x*exp(-x**2 - y**2)
# 2. Convert to numerical functions (NumPy)
df_dx_num = sp.lambdify((x, y), df_dx, 'numpy')
df_dy_num = sp.lambdify((x, y), df_dy, 'numpy')
f_num = sp.lambdify((x, y), f, 'numpy')
# 3. Visualize the gradient vector field
x_vals = np.linspace(-2, 2, 20)
y_vals = np.linspace(-2, 2, 20)
X, Y = np.meshgrid(x_vals, y_vals)
# Evaluate gradient at grid points
U = df_dx_num(X, Y)
V = df_dy_num(X, Y)
# Plot vector field + surface contours
plt.figure(figsize=(8, 6))
contour = plt.contour(X, Y, f_num(X, Y), cmap='viridis')
plt.clabel(contour, inline=True, fontsize=8)
plt.quiver(X, Y, U, V, color='red', scale=50, width=0.003)
plt.title(r'Gradient Vector Field of $f(x,y) = -x y e^{-x^2 - y^2}$')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(alpha=0.3)
plt.axis('equal')## (np.float64(-2.0), np.float64(2.0), np.float64(-2.0), np.float64(2.0))
Similarly, the gradient of a multi-variable function with three or more variables \(f\) can be obtained. In general, the gradient of the \(n\)-variable function \(f(x_1, x_2, \dots, x_n)\) is defined as: \[ \operatorname{grad} f(x_1, x_2, \dots, x_n) = \nabla f(x_1, x_2, \dots, x_n) = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\[6pt] \frac{\partial f}{\partial x_2} \\[6pt] \vdots \\[6pt] \frac{\partial f}{\partial x_n} \end{bmatrix}. \]
Given:
\[ \begin{array}{ccccccc} \hline x_i & u_i \\ \hline 0 & 1 \\ 1 & 3 \\ 2 & 4 \\ 3 & 4 \\ \hline \end{array} \]
The first example of the least-squares problem was as follows. For each data \((x_i, u_i)\), let \(\hat{u}_i\) be the value obtained by substituting \(x_i\) in a linear function \(u = a + bx\). Thus: \[ \hat{u}_i = a + bx_i. \] If this equation’s solution does not exist, it is possible to find \(a, b\) where the square of error \((u_i - \hat{u}_i)^2\) is minimized. The error function \(E(\mathbf{u})\), which is the sum of squared errors \((u_i - \hat{u}_i)^2\), is defined as follows (the factor \(\frac{1}{2}\) is included for computational convenience and does not affect the conclusion):
\[\begin{align*} E(\mathbf{u}) & = E(a,b) = \frac{1}{2} \left\{ (a-1)^2 + (a+b-3)^2 + (a+2b-4)^2 + (a+3b-4)^2 \right\} \\ & = \frac{1}{2} \| A\mathbf{u} - \mathbf{y} \|^2. \end{align*}\]
Let’s use the GDM to find the approximate solution \(\hat{\mathbf{u}}\) that minimizes \(E(\mathbf{u})\).
Set an initial iterate \(\mathbf{u}_1 = (-2.5, -2.5)\), tolerance \(\epsilon = 10^{-6}\), and initial learning rate \(\eta = 0.1\).
GDM Parameters for Least-Squares Problem
\[ \begin{array}{ccc} \textbf{Parameter} & \textbf{Value} \\ \hline \textrm{Initial iterate} ~\mathbf{u}_1 & (-2.5, -2.5) \\ \textrm{Tolerance} ~ \epsilon & 10^{-6} \\ \textrm{Learning rate} ~ \eta & 0.1 \\ \hline \end{array} \]
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# ==============================
# 1. Define Error Function E(a,b)
# ==============================
def error_func(a, b):
return 0.5 * (
(a - 1)**2 +
(a + b - 3)**2 +
(a + 2*b - 4)**2 +
(a + 3*b - 4)**2
)
# ==============================
# 2. Define Gradient of E(a,b)
# ==============================
def gradient(a, b):
dE_da = (a - 1) + (a + b - 3) + (a + 2*b - 4) + (a + 3*b - 4)
dE_db = (a + b - 3) + 2*(a + 2*b - 4) + 3*(a + 3*b - 4)
return np.array([dE_da, dE_db])
# ==============================
# 3. Gradient Descent Parameters (MATCH YOUR LATEX)
# ==============================
u = np.array([-2.5, -2.5]) # initial point
tol = 1e-6 # tolerance
eta = 0.1 # learning rate
max_iter = 300
path = [] # store iteration path
# ==============================
# 4. Run Gradient Descent
# ==============================
for k in range(max_iter):
grad = gradient(u[0], u[1])
grad_norm = np.linalg.norm(grad)
path.append((u[0], u[1], error_func(u[0], u[1])))
if grad_norm <= tol:
print("Algorithm Succeed!")
break
u = u - eta * grad## Algorithm Succeed!
# ==============================
# 5. Print Final Results
# ==============================
print(f"u* = {u}")## u* = [1.49999929 1.00000033]
## E(x*) = 0.500000000000364
## iteration number = 118
# ==============================
# 6. Plot 1: 3D Error Surface + Descent Path
# ==============================
a_vals = np.linspace(-3, 5, 50)
b_vals = np.linspace(-3, 5, 50)
A, B = np.meshgrid(a_vals, b_vals)
Z = error_func(A, B)
path_np = np.array(path)
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(A, B, Z, cmap='viridis', alpha=0.6, edgecolor='none')
ax.plot(path_np[:,0], path_np[:,1], path_np[:,2], 'r-', linewidth=2)
ax.scatter(path_np[:,0], path_np[:,1], path_np[:,2], c='black', s=15)
ax.set_xlabel('a')
ax.set_ylabel('b')
ax.set_zlabel('E(a,b)')
ax.view_init(30, 30)
ax.set_box_aspect([5, 5, 1])
plt.title("3D Error Function & Gradient Descent Path")
plt.show()# ==============================
# 7. Plot 2: Least-Squares Line Fit
# ==============================
x_data = np.array([0, 1, 2, 3])
y_data = np.array([1, 3, 4, 4])
x_fit = np.linspace(-0.5, 3.5, 100)
y_fit = u[0] + u[1] * x_fit
plt.figure(figsize=(8, 6))
plt.scatter(x_data, y_data, c='red', s=60, label='Data points')
plt.plot(x_fit, y_fit, 'b-', linewidth=2, label=f'Fit: y = {u[0]:.2f} + {u[1]:.2f}x')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(alpha=0.3)
plt.legend()
plt.title('Least-Squares Linear Fit')
plt.xlim(-1, 4)## (-1.0, 4.0)
## (-1.0, 5.0)
As we found earlier, we found the least-square line by algorithm from the given data. The least-squares solution is obtained with the algorithm. The least-squares line \(y = \frac{3}{2} + x\) was obtained with the answer as a coefficient.
Summary of GDM least-squares linear fit
\[ \begin{array}{ccc} \textbf{Parameter} & \textbf{Value} \\ \hline \textrm{Initial iterate}~\mathbf{u}_1 & (-2.5, -2.5) \\ \textrm{Tolerance}~ \epsilon & 10^{-6} \\ \textrm{Learning rate}~ \eta & 0.1 \\ \textrm{Converged solution}~ \mathbf{u}^* & (1.5, 1.0) \\ \textrm{Minimum error}~ E(\mathbf{x}^*) & 0.5 \\ \textrm{Iterations to converge}~ & 118 \\ \textrm{Least-squares line}~ & y = x + \frac{3}{2} \\ \hline \end{array} \]
Least-Squares Line Solution
The least-square line is \[ y = \frac{3}{2} + x. \]
Therefore, the line we get from the least-squares solution is \[ y = \frac{3}{2} + x. \]
We can also find a quadratic function \(y = a + bx + cx^2\) that best fits the given data. In this case, the error function \(E(\mathbf{u})\) is as follows: \[ \begin{aligned} E(a,b,c) &= \frac{1}{2}\left\{(a-1)^2 + (a+b+c-3)^2 + (a+2b+4c-4)^2 + (a+3b+9c-4)^2\right\} \\ &= \frac{1}{2} \| A\mathbf{u} - \mathbf{y} \|^2. \end{aligned} \]
Summary of least-squares problems
\[ \begin{array}{ccc} \textbf{Problem Type} & \textbf{Form/Result} \\ \hline \textrm{Linear least-squares model} & y = a + bx \\ \textrm{Linear least-squares solution} & y = x + \frac{3}{2} \\ \textrm{Quadratic least-squares model} & y = a + bx + cx^2 \\ \textrm{Error function for quadratic fit} & E(a,b,c) = \frac{1}{2}\|A\mathbf{u}-\mathbf{y}\|^2 \\ \hline \end{array} \]
import numpy as np
import matplotlib.pyplot as plt
# --------------------------
# 1. Define the error function E(a,b,c)
# --------------------------
def E(a, b, c):
return 0.5 * (
(a - 1)**2 +
(a + b + c - 3)**2 +
(a + 2*b + 4*c - 4)**2 +
(a + 3*b + 9*c - 4)**2
)
# --------------------------
# 2. Compute the gradient of E(a,b,c) (analytical form)
# --------------------------
def grad_E(a, b, c):
# Partial derivative with respect to a
dE_da = (a - 1) + (a + b + c - 3) + (a + 2*b + 4*c - 4) + (a + 3*b + 9*c - 4)
# Partial derivative with respect to b
dE_db = (a + b + c - 3) + 2*(a + 2*b + 4*c - 4) + 3*(a + 3*b + 9*c - 4)
# Partial derivative with respect to c
dE_dc = (a + b + c - 3) + 4*(a + 2*b + 4*c - 4) + 9*(a + 3*b + 9*c - 4)
return np.array([dE_da, dE_db, dE_dc])
# --------------------------
# 3. Gradient Descent Algorithm
# --------------------------
# Initial parameters (matches your code)
u = np.array([1.0, 1.0, 1.0]) # initial value (a,b,c)
tol = 1e-6 # tolerance level
eta = 0.01 # learning rate
max_iter = 5000 # max iterations
for k in range(max_iter):
g = grad_E(u[0], u[1], u[2])
gn = np.linalg.norm(g)
if gn <= tol:
print("Algorithm Succeed!")
break
u = u - eta * g## Algorithm Succeed!
# --------------------------
# 4. Print results (matches your output)
# --------------------------
print(f"u* = {u}")## u* = [ 1.00000138 2.49999724 -0.49999918]
## E(x*) = 1.5966473717177356e-12
## iteration number = 4207
# --------------------------
# 5. Plot the data points and least-squares quadratic curve
# --------------------------
# Original data points
x_data = np.array([0, 1, 2, 3])
y_data = np.array([1, 3, 4, 4])
# Generate curve points
x_curve = np.linspace(-0.5, 3.5, 100)
y_curve = u[0] + u[1] * x_curve + u[2] * (x_curve ** 2)
# Plot
plt.figure(figsize=(8, 6))
plt.scatter(x_data, y_data, color='red', zorder=5, label='Data Points')
plt.plot(x_curve, y_curve, color='blue', label=f'Quadratic Fit: $y = {u[0]:.2f} + {u[1]:.2f}x + {u[2]:.2f}x^2$')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Least-Squares Quadratic Curve via Gradient Descent')
plt.legend()
plt.grid(alpha=0.3)
plt.ylim(-1, 5)## (-1.0, 5.0)
## (-0.5, 4.0)