Let's put the theory into practice. One of the most direct applications of solving linear systems in machine learning is determining the optimal parameters (coefficients) for models like linear regression. We want to find the best fit line or hyperplane through our data points.Consider a simple linear regression scenario. We have a set of data points $(x_i, y_i)$, where $x_i$ represents the features and $y_i$ represents the target value. For a model with multiple features, $x_i$ would be a vector. Our goal is to find a set of coefficients, let's call them $\theta$ (often represented by $x$ in the $Ax=b$ form), such that the model's predictions, $\hat{y} = X\theta$, are as close as possible to the actual target values $y$.The standard approach in linear regression involves minimizing the sum of squared errors between predictions and actual values. This minimization problem leads directly to a system of linear equations known as the Normal Equations:$$X^T X \theta = X^T y$$Here:$X$ is the matrix of features (often called the design matrix). Each row represents a data point, and each column represents a feature. A column of ones is usually added to $X$ to account for the intercept term.$\theta$ is the vector of model coefficients we want to find.$y$ is the vector of actual target values.$X^T$ is the transpose of the feature matrix $X$.Notice that this equation has the familiar form $A x = b$, where $A = X^T X$, $x = \theta$, and $b = X^T y$. Assuming the matrix $A = X^T X$ is invertible, we can solve for $\theta$ using the matrix inverse:$$\theta = (X^T X)^{-1} X^T y$$Let's work through an example using NumPy. Suppose we have some data representing, for instance, the square footage of houses and their corresponding prices. We want to find the linear model price = intercept + coefficient * square_footage that best fits this data.Setting up the DataFirst, let's define our sample data. We'll use two features for a slightly more complex example (e.g., square footage and number of bedrooms) to predict the price.import numpy as np # Sample data: [Square Footage, Number of Bedrooms] X_features = np.array([ [1500, 3], [2000, 4], [1200, 2], [1800, 3], [2500, 4] ]) # Corresponding house prices (in $1000s) y_target = np.array([300, 450, 250, 400, 550]) # Add a column of ones to X_features for the intercept term # This creates our design matrix X X_design = np.hstack([np.ones((X_features.shape[0], 1)), X_features]) print("Design Matrix X (with intercept column):\n", X_design) print("\nTarget Vector y:\n", y_target)Calculating Coefficients using the Inverse MethodNow, we apply the Normal Equation formula $\theta = (X^T X)^{-1} X^T y$.# Calculate X transpose X (XTX) XTX = X_design.T @ X_design # Using the @ operator for matrix multiplication # Calculate X transpose y (XTy) XTy = X_design.T @ y_target print("\nX^T * X:\n", XTX) print("\nX^T * y:\n", XTy) # Calculate the inverse of XTX XTX_inv = np.linalg.inv(XTX) print("\nInverse of (X^T * X):\n", XTX_inv) # Calculate the coefficients theta theta = XTX_inv @ XTy print("\nCalculated Coefficients (theta) using Inverse:\n", theta) print(f"Intercept: {theta[0]:.2f}") print(f"Coefficient for Sq Footage: {theta[1]:.2f}") print(f"Coefficient for Bedrooms: {theta[2]:.2f}")The resulting theta vector contains the intercept and the coefficients for each feature (square footage and number of bedrooms).Solving using np.linalg.solve()While calculating the inverse works, it's often less numerically stable and computationally more expensive than directly solving the system $A x = b$. NumPy provides the np.linalg.solve(A, b) function, which solves the system without explicitly computing the inverse. It's generally the preferred method.Let's solve $(X^T X) \theta = X^T y$ using np.linalg.solve(). Here, $A = X^T X$ and $b = X^T y$.# Solve the system (XTX) * theta = XTy directly theta_solve = np.linalg.solve(XTX, XTy) print("\nCalculated Coefficients (theta) using np.linalg.solve():\n", theta_solve) print(f"Intercept: {theta_solve[0]:.2f}") print(f"Coefficient for Sq Footage: {theta_solve[1]:.2f}") print(f"Coefficient for Bedrooms: {theta_solve[2]:.2f}")You should see that the coefficients obtained using np.linalg.solve() are virtually identical to those calculated using the explicit inverse method. However, for larger or more complex systems, np.linalg.solve() offers better performance and numerical accuracy.Visualizing the Fit (Simplified 1-Feature Case)Let's simplify to one feature (e.g., square footage) to visualize the result.import numpy as np # Use only square footage X_features_simple = np.array([1500, 2000, 1200, 1800, 2500])[:, np.newaxis] # Ensure it's a column vector y_target_simple = np.array([300, 450, 250, 400, 550]) # Add intercept column X_design_simple = np.hstack([np.ones((X_features_simple.shape[0], 1)), X_features_simple]) # Solve using np.linalg.solve XTX_simple = X_design_simple.T @ X_design_simple XTy_simple = X_design_simple.T @ y_target_simple theta_simple = np.linalg.solve(XTX_simple, XTy_simple) print("\nCoefficients for Simple Model (Intercept, Sq Footage Coeff):\n", theta_simple) # Generate points for the regression line x_line = np.linspace(1100, 2600, 100) # Range covering our data y_line = theta_simple[0] + theta_simple[1] * x_line # Create Plotly chart data plot_data = { "layout": { "title": "Linear Regression Fit (Price vs. Square Footage)", "xaxis": {"title": "Square Footage"}, "yaxis": {"title": "Price ($1000s)"}, "hovermode": "closest", "template": "plotly_white" # Cleaner look for web }, "data": [ { "type": "scatter", "mode": "markers", "x": X_features_simple.flatten().tolist(), "y": y_target_simple.tolist(), "name": "Data Points", "marker": {"color": "#228be6", "size": 10} # blue }, { "type": "scatter", "mode": "lines", "x": x_line.tolist(), "y": y_line.tolist(), "name": "Regression Line", "line": {"color": "#f03e3e", "width": 3} # red } ] }{"layout": {"title": "Linear Regression Fit (Price vs. Square Footage)", "xaxis": {"title": "Square Footage"}, "yaxis": {"title": "Price ($1000s)"}, "hovermode": "closest", "template": "plotly_white"}, "data": [{"type": "scatter", "mode": "markers", "x": [1500, 2000, 1200, 1800, 2500], "y": [300, 450, 250, 400, 550], "name": "Data Points", "marker": {"color": "#228be6", "size": 10}}, {"type": "scatter", "mode": "lines", "x": [1100.0, 1115.151515151515, 1130.3030303030303, 1145.4545454545455, 1160.6060606060605, 1175.7575757575758, 1190.909090909091, 1206.060606060606, 1221.2121212121212, 1236.3636363636363, 1251.5151515151515, 1266.6666666666667, 1281.8181818181818, 1296.969696969697, 1312.1212121212122, 1327.2727272727273, 1342.4242424242424, 1357.5757575757576, 1372.7272727272727, 1387.8787878787878, 1403.030303030303, 1418.1818181818182, 1433.3333333333333, 1448.4848484848485, 1463.6363636363637, 1478.7878787878788, 1493.939393939394, 1509.090909090909, 1524.2424242424242, 1539.3939393939393, 1554.5454545454545, 1569.6969696969697, 1584.8484848484848, 1600.0, 1615.1515151515152, 1630.3030303030303, 1645.4545454545455, 1660.6060606060605, 1675.7575757575758, 1690.909090909091, 1706.060606060606, 1721.2121212121212, 1736.3636363636363, 1751.5151515151515, 1766.6666666666667, 1781.818181818182, 1796.969696969697, 1812.1212121212122, 1827.2727272727273, 1842.4242424242424, 1857.5757575757576, 1872.7272727272727, 1887.8787878787878, 1903.030303030303, 1918.1818181818182, 1933.3333333333333, 1948.4848484848485, 1963.6363636363637, 1978.7878787878788, 1993.939393939394, 2009.090909090909, 2024.2424242424242, 2039.3939393939393, 2054.5454545454545, 2069.6969696969697, 2084.8484848484848, 2100.0, 2115.1515151515152, 2130.3030303030303, 2145.4545454545455, 2160.6060606060605, 2175.7575757575758, 2190.909090909091, 2206.060606060606, 2221.2121212121212, 2236.3636363636363, 2251.5151515151515, 2266.6666666666665, 2281.818181818182, 2296.969696969697, 2312.121212121212, 2327.2727272727275, 2342.4242424242424, 2357.5757575757576, 2372.727272727273, 2387.878787878788, 2403.030303030303, 2418.181818181818, 2433.3333333333335, 2448.4848484848485, 2463.6363636363635, 2478.787878787879, 2493.939393939394, 2509.090909090909, 2524.242424242424, 2539.393939393939, 2554.5454545454545, 2569.69696969697, 2584.848484848485, 2600.0], "y": [234.87730061349695, 238.60490797546013, 242.33251533742332, 246.0601226993865, 249.7877300613497, 253.51533742331288, 257.2429447852761, 260.9705521472393, 264.69815950920245, 268.42576687116564, 272.15337423312883, 275.88098159509203, 279.6085889570552, 283.3361963190184, 287.0638036809816, 290.7914110429448, 294.519018404908, 298.2466257668712, 301.97423312883436, 305.70184049079755, 309.42944785276074, 313.15705521472393, 316.8846625766871, 320.6122699386503, 324.3398773006135, 328.0674846625767, 331.7950920245399, 335.5227, 339.25030674846625, 342.97791411042944, 346.70552147239264, 350.43312883435583, 354.160736196319, 357.8883435582822, 361.6159509202454, 365.3435582822086, 369.07116564417177, 372.79877300613497, 376.52638036809816, 380.25398773006135, 383.98159509202454, 387.70920245398774, 391.4368098159509, 395.1644171779141, 398.8920245398773, 402.6196319018405, 406.3472392638037, 410.07484662576685, 413.80245398773005, 417.53006134969324, 421.25766871165644, 424.98527607361963, 428.7128834355828, 432.440490797546, 436.1680981595092, 439.8957055214724, 443.6233128834356, 447.3509202453988, 451.078527607362, 454.80613496932514, 458.53374233128833, 462.2613496932515, 465.9889570552147, 469.7165644171779, 473.4441717791411, 477.1717791411043, 480.8993865030675, 484.6269938650307, 488.35460122699386, 492.08220858895706, 495.80981595092025, 499.53742331288344, 503.26503067484663, 506.9926380368098, 510.720245398773, 514.4478527607362, 518.1754601226994, 521.9030674846626, 525.6306748466257, 529.3582822085889, 533.0858895705521, 536.8134969325153, 540.5411042944785, 544.2687116564417, 547.9963190184049, 551.7239263803681, 555.4515337423313, 559.1791411042944, 562.9067484662576, 566.6343558282208, 570.361963190184, 574.0895705521472, 577.8171779141104, 581.5447852760736, 585.2723926380368, 589.0, 592.7276073619632, 596.4552147239264], "name": "Regression Line", "line": {"color": "#f03e3e", "width": 3}}]}Linear regression line fitted to sample data using coefficients derived from solving the Normal Equations.This hands-on section demonstrates how solving systems of linear equations is a practical tool for finding parameters in machine learning models. By representing the problem in matrix form ($Ax=b$, or more specifically, $(X^T X) \theta = X^T y$), we can leverage efficient numerical libraries like NumPy to find the optimal coefficients, either through matrix inversion or, preferably, using direct solvers like np.linalg.solve().