Once you have trained a LinearRegression
model using Scikit-learn's fit
method, the model learns the optimal parameters (coefficients and intercept) that define the best-fit line (or hyperplane in higher dimensions) through your training data. Understanding these parameters is significant for interpreting the relationship between your features and the target variable.
Recall the general equation for a linear regression model with n features:
y^=b0+b1x1+b2x2+⋯+bnxnWhere:
After fitting a LinearRegression
model in Scikit-learn, these learned parameters are stored in specific attributes of the model object:
intercept_
attribute. This will be a single floating-point number.coef_
attribute. This will be a NumPy array, where each element corresponds to the coefficient for a feature. The order of coefficients matches the order of columns in the feature matrix X
used during training.Let's assume you have a fitted model named lr_model
:
# Assume X_train, y_train are defined and lr_model is a fitted LinearRegression instance
# Example:
# from sklearn.linear_model import LinearRegression
# lr_model = LinearRegression()
# lr_model.fit(X_train, y_train) # Fit the model first
# Access the intercept
intercept = lr_model.intercept_
print(f"Intercept (b0): {intercept:.4f}")
# Access the coefficients
coefficients = lr_model.coef_
print(f"Coefficients (b1, b2, ...): {coefficients}")
# If you have feature names (e.g., from a Pandas DataFrame)
# feature_names = ['feature1', 'feature2', ...]
# for feature, coef in zip(feature_names, coefficients):
# print(f" Coefficient for {feature}: {coef:.4f}")
Intercept: The value intercept_
tells you the predicted value of your target variable when all input features are equal to zero. The practical relevance of this value depends heavily on the context of your problem. If having all features at zero is a meaningful scenario in your dataset (e.g., features represent counts or amounts that can be zero), the intercept has a direct interpretation. If zero is outside the realistic range for your features (e.g., predicting house price based on square footage, where zero square footage is impossible), the intercept might be more of a mathematical artifact needed to position the regression line correctly, rather than having a standalone practical meaning.
Coefficients: Each value in the coef_
array quantifies the relationship between a specific feature and the target variable. For example, if the coefficient for feature1
is 15.7
, it means:
"Holding all other features constant, a one-unit increase in
feature1
is associated with an expected increase of 15.7 units in the target variable."
Conversely, if a coefficient is negative, say -3.2
for feature2
, it means:
"Holding all other features constant, a one-unit increase in
feature2
is associated with an expected decrease of 3.2 units in the target variable."
The phrase "holding all other features constant" is important. Linear regression assumes that the effect of one feature on the target is independent of the values of other features, and the coefficient reflects this isolated effect.
Visualization of a simple linear regression line y=10+2x. The intercept (b0=10) is the value of y where the line crosses the y-axis (x=0). The coefficient or slope (b1=2) indicates that for every one-unit increase in x, y increases by 2 units.
It's tempting to compare the magnitudes of coefficients to determine which feature is "most important". However, this is often misleading if the features are on different scales.
Consider predicting salary based on years_experience
(ranging from 0 to 30) and projects_completed
(ranging from 0 to 500). A coefficient of 2000 for years_experience
and a coefficient of 50 for projects_completed
doesn't automatically mean experience is vastly more impactful. A one-unit change in years is a much larger proportional change than a one-unit change in projects.
To make coefficients directly comparable in terms of importance, you typically need to scale your features first (e.g., using StandardScaler
or MinMaxScaler
from Scikit-learn, which we will cover in Chapter 4). When features are scaled to have a similar range (like a standard deviation of 1), larger coefficient magnitudes generally indicate a stronger effect on the target variable for a standardized change in that feature. Without scaling, interpret coefficients relative to the units of their respective features.
© 2025 ApX Machine Learning