Okay, let's dive into the mechanics of actually calculating partial derivatives. The concept introduced earlier, treating other variables as constants, is the main idea you need to remember. If you're comfortable calculating derivatives for single-variable functions like f(x), you already have most of the skills needed.
The Process: Freeze and Differentiate
When you need to find the partial derivative of a function with respect to a specific variable, follow these steps:
- Identify the Target Variable: Determine which variable you are differentiating with respect to. For example, if you're calculating ∂x∂f, your target variable is x.
- Treat Other Variables as Constants: Mentally (or literally, if it helps) replace all other variables in the function with fixed constants. Think of them as numbers like 5, -2, or π.
- Apply Standard Differentiation Rules: Now, differentiate the function using the familiar rules (like the Power Rule, Constant Rule, Sum Rule) only with respect to your target variable. Remember that the derivative of any constant term is zero.
Let's work through a couple of examples to make this concrete.
Example 1: A Simple Polynomial
Consider the function:
f(x,y)=x2+y3+4
Let's find the partial derivatives with respect to x and y.
Calculating ∂x∂f (Partial derivative with respect to x):
-
Target Variable: x.
-
Treat Others as Constants: Treat y as a constant. This means y3 is also treated as a constant. The number 4 is already a constant.
-
Differentiate:
- The derivative of x2 with respect to x is 2x.
- The derivative of y3 (treated as a constant) with respect to x is 0.
- The derivative of 4 (a constant) with respect to x is 0.
Putting it together using the Sum Rule:
∂x∂f=2x+0+0=2x
Calculating ∂y∂f (Partial derivative with respect to y):
-
Target Variable: y.
-
Treat Others as Constants: Treat x as a constant. This means x2 is also treated as a constant. The number 4 is a constant.
-
Differentiate:
- The derivative of x2 (treated as a constant) with respect to y is 0.
- The derivative of y3 with respect to y is 3y2.
- The derivative of 4 (a constant) with respect to y is 0.
Putting it together:
∂y∂f=0+3y2+0=3y2
Notice how the process isolates the effect of changing only one variable. When we calculated ∂x∂f, the y3 term vanished because, from the perspective of x, y isn't changing.
Example 2: Variables Multiplied Together
Let's look at a function structure often seen when dealing with model parameters, like weights (w) and biases (b):
g(w,b)=w2b+5w−2b+7
Calculating ∂w∂g (Partial derivative with respect to w):
-
Target Variable: w.
-
Treat Others as Constants: Treat b as a constant.
-
Differentiate:
- Consider the term w2b. Since b is treated as a constant coefficient, the derivative of w2b with respect to w is (b)×(2w)=2wb. (Think of it like finding the derivative of 5x2, which is 5×2x=10x; here b plays the role of 5.)
- The derivative of 5w with respect to w is 5.
- The derivative of −2b (treated as a constant) with respect to w is 0.
- The derivative of 7 (a constant) with respect to w is 0.
Combining these:
∂w∂g=2wb+5+0+0=2wb+5
Calculating ∂b∂g (Partial derivative with respect to b):
-
Target Variable: b.
-
Treat Others as Constants: Treat w as a constant. This means w2 and 5w are also treated as constants.
-
Differentiate:
- Consider the term w2b. Since w2 is treated as a constant coefficient, the derivative of w2b with respect to b is w2×1=w2. (Think of it like finding the derivative of ax with respect to x, which is a; here w2 plays the role of a and b plays the role of x.)
- The derivative of 5w (treated as a constant) with respect to b is 0.
- The derivative of −2b with respect to b is −2.
- The derivative of 7 (a constant) with respect to b is 0.
Combining these:
∂b∂g=w2+0−2+0=w2−2
Key Takeaway
Calculating partial derivatives reuses the differentiation rules you learned for single-variable functions. The essential technique is to temporarily "freeze" all variables except the one you are differentiating with respect to, treating them as constants during the calculation. This allows you to determine how the function's output changes as that single target variable changes, holding everything else steady. This technique is fundamental for understanding how to adjust parameters in machine learning models using methods like gradient descent.