Python's NumPy library is used to work with feature vectors directly. Feature vectors represent data, and fundamental operations can be performed on them. This practical application illustrates vector manipulation, norms, dot products, and distances – operations performed constantly in machine learning algorithms.Imagine we have data representing simplified profiles of two online users based on their engagement (e.g., hours spent) with three types of content: articles, videos, and podcasts.User A: [10 hours articles, 5 hours videos, 2 hours podcasts]User B: [8 hours articles, 9 hours videos, 3 hours podcasts]We can represent these as vectors in $\mathbb{R}^3$.Setting Up the VectorsFirst, let's import NumPy and create these vectors.import numpy as np # User engagement vectors (hours) user_a = np.array([10, 5, 2]) user_b = np.array([8, 9, 3]) print(f"User A vector: {user_a}") print(f"User B vector: {user_b}")Basic OperationsLet's say we want to analyze the difference in engagement between these two users. We can simply subtract one vector from the other.# Calculate the difference vector difference = user_a - user_b print(f"Difference (A - B): {difference}")The result [2, -4, -1] shows User A spent 2 more hours on articles, 4 fewer hours on videos, and 1 fewer hour on podcasts compared to User B.Now, suppose we want to project what User A's engagement might look like if they increased their activity by 50% across all categories. This is a scalar multiplication.# Scale User A's engagement by 1.5 (150%) scaled_user_a = user_a * 1.5 print(f"Scaled User A: {scaled_user_a}")Calculating Norms (Magnitude)How can we quantify the "overall engagement" of each user? Vector norms give us a sense of magnitude. Let's calculate the $L_2$ (Euclidean) and $L_1$ (Manhattan) norms.The $L_2$ norm is calculated as $||v||_2 = \sqrt{\sum_i v_i^2}$.# L2 Norm (Euclidean) l2_norm_a = np.linalg.norm(user_a) # Default is L2 norm l2_norm_b = np.linalg.norm(user_b) print(f"L2 Norm (User A): {l2_norm_a:.2f}") print(f"L2 Norm (User B): {l2_norm_b:.2f}")The $L_2$ norm gives a straight-line distance from the origin in the 3D feature space. It provides a general magnitude considering all components.The $L_1$ norm is calculated as $||v||_1 = \sum_i |v_i|$.# L1 Norm (Manhattan) l1_norm_a = np.linalg.norm(user_a, ord=1) l1_norm_b = np.linalg.norm(user_b, ord=1) print(f"L1 Norm (User A): {l1_norm_a:.2f}") print(f"L1 Norm (User B): {l1_norm_b:.2f}")The $L_1$ norm represents the total sum of engagement hours across categories. In this context, it's perhaps more directly interpretable as total time spent. User B has a slightly higher total engagement time (20 hours) than User A (17 hours), even though User A's $L_2$ norm is slightly higher. Different norms highlight different aspects of the vector's magnitude.Dot Product for SimilarityThe dot product helps measure the alignment or similarity between two vectors. A higher positive dot product suggests the vectors point in more similar directions.The dot product is $A \cdot B = \sum_i A_i B_i$.# Calculate the dot product dot_product = np.dot(user_a, user_b) print(f"Dot Product (A . B): {dot_product}")The result (131) is positive, indicating some alignment in their engagement patterns. To get a standardized measure of similarity, independent of the magnitude (total hours), we can calculate the cosine similarity:$$ \cos(\theta) = \frac{A \cdot B}{||A||_2 ||B||_2} $$# Calculate cosine similarity cosine_similarity = dot_product / (l2_norm_a * l2_norm_b) print(f"Cosine Similarity: {cosine_similarity:.2f}")A cosine similarity close to 1 means the vectors point in very similar directions (proportional engagement across categories), while a value close to 0 indicates orthogonality (very different patterns), and -1 indicates opposite directions. Here, 0.91 suggests relatively similar engagement profiles, despite differences in specific categories. This metric is widely used in recommendation systems and information retrieval.Calculating DistanceHow "far apart" are these users in terms of their engagement profiles? We can use the norms to calculate distances. The most common is the Euclidean distance, which is simply the $L_2$ norm of the difference vector we calculated earlier.$$ \text{Distance}(A, B) = ||A - B||_2 $$# Calculate Euclidean distance euclidean_distance = np.linalg.norm(difference) # L2 norm of the difference vector # Alternatively: np.linalg.norm(user_a - user_b) print(f"Euclidean Distance between User A and User B: {euclidean_distance:.2f}")This distance gives a single number representing how different the two users' engagement vectors are in the 3D space. Lower distances imply more similar users. This concept is fundamental to algorithms like k-Nearest Neighbors (k-NN).We could also calculate the Manhattan distance ($L_1$ norm of the difference):# Calculate Manhattan distance manhattan_distance = np.linalg.norm(difference, ord=1) print(f"Manhattan Distance between User A and User B: {manhattan_distance:.2f}")The Manhattan distance sums the absolute differences along each axis ($|2| + |-4| + |-1| = 7$). It represents the distance if you could only travel along the grid lines of the feature space axes.SummaryIn this practical exercise, you used NumPy to:Represent data points (user profiles) as vectors.Perform basic vector arithmetic (subtraction, scalar multiplication) to compare and transform data.Calculate $L_1$ and $L_2$ norms to understand vector magnitude or total engagement.Compute the dot product and cosine similarity to measure the alignment or similarity of user preferences.Calculate Euclidean and Manhattan distances to quantify the difference between user profiles.These operations are the building blocks for many machine learning techniques. Being comfortable with their calculation and interpretation using tools like NumPy is an important step in applying linear algebra effectively. As you progress, you'll see these operations applied to vectors with many more dimensions, representing complex data features.