Scatter plots are a fundamental tool for visualizing the relationship between two numerical variables. While Matplotlib provides the scatter()
function, Seaborn offers scatterplot()
, a more powerful and flexible function designed to work seamlessly with Pandas DataFrames and easily incorporate additional variables through visual semantics like color, size, and marker style.
Building upon Seaborn's high-level approach, scatterplot()
simplifies the process of creating visually appealing and informative scatter plots directly from your data structures.
scatterplot
The most common use case for a scatter plot is to plot the values of two variables against each other. Let's assume you have data loaded into a Pandas DataFrame. Seaborn's scatterplot()
function makes this straightforward. You typically provide the DataFrame to the data
argument and specify the column names (as strings) for the x-axis and y-axis using the x
and y
arguments.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Generate some sample data
np.random.seed(0) # for reproducibility
data = pd.DataFrame({
'x_values': np.random.rand(50) * 10,
'y_values': 2.5 * np.random.rand(50) * 10 + np.random.randn(50) * 5
})
# Create a basic scatter plot
plt.figure(figsize=(8, 5)) # Optional: Adjust figure size
sns.scatterplot(x='x_values', y='y_values', data=data)
# Add title and labels (using Matplotlib functions)
plt.title('Basic Scatter Plot using Seaborn')
plt.xlabel('X Values')
plt.ylabel('Y Values')
# Display the plot
plt.show()
This code generates a simple scatter plot where each point corresponds to a row in the DataFrame, positioned according to its 'x_values' and 'y_values'. Notice how Seaborn applies its default styling for a cleaner look compared to the basic Matplotlib output.
hue
, size
, and style
One of the significant advantages of seaborn.scatterplot
is its ability to map other variables in your dataset to visual properties of the points. This allows you to represent more dimensions of your data within a single 2D plot.
hue
for Categorical DistinctionThe hue
parameter is perhaps the most commonly used semantic mapping. It assigns different colors to points based on the values in a categorical column. This is excellent for comparing the relationship between x
and y
across different groups.
Let's use one of Seaborn's built-in datasets, 'tips', which contains information about restaurant tips. We can explore the relationship between the total bill and the tip amount, differentiated by the day of the week.
import seaborn as sns
import matplotlib.pyplot as plt
# Load the example dataset
tips = sns.load_dataset("tips")
# Create scatter plot with hue mapping
plt.figure(figsize=(9, 6))
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day")
plt.title('Total Bill vs. Tip by Day')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip Amount ($)')
plt.show()
Scatter plot comparing total bill and tip amount, with points colored by the day of the week.
Seaborn automatically assigns distinct colors to each category ('Thur', 'Fri', 'Sat', 'Sun') and adds a legend.
size
for Numerical or Categorical EmphasisThe size
parameter controls the area of the markers, mapping another variable to this visual property. This can be useful for emphasizing points based on a numerical quantity (like the size of the party) or distinguishing groups.
import seaborn as sns
import matplotlib.pyplot as plt
# Load the example dataset
tips = sns.load_dataset("tips")
# Create scatter plot with size mapping
plt.figure(figsize=(9, 6))
sns.scatterplot(data=tips, x="total_bill", y="tip", size="size") # 'size' column refers to party size
plt.title('Total Bill vs. Tip (Point Size by Party Size)')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip Amount ($)')
# Improve legend placement if needed
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)
plt.tight_layout(rect=[0, 0, 0.85, 1]) # Adjust layout to make space for legend
plt.show()
Here, larger points indicate larger dining parties. Seaborn handles the mapping of the numerical 'size' column to appropriate marker sizes and generates a legend.
style
for Categorical Marker ShapesSimilar to hue
, the style
parameter maps a categorical variable to different marker shapes (circles, squares, crosses, etc.). This provides another way to distinguish groups, especially useful in combination with hue
or for black-and-white publications.
import seaborn as sns
import matplotlib.pyplot as plt
# Load the example dataset
tips = sns.load_dataset("tips")
# Create scatter plot with style mapping
plt.figure(figsize=(9, 6))
sns.scatterplot(data=tips, x="total_bill", y="tip", style="time") # 'time' column: Lunch or Dinner
plt.title('Total Bill vs. Tip (Marker Style by Time of Day)')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip Amount ($)')
plt.show()
Now, points representing 'Lunch' and 'Dinner' have different marker styles.
You can combine hue
, size
, and style
in a single plot to represent even more information, although be cautious not to make the plot overly complex and difficult to interpret.
import seaborn as sns
import matplotlib.pyplot as plt
# Load the example dataset
tips = sns.load_dataset("tips")
# Create scatter plot combining hue and style
plt.figure(figsize=(10, 7))
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day", style="time")
plt.title('Total Bill vs. Tip (Color by Day, Style by Time)')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip Amount ($)')
# Improve legend placement
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)
plt.tight_layout(rect=[0, 0, 0.85, 1]) # Adjust layout
plt.show()
Because Seaborn builds on Matplotlib, you can still use Matplotlib functions to customize the plot after creating it with Seaborn. The seaborn.scatterplot
function returns the Matplotlib Axes
object on which the plot was drawn. You can capture this object and call its methods for fine-grained control, or use pyplot
functions like plt.title()
, plt.xlabel()
, plt.ylabel()
, plt.xlim()
, etc.
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
plt.figure(figsize=(9, 6))
# Capture the Axes object returned by Seaborn
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="smoker")
# Customize using the Axes object or plt functions
ax.set_title('Total Bill vs. Tip (Colored by Smoker Status)')
ax.set_xlabel('Total Bill Amount ($)')
plt.ylabel('Tip Received ($)') # Using plt works too
ax.grid(True, linestyle='--', alpha=0.6) # Add a grid
plt.show()
In summary, seaborn.scatterplot
offers a convenient and powerful way to create scatter plots in Python. Its direct integration with Pandas DataFrames and its ability to easily map data variables to visual properties like color (hue
), size (size
), and marker shape (style
) make it an excellent tool for exploring relationships within your data. Remember that you retain the full customization power of Matplotlib when needed.
© 2025 ApX Machine Learning