Chapter 1 introduced the key dimensions for evaluating synthetic data. Statistical fidelity, measuring how closely the synthetic data's statistical properties mirror the real data, is fundamental. While comparing basic statistics like the mean ( $\mu$ ) or standard deviation ( $\sigma$ ) for individual features provides a starting point, it often fails to capture the complex, high-dimensional relationships present in real datasets. Simply matching marginal distributions $P(X_i)$ for each feature $X_i$ does not guarantee that the joint distribution $P(X_1, X_2, ..., X_n)$ is accurately represented.

This chapter focuses on advanced techniques for a more thorough statistical fidelity assessment. You will learn methods to:

Compare multivariate distributions, going beyond single-feature analysis.
Apply rigorous hypothesis tests specifically designed for assessing distributional similarity between datasets.
Analyze and compare the correlation and covariance structures to ensure relationships between variables are maintained.
Utilize information-theoretic measures to quantify distributional differences.
Employ propensity score methods to assess the distinguishability of synthetic versus real data points.

The chapter concludes with a practical section where you will implement several multivariate statistical tests using Python libraries.

Sections

2.1 Multivariate Distribution Comparisons
2.2 Hypothesis Testing for Distributional Similarity
2.3 Correlation and Covariance Structure Analysis
2.4 Information-Theoretic Measures
2.5 Propensity Score Evaluation
2.6 Hands-on practical: Implementing Multivariate Tests