Generating synthetic data often carries the goal of preserving the privacy of the original dataset's subjects. However, simply generating data isn't enough; we must rigorously assess how much privacy is actually maintained. Failure to do so can lead to risks such as reconstructing sensitive information or identifying individuals present in the original data.
This chapter focuses on practical techniques for quantifying these privacy risks. You will learn about:
By the end of this chapter, you will be equipped to implement and interpret several key methods for measuring the privacy characteristics of your synthetic datasets.
4.1 Understanding Privacy Risks in Synthetic Data
4.2 Membership Inference Attacks (MIAs)
4.3 Attribute Inference Attacks
4.4 Distance-Based Privacy Metrics
4.5 Differential Privacy Considerations (if applicable)
4.6 Hands-on practical: Implementing a Basic MIA
© 2025 ApX Machine Learning