Having established what synthetic data is and why it is useful in machine learning, this chapter focuses on the initial how. We will examine fundamental techniques for generating artificial data points, moving from theory to simple application.
You will learn about the core idea behind using models or procedures to generate new data. We will cover methods for producing data by sampling from common statistical distributions, such as generating values where each outcome is equally likely (uniform distribution) or values clustered around a mean μ with a standard deviation σ (normal distribution). We will also look at rule-based systems, where data is created according to specific, predefined constraints.
The chapter provides examples for generating both simple numerical and categorical data types using these foundational approaches. A hands-on practical section is included to help solidify these techniques by guiding you through the creation of basic synthetic data. By the end of this chapter, you will have a grasp of elementary methods used to synthesize data from scratch.
2.1 The Idea of Data Generation Models
2.2 Generating Data from Statistical Distributions
2.3 Introduction to Rule-Based Systems
2.4 Generating Simple Numerical Data
2.5 Generating Simple Categorical Data
2.6 Hands-on Practical: Create Basic Synthetic Data
© 2025 ApX Machine Learning