Exploring Data Augmentation with Generative AI in Python

Data augmentation plays a vital role in training machine learning models by artificially increasing the size and diversity of the training data. This is particularly important when working with deep learning models, where larger datasets lead to better generalization and model robustness. Generative AI offers an exciting and effective approach for augmenting data, especially in fields like computer vision, natural language processing, and speech processing.

In this exploration, we will discuss how Generative AI models like Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models can be used for data augmentation and demonstrate examples using Python.

1. Overview of Generative AI for Data Augmentation

Generative AI models, such as GANs and VAEs, can be used to generate new data samples based on existing data. The main idea is to learn the distribution of the data and then generate synthetic data that resembles the original dataset, improving generalization.

Generative Adversarial Networks (GANs): GANs consist of two neural networks: a generator (which generates new data) and a discriminator (which distinguishes between real and fake data). GANs are often used for generating realistic images, videos, and more.

Variational Autoencoders (VAEs): VAEs learn a probabilistic mapping from input data to a latent space, which can be sampled to generate new data. VAEs are widely used in image and text data augmentation.

Diffusion Models: These models are becoming increasingly popular for tasks like image generation (e.g., Stable Diffusion and DALL-E). They work by iteratively refining noise into coherent data samples.

2. Using GANs for Data Augmentation

GANs are a popular choice for image augmentation, especially when you want to create realistic new images based on existing data. Here’s how we can generate synthetic images using a pre-trained GAN model.

2.1. Using Pre-Trained GAN (StyleGAN2)

For simplicity, let's use a pre-trained GAN, like StyleGAN2, to generate new images.

Setup

Install necessary libraries:

pip install torch torchvision matplotlib

Load and generate images:

import torch

from torchvision import transforms

import matplotlib.pyplot as plt

import numpy as np

# Load pre-trained StyleGAN2 model

model = torch.hub.load('facebookresearch/pytorch_GAN_zoo:hub', 'stylegan2', model_name='ffhq')

# Generate random latent vectors (z) to sample from

z = torch.randn(1, 512) # Latent space dimension for StyleGAN2

# Generate an image using the GAN model

generated_image = model.test(z)

# Convert the output tensor to a PIL image

generated_image = generated_image.squeeze(0).detach().cpu().numpy().transpose(1, 2, 0)

generated_image = (generated_image + 1) / 2 # Normalize to [0, 1] range

# Display the generated image

plt.imshow(generated_image)

plt.axis('off')

plt.show()

This code will generate a synthetic image based on the latent vector. You can adjust the latent space vector z to generate different images. This is a simple example of augmenting image datasets using a GAN.

3. Using Variational Autoencoders (VAEs) for Data Augmentation

VAEs are a powerful technique for generating new data, especially when you want to explore variations of the data. Unlike GANs, which involve a discriminator and a generator, VAEs use an encoder-decoder framework and are easier to train.

3.1. VAE Model for Data Augmentation

Here’s a simple implementation of a VAE for augmenting images in the MNIST dataset.

Setup

Install dependencies:

pip install torch torchvision matplotlib

VAE Code Example:

import torch

import torch.nn as nn

import torch.optim as optim

from torchvision import datasets, transforms

from torch.utils.data import DataLoader

import matplotlib.pyplot as plt

# VAE Architecture (Encoder + Decoder)

class VAE(nn.Module):

def __init__(self):

super(VAE, self).__init__()

# Encoder

self.fc1 = nn.Linear(28 * 28, 400)

self.fc21 = nn.Linear(400, 20) # Mean of latent variable

self.fc22 = nn.Linear(400, 20) # Log variance of latent variable

# Decoder

self.fc3 = nn.Linear(20, 400)

self.fc4 = nn.Linear(400, 28 * 28)

def encode(self, x):

h1 = torch.relu(self.fc1(x.view(-1, 28 * 28)))

return self.fc21(h1), self.fc22(h1)

def reparameterize(self, mu, logvar):

std = torch.exp(0.5 * logvar)

eps = torch.randn_like(std)

return mu + eps * std

def decode(self, z):

h3 = torch.relu(self.fc3(z))

return torch.sigmoid(self.fc4(h3)).view(-1, 1, 28, 28)

def forward(self, x):

mu, logvar = self.encode(x)

z = self.reparameterize(mu, logvar)

return self.decode(z), mu, logvar

# Loss function

def loss_function(recon_x, x, mu, logvar):

BCE = nn.functional.binary_cross_entropy(recon_x.view(-1, 28 * 28), x.view(-1, 28 * 28), reduction='sum')

MSE = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())

return BCE + MSE

# Instantiate VAE and optimizer

vae = VAE()

optimizer = optim.Adam(vae.parameters(), lr=1e-3)

# Load MNIST dataset

transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])

train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)

# Training Loop (simplified for augmentation)

num_epochs = 5

for epoch in range(num_epochs):

for data, _ in train_loader:

optimizer.zero_grad()

recon_batch, mu, logvar = vae(data)

loss = loss_function(recon_batch, data, mu, logvar)

loss.backward()

optimizer.step()

# Generate new data by sampling from the latent space

with torch.no_grad():

z = torch.randn(1, 20) # Sample random z

generated_image = vae.decode(z).cpu().numpy().squeeze()

# Display generated image

plt.imshow(generated_image, cmap='gray')

plt.axis('off')

plt.show()

This code trains a simple VAE on the MNIST dataset and then generates new images by sampling from the latent space. You can extend this approach to more complex datasets by adjusting the architecture and latent dimensions.

4. Using Diffusion Models for Data Augmentation

Diffusion models, like DALL-E or Stable Diffusion, have gained popularity in the realm of image generation. These models work by gradually refining noise into structured data. Though they are more complex to implement from scratch, pre-trained models are available for easy usage.

4.1. Stable Diffusion for Image Augmentation

You can use a pre-trained model like Stable Diffusion for generating new images based on text prompts or seed images. The Hugging Face library provides a simple interface to work with Stable Diffusion models.

Setup:

pip install diffusers transformers torch

Example Code:

from diffusers import StableDiffusionPipeline

import torch

# Load pre-trained Stable Diffusion model

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v-1-4-original", torch_dtype=torch.float16)

pipe = pipe.to("cuda")

# Generate image from text prompt

prompt = "a futuristic city at sunset"

image = pipe(prompt).images[0]

# Display the generated image

image.show()

This code loads the Stable Diffusion model and generates a new image from the text prompt. You can augment your image dataset with such generated content.

5. Text Augmentation with Generative Models

Generative models like GPT-2 or GPT-3 can be used to augment text data by generating new sentences, paragraphs, or entire articles. Here’s an example of using GPT-2 for text generation

Learn Generative AI Training in Hyderabad

Building an AI-Generated Chatbot Using GPT-3

Generating Art with GANs: A Practical Walkthrough for Beginners

Implementing a VAE for Image Generation: A Hands-On Example

Visit Our Quality Thought Training Institute in Hyderabad

Get Directions

December 17, 2025