PyTorch Basics

PyTorch Basics#

Introduction to PyTorch#

What is PyTorch?#

PyTorch is a popular open-source deep learning framework known for its flexibility and ease of use. It’s widely adopted in both research and industry for tasks ranging from simple machine learning models to complex neural networks.

Why PyTorch?#

Dynamic computation graph: PyTorch’s ability to dynamically build the computation graph at runtime makes it intuitive and easy to debug.
Strong community support and integration with Python: PyTorch is Pythonic and integrates well with the Python data science stack.
GPU acceleration: PyTorch makes it easy to move tensors to and from GPUs (supports Apple’s Metal and Nvidia GPUs), which is crucial for training large models efficiently.

Tensors in PyTorch#

What is a Tensor?#

Tensors are the fundamental data structures in PyTorch, similar to NumPy arrays but with the added capability of being used on a GPU.

Creating Tensors#

%pip install -q torch torchvision torchaudio

Note: you may need to restart the kernel to use updated packages.

import torch

# Creating a tensor from a list
tensor_a = torch.tensor([1.0, 2.0, 3.0])
print(tensor_a)

# Creating a tensor with random values
tensor_b = torch.rand((2, 3))  # A 2x3 matrix of random numbers
print(tensor_b)

# Creating a tensor with zeros
tensor_c = torch.zeros((3, 3))  # A 3x3 matrix of zeros
print(tensor_c)

tensor([1., 2., 3.])
tensor([[0.7223, 0.8251, 0.3011],
        [0.3216, 0.3183, 0.9513]])
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

Basic Tensor Operators#

# Reshaping a tensor
tensor_reshaped = tensor_b.view(3, 2)  # Reshape to 3x2
print(tensor_reshaped)

# Tensor addition
tensor_sum = tensor_a + tensor_a
print(tensor_sum)

# Indexing
print(tensor_a[1])  # Access the second element

tensor([[0.7223, 0.8251],
        [0.3011, 0.3216],
        [0.3183, 0.9513]])
tensor([2., 4., 6.])
tensor(2.)

# Moving a tensor to GPU

# Check which device is available
available_device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

print(f"Available device: {available_device}")

tensor_a_gpu = tensor_a.to(available_device)
print(tensor_a_gpu)

Available device: cpu
tensor([1., 2., 3.])

Autograd: Automatic Differentiation#

PyTorch’s autograd system automatically calculates gradients, which are essential for training neural networks.
Every operation on tensors keeps track of the computation history, allowing PyTorch to backpropagate errors automatically.

# Create a tensor with gradient tracking enabled
x = torch.tensor(2.0, requires_grad=True)

# Perform a computation
y = x ** 2 + 2* x ** 3

# Backpropagate to compute the gradient
y.backward()

# Print the gradient
print(x.grad)  # Should output 28.0, the derivative of x^2 + 2x^3 at x=2

tensor(28.)

Important

This example shows how PyTorch automatically calculates the gradient of a tensor operation, which is essential for updating the weights during training.

Dataset and DataLoaders#

What is a Dataset in PyTorch?#

Purpose: The Dataset class in PyTorch serves as an abstraction that allows you to manage, preprocess, and access your data in a consistent way.
Key Features:
- Handles how data is stored and accessed.
- Allows for custom data transformations and preprocessing.
- Integrates seamlessly with PyTorch’s DataLoader for efficient batching and shuffling.

What is a DataLoader in PyTorch?#

Purpose: The DataLoader is an iterable that abstracts the complexity of handling data in batches, shuffling, and parallel loading.
Key Features:
- Efficiently loads data in mini-batches during training.
- Automatically shuffles data at the start of each epoch (if specified).
- Supports parallel data loading using multiple workers.

from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, data, targets):
        self.data = data
        self.targets = targets

    def __len__(self):
        # Return the total number of samples
        return len(self.data)

    def __getitem__(self, idx):
        # Retrieve the data sample and label at the specified index
        sample = self.data[idx]
        target = self.targets[idx]
        return sample, target

Explaining `len` and `getitem`#

len: Returns the total number of samples in your dataset. PyTorch uses this method to know how many iterations to run during training.
getitem: Retrieves a specific sample from the dataset using its index. This method returns the data and its corresponding label, which PyTorch uses during training to form mini-batches.

import torch

# Generate random data: 6 samples, each with 2 features
torch.manual_seed(0)  # For reproducibility
features = torch.rand(6, 2)

# Generate random target values (e.g., for a regression problem)
targets = torch.rand(6, 1)

print(f"Features:\n{features}")
print(f"\nTarget:\n{targets}")

Features:
tensor([[0.4963, 0.7682],
        [0.0885, 0.1320],
        [0.3074, 0.6341],
        [0.4901, 0.8964],
        [0.4556, 0.6323],
        [0.3489, 0.4017]])

Target:
tensor([[0.0223],
        [0.1689],
        [0.2939],
        [0.5185],
        [0.6977],
        [0.8000]])

from torch.utils.data import DataLoader

# Create an instance of the custom dataset
dataset = CustomDataset(data=features, targets=targets)

# Create a DataLoader
data_loader = DataLoader(dataset, batch_size=2, shuffle=True)

# Example of iterating through the DataLoader
for idx, (batch_data, batch_labels) in enumerate(data_loader):
    print(f"Batch {idx+1}:\n========")
    print(f"Data:\n{batch_data}")
    print(f"Targets:\n{batch_labels}\n")

Batch 1:
========
Data:
tensor([[0.4556, 0.6323],
        [0.4963, 0.7682]])
Targets:
tensor([[0.6977],
        [0.0223]])

Batch 2:
========
Data:
tensor([[0.0885, 0.1320],
        [0.3489, 0.4017]])
Targets:
tensor([[0.1689],
        [0.8000]])

Batch 3:
========
Data:
tensor([[0.3074, 0.6341],
        [0.4901, 0.8964]])
Targets:
tensor([[0.2939],
        [0.5185]])

shuffle=True ensures that the data is shuffled at the beginning of each epoch, which helps prevent the model from learning patterns based on the order of the data.