Building and Training a Feed Forward Neural Network in PyTorch

Building and Training a Feed Forward Neural Network in PyTorch#

In this notebook, we’ll build a simple neural network using PyTorch, train it on the SNOTEL dataset, and evaluate its performance. This hands-on exercise will reinforce our understanding of the PyTorch framework and the steps involved in building and training neural networks on real-world data.

Load Libraries#

%pip install -q torch torchvision torchaudio

Note: you may need to restart the kernel to use updated packages.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split


import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

available_device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

print(f"Available device: {available_device}")

Available device: cpu

Preparing the Dataset#

Step 1: Load Dataset#

We’ll start by loading the SNOTEL dataset from a CSV file.

snotel_data=pd.read_csv("data/clean_data.csv")
snotel_data.info()
snotel_data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2469 entries, 0 to 2468
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   swe                 2469 non-null   float64
 1   snowdepth           2469 non-null   float64
 2   tempavg_7_days_avg  2469 non-null   float64
 3   precip_7_days_avg   2469 non-null   float64
 4   snowdensity         2469 non-null   float64
dtypes: float64(5)
memory usage: 96.6 KB

	swe	snowdepth	tempavg_7_days_avg	precip_7_days_avg	snowdensity
0	27.178	91.44	-1.414286	0.653143	0.297222
1	27.686	91.44	-1.528571	0.544286	0.302778
2	27.686	91.44	-0.971429	0.435429	0.302778
3	27.686	88.90	-0.557143	0.399143	0.311429
4	27.686	88.90	-0.271429	0.181429	0.311429

Step 2: Data Split#

We’ll split the data into training, validation, and testing sets. Typically, a common split might be 70% training, 15% validation, and 15% testing.

features = snotel_data.drop('snowdensity', axis=1).values
targets = snotel_data['snowdensity'].values

# Split the dataset into training and temp sets (85% train, 15% temp)
features_train, features_temp, targets_train, targets_temp = train_test_split(
    features, targets, test_size=0.3, random_state=0
)

# Further split the temp set into validation and test sets (15% each)
features_val, features_test, targets_val, targets_test = train_test_split(
    features_temp, targets_temp, test_size=0.5, random_state=0
)

Step 3: Preprocess Data#

Now that we’ve split the data, we can apply scaling. The scaler should be fit on the training data and then used to transform the training, validation, and test sets.

scaler = StandardScaler()

scaler.fit(features_train)

# Transform the training, validation, and test sets
features_train = scaler.transform(features_train)
features_val = scaler.transform(features_val)
features_test = scaler.transform(features_test)

Step 4: Creating Custom Datasets#

Next, we define custom Dataset classes for each of the three sets: training, validation, and testing.

class SNOTELDataset(Dataset):
    def __init__(self, data, targets):
        self.data = torch.tensor(data, dtype=torch.float32)
        self.targets = torch.tensor(targets, dtype=torch.float32).view(-1, 1)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample = self.data[idx]
        target = self.targets[idx]
        return sample, target

# Create instances of the custom datasets for training, validation, and testing sets
train_dataset = SNOTELDataset(data=features_train, targets=targets_train)
val_dataset = SNOTELDataset(data=features_val, targets=targets_val)
test_dataset = SNOTELDataset(data=features_test, targets=targets_test)

Step 5: Using DataLoader#

Now, we use DataLoader to manage our data in mini-batches during training, validation, and testing.

# Create DataLoaders for training, validation, and testing sets
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

Defining the Neural Network#

We define a simple feedforward neural network using torch.nn.Module.

class SNOTELNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SNOTELNN, self).__init__() # super class to inherit from nn.Module
        # Define the layers
        self.fc1 = nn.Linear(input_size, hidden_size)  # Fully connected layer 1
        self.relu = nn.ReLU()  # ReLU activation function
        self.fc2 = nn.Linear(hidden_size, output_size)  # Fully connected layer 2
    
    def forward(self, x): # x is the batch of input
        # Define the forward pass
        out = self.fc1(x)  # Pass input through first layer
        out = self.relu(out)  # Apply ReLU activation
        out = self.fc2(out)  # Pass through second layer to get output
        return out

# Instantiate the model
# Instantiate the model and move it to the device (GPU or CPU)
model = SNOTELNN(input_size=features_train.shape[1], hidden_size=128, output_size=1).to(available_device)

The forward method defines how the input data flows through the network layers. It specifies the sequence of operations that the data undergoes as it moves from the input layer to the output layer. This method is automatically called when you pass data through the model (e.g., outputs = model(inputs)).

Setting the Loss Function and Optimizer#

For this example, we’ll use Mean Squared Error Loss since we’re dealing with a regression problem. We’ll use the Adam optimizer, which is a good default choice due to its adaptive learning rates.

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

Training the Network#

We now write the training loop, which includes zeroing the gradients, performing the forward pass, computing the loss, backpropagating, and updating the model parameters. We will also validate the model on the validation set after each epoch.

Note

An Epoch refers to one complete pass through the entire training dataset. During each epoch, the model sees every example in the dataset once.

num_epochs = 5

# Lists to store the training and validation losses
train_losses = []
val_losses = []

for epoch in range(num_epochs):
    # Training phase
    model.train()
    train_loss = 0.0  # Initialize cumulative training loss
    
    for inputs, labels in train_loader:
        # Move data to the appropriate device
        inputs, labels = inputs.to(available_device), labels.to(available_device)
        
        # Zero the gradients from the previous iteration
        optimizer.zero_grad()
        
        # Perform forward pass
        outputs = model(inputs)
        
        # Compute the loss
        loss = criterion(outputs, labels)
        
        # Perform backward pass (compute gradients)
        loss.backward()
        
        # Update the model parameters
        optimizer.step()
        
        # Accumulate training loss
        train_loss += loss.item()
    
    # Average training loss
    train_loss /= len(train_loader)
    train_losses.append(train_loss)  # Store the training loss for this epoch
    
    # Validation phase
    model.eval()  # Set model to evaluation mode
    val_loss = 0.0
    
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(available_device), labels.to(available_device)  # Move to device
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
    
    # Average validation loss
    val_loss /= len(val_loader)
    val_losses.append(val_loss)  # Store the validation loss for this epoch
    
    print(f'Epoch [{epoch+1}/{num_epochs}], Training Loss: {train_loss:.4f}, Validation Loss: {val_loss:.4f}')

Epoch [1/5], Training Loss: 0.0318, Validation Loss: 0.0206
Epoch [2/5], Training Loss: 0.0160, Validation Loss: 0.0134
Epoch [3/5], Training Loss: 0.0111, Validation Loss: 0.0098
Epoch [4/5], Training Loss: 0.0086, Validation Loss: 0.0075
Epoch [5/5], Training Loss: 0.0069, Validation Loss: 0.0061

# Plotting the training and validation losses
plt.figure(figsize=(10, 5))
plt.plot(range(1, num_epochs + 1), train_losses, label='Training Loss')
plt.plot(range(1, num_epochs + 1), val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss Over Epochs')
plt.legend()
plt.show()

../../_images/9f81826d9d39bd072071daf50569ee0f40b1293102f89a5c2368584242ccc30f.png

Testing the Model#

# Evaluate the model on the test set and collect predictions
model.eval()  # Set the model to evaluation mode
test_loss = 0.0  # Initialize cumulative test loss
all_preds = []
all_labels = []

with torch.no_grad():  # Disable gradient computation for inference
    for inputs, labels in test_loader:
        # Move data to the appropriate device
        inputs, labels = inputs.to(available_device), labels.to(available_device)
        
        # Perform forward pass
        outputs = model(inputs)
        
        # Compute the loss
        loss = criterion(outputs, labels)
        
        # Accumulate test loss
        test_loss += loss.item()
        
        # Store the predictions and the corresponding labels
        all_preds.extend(outputs.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Calculate the average test loss
test_loss /= len(test_loader)
print(f'Test Loss: {test_loss:.4f}')

# Convert lists to numpy arrays for plotting
all_preds = np.array(all_preds)
all_labels = np.array(all_labels)

# Plot observed vs predicted
plt.figure(figsize=(8, 8))
plt.scatter(all_labels, all_preds, alpha=0.7)
plt.xlabel('Observed (Actual) Values')
plt.ylabel('Predicted Values')
plt.title('Observed vs. Predicted Values')
plt.grid(True)
plt.show()

Test Loss: 0.0056

../../_images/470e64226025492e941dd1df3cf36838dbe820edbf924f27e252694683f429a4.png

Saving the Model#

Saving your trained model is an essential part of any machine learning project. It allows you to reuse the model for predictions, further training, or sharing with others without having to retrain it from scratch. In PyTorch, saving and loading models is straightforward and can be done using the torch.save and torch.load functions.

# Save the model's state dictionary
torch.save(model.state_dict(), 'snotel_nn_model.pth')


# Initialize the model architecture
model = SNOTELNN(input_size=features_train.shape[1], hidden_size=128, output_size=1)

# Load the model's state dictionary
model.load_state_dict(torch.load('snotel_nn_model.pth', weights_only=True))

# Set the model to evaluation mode before inference
model.eval()

SNOTELNN(
  (fc1): Linear(in_features=4, out_features=128, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=128, out_features=1, bias=True)
)

Hyperparameter Tuning#

Hyperparameter tuning is a critical step in building machine learning models. Unlike model parameters (like weights and biases), which are learned from the data during training, hyperparameters are the settings you choose before the training process begins. These include:

Learning Rate: Controls how much to adjust the model’s weights with respect to the loss gradient.
Batch Size: Determines the number of training examples utilized in one iteration.
Number of Hidden Layers and Neurons: Specifies the architecture of the neural network.
Optimizer: The algorithm used to update model weights based on the computed gradients (e.g., Adam, SGD).

Tuning these hyperparameters can significantly affect the performance of your model. However, finding the optimal set of hyperparameters can be a challenging and time-consuming process, often requiring experimentation.

Manual vs. Automated Tuning#

Manual Tuning: Involves adjusting hyperparameters based on intuition, experience, or trial and error. While straightforward, this approach can be inefficient and might not always yield the best results.
Automated Tuning: Tools like Optuna can help automate the search for the best hyperparameters. These tools explore the hyperparameter space more systematically and can save a lot of time compared to manual tuning. Sample PyTorch hyperparameter tuning for Optuna can be found here.

Acknowledgements#

Many thanks to HP Marshall (my advisor) for his mentorship and support.
Many thanks to e-Science institute and all organizing members for allowing me deploy/present this tutorial. A huge thanks to eveyone for listening.