How Does Machine Learning Work?

Before an AI is ready for use, it must be trained for the task at hand. In the following, we go step by step through the individual phases of machine learning.

Required Imports

import torch
import torch.utils as utils
import torchvision
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

# check torch install
print(torch.__version__)

Basic Definitions

# Path to dataset
dataset_path = "Dataset"

# numberand names of classes
num_classes = 4
classes = ("glass", "metal","paper","plastic")

Transformations

PyTorch transformations are crucial because they facilitate data preprocessing, augmentation, and preparation, which are essential for training robust and accurate machine learning models. Here are some important transformations:

  1. Normalization: Transformations allow normalization of data to standardize input features. For example, image pixel values can be scaled to a specific range (e.g., 0–1 or -1–1) to make training more stable.
  2. Tensor Conversion: Converting raw data (e.g., images or arrays) to PyTorch tensors is necessary for compatibility with PyTorch models.
  3. Resizing: Standardizing the input size for a consistent model input.
  4. Cropping: Focusing on specific regions of an image.
  5. Flipping/Rotating: Creating variations to improve model generalization.
transformations = transforms.Compose([
        transforms.Resize(224),
        transforms.RandomVerticalFlip(),
        transforms.RandomHorizontalFlip(),
        transforms.RandomResizedCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])


dataset = datasets.ImageFolder(dataset_path, transform = transformations)
classes = dataset.classes

Deviding the Dataset

The next step is to divide the entire data set into training data and test data. A division of approx. 80% training data and 20% test data has proven to be successful.

train_subset , test_subset = utils.data.random_split(dataset, [2000,523], 
        generator = torch.Generator().manual_seed(1))
train_subset.dataset.transform = transformations    

Defining the Dataloaders

A PyTorch DataLoader is a utility in the PyTorch library that simplifies and automates the process of loading and managing data for machine learning models. It is particularly useful when working with large datasets that cannot fit into memory at once or when you need to apply data transformations and batching efficiently. We define a data loader for the training data and one for the test data:

train_loader = utils.data.DataLoader(train_subset, batch_size=4, shuffle=True, num_workers=0)
test_loader = utils.data.DataLoader(test_subset, batch_size=4, shuffle=True, num_workers=0)

dataloaders = {"train": train_loader, "test": test_loader}

Training Algorithm

Now comes the most important part: the program for training the neural network. The function has 3 parameters: the model, the data loaders used and the number of epochs that are run through in the training:

def training_and_test(model, dataloaders, epochs ):

    # lists for data evaluation
    train_acc =[]   # training accuracy
    test_acc = []   # test accuracy

Each epoch has two phases: a training phase in which the training data is learned and a test phase in which the network is tested against the test data:

    # phases
    phases = ["train", "test"]

Optimizer

In PyTorch, an optimizer is a key component of the machine learning workflow, responsible for adjusting the parameters (weights and biases) of a model during training in order to minimize the loss function. It implements specific optimization algorithms, such as Stochastic Gradient Descent (SGD), Adam, or RMSprop, among others. We use Adam.

    
    params_to_update = []  # list to store the names of the parameters that will be updated
    for name, param in model.named_parameters():
        if param.requires_grad:
            # Add the parameter to the list if requires_grad is True
            params_to_update.append(param)
            # Print the name of the parameter
            print("\t", name)
    optimizer = optim.Adam(params_to_update)

Loss Function

A PyTorch loss function is a key component in training machine learning models with the PyTorch library. Loss functions are mathematical formulations that measure the difference between the model's predictions and the true target values. By minimizing this loss during training, the model improves its performance.

In PyTorch, loss functions are implemented as classes or functions found in the torch.nn module or as standalone methods in the torch namespace. They are typically used in conjunction with optimization algorithms (e.g., torch.optim.SGD, torch.optim.Adam) to adjust the model's weights.

Cross-Entropy Loss

Cross-entropy loss, also known as log loss, is a widely used loss function in machine learning, particularly for classification tasks. It measures the difference between two probability distributions: the predicted probability distribution produced by the model and the true (target) distribution from the data. It is particularly well-suited for tasks involving multi-class classification.

    
    errorfunction = torch.nn.CrossEntropyLoss()

Now let's start ...

  
    # go through the training loop for all epochs
    for epoch in range(epochs):
        print("Epoch ", epoch + 1, " of " , epochs )

        # carry out training and then evaluation phases
        for phase in phases:
            if phase == 'train':
                model.train()  # training mode
            else:
                model.eval()  # test mode

            # Number of correct predictions
            correct = 0

            # Iterate over all data
            for inputs, labels in dataloaders[phase]:

                # Reset the gradient of the optimizer after each run
                optimizer.zero_grad()

                # One pass (gradient calculation on or off)
                with torch.set_grad_enabled(phase == 'train'):
                    # Chasing input through the network and calculating output
                    outputs = model(inputs)
                    
                    # Apply error function -> Calculate error
                    error = errorfunction(outputs, labels)

                    # The maximum value is the predicted class
                    _, predicted = torch.max(outputs, 1)

                    # Optimize model (only in training)
                    if phase == 'train':
                        # Using backpropagation
                        error.backward()
                        # Recalculate weights
                        optimizer.step()

                # Number of correctly determined labels
                correct = correct + torch.sum(predicted == labels.data)

            # Calculate the accuracy of the forecast
            accuracy = correct.double() / len(dataloaders[phase].dataset)

            print("accuracy in phase ", phase, " : ", accuracy)

            # Save current accuracy in our history
            if phase == 'test':
                test_acc.append(accuracy.item())
            else:
                train_acc.append(accuracy.item())
    # return the trained model and the test history as a result
    return  model, train_acc, test_acc 

Selecting A Model

There are many different types of artificial neural networks. These include: Perceptron, Feed Forward Neural Networks, Convolutional Neural Networks and Recurrent Neural Networks. Convolutional neural networks (CNNs) are used in particular in the field of image processing. In the case of CNNs, there are several layers (similar to multi-layer feedforward networks) that follow one another. There are three types of layers here: Convolutional Layer, Pooling Layer and Fully-Connected Layer.

Recurrent Neural Networks

In contrast to feedforward networks, some artificial neurons in recurrent neural networks (RNNs) are fed back. Put simply, the output of an artificial neuron is fed back as input to the same or previously used neuron (so-called feedback loop). In this way, the RNNs can retain information within an artificial neuron. In this respect, the RNNs have a kind of memory.

For this tutorial I use ResNet18, of course you can also use other CNNs or Recurrent Neural Networks.

  
model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.DEFAULT)
model.fc = torch.nn.Linear(model.fc.in_features, 4)

model , train_hist, test_hist = training_and_test(model, {"train" : train_loader, "test" : test_loader}, 20)    

Saving the Trained Model

After the training i save the models so that i can use them again later for my application:

datestring = datetime.now().strftime("%m_%d_%Y_%H_%M_%S")
torch.save(model.state_dict(), "model_" + datestring + ".pth")    

Visualize Training Progress

plt.plot(train_hist,label="train")
plt.plot(test_hist,label="test")
plt.legend()
plt.show()