Perceptron Algorithm implemented from scratch in Python

8 min readMar 22, 2021

Introduction

The perceptron algorithm is the most basic form of a neural network(NN) used in Machine Learning, and its design was inspired by human biology. Scientists studied the way that neurons determine their own state by receiving signals from the connections to other neurons and comparing the stimuli received to a threshold. Mathematicians took this idea and applied it to a simple perceptron algorithm that is commonly used for supervised learning binary classifications tasks where data is linearly separable. The perceptron is able to predict classes by taking a linear function that combines a set of weights with features.

Neurons determine their own state by receiving signals from their connections. Source: Giphy

Today I want to share how to implement a perceptron algorithm using Python. Why would you bother if you can go “the pip install way” and import some libraries that would handle it for you? Because you would benefit from understanding the most basic unit of a NN, which you can then use as a starting point for more complex operations. Also, sometimes it’s nice to do something from scratch, just like those people that love making their own pasta!

This is how I image my grandfather making pasta in Italy. Source: Giphy

The algorithm will be applied in two different datasets: one linearly separable and the other one not. I will review the technical details and also address the infinite loop problem in non-linearly separable datasets.

Perceptron Algorithm

The perceptron is a function that maps its input, a real-valued vector (X), to an output value f(X), a single binary value. It performs the mapping by associating a set of weights (w) to the attributes (x) along with a bias (b), the threshold. The function then aggregates the input in a weighted sum and returns 1 or -1 according to the threshold criteria. In machine learning, this process is repeated in several iterations by adjusting parameters (w and b) until the model’s prediction agrees with the target values.

The formula of the algorithm can be visualised like this:

This could also be explained as…

Other perceptron formula Source: Wikipedia

where a=x represents the data attributes. As it can be seen, the algorithm consists of a series of steps that will be explained below:

1.Initialize random weight vector and constant

To start, I’ve created a class MyBeautifulPerceptron that contains a constructor where I define the initial weights and the bias.

In line seven of the code above, I initialise the weight vector(w) with random numbers. It’s an array with two numbers, which match the number of features of our dataset because we need one weight per data attribute.

In line 10, I initialise the bias(b), which is a constant 1. If bias is not initialised here, another approach would have been to add the constant as x0 to the dataset, which would have required to also add another w0 of 1.

2. Multiply weights by input and sum them up (Weighted sum)

After that, I create a function called “predict” that requires the parameters initialised before, plus the training set for x. This function takes care of performing a multiplication between the weights and the inputs and summing them up, which is a mathematical operation known as the dot product. We also add the bias to the result so that we can return an array. The function is implemented in line 6 of the code below.

3. Compare result against the threshold to compute the output (1 or -1)

In line 7 of the code snippet above, we use the method np. sign() which returns 1 if the array value is greater than 0, or -1 if the array value is less than 0. This helps us return a prediction that will be either +1 or -1.

4. Update parameters (weights and bias)

Now that we have the results for our initial prediction, I create a method called fit to:

a) Save each hypothesis and calculate which hypothesis is better

In order to do this, we have to compare the predictions with the target. For this, I calculate the accuracy of each prediction and get an array of all the errors that occurred during training. I do this from line 8 to 20 in the code snippet below.

b) Update parameters according to the errors

From line 21 onwards I start a condition that checks if the prediction is still having errors. This helps the iterations to stop once the predictions are equal to the target. But if the prediction has at least one error, the weights and the bias will be updated.

On line 27 I calculate how much the new weight will be and I update the number to the variable “w”. Weights are updated based on the error the model made. The new weight is the product of the wrong predictions, plus the old weight. This helps move the parameters closer to y, in the direction that would help x come closer to the target. Note that in Python I’m using += which adds a number to a variable and changes the variable itself. The bias is also updated in a similar way, except without x as it is not associated with a specific input.

c)Repeat the process until maximum accuracy is achieved

To make sure the process is repeated until maximum accuracy is achieved I wrap everything into a function called “fit” that requires the self parameters of w and b, and the train and test data. Here I introduce a “while true” loop that contains all the sections of the algorithm explained above.

5. Solve the Infinite loop problem for non-linearly separable data

The algorithm ends when the 100% train accuracy is achieved. If we cannot use a straight line to separate samples then this algorithm will end up in an infinite loop because 100% will never be achieved. An example of this is what happened to me when running this notebook.

What could be the solution to that problem? I dealt with it by adding a maximum number of iterations to the loop. This avoids falling into the infinite loop by adding an environmental constant of 100 iterations and including a condition in every iteration.

Model Evaluation

The model is evaluated on two datasets. The first one is a linearly separable dataset obtained from DataOptimal GitHub (LINK). The other one is the Breast Cancer Wisconsin (Diagnostic) Data Set from UCI (LINK).

1.Data Preparation

The first dataset contains 2000 instances that are linearly separable. All of the features are in numeric values, in columns 1 and 2. Column 0 is a dummy feature of 1’s included adding a constant but not used for this experiment as bias was introduced in the perceptron class. The target is the data in column 3 (0 or 1), that I pre-processed to convert into -1 or 1. This dataset is used for a binary classification task and was converted from a CSV to a DataFrame, and then to a multidimensional array.

The second dataset contains 569 instances that are non-linearly separable. The features are in columns 1–29, and the target is in the data column 30. Same pre-processing was done, I converted 0 class to -1, and also selected only two attributes(column 1 and 2) to work with the model. This dataset is also used for a binary classification task and was converted from a CSV to a DataFrame to a multidimensional array.

2.Experiment Design and Evaluation

The overall design of the experiment was to build a perceptron model and fit it into two different datasets, one of which was not-linearly separable. Each dataset was pre-processed and split into two parts, 70% for training and another 30% for evaluation. Prior to splitting data was randomly shuffled. After that, the data attributes were separated from the class column. The following resulted in 8 different variables used to evaluate the performance of my perceptron vs off-the-shelf perceptron: x_train, y_train, x_test, y_test (for dataset1) and x_train_nonl, y_train_nonl, x_test_nonl,y_test_nonl (for dataset2). The metrics used to evaluate the performance are Training and Testing accuracy.

3.Evaluation Results

Testing Acc Dataset1 (linear)

-MyPerceptron:99%

-Sklearn’s Perceptron:99%

Testing Acc Dataset2 (non-linear)

-MyPerceptron:62%

-Sklearn’s Perceptron:62%

Conclusion

We’ve implemented from scratch a perceptron algorithm using Python. The steps followed were: Initializing random weight vector and constant, performing a weighted sum, comparing the result against the threshold to compute the output (1 or -1), updating parameters (weights and bias), and solving the Infinite loop problem for non-linearly separable data.

Results show that the perceptron model works best for binary classification tasks with data that can be linearly separable, demonstrating a -37pp difference in accuracy when applied to samples that couldn’t be separated in classes. If we compare the results of my custom model vs an off-the-shelf trusted implementation we can conclude that my custom model was able to achieve the same accuracy as sklearn’s perceptron for both datasets, and for both training and testing.

I personally believe that implementing a perceptron from scratch is a great way to learn the algorithm on a deeper level, and might even result in slightly better results than using off-the-shelf libraries.

References

Dr Jun Li, “Advanced Data Analytics and Algorithms” (Perceptron Algorithm, the University of Technology of Sydney, Sydney, October 2020).
6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study.2020.DataOptimal.
Perceptron Learning Algorithm: A Graphical Explanation Of Why It Works
2020.A.L. Chandra.Medium