How to Build a Neural Network from Scratch with Python in 2026

Table of Contents[Hide][Show]

An Overview of the File Structure for Our Python Neural Network Code
Installation
Importing Modules and setting up the Helper function+−
Complete Code
Testing Neural Network+−
- Loading MNIST data
- Running Tests
Conclusion

The brain is comparable to neural networks. This is the analogy that is typically used to assist someone new to the subject to understand the ideas behind machine learning and artificial neural networks.

Because there are several layers of mathematical and statistical computations going on behind the scenes, defining these networks as a mathematical function is a more advanced method.

This is for the people who are actually interested in machine learning and want to see how Python neural network code is written.

In this article, we’ll demonstrate how to construct a fully connected deep neural network (DNN) from scratch in Python 3.

An Overview of the File Structure for Our Python Neural Network Code

There will be three files created here. The first is the simple nn.py file, which will be discussed in “Setting Up Helper Functions” and “Building the Neural Network from Scratch.”

We will also have a file named mnist loader.py to load the test data, as described in “Loading MNIST Data.”

Finally, we will have a file named test.py that will be launched in the terminal to test our neural network.

This file is described in detail in “Running Tests.”

Installation

The NumPy Python library must be downloaded in order to follow this tutorial. You can accomplish this by using the following command on the terminal:

Installation

Importing Modules and setting up the Helper function

The only two libraries we require are random and NumPy, which we will import right away. For our neural network’s initial weights, we’ll shuffle them using the random library.

In order to speed up our computations, we’ll use NumPy or np (by convention, it is often imported as np). Our two helper functions will be made after our imports. Two sigmoid functions: one and sigmoid prime.

Logistic regression will classify data using the sigmoid function, while backpropagation will calculate the delta or gradient using the sigmoid prime function.

Importing Modules And Setting Up Helper Function

Creating Network Class

Building a fully linked neural network is the only focus of this section. The network class will encompass all of the functions that come after. The function Object() { [native code] } will be created initially in our network class.

One argument, sizes, is required by the function Object() { [native code] }. The sizes variable is a collection of numeric values that represents the number of input nodes present in each layer of our neural network.

We initialize four properties in our __init__ method. The input variables, sizes, are used to set the list of layer sizes and the number of layers, num layers, respectively.

The first step is to randomly assign our network’s initial biases to each layer that follows the input layer.

Finally, each link between the input and output layers has its weights randomly generated. Np.Random.Randn() gives a random sample drawn from the normal distribution for context.

Network Class

Feed Forward Function

In a neural network, information is sent forward by the feedforward function. One argument, a, indicating the current activation vector, will be required by this function.

This function estimates the activations at each layer by iterating over all of the biases and weights in the network. The answer given is the prediction, which is the activations of the last layer.

Feed Forward Function

Mini-batch Gradient Descent

Our Network class’s workhorse is Gradient Descent. In this version, we use mini-batch (stochastic) gradient descent, a modified variation of gradient descent.

This indicates that a small batch of data points will be used to update our model. Four required and one optional argument are passed to this method. The four required variables are the training data set, the number of epochs, the size of the mini-batches, and the learning rate (eta).

Test data are available upon request. We’ll supply test data when we eventually evaluate this network. The number of samples in this function is initially set to the length of the list once the training data has been transformed into a list type.

We also apply the same process to test data that is given in. This is because instead of being returned to us as lists, they are really zips of lists. When we load the MNIST data samples later, we’ll learn more about this.

If we can make sure that we provide both sorts of data as lists, then this type-casting is not necessarily essential.

Once we have the data, we go over the training epochs in a loop. A training period is only one round of neural network training. We first shuffle the data in each epoch to ensure randomness before making a list of mini-batches.

The update mini batch function, which is discussed below, will be called for each mini-batch. The test accuracy will also be returned if the test data is available.

Mini Batch Gradient Descent

Cost-derivative helper function

Let’s develop a helper function called cost derivative first before we really create the backpropagation code. If we make a mistake in our output layer, the cost derivative function will show it.

It requires two inputs: the output activations array and the y-coordinates of the anticipated output values.

Cost Derivative Helper Function

Backpropagation function

Our present activation vector, activation, as well as any other activation vectors, activations, and z-vectors, zs, must all be kept in mind. A layer called the input layer is activated first.

We’ll loop through each bias and weight after putting them up. Each loop involves calculating the z vector as the dot product of the weights and the activation, adding it to the list of zs, recalculating the activation, and adding the updated activation to the list of activations.

Finally, the math. The delta, which is equal to the error from the previous layer multiplied by the sigmoid prime of the last element of the zs vectors, is computed before we start our backward pass.

The last layer of nabla b is set to be the delta, and the final layer of nabla w is set to be the dot product of the delta and the second-to-last layer of activations (transposed so we can actually do the math).

We proceed as before, starting with the second layer and concluding with the last, and repeat the process after completing these last layers. The nablas are then given back as a tuple.

Backpropagation Function

Updating Mini-batch gradient descent

Our SGD (stochastic gradient descent) method from before incorporates mini-batch updating. Since it is utilized in SGD but also requires backprop, I debated where to put this function.

Finally, I made the choice to post it here. It begins by generating 0 vectors of the biases’ and weights’ nablas, just like our backprop function did.

It requires the mini-batch and the eta learning rate as its two inputs. In the mini-batch, we then use the backprop function to obtain the delta of each nabla array for each input, x, and output, y. The nabla lists are then updated with these deltas.

Finally, we use the learning rate and the nablas to update the network’s weights and biases. Each value is updated to the most recent value, less the learning rate, multiplied by the minibatch size, and then added to the nabla value.

Updating Mini Batch Gradient Descent

Evaluate function

The evaluate function is the final one we need to write. The test data is the only input for this function. In this function, we only compare the outputs of the network with the anticipated result, y. By feeding the input, x, forward, the outputs of the network are determined.

Evaluating Function

Complete Code

When we combine all the code, this is how it appears.

Build A Neural Network From Scratch With Python

Testing Neural Network

Loading MNIST data

The MNIST data is in .pkl.gz format, which we’ll open using GZIP and load with pickle. Let’s write a quick method to load this data as a tuple of size three, divided into training, validation, and test data.

To make our data easier to manage, we’ll write another function to encode the y into a 10-item array. The array will be all 0s except for a 1 that matches the image’s proper digit.

We’ll use the basic load data and one hot encode method to load our data into a readable format. Another function will be written that will convert our x values into a list of size 784, matching to the image’s 784 pixels, and our y values into their single hot encoded vector form.

Then we’ll combine the x and y values such that one index matches the other. This applies to the training, validation, and test data sets. We then return the changed data.

Loading MNIST Data

Running Tests

We’ll make a new file called “mnist loader” that will import both the neural network we established previously (simple nn) and the MNIST data set loader before we begin testing.

In this file, all we need to do is import the data, build a network with an input layer size of 784 and an output layer size of 10, run the network’s SGD function on the training data, then test it using the test data.

Keep in mind that for our list of input layers, it makes no difference what any of the numbers are between 784 and 10. We can change the other layers any way we like; just the input and output sizes are fixed.

Three layers are not necessary; we might use four, five, or even just two. Have fun experimenting with it.

Running Tests

Conclusion

Here, using Python 3, we create a neural network from scratch. Along with high-level math, we also discussed the specifics of implementation.

We began by implementing helper functions. For the neurons to work, the sigmoid and sigmoid prime functions are crucial. We then put into practice the feedforward function, which is the fundamental process for feeding data into the neural network.

Next, we created the gradient descent function in Python, the engine that drives our neural network. In order to locate “local minima” and optimize their weights and biases, our neural network uses gradient descent. We created the backpropagation function using gradient descent.

By delivering updates when the outputs don’t match the proper labels, this function enables the neural network to “learn.”

Finally, we put our brand-new Python neural network to the test using the MNIST data set. Everything functioned smoothly.

Happy Coding!

How to Build a Neural Network from Scratch with Python

An Overview of the File Structure for Our Python Neural Network Code

Installation