Artificial Intelligence (AI) has gained a significant amount of popularity in recent years.
If you are a software engineer, computer scientist, or data science enthusiast in general, then you are probably intrigued by the amazing applications of image processing, pattern recognition and object detection provided by this field.
The most important subfield of AI that you probably heard about is Deep Learning. This field focuses on powerful algorithms (computer program instructions) modeled after human brain functionality known as Neural Networks.
In this article, we’ll go over the concept of Neural Networks and how to build, compile, fit and evaluate these models using Python.
Neural Networks
Neural Networks, or NNs, are a series of algorithms modeled after the biological activity of the human brain. Neural Networks consist of nodes, also called neurons.
A collection of vertical nodes are known as layers. The model consists of one input, one output, and a number of hidden layers. Each layer consists of nodes, also called neurons, where the calculations take place.
In the following diagram, the circles represent the nodes and the vertical collection of nodes represent the layers. There are three layers in this model.
The nodes of one layer are connected to the next layer through transmission lines as seen below.
Our dataset consists of of labeled data. This means that each data entity has been assigned a certain name value.
So for an animal classification dataset we will have images of cats and dogs as our data, with ‘cat’ and ‘dog’ as our labels.
It is important to note that labels need to be converted to numerical values for our model to make sense of them, so our animal labels become ‘0’ for cat and ‘1’ for dog. Both the data and the labels are passed through the model.
Learning
Data is fed to the model one entity at a time. This data is broken down into chunks and passed through each node of the model. Nodes carry out mathematical operations on these chunks.
You do not need to know the mathematical functions or calculations for this tutorial, but it is important to have a general idea of how these models work. After a series of calculations in one layer, data is passed onto the next layer and so on.
Once completed, our model predicts the data label at the output layer (for example, in an animal classification problem we get a prediction ‘0’ for a cat).
The model then proceeds to compare this predicted value with that of the actual label value.
If the values match, our model will take the next input but if the values differ the model will calculate the difference between both values, called loss, and adjust node calculations to produce matching labels next time.
Deep Learning Frameworks
To build Neural Networks in code, we need to import Deep Learning frameworks known as libraries using our Integrated Development Environment (IDE).
These frameworks are a collection of pre-written functions that will help us in this tutorial. We will be using the Keras framework to build our model.
Keras is a Python library that uses a deep learning and artificial intelligence backend called Tensorflow to create NNs in the form of simple sequential models with ease.
Keras also comes with its own preexisting models that could be used as well. For this tutorial, we will be creating our own model using Keras.
You can learn more about this Deep Learning framework from the Keras website.
Building a Neural Network (Tutorial)
Let’s move on to building a Neural Network using Python.
Problem Statement
Neural Networks are a type of solution to AI-based problems. For this tutorial we will be going over the Pima Indians Diabetes Data, which is available here.
UCI Machine Learning has compiled this dataset and contains a medical record of Indian patients. Our model has to predict whether the patient has an onset of diabetes within 5 years or not.
Loading Dataset
Our dataset is a single CSV file called ‘diabetes.csv’ that can easily be manipulated using Microsoft Excel.
Before creating our model, we need to import our dataset. Using the following code you can do this:
import pandas as pd
data = pd.read_csv(‘diabetes.csv’)
x = data.drop(“Outcome”)
y = data[“Outcome”]
Here we are using the Pandas library to be able to manipulate our CSV file data, read_csv() is a built-in function of Pandas that allows us to store the values in our file to a variable called ‘data’.
The variable x contains our dataset without the outcome (labels) data. We achieve this with the data.drop() function that removes the labels for x, while y contains only the outcome (label) data.
Building Sequential Model
Step 1: Importing Libraries
Firstly, we need to import TensorFlow and Keras, along with certain parameters required for our model. The following code allows us to do this:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
For our model we are importing dense layers. These are fully connected layers; i.e., each node in a layer is fully connected with another node in the next layer.
We are also importing an activation function needed for scaling data sent to nodes. Optimizers have also been imported to minimize loss.
Adam is a renown optimizer that makes our model update node calculations more efficiently, along with categorical_crossentropy which is the type of loss function (calculates difference between actual and predicted label values) that we will be using.
Step 2: Designing Our Model
The model I am creating has one input (with 16 units), one hidden (with 32 units) and one output (with 2 units) layer. These numbers are not fixed and will depend entirely on the given problem.
Setting the right number of units and layers is a process that can be improved overtime through practice. Activation corresponds to the type of scaling we will be performing on our data before passing it through a node.
Relu and Softmax are renowned activation functions for this task.
model = Sequential([
Dense(units = 16, input_shape = (1,), activation = ‘relu’),
Dense(units = 32, activation = ‘relu’),
Dense(units = 2, activation = ‘softmax’)
])
Here is what the summary of the model should look like:
Training the Model
Our model will be trained in two steps, the first being compiling the model (putting the model together) and the next being fitting the model on a given dataset.
This can be done using the model.compile() function followed by the model.fit() function.
model.compile(optimizer = Adam(learning_rate = 0.0001), loss = ‘binary_crossentropy’, metrics = [‘accuracy’])
model.fit(x, y, epochs = 30, batch_size = 10)
Specifying the ‘accuracy’ metric allows us to observe the accuracy of our model during training.
Since our labels are in the form of 1’s and 0’s, we will be using a binary loss function to compute the difference between actual and predicted labels.
The dataset is also being split into batches of 10 (batch_size) and will be passed through the model 30 times (epochs). For a given dataset, x would be the data and y would be the labels corresponding to the data.
Testing Model Using Predictions
To evaluate our model, we make predictions on the test data using the predict() function.
predictions = model.predict(x)
And that’s it!
You should now have a good understanding of the Deep Learning application, Neural Networks, how they work in general and how to build, train and test a model in Python code.
I hope this tutorial gives you the kickstart to create and deploy your own Deep Learning models.
Let us know in the comments if the article was helpful.
Leave a Reply