Why do we even need ‘Artificial Neural Networks (ANNs)’?
Look at this handwritten number:
How many seconds elapsed by the time you recognized which integers were written? Most probably less than 10 seconds. Have you ever wondered how the brain processes what it sees in a flash? The answer is quite straightforward: the human brain is incisive at pattern recognition and contextual interpretation due to its highly complex neural networks which consist of multitudinous neurons that enable information processing in a hierarchical yet parallel manner.
On the contrary, computers are essentially tremendous at numerical computations but have difficulty recognizing patterns. As an attempt to instill artificial intelligence in computers, Dr. Frank Rosenblatt decided to try and replicate the brain’s neural networks. Once created, these artificial neural networks would enable the computer to process complex information and produce optimal results. This is precisely what happened.
Today, we study Artificial Neural Networks (ANNs) as a fundamental computational model in machine learning being used to solve labyrinthine problems in computer science domains like computer vision and natural language processing.
What is the architecture of ‘Artificial Neural Networks (ANNs)’?
Since computers’ fundamental functioning is based on two states, off and on, that are represented using 0 and 1 respectively, the brain’s neural network had to be mapped onto a mathematical model to be decipherable by the machine.
The biological neurons that make up the neural networks are represented by their mathematical model: ‘artificial neurons’. In Artificial Neural Networks, artificial neurons receive an input, process it, and deliver an output. The artificial neurons receive multiple inputs which are assigned weights to represent their respective prominence. Then, the weighted sum, after going through an activation function, is measured against a defined threshold to determine the artificial neuron’s output. The activation function introduces non-linearity to the network to delimit its potential and avoid becoming a mere linear transformation. You can think of an artificial neuron as a decision-maker that finalizes a verdict after analyzing the impact of each relevant event. Since the weights and the threshold for the decision can be varied, there are multiple decision models that can be generated. The initial artificial neuron was a perceptron, but it eventually evolved into an advanced version which is now known as a sigmoid neuron.
The artificial neurons are then organized as multiple layers and connected to each other via synapses. The synapses each carry a weight that defines the connection and is varied during the training phase. The layers can be divided into 3 main layers: input layer, hidden layer, and output layer. The hidden layer is where the learning happens. If there are multiple layers of these artificial neurons, the artificial neural network facilitates what we know as deep learning. These networks are then usually referred to as deep neural networks or deep learning models because the presence of multiple layers allows them to learn intricate and hierarchical features from data.
How do the ‘Artificial Neural Networks (ANNs)’ function?
Artificial Neural Networks are trained in order to be able to learn how to react to data presented to it.
A known set of data (training dataset) with known correct answers is passed through the neural network from the input layer to the output layer. This is the feedforward process during which layers of neurons output a result based on their current set of weights and biases. Once the output has been generated, a loss function is utilized to tabulate the error between the actual value and the output value. The error is then propagated backward through the network in order to update and optimize the weights and biases to minimize the loss function by using optimization methods like gradient descent. This is the backpropagation process. The feedforward and backpropagation processes are repeated multiple times until a certain halting criterion is met. The artificial neural network can then be used to make predictions on new, unseen data.
Are there different types of ‘Artificial Neural Networks (ANNs)’?
Artificial Neural Networks are an elemental basis for a number of different types of specialized Artificial Neural Networks. Let’s have a brief look at a few popular ones.
Convolutional Neural Networks (CNNs) use convolutional layers for the intricate task of image and grid-based data processing. This makes them capable of executing enigmatic tasks like image classification, object detection, and image generation.
Recurrent Neural Networks (RNNs) consist of connections that loop back on themselves. Such a structure allows them to maintain memory of previous inputs making them specialized for sequential data, such as time series, natural language, and audio. Some popular tasks being performed by RNNs include speech recognition, language modeling, and sequence prediction.
Generative Adversarial Networks (GANs) consist of two neural networks: a generator and a discriminator. They are trained together in a competitive fashion to generate synthetic data that closely resembles real data distributions. GANs have common applications in image generation, style transfer, and data augmentation.
"In the neural dance of code and connection, artificial networks mirror the brilliance of the mind, reminding us that in every byte, there's the potential for infinite possibilities and profound discoveries."
Do ‘Artificial Neural Networks (ANNs)’ face no challenges during implementation?
Have you ever asked a child to convey a message and they forwarded the exact message, word to word, with words that did not need to be pushed forth? This is what happens when an Artificial Neural Network becomes overfit. Basically, the model learns the training data so well that it captures noise or random fluctuations in the data rather than the underlying patterns and thus, ends up performing exceptionally well on the training data but fails to generalize to new, unseen data. Simply stated, the model just memorizes the exact pattern instead of learning the pattern which results in shabby predictions.
Hyperparameters (learning rate, batch size, activation functions, weight initiation etc.) are settings and configurations of a machine learning model that are chosen by the practitioner before the training process begins. This is where hyperparameter tuning jumps into the frame. It involves systematically searching for the best combination of hyperparameters to achieve optimal model performance. Hyperparameter tuning has multiple implementations like manual tuning (trial and error), grid search (systematic search across predefined hyperparameter values), random search (randomly sampling hyperparameters), and more advanced techniques like Bayesian optimization. A separate validation dataset is typically used to evaluate the model's performance during hyperparameter tuning, while a final test dataset is kept separate to assess the model's performance after tuning.
Are ‘Artificial Neural Networks (ANNs)’ the perfect solution to every machine learning problem?
The answer is quite simple: no. Artificial Neural Networks (ANNs) have demonstrated remarkable performance in various domains like computer vision and natural language processing, but they are not universally applicable. A handful of shortcomings limit their scope.
To start with, ANNs require large amounts of labeled data for effective training, and attaining it is not possible for every single machine learning problem that exists. Even if the data exists, it might be imbalanced, and thus, ANNs would not be deemed sufficient alone. Secondly, ANNs are computationally expensive to train due to the substantial computational resources required by powerful GPUs or TPUs. For simpler tasks, it will not be an intelligent decision to train expensive ANNs. Third, ANNs are considered black-box models due to their low decision-making interpretability. For applications in which the explainability of decision is critical, ANNs tend to underperform. Finally, if a machine learning problem requires low latency for a real-time application, simpler and faster models are the right solution.
Hence, despite being commendably powerful, ANNs are not the ultimate solution to every machine learning problem. Choosing the right machine learning approach depends on a thorough understanding of the problem, the available data, and the specific requirements of the application. As a computer scientist, you must explore the different approaches available and if needed, you can even implement a hybrid one!
Comments