Similar articles being viewed by others
Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.


A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning
01 June 2019
Shaveta Dargan, Munish Kumar, … Gulshan Kumar

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
31 March 2021
Laith Alzubaidi, Jinglan Zhang, … Laith Farhan

Various Frameworks and Libraries of Machine Learning and Deep Learning: A Survey
01 February 2019
Zhaobin Wang, Ke Liu, … Yaonan Zhang

An Extensive Study on Deep Learning: Techniques, Applications
03 February 2021
Ruchi Mittal, Shefali Arora, … M. P. S. Bhatia

Machine learning and deep learning
08 April 2021
Christian Janiesch, Patrick Zschech & Kai Heinrich

Convolutional Neural Networks-An Extensive arena of Deep Learning. A Comprehensive Study
16 February 2021
Navdeep Singh & Hiteshwari Sabrol

Machine Learning: Algorithms, Real-World Applications and Research Directions
22 March 2021
Iqbal H. Sarker
Deep Learning without Tears
01 January 2020
Divyasheel Sharma

“Transfer Learning” for Bridging the Gap Between Data Sciences and the Deep Learning
28 March 2022
Ayesha Sohail
- Review Article
- Published: 18 August 2021
Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions
- Iqbal H. Sarker ORCID: orcid.org/0000-0003-1740-5517 1 , 2
SN Computer Science volume 2 , Article number: 420 ( 2021 ) Cite this article
74k Accesses
230 Citations
13 Altmetric
Metrics details
Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today’s Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various application areas like healthcare, visual recognition, text analytics, cybersecurity, and many more. However, building an appropriate DL model is a challenging task, due to the dynamic nature and variations in real-world problems and data. Moreover, the lack of core understanding turns DL methods into black-box machines that hamper development at the standard level. This article presents a structured and comprehensive view on DL techniques including a taxonomy considering various types of real-world tasks like supervised or unsupervised. In our taxonomy, we take into account deep networks for supervised or discriminative learning , unsupervised or generative learning as well as hybrid learning and relevant others. We also summarize real-world application areas where deep learning techniques can be used. Finally, we point out ten potential aspects for future generation DL modeling with research directions . Overall, this article aims to draw a big picture on DL modeling that can be used as a reference guide for both academia and industry professionals.
Working on a manuscript?
Introduction.
In the late 1980s, neural networks became a prevalent topic in the area of Machine Learning (ML) as well as Artificial Intelligence (AI), due to the invention of various efficient learning methods and network structures [ 52 ]. Multilayer perceptron networks trained by “Backpropagation” type algorithms, self-organizing maps, and radial basis function networks were such innovative methods [ 26 , 36 , 37 ]. While neural networks are successfully used in many applications, the interest in researching this topic decreased later on. After that, in 2006, “Deep Learning” (DL) was introduced by Hinton et al. [ 41 ], which was based on the concept of artificial neural network (ANN). Deep learning became a prominent topic after that, resulting in a rebirth in neural network research, hence, some times referred to as “new-generation neural networks”. This is because deep networks, when properly trained, have produced significant success in a variety of classification and regression challenges [ 52 ].
Nowadays, DL technology is considered as one of the hot topics within the area of machine learning, artificial intelligence as well as data science and analytics, due to its learning capabilities from the given data. Many corporations including Google, Microsoft, Nokia, etc., study it actively as it can provide significant results in different classification and regression problems and datasets [ 52 ]. In terms of working domain, DL is considered as a subset of ML and AI, and thus DL can be seen as an AI function that mimics the human brain’s processing of data. The worldwide popularity of “Deep learning” is increasing day by day, which is shown in our earlier paper [ 96 ] based on the historical data collected from Google trends [ 33 ]. Deep learning differs from standard machine learning in terms of efficiency as the volume of data increases, discussed briefly in Section “ Why Deep Learning in Today's Research and Applications? ”. DL technology uses multiple layers to represent the abstractions of data to build computational models. While deep learning takes a long time to train a model due to a large number of parameters, it takes a short amount of time to run during testing as compared to other machine learning algorithms [ 127 ].
While today’s Fourth Industrial Revolution (4IR or Industry 4.0) is typically focusing on technology-driven “automation, smart and intelligent systems”, DL technology, which is originated from ANN, has become one of the core technologies to achieve the goal [ 103 , 114 ]. A typical neural network is mainly composed of many simple, connected processing elements or processors called neurons, each of which generates a series of real-valued activations for the target outcome. Figure 1 shows a schematic representation of the mathematical model of an artificial neuron, i.e., processing element, highlighting input ( \(X_i\) ), weight ( w ), bias ( b ), summation function ( \(\sum\) ), activation function ( f ) and corresponding output signal ( y ). Neural network-based DL technology is now widely applied in many fields and research areas such as healthcare, sentiment analysis, natural language processing, visual recognition, business intelligence, cybersecurity, and many more that have been summarized in the latter part of this paper.

Schematic representation of the mathematical model of an artificial neuron (processing element), highlighting input ( \(X_i\) ), weight ( w ), bias ( b ), summation function ( \(\sum\) ), activation function ( f ) and output signal ( y )
Although DL models are successfully applied in various application areas, mentioned above, building an appropriate model of deep learning is a challenging task, due to the dynamic nature and variations of real-world problems and data. Moreover, DL models are typically considered as “black-box” machines that hamper the standard development of deep learning research and applications. Thus for clear understanding, in this paper, we present a structured and comprehensive view on DL techniques considering the variations in real-world problems and tasks. To achieve our goal, we briefly discuss various DL techniques and present a taxonomy by taking into account three major categories: (i) deep networks for supervised or discriminative learning that is utilized to provide a discriminative function in supervised deep learning or classification applications; (ii) deep networks for unsupervised or generative learning that are used to characterize the high-order correlation properties or features for pattern analysis or synthesis, thus can be used as preprocessing for the supervised algorithm; and (ii) deep networks for hybrid learning that is an integration of both supervised and unsupervised model and relevant others. We take into account such categories based on the nature and learning capabilities of different DL techniques and how they are used to solve problems in real-world applications [ 97 ]. Moreover, identifying key research issues and prospects including effective data representation, new algorithm design, data-driven hyper-parameter learning, and model optimization, integrating domain knowledge, adapting resource-constrained devices, etc. is one of the key targets of this study, which can lead to “Future Generation DL-Modeling”. Thus the goal of this paper is set to assist those in academia and industry as a reference guide, who want to research and develop data-driven smart and intelligent systems based on DL techniques.
The overall contribution of this paper is summarized as follows:
This article focuses on different aspects of deep learning modeling, i.e., the learning capabilities of DL techniques in different dimensions such as supervised or unsupervised tasks, to function in an automated and intelligent manner, which can play as a core technology of today’s Fourth Industrial Revolution (Industry 4.0).
We explore a variety of prominent DL techniques and present a taxonomy by taking into account the variations in deep learning tasks and how they are used for different purposes. In our taxonomy, we divide the techniques into three major categories such as deep networks for supervised or discriminative learning, unsupervised or generative learning, as well as deep networks for hybrid learning, and relevant others.
We have summarized several potential real-world application areas of deep learning, to assist developers as well as researchers in broadening their perspectives on DL techniques. Different categories of DL techniques highlighted in our taxonomy can be used to solve various issues accordingly.
Finally, we point out and discuss ten potential aspects with research directions for future generation DL modeling in terms of conducting future research and system development.
This paper is organized as follows. Section “ Why Deep Learning in Today's Research and Applications? ” motivates why deep learning is important to build data-driven intelligent systems. In Section“ Deep Learning Techniques and Applications ”, we present our DL taxonomy by taking into account the variations of deep learning tasks and how they are used in solving real-world issues and briefly discuss the techniques with summarizing the potential application areas. In Section “ Research Directions and Future Aspects ”, we discuss various research issues of deep learning-based modeling and highlight the promising topics for future research within the scope of our study. Finally, Section “ Concluding Remarks ” concludes this paper.
Why Deep Learning in Today’s Research and Applications?
The main focus of today’s Fourth Industrial Revolution (Industry 4.0) is typically technology-driven automation, smart and intelligent systems, in various application areas including smart healthcare, business intelligence, smart cities, cybersecurity intelligence, and many more [ 95 ]. Deep learning approaches have grown dramatically in terms of performance in a wide range of applications considering security technologies, particularly, as an excellent solution for uncovering complex architecture in high-dimensional data. Thus, DL techniques can play a key role in building intelligent data-driven systems according to today’s needs, because of their excellent learning capabilities from historical data. Consequently, DL can change the world as well as humans’ everyday life through its automation power and learning from experience. DL technology is therefore relevant to artificial intelligence [ 103 ], machine learning [ 97 ] and data science with advanced analytics [ 95 ] that are well-known areas in computer science, particularly, today’s intelligent computing. In the following, we first discuss regarding the position of deep learning in AI, or how DL technology is related to these areas of computing.
The Position of Deep Learning in AI
Nowadays, artificial intelligence (AI), machine learning (ML), and deep learning (DL) are three popular terms that are sometimes used interchangeably to describe systems or software that behaves intelligently. In Fig. 2 , we illustrate the position of deep Learning, comparing with machine learning and artificial intelligence. According to Fig. 2 , DL is a part of ML as well as a part of the broad area AI. In general, AI incorporates human behavior and intelligence to machines or systems [ 103 ], while ML is the method to learn from data or experience [ 97 ], which automates analytical model building. DL also represents learning methods from data where the computation is done through multi-layer neural networks and processing. The term “Deep” in the deep learning methodology refers to the concept of multiple levels or stages through which data is processed for building a data-driven model.

An illustration of the position of deep learning (DL), comparing with machine learning (ML) and artificial intelligence (AI)
Thus, DL can be considered as one of the core technology of AI, a frontier for artificial intelligence, which can be used for building intelligent systems and automation. More importantly, it pushes AI to a new level, termed “Smarter AI”. As DL are capable of learning from data, there is a strong relation of deep learning with “Data Science” [ 95 ] as well. Typically, data science represents the entire process of finding meaning or insights in data in a particular problem domain, where DL methods can play a key role for advanced analytics and intelligent decision-making [ 104 , 106 ]. Overall, we can conclude that DL technology is capable to change the current world, particularly, in terms of a powerful computational engine and contribute to technology-driven automation, smart and intelligent systems accordingly, and meets the goal of Industry 4.0.
Understanding Various Forms of Data
As DL models learn from data, an in-depth understanding and representation of data are important to build a data-driven intelligent system in a particular application area. In the real world, data can be in various forms, which typically can be represented as below for deep learning modeling:
Sequential Data Sequential data is any kind of data where the order matters, i,e., a set of sequences. It needs to explicitly account for the sequential nature of input data while building the model. Text streams, audio fragments, video clips, time-series data, are some examples of sequential data.
Image or 2D Data A digital image is made up of a matrix, which is a rectangular array of numbers, symbols, or expressions arranged in rows and columns in a 2D array of numbers. Matrix, pixels, voxels, and bit depth are the four essential characteristics or fundamental parameters of a digital image.
Tabular Data A tabular dataset consists primarily of rows and columns. Thus tabular datasets contain data in a columnar format as in a database table. Each column (field) must have a name and each column may only contain data of the defined type. Overall, it is a logical and systematic arrangement of data in the form of rows and columns that are based on data properties or features. Deep learning models can learn efficiently on tabular data and allow us to build data-driven intelligent systems.
The above-discussed data forms are common in the real-world application areas of deep learning. Different categories of DL techniques perform differently depending on the nature and characteristics of data, discussed briefly in Section “ Deep Learning Techniques and Applications ” with a taxonomy presentation. However, in many real-world application areas, the standard machine learning techniques, particularly, logic-rule or tree-based techniques [ 93 , 101 ] perform significantly depending on the application nature. Figure 3 also shows the performance comparison of DL and ML modeling considering the amount of data. In the following, we highlight several cases, where deep learning is useful to solve real-world problems, according to our main focus in this paper.
DL Properties and Dependencies
A DL model typically follows the same processing stages as machine learning modeling. In Fig. 4 , we have shown a deep learning workflow to solve real-world problems, which consists of three processing steps, such as data understanding and preprocessing, DL model building, and training, and validation and interpretation. However, unlike the ML modeling [ 98 , 108 ], feature extraction in the DL model is automated rather than manual. K-nearest neighbor, support vector machines, decision tree, random forest, naive Bayes, linear regression, association rules, k-means clustering, are some examples of machine learning techniques that are commonly used in various application areas [ 97 ]. On the other hand, the DL model includes convolution neural network, recurrent neural network, autoencoder, deep belief network, and many more, discussed briefly with their potential application areas in Section 3 . In the following, we discuss the key properties and dependencies of DL techniques, that are needed to take into account before started working on DL modeling for real-world applications.

An illustration of the performance comparison between deep learning (DL) and other machine learning (ML) algorithms, where DL modeling from large amounts of data can increase the performance
Data Dependencies Deep learning is typically dependent on a large amount of data to build a data-driven model for a particular problem domain. The reason is that when the data volume is small, deep learning algorithms often perform poorly [ 64 ]. In such circumstances, however, the performance of the standard machine-learning algorithms will be improved if the specified rules are used [ 64 , 107 ].
Hardware Dependencies The DL algorithms require large computational operations while training a model with large datasets. As the larger the computations, the more the advantage of a GPU over a CPU, the GPU is mostly used to optimize the operations efficiently. Thus, to work properly with the deep learning training, GPU hardware is necessary. Therefore, DL relies more on high-performance machines with GPUs than standard machine learning methods [ 19 , 127 ].
Feature Engineering Process Feature engineering is the process of extracting features (characteristics, properties, and attributes) from raw data using domain knowledge. A fundamental distinction between DL and other machine-learning techniques is the attempt to extract high-level characteristics directly from data [ 22 , 97 ]. Thus, DL decreases the time and effort required to construct a feature extractor for each problem.
Model Training and Execution time In general, training a deep learning algorithm takes a long time due to a large number of parameters in the DL algorithm; thus, the model training process takes longer. For instance, the DL models can take more than one week to complete a training session, whereas training with ML algorithms takes relatively little time, only seconds to hours [ 107 , 127 ]. During testing, deep learning algorithms take extremely little time to run [ 127 ], when compared to certain machine learning methods.
Black-box Perception and Interpretability Interpretability is an important factor when comparing DL with ML. It’s difficult to explain how a deep learning result was obtained, i.e., “black-box”. On the other hand, the machine-learning algorithms, particularly, rule-based machine learning techniques [ 97 ] provide explicit logic rules (IF-THEN) for making decisions that are easily interpretable for humans. For instance, in our earlier works, we have presented several machines learning rule-based techniques [ 100 , 102 , 105 ], where the extracted rules are human-understandable and easier to interpret, update or delete according to the target applications.
The most significant distinction between deep learning and regular machine learning is how well it performs when data grows exponentially. An illustration of the performance comparison between DL and standard ML algorithms has been shown in Fig. 3 , where DL modeling can increase the performance with the amount of data. Thus, DL modeling is extremely useful when dealing with a large amount of data because of its capacity to process vast amounts of features to build an effective data-driven model. In terms of developing and training DL models, it relies on parallelized matrix and tensor operations as well as computing gradients and optimization. Several, DL libraries and resources [ 30 ] such as PyTorch [ 82 ] (with a high-level API called Lightning) and TensorFlow [ 1 ] (which also offers Keras as a high-level API) offers these core utilities including many pre-trained models, as well as many other necessary functions for implementation and DL model building.

A typical DL workflow to solve real-world problems, which consists of three sequential stages (i) data understanding and preprocessing (ii) DL model building and training (iii) validation and interpretation
Deep Learning Techniques and Applications
In this section, we go through the various types of deep neural network techniques, which typically consider several layers of information-processing stages in hierarchical structures to learn. A typical deep neural network contains multiple hidden layers including input and output layers. Figure 5 shows a general structure of a deep neural network ( \(hidden \; layer=N\) and N \(\ge\) 2) comparing with a shallow network ( \(hidden \; layer=1\) ). We also present our taxonomy on DL techniques based on how they are used to solve various problems, in this section. However, before exploring the details of the DL techniques, it’s useful to review various types of learning tasks such as (i) Supervised: a task-driven approach that uses labeled training data, (ii) Unsupervised: a data-driven process that analyzes unlabeled datasets, (iii) Semi-supervised: a hybridization of both the supervised and unsupervised methods, and (iv) Reinforcement: an environment driven approach, discussed briefly in our earlier paper [ 97 ]. Thus, to present our taxonomy, we divide DL techniques broadly into three major categories: (i) deep networks for supervised or discriminative learning; (ii) deep networks for unsupervised or generative learning; and (ii) deep networks for hybrid learning combing both and relevant others, as shown in Fig. 6 . In the following, we briefly discuss each of these techniques that can be used to solve real-world problems in various application areas according to their learning capabilities.

A general architecture of a a shallow network with one hidden layer and b a deep neural network with multiple hidden layers

A taxonomy of DL techniques, broadly divided into three major categories (i) deep networks for supervised or discriminative learning, (ii) deep networks for unsupervised or generative learning, and (ii) deep networks for hybrid learning and relevant others
Deep Networks for Supervised or Discriminative Learning
This category of DL techniques is utilized to provide a discriminative function in supervised or classification applications. Discriminative deep architectures are typically designed to give discriminative power for pattern classification by describing the posterior distributions of classes conditioned on visible data [ 21 ]. Discriminative architectures mainly include Multi-Layer Perceptron (MLP), Convolutional Neural Networks (CNN or ConvNet), Recurrent Neural Networks (RNN), along with their variants. In the following, we briefly discuss these techniques.
Multi-layer Perceptron (MLP)
Multi-layer Perceptron (MLP), a supervised learning approach [ 83 ], is a type of feedforward artificial neural network (ANN). It is also known as the foundation architecture of deep neural networks (DNN) or deep learning. A typical MLP is a fully connected network that consists of an input layer that receives input data, an output layer that makes a decision or prediction about the input signal, and one or more hidden layers between these two that are considered as the network’s computational engine [ 36 , 103 ]. The output of an MLP network is determined using a variety of activation functions, also known as transfer functions, such as ReLU (Rectified Linear Unit), Tanh, Sigmoid, and Softmax [ 83 , 96 ]. To train MLP employs the most extensively used algorithm “Backpropagation” [ 36 ], a supervised learning technique, which is also known as the most basic building block of a neural network. During the training process, various optimization approaches such as Stochastic Gradient Descent (SGD), Limited Memory BFGS (L-BFGS), and Adaptive Moment Estimation (Adam) are applied. MLP requires tuning of several hyperparameters such as the number of hidden layers, neurons, and iterations, which could make solving a complicated model computationally expensive. However, through partial fit, MLP offers the advantage of learning non-linear models in real-time or online [ 83 ].
Convolutional Neural Network (CNN or ConvNet)
The Convolutional Neural Network (CNN or ConvNet) [ 65 ] is a popular discriminative deep learning architecture that learns directly from the input without the need for human feature extraction. Figure 7 shows an example of a CNN including multiple convolutions and pooling layers. As a result, the CNN enhances the design of traditional ANN like regularized MLP networks. Each layer in CNN takes into account optimum parameters for a meaningful output as well as reduces model complexity. CNN also uses a ‘dropout’ [ 30 ] that can deal with the problem of over-fitting, which may occur in a traditional network.

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers
CNNs are specifically intended to deal with a variety of 2D shapes and are thus widely employed in visual recognition, medical image analysis, image segmentation, natural language processing, and many more [ 65 , 96 ]. The capability of automatically discovering essential features from the input without the need for human intervention makes it more powerful than a traditional network. Several variants of CNN are exist in the area that includes visual geometry group (VGG) [ 38 ], AlexNet [ 62 ], Xception [ 17 ], Inception [ 116 ], ResNet [ 39 ], etc. that can be used in various application domains according to their learning capabilities.
Recurrent Neural Network (RNN) and its Variants
A Recurrent Neural Network (RNN) is another popular neural network, which employs sequential or time-series data and feeds the output from the previous step as input to the current stage [ 27 , 74 ]. Like feedforward and CNN, recurrent networks learn from training input, however, distinguish by their “memory”, which allows them to impact current input and output through using information from previous inputs. Unlike typical DNN, which assumes that inputs and outputs are independent of one another, the output of RNN is reliant on prior elements within the sequence. However, standard recurrent networks have the issue of vanishing gradients, which makes learning long data sequences challenging. In the following, we discuss several popular variants of the recurrent network that minimizes the issues and perform well in many real-world application domains.
Long short-term memory (LSTM) This is a popular form of RNN architecture that uses special units to deal with the vanishing gradient problem, which was introduced by Hochreiter et al. [ 42 ]. A memory cell in an LSTM unit can store data for long periods and the flow of information into and out of the cell is managed by three gates. For instance, the ‘Forget Gate’ determines what information from the previous state cell will be memorized and what information will be removed that is no longer useful, while the ‘Input Gate’ determines which information should enter the cell state and the ‘Output Gate’ determines and controls the outputs. As it solves the issues of training a recurrent network, the LSTM network is considered one of the most successful RNN.
Bidirectional RNN/LSTM Bidirectional RNNs connect two hidden layers that run in opposite directions to a single output, allowing them to accept data from both the past and future. Bidirectional RNNs, unlike traditional recurrent networks, are trained to predict both positive and negative time directions at the same time. A Bidirectional LSTM, often known as a BiLSTM, is an extension of the standard LSTM that can increase model performance on sequence classification issues [ 113 ]. It is a sequence processing model comprising of two LSTMs: one takes the input forward and the other takes it backward. Bidirectional LSTM in particular is a popular choice in natural language processing tasks.
Gated recurrent units (GRUs) A Gated Recurrent Unit (GRU) is another popular variant of the recurrent network that uses gating methods to control and manage information flow between cells in the neural network, introduced by Cho et al. [ 16 ]. The GRU is like an LSTM, however, has fewer parameters, as it has a reset gate and an update gate but lacks the output gate, as shown in Fig. 8 . Thus, the key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates). The GRU’s structure enables it to capture dependencies from large sequences of data in an adaptive manner, without discarding information from earlier parts of the sequence. Thus GRU is a slightly more streamlined variant that often offers comparable performance and is significantly faster to compute [ 18 ]. Although GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets [ 18 , 34 ], both variants of RNN have proven their effectiveness while producing the outcome.

Basic structure of a gated recurrent unit (GRU) cell consisting of reset and update gates
Overall, the basic property of a recurrent network is that it has at least one feedback connection, which enables activations to loop. This allows the networks to do temporal processing and sequence learning, such as sequence recognition or reproduction, temporal association or prediction, etc. Following are some popular application areas of recurrent networks such as prediction problems, machine translation, natural language processing, text summarization, speech recognition, and many more.
Deep Networks for Generative or Unsupervised Learning
This category of DL techniques is typically used to characterize the high-order correlation properties or features for pattern analysis or synthesis, as well as the joint statistical distributions of the visible data and their associated classes [ 21 ]. The key idea of generative deep architectures is that during the learning process, precise supervisory information such as target class labels is not of concern. As a result, the methods under this category are essentially applied for unsupervised learning as the methods are typically used for feature learning or data generating and representation [ 20 , 21 ]. Thus generative modeling can be used as preprocessing for the supervised learning tasks as well, which ensures the discriminative model accuracy. Commonly used deep neural network techniques for unsupervised or generative learning are Generative Adversarial Network (GAN), Autoencoder (AE), Restricted Boltzmann Machine (RBM), Self-Organizing Map (SOM), and Deep Belief Network (DBN) along with their variants.
Generative Adversarial Network (GAN)
A Generative Adversarial Network (GAN), designed by Ian Goodfellow [ 32 ], is a type of neural network architecture for generative modeling to create new plausible samples on demand. It involves automatically discovering and learning regularities or patterns in input data so that the model may be used to generate or output new examples from the original dataset. As shown in Fig. 9 , GANs are composed of two neural networks, a generator G that creates new data having properties similar to the original data, and a discriminator D that predicts the likelihood of a subsequent sample being drawn from actual data rather than data provided by the generator. Thus in GAN modeling, both the generator and discriminator are trained to compete with each other. While the generator tries to fool and confuse the discriminator by creating more realistic data, the discriminator tries to distinguish the genuine data from the fake data generated by G .

Schematic structure of a standard generative adversarial network (GAN)
Generally, GAN network deployment is designed for unsupervised learning tasks, but it has also proven to be a better solution for semi-supervised and reinforcement learning as well depending on the task [ 3 ]. GANs are also used in state-of-the-art transfer learning research to enforce the alignment of the latent feature space [ 66 ]. Inverse models, such as Bidirectional GAN (BiGAN) [ 25 ] can also learn a mapping from data to the latent space, similar to how the standard GAN model learns a mapping from a latent space to the data distribution. The potential application areas of GAN networks are healthcare, image analysis, data augmentation, video generation, voice generation, pandemics, traffic control, cybersecurity, and many more, which are increasing rapidly. Overall, GANs have established themselves as a comprehensive domain of independent data expansion and as a solution to problems requiring a generative solution.
Auto-Encoder (AE) and Its Variants
An auto-encoder (AE) [ 31 ] is a popular unsupervised learning technique in which neural networks are used to learn representations. Typically, auto-encoders are used to work with high-dimensional data, and dimensionality reduction explains how a set of data is represented. Encoder, code, and decoder are the three parts of an autoencoder. The encoder compresses the input and generates the code, which the decoder subsequently uses to reconstruct the input. The AEs have recently been used to learn generative data models [ 69 ]. The auto-encoder is widely used in many unsupervised learning tasks, e.g., dimensionality reduction, feature extraction, efficient coding, generative modeling, denoising, anomaly or outlier detection, etc. [ 31 , 132 ]. Principal component analysis (PCA) [ 99 ], which is also used to reduce the dimensionality of huge data sets, is essentially similar to a single-layered AE with a linear activation function. Regularized autoencoders such as sparse, denoising, and contractive are useful for learning representations for later classification tasks [ 119 ], while variational autoencoders can be used as generative models [ 56 ], discussed below.
Sparse Autoencoder (SAE) A sparse autoencoder [ 73 ] has a sparsity penalty on the coding layer as a part of its training requirement. SAEs may have more hidden units than inputs, but only a small number of hidden units are permitted to be active at the same time, resulting in a sparse model. Figure 10 shows a schematic structure of a sparse autoencoder with several active units in the hidden layer. This model is thus obliged to respond to the unique statistical features of the training data following its constraints.
Denoising Autoencoder (DAE) A denoising autoencoder is a variant on the basic autoencoder that attempts to improve representation (to extract useful features) by altering the reconstruction criterion, and thus reduces the risk of learning the identity function [ 31 , 119 ]. In other words, it receives a corrupted data point as input and is trained to recover the original undistorted input as its output through minimizing the average reconstruction error over the training data, i.e, cleaning the corrupted input, or denoising. Thus, in the context of computing, DAEs can be considered as very powerful filters that can be utilized for automatic pre-processing. A denoising autoencoder, for example, could be used to automatically pre-process an image, thereby boosting its quality for recognition accuracy.
Contractive Autoencoder (CAE) The idea behind a contractive autoencoder, proposed by Rifai et al. [ 90 ], is to make the autoencoders robust of small changes in the training dataset. In its objective function, a CAE includes an explicit regularizer that forces the model to learn an encoding that is robust to small changes in input values. As a result, the learned representation’s sensitivity to the training input is reduced. While DAEs encourage the robustness of reconstruction as discussed above, CAEs encourage the robustness of representation.
Variational Autoencoder (VAE) A variational autoencoder [ 55 ] has a fundamentally unique property that distinguishes it from the classical autoencoder discussed above, which makes this so effective for generative modeling. VAEs, unlike the traditional autoencoders which map the input onto a latent vector, map the input data into the parameters of a probability distribution, such as the mean and variance of a Gaussian distribution. A VAE assumes that the source data has an underlying probability distribution and then tries to discover the distribution’s parameters. Although this approach was initially designed for unsupervised learning, its use has been demonstrated in other domains such as semi-supervised learning [ 128 ] and supervised learning [ 51 ].

Schematic structure of a sparse autoencoder (SAE) with several active units (filled circle) in the hidden layer
Although, the earlier concept of AE was typically for dimensionality reduction or feature learning mentioned above, recently, AEs have been brought to the forefront of generative modeling, even the generative adversarial network is one of the popular methods in the area. The AEs have been effectively employed in a variety of domains, including healthcare, computer vision, speech recognition, cybersecurity, natural language processing, and many more. Overall, we can conclude that auto-encoder and its variants can play a significant role as unsupervised feature learning with neural network architecture.
Kohonen Map or Self-Organizing Map (SOM)
A Self-Organizing Map (SOM) or Kohonen Map [ 59 ] is another form of unsupervised learning technique for creating a low-dimensional (usually two-dimensional) representation of a higher-dimensional data set while maintaining the topological structure of the data. SOM is also known as a neural network-based dimensionality reduction algorithm that is commonly used for clustering [ 118 ]. A SOM adapts to the topological form of a dataset by repeatedly moving its neurons closer to the data points, allowing us to visualize enormous datasets and find probable clusters. The first layer of a SOM is the input layer, and the second layer is the output layer or feature map. Unlike other neural networks that use error-correction learning, such as backpropagation with gradient descent [ 36 ], SOMs employ competitive learning, which uses a neighborhood function to retain the input space’s topological features. SOM is widely utilized in a variety of applications, including pattern identification, health or medical diagnosis, anomaly detection, and virus or worm attack detection [ 60 , 87 ]. The primary benefit of employing a SOM is that this can make high-dimensional data easier to visualize and analyze to understand the patterns. The reduction of dimensionality and grid clustering makes it easy to observe similarities in the data. As a result, SOMs can play a vital role in developing a data-driven effective model for a particular problem domain, depending on the data characteristics.
Restricted Boltzmann Machine (RBM)
A Restricted Boltzmann Machine (RBM) [ 75 ] is also a generative stochastic neural network capable of learning a probability distribution across its inputs. Boltzmann machines typically consist of visible and hidden nodes and each node is connected to every other node, which helps us understand irregularities by learning how the system works in normal circumstances. RBMs are a subset of Boltzmann machines that have a limit on the number of connections between the visible and hidden layers [ 77 ]. This restriction permits training algorithms like the gradient-based contrastive divergence algorithm to be more efficient than those for Boltzmann machines in general [ 41 ]. RBMs have found applications in dimensionality reduction, classification, regression, collaborative filtering, feature learning, topic modeling, and many others. In the area of deep learning modeling, they can be trained either supervised or unsupervised, depending on the task. Overall, the RBMs can recognize patterns in data automatically and develop probabilistic or stochastic models, which are utilized for feature selection or extraction, as well as forming a deep belief network.
Deep Belief Network (DBN)
A Deep Belief Network (DBN) [ 40 ] is a multi-layer generative graphical model of stacking several individual unsupervised networks such as AEs or RBMs, that use each network’s hidden layer as the input for the next layer, i.e, connected sequentially. Thus, we can divide a DBN into (i) AE-DBN which is known as stacked AE, and (ii) RBM-DBN that is known as stacked RBM, where AE-DBN is composed of autoencoders and RBM-DBN is composed of restricted Boltzmann machines, discussed earlier. The ultimate goal is to develop a faster-unsupervised training technique for each sub-network that depends on contrastive divergence [ 41 ]. DBN can capture a hierarchical representation of input data based on its deep structure. The primary idea behind DBN is to train unsupervised feed-forward neural networks with unlabeled data before fine-tuning the network with labeled input. One of the most important advantages of DBN, as opposed to typical shallow learning networks, is that it permits the detection of deep patterns, which allows for reasoning abilities and the capture of the deep difference between normal and erroneous data [ 89 ]. A continuous DBN is simply an extension of a standard DBN that allows a continuous range of decimals instead of binary data. Overall, the DBN model can play a key role in a wide range of high-dimensional data applications due to its strong feature extraction and classification capabilities and become one of the significant topics in the field of neural networks.
In summary, the generative learning techniques discussed above typically allow us to generate a new representation of data through exploratory analysis. As a result, these deep generative networks can be utilized as preprocessing for supervised or discriminative learning tasks, as well as ensuring model accuracy, where unsupervised representation learning can allow for improved classifier generalization.
Deep Networks for Hybrid Learning and Other Approaches
In addition to the above-discussed deep learning categories, hybrid deep networks and several other approaches such as deep transfer learning (DTL) and deep reinforcement learning (DRL) are popular, which are discussed in the following.
Hybrid Deep Neural Networks
Generative models are adaptable, with the capacity to learn from both labeled and unlabeled data. Discriminative models, on the other hand, are unable to learn from unlabeled data yet outperform their generative counterparts in supervised tasks. A framework for training both deep generative and discriminative models simultaneously can enjoy the benefits of both models, which motivates hybrid networks.
Hybrid deep learning models are typically composed of multiple (two or more) deep basic learning models, where the basic model is a discriminative or generative deep learning model discussed earlier. Based on the integration of different basic generative or discriminative models, the below three categories of hybrid deep learning models might be useful for solving real-world problems. These are as follows:
Hybrid \(Model\_1\) : An integration of different generative or discriminative models to extract more meaningful and robust features. Examples could be CNN+LSTM, AE+GAN, and so on.
Hybrid \(Model\_2\) : An integration of generative model followed by a discriminative model. Examples could be DBN+MLP, GAN+CNN, AE+CNN, and so on.
Hybrid \(Model\_3\) : An integration of generative or discriminative model followed by a non-deep learning classifier. Examples could be AE+SVM, CNN+SVM, and so on.
Thus, in a broad sense, we can conclude that hybrid models can be either classification-focused or non-classification depending on the target use. However, most of the hybrid learning-related studies in the area of deep learning are classification-focused or supervised learning tasks, summarized in Table 1 . The unsupervised generative models with meaningful representations are employed to enhance the discriminative models. The generative models with useful representation can provide more informative and low-dimensional features for discrimination, and they can also enable to enhance the training data quality and quantity, providing additional information for classification.
Deep Transfer Learning (DTL)
Transfer Learning is a technique for effectively using previously learned model knowledge to solve a new task with minimum training or fine-tuning. In comparison to typical machine learning techniques [ 97 ], DL takes a large amount of training data. As a result, the need for a substantial volume of labeled data is a significant barrier to address some essential domain-specific tasks, particularly, in the medical sector, where creating large-scale, high-quality annotated medical or health datasets is both difficult and costly. Furthermore, the standard DL model demands a lot of computational resources, such as a GPU-enabled server, even though researchers are working hard to improve it. As a result, Deep Transfer Learning (DTL), a DL-based transfer learning method, might be helpful to address this issue. Figure 11 shows a general structure of the transfer learning process, where knowledge from the pre-trained model is transferred into a new DL model. It’s especially popular in deep learning right now since it allows to train deep neural networks with very little data [ 126 ].

A general structure of transfer learning process, where knowledge from pre-trained model is transferred into new DL model
Transfer learning is a two-stage approach for training a DL model that consists of a pre-training step and a fine-tuning step in which the model is trained on the target task. Since deep neural networks have gained popularity in a variety of fields, a large number of DTL methods have been presented, making it crucial to categorize and summarize them. Based on the techniques used in the literature, DTL can be classified into four categories [ 117 ]. These are (i) instances-based deep transfer learning that utilizes instances in source domain by appropriate weight, (ii) mapping-based deep transfer learning that maps instances from two domains into a new data space with better similarity, (iii) network-based deep transfer learning that reuses the partial of network pre-trained in the source domain, and (iv) adversarial based deep transfer learning that uses adversarial technology to find transferable features that both suitable for two domains. Due to its high effectiveness and practicality, adversarial-based deep transfer learning has exploded in popularity in recent years. Transfer learning can also be classified into inductive, transductive, and unsupervised transfer learning depending on the circumstances between the source and target domains and activities [ 81 ]. While most current research focuses on supervised learning, how deep neural networks can transfer knowledge in unsupervised or semi-supervised learning may gain further interest in the future. DTL techniques are useful in a variety of fields including natural language processing, sentiment classification, visual recognition, speech recognition, spam filtering, and relevant others.
Deep Reinforcement Learning (DRL)
Reinforcement learning takes a different approach to solving the sequential decision-making problem than other approaches we have discussed so far. The concepts of an environment and an agent are often introduced first in reinforcement learning. The agent can perform a series of actions in the environment, each of which has an impact on the environment’s state and can result in possible rewards (feedback) - “positive” for good sequences of actions that result in a “good” state, and “negative” for bad sequences of actions that result in a “bad” state. The purpose of reinforcement learning is to learn good action sequences through interaction with the environment, typically referred to as a policy.

Schematic structure of deep reinforcement learning (DRL) highlighting a deep neural network
Deep reinforcement learning (DRL or deep RL) [ 9 ] integrates neural networks with a reinforcement learning architecture to allow the agents to learn the appropriate actions in a virtual environment, as shown in Fig. 12 . In the area of reinforcement learning, model-based RL is based on learning a transition model that enables for modeling of the environment without interacting with it directly, whereas model-free RL methods learn directly from interactions with the environment. Q-learning is a popular model-free RL technique for determining the best action-selection policy for any (finite) Markov Decision Process (MDP) [ 86 , 97 ]. MDP is a mathematical framework for modeling decisions based on state, action, and rewards [ 86 ]. In addition, Deep Q-Networks, Double DQN, Bi-directional Learning, Monte Carlo Control, etc. are used in the area [ 50 , 97 ]. In DRL methods it incorporates DL models, e.g. Deep Neural Networks (DNN), based on MDP principle [ 71 ], as policy and/or value function approximators. CNN for example can be used as a component of RL agents to learn directly from raw, high-dimensional visual inputs. In the real world, DRL-based solutions can be used in several application areas including robotics, video games, natural language processing, computer vision, and relevant others.

Several potential real-world application areas of deep learning
Deep Learning Application Summary
During the past few years, deep learning has been successfully applied to numerous problems in many application areas. These include natural language processing, sentiment analysis, cybersecurity, business, virtual assistants, visual recognition, healthcare, robotics, and many more. In Fig. 13 , we have summarized several potential real-world application areas of deep learning. Various deep learning techniques according to our presented taxonomy in Fig. 6 that includes discriminative learning, generative learning, as well as hybrid models, discussed earlier, are employed in these application areas. In Table 1 , we have also summarized various deep learning tasks and techniques that are used to solve the relevant tasks in several real-world applications areas. Overall, from Fig. 13 and Table 1 , we can conclude that the future prospects of deep learning modeling in real-world application areas are huge and there are lots of scopes to work. In the next section, we also summarize the research issues in deep learning modeling and point out the potential aspects for future generation DL modeling.
Research Directions and Future Aspects
While existing methods have established a solid foundation for deep learning systems and research, this section outlines the below ten potential future research directions based on our study.
Automation in Data Annotation According to the existing literature, discussed in Section 3 , most of the deep learning models are trained through publicly available datasets that are annotated. However, to build a system for a new problem domain or recent data-driven system, raw data from relevant sources are needed to collect. Thus, data annotation, e.g., categorization, tagging, or labeling of a large amount of raw data, is important for building discriminative deep learning models or supervised tasks, which is challenging. A technique with the capability of automatic and dynamic data annotation, rather than manual annotation or hiring annotators, particularly, for large datasets, could be more effective for supervised learning as well as minimizing human effort. Therefore, a more in-depth investigation of data collection and annotation methods, or designing an unsupervised learning-based solution could be one of the primary research directions in the area of deep learning modeling.
Data Preparation for Ensuring Data Quality As discussed earlier throughout the paper, the deep learning algorithms highly impact data quality, and availability for training, and consequently on the resultant model for a particular problem domain. Thus, deep learning models may become worthless or yield decreased accuracy if the data is bad, such as data sparsity, non-representative, poor-quality, ambiguous values, noise, data imbalance, irrelevant features, data inconsistency, insufficient quantity, and so on for training. Consequently, such issues in data can lead to poor processing and inaccurate findings, which is a major problem while discovering insights from data. Thus deep learning models also need to adapt to such rising issues in data, to capture approximated information from observations. Therefore, effective data pre-processing techniques are needed to design according to the nature of the data problem and characteristics, to handling such emerging challenges, which could be another research direction in the area.
Black-box Perception and Proper DL/ML Algorithm Selection In general, it’s difficult to explain how a deep learning result is obtained or how they get the ultimate decisions for a particular model. Although DL models achieve significant performance while learning from large datasets, as discussed in Section 2 , this “black-box” perception of DL modeling typically represents weak statistical interpretability that could be a major issue in the area. On the other hand, ML algorithms, particularly, rule-based machine learning techniques provide explicit logic rules (IF-THEN) for making decisions that are easier to interpret, update or delete according to the target applications [ 97 , 100 , 105 ]. If the wrong learning algorithm is chosen, unanticipated results may occur, resulting in a loss of effort as well as the model’s efficacy and accuracy. Thus by taking into account the performance, complexity, model accuracy, and applicability, selecting an appropriate model for the target application is challenging, and in-depth analysis is needed for better understanding and decision making.
Deep Networks for Supervised or Discriminative Learning: According to our designed taxonomy of deep learning techniques, as shown in Fig. 6 , discriminative architectures mainly include MLP, CNN, and RNN, along with their variants that are applied widely in various application domains. However, designing new techniques or their variants of such discriminative techniques by taking into account model optimization, accuracy, and applicability, according to the target real-world application and the nature of the data, could be a novel contribution, which can also be considered as a major future aspect in the area of supervised or discriminative learning.
Deep Networks for Unsupervised or Generative Learning As discussed in Section 3 , unsupervised learning or generative deep learning modeling is one of the major tasks in the area, as it allows us to characterize the high-order correlation properties or features in data, or generating a new representation of data through exploratory analysis. Moreover, unlike supervised learning [ 97 ], it does not require labeled data due to its capability to derive insights directly from the data as well as data-driven decision making. Consequently, it thus can be used as preprocessing for supervised learning or discriminative modeling as well as semi-supervised learning tasks, which ensure learning accuracy and model efficiency. According to our designed taxonomy of deep learning techniques, as shown in Fig. 6 , generative techniques mainly include GAN, AE, SOM, RBM, DBN, and their variants. Thus, designing new techniques or their variants for an effective data modeling or representation according to the target real-world application could be a novel contribution, which can also be considered as a major future aspect in the area of unsupervised or generative learning.
Hybrid/Ensemble Modeling and Uncertainty Handling According to our designed taxonomy of DL techniques, as shown in Fig 6 , this is considered as another major category in deep learning tasks. As hybrid modeling enjoys the benefits of both generative and discriminative learning, an effective hybridization can outperform others in terms of performance as well as uncertainty handling in high-risk applications. In Section 3 , we have summarized various types of hybridization, e.g., AE+CNN/SVM. Since a group of neural networks is trained with distinct parameters or with separate sub-sampling training datasets, hybridization or ensembles of such techniques, i.e., DL with DL/ML, can play a key role in the area. Thus designing effective blended discriminative and generative models accordingly rather than naive method, could be an important research opportunity to solve various real-world issues including semi-supervised learning tasks and model uncertainty.
Dynamism in Selecting Threshold/ Hyper-parameters Values, and Network Structures with Computational Efficiency In general, the relationship among performance, model complexity, and computational requirements is a key issue in deep learning modeling and applications. A combination of algorithmic advancements with improved accuracy as well as maintaining computational efficiency, i.e., achieving the maximum throughput while consuming the least amount of resources, without significant information loss, can lead to a breakthrough in the effectiveness of deep learning modeling in future real-world applications. The concept of incremental approaches or recency-based learning [ 100 ] might be effective in several cases depending on the nature of target applications. Moreover, assuming the network structures with a static number of nodes and layers, hyper-parameters values or threshold settings, or selecting them by the trial-and-error process may not be effective in many cases, as it can be changed due to the changes in data. Thus, a data-driven approach to select them dynamically could be more effective while building a deep learning model in terms of both performance and real-world applicability. Such type of data-driven automation can lead to future generation deep learning modeling with additional intelligence, which could be a significant future aspect in the area as well as an important research direction to contribute.
Lightweight Deep Learning Modeling for Next-Generation Smart Devices and Applications: In recent years, the Internet of Things (IoT) consisting of billions of intelligent and communicating things and mobile communications technologies have become popular to detect and gather human and environmental information (e.g. geo-information, weather data, bio-data, human behaviors, and so on) for a variety of intelligent services and applications. Every day, these ubiquitous smart things or devices generate large amounts of data, requiring rapid data processing on a variety of smart mobile devices [ 72 ]. Deep learning technologies can be incorporate to discover underlying properties and to effectively handle such large amounts of sensor data for a variety of IoT applications including health monitoring and disease analysis, smart cities, traffic flow prediction, and monitoring, smart transportation, manufacture inspection, fault assessment, smart industry or Industry 4.0, and many more. Although deep learning techniques discussed in Section 3 are considered as powerful tools for processing big data, lightweight modeling is important for resource-constrained devices, due to their high computational cost and considerable memory overhead. Thus several techniques such as optimization, simplification, compression, pruning, generalization, important feature extraction, etc. might be helpful in several cases. Therefore, constructing the lightweight deep learning techniques based on a baseline network architecture to adapt the DL model for next-generation mobile, IoT, or resource-constrained devices and applications, could be considered as a significant future aspect in the area.
Incorporating Domain Knowledge into Deep Learning Modeling Domain knowledge, as opposed to general knowledge or domain-independent knowledge, is knowledge of a specific, specialized topic or field. For instance, in terms of natural language processing, the properties of the English language typically differ from other languages like Bengali, Arabic, French, etc. Thus integrating domain-based constraints into the deep learning model could produce better results for such particular purpose. For instance, a task-specific feature extractor considering domain knowledge in smart manufacturing for fault diagnosis can resolve the issues in traditional deep-learning-based methods [ 28 ]. Similarly, domain knowledge in medical image analysis [ 58 ], financial sentiment analysis [ 49 ], cybersecurity analytics [ 94 , 103 ] as well as conceptual data model in which semantic information, (i.e., meaningful for a system, rather than merely correlational) [ 45 , 121 , 131 ] is included, can play a vital role in the area. Transfer learning could be an effective way to get started on a new challenge with domain knowledge. Moreover, contextual information such as spatial, temporal, social, environmental contexts [ 92 , 104 , 108 ] can also play an important role to incorporate context-aware computing with domain knowledge for smart decision making as well as building adaptive and intelligent context-aware systems. Therefore understanding domain knowledge and effectively incorporating them into the deep learning model could be another research direction.
Designing General Deep Learning Framework for Target Application Domains One promising research direction for deep learning-based solutions is to develop a general framework that can handle data diversity, dimensions, stimulation types, etc. The general framework would require two key capabilities: the attention mechanism that focuses on the most valuable parts of input signals, and the ability to capture latent feature that enables the framework to capture the distinctive and informative features. Attention models have been a popular research topic because of their intuition, versatility, and interpretability, and employed in various application areas like computer vision, natural language processing, text or image classification, sentiment analysis, recommender systems, user profiling, etc [ 13 , 80 ]. Attention mechanism can be implemented based on learning algorithms such as reinforcement learning that is capable of finding the most useful part through a policy search [ 133 , 134 ]. Similarly, CNN can be integrated with suitable attention mechanisms to form a general classification framework, where CNN can be used as a feature learning tool for capturing features in various levels and ranges. Thus, designing a general deep learning framework considering attention as well as a latent feature for target application domains could be another area to contribute.
To summarize, deep learning is a fairly open topic to which academics can contribute by developing new methods or improving existing methods to handle the above-mentioned concerns and tackle real-world problems in a variety of application areas. This can also help the researchers conduct a thorough analysis of the application’s hidden and unexpected challenges to produce more reliable and realistic outcomes. Overall, we can conclude that addressing the above-mentioned issues and contributing to proposing effective and efficient techniques could lead to “Future Generation DL” modeling as well as more intelligent and automated applications.
Concluding Remarks
In this article, we have presented a structured and comprehensive view of deep learning technology, which is considered a core part of artificial intelligence as well as data science. It starts with a history of artificial neural networks and moves to recent deep learning techniques and breakthroughs in different applications. Then, the key algorithms in this area, as well as deep neural network modeling in various dimensions are explored. For this, we have also presented a taxonomy considering the variations of deep learning tasks and how they are used for different purposes. In our comprehensive study, we have taken into account not only the deep networks for supervised or discriminative learning but also the deep networks for unsupervised or generative learning, and hybrid learning that can be used to solve a variety of real-world issues according to the nature of problems.
Deep learning, unlike traditional machine learning and data mining algorithms, can produce extremely high-level data representations from enormous amounts of raw data. As a result, it has provided an excellent solution to a variety of real-world problems. A successful deep learning technique must possess the relevant data-driven modeling depending on the characteristics of raw data. The sophisticated learning algorithms then need to be trained through the collected data and knowledge related to the target application before the system can assist with intelligent decision-making. Deep learning has shown to be useful in a wide range of applications and research areas such as healthcare, sentiment analysis, visual recognition, business intelligence, cybersecurity, and many more that are summarized in the paper.
Finally, we have summarized and discussed the challenges faced and the potential research directions, and future aspects in the area. Although deep learning is considered a black-box solution for many applications due to its poor reasoning and interpretability, addressing the challenges or future aspects that are identified could lead to future generation deep learning modeling and smarter systems. This can also help the researchers for in-depth analysis to produce more reliable and realistic outcomes. Overall, we believe that our study on neural networks and deep learning-based advanced analytics points in a promising path and can be utilized as a reference guide for future research and implementations in relevant application domains by both academic and industry professionals.
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin Ma, Ghemawat S, Irving G, Isard M, et al. Tensorflow: a system for large-scale machine learning. In: 12th { USENIX } Symposium on operating systems design and implementation ({ OSDI } 16), 2016; p. 265–283.
Abdel-Basset M, Hawash H, Chakrabortty RK, Ryan M. Energy-net: a deep learning approach for smart energy management in iot-based smart cities. IEEE Internet of Things J. 2021.
Aggarwal A, Mittal M, Battineni G. Generative adversarial network: an overview of theory and applications. Int J Inf Manag Data Insights. 2021; p. 100004.
Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K. Deep learning approach combining sparse autoencoder with svm for network intrusion detection. IEEE Access. 2018;6:52843–56.
Article Google Scholar
Ale L, Sheta A, Li L, Wang Y, Zhang N. Deep learning based plant disease detection for smart agriculture. In: 2019 IEEE Globecom Workshops (GC Wkshps), 2019; p. 1–6. IEEE.
Amarbayasgalan T, Lee JY, Kim KR, Ryu KH. Deep autoencoder based neural networks for coronary heart disease risk prediction. In: Heterogeneous data management, polystores, and analytics for healthcare. Springer; 2019. p. 237–48.
Anuradha J, et al. Big data based stock trend prediction using deep cnn with reinforcement-lstm model. Int J Syst Assur Eng Manag. 2021; p. 1–11.
Aqib M, Mehmood R, Albeshri A, Alzahrani A. Disaster management in smart cities by forecasting traffic plan using deep learning and gpus. In: International Conference on smart cities, infrastructure, technologies and applications. Springer; 2017. p. 139–54.
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag. 2017;34(6):26–38.
Aslan MF, Unlersen MF, Sabanci K, Durdu A. Cnn-based transfer learning-bilstm network: a novel approach for covid-19 infection detection. Appl Soft Comput. 2021;98:106912.
Bu F, Wang X. A smart agriculture iot system based on deep reinforcement learning. Futur Gener Comput Syst. 2019;99:500–7.
Chang W-J, Chen L-B, Hsu C-H, Lin C-P, Yang T-C. A deep learning-based intelligent medicine recognition system for chronic patients. IEEE Access. 2019;7:44441–58.
Chaudhari S, Mithal V, Polatkan Gu, Ramanath R. An attentive survey of attention models. arXiv preprint arXiv:1904.02874, 2019.
Chaudhuri N, Gupta G, Vamsi V, Bose I. On the platform but will they buy? predicting customers’ purchase behavior using deep learning. Decis Support Syst. 2021; p. 113622.
Chen D, Wawrzynski P, Lv Z. Cyber security in smart cities: a review of deep learning-based applications and case studies. Sustain Cities Soc. 2020; p. 102655.
Cho K, Van MB, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
Chollet F. Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2017; p. 1251–258.
Chung J, Gulcehre C, Cho KH, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
Coelho IM, Coelho VN, da Eduardo J, Luz S, Ochi LS, Guimarães FG, Rios E. A gpu deep learning metaheuristic based model for time series forecasting. Appl Energy. 2017;201:412–8.
Da'u A, Salim N. Recommendation system based on deep learning methods: a systematic review and new directions. Artif Intel Rev. 2020;53(4):2709–48.
Deng L. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process. 2014; p. 3.
Deng L, Dong Yu. Deep learning: methods and applications. Found Trends Signal Process. 2014;7(3–4):197–387.
Article MathSciNet MATH Google Scholar
Deng S, Li R, Jin Y, He H. Cnn-based feature cross and classifier for loan default prediction. In: 2020 International Conference on image, video processing and artificial intelligence, volume 11584, page 115841K. International Society for Optics and Photonics, 2020.
Dhyani M, Kumar R. An intelligent chatbot using deep learning with bidirectional rnn and attention model. Mater Today Proc. 2021;34:817–24.
Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.
Du K-L, Swamy MNS. Neural networks and statistical learning. Berlin: Springer Science & Business Media; 2013.
MATH Google Scholar
Dupond S. A thorough review on the current advance of neural network structures. Annu Rev Control. 2019;14:200–30.
Google Scholar
Feng J, Yao Y, Lu S, Liu Y. Domain knowledge-based deep-broad learning framework for fault diagnosis. IEEE Trans Ind Electron. 2020;68(4):3454–64.
Garg S, Kaur K, Kumar N, Rodrigues JJPC. Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in sdn: a social multimedia perspective. IEEE Trans Multimed. 2019;21(3):566–78.
Géron A. Hands-on machine learning with Scikit-Learn, Keras. In: and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media; 2019.
Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning, vol. 1. Cambridge: MIT Press; 2016.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014; p. 2672–680.
Google trends. 2021. https://trends.google.com/trends/ .
Gruber N, Jockisch A. Are gru cells more specific and lstm cells more sensitive in motive classification of text? Front Artif Intell. 2020;3:40.
Gu B, Ge R, Chen Y, Luo L, Coatrieux G. Automatic and robust object detection in x-ray baggage inspection using deep convolutional neural networks. IEEE Trans Ind Electron. 2020.
Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.
Haykin S. Neural networks and learning machines, 3/E. London: Pearson Education; 2010.
He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2016; p. 770–78.
Hinton GE. Deep belief networks. Scholarpedia. 2009;4(5):5947.
Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Huang C-J, Kuo P-H. A deep cnn-lstm model for particulate matter (pm2. 5) forecasting in smart cities. Sensors. 2018;18(7):2220.
Huang H-H, Fukuda M, Nishida T. Toward rnn based micro non-verbal behavior generation for virtual listener agents. In: International Conference on human-computer interaction, 2019; p. 53–63. Springer.
Hulsebos M, Hu K, Bakker M, Zgraggen E, Satyanarayan A, Kraska T, Demiralp Ça, Hidalgo C. Sherlock: a deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on knowledge discovery & data mining, 2019; p. 1500–508.
Imamverdiyev Y, Abdullayeva F. Deep learning method for denial of service attack detection based on restricted Boltzmann machine. Big Data. 2018;6(2):159–69.
Islam MZ, Islam MM, Asraf A. A combined deep cnn-lstm network for the detection of novel coronavirus (covid-19) using x-ray images. Inf Med Unlock. 2020;20:100412.
Ismail WN, Hassan MM, Alsalamah HA, Fortino G. Cnn-based health model for regular health factors analysis in internet-of-medical things environment. IEEE. Access. 2020;8:52541–9.
Jangid H, Singhal S, Shah RR, Zimmermann R. Aspect-based financial sentiment analysis using deep learning. In: Companion Proceedings of the The Web Conference 2018, 2018; p. 1961–966.
Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.
Kameoka H, Li L, Inoue S, Makino S. Supervised determined source separation with multichannel variational autoencoder. Neural Comput. 2019;31(9):1891–914.
Karhunen J, Raiko T, Cho KH. Unsupervised deep learning: a short review. In: Advances in independent component analysis and learning machines. 2015; p. 125–42.
Kawde P, Verma GK. Deep belief network based affect recognition from physiological signals. In: 2017 4th IEEE Uttar Pradesh Section International Conference on electrical, computer and electronics (UPCON), 2017; p. 587–92. IEEE.
Kim J-Y, Seok-Jun B, Cho S-B. Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf Sci. 2018;460:83–102.
Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
Kingma DP, Welling M. An introduction to variational autoencoders. arXiv preprint arXiv:1906.02691, 2019.
Kiran PKR, Bhasker B. Dnnrec: a novel deep learning based hybrid recommender system. Expert Syst Appl. 2020.
Kloenne M, Niehaus S, Lampe L, Merola A, Reinelt J, Roeder I, Scherf N. Domain-specific cues improve robustness of deep learning-based segmentation of ct volumes. Sci Rep. 2020;10(1):1–9.
Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.
Kohonen T. Essentials of the self-organizing map. Neural Netw. 2013;37:52–65.
Kök İ, Şimşek MU, Özdemir S. A deep learning model for air quality prediction in smart cities. In: 2017 IEEE International Conference on Big Data (Big Data), 2017; p. 1983–990. IEEE.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. 2012; p. 1097–105.
Latif S, Rana R, Younis S, Qadir J, Epps J. Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353, 2018.
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
Li B, François-Lavet V, Doan T, Pineau J. Domain adversarial reinforcement learning. arXiv preprint arXiv:2102.07097, 2021.
Li T-HS, Kuo P-H, Tsai T-N, Luan P-C. Cnn and lstm based facial expression analysis model for a humanoid robot. IEEE Access. 2019;7:93998–4011.
Liu C, Cao Y, Luo Y, Chen G, Vokkarane V, Yunsheng M, Chen S, Hou P. A new deep learning-based food recognition system for dietary assessment on an edge computing service infrastructure. IEEE Trans Serv Comput. 2017;11(2):249–61.
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey of deep neural network architectures and their applications. Neurocomputing. 2017;234:11–26.
López AU, Mateo F, Navío-Marco J, Martínez-Martínez JM, Gómez-Sanchís J, Vila-Francés J, Serrano-López AJ. Analysis of computer user behavior, security incidents and fraud using self-organizing maps. Comput Secur. 2019;83:38–51.
Lopez-Martin M, Carro B, Sanchez-Esguevillas A. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Syst Appl. 2020;141:112963.
Ma X, Yao T, Menglan H, Dong Y, Liu W, Wang F, Liu J. A survey on deep learning empowered iot applications. IEEE Access. 2019;7:181721–32.
Makhzani A, Frey B. K-sparse autoencoders. arXiv preprint arXiv:1312.5663, 2013.
Mandic D, Chambers J. Recurrent neural networks for prediction: learning algorithms, architectures and stability. Hoboken: Wiley; 2001.
Book Google Scholar
Marlin B, Swersky K, Chen B, Freitas N. Inductive principles for restricted boltzmann machine learning. In: Proceedings of the Thirteenth International Conference on artificial intelligence and statistics, p. 509–16. JMLR Workshop and Conference Proceedings, 2010.
Masud M, Muhammad G, Alhumyani H, Alshamrani SS, Cheikhrouhou O, Ibrahim S, Hossain MS. Deep learning-based intelligent face recognition in iot-cloud environment. Comput Commun. 2020;152:215–22.
Memisevic R, Hinton GE. Learning to represent spatial transformations with factored higher-order boltzmann machines. Neural Comput. 2010;22(6):1473–92.
Article MATH Google Scholar
Minaee S, Azimi E, Abdolrashidi AA. Deep-sentiment: sentiment analysis using ensemble of cnn and bi-lstm models. arXiv preprint arXiv:1904.04206, 2019.
Naeem M, Paragliola G, Coronato A. A reinforcement learning and deep learning based intelligent system for the support of impaired patients in home treatment. Expert Syst Appl. 2021;168:114285.
Niu Z, Zhong G, Hui Yu. A review on the attention mechanism of deep learning. Neurocomputing. 2021;452:48–62.
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32:8026–37.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet MATH Google Scholar
Pi Y, Nath ND, Behzadan AH. Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv Eng Inf. 2020;43:101009.
Piccialli F, Giampaolo F, Prezioso E, Crisci D, Cuomo S. Predictive analytics for smart parking: A deep learning approach in forecasting of iot data. ACM Trans Internet Technol (TOIT). 2021;21(3):1–21.
Puterman ML. Markov decision processes: discrete stochastic dynamic programming. Hoboken: Wiley; 2014.
Qu X, Lin Y, Kai G, Linru M, Meng S, Mingxing K, Mu L, editors. A survey on the development of self-organizing maps for unsupervised intrusion detection. Mob Netw Appl. 2019; p. 1–22.
Rahman MW, Tashfia SS, Islam R, Hasan MM, Sultan SI, Mia S, Rahman MM. The architectural design of smart blind assistant using iot with deep learning paradigm. Internet of Things. 2021;13:100344.
Ren J, Green M, Huang X. From traditional to deep learning: fault diagnosis for autonomous vehicles. In: Learning control. Elsevier. 2021; p. 205–19.
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. Contractive auto-encoders: Explicit invariance during feature extraction. In: Icml, 2011.
Rosa RL, Schwartz GM, Ruggiero WV, Rodríguez DZ. A knowledge-based recommendation system that includes sentiment analysis and deep learning. IEEE Trans Ind Inf. 2018;15(4):2124–35.
Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):1–25.
Article MathSciNet Google Scholar
Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet of Things. 2019;5:180–93.
Sarker IH. Cyberlearning: effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet of Things. 2021;14:100393.
Sarker IH. Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Comput Sci. 2021.
Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Computer. Science. 2021;2(3):1–16.
MathSciNet Google Scholar
Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Computer. Science. 2021;2(3):1–21.
Sarker IH, Abushark YB, Alsolami F, Khan AI. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.
Sarker IH, Abushark YB, Khan AI. Contextpca: Predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.
Sarker IH, Colman A, Han J. Recencyminer: mining recency-based personalized behavior from contextual smartphone data. J Big Data. 2019;6(1):1–21.
Sarker IH, Colman A, Han J, Khan AI, Abushark YB, Salah K. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mob Netw Appl. 2020;25(3):1151–61.
Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J. 2018;61(3):349–68.
Sarker IH, Furhad MH, Nowrozy R. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Computer. Science. 2021;2(3):1–18.
Sarker IH, Hoque MM, Uddin MK. Mobile data science and intelligent apps: concepts, ai-based modeling and research directions. Mob Netw Appl. 2021;26(1):285–303.
Sarker IH, Kayes ASM. Abc-ruleminer: User behavioral rule-based machine learning method for context-aware intelligent services. J Netw Comput Appl. 2020;168:102762.
Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big data. 2020;7(1):1–29.
Sarker IH, Kayes ASM, Watters P. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.
Sarker IH, Salah K. Appspred: predicting context-aware smartphone apps using random forest learning. Internet of Things. 2019;8:100106.
Satt A, Rozenberg S, Hoory R. Efficient emotion recognition from speech using deep learning on spectrograms. In: Interspeec, 2017; p. 1089–1093.
Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y. Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans Comput Biol Bioinf. 2018;16(6):2089–100.
Sujay Narumanchi H, Ananya Pramod Kompalli Shankar A, Devashish CK. Deep learning based large scale visual recommendation and search for e-commerce. arXiv preprint arXiv:1703.02344, 2017.
Shao X, Kim CS. Multi-step short-term power consumption forecasting using multi-channel lstm with time location considering customer behavior. IEEE Access. 2020;8:125263–73.
Siami-Namini S, Tavakoli N, Namin AS. The performance of lstm and bilstm in forecasting time series. In: 2019 IEEE International Conference on Big Data (Big Data), 2019; p. 3285–292. IEEE.
Ślusarczyk B. Industry 4.0: are we ready? Pol J Manag Stud. 2018; p. 17
Sumathi P, Subramanian R, Karthikeyan VV, Karthik S. Soil monitoring and evaluation system using edl-asqe: enhanced deep learning model for ioi smart agriculture network. Int J Commun Syst. 2021; p. e4859.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2015; p. 1–9.
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In: International Conference on artificial neural networks, 2018; p. 270–279. Springer.
Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Trans Neural Netw. 2000;11(3):586–600.
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A, Bottou L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010;11(12).
Wang J, Liang-Chih Yu, Robert Lai K, Zhang X. Tree-structured regional cnn-lstm model for dimensional sentiment analysis. IEEE/ACM Trans Audio Speech Lang Process. 2019;28:581–91.
Wang S, Wan J, Li D, Liu C. Knowledge reasoning with semantic data for real-time data processing in smart factory. Sensors. 2018;18(2):471.
Wang W, Zhao M, Wang J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput. 2019;10(8):3035–43.
Wang X, Liu J, Qiu T, Chaoxu M, Chen C, Zhou P. A real-time collision prediction mechanism with deep learning for intelligent transportation system. IEEE Trans Veh Technol. 2020;69(9):9497–508.
Wang Y, Huang M, Zhu X, Zhao L. Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on empirical methods in natural language processing, 2016; p. 606–615.
Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization method for intrusion detection classification model based on deep belief network. IEEE Access. 2019;7:87593–605.
Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.
Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. Ieee access. 2018;6:35365–81.
Xu W, Sun H, Deng C, Tan Y. Variational autoencoder for semi-supervised text classification. In: Thirty-First AAAI Conference on artificial intelligence, 2017.
Xue Q, Chuah MC. New attacks on rnn based healthcare learning system and their detections. Smart Health. 2018;9:144–57.
Yousefi-Azar M, Hamey L. Text summarization using unsupervised deep learning. Expert Syst Appl. 2017;68:93–105.
Yuan X, Shi J, Gu L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst Appl. 2020;p. 114417.
Zhang G, Liu Y, Jin X. A survey of autoencoder-based recommender systems. Front Comput Sci. 2020;14(2):430–50.
Zhang X, Yao L, Huang C, Wang S, Tan M, Long Gu, Wang C. Multi-modality sensor data classification with selective attention. arXiv preprint arXiv:1804.05493, 2018.
Zhang X, Yao L, Wang X, Monaghan J, Mcalpine D, Zhang Y. A survey on deep learning based brain computer interface: recent advances and new frontiers. arXiv preprint arXiv:1905.04149, 2019; p. 66.
Zhang Y, Zhang P, Yan Y. Attention-based lstm with multi-task learning for distant speech recognition. In: Interspeech, 2017; p. 3857–861.
Download references
Author information
Authors and affiliations.
Swinburne University of Technology, Melbourne, VIC, 3122, Australia
Chittagong University of Engineering & Technology, Chittagong, 4349, Bangladesh
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Iqbal H. Sarker .
Ethics declarations
Conflict of interest.
The author declares no conflict of interest.
Additional information
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K. N. and M. Shivakumar.
Rights and permissions
Reprints and Permissions
About this article
Cite this article.
Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN COMPUT. SCI. 2 , 420 (2021). https://doi.org/10.1007/s42979-021-00815-1
Download citation
Received : 29 May 2021
Accepted : 07 August 2021
Published : 18 August 2021
DOI : https://doi.org/10.1007/s42979-021-00815-1
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Deep learning
- Artificial neural network
- Artificial intelligence
- Discriminative learning
- Generative learning
- Hybrid learning
- Intelligent systems
Advertisement
- Event calendar
The Top 17 ‘Must-Read’ AI Papers in 2022
We caught up with experts in the RE•WORK community to find out what the top 17 AI papers are for 2022 so far that you can add to your Summer must reads. The papers cover a wide range of topics including AI in social media and how AI can benefit humanity and are free to access.
Interested in learning more? Check out all the upcoming RE•WORK events to find out about the latest trends and industry updates in AI here .
Max Li, Staff Data Scientist – Tech Lead at Wish
Max is a Staff Data Scientist at Wish where he focuses on experimentation (A/B testing) and machine learning. His passion is to empower data-driven decision-making through the rigorous use of data. View Max’s presentation, ‘Assign Experiment Variants at Scale in A/B Tests’, from our Deep Learning Summit in February 2022 here .
1. Boostrapped Meta-Learning (2022) – Sebastian Flennerhag et al.
The first paper selected by Max proposes an algorithm in which allows the meta-learner teach itself, allowing to overcome the meta-optimisation challenge. The algorithm focuses meta-learning with gradients, which guarantees improvements in performance. The paper also looks at how bootstrapping opens up possibilities. Read the full paper here .
2. Multi-Objective Bayesian Optimization over High-Dimensional Search Spaces (2022) – Samuel Daulton et al.
Another paper selected by Max proposes MORBO, a scalable method for multiple-objective BO as it performs better than that of high-dimensional search spaces. MORBO significantly improves the sample efficiency, and where BO algorithms fail, MORBO provides improved sample efficiencies to the current BO approach used. Read the full paper here .
3. Tabular Data: Deep Learning is Not All You Need (2021) – Ravid Shwartz-Ziv, Amitai Armon
To solve real-life data science problems, selecting the right model to use is crucial. This final paper selected by Max explores whether deep models should be recommended as an option for tabular data. Read the full paper here .

Jigyasa Grover, Senior Machine Learning Engineer at Twitter
Jigyasa Grover is a Senior Machine Learning Engineer at Twitter working in the performance ads ranking domain. Recently, she was honoured with the 'Outstanding in AI: Young Role Model Award' by Women in AI across North America. She is one of the few ML Google Developer Experts globally. Jigyasa has previously presented at our Deep Learning Summit and MLOps event in San Fransisco earlier this year.
4. Privacy for Free: How does Dataset Condensation Help Privacy? (2022) – Tian Dong et al.
Jigyasa’s first recommendation concentrates on Privacy Preserving Machine Learning, specifically mitigating the leakage of sensitive data in Machine Learning. The paper provides one of the first propositions of using dataset condensation techniques to preserve the data efficiency during model training and furnish membership privacy. This paper was published by Sony AI and won the Outstanding Paper Award at ICML 2022. Read the full paper here .
5. Affective Signals in a Social Media Recommender System (2022) – Jane Dwivedi-Yu et al.
The second paper recommended by Jigyasa talks about operationalising Affective Computing, also known as Emotional AI, for an improved personalised feed on social media. The paper discusses the design of an affective taxonomy customised to user needs on social media. It further lays out the curation of suitable training data by combining engagement data and data from a human-labelling task to enable the identification of the affective response a user might exhibit for a particular post. Read the full paper here .
6. ItemSage: Learning Product Embeddings for Shopping Recommendations at Pinterest (2022) – Paul Baltescu et al.
Jigyasa’s last recommendation is a paper by Pinterest that illustrates the aggregation of both textual and visual information to build a unified set of product embeddings to enhance recommendation results on e-commerce websites. By applying multi-task learning, the proposed embeddings can optimise for multiple engagement types and ensures that the shopping recommendation stack is efficient with respect to all objectives. Read the full article here .
Asmita Poddar, Software Development Engineer at Amazon Alexa
Asmita is a Software Development Engineer at Amazon Alexa, where she works on developing and productionising natural language processing and speech models. Asmita also has prior experience in applying machine learning in diverse domains. Asmita will be presenting at our London AI Summit , in September, where she will discuss AI for Spoken Communication.
7. Competition-Level Code Generation with AlphaCode (2022) – Yujia Li et al.
Systems can help programmers become more productive. Asmita has selected this paper which addresses the problems with incorporating innovations in AI into these systems. AlphaCode is a system that creates solutions for problems that requires deeper reasoning. Read the full paper here .
8. A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog (2022) – Yunhe Xie et al.
There are limits to model’s reasoning in regards to the existing ERSD datasets. The final paper selected by Asmita proposes a Commonsense Knowledge Enhanced Network with a backward-looking loss to perform dialog modelling, external knowledge integration and historical state retrospect. The model used has been shown to outperform other models. Read the full paper here .

Discover the speakers we have lined up and the topics we will cover at the London AI Summit.
Sergei Bobrovskyi, Expert in Anomaly Detection for Root Cause Analysis at Airbus
Dr. Sergei Bobrovskyi is a Data Scientist within the Analytics Accelerator team of the Airbus Digital Transformation Office. His work focuses on applications of AI for anomaly detection in time series, spanning various use-cases across Airbus. Sergei will be presenting at our Berlin AI Summit in October about Anomaly Detection, Root Cause Analysis and Explainability.
9. LaMDA: Language Models for Dialog Applications (2022) – Romal Thoppilan et al.
The paper chosen by Sergei describes the LaMDA system, which caused the furor this summer, when a former Google engineer claimed it has shown signs of being sentient. LaMDA is a family of large language models for dialog applications based on Transformer architecture. The interesting feature of the model is their fine-tuning with human annotated data and possibility to consult external sources. In any case, this is a very interesting model family, which we might encounter in many of the applications we use daily. Read the full paper here .
10. A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27 (2022) – Yann LeCun
The second paper chosen by Sergei provides a vision on how to progress towards general AI. The study combines a number of concepts including configurable predictive world model, behaviour driven through intrinsic motivation, and hierarchical joint embedding architectures. Read the full paper here .
11. Coordination Among Neural Modules Through a Shared Global Workpace (2022) – Anirudh Goyal et al.
This paper chosen by Sergei combines the Transformer architecture underlying most of the recent successes of deep learning with ideas from the Global Workspace Theory from cognitive sciences. This is an interesting read to broaden the understanding of why certain model architectures perform well and in which direction we might go in the future to further improve performance on challenging tasks. Read the full paper here .
12. Magnetic control of tokamak plasmas through deep reinforcement learning (2022) – Jonas Degrave et al.
Sergei chose the next paper, which asks the question of ‘how can AI research benefit humanity?’. The use of AI to enable safe, reliable and scalable deployment of fusion energy could contribute to the solution of pression problems of climate change. Sergei has said that this is an extremely interesting application of AI technology for engineering. Read the full paper here .
13. TranAd: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data (2022) – Shreshth Tuli, Giuliano Casale and Nicholas R. Jennings
The final paper chosen by Sergei is a specialised paper applying transformer architecture to the problem of unsupervised anomaly detection in multivariate time-series. Many architectures which were successful in other fields are at some points being also applied to time-series. The paper shows an improved performance on some known data sets. Read the full paper here .

Abdullahi Adamu, Senior Software Engineer at Sony
Abdullahi has worked in various industries including working at a market research start-up where he developed models that could extract insights from human conversations about products or services. He moved to Publicis, where he became Data Engineer and Data Scientist in 2018. Abdullahi will be part of our panel discussion at the London AI Summit in September, where he will discuss Harnessing the Power of Deep Learning.
14. Self-Supervision for Learning from the Bottom Up (2022) – Alexei Efros
This paper chosen by Abdullahi makes compelling arguments for why self-supervision is the next step in the evolution of AI/ML for building more robust models. Overall, these compelling arguments justify even further why self-supervised learning is important on our journey towards more robust models that generalise better in the wild. Read the full paper here .
15. Neural Architecture Search Survey: A Hardware Perspective (2022) – Krishna Teja Chitty-Venkata and Arun K. Somani
Another paper chosen by Abdullahi understands that as we move towards edge computing and federated learning, neural architecture search that takes into account hardware constraints which will be more critical in ensuring that we have leaner neural network models that balance latency and generalisation performance. This survey gives a birds eye view of the various neural architecture search algorithms that take into account hardware constraints to design artificial neural networks that give the best tradeoff of performance and accuracy. Read the full paper here .
16. What Should Not Be Contrastive In Contrastive Learning (2021) – Tete Xiao et al.
In the paper chosen by Abdullahi highlights the underlying assumptions behind data augmentation methods and how these can be counter productive in the context of contrastive learning; for example colour augmentation whilst a downstream task is meant to differentiate colours of objects. The result reported show promising results in the wild. Overall, it presents an elegant solution to using data augmentation for contrastive learning. Read the full paper here .
17. Why do tree-based models still outperform deep learning on tabular data? (2022) – Leo Grinsztajn, Edouard Oyallon and Gael Varoquaux
The final paper selected by Abdulliah works on answering the question of why deep learning models still find it hard to compete on tabular data compared to tree-based models. It is shown that MLP-like architectures are more sensitive to uninformative features in data, compared to their tree-based counterparts. Read the full paper here .
Sign up to the RE•WORK monthly newsletter for the latest AI news, trends and events.
Join us at our upcoming events this year:
· London AI Summit – 14-15 September 2022
· Berlin AI Summit – 4-5 October 2022
· AI in Healthcare Summit Boston – 13-14 October 2022
· Sydney Deep Learning and Enterprise AI Summits – 17-18 October 2022
· MLOps Summit – 9-10 November 2022
· Toronto AI Summit – 9-10 November 2022
· Nordics AI Summit - 7-8 December 2022

Curated list of awesome lists
Awesome - most cited deep learning papers.
[Notice] This list is not being maintained anymore because of the overwhelming amount of deep learning papers published every day since 2017.
A curated list of the most cited deep learning papers (2012-2016)
We believe that there exist classic deep learning papers which are worth reading regardless of their application domain. Rather than providing overwhelming amount of papers, We would like to provide a curated list of the awesome deep learning papers which are considered as must-reads in certain research domains.
Before this list, there exist other awesome deep learning lists , for example, Deep Vision and Awesome Recurrent Neural Networks . Also, after this list comes out, another awesome list for deep learning beginners, called Deep Learning Papers Reading Roadmap , has been created and loved by many deep learning researchers.
Although the Roadmap List includes lots of important deep learning papers, it feels overwhelming for me to read them all. As I mentioned in the introduction, I believe that seminal works can give us lessons regardless of their application domain. Thus, I would like to introduce top 100 deep learning papers here as a good starting point of overviewing deep learning researches.
To get the news for newly released papers everyday, follow my twitter or facebook page !
Awesome list criteria
- A list of top 100 deep learning papers published from 2012 to 2016 is suggested.
- If a paper is added to the list, another paper (usually from *More Papers from 2016" section) should be removed to keep top 100 papers. (Thus, removing papers is also important contributions as well as adding papers)
- Papers that are important, but failed to be included in the list, will be listed in More than Top 100 section.
- Please refer to New Papers and Old Papers sections for the papers published in recent 6 months or before 2012.
(Citation criteria)
- < 6 months : New Papers (by discussion)
- 2016 : +60 citations or "More Papers from 2016"
- 2015 : +200 citations
- 2014 : +400 citations
- 2013 : +600 citations
- 2012 : +800 citations
- ~2012 : Old Papers (by discussion)
Please note that we prefer seminal deep learning papers that can be applied to various researches rather than application papers. For that reason, some papers that meet the criteria may not be accepted while others can be. It depends on the impact of the paper, applicability to other researches scarcity of the research domain, and so on.
We need your contributions!
If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. (Please read the contributing guide for further instructions, though just letting me know the title of papers can also be a big contribution to us.)
(Update) You can download all top-100 papers with this and collect all authors' names with this . Also, bib file for all top-100 papers are available. Thanks, doodhwala, Sven and grepinsight !
- Can anyone contribute the code for obtaining the statistics of the authors of Top-100 papers?
Understanding / Generalization / Transfer
Optimization / training techniques, unsupervised / generative models.
- Convolutional Network Models
- Image Segmentation / Object Detection
Image / Video / Etc
Natural language processing / rnns, speech / other domain, reinforcement learning / robotics, more papers from 2016.
(More than Top 100)
- New Papers : Less than 6 months
- Old Papers : Before 2012
- HW / SW / Dataset : Technical reports
Book / Survey / Review
Video lectures / tutorials / blogs.
- Appendix: More than Top 100 : More papers not in the list
- Distilling the knowledge in a neural network (2015), G. Hinton et al. [pdf]
- Deep neural networks are easily fooled: High confidence predictions for unrecognizable images (2015), A. Nguyen et al. [pdf]
- How transferable are features in deep neural networks? (2014), J. Yosinski et al. [pdf]
- CNN features off-the-Shelf: An astounding baseline for recognition (2014), A. Razavian et al. [pdf]
- Learning and transferring mid-Level image representations using convolutional neural networks (2014), M. Oquab et al. [pdf]
- Visualizing and understanding convolutional networks (2014), M. Zeiler and R. Fergus [pdf]
- Decaf: A deep convolutional activation feature for generic visual recognition (2014), J. Donahue et al. [pdf]
- Training very deep networks (2015), R. Srivastava et al. [pdf]
- Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedy [pdf]
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015), K. He et al. [pdf]
- Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al. [pdf]
- Adam: A method for stochastic optimization (2014), D. Kingma and J. Ba [pdf]
- Improving neural networks by preventing co-adaptation of feature detectors (2012), G. Hinton et al. [pdf]
- Random search for hyper-parameter optimization (2012) J. Bergstra and Y. Bengio [pdf]
- Pixel recurrent neural networks (2016), A. Oord et al. [pdf]
- Improved techniques for training GANs (2016), T. Salimans et al. [pdf]
- Unsupervised representation learning with deep convolutional generative adversarial networks (2015), A. Radford et al. [pdf]
- DRAW: A recurrent neural network for image generation (2015), K. Gregor et al. [pdf]
- Generative adversarial nets (2014), I. Goodfellow et al. [pdf]
- Auto-encoding variational Bayes (2013), D. Kingma and M. Welling [pdf]
- Building high-level features using large scale unsupervised learning (2013), Q. Le et al. [pdf]
Convolutional Neural Network Models
- Rethinking the inception architecture for computer vision (2016), C. Szegedy et al. [pdf]
- Inception-v4, inception-resnet and the impact of residual connections on learning (2016), C. Szegedy et al. [pdf]
- Identity Mappings in Deep Residual Networks (2016), K. He et al. [pdf]
- Deep residual learning for image recognition (2016), K. He et al. [pdf]
- Spatial transformer network (2015), M. Jaderberg et al., [pdf]
- Going deeper with convolutions (2015), C. Szegedy et al. [pdf]
- Very deep convolutional networks for large-scale image recognition (2014), K. Simonyan and A. Zisserman [pdf]
- Return of the devil in the details: delving deep into convolutional nets (2014), K. Chatfield et al. [pdf]
- OverFeat: Integrated recognition, localization and detection using convolutional networks (2013), P. Sermanet et al. [pdf]
- Maxout networks (2013), I. Goodfellow et al. [pdf]
- Network in network (2013), M. Lin et al. [pdf]
- ImageNet classification with deep convolutional neural networks (2012), A. Krizhevsky et al. [pdf]
Image: Segmentation / Object Detection
- You only look once: Unified, real-time object detection (2016), J. Redmon et al. [pdf]
- Fully convolutional networks for semantic segmentation (2015), J. Long et al. [pdf]
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015), S. Ren et al. [pdf]
- Fast R-CNN (2015), R. Girshick [pdf]
- Rich feature hierarchies for accurate object detection and semantic segmentation (2014), R. Girshick et al. [pdf]
- Spatial pyramid pooling in deep convolutional networks for visual recognition (2014), K. He et al. [pdf]
- Semantic image segmentation with deep convolutional nets and fully connected CRFs , L. Chen et al. [pdf]
- Learning hierarchical features for scene labeling (2013), C. Farabet et al. [pdf]
- Image Super-Resolution Using Deep Convolutional Networks (2016), C. Dong et al. [pdf]
- A neural algorithm of artistic style (2015), L. Gatys et al. [pdf]
- Deep visual-semantic alignments for generating image descriptions (2015), A. Karpathy and L. Fei-Fei [pdf]
- Show, attend and tell: Neural image caption generation with visual attention (2015), K. Xu et al. [pdf]
- Show and tell: A neural image caption generator (2015), O. Vinyals et al. [pdf]
- Long-term recurrent convolutional networks for visual recognition and description (2015), J. Donahue et al. [pdf]
- VQA: Visual question answering (2015), S. Antol et al. [pdf]
- DeepFace: Closing the gap to human-level performance in face verification (2014), Y. Taigman et al. [pdf] :
- Large-scale video classification with convolutional neural networks (2014), A. Karpathy et al. [pdf]
- Two-stream convolutional networks for action recognition in videos (2014), K. Simonyan et al. [pdf]
- 3D convolutional neural networks for human action recognition (2013), S. Ji et al. [pdf]
- Neural Architectures for Named Entity Recognition (2016), G. Lample et al. [pdf]
- Exploring the limits of language modeling (2016), R. Jozefowicz et al. [pdf]
- Teaching machines to read and comprehend (2015), K. Hermann et al. [pdf]
- Effective approaches to attention-based neural machine translation (2015), M. Luong et al. [pdf]
- Conditional random fields as recurrent neural networks (2015), S. Zheng and S. Jayasumana. [pdf]
- Memory networks (2014), J. Weston et al. [pdf]
- Neural turing machines (2014), A. Graves et al. [pdf]
- Neural machine translation by jointly learning to align and translate (2014), D. Bahdanau et al. [pdf]
- Sequence to sequence learning with neural networks (2014), I. Sutskever et al. [pdf]
- Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014), K. Cho et al. [pdf]
- A convolutional neural network for modeling sentences (2014), N. Kalchbrenner et al. [pdf]
- Convolutional neural networks for sentence classification (2014), Y. Kim [pdf]
- Glove: Global vectors for word representation (2014), J. Pennington et al. [pdf]
- Distributed representations of sentences and documents (2014), Q. Le and T. Mikolov [pdf]
- Distributed representations of words and phrases and their compositionality (2013), T. Mikolov et al. [pdf]
- Efficient estimation of word representations in vector space (2013), T. Mikolov et al. [pdf]
- Recursive deep models for semantic compositionality over a sentiment treebank (2013), R. Socher et al. [pdf]
- Generating sequences with recurrent neural networks (2013), A. Graves. [pdf]
- End-to-end attention-based large vocabulary speech recognition (2016), D. Bahdanau et al. [pdf]
- Deep speech 2: End-to-end speech recognition in English and Mandarin (2015), D. Amodei et al. [pdf]
- Speech recognition with deep recurrent neural networks (2013), A. Graves [pdf]
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012), G. Hinton et al. [pdf]
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition (2012) G. Dahl et al. [pdf]
- Acoustic modeling using deep belief networks (2012), A. Mohamed et al. [pdf]
- End-to-end training of deep visuomotor policies (2016), S. Levine et al. [pdf]
- Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection (2016), S. Levine et al. [pdf]
- Asynchronous methods for deep reinforcement learning (2016), V. Mnih et al. [pdf]
- Deep Reinforcement Learning with Double Q-Learning (2016), H. Hasselt et al. [pdf]
- Mastering the game of Go with deep neural networks and tree search (2016), D. Silver et al. [pdf]
- Continuous control with deep reinforcement learning (2015), T. Lillicrap et al. [pdf]
- Human-level control through deep reinforcement learning (2015), V. Mnih et al. [pdf]
- Deep learning for detecting robotic grasps (2015), I. Lenz et al. [pdf]
- Playing atari with deep reinforcement learning (2013), V. Mnih et al. [pdf] )
- Layer Normalization (2016), J. Ba et al. [pdf]
- Learning to learn by gradient descent by gradient descent (2016), M. Andrychowicz et al. [pdf]
- Domain-adversarial training of neural networks (2016), Y. Ganin et al. [pdf]
- WaveNet: A Generative Model for Raw Audio (2016), A. Oord et al. [pdf] [web]
- Colorful image colorization (2016), R. Zhang et al. [pdf]
- Generative visual manipulation on the natural image manifold (2016), J. Zhu et al. [pdf]
- Texture networks: Feed-forward synthesis of textures and stylized images (2016), D Ulyanov et al. [pdf]
- SSD: Single shot multibox detector (2016), W. Liu et al. [pdf]
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size (2016), F. Iandola et al. [pdf]
- Eie: Efficient inference engine on compressed deep neural network (2016), S. Han et al. [pdf]
- Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1 (2016), M. Courbariaux et al. [pdf]
- Dynamic memory networks for visual and textual question answering (2016), C. Xiong et al. [pdf]
- Stacked attention networks for image question answering (2016), Z. Yang et al. [pdf]
- Hybrid computing using a neural network with dynamic external memory (2016), A. Graves et al. [pdf]
- Google's neural machine translation system: Bridging the gap between human and machine translation (2016), Y. Wu et al. [pdf]
Newly published papers (< 6 months) which are worth reading
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017), Andrew G. Howard et al. [pdf]
- Convolutional Sequence to Sequence Learning (2017), Jonas Gehring et al. [pdf]
- A Knowledge-Grounded Neural Conversation Model (2017), Marjan Ghazvininejad et al. [pdf]
- Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour (2017), Priya Goyal et al. [pdf]
- TACOTRON: Towards end-to-end speech synthesis (2017), Y. Wang et al. [pdf]
- Deep Photo Style Transfer (2017), F. Luan et al. [pdf]
- Evolution Strategies as a Scalable Alternative to Reinforcement Learning (2017), T. Salimans et al. [pdf]
- Deformable Convolutional Networks (2017), J. Dai et al. [pdf]
- Mask R-CNN (2017), K. He et al. [pdf]
- Learning to discover cross-domain relations with generative adversarial networks (2017), T. Kim et al. [pdf]
- Deep voice: Real-time neural text-to-speech (2017), S. Arik et al., [pdf]
- PixelNet: Representation of the pixels, by the pixels, and for the pixels (2017), A. Bansal et al. [pdf]
- Batch renormalization: Towards reducing minibatch dependence in batch-normalized models (2017), S. Ioffe. [pdf]
- Wasserstein GAN (2017), M. Arjovsky et al. [pdf]
- Understanding deep learning requires rethinking generalization (2017), C. Zhang et al. [pdf]
- Least squares generative adversarial networks (2016), X. Mao et al. [pdf]
Classic papers published before 2012
- An analysis of single-layer networks in unsupervised feature learning (2011), A. Coates et al. [pdf]
- Deep sparse rectifier neural networks (2011), X. Glorot et al. [pdf]
- Natural language processing (almost) from scratch (2011), R. Collobert et al. [pdf]
- Recurrent neural network based language model (2010), T. Mikolov et al. [pdf]
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion (2010), P. Vincent et al. [pdf]
- Learning mid-level features for recognition (2010), Y. Boureau [pdf]
- A practical guide to training restricted boltzmann machines (2010), G. Hinton [pdf]
- Understanding the difficulty of training deep feedforward neural networks (2010), X. Glorot and Y. Bengio [pdf]
- Why does unsupervised pre-training help deep learning (2010), D. Erhan et al. [pdf]
- Learning deep architectures for AI (2009), Y. Bengio. [pdf]
- Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations (2009), H. Lee et al. [pdf]
- Greedy layer-wise training of deep networks (2007), Y. Bengio et al. [pdf]
- Reducing the dimensionality of data with neural networks, G. Hinton and R. Salakhutdinov. [pdf]
- A fast learning algorithm for deep belief nets (2006), G. Hinton et al. [pdf]
- Gradient-based learning applied to document recognition (1998), Y. LeCun et al. [pdf]
- Long short-term memory (1997), S. Hochreiter and J. Schmidhuber. [pdf]
HW / SW / Dataset
- SQuAD: 100,000+ Questions for Machine Comprehension of Text (2016), Rajpurkar et al. [pdf]
- OpenAI gym (2016), G. Brockman et al. [pdf]
- TensorFlow: Large-scale machine learning on heterogeneous distributed systems (2016), M. Abadi et al. [pdf]
- Theano: A Python framework for fast computation of mathematical expressions, R. Al-Rfou et al.
- Torch7: A matlab-like environment for machine learning, R. Collobert et al. [pdf]
- MatConvNet: Convolutional neural networks for matlab (2015), A. Vedaldi and K. Lenc [pdf]
- Imagenet large scale visual recognition challenge (2015), O. Russakovsky et al. [pdf]
- Caffe: Convolutional architecture for fast feature embedding (2014), Y. Jia et al. [pdf]
- On the Origin of Deep Learning (2017), H. Wang and Bhiksha Raj. [pdf]
- Deep Reinforcement Learning: An Overview (2017), Y. Li, [pdf]
- Neural Machine Translation and Sequence-to-sequence Models(2017): A Tutorial, G. Neubig. [pdf]
- Neural Network and Deep Learning (Book, Jan 2017), Michael Nielsen. [html]
- Deep learning (Book, 2016), Goodfellow et al. [html]
- LSTM: A search space odyssey (2016), K. Greff et al. [pdf]
- Tutorial on Variational Autoencoders (2016), C. Doersch. [pdf]
- Deep learning (2015), Y. LeCun, Y. Bengio and G. Hinton [pdf]
- Deep learning in neural networks: An overview (2015), J. Schmidhuber [pdf]
- Representation learning: A review and new perspectives (2013), Y. Bengio et al. [pdf]
- CS231n, Convolutional Neural Networks for Visual Recognition, Stanford University [web]
- CS224d, Deep Learning for Natural Language Processing, Stanford University [web]
- Oxford Deep NLP 2017, Deep Learning for Natural Language Processing, University of Oxford [web]
(Tutorials)
- NIPS 2016 Tutorials, Long Beach [web]
- ICML 2016 Tutorials, New York City [web]
- ICLR 2016 Videos, San Juan [web]
- Deep Learning Summer School 2016, Montreal [web]
- Bay Area Deep Learning School 2016, Stanford [web]
- OpenAI [web]
- Distill [web]
- Andrej Karpathy Blog [web]
- Colah's Blog [Web]
- WildML [Web]
- FastML [web]
- TheMorningPaper [web]
Appendix: More than Top 100
- A character-level decoder without explicit segmentation for neural machine translation (2016), J. Chung et al. [pdf]
- Dermatologist-level classification of skin cancer with deep neural networks (2017), A. Esteva et al. [html]
- Weakly supervised object localization with multi-fold multiple instance learning (2017), R. Gokberk et al. [pdf]
- Brain tumor segmentation with deep neural networks (2017), M. Havaei et al. [pdf]
- Professor Forcing: A New Algorithm for Training Recurrent Networks (2016), A. Lamb et al. [pdf]
- Adversarially learned inference (2016), V. Dumoulin et al. [web] [pdf]
- Understanding convolutional neural networks (2016), J. Koushik [pdf]
- Taking the human out of the loop: A review of bayesian optimization (2016), B. Shahriari et al. [pdf]
- Adaptive computation time for recurrent neural networks (2016), A. Graves [pdf]
- Densely connected convolutional networks (2016), G. Huang et al. [pdf]
- Region-based convolutional networks for accurate object detection and segmentation (2016), R. Girshick et al.
- Continuous deep q-learning with model-based acceleration (2016), S. Gu et al. [pdf]
- A thorough examination of the cnn/daily mail reading comprehension task (2016), D. Chen et al. [pdf]
- Achieving open vocabulary neural machine translation with hybrid word-character models, M. Luong and C. Manning. [pdf]
- Very Deep Convolutional Networks for Natural Language Processing (2016), A. Conneau et al. [pdf]
- Bag of tricks for efficient text classification (2016), A. Joulin et al. [pdf]
- Efficient piecewise training of deep structured models for semantic segmentation (2016), G. Lin et al. [pdf]
- Learning to compose neural networks for question answering (2016), J. Andreas et al. [pdf]
- Perceptual losses for real-time style transfer and super-resolution (2016), J. Johnson et al. [pdf]
- Reading text in the wild with convolutional neural networks (2016), M. Jaderberg et al. [pdf]
- What makes for effective detection proposals? (2016), J. Hosang et al. [pdf]
- Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks (2016), S. Bell et al. [pdf] .
- Instance-aware semantic segmentation via multi-task network cascades (2016), J. Dai et al. [pdf]
- Conditional image generation with pixelcnn decoders (2016), A. van den Oord et al. [pdf]
- Deep networks with stochastic depth (2016), G. Huang et al., [pdf]
- Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics (2016), Yee Whye Teh et al. [pdf]
- Ask your neurons: A neural-based approach to answering questions about images (2015), M. Malinowski et al. [pdf]
- Exploring models and data for image question answering (2015), M. Ren et al. [pdf]
- Are you talking to a machine? dataset and methods for multilingual image question (2015), H. Gao et al. [pdf]
- Mind's eye: A recurrent visual representation for image caption generation (2015), X. Chen and C. Zitnick. [pdf]
- From captions to visual concepts and back (2015), H. Fang et al. [pdf] .
- Towards AI-complete question answering: A set of prerequisite toy tasks (2015), J. Weston et al. [pdf]
- Ask me anything: Dynamic memory networks for natural language processing (2015), A. Kumar et al. [pdf]
- Unsupervised learning of video representations using LSTMs (2015), N. Srivastava et al. [pdf]
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding (2015), S. Han et al. [pdf]
- Improved semantic representations from tree-structured long short-term memory networks (2015), K. Tai et al. [pdf]
- Character-aware neural language models (2015), Y. Kim et al. [pdf]
- Grammar as a foreign language (2015), O. Vinyals et al. [pdf]
- Trust Region Policy Optimization (2015), J. Schulman et al. [pdf]
- Beyond short snippents: Deep networks for video classification (2015) [pdf]
- Learning Deconvolution Network for Semantic Segmentation (2015), H. Noh et al. [pdf]
- Learning spatiotemporal features with 3d convolutional networks (2015), D. Tran et al. [pdf]
- Understanding neural networks through deep visualization (2015), J. Yosinski et al. [pdf]
- An Empirical Exploration of Recurrent Network Architectures (2015), R. Jozefowicz et al. [pdf]
- Deep generative image models using a laplacian pyramid of adversarial networks (2015), E.Denton et al. [pdf]
- Gated Feedback Recurrent Neural Networks (2015), J. Chung et al. [pdf]
- Fast and accurate deep network learning by exponential linear units (ELUS) (2015), D. Clevert et al. [pdf]
- Pointer networks (2015), O. Vinyals et al. [pdf]
- Visualizing and Understanding Recurrent Networks (2015), A. Karpathy et al. [pdf]
- Attention-based models for speech recognition (2015), J. Chorowski et al. [pdf]
- End-to-end memory networks (2015), S. Sukbaatar et al. [pdf]
- Describing videos by exploiting temporal structure (2015), L. Yao et al. [pdf]
- A neural conversational model (2015), O. Vinyals and Q. Le. [pdf]
- Improving distributional similarity with lessons learned from word embeddings, O. Levy et al. [[pdf]] (https://www.transacl.org/ojs/index.php/tacl/article/download/570/124)
- Transition-Based Dependency Parsing with Stack Long Short-Term Memory (2015), C. Dyer et al. [pdf]
- Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs (2015), M. Ballesteros et al. [pdf]
- Finding function in form: Compositional character models for open vocabulary word representation (2015), W. Ling et al. [pdf]
- DeepPose: Human pose estimation via deep neural networks (2014), A. Toshev and C. Szegedy [pdf]
- Learning a Deep Convolutional Network for Image Super-Resolution (2014, C. Dong et al. [pdf]
- Recurrent models of visual attention (2014), V. Mnih et al. [pdf]
- Empirical evaluation of gated recurrent neural networks on sequence modeling (2014), J. Chung et al. [pdf]
- Addressing the rare word problem in neural machine translation (2014), M. Luong et al. [pdf]
- On the properties of neural machine translation: Encoder-decoder approaches (2014), K. Cho et. al.
- Recurrent neural network regularization (2014), W. Zaremba et al. [pdf]
- Intriguing properties of neural networks (2014), C. Szegedy et al. [pdf]
- Towards end-to-end speech recognition with recurrent neural networks (2014), A. Graves and N. Jaitly. [pdf]
- Scalable object detection using deep neural networks (2014), D. Erhan et al. [pdf]
- On the importance of initialization and momentum in deep learning (2013), I. Sutskever et al. [pdf]
- Regularization of neural networks using dropconnect (2013), L. Wan et al. [pdf]
- Learning Hierarchical Features for Scene Labeling (2013), C. Farabet et al. [pdf]
- Linguistic Regularities in Continuous Space Word Representations (2013), T. Mikolov et al. [pdf]
- Large scale distributed deep networks (2012), J. Dean et al. [pdf]
- A Fast and Accurate Dependency Parser using Neural Networks. Chen and Manning. [pdf]
Acknowledgement
Thank you for all your contributions. Please make sure to read the contributing guide before you make a pull request.
To the extent possible under law, Terry T. Um has waived all copyright and related or neighboring rights to this work.
How to Read Research Papers: A Pragmatic Approach for ML Practitioners

Is it necessary for data scientists or machine-learning experts to read research papers?
The short answer is yes. And don’t worry if you lack a formal academic background or have only obtained an undergraduate degree in the field of machine learning.
Reading academic research papers may be intimidating for individuals without an extensive educational background. However, a lack of academic reading experience should not prevent Data scientists from taking advantage of a valuable source of information and knowledge for machine learning and AI development .
This article provides a hands-on tutorial for data scientists of any skill level to read research papers published in academic journals such as NeurIPS , JMLR , ICML, and so on.
Before diving wholeheartedly into how to read research papers, the first phases of learning how to read research papers cover selecting relevant topics and research papers.
Step 1: Identify a topic
The domain of machine learning and data science is home to a plethora of subject areas that may be studied. But this does not necessarily imply that tackling each topic within machine learning is the best option.
Although generalization for entry-level practitioners is advised, I’m guessing that when it comes to long-term machine learning, career prospects, practitioners, and industry interest often shifts to specialization.
Identifying a niche topic to work on may be difficult, but good. Still, a rule of thumb is to select an ML field in which you are either interested in obtaining a professional position or already have experience.
Deep Learning is one of my interests, and I’m a Computer Vision Engineer that uses deep learning models in apps to solve computer vision problems professionally. As a result, I’m interested in topics like pose estimation, action classification, and gesture identification.
Based on roles, the following are examples of ML/DS occupations and related themes to consider.

For this article, I’ll select the topic Pose Estimation to explore and choose associated research papers to study.
Step 2: Finding research papers
One of the most excellent tools to use while looking at machine learning-related research papers, datasets, code, and other related materials is PapersWithCode .
We use the search engine on the PapersWithCode website to get relevant research papers and content for our chosen topic, “Pose Estimation.” The following image shows you how it’s done.
The search results page contains a short explanation of the searched topic, followed by a table of associated datasets, models, papers, and code. Without going into too much detail, the area of interest for this use case is the “Greatest papers with code”. This section contains the relevant papers related to the task or topic. For the purpose of this article, I’ll select the DensePose: Dense Human Pose Estimation In The Wild .
Step 3: First pass (gaining context and understanding)

At this point, we’ve selected a research paper to study and are prepared to extract any valuable learnings and findings from its content.
It’s only natural that your first impulse is to start writing notes and reading the document from beginning to end, perhaps taking some rest in between. However, having a context for the content of a study paper is a more practical way to read it. The title, abstract, and conclusion are three key parts of any research paper to gain an understanding.
The goal of the first pass of your chosen paper is to achieve the following:
- Assure that the paper is relevant.
- Obtain a sense of the paper’s context by learning about its contents, methods, and findings.
- Recognize the author’s goals, methodology, and accomplishments.
The title is the first point of information sharing between the authors and the reader. Therefore, research papers titles are direct and composed in a manner that leaves no ambiguity.
The research paper title is the most telling aspect since it indicates the study’s relevance to your work. The importance of the title is to give a brief perception of the paper’s content.
In this situation, the title is “DensePose: Dense Human Pose Estimation in the Wild.” This gives a broad overview of the work and implies that it will look at how to provide pose estimations in environments with high levels of activity and realistic situations properly.
The abstract portion gives a summarized version of the paper. It’s a short section that contains 300-500 words and tells you what the paper is about in a nutshell. The abstract is a brief text that provides an overview of the article’s content, researchers’ objectives, methods, and techniques.
When reading an abstract of a machine-learning research paper, you’ll typically come across mentions of datasets, methods, algorithms, and other terms. Keywords relevant to the article’s content provide context. It may be helpful to take notes and keep track of all keywords at this point.
For the paper: “ DensePose: Dense Human Pose Estimation In The Wild “, I identified in the abstract the following keywords: pose estimation, COCO dataset, CNN, region-based models, real-time.
It’s not uncommon to experience fatigue when reading the paper from top to bottom at your first initial pass, especially for Data Scientists and practitioners with no prior advanced academic experience. Although extracting information from the later sections of a paper might seem tedious after a long study session, the conclusion sections are often short. Hence reading the conclusion section in the first pass is recommended.
The conclusion section is a brief compendium of the work’s author or authors and/or contributions and accomplishments and promises for future developments and limitations.
Before reading the main content of a research paper, read the conclusion section to see if the researcher’s contributions, problem domain, and outcomes match your needs.
Following this particular brief first pass step enables a sufficient understanding and overview of the research paper’s scope and objectives, as well as a context for its content. You’ll be able to get more detailed information out of its content by going through it again with laser attention.
Step 4: Second pass (content familiarization)
Content familiarization is a process that’s relevant to the initial steps. The systematic approach to reading the research paper presented in this article. The familiarity process is a step that involves the introduction section and figures within the research paper.
As previously mentioned, the urge to plunge straight into the core of the research paper is not required because knowledge acclimatization provides an easier and more comprehensive examination of the study in later passes.
Introduction
Introductory sections of research papers are written to provide an overview of the objective of the research efforts. This objective mentions and explains problem domains, research scope, prior research efforts, and methodologies.
It’s normal to find parallels to past research work in this area, using similar or distinct methods. Other papers’ citations provide the scope and breadth of the problem domain, which broadens the exploratory zone for the reader. Perhaps incorporating the procedure outlined in Step 3 is sufficient at this point.
Another aspect of the benefit provided by the introduction section is the presentation of requisite knowledge required to approach and understand the content of the research paper.
Graph, diagrams, figures
Illustrative materials within the research paper ensure that readers can comprehend factors that support problem definition or explanations of methods presented. Commonly, tables are used within research papers to provide information on the quantitative performances of novel techniques in comparison to similar approaches.

Generally, the visual representation of data and performance enables the development of an intuitive understanding of the paper’s context. In the Dense Pose paper mentioned earlier, illustrations are used to depict the performance of the author’s approach to pose estimation and create. An overall understanding of the steps involved in generating and annotating data samples.
In the realm of deep learning, it’s common to find topological illustrations depicting the structure of artificial neural networks. Again this adds to the creation of intuitive understanding for any reader. Through illustrations and figures, readers may interpret the information themselves and gain a fuller perspective of it without having any preconceived notions about what outcomes should be.

Step 5: Third pass (deep reading)
The third pass of the paper is similar to the second, though it covers a greater portion of the text. The most important thing about this pass is that you avoid any complex arithmetic or technique formulations that may be difficult for you. During this pass, you can also skip over any words and definitions that you don’t understand or aren’t familiar with. These unfamiliar terms, algorithms, or techniques should be noted to return to later.

During this pass, your primary objective is to gain a broad understanding of what’s covered in the paper. Approach the paper, starting again from the abstract to the conclusion, but be sure to take intermediary breaks in between sections. Moreover, it’s recommended to have a notepad, where all key insights and takeaways are noted, alongside the unfamiliar terms and concepts.
The Pomodoro Technique is an effective method of managing time allocated to deep reading or study. Explained simply, the Pomodoro Technique involves the segmentation of the day into blocks of work, followed by short breaks.
What works for me is the 50/15 split, that is, 50 minutes studying and 15 minutes allocated to breaks. I tend to execute this split twice consecutively before taking a more extended break of 30 minutes. If you are unfamiliar with this time management technique, adopt a relatively easy division such as 25/5 and adjust the time split according to your focus and time capacity.
Step 6: Forth pass (final pass)
The final pass is typically one that involves an exertion of your mental and learning abilities, as it involves going through the unfamiliar terms, terminologies, concepts, and algorithms noted in the previous pass. This pass focuses on using external material to understand the recorded unfamiliar aspects of the paper.
In-depth studies of unfamiliar subjects have no specified time length, and at times efforts span into the days and weeks. The critical factor to a successful final pass is locating the appropriate sources for further exploration.
Unfortunately, there isn’t one source on the Internet that provides the wealth of information you require. Still, there are multiple sources that, when used in unison and appropriately, fill knowledge gaps. Below are a few of these resources.
- The Machine Learning Subreddit
- The Deep Learning Subreddit
- PapersWithCode
- Top conferences such as NIPS , ICML , ICLR
- Research Gate
- Machine Learning Apple
The Reference sections of research papers mention techniques and algorithms. Consequently, the current paper either draws inspiration from or builds upon, which is why the reference section is a useful source to use in your deep reading sessions.
Step 7: Summary (optional)
In almost a decade of academic and professional undertakings of technology-associated subjects and roles, the most effective method of ensuring any new information learned is retained in my long-term memory through the recapitulation of explored topics. By rewriting new information in my own words, either written or typed, I’m able to reinforce the presented ideas in an understandable and memorable manner.

To take it one step further, it’s possible to publicize learning efforts and notes through the utilization of blogging platforms and social media. An attempt to explain the freshly explored concept to a broad audience, assuming a reader isn’t accustomed to the topic or subject, requires understanding topics in intrinsic details.
Undoubtedly, reading research papers for novice Data Scientists and ML practitioners can be daunting and challenging; even seasoned practitioners find it difficult to digest the content of research papers in a single pass successfully.
The nature of the Data Science profession is very practical and involved. Meaning, there’s a requirement for its practitioners to employ an academic mindset, more so as the Data Science domain is closely associated with AI, which is still a developing field.
To summarize, here are all of the steps you should follow to read a research paper:
- Identify A Topic.
- Finding associated Research Papers
- Read title, abstract, and conclusion to gain a vague understanding of the research effort aims and achievements.
- Familiarize yourself with the content by diving deeper into the introduction; including the exploration of figures and graphs presented in the paper.
- Use a deep reading session to digest the main content of the paper as you go through the paper from top to bottom.
- Explore unfamiliar terms, terminologies, concepts, and methods using external resources.
- Summarize in your own words essential takeaways, definitions, and algorithms.
Thanks for reading!
About the Authors

Related resources

Improving Machine Learning Security Skills at a DEF CON Competition

Community Spotlight: Democratizing Computer Vision and Conversational AI in Kenya

An Important Skill for Data Scientists and Machine Learning Practitioners

AI Pioneers Write So Should Data Scientists

Meet the Researcher: Peerapon Vateekul, Deep Learning Solutions for Medical Diagnosis and NLP
How nerfs helped me re-imagine the world.

NVIDIA Jetson Project of the Month: Recognizing Birds by Sound

Controlled Adaptation of Speech Recognition Models to New Domains

Entropy-Based Methods for Word-Level ASR Confidence Estimation

X-ray Research Reveals Hazards in Airport Luggage Using Crystal Physics
- No suggested jump to results
- Notifications
Collection of must read papers for Data Science, or Machine Learning / Deep Learning Engineer
hurshd0/must-read-papers-for-ml
Name already in use.
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI .
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Must read papers for data science, ml, and dl, curated collection of data science, machine learning and deep learning papers, reviews and articles that are on must read list..
NOTE: 🚧 in process of updating, let me know what additional papers, articles, blogs to add I will add them here.
👉 ⭐ this repo
Contributing
- 👉 🔃 Please feel free to Submit Pull Request , if links are broken, or I am missing any important papers, blogs or articles.
👇 READ THIS 👇
- 👉 Reading paper with heavy math is hard, it takes time and effort to understand, most of it is dedication and motivation to not quit, don't be discouraged, read once, read twice, read thrice,... until it clicks and blows you away.
🥇 - Read it first
🥈 - Read it second
🥉 - Read it third
Data Science
📊 pre-processing & eda.
🥇 📄 Data preprocessing - Tidy data - by Hadley Wickham
📓 General DS
🥇 📄 Statistical Modeling: The Two Cultures - by Leo Breiman
🥈 📄 A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning
- 📹 KDD 2019 Cynthia Rudin's Keynote
🥇 📄 Frequentism and Bayesianism: A Python-driven Primer by Jake VanderPlas
Machine Learning
🎯 general ml.
🥇 📄 Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning - by Sebastian Raschka
🥇 📄 A Brief Introduction into Machine Learning - by Gunnar Ratsch
🥉 📄 An Introduction to the Conjugate Gradient Method Without the Agonizing Pain - by Jonathan Richard Shewchuk
🥉 📄 On Model Stability as a Function of Random Seed
🔍 Outlier/Anomaly detection
🥇 📰 Outlier Detection : A Survey
🥈 📄 XGBoost: A Scalable Tree Boosting System
🥈 📄 LightGBM: A Highly Efficient Gradient BoostingDecision Tree
🥈 📄 AdaBoost and the Super Bowl of Classifiers - A Tutorial Introduction to Adaptive Boosting
🥉 📄 Greedy Function Approximation: A Gradient Boosting Machine
📖 Unraveling Blackbox ML
🥉 📄 Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation
🥉 📄 Data Shapley: Equitable Valuation of Data for Machine Learning
✂️ Dimensionality Reduction
🥇 📄 A Tutorial on Principal Component Analysis
🥈 📄 How to Use t-SNE Effectively
🥉 📄 Visualizing Data using t-SNE
📈 Optimization
🥇 📄 A Tutorial on Bayesian Optimization
🥈 📄 Taking the Human Out of the Loop: A review of Bayesian Optimization
Famous Blogs
Sebastian Raschka
🎱 🔮 Recommenders
🥇 📄 A Survey of Collaborative Filtering Techniques
🥇 📄 Collaborative Filtering Recommender Systems
🥇 📄 Deep Learning Based Recommender System: A Survey and New Perspectives
🥇 📄 🤔 ⭐ Explainable Recommendation: A Survey and New Perspectives ⭐
Case Studies
🥈 📄 The Netflix Recommender System: Algorithms, Business Value,and Innovation
- Netflix Recommendations: Beyond the 5 stars Part 1
- Netflix Recommendations: Beyond the 5 stars Part 2
🥈 📄 Two Decades of Recommender Systems at Amazon.com
🥈 🌐 How Does Spotify Know You So Well?
👉 More In-Depth study, 📕 Recommender Systems Handbook
Famous Deep Learning Blogs 🤠
🌐 Stanford UFLDL Deep Learning Tutorial
🌐 Distill.pub
🌐 Colah's Blog
🌐 Andrej Karpathy
🌐 Zack Lipton
🌐 Sebastian Ruder
🌐 Jay Alammar
📚 Neural Networks and Deep Learning Neural Networks
⭐ 🥇 📰 The Matrix Calculus You Need For Deep Learning - Terence Parr and Jeremy Howard ⭐
🥇 📰 Deep learning -Yann LeCun, Yoshua Bengio & Geoffrey Hinton
🥇 📄 Generalization in Deep Learning
🥇 📄 Topology of Learning in Artificial Neural Networks
🥇 📄 Dropout: A Simple Way to Prevent Neural Networks from Overfitting
🥈 📄 Polynomial Regression As an Alternative to Neural Nets
🥈 🌐 The Neural Network Zoo
🥈 🌐 Image Completion with Deep Learning in TensorFlow
🥈 📄 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
🥉 📄 A systematic study of the class imbalance problem in convolutional neural networks
🥉 📄 All Neural Networks are Created Equal
🥉 📄 Adam: A Method for Stochastic Optimization
🥉 📄 AutoML: A Survey of the State-of-the-Art
🥇 📄 Visualizing and Understanding Convolutional Networks -by Andrej Karpathy Justin Johnson Li Fei-Fei
🥈 📄 Deep Residual Learning for Image Recognition
🥈 📄 AlexNet-ImageNet Classification with Deep Convolutional Neural Networks
🥈 📄 VGG Net-VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
🥉 📄 A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction
🥉 📄 Large-scale Video Classification with Convolutional Neural Networks
🥉 📄 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
⚫ CapsNet 🔱
🥇 📄 Dynamic Routing Between Capsules
Blog explaning, "What are CapsNet, or Capsule Networks?"
Capsule Networks Tutorial by Aureline Geron
🏞️ 💬 Image Captioning
🥇 📄 Show and Tell: A Neural Image Caption Generator
🥈 📄 Neural Machine Translation by Jointly Learning to Align and Translate
🥈 📄 StyleNet: Generating Attractive Visual Captions with Styles
🥈 📄 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
🥈 📄 Where to put the Image in an Image Caption Generator
🥈 📄 Dank Learning: Generating Memes Using Deep Neural Networks
🚗 🚶♂️ Object Detection 🦅 🏈
🥈 📄 ResNet-Deep Residual Learning for Image Recognition
🥈 📄 YOLO-You Only Look Once: Unified, Real-Time Object Detection
🥈 📄 Microsoft COCO: Common Objects in Context
- COCO dataset
🥈 📄 (R-CNN) Rich feature hierarchies for accurate object detection and semantic segmentation
🥈 📄 Fast R-CNN
- 💻 Papers with Code
🥈 📄 Faster R-CNN
🥈 📄 Mask R-CNN

🚗 🚶♂️ 👫 Pose Detection 🏃 💃
🥈 📄 DensePose: Dense Human Pose Estimation In The Wild
🥈 📄 Parsing R-CNN for Instance-Level Human Analysis
🔡 🔣 Deep NLP 💱 🔢
🥇 📄 A Primer on Neural Network Models for Natural Language Processing
🥇 📄 Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
🥇 📄 On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
🥇 📄 LSTM: A Search Space Odyssey - by Klaus Greff et al.
🥇 📄 A Critical Review of Recurrent Neural Networksfor Sequence Learning
🥇 📄 Visualizing and Understanding Recurrent Networks
⭐ 🥇 📄 Attention Is All You Need ⭐
🥇 📄 An Empirical Exploration of Recurrent Network Architectures
🥇 📄 Open AI (GPT-2) Language Models are Unsupervised Multitask Learners
🥇 📄 BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding
- Google BERT Annoucement
🥉 📄 Parameter-Efficient Transfer Learning for NLP
🥉 📄 A Sensitivity Analysis of (and Practitioners’ Guide to) ConvolutionalNeural Networks for Sentence Classification
🥉 📄 A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
🥉 📄 Convolutional Neural Networks for Sentence Classification
🥉 📄 Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
🥉 📄 Single Headed Attention RNN: Stop Thinking With Your Head
🥇 📄 Generative Adversarial Nets - Goodfellow et al.
📚 GAN Rabbit Hole -> GAN Papers
⭕ ➖ ⭕ GNNs (Graph Neural Networks)
🥉 📄 A Comprehensive Survey on Graph Neural Networks
👨⚕️ 💉 Medical AI 💊 🔬
Machine learning classifiers and fMRI: a tutorial overview - by Francisco et al.
👇 Cool Stuff 👇
🔊 📄 SoundNet: Learning Sound Representations from Unlabeled Video
🎨 📄 CAN: Creative Adversarial NetworksGenerating “Art” by Learning About Styles andDeviating from Style Norms
🎨 📄 Deep Painterly Harmonization
- Github Code
🕺 💃 📄 Everybody Dance Now
- Everybody Dance Now - Youtube Video
⚽ Soccer on Your Tabletop
👱♀️ 💇♀️ 📄 SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color
📸 📄 Handheld Mobile Photography in Very Low Light
🏯 🕌 📄 Learning Deep Features for Scene Recognitionusing Places Database
🚅 🚄 📄 High-Speed Tracking withKernelized Correlation Filters
🎬 📄 Recent progress in semantic image segmentation
Rabbit hole -> 🔊 🌐 Analytics Vidhya Top 10 Audio Processing Tasks and their papers
:blonde_man: -> 👴 📄 📄 Face Aging With Condintional GANS
:blonde_man: -> 👴 📄 📄 Dual Conditional GANs for Face Aging and Rejuvenation
⚖️ 📄 BAGAN: Data Augmentation with Balancing GAN
📰 Cap Stone Projects 📰
8 Awesome Data Science Capstone Projects
10 Powerful Applications of Linear Algebra in Data Science
Top 5 Interesting Applications of GANs
Deep Learning Applications a beginner can build in minutes
2019-10-28 Started must-read-papers-for-ml repo
2019-10-29 Added analytics vidhya use case studies article links
2019-10-30 Added Outlier/Anomaly detection paper, separated Boosting, CNN, Object Detection, NLP papers, and added Image captioning papers
2019-10-31 Added Famous Blogs from Deep and Machine Learning Researchers
2019-11-1 Fixed markdown issues, added contribution guideline
2019-11-20 Added Recommender Surveys, and Papers
2019-12-12 Added R-CNN variants, PoseNets, GNNs
2020-02-23 Added GRU paper
Help | Advanced Search
Computer Science > Machine Learning
Title: reinforcement learning in practice: opportunities and challenges.
Abstract: This article is a gentle discussion about the field of reinforcement learning in practice, about opportunities and challenges, touching a broad range of topics, with perspectives and without technical details. The article is based on both historical and recent research papers, surveys, tutorials, talks, blogs, books, (panel) discussions, and workshops/conferences. Various groups of readers, like researchers, engineers, students, managers, investors, officers, and people wanting to know more about the field, may find the article interesting. In this article, we first give a brief introduction to reinforcement learning (RL), and its relationship with deep learning, machine learning and AI. Then we discuss opportunities of RL, in particular, products and services, games, bandits, recommender systems, robotics, transportation, finance and economics, healthcare, education, combinatorial optimization, computer systems, and science and engineering. Then we discuss challenges, in particular, 1) foundation, 2) representation, 3) reward, 4) exploration, 5) model, simulation, planning, and benchmarks, 6) off-policy/offline learning, 7) learning to learn a.k.a. meta-learning, 8) explainability and interpretability, 9) constraints, 10) software development and deployment, 11) business perspectives, and 12) more challenges. We conclude with a discussion, attempting to answer: "Why has RL not been widely adopted in practice yet?" and "When is RL helpful?".
Submission history
- Download a PDF of the paper titled Reinforcement Learning in Practice: Opportunities and Challenges, by Yuxi Li PDF
- Other formats
References & Citations
- Google Scholar
- Semantic Scholar
BibTeX formatted citation

Bibliographic and Citation Tools
Code, data and media associated with this article, recommenders and search tools.
- Institution
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

- Data Science | All Courses
- Master of Science in Data Science – IIIT Bangalore
- Executive PG Programme in Data Science from IIIT Bangalore
- Professional Certificate Program in Data Science for Business Decision Making
- Master of Science in Data Science – UOA
- Advanced Certificate Programme in Data Science from IIITB
- Caltech CTME Data Analytics Certificate Program
- Advanced Programme in Data Science from IIIT Bangalore
- Professional Certificate Program in Data Science and Business Analytics
- Python Programming Bootcamp from upGrad
- Master of Science in Project Management – Golden Gate Univerity
- Project Management For Senior Professionals – XLRI Jamshedpur
- Software Engineering | All Courses
- Full Stack Development Bootcamp from upGrad
- Cybersecurity Certificate Program Caltech
- Blockchain Certification PGD – IIIT Bangalore
- Cyber Security Program – IIIT Bangalore
- Advanced Certificate Programme in Blockchain – IIIT Bangalore
- Full Stack Development Program – PURDUE
- Big Data Programme – IIIT Bangalore
- Cloud Backend Development Program – PURDUE
- Blockchain Certificate Program – PURDUE
- Cybersecurity Certificate Program – PURDUE
- Java Programming – upGrad
- Msc in Computer Science – Liverpool John Moores University
- Msc in Computer Science (CyberSecurity) – Liverpool John Moores University
- Full Stack Developer Course – IIIT Bangalore
- Executive PGP in Software Development (DevOps) – IIIT Bangalore
- Executive PGP in Software Development (Cloud Backend Development) – IIIT Bangalore
- Advanced Certificate Programme in DevOps – IIIT Bangalore
- Advanced Certificate Programme in Cloud Backend Development – IIIT Bangalore
- Machine Learning | All Courses
- Master of Science in Machine Learning & AI – Liverpool John Moores University
- Executive Post Graduate Programme in Machine Learning & AI – IIIT Bangalore
- Advanced Certification in Machine Learning and Cloud – IIT Madras
- Msc in ML & AI – Liverpool John Moores University
- Advanced Certificate Programme in Machine Learning & NLP – IIIT Bangalore
- Advanced Certificate Programme in Machine Learning & Deep Learning – IIIT Bangalore
- Advanced Certificate Program in AI for Managers – IIT Roorkee
- Data Science and Business Analytics – Maryland, US
- Executive PG Programme in Business Analytics – EPGP LIBA
- Executive Post-Graduate Programme in Business Analytics
- Business Analytics Certification Programme from upGrad
- Global Master Certificate in Business Analytics – Michigan State University
- MA in Journalism & Mass Communication – CU
- BA in Journalism & Mass Communication – CU
- Marketing | All Courses
- Brand and Communication Management – MICA
- Executive Development Program In Digital Marketing – XLRI
- Advanced Certificate in Digital Marketing and Communication – MICA
- Performance Marketing Bootcamp – Google Ads
- LL.M. in Corporate & Financial Law – Jindal Law School
- LL.M. in Intellectual Property & Technology Law – Jindal Law School
- LL.M. in Dispute Resolution from Jindal Law School
- Executive PGP Healthcare Management – LIBA
- Management Programme with PGP – IMT Ghaziabad
- PG Certification in Software Engineering from upGrad
- DBA – SSBM, Geneva
- DBA – Golden Gate Univerity
- Master of Business Administration (90 ECTS) | MBA
- Master of Business Administration (60 ECTS) | Master of Business Administration (60 ECTS)
- Computer Science | Master’s Degree
- MS in Data Analytics | MS in Data Analytics
- Project Management | MS
- Information Technology | MS
- International Management | Master’s Degree
- Advanced Credit Course for Master in International Management (120 ECTS)
- Advanced Credit Course for Master in Computer Science (120 ECTS)
- Bachelor of Business Administration (180 ECTS)
- B.Sc. Computer Science (180 ECTS)
- Masters Degree in Data Analytics and Visualization
- Masters Degree in Artificial Intelligence
- Masters Degree in Cybersecurity
- MBS in Entrepreneurship and Marketing
- MSc in Data Analytics
- Master of Business Administration (MBA)
- MBA – Information Technology Concentration
- MS in Data Analytics
- MS in Cybersecurity
- MS in Computer Science
- MS in Artificial Intelligence | MS in Artificial Intelligence
- MS in Analytics
- Master of Business Administration
- Master of Science in Accountancy
- Master of Science in Business Analytics
- Master of Science in Project Management
- MS in Data Science
- MS in Information Technology
- MS in Applied Data Science
- Executive PG Programme in Data Science – IIIT Bangalore
- Master of Science in Data Science – LJMU & IIIT Bangalore
- Advanced Certificate Programme in Data Science
- Advanced Programme in Data Science – IIIT Bangalore
- Full Stack Development Bootcamp – upGrad
- Msc in Computer Science from Liverpool John Moores University
- Advanced Certificate in Brand Communication Management
- Advanced Certificate in Digital Marketing and Communication
- Business Analytics Certification Programme
- MBA (90 ECTS) – IU, Germany
- MBA (60 ECTS) – IU, Germany
- Master in Computer Science – IU, Germany
- Master in International Management (120 ECTS) – IU, Germany
- Advanced Credit Course for Master in Computer Science (120 ECTS) – IU, Germany
- Advanced Credit Course for Master in International Management (120 ECTS) – IU, Germany
- Master in Data Science (120 ECTS) – IU, Germany
- Bachelor of Business Administration (180 ECTS) – IU, Germany
- B.Sc. Computer Science (180 ECTS) – IU, Germany
- MS in Data Analytics – Clark University, US
- MS in Information Technology – Clark University, US
- MS in Project Management – Clark University, US
- Masters Degree in Data Analytics and Visualization – Yeshiva University, USA
- Masters Degree in Artificial Intelligence – Yeshiva University, USA
- Masters Degree in Cybersecurity – Yeshiva University, USA
- MSc in Data Analytics – Dundalk Institute of Technology
- Master of Science in Project Management – Golden Gate University
- Master of Science in Business Analytics – Golden Gate University
- MS in Computer Science – Troy University
- Master of Business Administration – Edgewood College
- Master of Science in Accountancy – Edgewood College
- Master of Business Administration – University of Bridgeport, US
- MS in Analytics – University of Bridgeport, US
- MS in Artificial Intelligence – University of Bridgeport, US
- MS in Computer Science – University of Bridgeport, US
- MS in Cybersecurity – Johnson & Wales University (JWU)
- MS in Data Analytics – Johnson & Wales University (JWU)
- MBA – Information Technology Concentration – Johnson & Wales University (JWU)
- MS in Computer Science in Artificial Intelligence – CWRU, USA
- MS in Civil Engineering in AI & ML – CWRU, USA
- MS in Mechanical Engineering in AI and Robotics – CWRU, USA
- MS in Biomedical Engineering in Digital Health Analytics – CWRU, USA
- MBA – University Canada West in Vancouver, Canada
- Global Master Certificate in Integrated Supply Chain Management – Michigan State University
- Certificate Programme in Operations Management and Analytics – IIT Delhi
- MBA (Global) – Deakin Business School
- MBA – Golden Gate University
- PG Diploma in Management – BIMTECH
- Management PGP IMT
- Doctor of Business Administration – SSBM
- MBA (Global) in Digital Marketing – Deakin MICA
- MBA in Digital Finance – O.P. Jindal Global University
- Product Management Certification Program – DUKE CE
- PG Programme in Human Resource Management – LIBA
- HR Management and Analytics – IIM Kozhikode
- PG Programme in Healthcare Management – LIBA
- Supply Chain Management – MSU
- Finance for Non Finance Executives – IIT Delhi
- Effective Leadership & Management – MSU
- PG Programme in Management – IMT Ghaziabad
- Leadership and Management in New-Age Business
- Management Essentials from upGrad
- Executive PG Programme in Human Resource Management – LIBA
- Professional Certificate Programme in HR Management and Analytics – IIM Kozhikode
- Artificial Intelligence
- Bee The Change
- Blockchain Technology
- Business Analytics
- Cloud Computing
- Cyber Security
- Data Science
- Criminology
- Digital Marketing
- Entrepreneurship
- Full Stack Development
- Human Resource
- Product Management
- Software Development
- Study Abroad
- upGrad Mentorship
- Supply Chain Management
- MBA – Liverpool Business School
- IMT Management Certification + Liverpool MBA
- IMT Management Certification + Deakin MBA
- IMT Management Certification with 100% Job Guaranteed
- IMT Management Certification program
- MBA with specialisation (18 months)
- MBA (2 years)
- MBA (1 year)
- PGDM with Dual Specialisations (2 years)
- Machine Learning and Cloud – IIT Madras
- Master of Science in ML & AI – LJMU & IIT Madras
- PCP in Data Science – IIM Kozhikode
- HR Management & Analytics – IIM Kozhikode
- Msc in Data Science – IIIT Bangalore
- Certificate Programme in Blockchain – IIIT Bangalore
- Executive PGP in DevOps – IIIT Bangalore
- Executive PGP in Cloud Backend Development – IIIT Bangalore
- Certificate Programme in DevOps – IIIT Bangalore
- Certification in Cloud Backend Development – IIIT Bangalore
- Executive PG Programme in ML & AI – IIIT Bangalore
- Certificate Programme in ML & NLP – IIIT Bangalore
- Certificate Programme in ML & Deep Learning – IIIT B
- Executive Post-Graduate Programme in Human Resource Management
- Executive Post-Graduate Programme in Healthcare Management
- LL.M. in Intellectual Property & Technology Law
- LL.M. in Corporate & Financial Law
- LL.M. in Corporate & Financial LawLLM in Dispute Resolution
- Consumer Behavior
- Financial Analysis
- Introduction to FinTech
- Introduction to HR Analytics
- Fundamentals of Communication
- Art of Effective Communication
- Introduction to Research Methodology
- Mastering Sales Technique
- Business Communication
- Fundamentals of Journalism
- Economics Masterclasses
- Advertising
- Influencer Marketing
- Search Engine Optimization
- Performance Marketing
- Search Engine Marketing
- Email Marketing
- Content Marketing
- Social Media Marketing
- Display Advertising
- Marketing Analytics
- Web Analytics
- Affiliate Marketing
- Data Structure
- Full stack development
- Introduction to Database Design with MySQL
- Cryptocurrency
- Data Analysis
- Inferential Statistics
- Hypothesis Testing
- Logistic Regression
- Linear Regression
- Linear Algebra for Analysis
- Natural Learning Processing
- Deep Learning
- MBA in Finance
- MBA in Marketing
- MBA in Business Analytics
- MBA in Operations Management
- MBA in International Business
- MBA in Information Technology
- MBA in Healthcare Management
- MBA In General Management
- MBA in Agriculture
- MBA in Supply Chain Management
- MBA in Entrepreneurship
- MBA in Project Management
- Free Live Class
Top 16 Exciting Deep Learning Project Ideas for Beginners [2023]

Experienced Developer, Team Player and a Leader with a demonstrated history of working in startups. Strong engineering professional with a Bachelor of Technology (BTech) focused in Computer Science from Indian…
Table of Contents
Deep Learning Project Ideas
Although a new technological advancement, the scope of Deep Learning is expanding exponentially. This technology aims to imitate the biological neural network, that is, of the human brain. While the origins of Deep Learning dates back to the 1950s, it is only with the advancement and adoption of Artificial Intelligence and Machine Learning that it came to the limelight. So, if you are an ML beginner, the best thing you can do is work on some Deep learning project ideas.
You don’t have to waste time finding the best deep learning research topic for you. This article includes a variety of deep learning project topics in a categorised manner
We, here at upGrad, believe in a practical approach as theoretical knowledge alone won’t be of help in a real-time work environment. In this article, we will be exploring some interesting deep learning project ideas which beginners can work on to put their knowledge to test. In this article, you will find top deep learning project ideas for beginners to get hands-on experience on deep learning.
A subset of Machine Learning, Deep Learning leverages artificial neural networks arranged hierarchically to perform specific ML tasks. Deep Learning networks use the unsupervised learning approach – they learn from unstructured or unlabeled data. Artificial neural networks are just like the human brain, with neuron nodes interconnected to form a web-like structure.
While traditional learning models analyze data using a linear approach, the hierarchical function of Deep Learning systems is designed to process and analyze data in a nonlinear approach.
Check out our free deep learning courses

Deep Learning architectures like deep neural networks, recurrent neural networks, and deep belief networks have found applications in various fields including natural language processing, computer vision, bioinformatics, speech recognition, audio recognition, machine translation, social network filtering, drug design, and even board game programs. As new advances are being made in this domain, it is helping ML and Deep Learning experts to design innovative and functional Deep Learning projects. The more deep learning project ideas you try, the more experience you gain.
Today, we’ll discuss the top seven amazing Deep Learning projects that are helping us reach new heights of achievement.
In this article, we have covered top deep learning project ideas . We started with some beginner projects which you can solve with ease. Once you finish with these simple projects, I suggest you go back, learn a few more concepts and then try the intermediate projects. When you feel confident, you can then tackle the advanced projects. If you wish to improve your skills on the same, you need to get your hands on these machine learning courses .
So, here are a few Deep Learning Project ideas which beginners can work on:
Deep Learning Project Ideas: Beginners Level
This list of deep learning project ideas for students is suited for beginners, and those just starting out with ML in general. These deep learning project ideas will get you going with all the practicalities you need to succeed in your career.
Further, if you’re looking for deep learning project ideas for final year , this list should get you going. So, without further ado, let’s jump straight into some deep learning project ideas that will strengthen your base and allow you to climb up the ladder.
1. Image Classification with CIFAR-10 dataset
One of the best ideas to start experimenting you hands-on deep learning projects for students is working on Image classification. CIFAR-10 is a large dataset containing over 60,000 (32×32 size) colour images categorized into ten classes, wherein each class has 6,000 images. The training set contains 50,000 images, whereas the test set contains 10,000 images. The training set will be divided into five separate sections, each having 10,000 images arranged randomly. As for the test set, it will include 1000 images that are randomly chosen from each of the ten classes.
In this project, you’ll develop an image classification system that can identify the class of an input image. Image classification is a pivotal application in the field of deep learning, and hence, you will gain knowledge on various deep learning concepts while working on this project.
2. Visual tracking system
A visual tracking system is designed to track and locate moving object(s) in a given time frame via a camera. It is a handy tool that has numerous applications such as security and surveillance, medical imaging, augmented reality, traffic control, video editing and communication, and human-computer interaction.
This system uses a deep learning algorithm to analyze sequential video frames, after which it tracks the movement of target objects between the frames. The two core components of this visual tracking system are:
- Target representation and localization
- Filtering and data association
3. Face detection system
This is one of the excellent deep learning project ideas for beginners. With the advance of deep learning, facial recognition technology has also advanced tremendously. Face recognition technology is a subset of Object Detection that focuses on observing the instance of semantic objects. It is designed to track and visualize human faces within digital images.

In this deep learning project, you will learn how to perform human face recognition in real-time. You have to develop the model in Python and OpenCV.
Deep Learning Project Ideas: Intermediate Level
4. digit recognition system.
As the name suggests, this project involves developing a digit recognition system that can classify digits based on the set tenets. Here, you’ll be using the MNIST dataset containing images (28 X 28 size).
This project aims to create a recognition system that can classify digits ranging from 0 to 9 using a combination of shallow network and deep neural network and by implementing logistic regression. Softmax Regression or Multinomial Logistic Regression is the ideal choice for this project. Since this technique is a generalization of logistic regression, it is apt for multi-class classification, assuming that all the classes are mutually exclusive).
In this project, you will model a chatbot using IBM Watson’s API. Watson is the prime example of what AI can help us accomplish. The idea behind this project is to harness Watson’s deep learning abilities to create a chatbot that can engage with humans just like another human being. Chatbots are supremely intelligent and can answer to human question or requests in real-time. This is the reason why an increasing number of companies across all domains are adopting chatbots in their customer support infrastructure.

This project isn’t a very challenging one. All you need is to have Python 2/3 in your machine, a Bluemix account, and of course, an active Internet connection! If you wish to scale it up a notch, you can visit Github repository and improve your chatbot’s features by including an animated car dashboard.
Read: How to make chatbot in Python?
6. Music genre classification system
This is one of the interesting deep learning project ideas. This is an excellent project to nurture and improve your deep learning skills. You will create a deep learning model that uses neural networks to classify the genre of music automatically. For this project, you will use an FMA ( Free Music Archive ) dataset. FMA is an interactive library comprising high-quality and legal audio downloads. It is an open-source and easily accessible dataset that is great for a host of MIR tasks, including browsing and organizing vast music collections.
However, keep in mind that before you can use the model to classify audio files by genre, you will have to extract the relevant information from the audio samples (like spectrograms, MFCC, etc.).
Best Machine Learning Courses & AI Courses Online
7. drowsiness detection system.
The drowsiness of drivers is one of the main reasons behind road accidents. It is natural for drivers who frequent long routes to doze off when behind the steering wheel. Even stress and lack of sleep can cause drivers to feel drowsy while driving. This project aims to prevent and reduce such accidents by creating a drowsiness detection agent.
Here, you will use Python, OpenCV, and Keras to build a system that can detect the closed eyes of drivers and alert them if ever they fall asleep while driving. Even if the driver’s eyes are closed for a few seconds, this system will immediately inform the driver, thereby preventing terrible road accidents. OpenCV will monitor and collect the driver’s images via a webcam and feed them into the deep learning model that will classify the driver’s eyes as ‘open’ or ‘closed.’
8. Image caption generator
This is one of the trending deep learning project ideas. This is a Python-based deep learning project that leverages Convolutional Neural Networks and LTSM (a type of Recurrent Neural Network) to build a deep learning model that can generate captions for an image.
An Image caption generator combines both computer vision and natural language processing techniques to analyze and identify the context of an image and describe them accordingly in natural human languages (for example, English, Spanish, Danish, etc.). This project will strengthen your knowledge of CNN and LSTM, and you will learn how to implement them in real-world applications as this.
9. Colouring old B&W photos
For long, automated image colourization of B&W images has been a hot topic of exploration in the field of computer vision and deep learning. A recent study stated that if we train a neural network using a voluminous and rich dataset, we could create a deep learning model that can hallucinate colours within a black and white photograph.
In this image colourization project, you will be using Python and OpenCV DNN architecture (it is trained on ImageNet dataset). The aim is to create a coloured reproduction of grayscale images. For this purpose, you will use a pre-trained Caffe model , a prototxt file, and a NumPy file.
Deep Learning Project Ideas – Advanced Level
Below are some best ideas fo r advanced deep learning projects . These are some deep learning research topic s that will definitely challenge your depth of knowledge.
10. Detector
Detectron is a Facebook AI Research’s (FAIR) software system designed to execute and run state-of-the-art Object Detection algorithms. Written in Python, this Deep Learning project is based on the Caffe2 deep learning framework.
Detectron has been the foundation for many wonderful research projects including Feature Pyramid Networks for Object Detection ; Mask R-CNN ; Detecting and Recognizing Human-Object Interactions ; Focal Loss for Dense Object Detection ; Non-local Neural Networks , and Learning to Segment Every Thing , to name a few.
Detectron offers a high-quality and high-performance codebase for object detection research. It includes over 50 pre-trained models and is extremely flexible – it supports rapid implementation and evaluation of novel research.
11. WaveGlow
This is one of the interesting deep learning project ideas. WaveGlow is a flow-based Generative Network for Speech Synthesis developed and offered by NVIDIA. It can generate high-quality speech from mel-spectograms. It blends the insights obtained from WaveNet and Glow to facilitate fast, efficient, and high-quality audio synthesis, without requiring auto-regression.
WaveGlow can be implemented via a single network and also trained using a single cost function. The aim is to optimize the likelihood of the training data, thereby makes the training procedure manageable and stable.
12. OpenCog
OpenCog project includes the core components and a platform to facilitate AI R&D. It aims to design an open-source Artificial General Intelligence (AGI) framework that can accurately capture the spirit of the human brain’s architecture and dynamics. The AI bot, Sophia is one of the finest examples of AGI.
OpenCog also encompasses OpenCog Prime – an advanced architecture for robot and virtual embodied cognition that includes an assortment of interacting components to give birth to human-equivalent artificial general intelligence (AGI) as an emergent phenomenon of the system as a whole.
13. DeepMimic
DeepMimic is an “example-guided Deep Reinforcement Learning of Physics-based character skills.” In other words, it is a neural network trained by leveraging reinforcement learning to reproduce motion-captured movements via a simulated humanoid, or any other physical agent.
The functioning of DeepMimic is pretty simple. First, you need to set up a simulation of the thing you wish to animate (you can capture someone making specific movements and try to imitate that). Now, you use the motion capture data to train a neural network through reinforcement learning. The input here is the configuration of the arms and legs at different time points while the reward is the difference between the real thing and the simulation at specific time points.
In-demand Machine Learning Skills
14. ibm watson.
One of the most excellent examples of Machine Learning and Deep Learning is IBM Watson. The greatest aspect of IBM Watson is that it allows Data Scientists and ML Engineers/Developers to collaborate on an integrated platform to enhance and automate the AI life cycle. Watson can simplify, accelerate, and manage AI deployments, thereby enabling companies to harness the potential of both ML and Deep Learning to boost business value.
IBM Watson is Integrated with the Watson Studio to empower cross-functional teams to deploy, monitor, and optimize ML/Deep Learning models quickly and efficiently. It can automatically generate APIs to help your developers incorporate AI into their applications readily. On top of that, it comes with intuitive dashboards that make it convenient for the teams to manage models in production seamlessly.
15. Google Brain
This is one of the excellent deep learning project ideas. The Google Brain project is Deep Learning AI research that began in 2011 at Google. The Google Brain team led by Google Fellow Jeff Dean, Google Researcher Greg Corrado, and Stanford University Professor Andrew Ng aimed to bring Deep Learning and Machine Learning out from the confines of the lab into the real world. They designed one of the largest neural networks for ML – it comprised of 16,000 computer processors connected together.
To test the capabilities of a neural network of this massive size, the Google Brain team fed the network with random thumbnails of cat images sourced from 10 million YouTube videos. However, the catch is that they didn’t train the system to recognize what a cat looks like. But the intelligent system left everyone astonished – it taught itself how to identify cats and further went on to assemble the features of a cat to complete the image of a cat!
The Google Brain project successfully proved that software-based neural networks can imitate the functioning of the human brain, wherein each neuron is trained to detect particular objects. How Deep Learning Algorithms are Transforming our Everyday Lives
16. 12 Sigma’s Lung Cancer detection algorithm
12 Sigma has developed an AI algorithm that can reduce diagnostic errors associated with lung cancer in its early stages and detect signs of lung cancer much faster than traditional approaches.
According to Xin Zhong, the Co-founder and CEO of Sigma Technologies, usually conventional cancer detection practices take time to detect lung cancer. However, 12 Sigma’s AI algorithm system can reduce the diagnosis time, leading to a better rate of survival for lung cancer patients.
Generally, doctors diagnose lung cancer by carefully examining CT scan images to check for small nodules and classify them as benign or malignant. It can take over ten minutes for doctors to visually inspect the patient’s CT images for nodules, plus additional time for classifying the nodules as benign or malignant.
Needless to say, there always remains a high possibility of human errors. 12 Sigma maintains that its AI algorithm can inspect the CT images and classify nodules within two minutes .
One of the most famous types of artificial neural networks is CNN, also known as Convolutional Neural Networks which is majorly used for image and object recognition as well as classification. It is a type of supervised Deep Learning, which means that is it able to learn on its own, without any human supervision. It can work with both structured and unstructured data. CNN.
By integrating CNNs and deep learning world-class applications are made. Some of the applications of CNN include Facial recognition, analyzing documents, collecting natural history, analyzing climate and even advertisements. Especially in the world of marketing and advertisements, CNN has brought a huge change by introducing data-driven personalized advertising.
Therefore, it is fair to say that CNN deep learning projects can help add significant weight to one’s experience and resume. Below are some ideas for CNN deep learning projects. These projects are best for beginners to advanced levels to get a hands-on experience with CNN. As you go down the list, difficulty levels increases.
- Disease detection in plants using MATLAB
With the constantly shifting climate changes and various other pathogenic bacteria and fungus, the life span of the plants are getting decreased. Especially due to the use of harsh pesticides, a new type of disease may emerge within a plant. Diagnosing these problems at an early stage can help us save a variety of plant species that are on the verge of extinction and these deep learning research topic s assist to make that happen.
It is pretty time-consuming to manually detect the disease, therefore image processing can help make the process swifter. In this project, machine vision equipment is used to collect images and judge whether or not the plant has any fatal disease. It is quite a popular topic among CNN deep learning projects. Manual design of features plus classifier or conventional image processing algorithm is used in this project.
Knowledge of MATLAB is essential to execute this project. Apart from this knowledge of Image processing and CNNs is also required.
- Detecting traffic using Python
This recognition system helps figure out the traffic signal lights along the stress, speed limit signs and various other signs such as caution signs, blend signs and so on. This recognition system is exp[ected to play a huge role in smart vehicles and self-driving cars in the future.
As per the reports of WHO, on average approx, 1.3 million people die every year due to road traffic crashes and about 20 to 50 million people suffer from fatal and non-fatal road accidents. Therefore, a system like this can play a significant role in reducing the numbers.
This program requires the knowledge of Python, CNN and Build CNN. It also requires some basic knowledge of Keras, some Python library Matplotlip, PIL, image classification and Scikit-learn.
- Detecting Gender and Age:
As simple as it may sound, after the emergence of AI, it has become so important to differentiate between real and mimic. Detecting age and gender is a project that has been around for quite a long now. However, after the emergence of AI, the process has become a bit tricky. Continuous changes are been made to improve the outcomes.
The programing language that is used for executing this project in Python. The objective of this program is to give an approximate idea of the person’s gender and age by using their pictures. To execute this project, you’ll be required to have an in-depth knowledge of Python, OpenCV and CNNs.
- Language Translating
With the speed at which globalization is becoming the new norm, knowing multiple languages is becoming more and more important. However, not everyone has the knack or interest to learn multiple languages. Apart from that, oftentimes we are required to know a certain language even for travelling purposes. The majority of us rely on Google Translator which functions on the basics of Machine Translation (MT).
It is an in-demand topic under computer linguistics where ML is used to translate one language to another. However, it is under more advanced deep learning projects . Amongst the varied choices, Neural Machine Translation (NMT) is considered to be the most efficient method. Having knowledge of RNN sequence-to-sequence learning is important for this project.
- Hand gesture recognition
Smart devices such as TVs, mobile phones and cameras are becoming more advanced every day. We all are familiar with the feature of gesture control in our smartphones, however, it can also be implemented in devices such as TVs.
By incorporating gesture recognition programs into TVs, people will be able to perform a bunch of basic tasks without having the need of using remotes. Tasks like changing channels, increasing volume, pausing, and fast-forwarding, all can be done with the help of gesture recognition.
For this few hundred of training, data are required, which can then be classified into the major classes, like the ones mentioned before. These videos of various people performing the hand gestures will be used as training data, and when anybody does a similar hand gesture, it will be detected by the smart TV’s webcam and behave accordingly. It is definitely a deep learning project that is more on the advanced side.
Popular AI and ML Blogs & Free Courses
These are only a handful of the real-world applications of Deep Learning made so far. The technology is still very young – it is developing as we speak. Deep Learning holds immense possibilities to give birth to pioneering innovations that can help humankind to address some of the fundamental challenges of the real world .
Check out Advanced Certification Program in Machine Learning & Cloud with IIT Madras, the best engineering school in the country to create a program that teaches you not only machine learning but also the effective deployment of it using the cloud infrastructure. Our aim with this program is to open the doors of the most selective institute in the country and give learners access to amazing faculty & resources in order to master a skill that is in high & growing
Is Deep Learning just a hype or does it have real-life applications?
Deep Learning has recently found a number of useful applications. Deep learning is already changing a number of organizations and is projected to bring about a revolution in practically all industries, from Netflix's well-known movie recommendation system to Google's self-driving automobiles. Deep learning models are utilized in everything from cancer diagnosis to presidential election victory, from creating art and literature to making actual money. As a result, it would be incorrect to dismiss it as a fad. At any given time, Google and Facebook are translating content into hundreds of languages. This is accomplished by the application of deep learning models to NLP tasks, and it is a big success story.
What is the difference between Deep Learning and Machine Learning?
The most significant distinction between deep learning and regular machine learning is how well it performs when data scales up. Deep learning techniques do not perform well when the data is small. This is due to the fact that deep learning algorithms require a vast amount of data to fully comprehend it. Traditional machine learning algorithms, on the other hand, with their handmade rules, win in this circumstance. Most used features in machine learning must be chosen by an experienced and then hand-coded according to the domain and data type.
What are the prerequisites for starting out in Deep Learning?
Starting out with deep learning isn't nearly as difficult as some people make it out to be. Before getting into deep learning, you should brush up on a few fundamentals. Probability, derivatives, linear algebra, and a few other fundamental concepts should be familiar to you. Any machine learning task necessitates a fundamental understanding of statistics. Deep learning in real-world issues necessitates a reasonable level of coding ability. Deep learning is built on the foundation of machine learning. Without first grasping the basics of machine learning, it is impossible to begin mastering deep learning.
Refer to your Network!
If you know someone, who would benefit from our specially curated programs? Kindly fill in this form to register their interest. We would assist them to upskill with the right program, and get them a highest possible pre-applied fee-waiver up to ₹ 70,000/-
You earn referral incentives worth up to ₹80,000 for each friend that signs up for a paid programme! Read more about our referral incentives here .

Prepare for a Career of the Future
Leave a comment, cancel reply.
Your email address will not be published. Required fields are marked *
Our Trending Machine Learning Courses
- Advanced Certificate Programme in Machine Learning and NLP from IIIT Bangalore - Duration 8 Months
- Master of Science in Machine Learning & AI from LJMU - Duration 18 Months
- Executive PG Program in Machine Learning and AI from IIIT-B - Duration 12 Months
Our Popular Machine Learning Course

Get Free Consultation
Machine learning skills to master.
- Artificial Intelligence Courses
- Tableau Courses
- NLP Courses
- Deep Learning Courses
Related Articles

Introduction to Natural Language Processing

What is an Algorithm? Simple & Easy Explanation for Beginners [2023]

Recursive Feature Elimination: What It Is and Why It Matters?
Start your upskilling journey now, get a free personalised counselling session..
Schedule 1:1 free counselling
Talk to a career expert
Explore Free Courses

Data Science & Machine Learning
Build your foundation in one of the hottest industry of the 21st century

Build essential technical skills to move forward in your career in these evolving times

Career Planning
Get insights from industry leaders and career counselors and learn how to stay ahead in your career

Master industry-relevant skills that are required to become a leader and drive organizational success

Advance your career in the field of marketing with Industry relevant free courses

Kickstart your career in law by building a solid foundation with these relevant free courses.
Register for a demo course, talk to our counselor to find a best course suitable to your career growth.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Published: 25 May 2023
Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii
- Gary Liu 1 na1 ,
- Denise B. Catacutan 1 na1 ,
- Khushi Rathod 1 na1 ,
- Kyle Swanson ORCID: orcid.org/0000-0002-7385-7844 2 ,
- Wengong Jin 2 ,
- Jody C. Mohammed 1 ,
- Anush Chiappino-Pepe ORCID: orcid.org/0000-0002-3993-907X 3 , 4 ,
- Saad A. Syed 5 ,
- Meghan Fragis ORCID: orcid.org/0000-0002-2297-2658 1 , 6 ,
- Kenneth Rachwalski ORCID: orcid.org/0000-0002-2967-7811 1 ,
- Jakob Magolan ORCID: orcid.org/0000-0002-2947-8580 1 , 6 ,
- Michael G. Surette 5 ,
- Brian K. Coombes ORCID: orcid.org/0000-0001-9883-1010 1 ,
- Tommi Jaakkola ORCID: orcid.org/0000-0002-2199-0379 2 ,
- Regina Barzilay 2 , 7 ,
- James J. Collins ORCID: orcid.org/0000-0002-5560-8246 3 , 7 , 8 , 9 &
- Jonathan M. Stokes ORCID: orcid.org/0009-0001-8378-2380 1
Nature Chemical Biology ( 2023 ) Cite this article
5122 Accesses
732 Altmetric
Metrics details
- Chemical tools
- Small molecules
- Virtual screening
Acinetobacter baumannii is a nosocomial Gram-negative pathogen that often displays multidrug resistance. Discovering new antibiotics against A. baumannii has proven challenging through conventional screening approaches. Fortunately, machine learning methods allow for the rapid exploration of chemical space, increasing the probability of discovering new antibacterial molecules. Here we screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. We trained a neural network with this growth inhibition dataset and performed in silico predictions for structurally new molecules with activity against A. baumannii . Through this approach, we discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii . Further investigations revealed that abaucin perturbs lipoprotein trafficking through a mechanism involving LolE. Moreover, abaucin could control an A. baumannii infection in a mouse wound model. This work highlights the utility of machine learning in antibiotic discovery and describes a promising lead with targeted activity against a challenging Gram-negative pathogen.

This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
251,40 € per year
only 20,95 € per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout

Data Availability
GenBank accession numbers for sequencing of abaucin-resistant mutants are BankIt2629921 – OP677864, OP677865, OP677866 and OP677867. GEO accession numbers for RNA sequencing datasets are GSE214305 – GSM6603484 , GSM6603485 , GSM6603486 , GSM6603487 , GSM6603488 , GSM6603489 and GSM6603490 . Source data are provided with this paper.
Code Availability
All custom code used for antibiotic prediction is open source and can be accessed without restriction at https://github.com/chemprop/chemprop . A cloned snapshot used for this paper is available at https://github.com/GaryLiu152/chemprop_abaucin . All commercial software used is described in Methods. Source data are provided with this paper.
Antunes, L. C. S., Visca, P. & Towner, K. J. Acinetobacter baumannii : evolution of a global pathogen. Pathog. Dis. 71 , 292–301 (2014).
Article CAS PubMed Google Scholar
2020 Antibacterial Agents in Clinical and Preclinical Development: An Overview and Analysis (World Health Organization, 2021); https://www.who.int/publications/i/item/9789240021303
Walsh, C. Where will new antibiotics come from? Nat. Rev. Microbiol. 1 , 65–70 (2003).
Tommasi, R., Brown, D. G., Walkup, G. K., Manchester, J. I. & Miller, A. A. ESKAPEing the labyrinth of antibacterial discovery. Nat. Rev. Drug Discov. 14 , 529–542 (2015).
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180 , 688–702 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ma, Y. et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol. 40 , 921–931 (2022).
Lluka, T. & Stokes, J. M. Antibiotic discovery in the artificial intelligence era. Ann. N. Y. Acad. Sci. 1519 , 74–93 (2023).
Melander, R. J., Zurawski, D. V. & Melander, C. Narrow-spectrum antibacterial agents. MedChemComm 9 , 12–21 (2018).
Theriot, C. M. et al. Antibiotic-induced shifts in the mouse gut microbiome and metabolome increase susceptibility to Clostridium difficile infection. Nat. Commun. 5 , 3114 (2014).
Article PubMed Google Scholar
Willing, B. P. et al. A pyrosequencing study in twins shows that gastrointestinal microbial profiles vary with inflammatory bowel disease phenotypes. Gastroenterology 139 , 1844–1854 (2010).
Turnbaugh, P. J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444 , 1027–1031 (2006).
Kelly, J. R. et al. Transferring the blues: depression-associated gut microbiota induces neurobehavioural changes in the rat. J. Psychiatr. Res. 82 , 109–118 (2016).
Wu, N. et al. Dysbiosis signature of fecal microbiota in colorectal cancer patients. Microb. Ecol. 66 , 462–470 (2013).
Lee, H. S., Plechot, K., Gohil, S. & Le, J. Clostridium difficile : diagnosis and the consequence of over diagnosis. Infect. Dis. Ther. 10 , 687–697 (2021).
Article PubMed PubMed Central Google Scholar
Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23 , 405–408 (2017).
Kaplan, E., Greene, N. P., Crow, A. & Koronakis, V. Insights into bacterial lipoprotein trafficking from a structure of LolA bound to the LolC periplasmic domain. Proc. Natl Acad. Sci. USA 115 , E7389–E7397 (2018).
Tang, X. et al. Structural basis for bacterial lipoprotein relocation by the transporter LolCDE. Nat. Struct. Mol. Biol. 28 , 347–355 (2021).
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59 , 3370–3388 (2019).
Landrum, G. RDKit: a software suite for cheminformatics, computational chemistry, and predictive modeling. https://www.rdkit.org/RDKit_Overview.pdf
Seok, S. J. et al. Blockade of CCL2/CCR2 signalling ameliorates diabetic nephropathy in db/db mice. Nephrol. Dial. Transplant. 28 , 1700–1710 (2013).
Cerri, C. et al. The chemokine CCL2 mediates the seizure-enhancing effects of systemic inflammation. J. Neurosci. 36 , 3777–3788 (2016).
Chargari, C. et al. Preclinical assessment of JNJ-26854165 (Serdemetan), a novel tryptamine compound with radiosensitizing activity in vitro and in tumor xenografts. Cancer Lett. 312 , 209–218 (2011).
Lehman, J. A. et al. Serdemetan antagonizes the Mdm2-HIF1α axis leading to decreased levels of glycolytic enzymes. PLoS ONE 8 , e74741 (2013).
Stokes, J. M., Lopatkin, A. J., Lobritz, M. A. & Collins, J. J. Bacterial metabolism and antibiotic efficacy. Cell Metab. 30 , 251–259 (2019).
Zheng, E. J., Stokes, J. M. & Collins, J. J. Eradicating bacterial persisters with combinations of strongly and weakly metabolism-dependent antibiotics. Cell Chem. Biol. 27 , 1544–1552 (2020).
Francino, M. P. Antibiotics and the human gut microbiome: dysbioses and accumulation of resistances. Front. Microbiol. 6 , 1543 (2016).
Smits, W. K., Lyras, D., Lacy, D. B., Wilcox, M. H. & Kuijper, E. J. Clostridium difficile infection. Nat. Rev. Dis. Prim. 2 , 16020 (2016).
Sharma, S. et al. Mechanism of LolCDE as a molecular extruder of bacterial triacylated lipoproteins. Nat. Commun. 12 , 4687 (2021).
Nicholson, W. L. & Maughan, H. The spectrum of spontaneous rifampin resistance mutations in the rpoB gene of Bacillus subtilis 168 spores differs from that of vegetative cells and resembles that of Mycobacterium tuberculosis . J. Bacteriol. 184 , 4936–4940 (2002).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373 , 871–876 (2021).
Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29 , 1–2 (2022).
Raivio, T. L., Leblanc, S. K. D. & Price, N. L. The Escherichia coli Cpx envelope stress response regulates genes of diverse function that impact antibiotic resistance and membrane integrity. J. Bacteriol. 195 , 2755–2767 (2013).
Guest, R. L., Wang, J., Wong, J. L. & Raivio, T. L. A bacterial stress response regulates respiratory protein complexes to control envelope stress adaptation. J. Bacteriol. 199 , e00153-17 (2017).
Delhaye, A., Laloux, G. & Collet, J.-F. The lipoprotein NlpE is a Cpx sensor that serves as a sentinel for protein sorting and folding defects in the Escherichia coli envelope. J. Bacteriol. 201 , e00611-18 (2019).
Peters, J. M. et al. A comprehensive, CRISPR-based functional analysis of essential genes in bacteria. Cell 165 , 1493–1506 (2016).
Pathania, R. et al. Chemical genomics in Escherichia coli identifies an inhibitor of bacterial lipoprotein targeting. Nat. Chem. Biol. 5 , 849–856 (2009).
McLeod, S. M. et al. Small-molecule inhibitors of Gram-negative lipoprotein trafficking discovered by phenotypic screening. J. Bacteriol. 197 , 1075–1082 (2015).
Manchanda, V., Sanchaita, S. & Singh, N. Multidrug resistant acinetobacter. J. Glob. Infect. Dis. 2 , 291–304 (2010).
Davis, K. A., Moran, K. A., McAllister, C. K. & Gray, P. J. Multidrug-resistant Acinetobacter extremity infections in soldiers. Emerg. Infect. Dis. 11 , 1218–1224 (2005).
Jin, W. et al. Deep learning identifies synergistic drug combinations for treating COVID-19. Proc. Natl Acad. Sci. USA 118 , e2105070118 (2021).
Preuer, K. et al. DeepSynergy: predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 34 , 1538–1546 (2018).
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).
Google Scholar
Deatherage, D. E. & Barrick, J. E. Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol. Biol. 1151 , 165–188 (2014).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34 , 525–527 (2016).
Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5 , 299–314 (1996).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 , 550 (2014).
Karp, P. D. Pathway databases: a case study in computational symbolic theories. Science 293 , 2040–2044 (2001).
Keseler, I. M. et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41 , D605–D612 (2013).
Karp, P. D. et al. Pathway tools version 19.0 update: software for pathway/genome informatics and systems biology. Brief. Bioinform. 17 , 877–890 (2016).
Calvo-Villamañán, A. et al. On-target activity predictions enable improved CRISPR–dCas9 screens in bacteria. Nucleic Acids Res. 48 , e64 (2020).
Depardieu, F. & Bikard, D. Gene silencing with CRISPRi in bacteria and optimization of dCas9 expression levels. Methods 172 , 61–75 (2020).
UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49 , D480–D489 (2021).
Article Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 , 3389–3402 (1997).
Altschul, S. F. et al. Protein database searches using compositionally adjusted substitution matrices. FEBS J. 272 , 5101–5109 (2005).
Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27 , 14–25 (2018).
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30 , 70–82 (2021).
Download references
Acknowledgements
We thank S. French from McMaster University for technical assistance with fluorescence microscopy experiments. This work was supported by the David Braley Centre for Antibiotic Discovery (to J.M.S.); the Weston Family Foundation (POP and Catalyst to J.M.S.); the Audacious Project (to J.J.C. and J.M.S.); the C3.ai Digital Transformation Institute (to R.B.); the Abdul Latif Jameel Clinic for Machine Learning in Health (to R.B.); the DTRA Discovery of Medical Countermeasures Against New and Emerging (DOMANE) threats program (to R.B.); the DARPA Accelerated Molecular Discovery program (to R.B.); the Canadian Institutes of Health Research (FRN-156361 to B.K.C.); Genome Canada GAPP (OGI-146 to M.G.S.); the Canadian Institutes of Health Research (FRN-148713 to M.G.S.); the Faculty of Health Sciences of McMaster University (to J.M.); the Boris Family (to J.M.); a Marshall Scholarship (to K.S.); and the DOE BER (DE-FG02-02ER63445 to A.C-P.).
Author information
These authors contributed equally: Gary Liu, Denise B. Catacutan, Khushi Rathod.
Authors and Affiliations
Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
Gary Liu, Denise B. Catacutan, Khushi Rathod, Jody C. Mohammed, Meghan Fragis, Kenneth Rachwalski, Jakob Magolan, Brian K. Coombes & Jonathan M. Stokes
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
Kyle Swanson, Wengong Jin, Tommi Jaakkola & Regina Barzilay
Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
Anush Chiappino-Pepe & James J. Collins
Department of Genetics, Harvard Medical School, Boston, MA, USA
Anush Chiappino-Pepe
Department of Medicine, Department of Biochemistry and Biomedical Sciences, Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, Ontario, Canada
Saad A. Syed & Michael G. Surette
Department of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario, Canada
Meghan Fragis & Jakob Magolan
Abdul Latif Jameel Clinic for Machine Learning in Health, Massachusetts Institute of Technology, Cambridge, MA, USA
Regina Barzilay & James J. Collins
Department of Biological Engineering, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
James J. Collins
Broad Institute of MIT and Harvard, Cambridge, MA, USA
You can also search for this author in PubMed Google Scholar
Contributions
J.M.S. and J.J.C. conceptualized the study; J.M.S., G.L., K.S. and W.J. performed model building and training; J.M.S., D.B.C., K.R. and A.C-P. performed mechanistic investigations; J.M.S., K.R. and S.A.S. performed spectrum of activity experiments; J.C.M. conducted mouse model experiments; M.F. performed chemical synthesis; J.M.S. and J.J.C. wrote the paper; J.M.S., J.J.C., R.B., T.J., M.G.S., B.K.C. and J.M. supervised the research.
Corresponding authors
Correspondence to James J. Collins or Jonathan M. Stokes .
Ethics declarations
Competing interests.
J.M.S. is cofounder and scientific director of Phare Bio. J.J.C. is cofounder and scientific advisory board chair of Phare Bio. J.J.C. is cofounder and scientific advisory board chair of Enbiotix. The other authors declare no competing interests.
Peer review
Peer review information.
Nature Chemical Biology thanks Jean Francois Collet and the other, anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended data fig. 1 model training data and prediction..
( a ) Replicate plot showing primary screening data of 7,684 small molecules for those that inhibited the growth of A. baumannii ATCC 17978 in LB medium at 50 µM. ( b ) Rank-ordered growth inhibition data of the prioritized 240 molecules from our prediction set that were selected for empirical validation (top); rank-ordered growth inhibition data of the 240 predicted molecules with the lowest prediction score (middle); rank-ordered growth inhibition data of the 240 predicted molecules with the highest prediction score that were not found in the training dataset (bottom). Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. Dashed horizontal line represents the stringent hit cut-off of >80% growth inhibition at 50 µM. ( c ) Growth inhibition of A. baumannii by abaucin (blue) and serdemetan (red) in LB medium. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. The structure of serdemetan is shown. ( d ) Growth kinetics of A. baumannii cells after treatment with abaucin at varying concentrations for 6 hours. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted.
Extended Data Fig. 2 Antibacterial activity of abaucin against human commensal species.
( a ) Growth inhibition of A. baumannii ATCC 17978 by ampicillin (blue) and ciprofloxacin (red) in LB medium. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. ( b ) Growth inhibition of B. breve by abaucin. Experiments were conducted in biological duplicate. ( c ) Growth inhibition of B. longum by abaucin. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. ( d ) Non-validated (see Fig. 2e ) growth inhibition of E. lenta by abaucin. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted.
Extended Data Fig. 3 Abaucin mechanism of action.
( a–h ) Growth inhibition of wildtype A. baumannii (WT) and the four independent abaucin-resistant mutants by a collection of diverse antibiotics. From left to right for each plot, the mutants are: A362T variant 1, Y394F, intergenic, and A362T variant 2. Experiments were conducted in biological duplicate. Note that the abaucin-resistant mutants do not display cross-resistance to other antibiotics. ( i ) Structural prediction of wildtype A. baumannii LolE using RoseTTAFold (bottom), with the structural error estimate of each amino acid (top). Position 362 is highlighted orange and resides in a disordered region of the protein. ( j ) same as (i), except with the Y362T abaucin-resistant mutant of LolE. ( k ) RNA sequencing of wildtype A. baumannii treated with 5x MIC abaucin for 4.5 hr (top) or 6 hr (bottom). Data are the mean of biological duplicates. Transcript abundance is normalized to no-drug control cultures grown in identical conditions. Vertical black lines show statistical significance cut-off values. Note the highly significant downregulation of genes involved in the electron transport chain and transmembrane ion transport. ( l ) Growth inhibition of A. baumannii harboring an empty CRISPRi vector (red), or three distinct sgRNAs targeting lolE (blue, teal, and green). All strains were grown in LB medium without induction. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. ( m ) qPCR quantifying the expression of lolE relative to the housekeeping gene gltA (left) and gyrB (right) in all four abaucin resistant mutants, normalized to wildtype A. baumannii . Experiments were conducted in biological duplicate with technical triplicates. Bar height represents mean expression.
Supplementary information
Supplementary information.
Supplementary Tables 1–7 and Note.
Reporting Summary
Supplementary data.
Supplementary Data 1: Growth inhibition data against A. baumannii for model training. Supplementary Data 2: Model prediction scores of compounds in the Drug Repurposing Hub. Supplementary Data 3: Experimental validation of (prioritized/poorest/top 240) predictions from the Drug Repurposing Hub. Supplementary Data 4: GO enrichment for up- and down-regulated transcripts in A. baumannii treated with 5x MIC abaucin.
Source data
Source data fig. 4a.
Raw data of measured bacterial load from mouse wound infection models.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and Permissions
About this article
Cite this article.
Liu, G., Catacutan, D.B., Rathod, K. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii . Nat Chem Biol (2023). https://doi.org/10.1038/s41589-023-01349-8
Download citation
Received : 25 March 2022
Accepted : 25 April 2023
Published : 25 May 2023
DOI : https://doi.org/10.1038/s41589-023-01349-8
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

How to Read a Research Paper – A Guide to Setting Research Goals, Finding Papers to Read, and More
If you work in a scientific field, you should try to build a deep and unbiased understanding of that field. This not only educates you in the best possible way but also helps you envision the opportunities in your space.
A research paper is often the culmination of a wide range of deep and authentic practices surrounding a topic. When writing a research paper, the author thinks critically about the problem, performs rigorous research, evaluates their processes and sources, organizes their thoughts, and then writes. These genuinely-executed practices make for a good research paper.
If you’re struggling to build a habit of reading papers (like I am) on a regular basis, I’ve tried to break down the whole process. I've talked to researchers in the field, read a bunch of papers and blogs from distinguished researchers, and jotted down some techniques that you can follow.
Let’s start off by understanding what a research paper is and what it is NOT!
What is a Research Paper?
A research paper is a dense and detailed manuscript that compiles a thorough understanding of a problem or topic. It offers a proposed solution and further research along with the conditions under which it was deduced and carried out, the efficacy of the solution and the research performed, and potential loopholes in the study.
A research paper is written not only to provide an exceptional learning opportunity but also to pave the way for further advancements in the field. These papers help other scholars germinate the thought seed that can either lead to a new world of ideas or an innovative method of solving a longstanding problem.
What Research Papers are NOT
There is a common notion that a research paper is a well-informed summary of a problem or topic written by means of other sources.
But you shouldn't mistake it for a book or an opinionated account of an individual’s interpretation of a particular topic.
Why Should You Read Research Papers?
What I find fascinating about reading a good research paper is that you can draw on a profound study of a topic and engage with the community on a new perspective to understand what can be achieved in and around that topic.
I work at the intersection of instructional design and data science. Learning is part of my day-to-day responsibilities. If the source of my education is flawed or inefficient, I’d fail at my job in the long term. This applies to many other jobs in Science with a special focus on research.
There are three important reasons to read a research paper:
- Knowledge — Understanding the problem from the eyes of someone who has probably spent years solving it and has taken care of all the edge cases that you might not think of at the beginning.
- Exploration — Whether you have a pinpointed agenda or not, there is a very high chance that you will stumble upon an edge case or a shortcoming that is worth following up. With persistent efforts over a considerable amount of time, you can learn to use that knowledge to make a living.
- Research and review — One of the main reasons for writing a research paper is to further the development in the field. Researchers read papers to review them for conferences or to do a literature survey of a new field. For example, Yann LeCun’ s paper on integrating domain constraints into backpropagation set the foundation of modern computer vision back in 1989. After decades of research and development work, we have come so far that we're now perfecting problems like object detection and optimizing autonomous vehicles.
Not only that, with the help of the internet, you can extrapolate all of these reasons or benefits onto multiple business models. It can be an innovative state-of-the-art product, an efficient service model, a content creator, or a dream job where you are solving problems that matter to you.
Goals for Reading a Research Paper — What Should You Read About?
The first thing to do is to figure out your motivation for reading the paper. There are two main scenarios that might lead you to read a paper:
- Scenario 1 — You have a well-defined agenda/goal and you are deeply invested in a particular field. For example, you’re an NLP practitioner and you want to learn how GPT-4 has given us a breakthrough in NLP. This is always a nice scenario to be in as it offers clarity.
- Scenario 2 — You want to keep abreast of the developments in a host of areas, say how a new deep learning architecture has helped us solve a 50-year old biological problem of understanding protein structures. This is often the case for beginners or for people who consume their daily dose of news from research papers (yes, they exist!).
If you’re an inquisitive beginner with no starting point in mind, start with scenario 2. Shortlist a few topics you want to read about until you find an area that you find intriguing. This will eventually lead you to scenario 1.
ML Reproducibility Challenge
In addition to these generic goals, if you need an end goal for your habit-building exercise of reading research papers, you should check out the ML reproducibility challenge.

You’ll find top-class papers from world-class conferences that are worth diving deep into and reproducing the results.
They conduct this challenge twice a year and they have one coming up in Spring 2021. You should study the past three versions of the challenge, and I’ll write a detailed post on what to expect, how to prepare, and so on.
Now you must be wondering – how can you find the right paper to read?
How to Find the Right Paper to Read
In order to get some ideas around this, I reached out to my friend, Anurag Ghosh who is a researcher at Microsoft. Anurag has been working at the crossover of computer vision, machine learning, and systems engineering.

Here are a few of his tips for getting started:
- Always pick an area you're interested in.
- Read a few good books or detailed blog posts on that topic and start diving deep by reading the papers referenced in those resources.
- Look for seminal papers around that topic. These are papers that report a major breakthrough in the field and offer a new method perspective with a huge potential for subsequent research in that field. Check out papers from the morning paper or C VF - test of time award/Helmholtz prize (if you're interested in computer vision).
- Check out books like Computer Vision: Algorithms and Applications by Richard Szeliski and look for the papers referenced there.
- Have and build a sense of community. Find people who share similar interests, and join groups/subreddits/discord channels where such activities are promoted.
In addition to these invaluable tips, there are a number of web applications that I’ve shortlisted that help me narrow my search for the right papers to read:
- r/MachineLearning — there are many researchers, practitioners, and engineers who share their work along with the papers they've found useful in achieving those results.

- Arxiv Sanity Preserver — built by Andrej Karpathy to accelerate research. It is a repository of 142,846 papers from computer science, machine learning, systems, AI, Stats, CV, and so on. It also offers a bunch of filters, powerful search functionality, and a discussion forum to make for a super useful research platform.

- Google Research — the research teams at Google are working on problems that have an impact on our everyday lives. They share their publications for individuals and teams to learn from, contribute to, and expedite research. They also have a Google AI blog that you can check out.

How to Read a Research Paper
After you have stocked your to-read list, then comes the process of reading these papers. Remember that NOT every paper is useful to read and we need a mechanism that can help us quickly screen papers that are worth reading.
To tackle this challenge, you can use this Three-Pass Approach by S. Keshav . This approach proposes that you read the paper in three passes instead of starting from the beginning and diving in deep until the end.
The three pass approach
- The first pass — is a quick scan to capture a high-level view of the paper. Read the title, abstract, and introduction carefully followed by the headings of the sections and subsections and lastly the conclusion. It should take you no more than 5–10 mins to figure out if you want to move to the second pass.
- The second pass — is a more focused read without checking for the technical proofs. You take down all the crucial notes, underline the key points in the margins. Carefully study the figures, diagrams, and illustrations. Review the graphs, mark relevant unread references for further reading. This helps you understand the background of the paper.
- The third pass — reaching this pass denotes that you’ve found a paper that you want to deeply understand or review. The key to the third pass is to reproduce the results of the paper. Check it for all the assumptions and jot down all the variations in your re-implementation and the original results. Make a note of all the ideas for future analysis. It should take 5–6 hours for beginners and 1–2 hours for experienced readers.
Tools and Software to Keep Track of Your Pipeline of Papers
If you’re sincere about reading research papers, your list of papers will soon grow into an overwhelming stack that is hard to keep track of. Fortunately, we have software that can help us set up a mechanism to manage our research.
Here are a bunch of them that you can use:
- Mendeley [not free] — you can add papers directly to your library from your browser, import documents, generate references and citations, collaborate with fellow researchers, and access your library from anywhere. This is mostly used by experienced researchers.

- Zotero [free & open source] — Along the same lines as Mendeley but free of cost. You can make use of all the features but with limited storage space.

- Notion — this is great if you are just starting out and want to use something lightweight with the option to organize your papers, jot down notes, and manage everything in one workspace. It might not stand anywhere in comparison with the above tools but I personally feel comfortable using Notion and I have created this board to keep track of my progress for now that you can duplicate:

⚠️ Symptoms of Reading a Research Paper
Reading a research paper can turn out to be frustrating, challenging, and time-consuming especially when you’re a beginner. You might face the following common symptoms:
- You might start feeling dumb for not understanding a thing a paper says.
- Finding yourself pushing too hard to understand the math behind those proofs.
- Beating your head against the wall to wrap it around the number of acronyms used in the paper. Just kidding, you’ll have to look up those acronyms every now and then.
- Being stuck on one paragraph for more than an hour.
Here’s a complete list of emotions that you might undergo as explained by Adam Ruben in this article .
Key Takeaways
We should be all set to dive right in. Here’s a quick summary of what we have covered here:
- A research paper is an in-depth study that offers an detailed explanation of a topic or problem along with the research process, proofs, explained results, and ideas for future work.
- Read research papers to develop a deep understanding of a topic/problem. Then you can either review papers as part of being a researcher, explore the domain and the kind of problems to build a solution or startup around it, or you can simply read them to keep abreast of the developments in your domain of interest.
- If you’re a beginner, start with exploration to soon find your path to goal-oriented research.
- In order to find good papers to read, you can use websites like arxiv-sanity, google research, and subreddits like r/MachineLearning.
- Reading approach — Use the 3-pass method to find a paper.
- Keep track of your research, notes, developments by using tools like Zotero/Notion.
- This can get overwhelming in no time. Make sure you start off easy and increment your load progressively.
Remember: Art is not a single method or step done over a weekend but a process of accomplishing remarkable results over time.
You can also watch the video on this topic on my YouTube channel :
Feel free to respond to this blog or comment on the video if you have some tips, questions, or thoughts!
If this tutorial was helpful, you should check out my data science and machine learning courses on Wiplane Academy . They are comprehensive yet compact and helps you build a solid foundation of work to showcase.
Web and Data Science Consultant | Instructional Design
If you read this far, tweet to the author to show them you care. Tweet a thanks
Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started
International Conference on Robotics and Automation
Shaping the future.
Robotics and Artificial Intelligence Advancements at Georgia Tech are Creating Entirely New Paradigms of What Computing Technology Can Do
Georgia Tech at ICRA 2023
May 29 -June 2 | London
Pictured above: Georgia Tech ‘robot alum’ Curi turns 10 this year.
Georgia Tech @ ICRA is a joint effort by: Institute for Robotics and Intelligent Machines • Machine Learning Center • College of Computing
More than 70 researchers from across the College of Engineering and College of Computing are advancing the state of the art in robotics.
Discover the Experts at ICRA

Georgia Tech is a leading contributor to ICRA 2023, a research venue focused on robotics and advancements in the development of embodied artificial intelligence.
*by number of papers
Partner Organizations
Air Force Research Laboratory • Amazon Robotics • Aurora Innovation • Autodesk • California Institute of Technology • Caltech • CCDC US Army Research Laboratory • Clemson • Columbia University • ETH Zurich • Free University of Bozen-Bolzano • Google Brain • Instituto Tecnologico Y De Estudios Superiores De Monterrey • Johns Hopkins University • MIT • NASA Jet Propulsion Laboratory • NJIT • NVIDIA • Samsung • Simon Fraser University • Stanford • Toyota Research Institute • Umi 2958 Gt-Cnrs • Université De Lorraine • University of British Columbia • University of California, Irvine • University of California, San Diego • University of Edinburgh • University of Illinois Urbana-Champaign • University of Maryland • University of Oxford • University of Southern California • University of Toronto • University of Washington • University of Waterloo • USC Viterbi School of Engineering • USNWC PC
FEATURED RESEARCH
Researchers use novel approach to teach robot to navigate over obstacles.
By Nathan Deen
Quadrupedal robots may be able to step directly over obstacles in their paths thanks to the efforts of a trio of Georgia Tech Ph.D. students.
When it comes to robotic locomotion and navigation, Naoki Yokoyama says most four-legged robots are trained to regain their footing if an obstacle causes them to stumble. Working toward a larger effort to develop a housekeeping robot, Yokoyama and his collaborators — Simar Kareer and Joanne Truong — set out to train their robot to walk over clutter it might encounter in a home.
“The main motivation of the project is getting low-level control over the legs of the robot that also incorporates visual input,” said Yokoyama, a Ph.D. student within the School of Electrical and Computer Engineering. “We envisioned a controller that could be deployed in an indoor setting with a lot of clutter, such as shoes or toys on the ground of a messy home. Whereas blind locomotive controllers tend to be more reactive — if they step on something, they’ll make sure they don’t fall over — we wanted ours to use visual input to avoid stepping on the obstacle altogether.”

RESEARCH TEAM
(pictured at top, from 1st row, left to right) :
Simar Kareer
Ph.D. student in computer vision
Joanne Truong
Ph.D. student in robotics
Naoki Yokoyama
Assistant Professor Interactive Computing
Dhruv Batra
Associate Professor Interactive Computing

Together We Swarm
Georgia Tech’s Yellow Jackets are tackling robotics research from a holistic perspective to develop safe, responsible AI systems that operate in the physical and virtual worlds. Learn about research areas and teams in the chart. Continue below to meet some of our experts and learn about their latest efforts.
Featured Research

Multi-Robot Systems
Together We Swarm, In Research & Robots
“Microrobots have great potential for healthcare and drug delivery, however these applications are impeded by the inaccurate control of microrobots, especially in swarms.
By collaborating with roboticists, we were able to ‘close the gap’ between single robot design and swarm control . All the different elements were there. We just made the connection.”
Systematically designing local interaction rules to achieve collective behaviors in robot swarms is a challenging endeavor, especially in microrobots, where size restrictions imply severe sensing, communication, and computation limitations. New research demonstrates a systematic approach to control the behaviors of microrobots by leveraging the physical interactions in a swarm of 300 3-mm vibration-driven “micro bristle robots” that designed and fabricated at Georgia Tech. The team’s investigations reveal how physics-driven interaction mechanisms can be exploited to achieve desired behaviors in minimally equipped robot swarms and highlight the specific ways in which hardware and software developments aid in the achievement of collision-induced aggregations.
Why It Matters
- Microrobots have great potential for healthcare and drug delivery, however these applications are impeded by the inaccurate control of microrobots, especially in swarms.
- The new published work is the first demonstration of motility-induced phase separation (MIPS) behaviors on a swarm robotic platform.
- This research promises to help overcome current constraints in the deployment of microrobots, such as limited motility, sensing, communication, and computational limitations.

Force and torque sensors are commonly used today to give robots a sense of touch. They are often placed at the “wrist” of a robot arm and allow a robot to precisely sense how much force its gripper applies to the world. However, these sensors are expensive, complex, and fragile. To address these problems, we present Visual Force/Torque Sensing, a method to estimate force and torque without a dedicated sensor. We mount a camera to the robot which focuses on the gripper and train a machine learning algorithm to observe small deflections in the gripper and estimate the forces and torques that caused them. While our method is less accurate than a purpose-built force/torque sensor, it is 100x cheaper and can be used to make touch-sensitive robots that accomplish real-world tasks.
- Currently, robots are programmed to avoid obstacles, and touching the environment or a human is often viewed as a failure.
- Future robots that operate in homes and around people will need to make contact with the world to manipulate objects and collaborate with humans.
- Currently available touch sensors are expensive and impractical to mount on home robots.
- This work proposes a method to replace these expensive force sensors with a simple camera, allowing robots to be touch-sensitive around humans.
- New method dramatically lowers the cost of these touch sensors, making robots more accessible, capable, and safe.
- New method is useful for potential applications in healthcare, such as making a bed and cleaning surfaces.

Cross-Applications
From Graffiti to Growing Plants: Art Robot Retooled for Hydroponics
“This project highlights how Georgia Tech’s commitment to interdisciplinary research has led to unexpected applications of seemingly unrelated technologies, and serves as a testament to the value of exploring diverse fields of study and collaboration in order to develop innovative solutions to real-world problems.”
Researchers have applied technology they developed for robotic art research to a hydroponics robot. The team originally developed a cable-based robot designed to spray paint graffiti on large buildings. Because the cable robot technology scales well to large sizes, the same robot became an ideal fit for many agricultural applications. Through a collaboration with the N.E.W. Center for Agriculture Technology at Georgia Tech, a robotic plant phenotyping robot was built, tested, and deployed in a hydroponic pilot farm on campus. The robot takes around 75 photos of each plant every day as the plant grows, then uses computer vision to construct 3D models of the plants which can be used to non-destructively estimate properties such as crop biomass and a photosynthetically active area. This allows for tracking, modeling, and predicting plant growth.
- Efficiency of resources in agriculture – using less water and fertilizer to produce more food – is becoming increasingly important.
- Advancements in agricultural efficiency are driven by better understanding of plant growth and substrate dynamics. There is a need to understand how plants grow to know how much to feed them.
- This work addresses plant growth model accuracy; accuracy is currently limited by (1) the difficulty in collecting large-scale detailed data and (2) “noisy” data due to genetic and environmental variations.
- Robotics is advanced by combining two existing robot architectures – cable robots and serial manipulators – and using field tests to autonomously image plants at high quality from many angles.
- In leveraging computer vision algorithms, plant images can be used to generate measurements that are accurate enough to be used in developing more advanced plant growth models.
Agriculture
Applying a Soft Touch to Berry Picking
“Our research in automated harvesting can be applied to soft fruits — such as blackberries, raspberries, loganberries, and grapes — that require intensive labor to harvest manually. Automating the harvesting process will allow for a drastic increase in productivity and decrease in human labor and time spent harvesting.”

New research outlines the design, fabrication, and testing of a soft robotic gripper for automated harvesting of blackberries and other fragile fruits. A robotic gripper utilizes three rubber-based fingers that are uniformly actuated by fishing line, similar to how human tendons function. This gripper also includes a ripeness sensor that utilizes the reflectance of near-infrared light to determine ripeness level, as well as an endoscopic camera for blackberry detection and providing image feedback for robot arm manipulation. The gripper was used to harvest 139 berries with manual positioning in two separate field tests. The retention force – the amount of force required to detach the berry from the stem – and average reflectance value was able to be determined for both ripe and unripe blackberries. The soft robotic gripper was integrated onto a rigid robot arm and successfully harvested fifteen artificial blackberries in a lab setting using visual servoing, a technique that allows the robot arm to iteratively approach the target using image feedback.
- In the blackberry industry, about 40-50% of total labor hours are spent in maintaining and harvesting the crops; harvesting blackberries contributed to ~56% of the total cost of bringing blackberries to the market.
- Automating the harvesting process will allow for a drastic increase in productivity and decrease in human labor/time spent harvesting.
- Very few studies have been conducted on automated harvesting for soft fruits and blackberries. This research acts as a proof of concept and spearhead for more technological development in this area.
- The research can be applied to other soft fruits such as raspberries, loganberries, grapes, etc.

Explore the Papers
HOW TO READ:
- Gold = Georgia Tech authors
- 1 row = 1 team (lead author is labeled)
- Left column = GT-led teams
- Right column = Teams with GT
- Bars sorted by percent of GT contributors on each team
RESEARCH & ACTIVITIES

Georgia Tech is working with more than 35 organizations on robotics research in ICRA’s main program. Explore the partnerships through the network chart and see the top institutions by number of collaborations.

See you in London!
Project Lead and Web Development : Joshua Preston Writer : Nathan Deen Photography and Visual Media: Kevin Beasley, Gerry Chen, Jeremy Collins, Christa Ernst, Joanne Truong Interactive Data Visualizations and Data Analysis : Joshua Preston


IMAGES
VIDEO
COMMENTS
Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today's Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various ...
This is especially the case for a beginner who is just trying to get engrossed in the world of deep learning. It might be hard to figure out which research papers are the best starting point for developing new projects and gaining an intuitive understanding of the subject. ... Research Paper: Deep Residual Learning for Image Recognition ...
'Descending through a Crowded Valley — Benchmarking Deep Learning Optimizers' Authors: Schmidt*, Schneider,* Henning (2021) | 📝 Paper | 🗣 Talk | 🤖 Code. One Paragraph Summary: Tuning optimizers is a fundamental ingredient to every Deep Learning-based project. There exist many heuristics such as the infamous starting point for the ...
This is the most suited path for any beginner while starting out with deep learning research. Firstly you should adapt yourself with reading a lot of Research papers.
Read the full paper here. 3. Tabular Data: Deep Learning is Not All You Need (2021) - Ravid Shwartz-Ziv, Amitai Armon. To solve real-life data science problems, selecting the right model to use is crucial. This final paper selected by Max explores whether deep models should be recommended as an option for tabular data. Read the full paper here.
This was a huge step forward in building interpretable deep learning models. In and around 2012, we had powerful GPUs, large labelled training sets, new model architectures and regularization ...
Also, after this list comes out, another awesome list for deep learning beginners, called Deep Learning Papers Reading Roadmap, has been created and loved by many deep learning researchers. Although the Roadmap List includes lots of important deep learning papers, it feels overwhelming for me to read them all. As I mentioned in the introduction ...
Step 4: Second pass (content familiarization) Content familiarization is a process that's relevant to the initial steps. The systematic approach to reading the research paper presented in this article. The familiarity process is a step that involves the introduction section and figures within the research paper.
Curated collection of Data Science, Machine Learning and Deep Learning papers, reviews and articles that are on must read list. ... Deep Learning Applications a beginner can build in minutes . CHANGELOG. 2019-10-28 Started must-read-papers-for-ml repo. 2019-10-29 Added analytics vidhya use case studies article links.
Download PDF Abstract: This article is a gentle discussion about the field of reinforcement learning in practice, about opportunities and challenges, touching a broad range of topics, with perspectives and without technical details. The article is based on both historical and recent research papers, surveys, tutorials, talks, blogs, books, (panel) discussions, and workshops/conferences.
3. Face detection system. This is one of the excellent deep learning project ideas for beginners. With the advance of deep learning, facial recognition technology has also advanced tremendously. Face recognition technology is a subset of Object Detection that focuses on observing the instance of semantic objects.
Building off this prior research, ... Source data are provided with this paper. ... predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 34, 1538-1546 ...
A research paper is often the culmination of a wide. Search Submit your search query. Forum ... say how a new deep learning architecture has helped us solve a 50-year old biological problem of understanding protein structures. This is often the case for beginners or for people who consume their daily dose of news from research papers (yes, they ...
Research Papers for beginners . I started projects on Deep Learning almost a year ago, I had done a few projects in my own interests. ... I think the first deep q learning research paper was one where they used an ai on the atari game. The paper is short and is easily reproducible using the AI gym thing. I dont know much but the paper is really ...
on each paragraph of the paper. Yeah, this is soo true. I mean a research paper can be abstract to new readers but seductive since the new readers are very interested (and somewhat overzealous) to the topic, but ended getting demotivated due to not understanding the paper - and not necessarily it's because they aren't capable of understanding the paper; maybe it's the paper's style of writing ...
May 29 -June 2 | London. Pictured above: Georgia Tech 'robot alum' Curi turns 10 this year. Georgia Tech @ ICRA is a joint effort by: Institute for Robotics and Intelligent Machines •. Machine Learning Center • College of Computing. More than 70 researchers from across the College of Engineering and College of Computing are advancing ...