Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

deep learning research papers for beginners

A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning

01 June 2019

Shaveta Dargan, Munish Kumar, … Gulshan Kumar

deep learning research papers for beginners

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

31 March 2021

Laith Alzubaidi, Jinglan Zhang, … Laith Farhan

deep learning research papers for beginners

Various Frameworks and Libraries of Machine Learning and Deep Learning: A Survey

01 February 2019

Zhaobin Wang, Ke Liu, … Yaonan Zhang

deep learning research papers for beginners

An Extensive Study on Deep Learning: Techniques, Applications

03 February 2021

Ruchi Mittal, Shefali Arora, … M. P. S. Bhatia

deep learning research papers for beginners

Machine learning and deep learning

08 April 2021

Christian Janiesch, Patrick Zschech & Kai Heinrich

deep learning research papers for beginners

Convolutional Neural Networks-An Extensive arena of Deep Learning. A Comprehensive Study

16 February 2021

Navdeep Singh & Hiteshwari Sabrol

deep learning research papers for beginners

Machine Learning: Algorithms, Real-World Applications and Research Directions

22 March 2021

Iqbal H. Sarker

Deep Learning without Tears

01 January 2020

Divyasheel Sharma

deep learning research papers for beginners

“Transfer Learning” for Bridging the Gap Between Data Sciences and the Deep Learning

28 March 2022

Ayesha Sohail

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

SN Computer Science volume  2 , Article number:  420 ( 2021 ) Cite this article

74k Accesses

230 Citations

13 Altmetric

Metrics details

Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today’s Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various application areas like healthcare, visual recognition, text analytics, cybersecurity, and many more. However, building an appropriate DL model is a challenging task, due to the dynamic nature and variations in real-world problems and data. Moreover, the lack of core understanding turns DL methods into black-box machines that hamper development at the standard level. This article presents a structured and comprehensive view on DL techniques including a taxonomy considering various types of real-world tasks like supervised or unsupervised. In our taxonomy, we take into account deep networks for supervised or discriminative learning , unsupervised or generative learning as well as hybrid learning and relevant others. We also summarize real-world application areas where deep learning techniques can be used. Finally, we point out ten potential aspects for future generation DL modeling with research directions . Overall, this article aims to draw a big picture on DL modeling that can be used as a reference guide for both academia and industry professionals.

Working on a manuscript?


In the late 1980s, neural networks became a prevalent topic in the area of Machine Learning (ML) as well as Artificial Intelligence (AI), due to the invention of various efficient learning methods and network structures [ 52 ]. Multilayer perceptron networks trained by “Backpropagation” type algorithms, self-organizing maps, and radial basis function networks were such innovative methods [ 26 , 36 , 37 ]. While neural networks are successfully used in many applications, the interest in researching this topic decreased later on. After that, in 2006, “Deep Learning” (DL) was introduced by Hinton et al. [ 41 ], which was based on the concept of artificial neural network (ANN). Deep learning became a prominent topic after that, resulting in a rebirth in neural network research, hence, some times referred to as “new-generation neural networks”. This is because deep networks, when properly trained, have produced significant success in a variety of classification and regression challenges [ 52 ].

Nowadays, DL technology is considered as one of the hot topics within the area of machine learning, artificial intelligence as well as data science and analytics, due to its learning capabilities from the given data. Many corporations including Google, Microsoft, Nokia, etc., study it actively as it can provide significant results in different classification and regression problems and datasets [ 52 ]. In terms of working domain, DL is considered as a subset of ML and AI, and thus DL can be seen as an AI function that mimics the human brain’s processing of data. The worldwide popularity of “Deep learning” is increasing day by day, which is shown in our earlier paper [ 96 ] based on the historical data collected from Google trends [ 33 ]. Deep learning differs from standard machine learning in terms of efficiency as the volume of data increases, discussed briefly in Section “ Why Deep Learning in Today's Research and Applications? ”. DL technology uses multiple layers to represent the abstractions of data to build computational models. While deep learning takes a long time to train a model due to a large number of parameters, it takes a short amount of time to run during testing as compared to other machine learning algorithms [ 127 ].

While today’s Fourth Industrial Revolution (4IR or Industry 4.0) is typically focusing on technology-driven “automation, smart and intelligent systems”, DL technology, which is originated from ANN, has become one of the core technologies to achieve the goal [ 103 , 114 ]. A typical neural network is mainly composed of many simple, connected processing elements or processors called neurons, each of which generates a series of real-valued activations for the target outcome. Figure 1 shows a schematic representation of the mathematical model of an artificial neuron, i.e., processing element, highlighting input ( \(X_i\) ), weight ( w ), bias ( b ), summation function ( \(\sum\) ), activation function ( f ) and corresponding output signal ( y ). Neural network-based DL technology is now widely applied in many fields and research areas such as healthcare, sentiment analysis, natural language processing, visual recognition, business intelligence, cybersecurity, and many more that have been summarized in the latter part of this paper.

figure 1

Schematic representation of the mathematical model of an artificial neuron (processing element), highlighting input ( \(X_i\) ), weight ( w ), bias ( b ), summation function ( \(\sum\) ), activation function ( f ) and output signal ( y )

Although DL models are successfully applied in various application areas, mentioned above, building an appropriate model of deep learning is a challenging task, due to the dynamic nature and variations of real-world problems and data. Moreover, DL models are typically considered as “black-box” machines that hamper the standard development of deep learning research and applications. Thus for clear understanding, in this paper, we present a structured and comprehensive view on DL techniques considering the variations in real-world problems and tasks. To achieve our goal, we briefly discuss various DL techniques and present a taxonomy by taking into account three major categories: (i) deep networks for supervised or discriminative learning that is utilized to provide a discriminative function in supervised deep learning or classification applications; (ii) deep networks for unsupervised or generative learning that are used to characterize the high-order correlation properties or features for pattern analysis or synthesis, thus can be used as preprocessing for the supervised algorithm; and (ii) deep networks for hybrid learning that is an integration of both supervised and unsupervised model and relevant others. We take into account such categories based on the nature and learning capabilities of different DL techniques and how they are used to solve problems in real-world applications [ 97 ]. Moreover, identifying key research issues and prospects including effective data representation, new algorithm design, data-driven hyper-parameter learning, and model optimization, integrating domain knowledge, adapting resource-constrained devices, etc. is one of the key targets of this study, which can lead to “Future Generation DL-Modeling”. Thus the goal of this paper is set to assist those in academia and industry as a reference guide, who want to research and develop data-driven smart and intelligent systems based on DL techniques.

The overall contribution of this paper is summarized as follows:

This article focuses on different aspects of deep learning modeling, i.e., the learning capabilities of DL techniques in different dimensions such as supervised or unsupervised tasks, to function in an automated and intelligent manner, which can play as a core technology of today’s Fourth Industrial Revolution (Industry 4.0).

We explore a variety of prominent DL techniques and present a taxonomy by taking into account the variations in deep learning tasks and how they are used for different purposes. In our taxonomy, we divide the techniques into three major categories such as deep networks for supervised or discriminative learning, unsupervised or generative learning, as well as deep networks for hybrid learning, and relevant others.

We have summarized several potential real-world application areas of deep learning, to assist developers as well as researchers in broadening their perspectives on DL techniques. Different categories of DL techniques highlighted in our taxonomy can be used to solve various issues accordingly.

Finally, we point out and discuss ten potential aspects with research directions for future generation DL modeling in terms of conducting future research and system development.

This paper is organized as follows. Section “ Why Deep Learning in Today's Research and Applications? ” motivates why deep learning is important to build data-driven intelligent systems. In Section“ Deep Learning Techniques and Applications ”, we present our DL taxonomy by taking into account the variations of deep learning tasks and how they are used in solving real-world issues and briefly discuss the techniques with summarizing the potential application areas. In Section “ Research Directions and Future Aspects ”, we discuss various research issues of deep learning-based modeling and highlight the promising topics for future research within the scope of our study. Finally, Section “ Concluding Remarks ” concludes this paper.

Why Deep Learning in Today’s Research and Applications?

The main focus of today’s Fourth Industrial Revolution (Industry 4.0) is typically technology-driven automation, smart and intelligent systems, in various application areas including smart healthcare, business intelligence, smart cities, cybersecurity intelligence, and many more [ 95 ]. Deep learning approaches have grown dramatically in terms of performance in a wide range of applications considering security technologies, particularly, as an excellent solution for uncovering complex architecture in high-dimensional data. Thus, DL techniques can play a key role in building intelligent data-driven systems according to today’s needs, because of their excellent learning capabilities from historical data. Consequently, DL can change the world as well as humans’ everyday life through its automation power and learning from experience. DL technology is therefore relevant to artificial intelligence [ 103 ], machine learning [ 97 ] and data science with advanced analytics [ 95 ] that are well-known areas in computer science, particularly, today’s intelligent computing. In the following, we first discuss regarding the position of deep learning in AI, or how DL technology is related to these areas of computing.

The Position of Deep Learning in AI

Nowadays, artificial intelligence (AI), machine learning (ML), and deep learning (DL) are three popular terms that are sometimes used interchangeably to describe systems or software that behaves intelligently. In Fig. 2 , we illustrate the position of deep Learning, comparing with machine learning and artificial intelligence. According to Fig. 2 , DL is a part of ML as well as a part of the broad area AI. In general, AI incorporates human behavior and intelligence to machines or systems [ 103 ], while ML is the method to learn from data or experience [ 97 ], which automates analytical model building. DL also represents learning methods from data where the computation is done through multi-layer neural networks and processing. The term “Deep” in the deep learning methodology refers to the concept of multiple levels or stages through which data is processed for building a data-driven model.

figure 2

An illustration of the position of deep learning (DL), comparing with machine learning (ML) and artificial intelligence (AI)

Thus, DL can be considered as one of the core technology of AI, a frontier for artificial intelligence, which can be used for building intelligent systems and automation. More importantly, it pushes AI to a new level, termed “Smarter AI”. As DL are capable of learning from data, there is a strong relation of deep learning with “Data Science” [ 95 ] as well. Typically, data science represents the entire process of finding meaning or insights in data in a particular problem domain, where DL methods can play a key role for advanced analytics and intelligent decision-making [ 104 , 106 ]. Overall, we can conclude that DL technology is capable to change the current world, particularly, in terms of a powerful computational engine and contribute to technology-driven automation, smart and intelligent systems accordingly, and meets the goal of Industry 4.0.

Understanding Various Forms of Data

As DL models learn from data, an in-depth understanding and representation of data are important to build a data-driven intelligent system in a particular application area. In the real world, data can be in various forms, which typically can be represented as below for deep learning modeling:

Sequential Data Sequential data is any kind of data where the order matters, i,e., a set of sequences. It needs to explicitly account for the sequential nature of input data while building the model. Text streams, audio fragments, video clips, time-series data, are some examples of sequential data.

Image or 2D Data A digital image is made up of a matrix, which is a rectangular array of numbers, symbols, or expressions arranged in rows and columns in a 2D array of numbers. Matrix, pixels, voxels, and bit depth are the four essential characteristics or fundamental parameters of a digital image.

Tabular Data A tabular dataset consists primarily of rows and columns. Thus tabular datasets contain data in a columnar format as in a database table. Each column (field) must have a name and each column may only contain data of the defined type. Overall, it is a logical and systematic arrangement of data in the form of rows and columns that are based on data properties or features. Deep learning models can learn efficiently on tabular data and allow us to build data-driven intelligent systems.

The above-discussed data forms are common in the real-world application areas of deep learning. Different categories of DL techniques perform differently depending on the nature and characteristics of data, discussed briefly in Section “ Deep Learning Techniques and Applications ” with a taxonomy presentation. However, in many real-world application areas, the standard machine learning techniques, particularly, logic-rule or tree-based techniques [ 93 , 101 ] perform significantly depending on the application nature. Figure 3 also shows the performance comparison of DL and ML modeling considering the amount of data. In the following, we highlight several cases, where deep learning is useful to solve real-world problems, according to our main focus in this paper.

DL Properties and Dependencies

A DL model typically follows the same processing stages as machine learning modeling. In Fig. 4 , we have shown a deep learning workflow to solve real-world problems, which consists of three processing steps, such as data understanding and preprocessing, DL model building, and training, and validation and interpretation. However, unlike the ML modeling [ 98 , 108 ], feature extraction in the DL model is automated rather than manual. K-nearest neighbor, support vector machines, decision tree, random forest, naive Bayes, linear regression, association rules, k-means clustering, are some examples of machine learning techniques that are commonly used in various application areas [ 97 ]. On the other hand, the DL model includes convolution neural network, recurrent neural network, autoencoder, deep belief network, and many more, discussed briefly with their potential application areas in Section 3 . In the following, we discuss the key properties and dependencies of DL techniques, that are needed to take into account before started working on DL modeling for real-world applications.

figure 3

An illustration of the performance comparison between deep learning (DL) and other machine learning (ML) algorithms, where DL modeling from large amounts of data can increase the performance

Data Dependencies Deep learning is typically dependent on a large amount of data to build a data-driven model for a particular problem domain. The reason is that when the data volume is small, deep learning algorithms often perform poorly [ 64 ]. In such circumstances, however, the performance of the standard machine-learning algorithms will be improved if the specified rules are used [ 64 , 107 ].

Hardware Dependencies The DL algorithms require large computational operations while training a model with large datasets. As the larger the computations, the more the advantage of a GPU over a CPU, the GPU is mostly used to optimize the operations efficiently. Thus, to work properly with the deep learning training, GPU hardware is necessary. Therefore, DL relies more on high-performance machines with GPUs than standard machine learning methods [ 19 , 127 ].

Feature Engineering Process Feature engineering is the process of extracting features (characteristics, properties, and attributes) from raw data using domain knowledge. A fundamental distinction between DL and other machine-learning techniques is the attempt to extract high-level characteristics directly from data [ 22 , 97 ]. Thus, DL decreases the time and effort required to construct a feature extractor for each problem.

Model Training and Execution time In general, training a deep learning algorithm takes a long time due to a large number of parameters in the DL algorithm; thus, the model training process takes longer. For instance, the DL models can take more than one week to complete a training session, whereas training with ML algorithms takes relatively little time, only seconds to hours [ 107 , 127 ]. During testing, deep learning algorithms take extremely little time to run [ 127 ], when compared to certain machine learning methods.

Black-box Perception and Interpretability Interpretability is an important factor when comparing DL with ML. It’s difficult to explain how a deep learning result was obtained, i.e., “black-box”. On the other hand, the machine-learning algorithms, particularly, rule-based machine learning techniques [ 97 ] provide explicit logic rules (IF-THEN) for making decisions that are easily interpretable for humans. For instance, in our earlier works, we have presented several machines learning rule-based techniques [ 100 , 102 , 105 ], where the extracted rules are human-understandable and easier to interpret, update or delete according to the target applications.

The most significant distinction between deep learning and regular machine learning is how well it performs when data grows exponentially. An illustration of the performance comparison between DL and standard ML algorithms has been shown in Fig. 3 , where DL modeling can increase the performance with the amount of data. Thus, DL modeling is extremely useful when dealing with a large amount of data because of its capacity to process vast amounts of features to build an effective data-driven model. In terms of developing and training DL models, it relies on parallelized matrix and tensor operations as well as computing gradients and optimization. Several, DL libraries and resources [ 30 ] such as PyTorch [ 82 ] (with a high-level API called Lightning) and TensorFlow [ 1 ] (which also offers Keras as a high-level API) offers these core utilities including many pre-trained models, as well as many other necessary functions for implementation and DL model building.

figure 4

A typical DL workflow to solve real-world problems, which consists of three sequential stages (i) data understanding and preprocessing (ii) DL model building and training (iii) validation and interpretation

Deep Learning Techniques and Applications

In this section, we go through the various types of deep neural network techniques, which typically consider several layers of information-processing stages in hierarchical structures to learn. A typical deep neural network contains multiple hidden layers including input and output layers. Figure 5 shows a general structure of a deep neural network ( \(hidden \; layer=N\) and N \(\ge\) 2) comparing with a shallow network ( \(hidden \; layer=1\) ). We also present our taxonomy on DL techniques based on how they are used to solve various problems, in this section. However, before exploring the details of the DL techniques, it’s useful to review various types of learning tasks such as (i) Supervised: a task-driven approach that uses labeled training data, (ii) Unsupervised: a data-driven process that analyzes unlabeled datasets, (iii) Semi-supervised: a hybridization of both the supervised and unsupervised methods, and (iv) Reinforcement: an environment driven approach, discussed briefly in our earlier paper [ 97 ]. Thus, to present our taxonomy, we divide DL techniques broadly into three major categories: (i) deep networks for supervised or discriminative learning; (ii) deep networks for unsupervised or generative learning; and (ii) deep networks for hybrid learning combing both and relevant others, as shown in Fig. 6 . In the following, we briefly discuss each of these techniques that can be used to solve real-world problems in various application areas according to their learning capabilities.

figure 5

A general architecture of a a shallow network with one hidden layer and b a deep neural network with multiple hidden layers

figure 6

A taxonomy of DL techniques, broadly divided into three major categories (i) deep networks for supervised or discriminative learning, (ii) deep networks for unsupervised or generative learning, and (ii) deep networks for hybrid learning and relevant others

Deep Networks for Supervised or Discriminative Learning

This category of DL techniques is utilized to provide a discriminative function in supervised or classification applications. Discriminative deep architectures are typically designed to give discriminative power for pattern classification by describing the posterior distributions of classes conditioned on visible data [ 21 ]. Discriminative architectures mainly include Multi-Layer Perceptron (MLP), Convolutional Neural Networks (CNN or ConvNet), Recurrent Neural Networks (RNN), along with their variants. In the following, we briefly discuss these techniques.

Multi-layer Perceptron (MLP)

Multi-layer Perceptron (MLP), a supervised learning approach [ 83 ], is a type of feedforward artificial neural network (ANN). It is also known as the foundation architecture of deep neural networks (DNN) or deep learning. A typical MLP is a fully connected network that consists of an input layer that receives input data, an output layer that makes a decision or prediction about the input signal, and one or more hidden layers between these two that are considered as the network’s computational engine [ 36 , 103 ]. The output of an MLP network is determined using a variety of activation functions, also known as transfer functions, such as ReLU (Rectified Linear Unit), Tanh, Sigmoid, and Softmax [ 83 , 96 ]. To train MLP employs the most extensively used algorithm “Backpropagation” [ 36 ], a supervised learning technique, which is also known as the most basic building block of a neural network. During the training process, various optimization approaches such as Stochastic Gradient Descent (SGD), Limited Memory BFGS (L-BFGS), and Adaptive Moment Estimation (Adam) are applied. MLP requires tuning of several hyperparameters such as the number of hidden layers, neurons, and iterations, which could make solving a complicated model computationally expensive. However, through partial fit, MLP offers the advantage of learning non-linear models in real-time or online [ 83 ].

Convolutional Neural Network (CNN or ConvNet)

The Convolutional Neural Network (CNN or ConvNet) [ 65 ] is a popular discriminative deep learning architecture that learns directly from the input without the need for human feature extraction. Figure 7 shows an example of a CNN including multiple convolutions and pooling layers. As a result, the CNN enhances the design of traditional ANN like regularized MLP networks. Each layer in CNN takes into account optimum parameters for a meaningful output as well as reduces model complexity. CNN also uses a ‘dropout’ [ 30 ] that can deal with the problem of over-fitting, which may occur in a traditional network.

figure 7

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers

CNNs are specifically intended to deal with a variety of 2D shapes and are thus widely employed in visual recognition, medical image analysis, image segmentation, natural language processing, and many more [ 65 , 96 ]. The capability of automatically discovering essential features from the input without the need for human intervention makes it more powerful than a traditional network. Several variants of CNN are exist in the area that includes visual geometry group (VGG) [ 38 ], AlexNet [ 62 ], Xception [ 17 ], Inception [ 116 ], ResNet [ 39 ], etc. that can be used in various application domains according to their learning capabilities.

Recurrent Neural Network (RNN) and its Variants

A Recurrent Neural Network (RNN) is another popular neural network, which employs sequential or time-series data and feeds the output from the previous step as input to the current stage [ 27 , 74 ]. Like feedforward and CNN, recurrent networks learn from training input, however, distinguish by their “memory”, which allows them to impact current input and output through using information from previous inputs. Unlike typical DNN, which assumes that inputs and outputs are independent of one another, the output of RNN is reliant on prior elements within the sequence. However, standard recurrent networks have the issue of vanishing gradients, which makes learning long data sequences challenging. In the following, we discuss several popular variants of the recurrent network that minimizes the issues and perform well in many real-world application domains.

Long short-term memory (LSTM) This is a popular form of RNN architecture that uses special units to deal with the vanishing gradient problem, which was introduced by Hochreiter et al. [ 42 ]. A memory cell in an LSTM unit can store data for long periods and the flow of information into and out of the cell is managed by three gates. For instance, the ‘Forget Gate’ determines what information from the previous state cell will be memorized and what information will be removed that is no longer useful, while the ‘Input Gate’ determines which information should enter the cell state and the ‘Output Gate’ determines and controls the outputs. As it solves the issues of training a recurrent network, the LSTM network is considered one of the most successful RNN.

Bidirectional RNN/LSTM Bidirectional RNNs connect two hidden layers that run in opposite directions to a single output, allowing them to accept data from both the past and future. Bidirectional RNNs, unlike traditional recurrent networks, are trained to predict both positive and negative time directions at the same time. A Bidirectional LSTM, often known as a BiLSTM, is an extension of the standard LSTM that can increase model performance on sequence classification issues [ 113 ]. It is a sequence processing model comprising of two LSTMs: one takes the input forward and the other takes it backward. Bidirectional LSTM in particular is a popular choice in natural language processing tasks.

Gated recurrent units (GRUs) A Gated Recurrent Unit (GRU) is another popular variant of the recurrent network that uses gating methods to control and manage information flow between cells in the neural network, introduced by Cho et al. [ 16 ]. The GRU is like an LSTM, however, has fewer parameters, as it has a reset gate and an update gate but lacks the output gate, as shown in Fig. 8 . Thus, the key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates). The GRU’s structure enables it to capture dependencies from large sequences of data in an adaptive manner, without discarding information from earlier parts of the sequence. Thus GRU is a slightly more streamlined variant that often offers comparable performance and is significantly faster to compute [ 18 ]. Although GRUs have been shown to exhibit better performance on certain smaller and less frequent datasets [ 18 , 34 ], both variants of RNN have proven their effectiveness while producing the outcome.

figure 8

Basic structure of a gated recurrent unit (GRU) cell consisting of reset and update gates

Overall, the basic property of a recurrent network is that it has at least one feedback connection, which enables activations to loop. This allows the networks to do temporal processing and sequence learning, such as sequence recognition or reproduction, temporal association or prediction, etc. Following are some popular application areas of recurrent networks such as prediction problems, machine translation, natural language processing, text summarization, speech recognition, and many more.

Deep Networks for Generative or Unsupervised Learning

This category of DL techniques is typically used to characterize the high-order correlation properties or features for pattern analysis or synthesis, as well as the joint statistical distributions of the visible data and their associated classes [ 21 ]. The key idea of generative deep architectures is that during the learning process, precise supervisory information such as target class labels is not of concern. As a result, the methods under this category are essentially applied for unsupervised learning as the methods are typically used for feature learning or data generating and representation [ 20 , 21 ]. Thus generative modeling can be used as preprocessing for the supervised learning tasks as well, which ensures the discriminative model accuracy. Commonly used deep neural network techniques for unsupervised or generative learning are Generative Adversarial Network (GAN), Autoencoder (AE), Restricted Boltzmann Machine (RBM), Self-Organizing Map (SOM), and Deep Belief Network (DBN) along with their variants.

Generative Adversarial Network (GAN)

A Generative Adversarial Network (GAN), designed by Ian Goodfellow [ 32 ], is a type of neural network architecture for generative modeling to create new plausible samples on demand. It involves automatically discovering and learning regularities or patterns in input data so that the model may be used to generate or output new examples from the original dataset. As shown in Fig. 9 , GANs are composed of two neural networks, a generator G that creates new data having properties similar to the original data, and a discriminator D that predicts the likelihood of a subsequent sample being drawn from actual data rather than data provided by the generator. Thus in GAN modeling, both the generator and discriminator are trained to compete with each other. While the generator tries to fool and confuse the discriminator by creating more realistic data, the discriminator tries to distinguish the genuine data from the fake data generated by G .

figure 9

Schematic structure of a standard generative adversarial network (GAN)

Generally, GAN network deployment is designed for unsupervised learning tasks, but it has also proven to be a better solution for semi-supervised and reinforcement learning as well depending on the task [ 3 ]. GANs are also used in state-of-the-art transfer learning research to enforce the alignment of the latent feature space [ 66 ]. Inverse models, such as Bidirectional GAN (BiGAN) [ 25 ] can also learn a mapping from data to the latent space, similar to how the standard GAN model learns a mapping from a latent space to the data distribution. The potential application areas of GAN networks are healthcare, image analysis, data augmentation, video generation, voice generation, pandemics, traffic control, cybersecurity, and many more, which are increasing rapidly. Overall, GANs have established themselves as a comprehensive domain of independent data expansion and as a solution to problems requiring a generative solution.

Auto-Encoder (AE) and Its Variants

An auto-encoder (AE) [ 31 ] is a popular unsupervised learning technique in which neural networks are used to learn representations. Typically, auto-encoders are used to work with high-dimensional data, and dimensionality reduction explains how a set of data is represented. Encoder, code, and decoder are the three parts of an autoencoder. The encoder compresses the input and generates the code, which the decoder subsequently uses to reconstruct the input. The AEs have recently been used to learn generative data models [ 69 ]. The auto-encoder is widely used in many unsupervised learning tasks, e.g., dimensionality reduction, feature extraction, efficient coding, generative modeling, denoising, anomaly or outlier detection, etc. [ 31 , 132 ]. Principal component analysis (PCA) [ 99 ], which is also used to reduce the dimensionality of huge data sets, is essentially similar to a single-layered AE with a linear activation function. Regularized autoencoders such as sparse, denoising, and contractive are useful for learning representations for later classification tasks [ 119 ], while variational autoencoders can be used as generative models [ 56 ], discussed below.

Sparse Autoencoder (SAE) A sparse autoencoder [ 73 ] has a sparsity penalty on the coding layer as a part of its training requirement. SAEs may have more hidden units than inputs, but only a small number of hidden units are permitted to be active at the same time, resulting in a sparse model. Figure 10 shows a schematic structure of a sparse autoencoder with several active units in the hidden layer. This model is thus obliged to respond to the unique statistical features of the training data following its constraints.

Denoising Autoencoder (DAE) A denoising autoencoder is a variant on the basic autoencoder that attempts to improve representation (to extract useful features) by altering the reconstruction criterion, and thus reduces the risk of learning the identity function [ 31 , 119 ]. In other words, it receives a corrupted data point as input and is trained to recover the original undistorted input as its output through minimizing the average reconstruction error over the training data, i.e, cleaning the corrupted input, or denoising. Thus, in the context of computing, DAEs can be considered as very powerful filters that can be utilized for automatic pre-processing. A denoising autoencoder, for example, could be used to automatically pre-process an image, thereby boosting its quality for recognition accuracy.

Contractive Autoencoder (CAE) The idea behind a contractive autoencoder, proposed by Rifai et al. [ 90 ], is to make the autoencoders robust of small changes in the training dataset. In its objective function, a CAE includes an explicit regularizer that forces the model to learn an encoding that is robust to small changes in input values. As a result, the learned representation’s sensitivity to the training input is reduced. While DAEs encourage the robustness of reconstruction as discussed above, CAEs encourage the robustness of representation.

Variational Autoencoder (VAE) A variational autoencoder [ 55 ] has a fundamentally unique property that distinguishes it from the classical autoencoder discussed above, which makes this so effective for generative modeling. VAEs, unlike the traditional autoencoders which map the input onto a latent vector, map the input data into the parameters of a probability distribution, such as the mean and variance of a Gaussian distribution. A VAE assumes that the source data has an underlying probability distribution and then tries to discover the distribution’s parameters. Although this approach was initially designed for unsupervised learning, its use has been demonstrated in other domains such as semi-supervised learning [ 128 ] and supervised learning [ 51 ].

figure 10

Schematic structure of a sparse autoencoder (SAE) with several active units (filled circle) in the hidden layer

Although, the earlier concept of AE was typically for dimensionality reduction or feature learning mentioned above, recently, AEs have been brought to the forefront of generative modeling, even the generative adversarial network is one of the popular methods in the area. The AEs have been effectively employed in a variety of domains, including healthcare, computer vision, speech recognition, cybersecurity, natural language processing, and many more. Overall, we can conclude that auto-encoder and its variants can play a significant role as unsupervised feature learning with neural network architecture.

Kohonen Map or Self-Organizing Map (SOM)

A Self-Organizing Map (SOM) or Kohonen Map [ 59 ] is another form of unsupervised learning technique for creating a low-dimensional (usually two-dimensional) representation of a higher-dimensional data set while maintaining the topological structure of the data. SOM is also known as a neural network-based dimensionality reduction algorithm that is commonly used for clustering [ 118 ]. A SOM adapts to the topological form of a dataset by repeatedly moving its neurons closer to the data points, allowing us to visualize enormous datasets and find probable clusters. The first layer of a SOM is the input layer, and the second layer is the output layer or feature map. Unlike other neural networks that use error-correction learning, such as backpropagation with gradient descent [ 36 ], SOMs employ competitive learning, which uses a neighborhood function to retain the input space’s topological features. SOM is widely utilized in a variety of applications, including pattern identification, health or medical diagnosis, anomaly detection, and virus or worm attack detection [ 60 , 87 ]. The primary benefit of employing a SOM is that this can make high-dimensional data easier to visualize and analyze to understand the patterns. The reduction of dimensionality and grid clustering makes it easy to observe similarities in the data. As a result, SOMs can play a vital role in developing a data-driven effective model for a particular problem domain, depending on the data characteristics.

Restricted Boltzmann Machine (RBM)

A Restricted Boltzmann Machine (RBM) [ 75 ] is also a generative stochastic neural network capable of learning a probability distribution across its inputs. Boltzmann machines typically consist of visible and hidden nodes and each node is connected to every other node, which helps us understand irregularities by learning how the system works in normal circumstances. RBMs are a subset of Boltzmann machines that have a limit on the number of connections between the visible and hidden layers [ 77 ]. This restriction permits training algorithms like the gradient-based contrastive divergence algorithm to be more efficient than those for Boltzmann machines in general [ 41 ]. RBMs have found applications in dimensionality reduction, classification, regression, collaborative filtering, feature learning, topic modeling, and many others. In the area of deep learning modeling, they can be trained either supervised or unsupervised, depending on the task. Overall, the RBMs can recognize patterns in data automatically and develop probabilistic or stochastic models, which are utilized for feature selection or extraction, as well as forming a deep belief network.

Deep Belief Network (DBN)

A Deep Belief Network (DBN) [ 40 ] is a multi-layer generative graphical model of stacking several individual unsupervised networks such as AEs or RBMs, that use each network’s hidden layer as the input for the next layer, i.e, connected sequentially. Thus, we can divide a DBN into (i) AE-DBN which is known as stacked AE, and (ii) RBM-DBN that is known as stacked RBM, where AE-DBN is composed of autoencoders and RBM-DBN is composed of restricted Boltzmann machines, discussed earlier. The ultimate goal is to develop a faster-unsupervised training technique for each sub-network that depends on contrastive divergence [ 41 ]. DBN can capture a hierarchical representation of input data based on its deep structure. The primary idea behind DBN is to train unsupervised feed-forward neural networks with unlabeled data before fine-tuning the network with labeled input. One of the most important advantages of DBN, as opposed to typical shallow learning networks, is that it permits the detection of deep patterns, which allows for reasoning abilities and the capture of the deep difference between normal and erroneous data [ 89 ]. A continuous DBN is simply an extension of a standard DBN that allows a continuous range of decimals instead of binary data. Overall, the DBN model can play a key role in a wide range of high-dimensional data applications due to its strong feature extraction and classification capabilities and become one of the significant topics in the field of neural networks.

In summary, the generative learning techniques discussed above typically allow us to generate a new representation of data through exploratory analysis. As a result, these deep generative networks can be utilized as preprocessing for supervised or discriminative learning tasks, as well as ensuring model accuracy, where unsupervised representation learning can allow for improved classifier generalization.

Deep Networks for Hybrid Learning and Other Approaches

In addition to the above-discussed deep learning categories, hybrid deep networks and several other approaches such as deep transfer learning (DTL) and deep reinforcement learning (DRL) are popular, which are discussed in the following.

Hybrid Deep Neural Networks

Generative models are adaptable, with the capacity to learn from both labeled and unlabeled data. Discriminative models, on the other hand, are unable to learn from unlabeled data yet outperform their generative counterparts in supervised tasks. A framework for training both deep generative and discriminative models simultaneously can enjoy the benefits of both models, which motivates hybrid networks.

Hybrid deep learning models are typically composed of multiple (two or more) deep basic learning models, where the basic model is a discriminative or generative deep learning model discussed earlier. Based on the integration of different basic generative or discriminative models, the below three categories of hybrid deep learning models might be useful for solving real-world problems. These are as follows:

Hybrid \(Model\_1\) : An integration of different generative or discriminative models to extract more meaningful and robust features. Examples could be CNN+LSTM, AE+GAN, and so on.

Hybrid \(Model\_2\) : An integration of generative model followed by a discriminative model. Examples could be DBN+MLP, GAN+CNN, AE+CNN, and so on.

Hybrid \(Model\_3\) : An integration of generative or discriminative model followed by a non-deep learning classifier. Examples could be AE+SVM, CNN+SVM, and so on.

Thus, in a broad sense, we can conclude that hybrid models can be either classification-focused or non-classification depending on the target use. However, most of the hybrid learning-related studies in the area of deep learning are classification-focused or supervised learning tasks, summarized in Table 1 . The unsupervised generative models with meaningful representations are employed to enhance the discriminative models. The generative models with useful representation can provide more informative and low-dimensional features for discrimination, and they can also enable to enhance the training data quality and quantity, providing additional information for classification.

Deep Transfer Learning (DTL)

Transfer Learning is a technique for effectively using previously learned model knowledge to solve a new task with minimum training or fine-tuning. In comparison to typical machine learning techniques [ 97 ], DL takes a large amount of training data. As a result, the need for a substantial volume of labeled data is a significant barrier to address some essential domain-specific tasks, particularly, in the medical sector, where creating large-scale, high-quality annotated medical or health datasets is both difficult and costly. Furthermore, the standard DL model demands a lot of computational resources, such as a GPU-enabled server, even though researchers are working hard to improve it. As a result, Deep Transfer Learning (DTL), a DL-based transfer learning method, might be helpful to address this issue. Figure 11 shows a general structure of the transfer learning process, where knowledge from the pre-trained model is transferred into a new DL model. It’s especially popular in deep learning right now since it allows to train deep neural networks with very little data [ 126 ].

figure 11

A general structure of transfer learning process, where knowledge from pre-trained model is transferred into new DL model

Transfer learning is a two-stage approach for training a DL model that consists of a pre-training step and a fine-tuning step in which the model is trained on the target task. Since deep neural networks have gained popularity in a variety of fields, a large number of DTL methods have been presented, making it crucial to categorize and summarize them. Based on the techniques used in the literature, DTL can be classified into four categories [ 117 ]. These are (i) instances-based deep transfer learning that utilizes instances in source domain by appropriate weight, (ii) mapping-based deep transfer learning that maps instances from two domains into a new data space with better similarity, (iii) network-based deep transfer learning that reuses the partial of network pre-trained in the source domain, and (iv) adversarial based deep transfer learning that uses adversarial technology to find transferable features that both suitable for two domains. Due to its high effectiveness and practicality, adversarial-based deep transfer learning has exploded in popularity in recent years. Transfer learning can also be classified into inductive, transductive, and unsupervised transfer learning depending on the circumstances between the source and target domains and activities [ 81 ]. While most current research focuses on supervised learning, how deep neural networks can transfer knowledge in unsupervised or semi-supervised learning may gain further interest in the future. DTL techniques are useful in a variety of fields including natural language processing, sentiment classification, visual recognition, speech recognition, spam filtering, and relevant others.

Deep Reinforcement Learning (DRL)

Reinforcement learning takes a different approach to solving the sequential decision-making problem than other approaches we have discussed so far. The concepts of an environment and an agent are often introduced first in reinforcement learning. The agent can perform a series of actions in the environment, each of which has an impact on the environment’s state and can result in possible rewards (feedback) - “positive” for good sequences of actions that result in a “good” state, and “negative” for bad sequences of actions that result in a “bad” state. The purpose of reinforcement learning is to learn good action sequences through interaction with the environment, typically referred to as a policy.

figure 12

Schematic structure of deep reinforcement learning (DRL) highlighting a deep neural network

Deep reinforcement learning (DRL or deep RL) [ 9 ] integrates neural networks with a reinforcement learning architecture to allow the agents to learn the appropriate actions in a virtual environment, as shown in Fig. 12 . In the area of reinforcement learning, model-based RL is based on learning a transition model that enables for modeling of the environment without interacting with it directly, whereas model-free RL methods learn directly from interactions with the environment. Q-learning is a popular model-free RL technique for determining the best action-selection policy for any (finite) Markov Decision Process (MDP) [ 86 , 97 ]. MDP is a mathematical framework for modeling decisions based on state, action, and rewards [ 86 ]. In addition, Deep Q-Networks, Double DQN, Bi-directional Learning, Monte Carlo Control, etc. are used in the area [ 50 , 97 ]. In DRL methods it incorporates DL models, e.g. Deep Neural Networks (DNN), based on MDP principle [ 71 ], as policy and/or value function approximators. CNN for example can be used as a component of RL agents to learn directly from raw, high-dimensional visual inputs. In the real world, DRL-based solutions can be used in several application areas including robotics, video games, natural language processing, computer vision, and relevant others.

figure 13

Several potential real-world application areas of deep learning

Deep Learning Application Summary

During the past few years, deep learning has been successfully applied to numerous problems in many application areas. These include natural language processing, sentiment analysis, cybersecurity, business, virtual assistants, visual recognition, healthcare, robotics, and many more. In Fig. 13 , we have summarized several potential real-world application areas of deep learning. Various deep learning techniques according to our presented taxonomy in Fig. 6 that includes discriminative learning, generative learning, as well as hybrid models, discussed earlier, are employed in these application areas. In Table 1 , we have also summarized various deep learning tasks and techniques that are used to solve the relevant tasks in several real-world applications areas. Overall, from Fig. 13 and Table 1 , we can conclude that the future prospects of deep learning modeling in real-world application areas are huge and there are lots of scopes to work. In the next section, we also summarize the research issues in deep learning modeling and point out the potential aspects for future generation DL modeling.

Research Directions and Future Aspects

While existing methods have established a solid foundation for deep learning systems and research, this section outlines the below ten potential future research directions based on our study.

Automation in Data Annotation According to the existing literature, discussed in Section 3 , most of the deep learning models are trained through publicly available datasets that are annotated. However, to build a system for a new problem domain or recent data-driven system, raw data from relevant sources are needed to collect. Thus, data annotation, e.g., categorization, tagging, or labeling of a large amount of raw data, is important for building discriminative deep learning models or supervised tasks, which is challenging. A technique with the capability of automatic and dynamic data annotation, rather than manual annotation or hiring annotators, particularly, for large datasets, could be more effective for supervised learning as well as minimizing human effort. Therefore, a more in-depth investigation of data collection and annotation methods, or designing an unsupervised learning-based solution could be one of the primary research directions in the area of deep learning modeling.

Data Preparation for Ensuring Data Quality As discussed earlier throughout the paper, the deep learning algorithms highly impact data quality, and availability for training, and consequently on the resultant model for a particular problem domain. Thus, deep learning models may become worthless or yield decreased accuracy if the data is bad, such as data sparsity, non-representative, poor-quality, ambiguous values, noise, data imbalance, irrelevant features, data inconsistency, insufficient quantity, and so on for training. Consequently, such issues in data can lead to poor processing and inaccurate findings, which is a major problem while discovering insights from data. Thus deep learning models also need to adapt to such rising issues in data, to capture approximated information from observations. Therefore, effective data pre-processing techniques are needed to design according to the nature of the data problem and characteristics, to handling such emerging challenges, which could be another research direction in the area.

Black-box Perception and Proper DL/ML Algorithm Selection In general, it’s difficult to explain how a deep learning result is obtained or how they get the ultimate decisions for a particular model. Although DL models achieve significant performance while learning from large datasets, as discussed in Section 2 , this “black-box” perception of DL modeling typically represents weak statistical interpretability that could be a major issue in the area. On the other hand, ML algorithms, particularly, rule-based machine learning techniques provide explicit logic rules (IF-THEN) for making decisions that are easier to interpret, update or delete according to the target applications [ 97 , 100 , 105 ]. If the wrong learning algorithm is chosen, unanticipated results may occur, resulting in a loss of effort as well as the model’s efficacy and accuracy. Thus by taking into account the performance, complexity, model accuracy, and applicability, selecting an appropriate model for the target application is challenging, and in-depth analysis is needed for better understanding and decision making.

Deep Networks for Supervised or Discriminative Learning: According to our designed taxonomy of deep learning techniques, as shown in Fig. 6 , discriminative architectures mainly include MLP, CNN, and RNN, along with their variants that are applied widely in various application domains. However, designing new techniques or their variants of such discriminative techniques by taking into account model optimization, accuracy, and applicability, according to the target real-world application and the nature of the data, could be a novel contribution, which can also be considered as a major future aspect in the area of supervised or discriminative learning.

Deep Networks for Unsupervised or Generative Learning As discussed in Section 3 , unsupervised learning or generative deep learning modeling is one of the major tasks in the area, as it allows us to characterize the high-order correlation properties or features in data, or generating a new representation of data through exploratory analysis. Moreover, unlike supervised learning [ 97 ], it does not require labeled data due to its capability to derive insights directly from the data as well as data-driven decision making. Consequently, it thus can be used as preprocessing for supervised learning or discriminative modeling as well as semi-supervised learning tasks, which ensure learning accuracy and model efficiency. According to our designed taxonomy of deep learning techniques, as shown in Fig. 6 , generative techniques mainly include GAN, AE, SOM, RBM, DBN, and their variants. Thus, designing new techniques or their variants for an effective data modeling or representation according to the target real-world application could be a novel contribution, which can also be considered as a major future aspect in the area of unsupervised or generative learning.

Hybrid/Ensemble Modeling and Uncertainty Handling According to our designed taxonomy of DL techniques, as shown in Fig 6 , this is considered as another major category in deep learning tasks. As hybrid modeling enjoys the benefits of both generative and discriminative learning, an effective hybridization can outperform others in terms of performance as well as uncertainty handling in high-risk applications. In Section 3 , we have summarized various types of hybridization, e.g., AE+CNN/SVM. Since a group of neural networks is trained with distinct parameters or with separate sub-sampling training datasets, hybridization or ensembles of such techniques, i.e., DL with DL/ML, can play a key role in the area. Thus designing effective blended discriminative and generative models accordingly rather than naive method, could be an important research opportunity to solve various real-world issues including semi-supervised learning tasks and model uncertainty.

Dynamism in Selecting Threshold/ Hyper-parameters Values, and Network Structures with Computational Efficiency In general, the relationship among performance, model complexity, and computational requirements is a key issue in deep learning modeling and applications. A combination of algorithmic advancements with improved accuracy as well as maintaining computational efficiency, i.e., achieving the maximum throughput while consuming the least amount of resources, without significant information loss, can lead to a breakthrough in the effectiveness of deep learning modeling in future real-world applications. The concept of incremental approaches or recency-based learning [ 100 ] might be effective in several cases depending on the nature of target applications. Moreover, assuming the network structures with a static number of nodes and layers, hyper-parameters values or threshold settings, or selecting them by the trial-and-error process may not be effective in many cases, as it can be changed due to the changes in data. Thus, a data-driven approach to select them dynamically could be more effective while building a deep learning model in terms of both performance and real-world applicability. Such type of data-driven automation can lead to future generation deep learning modeling with additional intelligence, which could be a significant future aspect in the area as well as an important research direction to contribute.

Lightweight Deep Learning Modeling for Next-Generation Smart Devices and Applications: In recent years, the Internet of Things (IoT) consisting of billions of intelligent and communicating things and mobile communications technologies have become popular to detect and gather human and environmental information (e.g. geo-information, weather data, bio-data, human behaviors, and so on) for a variety of intelligent services and applications. Every day, these ubiquitous smart things or devices generate large amounts of data, requiring rapid data processing on a variety of smart mobile devices [ 72 ]. Deep learning technologies can be incorporate to discover underlying properties and to effectively handle such large amounts of sensor data for a variety of IoT applications including health monitoring and disease analysis, smart cities, traffic flow prediction, and monitoring, smart transportation, manufacture inspection, fault assessment, smart industry or Industry 4.0, and many more. Although deep learning techniques discussed in Section 3 are considered as powerful tools for processing big data, lightweight modeling is important for resource-constrained devices, due to their high computational cost and considerable memory overhead. Thus several techniques such as optimization, simplification, compression, pruning, generalization, important feature extraction, etc. might be helpful in several cases. Therefore, constructing the lightweight deep learning techniques based on a baseline network architecture to adapt the DL model for next-generation mobile, IoT, or resource-constrained devices and applications, could be considered as a significant future aspect in the area.

Incorporating Domain Knowledge into Deep Learning Modeling Domain knowledge, as opposed to general knowledge or domain-independent knowledge, is knowledge of a specific, specialized topic or field. For instance, in terms of natural language processing, the properties of the English language typically differ from other languages like Bengali, Arabic, French, etc. Thus integrating domain-based constraints into the deep learning model could produce better results for such particular purpose. For instance, a task-specific feature extractor considering domain knowledge in smart manufacturing for fault diagnosis can resolve the issues in traditional deep-learning-based methods [ 28 ]. Similarly, domain knowledge in medical image analysis [ 58 ], financial sentiment analysis [ 49 ], cybersecurity analytics [ 94 , 103 ] as well as conceptual data model in which semantic information, (i.e., meaningful for a system, rather than merely correlational) [ 45 , 121 , 131 ] is included, can play a vital role in the area. Transfer learning could be an effective way to get started on a new challenge with domain knowledge. Moreover, contextual information such as spatial, temporal, social, environmental contexts [ 92 , 104 , 108 ] can also play an important role to incorporate context-aware computing with domain knowledge for smart decision making as well as building adaptive and intelligent context-aware systems. Therefore understanding domain knowledge and effectively incorporating them into the deep learning model could be another research direction.

Designing General Deep Learning Framework for Target Application Domains One promising research direction for deep learning-based solutions is to develop a general framework that can handle data diversity, dimensions, stimulation types, etc. The general framework would require two key capabilities: the attention mechanism that focuses on the most valuable parts of input signals, and the ability to capture latent feature that enables the framework to capture the distinctive and informative features. Attention models have been a popular research topic because of their intuition, versatility, and interpretability, and employed in various application areas like computer vision, natural language processing, text or image classification, sentiment analysis, recommender systems, user profiling, etc [ 13 , 80 ]. Attention mechanism can be implemented based on learning algorithms such as reinforcement learning that is capable of finding the most useful part through a policy search [ 133 , 134 ]. Similarly, CNN can be integrated with suitable attention mechanisms to form a general classification framework, where CNN can be used as a feature learning tool for capturing features in various levels and ranges. Thus, designing a general deep learning framework considering attention as well as a latent feature for target application domains could be another area to contribute.

To summarize, deep learning is a fairly open topic to which academics can contribute by developing new methods or improving existing methods to handle the above-mentioned concerns and tackle real-world problems in a variety of application areas. This can also help the researchers conduct a thorough analysis of the application’s hidden and unexpected challenges to produce more reliable and realistic outcomes. Overall, we can conclude that addressing the above-mentioned issues and contributing to proposing effective and efficient techniques could lead to “Future Generation DL” modeling as well as more intelligent and automated applications.

Concluding Remarks

In this article, we have presented a structured and comprehensive view of deep learning technology, which is considered a core part of artificial intelligence as well as data science. It starts with a history of artificial neural networks and moves to recent deep learning techniques and breakthroughs in different applications. Then, the key algorithms in this area, as well as deep neural network modeling in various dimensions are explored. For this, we have also presented a taxonomy considering the variations of deep learning tasks and how they are used for different purposes. In our comprehensive study, we have taken into account not only the deep networks for supervised or discriminative learning but also the deep networks for unsupervised or generative learning, and hybrid learning that can be used to solve a variety of real-world issues according to the nature of problems.

Deep learning, unlike traditional machine learning and data mining algorithms, can produce extremely high-level data representations from enormous amounts of raw data. As a result, it has provided an excellent solution to a variety of real-world problems. A successful deep learning technique must possess the relevant data-driven modeling depending on the characteristics of raw data. The sophisticated learning algorithms then need to be trained through the collected data and knowledge related to the target application before the system can assist with intelligent decision-making. Deep learning has shown to be useful in a wide range of applications and research areas such as healthcare, sentiment analysis, visual recognition, business intelligence, cybersecurity, and many more that are summarized in the paper.

Finally, we have summarized and discussed the challenges faced and the potential research directions, and future aspects in the area. Although deep learning is considered a black-box solution for many applications due to its poor reasoning and interpretability, addressing the challenges or future aspects that are identified could lead to future generation deep learning modeling and smarter systems. This can also help the researchers for in-depth analysis to produce more reliable and realistic outcomes. Overall, we believe that our study on neural networks and deep learning-based advanced analytics points in a promising path and can be utilized as a reference guide for future research and implementations in relevant application domains by both academic and industry professionals.

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin Ma, Ghemawat S, Irving G, Isard M, et al. Tensorflow: a system for large-scale machine learning. In: 12th { USENIX } Symposium on operating systems design and implementation ({ OSDI } 16), 2016; p. 265–283.

Abdel-Basset M, Hawash H, Chakrabortty RK, Ryan M. Energy-net: a deep learning approach for smart energy management in iot-based smart cities. IEEE Internet of Things J. 2021.

Aggarwal A, Mittal M, Battineni G. Generative adversarial network: an overview of theory and applications. Int J Inf Manag Data Insights. 2021; p. 100004.

Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K. Deep learning approach combining sparse autoencoder with svm for network intrusion detection. IEEE Access. 2018;6:52843–56.

Article   Google Scholar  

Ale L, Sheta A, Li L, Wang Y, Zhang N. Deep learning based plant disease detection for smart agriculture. In: 2019 IEEE Globecom Workshops (GC Wkshps), 2019; p. 1–6. IEEE.

Amarbayasgalan T, Lee JY, Kim KR, Ryu KH. Deep autoencoder based neural networks for coronary heart disease risk prediction. In: Heterogeneous data management, polystores, and analytics for healthcare. Springer; 2019. p. 237–48.

Anuradha J, et al. Big data based stock trend prediction using deep cnn with reinforcement-lstm model. Int J Syst Assur Eng Manag. 2021; p. 1–11.

Aqib M, Mehmood R, Albeshri A, Alzahrani A. Disaster management in smart cities by forecasting traffic plan using deep learning and gpus. In: International Conference on smart cities, infrastructure, technologies and applications. Springer; 2017. p. 139–54.

Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag. 2017;34(6):26–38.

Aslan MF, Unlersen MF, Sabanci K, Durdu A. Cnn-based transfer learning-bilstm network: a novel approach for covid-19 infection detection. Appl Soft Comput. 2021;98:106912.

Bu F, Wang X. A smart agriculture iot system based on deep reinforcement learning. Futur Gener Comput Syst. 2019;99:500–7.

Chang W-J, Chen L-B, Hsu C-H, Lin C-P, Yang T-C. A deep learning-based intelligent medicine recognition system for chronic patients. IEEE Access. 2019;7:44441–58.

Chaudhari S, Mithal V, Polatkan Gu, Ramanath R. An attentive survey of attention models. arXiv preprint arXiv:1904.02874, 2019.

Chaudhuri N, Gupta G, Vamsi V, Bose I. On the platform but will they buy? predicting customers’ purchase behavior using deep learning. Decis Support Syst. 2021; p. 113622.

Chen D, Wawrzynski P, Lv Z. Cyber security in smart cities: a review of deep learning-based applications and case studies. Sustain Cities Soc. 2020; p. 102655.

Cho K, Van MB, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

Chollet F. Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2017; p. 1251–258.

Chung J, Gulcehre C, Cho KH, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.

Coelho IM, Coelho VN, da Eduardo J, Luz S, Ochi LS, Guimarães FG, Rios E. A gpu deep learning metaheuristic based model for time series forecasting. Appl Energy. 2017;201:412–8.

Da'u A, Salim N. Recommendation system based on deep learning methods: a systematic review and new directions. Artif Intel Rev. 2020;53(4):2709–48.

Deng L. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process. 2014; p. 3.

Deng L, Dong Yu. Deep learning: methods and applications. Found Trends Signal Process. 2014;7(3–4):197–387.

Article   MathSciNet   MATH   Google Scholar  

Deng S, Li R, Jin Y, He H. Cnn-based feature cross and classifier for loan default prediction. In: 2020 International Conference on image, video processing and artificial intelligence, volume 11584, page 115841K. International Society for Optics and Photonics, 2020.

Dhyani M, Kumar R. An intelligent chatbot using deep learning with bidirectional rnn and attention model. Mater Today Proc. 2021;34:817–24.

Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016.

Du K-L, Swamy MNS. Neural networks and statistical learning. Berlin: Springer Science & Business Media; 2013.

MATH   Google Scholar  

Dupond S. A thorough review on the current advance of neural network structures. Annu Rev Control. 2019;14:200–30.

Google Scholar  

Feng J, Yao Y, Lu S, Liu Y. Domain knowledge-based deep-broad learning framework for fault diagnosis. IEEE Trans Ind Electron. 2020;68(4):3454–64.

Garg S, Kaur K, Kumar N, Rodrigues JJPC. Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in sdn: a social multimedia perspective. IEEE Trans Multimed. 2019;21(3):566–78.

Géron A. Hands-on machine learning with Scikit-Learn, Keras. In: and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media; 2019.

Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning, vol. 1. Cambridge: MIT Press; 2016.

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014; p. 2672–680.

Google trends. 2021. .

Gruber N, Jockisch A. Are gru cells more specific and lstm cells more sensitive in motive classification of text? Front Artif Intell. 2020;3:40.

Gu B, Ge R, Chen Y, Luo L, Coatrieux G. Automatic and robust object detection in x-ray baggage inspection using deep convolutional neural networks. IEEE Trans Ind Electron. 2020.

Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.

Haykin S. Neural networks and learning machines, 3/E. London: Pearson Education; 2010.

He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2016; p. 770–78.

Hinton GE. Deep belief networks. Scholarpedia. 2009;4(5):5947.

Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

Huang C-J, Kuo P-H. A deep cnn-lstm model for particulate matter (pm2. 5) forecasting in smart cities. Sensors. 2018;18(7):2220.

Huang H-H, Fukuda M, Nishida T. Toward rnn based micro non-verbal behavior generation for virtual listener agents. In: International Conference on human-computer interaction, 2019; p. 53–63. Springer.

Hulsebos M, Hu K, Bakker M, Zgraggen E, Satyanarayan A, Kraska T, Demiralp Ça, Hidalgo C. Sherlock: a deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on knowledge discovery & data mining, 2019; p. 1500–508.

Imamverdiyev Y, Abdullayeva F. Deep learning method for denial of service attack detection based on restricted Boltzmann machine. Big Data. 2018;6(2):159–69.

Islam MZ, Islam MM, Asraf A. A combined deep cnn-lstm network for the detection of novel coronavirus (covid-19) using x-ray images. Inf Med Unlock. 2020;20:100412.

Ismail WN, Hassan MM, Alsalamah HA, Fortino G. Cnn-based health model for regular health factors analysis in internet-of-medical things environment. IEEE. Access. 2020;8:52541–9.

Jangid H, Singhal S, Shah RR, Zimmermann R. Aspect-based financial sentiment analysis using deep learning. In: Companion Proceedings of the The Web Conference 2018, 2018; p. 1961–966.

Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.

Kameoka H, Li L, Inoue S, Makino S. Supervised determined source separation with multichannel variational autoencoder. Neural Comput. 2019;31(9):1891–914.

Karhunen J, Raiko T, Cho KH. Unsupervised deep learning: a short review. In: Advances in independent component analysis and learning machines. 2015; p. 125–42.

Kawde P, Verma GK. Deep belief network based affect recognition from physiological signals. In: 2017 4th IEEE Uttar Pradesh Section International Conference on electrical, computer and electronics (UPCON), 2017; p. 587–92. IEEE.

Kim J-Y, Seok-Jun B, Cho S-B. Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. Inf Sci. 2018;460:83–102.

Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

Kingma DP, Welling M. An introduction to variational autoencoders. arXiv preprint arXiv:1906.02691, 2019.

Kiran PKR, Bhasker B. Dnnrec: a novel deep learning based hybrid recommender system. Expert Syst Appl. 2020.

Kloenne M, Niehaus S, Lampe L, Merola A, Reinelt J, Roeder I, Scherf N. Domain-specific cues improve robustness of deep learning-based segmentation of ct volumes. Sci Rep. 2020;10(1):1–9.

Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.

Kohonen T. Essentials of the self-organizing map. Neural Netw. 2013;37:52–65.

Kök İ, Şimşek MU, Özdemir S. A deep learning model for air quality prediction in smart cities. In: 2017 IEEE International Conference on Big Data (Big Data), 2017; p. 1983–990. IEEE.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. 2012; p. 1097–105.

Latif S, Rana R, Younis S, Qadir J, Epps J. Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353, 2018.

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

Li B, François-Lavet V, Doan T, Pineau J. Domain adversarial reinforcement learning. arXiv preprint arXiv:2102.07097, 2021.

Li T-HS, Kuo P-H, Tsai T-N, Luan P-C. Cnn and lstm based facial expression analysis model for a humanoid robot. IEEE Access. 2019;7:93998–4011.

Liu C, Cao Y, Luo Y, Chen G, Vokkarane V, Yunsheng M, Chen S, Hou P. A new deep learning-based food recognition system for dietary assessment on an edge computing service infrastructure. IEEE Trans Serv Comput. 2017;11(2):249–61.

Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey of deep neural network architectures and their applications. Neurocomputing. 2017;234:11–26.

López AU, Mateo F, Navío-Marco J, Martínez-Martínez JM, Gómez-Sanchís J, Vila-Francés J, Serrano-López AJ. Analysis of computer user behavior, security incidents and fraud using self-organizing maps. Comput Secur. 2019;83:38–51.

Lopez-Martin M, Carro B, Sanchez-Esguevillas A. Application of deep reinforcement learning to intrusion detection for supervised problems. Expert Syst Appl. 2020;141:112963.

Ma X, Yao T, Menglan H, Dong Y, Liu W, Wang F, Liu J. A survey on deep learning empowered iot applications. IEEE Access. 2019;7:181721–32.

Makhzani A, Frey B. K-sparse autoencoders. arXiv preprint arXiv:1312.5663, 2013.

Mandic D, Chambers J. Recurrent neural networks for prediction: learning algorithms, architectures and stability. Hoboken: Wiley; 2001.

Book   Google Scholar  

Marlin B, Swersky K, Chen B, Freitas N. Inductive principles for restricted boltzmann machine learning. In: Proceedings of the Thirteenth International Conference on artificial intelligence and statistics, p. 509–16. JMLR Workshop and Conference Proceedings, 2010.

Masud M, Muhammad G, Alhumyani H, Alshamrani SS, Cheikhrouhou O, Ibrahim S, Hossain MS. Deep learning-based intelligent face recognition in iot-cloud environment. Comput Commun. 2020;152:215–22.

Memisevic R, Hinton GE. Learning to represent spatial transformations with factored higher-order boltzmann machines. Neural Comput. 2010;22(6):1473–92.

Article   MATH   Google Scholar  

Minaee S, Azimi E, Abdolrashidi AA. Deep-sentiment: sentiment analysis using ensemble of cnn and bi-lstm models. arXiv preprint arXiv:1904.04206, 2019.

Naeem M, Paragliola G, Coronato A. A reinforcement learning and deep learning based intelligent system for the support of impaired patients in home treatment. Expert Syst Appl. 2021;168:114285.

Niu Z, Zhong G, Hui Yu. A review on the attention mechanism of deep learning. Neurocomputing. 2021;452:48–62.

Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–59.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32:8026–37.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

MathSciNet   MATH   Google Scholar  

Pi Y, Nath ND, Behzadan AH. Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv Eng Inf. 2020;43:101009.

Piccialli F, Giampaolo F, Prezioso E, Crisci D, Cuomo S. Predictive analytics for smart parking: A deep learning approach in forecasting of iot data. ACM Trans Internet Technol (TOIT). 2021;21(3):1–21.

Puterman ML. Markov decision processes: discrete stochastic dynamic programming. Hoboken: Wiley; 2014.

Qu X, Lin Y, Kai G, Linru M, Meng S, Mingxing K, Mu L, editors. A survey on the development of self-organizing maps for unsupervised intrusion detection. Mob Netw Appl. 2019; p. 1–22.

Rahman MW, Tashfia SS, Islam R, Hasan MM, Sultan SI, Mia S, Rahman MM. The architectural design of smart blind assistant using iot with deep learning paradigm. Internet of Things. 2021;13:100344.

Ren J, Green M, Huang X. From traditional to deep learning: fault diagnosis for autonomous vehicles. In: Learning control. Elsevier. 2021; p. 205–19.

Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. Contractive auto-encoders: Explicit invariance during feature extraction. In: Icml, 2011.

Rosa RL, Schwartz GM, Ruggiero WV, Rodríguez DZ. A knowledge-based recommendation system that includes sentiment analysis and deep learning. IEEE Trans Ind Inf. 2018;15(4):2124–35.

Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):1–25.

Article   MathSciNet   Google Scholar  

Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet of Things. 2019;5:180–93.

Sarker IH. Cyberlearning: effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet of Things. 2021;14:100393.

Sarker IH. Data science and analytics: an overview from data-driven smart computing, decision-making and applications perspective. SN Comput Sci. 2021.

Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Computer. Science. 2021;2(3):1–16.

MathSciNet   Google Scholar  

Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Computer. Science. 2021;2(3):1–21.

Sarker IH, Abushark YB, Alsolami F, Khan AI. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.

Sarker IH, Abushark YB, Khan AI. Contextpca: Predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.

Sarker IH, Colman A, Han J. Recencyminer: mining recency-based personalized behavior from contextual smartphone data. J Big Data. 2019;6(1):1–21.

Sarker IH, Colman A, Han J, Khan AI, Abushark YB, Salah K. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mob Netw Appl. 2020;25(3):1151–61.

Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J. 2018;61(3):349–68.

Sarker IH, Furhad MH, Nowrozy R. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Computer. Science. 2021;2(3):1–18.

Sarker IH, Hoque MM, Uddin MK. Mobile data science and intelligent apps: concepts, ai-based modeling and research directions. Mob Netw Appl. 2021;26(1):285–303.

Sarker IH, Kayes ASM. Abc-ruleminer: User behavioral rule-based machine learning method for context-aware intelligent services. J Netw Comput Appl. 2020;168:102762.

Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big data. 2020;7(1):1–29.

Sarker IH, Kayes ASM, Watters P. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.

Sarker IH, Salah K. Appspred: predicting context-aware smartphone apps using random forest learning. Internet of Things. 2019;8:100106.

Satt A, Rozenberg S, Hoory R. Efficient emotion recognition from speech using deep learning on spectrograms. In: Interspeec, 2017; p. 1089–1093.

Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y. Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans Comput Biol Bioinf. 2018;16(6):2089–100.

Sujay Narumanchi H, Ananya Pramod Kompalli Shankar A, Devashish CK. Deep learning based large scale visual recommendation and search for e-commerce. arXiv preprint arXiv:1703.02344, 2017.

Shao X, Kim CS. Multi-step short-term power consumption forecasting using multi-channel lstm with time location considering customer behavior. IEEE Access. 2020;8:125263–73.

Siami-Namini S, Tavakoli N, Namin AS. The performance of lstm and bilstm in forecasting time series. In: 2019 IEEE International Conference on Big Data (Big Data), 2019; p. 3285–292. IEEE.

Ślusarczyk B. Industry 4.0: are we ready? Pol J Manag Stud. 2018; p. 17

Sumathi P, Subramanian R, Karthikeyan VV, Karthik S. Soil monitoring and evaluation system using edl-asqe: enhanced deep learning model for ioi smart agriculture network. Int J Commun Syst. 2021; p. e4859.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2015; p. 1–9.

Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In: International Conference on artificial neural networks, 2018; p. 270–279. Springer.

Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Trans Neural Netw. 2000;11(3):586–600.

Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A, Bottou L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010;11(12).

Wang J, Liang-Chih Yu, Robert Lai K, Zhang X. Tree-structured regional cnn-lstm model for dimensional sentiment analysis. IEEE/ACM Trans Audio Speech Lang Process. 2019;28:581–91.

Wang S, Wan J, Li D, Liu C. Knowledge reasoning with semantic data for real-time data processing in smart factory. Sensors. 2018;18(2):471.

Wang W, Zhao M, Wang J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput. 2019;10(8):3035–43.

Wang X, Liu J, Qiu T, Chaoxu M, Chen C, Zhou P. A real-time collision prediction mechanism with deep learning for intelligent transportation system. IEEE Trans Veh Technol. 2020;69(9):9497–508.

Wang Y, Huang M, Zhu X, Zhao L. Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on empirical methods in natural language processing, 2016; p. 606–615.

Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization method for intrusion detection classification model based on deep belief network. IEEE Access. 2019;7:87593–605.

Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.

Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. Ieee access. 2018;6:35365–81.

Xu W, Sun H, Deng C, Tan Y. Variational autoencoder for semi-supervised text classification. In: Thirty-First AAAI Conference on artificial intelligence, 2017.

Xue Q, Chuah MC. New attacks on rnn based healthcare learning system and their detections. Smart Health. 2018;9:144–57.

Yousefi-Azar M, Hamey L. Text summarization using unsupervised deep learning. Expert Syst Appl. 2017;68:93–105.

Yuan X, Shi J, Gu L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst Appl. 2020;p. 114417.

Zhang G, Liu Y, Jin X. A survey of autoencoder-based recommender systems. Front Comput Sci. 2020;14(2):430–50.

Zhang X, Yao L, Huang C, Wang S, Tan M, Long Gu, Wang C. Multi-modality sensor data classification with selective attention. arXiv preprint arXiv:1804.05493, 2018.

Zhang X, Yao L, Wang X, Monaghan J, Mcalpine D, Zhang Y. A survey on deep learning based brain computer interface: recent advances and new frontiers. arXiv preprint arXiv:1905.04149, 2019; p. 66.

Zhang Y, Zhang P, Yan Y. Attention-based lstm with multi-task learning for distant speech recognition. In: Interspeech, 2017; p. 3857–861.

Download references

Author information

Authors and affiliations.

Swinburne University of Technology, Melbourne, VIC, 3122, Australia

Chittagong University of Engineering & Technology, Chittagong, 4349, Bangladesh

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Iqbal H. Sarker .

Ethics declarations

Conflict of interest.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K. N. and M. Shivakumar.

Rights and permissions

Reprints and Permissions

About this article

Cite this article.

Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN COMPUT. SCI. 2 , 420 (2021).

Download citation

Received : 29 May 2021

Accepted : 07 August 2021

Published : 18 August 2021


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative


The Top 17 ‘Must-Read’ AI Papers in 2022

The Top 17 ‘Must-Read’ AI Papers in 2022

We caught up with experts in the RE•WORK community to find out what the top 17 AI papers are for 2022 so far that you can add to your Summer must reads. The papers cover a wide range of topics including AI in social media and how AI can benefit humanity and are free to access.

Interested in learning more? Check out all the upcoming RE•WORK events to find out about the latest trends and industry updates in AI here .

Max Li, Staff Data Scientist – Tech Lead at Wish

Max is a Staff Data Scientist at Wish where he focuses on experimentation (A/B testing) and machine learning.  His passion is to empower data-driven decision-making through the rigorous use of data. View Max’s presentation, ‘Assign Experiment Variants at Scale in A/B Tests’, from our Deep Learning Summit in February 2022 here .

1. Boostrapped Meta-Learning (2022) – Sebastian Flennerhag et al.

The first paper selected by Max proposes an algorithm in which allows the meta-learner teach itself, allowing to overcome the meta-optimisation challenge. The algorithm focuses meta-learning with gradients, which guarantees improvements in performance. The paper also looks at how bootstrapping opens up possibilities. Read the full paper here .

2. Multi-Objective Bayesian Optimization over High-Dimensional Search Spaces (2022) – Samuel Daulton et al.

Another paper selected by Max proposes MORBO, a scalable method for multiple-objective BO as it performs better than that of high-dimensional search spaces. MORBO significantly improves the sample efficiency, and where BO algorithms fail, MORBO provides improved sample efficiencies to the current BO approach used. Read the full paper here .

3. Tabular Data: Deep Learning is Not All You Need (2021) – Ravid Shwartz-Ziv, Amitai Armon

To solve real-life data science problems, selecting the right model to use is crucial. This final paper selected by Max explores whether deep models should be recommended as an option for tabular data. Read the full paper here .

deep learning research papers for beginners

Jigyasa Grover, Senior Machine Learning Engineer at Twitter

Jigyasa Grover is a Senior Machine Learning Engineer at Twitter working in the performance ads ranking domain. Recently, she was honoured with the 'Outstanding in AI: Young Role Model Award' by Women in AI across North America. She is one of the few ML Google Developer Experts globally. Jigyasa has previously presented at our Deep Learning Summit and MLOps event in San Fransisco earlier this year.

4. Privacy for Free: How does Dataset Condensation Help Privacy? (2022) – Tian Dong et al.

Jigyasa’s first recommendation concentrates on Privacy Preserving Machine Learning, specifically mitigating the leakage of sensitive data in Machine Learning. The paper provides one of the first propositions of using dataset condensation techniques to preserve the data efficiency during model training and furnish membership privacy. This paper was published by Sony AI and won the Outstanding Paper Award at ICML 2022. Read the full paper here .

5. Affective Signals in a Social Media Recommender System (2022) – Jane Dwivedi-Yu et al.

The second paper recommended by Jigyasa talks about operationalising Affective Computing, also known as Emotional AI, for an improved personalised feed on social media. The paper discusses the design of an affective taxonomy customised to user needs on social media. It further lays out the curation of suitable training data by combining engagement data and data from a human-labelling task to enable the identification of the affective response a user might exhibit for a particular post. Read the full paper here .

6. ItemSage: Learning Product Embeddings for Shopping Recommendations at Pinterest (2022) – Paul Baltescu et al.

Jigyasa’s last recommendation is a paper by Pinterest that illustrates the aggregation of both textual and visual information to build a unified set of product embeddings to enhance recommendation results on e-commerce websites. By applying multi-task learning, the proposed embeddings can optimise for multiple engagement types and ensures that the shopping recommendation stack is efficient with respect to all objectives. Read the full article here .

Asmita Poddar, Software Development Engineer at Amazon Alexa

Asmita is a Software Development Engineer at Amazon Alexa, where she works on developing and productionising natural language processing and speech models. Asmita also has prior experience in applying machine learning in diverse domains. Asmita will be presenting at our London AI Summit , in September, where she will discuss AI for Spoken Communication.

7. Competition-Level Code Generation with AlphaCode (2022) – Yujia Li et al.

Systems can help programmers become more productive. Asmita has selected this paper which addresses the problems with incorporating innovations in AI into these systems. AlphaCode is a system that creates solutions for problems that requires deeper reasoning. Read the full paper here .

8. A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog (2022) – Yunhe Xie et al.

There are limits to model’s reasoning in regards to the existing ERSD datasets. The final paper selected by Asmita proposes a Commonsense Knowledge Enhanced Network with a backward-looking loss to perform dialog modelling, external knowledge integration and historical state retrospect. The model used has been shown to outperform other models. Read the full paper here .

deep learning research papers for beginners

Discover the speakers we have lined up and the topics we will cover at the London AI Summit.

Sergei Bobrovskyi, Expert in Anomaly Detection for Root Cause Analysis at Airbus

Dr. Sergei Bobrovskyi is a Data Scientist within the Analytics Accelerator team of the Airbus Digital Transformation Office. His work focuses on applications of AI for anomaly detection in time series, spanning various use-cases across Airbus. Sergei will be presenting at our Berlin AI Summit in October about Anomaly Detection, Root Cause Analysis and Explainability.

9. LaMDA: Language Models for Dialog Applications (2022) – Romal Thoppilan et al.

The paper chosen by Sergei describes the LaMDA system, which caused the furor this summer, when a former Google engineer claimed it has shown signs of being sentient. LaMDA is a family of large language models for dialog applications based on Transformer architecture. The interesting feature of the model is their fine-tuning with human annotated data and possibility to consult external sources. In any case, this is a very interesting model family, which we might encounter in many of the applications we use daily. Read the full paper here .

10. A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27 (2022) – Yann LeCun

The second paper chosen by Sergei provides a vision on how to progress towards general AI. The study combines a number of concepts including configurable predictive world model, behaviour driven through intrinsic motivation, and hierarchical joint embedding architectures. Read the full paper here .

11. Coordination Among Neural Modules Through a Shared Global Workpace (2022) – Anirudh Goyal et al.

This paper chosen by Sergei combines the Transformer architecture underlying most of the recent successes of deep learning with ideas from the Global Workspace Theory from cognitive sciences. This is an interesting read to broaden the understanding of why certain model architectures perform well and in which direction we might go in the future to further improve performance on challenging tasks. Read the full paper here .

12. Magnetic control of tokamak plasmas through deep reinforcement learning (2022) – Jonas Degrave et al.

Sergei chose the next paper, which asks the question of ‘how can AI research benefit humanity?’. The use of AI to enable safe, reliable and scalable deployment of fusion energy could contribute to the solution of pression problems of climate change. Sergei has said that this is an extremely interesting application of AI technology for engineering. Read the full paper here .

13. TranAd: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data (2022) – Shreshth Tuli, Giuliano Casale and Nicholas R. Jennings

The final paper chosen by Sergei is a specialised paper applying transformer architecture to the problem of unsupervised anomaly detection in multivariate time-series. Many architectures which were successful in other fields are at some points being also applied to time-series. The paper shows an improved performance on some known data sets. Read the full paper here .

deep learning research papers for beginners

Abdullahi Adamu, Senior Software Engineer at Sony

Abdullahi has worked in various industries including working at a market research start-up where he developed models that could extract insights from human conversations about products or services. He moved to Publicis, where he became Data Engineer and Data Scientist in 2018. Abdullahi will be part of our panel discussion at the London AI Summit in September, where he will discuss Harnessing the Power of Deep Learning.

14. Self-Supervision for Learning from the Bottom Up (2022) – Alexei Efros

This paper chosen by Abdullahi makes compelling arguments for why self-supervision is the next step in the evolution of AI/ML for building more robust models. Overall, these compelling arguments justify even further why self-supervised learning is important on our journey towards more robust models that generalise better in the wild. Read the full paper here .

15. Neural Architecture Search Survey: A Hardware Perspective (2022) – Krishna Teja Chitty-Venkata and Arun K. Somani

Another paper chosen by Abdullahi understands that as we move towards edge computing and federated learning, neural architecture search that takes into account hardware constraints which will be more critical in ensuring that we have leaner neural network models that balance latency and generalisation performance. This survey gives a birds eye view of the various neural architecture search algorithms that take into account hardware constraints to design artificial neural networks that give the best tradeoff of performance and accuracy. Read the full paper here .

16. What Should Not Be Contrastive In Contrastive Learning (2021) – Tete Xiao et al.

In the paper chosen by Abdullahi highlights the underlying assumptions behind data augmentation methods and how these can be counter productive in the context of contrastive learning; for example colour augmentation whilst a downstream task is meant to differentiate colours of objects. The result reported show promising results in the wild. Overall, it presents an elegant solution to using data augmentation for contrastive learning. Read the full paper here .

17. Why do tree-based models still outperform deep learning on tabular data? (2022) – Leo Grinsztajn, Edouard Oyallon and Gael Varoquaux

The final paper selected by Abdulliah works on answering the question of why deep learning models still find it hard to compete on tabular data compared to tree-based models. It is shown that MLP-like architectures are more sensitive to uninformative features in data, compared to their tree-based counterparts. Read the full paper here .

Sign up to the RE•WORK monthly newsletter for the latest AI news, trends and events.

Join us at our upcoming events this year:

·       London AI Summit – 14-15 September 2022

·       Berlin AI Summit – 4-5 October 2022

·       AI in Healthcare Summit Boston – 13-14 October 2022

·       Sydney Deep Learning and Enterprise AI Summits – 17-18 October 2022

·       MLOps Summit – 9-10 November 2022

·       Toronto AI Summit – 9-10 November 2022

·       Nordics AI Summit - 7-8 December 2022


Curated list of awesome lists

Awesome - most cited deep learning papers.

[Notice] This list is not being maintained anymore because of the overwhelming amount of deep learning papers published every day since 2017.

A curated list of the most cited deep learning papers (2012-2016)

We believe that there exist classic deep learning papers which are worth reading regardless of their application domain. Rather than providing overwhelming amount of papers, We would like to provide a curated list of the awesome deep learning papers which are considered as must-reads in certain research domains.

Before this list, there exist other awesome deep learning lists , for example, Deep Vision and Awesome Recurrent Neural Networks . Also, after this list comes out, another awesome list for deep learning beginners, called Deep Learning Papers Reading Roadmap , has been created and loved by many deep learning researchers.

Although the Roadmap List includes lots of important deep learning papers, it feels overwhelming for me to read them all. As I mentioned in the introduction, I believe that seminal works can give us lessons regardless of their application domain. Thus, I would like to introduce top 100 deep learning papers here as a good starting point of overviewing deep learning researches.

To get the news for newly released papers everyday, follow my twitter or facebook page !

Awesome list criteria

(Citation criteria)

Please note that we prefer seminal deep learning papers that can be applied to various researches rather than application papers. For that reason, some papers that meet the criteria may not be accepted while others can be. It depends on the impact of the paper, applicability to other researches scarcity of the research domain, and so on.

We need your contributions!

If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. (Please read the contributing guide for further instructions, though just letting me know the title of papers can also be a big contribution to us.)

(Update) You can download all top-100 papers with this and collect all authors' names with this . Also, bib file for all top-100 papers are available. Thanks, doodhwala, Sven and grepinsight !

Understanding / Generalization / Transfer

Optimization / training techniques, unsupervised / generative models.

Image / Video / Etc

Natural language processing / rnns, speech / other domain, reinforcement learning / robotics, more papers from 2016.

(More than Top 100)

Book / Survey / Review

Video lectures / tutorials / blogs.

Convolutional Neural Network Models

Image: Segmentation / Object Detection

Newly published papers (< 6 months) which are worth reading

Classic papers published before 2012

HW / SW / Dataset


Appendix: More than Top 100


Thank you for all your contributions. Please make sure to read the contributing guide before you make a pull request.

To the extent possible under law, Terry T. Um has waived all copyright and related or neighboring rights to this work.

How to Read Research Papers: A Pragmatic Approach for ML Practitioners

deep learning research papers for beginners

Is it necessary for data scientists or machine-learning experts to read research papers?

The short answer is yes. And don’t worry if you lack a formal academic background or have only obtained an undergraduate degree in the field of machine learning.

Reading academic research papers may be intimidating for individuals without an extensive educational background. However, a lack of academic reading experience should not prevent Data scientists from taking advantage of a valuable source of information and knowledge for machine learning and AI development .

This article provides a hands-on tutorial for data scientists of any skill level to read research papers published in academic journals such as NeurIPS , JMLR , ICML, and so on.

Before diving wholeheartedly into how to read research papers, the first phases of learning how to read research papers cover selecting relevant topics and research papers.

Step 1: Identify a topic

The domain of machine learning and data science is home to a plethora of subject areas that may be studied. But this does not necessarily imply that tackling each topic within machine learning is the best option.

Although generalization for entry-level practitioners is advised, I’m guessing that when it comes to long-term machine learning, career prospects, practitioners, and industry interest often shifts to specialization.

Identifying a niche topic to work on may be difficult, but good. Still, a rule of thumb is to select an ML field in which you are either interested in obtaining a professional position or already have experience.

Deep Learning is one of my interests, and I’m a Computer Vision Engineer that uses deep learning models in apps to solve computer vision problems professionally. As a result, I’m interested in topics like pose estimation, action classification, and gesture identification.

Based on roles, the following are examples of ML/DS occupations and related themes to consider.

deep learning research papers for beginners

For this article, I’ll select the topic Pose Estimation to explore and choose associated research papers to study.

Step 2: Finding research papers

One of the most excellent tools to use while looking at machine learning-related research papers, datasets, code, and other related materials is PapersWithCode .

We use the search engine on the PapersWithCode website to get relevant research papers and content for our chosen topic, “Pose Estimation.” The following image shows you how it’s done.

The search results page contains a short explanation of the searched topic, followed by a table of associated datasets, models, papers, and code. Without going into too much detail, the area of interest for this use case is the “Greatest papers with code”. This section contains the relevant papers related to the task or topic. For the purpose of this article, I’ll select the DensePose: Dense Human Pose Estimation In The Wild .

Step 3: First pass (gaining context and understanding)

A notepad with a lightbulb drawn on it.

At this point, we’ve selected a research paper to study and are prepared to extract any valuable learnings and findings from its content.

It’s only natural that your first impulse is to start writing notes and reading the document from beginning to end, perhaps taking some rest in between. However, having a context for the content of a study paper is a more practical way to read it. The title, abstract, and conclusion are three key parts of any research paper to gain an understanding.

The goal of the first pass of your chosen paper is to achieve the following:

The title is the first point of information sharing between the authors and the reader. Therefore, research papers titles are direct and composed in a manner that leaves no ambiguity.

The research paper title is the most telling aspect since it indicates the study’s relevance to your work. The importance of the title is to give a brief perception of the paper’s content.

In this situation, the title is “DensePose: Dense Human Pose Estimation in the Wild.” This gives a broad overview of the work and implies that it will look at how to provide pose estimations in environments with high levels of activity and realistic situations properly.

The abstract portion gives a summarized version of the paper. It’s a short section that contains 300-500 words and tells you what the paper is about in a nutshell. The abstract is a brief text that provides an overview of the article’s content, researchers’ objectives, methods, and techniques.

When reading an abstract of a machine-learning research paper, you’ll typically come across mentions of datasets, methods, algorithms, and other terms. Keywords relevant to the article’s content provide context. It may be helpful to take notes and keep track of all keywords at this point.

For the paper: “ DensePose: Dense Human Pose Estimation In The Wild “, I identified in the abstract the following keywords: pose estimation, COCO dataset, CNN, region-based models, real-time.

It’s not uncommon to experience fatigue when reading the paper from top to bottom at your first initial pass, especially for Data Scientists and practitioners with no prior advanced academic experience. Although extracting information from the later sections of a paper might seem tedious after a long study session, the conclusion sections are often short. Hence reading the conclusion section in the first pass is recommended.

The conclusion section is a brief compendium of the work’s author or authors and/or contributions and accomplishments and promises for future developments and limitations.

Before reading the main content of a research paper, read the conclusion section to see if the researcher’s contributions, problem domain, and outcomes match your needs.

Following this particular brief first pass step enables a sufficient understanding and overview of the research paper’s scope and objectives, as well as a context for its content. You’ll be able to get more detailed information out of its content by going through it again with laser attention.

Step 4: Second pass (content familiarization)

Content familiarization is a process that’s relevant to the initial steps. The systematic approach to reading the research paper presented in this article. The familiarity process is a step that involves the introduction section and figures within the research paper.

As previously mentioned, the urge to plunge straight into the core of the research paper is not required because knowledge acclimatization provides an easier and more comprehensive examination of the study in later passes.


Introductory sections of research papers are written to provide an overview of the objective of the research efforts. This objective mentions and explains problem domains, research scope, prior research efforts, and methodologies.

It’s normal to find parallels to past research work in this area, using similar or distinct methods. Other papers’ citations provide the scope and breadth of the problem domain, which broadens the exploratory zone for the reader. Perhaps incorporating the procedure outlined in Step 3 is sufficient at this point.

Another aspect of the benefit provided by the introduction section is the presentation of requisite knowledge required to approach and understand the content of the research paper.

Graph, diagrams, figures

Illustrative materials within the research paper ensure that readers can comprehend factors that support problem definition or explanations of methods presented. Commonly, tables are used within research papers to provide information on the quantitative performances of novel techniques in comparison to similar approaches.

Image showing the Comparison of DensePose with other single person pose estimation solutions,

Generally, the visual representation of data and performance enables the development of an intuitive understanding of the paper’s context. In the Dense Pose paper mentioned earlier, illustrations are used to depict the performance of the author’s approach to pose estimation and create. An overall understanding of the steps involved in generating and annotating data samples.

In the realm of deep learning, it’s common to find topological illustrations depicting the structure of artificial neural networks. Again this adds to the creation of intuitive understanding for any reader. Through illustrations and figures, readers may interpret the information themselves and gain a fuller perspective of it without having any preconceived notions about what outcomes should be.

Image showing the cross-cascading architecture of DensePose.

Step 5: Third pass (deep reading)

The third pass of the paper is similar to the second, though it covers a greater portion of the text. The most important thing about this pass is that you avoid any complex arithmetic or technique formulations that may be difficult for you. During this pass, you can also skip over any words and definitions that you don’t understand or aren’t familiar with. These unfamiliar terms, algorithms, or techniques should be noted to return to later.

Image of a magnifying glass depicting deep reading.

During this pass, your primary objective is to gain a broad understanding of what’s covered in the paper. Approach the paper, starting again from the abstract to the conclusion, but be sure to take intermediary breaks in between sections. Moreover, it’s recommended to have a notepad, where all key insights and takeaways are noted, alongside the unfamiliar terms and concepts.

The Pomodoro Technique is an effective method of managing time allocated to deep reading or study. Explained simply, the Pomodoro Technique involves the segmentation of the day into blocks of work, followed by short breaks.

What works for me is the 50/15 split, that is, 50 minutes studying and 15 minutes allocated to breaks. I tend to execute this split twice consecutively before taking a more extended break of 30 minutes. If you are unfamiliar with this time management technique, adopt a relatively easy division such as 25/5 and adjust the time split according to your focus and time capacity.

Step 6: Forth pass (final pass)

The final pass is typically one that involves an exertion of your mental and learning abilities, as it involves going through the unfamiliar terms, terminologies, concepts, and algorithms noted in the previous pass. This pass focuses on using external material to understand the recorded unfamiliar aspects of the paper.

In-depth studies of unfamiliar subjects have no specified time length, and at times efforts span into the days and weeks. The critical factor to a successful final pass is locating the appropriate sources for further exploration.

 Unfortunately, there isn’t one source on the Internet that provides the wealth of information you require. Still, there are multiple sources that, when used in unison and appropriately, fill knowledge gaps. Below are a few of these resources.

The Reference sections of research papers mention techniques and algorithms. Consequently, the current paper either draws inspiration from or builds upon, which is why the reference section is a useful source to use in your deep reading sessions.

Step 7: Summary (optional)

In almost a decade of academic and professional undertakings of technology-associated subjects and roles, the most effective method of ensuring any new information learned is retained in my long-term memory through the recapitulation of explored topics. By rewriting new information in my own words, either written or typed, I’m able to reinforce the presented ideas in an understandable and memorable manner.

An image of someone blogging on a laptop

To take it one step further, it’s possible to publicize learning efforts and notes through the utilization of blogging platforms and social media. An attempt to explain the freshly explored concept to a broad audience, assuming a reader isn’t accustomed to the topic or subject, requires understanding topics in intrinsic details.

Undoubtedly, reading research papers for novice Data Scientists and ML practitioners can be daunting and challenging; even seasoned practitioners find it difficult to digest the content of research papers in a single pass successfully.

The nature of the Data Science profession is very practical and involved. Meaning, there’s a requirement for its practitioners to employ an academic mindset, more so as the Data Science domain is closely associated with AI, which is still a developing field.

To summarize, here are all of the steps you should follow to read a research paper:

Thanks for reading!

About the Authors

Avatar photo

Related resources

Letters, numbers, and padlocks on black background

Improving Machine Learning Security Skills at a DEF CON Competition

deep learning research papers for beginners

Community Spotlight: Democratizing Computer Vision and Conversational AI in Kenya

deep learning research papers for beginners

An Important Skill for Data Scientists and Machine Learning Practitioners

deep learning research papers for beginners

AI Pioneers Write So Should Data Scientists

deep learning research papers for beginners

Meet the Researcher: Peerapon Vateekul, Deep Learning Solutions for Medical Diagnosis and NLP

How nerfs helped me re-imagine the world.

deep learning research papers for beginners

NVIDIA Jetson Project of the Month: Recognizing Birds by Sound

deep learning research papers for beginners

Controlled Adaptation of Speech Recognition Models to New Domains

Person standing in front of waterfall

Entropy-Based Methods for Word-Level ASR Confidence Estimation

deep learning research papers for beginners

X-ray Research Reveals Hazards in Airport Luggage Using Crystal Physics

deep learning research papers for beginners

Collection of must read papers for Data Science, or Machine Learning / Deep Learning Engineer


Name already in use.

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more about the CLI .

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Must read papers for data science, ml, and dl, curated collection of data science, machine learning and deep learning papers, reviews and articles that are on must read list..

NOTE: 🚧 in process of updating, let me know what additional papers, articles, blogs to add I will add them here.
👉 ⭐ this repo




🥇 - Read it first

🥈 - Read it second

🥉 - Read it third

Data Science

📊 pre-processing & eda.

🥇 📄 Data preprocessing - Tidy data - by Hadley Wickham

📓 General DS

🥇 📄 Statistical Modeling: The Two Cultures - by Leo Breiman

🥈 📄 A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning

🥇 📄 Frequentism and Bayesianism: A Python-driven Primer by Jake VanderPlas

Machine Learning

🎯 general ml.

🥇 📄 Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning - by Sebastian Raschka

🥇 📄 A Brief Introduction into Machine Learning - by Gunnar Ratsch

🥉 📄 An Introduction to the Conjugate Gradient Method Without the Agonizing Pain - by Jonathan Richard Shewchuk

🥉 📄 On Model Stability as a Function of Random Seed

🔍 Outlier/Anomaly detection

🥇 📰 Outlier Detection : A Survey

🥈 📄 XGBoost: A Scalable Tree Boosting System

🥈 📄 LightGBM: A Highly Efficient Gradient BoostingDecision Tree

🥈 📄 AdaBoost and the Super Bowl of Classifiers - A Tutorial Introduction to Adaptive Boosting

🥉 📄 Greedy Function Approximation: A Gradient Boosting Machine

📖 Unraveling Blackbox ML

🥉 📄 Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation

🥉 📄 Data Shapley: Equitable Valuation of Data for Machine Learning

✂️ Dimensionality Reduction

🥇 📄 A Tutorial on Principal Component Analysis

🥈 📄 How to Use t-SNE Effectively

🥉 📄 Visualizing Data using t-SNE

📈 Optimization

🥇 📄 A Tutorial on Bayesian Optimization

🥈 📄 Taking the Human Out of the Loop: A review of Bayesian Optimization

Famous Blogs

Sebastian Raschka

🎱 🔮 Recommenders

🥇 📄 A Survey of Collaborative Filtering Techniques

🥇 📄 Collaborative Filtering Recommender Systems

🥇 📄 Deep Learning Based Recommender System: A Survey and New Perspectives

🥇 📄 🤔 ⭐ Explainable Recommendation: A Survey and New Perspectives ⭐

Case Studies

🥈 📄 The Netflix Recommender System: Algorithms, Business Value,and Innovation

🥈 📄 Two Decades of Recommender Systems at

🥈 🌐 How Does Spotify Know You So Well?

👉 More In-Depth study, 📕 Recommender Systems Handbook

Famous Deep Learning Blogs 🤠

🌐 Stanford UFLDL Deep Learning Tutorial


🌐 Colah's Blog

🌐 Andrej Karpathy

🌐 Zack Lipton

🌐 Sebastian Ruder

🌐 Jay Alammar

📚 Neural Networks and Deep Learning Neural Networks

⭐ 🥇 📰 The Matrix Calculus You Need For Deep Learning - Terence Parr and Jeremy Howard ⭐

🥇 📰 Deep learning -Yann LeCun, Yoshua Bengio & Geoffrey Hinton

🥇 📄 Generalization in Deep Learning

🥇 📄 Topology of Learning in Artificial Neural Networks

🥇 📄 Dropout: A Simple Way to Prevent Neural Networks from Overfitting

🥈 📄 Polynomial Regression As an Alternative to Neural Nets

🥈 🌐 The Neural Network Zoo

🥈 🌐 Image Completion with Deep Learning in TensorFlow

🥈 📄 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

🥉 📄 A systematic study of the class imbalance problem in convolutional neural networks

🥉 📄 All Neural Networks are Created Equal

🥉 📄 Adam: A Method for Stochastic Optimization

🥉 📄 AutoML: A Survey of the State-of-the-Art

🥇 📄 Visualizing and Understanding Convolutional Networks -by Andrej Karpathy Justin Johnson Li Fei-Fei

🥈 📄 Deep Residual Learning for Image Recognition

🥈 📄 AlexNet-ImageNet Classification with Deep Convolutional Neural Networks


🥉 📄 A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction

🥉 📄 Large-scale Video Classification with Convolutional Neural Networks

🥉 📄 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

⚫ CapsNet 🔱

🥇 📄 Dynamic Routing Between Capsules

Blog explaning, "What are CapsNet, or Capsule Networks?"

Capsule Networks Tutorial by Aureline Geron

🏞️ 💬 Image Captioning

🥇 📄 Show and Tell: A Neural Image Caption Generator

🥈 📄 Neural Machine Translation by Jointly Learning to Align and Translate

🥈 📄 StyleNet: Generating Attractive Visual Captions with Styles

🥈 📄 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

🥈 📄 Where to put the Image in an Image Caption Generator

🥈 📄 Dank Learning: Generating Memes Using Deep Neural Networks

🚗 🚶‍♂️ Object Detection 🦅 🏈

🥈 📄 ResNet-Deep Residual Learning for Image Recognition

🥈 📄 YOLO-You Only Look Once: Unified, Real-Time Object Detection

🥈 📄 Microsoft COCO: Common Objects in Context

🥈 📄 (R-CNN) Rich feature hierarchies for accurate object detection and semantic segmentation

🥈 📄 Fast R-CNN

🥈 📄 Faster R-CNN

🥈 📄 Mask R-CNN

🚗 🚶‍♂️ 👫 Pose Detection 🏃 💃

🥈 📄 DensePose: Dense Human Pose Estimation In The Wild

🥈 📄 Parsing R-CNN for Instance-Level Human Analysis

🔡 🔣 Deep NLP 💱 🔢

🥇 📄 A Primer on Neural Network Models for Natural Language Processing

🥇 📄 Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

🥇 📄 On the Properties of Neural Machine Translation: Encoder–Decoder Approaches

🥇 📄 LSTM: A Search Space Odyssey - by Klaus Greff et al.

🥇 📄 A Critical Review of Recurrent Neural Networksfor Sequence Learning

🥇 📄 Visualizing and Understanding Recurrent Networks

⭐ 🥇 📄 Attention Is All You Need ⭐

🥇 📄 An Empirical Exploration of Recurrent Network Architectures

🥇 📄 Open AI (GPT-2) Language Models are Unsupervised Multitask Learners

🥇 📄 BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding

🥉 📄 Parameter-Efficient Transfer Learning for NLP

🥉 📄 A Sensitivity Analysis of (and Practitioners’ Guide to) ConvolutionalNeural Networks for Sentence Classification

🥉 📄 A Survey on Recent Advances in Named Entity Recognition from Deep Learning models

🥉 📄 Convolutional Neural Networks for Sentence Classification

🥉 📄 Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

🥉 📄 Single Headed Attention RNN: Stop Thinking With Your Head

🥇 📄 Generative Adversarial Nets - Goodfellow et al.

📚 GAN Rabbit Hole -> GAN Papers

⭕ ➖ ⭕ GNNs (Graph Neural Networks)

🥉 📄 A Comprehensive Survey on Graph Neural Networks

👨‍⚕️ 💉 Medical AI 💊 🔬

Machine learning classifiers and fMRI: a tutorial overview - by Francisco et al.

👇 Cool Stuff 👇

🔊 📄 SoundNet: Learning Sound Representations from Unlabeled Video

🎨 📄 CAN: Creative Adversarial NetworksGenerating “Art” by Learning About Styles andDeviating from Style Norms

🎨 📄 Deep Painterly Harmonization

🕺 💃 📄 Everybody Dance Now

⚽ Soccer on Your Tabletop

👱‍♀️ 💇‍♀️ 📄 SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color

📸 📄 Handheld Mobile Photography in Very Low Light

🏯 🕌 📄 Learning Deep Features for Scene Recognitionusing Places Database

🚅 🚄 📄 High-Speed Tracking withKernelized Correlation Filters

🎬 📄 Recent progress in semantic image segmentation

Rabbit hole -> 🔊 🌐 Analytics Vidhya Top 10 Audio Processing Tasks and their papers

:blonde_man: -> 👴 📄 📄 Face Aging With Condintional GANS

:blonde_man: -> 👴 📄 📄 Dual Conditional GANs for Face Aging and Rejuvenation

⚖️ 📄 BAGAN: Data Augmentation with Balancing GAN

📰 Cap Stone Projects 📰

8 Awesome Data Science Capstone Projects

10 Powerful Applications of Linear Algebra in Data Science

Top 5 Interesting Applications of GANs

Deep Learning Applications a beginner can build in minutes

2019-10-28 Started must-read-papers-for-ml repo

2019-10-29 Added analytics vidhya use case studies article links

2019-10-30 Added Outlier/Anomaly detection paper, separated Boosting, CNN, Object Detection, NLP papers, and added Image captioning papers

2019-10-31 Added Famous Blogs from Deep and Machine Learning Researchers

2019-11-1 Fixed markdown issues, added contribution guideline

2019-11-20 Added Recommender Surveys, and Papers

2019-12-12 Added R-CNN variants, PoseNets, GNNs

2020-02-23 Added GRU paper

Help | Advanced Search

Computer Science > Machine Learning

Title: reinforcement learning in practice: opportunities and challenges.

Abstract: This article is a gentle discussion about the field of reinforcement learning in practice, about opportunities and challenges, touching a broad range of topics, with perspectives and without technical details. The article is based on both historical and recent research papers, surveys, tutorials, talks, blogs, books, (panel) discussions, and workshops/conferences. Various groups of readers, like researchers, engineers, students, managers, investors, officers, and people wanting to know more about the field, may find the article interesting. In this article, we first give a brief introduction to reinforcement learning (RL), and its relationship with deep learning, machine learning and AI. Then we discuss opportunities of RL, in particular, products and services, games, bandits, recommender systems, robotics, transportation, finance and economics, healthcare, education, combinatorial optimization, computer systems, and science and engineering. Then we discuss challenges, in particular, 1) foundation, 2) representation, 3) reward, 4) exploration, 5) model, simulation, planning, and benchmarks, 6) off-policy/offline learning, 7) learning to learn a.k.a. meta-learning, 8) explainability and interpretability, 9) constraints, 10) software development and deployment, 11) business perspectives, and 12) more challenges. We conclude with a discussion, attempting to answer: "Why has RL not been widely adopted in practice yet?" and "When is RL helpful?".

Submission history

References & Citations

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

upGrad blog

Top 16 Exciting Deep Learning Project Ideas for Beginners [2023]

' src=

Experienced Developer, Team Player and a Leader with a demonstrated history of working in startups. Strong engineering professional with a Bachelor of Technology (BTech) focused in Computer Science from Indian…

Table of Contents

Deep Learning Project Ideas

Although a new technological advancement, the scope of Deep Learning is expanding exponentially. This technology aims to imitate the biological neural network, that is, of the human brain. While the origins of Deep Learning dates back to the 1950s, it is only with the advancement and adoption of Artificial Intelligence and Machine Learning that it came to the limelight. So, if you are an ML beginner, the best thing you can do is work on some Deep learning project ideas.

You don’t have to waste time finding the best deep learning research topic for you. This article includes a variety of deep learning project topics in a categorised manner

We, here at upGrad, believe in a practical approach as theoretical knowledge alone won’t be of help in a real-time work environment. In this article, we will be exploring some interesting  deep learning project ideas which beginners can work on to put their knowledge to test. In this article, you will find top deep learning project ideas for beginners to get hands-on experience on deep learning.

A subset of Machine Learning, Deep Learning leverages artificial neural networks arranged hierarchically to perform specific ML tasks. Deep Learning networks use the unsupervised learning approach – they learn from unstructured or unlabeled data. Artificial neural networks are just like the human brain, with neuron nodes interconnected to form a web-like structure.

While traditional learning models analyze data using a linear approach, the hierarchical function of Deep Learning systems is designed to process and analyze data in a nonlinear approach.

Check out our free deep learning courses

deep learning research papers for beginners

Deep Learning architectures like deep neural networks, recurrent neural networks, and deep belief networks have found applications in various fields including natural language processing, computer vision, bioinformatics, speech recognition, audio recognition, machine translation, social network filtering, drug design, and even board game programs. As new advances are being made in this domain, it is helping ML and Deep Learning experts to design innovative and functional Deep Learning projects. The more deep learning project ideas you try, the more experience you gain.

Today, we’ll discuss the top seven amazing Deep Learning projects that are helping us reach new heights of achievement.

In this article, we have covered top deep learning   project ideas . We started with some beginner projects which you can solve with ease. Once you finish with these simple projects, I suggest you go back, learn a few more concepts and then try the intermediate projects. When you feel confident, you can then tackle the advanced projects. If you wish to improve your skills on the same, you need to get your hands on these machine learning courses .

So, here are a few Deep Learning Project ideas  which beginners can work on:

Deep Learning Project Ideas: Beginners Level

This list of deep learning  project ideas for students is suited for beginners, and those just starting out with ML in general. These deep learning project ideas will get you going with all the practicalities you need to succeed in your career.

Further, if you’re looking for  deep learning project ideas for final year , this list should get you going. So, without further ado, let’s jump straight into some  deep learning project ideas that will strengthen your base and allow you to climb up the ladder.

1. Image Classification with CIFAR-10 dataset

One of the best ideas to start experimenting you hands-on  deep learning projects for students is working on Image classification. CIFAR-10 is a large dataset containing over 60,000 (32×32 size) colour images categorized into ten classes, wherein each class has 6,000 images. The training set contains 50,000 images, whereas the test set contains 10,000 images. The training set will be divided into five separate sections, each having 10,000 images arranged randomly. As for the test set, it will include 1000 images that are randomly chosen from each of the ten classes.

In this project, you’ll develop an image classification system that can identify the class of an input image. Image classification is a pivotal application in the field of deep learning, and hence, you will gain knowledge on various deep learning concepts while working on this project.

2. Visual tracking system

A visual tracking system is designed to track and locate moving object(s) in a given time frame via a camera. It is a handy tool that has numerous applications such as security and surveillance, medical imaging, augmented reality, traffic control, video editing and communication, and human-computer interaction.

This system uses a deep learning algorithm to analyze sequential video frames, after which it tracks the movement of target objects between the frames. The two core components of this visual tracking system are:

3. Face detection system

This is one of the excellent deep learning project ideas for beginners. With the advance of deep learning, facial recognition technology has also advanced tremendously. Face recognition technology is a subset of Object Detection that focuses on observing the instance of semantic objects. It is designed to track and visualize human faces within digital images. 

deep learning project ideas

In this deep learning project, you will learn how to perform human face recognition in real-time. You have to develop the model in Python and OpenCV. 

Deep Learning Project Ideas: Intermediate Level

4. digit recognition system.

As the name suggests, this project involves developing a digit recognition system that can classify digits based on the set tenets. Here, you’ll be using the MNIST dataset containing images (28 X 28 size). 

This project aims to create a recognition system that can classify digits ranging from 0 to 9 using a combination of shallow network and deep neural network and by implementing logistic regression. Softmax Regression or Multinomial Logistic Regression is the ideal choice for this project. Since this technique is a generalization of logistic regression, it is apt for multi-class classification, assuming that all the classes are mutually exclusive).

In this project, you will model a chatbot using IBM Watson’s API. Watson is the prime example of what AI can help us accomplish. The idea behind this project is to harness Watson’s deep learning abilities to create a chatbot that can engage with humans just like another human being. Chatbots are supremely intelligent and can answer to human question or requests in real-time. This is the reason why an increasing number of companies across all domains are adopting chatbots in their customer support infrastructure. 

deep learning project ideas

This project isn’t a very challenging one. All you need is to have Python 2/3 in your machine, a Bluemix account, and of course, an active Internet connection! If you wish to scale it up a notch, you can visit Github repository and improve your chatbot’s features by including an animated car dashboard.

Read:  How to make chatbot in Python?

6. Music genre classification system

This is one of the interesting deep learning project ideas. This is an excellent project to nurture and improve your deep learning skills. You will create a deep learning model that uses neural networks to classify the genre of music automatically. For this project, you will use an FMA ( Free Music Archive ) dataset. FMA is an interactive library comprising high-quality and legal audio downloads. It is an open-source and easily accessible dataset that is great for a host of MIR tasks, including browsing and organizing vast music collections.

However, keep in mind that before you can use the model to classify audio files by genre, you will have to extract the relevant information from the audio samples (like spectrograms, MFCC, etc.).

Best Machine Learning Courses & AI Courses Online

7. drowsiness detection system.

The drowsiness of drivers is one of the main reasons behind road accidents. It is natural for drivers who frequent long routes to doze off when behind the steering wheel. Even stress and lack of sleep can cause drivers to feel drowsy while driving. This project aims to prevent and reduce such accidents by creating a drowsiness detection agent.  

Here, you will use Python, OpenCV, and Keras to build a system that can detect the closed eyes of drivers and alert them if ever they fall asleep while driving. Even if the driver’s eyes are closed for a few seconds, this system will immediately inform the driver, thereby preventing terrible road accidents. OpenCV will monitor and collect the driver’s images via a webcam and feed them into the deep learning model that will classify the driver’s eyes as ‘open’ or ‘closed.’

8. Image caption generator

This is one of the trending deep learning project ideas. This is a Python-based deep learning project that leverages Convolutional Neural Networks and LTSM (a type of Recurrent Neural Network) to build a deep learning model that can generate captions for an image. 

An Image caption generator combines both computer vision and natural language processing techniques to analyze and identify the context of an image and describe them accordingly in natural human languages (for example, English, Spanish, Danish, etc.). This project will strengthen your knowledge of CNN and LSTM, and you will learn how to implement them in real-world applications as this.

9. Colouring old B&W photos

For long, automated image colourization of B&W images has been a hot topic of exploration in the field of computer vision and deep learning. A recent study stated that if we train a neural network using a voluminous and rich dataset, we could create a deep learning model that can hallucinate colours within a black and white photograph. 

In this image colourization project, you will be using Python and OpenCV DNN architecture (it is trained on ImageNet dataset). The aim is to create a coloured reproduction of grayscale images. For this purpose, you will use a pre-trained Caffe model , a prototxt file, and a NumPy file.

Deep Learning Project Ideas – Advanced Level

Below are some best ideas fo r advanced deep learning projects .  These are some deep learning research topic s that will definitely challenge your depth of knowledge. 

10. Detector

Detectron is a Facebook AI Research’s (FAIR) software system designed to execute and run state-of-the-art Object Detection algorithms. Written in Python, this Deep Learning project is based on the Caffe2 deep learning framework.

Detectron has been the foundation for many wonderful research projects including Feature Pyramid Networks for Object Detection ; Mask R-CNN ; Detecting and Recognizing Human-Object Interactions ; Focal Loss for Dense Object Detection ; Non-local Neural Networks , and Learning to Segment Every Thing , to name a few.

Detectron offers a high-quality and high-performance codebase for object detection research. It includes over 50 pre-trained models and is extremely flexible – it supports rapid implementation and evaluation of novel research.

11. WaveGlow

This is one of the interesting deep learning project ideas. WaveGlow is a flow-based Generative Network for Speech Synthesis developed and offered by NVIDIA. It can generate high-quality speech from mel-spectograms. It blends the insights obtained from WaveNet and Glow to facilitate fast, efficient, and high-quality audio synthesis, without requiring auto-regression.

WaveGlow can be implemented via a single network and also trained using a single cost function. The aim is to optimize the likelihood of the training data, thereby makes the training procedure manageable and stable.

12. OpenCog

OpenCog project includes the core components and a platform to facilitate AI R&D. It aims to design an open-source Artificial General Intelligence (AGI) framework that can accurately capture the spirit of the human brain’s architecture and dynamics. The AI bot, Sophia is one of the finest examples of AGI.

OpenCog also encompasses OpenCog Prime – an advanced architecture for robot and virtual embodied cognition that includes an assortment of interacting components to give birth to human-equivalent artificial general intelligence (AGI) as an emergent phenomenon of the system as a whole.

13. DeepMimic

DeepMimic is an “example-guided Deep Reinforcement Learning of Physics-based character skills.” In other words, it is a neural network trained by leveraging reinforcement learning to reproduce motion-captured movements via a simulated humanoid, or any other physical agent. 

The functioning of DeepMimic is pretty simple. First, you need to set up a simulation of the thing you wish to animate (you can capture someone making specific movements and try to imitate that). Now, you use the motion capture data to train a neural network through reinforcement learning. The input here is the configuration of the arms and legs at different time points while the reward is the difference between the real thing and the simulation at specific time points.

In-demand Machine Learning Skills

14. ibm watson.

One of the most excellent examples of Machine Learning and Deep Learning is IBM Watson. The greatest aspect of IBM Watson is that it allows Data Scientists and ML Engineers/Developers to collaborate on an integrated platform to enhance and automate the AI life cycle. Watson can simplify, accelerate, and manage AI deployments, thereby enabling companies to harness the potential of both ML and Deep Learning to boost business value. 

IBM Watson is Integrated with the Watson Studio to empower cross-functional teams to deploy, monitor, and optimize ML/Deep Learning models quickly and efficiently. It can automatically generate APIs to help your developers incorporate AI into their applications readily. On top of that, it comes with intuitive dashboards that make it convenient for the teams to manage models in production seamlessly.

15. Google Brain

This is one of the excellent deep learning project ideas. The Google Brain project is Deep Learning AI research that began in 2011 at Google. The Google Brain team led by Google Fellow Jeff Dean, Google Researcher Greg Corrado, and Stanford University Professor Andrew Ng aimed to bring Deep Learning and Machine Learning out from the confines of the lab into the real world. They designed one of the largest neural networks for ML – it comprised of 16,000 computer processors connected together. 

To test the capabilities of a neural network of this massive size, the Google Brain team fed the network with random thumbnails of cat images sourced from 10 million YouTube videos. However, the catch is that they didn’t train the system to recognize what a cat looks like. But the intelligent system left everyone astonished – it taught itself how to identify cats and further went on to assemble the features of a cat to complete the image of a cat! 

The Google Brain project successfully proved that software-based neural networks can imitate the functioning of the human brain, wherein each neuron is trained to detect particular objects. How Deep Learning Algorithms are Transforming our Everyday Lives

16. 12 Sigma’s Lung Cancer detection algorithm

12 Sigma has developed an AI algorithm that can reduce diagnostic errors associated with lung cancer in its early stages and detect signs of lung cancer much faster than traditional approaches.

According to Xin Zhong, the Co-founder and CEO of Sigma Technologies, usually conventional cancer detection practices take time to detect lung cancer. However, 12 Sigma’s AI algorithm system can reduce the diagnosis time, leading to a better rate of survival for lung cancer patients.

Generally, doctors diagnose lung cancer by carefully examining CT scan images to check for small nodules and classify them as benign or malignant. It can take over ten minutes for doctors to visually inspect the patient’s CT images for nodules, plus additional time for classifying the nodules as benign or malignant.

Needless to say, there always remains a high possibility of human errors. 12 Sigma maintains that its AI algorithm can inspect the CT images and classify nodules within two minutes .

One of the most famous types of artificial neural networks is CNN, also known as Convolutional Neural Networks which is majorly used for image and object recognition as well as classification. It is a type of supervised Deep Learning, which means that is it able to learn on its own, without any human supervision. It can work with both structured and unstructured data. CNN.

By integrating CNNs and deep learning world-class applications are made. Some of the applications of CNN include Facial recognition, analyzing documents, collecting natural history, analyzing climate and even advertisements. Especially in the world of marketing and advertisements, CNN has brought a huge change by introducing data-driven personalized advertising. 

Therefore, it is fair to say that CNN deep learning projects can help add significant weight to one’s experience and resume. Below are some ideas for CNN deep learning projects. These projects are best for beginners to advanced levels to get a hands-on experience with CNN. As you go down the list, difficulty levels increases. 

With the constantly shifting climate changes and various other pathogenic bacteria and fungus, the life span of the plants are getting decreased. Especially due to the use of harsh pesticides, a new type of disease may emerge within a plant. Diagnosing these problems at an early stage can help us save a variety of plant species that are on the verge of extinction and these deep learning research topic s assist to make that happen. 

It is pretty time-consuming to manually detect the disease, therefore image processing can help make the process swifter. In this project, machine vision equipment is used to collect images and judge whether or not the plant has any fatal disease. It is quite a popular topic among CNN deep learning projects.   Manual design of features plus classifier or conventional image processing algorithm is used in this project. 

Knowledge of MATLAB is essential to execute this project. Apart from this knowledge of Image processing and CNNs is also required. 

This recognition system helps figure out the traffic signal lights along the stress, speed limit signs and various other signs such as caution signs, blend signs and so on. This recognition system is exp[ected to play a huge role in smart vehicles and self-driving cars in the future. 

As per the reports of WHO, on average approx, 1.3 million people die every year due to road traffic crashes and about 20 to 50 million people suffer from fatal and non-fatal road accidents. Therefore, a system like this can play a significant role in reducing the numbers. 

This program requires the knowledge of Python, CNN and Build CNN. It also requires some basic knowledge of Keras, some Python library Matplotlip, PIL, image classification and Scikit-learn. 

As simple as it may sound, after the emergence of AI, it has become so important to differentiate between real and mimic. Detecting age and gender is a project that has been around for quite a long now. However, after the emergence of AI, the process has become a bit tricky. Continuous changes are been made to improve the outcomes.

 The programing language that is used for executing this project in Python. The objective of this program is to give an approximate idea of the person’s gender and age by using their pictures. To execute this project, you’ll be required to have an in-depth knowledge of Python, OpenCV and CNNs. 

With the speed at which globalization is becoming the new norm, knowing multiple languages is becoming more and more important. However, not everyone has the knack or interest to learn multiple languages. Apart from that, oftentimes we are required to know a certain language even for travelling purposes. The majority of us rely on Google Translator which functions on the basics of Machine Translation (MT).

It is an in-demand topic under computer linguistics where ML is used to translate one language to another. However, it is under more advanced deep learning projects . Amongst the varied choices, Neural Machine Translation (NMT) is considered to be the most efficient method. Having knowledge of RNN sequence-to-sequence learning is important for this project. 

Smart devices such as TVs, mobile phones and cameras are becoming more advanced every day. We all are familiar with the feature of gesture control in our smartphones, however, it can also be implemented in devices such as TVs. 

By incorporating gesture recognition programs into TVs, people will be able to perform a bunch of basic tasks without having the need of using remotes. Tasks like changing channels, increasing volume, pausing, and fast-forwarding, all can be done with the help of gesture recognition. 

For this few hundred of training, data are required, which can then be classified into the major classes, like the ones mentioned before. These videos of various people performing the hand gestures will be used as training data, and when anybody does a similar hand gesture, it will be detected by the smart TV’s webcam and behave accordingly. It is definitely a deep learning project that is more on the advanced side. 

Popular AI and ML Blogs & Free Courses

These are only a handful of the real-world applications of Deep Learning made so far. The technology is still very young – it is developing as we speak. Deep Learning holds immense possibilities to give birth to pioneering innovations that can help humankind to address some of the fundamental challenges of the real world .

Check out Advanced Certification Program in Machine Learning & Cloud  with IIT Madras, the best engineering school in the country to create a program that teaches you not only machine learning but also the effective deployment of it using the cloud infrastructure. Our aim with this program is to open the doors of the most selective institute in the country and give learners access to amazing faculty & resources in order to master a skill that is in high & growing

Is Deep Learning just a hype or does it have real-life applications?

Deep Learning has recently found a number of useful applications. Deep learning is already changing a number of organizations and is projected to bring about a revolution in practically all industries, from Netflix's well-known movie recommendation system to Google's self-driving automobiles. Deep learning models are utilized in everything from cancer diagnosis to presidential election victory, from creating art and literature to making actual money. As a result, it would be incorrect to dismiss it as a fad. At any given time, Google and Facebook are translating content into hundreds of languages. This is accomplished by the application of deep learning models to NLP tasks, and it is a big success story.

What is the difference between Deep Learning and Machine Learning?

The most significant distinction between deep learning and regular machine learning is how well it performs when data scales up. Deep learning techniques do not perform well when the data is small. This is due to the fact that deep learning algorithms require a vast amount of data to fully comprehend it. Traditional machine learning algorithms, on the other hand, with their handmade rules, win in this circumstance. Most used features in machine learning must be chosen by an experienced and then hand-coded according to the domain and data type.

What are the prerequisites for starting out in Deep Learning?

Starting out with deep learning isn't nearly as difficult as some people make it out to be. Before getting into deep learning, you should brush up on a few fundamentals. Probability, derivatives, linear algebra, and a few other fundamental concepts should be familiar to you. Any machine learning task necessitates a fundamental understanding of statistics. Deep learning in real-world issues necessitates a reasonable level of coding ability. Deep learning is built on the foundation of machine learning. Without first grasping the basics of machine learning, it is impossible to begin mastering deep learning.

Refer to your Network!

If you know someone, who would benefit from our specially curated programs? Kindly fill in this form to register their interest. We would assist them to upskill with the right program, and get them a highest possible pre-applied fee-waiver up to ₹ 70,000/-

You earn referral incentives worth up to ₹80,000 for each friend that signs up for a paid programme! Read more about our referral incentives here .

deep learning research papers for beginners

Prepare for a Career of the Future

Leave a comment, cancel reply.

Your email address will not be published. Required fields are marked *

Our Trending Machine Learning Courses

Our Popular Machine Learning Course

Machine Learning Course

Get Free Consultation

Machine learning skills to master.

Related Articles

Introduction to Natural Language Processing

Introduction to Natural Language Processing

' src=

What is an Algorithm? Simple & Easy Explanation for Beginners [2023]

' src=

Recursive Feature Elimination: What It Is and Why It Matters?

Start your upskilling journey now, get a free personalised counselling session..

Schedule 1:1 free counselling

Talk to a career expert

Explore Free Courses

Data Science & Machine Learning

Data Science & Machine Learning

Build your foundation in one of the hottest industry of the 21st century


Build essential technical skills to move forward in your career in these evolving times

Career Planning

Career Planning

Get insights from industry leaders and career counselors and learn how to stay ahead in your career


Master industry-relevant skills that are required to become a leader and drive organizational success


Advance your career in the field of marketing with Industry relevant free courses


Kickstart your career in law by building a solid foundation with these relevant free courses.

Register for a demo course, talk to our counselor to find a best course suitable to your career growth.

deep learning research papers for beginners

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii

Nature Chemical Biology ( 2023 ) Cite this article

5122 Accesses

732 Altmetric

Metrics details

Acinetobacter baumannii is a nosocomial Gram-negative pathogen that often displays multidrug resistance. Discovering new antibiotics against A. baumannii has proven challenging through conventional screening approaches. Fortunately, machine learning methods allow for the rapid exploration of chemical space, increasing the probability of discovering new antibacterial molecules. Here we screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. We trained a neural network with this growth inhibition dataset and performed in silico predictions for structurally new molecules with activity against A. baumannii . Through this approach, we discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii . Further investigations revealed that abaucin perturbs lipoprotein trafficking through a mechanism involving LolE. Moreover, abaucin could control an A. baumannii infection in a mouse wound model. This work highlights the utility of machine learning in antibiotic discovery and describes a promising lead with targeted activity against a challenging Gram-negative pathogen.

deep learning research papers for beginners

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

251,40 € per year

only 20,95 € per issue

Rent or buy this article

Get just this article for as long as you need it

Prices may be subject to local taxes which are calculated during checkout

deep learning research papers for beginners

Data Availability

GenBank accession numbers for sequencing of abaucin-resistant mutants are BankIt2629921 – OP677864, OP677865, OP677866 and OP677867. GEO accession numbers for RNA sequencing datasets are GSE214305 – GSM6603484 , GSM6603485 , GSM6603486 , GSM6603487 , GSM6603488 , GSM6603489 and GSM6603490 . Source data are provided with this paper.

Code Availability

All custom code used for antibiotic prediction is open source and can be accessed without restriction at . A cloned snapshot used for this paper is available at . All commercial software used is described in Methods. Source data are provided with this paper.

Antunes, L. C. S., Visca, P. & Towner, K. J. Acinetobacter baumannii : evolution of a global pathogen. Pathog. Dis. 71 , 292–301 (2014).

Article   CAS   PubMed   Google Scholar  

2020 Antibacterial Agents in Clinical and Preclinical Development: An Overview and Analysis (World Health Organization, 2021);

Walsh, C. Where will new antibiotics come from? Nat. Rev. Microbiol. 1 , 65–70 (2003).

Tommasi, R., Brown, D. G., Walkup, G. K., Manchester, J. I. & Miller, A. A. ESKAPEing the labyrinth of antibacterial discovery. Nat. Rev. Drug Discov. 14 , 529–542 (2015).

Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180 , 688–702 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ma, Y. et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol. 40 , 921–931 (2022).

Lluka, T. & Stokes, J. M. Antibiotic discovery in the artificial intelligence era. Ann. N. Y. Acad. Sci. 1519 , 74–93 (2023).

Melander, R. J., Zurawski, D. V. & Melander, C. Narrow-spectrum antibacterial agents. MedChemComm 9 , 12–21 (2018).

Theriot, C. M. et al. Antibiotic-induced shifts in the mouse gut microbiome and metabolome increase susceptibility to Clostridium difficile infection. Nat. Commun. 5 , 3114 (2014).

Article   PubMed   Google Scholar  

Willing, B. P. et al. A pyrosequencing study in twins shows that gastrointestinal microbial profiles vary with inflammatory bowel disease phenotypes. Gastroenterology 139 , 1844–1854 (2010).

Turnbaugh, P. J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444 , 1027–1031 (2006).

Kelly, J. R. et al. Transferring the blues: depression-associated gut microbiota induces neurobehavioural changes in the rat. J. Psychiatr. Res. 82 , 109–118 (2016).

Wu, N. et al. Dysbiosis signature of fecal microbiota in colorectal cancer patients. Microb. Ecol. 66 , 462–470 (2013).

Lee, H. S., Plechot, K., Gohil, S. & Le, J. Clostridium difficile : diagnosis and the consequence of over diagnosis. Infect. Dis. Ther. 10 , 687–697 (2021).

Article   PubMed   PubMed Central   Google Scholar  

Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23 , 405–408 (2017).

Kaplan, E., Greene, N. P., Crow, A. & Koronakis, V. Insights into bacterial lipoprotein trafficking from a structure of LolA bound to the LolC periplasmic domain. Proc. Natl Acad. Sci. USA 115 , E7389–E7397 (2018).

Tang, X. et al. Structural basis for bacterial lipoprotein relocation by the transporter LolCDE. Nat. Struct. Mol. Biol. 28 , 347–355 (2021).

Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59 , 3370–3388 (2019).

Landrum, G. RDKit: a software suite for cheminformatics, computational chemistry, and predictive modeling.

Seok, S. J. et al. Blockade of CCL2/CCR2 signalling ameliorates diabetic nephropathy in db/db mice. Nephrol. Dial. Transplant. 28 , 1700–1710 (2013).

Cerri, C. et al. The chemokine CCL2 mediates the seizure-enhancing effects of systemic inflammation. J. Neurosci. 36 , 3777–3788 (2016).

Chargari, C. et al. Preclinical assessment of JNJ-26854165 (Serdemetan), a novel tryptamine compound with radiosensitizing activity in vitro and in tumor xenografts. Cancer Lett. 312 , 209–218 (2011).

Lehman, J. A. et al. Serdemetan antagonizes the Mdm2-HIF1α axis leading to decreased levels of glycolytic enzymes. PLoS ONE 8 , e74741 (2013).

Stokes, J. M., Lopatkin, A. J., Lobritz, M. A. & Collins, J. J. Bacterial metabolism and antibiotic efficacy. Cell Metab. 30 , 251–259 (2019).

Zheng, E. J., Stokes, J. M. & Collins, J. J. Eradicating bacterial persisters with combinations of strongly and weakly metabolism-dependent antibiotics. Cell Chem. Biol. 27 , 1544–1552 (2020).

Francino, M. P. Antibiotics and the human gut microbiome: dysbioses and accumulation of resistances. Front. Microbiol. 6 , 1543 (2016).

Smits, W. K., Lyras, D., Lacy, D. B., Wilcox, M. H. & Kuijper, E. J. Clostridium difficile infection. Nat. Rev. Dis. Prim. 2 , 16020 (2016).

Sharma, S. et al. Mechanism of LolCDE as a molecular extruder of bacterial triacylated lipoproteins. Nat. Commun. 12 , 4687 (2021).

Nicholson, W. L. & Maughan, H. The spectrum of spontaneous rifampin resistance mutations in the rpoB gene of Bacillus subtilis 168 spores differs from that of vegetative cells and resembles that of Mycobacterium tuberculosis . J. Bacteriol. 184 , 4936–4940 (2002).

Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373 , 871–876 (2021).

Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29 , 1–2 (2022).

Raivio, T. L., Leblanc, S. K. D. & Price, N. L. The Escherichia coli Cpx envelope stress response regulates genes of diverse function that impact antibiotic resistance and membrane integrity. J. Bacteriol. 195 , 2755–2767 (2013).

Guest, R. L., Wang, J., Wong, J. L. & Raivio, T. L. A bacterial stress response regulates respiratory protein complexes to control envelope stress adaptation. J. Bacteriol. 199 , e00153-17 (2017).

Delhaye, A., Laloux, G. & Collet, J.-F. The lipoprotein NlpE is a Cpx sensor that serves as a sentinel for protein sorting and folding defects in the Escherichia coli envelope. J. Bacteriol. 201 , e00611-18 (2019).

Peters, J. M. et al. A comprehensive, CRISPR-based functional analysis of essential genes in bacteria. Cell 165 , 1493–1506 (2016).

Pathania, R. et al. Chemical genomics in Escherichia coli identifies an inhibitor of bacterial lipoprotein targeting. Nat. Chem. Biol. 5 , 849–856 (2009).

McLeod, S. M. et al. Small-molecule inhibitors of Gram-negative lipoprotein trafficking discovered by phenotypic screening. J. Bacteriol. 197 , 1075–1082 (2015).

Manchanda, V., Sanchaita, S. & Singh, N. Multidrug resistant acinetobacter. J. Glob. Infect. Dis. 2 , 291–304 (2010).

Davis, K. A., Moran, K. A., McAllister, C. K. & Gray, P. J. Multidrug-resistant Acinetobacter extremity infections in soldiers. Emerg. Infect. Dis. 11 , 1218–1224 (2005).

Jin, W. et al. Deep learning identifies synergistic drug combinations for treating COVID-19. Proc. Natl Acad. Sci. USA 118 , e2105070118 (2021).

Preuer, K. et al. DeepSynergy: predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 34 , 1538–1546 (2018).

Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

Google Scholar  

Deatherage, D. E. & Barrick, J. E. Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol. Biol. 1151 , 165–188 (2014).

Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34 , 525–527 (2016).

Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5 , 299–314 (1996).

Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15 , 550 (2014).

Karp, P. D. Pathway databases: a case study in computational symbolic theories. Science 293 , 2040–2044 (2001).

Keseler, I. M. et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41 , D605–D612 (2013).

Karp, P. D. et al. Pathway tools version 19.0 update: software for pathway/genome informatics and systems biology. Brief. Bioinform. 17 , 877–890 (2016).

Calvo-Villamañán, A. et al. On-target activity predictions enable improved CRISPR–dCas9 screens in bacteria. Nucleic Acids Res. 48 , e64 (2020).

Depardieu, F. & Bikard, D. Gene silencing with CRISPRi in bacteria and optimization of dCas9 expression levels. Methods 172 , 61–75 (2020).

UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49 , D480–D489 (2021).

Article   Google Scholar  

Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 , 3389–3402 (1997).

Altschul, S. F. et al. Protein database searches using compositionally adjusted substitution matrices. FEBS J. 272 , 5101–5109 (2005).

Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27 , 14–25 (2018).

Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30 , 70–82 (2021).

Download references


We thank S. French from McMaster University for technical assistance with fluorescence microscopy experiments. This work was supported by the David Braley Centre for Antibiotic Discovery (to J.M.S.); the Weston Family Foundation (POP and Catalyst to J.M.S.); the Audacious Project (to J.J.C. and J.M.S.); the Digital Transformation Institute (to R.B.); the Abdul Latif Jameel Clinic for Machine Learning in Health (to R.B.); the DTRA Discovery of Medical Countermeasures Against New and Emerging (DOMANE) threats program (to R.B.); the DARPA Accelerated Molecular Discovery program (to R.B.); the Canadian Institutes of Health Research (FRN-156361 to B.K.C.); Genome Canada GAPP (OGI-146 to M.G.S.); the Canadian Institutes of Health Research (FRN-148713 to M.G.S.); the Faculty of Health Sciences of McMaster University (to J.M.); the Boris Family (to J.M.); a Marshall Scholarship (to K.S.); and the DOE BER (DE-FG02-02ER63445 to A.C-P.).

Author information

These authors contributed equally: Gary Liu, Denise B. Catacutan, Khushi Rathod.

Authors and Affiliations

Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada

Gary Liu, Denise B. Catacutan, Khushi Rathod, Jody C. Mohammed, Meghan Fragis, Kenneth Rachwalski, Jakob Magolan, Brian K. Coombes & Jonathan M. Stokes

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA

Kyle Swanson, Wengong Jin, Tommi Jaakkola & Regina Barzilay

Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA

Anush Chiappino-Pepe & James J. Collins

Department of Genetics, Harvard Medical School, Boston, MA, USA

Anush Chiappino-Pepe

Department of Medicine, Department of Biochemistry and Biomedical Sciences, Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, Ontario, Canada

Saad A. Syed & Michael G. Surette

Department of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario, Canada

Meghan Fragis & Jakob Magolan

Abdul Latif Jameel Clinic for Machine Learning in Health, Massachusetts Institute of Technology, Cambridge, MA, USA

Regina Barzilay & James J. Collins

Department of Biological Engineering, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA

James J. Collins

Broad Institute of MIT and Harvard, Cambridge, MA, USA

You can also search for this author in PubMed   Google Scholar


J.M.S. and J.J.C. conceptualized the study; J.M.S., G.L., K.S. and W.J. performed model building and training; J.M.S., D.B.C., K.R. and A.C-P. performed mechanistic investigations; J.M.S., K.R. and S.A.S. performed spectrum of activity experiments; J.C.M. conducted mouse model experiments; M.F. performed chemical synthesis; J.M.S. and J.J.C. wrote the paper; J.M.S., J.J.C., R.B., T.J., M.G.S., B.K.C. and J.M. supervised the research.

Corresponding authors

Correspondence to James J. Collins or Jonathan M. Stokes .

Ethics declarations

Competing interests.

J.M.S. is cofounder and scientific director of Phare Bio. J.J.C. is cofounder and scientific advisory board chair of Phare Bio. J.J.C. is cofounder and scientific advisory board chair of Enbiotix. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Chemical Biology thanks Jean Francois Collet and the other, anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 model training data and prediction..

( a ) Replicate plot showing primary screening data of 7,684 small molecules for those that inhibited the growth of A. baumannii ATCC 17978 in LB medium at 50 µM. ( b ) Rank-ordered growth inhibition data of the prioritized 240 molecules from our prediction set that were selected for empirical validation (top); rank-ordered growth inhibition data of the 240 predicted molecules with the lowest prediction score (middle); rank-ordered growth inhibition data of the 240 predicted molecules with the highest prediction score that were not found in the training dataset (bottom). Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. Dashed horizontal line represents the stringent hit cut-off of >80% growth inhibition at 50 µM. ( c ) Growth inhibition of A. baumannii by abaucin (blue) and serdemetan (red) in LB medium. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. The structure of serdemetan is shown. ( d ) Growth kinetics of A. baumannii cells after treatment with abaucin at varying concentrations for 6 hours. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted.

Extended Data Fig. 2 Antibacterial activity of abaucin against human commensal species.

( a ) Growth inhibition of A. baumannii ATCC 17978 by ampicillin (blue) and ciprofloxacin (red) in LB medium. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. ( b ) Growth inhibition of B. breve by abaucin. Experiments were conducted in biological duplicate. ( c ) Growth inhibition of B. longum by abaucin. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. ( d ) Non-validated (see Fig. 2e ) growth inhibition of E. lenta by abaucin. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted.

Extended Data Fig. 3 Abaucin mechanism of action.

( a–h ) Growth inhibition of wildtype A. baumannii (WT) and the four independent abaucin-resistant mutants by a collection of diverse antibiotics. From left to right for each plot, the mutants are: A362T variant 1, Y394F, intergenic, and A362T variant 2. Experiments were conducted in biological duplicate. Note that the abaucin-resistant mutants do not display cross-resistance to other antibiotics. ( i ) Structural prediction of wildtype A. baumannii LolE using RoseTTAFold (bottom), with the structural error estimate of each amino acid (top). Position 362 is highlighted orange and resides in a disordered region of the protein. ( j ) same as (i), except with the Y362T abaucin-resistant mutant of LolE. ( k ) RNA sequencing of wildtype A. baumannii treated with 5x MIC abaucin for 4.5 hr (top) or 6 hr (bottom). Data are the mean of biological duplicates. Transcript abundance is normalized to no-drug control cultures grown in identical conditions. Vertical black lines show statistical significance cut-off values. Note the highly significant downregulation of genes involved in the electron transport chain and transmembrane ion transport. ( l ) Growth inhibition of A. baumannii harboring an empty CRISPRi vector (red), or three distinct sgRNAs targeting lolE (blue, teal, and green). All strains were grown in LB medium without induction. Experiments were conducted in biological duplicate. Individual replicates with means connected are plotted. ( m ) qPCR quantifying the expression of lolE relative to the housekeeping gene gltA (left) and gyrB (right) in all four abaucin resistant mutants, normalized to wildtype A. baumannii . Experiments were conducted in biological duplicate with technical triplicates. Bar height represents mean expression.

Supplementary information

Supplementary information.

Supplementary Tables 1–7 and Note.

Reporting Summary

Supplementary data.

Supplementary Data 1: Growth inhibition data against A. baumannii for model training. Supplementary Data 2: Model prediction scores of compounds in the Drug Repurposing Hub. Supplementary Data 3: Experimental validation of (prioritized/poorest/top 240) predictions from the Drug Repurposing Hub. Supplementary Data 4: GO enrichment for up- and down-regulated transcripts in A. baumannii treated with 5x MIC abaucin.

Source data

Source data fig. 4a.

Raw data of measured bacterial load from mouse wound infection models.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Cite this article.

Liu, G., Catacutan, D.B., Rathod, K. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii . Nat Chem Biol (2023).

Download citation

Received : 25 March 2022

Accepted : 25 April 2023

Published : 25 May 2023


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

deep learning research papers for beginners

How to Read a Research Paper – A Guide to Setting Research Goals, Finding Papers to Read, and More

If you work in a scientific field, you should try to build a deep and unbiased understanding of that field. This not only educates you in the best possible way but also helps you envision the opportunities in your space.

A research paper is often the culmination of a wide range of deep and authentic practices surrounding a topic. When writing a research paper, the author thinks critically about the problem, performs rigorous research, evaluates their processes and sources, organizes their thoughts, and then writes. These genuinely-executed practices make for a good research paper.

If you’re struggling to build a habit of reading papers (like I am) on a regular basis, I’ve tried to break down the whole process. I've talked to researchers in the field, read a bunch of papers and blogs from distinguished researchers, and jotted down some techniques that you can follow.

Let’s start off by understanding what a research paper is and what it is NOT!

What is a Research Paper?

A research paper is a dense and detailed manuscript that compiles a thorough understanding of a problem or topic. It offers a proposed solution and further research along with the conditions under which it was deduced and carried out, the efficacy of the solution and the research performed, and potential loopholes in the study.

A research paper is written not only to provide an exceptional learning opportunity but also to pave the way for further advancements in the field. These papers help other scholars germinate the thought seed that can either lead to a new world of ideas or an innovative method of solving a longstanding problem.

What Research Papers are NOT

There is a common notion that a research paper is a well-informed summary of a problem or topic written by means of other sources.

But you shouldn't mistake it for a book or an opinionated account of an individual’s interpretation of a particular topic.

Why Should You Read Research Papers?

What I find fascinating about reading a good research paper is that you can draw on a profound study of a topic and engage with the community on a new perspective to understand what can be achieved in and around that topic.

I work at the intersection of instructional design and data science. Learning is part of my day-to-day responsibilities. If the source of my education is flawed or inefficient, I’d fail at my job in the long term. This applies to many other jobs in Science with a special focus on research.

There are three important reasons to read a research paper:

Not only that, with the help of the internet, you can extrapolate all of these reasons or benefits onto multiple business models. It can be an innovative state-of-the-art product, an efficient service model, a content creator, or a dream job where you are solving problems that matter to you.

Goals for Reading a Research Paper — What Should You Read About?

The first thing to do is to figure out your motivation for reading the paper. There are two main scenarios that might lead you to read a paper:

If you’re an inquisitive beginner with no starting point in mind, start with scenario 2. Shortlist a few topics you want to read about until you find an area that you find intriguing. This will eventually lead you to scenario 1.

ML Reproducibility Challenge

In addition to these generic goals, if you need an end goal for your habit-building exercise of reading research papers, you should check out the ML reproducibility challenge.


You’ll find top-class papers from world-class conferences that are worth diving deep into and reproducing the results.

They conduct this challenge twice a year and they have one coming up in Spring 2021. You should study the past three versions of the challenge, and I’ll write a detailed post on what to expect, how to prepare, and so on.

Now you must be wondering – how can you find the right paper to read?

How to Find the Right Paper to Read

In order to get some ideas around this, I reached out to my friend, Anurag Ghosh who is a researcher at Microsoft. Anurag has been working at the crossover of computer vision, machine learning, and systems engineering.


Here are a few of his tips for getting started:

In addition to these invaluable tips, there are a number of web applications that I’ve shortlisted that help me narrow my search for the right papers to read:




How to Read a Research Paper

After you have stocked your to-read list, then comes the process of reading these papers. Remember that NOT every paper is useful to read and we need a mechanism that can help us quickly screen papers that are worth reading.

To tackle this challenge, you can use this Three-Pass Approach by S. Keshav . This approach proposes that you read the paper in three passes instead of starting from the beginning and diving in deep until the end.

The three pass approach

Tools and Software to Keep Track of Your Pipeline of Papers

If you’re sincere about reading research papers, your list of papers will soon grow into an overwhelming stack that is hard to keep track of. Fortunately, we have software that can help us set up a mechanism to manage our research.

Here are a bunch of them that you can use:




⚠️ Symptoms of Reading a Research Paper

Reading a research paper can turn out to be frustrating, challenging, and time-consuming especially when you’re a beginner. You might face the following common symptoms:

Here’s a complete list of emotions that you might undergo as explained by Adam Ruben in this article .

Key Takeaways

We should be all set to dive right in. Here’s a quick summary of what we have covered here:

Remember: Art is not a single method or step done over a weekend but a process of accomplishing remarkable results over time.

You can also watch the video on this topic on my YouTube channel :

Feel free to respond to this blog or comment on the video if you have some tips, questions, or thoughts!

If this tutorial was helpful, you should check out my data science and machine learning courses on Wiplane Academy . They are comprehensive yet compact and helps you build a solid foundation of work to showcase.

Web and Data Science Consultant | Instructional Design

If you read this far, tweet to the author to show them you care. Tweet a thanks

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

International Conference on Robotics and Automation

Shaping the future.

Robotics and Artificial Intelligence Advancements at Georgia Tech are Creating Entirely New Paradigms of What Computing Technology Can Do

Georgia Tech at ICRA 2023

May 29 -June 2 | London

Pictured above: Georgia Tech ‘robot alum’ Curi turns 10 this year.

Georgia Tech @ ICRA is a joint effort by: Institute for Robotics and Intelligent Machines • Machine Learning Center • College of Computing

More than 70 researchers from across the College of Engineering and College of Computing are advancing the state of the art in robotics.

Discover the Experts at ICRA

deep learning research papers for beginners

Georgia Tech is a leading contributor to ICRA 2023, a research venue focused on robotics and advancements in the development of embodied artificial intelligence.

*by number of papers

Partner Organizations

Air Force Research Laboratory • Amazon Robotics • Aurora Innovation • Autodesk • California Institute of Technology • Caltech • CCDC US Army Research Laboratory • Clemson • Columbia University • ETH Zurich • Free University of Bozen-Bolzano • Google Brain • Instituto Tecnologico Y De Estudios Superiores De Monterrey • Johns Hopkins University • MIT • NASA Jet Propulsion Laboratory • NJIT • NVIDIA • Samsung • Simon Fraser University • Stanford • Toyota Research Institute • Umi 2958 Gt-Cnrs • Université De Lorraine • University of British Columbia • University of California, Irvine • University of California, San Diego • University of Edinburgh • University of Illinois Urbana-Champaign • University of Maryland • University of Oxford • University of Southern California • University of Toronto • University of Washington • University of Waterloo • USC Viterbi School of Engineering • USNWC PC


Researchers use novel approach to teach robot to navigate over obstacles.

By Nathan Deen

Quadrupedal robots may be able to step directly over obstacles in their paths thanks to the efforts of a trio of Georgia Tech Ph.D. students.

When it comes to robotic locomotion and navigation, Naoki Yokoyama says most four-legged robots are trained to regain their footing if an obstacle causes them to stumble. Working toward a larger effort to develop a housekeeping robot, Yokoyama and his collaborators — Simar Kareer and Joanne Truong — set out to train their robot to walk over clutter it might encounter in a home.

“The main motivation of the project is getting low-level control over the legs of the robot that also incorporates visual input,” said Yokoyama, a Ph.D. student within the School of Electrical and Computer Engineering. “We envisioned a controller that could be deployed in an indoor setting with a lot of clutter, such as shoes or toys on the ground of a messy home. Whereas blind locomotive controllers tend to be more reactive — if they step on something, they’ll make sure they don’t fall over — we wanted ours to use visual input to avoid stepping on the obstacle altogether.”

deep learning research papers for beginners


(pictured at top, from 1st row, left to right) :

Simar Kareer

Ph.D. student in computer vision

Joanne Truong

Ph.D. student in robotics

Naoki Yokoyama

Assistant Professor Interactive Computing

Dhruv Batra

Associate Professor Interactive Computing

deep learning research papers for beginners

Together We Swarm

Georgia Tech’s Yellow Jackets are tackling robotics research from a holistic perspective to develop safe, responsible AI systems that operate in the physical and virtual worlds. Learn about research areas and teams in the chart. Continue below to meet some of our experts and learn about their latest efforts.

Featured Research

deep learning research papers for beginners

Multi-Robot Systems

Together We Swarm, In Research & Robots 

“Microrobots have great potential for healthcare and drug delivery, however these applications are impeded by the inaccurate control of microrobots, especially in swarms.

By collaborating with roboticists, we were able to ‘close the gap’ between single robot design and swarm control . All the different elements were there. We just made the connection.” 

Systematically designing local interaction rules to achieve collective behaviors in robot swarms is a challenging endeavor, especially in microrobots, where size restrictions imply severe sensing, communication, and computation limitations. New research demonstrates a systematic approach to control the behaviors of microrobots by leveraging the physical interactions in a swarm of 300 3-mm vibration-driven “micro bristle robots” that designed and fabricated at Georgia Tech. The team’s investigations reveal how physics-driven interaction mechanisms can be exploited to achieve desired behaviors in minimally equipped robot swarms and highlight the specific ways in which hardware and software developments aid in the achievement of collision-induced aggregations.  

Why It Matters

deep learning research papers for beginners

Force and torque sensors are commonly used today to give robots a sense of touch. They are often placed at the “wrist” of a robot arm and allow a robot to precisely sense how much force its gripper applies to the world. However, these sensors are expensive, complex, and fragile. To address these problems, we present Visual Force/Torque Sensing, a method to estimate force and torque without a dedicated sensor. We mount a camera to the robot which focuses on the gripper and train a machine learning algorithm to observe small deflections in the gripper and estimate the forces and torques that caused them. While our method is less accurate than a purpose-built force/torque sensor, it is 100x cheaper and can be used to make touch-sensitive robots that accomplish real-world tasks. 

deep learning research papers for beginners


From Graffiti to Growing Plants: Art Robot Retooled for Hydroponics

“This project highlights how Georgia Tech’s commitment to interdisciplinary research has led to unexpected applications of seemingly unrelated technologies, and serves as a testament to the value of exploring diverse fields of study and collaboration in order to develop innovative solutions to real-world problems.”

Researchers have applied technology they developed for robotic art research to a hydroponics robot.  The team originally developed a cable-based robot designed to spray paint graffiti on large buildings. Because the cable robot technology scales well to large sizes, the same robot became an ideal fit for many agricultural applications. Through a collaboration with the N.E.W. Center for Agriculture Technology at Georgia Tech, a robotic plant phenotyping robot was built, tested, and deployed in a hydroponic pilot farm on campus. The robot takes around 75 photos of each plant every day as the plant grows, then uses computer vision to construct 3D models of the plants which can be used to non-destructively estimate properties such as crop biomass and a photosynthetically active area. This allows for tracking, modeling, and predicting plant growth.


Applying a Soft Touch to Berry Picking

“Our research in automated harvesting can be applied to soft fruits — such as blackberries, raspberries, loganberries, and grapes — that require intensive labor to harvest manually. Automating the harvesting process will allow for a drastic increase in productivity and decrease in human labor and time spent harvesting.”

deep learning research papers for beginners

New research outlines the design, fabrication, and testing of a soft robotic gripper for automated harvesting of blackberries and other fragile fruits. A robotic gripper utilizes three rubber-based fingers that are uniformly actuated by fishing line, similar to how human tendons function. This gripper also includes a ripeness sensor that utilizes the reflectance of near-infrared light to determine ripeness level, as well as an endoscopic camera for blackberry detection and providing image feedback for robot arm manipulation. The gripper was used to harvest 139 berries with manual positioning in two separate field tests. The retention force – the amount of force required to detach the berry from the stem – and average reflectance value was able to be determined for both ripe and unripe blackberries. The soft robotic gripper was integrated onto a rigid robot arm and successfully harvested fifteen artificial blackberries in a lab setting using visual servoing, a technique that allows the robot arm to iteratively approach the target using image feedback. 

deep learning research papers for beginners

Explore the Papers



deep learning research papers for beginners

Georgia Tech is working with more than 35 organizations on robotics research in ICRA’s main program. Explore the partnerships through the network chart and see the top institutions by number of collaborations.

deep learning research papers for beginners

See you in London!

Project Lead and Web Development : Joshua Preston Writer : Nathan Deen Photography and Visual Media: Kevin Beasley, Gerry Chen, Jeremy Collins, Christa Ernst, Joanne Truong Interactive Data Visualizations and Data Analysis : Joshua Preston

deep learning research papers for beginners


  1. (PDF) Deep Learning Analysis: A Review

    deep learning research papers for beginners

  2. Four Deep Learning Papers to Read in February 2021

    deep learning research papers for beginners

  3. How to read Machine Learning and Deep Learning Research papers

    deep learning research papers for beginners

  4. 🎉 Learning styles research paper. Research Paper Styles. 2019-01-24

    deep learning research papers for beginners

  5. The 9 Deep Learning Papers You Need to Know About 3

    deep learning research papers for beginners

  6. (PDF) Deep Learning

    deep learning research papers for beginners


  1. Lecture : 01

  2. Introduction to Deep Learning

  3. F# Tutorial: Type tests via pattern matching

  4. F# Tutorial: Using the Array.collect function

  5. Open Assistant, Truth GPT, Meta & More! 🚀 Day 8 AI News Roundup

  6. Introduction to Deep Learning (1 of 5)


  1. Deep Learning: A Comprehensive Overview on Techniques ...

    Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today's Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various ...

  2. 7 Best Research Papers To Read To Get Started With Deep Learning

    This is especially the case for a beginner who is just trying to get engrossed in the world of deep learning. It might be hard to figure out which research papers are the best starting point for developing new projects and gaining an intuitive understanding of the subject. ... Research Paper: Deep Residual Learning for Image Recognition ...

  3. Four Deep Learning Papers to Read in August 2021

    'Descending through a Crowded Valley — Benchmarking Deep Learning Optimizers' Authors: Schmidt*, Schneider,* Henning (2021) | 📝 Paper | 🗣 Talk | 🤖 Code. One Paragraph Summary: Tuning optimizers is a fundamental ingredient to every Deep Learning-based project. There exist many heuristics such as the infamous starting point for the ...

  4. A Beginner's Guide to Deep Learning Research

    This is the most suited path for any beginner while starting out with deep learning research. Firstly you should adapt yourself with reading a lot of Research papers.

  5. The Top 17 'Must-Read' AI Papers in 2022

    Read the full paper here. 3. Tabular Data: Deep Learning is Not All You Need (2021) - Ravid Shwartz-Ziv, Amitai Armon. To solve real-life data science problems, selecting the right model to use is crucial. This final paper selected by Max explores whether deep models should be recommended as an option for tabular data. Read the full paper here.

  6. 23 Deep Learning Papers To Get You Started

    This was a huge step forward in building interpretable deep learning models. In and around 2012, we had powerful GPUs, large labelled training sets, new model architectures and regularization ...

  7. Most Cited Deep Learning Papers

    Also, after this list comes out, another awesome list for deep learning beginners, called Deep Learning Papers Reading Roadmap, has been created and loved by many deep learning researchers. Although the Roadmap List includes lots of important deep learning papers, it feels overwhelming for me to read them all. As I mentioned in the introduction ...

  8. How to Read Research Papers: A Pragmatic Approach for ML Practitioners

    Step 4: Second pass (content familiarization) Content familiarization is a process that's relevant to the initial steps. The systematic approach to reading the research paper presented in this article. The familiarity process is a step that involves the introduction section and figures within the research paper.

  9. Must Read Papers for Data Science, ML, and DL

    Curated collection of Data Science, Machine Learning and Deep Learning papers, reviews and articles that are on must read list. ... Deep Learning Applications a beginner can build in minutes . CHANGELOG. 2019-10-28 Started must-read-papers-for-ml repo. 2019-10-29 Added analytics vidhya use case studies article links.

  10. Reinforcement Learning in Practice: Opportunities and Challenges

    Download PDF Abstract: This article is a gentle discussion about the field of reinforcement learning in practice, about opportunities and challenges, touching a broad range of topics, with perspectives and without technical details. The article is based on both historical and recent research papers, surveys, tutorials, talks, blogs, books, (panel) discussions, and workshops/conferences.

  11. Top 16 Exciting Deep Learning Project Ideas for Beginners [2023]

    3. Face detection system. This is one of the excellent deep learning project ideas for beginners. With the advance of deep learning, facial recognition technology has also advanced tremendously. Face recognition technology is a subset of Object Detection that focuses on observing the instance of semantic objects.

  12. Deep learning-guided discovery of an antibiotic targeting

    Building off this prior research, ... Source data are provided with this paper. ... predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 34, 1538-1546 ...

  13. How to Read a Research Paper

    A research paper is often the culmination of a wide. Search Submit your search query. Forum ... say how a new deep learning architecture has helped us solve a 50-year old biological problem of understanding protein structures. This is often the case for beginners or for people who consume their daily dose of news from research papers (yes, they ...

  14. Research Papers for beginners : r/learnmachinelearning

    Research Papers for beginners . I started projects on Deep Learning almost a year ago, I had done a few projects in my own interests. ... I think the first deep q learning research paper was one where they used an ai on the atari game. The paper is short and is easily reproducible using the AI gym thing. I dont know much but the paper is really ...

  15. What are the must read papers for a beginner in the field of Machine

    on each paragraph of the paper. Yeah, this is soo true. I mean a research paper can be abstract to new readers but seductive since the new readers are very interested (and somewhat overzealous) to the topic, but ended getting demotivated due to not understanding the paper - and not necessarily it's because they aren't capable of understanding the paper; maybe it's the paper's style of writing ...

  16. International Conference on Robotics and Automation

    May 29 -June 2 | London. Pictured above: Georgia Tech 'robot alum' Curi turns 10 this year. Georgia Tech @ ICRA is a joint effort by: Institute for Robotics and Intelligent Machines •. Machine Learning Center • College of Computing. More than 70 researchers from across the College of Engineering and College of Computing are advancing ...