Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals

Machine learning articles from across Nature Portfolio

Machine learning is the ability of a machine to improve its performance based on previous results. Machine learning methods enable computers to learn without being explicitly programmed and have multiple applications, for example, in the improvement of data mining algorithms.

top research papers in machine learning

Deciphering protein interaction network dynamics with a machine learning-based framework

We developed Tapioca, an integrative ensemble machine learning-based framework, to accurately predict global protein–protein interaction network dynamics. Tapioca enabled the characterization of host regulation during reactivation from latency of an oncogenic virus. Introducing an interactome homology analysis method, we identified a proviral host factor with broad relevance for herpesviruses.

top research papers in machine learning

Artificial intelligence uses multi-omic data to predict pancreatic cancer outcomes

We applied an artificial intelligence (AI) approach to a dataset of clinical and advanced multi-omic molecular features from patients with pancreatic adenocarcinoma to predict survival. The results reveal a tumor-type-agnostic platform that can identify parsimonious and robust clinical prediction biomarkers, catalyzing the vision to democratize precision oncology worldwide.

Latest Research and Reviews

top research papers in machine learning

Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN

SATURN performs cross-species integration and analysis using both single-cell gene expression and protein representations generated by protein language models.

  • Yanay Rosen
  • Maria Brbić
  • Jure Leskovec

top research papers in machine learning

A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing

The authors present DeepMod2, a deep-learning based computational method that allows fast and accurate detection of DNA methylation and epihaplotypes from Oxford Nanopore sequencing data.

  • Mian Umair Ahsan
  • Anagha Gouru

top research papers in machine learning

Multiscale biochemical mapping of the brain through deep-learning-enhanced high-throughput mass spectrometry

MEISTER is an integrative experimental and computational framework for mass spectrometry that integrates three-dimensional, organ-wide biomolecular mapping with single-cell analysis for multiscale profiling of spatial–biochemical organization.

  • Yuxuan Richard Xie
  • Daniel C. Castro

top research papers in machine learning

Designing proteins with language models

Protein language models learn from diverse sequences spanning the evolutionary tree and have proven to be powerful tools for sequence design, variant effect prediction and structure prediction. What are the foundations of protein language models, and how are they applied in protein engineering?

  • Jeffrey A. Ruffolo

top research papers in machine learning

Generative models for protein structures and sequences

Models like ChatGPT and DALL-E2 generate text and images in response to a text prompt. Despite different data and goals, how can generative models be useful for protein engineering?

  • Clara Fannjiang
  • Jennifer Listgarten

top research papers in machine learning

Machine learning for functional protein design

Notin, Rollins and colleagues discuss advances in computational protein design with a focus on redesign of existing proteins.

  • Pascal Notin
  • Nathan Rollins
  • Debora Marks

Advertisement

News and Comment

top research papers in machine learning

What the EU’s tough AI law means for research and ChatGPT

The EU AI Act is the world’s first major legislation on artificial intelligence and strictly regulates general-purpose models.

  • Elizabeth Gibney

top research papers in machine learning

How journals are fighting back against a wave of questionable images

Publishers are deploying AI-based tools to detect suspicious images, but generative AI threatens their efforts.

  • Nicola Jones

top research papers in machine learning

Apple Vision Pro: what does it mean for scientists?

The headset opens up possibilities in accessibility and medical research — and raises concerns about human behaviour.

  • Jonathan O'Callaghan

AI-driven detection and analysis of label-free protein aggregates

In this Tools of the Trade article, Khalid Ibrahim (Radenovic and Lashuel labs) describes a tool for the artificial intelligence (AI)-driven detection of cellular aggregates that bypasses the need for fluorescent labelling.

  • Khalid A. Ibrahim

top research papers in machine learning

AI chatbot shows surprising talent for predicting chemical properties and reactions

Researchers lightly tweak ChatGPT-like system to offer chemistry insight.

  • Davide Castelvecchi

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

top research papers in machine learning

machine learning Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

An explainable machine learning model for identifying geographical origins of sea cucumber Apostichopus japonicus based on multi-element profile

A comparison of machine learning- and regression-based models for predicting ductility ratio of rc beam-column joints, alexa, is this a historical record.

Digital transformation in government has brought an increase in the scale, variety, and complexity of records and greater levels of disorganised data. Current practices for selecting records for transfer to The National Archives (TNA) were developed to deal with paper records and are struggling to deal with this shift. This article examines the background to the problem and outlines a project that TNA undertook to research the feasibility of using commercially available artificial intelligence tools to aid selection. The project AI for Selection evaluated a range of commercial solutions varying from off-the-shelf products to cloud-hosted machine learning platforms, as well as a benchmarking tool developed in-house. Suitability of tools depended on several factors, including requirements and skills of transferring bodies as well as the tools’ usability and configurability. This article also explores questions around trust and explainability of decisions made when using AI for sensitive tasks such as selection.

Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques

Data-driven analysis and machine learning for energy prediction in distributed photovoltaic generation plants: a case study in queensland, australia, modeling nutrient removal by membrane bioreactor at a sewage treatment plant using machine learning models, big five personality prediction based in indonesian tweets using machine learning methods.

<span lang="EN-US">The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user’s personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including <a name="_Hlk87278444"></a>naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the big five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.</span>

Compressive strength of concrete with recycled aggregate; a machine learning-based evaluation

Temperature prediction of flat steel box girders of long-span bridges utilizing in situ environmental parameters and machine learning, computer-assisted cohort identification in practice.

The standard approach to expert-in-the-loop machine learning is active learning, where, repeatedly, an expert is asked to annotate one or more records and the machine finds a classifier that respects all annotations made until that point. We propose an alternative approach, IQRef , in which the expert iteratively designs a classifier and the machine helps him or her to determine how well it is performing and, importantly, when to stop, by reporting statistics on a fixed, hold-out sample of annotated records. We justify our approach based on prior work giving a theoretical model of how to re-use hold-out data. We compare the two approaches in the context of identifying a cohort of EHRs and examine their strengths and weaknesses through a case study arising from an optometric research problem. We conclude that both approaches are complementary, and we recommend that they both be employed in conjunction to address the problem of cohort identification in health research.

Export Citation Format

Share document.

top research papers in machine learning

Analytics Insight

Top 10 Machine Learning Research Papers of 2021

Avatar photo

Machine learning research papers showcasing the transformation of the technology

Unbiased gradient estimation in unrolled computation graphs with persistent evolution, solving high-dimensional parabolic pdes using the tensor train format.

  • TOP 10 MACHINE LEARNING TOOLS 2021
  • TOP COMPANIES USING MACHINE LEARNING IN A PROFITABLE WAY
  • MACHINE LEARNING GUIDE: DIFFERENCES BETWEEN PYTHON AND JAVA

Oops I took a gradient: Scalable sampling for discrete distributions

Optimal complexity in decentralized training, understanding self-supervised learning dynamics without contrastive pairs, how transferable are featured in deep neural networks, do we need hundreds of classifiers to solve real-world classification problems, knowledge vault: a web-scale approach to probabilistic knowledge fusion, scalable nearest neighbor algorithms for high dimensional data, trends in extreme learning machines.

Whatsapp Icon

Disclaimer: Any financial and crypto market information given on Analytics Insight are sponsored articles, written for informational purpose only and is not an investment advice. The readers are further advised that Crypto products and NFTs are unregulated and can be highly risky. There may be no regulatory recourse for any loss from such transactions. Conduct your own research by contacting financial experts before making any investment decisions. The decision to read hereinafter is purely a matter of choice and shall be construed as an express undertaking/guarantee in favour of Analytics Insight of being absolved from any/ all potential legal action, or enforceable claims. We do not represent nor own any cryptocurrency, any complaints, abuse or concerns with regards to the information provided shall be immediately informed here .

You May Also Like

Digital Rupee

Digital Rupee: India’s Own Crypto that will Go Against Bitcoin

Top 10 Blockchain Universities Preparing Students for the Future

Top 10 Blockchain Universities Preparing Students for the Future

Data Science

Top 10 Affordable Data Science and Data Analytics Courses to Take Up in India

Make money online

7 Ways to Make Money Online While You Sleep (Passive Income)

top research papers in machine learning

Analytics Insight® is an influential platform dedicated to insights, trends, and opinion from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

linkedin

  • Select Language:
  • Privacy Policy
  • Content Licensing
  • Terms & Conditions
  • Submit an Interview

Special Editions

  • Dec – Crypto Weekly Vol-1
  • 40 Under 40 Innovators
  • Women In Technology
  • Market Reports
  • AI Glossary
  • Infographics

Latest Issue

Magazine Issue January 2024

Disclaimer: Any financial and crypto market information given on Analytics Insight is written for informational purpose only and is not an investment advice. Conduct your own research by contacting financial experts before making any investment decisions, more information here .

Second Menu

top research papers in machine learning

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

TOPBOTS Logo

The Best of Applied Artificial Intelligence, Machine Learning, Automation, Bots, Chatbots

2020’s Top AI & Machine Learning Research Papers

November 24, 2020 by Mariya Yao

machine learning papers

Despite the challenges of 2020, the AI research community produced a number of meaningful technical breakthroughs. GPT-3 by OpenAI may be the most famous, but there are definitely many other research papers worth your attention. 

For example, teams from Google introduced a revolutionary chatbot, Meena, and EfficientDet object detectors in image recognition. Researchers from Yale introduced a novel AdaBelief optimizer that combines many benefits of existing optimization methods. OpenAI researchers demonstrated how deep reinforcement learning techniques can achieve superhuman performance in Dota 2.

To help you catch up on essential reading, we’ve summarized 10 important machine learning research papers from 2020. These papers will give you a broad overview of AI research advancements this year. Of course, there are many more breakthrough papers worth reading as well.

We have also published the top 10 lists of key research papers in natural language processing and computer vision . In addition, you can read our premium research summaries , where we feature the top 25 conversational AI research papers introduced recently.

Subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new summaries.

If you’d like to skip around, here are the papers we featured:

  • A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning
  • Efficiently Sampling Functions from Gaussian Process Posteriors
  • Dota 2 with Large Scale Deep Reinforcement Learning
  • Towards a Human-like Open-Domain Chatbot
  • Language Models are Few-Shot Learners
  • Beyond Accuracy: Behavioral Testing of NLP models with CheckList
  • EfficientDet: Scalable and Efficient Object Detection
  • Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
  • An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
  • AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Best AI & ML Research Papers 2020

1. a distributed multi-sensor machine learning approach to earthquake early warning , by kévin fauvel, daniel balouek-thomert, diego melgar, pedro silva, anthony simonet, gabriel antoniu, alexandru costan, véronique masson, manish parashar, ivan rodero, and alexandre termier, original abstract .

Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify medium earthquakes due to their propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data, consequently affecting the response time and the robustness of EEW systems. 

In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seismometers) approach that adopts the rule of relative strength.

Our Summary 

The authors claim that traditional Earthquake Early Warning (EEW) systems that are based on seismometers, as well as recently introduced GPS systems, have their disadvantages with regards to predicting large and medium earthquakes respectively. Thus, the researchers suggest approaching an early earthquake prediction problem with machine learning by using the data from seismometers and GPS stations as input data. In particular, they introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, which is specifically tailored for efficient computation on large-scale distributed cyberinfrastructures. The evaluation demonstrates that the DMSEEW system is more accurate than other baseline approaches with regard to real-time earthquake detection.

earthquake early warning

What’s the core idea of this paper?

  • Seismometers have difficulty detecting large earthquakes because of their sensitivity to ground motion velocity.
  • GPS stations are ineffective in detecting medium earthquakes, as they are prone to producing lots of noisy data.
  • takes sensor-level class predictions from seismometers and GPS stations (i.e. normal activity, medium earthquake, large earthquake);
  • aggregates these predictions using a bag-of-words representation and defines a final prediction for the earthquake category.
  • Furthermore, they introduce a distributed cyberinfrastructure that can support the processing of high volumes of data in real time and allows the redirection of data to other processing data centers in case of disaster situations.

What’s the key achievement?

  • precision – 100% vs. 63.2%;
  • recall – 100% vs. 85.7%;
  • F1 score – 100% vs. 72.7%.
  • precision – 76.7% vs. 70.7%;
  • recall – 38.8% vs. 34.1%;
  • F1 score – 51.6% vs. 45.0%.

What does the AI community think?

  • The paper received an Outstanding Paper award at AAAI 2020 (special track on AI for Social Impact).

What are future research areas?

  • Evaluating DMSEEW response time and robustness via simulation of different scenarios in an existing EEW execution platform. 
  • Evaluating the DMSEEW system on another seismic network.

2nd Edition Applied AI book

2. Efficiently Sampling Functions from Gaussian Process Posteriors , by James T. Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth

Gaussian processes are the gold standard for many real-world modeling problems, especially in cases where a model’s success hinges upon its ability to faithfully represent predictive uncertainty. These problems typically exist as parts of larger frameworks, wherein quantities of interest are ultimately defined by integrating over posterior distributions. These quantities are frequently intractable, motivating the use of Monte Carlo methods. Despite substantial progress in scaling up Gaussian processes to large training sets, methods for accurately generating draws from their posterior distributions still scale cubically in the number of test locations. We identify a decomposition of Gaussian processes that naturally lends itself to scalable sampling by separating out the prior from the data. Building off of this factorization, we propose an easy-to-use and general-purpose approach for fast posterior sampling, which seamlessly pairs with sparse approximations to afford scalability both during training and at test time. In a series of experiments designed to test competing sampling schemes’ statistical properties and practical ramifications, we demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.

In this paper, the authors explore techniques for efficiently sampling from Gaussian process (GP) posteriors. After investigating the behaviors of naive approaches to sampling and fast approximation strategies using Fourier features, they find that many of these strategies are complementary. They, therefore, introduce an approach that incorporates the best of different sampling approaches. First, they suggest decomposing the posterior as the sum of a prior and an update. Then they combine this idea with techniques from literature on approximate GPs and obtain an easy-to-use general-purpose approach for fast posterior sampling. The experiments demonstrate that decoupled sample paths accurately represent GP posteriors at a much lower cost.

  • The introduced approach to sampling functions from GP posteriors centers on the observation that it is possible to implicitly condition Gaussian random variables by combining them with an explicit corrective term.
  • The authors translate this intuition to Gaussian processes and suggest decomposing the posterior as the sum of a prior and an update.
  • Building on this factorization, the researchers suggest an efficient approach for fast posterior sampling that seamlessly pairs with sparse approximations to achieve scalability both during training and at test time.
  • Introducing an easy-to-use and general-purpose approach to sampling from GP posteriors.
  • avoid many shortcomings of the alternative sampling strategies;
  • accurately represent GP posteriors at a much lower cost; for example, simulation of a well-known model of a biological neuron required only 20 seconds using decoupled sampling, while the iterative approach required 10 hours.
  • The paper received an Honorable Mention at ICML 2020. 

Where can you get implementation code?

  • The authors released the implementation of this paper on GitHub .

3. Dota 2 with Large Scale Deep Reinforcement Learning , by Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław “Psyho” Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

The OpenAI research team demonstrates that modern reinforcement learning techniques can achieve superhuman performance in such a challenging esports game as Dota 2. The challenges of this particular task for the AI system lies in the long time horizons, partial observability, and high dimensionality of observation and action spaces. To tackle this game, the researchers scaled existing RL systems to unprecedented levels with thousands of GPUs utilized for 10 months. The resulting OpenAI Five model was able to defeat the Dota 2 world champions and won 99.4% of over 7000 games played during the multi-day showcase.

OpenAI Dota 2

  • The goal of the introduced OpenAI Five model is to find the policy that maximizes the probability of winning the game against professional human players, which in practice implies maximizing the reward function with some additional signals like characters dying, resources collected, etc.
  • While the Dota 2 engine runs at 30 frames per second, the OpenAI Five only acts on every 4th frame.
  • At each timestep, the model receives an observation with all the information available to human players (approximated in a set of data arrays) and returns a discrete action , which encodes the desired movement, attack, etc.
  • A policy is defined as a function from the history of observations to a probability distribution over actions that are parameterized as an LSTM with ~159M parameters.
  • The policy is trained using a variant of advantage actor critic, Proximal Policy Optimization.
  • The OpenAI Five model was trained for 180 days spread over 10 months of real time.

OpenAI Dota 2

  • defeated the Dota 2 world champions in a best-of-three match (2–0);
  • won 99.4% of over 7000 games during a multi-day online showcase.
  • Applying introduced methods to other zero-sum two-team continuous environments.

What are possible business applications?

  • Tackling challenging esports games like Dota 2 can be a promising step towards solving advanced real-world problems using reinforcement learning techniques.

4. Towards a Human-like Open-Domain Chatbot , by Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated. 

In contrast to most modern conversational agents, which are highly specialized, the Google research team introduces a chatbot Meena that can chat about virtually anything. It’s built on a large neural network with 2.6B parameters trained on 341 GB of text. The researchers also propose a new human evaluation metric for open-domain chatbots, called Sensibleness and Specificity Average (SSA), which can capture important attributes for human conversation. They demonstrate that this metric correlates highly with perplexity, an automatic metric that is readily available. Thus, the Meena chatbot, which is trained to minimize perplexity, can conduct conversations that are more sensible and specific compared to other chatbots. Particularly, the experiments demonstrate that Meena outperforms existing state-of-the-art chatbots by a large margin in terms of the SSA score (79% vs. 56%) and is closing the gap with human performance (86%).

Meena chatbot

  • Despite recent progress, open-domain chatbots still have significant weaknesses: their responses often do not make sense or are too vague or generic.
  • Meena is built on a seq2seq model with Evolved Transformer (ET) that includes 1 ET encoder block and 13 ET decoder blocks.
  • The model is trained on multi-turn conversations with the input sequence including all turns of the context (up to 7) and the output sequence being the response.
  • making sense,
  • being specific.
  • The research team discovered that the SSA metric shows high negative correlation (R2 = 0.93) with perplexity, a readily available automatic metric that Meena is trained to minimize.
  • Proposing a simple human-evaluation metric for open-domain chatbots.
  • The best end-to-end trained Meena model outperforms existing state-of-the-art open-domain chatbots by a large margin, achieving an SSA score of 72% (vs. 56%).
  • Furthermore, the full version of Meena, with a filtering mechanism and tuned decoding, further advances the SSA score to 79%, which is not far from the 86% SSA achieved by the average human.
  • “Google’s “Meena” chatbot was trained on a full TPUv3 pod (2048 TPU cores) for 30 full days – that’s more than $1,400,000 of compute time to train this chatbot model.” – Elliot Turner, CEO and founder of Hyperia .
  • “So I was browsing the results for the new Google chatbot Meena, and they look pretty OK (if boring sometimes). However, every once in a while it enters ‘scary sociopath mode,’ which is, shall we say, sub-optimal” – Graham Neubig, Associate professor at Carnegie Mellon University .

Meena chatbot

  • Lowering the perplexity through improvements in algorithms, architectures, data, and compute.
  • Considering other aspects of conversations beyond sensibleness and specificity, such as, for example, personality and factuality.
  • Tackling safety and bias in the models.
  • further humanizing computer interactions; 
  • improving foreign language practice; 
  • making interactive movie and videogame characters relatable.
  • Considering the challenges related to safety and bias in the models, the authors haven’t released the Meena model yet. However, they are still evaluating the risks and benefits and may decide otherwise in the coming months.

5. Language Models are Few-Shot Learners , by Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions – something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10× more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

The OpenAI research team draws attention to the fact that the need for a labeled dataset for every new language task limits the applicability of language models. Considering that there is a wide range of possible tasks and it’s often difficult to collect a large labeled training dataset, the researchers suggest an alternative solution, which is scaling up language models to improve task-agnostic few-shot performance. They test their solution by training a 175B-parameter autoregressive language model, called GPT-3 , and evaluating its performance on over two dozen NLP tasks. The evaluation under few-shot learning, one-shot learning, and zero-shot learning demonstrates that GPT-3 achieves promising results and even occasionally outperforms the state of the art achieved by fine-tuned models.

GPT-3

  • The GPT-3 model uses the same model and architecture as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization.
  • However, in contrast to GPT-2, it uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, as in the Sparse Transformer .
  • Few-shot learning , when the model is given a few demonstrations of the task (typically, 10 to 100) at inference time but with no weight updates allowed.
  • One-shot learning , when only one demonstration is allowed, together with a natural language description of the task.
  • Zero-shot learning , when no demonstrations are allowed and the model has access only to a natural language description of the task.
  • On the CoQA benchmark, 81.5 F1 in the zero-shot setting, 84.0 F1 in the one-shot setting, and 85.0 F1 in the few-shot setting, compared to the 90.7 F1 score achieved by fine-tuned SOTA.
  • On the TriviaQA benchmark, 64.3% accuracy in the zero-shot setting, 68.0% in the one-shot setting, and 71.2% in the few-shot setting, surpassing the state of the art (68%) by 3.2%.
  • On the LAMBADA dataset, 76.2 % accuracy in the zero-shot setting, 72.5% in the one-shot setting, and 86.4% in the few-shot setting, surpassing the state of the art (68%) by 18%.
  • The news articles generated by the 175B-parameter GPT-3 model are hard to distinguish from real ones, according to human evaluations (with accuracy barely above the chance level at ~52%).
  • “The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.” – Sam Altman, CEO and co-founder of OpenAI .
  • “I’m shocked how hard it is to generate text about Muslims from GPT-3 that has nothing to do with violence… or being killed…” – Abubakar Abid, CEO and founder of Gradio .
  • “No. GPT-3 fundamentally does not understand the world that it talks about. Increasing corpus further will allow it to generate a more credible pastiche but not fix its fundamental lack of comprehension of the world. Demos of GPT-4 will still require human cherry picking.” – Gary Marcus, CEO and founder of Robust.ai .
  • “Extrapolating the spectacular performance of GPT3 into the future suggests that the answer to life, the universe and everything is just 4.398 trillion parameters.” – Geoffrey Hinton, Turing Award winner .
  • Improving pre-training sample efficiency.
  • Exploring how few-shot learning works.
  • Distillation of large models down to a manageable size for real-world applications.
  • The model with 175B parameters is hard to apply to real business problems due to its impractical resource requirements, but if the researchers manage to distill this model down to a workable size, it could be applied to a wide range of language tasks, including question answering, dialog agents, and ad copy generation.
  • The code itself is not available, but some dataset statistics together with unconditional, unfiltered 2048-token samples from GPT-3 are released on GitHub .

6. Beyond Accuracy: Behavioral Testing of NLP models with CheckList , by Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.

The authors point out the shortcomings of existing approaches to evaluating performance of NLP models. A single aggregate statistic, like accuracy, makes it difficult to estimate where the model is failing and how to fix it. The alternative evaluation approaches usually focus on individual tasks or specific capabilities. To address the lack of comprehensive evaluation approaches, the researchers introduce CheckList , a new evaluation methodology for testing of NLP models. The approach is inspired by principles of behavioral testing in software engineering. Basically, CheckList is a matrix of linguistic capabilities and test types that facilitates test ideation. Multiple user studies demonstrate that CheckList is very effective at discovering actionable bugs, even in extensively tested NLP models.

CheckList

  • The primary approach to the evaluation of models’ generalization capabilities, which is accuracy on held-out data, may lead to performance overestimation, as the held-out data often contains the same biases as the training data. Moreover, this single aggregate statistic doesn’t help much in figuring out where the NLP model is failing and how to fix these bugs.
  • The alternative approaches are usually designed for evaluation of specific behaviors on individual tasks and thus, lack comprehensiveness.
  • CheckList provides users with a list of linguistic capabilities to be tested, like vocabulary, named entity recognition, and negation.
  • Then, to break down potential capability failures into specific behaviors, CheckList suggests different test types , such as prediction invariance or directional expectation tests in case of certain perturbations.
  • Potential tests are structured as a matrix, with capabilities as rows and test types as columns.
  • The suggested implementation of CheckList also introduces a variety of abstractions to help users generate large numbers of test cases easily.
  • Evaluation of state-of-the-art models with CheckList demonstrated that even though some NLP tasks are considered “solved” based on accuracy results, the behavioral testing highlights many areas for improvement.
  • helps to identify and test for capabilities not previously considered;
  • results in more thorough and comprehensive testing for previously considered capabilities;
  • helps to discover many more actionable bugs.
  • The paper received the Best Paper Award at ACL 2020, the leading conference in natural language processing.
  • CheckList can be used to create more exhaustive testing for a variety of NLP tasks.
  • Such comprehensive testing that helps in identifying many actionable bugs is likely to lead to more robust NLP systems.
  • The code for testing NLP models with CheckList is available on GitHub .

7. EfficientDet: Scalable and Efficient Object Detection , by Mingxing Tan, Ruoming Pang, Quoc V. Le

Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and EfficientNet backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single-model and single-scale, our EfficientDet-D7 achieves state-of-the-art 52.2 AP on COCO test-dev with 52M parameters and 325B FLOPs, being 4×–9× smaller and using 13×–42× fewer FLOPs than previous detectors. Code is available on https://github.com/google/automl/tree/master/efficientdet .

The large size of object detection models deters their deployment in real-world applications such as self-driving cars and robotics. To address this problem, the Google Research team introduces two optimizations, namely (1) a weighted bi-directional feature pyramid network (BiFPN) for efficient multi-scale feature fusion and (2) a novel compound scaling method. By combining these optimizations with the EfficientNet backbones, the authors develop a family of object detectors, called EfficientDet . The experiments demonstrate that these object detectors consistently achieve higher accuracy with far fewer parameters and multiply-adds (FLOPs).

EfficientDet

  • A weighted bi-directional feature pyramid network (BiFPN) for easy and fast multi-scale feature fusion. It learns the importance of different input features and repeatedly applies top-down and bottom-up multi-scale feature fusion.
  • A new compound scaling method for simultaneous scaling of the resolution, depth, and width for all backbone, feature network, and box/class prediction networks.
  • These optimizations, together with the EfficientNet backbones, allow the development of a new family of object detectors, called EfficientDet .
  • the EfficientDet model with 52M parameters gets state-of-the-art 52.2 AP on the COCO test-dev dataset, outperforming the previous best detector with 1.5 AP while being 4× smaller and using 13× fewer FLOPs;
  • with simple modifications, the EfficientDet model achieves 81.74% mIOU accuracy, outperforming DeepLabV3+ by 1.7% on Pascal VOC 2012 semantic segmentation with 9.8x fewer FLOPs;
  • the EfficientDet models are up to 3× to 8× faster on GPU/CPU than previous detectors.
  • The paper was accepted to CVPR 2020, the leading conference in computer vision.
  • The high level of interest in the code implementations of this paper makes this research one of the highest-trending papers introduced recently.
  • The high accuracy and efficiency of the EfficientDet detectors may enable their application for real-world tasks, including self-driving cars and robotics.
  • The authors released the official TensorFlow implementation of EfficientDet.
  • The PyTorch implementation of this paper can be found here and here .

8. Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild , by Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi

We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.

The research group from the University of Oxford studies the problem of learning 3D deformable object categories from single-view RGB images without additional supervision. To decompose the image into depth, albedo, illumination, and viewpoint without direct supervision for these factors, they suggest starting by assuming objects to be symmetric. Then, considering that real-world objects are never fully symmetrical, at least due to variations in pose and illumination, the researchers augment the model by explicitly modeling illumination and predicting a dense map with probabilities that any given pixel has a symmetric counterpart. The experiments demonstrate that the introduced approach achieves better reconstruction results than other unsupervised methods. Moreover, it outperforms the recent state-of-the-art method that leverages keypoint supervision.

deformable 3D

  • no access to 2D or 3D ground truth information such as keypoints, segmentation, depth maps, or prior knowledge of a 3D model;
  • using an unconstrained collection of single-view images without having multiple views of the same instance.
  • leveraging symmetry as a geometric cue to constrain the decomposition;
  • explicitly modeling illumination and using it as an additional cue for recovering the shape;
  • augmenting the model to account for potential lack of symmetry – particularly, predicting a dense map that contains the probability of a given pixel having a symmetric counterpart in the image.
  • Qualitative evaluation of the suggested approach demonstrates that it reconstructs 3D faces of humans and cats with high fidelity, containing fine details of the nose, eyes, and mouth.
  • The method reconstructs higher-quality shapes compared to other state-of-the-art unsupervised methods, and even outperforms the DepthNet model, which uses 2D keypoint annotations for depth prediction.

deformable 3D reconstruction

  • The paper received the Best Paper Award at CVPR 2020, the leading conference in computer vision.
  • Reconstructing more complex objects by extending the model to use either multiple canonical views or a different 3D representation, such as a mesh or a voxel map.
  • Improving model performance under extreme lighting conditions and for extreme poses.
  • The implementation code and demo are available on GitHub .

9. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale , by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer can perform very well on image classification tasks when applied directly to sequences of image patches. When pre-trained on large amounts of data and transferred to multiple recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer attain excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

The authors of this paper show that a pure Transformer can perform very well on image classification tasks. They introduce Vision Transformer (ViT) , which is applied directly to sequences of image patches by analogy with tokens (words) in NLP. When trained on large datasets of 14M–300M images, Vision Transformer approaches or beats state-of-the-art CNN-based models on image recognition tasks. In particular, it achieves an accuracy of 88.36% on ImageNet, 90.77% on ImageNet-ReaL, 94.55% on CIFAR-100, and 77.16% on the VTAB suite of 19 tasks.

Visual Transformer

  • When applying Transformer architecture to images, the authors follow as closely as possible the design of the original Transformer designed for NLP.
  • splitting images into fixed-size patches;
  • linearly embedding each of them;
  • adding position embeddings to the resulting sequence of vectors;
  • feeding the patches to a standard Transformer encoder;
  • adding an extra learnable ‘classification token’ to the sequence.
  • Similarly to Transformers in NLP, Vision Transformer is typically pre-trained on large datasets and fine-tuned to downstream tasks.
  • 88.36% on ImageNet; 
  • 90.77% on ImageNet-ReaL; 
  • 94.55% on CIFAR-100; 
  • 97.56% on Oxford-IIIT Pets;
  • 99.74% on Oxford Flowers-102;
  • 77.16% on the VTAB suite of 19 tasks.

Visual Transformer

  • The paper is trending in the AI research community, as evident from the repository stats on GitHub .
  • It is also under review for ICLR 2021 , one of the key conferences in deep learning.
  • Applying Vision Transformer to other computer vision tasks, such as detection and segmentation.
  • Exploring self-supervised pre-training methods.
  • Analyzing the few-shot properties of Vision Transformer.
  • Exploring contrastive pre-training.
  • Further scaling ViT.
  • Thanks to their efficient pre-training and high performance, Transformers may substitute convolutional networks in many computer vision applications, including navigation, automatic inspection, and visual surveillance.
  • The PyTorch implementation of Vision Transformer is available on GitHub .

10. AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients , by Juntang Zhuang, Tommy Tang, Sekhar Tatikonda, Nicha Dvornek, Yifan Ding, Xenophon Papademetris, James S. Duncan

Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) or accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability. We propose AdaBelief to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability. The intuition for AdaBelief is to adapt the step size according to the “belief” in the current gradient direction. Viewing the exponential moving average (EMA) of the noisy gradient as the prediction of the gradient at the next time step, if the observed gradient greatly deviates from the prediction, we distrust the current observation and take a small step; if the observed gradient is close to the prediction, we trust it and take a large step. We validate AdaBelief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling. Specifically, on ImageNet, AdaBelief achieves comparable accuracy to SGD. Furthermore, in the training of a GAN on Cifar10, AdaBelief demonstrates high stability and improves the quality of generated samples compared to a well-tuned Adam optimizer. Code is available at https://github.com/juntang-zhuang/Adabelief-Optimizer .

The researchers introduce AdaBelief , a new optimizer, which combines the high convergence speed of adaptive optimization methods and good generalization capabilities of accelerated stochastic gradient descent (SGD) schemes. The core idea behind the AdaBelief optimizer is to adapt step size based on the difference between predicted gradient and observed gradient: the step is small if the observed gradient deviates significantly from the prediction, making us distrust this observation, and the step is large when the current observation is close to the prediction, making us believe in this observation. The experiments confirm that AdaBelief combines fast convergence of adaptive methods, good generalizability of the SGD family, and high stability in the training of GANs.

  • The idea of the AdaBelief optimizer is to combine the advantages of adaptive optimization methods (e.g., Adam) and accelerated SGD optimizers. Adaptive methods typically converge faster, while SGD optimizers demonstrate better generalization performance.
  • If the observed gradient deviates greatly from the prediction, we have a weak belief in this observation and take a small step.
  • If the observed gradient is close to the prediction, we have a strong belief in this observation and take a large step.
  • fast convergence, like adaptive optimization methods;
  • good generalization, like the SGD family;
  • training stability in complex settings such as GAN.
  • In image classification tasks on CIFAR and ImageNet, AdaBelief demonstrates as fast convergence as Adam and as good generalization as SGD.
  • It outperforms other methods in language modeling.
  • In the training of a WGAN , AdaBelief significantly improves the quality of generated images compared to Adam.
  • The paper was accepted to NeurIPS 2020, the top conference in artificial intelligence.
  • It is also trending in the AI research community, as evident from the repository stats on GitHub .
  • AdaBelief can boost the development and application of deep learning models as it can be applied to the training of any model that numerically estimates parameter gradient. 
  • Both PyTorch and Tensorflow implementations are released on GitHub.

If you like these research summaries, you might be also interested in the following articles:

  • GPT-3 & Beyond: 10 NLP Research Papers You Should Read
  • Novel Computer Vision Research Papers From 2020
  • AAAI 2021: Top Research Papers With Business Applications
  • ICLR 2021: Key Research Papers

Enjoy this article? Sign up for more AI research updates.

We’ll let you know when we release more summary articles like this one.

  • Email Address *
  • Name * First Last
  • Natural Language Processing (NLP)
  • Chatbots & Conversational AI
  • Computer Vision
  • Ethics & Safety
  • Machine Learning
  • Deep Learning
  • Reinforcement Learning
  • Generative Models
  • Other (Please Describe Below)
  • What is your biggest challenge with AI research? *

Reader Interactions

' src=

About Mariya Yao

Mariya is the co-author of Applied AI: A Handbook For Business Leaders and former CTO at Metamaven. She "translates" arcane technical concepts into actionable business advice for executives and designs lovable products people actually want to use. Follow her on Twitter at @thinkmariya to raise your AI IQ.

' src=

May 16, 2021 at 8:13 pm

Merci pour ces informations massives

Leave a Reply

Your email address will not be published. Required fields are marked *

About TOPBOTS

  • Expert Contributors
  • Terms of Service & Privacy Policy
  • Contact TOPBOTS

If you are not redirected, click here .

top research papers in machine learning

Frequently Asked Questions

Journal of Machine Learning Research

The Journal of Machine Learning Research (JMLR), established in 2000 , provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online.

  • 2024.02.18 : Volume 24 completed; Volume 25 began.
  • 2023.01.20 : Volume 23 completed; Volume 24 began.
  • 2022.07.20 : New special issue on climate change .
  • 2022.02.18 : New blog post: Retrospectives from 20 Years of JMLR .
  • 2022.01.25 : Volume 22 completed; Volume 23 began.
  • 2021.12.02 : Message from outgoing co-EiC Bernhard Schölkopf .
  • 2021.02.10 : Volume 21 completed; Volume 22 began.
  • More news ...

Latest papers

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions Shijun Zhang, Jianfeng Lu, Hongkai Zhao , 2024. [ abs ][ pdf ][ bib ]

Effect-Invariant Mechanisms for Policy Generalization Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters , 2024. [ abs ][ pdf ][ bib ]

Pygmtools: A Python Graph Matching Toolkit Runzhong Wang, Ziao Guo, Wenzheng Pan, Jiale Ma, Yikai Zhang, Nan Yang, Qi Liu, Longxuan Wei, Hanxue Zhang, Chang Liu, Zetian Jiang, Xiaokang Yang, Junchi Yan , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Heterogeneous-Agent Reinforcement Learning Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Sample-efficient Adversarial Imitation Learning Dahuin Jung, Hyungyu Lee, Sungroh Yoon , 2024. [ abs ][ pdf ][ bib ]

Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent Benjamin Gess, Sebastian Kassing, Vitalii Konarovskyi , 2024. [ abs ][ pdf ][ bib ]

Rates of convergence for density estimation with generative adversarial networks Nikita Puchkin, Sergey Samsonov, Denis Belomestny, Eric Moulines, Alexey Naumov , 2024. [ abs ][ pdf ][ bib ]

Additive smoothing error in backward variational inference for general state-space models Mathis Chagneux, Elisabeth Gassiat, Pierre Gloaguen, Sylvain Le Corff , 2024. [ abs ][ pdf ][ bib ]

Optimal Bump Functions for Shallow ReLU networks: Weight Decay, Depth Separation, Curse of Dimensionality Stephan Wojtowytsch , 2024. [ abs ][ pdf ][ bib ]

Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees Alexander Terenin, David R. Burt, Artem Artemev, Seth Flaxman, Mark van der Wilk, Carl Edward Rasmussen, Hong Ge , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On Tail Decay Rate Estimation of Loss Function Distributions Etrit Haxholli, Marco Lorenzi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces Hao Liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, Wenjing Liao , 2024. [ abs ][ pdf ][ bib ]

Post-Regularization Confidence Bands for Ordinary Differential Equations Xiaowu Dai, Lexin Li , 2024. [ abs ][ pdf ][ bib ]

On the Generalization of Stochastic Gradient Descent with Momentum Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang , 2024. [ abs ][ pdf ][ bib ]

Pursuit of the Cluster Structure of Network Lasso: Recovery Condition and Non-convex Extension Shotaro Yagishita, Jun-ya Gotoh , 2024. [ abs ][ pdf ][ bib ]

Iterate Averaging in the Quest for Best Test Error Diego Granziol, Nicholas P. Baskerville, Xingchen Wan, Samuel Albanie, Stephen Roberts , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Nonparametric Inference under B-bits Quantization Kexuan Li, Ruiqi Liu, Ganggang Xu, Zuofeng Shang , 2024. [ abs ][ pdf ][ bib ]

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box Ryan Giordano, Martin Ingram, Tamara Broderick , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On Sufficient Graphical Models Bing Li, Kyongwon Kim , 2024. [ abs ][ pdf ][ bib ]

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond Nathan Kallus, Xiaojie Mao, Masatoshi Uehara , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks Sebastian Neumayer, Lénaïc Chizat, Michael Unser , 2024. [ abs ][ pdf ][ bib ]

Improving physics-informed neural networks with meta-learned optimization Alex Bihlo , 2024. [ abs ][ pdf ][ bib ]

A Comparison of Continuous-Time Approximations to Stochastic Gradient Descent Stefan Ankirchner, Stefan Perko , 2024. [ abs ][ pdf ][ bib ]

Critically Assessing the State of the Art in Neural Network Verification Matthias König, Annelot W. Bosman, Holger H. Hoos, Jan N. van Rijn , 2024. [ abs ][ pdf ][ bib ]

Estimating the Minimizer and the Minimum Value of a Regression Function Arya Akhava, Davit Gogolashvili, Alexandre B. Tsybakov , 2024. [ abs ][ pdf ][ bib ]

Modeling Random Networks with Heterogeneous Reciprocity Daniel Cirkovic, Tiandong Wang , 2024. [ abs ][ pdf ][ bib ]

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment Zixian Yang, Xin Liu, Lei Ying , 2024. [ abs ][ pdf ][ bib ]

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models Yangjing Zhang, Ying Cui, Bodhisattva Sen, Kim-Chuan Toh , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Decorrelated Variable Importance Isabella Verdinelli, Larry Wasserman , 2024. [ abs ][ pdf ][ bib ]

Model-Free Representation Learning and Exploration in Low-Rank MDPs Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal , 2024. [ abs ][ pdf ][ bib ]

Seeded Graph Matching for the Correlated Gaussian Wigner Model via the Projected Power Method Ernesto Araya, Guillaume Braun, Hemant Tyagi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization Shicong Cen, Yuting Wei, Yuejie Chi , 2024. [ abs ][ pdf ][ bib ]

Power of knockoff: The impact of ranking algorithm, augmented design, and symmetric statistic Zheng Tracy Ke, Jun S. Liu, Yucong Ma , 2024. [ abs ][ pdf ][ bib ]

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction Yuze Han, Guangzeng Xie, Zhihua Zhang , 2024. [ abs ][ pdf ][ bib ]

On Truthing Issues in Supervised Classification Jonathan K. Su , 2024. [ abs ][ pdf ][ bib ]

  • Event calendar

The Top 17 ‘Must-Read’ AI Papers in 2022

The Top 17 ‘Must-Read’ AI Papers in 2022

We caught up with experts in the RE•WORK community to find out what the top 17 AI papers are for 2022 so far that you can add to your Summer must reads. The papers cover a wide range of topics including AI in social media and how AI can benefit humanity and are free to access.

Interested in learning more? Check out all the upcoming RE•WORK events to find out about the latest trends and industry updates in AI here .

Max Li, Staff Data Scientist – Tech Lead at Wish

Max is a Staff Data Scientist at Wish where he focuses on experimentation (A/B testing) and machine learning.  His passion is to empower data-driven decision-making through the rigorous use of data. View Max’s presentation, ‘Assign Experiment Variants at Scale in A/B Tests’, from our Deep Learning Summit in February 2022 here .

1. Boostrapped Meta-Learning (2022) – Sebastian Flennerhag et al.

The first paper selected by Max proposes an algorithm in which allows the meta-learner teach itself, allowing to overcome the meta-optimisation challenge. The algorithm focuses meta-learning with gradients, which guarantees improvements in performance. The paper also looks at how bootstrapping opens up possibilities. Read the full paper here .

2. Multi-Objective Bayesian Optimization over High-Dimensional Search Spaces (2022) – Samuel Daulton et al.

Another paper selected by Max proposes MORBO, a scalable method for multiple-objective BO as it performs better than that of high-dimensional search spaces. MORBO significantly improves the sample efficiency, and where BO algorithms fail, MORBO provides improved sample efficiencies to the current BO approach used. Read the full paper here .

3. Tabular Data: Deep Learning is Not All You Need (2021) – Ravid Shwartz-Ziv, Amitai Armon

To solve real-life data science problems, selecting the right model to use is crucial. This final paper selected by Max explores whether deep models should be recommended as an option for tabular data. Read the full paper here .

top research papers in machine learning

Jigyasa Grover, Senior Machine Learning Engineer at Twitter

Jigyasa Grover is a Senior Machine Learning Engineer at Twitter working in the performance ads ranking domain. Recently, she was honoured with the 'Outstanding in AI: Young Role Model Award' by Women in AI across North America. She is one of the few ML Google Developer Experts globally. Jigyasa has previously presented at our Deep Learning Summit and MLOps event in San Fransisco earlier this year.

4. Privacy for Free: How does Dataset Condensation Help Privacy? (2022) – Tian Dong et al.

Jigyasa’s first recommendation concentrates on Privacy Preserving Machine Learning, specifically mitigating the leakage of sensitive data in Machine Learning. The paper provides one of the first propositions of using dataset condensation techniques to preserve the data efficiency during model training and furnish membership privacy. This paper was published by Sony AI and won the Outstanding Paper Award at ICML 2022. Read the full paper here .

5. Affective Signals in a Social Media Recommender System (2022) – Jane Dwivedi-Yu et al.

The second paper recommended by Jigyasa talks about operationalising Affective Computing, also known as Emotional AI, for an improved personalised feed on social media. The paper discusses the design of an affective taxonomy customised to user needs on social media. It further lays out the curation of suitable training data by combining engagement data and data from a human-labelling task to enable the identification of the affective response a user might exhibit for a particular post. Read the full paper here .

6. ItemSage: Learning Product Embeddings for Shopping Recommendations at Pinterest (2022) – Paul Baltescu et al.

Jigyasa’s last recommendation is a paper by Pinterest that illustrates the aggregation of both textual and visual information to build a unified set of product embeddings to enhance recommendation results on e-commerce websites. By applying multi-task learning, the proposed embeddings can optimise for multiple engagement types and ensures that the shopping recommendation stack is efficient with respect to all objectives. Read the full article here .

Asmita Poddar, Software Development Engineer at Amazon Alexa

Asmita is a Software Development Engineer at Amazon Alexa, where she works on developing and productionising natural language processing and speech models. Asmita also has prior experience in applying machine learning in diverse domains. Asmita will be presenting at our London AI Summit , in September, where she will discuss AI for Spoken Communication.

7. Competition-Level Code Generation with AlphaCode (2022) – Yujia Li et al.

Systems can help programmers become more productive. Asmita has selected this paper which addresses the problems with incorporating innovations in AI into these systems. AlphaCode is a system that creates solutions for problems that requires deeper reasoning. Read the full paper here .

8. A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog (2022) – Yunhe Xie et al.

There are limits to model’s reasoning in regards to the existing ERSD datasets. The final paper selected by Asmita proposes a Commonsense Knowledge Enhanced Network with a backward-looking loss to perform dialog modelling, external knowledge integration and historical state retrospect. The model used has been shown to outperform other models. Read the full paper here .

top research papers in machine learning

Discover the speakers we have lined up and the topics we will cover at the London AI Summit.

Sergei Bobrovskyi, Expert in Anomaly Detection for Root Cause Analysis at Airbus

Dr. Sergei Bobrovskyi is a Data Scientist within the Analytics Accelerator team of the Airbus Digital Transformation Office. His work focuses on applications of AI for anomaly detection in time series, spanning various use-cases across Airbus. Sergei will be presenting at our Berlin AI Summit in October about Anomaly Detection, Root Cause Analysis and Explainability.

9. LaMDA: Language Models for Dialog Applications (2022) – Romal Thoppilan et al.

The paper chosen by Sergei describes the LaMDA system, which caused the furor this summer, when a former Google engineer claimed it has shown signs of being sentient. LaMDA is a family of large language models for dialog applications based on Transformer architecture. The interesting feature of the model is their fine-tuning with human annotated data and possibility to consult external sources. In any case, this is a very interesting model family, which we might encounter in many of the applications we use daily. Read the full paper here .

10. A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27 (2022) – Yann LeCun

The second paper chosen by Sergei provides a vision on how to progress towards general AI. The study combines a number of concepts including configurable predictive world model, behaviour driven through intrinsic motivation, and hierarchical joint embedding architectures. Read the full paper here .

11. Coordination Among Neural Modules Through a Shared Global Workpace (2022) – Anirudh Goyal et al.

This paper chosen by Sergei combines the Transformer architecture underlying most of the recent successes of deep learning with ideas from the Global Workspace Theory from cognitive sciences. This is an interesting read to broaden the understanding of why certain model architectures perform well and in which direction we might go in the future to further improve performance on challenging tasks. Read the full paper here .

12. Magnetic control of tokamak plasmas through deep reinforcement learning (2022) – Jonas Degrave et al.

Sergei chose the next paper, which asks the question of ‘how can AI research benefit humanity?’. The use of AI to enable safe, reliable and scalable deployment of fusion energy could contribute to the solution of pression problems of climate change. Sergei has said that this is an extremely interesting application of AI technology for engineering. Read the full paper here .

13. TranAd: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data (2022) – Shreshth Tuli, Giuliano Casale and Nicholas R. Jennings

The final paper chosen by Sergei is a specialised paper applying transformer architecture to the problem of unsupervised anomaly detection in multivariate time-series. Many architectures which were successful in other fields are at some points being also applied to time-series. The paper shows an improved performance on some known data sets. Read the full paper here .

top research papers in machine learning

Abdullahi Adamu, Senior Software Engineer at Sony

Abdullahi has worked in various industries including working at a market research start-up where he developed models that could extract insights from human conversations about products or services. He moved to Publicis, where he became Data Engineer and Data Scientist in 2018. Abdullahi will be part of our panel discussion at the London AI Summit in September, where he will discuss Harnessing the Power of Deep Learning.

14. Self-Supervision for Learning from the Bottom Up (2022) – Alexei Efros

This paper chosen by Abdullahi makes compelling arguments for why self-supervision is the next step in the evolution of AI/ML for building more robust models. Overall, these compelling arguments justify even further why self-supervised learning is important on our journey towards more robust models that generalise better in the wild. Read the full paper here .

15. Neural Architecture Search Survey: A Hardware Perspective (2022) – Krishna Teja Chitty-Venkata and Arun K. Somani

Another paper chosen by Abdullahi understands that as we move towards edge computing and federated learning, neural architecture search that takes into account hardware constraints which will be more critical in ensuring that we have leaner neural network models that balance latency and generalisation performance. This survey gives a birds eye view of the various neural architecture search algorithms that take into account hardware constraints to design artificial neural networks that give the best tradeoff of performance and accuracy. Read the full paper here .

16. What Should Not Be Contrastive In Contrastive Learning (2021) – Tete Xiao et al.

In the paper chosen by Abdullahi highlights the underlying assumptions behind data augmentation methods and how these can be counter productive in the context of contrastive learning; for example colour augmentation whilst a downstream task is meant to differentiate colours of objects. The result reported show promising results in the wild. Overall, it presents an elegant solution to using data augmentation for contrastive learning. Read the full paper here .

17. Why do tree-based models still outperform deep learning on tabular data? (2022) – Leo Grinsztajn, Edouard Oyallon and Gael Varoquaux

The final paper selected by Abdulliah works on answering the question of why deep learning models still find it hard to compete on tabular data compared to tree-based models. It is shown that MLP-like architectures are more sensitive to uninformative features in data, compared to their tree-based counterparts. Read the full paper here .

Sign up to the RE•WORK monthly newsletter for the latest AI news, trends and events.

Join us at our upcoming events this year:

·       London AI Summit – 14-15 September 2022

·       Berlin AI Summit – 4-5 October 2022

·       AI in Healthcare Summit Boston – 13-14 October 2022

·       Sydney Deep Learning and Enterprise AI Summits – 17-18 October 2022

·       MLOps Summit – 9-10 November 2022

·       Toronto AI Summit – 9-10 November 2022

·       Nordics AI Summit - 7-8 December 2022

top research papers in machine learning

Machine Learning

  • Reports substantive results on a wide range of learning methods applied to various learning problems.
  • Provides robust support through empirical studies, theoretical analysis, or comparison to psychological phenomena.
  • Demonstrates how to apply learning methods to solve significant application problems.
  • Improves how machine learning research is conducted.
  • Prioritizes verifiable and replicable supporting evidence in all published papers.
  • Hendrik Blockeel

top research papers in machine learning

Latest issue

Volume 113, Issue 2

Latest articles

Reinforcement learning tutor better supported lower performers in a math task.

  • Sherry Ruan
  • Emma Brunskill

top research papers in machine learning

Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning

top research papers in machine learning

Goal-conditioned offline reinforcement learning through state space partitioning

  • Mianchu Wang
  • Giovanni Montana

top research papers in machine learning

Differentially private Riemannian optimization

  • Bamdev Mishra

top research papers in machine learning

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

  • Songwen Pei
  • Mingsong Chen

top research papers in machine learning

Journal updates

Cfp: discovery science 2023.

Submission Deadline: March 4, 2024

Guest Editors: Rita P. Ribeiro, Albert Bifet, Ana Carolina Lorena

CfP: IJCLR Learning and reasoning

Call for papers: conformal prediction and distribution-free uncertainty quantification.

Submission Deadline: January 7th, 2024

Guest Editors: Henrik Boström, Eyke Hüllermeier, Ulf Johansson, Khuong An Nguyen, Aaditya Ramdas

CfP: Special Issue on ACML 2023

Guest editors: Vu Nguyen, Dani Yogatama

Submission deadline: June 2, 2023

Journal information

  • ACM Digital Library
  • Current Contents/Engineering, Computing and Technology
  • EI Compendex
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • Mathematical Reviews
  • OCLC WorldCat Discovery Service
  • Science Citation Index Expanded (SCIE)
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Springer policies

© Springer Science+Business Media LLC, part of Springer Nature

  • Find a journal
  • Publish with us
  • Track your research
  • Data Science
  • Quantum Computing

Analytics Drift

  • Miscellaneous

Analytics Drift

A Comprehensive Guide on RTMP Streaming

Blockchain booms, risks loom: the ai rescue mission in smart contract auditing, developing incident response plans for insider threats, weis wave: revolutionizing market analysis, top machine learning (ml) research papers released in 2022.

For every Machine Learning (ML) enthusiast, we bring you a curated list of the major breakthroughs in ML research in 2022.

Preetipadma K

Machine learning (ML) is gaining much traction in recent years owing to the disruption and development it brings in enhancing existing technologies. Every month, hundreds of ML papers from various organizations and universities get uploaded on the internet to share the latest breakthroughs in this domain. As the year ends, we bring you the Top 22 ML research papers of 2022 that created a huge impact in the industry. The following list does not reflect the ranking of the papers, and they have been selected on the basis of the recognitions and awards received at international conferences in machine learning.

  • Bootstrapped Meta-Learning

Meta-learning is a promising field that investigates ways to enable machine learners or RL agents (which include hyperparameters) to learn how to learn in a quicker and more robust manner, and it is a crucial study area for enhancing the efficiency of AI agents.

This 2022 ML paper presents an algorithm that teaches the meta-learner how to overcome the meta-optimization challenge and myopic meta goals. The algorithm’s primary objective is meta-learning using gradients, which ensures improved performance. The research paper also examines the potential benefits due to bootstrapping. The authors highlight several interesting theoretical aspects of this algorithm, and the empirical results achieve new state-of-the-art (SOTA) on the ATARI ALE benchmark as well as increased efficiency in multitask learning.

  • Competition-level code generation with AlphaCode

One of the exciting uses for deep learning and large language models is programming. The rising need for coders has sparked the race to build tools that can increase developer productivity and provide non-developers with tools to create software. However, these models still perform badly when put to the test on more challenging, unforeseen issues that need more than just converting instructions into code.

The popular ML paper of 2022 introduces AlphaCode, a code generation system that, in simulated assessments of programming contests on the Codeforces platform, averaged a rating in the top 54.3%. The paper describes the architecture, training, and testing of the deep-learning model.

  • Restoring and attributing ancient texts using deep neural networks

The epigraphic evidence of the ancient Greek era — inscriptions created on durable materials such as stone and pottery —  had already been broken when it was discovered, rendering the inscribed writings incomprehensible. Machine learning can help in restoring, and identifying chronological and geographical origins of damaged inscriptions to help us better understand our past. 

This ML paper proposed a machine learning model built by DeepMind, Ithaca, for the textual restoration and geographical and chronological attribution of ancient Greek inscriptions. Ithaca was trained on a database of just under 80,000 inscriptions from the Packard Humanities Institute. It had a 62% accuracy rate compared to historians, who had a 25% accuracy rate on average. But when historians used Ithaca, they quickly achieved a 72% accuracy.

  • Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Large neural networks use more resources to train hyperparameters since each time, the network must estimate which hyperparameters to utilize. This groundbreaking ML paper of 2022 suggests a novel zero-shot hyperparameter tuning paradigm for more effectively tuning massive neural networks. The research, co-authored by Microsoft Research and OpenAI, describes a novel method called µTransfer that leverages µP to zero-shot transfer hyperparameters from small models and produces nearly perfect HPs on large models without explicitly tuning them.

This method has been found to reduce the amount of trial and error necessary in the costly process of training large neural networks. By drastically lowering the need to predict which training hyperparameters to use, this approach speeds up research on massive neural networks like GPT-3 and perhaps its successors in the future.

  • PaLM: Scaling Language Modeling with Pathways 

Large neural networks trained for language synthesis and recognition have demonstrated outstanding results in various tasks in recent years. This trending 2022 ML paper introduced Pathways Language Model (PaLM), a 780 billion high-quality text token, and 540 billion parameter-dense decoder-only autoregressive transformer.

Although PaLM just uses a decoder and makes changes like SwiGLU Activation, Parallel Layers, Multi-Query Attention, RoPE Embeddings, Shared Input-Output Embeddings, and No Biases and Vocabulary, it is based on a typical transformer model architecture. The paper describes the company’s latest flagship surpassing several human baselines while achieving state-of-the-art in numerous zero, one, and few-shot NLP tasks.

  • Robust Speech Recognition via Large-Scale Weak Supervision

Machine learning developers have found it challenging to build speech-processing algorithms that are trained to predict a vast volume of audio transcripts on the internet. This year, OpenAI released Whisper , a new state-of-the-art (SotA) model in speech-to-text that can transcribe any audio to text and translate it into several languages. It has received 680,000 hours of training on a vast amount of voice data gathered from the internet. According to OpenAI, this model is robust to accents, background noise, and technical terminology. Additionally, it allows transcription into English from 99 different languages and translation into English from those languages.

The OpenAI ML paper mentions the author ensured that about one-third of the audio data is non-English. This helped the team outperform other supervised state-of-the-art models by maintaining a diversified dataset.

  • OPT: Open Pre-trained Transformer Language Models

Large language models have demonstrated extraordinary performance f on numerous tasks (e.g., zero and few-shot learning). However, these models are difficult to duplicate without considerable funding due to their high computing costs. Even while the public can occasionally interact with these models through paid APIs, complete research access is still only available from a select group of well-funded labs. This limited access has hindered researchers’ ability to comprehend how and why these language models work, which has stalled progress on initiatives to improve their robustness and reduce ethical drawbacks like bias and toxicity.

The popular 2022 ML paper introduces Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers with 125 million to 175 billion parameters that the authors want to share freely and responsibly with interested academics. The biggest OPT model, OPT-175B (it is not included in the code repository but is accessible upon request), which is impressively proven to perform similarly to GPT-3 (which also has 175 billion parameters)  uses just 15% of GPT-3’s carbon footprint during development and training.

  • A Path Towards Autonomous Machine Intelligence

Yann LeCun is a prominent and respectable researcher in the field of artificial intelligence and machine learning. In June, his much-anticipated paper “ A Path Towards Autonomous Machine Intelligence ” was published on OpenReview. LeCun offered a number of approaches and architectures in his paper that might be combined and used to create self-supervised autonomous machines. 

He presented a modular architecture for autonomous machine intelligence that combines various models to operate as distinct elements of a machine’s brain and mirror the animal brain. Due to the differentiability of all the models, they are all interconnected to power certain brain-like activities, such as identification and environmental response. It incorporates ideas like a configurable predictive world model, behavior-driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning. 

  • LaMDA: Language Models for Dialog Applications 

Despite tremendous advances in text generation, many of the chatbots available are still rather irritating and unhelpful. This 2022 ML paper from Google describes the LaMDA — short for “Language Model for Dialogue Applications” — system, which caused the uproar this summer when a former Google engineer, Blake Lemoine, alleged that it is sentient. LaMDA is a family of large language models for dialog applications built on Google’s Transformer architecture, which is known for its efficiency and speed in language tasks such as translation. The model’s ability to be adjusted using data that has been human-annotated and the capability of consulting external sources are its most intriguing features.

The model, which has 137 billion parameters, was pre-trained using 1.56 trillon words from publicly accessible conversation data and online publications. The model is also adjusted based on the three parameters of quality, safety, and groundedness.

  • Privacy for Free: How does Dataset Condensation Help Privacy?

One of the primary proposals in the award-winning ML paper is to use dataset condensation methods to retain data efficiency during model training while also providing membership privacy. The authors argue that dataset condensation, which was initially created to increase training effectiveness, is a better alternative to data generators for producing private data since it offers privacy for free. 

Though existing data generators are used to produce differentially private data for model training to minimize unintended data leakage, they result in high training costs or subpar generalization performance for the sake of data privacy. This study was published by Sony AI and received the Outstanding Paper Award at ICML 2022. 

  • TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data

The use of a model that converts time series into anomaly scores at each time step is essential in any system for detecting time series anomalies. Recognizing and diagnosing anomalies in multivariate time series data is critical for modern industrial applications. Unfortunately, developing a system capable of promptly and reliably identifying abnormal observations is challenging. This is attributed to a shortage of anomaly labels, excessive data volatility, and the expectations of modern applications for ultra-low inference times. 

In this study , the authors present TranAD, a deep transformer network-based anomaly detection and diagnosis model that leverages attention-based sequence encoders to quickly execute inference while being aware of the more general temporal patterns in the data. TranAD employs adversarial training to achieve stability and focus score-based self-conditioning to enable robust multi-modal feature extraction. The paper mentions extensive empirical experiments on six publicly accessible datasets show that TranAD can perform better in detection and diagnosis than state-of-the-art baseline methods with data- and time-efficient training. 

  • Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding 

In the last few years, generative models called “diffusion models” have been increasingly popular. This year saw these models capture the excitement of AI enthusiasts around the world. 

Going ahead of the current text to speech technology of recent times, this outstanding 2022 ML paper introduced the viral text-to-image diffusion model from Google, Imagen. This diffusion model achieves a new state-of-the-art FID score of 7.27 on the COCO dataset by combining the deep language understanding of transformer-based large language models with the photorealistic image-generating capabilities of diffusion models. A text-only frozen language model provides the text representation, and a diffusion model with two super-resolution upsampling stages, up to 1024×2014, produces the images. It employs several training approaches, including classifier-free guiding, to teach itself conditional and unconditional generation. Another important feature of Imagen is the use of dynamic thresholding, which stops the diffusion process from being saturated in specific areas of the picture, a behavior that reduces image quality, particularly when the weight placed on text conditional creation is large.

  • No Language Left Behind: Scaling Human-Centered Machine Translation

This ML paper introduced the most popular Meta projects of the year 2022: NLLB-200. This paper talks about how Meta built and open-sourced this state-of-the-art AI model at FAIR, which is capable of translating 200 languages between each other. It covers every aspect of this technology: language analysis, moral issues, effect analysis, and benchmarking.

No matter what language a person speaks, accessibility via language ensures that everyone can benefit from the growth of technology. Meta claims that several languages that NLLB-200 translates, such as Kamba and Lao, are not currently supported by any translation systems in use. The tech behemoth also created a dataset called “FLORES-200” to evaluate the effectiveness of the NLLB-200 and show that accurate translations are offered. According to Meta, NLLB-200 offers an average of 44% higher-quality translations than its prior model.

  • A Generalist Agent

AI pundits believe that multimodality will play a huge role in the future of Artificial General Intelligence (AGI). One of the most talked ML papers of 2022 by DeepMind introduces Gato – a generalist agent . This AGI agent is a multi-modal, multi-task, multi-embodiment network, which means that the same neural network (i.e. a single architecture with a single set of weights) can do all tasks while integrating inherently diverse types of inputs and outputs. 

DeepMind claims that the general agent can be improved with new data to perform even better on a wider range of tasks. They argue that having a general-purpose agent reduces the need for hand-crafting policy models for each region, enhances the volume and diversity of training data, and enables continuous advances in the data, computing, and model scales. A general-purpose agent can also be viewed as the first step toward artificial general intelligence, which is the ultimate goal of AGI. 

Gato demonstrates the versatility of transformer-based machine learning architectures by exhibiting their use in a variety of applications.  Unlike previous neural network systems tailored for playing games, stack blocks with a real robot arm, read words, and caption images, Gato is versatile enough to perform all of these tasks on its own, using only a single set of weights and a relatively simple architecture.

  • The Forward-Forward Algorithm: Some Preliminary Investigations 

AI pioneer Geoffrey Hinton is known for writing paper on the first deep convolutional neural network and backpropagation. In his latest paper presented at NeurIPS 2022, Hinton proposed the “forward-forward algorithm,” a new learning algorithm for artificial neural networks based on our understanding of neural activations in the brain. This approach draws inspiration from Boltzmann machines (Hinton and Sejnowski, 1986) and noise contrast estimation (Gutmann and Hyvärinen, 2010). According to Hinton, forward-forward, which is still in its experimental stages, can substitute the forward and backward passes of backpropagation with two forward passes, one with positive data and the other with negative data that the network itself could generate. Further, the algorithm could simulate hardware more efficiently and provide a better explanation for the brain’s cortical learning process.

Without employing complicated regularizers, the algorithm obtained a 1.4 percent test error rate on the MNIST dataset in an empirical study, proving that it is just as effective as backpropagation.

The paper also suggests a novel “mortal computing” model that can enable the forward-forward algorithm and understand our brain’s energy-efficient processes.

  • Focal Modulation Networks

In humans, the ciliary muscles alter the shape of the eye and hence the radius of the curvature lens to focus on near or distant objects. Changing the shape of the eye lens, changes the focal length of the lens. Mimicking this behavior of focal modulation in computer vision systems can be tricky.

This machine learning paper introduces FocalNet, an iterative information extraction technique that employs the premise of foveal attention to post-process Deep Neural Network (DNN) outputs by performing variable input/feature space sampling. Its attention-free design outperforms SoTA self-attention (SA) techniques in a wide range of visual benchmarks. According to the paper, focal modulation consists of three parts: According to the paper, focal modulation consists of three parts: 

a. hierarchical contextualization, implemented using a stack of depth-wise convolutional layers, to encode visual contexts from close-up to a great distance; 

b. gated aggregation to selectively gather contexts for each query token based on its content; and  

c. element-wise modulation or affine modification to inject the gathered context into the query.

  • Learning inverse folding from millions of predicted structures

The field of structural biology is being fundamentally changed by cutting-edge technologies in machine learning, protein structure prediction, and innovative ultrafast structural aligners. Time and money are no longer obstacles to obtaining precise protein models and extensively annotating their functionalities. However, determining a protein sequence from its backbone atom coordinates remained a challenge for scientists. To date, machine learning methods to this challenge have been constrained by the amount of empirically determined protein structures available.

In this ICML Outstanding Paper (Runner Up) , authors explain tackling this problem by increasing training data by almost three orders of magnitude by using AlphaFold2 to predict structures for 12 million protein sequences. With the use of this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers is able to recover native sequence on structurally held-out backbones in 51% of cases while recovering buried residues in 72% of cases. This is an improvement of over 10% over previous techniques. In addition to designing protein complexes, partly masked structures, binding interfaces, and numerous states, the concept generalises to a range of other more difficult tasks.

  • MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Within the AI research community, using video games as a training medium for AI has gained popularity. These autonomous agents have had great success in Atari games, Starcraft, Dota, and Go. Although these developments have gained popularity in the field of artificial intelligence research, the agents do not generalize beyond a narrow range of activities, in contrast to humans, who continually learn from open-ended tasks.

This thought-provoking 2022 ML paper suggests MineDojo, a unique framework for embodied agent research based on the well-known game Minecraft. In addition to building an internet-scale information base with Minecraft videos, tutorials, wiki pages, and forum discussions, Minecraft provides a simulation suite with tens of thousands of open-ended activities. Using MineDojo data, the author proposes a unique agent learning methodology that employs massive pre-trained video-language models as a learnt reward function. Without requiring a dense shaping reward that has been explicitly created, MinoDojo autonomous agent can perform a wide range of open-ended tasks that are stated in free-form language.

  • Is Out-of-Distribution Detection Learnable?

Machine learning (supervised ML) models are frequently trained using the closed-world assumption, which assumes that the distribution of the testing data will resemble that of the training data. This assumption doesn’t hold true when used in real-world activities, which causes a considerable decline in their performance. While this performance loss is acceptable for applications like product recommendations, developing an out-of-distribution (OOD) identification algorithm is crucial to preventing ML systems from making inaccurate predictions in situations where data distribution in real-world activities typically drifts over time (self-driving cars).

In this paper , authors explore the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem, to study the applicability of OOD detection. They first focus on identifying a prerequisite for OOD detection’s learnability. Following that, they attempt to show a number of impossibility theorems regarding the learnability of OOD detection in a handful yet different scenarios.

  • Gradient Descent: The Ultimate Optimizer 

Gradient descent is a popular optimization approach for training machine learning models and neural networks. The ultimate aim of any machine learning (neural network) method is to optimize parameters, but selecting the ideal step size for an optimizer is difficult since it entails lengthy and error-prone manual work. Many strategies exist for automated hyperparameter optimization; however, they often incorporate additional hyperparameters to govern the hyperparameter optimization process. In this study , MIT CSAIL and Meta researchers offer a unique approach that allows gradient descent optimizers like SGD and Adam to tweak their hyperparameters automatically.

They propose learning the hyperparameters by self-using gradient descent, as well as learning the hyper-hyperparameters via gradient descent, and so on indefinitely. This paper describes an efficient approach for allowing gradient descent optimizers to autonomously adjust their own hyperparameters, which may be layered recursively to many levels. As these gradient-based optimizer towers expand in size, they become substantially less sensitive to the selection of top-level hyperparameters, reducing the load on the user to search for optimal values.

  • ProcTHOR: Large-Scale Embodied AI Using Procedural Generation 

Embodied AI is a developing study field that has been influenced by recent advancements in artificial intelligence, machine learning, and computer vision. This method of computer learning makes an effort to translate this connection to artificial systems. The paper proposes ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR allows researchers to sample arbitrarily huge datasets of diverse, interactive, customisable, and performant virtual environments in order to train and assess embodied agents across navigation, interaction, and manipulation tasks. 

According to the authors, models trained on ProcTHOR using only RGB images and without any explicit mapping or human task supervision achieve cutting-edge results in 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the ongoing Habitat2022, AI2-THOR Rearrangement2022, and RoboTHOR challenges. The paper received the Outstanding Paper award at NeurIPS 2022.

  • A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog

Emotion Recognition in Spoken Dialog (ERSD) has recently attracted a lot of attention due to the growth of open conversational data. This is due to the fact that excellent speech recognition algorithms have emerged as a result of the integration of emotional states in intelligent spoken human-computer interactions. Additionally, it has been demonstrated that recognizing emotions makes it possible to track the development of human-computer interactions, allowing for dynamic change of conversational strategies and impacting the result (e.g., customer feedback). But the volume of the current ERSD datasets restricts the model’s development. 

This ML paper proposes a Commonsense Knowledge Enhanced Network (CKE-Net) with a retrospective loss to carry out dialog modeling, external knowledge integration, and historical state retrospect hierarchically. 

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Preetipadma K

RELATED ARTICLES

From insight to impact: the power of data expanding your business, the ultimate guide to scrape websites for data using web scraping tools, the role of artificial intelligence in detecting cyber threats, leave a reply cancel reply.

Save my name, email, and website in this browser for the next time I comment.

Most Popular

Analytics Drift

Analytics Drift strives to keep you updated with the latest technologies such as Artificial Intelligence, Data Science, Machine Learning, and Deep Learning. We are on a mission to build the largest data science community in the world by serving you with engaging content on our platform.

Contact us: [email protected]

Copyright © 2024 Analytics Drift Private Limited.

Machine Learning

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community.

AIM logo Black

  • Last updated February 2, 2021
  • In AI Origins & Evolution

Top Machine Learning Research Papers Released In 2020

  • by Ram Sagar

top research papers in machine learning

It has been only two weeks into the last month of the year and arxiv.org, the popular repository for ML research papers has already witnessed close to 600 uploads. This should give one the idea of the pace at which machine learning research is proceeding; however, keeping track of all these research work is almost impossible. Every year, the research that gets maximum noise is usually from companies like Google and Facebook; from top universities like MIT; from research labs and most importantly from the conferences like NeurIPS or ACL. 

  • CVPR : 1,470 research papers on computer vision accepted from 6,656 valid submissions.
  • ICLR : 687 out of 2594 papers made it to ICLR 2020 — a 26.5% acceptance rate.
  • ICML : 1088 papers have been accepted from 4990 submissions.

In this article, we have compiled a list of interesting machine learning research work that has made some noise this year. 

Natural Language Processing

This is the seminal paper that introduced the most popular ML model of the year — GPT-3. In the paper titled, “Transformers are few shot learners”, the OpenAI team used the same model and architecture as GPT-2 that includes modified initialisation, pre-normalisation, and reversible tokenisation along with alternating dense and locally banded sparse attention patterns in the layers of the transformer. While the GPT-3 model achieved promising results in the zero-shot and one-shot settings, in the few-shot setting, it occasionally surpassed state-of-the-art models. 

ALBERT: A Lite BERT

Usually, increasing model size when pretraining natural language representations often result in improved performance on downstream tasks, but the training times become longer. To address these problems, the authors in their work presented two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. The authors also used a self-supervised loss that focuses on modelling inter-sentence coherence and consistently helped downstream tasks with multi-sentence inputs. According to results, this model established new state-of-the-art results on the GLUE, RACE, and squad benchmarks while having fewer parameters compared to BERT-large. 

Check the paper here .

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

Microsoft Research, along with the University of Washington and the University of California, in this paper, introduced a model-agnostic and task agnostic methodology for testing NLP models known as CheckList. This is also the winner of the best paper award at the ACL conference this year. It included a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. 

Linformer is a Transformer architecture for tackling the self-attention bottleneck in Transformers. It reduces self-attention to an O(n) operation in both space- and time complexity. It is a new self-attention mechanism which allows the researchers to compute the contextual mapping in linear time and memory complexity with respect to the sequence length. 

Read more about the paper here .

Plug and Play Language Models

Plug and Play Language Models ( PPLM ) are a combination of pre-trained language models with one or more simple attribute classifiers. This, in turn, assists in text generation without any further training. According to the authors, model samples demonstrated control over sentiment styles, and extensive automated and human-annotated evaluations showed attribute alignment and fluency. 

Reformer 

The researchers at Google, in this paper , introduced Reformer. This work showcased that the architecture of a Transformer can be executed efficiently on long sequences and with small memory. The authors believe that the ability to handle long sequences opens the way for the use of the Reformer on many generative tasks. In addition to generating very long coherent text, the Reformer can bring the power of Transformer models to other domains like time-series forecasting, music, image and video generation. 

To overcome the limitations of sparse transformers, Google, in another paper, introduced Performer which uses an efficient (linear) generalised attention framework and has the potential to directly impact research on biological sequence analysis and more. The authors stated that modern bioinformatics could immensely benefit from faster, more accurate language models, for development of new nanoparticle vaccines. 

Check paper here .

Computer Vision

An image is worth 16x16 words.

Recent conversation with a friend: @ilyasut : what's your take on https://t.co/fqVhQNaBWQ ? @OriolVinyalsML : my take is: farewell convolutions : ) pic.twitter.com/9PEvxmWvO4 — Oriol Vinyals (@OriolVinyalsML) October 3, 2020

The irony here is that one of the popular language models, Transformers have been made to do computer vision tasks. In this paper , the authors claimed that the vision transformer could go toe-to-toe with the state-of-the-art models on image recognition benchmarks, reaching accuracies as high as 88.36% on ImageNet and 94.55% on CIFAR-100. For this, the vision transformer receives input as a one-dimensional sequence of token embeddings. The image is then reshaped into a sequence of flattened 2D patches. The transformers in this work use constant widths through all of its layers.

Unsupervised Learning of Probably Symmetric Deformable 3D Objects

top research papers in machine learning

Winner of the CVPR best paper award, in this work, the authors proposed a method to learn 3D deformable object categories from raw single-view images, without external supervision. This method uses an autoencoder that factored each input image into depth, albedo, viewpoint and illumination. The authors showcased that reasoning about illumination can be used to exploit the underlying object symmetry even if the appearance is not symmetric due to shading.

Generative Pretraining from Pixels

In this paper , OpenAI researchers examined whether similar models can learn useful representations for images. For this, the researchers trained a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, the researchers found that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, it achieved 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full fine-tuning and matching the top supervised pre-trained models. An even larger model, trained on a mixture of ImageNet and web images, is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of their features.

Reinforcement Learning

Deep reinforcement learning and its neuroscientific implications.

In this paper, the authors provided a high-level introduction to deep RL , discussed some of its initial applications to neuroscience, and surveyed its wider implications for research on brain and behaviour and concluded with a list of opportunities for next-stage research. Although DeepRL seems to be promising, the authors wrote that it is still a work in progress and its implications in neuroscience should be looked at as a great opportunity. For instance, deep RL provides an agent-based framework for studying the way that reward shapes representation, and how representation, in turn, shapes learning and decision making — two issues which together span a large swath of what is most central to neuroscience. 

Dopamine-based Reinforcement Learning

Why humans doing certain things are often linked to dopamine , a hormone that acts as the reward system (think: the likes on your Instagram page). So, keeping this fact in hindsight, DeepMind with the help of Harvard labs, analysed dopamine cells in mice and recorded how the mice received rewards while they learned a task. They then checked these recordings for consistency in the activity of the dopamine neurons with standard temporal difference algorithms. This paper proposed an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning. The authors hypothesised that the brain represents possible future rewards not as a single mean but as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. 

Lottery Tickets In Reinforcement Learning & NLP

In this paper, the authors bridged natural language processing (NLP) and reinforcement learning (RL). They examined both recurrent LSTM models and large-scale Transformer models for NLP and discrete-action space tasks for RL. The results suggested that the lottery ticket hypothesis is not restricted to supervised learning of natural images, but rather represents a broader phenomenon in deep neural networks.

What Can Learned Intrinsic Rewards Capture

top research papers in machine learning

In this paper, the authors explored if the reward function itself can be a good locus of learned knowledge. They proposed a scalable framework for learning useful intrinsic reward functions across multiple lifetimes of experience and showed that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. 

Miscellaneous

Automl- zero.

The progress of AutoML has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks, or similarly restrictive search spaces. In this paper , the authors showed that AutoML could go further with AutoML Zero, that automatically discovers complete machine learning algorithms just using basic mathematical operations as building blocks. The researchers demonstrated this by introducing a novel framework that significantly reduced human bias through a generic search space.

Rethinking Batch Normalization for Meta-Learning

Batch normalization is an essential component of meta-learning pipelines. However, there are several challenges. So, in this paper, the authors evaluated a range of approaches to batch normalization for meta-learning scenarios and developed a novel approach — TaskNorm. Experiments demonstrated that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient-based and gradient-free meta-learning approaches. The TaskNorm has been found to be consistently improving the performance.

Meta-Learning without Memorisation

Meta-learning algorithms need meta-training tasks to be mutually exclusive, such that no single model can solve all of the tasks at once. In this paper, the authors designed a meta-regularisation objective using information theory that successfully uses data from non-mutually-exclusive tasks to efficiently adapt to novel tasks.

Understanding the Effectiveness of MAML

Model Agnostic Meta-Learning (MAML) consists of optimisation loops, from which the inner loop can efficiently learn new tasks. In this paper, the authors demonstrated that feature reuse is the dominant factor and led to ANIL (Almost No Inner Loop) algorithm — a simplification of MAML where the inner loop is removed for all but the (task-specific) head of the underlying neural network. 

Your Classifier is Secretly an Energy-Based Model

This paper proposed attempts to reinterpret a standard discriminative classifier as an energy-based model. In this setting, wrote the authors, the standard class probabilities can be easily computed. They demonstrated that energy-based training of the joint distribution improves calibration, robustness, handout-of-distribution detection while also enabling the proposed model to generate samples rivalling the quality of recent GAN approaches. This work improves upon the recently proposed techniques for scaling up the training of energy-based models. It has also been the first to achieve performance rivalling the state-of-the-art in both generative and discriminative learning within one hybrid model.

Reverse-Engineering Deep ReLU Networks

This paper investigated the commonly assumed notion that neural networks cannot be recovered from its outputs, as they depend on its parameters in a highly nonlinear way. The authors claimed that by observing only its output, one could identify the architecture, weights, and biases of an unknown deep ReLU network. By dissecting the set of region boundaries into components associated with particular neurons, the researchers showed that it is possible to recover the weights of neurons and their arrangement within the network.

(Note: The list is in no particular order and is a compilation based on the reputation of the publishers, reception to these research work in popular forums and feedback of the experts on social media. If you think we have missed any exceptional research work, please comment below)

Access all our open Survey & Awards Nomination forms in one place >>

Ram Sagar

Download our Mobile App

top research papers in machine learning

AIM Research

Pioneering advanced ai market research, request customised insights & surveys for the ai industry.

top research papers in machine learning

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative ai skilling for enterprises, our customized corporate training program on generative ai provides a unique opportunity to empower, retain, and advance your talent., 3 ways to join our community, telegram group.

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox, most popular.

top research papers in machine learning

Why ai.com Redirects to MKBHD YouTube’s Page?

disability ai

This Visually-Impaired Co-Founder Duo is Challenging Disability with AI 

Their technology is being used by the United Nations

These NVIDIA Researchers Made an Indic AI Model to Talk to their Spouses’ Indian Parents

NVIDIA Researchers Make Indic AI Model to Talk to their Spouses’ Indian Parents

top research papers in machine learning

Adobe Unveils Continous 3D Words for Text-to-Image Control

top research papers in machine learning

10 Mind-Blowing Videos Created by Sora 

top research papers in machine learning

Now You Can Run OpenAI’s Whisper on Apple Watches 

Argmax introduces WhisperKit, bringing OpenAI’s Whisper to Apple Devices for instant speech recognition.

top research papers in machine learning

Google Gemini 1.5 Pro Calls OpenAI Sora Generated Video Fake 

CoRover.ai is the Silent Winner of Indian LLM Race

CoRover.ai is the Silent Winner of Indian LLM Race

top research papers in machine learning

UiPath is Building Foundational Models: CIO

The company also announced UiPath Autopilot for Test Suite, now in public preview

Our mission is to bring about better-informed and more conscious decisions about technology through authoritative, influential, and trustworthy journalism.

Shape the future of tech.

© Analytics India Magazine Pvt Ltd & AIM Media House LLC 2024

  • Terms of use
  • Privacy Policy

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

Collection of must read papers for Data Science, or Machine Learning / Deep Learning Engineer

hurshd0/must-read-papers-for-ml

Folders and files, repository files navigation, must read papers for data science, ml, and dl, curated collection of data science, machine learning and deep learning papers, reviews and articles that are on must read list..

NOTE: 🚧 in process of updating, let me know what additional papers, articles, blogs to add I will add them here.
👉 ⭐ this repo

Contributing

  • 👉 🔃 Please feel free to Submit Pull Request , if links are broken, or I am missing any important papers, blogs or articles.

Maintenance

👇 READ THIS 👇

  • 👉 Reading paper with heavy math is hard, it takes time and effort to understand, most of it is dedication and motivation to not quit, don't be discouraged, read once, read twice, read thrice,... until it clicks and blows you away.

🥇 - Read it first

🥈 - Read it second

🥉 - Read it third

Data Science

📊 pre-processing & eda.

🥇 📄 Data preprocessing - Tidy data - by Hadley Wickham

📓 General DS

🥇 📄 Statistical Modeling: The Two Cultures - by Leo Breiman

🥈 📄 A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning

  • 📹 KDD 2019 Cynthia Rudin's Keynote

🥇 📄 Frequentism and Bayesianism: A Python-driven Primer by Jake VanderPlas

Machine Learning

🎯 general ml.

🥇 📄 Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning - by Sebastian Raschka

🥇 📄 A Brief Introduction into Machine Learning - by Gunnar Ratsch

🥉 📄 An Introduction to the Conjugate Gradient Method Without the Agonizing Pain - by Jonathan Richard Shewchuk

🥉 📄 On Model Stability as a Function of Random Seed

🔍 Outlier/Anomaly detection

🥇 📰 Outlier Detection : A Survey

🥈 📄 XGBoost: A Scalable Tree Boosting System

🥈 📄 LightGBM: A Highly Efficient Gradient BoostingDecision Tree

🥈 📄 AdaBoost and the Super Bowl of Classifiers - A Tutorial Introduction to Adaptive Boosting

🥉 📄 Greedy Function Approximation: A Gradient Boosting Machine

📖 Unraveling Blackbox ML

🥉 📄 Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation

🥉 📄 Data Shapley: Equitable Valuation of Data for Machine Learning

✂️ Dimensionality Reduction

🥇 📄 A Tutorial on Principal Component Analysis

🥈 📄 How to Use t-SNE Effectively

🥉 📄 Visualizing Data using t-SNE

📈 Optimization

🥇 📄 A Tutorial on Bayesian Optimization

🥈 📄 Taking the Human Out of the Loop: A review of Bayesian Optimization

Famous Blogs

Sebastian Raschka Chip Huyen

🎱 🔮 Recommenders

🥇 📄 A Survey of Collaborative Filtering Techniques

🥇 📄 Collaborative Filtering Recommender Systems

🥇 📄 Deep Learning Based Recommender System: A Survey and New Perspectives

🥇 📄 🤔 ⭐ Explainable Recommendation: A Survey and New Perspectives ⭐

Case Studies

🥈 📄 The Netflix Recommender System: Algorithms, Business Value,and Innovation

  • Netflix Recommendations: Beyond the 5 stars Part 1
  • Netflix Recommendations: Beyond the 5 stars Part 2

🥈 📄 Two Decades of Recommender Systems at Amazon.com

🥈 🌐 How Does Spotify Know You So Well?

👉 More In-Depth study, 📕 Recommender Systems Handbook

Famous Deep Learning Blogs 🤠

🌐 Stanford UFLDL Deep Learning Tutorial

🌐 Distill.pub

🌐 Colah's Blog

🌐 Andrej Karpathy

🌐 Zack Lipton

🌐 Sebastian Ruder

🌐 Jay Alammar

📚 Neural Networks and Deep Learning Neural Networks

⭐ 🥇 📰 The Matrix Calculus You Need For Deep Learning - Terence Parr and Jeremy Howard ⭐

🥇 📰 Deep learning -Yann LeCun, Yoshua Bengio & Geoffrey Hinton

🥇 📄 Generalization in Deep Learning

🥇 📄 Topology of Learning in Artificial Neural Networks

🥇 📄 Dropout: A Simple Way to Prevent Neural Networks from Overfitting

🥈 📄 Polynomial Regression As an Alternative to Neural Nets

🥈 🌐 The Neural Network Zoo

🥈 🌐 Image Completion with Deep Learning in TensorFlow

🥈 📄 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

🥉 📄 A systematic study of the class imbalance problem in convolutional neural networks

🥉 📄 All Neural Networks are Created Equal

🥉 📄 Adam: A Method for Stochastic Optimization

🥉 📄 AutoML: A Survey of the State-of-the-Art

🥇 📄 Visualizing and Understanding Convolutional Networks -by Andrej Karpathy Justin Johnson Li Fei-Fei

🥈 📄 Deep Residual Learning for Image Recognition

🥈 📄 AlexNet-ImageNet Classification with Deep Convolutional Neural Networks

🥈 📄 VGG Net-VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

🥉 📄 A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction

🥉 📄 Large-scale Video Classification with Convolutional Neural Networks

🥉 📄 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

⚫ CapsNet 🔱

🥇 📄 Dynamic Routing Between Capsules

Blog explaning, "What are CapsNet, or Capsule Networks?"

Capsule Networks Tutorial by Aureline Geron

🏞️ 💬 Image Captioning

🥇 📄 Show and Tell: A Neural Image Caption Generator

🥈 📄 Neural Machine Translation by Jointly Learning to Align and Translate

🥈 📄 StyleNet: Generating Attractive Visual Captions with Styles

🥈 📄 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

🥈 📄 Where to put the Image in an Image Caption Generator

🥈 📄 Dank Learning: Generating Memes Using Deep Neural Networks

🚗 🚶‍♂️ Object Detection 🦅 🏈

🥈 📄 ResNet-Deep Residual Learning for Image Recognition

🥈 📄 YOLO-You Only Look Once: Unified, Real-Time Object Detection

🥈 📄 Microsoft COCO: Common Objects in Context

  • COCO dataset

🥈 📄 (R-CNN) Rich feature hierarchies for accurate object detection and semantic segmentation

🥈 📄 Fast R-CNN

  • 💻 Papers with Code

🥈 📄 Faster R-CNN

🥈 📄 Mask R-CNN

🚗 🚶‍♂️ 👫 Pose Detection 🏃 💃

🥈 📄 DensePose: Dense Human Pose Estimation In The Wild

🥈 📄 Parsing R-CNN for Instance-Level Human Analysis

🔡 🔣 Deep NLP 💱 🔢

🥇 📄 A Primer on Neural Network Models for Natural Language Processing

🥇 📄 Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

🥇 📄 On the Properties of Neural Machine Translation: Encoder–Decoder Approaches

🥇 📄 LSTM: A Search Space Odyssey - by Klaus Greff et al.

🥇 📄 A Critical Review of Recurrent Neural Networksfor Sequence Learning

🥇 📄 Visualizing and Understanding Recurrent Networks

⭐ 🥇 📄 Attention Is All You Need ⭐

🥇 📄 An Empirical Exploration of Recurrent Network Architectures

🥇 📄 Open AI (GPT-2) Language Models are Unsupervised Multitask Learners

🥇 📄 BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding

  • Google BERT Annoucement

🥉 📄 Parameter-Efficient Transfer Learning for NLP

🥉 📄 A Sensitivity Analysis of (and Practitioners’ Guide to) ConvolutionalNeural Networks for Sentence Classification

🥉 📄 A Survey on Recent Advances in Named Entity Recognition from Deep Learning models

🥉 📄 Convolutional Neural Networks for Sentence Classification

🥉 📄 Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

🥉 📄 Single Headed Attention RNN: Stop Thinking With Your Head

🥇 📄 Generative Adversarial Nets - Goodfellow et al.

📚 GAN Rabbit Hole -> GAN Papers

⭕➖⭕ GNNs (Graph Neural Networks)

🥉 📄 A Comprehensive Survey on Graph Neural Networks

👨‍⚕️ 💉 Medical AI 💊 🔬

Machine learning classifiers and fMRI: a tutorial overview - by Francisco et al.

👇 Cool Stuff 👇

🔊 📄 SoundNet: Learning Sound Representations from Unlabeled Video

🎨 📄 CAN: Creative Adversarial NetworksGenerating “Art” by Learning About Styles andDeviating from Style Norms

🎨 📄 Deep Painterly Harmonization

  • Github Code

🕺 💃 📄 Everybody Dance Now

  • Everybody Dance Now - Youtube Video

⚽ Soccer on Your Tabletop

👱‍♀️ 💇‍♀️ 📄 SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color

📸 📄 Handheld Mobile Photography in Very Low Light

🏯 🕌 📄 Learning Deep Features for Scene Recognitionusing Places Database

🚅 🚄 📄 High-Speed Tracking withKernelized Correlation Filters

🎬 📄 Recent progress in semantic image segmentation

Rabbit hole -> 🔊 🌐 Analytics Vidhya Top 10 Audio Processing Tasks and their papers

:blonde_man: -> 👴 📄 📄 Face Aging With Condintional GANS

:blonde_man: -> 👴 📄 📄 Dual Conditional GANs for Face Aging and Rejuvenation

⚖️ 📄 BAGAN: Data Augmentation with Balancing GAN

labml.ai Annotated PyTorch Paper Implementations

📰 Cap Stone Projects 📰

8 Awesome Data Science Capstone Projects

10 Powerful Applications of Linear Algebra in Data Science

Top 5 Interesting Applications of GANs

Deep Learning Applications a beginner can build in minutes

2019-10-28 Started must-read-papers-for-ml repo

2019-10-29 Added analytics vidhya use case studies article links

2019-10-30 Added Outlier/Anomaly detection paper, separated Boosting, CNN, Object Detection, NLP papers, and added Image captioning papers

2019-10-31 Added Famous Blogs from Deep and Machine Learning Researchers

2019-11-1 Fixed markdown issues, added contribution guideline

2019-11-20 Added Recommender Surveys, and Papers

2019-12-12 Added R-CNN variants, PoseNets, GNNs

2020-02-23 Added GRU paper

Contributors 2

top research papers in machine learning

Ten Noteworthy AI Research Papers of 2023

top research papers in machine learning

This year has felt distinctly different. I've been working in, on, and with machine learning and AI for over a decade, yet I can't recall a time when these fields were as popular and rapidly evolving as they have been this year.

To conclude an eventful 2023 in machine learning and AI research, I'm excited to share 10 noteworthy papers I've read this year. My personal focus has been more on large language models, so you'll find a heavier emphasis on large language model (LLM) papers than computer vision papers this year.

I resisted labeling this article "Top AI Research Papers of 2023" because determining the "best" paper is subjective. The selection criteria were based on a mix of papers I either particularly enjoyed or found impactful and worth noting. (The sorting order is a recommended reading order, not an ordering by perceived quality or impact.)

By the way, if you scroll down to the end of this article, you'll find a little surprise. Thanks for all your support, and I wish you a great start to the new year!

1) Pythia — Insights from Large-Scale Training Runs

With Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , the researchers originally released 8 LLMs ranging from 70M to 12B parameters (with both weights and data publicly released, which is rare).

But in my opinion, the standout feature of this paper is that they also released the training details, analyses, and insights (some of them shown in the annotated figure below). 

top research papers in machine learning

Here are some questions that the Pythia paper addresses:

Does pretraining on duplicated data (i.e., training for >1 epoch) make a difference? It turns out that deduplication does not benefit or hurt performance.

Does training order influence memorization? Unfortunately, it turns out that it does not. "Unfortunately," because if this was true, we could mitigate undesirable verbatim memorization issues by reordering the training data.

Does pretrained term frequency influence task performance? Yes, few-shot accuracy tends to be higher for terms that occur more frequently.

Does increasing the batch size affect training efficiency and model convergence? Doubling the batch size halves the training time but doesn't hurt convergence.

Today, only six months later, the LLMs are by no means groundbreaking. However, I am including this paper because it not only tries to answer interesting questions about training settings but is also a positive example regarding details and transparency. Moreover, the small LLMs in the <1B range are nice templates for small studies and tinkering, or starters for pretraining experiments (here's a link to their GitHub repository ). 

My wish for 2024 is that we see more studies like this and well-written papers in the coming year!

2) Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open Foundation and Fine-Tuned Chat Models is the follow-up paper to Meta's popular first Llama paper. 

Llama 2 models, which range from 7B to 70B parameters, are one of the reasons this paper made it onto this list: these are still among the most capable and widely used openly available models. Worth noting is that the Llama 2 license also permits use in commercial applications (see the Request to Access page for details).

top research papers in machine learning

On the model side, what differentiates the Llama 2 suite from many other LLMs is that the models come as standard pretrained models and chat models that have been finetuned via reinforcement learning with human feedback (RLHF, the method used to create ChatGPT) to follow human instructions similar to ChatGPT — RLHF-finetuned models are still rare.

top research papers in machine learning

For more details on RLHF and how it's used in Llama 2, see my more comprehensive standalone article below.

LLM Training: RLHF and Its Alternatives

LLM Training: RLHF and Its Alternatives

Next to the fact that Llama 2 models are widely used and come with RLHF instruction-finetuned variants, the other reason I decided to include the paper on this list is the accompanying in-depth 77-page research report.

Here, the authors also nicely illustrated the evolution of the Llama 2 70B Chat models, tracing their journey from the initial supervised finetuning (SFT-v1) to the final RLHF finetuning stage with PPO (RLHF-v5). The chart reflects consistent improvements in both the harmlessness and helpfulness axes, as shown in the annotated plots below.

top research papers in machine learning

Even though models such as Mistral-8x7B (more later), DeepSeek-67B, and YI-34B top the larger Llama-2-70B models in public benchmarks, Llama 2 still remains a common and popular choice when it comes to openly available LLMs and developing methods on top of it. 

Furthermore, even though some benchmarks indicate that there may be better models, one of the bigger challenges this year has been the trustworthiness of benchmarks. For instance, how do we know that the models haven't been trained on said benchmarks and the scores aren't inflated? In classic machine learning, when someone proposed a new gradient boosting model, it was relatively easy to reproduce the results and check. Nowadays, given how expensive and complex it is to train LLMs (and the fact that most researchers either don't disclose the architecture or the training data details), it is impossible to tell. 

To conclude, it's refreshing to see Meta doubling down on open source even though every other major company is now rolling out its own proprietary large language models (Google's Bard and Gemini, Amazon's Q, and Twitter/X's Grok, and OpenAI's ChatGPT). 

3) QLoRA: Efficient Finetuning of Quantized LLMs

QLoRA: Efficient Finetuning of Quantized LLMs has been one of the favorite techniques in the LLM research and finetuning community this year because it makes the already popular LoRA (low-rank adaptation) technique more memory efficient. In short, this means that you can fit larger models onto smaller GPUs.

top research papers in machine learning

QLoRA stands for quantized LoRA (low-rank adaptation). The standard LoRA method modifies a pretrained LLM by adding low-rank matrices to the weights of the model's layers. These matrices are smaller and, therefore, require fewer resources to update during finetuning.

In QLoRA, these low-rank matrices are quantized, meaning their numerical precision is reduced. This is done by mapping the continuous range of values in these matrices to a limited set of discrete levels. This process reduces the model's memory footprint and computational demands, as operations on lower-precision numbers are less memory-intensive

top research papers in machine learning

According to the QLoRA paper , QLoRA reduces the memory requirements of a 65B Llama model to fit onto a single 48 GB GPU (like an A100). The 65B Guanaco model, obtained from quantized 4-bit training of 65B Llama, maintains full 16-bit finetuning task performance, reaching 99.3% of the ChatGPT performance after only 24 hours of finetuning.

I've also run many QLoRA experiments this year and found QLoRA a handy tool for reducing GPU memory requirements during finetuning. There's a trade-off, though: the extra quantization step results in an additional computation overhead, meaning the training will be a bit slower than regular LoRA.

top research papers in machine learning

LLM finetuning remains as relevant as ever as researchers and practitioners aim to create custom LLMs. And I appreciate techniques like QLoRA that help make this process more accessible by lowering the GPU memory-requirement barrier.

4) BloombergGPT: A Large Language Model for Finance

Looking at all the papers published this year, BloombergGPT: A Large Language Model for Finance may look like an odd choice for a top-10 list because it didn't result in a groundbreaking new insight, methodology, or open-source model. 

I include it because it's an interesting case study where someone pretrained a relatively large LLM on a domain-specific dataset. Moreover, the description was pretty thorough, which is becoming increasingly rare. This is especially true when it comes to papers with authors employed at companies -- one of the trends this year was that major companies are becoming increasingly secretive about architecture or dataset details to preserve trade secrets in this competitive landscape (PS: I don't fault them for that).

Also, BloombergGPT made me think of all the different ways we can pretrain and finetune models on domain-specific data, as summarized in the figure below (note that this was not explored in the BloombergGPT paper, but it would be interesting to see future studies on that).

top research papers in machine learning

In short, BloombergGPT is a 50-billion parameter language model for finance, trained on 363 billion tokens from finance data and 345 billion tokens from a general, publicly available dataset. For comparison, GPT-3 is 3.5x larger (175 billion parameters) but was trained on 1.4x fewer tokens (499 billion).

Why did the authors use an architecture with "only" 50 billion parameters since GPT-3 is 3.5x larger? That's easier to answer. They adopted the Chinchilla scaling laws and found this to be a good size given the available size of the finance data.

Is it worth (pre)training the LLM on the combined dataset from scratch? Based on the paper, the model performs really well in the target domain. However, we don't know whether it's better than a) further pretraining a pretrained model on domain-specific data or b) finetuning a pretrained model on domain-specific data.

Despite the little criticism above, overall, this is an interesting paper that serves as an interesting case study and example for domain-specific LLMs; plus, it leaves room for further research on pretraining versus finetuning to instill knowledge into an LLM.

(PS: For those curious about a comparison to finetuning, as Rohan Paul shared with me, the "small" AdaptLLM-7B model outperforms BloombergGPT on one dataset and nearly matches its performance on three other finance datasets. Although BloombergGPT appears to be slightly better overall, it's worth noting that training AdaptLLM-7B cost about $100, in contrast to BloombergGPT's multi-million dollar investment.)

5) Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Before discussing the Direct Preference Optimization: Your Language Model is Secretly a Reward Model paper, let's take a short step back and discuss the method it aims to replace, Reinforcement Learning from Human Feedback (RLHF).

RLHF is the main technique behind ChatGPT and Llama 2 Chat models. In RLHF, which I described in more detail in a separate article , we use a multi-step procedure:

Supervised finetuning: The model is initially trained on a dataset containing instructions and the desired responses.

Reward modeling: Human raters provide feedback on the model's outputs. This feedback is used to create a reward model, which learns to predict what kinds of outputs are to be preferred.

Proximal policy optimization (PPO): The model generates outputs, and the reward model scores each output. The PPO algorithm uses these scores to adjust the model's policy toward 

generating higher-quality outputs. (This is a reinforcement learning algorithm used to finetune the model's policy.

top research papers in machine learning

While RLHF is popular and effective, as we've seen with ChatGPT and Llama 2, it's also pretty complex to implement and finicky. 

The Direct Preference Optimization (DPO) paper introduces an algorithm that optimizes language models to align with human preferences without explicit reward modeling or reinforcement learning. Instead, DPO uses a simple classification objective.

top research papers in machine learning

In DPO, we still keep the supervised finetuning step (step 1 above), but we replace steps 2 and 3 with a single step to further finetune the model on the preference data. In other words, DPO skips the reward model creation required by RLHF entirely, which significantly simplifies the finetuning process.

How well does it work? There haven't been many models trained with DPO until very recently. (This makes sense because DPO is also a relatively recent method.) However, one recent example is the Zephyr 7B model described in Zephyr: Direct Distillation of LM Alignment . Zephyr-7B is based on a Mistral-7B base LLM that has been finetuned using DPO. (There will be more on Mistral later.)

As the performance tables below reveal, the 7B-parameter Zephyr model outperformed all other models in its size class at the time of its release. Even more impressively, Zephyr-7B even surpassed the 10 times larger 70B-parameter Llama 2 chat model on the conversational MT-Bench benchmark as well.

top research papers in machine learning

In summary, the appeal of the DPO paper lies in the simplicity of its method. The scarcity of chat models trained using RLHF, with Llama 2 as a notable exception, can likely be attributed to the complexity of the RLHF approach. Given this, I think it's reasonable to anticipate an increase in the adoption of DPO models in the coming year.

6) Mistral 7B

I must admit that the Mistral 7B paper wasn't among my favorites due to its brevity. However, the model it proposed was quite impactful.

I decided to include the paper on this list because the Mistral 7B model was not only very popular upon release, but also served as the base model, leading to the development of two other notable models: Zephyr 7B and the latest Mistral Mixture of Experts (MoE) approach. These models are good examples of the trend I foresee for small LLMs in (at least) the early half of 2024.

Before we discuss the Zephyr 7B and Mistral MoE models, let's briefly talk about Mistral 7B itself.

In short, The Mistral 7B paper introduces a compact yet powerful language model that, despite its relatively modest size of 7 billion tokens, outperforms its larger counterparts, such as the 13B Llama 2 model, in various benchmarks. (Next to the two-times larger Qwen 14B , Mistral 7B was also the base model used in the winning solutions of this year's NeurIPS LLM Finetuning & Efficiency challenge .)

top research papers in machine learning

Why exactly it is so good is unclear, but it might likely be due to its training data. Neither Llama 2 nor Mistral discloses the training data, so we can only speculate.

Architecture-wise, the model shares group-query attention with Llama 2. While being very similar to Llama 2, one interesting addition to the Mistral architecture is sliding window attention to save memory and improve computational throughput for faster training. (Sliding window attention was previously proposed in Child et al. 2019 and Beltagy et al. 2020 .)

The sliding window attention mechanism used in Mistral is essentially a fixed-sized attention block that allows a current token to attend only a specific number of previous tokens (instead of all previous tokens), which is illustrated in the figure below.

top research papers in machine learning

In the specific case of 7B Mistral, the attention block size is 4096 tokens, and the researchers were training the model with up to 100k token context sizes. To provide a  concrete example, in regular self-attention, a model at the 50,000th token can attend all previous 49,999 tokens. In sliding window self-attention, the Mistral model can only attend tokens 45,904 to 50,000 (since 50,000 - 4,096 = 45,904). 

However, sliding window attention is mainly used to improve computational performance. The fact that Mistral outperforms larger Llama 2 models is likely not because of sliding window attention but rather despite sliding window attention.

Zephyr and Mixtral

One reason Mistral 7B is an influential model is that it served as the base model for Zephyr 7B, as mentioned earlier in the DPO section. Zephyr 7B, the first popular model trained with DPO to outperform other alternatives, has potentially set the stage for DPO to become the preferred method for finetuning chat models in the coming months.

Another noteworthy model derived from Mistral 7B is the recently released Mistral Mixture of Experts (MoE) model , also known as Mixtral-8x7B. This model matches or exceeds the performance of the larger Llama-2-70B on several public benchmarks.

top research papers in machine learning

For more benchmarks, also see the official Mixtral blog post announcement . The team also released a Mixtral-8x7B-Instruct model that has been finetuned with DPO (but as of this writing there are no benchmarks comparing it to Llama-2-70-Chat, the RLHF-finetuned model).

top research papers in machine learning

GPT-4 is also rumored to be an MoE consisting of 16 submodules. Each of these 16 submodules is rumored to have 111 billion parameters (for reference, GPT-3 has 175 billion parameters). If you read my AI and Open Source in 2023 article approximately two months ago, I mentioned that "It will be interesting to see if MoE approaches can lift open-source models to new heights in 2024". It looks like Mixtral started this trend early, and I am sure that this is just the beginning.

Mixture of Experts 101

If you are new to MoE models, here's a short explanation.

top research papers in machine learning

The figure above shows the architecture behind the Switch Transformer, which uses 1 expert per token with 4 experts in total. Mixtral-8x-7B, on the other hand, consists of 8 experts and uses 2 experts per token.

Why MoEs? Combined, the 8 experts in a 7B model like Mixtral are still ~56B parameters. Actually, it's less than 56B, because the MoE approach is only applied to the FFN (feed forward network, aka fully-connected) layers, not the self-attention weight matrices. So, it's likely closer to 40-50B parameters.

Note that the router reroutes the tokens such that only <14B parameters (2x <7B, instead of all <56B) are used at a time for the forward pass, so the training (and especially inference) will be faster compared to the traditional non-MoE approach.

If you want to learn more about MoEs, here's a reading list recommended by Sophia Yang : 

The Sparsely-Gated Mixture-of-Experts Layer (2017)

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (2020)  

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts (2022)  

Mixture-of-Experts Meets Instruction Tuning (2023)

Furthermore, if you are interested in trying MoE LLMs, also check out the OpenMoE repository, which implemented and shared MoE LLMs earlier this year.

Other Small but Competitive LLMs

Mistral 7B, Zephyr 7B, and Mixtral-8x7B are excellent examples of the progress made in 2023 with small yet capable models featuring openly available weights. Another notable model, a runner-up on my favorite papers list, is Microsoft's phi series.

The secret sauce of phi is training on high-quality data (referred to as “textbook quality data”) obtained by filtering web data.

Released in stages throughout 2023, the phi models include phi-1 (1.3B parameters), phi-1.5 (1.3B parameters), and phi-2 (2.7B parameters). The latter, released just two weeks ago, is already said to match or outperform Mistral 7B, despite being only half its size.

top research papers in machine learning

For more information about the phi models, I recommend the following resources:

Textbooks Are All You Need -- the phi-1 paper

Textbooks Are All You Need II: phi-1.5 Technical Report

The Phi-2: The Surprising Power of Small Language Models announcement

7) Orca 2: Teaching Small Language Models How to Reason

Orca 2: Teaching Small Language Models How to Reason is a relatively new paper, and time will tell whether it has a lasting impact on how we train LLMs in the upcoming months or years. 

I decided to include it because it combines several concepts and ideas. 

One is the idea of distilling data from large, capable models such as GPT-4 to create a synthetic dataset to train small but capable LLMs. This idea was described in the Self-Instruct paper, which came out last year. Earlier this year, Alpaca (a Llama model finetuned on ChatGPT outputs) really popularized this approach.

How does this work? In a nutshell, it's a 4-step process:

Seed task pool with a set of human-written instructions (175 in this case) and sample instructions;

Use a pretrained LLM (like GPT-3) to determine the task category;

Given the new instruction, let a pretrained LLM generate the response;

Collect, prune, and filter the responses before adding them to the task pool.

top research papers in machine learning

The other idea may not be surprising but worth highlighting: high-quality data is important for finetuning. For instance, the LIMA paper proposed a human-generated high-quality dataset consisting of only 1k training examples that can be used to finetuning to outperform the same model finetuned on 50k ChatGPT-generated responses.

top research papers in machine learning

Unlike previous research that heavily relied on imitation learning to replicate outputs from larger models, Orca 2 aims to teach "small" (i.e., 7B and 13B) LLMs various reasoning techniques (like step-by-step reasoning, recall-then-generate, etc.) and to help them determine the most effective strategy for each task. This approach has led Orca 2 to outperform similar-sized models noticeably and even achieve results comparable to models 5-10 times larger.

top research papers in machine learning

While we haven't seen any extensive studies on this, the Orca 2 approach might also be able to address the issue of using synthetic data that was highlighted in the The False Promise of Imitating Proprietary LLMs paper. Here, the researchers investigated the finetuning weaker language models to imitate stronger proprietary models like ChatGPT, using examples such as Alpaca and Self-Instruct. Initially, the imitation models showed promising results, performing well in following instructions and receiving competitive ratings from crowd workers compared to ChatGPT. However, more follow-up evaluations revealed that these imitation models only seemed to perform well to a human observer but often generated factually incorrect responses.

8) ConvNets Match Vision Transformers at Scale

In recent years, I've almost exclusively worked with large language transformers or vision transformers (ViTs) due to their good performance. 

Switching gears from language to computer vision papers for the last three entries, what I find particularly appealing about transformers for computer vision is that pretrained ViTs are even easier to finetune than convolutional neural networks. (I summarized a short hands-on talk at CVPR earlier this year here: https://magazine.sebastianraschka.com/p/accelerating-pytorch-model-training). 

To my surprise, I stumbled upon the ConvNets Match Vision Transformers at Scale paper showing that convolutional neural networks (CNNs) are in fact, competitive with ViTs when given access to large enough datasets.

top research papers in machine learning

Here, researchers invested compute budgets of up to 110k TPU hours to do a fair comparison between ViTs and CNNs. The outcome was that when CNNs are pretrained with a compute budget similar to what is typically used for ViTs, they can match the performance of ViTs. For this, they pretrained on 4 billion labeled images from JFT and subsequently finetuned the models on ImageNet.

9) Segment Anything

Object recognition and segmentation in images and videos, along with classification and generative modeling, are the main research fields in computer vision. 

To briefly highlight the difference between these two tasks: object detection about predicting bounding boxes and the associated labels; segmentation classifies each pixel to distinguish between foreground and background objects. 

top research papers in machine learning

Meta's Segment Anything paper is a notable milestone for open source and image segmentation research. The paper introduces a new task, model, and dataset for image segmentation. The accompanying image datasets the largest segmentation dataset to date with over 1 billion masks on 11 million images. 

top research papers in machine learning

However, what's rare and especially laudable is that the researchers used licensed and privacy-respecting images, so the model can be open-sourced without major copyright concerns.

The Segment Anything Model (SAM) consists of three main components, as summarized in the annotated figure above.

top research papers in machine learning

In slightly more details, the three components can be summarized as follows:

An image encoder utilizing a masked autoencoder based on a pretrained vision transformer (ViT) that can handle high-resolution inputs. This encoder is run once per image and can be applied before prompting the model.

A prompt encoder that handles two types of prompts: sparse (points, boxes, text) and dense (masks). Points and boxes are represented by positional encodings combined with learned embeddings for each prompt type. And free-form text uses an off-the-shelf text encoder from CLIP. Dense prompts, i.e., masks, are embedded using convolutions and summed element-wise with the image embedding.

A mask decoder maps the image embedding, prompt embeddings, and an output token to a mask. This is a decoder-style transformer architecture that computes the mask foreground probability at each image location.

Image segmentation is important for applications like self-driving cars, medical imaging, and many others. In the short amount of 6 months, the paper has already been cited more than 1500 times , and there have already been many projects that have been built on top of this paper.

10) Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning is another notable computer vision project from Meta's research division. 

Emu is a text-to-video model that can generate entire videos from text prompts. 

While it's not the first model for impressive text-to-video generation, it compares very favorably to previous works.

top research papers in machine learning

As the authors note, the Emu architecture setup is relatively simple compared to previous approaches. One of the main ideas here is that Emu factorizes the generation process into two steps: first, generating an image based on text (using a diffusion model), then creating a video conditioned on both the text and the generated image (using another diffusion model). 

2022 has been a big year for text-to-image models like DALL-E 2, Stable Diffusion, and Midjourney. While text-to-image models remain very popular in 2023 (even though LLMs got most of the attention throughout the year), I think that text-to-video models are just about to become more prevalent in online communities in the upcoming year. 

Since I am not an image or video designer, I don't have use cases for these tools at the moment; however, text-to-image and text-to-video models are nonetheless interesting to watch as a general measure of progress regarding computer vision.

This magazine is personal passion project that does not offer direct compensation. However, for those who wish to support me, please consider purchasing a copy of one of my books . If you find them insightful and beneficial, please feel free to recommend them to your friends and colleagues.

top research papers in machine learning

Your support means a great deal! Thank you!

top research papers in machine learning

Ready for more?

top research papers in machine learning

CORE MACHINE LEARNING

Revisiting feature prediction for learning visual representations from video.

February 15, 2024

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model’s parameters; e.g., using a frozen backbone, our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.

Adrien Bardes

Quentin Garrido

Xinlei Chen

Michael Rabbat

Mido Assran

Nicolas Ballas

Research Topics

Core Machine Learning

Related Publications

January 09, 2024

Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK Work Decomposition

Less Wright , Adnan Hoque

January 06, 2024

RANKING AND RECOMMENDATIONS

Reinforcement learning, learning to bid and rank together in recommendation systems.

Geng Ji , Wentao Jiang , Jiang Li , Fahmid Morshed Fahid , Zhengxing Chen , Yinghua Li , Jun Xiao , Chongxi Bao , Zheqing (Bill) Zhu

November 13, 2023

Mechanic: A Learning Rate Tuner

Aaron Defazio , Ashok Cutkosky , Harsh Mehta

October 01, 2023

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

Wei Hung , Bo-Kai Huang , Ping-Chun Hsieh , Xi Liu

Latest News

Cicero: an ai agent that negotiates, persuades, and cooperates with people.

November 22, 2022

top research papers in machine learning

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment..

Latest Work

Our Actions

Meta © 2024

chrome icon

Showing papers in "Machine Learning in 2023"

Citation Count

4  citations

3  citations

2  citations

1  citations

Help | Advanced Search

Computer Science > Machine Learning

Title: fedkit: enabling cross-platform federated learning for android and ios.

Abstract: We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and training. We have deployed FedKit in a real-world use case for health data analysis on university campuses, demonstrating its effectiveness. FedKit is open-source at this https URL .

Submission history

Access paper:.

  • Download PDF
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Social justice
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

MIT researchers remotely map crops, field by field

Press contact :, media download.

Four Google Street View photos show rice, cassava, sugarcane, and maize fields.

*Terms of Use:

Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives license . You may not alter the images provided, other than to crop them to size. A credit line must be used when reproducing images; if one is not provided below, credit the images to "MIT."

Four Google Street View photos show rice, cassava, sugarcane, and maize fields.

Previous image Next image

Crop maps help scientists and policymakers track global food supplies and estimate how they might shift with climate change and growing populations. But getting accurate maps of the types of crops that are grown from farm to farm often requires on-the-ground surveys that only a handful of countries have the resources to maintain.

Now, MIT engineers have developed a method to quickly and accurately label and map crop types without requiring in-person assessments of every single farm. The team’s method uses a combination of Google Street View images, machine learning, and satellite data to automatically determine the crops grown throughout a region, from one fraction of an acre to the next. 

The researchers used the technique to automatically generate the first nationwide crop map of Thailand — a smallholder country where small, independent farms make up the predominant form of agriculture. The team created a border-to-border map of Thailand’s four major crops — rice, cassava, sugarcane, and maize — and determined which of the four types was grown, at every 10 meters, and without gaps, across the entire country. The resulting map achieved an accuracy of 93 percent, which the researchers say is comparable to on-the-ground mapping efforts in high-income, big-farm countries.

The team is applying their mapping technique to other countries such as India, where small farms sustain most of the population but the type of crops grown from farm to farm has historically been poorly recorded.

“It’s a longstanding gap in knowledge about what is grown around the world,” says Sherrie Wang, the d’Arbeloff Career Development Assistant Professor in MIT’s Department of Mechanical Engineering, and the Institute for Data, Systems, and Society (IDSS). “The final goal is to understand agricultural outcomes like yield, and how to farm more sustainably. One of the key preliminary steps is to map what is even being grown — the more granularly you can map, the more questions you can answer.”

Wang, along with MIT graduate student Jordi Laguarta Soler and Thomas Friedel of the agtech company PEAT GmbH, will present a paper detailing their mapping method later this month at the AAAI Conference on Artificial Intelligence.

Ground truth

Smallholder farms are often run by a single family or farmer, who subsist on the crops and livestock that they raise. It’s estimated that smallholder farms support two-thirds of the world’s rural population and produce 80 percent of the world’s food. Keeping tabs on what is grown and where is essential to tracking and forecasting food supplies around the world. But the majority of these small farms are in low to middle-income countries, where few resources are devoted to keeping track of individual farms’ crop types and yields.

Crop mapping efforts are mainly carried out in high-income regions such as the United States and Europe, where government agricultural agencies oversee crop surveys and send assessors to farms to label crops from field to field. These “ground truth” labels are then fed into machine-learning models that make connections between the ground labels of actual crops and satellite signals of the same fields. They then label and map wider swaths of farmland that assessors don’t cover but that satellites automatically do.

“What’s lacking in low- and middle-income countries is this ground label that we can associate with satellite signals,” Laguarta Soler says. “Getting these ground truths to train a model in the first place has been limited in most of the world.”

The team realized that, while many developing countries do not have the resources to maintain crop surveys, they could potentially use another source of ground data: roadside imagery, captured by services such as Google Street View and Mapillary, which send cars throughout a region to take continuous 360-degree images with dashcams and rooftop cameras.

In recent years, such services have been able to access low- and middle-income countries. While the goal of these services is not specifically to capture images of crops, the MIT team saw that they could search the roadside images to identify crops.

Cropped image

In their new study, the researchers worked with Google Street View (GSV) images taken throughout Thailand — a country that the service has recently imaged fairly thoroughly, and which consists predominantly of smallholder farms.

Starting with over 200,000 GSV images randomly sampled across Thailand, the team filtered out images that depicted buildings, trees, and general vegetation. About 81,000 images were crop-related. They set aside 2,000 of these, which they sent to an agronomist, who determined and labeled each crop type by eye. They then trained a convolutional neural network to automatically generate crop labels for the other 79,000 images, using various training methods, including iNaturalist — a web-based crowdsourced  biodiversity database, and GPT-4V, a “multimodal large language model” that enables a user to input an image and ask the model to identify what the image is depicting. For each of the 81,000 images, the model generated a label of one of four crops that the image was likely depicting — rice, maize, sugarcane, or cassava.

The researchers then paired each labeled image with the corresponding satellite data taken of the same location throughout a single growing season. These satellite data include measurements across multiple wavelengths, such as a location’s greenness and its reflectivity (which can be a sign of water). 

“Each type of crop has a certain signature across these different bands, which changes throughout a growing season,” Laguarta Soler notes.

The team trained a second model to make associations between a location’s satellite data and its corresponding crop label. They then used this model to process satellite data taken of the rest of the country, where crop labels were not generated or available. From the associations that the model learned, it then assigned crop labels across Thailand, generating a country-wide map of crop types, at a resolution of 10 square meters.

This first-of-its-kind crop map included locations corresponding to the 2,000 GSV images that the researchers originally set aside, that were labeled by arborists. These human-labeled images were used to validate the map’s labels, and when the team looked to see whether the map’s labels matched the expert, “gold standard” labels, it did so 93 percent of the time.

“In the U.S., we’re also looking at over 90 percent accuracy, whereas with previous work in India, we’ve only seen 75 percent because ground labels are limited,” Wang says. “Now we can create these labels in a cheap and automated way.”

The researchers are moving to map crops across India, where roadside images via Google Street View and other services have recently become available.

“There are over 150 million smallholder farmers in India,” Wang says. “India is covered in agriculture, almost wall-to-wall farms, but very small farms, and historically it’s been very difficult to create maps of India because there are very sparse ground labels.”

The team is working to generate crop maps in India, which could be used to inform policies having to do with assessing and bolstering yields, as global temperatures and populations rise.

“What would be interesting would be to create these maps over time,” Wang says. “Then you could start to see trends, and we can try to relate those things to anything like changes in climate and policies.”

Share this news article on:

Related links.

  • Sherrie Wang
  • Institute for Data, Systems, and Society
  • Department of Mechanical Engineering

Related Topics

  • Agriculture
  • Computer modeling
  • Computer vision
  • Developing countries
  • Environment
  • Mechanical engineering

Related Articles

Collage of eleven new faculty member's headshots, arranged in two rows

School of Engineering welcomes new faculty

Landscape of a peat bog under a blue sky. In the foreground, several islands of peat are surrounded by water.

Satellite-based method measures carbon in peat bogs

Three women, researchers from the GEAR Lab, stand on a dirt road in a field in Jordan holding laptops.

Smart irrigation technology covers “more crop per drop”

The village has about 20 huts that form a large a ring around an empty, brown, circular area. Lots of trees are around the village.

Ancient Amazonians intentionally created fertile “dark earth”

Aerial view of an abandoned agricultural terrace in France

3 Questions: Can disused croplands help mitigate climate change?

Previous item Next item

More MIT News

Three layers show a glob of glue, shiny circular metal bits, and the colorful blue computer chip. Pink lasers go through the chip and hit the circular metal bits and bounce back. A lock icon is to the right.

This tiny, tamper-proof ID tag can authenticate almost anything

Read full story →

An aerial view shows trees and sports stadiums and parking lots. The areas are measured with a road measuring “921.97ft” and a parking lot measuring, “614,232.74 ft squared.”

Stitch3D is powering a new wave of 3D data collaboration

Headshots of Thea Keith-Lucas, Sarah Johnson, and Natalie Hill

MIT course aids social connection, better relationships, and happiness

Illustration of U shaped vaccine moledules with tails attach to three-lobed albumin molecules. The background is an image of a oval shaped lymph node.

Hitchhiking cancer vaccine makes progress in the clinic

Portrait photo of Leon Sandler standing in the foyer of an MIT building

A passion for innovation and education

Graphic of glowing moleculars being touched by an electrical charge against black background

With just a little electricity, MIT researchers boost common catalytic reactions

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram

share this!

February 16, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

Q&A: Machine-learning model tracks trends in public finance research

by Jennifer Ellen French, Georgia State University

global finance

What are the leading topics in public finance and budgeting, how have they changed, and what future topics should be more closely researched by professionals and practitioners?

Can Chen and two of his former doctoral students, Shiyang Xiao at Syracuse University and Boyuan Zhao at Florida International University, used a machine-learning technique—structural topic modeling (STM)—to identify these themes and their dynamics over the past 40 years for an article recently published in the journal Public Budgeting & Finance .

Using the STM, Chen and his colleagues identified 15 latent topics in the areas of public budgeting, public finance and public financial management from the titles and abstracts of 1,028 articles published in the journal from 1981 to 2020. They compared these topics against those covered by standard exams for Certified Public Finance Officers (CPFO) and found much overlap. However, some topics that were mentioned less often may hint at some underexplored research agendas in PB&F.

Chen, an associate professor of public management and policy in the Andrew Young School of Policy Studies, directs the college's Ph.D. programs in public policy . After presenting this research at the Next Generation Public Finance conference hosted by Georgia State University, he received helpful feedback and comments he gratefully acknowledges. In the Q&A that follows, Chen reveals more about the journal, the findings and his motivation for conducting the study with his colleagues.

What inspired you to do this study?

The journal was 40 years old, so we wanted to do something to celebrate its anniversary, a review of the journal's history. Another reason is that the methodology we used, machine learning, was new to this publication. Traditionally, the articles were manually reviewed. We used technology to do a smart review.

And more importantly, doctoral students sometimes will come to me and ask whether I understand what the big trends are in the field. They must specialize in their first year, so they ask me about the overall landscape of public budgeting and finance: What are the major recent topics in this field? With this study we could look back 40 years and, more importantly for doctoral students, through its recent history to determine trends.

Where did you get the idea to use machine learning and text mining to find the trends and themes?

When I came to Georgia State, AYSPS was promoting its Digital Landscape Initiative, using big data for analytics. So, I thought, "Oh, great! This is a great methodology, and the school wants us to use it."

Other fields use machine learning to help analyze big data sets—engineering, science and technologies—but to our understanding, ours is one of the first research efforts to introduce machine-learning and text-mining methodology into the field of public budgeting and finance. This is what we're all about, using ideas from other disciplines and applying it to our discipline.

What key trends did your research reveal?

The first and most important thing I should mention is practitioners. The journal was founded to promote a knowledge exchange between practitioners and scholars. We found that, historically, we're seeing less and less of this exchange on the part of practitioners publishing in the journal. We need to promote more engagement with practitioners. And we need to have the doctoral students better understand the practitioner's point of view regarding the field.

Our findings have important implications for helping scholars, practitioners and students of government budgeting and finance keep sight of the overall landscape of this literature. It's useful in helping them gain a deeper understanding of the areas of research and form collaborations among researchers with various specializations.

This research can be useful for doctoral students and others in promoting new study topics. We need to look forward and do more research on public budgeting and finance in relation to big challenges in the future, such as health care, technology and climate change. These are important areas we can research in relation to public finance and budgeting to help society address these challenges.

Why is your analysis important? Who will it impact?

First, it's very important to have students, practitioners and scholars know both the big picture and the evolution of the field. It's even more important to think about the future direction of this public budgeting and finance research and the areas we need to spend more time studying in the future.

Also, many practitioners were writing academic papers in the early history of the journal and of public budgeting and finance. Now, it's super hard to find these folks—the practitioners—writing and getting published. But this is a very practical field, so scholars need to think about how to write articles that better represent the field and the practice, and to work with practitioners to promote this knowledge exchange.

Provided by Georgia State University

Explore further

Feedback to editors

top research papers in machine learning

Widely used machine learning models reproduce dataset bias: Study

17 hours ago

top research papers in machine learning

Targeting 'undruggable' proteins promises new approach for treating neurodegenerative diseases

top research papers in machine learning

Examining viruses that can help 'dial up' carbon capture in the sea

Feb 17, 2024

top research papers in machine learning

New research helps create new antibiotic that evades bacterial resistance

top research papers in machine learning

From crop to cup: A new genetic map could make your morning coffee more climate resilient

top research papers in machine learning

Saturday Citations: Einstein revisited (again); Atlantic geological predictions; how the brain handles echoes

top research papers in machine learning

CERN researchers measure speed of sound in the quark–gluon plasma more precisely than ever before

Feb 16, 2024

top research papers in machine learning

NASA's final tally shows spacecraft returned double the amount of asteroid rubble

top research papers in machine learning

Harnessing light with hemispherical shells for improved photovoltaics

top research papers in machine learning

New species of pirate spiders discovered on South Atlantic island

Relevant physicsforums posts, interesting anecdotes in the history of physics.

7 hours ago

Music to Lift Your Soul: 4 Genres & Honorable Mention

Cover songs versus the original track, which ones are better.

8 hours ago

A Rain Song -- Favorite one? Memorable one? One you like?

Two-tone, ska rock, what are your favorite disco "classics".

More from Art, Music, History, and Linguistics

Related Stories

top research papers in machine learning

Marketing research is too narrow: How the field must change to keep producing relevant, timely knowledge

Oct 25, 2023

top research papers in machine learning

Student loan debt may make mental health issues worse

Jul 13, 2023

top research papers in machine learning

A new tool assesses the real-world relevance of academic marketing articles

Aug 4, 2021

top research papers in machine learning

Female finance leaders outperform their male peers, so why so few of them in academia and beyond?

Jun 23, 2022

top research papers in machine learning

Review examines machine learning concepts for microbiologists

Dec 6, 2023

Unpacking consumer research: Identifying trends, emerging topics, and key insights

May 23, 2023

Recommended for you

top research papers in machine learning

Suicide rates in the US are on the rise. A new study offers surprising reasons why

Feb 15, 2024

top research papers in machine learning

Study reveals significant discrepancies in common poverty measurement approaches

Feb 5, 2024

top research papers in machine learning

Optimism key to greening the global financial system, says study

top research papers in machine learning

How to avoid a 'winner's curse' for social programs

top research papers in machine learning

Organizations in crises may benefit from jazz ensemble model

Jan 22, 2024

top research papers in machine learning

Sea level rises could cost EU and UK economies up to 872 billion euros by 2100, study suggests

Jan 18, 2024

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

IMAGES

  1. (PDF) A Research on Machine Learning Methods and Its Applications

    top research papers in machine learning

  2. Latest Research Papers in Machine Learning

    top research papers in machine learning

  3. (PDF) Machine Learning With Big Data: Challenges and Approaches

    top research papers in machine learning

  4. Journal of Machine Learning Research (JMLR) Template

    top research papers in machine learning

  5. (PDF) An Overview of Artificial Intelligence and their Applications

    top research papers in machine learning

  6. Analysis of Machine Learning Research and Application

    top research papers in machine learning

VIDEO

  1. This new AI DEEP FAKE will change movies FOREVER

  2. DATA SCIENCE: Intro to Machine Learning

  3. Why you should read Research Papers in ML & DL? #machinelearning #deeplearning

  4. Coming up with Data Science and Machine Learning Project IDEAS!

  5. Basics of Machine Learning

  6. Data Science "Machine Learning"

COMMENTS

  1. The latest in Machine Learning

    360 2.27 stars / hour Paper Code DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors Doubiiu/DynamiCrafter • • 18 Oct 2023 Animating a still image offers an engaging visual experience. Image Animation 734 2.02 stars / hour Paper Code Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

  2. Machine learning

    P: 1-2 Latest Research and Reviews Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN SATURN performs cross-species integration and analysis...

  3. machine learning Latest Research Papers

    machine learning Latest Research Papers | ScienceGate machine learning Recently Published Documents TOTAL DOCUMENTS 102881 (FIVE YEARS 75797) H-INDEX 193 (FIVE YEARS 62) Latest Documents Most Cited Documents Contributed Authors Related Sources Related Keywords

  4. Machine Learning: Algorithms, Real-World Applications and Research

    Thus, this study's key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more.

  5. [2104.05314] Machine learning and deep learning

    Today, intelligent systems that offer artificial intelligence capabilities often rely on machine learning. Machine learning describes the capacity of systems to learn from problem-specific training data to automate the process of analytical model building and solve associated tasks. Deep learning is a machine learning concept based on artificial neural networks. For many applications, deep ...

  6. Top 10 Machine Learning Research Papers of 2021

    Arti August 23, 2021 4 mins read Machine learning research papers showcasing the transformation of the technology In 2021, machine learning and deep learning had many amazing advances and important research papers may lead to breakthroughs in technology that get used by billions of people.

  7. 2020's Top AI & Machine Learning Research Papers

    2020's Top AI & Machine Learning Research Papers November 24, 2020 by Mariya Yao Despite the challenges of 2020, the AI research community produced a number of meaningful technical breakthroughs. GPT-3 by OpenAI may be the most famous, but there are definitely many other research papers worth your attention.

  8. Machine Learning authors/titles recent submissions

    Ignacio Hounie, Javier Porras-Valenzuela, Alejandro Ribeiro Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML) [5] arXiv:2402.09371 [ pdf, other] Transformers Can Achieve Length Generalization But Not Robustly Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou

  9. Journal of Machine Learning Research

    The Journal of Machine Learning Research (JMLR), established in 2000, provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing.

  10. Journal of Machine Learning Research

    The Journal of Machine Learning Research (JMLR), established in 2000, provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. Final versions are published ...

  11. The Top 17 'Must-Read' AI Papers in 2022

    The Top 17 'Must-Read' AI Papers in 2022 We caught up with experts in the RE•WORK community to find out what the top 17 AI papers are for 2022 so far that you can add to your Summer must reads. The papers cover a wide range of topics including AI in social media and how AI can benefit humanity and are free to access. Interested in learning more?

  12. Top 4 Important Machine Learning Papers You Should Read in 2021

    Every year, 1000s of research papers related to Machine Learning are published in popular publications like NeurIPS, ICML, ICLR, ACL, and MLDS. The criteria are using citation counts from three academic sources: scholar.google.com; academic.microsoft.com; and semanticscholar.org.

  13. Home

    Machine Learning is an international forum focusing on computational approaches to learning. ... Improves how machine learning research is conducted. Prioritizes verifiable and replicable supporting evidence in all published papers. Editor-in-Chief. Hendrik Blockeel; Impact factor 7.5 (2022) 5 year impact factor

  14. Top Machine Learning (ML) Research Papers Released in 2022

    Data Science Top Machine Learning (ML) Research Papers Released in 2022 For every Machine Learning (ML) enthusiast, we bring you a curated list of the major breakthroughs in ML research in 2022. By Preetipadma K January 13, 2023

  15. Top Machine Learning Research Papers Released In 2021

    by Dr. Nivash Jeevanandam Advances in machine learning and deep learning research are reshaping our technology. Machine learning and deep learning have accomplished various astounding feats this year in 2021, and key research articles have resulted in technical advances used by billions of people.

  16. Machine Learning

    Machine Learning Abstract: In machine learning, a computer first learns to perform a task by studying a training set of examples. The computer then performs the same task with data it hasn't encountered before. This article presents a brief overview of machine-learning technologies, with a concrete case study from code analysis.

  17. The latest in Machine Learning

    Papers With Code highlights trending Machine Learning research and the code to implement it. Browse State-of-the-Art Datasets ; Methods; More ... Subscribe to the PwC Newsletter ×. Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets. Read previous issues. Subscribe.

  18. Top Machine Learning Research Papers Released In 2020

    by Ram Sagar It has been only two weeks into the last month of the year and arxiv.org, the popular repository for ML research papers has already witnessed close to 600 uploads. This should give one the idea of the pace at which machine learning research is proceeding; however, keeping track of all these research work is almost impossible.

  19. 7 Best Research Papers To Read To Get Started With Deep Learning

    1. ResNet: Research Paper: Deep Residual Learning for Image Recognition Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun Summary: There are several transfer learning models that are used by data scientists to achieve optimal results on a particular task.

  20. Top 20 Recent Research Papers on Machine Learning and Deep Learning

    However, we see strong diversity - only one author (Yoshua Bengio) has 2 papers, and the papers were published in many different venues: CoRR (3), ECCV (3), IEEE CVPR (3), NIPS (2), ACM Comp Surveys, ICML, IEEE PAMI, IEEE TKDE, Information Fusion, Int. J. on Computers & EE, JMLR, KDD, and Neural Networks.

  21. GitHub

    Collection of must read papers for Data Science, or Machine Learning / Deep Learning Engineer Topics. data-science machine-learning deep-learning exploratory-data-analysis recurrent-neural-networks neural-networks data-analysis recommender-system convolutional-networks papers generalized-additive-models rnn-lstm

  22. 10 Noteworthy AI Research Papers of 2023

    This year has felt distinctly different. I've been working in, on, and with machine learning and AI for over a decade, yet I can't recall a time when these fields were as popular and rapidly evolving as they have been this year. To conclude an eventful 2023 in machine learning and AI research, I'm excited to share 10 noteworthy papers I've read this year. My personal focus has been more on ...

  23. Revisiting Feature Prediction for Learning Visual Representations from

    Abstract. This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision.

  24. Forecasting high-impact research topics via machine learning on

    The exponential growth in scientific publications poses a severe challenge for human researchers. It forces attention to more narrow sub-fields, which makes it challenging to discover new impactful research ideas and collaborations outside one's own field. While there are ways to predict a scientific paper's future citation counts, they need the research to be finished and the paper written ...

  25. Top 66 Machine Learning papers published in 2023

    Top 66 Machine Learning papers published in 2023 Home / Journals / Machine Learning / 2023 Showing papers in "Machine Learning in 2023" PDF Open Access Journals & Conferences ( 1) Clear filters Sort by: Citation Count Showing all 66 results Journal Article • DOI • α α ILP: thinking visual scenes as differentiable logic programs [...]

  26. [2402.10464] FedKit: Enabling Cross-Platform Federated Learning for

    We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and ...

  27. MIT researchers remotely map crops, field by field

    Caption: MIT engineers have developed a method to quickly and accurately label and map crop types using a combination of Google Street View images, machine learning, and satellite data to automatically determine the crops grown throughout a region, from one fraction of an acre to the next.

  28. Q&A: Machine-learning model tracks trends in public finance research

    Other fields use machine learning to help analyze big data sets—engineering, science and technologies—but to our understanding, ours is one of the first research efforts to introduce machine ...