Wake-Up Machine Learning Researchers!

Tech in 3
Nerd For Tech
Published in
3 min readJun 20, 2021

--

Source: CFO

“ After the Covid-19 pandemic hit , many researches emerged suggesting the use of machine learning models to diagnose Covid-19 from medical images. But, a recent systematic review by University of Cambridge researchers suggests that as a “too flawed” way and “unsuitable” for use in patients! “

If you’re from a Data Science / ML (Machine Learning) background, you must’ve regularly heard about ML having the potential to detect “so-and-so” disease. Many research papers are also published, where with a limited amount of medical data, ML models showed higher accuracies in disease detection compared to the conventional methods that are either time-consuming or requires an expert.

Similarly, after the Covid-19 pandemic, due to the lack of perfect tests available for the novel coronavirus, researchers started experimenting with the power of ML to detect Covid-19 from available medical data such as Chest Radiographs (CXR) and Computed Tomography (CT) images.

The ML models achieved encouraging results and few also promised better accuracy in diagnosis, compared with the gold standard RT-PCR!

These results felt promising until there was a recent systematic review done on all these research papers right from 1st January’20 to 3rd October’20. A team of researchers, led by the University of Cambridge started a detailed analysis of all these researches, challenges with those approaches, and remedies to avoid the flaws!

Over 2212 papers were considered based on the initial search criteria and after the quality screening, only 62 papers retained through the process out of which 37 were deep learning papers, 23 were traditional machine learning papers and 2 were hybrid ones!

But why such a low number? Only 62?

Most of the deep learning papers didn’t mention the following things:

  1. How the final model was selected?
  2. Method of pre-processing of the images
  3. Training approach details (Optimizer, Loss Function, Learning Rate)

And, most of the machine learning papers didn’t cover the following things:

  1. Feature reduction techniques
  2. Model Validation techniques

This essentially shows the quality of research done across the world in ML (Especially in Healthcare), where most of the researchers are more interested in getting the research paper published and very few are focused on real-time implementation and granularity of the data & models used!

As Dr. Michael Roberts from Cambridge’s Department of Applied Mathematics and Theoretical Physics says,

“However, any machine learning algorithm is only as good as the data it’s trained on”

So, What were the reasons for this poor quality of models when it comes to real-time usage?

  • > Poor quality data
  • > Poor quality of ML methodology
  • > Poor reproducibility & Biases in study design

Few researchers used images from children for their “Non-Covid-19” samples! However, children are less vulnerable to the disease when compared to adults and this adds a high bias in the training data!

This systematic survey by Cambridge researchers also asserted that, if people are basing the ML model based on a single hospital, it may also not even work on data from a hospital in the nearest town too!

Despite many flaws in the ML models, researchers are still positive about the potential of ML but it requires key modifications and external validations from the medical industry!

Moral: Re-Research before you Research and don’t be in the haste of publishing your “State-Of-The-Art” model!

References: https://www.nature.com/articles/s42256-021-00307-0

--

--

Tech in 3
Nerd For Tech

We are a team of 3 friends who are on a mission to make people understand technology in simple terms!