What is machine learning?
Machine learning is an umbrella term that includes many different classes of algorithms, each suited for analysis of different types of data. Machine learning algorithms feature the ability to ‘learn’ associations from data without explicit guidance from humans. Classical machine learning algorithms, such as random forests and support vector machines, can perform well with smaller data sets (N=500-1000) while deep neural networks (DNNs), the newest and most powerful branch of machine learning, require many training samples but can accept varied input such as 2D or 3D image data. DNNs perform best when trained on 100,000+ training samples, although smaller datasets may work depending on the clinical question and the availability of other data types. DNNs have become much more popular recently due to the convergence of abundant data and accessible computing power required to train the models.
How can machine learning help in medical research?
Studies using DNNs specialized for image input have demonstrated physician-level performance in the classification of diabetic retinopathy based on fundoscopy images 1, equal performance to dermatologists for the classification of skin lesions 2, and superior performance to pathologists for histological image data 3. Each year, there are over 12 million mis-diagnoses in the USA alone 4. By having parallel diagnostic systems in place - systems driven by big-data and neural networks, physicians can have a second opinion available at all times. Several studies have reliably demonstrated increased diagnostic accuracy when physician assessment and machine learning predictions were combined 5.
How do these algorithms ‘learn’ the correct associations between an input and a diagnosis? During training, each time a wrong diagnosis or prediction is made for a given input, an error function of corresponding magnitude is back-propagated through the network, adjusting the weights between neurons. This process is repeated iteratively until the network’s prediction accuracy on the training data is as optimized as possible. This same learning process is used across varied types of input data – this is one of the reasons DNNs are flexible and can be applied to many problems.
Eric Topol, a cardiologist and one of the top 10 most cited researchers in medicine, has written extensively about artificial intelligence (AI) and healthcare for Nature and other sources. In his book Deep Medicine, he does a wonderful job outlining the many ways in which the incorporation of deep learning algorithms in the healthcare workflow can allow physicians to spend more time on humane parts of being a doctor. For instance, outside of diagnosis and prognostication, one underrated way that neural networks may change medicine is by freeing up the physician for more patient-focused tasks. The placement of the keyboard and screen between a physician and their patient has been much derided. Machine learning systems are being developed to capture the dialogue between the physician and the patient and transcribe relevant and accurate notes, removing this obstacle and permitting longer and more direct patient contact.
What are some challenges for deep learning in medicine?
Some of the obstacles to the implementation of machine learning research are the fact that medical data are held in ‘silos’ and are often ‘unstructured’.
Silos mean that some of the patient’s data may be in paper charts or may be in multiple electronic medical records (EMRs) that are separate from each other and from any imaging data relevant to the patient. Deep learning algorithms will be most accurate for individual patients when all of their medical data can be incorporated in the analysis.
Unstructured refers to the fact that medical data may be handwritten, scanned, and not in a format readily acceptable as input by DNNs. In such cases, ‘data pre-processing’ is required to transform the data into a format amenable to analysis with DNNs.
Is machine learning really a “black-box”?
Up until recently, a fair criticism of machine learning was that behind the fancy math underlying the changing connections between neurons in the network, no real understanding was gleaned from the system. The neural network may make a correct prediction, but if the ‘why’ is uninterpretable, is the prediction really useful or trustworthy? The field of machine learning has advanced rapidly to make the prediction of neural network models much more interpretable and transparent.
The Local Interpretable Model-Agnostic Explanations (LIME) technique, published in 2016, allows for the visualization of any classifier’s relative weighing of all variables 6. It can visually depict the relative importance of variables globally for the overall cohort, as well as which variables were most weighed locally for each individual patient.
Furthermore, ‘saliency mapping’ is a technique which can be used to visualize, for instance, which exact parts of an X-ray were considered by an algorithm in its prediction.
(Rajpurkar et al, 2018)
Moreover, it is possible to determine which combination of inputs maximally activate a fully-trained deep neural network, to understand which overall input is considered most relevant to the problem by the algorithm.
What is needed to do good research with deep neural networks?
Up until even 3-4 years ago, the computational power needed to train deep neural networks was prohibitively expensive and required the use of institutional supercomputers. Now, services such as Google’s Colab provide this processing power for free!
Furthermore, the programming languages have improved many open-source code libraries are available. Researchers even make their pre-trained models and classifiers available for others to use. People who do not have the data to train a model from scratch can co-opt a pre-trained model to help them with their own task, a process known as transfer learning.
Really, the limiting factor to applying deep learning these days is not computing power or code, it is data! If your institution has a dataset of sufficient size, granularity and quality, that is free of bias, then applying a deep neural network can be quite feasible and rewarding!
What can we expect to see from deep neural networks in healthcare in the next 2-3 years?
The aforementioned examples of successful implementations of DNNs in diagnostic tasks are all based on retrospective analysis. The important question yet to be answered is “Can the prospective use of neural network algorithms help guide the management of patients towards better outcomes?”. Such study designs have been much rarer to date, but will soon become the focus of machine learning research in healthcare.
The first such prospective randomized controlled trial assessing DNN systems in healthcare was just published in March 2019 by a group from China 8. Over 1000 patients undergoing colonoscopy were randomized to a control group of routine colonoscopy, or an experimental group in which the physician performing the colonoscopy had access to a second screen containing the output from a neural network polyp detection system 8. (Wang et al, 2019)
Patients in the group featuring assistance from the real-time neural network had 1.9 times more polyps discovered overall, including 1.7 times more adenomas. There were 0.08 ‘false-alarms’ per colonoscopy in the neural network group and none of the polyps detected by human physicians were missed. While this increased detection of polyps has not yet been linked to increased survival, this is the first study reporting on the utility of AI systems for a valid surrogate outcome. The authors are planning a blinded follow-up RCT where one group of endoscopists will have AI-assistance and the other will have sham-AI assistance.
1. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama. 2016;316(22):2402-2410.
2. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115.
3. Djuric U, Zadeh G, Aldape K, Diamandis P. Precision histology: how deep learning is poised to revitalize histomorphology for personalized cancer care. NPJ precision oncology. 2017;1(1):22.
4. Singh H, Meyer AN, Thomas EJ. The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations. BMJ Qual Saf. 2014;23(9):727-731.
5. Miller DD, Brown EW. Artificial intelligence in medical practice: the question to the answer? The American journal of medicine. 2018;131(2):129-133.
6. Ribeiro MT, Singh S, Guestrin C. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:160605386. 2016.
7. Rajpurkar P, Irvin J, Ball RL, et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS medicine. 2018;15(11):e1002686.
8. Wang P, Berzin TM, Brown JRG, et al. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut. 2019:gutjnl-2018-317500.