Artificial Intelligence in Medicine – AIM 2020 May

Artificial Intelligence in Medicine (AIM) came into limelight in 2018 when the Royal College of Physicians, London invited Babylon’s Team (Babylon a healthcare chatbot) to make a presentation at its 2018 conference. This was followed by NHS-UK to use Chatbots like Babylon to triage patients, hopefully shortening the waiting time for patients to consult their GPs.

This is normally done by a triaging nurse and GP would do when a patient calls to get an appointment. Babylon published a paper about the Chatbot and Enrico Coeria has commented on why Babylon may not yet be ready to triage real-patients.

What is AI?

Medicine is defined by Merriam Webster as the science and art dealing with the maintenance of health and the prevention, alleviation, or cure of disease. 

Merriam Webster Dictionary [1]: defines Artificial Intelligence as a branch of computer science dealing with the simulation of intelligent behavior in computers;the capability of a machine to imitate intelligent human behavior.The philosophical question “Is medicine science or art?” has been simply skipped in the Merriam Webster’s definition by taking both science and art in the considered definition.Going back to these basics will help in having a clear idea of what is the current landmark of the research in AI in Medicine and what are the forthcoming challenges and hot research topics the journal will manage and stimulate.Indeed, even though not all the nuances of Artificial Intelligence in Medicine have been identified with these basic definitions, we can recognise some aspects that relate AI in medicine to medical (intelligent) decision-based tasks, such as diagnosis, therapy, prognosis, and monitoring, and to the capability of software (and hardware) tools to support/provide some form of reasoning similar to the human one in the medical domain.  (From Editorial from the new Editor-in-Chief: Artificial Intelligence in Medicine and the forthcoming challenges).



From the time computers came into use in the 1950’s scientists were aiming to create programs that were intelligent (thinking and reasoning) like human beings. In 1956 John McCarthy who coined the term ‘Artificial Intelligence’ (AI) (Ref) proposed at the Dartmouth Conference that  “An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.”

This summer has extended into more than half a century. Physicians were also captivated by the potential AI could have in medicine. With computer power to store vast amounts of data and processing power, it was thought that computers would become ‘doctors in a box’ assisting and surpassing clinicians with tasks like diagnosis (Ref). With this background computer scientists and healthcare professionals mainly from USA working together formed the new discipline of  ‘Artificial Intelligence in Medicine’ (AIM).

What is AIM?

In reviewing the emerging field of AI in medicine, Clancey and Shortliffe in 1984 provided the following definition: ‘Medical artificial intelligence is primarily concerned with the construction of AI programs that perform diagnosis and make therapy recommendations. Unlike medical applications based on other programming methods, such as purely statistical and probabilistic methods, medical AI programs are based on symbolic models of disease entities and their relationship to patient factors and clinical manifestations.’ (Ref)

Turin Test

The Turing test, originally called the imitation game by Alan Turing in 1950,[2] is a test of a machine’s ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. Turing proposed that a human evaluator would judge natural language conversations between a human and a machine designed to generate human-like responses. The evaluator would be aware that one of the two partners in conversation is a machine, and all participants would be separated from one another. The conversation would be limited to a text-only channel such as a computer keyboard and screen so the result would not depend on the machine’s ability to render words as speech.[3] If the evaluator cannot reliably tell the machine from the human, the machine is said to have passed the test. The test results do not depend on the machine’s ability to give correct answers to questions, only how closely its answers resemble those a human would give

The field of AI had two schools of thought. Proponents of so-called ‘strong’ AI were interested in creating computer systems whose behaviour is at some level indistinguishable from that of humans (Interlude Box IC2.1 – Ref Turin Test).  Success in strong AI would result in computer minds that could reside in autonomous physical beings such as robots or perhaps live in ‘virtual’ worlds such as the information space created by something like the Internet. (Ref – Coiera, Enrico. Guide to Health Informatics, Third Edition, 3rd Edition. CRC Press.)

The ‘weak’ AI  looked at human cognition and decide how it can be supported for complex or difficult tasks. For example, a fighter pilot may need the help of intelligent systems to assist in flying an aircraft that is too complex for humans to operate on their own. These ‘weak’ AI systems are not intended to have an independent existence, but instead are a form of ‘cognitive prosthesis’ that supports a human in a variety of tasks.

———————–The Turin TestHave this in a BOXYouTube clip that is less than 2 minutes———————— The progress of the strong AI The progress of weak AI

What is Machine Learning? 

AI is a branch of computer science that tried to make computers more intelligent. A basic requirement for intelligent behaviour in learning. Most experts believe that without learning there can be no intelligence. Machine learning is a major branch of AI and a rapidly developing subfields of AI (Ref). (This is a key paper to understand ML and the three branches – Baysean classifier, Neural Networks and Decision Trees) From the very beginning, three major branches of machine learning emerged. Classical work in symbolic learning is described by Hunt et al. [5], in statistical methods by Nilsson [6], and in neural networks by Rosenblatt [7]. Bayesian classifier example and explanation – link Jordan ML is an algorithmic field that blends ideas from statistics, computer science and many other disciplines (see below) to design algorithms that process data, make predictions and help make decisions (MJ Berkley). ML would soon power not only Amazon but essentially any company in which decisions could be tied to large-scale data. New business models would emerge. The phrase “Data Science” began to be used to refer to this phenomenon, reflecting the need of ML algorithms experts to partner with database and distributed-systems experts to build scalable, robust ML systems, and reflecting the larger social and environmental scope of the resulting systems. This confluence of ideas and technology trends has been rebranded as “AI” over the past few years. This rebranding is worthy of some scrutiny.


Chatbots can be defined as software agents that converse through a chat interface. Now, what that means is that they are software programs that are able to have a conversation, which provides some kind of value to the end user. The user can interact with the chatbot by typing in their end of the conversation, or simply by using their voice, depending on the type of chatbot provided. Virtual assistants like Apple Siri or Amazon Alexa are two examples of popular chatbots interacting via voice rather than text. Typically, the chatbot will greet the user and then invite them to ask some kind of question. When the user replies, the chatbot will parse the input and figure out what’s the intention of the user’s question. Finally, it will respond in some form of consequential or logical manner, either providing information or asking for further details before ultimately answering the question.’ (Ref)








Reviews re AIM – historical order Computer Programs to support clinical decision making –  1987 Shortlife Coming of Age in AI -2008 Patel Shortliffe Thirty years of AIM review of research themes – 2015 – AIM Peek Artificial intelligence in medicine – 2017 Hamet Artificial Intelligence in Medical Practice: The Question to the Answer? – AJM – 2018 Miller Topol The Medscape Editor Eric Topol’s articles about AIM The image below has all papers Toplo think is methodologicaly good for thned

Articial Intelligence—The Revolution Hasn’t Happened YetMichael I. Jordan is a Professor in the Department of Electrical Engineering and Computer Sciences and the Department of Statistics at UC Berkeley.

What AI has been used for historically and present Jeremy Howard – Ted Talk 

The wonderful and terrifying implications of computers that can learn    I’m Jeremy Howard, Enlitic CEO, Kaggle Past President, Singularity U Faculty. Ask me anything about machine learning, future of medicine, technological unemployment, startups, VC, or programming LINK Gary KasporovTed Talk Don’t fear intelligent machines. Work with them Fei Fei LiTed Talk

How we’re teaching computers to understand pictures


2018 – 12 27

On algorithms, machines, and medicine [Ref] The Lancet oncology piece by Coiera re thyroid cancer detection study

As we move into a world dominated by algorithms and machine-learned clinical approaches, we must deeply understand the difference between what a machine says and what we must do. Deep learning techniques in particular are transforming our ability to interpret imaging data.

The results of a retrospective preclinical [Ref] study applying deep learning and statistical methods to diagnose thyroid cancer using sonographic images are impressive. When compared with six radiologists on unseen data, in an internal validation dataset, the system correctly detected about the same number of cancers with the radiologists. How generalisable are these results? Training only on patients from one health service or region runs the risk of overfitting to the training data, resulting in brittle degraded performance in other settings. In this study, although similar machine specificity was achieved on populations from different hospitals, sensitivity dropped to 84·3%. One might anticipate the system to have weaker performance in non-Chinese populations. One remedy is to retrain the system on patients from new target populations. The problem of biases in training data is, however, foundational,5 and clinicians must always consider if a machine recommendation is based on data from a population different to their patient. For example, in the study, cancer-free images from patients with thyroid cancer were excluded from training. In real-world settings, such images are included, and their presence might distort algorithm performance.

The authors make commendable efforts to ensure results are as clinically meaningful as possible. Image augmentation was used to artificially distort training data—randomly cropping, scaling, and otherwise distorting images to mimic variations in real-world image quality. Deep learning systems are often criticised because their recommendations come without an explanation, the logic underpinning a diagnosis hidden. In this study, the pixels in an image that most contributed to a diagnosis were highlighted. A clinician could highlight salient parts of an image to help check the computer interpretation.

Coiera states that ‘Decision support must be embedded in a clinical workflow and is but one part of a web of actions and decisions that lead to patients’ care. In the case of thyroid cancer, ultrasound is one step in a sequence that can lead to biopsy and treatment. In view of concerns that thyroid cancer is both overdiagnosed and overtreated, improved ultrasound detection might deliver little benefit in terms of patients’ outcomes. For example, South Korea has seen a 15-fold increase in thyroid cancer, attributable largely to overdiagnosis, and any diagnostic method that detects more indolent than consequential disease would most likely exacerbate this situation. Certainly, precise automated identification of true negative sonograms might improve a clinician’s confidence to do nothing. For this reason, rather than only comparing human to machine, it is more clinically meaningful to measure the performance of human beings assisted by machine. Such measurements must ultimately take place in clinical trials, recording false-negative identifications and undertreatment as well as overtreatment. Indeed, there is a case that the most pressing decision-support need in thyroid cancer is not in diagnosis but in making the decision to treat.

Thus, excellence in algorithmic performance is essential in our quest for automation, but ultimately we are interested in what a human being decides when using automation in the messy reality of health care. Until our machines are fully embedded in that reality, and see it better than us, our role as clinicians is to be the bridge between machine and decision. At least for now, algorithms do not treat patients, health systems do

Published Online December 21, 2018 S1470-2045(18)30835-0

See Online/Articles S1470-2045(18)30762-9

Diagnostic aids

Evaluation of symp[toms checkers BMJ –

Isabel –

Babylon –