In the past many years, having sat on several advisory boards, research planning and funding review meetings, I have been concerned by how many policymakers, researchers and organisational leaders know so little of evaluation science and conflate it with health economic assessment and implementation science. Far more concerning, some of them think a health economic assessment supersedes an evaluation and negates the commissioning of an evaluation process. Obviously, I speak here of a healthcare services research context, but this issue may extend beyond this discipline. Therefore, I make a case here for stakeholders, including funding decision-makers and researchers alike, to distinguish between the disciplines of implementation science and evaluation science and the processes of evaluation and health economic assessment.
The healthcare sector is vast and complex, comprising multiple dimensions of study and implementation. It is driven by a constant need to maximize patient outcomes while ensuring the efficient use of resources. Implementation science and the evaluation process are both critical components within the realm of health and social sciences. They both aim to contribute to improved practices, interventions, and policies within various disciplines, particularly healthcare. While they may appear similar, they encompass different objectives, methodologies, and stages in implementing and assessing evidence-based interventions.
Implementation science, also known as knowledge translation, is a multidisciplinary field that systematically integrates research findings and evidence-based practices into routine and everyday use. The goal is to improve the quality and effectiveness of health services, social programs, and policies. The fundamental underpinning of implementation science is to bridge the gap between research and practice, often referred to as the "know-do" gap.
Implementation science utilizes various theories, models, and frameworks to identify, understand, and address barriers to evidence-based interventions' adoption, adaptation, integration, scale-up, and sustainability. It takes into account numerous factors, including the complexities of health systems, the variability of human behaviours, and the diversity of social and political contexts.
On the other hand, the evaluation process is a systematic method used to assess the design, implementation, and utility of programs, interventions, or policies. The primary purpose of the evaluation is to judge the value or worth of something to guide decision-making and improve effectiveness.
There are different types of evaluation, such as formative, summative, process, and outcome. Each type focuses on various stages of a program's life cycle. Evaluations may consider the fidelity of implementation, the outcomes of the intervention, and the cost-effectiveness, among other aspects. While implementation science and evaluation processes utilize mixed methods, their methodological emphases differ. Implementation science prioritizes process-oriented investigations, employing qualitative research to understand human behaviours and system complexities. In contrast, the evaluation process often emphasizes outcome measures, using quantitative methods to assess the degree to which program goals and objectives have been met.
Health economic assessment, often referred to as health economic evaluation, is a tool used to compare the cost-effectiveness of different health interventions. This method evaluates different health programs' benefits relative to their costs, aiming to maximize health outcomes given a particular budget constraint. The health economic assessment primarily adopts a macroeconomic perspective, focusing on the healthcare system's overall cost-effectiveness. This assessment typically employs methods such as cost-effectiveness analysis (CEA), cost-utility analysis (CUA), or cost-benefit analysis (CBA).
The value of health economic assessment lies in its ability to provide a comparative analysis of the efficiency of different healthcare interventions. Quantifying costs and outcomes (often in quality-adjusted life years or QALYs) provide a comprehensive view of the economic value of different healthcare choices. This is particularly valuable when resources are scarce, and there is a need to allocate them in a way that can yield the most significant health benefits for the most considerable number of people.
On the other hand, health service evaluation focuses on assessing the quality of care delivered in a specific healthcare setting or by a particular healthcare service. It's a process that takes a micro view, examining individual services, care pathways, or providers. It aims to identify areas of improvement and highlight best practices, focusing on effectiveness, efficiency, and equity in care delivery. The measures in health service evaluation might include patient satisfaction, accessibility of care, timeliness, and adherence to clinical guidelines, among others. Health service evaluation aids in pinpointing gaps or deficiencies in care delivery that might not be evident from a macroeconomic view. This detailed scrutiny of specific services can lead to improvements in patient care, satisfaction, and overall health outcomes. It can also highlight systemic issues that might need addressing at a policy level.
The principal differences between health economic assessment and health service evaluation stem from their varying focuses. The former adopts a broader perspective, taking into account the entire health system's economic balance. It often involves policy-level decisions concerning resource allocation, seeking to achieve the maximum health benefits per unit of cost across different healthcare interventions. Conversely, health service evaluation narrows its lens to the individual service, provider, or care pathway level. Its main goal is to improve the quality of care and patient satisfaction within specific healthcare services, not necessarily accounting for the broader economic implications of these improvements. It may, however, indirectly influence economic evaluations by identifying more effective or efficient practices that can then be implemented more broadly, thus improving cost-effectiveness.
Thus, while each field (implementation science, health economic assessment, and the evaluation process) have a critical role in improving the quality and outcomes of health service delivery, they are very distinct from each other. One cannot afford to conflate each other but only at the expense of sound research and assessment. It is important for evaluators to be clear and loud in making the difference known while standing up for appropriate evaluation processes to be considered in health service projects and programs.
In last week's Google I/O developer conference, there was an announcement that PaLM 2 (Google's latest generational large language model) will have multimodal capabilities. This means PaLM 2 can also interpret images and videos in addition to text interpretation and generation. Previously, Open AI announced that GPT-4 would have these capabilities too. In other words, the new generation of large language models will have multi-modal capabilities as a standard offering. How is this significant to the healthcare domain in which I operate?
Medical practice, by default, operates on multi-modal functionality. A clinician must interpret the patient or laboratory records, take an oral history, undertake visual examination, and interpret waveform and radiological investigations. Collectively, these inform the clinician's diagnosis or management of the patient. The previous generation of AI models could only contribute to a narrow set of medical tasks, say electronic record analysis or medical image interpretation, not in combination. This was mainly due to how the machine learning models were trained (supervised learning/annotated/labelled process) and the intrinsic limitation of the algorithms (even advanced ones) to perform accurately on multi-modal datasets. While regulatory authorities and vendors had a relatively easy task of having the application certified for its task boundaries and safety, they really fulfilled a narrow set of the customers (health services, medical doctors...etc.) requirements. Considering the need to integrate these applications into existing information systems, the economies of scale and ROI were minimal, if not non-existent.
The availability of multi-modal (and potentially multi-outcome) functionality may considerably change AI in the healthcare landscape. An ability of a single AI application to not only analyse a radiological investigation but link it back to the patient's history derived from analysis of the electronic health record and pathology investigations will be revolutionary. This will negate the need for stakeholders/purchasers to source multiple AI applications and become more accessible for the health service to set up a governance mechanism to monitor the deployment and delivery of AI-enabled services. At a clinical level, by utilizing multi-modal AI, physicians have a more comprehensive view when making a diagnosis.
Now such applications are not far from entering the commercial space. In the research domain, I was last year fascinated by this study from South Korea, where the authors demonstrated a multi-modal algorithm which adopts a BERT-based architecture to maximize generalization performance for both vision-language understanding tasks (diagnosis classification, medical image-report retrieval, medical visual question answering) and vision-language generation task (radiology report generation). As you may know, BERT is a masked language model based on the Transformer architecture. Since this study, I have seen a wave of studies showcasing the efficacy of multimodal AI, such as this and this.
Back to Google's announcement last week, as part of the customised offering of PaLM 2 in various domains, Med-PaLM 2 developed to generate medical analysis was demonstrated too. As per this blog, Med-PaLM 2 will interpret and generate text (answer questions and summarise insights) and have multi-modal functionalities to analyse/interpret medical image modalities. Considering GPT-4 can analyse images and offers access to their API to external developers, it is not hard to foresee multi-modal medical AI applications in the market. Of course, as I see it, multi-modal AI is not going to be restricted to LLM architecture, and there are different ways to develop such applications. Also, it is not enough to have multi-modal functionality; you also need to have multi-outcome features.
I write this article not only to signal to healthcare stakeholders (policymakers, funders, health services. Etc.) about the future of medical AI software but also to forewarn narrow use case medical AI developers to pivot their development strategy to multi-modal AI functionality or be swept away as the floodgates of multi-modal AI is unleased.
Health System Academic