Transcript for: Improving the efficiency of AF screening


This transcript accompanies this talk.

Some of the transcript content is reproduced from the accompanying paper under CC BY 4.0.

Title Slide

Topic for today:

  • Potential strategies to improve the efficiency of AF screening, using data from the SAFER Programme.

Clinical Problem: undiagnosed AF

[Click] If atrial fibrillation was adequately treated in England, then it’s estimated that this could save [click] 2,000 lives per year, [click] prevent 7,000 strokes, and result in an additional 425,000 diagnoses.

Potential Solution: AF screening

It has been proposed that undiagnosed AF could be identified through screening. However, “it is unclear whether there is a benefit of formal screening programmes for AF over and above diagnosis of AF only through routine clinical practice” [UK National Screening Committee, 2019]. The ongoing SAFER Trial [click] aims to address this gap in evidence.

Personal Introduction

The work I’m presenting today consists of secondary analysis of data collected in the SAFER Programme. The Trial is being conducted at the University of Cambridge, recruiting patients from East Anglia (in green), and across England (in blue). Many of the analyses involve [click] ECG signal processing.


In this talk I’ll look at three key questions and potential answers:

  1. Is screening for AF effective?
  2. How can today’s screening be optimised?
  3. What might tomorrow’s screening look like?

Firstly, an introduction to the SAFER Programme.

The SAFER Programme

The SAFER Programme of research aims to determine [click] whether screening for AF is effective and cost-effective in reducing stroke and other key outcomes compared to current practice. It consists of [click] four parts [SAFER Website]:

  1. Feasibility study 1: A study in 10 GP Practices, including a face-to-face appointment, which 2,141 participants took part in. This has finished, confirming that it is feasible to deliver an AF screening programme in general practice, and that high numbers of patients accept an offer of AF screening and go on to complete the screening process.
  2. Feasibility study 2: When the COVID-19 pandemic struck, it became clear that screening could not be delivered in the practice. An additional feasibility study was conducted with participants from 3 GP Practices, in which the screening was delivered remotely.
  3. Internal pilot study: The first phase of the trial is now underway. An internal pilot study is being conducted in 36 practices, consisting of 12 intervention practices where patients will be offered screening, and 24 control practices.
  4. Cluster randomised controlled trial: Following the pilot, we expect to move straight into the remainder of the trial, including a further 324 practices. The whole trial, including the internal pilot, will enrol approximately 126,000 patients from 360 practices, a third of whom will be offered screening. They will be followed up for an average of 5 years.

The Screening Process

The three main steps of the screening process which I’ll refer to are as follows [click]:

  1. ECGs recorded at home: Firstly, participants are sent a handheld device which takes 30-second ECG recordings between two thumbs. Participants are asked to record an ECG 4 times per day, for 3 weeks [SAFER Website], producing approximately 84 ECGs per participant.
  2. Automated analysis: Secondly, the ECG recordings are automatically analysed to identify any that may contain AF.
  3. Clinical review: Thirdly, those ECGs which may contain AF are sent for clinical review to make a final diagnosis. Any participants diagnosed with AF then discuss possible anticoagulation with their GP.

Is screening for AF effective?

So, the first question.

Is screening for AF effective and cost effective in reducing stroke and other key outcomes compared to current practice?

This question hasn’t yet been answered in the ongoing SAFER Trial. However, results were recently published for the STROKESTOP trial, a trial using the same handheld devices in Sweden [ref]. There was a “small net benefit” to screening, as shown by the significantly lower number of events in those invited to screening, compared to the control group. Although there were some differences in trial design, the trial does provide some useful learning points for SAFER, including:

  • For screening to be effective, a high proportion of those invited to screening have to participate, since the endpoint was calculated using the intention-to-treat analysis, as is the case in SAFER. In STROKESTOP, of those invited to screening, 51.3% participated. Indeed, those who “declined the invitation for screening … had higher … stroke risk” [Lowres et al.]. Qualitative researchers in the SAFER team are collecting feedback from non-participants to understand why people decline the invitation to screening.
  • [click] The results also show promise for the choice of combined ischaemic and haemorrhagic stroke as the primary endpoint in SAFER. Both of these endpoints showed a signal towards a difference, although this was not significant difference.

How can today’s screening be optimised? (outline)

So, the second question: How can today’s screening be optimised? My colleagues and I have been working on optimising the acquisition of ECGs, and their automated review. I’ll present preliminary analyses on these two aspects.

ECGs recorded at home

Firstly, [click] we are establishing criteria to prompt a telephone call from the research team to participants to provide additional training when required on taking ECGs.

[Click] Many of the ECGs collected in SAFER are of high quality, such as this ECG here. However, [click] some ECGs are of low quality. These are often difficult, if not impossible, to interpret. Consequently, if a participant records too many low quality ECGs then it may not be possible to accurately identify AF.

[Click] In a retrospective analysis of data from the first SAFER Feasibility Study, we investigated the performance of different criteria for identifying participants who would record a high proportion of low quality ECGs. The analysis was performed on 1,486 participants, who each recorded at least 56 ECGs. The criteria were applied to ECGs received during the first 10 days (i.e. the first half) of screening. We aimed for a high sensitivity to participants with less than 75% high quality ECGs, and a low alert rate to minimise the workload associated with the additional calls.

[Click] Here are the results for three candidate criteria. For each criterion, a threshold was learnt from the data. The third criterion, which consisted of triggering a call if more than 25% of a participant’s ECGs were of low quality, resulted in a high sensitivity at the desired alert rate.

Effectiveness of criterion to prompt a training call

[Click] The effectiveness of this criterion was then assessed when it was used in the second SAFER Feasibility Study. This study was smaller, with n participants. n of these participants received a telephone call - reassuringly, this indicates an alert rate of approximately 4% as desired. It’s testament to the SAFER Team that these calls resulted in a reduction in the percentage of low quality recordings, as shown in this box plot. (describe reduction shown in plot)

So, we have found that training calls can be targeted to those participants who would benefit most from them, and that they are effective in improving the quality of ECG recordings.

Automated analysis

A second analysis which informed the Trial’s methodology was an assessment of the performance of algorithm tags for identifying AF, and the workload associated with reviewing ECGs classified with each tag.

I’ll briefly summarise how the algorithm works. [Click] Each ECG is analysed by [click] detecting heartbeats, [click] analysing the heart rhythm, and assessing [click] whether P-waves are present or not. A key question [click] is: which tags should be used to identify ECGs for review?

[click] In a retrospective analysis, I assessed the sensitivity of each tag to AF, and its positive predictive value. This was performed using manual annotations of 911 ECGs containing AF provided by cardiologists. Six tags showed reasonable performance (shown in bold). These were considered as candidates to identify ECGs for clinical review.

[click] In a second analysis, I investigated the workload and effectiveness of the combination of tags used in this Feasibility Study, and all possible combinations of the candidate tags. The combination of tags used in the study, shown in the first row, was particularly comprehensive, and resulted in 23,000 ECGs being sent for review. Through this process 54 participants were diagnosed with AF. The remaining rows show selected combinations using subsets of these tags. As the number of tags is reduced, the number of ECGs sent for review is reduced. Only in the last combination, the irregular sequence tag on its own, was there an impact on the diagnosis of AF, with one participant being missed.

Based on this, I recommended that the irregular sequence and fast regular tags be used in SAFER, which in this analysis would have substantially reduced the workload associated with ECG reviewing, whilst maintaining the number of AF diagnoses.

Ordering ECGs for review

When a participant’s ECGs are presented for clinical review, they are currently ordered chronologically. We’re interested in whether they could be ordered according to the probability of an AF diagnosis being made. This could improve the efficiency of the review process, as once an ECG exhibiting AF is found, no further ECGs are reviewed for that participant.

We performed a preliminary investigation by developing a logistic regression model to estimate the probability that an ECG will be diagnosed with AF. It took as inputs: the heart rate, standard deviation of RR-intervals, and classifications provided by the Cardiolund algorithm being used in the trial.

[Click] We found that the number of cardiologist reviews that would have been required in the first two feasibility studies could theoretically have been reduced by approximately a quarter by using this model to order ECGs, whilst still identifying all cases of AF.

[Click] We also found that only sending ECGs for review which had at least a threshold probability of AF, reduced the number of reviews further.

Inter-reviewer agreement

How can today’s screening be optimised?

Today I’ve presented preliminary results on questions relating to optimising the current approach to screening, specifically on: training calls, automated analysis of ECGs, and the review process. In addition, I am particularly interested in:

  • Can we develop a library of ECGs with which to train new reviewers?
  • How best to evaluate the probability that an ECG: (i) exhibits AF; and (ii) is of diagnostic quality?

What might tomorrow’s screening look like? (outline)

So, the third question: What might tomorrow’s screening look like? As an engineer, it is my hope that the data collected in both the SAFER Programme and follow-on studies will be useful for informing tomorrow’s approach to screening.

Could consumer devices be used for clinical decision making?

A key question going forward will be: Could consumer devices be used for clinical decision making?

Devices like those shown here are commercially available, including the AliveCor device (left), which allows users to record their ECG with a small add-on to their smartphone. Watches by Withings [click], such as that shown here, also allow users to record a 30-second ECG. And, fitness trackers and smartwatches which measure the arterial pulse wave are now commonplace [click], some of which can identify an irregular pulse which may be indicative of AF. We are starting the SAFER Wearables study, a study to assess the acceptability and performance of wearables such as those shown here for identifying AF in older adults.


None of this would have been possible without:


To conclude:

  • Evidence is emerging showing the clinical benefit of AF screening.
  • Trials are ongoing, which will also assess the cost-effectiveness.
  • If wearables are used to identify AF, they should be like a harness: highly reliable (effective), and good value (cost-effective).