Large ‐ scale remote fear conditioning: Demonstration of associations with anxiety using the FLARe smartphone app

Objectives: We aimed to examine differences in fear conditioning between anxious and nonanxious participants in a single large sample. Materials and methods: We employed a remote fear conditioning task (FLARe) to collect data from participants from the Twins Early Development Study ( n = 1,146; 41% anxious vs. 59% nonanxious). Differences between groups were estimated for their expectancy of an aversive outcome towards a reinforced conditional stimulus (CS+) and an unreinforced conditional stimulus (CS − ) during acquisition and extinction phases. Results: During acquisition, the anxious group (vs. nonanxious group) showed greater expectancy towards the CS − . During extinction, the anxious group (vs. nonanxious group) showed greater expectancy to both CSs. These comparisons yielded effect size estimates ( d = 0.26 – 0.34) similar to those identified in previous meta ‐ analyses. Conclusion: The current study demonstrates that remote fear conditioning can be used to detect differences between groups of anxious and nonanxious individuals, which appear to be consistent with previous meta ‐ analyses including in ‐ person studies.


| INTRODUCTION
Fear conditioning models aversive associative learning, a key process involved in the development, maintenance, and treatment of anxiety disorders (Craske et al., 2018;Mineka & Oehlberg, 2008;Pittig et al., 2018).Differential fear conditioning tasks use neutral "conditional stimuli" (CS; e.g., images of shapes) and aversive "unconditional stimuli" (US; e.g., loud scream) to experimentally manipulate fear-based learning.During acquisition, one CS is reinforced (CS+) by repeatedly presenting it with the aversive US, while another nonreinforced CS is presented alone (CS−).After multiple presentations, the CS+ typically elicits a conditional response (e.g., sweating) reflecting anticipation of US onset, whereas the CS− does not.Conditional responses can include self-report, behavioral, and physiological/neurobiological changes (Lonsdorf et al., 2017).During extinction, the CS+ and CS− are repeatedly presented without the US.Extinction usually results in a decrease in conditional responses driven by the development of a competing association between the CS+ and safety (Bouton, 1993).
Acquisition and extinction model the development and exposurebased treatment of anxiety, respectively.
Multiple studies have examined differences in fear conditioning between anxious and nonanxious participants, with varying patterns of response emerging across different anxiety disorders and outcome measures.During acquisition, anxious participants tend to show either greater responding to the CS+ and CS− (Blechert et al., 2007;Norrholm et al., 2011;Orr et al., 2000), or to the CS− only (Lissek et al., 2009(Lissek et al., , 2010;;Rabinak et al., 2017).This inconsistency is reflected in two meta-analyses, as one found stronger responses to both the CS+ and CS− (Lissek et al., 2005) while the other, more recent analysis found stronger responses to the CS− only (Duits et al., 2015).These findings suggest poor inhibitory responding to safety (CS−) among anxious participants and potentially increased excitatory responding to threat (CS+).One explanation for poor inhibitory responding is that, during threatening or uncertain situations, anxious participants generalize their fear of the CS+ towards nonthreatening stimuli (i.e., the CS−; Duits et al., 2015, for a review of alternative explanations see Lissek et al., 2005).Findings are somewhat more consistent during extinction, with studies showing stronger responses to the CS+ for anxious participants compared with controls, suggesting that it is difficult for them to develop new inhibitory learning to a previously threatening cue (Blechert et al., 2007;Duits et al., 2015;Lissek et al., 2005;Michael et al., 2007;Norrholm et al., 2011).
Inconsistencies across studies are likely due to heterogeneity in the adopted methodology, sample, and analytical approach (Lonsdorf et al., 2017(Lonsdorf et al., , 2019;;Ney et al., 2018).In addition, effect size estimates from the most recent meta-analysis suggest differences between groups are modest (d = 0.3-0.35;Duits et al., 2015).Therefore, the typically small sample sizes used in fear conditioning research likely mean that many studies are underpowered to detect these effects.
For example, no individual study included in Duits et al.'s (2015) meta-analysis had sufficient power to detect the effect sizes produced by combining samples.The largest study in the meta-analysis had 270 participants (Norrholm et al., 2013).Power calculations (using G*Power; Faul et al., 2007) show 352 participants would be required to detect similar effect sizes (d > 0.3, ɑ = .05,1−β = .8)to those observed for the main outcomes in Duits et al. (2015).
To overcome issues around study heterogeneity and power, we developed a smartphone app, Fear Learning and Anxiety Response (FLARe), which delivers a fear conditioning task remotely via smartphone (Purves et al., 2019).Remote delivery removes many barriers to conducting large-scale experiments by vastly increasing the number of simultaneous assessments, and reducing the time and cost needed.FLARe assesses self-reported expectancy of the US (a loud scream) during CS presentations (geometric shapes) throughout acquisition and extinction.We have previously validated FLARe against standard in-person (laboratory) data collection, demonstrating within-person correlations of fear conditioning outcomes between laboratory and app delivery did not differ from those seen across time using the same delivery mode (Purves et al., 2019).
FLARe presents a novel opportunity to examine differences between anxious and nonanxious individuals within a large sample without the confounds of task, sample, and analysis variability.
The current study is the first attempt to assess differences in fear conditioning between anxious versus nonanxious participants via data collected remotely from a single, large sample using a mobile app, FLARe.The primary analyses investigated differences in fear conditioning between anxious participants self-reporting current or lifetime, clinically relevant anxiety, and nonanxious participants reporting no such experience.In addition, secondary sensitivity analyses examined current and prior anxiety separately.To compare with previous research, we present mean discrimination (CS+ minus CS−) and CS-specific (CS+ or CS−) between-group differences in expectancy scores for each phase (acquisition and extinction).Based on subjective (i.e., self-reported) results from the most recent metaanalysis (Duits et al., 2015), we hypothesized: (i) during acquisition, anxious individuals (compared with nonanxious individuals) would show greater responses to the CS− and poorer discrimination scores, and (ii) during extinction, anxious individuals would display greater responses to both the CS+ and CS−.The study initially recruited approximately 16,000 families and approximately 8,000 continue to participate (Rimfeld et al., 2019).The cohort continues to be roughly representative of the population in Figure 1 illustrates sample size from recruitment to analyses.See supplementary information for anxiety and socioeconomic status comparisons between consenters and nonconsenters and Table S1 for information on twin relatedness.Participants were screened-out if they had a pre-existing heart condition, a neurological condition, an uncorrected hearing impairment, or were pregnant.Of the participants meeting screening criteria, 382 participants did not complete the task and a further 180 rated the US unpleasantness five out of 10 or lower (1-"Not unpleasant at all", 10-"Very unpleasant").Participants were excluded from analyses (n = 738) for self-reporting they did not follow the task instructions (e.g., removing their headphones), or if the app detected it had (a) been closed or (b) average device volume was lower than 50%.The final number of participants who met criteria to be included in either group (anxious vs. nonanxious) was 1,146.

| Procedure
After downloading and logging into the app, participants were asked to complete consent and screening procedures.Eligible participants continued by supplying demographic information.Next, participants were given setup instructions (see Figure S1).They were instructed to complete the session alone, in a quiet room where they were unlikely to be disturbed.During the setup, the app detects whether (a) the mobile device is connected to headphones and (b) its volume is set to maximum.Participants are only able to begin the fear conditioning task once these two requirements are met.Taskspecific instructions were provided following setup (see Figure S2).
We assessed self-reported expectancy of the US during each CS presentation (for average trial-by-trial expectancies see Figure S3).
Stimulus choice for the CS (different sized circles) and US (a loud scream) was made because they could easily be delivered via smartphone and are often used in fear conditioning (Lonsdorf et al., 2017).
During the first phase of the task, acquisition, participants completed a total of 24 pseudo-randomized trials (12 per CS).Each trial lasted 8 s (Figure 2b).Throughout each trial, one CS was presented superimposed on a context image of an outdoor scene (Context A; Figure 2a).After 2 s, expectancy ratings became available at the bottom of the screen.
On 75% of CS+ trials, the US occurred during the final 500 ms of the trial.The US never occurred during CS− trials.Each trial was separated by an intertrial interval (ITI) where participants were instructed to focus on a fixation cross.ITI length was randomized to be 2, 2.5, or 3 s.After acquisition, participants had a 10-min break during which they completed the first set of questionnaires (not analyzed here, see Table S2 for details on all questionnaire measures collected during the study).
Following the break, participants completed the second phase of the fear conditioning task, fear extinction.Extinction consisted of 36 trials in total (18 per CS).Trials followed the same format as acquisition, although the context image used was that of an indoor scene (Context B) and neither CS was paired with the US.Although not used here, FLARe has an additional optional phase delivered a day later to allow assessment of fear renewal in Context A (i.e., ABA conditioning).Renewal was not included to minimize participant burden.The AB context switch in the current study is comparable to studies included in Duits et al. (2015) that also switched context after acquisition as part of a return of fear paradigm (e.g., Milad et al., 2008Milad et al., , 2009Milad et al., , 2013)).For each trial, measures of task compliance were collected via the app including headphone connection, volume and whether participants left the app.
Following extinction, participants were redirected to a second set of questionnaires hosted on an external website (Qualtrics, Provo, UT) which contained all the self-report measures used in the current study.
These questionnaires included further items about task compliance, one of which was used to determine whether participants removed their headphones.

| Questionnaire measures
Two questionnaires were used to assign individuals to the anxious or nonanxious group.

Generalized Anxiety Disorder seven-item scale
Participants rated the frequency that they had experienced symptoms of anxiety over the past two weeks on a four-point scale ranging from "Not at all" (0) to "Nearly every day" (3).The Generalized Anxiety Disorder seven-item scale (GAD-7) has been shown to have good criterion validity for detecting anxiety disorders (Spitzer et al., 2006).Total scores range from 0 to 21, with scores of 10 or greater indicating clinically significant moderate to severe anxiety.

Self-reported lifetime diagnoses
Participants were asked "Have you ever been diagnosed with one or more of the following mental health problems by a professional, even if you don't have it currently?"(for a full list of response options, see Table S3).Single-item measures assessing anxiety disorder diagnosis have been shown to have reasonable agreement (76.7%) with more detailed algorithm-based assessments (Davies et al., 2021).
For the primary analyses, participants were included in the anxious group if they had clinically significant current levels of anxiety (GAD-7 > 10; n = 299) or if they reported having a diagnosis of an anxiety disorder across the lifespan (n = 306).Lifetime anxiety disorder diagnosis included generalized anxiety disorder, social phobia, panic attacks, agoraphobia, specific phobia, obsessive compulsive disorder (OCD), or posttraumatic stress disorder (PTSD).Diagnosis of OCD and PTSD were included to reflect the Duits et al. (2015) meta-analysis which was based on the DSM-IV (American Psychiatric Association, 1994), though no participants ended up reporting a diagnosis of OCD or PTSD (see Table S4 for a breakdown of anxiety diagnoses).Participants reporting that a clinician had diagnosed them as suffering from panic attacks (n = 22) were also included in the anxious group.Panic attacks are not, however, an official anxiety disorder diagnosis but rather a key symptom for some anxiety disorders, in particular panic disorder.This terminology was chosen to represent how mental health disorders are commonly referred to rather than their strict DSM classification (i.e., panic disorder) to help participants recognize and self-report their diagnosis.Single-item measures of self-reported, clinician diagnoses of panic attacks have been shown to have moderate agreement with algorithm-based measures of panic disorder (65.4%;Davies et al., 2021).
The two groups identified through the GAD-7 or by reporting lifetime diagnoses overlapped considerably (n = 132), resulting in the total number of participants in the anxious group being 473.Both groups were included to maximize power, but sensitivity analyses were also conducted using participants with current (n = 299) and prior (i.e., self-reported lifetime diagnosis, but GAD-7 < 10; n = 174) anxiety separately.Participants were included in the nonanxious group if their GAD-7 scores were below five (cut-off for mild anxiety) and did not report a lifetime mental health diagnosis.

Expectancy ratings
During each trial, participants were asked to rate their certainty that the trial would end with the occurrence of a loud scream (US).This "expectancy rating" was made using a nine-point scale ranging from one ("certain no scream") to nine ("certain scream"), with five indicating uncertainty ("uncertain").Expectancy ratings are a valid index of fear conditioning (Boddez et al., 2013) commonly used in investigations comparing anxious and nonanxious individuals (Blechert et al., 2007;Lissek et al., 2009;Norrholm et al., 2011).For both phases, mean expectancy ratings for each stimulus (CS+/CS−) were calculated to index participants' conditioning and extinction.In addition, differences between the mean expectancy ratings for each stimulus (CS+ minus CS−) were calculated to index discrimination learning for both phases.

| Statistical analyses
An analysis of variance (ANOVA) was conducted twice for both phases.The first ANOVA tested mean expectancy ratings for the CS using a two-factor mixed-design with group (anxious vs. nonanxious) and stimulus (CS+ vs. CS−) entered as between-subjects and withinsubjects factors, respectively.The second tested CS-discrimination scores using a one-way between-subjects design where group (anxious vs. nonanxious) was entered as the between-subjects factor.
For each ANOVA, follow-up tests were conducted for pairwise comparisons using Tukey's Honestly Significant Difference (HSD) test to control for multiple comparisons.Cohen's d effect size estimates were calculated to indicate standardized differences between means allowing for comparisons with Duits et al.'s (2015) metaanalysis.This process was conducted for the primary analysis comparing participants with current and/or lifetime anxiety to nonanxious participants.Sensitivity analyses were then run, considering current anxiety and prior anxiety cases only.All analyses were conducted in R (3.6.1) using the Stats (3.2.1) and Psych (1.8.12) packages.
Two additional sensitivity analyses were conducted (see supplementary information Tables S5-S7 and Figure S4).The first considered only one participant from complete pairs of twins to see whether the clustered nature of the data was impacting results.The second analyses included excluded participants to assess whether their data introduced considerable noise, or whether it appears to add further sensible/usable information.This was done due to the fact that the majority of excluded participants had been removed for self-reporting headphone removal, yet the point at which they removed their headphones could not be determined (e.g., during extinction or after many acquisition trials).

| RESULTS
Females were three times as likely as males to be in the anxious group (odds ratio = 2.96; 95% confidence interval = 2.21-3.98,p < .001;Table 1).The two groups were of similar age, with a mean difference in age of approximately 1.5 months (d = 0.15; t(998.34)= 2.46, p = .01).

| Acquisition
For the primary analyses, there was a statistically significant group × stimulus type interaction on mean expectancy ratings (F(1, 2228) = 26.06,p < .001;Table 2).Post hoc tests showed that, compared with the nonanxious group, the anxious group showed (a) no significant difference in expectancy ratings towards the CS + (d = −0.14, p = .086)and (b) significantly higher mean expectancy ratings towards the CS-(d = 0.28, p < .001;see Table 3 for mean expectancy scores).There was a statistically significant main effect of group on expectancy discrimination scores indicating that, compared with the non-anxious group, the anxious group had lower expectancy discrimination scores (d = −0.25;F(1, 1144) = 17.20, p < .001). Figure 3 illustrates effect sizes for both phases of the fear conditioning task, and also indicates the findings for subjective outcome measures from the latest meta-analysis for comparison (Duits et al., 2015).Both secondary sensitivity analyses, that is, looking at participants with current and prior anxiety separately, followed the same pattern of results as the primary analyses.
Two additional sensitivity analyses assessing the impact of excluding twin pairs and including participants who disregarded instructions (see supplementary information Tables S5-S7 and Figure S4) showed that the pattern of effects remained the same, though effect sizes varied slightly.In line with our hypotheses, our findings followed the same pattern of effects seen for subjective ratings in the most recent metaanalysis of fear conditioning in the anxiety disorders.Secondary sensitivity analyses showed these effects still stood when analyzing participants with current and prior anxiety separately.Significant anxiety-related differences were also found for discrimination scores during extinction.However, the effect size for this difference was very small (d = −0.14)and was not observed in the prior anxiety only sensitivity analyses.

| DISCUSSION
During acquisition, no anxiety-related difference in expectancy ratings was observed for the CS+.This could have been due to a high reinforcement rate (75%) causing a "strong situation" (Lissek et al., 2006) whereby ambiguity concerning the likelihood of the US occurring was low.In such a case, it is possible that most participants, regardless of anxiety status, would give high expectancy ratings.This may be further explained by a ceiling effect where our expectancy rating scale did not allow for enough variation in confidence levels regarding the upcoming US.As such, both anxious and nonanxious participants may have been constrained to one or two rating levels towards the "certain" end of the scale.The largest difference in expectancy ratings was seen towards the CS−, with anxious participants displaying greater US expectancy to the safety stimulus than nonanxious participants.This finding suggests anxious individuals have a tendency to generalize their threat response from threatening stimuli (CS+) to nonthreatening but perceptually similar stimuli (CS−).Further evidence of this "overgeneralization" is provided through anxious participants' poor discriminatory learning.Overgeneralization is thought to maintain/exacerbate anxiety symptoms by increasing the number of threatening cues an anxious individual perceives in their environment (Lissek, 2012).
During extinction, the anxious group showed greater expectancy ratings for both CSs compared with the nonanxious group.This provides evidence that anxious individuals are resistant to the extinction of both conditioned (CS+) and generalized (CS−) fear.Difficulty extinguishing generalized fear poses a challenge for individuals undergoing exposure therapy.For example, for a patient with a phobia of dogs, exposure therapy focused on a single dog may help reduce their anxiety towards that breed but leave the patient with a generalized phobia towards other dogs.Previous research has shown that strengthening inhibitory learning (i.e., learning that a stimulus can be safe) towards a variety of related stimuli can improve extinction in conditioning tasks and exposure outcomes for anxious individuals (Carpenter et al., 2019;Craske et al., 2008).In practice, a clinician might decide to treat a patient with a phobia of dogs by eventually exposing them to a number of different breeds, colors and sizes of dog to reduce the patient's symptoms.
Our findings were consistent with a previous meta-analysis (Duits et al., 2015), highlighting important differences in fear conditioning processes between anxious and nonanxious individuals.Though effect sizes were modest, these differences were observed in participants reporting both current and prior anxiety and provide further evidence of fear conditioning's ability to model differences between healthy and at-risk individuals using expectancy rating data (Boddez et al., 2013).
However, the ability to use individual differences in fear conditioning response to predict differences in anxiety or treatment response (i.e., predictive validity) remains the best test of whether findings from human fear conditioning research will help us understand the development and treatment of anxiety (Carpenter et al., 2019).Studies have used overgeneralization of fear and deficits in extinction learning to assess risk (Lommen et al., 2013;Sijbrandij et al., 2013), or predict treatment outcomes (Forcadell et al., 2017;Raeder et al., 2020;Waters & Pine, 2016), in anxiety disorder patients with varying levels of success.
Inconsistent findings could relate to the relatively small effect sizes seen for the differences between groups, as demonstrated by effect sizes in our study and a previous meta-analysis | 725 (Duits et al., 2015), which suggests a substantial proportion of variance is unexplained.Larger sample sizes afforded by remote research may improve our detection and prediction of individual differences in anxiety status and treatment response.
Using FLARe, we quickly collected data from a large sample of participants.This allowed us to conduct what we believe to be the largest human fear conditioning study to date.FLARe can easily be adopted in a variety of contexts, including clinical settings.Future research should take advantage of these benefits to reach clinical samples, study differences between specific types of anxiety disorders, and explore unique and interacting contributions between fear conditioning outcomes and other key processes underlying anxiety.In addition, the power afforded by large samples allows the use of more complex research methods, such as longitudinal studies including cohorts, treatment trials, and genetically sensitive designs.

| Limitations
Though remote research vastly increases ease of data collection, control over participant behavior is diminished.Many participants were excluded for not following task instructions (reported in a postexperiment survey), primarily headphone removal during testing.
Self-reported anxiety scores indicated this set of participants were more anxious than participants who followed instructions; excluding them may have impacted our effect sizes.
While our findings replicated those from a meta-analysis (Duits et al., 2015), key differences in methodology should be highlighted.Abbreviations: CIs, confidence intervals; CS, conditional stimulus; HSD, Tukey's honestly significant difference; SE, standard error.
limitations relating to concerns around accuracy and memory of reporting.However, the GAD-7 has been shown to have good face validity for identifying individuals with anxiety disorders (Spitzer et al., 2006) and self-reported anxiety diagnoses have been shown to have reasonable agreement (76.7%) with an algorithm-based measure of anxiety disorders (Davies et al., 2021).Agreement for selfreported anxiety diagnoses was lower when looking at specific anxiety disorders separately, which was avoided in the current article.
The effects of anxiety on patterns of fear conditioning were replicated in our study, suggesting that they generalize to groups selected using self-report measures.The meta-analysis also focused specifically on participants with current anxiety, whereas our study looked at participants with current and/or prior anxiety.Secondary sensitivity analyses in our sample showed similar patterns of results when current and prior anxiety were looked at separately.However, smaller effect sizes were observed when analyses were restricted to Participants were recruited from the Twins Early Development Study (TEDS) via email invitation.TEDS is a longitudinal birth cohort study of twins born in England and Wales between 1994 and 1996.
England and Wales with regard to ethnicity and family socioeconomic factors.To take part in the current study, participants needed an Android or iOS smartphone to download the FLARe app, which delivered the fear conditioning task and collected detailed study information including informed consent.Ethical approval was granted by the King's College London Psychiatry, Midwifery and Nursing Research Ethics Subcommittees (application PNM/09/10-104).All participants who completed the study received a £10 gift voucher as reimbursement.

F
I G U R E 2 Visualization of experimental design implemented in the FLARe app.Schematic of overall task structure (a) with numbers representing the amount of times a stimulus is presented.Schematic of trial structure (b).CS; conditional stimuli.US; unconditional stimulus, a loud human scream played through headphones at a loud volume.Context; an outdoor scene (Context A) during the acquisition phase, an indoor scene (Context B) during the extinction phase Using a novel remote fear conditioning task, we examined differences in expectancy ratings during acquisition and extinction between anxious and nonanxious individuals in a large sample of young adults.During acquisition, anxious individuals had larger expectancy ratings towards the CS− (d = 0.29) and smaller discrimination scores (d = −0.25)compared with nonanxious individuals.During extinction, anxious individuals had larger expectancy ratings towards the CS+ (d = 0.34) and CS− (d = 0.26) compared with nonanxious individuals.

F
I G U R E 3 Barplots of the effect sizes (d) per stimulus type, reflecting the standardized mean difference in expectancy ratings between anxious participants minus nonanxious participants during acquisition and extinction.Error bars display the unadjusted 95% confidence interval of the effect size estimate.Stars reflect the significance level of the difference between groups calculated using Tukey HSD which adjusts for multiple comparisons; *p < .05,**p < .01,***p < .001.Red diamonds indicate the effect size estimates (d) for subjective ratings from the Duits et al. (2015) meta-analysis.CS, conditional stimulus; HSD, honestly significant difference T A B L E 1 Sample characteristics for each group indicating: The number and proportion of males and females; means and standard errors for age and GAD-7 scores; the number of participants meeting different criteria for inclusion in the anxious group T A B L E 2 For both phases (acquisition/extinction), results for two ANOVAs testing (i) mean expectancy ratings with group (anxious/ nonanxious) as a between-subjects factor and stimulus (CS+/CS−) as a within-subjects factor; (ii) CS-discrimination scores where group (anxious/nonanxious) was entered as the between-subjects factor Degrees of freedom, F statistics and p values for each ANOVA at acquisition and extinction.Significant p values (p < .05)are emphasized in bold.
Mean expectancy rating scores for each group and each phase (acquisition/extinction) with Tukey HSD estimates, Cohen's d estimates, and p values for the difference between groups for all fear conditioning outcomes First, sample characteristics, such as age, sex, and distribution of anxiety diagnoses, differ across the two studies, as does the method used to obtain them.Previous meta-analyses included studies using clinician assessment, where as we used self-report measures to group participants.Like clinical interviews, self-report measures have T A B L E 3 Note: Significant p values (p < .05)are emphasized in bold.