Graphical Depiction of Longitudinal Study Designs in Health Care Databases

The pharmacoepidemiologic and pharmacoeconomic analysis of databases containing administrative claims and electronic health records has become a routine source of evidence to support regulatory (1) and reimbursement (2) decisions, as well as efficient management of health care organizations. When decision makers understand the study design and analytic choices of a nonrandomized database study and recognize those choices as valid, they have confidence in their decisions based on the study's evidence about the comparative effectiveness and safety of medical products (3, 4). Generally, they consider nonexperimental database studies more difficult to review than randomized trials and see the increased complexity, greater variability in design and analysis options, and lack of consistency in presentation of design choices as key barriers to using database evidence for high-stakes decisions. Unfortunately, some poorly designed studies have led to negative generalizations about the entire field of health care database research rather than a refined view that distinguishes robust evidence from less reliable evidence (5). Confounding from treatment selection based on outcome risk is well known to cause bias (6). Time-related study design flaws can also introduce large biases, including immortal time bias (7), reverse causation (8, 9), adjustment for causal intermediates, unobservable time bias (10), and depletion of susceptibles (11, 12). The methods sections of study reports should describe the study design and analytic choices clearly enough to allow the reader to judge the validity of findings. However, convoluted prose often makes it difficult for most readers to understand what methods were implemented or identify avoidable design flaws. Design diagrams provide key information that needs to be considered when evidence is interpreted from pharmacoepidemiologic and pharmacoeconomic studies done with health care databases. Improving transparency in how these studies are designed and implemented will make it easier for reviewers and decision makers to distinguish the useful from the flawed or irrelevant (13). Graphical study design representations were recommended by the most recent guidance for reporting on database studies from the REporting of studies Conducted using Observational Routinely collected health Data statement for pharmacoepidemiology (RECORD-PE) (14), as well as recently published consensus papers by 2 leading professional societies (15, 16). We propose a simple framework of graphical representations that will clarify critical design choices in database analyses of the effectiveness and safety of medical products. A recent consensus statement laid out a set of parameters that define decisions in database study implementation, which, if reported, would increase reproducibility of studies (16). Building on these parameters, we sought to develop a visualization framework that describes study design implementation in a comprehensive, unambiguous, and intuitive way; contains a level of detail that enables reproduction of key study design variables; and uses standardized structure and terminology to simplify review and communication to a broad audience of decision makers. Our multistakeholder group comprised international leaders with more than 75 years of combined experience in academia, regulatory decisions, health technology assessment, journal leadership, payer decision making, and analyses of distributed health care data networks. The example figures and templates are covered by a Creative Commons license. The PowerPoint figures are free to download and adapt, with appropriate attribution, from www.repeatinitiative.org/projects.html. Terminology The terminology we suggest for temporal anchors is frequently used in descriptions of database studies and in textbooks (17), as well as in the recently published consensus statement (15, 16). We define 3 categories of temporal anchors (Table): base anchors, first-order anchors, and second-order anchors. Base anchors are defined in calendar time and describe the source databasethat is, the longitudinal streams of administrative or clinical health care data from which an analyzable study data set is derived. First-order anchors are defined in patient event time rather than calendar time and specify the study entry or index date. Second-order anchors are also measured in patient event time and are defined relative to the first-order anchor. We provide more detail on each temporal anchor in the following section. Table. Temporal Anchors Study Design Implementation in Health Care Databases The Nature of Health Care Databases Relevant to Effectiveness Research Health care databases are derived from transactional databases that record clinical and administrative information for delivering and administering health care. As encounters occur and services are provided, records are generated and tallied. Each addition to the database comes with a service date stamp and is attributed to the patient via a unique patient identification number, thus generating longitudinal patient records of increasing duration. There is substantial literature describing the details of data integration, cleaning, and normalization (1820). For each patient, all encounters with the health care system that are reimbursable by health insurance (or are captured by the provider's electronic health record system) can be sorted by the service date in calendar time (Figure 1). Each encounter is associated with information on medical services, diagnoses, procedures, and similar events, plus information on payments (in claims data) or charges (in electronic health record data). The rules and algorithms that stem from a specific study implementation will then be applied to each patient's longitudinal data stream. The study implementation is usually oriented around an event-based timeline anchored to a key event, in contrast to the calendar time arrangement of the raw data (Figure 1) (21). Figure 1. From transactional data to study implementation. Individual patient data are documented as encounters from various sources and are arranged in calendar time. This work is licensed under CC BY, and the original versions can be found at www.repeatinitiative.org/projects.html. Dx= diagnosis; E= exposure; Lab= laboratory test; O= outcome; Rx= drug dispensing; V= visit. Dates and Time Windows Certain principles guide the design and implementation of studies in health care data streams. One of the most important is temporality. Unlike in primary data collection, many measurements in health care databasesfor example, patients' baseline characteristicsare measured by reviewing information recorded during multiple health care encounters over time. In primary data collection, a study participant's health state is usually established when the patient is thoroughly interviewed or examined at a study visit. Health care databases have no defined interview date with the investigator team; rather, studies rely on the occurrence of routine visits and other health care encounters to collect information that was recorded during provision of care. Thus, information that may be conceptualized as characterizing a point in time, such as baseline patient characteristics before the start of exposure, is actually recorded during a time window through a series of encounters. Anchors in Calendar Time For a database study to be reproducible, temporal anchors must be defined to specify the underlying longitudinal data used to create a study population (Table). The data extraction date is particularly important to record when working with recent data that are still fluid. The dynamic data flow in a health care database is stabilized by extracting and physically or virtually setting aside requested data for research purposes. However, some administrative records may be corrected or amended retroactively for up to 6 months or longer (22). If the underlying database has data that are dynamically updated over time, a study using the most recently available data extracted today will probably not be exactly replicated using data covering the same period but extracted a year later. The source data range reflects the calendar date boundaries beyond which encounter information is not captured for patients. Investigators must be clear about the lag between the most recent update to the data source and the calendar time boundaries for data included in their study (study period). For example, investigators may access a data source where the tables containing up-to-date information on patient health care contacts are extracted on 1 January 2019 (data extraction date). The source data range included in those tables covers 1 January 2003 to 31 December 2018. The investigators, however, choose a study period that focuses on time after market entry of a drug and does not use the most recent 6 months, a period during which the data may be more fluid. The data extraction date and source data range do not need to be included in visualization of study design, but reporting them and archiving extracted longitudinal data will make study implementation reproducible (16). Anchors in Patient Event Time When an effectiveness or safety study is implemented in a longitudinal database, the time scale shifts from calendar time to patient event time. Specific algorithms define events in the patient timeline. As in randomized controlled trials, where the randomization date is the anchor date, the cohort entry date (CED, also called the index date) is the primary anchor in a nonrandomized database study (Table). The CED is the date when patients enter the analytic study population. For some study designs, study entry can be defined by an event date (as described under Nested CaseControl Study and in Self-Controlled Study Design Visualization in the Appendix). The CED is considered a first-order anchor because most other anchors and parameters used in study implementation w

T he pharmacoepidemiologic and pharmacoeconomic analysis of databases containing administrative claims and electronic health records has become a routine source of evidence to support regulatory (1) and reimbursement (2) decisions, as well as efficient management of health care organizations. When decision makers understand the study design and analytic choices of a nonrandomized database study and recognize those choices as valid, they have confidence in their decisions based on the study's evidence about the comparative effectiveness and safety of medical products (3,4). Generally, they consider nonexperimental database studies more difficult to review than randomized trials and see the increased complexity, greater variability in design and analysis options, and lack of consistency in presentation of design choices as key barriers to using database evidence for high-stakes decisions.
Unfortunately, some poorly designed studies have led to negative generalizations about the entire field of health care database research rather than a refined view that distinguishes robust evidence from less reliable evidence (5). Confounding from treatment selection based on outcome risk is well known to cause bias (6). Time-related study design flaws can also introduce large biases, including immortal time bias (7), reverse causation (8,9), adjustment for causal intermediates, unobservable time bias (10), and depletion of susceptibles (11,12). The methods sections of study reports should describe the study design and analytic choices clearly enough to allow the reader to judge the validity of findings. However, convoluted prose often makes it difficult for most readers to understand what methods were implemented or identify avoidable design flaws.
Design diagrams provide key information that needs to be considered when evidence is interpreted from pharmacoepidemiologic and pharmacoeconomic studies done with health care databases. Improving transparency in how these studies are designed and implemented will make it easier for reviewers and decision makers to distinguish the useful from the flawed or irrelevant (13). Graphical study design representations were recommended by the most recent guidance for reporting on database studies from the REporting of studies Conducted using Observational Routinely collected health Data statement for pharmacoepidemiology (RECORD-PE) (14), as well as recently published consensus papers by 2 leading professional societies (15,16).
We propose a simple framework of graphical representations that will clarify critical design choices in database analyses of the effectiveness and safety of medical products. A recent consensus statement laid out a set of parameters that define decisions in database study implementation, which, if reported, would increase reproducibility of studies (16). Building on these parameters, we sought to develop a visualization framework that describes study design implementation in a comprehensive, unambiguous, and intuitive way; contains a level of detail that enables reproduction of key study design variables; and uses standardized structure and terminology to simplify review and communication to a broad audience of decision makers. Our multistakeholder group comprised international leaders with more than 75 years of combined experience in academia, regulatory decisions, health technology assessment, journal leadership, payer decision making, and analyses of distributed health care data networks. The example figures and templates are covered by a Creative Commons license. The PowerPoint figures are free to download and adapt, with appropriate attribution, from www.repeatinitiative .org/projects.html.

TERMINOLOGY
The terminology we suggest for temporal anchors is frequently used in descriptions of database studies and in textbooks (17), as well as in the recently published con-RESEARCH AND REPORTING METHODS Annals of Internal Medicine sensus statement (15,16). We define 3 categories of temporal anchors (Table): base anchors, first-order anchors, and second-order anchors. Base anchors are defined in calendar time and describe the source database-that is, the longitudinal streams of administrative or clinical health care data from which an analyzable study data set is derived. First-order anchors are defined in patient event time rather than calendar time and specify the study entry or index date. Second-order anchors are also measured in patient event time and are defined relative to the firstorder anchor. We provide more detail on each temporal anchor in the following section.

STUDY DESIGN IMPLEMENTATION IN HEALTH CARE DATABASES The Nature of Health Care Databases Relevant to Effectiveness Research
Health care databases are derived from transactional databases that record clinical and administrative information for delivering and administering health care. As encounters occur and services are provided, records are generated and tallied. Each addition to the database comes with a service date stamp and is attributed to the patient via a unique patient identification number, thus generating longitudinal patient records of increasing duration. There is substantial literature describing the details of data integration, cleaning, and normalization (18 -20).
For each patient, all encounters with the health care system that are reimbursable by health insurance (or are captured by the provider's electronic health record system) can be sorted by the service date in calendar time ( Figure 1). Each encounter is associated with information on medical services, diagnoses, procedures, and similar events, plus information on payments (in claims data) or charges (in electronic health record data). The rules and algorithms that stem from a specific study implementation will then be applied to each patient's longitudinal data stream. The study implementation is usually oriented around an eventbased timeline anchored to a key event, in contrast to the calendar time arrangement of the raw data ( Figure  1) (21).

Dates and Time Windows
Certain principles guide the design and implementation of studies in health care data streams. One of the most important is temporality. Unlike in primary data collection, many measurements in health care databases-for example, patients' baseline characteristics-are measured by reviewing information recorded during multiple health care encounters over time. In primary data collection, a study participant's health state is usually established when the patient is thoroughly interviewed or examined at a study visit. Health care databases have no defined interview date with the investigator team; rather, studies rely on the occurrence of routine visits and other health care Second-order anchors (defined in patient event time, relative to first-order anchor) Washout window for exposure An interval used to define incident exposure. If there is no record of the exposure (and/or comparator) of interest within this interval, the next exposure is considered a "new" initiation; otherwise, it is considered prevalent exposure. Washout window for outcome An interval used to define incident outcomes. If there is no record of outcomes within this interval, the next outcome is considered incident. Exclusion assessment window An interval during which patient exclusion criteria are assessed. Covariate assessment window An interval during which patient covariates are assessed. The covariate assessment window should precede the exposure assessment window in order to avoid adjusting for causal intermediates. It is sometimes called baseline period. Exposure assessment window The window during which exposure status is assessed. The exposure status is defined at the end of the exposure assessment window. † The exposure assessment window should precede the follow-up window to avoid reverse causation.

Follow-up window
The interval during which occurrence of the outcome of interest in the study population will be included in the analysis. The follow-up window may involve stockpiling algorithms, grace periods, exposure extension, and/or censoring related to exposure discontinuation. encounters to collect information that was recorded during provision of care. Thus, information that may be conceptualized as characterizing a point in time, such as baseline patient characteristics before the start of exposure, is actually recorded during a time window through a series of encounters.

Anchors in Calendar Time
For a database study to be reproducible, temporal anchors must be defined to specify the underlying longitudinal data used to create a study population (Table). The data extraction date is particularly important to record when working with recent data that are still fluid. The dynamic data flow in a health care database is stabilized by extracting and physically or virtually setting aside requested data for research purposes. However, some administrative records may be corrected or amended retroactively for up to 6 months or longer (22). If the underlying database has data that are dynamically updated over time, a study using the most recently available data extracted today will probably not be exactly replicated using data covering the same period but extracted a year later.
The source data range reflects the calendar date boundaries beyond which encounter information is not captured for patients. Investigators must be clear about the lag between the most recent update to the data source and the calendar time boundaries for data included in their study (study period). For example, investigators may access a data source where the tables containing up-to-date information on patient health care contacts are extracted on 1 January 2019 (data extraction date). The source data range included in those tables covers 1 January 2003 to 31 December 2018. The investigators, however, choose a study period that focuses on time after market entry of a drug and does not use the most recent 6 months, a period during which the data may be more fluid. The data extraction date and source data range do not need to be included in visualization of study design, but reporting them and archiving extracted longitudinal data will make study implementation reproducible (16).

Anchors in Patient Event Time
When an effectiveness or safety study is implemented in a longitudinal database, the time scale shifts from calendar time to patient event time. Specific algorithms define events in the patient timeline. As in randomized controlled trials, where the randomization date is the anchor date, the cohort entry date (CED, also called the index date) is the primary anchor in a nonrandomized database study (Table).
The CED is the date when patients enter the analytic study population. For some study designs, study entry can be defined by an event date (as described under Nested Case-Control Study and in Self-Controlled Study Design Visualization in the Appendix, available at Annals .org). The CED is considered a first-order anchor because most other anchors and parameters used in study implementation will be defined relative to it. The CED is defined by an inclusion rule, along with multiple exclusion criteria that are sequentially applied. Clarity in the definitions and sequence of these criteria is essential. For example, whether exclusions are applied before or after selection of the CED should be clear. If the wrong patients are excluded or if the study entry date is shifted, results may not be reproducible (16).
Secondary temporal anchors are defined relative to the first-order anchor, the CED. As in temporal ordering in a randomized trial (23), we wish to assess all patient characteristics before the start of exposure to avoid adjusting for causal intermediates. The exclusion assessment window and the covariate assessment window are often defined to begin a set number of days before the CED and end the day before or the day of the CED (Table) (24). These windows are sometimes identical, but in some studies, separate windows may be specified for subsets of exclusion criteria or confounders. For example, history of cancer might be measured over all available time before the CED, whereas

RESEARCH AND REPORTING METHODS
Graphical Depiction of Study Designs recent myocardial infarction might be measured within 30 days before the CED. Research has suggested that use of a flexible window for covariate assessment, starting from the beginning of the available data stream and continuing until the day of CED, is preferable to use of a fixed window (25). The effect on confounding adjustment may vary by setting (26).
For studies where exposure is not a first-order anchor, it can be defined in an exposure assessment window. This window itself is defined relative to the CED. For example, a cohort study looking at risk for cardiovascular outcomes in patients after percutaneous coronary intervention or acute coronary syndrome defined the CED as the date of hospital discharge (27). Patients were further required to receive clopidogrel for the first time within 7 days after the CED. The exposure of interest was proton-pump inhibitor use, which was assessed during the 21 days before and 7 days after the CED. To avoid immortal time bias, outcomes should not be counted as exposed outcomes until after the exposure definition has been met (28).
In many applications, we want to make sure that the outcome of interest has not yet occurred at the time of study entry. To study newly occurring events, investigators can require an outcome washout window. Similarly, new use of a drug or other treatment can be defined by requiring an exposure washout window of defined duration (Table).
The analytic follow-up window, during which the study population is at risk for developing the outcome of interest, begins after study entry. It may begin on the CED or after an assumed induction window before which there is no biologically plausible effect of exposure on outcome. The maximum analytic window for follow-up is defined by 1 or more censoring criteria. For analyses that focus on follow-up time while patients are exposed to a treatment, the analytic follow-up time may incorporate stockpiling algorithms, grace windows for drug exposures, hypothesized induction windows before the effect of exposure begins, or hypothesized duration of biological risk beyond the end of observed exposure (16,29).
The outcome event date is the date of outcome occurrence during analytic follow-up. For some study designs, such as case-crossover (where assessment windows are anchored on the outcome), the outcome event date is a first-order anchor equal to the CED. In the nested case-control design, secondary temporal anchors may be defined relative to the CED for the underlying source cohort as well as the outcome event date.

GRAPHICAL REPRESENTATION OF DESIGN IMPLEMENTATION
Because of the complexity of the timeline and the interrelated nature of the factors described in this article, researchers often find it helpful to illustrate their study design implementation on the longitudinal health care record of an imaginary patient. However, the de-  Graphical Depiction of Study Designs RESEARCH AND REPORTING METHODS sign elements represented in a diagram and the level of detail provided in published reports vary widely (30 -34). We propose a framework for visualizing the design of nonrandomized database study implementation that uses standardized structure and terminology and focuses on summarizing details of first-and second-order temporal anchors (Table). These design diagrams include bracketed numbers representing time intervals anchored on the CED (day 0). Following conventional mathematical notation, we indicate open intervals (which do not include the end points) with parentheses and closed intervals (which do include the end points) with square brackets. First-order time anchors are represented as columns indicating a date on the patient timeline, whereas second-order anchors (time windows) are represented as separate boxes. Boxes are placed in different rows so that overlap can be easily distinguished. The steps to create the analytic cohort from data tables in the longitudinal source are laid out sequentially from top to bottom in the design diagram. Attrition tables could be incorporated into these diagrams, with patient counts inserted in the relevant rows for exclusion criteria. We used standardized structure and terminology to provide examples of graphical representation for several designs that can be used in nonrandomized database studies, includ-ing cohort designs; designs that sample from cohorts (case-control, case-cohort, and 2-stage sampling); and self-controlled study designs.

Cohort Study
The cohort study design is widely used in research in large health care databases and encompasses a range of designs in which a group of patients enters the study population on the CED. Baseline characteristics or covariates are usually (but not always) defined before and outcome events after the CED. When covariate assessment windows are after the CED (for example, time-varying covariates), these should occur before the relevant exposure assessment window to avoid adjustment for causal intermediates. Numerous variations of the cohort study design could be implemented. These decisions can greatly affect results. For example, study entry could be based on initiation of an exposure of interest, occurrence of a health event, calendar time, or a combination thereof (28). Patients could be allowed to enter only 1 time or every time they meet entry criteria.
Example 1: Exposure-based cohort entry. A cohort study investigated whether angiotensin-converting enzyme inhibitors (ACEIs) differ from angiotensinreceptor blockers (ARBs) with respect to risk for angio-

Follow-up Window Days [0, Censor] §
This work is licensed under CC BY, and the original versions can be found at www.repeatinitiative.org/projects.html. ACEI = angiotensin-converting enzyme inhibitor; ARB = angiotensin-receptor blocker. * Treatment episodes were defined by date of dispensing and days' supply with a stockpiling algorithm if a new dispensing occurred before the end of days' supply. Gaps of <30 d between end of days' supply and next dispensing were bridged. 30 d was added to the last dispensing days' supply in an exposure episode.
† Up to 45-d gaps in medical or pharmacy enrollment were allowed. ‡ Baseline conditions included allergic reactions, diabetes, heart failure, ischemic heart disease, and use of nonsteroidal anti-inflammatory drugs. § Earliest of outcome of interest (angioedema), switching or withdrawing study drugs, death, disenrollment, 365 d of follow-up, or end of study period.

RESEARCH AND REPORTING METHODS
Graphical Depiction of Study Designs edema (35). The inclusion criterion was initiation of use of a study drug (ACEI or ARB) after 183 or more days without dispensings of either of the drug groups being compared ( Figure 2). In this study, as in most, patients were allowed to enter the study population only 1 time.
The CED was the date of first prescription for ACEI or ARB. Exclusion criteria were then applied. Patients were excluded if they were younger than 18 years or started receiving both ACEI and ARB on the CED. Patients were also excluded if they had intermittent medical and drug coverage (defined as gaps in coverage >45 days) in the 183 days before, but not including, the CED. The covariate assessment window and washout for incident exposure and outcome were also the 183 days before, but not including, the CED. Follow-up began on the CED and continued until the outcome of interest (angioedema), switching or withdrawal of study drugs, death, disenrollment, 365 days of follow-up, or the end of the study period, whichever came first. In contrast to this study, which defined the CED before applying exclusion criteria, a similar study investigating ACEI versus ARB on risk for angioedema identified every new episode of new treatment initiation within the study period and then picked only the first new initiation episode for each patient after exclusion criteria were applied (Figure 3

) (36).
Example 2: Exposure-based cohort entry restricted to adherent users. A cohort study investigated whether statins differed from glaucoma agents with respect to mortality risk among patients who adhered to statin or glaucoma therapy and were not at high risk for death (37). The CED was defined by initiation of study drug use after at least 12 months continuously enrolled without any dispensing of study drugs. Nonadherent patients were excluded, where nonadherence was defined as fewer than 3 dispensings for statin or glaucoma therapy within 180 days (Figure 4). The study specified an exclusion assessment window of 12 months before the CED to exclude patients with evidence of dementia or cancer and those without evidence of at least 1 risk factor for a major vascular event (angina; intermittent claudication; hypertension; diabetes; history of stroke, transient ischemic attack, myocardial infarction, arterial surgery, or amputation for vascular disease; or smoking), as well as a 6-month exclusion assessment window to exclude patients with cardiovascular-related hospitalizations. Patients were excluded if they were younger than 65 years on the CED or started receiving both statins and glaucoma agents on the

Follow-up Window Days [3rd refill, Censor] ‡
This work is licensed under CC BY, and the original versions can be found at www.repeatinitiative.org/projects.html. CVD = cardiovascular disease; Rx = prescription. * Excluded if there is evidence of dementia or cancer or no evidence of ≥1 of the following conditions: angina, intermittent claudication, hypertension, diabetes, history of stroke, transient ischemic attack, myocardial infarction, arterial surgery, amputation for vascular disease, or smoking.
† Full list and code algorithms are in the Appendix (available at Annals.org). ‡ Censored at earliest of outcome, death, disenrollment, or end of study period.
Graphical Depiction of Study Designs RESEARCH AND REPORTING METHODS CED. Confounders for eligible patients were captured in a 12-month covariate assessment window before the CED. Follow-up began on the date of the third refill and continued until outcome, death, disenrollment, or end of the study period.

Nested Case-Control Study
A nested case-control study samples the analytic study population from a fully enumerated source cohort (17). In database studies, the source cohort for a case-control study can be fully enumerated, making nested case-control studies feasible. The CED is the date of entry to the source cohort. Exclusions are applied to the source cohort before or on the CED, and the follow-up window begins on or after the CED. Case patients are identified on the basis of occurrence of the outcome-defining event during the cohort follow-up window. Exposure is assessed in 1 or more windows that fall between the CED and the outcome event date. When risk-set sampling of control patients is used, a fixed number of members of the source cohort who are at risk for the outcome on the date of a given case patient's event are sampled as potential control patients. With such individual matching, a control patient's person-time is anchored by the outcome event date for the case patient to which he or she is matched.
Example 3: A nested case-control study with riskset sampling. A nested case-control study compared pioglitazone versus other oral antidiabetic agents on risk for bladder cancer (38). The CED for the cohort was first initiation of antidiabetic agent use, defined with a washout window that included all available data before the CED ( Figure 5). Patients were required to be enrolled in the primary care database with at least 1 year of medical history before the CED. They were excluded if the first antidiabetic drug prescribed was insulin, they were younger than 40 years on the CED, or they had a history of bladder cancer ever recorded before the CED. The covariate assessment period included all available data before the CED. Follow-up started on the CED and continued until censoring at the first of inci-

Follow-up Window Days [0, Censor]*
This work is licensed under CC BY, and the original versions can be found at www.repeatinitiative.org/projects.html. ED = event date. * Censored at first of incident bladder cancer, death, disenrollment, or end of study period.
† Control patients were risk-set matched on year of cohort entry, duration of follow-up (from cohort entry), age, and sex.

DISCUSSION
In this article, we focus on use of graphical representation to clearly communicate design decisions made when generating evidence from administrative and clinical data that were collected as part of routine care, not for research purposes. We provide examples of graphical representation for different study designs using standardized structure and terminology. The figures in this article are freely available for download from drugepi.org, and users can adapt them as needed. We look forward to user experiences and suggestions for improvement.
Visualization of study design is a powerful communication tool that provides a clear and concise summary of study implementation details. It can help consumers of pharmacoepidemiologic and pharmacoeconomic evidence assess how that evidence was generated, but it does not remove the need for examination of study strengths and limitations, including measurement issues, which should be discussed in reports. Recent publications of database studies have provided informative graphs that show key aspects of longitudinal study designs, varying from high-level conceptual descriptions to information-packed diagrams (31,33,34). These diagrams depict critical temporal aspects with clarity. Once the basic temporal aspects of a study are understood, it is easier to comprehend the longer prose typically used to describe study design in detail.
We believe that a widely used framework with a common structure and terminology for graphical representation of database study designs would promote clearer understanding of database research. This framework would encourage researchers and reviewers to think systematically about time-related aspects in the context of typical study designs when designing studies or preparing manuscripts. It would also help readers understand critical temporal aspects of a longitudinal database study. Ultimately, these factors support the confidence of decision makers in evidence generated from nonrandomized database studies. Disclosures: Dr. Schneeweiss reports grants from the U.S. Food and Drug Administration, the Patient-Centered Outcomes Research Institute, and the National Institutes of Health during the conduct of the study and personal fees from WHISCON, equity in Aetion, and being principal investigator of research contracts to Brigham and Women's Hospital from Bayer, Vertex, Boehringer Ingelheim, and the Arnold Foundation outside the submitted work. In addition, Dr. Schneeweiss has a patent for a database system for analysis of longitudinal data sets, with no royalties paid. Dr. Rassen reports that he is an employee of and has an ownership stake in Aetion outside the submitted work. Dr. Murk reports that he is an employee of and holds stock options in Aetion outside the submitted work. Dr. Wang reports being principal investigator on research contracts to Brigham and Women's Hospital from Boehringer Ingelheim, Novartis, and Johnson & Johnson, and the Arnold Foundation outside the submitted work. She is also a consultant to Aetion for unrelated work. Dr. Arlett is a full-time employee of the European Medicines Agency. Dr. Dal Pan is a full-time employee of the Food and Drug Administration. Authors not named here have disclosed no conflicts of interest. Disclosures can also be viewed at www.acponline .org/authors/icmje/ConflictOfInterestForms.do?msNum=M18 -3079. criteria. The CED was defined as the date of MMR administration. Eligible CEDs could not have any vaccinations in the immunization schedule or diagnoses of febrile seizures recorded in the 56 days prior. Only incident outcomes were included in the analysis. Incident outcomes were defined by the first inpatient or emergency department code for seizure after 56 days without any codes for seizure. The analytic follow-up windows where children were considered to be at risk for seizure were the 7 to 10 days after MMR vaccination (hypothesized exposure risk window) and 14 to 56 days after MMR vaccination (reference window). Days 1 to 6 were considered induction time before the biological effect of vaccination on seizure plausibly begins, and days 11 to 13 were used as washout for potential carryover exposure effects. The analysis conditioned on the individual to make within-person analyses and accounted for differential person-time in follow-up windows.