Biosimilar monoclonal antibodies: the scientific basis for extrapolation.

Introduction: Biosimilars are biologic products that receive authorization based on an abbreviated regulatory application containing comparative quality and nonclinical and clinical data that demonstrate similarity to a licensed biologic product. Extrapolation of safety and efficacy has emerged as an important way to simplify biosimilar development. Regulatory authorities have generally reached the consensus that extrapolation of similarity from one indication to other approved indications of the reference product can be permitted if it is scientifically justified. Areas covered: Recently, the first biosimilar, biosimilar infliximab (Remsima/Inflectra) to the innovator monoclonal antibody infliximab (Remicade), was approved in the European Union, Canada and South Korea; the USA subsequently approved its first biosimilar, a less complex molecule (filgrastim-sndz). Based on two clinical trials of biosimilar infliximab in patients with rheumatoid arthritis and ankylosing spondylitis, the European Medicines Agency allowed extrapolation to all eight approved indications for innovator infliximab, whereas Health Canada did not permit extrapolation to the indications for ulcerative colitis and Crohn’s disease. These differing decisions on extrapolation of indications for biosimilar infliximab highlight important unanswered regulatory and scientific questions. Here, we propose substantive scientific considerations for indication extrapolation. Expert opinion: The preclinical and clinical criteria that are currently required to merit indication extrapolation have not been rigorously evaluated.

Introduction: Biosimilars are biologic products that receive authorization based on an abbreviated regulatory application containing comparative quality and nonclinical and clinical data that demonstrate similarity to a licensed biologic product. Extrapolation of safety and efficacy has emerged as an important way to simplify biosimilar development. Regulatory authorities have generally reached the consensus that extrapolation of similarity from one indication to other approved indications of the reference product can be permitted if it is scientifically justified. Areas covered: Recently, the first biosimilar, biosimilar infliximab (Remsima/ Inflectra) to the innovator monoclonal antibody infliximab (Remicade), was approved in the European Union, Canada and South Korea; the USA subsequently approved its first biosimilar, a less complex molecule (filgrastim-sndz). Based on two clinical trials of biosimilar infliximab in patients with rheumatoid arthritis and ankylosing spondylitis, the European Medicines Agency allowed extrapolation to all eight approved indications for innovator infliximab, whereas Health Canada did not permit extrapolation to the indications for ulcerative colitis and Crohn's disease. These differing decisions on extrapolation of indications for biosimilar infliximab highlight important unanswered regulatory and scientific questions. Here, we propose substantive scientific considerations for indication extrapolation. Expert opinion: The preclinical and clinical criteria that are currently required to merit indication extrapolation have not been rigorously evaluated.

Introduction
A biosimilar is a biologic product authorized on the basis of an abbreviated regulatory application containing comparative quality and nonclinical and clinical data. The application ultimately must show the following, relative to a biologic product that is already licensed (i.e., the reference product): similar physicochemical characteristics and biologic activity; comparable pharmacokinetics; comparable pharmacodynamics, when there is a relevant pharmacodynamic indicator and similar safety, efficacy and clinical immunogenicity [1]. The extrapolation of clinical safety and efficacy has emerged as an important way to simplify the development process for biosimilars and enable truncated (and less expensive) dossiers. Indeed, regulatory authorities have generally reached consensus that similar safety and efficacy do not need to be shown in every reference product indication. Instead, they will permit extrapolation in certain cases if certain points can be demonstrated and if extrapolation is justified scientifically ( Table 1) [1][2][3][4][5][6][7].
When considering the scientific justification for extrapolation, regulators have generally emphasized that the reference ('tested') indication and the extrapolated ('target') indications should have comparable patient populations, that the diseases in question should have comparable pathogenesis, and that the two biologics should either have the same mechanism of action and/or receptor in each indication or that relevant data (e.g., pharmacokinetics and pharmacodynamics for each of the indications) should be provided [5,8]. Regulators also generally expect that the tested indication is in a population that is sufficiently sensitive, and preferably the most sensitive, to detect clinically meaningful differences between the biosimilar and the reference product. Similarly, comparative clinical data should be collected in the tested indication using clinical end points that are sensitive enough to elucidate such differences. The scientific rationale is to maximize the opportunity to predict differences in clinical profile, and the inability to detect differences in the most sensitive clinical model would increase confidence in the demonstration of biosimilarity.
Unlike the relatively simple biologics (e.g., erythropoietin, filgrastim, somatropin) that made up the first wave of biosimilars, monoclonal antibodies have multiple biologic functions, are used for purposes (such as TNF inhibition) that are normally not performed by antibodies and are directed to intervene locally in disease processes that are complicated and not very well understood ( Figure 1). Regulators have now begun to approve biosimilar monoclonal antibodies. Specifically, the European Medicines Agency (EMA) and Health Canada recently authorized biosimilar infliximab (Inflectra and Remsima) as biosimilar to innovator infliximab (Remicade). The US FDA has not yet licensed a biosimilar monoclonal antibody, but it licensed its first biosimilar, filgrastim-sndz, in March 2015 [9]. This paper discusses the scientific basis of extrapolation, with emphasis on the second wave of biosimilars. The recent regulatory approvals of biosimilar infliximab demonstrate the importance of the underlying scientific questions and underscore the need to tailor the approach toward the unique characteristics of these more complex biologics.

Background
When public discussion of possible pathways to market for follow-on biologics began in earnest in the early 2000s, it quickly became clear that the generic pathway for regulatory approval of nonbiologic drugs would not be suitable. The generic pathway permits approval of abbreviated applications for chemically synthesized, small-molecule drugs based on the demonstration of pharmaceutical equivalence and bioequivalence to their reference products. Two products are considered pharmaceutically equivalent if their active ingredients are identical and their routes of administration, dosage forms and strengths are the same. They are bioequivalent if they are comparably bioavailable (i.e., have the same rate and extent of absorption) after administration in the same molar dose. For systemically active drugs, this often requires a comparison of pharmacokinetic characteristics. When the test and reference products are pharmaceutically equivalent and bioequivalent, they are considered therapeutically equivalent in all indications.
Biologic products differ in important respects from chemically synthesized, small-molecule drugs [10]. They are produced in cell systems or from other biological sources (e.g., blood products). The host cell type influences the properties of the product; the subsequent isolation and production processes can greatly influence the quality, safety and even efficacy of the finished product. Other key differences between nonbiologic and biologic drugs are the considerably larger molecular size of biologics and their complicated structure. Process-related impurities combine with naturally occurring modifications and inevitably lead to product heterogeneity, such as isoforms that differ in the degree or pattern of glycosylation or other post-translational modifications [11]. Some isoforms may be considered impurities and could alter the efficacy, safety and immunogenicity of the product [12]. One of the most important differences between nonbiologic drugs and biologic drugs, however, is the immunogenic potential of the latter. While a few nonbiologic drugs can trigger antibody formation, nearly all biologics do. These anti-biologic antibodies can have a variety of impacts, ranging from no clinical effect to interference with efficacy to serious or even fatal adverse effects [11], including pure red-cell aplasia [13]. Immunogenicity can be triggered by seemingly minor changes in the raw materials or manufacturing process that alter the finished product or introduce impurities [14,15]; analytical and animal testing cannot fully predict the immunogenic response in humans. The potential for immunogenicity lies at the heart of the expectation that all biologic and biosimilar sponsors   perform at least some premarket testing in humans, because immunogenicity not only serves as a proxy safety signal (e.g., for infusion reaction, anaphylaxis and development of anti-biologic antibodies), but is also a negative marker for a biological response and maintenance of that response.

Biosimilarity
European regulators developed the concept of similar biologic medicinal products, or biosimilars, in the early 2000s. The pathway for marketing authorization of biosimilars is described in EU legislation, but the requirements are laid out in various overarching and product-specific guidelines issued by the Committee for Medicinal Products for Human Use, which is a scientific committee of the EMA [16][17][18]. To request marketing authorization for a biosimilar, the manufacturer must submit a full-quality dossier with a comparison of the physicochemical and in vitro biologic characteristics of multiple batches of the biosimilar and the reference product. Comparative preclinical and clinical data are required, following a stepwise model, although the preclinical requirements may be modest. The European requirements have been profoundly influential around the world, including in the USA, and these requirements formed the basis for the 2009 World Health Organization guidelines [5].

Extrapolation
The biosimilar marketing application is more extensive than the generic nonbiologic drug application, but it is truncated in comparison with a full biologic marketing application. Truncation may take the form of an abbreviated preclinical dossier. Typically, regulators do not require a full toxicity program. There is also truncation in the clinical development program. A regulator might not require clinical outcomes for efficacy testing for one or more indications if sensitive pharmacodynamic biomarkers exist and no residual uncertainty remains. Extrapolation between disease conditions and populations may be allowed, if it is scientifically justifiable. Indeed, all major regulators that have developed modern biosimilar pathways have indicated that they will permit extrapolation of indications if it is justified on scientific grounds [6,19]. Based on the regulatory guidelines of the key jurisdictions and the comments of stakeholders available in the public domain, there appears to be a consensus that justifications for extrapolation should be grounded in science and predicated on similarity in the pathogenesis of the disease in the tested and target indications. Regulators also expect similarity in the mechanism of action for the two indications, including the target(s) or receptor(s) for each relevant activity or function of the product; the binding, dose or concentration response and pattern of molecular signaling upon engagement of the target(s) or receptor(s); the relationship between the product structure and its interactions with the target(s) or receptor(s) and the location and expression of the target(s) or receptor(s). Also, regulators expect similarity in pharmacokinetics and drug substance distribution among the different patient populations. The scientific justification should also address differences in expected toxicities between the conditions of use and the patient populations under consideration, as well as any other factors that might affect the safety or efficacy of the product in each condition of use and in each patient population [1,8].

First wave of biosimilars
The EMA has evaluated the greatest number of biosimilars to date and therefore has the most experience with extrapolation in practice. The EMA website includes a European Public Assessment Report (EPAR) for each approved biosimilar; the EPAR explains the contents of the application, as well as the agency's evaluation of those contents and its conclusions with respect to extrapolation [20]. For the first wave of biosimilars (filgrastim, somatropin and epoetin alfa), extrapolation of efficacy was straightforward. Specifically, safety and efficacy in the tested indications were considered sufficient justification for extrapolation to other indications. The EMA has generally permitted extrapolation [21]; indeed, the EMA has denied extrapolation only twice. The agency rejected the extrapolation of immunogenicity data of epoetin alfa biosimilars from an immune-compromised, intravenously treated population to an immune-competent, subcutaneously treated population because of a close link between pure red cell aplasia and a subtle formulation change that led to increased immunogenicity [18]. Based on a similar concern regarding the difference in sensitivity of the population, the EMA also restricted extrapolation of immunogenicity data from one patient population to another for filgrastim. This issue was later resolved by requiring a post-marketing commitment to pharmacovigilance [17].

Second wave of biosimilars
The first wave of biosimilars presented a comparatively easy case for extrapolation. The biologics in question are homologues of endogenous factors with a physiologic role. Their therapeutic use is either supportive or supplementary to an endogenous physiologic process, and their biologic activity does not modify the course or pathogenesis of a specific disease. Furthermore, these biosimilars exert their activity with the same mechanism of action and at the same body sites, independent of the underlying disease. In many cases, it is relatively straightforward to define the most sensitive clinical model, including population and end points, and validated pharmacodynamic efficacy markers exist for several of the indications.
However, the situation is different for most of the monoclonal antibodies, including the TNF inhibitors ( Figure 1). Recombinant, therapeutic monoclonal antibodies are not physiologic regulators of normal biological responses. Nearly all therapeutic monoclonal antibodies are used as disease-modifying agents, and they may exert their modifying activities at different sites of action depending on the condition. For example, the mechanisms that underlie the efficacy of anti-TNF antibodies in gastrointestinal diseases (e.g., Crohn's disease and ulcerative colitis) may not be expected to be the same as the mechanisms underlying efficacy in psoriasis, in part due to differences in the types of cells and/or cytokine profiles involved in the pathogenesis of the diseases. Furthermore, diseases treated with monoclonal antibodies are often complex, and varying subpopulations of patients may react differently to treatment. Comorbidities in the varying patient populations further complicate the clinical picture. The choice of end points for the biosimilarity exercise and the definition of acceptable margins and confidence intervals can be problematic. If monoclonal antibodies have only modest efficacy -as seen with some oncology products -the level of confidence is reduced for the tested indication. Finally, the current knowledge of the many antibodies' mechanisms of action is extremely limited, including those of the anti-TNF monoclonal antibodies. It is critical for extrapolation that the biosimilar have a mechanism of action similar to that of the reference product, to the extent that it is known, in the most sensitive population, even if there are no residual uncertainties from the analytical data.

Biosimilar infliximab
The EMA and Health Canada recently approved biosimilar versions of innovator infliximab. The marketing authorization holder for innovator infliximab supported eight indications: rheumatoid arthritis, adult Crohn's disease, pediatric Crohn's disease, ulcerative colitis, pediatric ulcerative colitis, ankylosing spondylitis, psoriatic arthritis and psoriasis [22,23]. Although the biosimilars Inflectra and Remsima were proposed by different marketing authorization holders, they share an active substance and dossier (manufactured by the same company in South Korea). This dossier contained data from only two clinical studies [24,25]. The first was a Phase I pharmacokinetic study in patients with ankylosing spondylitis [25]. The second was a Phase III safety and efficacy study in patients with active moderate-to-severe rheumatoid arthritis [24]. The EMA authorized the product for all eight innovator infliximab indications ( Table 2) [22]. In contrast, Health Canada approved biosimilar infliximab for only four of the eight innovator infliximab indications, permitting extrapolation from rheumatoid arthritis and ankylosing spondylitis only to psoriasis and psoriatic arthritis [10,26]. Notably, although the EMA granted all eight indications, the marketing authorization holder of Remsima is currently conducting a comparative trial in patients with active Crohn's disease. While the forthcoming trial is referenced in the EPAR as part of the approved Risk Management Plan, it is unclear to what extent the sponsor's agreement to conduct this trial played in the EMA's decision to permit the extensive extrapolation of data.
The focal point of discussion regarding extrapolation was the possible difference in the mechanism of action of the drug in the inflammatory bowel disease (IBD) indications, which potentially involve Fc-mediated effector functions, compared with the rheumatologic and dermatologic indications [10,26,27]. The biosimilar was found to have a difference in glycosylation in the Fc part of the antibody molecule, in particular at the level of afucosylated glycans [28]. This analytical observation was translated into a significant difference in a sensitive in vitro functional assay for one Fc-effector-related function (i.e., antibody-dependent cellular cytotoxicity [ADCC]). This difference was not detected in a subsequent, much less sensitive in vitro test model, which was, however, deemed 'closer' to the human situation and therefore considered more relevant [28]. In addition, notwithstanding the fact that ADCC is stated as a secondary mechanism of action, it was also noted by the EMA that other analytical data suggest that the specific crystallizable fragment g receptor IIIa (FcgRIIIa) may not significantly affect ADCC or monocyte/ macrophage activity and that, to date, there are no published reports describing the induction of ADCC by TNF antagonists in patients [28]. By way of contrast, after review of the sponsor's rationale and the literature regarding the Rheumatoid arthritis CT* CT* CT* CT* CT* Ankylosing spondylitis CT z CT z CT z mechanism of action, Health Canada concluded that ADCC could not be ruled out as a mechanism of action in the IBDs. Their position on ADCC was further rationalized by the observation that certolizumab pegol, another anti-TNF that lacks the ability to induce ADCC, displays only marginal efficacy in patients with Crohn's disease compared with other anti-TNFs, including infliximab. Finally, Health Canada noted differences in the safety profile of infliximab between the tested populations and patients with IBD, in particular, the risk of hepatosplenic T-cell lymphoma, which appears to be uniquely associated with adolescent and young adult patients with IBD [26]. In summary, the specific example of biosimilar infliximab illustrates the difficulty of adapting scientific principles that were previously formulated for smaller, 'simpler' biologics to the more complicated situations with disease-modifying monoclonal antibodies that treat complex and incompletely understood pathophysiology, with often unknown or poorly defined mechanisms of action. Nevertheless, we believe that it is possible to define solid criteria that should improve the scientific basis of indication extrapolation.

A scientific approach to extrapolation
Before enumerating the scientific issues surrounding extrapolation, an additional point needs to be made regarding reliance on quality comparisons. Most regulatory guidelines define the analytical biosimilarity exercise as the foundation of extrapolation; in other words, extrapolation will be permitted only if the biosimilar does not differ from the reference product in physicochemical and in vitro characteristics that may be of clinical relevance. In our view, this raises two important questions that will need to be addressed by the scientific and regulatory communities.
First, two biologics will always differ analytically, for example, in glycosylation patterns and level of impurities. Indeed, as the sensitivity of analytical technology improves, more analytical differences between biologics and their biosimilars will be detectable. It is therefore essential to define an extensive set of quality attributes, especially critical quality attributes (CQAs) for each reference product, and to define the acceptable margins and differences. According to the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use Q8, a CQA is defined as "a physical, chemical, biological, or microbiological property or characteristic that should be within an appropriate limit, range, or distribution to ensure the desired product quality" [29]. Determining which of many quality attributes could affect the efficacy and safety of a particular biologic is an important stage in developing and manufacturing a biosimilar [30]. The importance of each CQA is based on the severity of consequences associated with failing to control the attribute (impact) and the amount and relevance of available information (uncertainty) [30]. Defining fixed, validated ranges of acceptable values for CQAs via iterative testing is a key part of developing the manufacturing methods for a biosimilar. In our view, the process of establishing CQAs for biosimilars as compared with innovator products has not been fully completed for the anti-TNFs.
Second, it is often argued that clinical testing is much less sensitive than analytical methodology and therefore less relevant for, or only confirmatory to, the biosimilarity exercise. It could be posited that, if a biosimilar meets analytical specifications, clinical data are unlikely to identify shortcomings in its efficacy and safety. Arguing against this view is that clinically meaningful differences, particularly immunogenicity, are ultimately the most important and, in nearly every case, it will be impossible to totally exclude clinical differences on the basis of analytical and in vitro functional testing alone. Instead of acknowledging this status quo, we explored ways to increase the sensitivity of the clinical testing model as well as other data to further increase confidence in extrapolation. Rather than adopting the principle that differences are not clinically relevant unless proven otherwise, we would generally assume that any difference may be clinically relevant unless proven or justified otherwise. Reversal of the presumption is in the best interest of patients; it also follows logically from increased reliance on sensitivity of the analytical methodology. With regard to extrapolation, any differences observed during biosimilar development, whether in vitro or in humans, naturally add to the residual uncertainty and may warrant additional data or, ultimately, lead to decisions against full or partial extrapolation. 6. Mechanism of action 6.1 Disease pathogenesis If the disease pathogenesis is unknown, it will be difficult to justify the extrapolation of the efficacy data from the tested indication to the target indication because it will be impossible to show that the biologic has a similar mechanism of action in the different indications. Biologics, especially monoclonal antibodies, have specific targets, and differences in efficacy among multiple indications may relate to the heterogeneity of patient populations across the diseases as well as differential sensitivities of patients within specific disease populations. Because the full pathogenesis of diseases treated by biologics is complex and not fully elucidated, it should be necessary to demonstrate that the biosimilar interacts with key components of a particular disease pathway through a similar mechanism as the reference product. Furthermore, regulatory authorities should carefully examine the analytical and pathogenesis data for biosimilars to identify properties (e.g., effector function and tissue distribution) that could affect clinical efficacy and recommend that these issues be addressed with appropriate research methods.

The mechanism of action of both indications should be known
The complexity of biologics leads to the conclusion that the mode of action of both tested and target indications should be well understood. Biologics are always mixtures of natural and process-related modified proteins. Even if the mechanism of action of a monoclonal antibody is restricted to the prototype molecule, monoclonal antibodies have multiple biological functions. They react with antigens through the complementarity-determining regions of the antibody molecule; however, there are also a number of Fc-related functions that may be important for the mechanism of action of the product. Finally, the relative contribution of all these biological activities to the clinical effect in different indications may vary.

Pharmacokinetic and local bioavailability data at the sites of action
The preceding requirements represent a high bar that is often unattainable for complex biologics. So, how can insufficient knowledge about the disease pathophysiology or mechanism of action be augmented? For disease-modifying biologics that exert their effect at the site of disease, comparative pharmacokinetic data in different patient populations may support extrapolation to conditions with a different site of action, although the simple presence of the drug at a particular site does not establish effectiveness in different disease states with differing mechanisms of action. Pharmacokinetic data will be important for monoclonal antibodies because their tissue distribution and penetration depend on glycosylation and other physicochemical properties of the antibody. Variations in glycosylation patterns may alter the pharmacokinetic profiles and influence the biological activity by leading to different levels of the active ingredient at the relevant sites of action in the various indications [31,32]. For example, in the case of extrapolation of anti-TNFs from rheumatologic to dermatologic diseases, there is no evidence to suggest different mechanisms of action, other than Fab-related TNF binding, but the relevant pathophysiology and sites of action are clearly different.
The scientific rationale implies a need for clinical efficacy and safety trials when relevant supporting data are absent for a target indication or if significant uncertainties exist regarding the pathophysiology or the mechanism of action in that specific indication. This holds regardless of whether the indication in question is for the most sensitive population.

Sensitivity of the clinical test model
As leading regulatory agencies have noted in the guidelines ( Table 1), the sensitivity of the tested indication is critical. Clinical testing should maximize the chance to detect a difference in efficacy and safety (including immunogenicity) between the innovator's product and the biosimilar. Further, the most sensitive population among indications may differ in terms of safety or efficacy. For example, when studied in cells from patients with Crohn's disease, biosimilar infliximab bound less avidly than innovator infliximab to FcgRIIIa if the genotype was V/V and V/F at amino acid residue 158; binding of the two products was similar for the F/F genotype [28]. These results may suggest that patients with Crohn's disease and the V/V or V/F genotype of FcgRIIIa may respond differently to biosimilar infliximab than patients with the wild-type or F/F genotype. Although, in principle, the differences in binding between biosimilar infliximab and innovator infliximab may require separate trials in different patient populations, these additional studies may be incompatible with the abbreviated nature of biosimilar development. In such cases, it will be critical to define one or more populations that are the most sensitive or, at a minimum, sufficiently sensitive, to enable a comparative clinical evaluation of all relevant parameters.
There are four critical elements that collectively determine the sensitivity of the Clinical Test Model: population type (in relationship to effect size), end point(s), dosage(s) and time point(s). We will discuss each of these elements in the next subsections.

Effect size
The biologic effect in the tested indication should be substantial, and the average effect size should not vary widely within the selected patient population. Some complex biologics, especially those used in cancer therapy and in combination therapy, will not meet these conditions for all indications. For example, comparison of the placebo-adjusted response rates from Remicade trials across the six adult indications demonstrates that rheumatoid arthritis is associated with the lowest treatment effect size ( Figure 2). Therefore, rheumatoid arthritis is a less sensitive clinical model to detect differences in efficacy between the originator and biosimilar product than indications with higher treatment effect sizes, such as psoriasis, psoriatic arthritis and Crohn's disease [6,19].

End point(s)
When possible, the clinical end points that show the highest treatment effect size (i.e., most sensitive end points) should be used when comparing the effects of the innovator product with those of the biosimilar. The Psoriasis Area and Severity Index is a good example of a sensitive end point; however, the most sensitive end point may not always be a composite index. It may not always be feasible for a clinical trial with a biosimilar to use the same clinical end point as the innovator product; this is acceptable when the treatment effect size is larger with an alternative end point or when practical considerations (e.g., trial duration or treatment group sizes) prohibit use of the same end point as with the innovator product.
Ideally, clinical trials would include pharmacodynamic markers or biomarkers because they are often more sensitive than clinical end points and therefore more suitable to detect potential differences between two products. Unfortunately, there is currently a lack of validated biomarkers in the case of complex biologics such as anti-TNFs, and their clinical usefulness is limited in diseases such as autoimmune disorders. For example, biomarkers often measured in patients include C-reactive protein and rheumatoid factor, but it is unclear at present whether their serum levels correlate with the effect of the drug, the active state of the disease, another immune event such as infection or all of these. Clearly, end points including biomarkers should be correlated to relevant clinical outcomes and properly validated, including their capacity to show relevant differences in potency and safety. Although this might be possible for efficacy biomarkers in the future, currently it seems challenging to identify a single biomarker, or even a combination of biomarkers, that could reliably predict clinical safety with such an extensive set of events, as well as variability among diseases (see the discussion on risk of hepatosplenic T-cell lymphoma in young patients with Crohn's disease in Section 4.1).

Dosage(s)
Another aspect influencing the sensitivity of the tested indication is the dose--response curve of biologics, which is typically not linear but rather plateaus at a high dose. This means there is a wide range of doses (in the plateau) where there is little or no difference in effect. If the biosimilar and reference product are compared in this range of doses (i.e., in the plateau at the top of the dose--response curve), significant differences in potency between the two products that are not clinically meaningful in this part of the curve, but could be clinically meaningful in other parts of the curve (e.g., in case of different dosages in other indications), may not be detected. Because the biosimilarity exercise is aimed to detect differences rather than establish efficacy and safety per se, dose selection for  Treatment effect = placebo-adjusted difference in the percentages of patients achieving the indicated end point. Adapted from [6]. The sources for the data in the figure with the column number for each indication, counted from left to right, are listed. CD studies: columns 1 --2, Targan et al. [39]; column 3, Hanauer et al. [40]; column 4, Present et al. [41]; column 5, Sands et al. [42]. UC studies: Rutgeerts et al. [43]. RA studies: columns 1 --3, Lipsky et al. [44]; columns 4 --6, St Clair et al. [45]. AS study: van der Heijde et al. [46]. PsA studies: columns 1 --3, Antoni et al. [47]; columns 4 --6, Antoni et al. [48]. Ps studies: column 1, Reich et al. [49]; column 2, Menter et al. [50]; column 3, Gottlieb et al. [51]. ACR20, ACR50 and ACR70 indicate improvements of at least 20, 50 and 70%, respectively, in ACR score. ASAS20, ASAS50 and ASAS70 indicate improvements of at least 20, 50 and 70%, respectively, in ASAS score. PASI75 indicates an improvement of at least 75% in PASI score. comparative trials should be guided by this consideration and not appear on the flat part of the dose--response curve. Moreover, biologic products are frequently 'over'dosed, and the precise dose--response relationship is often not fully established for every indication. To observe a difference in potency between the biosimilar and the reference product, a thoroughly characterized dose--response curve for the tested indication should be the basis for a test at the lowest possible dose with effect on that curve for that particular use. As a result, for most biologics, a sensitive dose on the steep part of the dose--response curve will be subtherapeutic. Although this effect may be overcome relatively easily in the case of Phase I pharmacokinetic data testing in healthy volunteers (which is often a more sensitive test model than patients at any rate), it is much more difficult in the case of Phase III efficacy and safety trials and raises ethical concerns. It may be very challenging in such cases to combine the optimal population, end point and dose into one 'most sensitive' clinical trial.

Time point(s)
Much as setting the therapeutic dose at the plateau level of the dose--response curve may mask potential clinical differences, the time points used in placebo-controlled trials may not be most suitable in the case of comparative studies of biosimilars. As stated previously, the aim for novel drugs is to independently establish superior efficacy and/or safety compared with placebo or the current standard of care. For chronic diseases such as autoimmune disorders, the clinical response is often slow and does not plateau until several months after treatment initiation. In addition, by the very nature of these diseases, one would naturally be most interested in clinical end points that directly assess or are correlated with longterm outcomes. However, for comparative biosimilar trials, earlier time points in the beginning of the dose response would, in fact, represent a much more sensitive test to detect differences [33]. Although the early time points could serve as primary or co-primary end points, the more traditional study end points should be included because most Phase III studies will need to last at least 6 --12 months to collect sufficient long-term safety and, in particular, immunogenicity data.

Extrapolation of immunogenicity data
Understanding the immunogenicity of any biologic is important, and biosimilars are no exception. Long-term antibody formation data should be available for the reference indication because most patients do not begin producing anti-drug antibodies (ADAs) to a biologic until 6 --12 months after treatment initiation [34,35]. In one study, anti-infliximab antibodies were observed in 44% of patients (33/75) with rheumatoid arthritis 6 months after treatment had begun; in another study, 15% of patients (64/442) with Crohn's disease had anti-infliximab antibodies at week 54 following repeated infliximab treatments beginning at week 0 [34,35]. Accordingly, to obtain full insight into the immunogenicity profile of a proposed biosimilar, comparative ADA data generated after at least 1 year of treatment in the tested indication are essential. This more complete insight into the immunogenicity profile of the biosimilar in the tested indication is necessary before any conclusions can be drawn about immunogenicity in other indications. The extrapolation of immunogenicity potential across indications may be further confounded by the presence or absence of concomitant immunosuppressants; for example, methotrexate decreases the incidence of ADAs to monoclonal antibodies such as infliximab and adalimumab ( Figure 3A and B) [19,23,36]. For infliximab, rates of antibody formation are higher in patients treated for psoriasis and Crohn's disease (after a drug-free interval) and lower for patients with rheumatoid arthritis or Crohn's disease receiving concomitant immunosuppressant therapy [23]. Additionally, the assays used to assess immunogenicity in the tested indication should employ current technologies and be fully validated and highly sensitive, in accordance with the relevant regulatory guidelines and scientific consensus. The measurement of ADAs is complex, in part owing to methodological issues (e.g., the degree of nonspecific binding), and can be complicated by the presence of drug in the serum of patients [37]. Furthermore, development of immunogenicity is slow, and long-term exposure to the therapy may lead to tolerance or, conversely, to higher levels of antibodies [37]. It should be possible to discern subtle differences from the assays between similar products so that the results are useful for extrapolation.

Other factors impacting residual uncertainty regarding extrapolation
The previous sections discuss and provide suggestions for addressing the uncertainties caused by the lack of understanding of the pathophysiology and/or mechanism of action of biosimilars. Besides improvement of the sensitivity of the clinical testing model, other information or circumstances may affect decisions around indication extrapolation by regulators and the subsequent confidence of the medical and scientific communities, including experience with the reference product, the tested indication, the availability of a safety database and post-approval studies.

Experience with the reference product in the tested indication
The reference biologic should have been used for the tested indication for a sufficient period of time in clinical practice. This provides a robust body of clinical data about the reference product to which a biosimilar can be compared. More extensive and meaningful real-world safety data help to inform the therapeutic index (i.e., risk--benefit profile) of the tested indication. If the experience base is insufficient to eliminate residual uncertainty about safety in the test indication, the data will also be insufficient to address the additional uncertainty raised by indication extrapolation. 9.2 Prior regulatory approval of the tested indication Some innovator biologics are not approved and marketed in every major country or are not approved for every use in every country. If the regulator is not familiar with the reference product indication in question (i.e., the reference product indication is approved only in a different regulatory jurisdiction), additional data would be needed from the biosimilar applicant about the use of the reference product for that indication. Moreover, if the indication has not been accepted for the reference product by all major agencies, it again would be incumbent upon the biosimilar applicant to provide  *Percentages not shown are not available in the US product labeling [36]. justification for extrapolation to that indication. In summary, a biosimilar should be extrapolated to an indication for which the innovator was approved in a well-regulated jurisdiction and sufficient experience exists. 9.3 A large and reliable safety database for the reference product Just like extrapolation of efficacy, extrapolation of safety necessitates extensive experience with the use of the reference product in the tested indication, for the reasons noted previously. However, the quality of the safety data is important. Routine pharmacovigilance is based on spontaneous reporting. These data are always selective, rather than representative, due to underreporting. They also often lack full documentation. The biosimilar applicant should therefore work from a reference indication for which there are more reliable postmarketing safety data, for example, from national registries or other registries (e.g., started by a third-party clinical group) or dedicated safety studies. This ensures that the finding of similarity as to the tested indication is grounded in a robust body of data about the reference product and, in turn, ensures that the conclusion of similarity with respect to the tested indication is reliable.

Post-approval studies in the extrapolated indications
The absence of specific clinical data from a randomized controlled trial in extrapolated indications for monoclonal antibodies is currently heavily debated and is expected to remain the subject of controversy for some time. As mentioned previously, in the case of Remsima, a comparative clinical study in patients with Crohn's disease is currently ongoing, and positive data might further increase acceptance of indication extrapolation in the medical community. Although the EMA has been careful to state that any decisions on indication extrapolation cannot be subject to post-approval commitments, there is in fact precedence for this, as the decision not to extrapolate filgrastim for reasons of immunogenicity was resolved by requiring a post-marketing commitment to pharmacovigilance [17]. In our view, post-approval measures such as long-term, open-label safety studies or patient registries should not be seen as a hurdle. In fact, implementation of such measures should be encouraged to specifically evaluate extrapolated indications. Among other benefits, this would potentially improve the robustness of regulatory decisions around extrapolation, increase the acceptance of biosimilars in the market (in particular, the target indications) and would allow the scientific community to further understand the risk--benefit profile of all biologics.

Conclusions
Extrapolation of indications for biosimilars of therapeutic monoclonal antibodies presents a new set of complex scientific challenges for scientists and regulators. Ultimately, the goal should be to reduce the level of the residual uncertainty about the efficacy and safety of biosimilars and inform scientifically sound, data-driven decisions about full or partial extrapolation.

Expert opinion
For most regulatory guidelines, analytical similarity is the foundation for indication extrapolation, that is, extrapolation will be permitted only when the biosimilar is similar to the reference product in physicochemical and in vitro characteristics that may be clinically relevant. Although analytical data might justify approval of a biosimilar for a specific indication, additional issues should be considered when justifying indication extrapolation. The abbreviated FDA and EMA processes for demonstrating biosimilarity to a reference product is necessary but insufficient for making critical scientific judgments about extrapolating multiple indications for that biosimilar [1,16]. In fact, more recent guidelines from the EMA suggest that applicants include data supporting extrapolation when there is evidence demonstrating different mechanisms of action for a particular claimed indication [8]. For example, regulatory agency approvals of biosimilar infliximab have raised new challenges about making data-driven judgments concerning indication extrapolation, demonstrating the importance of underlying scientific questions such as mechanism of action and disease pathogenesis and underscore the need to refine the extrapolation indication process for the unique characteristics of these more complex biologics. It is critical to establish regulatory pathways for indication extrapolation based on a common evidence framework that includes not only biosimilarity but also relevant data on fundamental issues such as mechanism of drug action, mechanism of pathogenesis and efficacy in sensitive clinical populations. Ultimately, such a framework may allow collection of more uniform data for determining whether indication extrapolation is appropriate.
As biosimilars to complex reference products become more numerous, it will be important to identify key scientific points that should be addressed when considering indication extrapolation. Indication extrapolation should never be the default, but rather require key data to augment the physicochemical, efficacy and safety evidence of the biosimilar on a case-by-case basis. As regulatory agencies gain more experience justifying indication extrapolation for biosimilars, it is hoped that gaps in the key data can be identified and subsequently remedied.
It is often argued that, owing to lower sensitivity, clinical testing is less relevant than analytical testing when confirming biosimilarity. It could be posited that, if a biosimilar meets analytical specifications, clinical data are unlikely to identify shortcomings in its efficacy and safety. Arguing against this view is that clinically meaningful differences, particularly immunogenicity, are ultimately the most important and, in nearly every case, it will be impossible to totally exclude clinical differences on the basis of analytical and in vitro functional testing alone. Instead of acknowledging this status quo, we consider ways to increase the sensitivity of the clinical testing model as well as other data to further increase confidence in extrapolation. Any differences observed during biosimilar development, whether in vitro or in humans, naturally add to residual uncertainty and may warrant additional testing or, ultimately, lead to decisions against full or partial extrapolation. Developers of biologics and biosimilars should strive to develop more sensitive clinical testing methods, such as clinical study designs that increase effect sizes, biomarkers based on the mechanism of action of a specific monoclonal antibody in a given disease state and methods that improve understanding of pharmacokinetic/ pharmacodynamic relationships in clinically sensitive patient populations.