Evaluating a Theory-Based Hypothesis Against Its Complement Using an AIC-Type Information Criterion With an Application to Facial Burn Injury

An information criterion (IC) like the Akaike IC (AIC), can be used to select the best hypothesis from a set of competing theory-based hypotheses. An IC developed to evaluate theory-based order-restricted hypotheses is the Generalized Order-Restricted Information Criterion (GORIC). Like for any IC, the values themselves are not interpretable but only comparable. To improve the interpretation regarding the strength, GORIC weights and related evidence ratios can be computed. However, if the unconstrained hypothesis (the default) is used as competing hypothesis, the evidence ratio is not affected by sample-size nor effect-size in case the hypothesis of interest is (also) in agreement with the data. In practice, this means that in such a case strong support for the order-restricted hypothesis is not reflected by a high evidence ratio. Therefore, we introduce the evaluation of an order-restricted hypothesis against its complement using the GORIC (weights). We show how to compute the GORIC value for the complement, which cannot be achieved by current methods. In a small simulation study, we show that the evidence ratio for the order-restricted hypothesis versus the complement increases for larger samples and/or effect-sizes, while the evidence ratio for the order-restricted hypothesis versus the unconstrained hypothesis remains bounded. An empirical example about facial burn injury illustrates our method and shows that using the complement as competing hypothesis results in much more support for the hypothesis of interest than using the unconstrained hypothesis as competing hypothesis. Translational Abstract In an informative hypothesis, academic expertise (i.e., theory) about the population of interest is included in the hypothesis in terms of order-restrictions on the model parameters. As an example, in an ANOVA setting, we might expect that the group means follow a certain order (e.g., H 1 : (cid:2) 1 (cid:3) (cid:2) 2 (cid:3) (cid:2) 3 (cid:3) (cid:2) 4 ). As another example, in a linear regression model, we might expect that the (standardized) regression coefficients are subject to multiple one-sided restrictions (e.g., H 2 : (cid:4) 1 (cid:5) 0; (cid:4) 2 (cid:5) 0; (cid:4) 3 (cid:5) 0). In the absence of a competing informative hypothesis, an order-restricted hypothesis is typically evaluated against the unconstrained hypothesis H u ( all orderings are allowed). However, a problem arises if the order-restricted hypothesis H 1 is in agreement with the data: Then, the estimated parameters for H 1 are identical to the estimated parameters of H u . Therefore, we introduce an AIC-type/information-theoretic method for evaluating an informative hypothesis H m against its complement, where the complement H c is defined as H c : not H m (all orderings are allowed except the ordering in H 1 ).The method is illustrated using an empirical example about facial burn injury. In addition, the method is implemented in the user-friendly R package restriktor .

rumination acting as a reminder to the event (Nolen-Hoeksema, Wisco, & Lyubomirsky, 2008).Based on previous research (Van Loey et al., 2014), it is expected that injury characteristics that may be perceived as distressing such as facial burn injury and larger burns may be triggers for the activation and prolongation of rumination.In addition, a gender effect is also expected because disfiguring scars resulting from burns may be of greater importance to woman as compared with men (Ghriwati et al., 2017).We therefore hypothesized that the means of rumination for men with and without facial burn injury and the mean of rumination for women without facial burn injury would be lower than the mean of rumination for women with facial burn injury.In symbols, this hypothesis can be stated as: H 1 : { men; no facial burns , men; facial burns , women; no facial burns } Յ women; facial burns , where are the population means for rumination for the four groups determined by gender and facial burns, adjusted for the population effects of some covariates.Note that no particular order is assumed among the first three means.This form of theory-based hypothesis is known as an order-restricted (OR) hypothesis or informative hypothesis (Hoijtink, 2012) because the order of the means is restricted based on theory and/or academic expertise.
To evaluate such OR hypothesis, three methods can be distinguished, that is, OR hypothesis testing (e.g., Kudô, 1963), model selection using OR information criteria (e.g., Anraku, 1999;Mulder, & Raftery, in press), and model selection using the Bayes factor (e.g., Mulder, Hoijtink, & Klugkist, 2010).In this current article, we focus on model selection using information criteria.Akaike's IC (AIC; see, e.g., Akaike, 1973Akaike, , 1998) ) is probably the most familiar and widely used information criterion employed in the social and behavioral sciences.Nevertheless, the AIC is not suitable when the model parameters (e.g., means or regression coefficients) are subject to order restrictions.A modification of the AIC that can deal with simple order restrictions 1 (i.e., Order-Restricted Information Criterion [ORIC]) in the exponential family was proposed by Anraku (1999).Kuiper, Hoijtink, and Silvapulle (2011) generalized the ORIC (GORIC) to accommodate any linear inequality restrictions in multivariate normal linear models (except for range restrictions, which bounds a parameter to a specific interval, e.g., Ϫ1 Յ Յ 1).Information criteria like the AIC, ORIC, and GORIC are calculated as minus two times the maximum log-likelihood (under the hypothesized model) plus twice a penalty term value.The main difference between the methods is in calculating the penalty term value, which is less straightforward to compute in case of order restrictions.
The evaluation of an OR hypothesis (e.g., H 1 ) requires at least one competing hypothesis.Sometimes researchers have another hypothesis of interest and want to know which hypothesis is best, for example H 1 versus H 2 : men; no facial burns Յ men; facial burns Յ women; no facial burns Յ women; facial burns .The difference between the two is that in H 2 the means are fully ordered.In practice, researchers often do not have such a specific competing hypothesis and only the unconstrained hypothesis H u , where no restrictions are imposed on the model parameters, remains included as competing hypothesis in the set.Therefore, in this article, we focus solely on the set of hypotheses with one OR hypothesis H m and the unconstrained hypothesis H u .
The hypothesis with the lowest GORIC value is the preferred one.The GORIC values themselves are not interpretable and only the differences between the values can be inspected.To improve the interpretation, so-called GORIC weights (w m ) can be computed, which are derived from the Akaike weights (Akaike, 1978;Burnham & Anderson, 2002) and are comparable with posterior model probabilities (see Burnham & Anderson, 2002, pp. 302-305).An IC weight w m represents the relative likelihood of hypothesis m given the data and the set of M hypotheses (Burnham & Anderson, 2002;Kuiper, 2011;Wagenmakers & Farrell, 2004).For example, if we compare hypothesis H 1 against hypothesis H u , we can examine the ratio of the two corresponding weights, that is w 1 /w u .This evidence ratio is considered as the strength of evidence in favor of model m (in this example, m ϭ 1) of being the best model (Burnham & Anderson, 2002;Wagenmakers & Farrell, 2004).
The evidence ratio should increase for larger samples and/or effect-sizes.However, if the OR hypothesis of interest H m is in agreement with the data, increasing the sample-size and/or effectsize does not affect the evidence ratio if the unconstrained hypothesis is used as competing hypothesis (assuming that H m is true and remains in agreement with the data).In that case, both hypotheses H m and H u are in line with the data, since H u is always in line with the data, and consequently both hypotheses have the same maximized log-likelihood value.Then, the difference in GORIC values equals the difference in penalty term values, which are independent of sample-size and effect-size.The latter case is illustrated in Figure 1, where we generated 500 data sets according to an ANOVA model with four uncorrelated ordered means which are in agreement with H 1 , with a sample-size of n ϭ 50 per group and various effect-sizes f (Cohen, 1988, pp. 274 -275).The results (cf.see triangles in Figure 1) show that at first the mean evidence ratio (on a log scale) of w 1 /w u increases slightly for increasing effectsizes and that afterward it stabilizes at an upper-bound value of approximately exp(1.09)Ϸ 2.97 on the original scale.It is at this point that the data are for each simulation run in agreement with H 1 and thus the maximized log-likelihood values of H 1 and H u are the same.The boundary value equates the exponential difference of the penalty term values between H u and H 1 , that is, exp(5.00-3.91) ϭ exp(1.09)Ϸ 2.97; as will become clear later on.Consequently, strong support for the OR parameters is not expressed in a high evidence ratio if compared to the unconstrained hypothesis and many research questions may be erroneously dismissed as irrelevant.It is important to note that the boundary issue is not specific to the order-restricted IC literature but can also be found in the Bayes factor literature (e.g., Mulder et al., 2009Mulder et al., , 2010)).They solve the boundary issue by comparing the orderrestricted hypothesis against its complement.
The objective of this study is to show that this upper bound issue can also be solved by replacing the unconstrained hypothesis by the complement of the hypothesis of interest (cf.see circles in Figure 1).The complement is defined as H c ϭ ¬ H m , where ¬ denotes "not."For example, for the OR hypothesis H 1 : { men; no facial burns , men; facial burns , women; no facial burns } Յ women; facial burns with four means there are four ways in which the four means can be ordered in such an ordering.Hypothesis H 1 consists of one of these four combinations, therefore the complement represents the 4 Ϫ 1 ϭ 3 remaining ways in which the four means can be ordered.In this "simple" case, it is easy to write out the complement but this is often not the case.In many cases, it is a cumbersome or even impossible task to write up all possible combinations that belong to the complement, because the number of combinations increases excessively with the number of parameters.For example, for the OR hypothesis H 2 with four means there are 24 ways (i.e., 4! ϭ 4 ϫ 3 ϫ 2 ϫ 1) in which the four means can be ordered in a simple order ordering.Moreover, the GORIC is often not defined when the complement comprises multiple combinations.This is because, the GORIC is only defined for restrictions that form a closed convex cone2 (ccc, e.g., H 1 and H 2 ).
The novelty of this article is that we introduce the GORIC for the situation that the restrictions in the complement are not a ccc.We show (a) how to determine the log-likelihood for the complement and (b) how to determine the penalty term value for the complement.
The remainder of this article is organized as follows.First, we provide some technical background about the computation of the GORIC and the corresponding penalty term value for the unconstrained hypothesis and an OR hypothesis H m .Second, we introduce how the GORIC is computed for the complement of H m .Third, we illustrate our method with the empirical example introduced at the beginning of this section.Finally, we give some concluding remarks and recommendations.

Technical Background
Before we introduce the GORIC for the complement, some technical background is inevitable.The results given in this part are for the linear regression model, where the regression coefficients are subject to linear inequality and/or linear equality restrictions.The method can readily be adapted to multivariate normal linear models.This is briefly discussed in the last section.

Linear Model and Order-Restricted Hypotheses
Consider the standard linear regression model, where ϭ ( 1 , . . ., p ) T is the parameter vector of interest, x i ϭ ͑x i1 , . . ., x ip ͒ T is a vector of predictor variables3 for person i, and ⑀ i ϭ ͑⑀ 1 , . . ., ⑀ n ͒ T is a vector of normally distributed random errors: ⑀ i ϳ N͑0, 2 ͒.Let the (unconstrained) maximum likelihood estimates be denoted by and the order-restricted mle's denoted by ˜m.The latter is the solution of maximizing the likelihood under the restrictions in H m , a well-studied restricted optimization problem in the statistical literature (Nocedal & Wright, 2006).
We consider three types of hypotheses, namely H u : ʦ ‫ޒ‬ p , where ‫ޒ‬ p is the p-dimensional Euclidean space, H m : ʦ C, where C is also a space in ‫ޒ‬ p and is a (reallocated) closed convex cone (Kuiper, Hoijtink, & Silvapulle, 2012), and H c : ¬H m , which is not necessarily a (reallocated) closed convex cone.Because, most applications only involve linear restrictions, we only consider linear hypotheses.

The GORIC
The GORIC for the unconstrained hypothesis H u is defined as where LL u is the maximized log-likelihood value and the penalty term value is defined as PT u ϭ 1 ϩ p.Note that GORIC u equals the AIC for H u .
The GORIC for the OR hypothesis H m is defined as where LL m is the maximized log-likelihood value for the OR hypothesis H m and PT m is the penalty term value for H m .The penalty term value equals In an univariate regression model ⌺ ϭ ͑X T X͒ Ϫ1 is the unscaled4 covariance matrix of the parameters with X ϭ ͑x 1 T , . . ., x n T ͒ T of order n ϫ p and LP j (p, ⌺, H m ) are the level probabilities (chi-bar-square weights).A level probability LP j , is the probability that the OR mle ˜m has j levels (under the null-hypothesis), where j ϭ p Ϫ "the number of active restrictions"; and the LP's sum to 1.We will clarify the computation of PT m using two examples.Consider Figure 2a, where .Mean evidence ratio (on a log scale and based on 500 simulations) for hypothesis H 1 : { 1 , 2 , 3 } Յ 4 compared with the unconstrained hypothesis (mean w 1 /w u ) and compared with its complement (mean w 1 /w c ) when H 1 is true, for n ϭ 50, and various effect-sizes (f).
This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
the unconstrained parameter space is determined by the two parameters 1 and 2 and is divided into four quadrants (Q 1 to Q 4 ).If we assume that 1 and 2 are independent of each other (i.e., ⌺ ϭ I, where I is an identity matrix), then each quadrant is assigned a level probability of 0.25 under H 0 : 1 ϭ 2 ϭ 0. The permissible gray shaded area is defined by the order restrictions H 3 : 1 Ն 0, 2 Ն 0.
Then, the probability that j ϭ 2, that is, none of the restrictions are active (i.e., j ϭ p Ϫ 0 ϭ 2 Ϫ 0 ϭ 2), is 0.25 (Q 1 ).The probability that j ϭ 1, that is, that one restriction is active (i.e., j ϭ 2 Ϫ 1 ϭ 1), is 0.25 ϩ 0.25 ϭ 0.50 (Q 2 and Q 4 ).The probability that j ϭ 0, that is, that both restrictions are active (i.e., j ϭ 2 Ϫ 2 ϭ 0), is 0.25 (Q 3 ).Consequently, the penalty term value for the OR hypothesis H 3 can be computed as PT 3 ϭ 1 ϩ 0 ϫ 0.25 ϩ 1 ϫ 0.50 ϩ 2 ϫ 0.25 ϭ 2. In addition, consider Figure 2b, where the parameter space is restricted by the order restrictions H 4 : 1 Ն 2 .Because the order restriction divides the unconstrained parameter space into two spaces, Q 1 and Q 2 are now two half-spaces.Again, assume that ⌺ ϭ I and again we have two parameters (i.e., p ϭ 2), but now we only have one order restriction (i.e., q 1 ϭ 1).Consequently, there can be zero or at maximum one active restriction and thus the probability that j ϭ 0, that is, that we have two active restrictions, is 0. This is because, if we impose one order restriction on two parameters, one parameter is allowed to vary freely, while the other parameter is restricted by the value of this free parameter.The probability that j ϭ 1, that is, that the order restriction is active, is 0.5 (Q 2 ).The probability that j ϭ 2, that is, that the order restriction is not active, is 0.5 (Q 1 ).Hence, the penalty term value for the OR hypothesis H 4 is computed by PT 4 ϭ 1 ϩ 0 ϫ 0 ϩ 1 ϫ 0.5 ϩ 2 ϫ 0.5 ϭ 2.5.Equipped with this knowledge, we rewrite the penalty term PT m .This becomes helpful for determining the penalty for the complement of H m , which is discussed later on.Let q 1 Ͼ 0 be the number of order restrictions and q 2 Ͼ 0 the number of equality restrictions.Then, p ϭ q 1 ϩ q 2 ϩ F, with F the number of remaining, free parameters.Then, the penalty for H m can be rewritten as: using that LP 0 to LP FϪ1 are 0 (because F free parameters in H m denote that there are always at least F "inactive" restrictions); and that LP pϪq 2 ϩ1 to LP p are 0 (because the q 2 equality restrictions in H m are always active); and that the LP's sum to 1.
From the penalty term value PT m , it follows that for q 1 order restrictions and q 2 equality restrictions, and thus F ϭ p Ϫ q 2 free parameters, the penalty term value for a hypothesis with solely equality restrictions equals 1 ϩ F ϭ 1 ϩ p Ϫ q 2 , which equals the penalty term value of the AIC.
In case of order restrictions, the exact computation of the level probabilities when ⌺ I and for q Ͼ 4 is a difficult task in general because the probabilities can no longer be expressed in closed form.Fortunately, the probabilities can be approximated by using the multivariate normal probability distribution function with additional Monte Carlo steps (Grömping, 2010) or they can be computed easily and sufficiently precise by Monte Carlo simulation (Silvapulle & Sen, 2005;Wolak, 1987).

Introduction of the GORIC for H c
Here, we introduce the GORIC for the complement of H m , which is defined as where LL c is the maximized log-likelihood value for the complement of H m and PT c is the penalty term value.Recall that for the computation of the GORIC value for H m the order-restricted Illustration to illuminate the computation of the penalty term value of H m .The gray shaded area is the permissible area under H m .(a) This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
hypothesis is required to be a ccc.However, the complement H c is in many cases not a ccc.For example, the complement of H 3 : 1 Ն 0, 2 Ն 0, that is, H c , is constructed by the quadrants Q 2 , Q 3 , and Q 4 (see Figure 3a), and can be written as Note that in this case, the complement can be written out easily but, for many hypotheses, it is a difficult or even impossible task to write up the complement.Because, H c is not a ccc, the LL c and the PT c values cannot be computed directly like the LL m and the PT m values.

Computing the Log-Likelihood for H c
To compute the LL c value, we first need to ascertain whether the restrictions in H m are in line with the data or not.If at least one inequality restriction is violated, then the data are automatically in line with the complement and the LL c value equals the LL u value.This is illustrated for H 3 : 1 Ն 0, 2 Ն 0 in Figure 3a, where the permissible area is Q 1 and the quadrants Q 2 , Q 3 , and Q 4 form the complement.Because the unconstrained mle's lie in Q 3 (here, both restrictions are violated), the data are in line with the complement and the LL c is equal to LL u .Note that the same applies if the mle's lie in Q 2 or Q 4 .On the other hand, if the data are in line with the restrictions in H 3 , then we have to find the mle's of that are closest to ʦ H c , given ⌺, which is denoted by ˜c.Let us inspect a bivariate case with ⌺ ϭ I (solid circles), as depicted in Figure 3b.The solid circles of the contour plot indicate that the two parameters 1 and 2 are uncorrelated.As a reminder, the lines of the contour plot correspond to parameter values which have equal log-likelihood values and lines closer to result in a higher log-likelihood value, because is the value for which the loglikelihood is maximized (without imposing restrictions on the parameters).Clearly, the solution ˜c is on the boundary of the restricted parameter space H m .Because there are many boundary solutions (see thick black lines), we have to search for a solution that has the shortest distance between and the two boundaries, given ⌺.Fortunately, we do not have to investigate each point on the thick black lines but only the points ˜c1 and ˜c2 .The point ˜c1 is computed by treating the inequality restriction for 1 as equality restriction (i.e., 1 ϭ 0, 2 Ն 0).Analogously, for the point ˜c2 , where 2 is treated as equality restriction (i.e., 1 Ն 0, 2 ϭ 0).Thus, in general, there are in total q 1 possibilities to be investigated.Notably, in case of equality restrictions, all q 2 equalities are "freed."The LL corresponding to that point that results in the highest log-likelihood value, given ⌺, equals the LL c value (here, ˜c1 ).
As mentioned above, the solution of c is dependent on the covariance matrix ⌺.To clarify this, again consider Figure 3b.The solid contour lines show the solution of ˜c if ⌺ is an identity matrix (i.e., ˜c1 ).It can easily been seen that, if the covariance matrix is not an identity matrix (e.g., see dot-dashed line) that the solution of ˜c alters (here, it is ˜c2 ).

Computing the Penalty Term Value for H c
The penalty term value for H m can be seen as the expected number of "inactive" restrictions (i.e., j ϭ p Ϫ "the number of active restrictions" which implies that free parameters denote "inactive" restrictions) plus one for the variance term.The expected number of "inactive" restrictions is the sum of the expected number of "inactive" restrictions for 1 ϩ p Ϫ q 2 subspaces (where 0 to p Ϫ q 2 restrictions are "inactive").If we apply this Figure 3.The gray shaded area (i.e., Q 1 ) is the permissible area under H 3 and the other quadrants (white area) are the permissible areas under its complement H c .(a) H 3 : 1 Ն 0, 2 Ն 0. The mle's lie in Q 3 and is thus in agreement with H c .(b) H 3 : 1 Ն 0, 2 Ն 0, for ⌺ ϭ I (solid lines) and for ⌺ I (dashed lines).The mle's lie in Q 1 and is thus not in agreement with H c .This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
principle on the complement, then only two distinct subspaces can be distinguished (and not 1 ϩ p Ϫ q 2 ).The first subspace is the one not in agreement with the complement (cf.Q 1 ; i.e., the space fully in agreement with H m ).For this subspace, the number of "inactive" restrictions equals the number of free parameters in H c , denoted by F c .Note that the free parameters in H m (F) remain free in H c and the q 2 equality restrictions in H m are "freed" in H c .Thus, F c ϭ F ϩ q 2 ϭ p Ϫ q 1 .The probability of F c levels in H c (LP F c c ) equals the probability of having p Ϫ q 2 levels in H m (LP pϪq 2 ; cf.probability of ending up in Q 1 is LP pϪq 2 ϭ LP 2 ϭ 0.25), that is, the probability finding the mle in H m (under the null).The second subspace is the one fully in agreement with the complement (cf.not Q 1 , that is, Q 2 , Q 3 , and Q 4 ) and has p ϭ F c ϩ q 1 "inactive" restrictions (note that there are no equality restrictions in H c ).The corresponding level probability is denoted by LP F c ϩq 1 c and equals the probability of not finding the mle in H m but in H c which equals ͑1 Ϫ LP pϪq 2 ͒.Mimicking Equation 4, the penalty term value for H c is given by Another way of establishing PT c follows from the fact that the complement is the whole space minus the space fully in agreement with H m and, therefore, its penalty equals the penalty of the whole space (i.e., 1 ϩ p) minus the expected number of "inactive" restrictions in the space H m .The expression for the latter equals the last part in Equation 4, that is, q 1 ϫ LP Fϩq 1 .Then, it follows that the expected number of "inactive" restrictions in H c equals PT c ϭ 1 ϩ p Ϫ q 1 ϫ LP Fϩq 1 . ( This expression is equal to Equation 7, because F ϩ q 1 ϭ p Ϫ q 2 .To illustrate, the penalty term value for the complement of H 3 : 1 Ն 0, 2 Ն 0 is computed by PT c ϭ 1 ϩ 2 Ϫ 2 ϫ 0.25 ϭ 2.5 and the penalty term value for the complement of H 4 : 1 Ն 2 is computed by PT c ϭ 1 ϩ 2 Ϫ 1 ϫ 0.5 ϭ 2.5.Because the complement of H 4 (i.e., 1 Յ 2 ) is a ccc, we can use the PT m formula as well, which also renders a penalty of 2.5.In Appendix A, we illustrate the computation of the PT m and the PT c values in case of three parameters.
In the next section, we show by means of a brief illustration and the results from a simulation study that the evidence ratio for the order-restricted hypothesis compared with its complement is boundless.

Unbounded GORIC Weights in Case of the Complement
Once the GORIC values for H m and its complement are known, the GORIC weights can be easily obtained as follows where the subscript s equals m or c for hypothesis H m and hypothesis H c , respectively.From these weights, we can determine the evidence ratio for H m against its complement w m /w c .This ratio is interpreted as the strength of evidence for H m given the data and H c (Kuiper, 2011;Wagenmakers & Farrell, 2004).For example, for Figure 3b with ⌺ I (e.g., dot-dashed lines), n ϭ 50, and f ϭ 0.20, the evidence ratio for H 3 : 1 Ն 0, 2 Ն 0 compared with H c equals w 3 /w c ϭ 0.92/0.08ϭ 11.50.This ratio tells us that hypothesis H 3 is the best out of the two (because it is larger than 1) and that H 3 is 11.50 times more supported than H c .Notably, because we sampled from H 3 , H 3 is true.To contrast, if we want to determine the evidence ratio for H 3 against the unconstrained hypothesis H u , that is, w 3 /w u , we have to replace the GORIC c by the GORIC u in Equation 9and s equals m or u for hypothesis H m and hypothesis H u , respectively.Note that w 3 now not equates 0.92 from above, because the weights depend on the set of hypotheses.Therefore, if H c is replaced by H u , the weights must be recomputed for the two hypotheses in the set.Then, the evidence ratio equals w 3 /w u ϭ 0.62/0.38Ϸ 1.63.This clearly shows the advantage of using the complement as competing hypothesis.Namely for the same data, we could obtain support for H 3 of 1.63 (the maximum support) or a support of 11.50 when comparing to H c (using n ϭ 50).
In Appendix B, we present a simulation study in which we investigated the performance of the evidence ratio weights.The simulation results show the benefits of evaluating an OR hypothesis against its complement.While, for small effect-sizes and/or sample-sizes, the difference between the evidence ratio for the true H m when using the complement or unconstrained as competing hypothesis is minimal, the difference increases rapidly and profoundly for larger effect-sizes and/or sample-sizes.More importantly, the evidence ratio for the true H m against its complement is boundless for increasing effect-sizes and/or sample-size and might therefore be more compelling, whereas the evidence ratio for the unconstrained hypothesis as competing hypothesis has an upper bound.Therefore, we recommend to replace the unconstrained hypothesis by the complement of the hypothesis of interest as competing hypothesis.

Burns Example
To illustrate the method, we analyze the empirical example introduced in the introduction in which we sought to determine possible risk factors for ruminating thoughts after a burn injury.The data are based on a cohort study consisting of 245 individuals with burns, aged 18-to 74-years-old.The response variable is rumination.Moreover, for the current illustration, we included gender (0 ϭ men, 1 ϭ women) and facial burns (0 ϭ no, 1 ϭ yes) together with its interaction as predictor variables and Hospital Anxiety and Depression Scale (HADS; M ϭ 3.85, SD ϭ 3.66), age (M ϭ 41.06, SD ϭ 13.94), and the number of surgical operations, which is a measure of severity of the burns (SO; M ϭ 1.14, SD ϭ 1.76) as covariates.
Reconsider, the hypothesis of interest H 1 : { men; no facial burns , men; facial burns , women; no facial burns } Յ women; facial burns .A natural choice to evaluate the OR hypothesis H 1 would be an order-restricted 2 ϫ 2 ANCOVA model.Because an ANCOVA is just a special case of the linear regression model, the model can be written as a linear function.To obtain adjusted means for a person with an average score on the covariates, the covariates HADS, age and SO are centered at their average and are denoted by Z_HADS, Z_age and Z_SO, respectively.Then, the model can be written as follows: This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
On the left-hand side of the ϭ operator, we have the response variable rumination and on the right-hand side we have the factors facial burns and gender and its interaction, and the centered covariates Z_HADS, Z_age and Z_SO.The interaction between gender and facial burns is included using the ϫ operator.Then, the four adjusted means with average scores on the covariates are computed as: men; no facial burns ϭ 1 men; facial burns ϭ 1 ϩ 2 women; no facial burns ϭ 1 ϩ 3 women; facial burns ϭ 1 ϩ 2 ϩ 3 ϩ 4 .
The R (R Core Team, 2019) code and the output from the analyses can be found in Appendix C.
The results show that the OR hypothesis H 1 is 0.891/0.109Ϸ 8.198 times more supported by the data than its complement.For comparison, the results for the unconstrained hypothesis show that hypothesis H 1 is 0.734/0.266Ϸ 2.754 times more supported by the data, which is its maximum support.Assuming H 1 is true, using the complement of H m instead of H u , we have now have more compelling evidence.

Summary and Discussion
In this article, we introduced the evaluation of an order-restricted (OR) hypothesis against its complement using the GORIC (weights).The GORIC is an information criterion that can be used to evaluate competing hypotheses in univariate and multivariate normal linear models, where the regression parameters are subject to linear (in)equality restrictions.The interpretation can be improved by computing GORIC weights and related evidence ratios reflecting the strength of evidence for one hypothesis versus another.
We advise that one should evaluate their theory against its complement H c instead of the unconstrained hypothesis H u .advantage of our method is that the evidence ratio for an OR hypothesis H m compared with its complement is boundless.The evidence ratio for H m compared to H u is neither increased by a larger sample-size nor by a larger effect-size, if the data are in agreement with the hypothesis of interest (i.e., theory).Consequently, using the complement as competing hypothesis leads to much more support for the hypothesis of interest assuming it is true, compared with using the unconstrained hypothesis as competing hypothesis.In case the complement is not true, then the results are comparable to evaluating the order-restricted hypothesis against the unconstrained hypothesis.Furthermore, in case that H m is almost true, then there is less support for H m when compared with its complement than when compared with the unconstrained hypothesis.This is because the log-likelihood values are almost identical and the difference between the penalty term values is larger between H m and H u than between H m and H c .Consequently, in such cases, it is also better to evaluate H m against its complement as the support for H m against H u might be overstated because H m is not true.Besides that the complement has practical benefits, it is also more substantive: When looking at the interpretation of the complement and the unconstrained hypothesis, the latter can be seen as all possible theories including the target hypothesis H m , whereas the former denotes the possible theories without the target hypothesis H m .We believe that this makes more sense if you would like to know whether your theory is better than all other theories (and how much).Comparing your theory against all possible theories including the targeted one makes less sense to us.
The method was illustrated using an empirical example about facial burn injury.In six easy steps (shown in Appendix C), we showed how to compute the evidence ratio of the researchers theory against its complement using the R package restriktor (see http://www.restriktor.org).
We assumed that researchers often do not have specific competing hypotheses.While, this is probably often the case, it is conceivable that the set of hypotheses contains more than one competing hypothesis.In these cases, the problem that the evidence ratio for H m against H u is not affected by increasing samplesize and/or effect-size after a specific value can still occur.For example, consider the set with three hypotheses: H 2 , H 5 : 1 Յ 2 Յ 3 ϭ 4 (which is a subset of H 2 ) and the unconstrained hypothesis H u .If H 5 is true, then all three hypotheses are true and all evidence ratios are bounded (Kuiper et al., 2011, p. 107).However, the evaluation of a set of multiple OR hypotheses against its complement is less straightforward because determining the complement for multiple hypotheses might not always be trivial (especially for software).
The results presented in this article are for the univariate linear regression model but fortunately they can easily be adapted for the multivariate normal linear model.One should keep in mind that, unlike in the univariate setting, where ˜does not depend on the order-restricted covariance matrix, denoted by ⌺ ˜, in the multivariate normal linear model ˜does depend on ⌺ ˜and ⌺ ˜on ˜ (Kuiper et al., 2012).Hence, an iterative procedure is needed to calculate them.The procedure is implemented in restriktor.

Design
We generated 500 samples according to the ANOVA model5 y i ϭ 1 x i1 ϩ . . .ϩ 4 x i4 ϩ ⑀ i , i ϭ 1, . . ., n, where we assume that the residuals are standard normally distributed.We considered the OR hypothesis H 1 : 1 Յ 2 Յ 3 Յ 4 , its complement H c : ¬H 1 and the unconstrained hypothesis H u : 1 , 2 , 3 , 4 .Note that H c does not equal 1 Ն 2 Ն 3 Ն 4 ; it does contain this but also the other ( 22) orderings of simple ordering combinations of 1 to 4 (excluding the one ordering in H m ).Data were generated under hypothesis H 1 with four uncorrelated independent means of size n ϭ 30, 50, 100, 200, 500 per group and for a variety of differences among the population means, using effect-size f ϭ 0, 0.10, 0.20, . . ., 1 (Cohen, 1988, pp. 274 -275).Notably, f ϭ 0 corresponds to sampling from the boundary of both H m and H c .If we sample values from a H 1 population with increasing effect-size, this will evidently lead to more and more support for H 1 .Let the differences between the means, d, be equally spaced, where d is defined as d ϭ 2f͙p ͙ ͚ iϭ1 p ͑2iϪ1Ϫp͒ 2 under the restriction that ͚iϭ1 p i ϭ 0 and ϭ 1.Then, the p ordered means can be computed as i ϭ Ϫ͑pϪ1͒d 2 ϩ ͑i Ϫ 1͒d.Appendix D shows the computed population means for the various effect-sizes (f).

Simulation Results
All results are obtained using the R package restriktor (see http://www.restriktor.org)employing the GORIC function.The results of the simulation study are presented in Figure B1, B2, and B3, and are obtained by computing the mean value of the relative evidences in each of the 500 simulation runs.Furthermore, to improve visibility, we took the natural logarithm values of the means and we used a varying range of sample-sizes and effectsizes.
The results clearly illustrate the benefits of evaluating H m versus its complement: The mean evidence ratio for H 1 versus H c (mean w 1 /w c ) increases rapidly for larger effect-sizes (see Figure B1a) and sample-sizes (see Figure B1b), while the mean evidence ratio using the unconstrained hypothesis as competing hypothesis (mean w 1 /w u ) is clearly bounded after a certain value (see Figures B1c  and B1d, respectively).To illustrate, consider for example Figure B1a, where the mean evidence ratio for H 1 versus H c (mean w 1 /w c ) for a medium effect-size (f ϭ 0.30) and n ϭ 100 is exp(2.63)Ϸ 13.87 (on the original scale), while the mean evidence ratio for H 1 versus H u (mean w 1 /w u ) is bounded at exp(1.92)Ϸ 6.82.Note that the value 1.92 equals the difference in penalty term values; with PT u Ϫ PT 1 ϭ (1.00 ϩ 4.00) Ϫ (1.00 ϩ 2.08) Ϸ 1.92, which equals the difference in GORIC values, because the log-likelihood values are here the same (i.e., LL u ϭ LL 1 ).
For small effect-sizes and small samples, the mean evidence ratio for H 1 using H c is slightly lower than when using H u .For example, for f ϭ 0.10 and n ϭ 30 the mean evidence ratio for w 1 /w c is exp(1.50)Ϸ 4.48 and for w 1 /w u the mean evidence ratio is exp(1.61)Ϸ 5.00.In this case, using the complement is a bit more conservative; although the conclusion is not different of course.Furthermore, the evidence ratio for small effect-sizes (f Յ 0.20) does not increase very rapidly (see Figure B1b), independent of sample-size.This is because, when examining small effects, the complement is often true (even though the data were generated under H 1 ).This is illustrated in Figure B2.For example, if f ϭ 0, the mle's are (except from some sampling variation) in 23/24 (approximately 95.8%) of the time not in agreement with H 1 (and thus in agreement with H c ).Thus, both hypotheses H c and H u have the same maximized log-likelihood value with a probability of prob cu ϭ 23/24.When f increases, the data/the mle's will be more and more in agreement with H 1 , and thus not with its complement H c and hence the proportion of equal maximized log-likelihood values of H u and H c (and thus prob cu ) decreases.Logically, the proportion of equal maximized log-likelihood values of H 1 and H u , that is 1 Ϫ prob cu , then increases.
The presented results so far are for the scenario that H 1 is true, but we are also interested in the performance if H 1 is not true (i.e., H c is true).Figure shows the results for the situation that the complement is true.Data were generated under the complement of H 1 , for which we choose H c : 1 Ն 2 Ն 3 Ն 4 .The means are given in Appendix D and are now in reversed order compared with the previous simulation.Again, we considered the OR hypothesis H 1 , its complement H c , and the unconstrained hypothesis H u .The results in Figure B3a show that the mean evidence ratio for H 1 versus H c (mean w 1 /w c ) and for H 1 versus H u (mean w 1 /w u ) decreases rapidly for larger f.This is because, when the effect-size and/or the sample-size increases, the data/mle's will be more and more in agreement with the complement H c and of course also with the unconstrained hypothesis H u .The results shown in Figure B3b are based on the same numerical results shown in Figure B3a but now for H c versus H 1 (mean w c /w 1 ) and for H u versus H 1 (mean w u /w 1 ).They clearly show the nice property that if the complement (and also H u ) is true, both evidence ratios w c /w 1 and w u /w 1 show more support for larger effect-sizes and samples sizes.

(Appendices continue)
This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

(Appendices continue)
This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

139
EVALUATING A HYPOTHESIS AGAINST ITS COMPLEMENT Figure1.Mean evidence ratio (on a log scale and based on 500 simulations) for hypothesis H 1 : { 1 , 2 , 3 } Յ 4 compared with the unconstrained hypothesis (mean w 1 /w u ) and compared with its complement (mean w 1 /w c ) when H 1 is true, for n ϭ 50, and various effect-sizes (f).

Figure B1 .
Figure B1.Mean of the evidence ratio (on a log scale) for the situation that the OR hypothesis H 1 is true (based on 500 simulations).(a) Hypothesis H 1 is compared with its complement H c (mean w 1 /w c ), for various effect-sizes (f) and for n ϭ 30, 50, 100 and 200.(b) Hypothesis H 1 is compared with its complement H c (mean w 1 /w c ), for various sample-sizes (n) and for f ϭ 0.10, 0.20, 0.30 and 0.40.(c, d) same as (a, b) but now for hypothesis H 1 versus the unconstrained hypothesis H u (mean w 1 /w u ).