A global-scale two-layer transient groundwater model: Development and application to groundwater depletion

Groundwater is the world’s largest accessible source of freshwater to satisfy human water needs. More- over, groundwater buffers variable precipitation rates over time, thereby effectively sustaining river ﬂows in times of droughts and evaporation in areas with shallow water tables. In this study, building on previ- ous work, we simulate groundwater head ﬂuctuations and groundwater storage changes in both conﬁned and unconﬁned aquifer systems using a global-scale high-resolution (5 (cid:3) ) groundwater model by deriving new estimates of the distribution and thickness of conﬁning layers. Inclusion of conﬁned aquifer systems (estimated 6–20% of the total aquifer area) improves estimates of timing and amplitude of groundwa- ter head ﬂuctuations and changes groundwater ﬂow paths and groundwater-surface water interaction rates. Groundwater ﬂow paths within conﬁning layers are shorter than paths in the underlying aquifer, while ﬂows within the conﬁned aquifer can get disconnected from the local drainage system due to the low conductivity of the conﬁning layer. Lateral groundwater ﬂows between basins are signiﬁcant in the model, especially for areas with (partially) conﬁned aquifers were long ﬂow paths crossing catchment boundaries are simulated, thereby supporting water budgets of neighboring catchments or aquifer sys- tems. The developed two-layer transient groundwater model is used to identify hot-spots of groundwater depletion. Global groundwater depletion is estimated as 7013 km 3 (137 km 3 y − 1 ) over 1960–2010, which is consistent with estimates of previous studies. This paper presents a global-scale groundwater model simulating lateral ﬂows and head ﬂuctuations caused by changes in climate or human water use over the period 1960–2010. It is the ﬁrst global-scale transient groundwater model that includes a parameterization of unconﬁned and conﬁned systems (presented in two model layers), and simulates lateral-ﬂows and groundwater-surface water interactions in these systems. This results in more realistic estimates of groundwater head ﬂuctuations and storage changes than before.


Introduction
As the world's largest accessible source of freshwater, groundwater plays a vital role in satisfying the basic needs of human society ( Gleeson et al., 2016;UNESCO, 2009 ). It serves as a primary source of drinking water and supplies water for agricultural and industrial activities. During periods of low or no rainfall, groundwater storage provides a natural buffer against water shortage, preserves evaporation in areas with shallow water tables, and sustains baseflows to rivers and wetlands, thereby supporting ecosystem habitats and biodiversity (e.g. Bierkens and van den Hurk, 2007;de Graaf et al., 2013;Wada et al., 2014 ). Moreover, groundwater often flows across topographical and administrative boundaries at considerable rates, supporting water budgets of receiving catchments or aquifers ( Schaller and Fan, 2009 ). However, groundwater resources are increasingly threatened by excessive abstractions, particularly in irrigated areas where abstraction rates are high and groundwater is only slowly renewed ( Gleeson et al., 2011 ). The most direct effects of groundwater depletion are falling water tables and the irreversible loss of groundwater storage. As a consequence, pumping costs increase, and baseflow to rivers and wetlands is reduced, negatively affecting ecosystems, and compaction of the emptied pore space may lead to land subsidence.
Despite the importance of groundwater and the explicit treatment of groundwater recharge and pumping in global studies ( Döll et al., 2014a;Wada et al., 2010 ), most global-scale hydrological models do not include a groundwater flow component. The obvious reason for this missing component is the lack of consistent global-scale hydrogeological information. Such data is needed to obtain a realistic physical representation of the groundwater sys- tem, allowing for simulation of groundwater head dynamics and lateral flows (e.g. Fan et al., 2007;Schaller and Fan, 2009 ). Another reason is that at the usual time scales considered (from season to years), the amount of lateral flow across cell boundaries is likely to be small compared to the vertical exchange between groundwater and the overlying soil through recharge, evaporation and capillary rise ( Lam et al., 2011 ). Moreover, many applications do not require lateral flow. For instance, if the purpose of the modeling exercise is to estimate the volume of groundwater depletion using a volume based approach ( Döll et al., 2014b;Pokhrel et al., 2012;Wada et al., 2010 ), or to relate groundwater storage to land evaporation ( Lam et al., 2011 ), it generally is sufficient to model groundwater as a single reservoir with no connections to neighboring cells.
However, there are a number of applications that do require lateral flow to be explicitly modeled. First, even though the volumes of groundwater traveling across cells is currently limited, it becomes more and more important at higher resolutions Wood et al., 2012 ). This is certainly the case when groundwater abstractions are large and increasing. Second, particularly below megacities in deltas and for irrigated agriculture in developed countries such as the U.S., groundwater abstractions are often from (semi-) confined aquifers. In confined aquifers the spatial influence of abstractions is much larger than in unconfined aquifers, likely exceeding cell boundaries. Third, there is considerable interaction between surface water and groundwater, even in confined aquifers, with the larger rivers. Not accounting for this interaction may overestimate groundwater depletion due to the neglect of increased capture (see e.g. Konikow, 2011 for a critique on the flux-based approach). This groundwater-surface water interaction is difficult to parameterize in a land-surface model without lateral flow. Fourth, it is important to eventually surpass the mere estimation of depletion volumes and move to estimating head declines if one aims to estimate by what time in the future groundwater becomes unattainable or when falling heads will results in rivers running dry during low flow periods. Moreover, a groundwater flow model will be more suitable to estimate the effects of climate variability and change on ecologically relevant groundwater depths and low flows ( Fan, 2015 ). Finally, even though the volumes of water crossing cells may be limited, flow paths crossing aquifer boundaries or even larger catchment boundaries ( de Graaf et al., 2015;Schaller and Fan, 2009 ) can be long and the associated groundwater age considerable. Groundwater age dates are a likely additional source of data to constrain subsurface properties (aquifer depth and permeability) by lack of any direct observations of these quantities ( Gleeson et al., 2016 ). It is beneficial to use these age dates to their full potential in a groundwater flow model.
In this paper we present the results of a two-layer transient groundwater flow model that has been built to simulate groundwater head dynamics affected by changes in climate and human water use. In particular, it is meant to estimate groundwater depletion volumes and associated head declines, thereby accounting for groundwater-surface water interaction and the presence of confining layers. The global model is a first step toward assessing how long groundwater reserves will last. It also prepares the ground for future hyper-resolution land surface models where the effect of groundwater convergence on evaporation and runoff can no longer be parameterized but has to be modeled explicitly .
Previous work on global-scale groundwater models has been done by Fan et al. (2013) andde Graaf et al. (2015) . The study of Fan et al. (2013) produced a first high-resolution global groundwater table map. However, their method does not include hydrogeological information on aquifers (e.g. depths and transmissivities). In addition the hydrological connection between groundwater and rivers, which is the primary drainage of groundwater in humid re-gions, is modeled only implicitly. Moreover, their model requires calibration to head observations and only returns the steady state head distribution. The study of de Graaf et al. (2015) presents the first high-resolution global-scale groundwater model including hydrogeological information and accounting for groundwater-surface water interactions. Lateral groundwater flows for single unconfined aquifers are simulated at steady-state. A relative simple method for aquifer parameterization is used based on available global datasets of lithology ( Hartmann and Moosdorf, 2012 ) and permeability ( Gleeson et al., 2011 ) such that the method provides results for data-poor environments. Also, aquifer thickness is derived globally using terrain attributes. The results are promising. This is shown by the high correlation between observed and simulated averaged groundwater heads, although shallow groundwater depths in local alluvial aquifers in mountain valleys, smaller than the grid resolution ( < 10 km), are not captured. Also, the study 's greatest limitation is that only the steady-state head distribution is reported and temporal changes in groundwater head patterns due to climate or human impacts are not considered. Also, only a single unconfined aquifer is described and vital information on the accessibility and quality of the groundwater water resource is missing. This information is needed to simulate the aquifer sensitivity to storage changes due to climate fluctuations or abstractions. Abstractions are preferentially positioned in confined aquifer systems, as these systems are less sensitive for climate fluctuations and groundwater quality is often better than from unconfined systems ( Foster and Chilton, 2003 ).
The first objective of this study is therefore to represent the groundwater system more realistically, by including aquifers with a confining layer, in order to simulate groundwater head dynamics affected by changes in climate and human water use adequately. The groundwater system is described in more detail by including information on the presence of confining layers, leading to the categorization of world's aquifers in confined and unconfined systems. The improved physical groundwater representation is expected to lead to a more realistic estimate of aquifer sensitivity to changes in storage by climate or human impacts.
The second objective of this study is to provide estimates of groundwater depletion and head declines worldwide. Previous estimates of groundwater depletion (e.g. Döll et al., 2012;Wada et al., 2010;Konikow, 2011;Pokhrel et al., 2012 ) vary by an order of magnitude. For example the flux-based approach of Wada et al. (2010) estimate groundwater depletion at 283 ± 40 km 3 yr −1 2010 as the residual of grid-based groundwater recharge and withdrawal. The approach of Wada et al. (2010) overestimates depletion as it does not account for increased capture due to decreased groundwater discharge, nor for long-distance surface water transports. The volume-based approach of Konikow (2011) estimates groundwater depletion at 145 ± 39 km 3 yr −1 between 2001 and 2008 based on data for the US and other big aquifer systems by extrapolation of groundwater depletion globally using the average ratio of depletion to abstractions observed in the US. Our study will be the first to simulate changes in global groundwater storage in relation to climate and groundwater pumping while including lateral groundwater flow and groundwater-surface water interaction. Including these components will lead to a more realistic estimate of groundwater depletion, that is lower than earlier flux-based estimates and more similar to volume-based estimates.

General
In this study a two-layer groundwater model for the terrestrial part of the world was developed (excluding Greenland and Antarctica). The model consists of two parts: (1) the hydrological Fig. 1. A) Original model structure of the hydrological model PCR-GLOBWB. B) The groundwater store (store 3 in PCR-GLOBWB) is replaced by a groundwater model. The groundwater model is forced with net groundwater recharge minus capillary rise, and surface water levels calculated with PCR-GLOBWB. C) A cross-section illustrating the difference between simulated regional-scale groundwater levels and perched water levels. The latter are simulated as interflow in PCR-GLOBWB.
model PCR-GLOBWB ( van Beek et al., 2011 ) and (2) the groundwater model using MODFLOW ( McDonald and Harbaugh, 20 0 0 ). Both models were run at 5 arc-minute resolution (approx. 10x10 km at the equator) over the period 1960-2010. The groundwater model was forced with outputs from PCR-GLOBWB, specifically via groundwater recharge and river discharges ( Fig. 1 ). A brief description of the models and coupling is given here, a more detailed description is given in de Graaf et al. (2015) .

Global hydrological model
The model PCR-GLOBWB simulates hydrological processes in and between two vertically stacked soil stores (maximum thickness 0.3 and 1.2 m respectively) and one underlying groundwater store. The model was run at 5 resolution at a daily time step. For the climate forcing, monthly data from CRU TS 2.1 ( Mitchell and Jones, 2005 ) was downscaled using ERA-40 ( Uppala et al., 2005 ) and ERA-interim ( Dee et al., 2011 ) reanalysis to obtain a daily climate forcing (see de Graaf et al., 2013 for a more detailed description of this forcing data-set). For each time step and every grid cell the water balance of the soil column is calculated based on the climatic forcing that imposes precipitation (rain or snow depending on temperature), reference potential evapotranspiration, and temperature. Vertical exchange between the soil and groundwater occurs through percolation and capillary rise. In the original version of the PCR-GLOBWB model no lateral groundwater flow is simulated. Instead, groundwater dynamics are described by means of a linear storage-outflow relation (Store 3 in Fig. 1 A) . Specific surface runoff snow-melt, interflow, and baseflow are accumulated along the drainage network that consists of laterally connected surface water elements representing river channels, lakes, or reservoirs. A kinematic wave approximation at a sub-daily time step is used to route surface water to the basin outlet.
PCR-GLOBWB also simulates human water use by calculating at each time step sectoral water demand (i.e. the amount of water that would be abstracted if sufficient water were available) for irrigation, industrial, domestic, and livestock), the associated water withdrawals (partitioned into groundwater and surface water based on availability and pumping capacity), consumptive water use, and return flows. Irrigation is simulated by adding additional water to soils as a function of soil water deficit needed to achieve potential evaporation. Sectorial water demands were estimated using country statistics and population numbers downscaled to 5 resolution. To approximate economic development over the model period, data of Gross Domestic Product, electricity, and household consumption were used. Industrial water demands were kept constant over the year, while domestic demand reflect seasonality in relation to air temperature. Water recycling ratios for industry and domestic use are adopted from Wada et al. (2014) and were calculated per country on the basis of GDP and level of economic development. Irrigation demands were calculated assuming optimal crop growth, using maximum crop transpiration, and accounting for bare soil evaporation and soil water availability. Losses during abstraction and transport are calculated using a country specific irrigation efficiency factor (used from Siebert and Döll, 2010 ). Irrigation evaporation losses during application are not accounted for. Also, domestic irrigation is not included as it is very small compared to crop irrigation. Total water demand was dynamically allocated to surface water (rivers, lakes, and reservoirs) and dependent on actual availability in the resource and accounting for feedbacks such as return flows of unconsumed abstracted water and groundwater-surface water interactions (following de Graaf et al., 2013 ).

Global groundwater model
A two-layer MODFLOW model ( McDonald and Harbaugh, 20 0 0; Schmitz et al., 2009 ) replaces the linear groundwater store of PCR-GLOBWB. Aquifer properties were prescribed and include confined and unconfined aquifer systems (discussed in the next paragraph). The MODFLOW model was forced with outputs from PCR-GLOBWB, specifically the flow between layer 2 and 3 ( Fig. 1 ) consisting of net recharge (recharge minus capillary rise) and river levels. The net recharge is the input for the MODFLOW's recharge (RCH) package, while the MODFLOW's river (RIV) and drain (DRN) packages are used to incorporate interactions between the groundwater and surface water. As PCR-GLOBWB runs on a Cartesian (or regular) grid (geographic projection) we have to take account of the fact that cell areas and volumes of the MODFLOW grid do not vary in space. This is done by adjusting recharge and the storage coefficient in the RCH and BCF packages to account for varying cell areas and volumes associated on a geographic grid (see Sutanudjaja et al., 2011 andde Graaf et al., 2015 for details). Apart from the river water levels, the boundary conditions of the global groundwater model were obtained as fixed heads set equal to global sea-level (reference level 0). Initial conditions were obtained by a steady state-state simulation of hydraulic head with average recharge and then warming up the model by running the year 1960 back-to-back for 20 years to reach dynamic equilibrium.
Three levels of groundwater-surface water interactions are distinguished: (1) large rivers, wider than 25 m, (2) smaller rivers with a width smaller than 25 m, and (3) springs and streams higher up in the valley.
1. For large rivers, interactions are governed by actual groundwater heads and river levels ( HRIV [m]). The latter was calculated from river discharges ( Qchn [m 3 s −1 ]) simulated by PCR-GLOBWB and using channel properties based (channel width and depth) based on geomorphological relations to bankfull discharge ( Qbkfl [m 3 s −1 ]) ( Lacey, 1930;Manning, 1891;Savenije, 2003 ). The RIV-package calculates the water flux between the river and groundwater Qriv [m 3 d −1 ]. Water infiltrates the aquifer if the groundwater head lies below the river head: Qriv is then positive. Water is drained from the aquifer if groundwater head lies above the river head: Qriv is then negative. Qriv for the larger rivers ( Qriv.Big [m 3 d −1 ]) was calculated as: is the river bed conductance calculated from river length (approximated as the diagonal cell length), river bed width and depth and river bed resistance ( 3. Qriv.Big and Qriv.Small quantify flow between streams and aquifers and are the main components of the baseflow, Qbf , which is negative when water is drained from the aquifer. At the 5 resolution however, the main stream is insufficient to represent local sags, springs, and streams higher up in the mountainous area. We assume that groundwater above the floodplain level can be tapped by local springs that are represented by means of a linear storage-outflow relationship. From the DEM and the storage estimated in S3 (in the PCR-GLOBWB run) we calculate the amount of storage ([L], aquifer depth times porosity). From this follows the local drainage flux. To be consistent with the RIV and DRN packages, this linear storage term is also negative when water is drained. Total drainage Qbf [m 3 d −1 ] is thus calculated as: where S 3, flp [m 3 ] is the groundwater storage above the floodplain (obtained from PCR-GLOBWB) and J [d −1 ] is the recession coefficient parameterized based on Kraaijenhof van der Leur (1958) : is the specific yield (for each hydrolithological unit; Table 1 and L [m] is the average distance between streams and rivers as obtained from the drainage density . Globally the local drainage flux sums up to 50% of the total baseflow.

Model runs
PCR-GLOBWB and MODFLOW were only coupled one-way. This means we first run PCR-GLOBWB at a daily timestep for 1960-2010, and force MODFLOW with monthly averaged values of the following outputs: surface water levels are passed to the RIV package, net recharge (groundwater recharge minus capillary rise) is passed to the RCH package, and groundwater abstractions are passed to the WELL package. As we do not know the actual locations of the pumping wells, we simulate abstractions with a pumping well positioned in the middle of a MODFLOW cell (that coincides with a PCR-GLOBWB cell with groundwater abstraction). Also the linear storage term (i.e. J · S 3, flp ) is passed to the WELL package. After that, MODFLOW was run over the period 1960-2010 at monthly time steps with these fluxes and boundary conditions imposed.
This type of coupling ensures that the same amount of water passes through the groundwater model as through the groundwater zone ( Fig. 1 A store 3) of PCR-GLOBWB and also yields the same global average flux between surface and groundwater. Thus, the coupling ensures full terrestrial balance closure. Also, it implicitly includes capillary rise, which is calculated in PCR-GLOBWB  and is included in the net recharge. However, it does not include feedback effects of groundwater levels on the portioning of surface fluxes as a full coupling would do, nor does it explicitly portray the effects of groundwater abstractions on surface water levels.

Aquifer parameterization
The aquifer parameterization is based on available global datasets on lithology (Global Lithology Map (GLiM), ( Hartmann and Moosdorf, 2012 ) and permeability ( Gleeson et al., 2014;2011 ), combined with an estimate of aquifer thicknesses ( de Graaf et al., 2015 ). A detailed description of the aquifer parameterization is given in ( de Graaf et al., 2015 ), a summary is given below since the focus of this study is to extend the aquifer parameterization by categorizing world's aquifers in confined and unconfined systems.

General: aquifer permeability and transmissivity
Aquifer permeabilities were defined by lumping the lithology classes defined by Hartmann and Moosdorf (2012) to 7 combined hydrolithologies (adopted from Gleeson et al., 2011 ), representing broad lithological categories with similar hydrogeological characteristics. These units were subdivided further on the basis of texture for unconsolidated and consolidated sedimentary rocks ( Table 1 ), resulting in a global map representing regional scale permeability. The polygons of the lithological map, delineating a hydrogeological unit, were subsequently gridded to 30 (approx. 1x1 km at the equator) and aggregated as the geometric mean at 5 resolution (approx. 10x10 km at the equator). To calculate transmissivities ( T [m 2 d −1 ]) aquifer thicknesses are needed. These thicknesses were estimated since no global data-set of observed aquifer thickness is available. Using the assumption that productive aquifers coincide with sedimentary basins and sediments below river valleys, the distinction is made between (1) mountain ranges, and (2) sediment basins representing the aquifers. Subsequently, aquifer thicknesses were estimated by relating these to terrain attributes (e.g. curvature) after calibration of these relations with reported aquifer thicknesses from U.S. groundwater modeling studies (see de Graaf et al., 2015 for an extensive description of this procedure to delineate aquifer systems and estimate thicknesses). These estimated thicknesses represent the productive aquifers until the first impermeable basis, meaning that permeable and (thinner) confining units are lumped together into one big aquifer system.

Delineation of confining layers
In this study we delineate confining layers and categorize the aquifers of the world in confined and unconfined aquifers. The categorization is done using information on grain sizes of unconsolidated sediments in the lithological map (GLiM) and addi-  (2012) b Gleeson et al. (2011) , log k μgeo is the geometric mean logarithmic permeability; σ is the standard deviation; f.g. and c.g. are fine-grained and coarse-grained, respectively. c S y is the storage coefficient, average per category. tional information in the sediment property description of GLiM. Fig. 2 A shows the resulting map of confining layers. Some clear spatial inconsistencies can be seen, e.g. for the US over the High Plain aquifer, or the German-Polish border (inconsistencies are discussed in Hartmann and Moosdorf (2012) ). We interpreted the GLiM lithologies of 'fine-grained unconsolidated sediments' and 'mixed-grained unconsolidated sediments' as potential near-surface confining units. About 30% of the mixed grained unconsolidated sediments have layers of clay, silt or till in the sediments description. We assume, based on the uncertainty in the sediment de-scriptions, that this 30% represents the minimum percentage of unconsolidated mixed grained sediments with a confining unit. Additionally, we classified regions of the world where we know confining layers are present but are not fully represented in GLiM; the coastal zones. We reconstructed the inland extent of Holocene fine-grained coastal plains that overlay older, coarser sediments from the Pleistocene. A reconstruction of sea-level rise (e.g. Toscano and Macintyre, 2003 ) and coastal sedimentation (e.g Syvitski et al., 2005 ) implies that 60 0 0 years ago along many Holocene coastal plains the coastline was more seawards compared to the modern situation. Coastlines have receded tens of kilometers since in many deltas (e.g. Stanley and Warne, 1994 ). With sea-level rising over the Holocene, the downstream gradient along the river flattened, and sediments accumulated in the downstream part. The apex of the delta and deposition areas indicate the point where accumulation originated. This accumulation point is reconstructed using the profile curvature along the river, with its occurrence where the curvature changes from convex to concave starting from the coast. The accumulation points were identified globally for larger rivers ( > 25 m wide) with their outlet on, or near the coast (i.e. within 50 km). Points at high elevations, clearly not related to sedimentation in the coastal zone, were excluded. Fig. 3 A shows a schematic cross-section of a coastal aquifer with a confining layer on top. Thickness of the confining layer at the coast is estimated using ocean bathymetry (elevation taken one 5 grid-cell out of the coast). The thickness at the coastline is interpolated along the river using the river gradient until the apex, where thickness is minimal. The interpolation is done for all identified large rivers and thickness of the coastal plains is derived by interpolation between the rivers, bounded by the elevation contours on which the apex are situated ( Fig. 3 B. Following this approach, approximately 11% of the global coastline is classified as confined after interpolation.

Confined aquifers: aquifer transmissivity and storativity
Confined aquifers are defined here as those aquifers overlain by fine grained confining layers. We realize that the term 'confined' aquifers may not entirely cover the different hydrogeological settings that are found in reality. First, aquifers covered with finegrained sediments may still allow vertical flow through the confining layers and are thus in fact leaky or semi-confined aquifers. Second, due to the varying thickness and permeability of the confining layers certain areas may be in fact unconfined, depending on the spatial scale under consideration. Third, in areas with excessive groundwater pumping the drawdown close to wells may cause part of the originally (pressurized) groundwater head to fall below the top of the aquifer, causing it to be partially unconfined. All these cases mean that what we define here as confined aquifers  Table 1 ), for the confined aquifer kh and kv were taken similar to coarse grained unconsolidated sediments. S is the storage coefficient, defined as specific yield (confining layer, values Table 1 ) or storage coefficient (confined aquifer). may be better termed as partially and semi-confined aquifers. Having noted this limitation, for the sake of briefness of terminology, we call all aquifers covered with fine grained confining layers 'confined aquifers', including the hydrogeological settings described above. Finally, we note that in the implementation in MOD-FLOW we use a global two-layer model where for the areas with unconfined aquifers the top layer is given the same hydraulic properties as the underlying aquifer.
The parameter settings used for the unconsolidated confined aquifers and the confining layers are shown in Fig. 4 . For all other rocks and sediments the values from Table 1 are used. In absence of any other information, vertical conductivity was initially assumed to be the same as the horizontal conductivity and next, a calibration procedure was used to calibrate the storage coefficient, the horizontal conductivity as well as the anisotropy ratio kh/kv assuming kh ≥ kv (see Table 1 ). Aquifer storage capacities were parameterized using specific yields for unconfined aquifers ( Table 1 ) and for confined aquifers a storage coefficient of 0.001 ( Heath, 1983 ) was assumed. As mentioned before, the estimated aquifer thicknesses represent productive aquifers until the first impermeable basis. This means that aquifers separated by semipermeable layers are lumped into one big aquifer system. As a result, head declines may be underestimated. By including confining layers where fine-grained sediment properties are present and using an anisotropy ratio kh/kv (with kh ≥ kv), we expect to implicitly correct for neglecting semi-permeable layers and lumping multi-aquifers into one system.
Thickness of coastal confining layers is estimated using ocean bathymetry and interpolation (as described in 2.3.2). For 80% of the coastal confined grid-cells the estimated confining layer thickness is approximately 10% of the total estimated aquifer thickness. Based on this finding and by lack of any other information, we decided to apply a thickness of 10% of the total estimated thickness to all confining layers not located near the coast.

Analyzing the effect of schematization
We analyzed the effects of incorporating confined aquifer systems on simulated groundwater head fluctuations, groundwater flow patterns, groundwater-surface water interactions and storage changes. Three scenarios were formulated describing different spatial distributions of confined and unconfined aquifer systems: (1) unconfined systems only (following de Graaf et al., 2015 ), (2) confined systems for fine grained unconsolidated sediments and mixed grained unconsolidated sediments with fine grained layers (30%) (in GLiM) and coastal confined regions (minimum confining scenario), and (3) confined systems for fine grained and all mixed grained unconsolidated sediments and coastal confined regions (maximum confining scenario). The parameter settings of confined and unconfined systems are used as described above ( Fig. 4 and Table 1 ).

Calibration and validation of simulated groundwater heads
In order to calibrate and validate the groundwater model, timeseries were selected from the US (available from the USGS: www. usgs.gov/water ) and Europe (Rhine-Meuse basin: Sutanudjaja et al., 2011 ). The used time-series have a record covering at least five years and include seasonal variation; for the US ∼280 0 0 stations, for Europe ∼60 0 0 stations were available. An earlier steady-state version of the model ( de Graaf et al., 2015 ) was already validated on all observations provided by Fan et al. (2013) (giving a total of 65 303 cells with observations), which also consist of observations from Spain, Brazil and Australia. However, many of these observations are not time series but single or time-average values. We therefore used these in our previous steady-state version ( de Graaf et al., 2015 ) but not to validate the present transient simulation. In case multiple stations were positioned in one grid cell, we chose not to average these but use them as they are, because the different stations generally have observations over different time periods. Also, stations have different surface elevations or are placed in different aquif ers (i.e. deep or shallow). We randomly selected half of the observations for calibration and half for validation (split sample approach).
Limited calibration was used to better fit the groundwater model to the head observations. The calculation times (approximately 2 weeks for a 1960-2010 run) were too long to allow for an full-fledged automated calibration (e.g. Doherty, 2015 ). As a first step, we ran the groundwater model in steady state to calibrate the horizontal conductivity and the anisotropy ratio of the confining layers. A single global pre-factor was used varying the horizontal and vertical conductivities between 0.01 and 100 times the initial value, changing the anisotropy ratio kh / kv (a total of 9x9 = 81 runs). Using the parameters of the best performing run (the parameter setting with the minimum root mean square error RMSE) we then further calibrated the model performing six transient runs while changing the storage coefficient between 0.1 and 5 times the initial value and allowing the horizontal conductivity and anisotropy ratio to be changed +/ −10%. This calibration was performed using the maximum confining conditions, as we see this as the most realistic scenario. The best performing parameter set (with minimum RMSE) found was also used using the minimum confining conditions, and the results of both runs were compared.
The model performance was validated by evaluating simulatedto observed water table head time-series (the other half of the head observation locations not used for calibration). Heads (with reference to ocean level) instead of depths were used as heads measure potential energy and are therefore more meaningful than depths. The evaluation focused on mean monthly values, timing of fluctuations (given by the cross-correlation coefficient R ), and amplitudes. The latter is compared by the (relative) inter-quantile range error, QRE , calculated as: where, IQ m 7525 and IQ o 7525 are the inter-quantile ranges of the modeled and observed data time series.
The cross-correlation is calculated as: where s m and s o are the sample standard deviations of the modeled (m) and observed (o) samples, and s mo is the sample covariance between the two. It was evaluated which stations performed good agreement in timing ( R > 0.5) and fluctuation magnitude ( Q RE < 50%). Results are shown in scatter plots. Additionally, we compared averaged simulated heads to averaged heads of the observed time series and a wider set of observations including observations of Spain, Brazil, and Australia provided by Fan et al. (2013) . Apart from these global validation measures we also evaluated the performance of the model in three areas where groundwater depletion is severe (as follows from the global analysis, sections 2.7 and 3.5): The High Plains aquifer (USA), The Central Valley of California (USA) and India. For these areas, we compared on an aquifer-by-aquifer basis the average head decline rates (shown in the supplementary online information).

Groundwater flow patterns
The effect of considering confined aquifers on groundwater flow patterns is analyzed by simulating groundwater flow paths. Groundwater flow paths and travel times were simulated for heads averaged over the simulation period, comparing two model setups: unconfined and confined aquifer systems (the unconfined and maximum confining scenarios). Flow paths were calculated with particle tracking in MODPATH ( Pollock, 1994 ), using cell-tocell flux densities [volume/time/area] as input. In MODPATH a flow path is computed by tracking the particle through the subsoil from one cell to the next, from the location of recharge (at the water table) toward the point of extraction or drainage (strong and weak sinks).
Following Schaller and Fan (2009) the spatial redistribution of local recharge was evaluated using the ratio between groundwater drainage and recharge: Qgw / Rgw . The ratio was estimated for sub-basins that we defined using stream-orders, starting from the second stream-order at 5 . The local drainage network was defined at the 5 grid-cell, and for every cell a stream is present. It follows that, for a ratio of 1, all water recharged in the basin is also drained in the basin, or cross basin-boundary groundwater inflows and outflows are approximately equal. If part of the water recharged in a catchment flows to a neighboring catchment and it is not compensated by cross-boundary inflow, the catchment's groundwater drainage is less than groundwater recharge, thus Qgw / Rgw < 1. The catchment is then classified as a 'net groundwater exporter'. If more water is drained than recharged in a catchment, the catchment is classified as a 'net groundwater importer' and Qgw / Rgw > 1. Deviations from 1 are primarily a function of geology and topography, while climate and basin size influence the magnitude of these deviations ( Schaller and Fan, 2009 ). The spatial pattern of net importers and exporters is expected to change by including confined aquifer systems.

Analyzing the global effects of human groundwater use
We used the two-layer transient global groundwater model to analyze the effects of human groundwater withdrawals on groundwater heads. The groundwater abstractions taken from the 1960-2010 run with PCR-GLOBWB (including human water use) were subsequently used as input for the MODFLOW model through the WELL-package. For confined systems all abstractions were located in the confined aquifer, for unconfined systems abstractions were located in the lower layer, which has the same conductivity and storage coefficient as the top layer. The MODFLOW model was additionally forced with PCR-GLOBWB river levels and net recharge outputs (recharge minus capillary rise) that include the effects of abstractions and corresponding return flows over Table 2 Parameter values used in the calibration process.

Delineation of confining layers
Fig. 2 A shows the spatial distribution of the confining layers. For the minimum confining scenario about 6% of the total aquifer area is classified as confined, i.e. belonging to either coastal confined, or fine grained and layered mixed grained unconsolidated sediments (in GLiM, Hartmann and Moosdorf, 2012 ). This 6% is assumed to be the minimum extent of confined aquifers systems worldwide. For the maximum confining scenario about 20% of the total aquifer area is classified as confined, belonging to either coastal confined or fine grained and mixed grained unconsolidated sediments. This scenario is assumed to represent the maximum extent of confined aquifer systems. The relative distribution per continent does not differ much between the two scenarios ( Fig. 2 C). Combining coastal and non-coastal confined aquifers, respectively 5.39 x 10 6 km 2 to 17.34 x 10 6 km 2 of the world's aquifers are classified as confined. Table 2 shows the parameter ranges of horizontal ( k h ) and vertical hydraulic conductivities ( k v ) and storage coefficients ( Sy ) used in the calibration and the resulting 'best' parameter set (with minimum RMSE). It was found that the model performance is best when the ratio between k h and k v is 1:0.1 and when Sy was multiplied by 3. With these parameters the model performance was evaluated for the three scenarios (unconfined, minimum, or maximum confining) focusing on mean monthly values, timing of fluctuations and amplitudes of groundwater heads. Simulated head fluctuations were compared to the observed head time series (for Europe and US) that are not used in the calibration. Note that the locations of these head time series are often biased towards river valleys, coastal ribbons, and productive aquifers.

Calibration and validation
Scatter plots, plotting the grid-cell minimum | Qre | ( Eq. (5) ) and maximum R ( Eq. (6) ), for the three scenarios are given in Fig. 5 . Overall these scatters show good agreement in timing ( R > 0.5), which (slightly) improves when confined aquifer systems are included, seen as more points above the 0.5-line. Also, overall amplitude errors are small (| Qre | < 50%). When including confining layers the amplitude errors increase slightly, seen as more scatter to the 50%-line and above this line. This slight increase in amplitude errors is particularly found in areas with considerable recharge under unconfined conditions and/or where groundwater levels are disconnected from the local drainage in the confined scenario and where the model most likely underestimates groundwater heads. Next, we evaluated for how many head observations (averaged over the grid cell) our simulated heads show good agreement on the two criteria (cluster I in Fig. 5 ), how many perform good on one criteria (good on amplitude: cluster II, good on timing: cluster III), and how many perform bad on both criteria (cluster IV). Results are given in the table of Fig. 5 . For the unconfined scenario 26% of the simulated head time-series show good agreement, compared to 29% for the maximum confined scenario. Examples of time-series of the observed and simulated heads for which timing and amplitude are well captured are given in Fig. 6 . Instead of plotting actual head values, we plotted the anomalies related to their mean values. The examples show that the model is able to capture both timing and amplitude of the observed heads reasonable well, and that including confining layers improves the estimates, seen in particular for the amplitude estimate in the Central Valley example.
The scatter and statistics ( Fig. 7 ) of temporal mean simulated values against mean observed values for the maximum confining scenario show that averaged values are reproduced well (very high R 2 and α close to 1) and are slightly better when only head locations in areas with sediments and sedimentary basins are considered in the comparison. Model performance of the other scenarios is comparable (see the table of Fig. 5 . However, seen from both scatters the trend is for a general underestimation of groundwater heads (meaning simulated groundwater levels are deeper than observed). This appears especially for the higher elevated areas, where shallow water tables are most likely to be sampled but are not captured by our model as a result of the grid resolution. The R 2 of sediment basins is therefore slightly better than when mountain areas are included in the evaluation, however groundwater heads in sediment basins are underestimated. Fig. 8 (top) shows the long-term average groundwater depth simulated for the maximum confining scenario simulated under natural conditions (no groundwater pumping). General patterns in groundwater depths can be identified, and are similar for the three scenarios and comparable to previous results by de Graaf et al. (2015) . At the global-scale, sea level is the main control of groundwater depth and throughout the entire coastal ribbon shallow groundwater tables are found. These areas expand inland where the coastal ribbon or coastal plains meets the major river deltas, like the Mississippi, Indus, and large wetland areas of e.g. South America. At the regional scale, recharge is the main control together with topography. Topography influences for example the heads in the flat areas of central Amazon and the lowlands of South America that receive groundwater from the higher elevated areas. High, flat recharge regions also result in shallow groundwater tables, e.g. the tropical swamps of the Amazon. Regions with low recharge rates show deep groundwater tables; the deserts stand out where groundwater, if present, is now disconnected from the local topography. Also, for mountain regions deep groundwater tables are simulated. In these areas local aquifers in sedimentary pockets in mountain valleys are smaller than the grid resolution Fig. 5. Scatter plots plotting amplitude error (| QRE |) against correlation ( R ) for the three scenarios, the maximum confined scenario shows the best agreement to the observed head time series. When more than one observed time-series was available in a grid-cell, the result of the highest performance value is plotted in the scatter. In the top figure cluster areas are indicated where simulated heads show good performance both on timing and amplitude error (I), good performance of timing (II), good performance on amplitude error (III), and bad performance on both (IV). Percentage of simulated head time-series per cluster for the three scenarios are given in the table, as well as model performance on average groundwater heads.

Global map of groundwater depth
( < 10 km) and are therefore not captured. As a result, groundwater heads in these regions are likely underestimated ( de Graaf et al., 2015 ).
The differences in groundwater depth between the two model layers (top: confining layer, bottom: confined aquifer) are small. Insets of parts of the US and Europe illustrate this ( Fig. 8 bottom). For unconfined systems, where top and bottom layers have equal conductivities, any groundwater depth differences are expected to be small. For confined systems, where conductivities are different for the top confining layer and bottom confined aquifer, the groundwater heads in the confined aquifers are often higher than those in the overlying layers. This is for example seen along the Dutch coast, Po river, Rhone, and along the Gulf of Mexico, including the Mississippi. However, when groundwater recharge does not seep through the confining layer groundwater tables can exceed the hydraulic heads of the confined aquifer. This is seen for example in the Mississippi embayment and the Garonne region (France). Fig. 6. Examples of groundwater head anomaly comparison between measured data (red) and simulated data from the different scenarios; unconfined (yellow), minimal confined (green), maximum confined (blue) for a location in the Central Valley (USA) and a location in the Netherlands. We selected these two locations to show the difference between an unconfined (Central Valley) and a confined (Netherlands) system, for two large aquifer systems. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Groundwater flow patterns
Flow paths simulations were performed using averaged groundwater heads  for the unconfined and maximum confining scenarios. We focus the discussion on part of the US ( Fig. 9 ) as this region contains shallow and deep groundwater tables and confined and unconfined aquifer systems, and has been the focus of related previous studies ( Gleeson et al., 2011;Schaller and Fan, 2009 ). Confined systems are e.g. found for the High Plain aquifer, along the Gulf of Mexico, and Mississippi embayment ( Fig. 9 left). The simulated flow paths show how groundwater recharge is redistributed over time and space ( Fig. 9 middle and right). Longer flow paths, crossing catchments boundaries, provide additional recharge to the importing catchment, thereby supporting water budgets in that catchment. The difference between the unconfined and confining scenario (maximum) is evident ( Fig. 9 ), e.g. seen for the coastal lowland aquifer, Mississippi Embayment aquifer, and the High Plain aquifer. Travel distances and travel times are generally shorter if confining layers are present. This results from the lower permeability of the upper layer, providing higher groundwater tables, higher drainage densities and as a consequence more local groundwater systems. Occasionally flow paths end up in the lower aquifer where they become disconnected from the surface water system over longer distances resulting in longer   The difference in spatial distribution is also shown by the ratio between groundwater baseflow and groundwater recharge ( Qbf / Rch ) per a sub-basin ( Fig. 10 ). In this study the sub-basin is defined by stream-order, starting from the second stream-order. A ratio of 0.1 means 90% of the recharged water is exported, a ratio of 2 or larger means at least half of the groundwater drainage comes from neighboring catchments. The histograms in Fig. 10 show the effect of confining units on the distribution of net groundwater importer and exporter sub-basins. The results show that the change in distribution of groundwater exporter and importer sub-basins is small when confining layers are introduced. Fig. 11 shows, for the maximum confining scenario, a global map of the cumulative depletion (mm per grid cell) of groundwater removed from storage by pumping. The well-known hotspots of groundwater depletion appear, e.g. High Plains aquifer, California, Indus basin, and Saudi-Arabia (e.g. de Graaf et al., 2013;Richey et al., 2015;Scanlon et al., 2012 ). As an additional means of validation we compared the simulated head drops by groundwater depletion with reported rates in a number of well-known stressed aquifers ( Gleeson et al., 2012;Richey et al., 2015 ): the Central Valley, High plains aquifer, and India (see Fig. 1 in the Supplementary  Information). These results show that depletion estimates in California, the Southern Great Plains and India are reasonable to good (of similar order of magnitude and spatial pattern), while those in the northern part of the Great Plains are severely over-estimated. Possible explanations for this mismatch are an underestimation of aquifer permeability and storage coefficient, overestimation of groundwater abstraction rates and the underestimation of enhanced surface water infiltration.

Analyzing the global effects of human groundwater use
We compared the simulated depletion trends for the maximum confining scenario and unconfined scenario, calculated using head declines and storage coefficients ( Fig. 12 ). We also compared these estimates to the difference between recharge and groundwater ab- Table 3 Previous published depletion cumulative over 1960-20 0 0 (as most studies did not simulate until 2010) compared to this study. straction (in case of deficit) per grid-cell, which is the way that depletion rates were estimated using flux-based methods ( Pokhrel et al., 2012;Wada et al., 2010 ). The total trend for the maximum confining scenario shows a global increase of depletion with some periods of recovery between 1960-20 0 0, after which we see an acceleration of depletion until 2010. The recovery periods (1965)(1966)(1967)(1968)(1969)(1977)(1978)(1979)(1980)(1981) must be the result of climate variability-related increases in recharge and associated increased capture from surface water, as we observe an exponential increase in abstraction minus recharge over the same period. The sudden increase in depletion after 1998 is most likely an interplay between climate (e.g. the 1982-83 strong El Niño) and increased surface water and groundwater withdrawals related to socio-economic trends ( Wada et al., 2010 ). Figure 2 (Supplementary Information) shows that effect of hydraulic properties on the groundwater depletion volumes (note volumes, not heads) is considerable, which makes estimates of groundwater depletion by volume-based methods rather uncertain.
Using the maximum confining scenario and the best parameter set, the total groundwater depletion over 1960-20 0 0 is estimated as 2703 km 3 and over 1960-2010 at 7013 km 3 (resulting in an average depletion rate of 431 km 3 over 20 0 0-2010). The depletion estimate is comparable to previous published estimates of volume and flux-based approaches (see Table 3 ). Fig. 12 shows that the un-  Fig. 12. Trend of depletion globally, estimated for the maximum confining and unconfined scenario, and calculated as the cumulative deficit between recharge and abstraction for grid-cells where abstraction is larger than recharge. confined scenario shows in general the same trend. However, the estimated groundwater depletion is larger. The estimated depletion difference is 4803 km 3 . This difference can be explained by the increase in river capture under the confined scenario. The presence of a confining layer results in a larger area of influence of head decline.
Despite the lower permeability of the confining layer, this larger area of influence causes a larger decrease in baseflow or a larger increase in riverbed infiltration. Indeed, the groundwater drainage rate of the confined aquifer model is ∼50 km 3 y −1 lower than that of the unconfined aquifer model, which adds to 30 0 0 km 3 over 1960-2010.
The estimated depletion calculated as the deficit between groundwater recharge and abstraction for cells where more water is abstracted than recharged (flux-based method as used in e.g. Pokhrel et al., 2012;Wada et al., 2010 ) shows much bigger depletion volumes of 26,700 km 3 over 1960-2010 (with an average depletion rate of 323 km 3 over 20 0 0-2010). Again, the difference in estimated depletion can be explained by the increase in surface water capture, which is not accounted for by calculating the recharge-abstraction difference but is included in the groundwa-ter model. The large difference confirms the need to use a lateral groundwater model, accounting for groundwater-surface water interactions, when sensitivity of aquifers to storage changes is studied.

Conclusions
This paper presents a global-scale groundwater model simulating lateral flows and head fluctuations caused by changes in climate or human water use over the period 1960-2010. It is the first global-scale transient groundwater model that includes a parameterization of unconfined and confined systems (presented in two model layers), and simulates lateral-flows and groundwater-surface water interactions in these systems. This results in more realistic estimates of groundwater head fluctuations and storage changes than before.
One of the main contributions of this study is the parameterization of the world's aquifer systems, including information on their vertical structure. The model's aquifer parameterization is based on global data-sets of surface geology and hydraulic properties and topography-based estimates of the vertical structure of the aquifer systems. Only globally available data sets are used in order to keep the methods readily applicable to data-poor environments. The world's aquifers are classified into confined and unconfined systems to understand aquifer sensitivity to groundwater abstractions and to properly project future groundwater level declines.
Results show that including confining layers (estimated 6-20% of the total aquifer area) improve estimates of groundwater head fluctuations and timing and change flow path patters and groundwater-surface water interaction rates. Model performance on average heads and timing of fluctuations is slightly better when confined systems are considered, and best for the maximum scenario (20% of the total aquifer area is confined). Groundwater flow paths within confining layers are shorter compared to paths in the aquifer, while flows within the confined aquifer can get disconnected from the local drainage system due to the low conductivity of the confining layer. This change in groundwater hydrology is reflected in head fluctuations and flow magnitudes. Although model performance in terms of heads is only slightly better when confined aquifers are considered, the estimated depletion rates are closer to previous volume-based estimates (e.g. Konikow, 2011 ) and show that capture of surface-groundwater interaction are simulated more realistically than before.
Lateral groundwater flows between basins are significant in the model, especially for confined aquifer areas where long paths are simulated crossing catchment boundaries, thereby supporting water budgets of neighboring catchments or aquifer systems.
The estimated global depletion over the period 1960-2010 is 7013 km 3 . This value is in the range of earlier published values (e.g. 80 0 0 km 3 ( Wada et al., 2010 ), 50 0 0 km 3 ). Hotspot regions of depletion are found for intensively irrigated areas, like the High Plains aquifer (simulated at 1390 km 3 for 20 0 0) or Central Valley (simulated at 48 km 3 for 20 0 0). Although depletion is strongly overestimated for the Northern High Plains compared to reported data ( ∼240 km 3 reported for 20 0 0 by USGS ( Scanlon et al., 2012 )), simulated temporal fluctuations mimic measured data well. The same holds for Central Valley, where we slightly underestimate total depletion (which is reported at ∼53 km 3 for 20 0 0). The comparison of depletion volumes for confined, unconfined, and estimated as the deficit between recharge and abstraction, shows increased river capture lowers the estimated depletion volume. This confirms that a lateral groundwater model, with realistic parameter settings for its aquifers, is preferred when sensitivity of aquifers to groundwater abstractions is studied.
Of course there are several limitations to our approach. Firstly, the grid-resolution (5 , approx. 10x10 km at the equator) is still too coarse to capture local aquifers in higher and steeper terrain. Groundwater heads are underestimated for these areas. Note however, that perched water tables within the soil are accounted for in the PCR-GLOBWB.
Secondly, our groundwater model is not fully coupled to our hydrological model, meaning the groundwater-surface water interaction is still simplified: two-way groundwater-surface water exchange and the resulting groundwater recharge is first simulated in PCR-GLOBWB. The simulated groundwater recharge, and the calculated river levels are imposed on the MODFLOW RIV package afterwards. Although this approach can be expected to preserve the same groundwater-surface water fluxes in the groundwater model, the effects of groundwater pumping on the surface water system are more intricate and would preferably require a tighter coupling. Thus, this modeling study is the next step towards a fully coupled global surface water-groundwater model. In future work the two models will be fully coupled to incorporate feedback effects on a daily time step basis.
Thirdly, the most obvious limitation is that the aquifer parameterization used in this study is a relative simple approximation of the true 3D-hydrogeological setup of aquifer systems that contain multiple stacked aquifers and aquitards. Our model thus produces only a first-order estimate of head fluctuations, storage loss and average head decline over the entire aquifer set, most likely underestimating depletion rates when abstraction comes from deeper confined aquifer systems. Hence, the global hydrogeological model, although at the edge of what can be expected when using only global information, leaves much room for improvement. A concerted effort of the hydrogeological community to use information from regional on hydrological-and groundwater models holding more detailed information of e.g. presence of confining layers or conductivity values, as well as observations to update the global two-layer model would likely lead to the largest improvement in global groundwater assessments . Also, more detailed information on the precise vertical hydrogeological structure of the aquifers around the observation well screens, used for calibration and validation of the model, would improve the calibration results. Knowledge on the sensitivity of aquifers to changes in climate and human water use is vital and needed to ensure sustainable water use particularly for the semi-arid regions of the world, where groundwater abstractions will intensify due to an increase in drought frequency and duration combined with population growth, expanding irrigation areas, and rising standards of living. Therefore it is important not only to focus only on areas already experiencing groundwater depletion, but to provide a global overview revealing where and when in future new areas experiencing water scarcity will develop in the near future.