UvA-DARE ( Digital Academic Repository ) Talk and Tools : The best of both worlds in mobile user interfaces for E-coaching

In this paper, a user interface paradigm, called Talk-and-Tools, is presented for automated e-coaching. The paradigm is based on the idea that people interact in two ways with their environment: symbolically and physically. The main goal is to show how the paradigm can be applied in the design of interactive systems that offer an acceptable coaching process. As a proof of concept, an ecoaching system is implemented that supports an insomnia therapy on a smartphone. A human coach was replaced by a cooperative virtual coach that is able to interact with a human coachee. In the interface of the system, we distinguish between a set of personalized conversations (BTalk^) and specialized modules that form a coherent structure of input and output facilities (BTools^). Conversations contained a minimum of variation to exclude unpredictable behavior but included the necessary mechanisms for variation to offer personalized consults and support. A variety of system and user tests was conducted to validate the use of the system. After a 6-week therapy, some users spontaneously reported the experience of building a relationship with the e-coach. It is concluded that the addition of a conversational component fills an important gap in the design of current mobile systems.


Introduction
In the interaction with the world that surrounds us, two types of information flow can be distinguished: a symbolic and a physical one. On the one hand, human beings can interact symbolically with other individuals using verbal and non-verbal signs to express their thoughts or ideas (e.g., asking, providing information, or giving commands). On the other hand, humans can interact with the world physically by directly manipulating the domain (e.g., pushing, moving, lifting) and through sensing (e.g., hearing, seeing, touching). The essential difference between the two interaction types is that interactions of the first type need an interpreter who can bridge the gap between the symbols and their actual meaning and purpose, while interactions of the second type are related in a more direct manner to human perception and action [1,23,39].
The distinction has given rise to two major existing paradigms for designing user interfaces of computerized systems: (a) the so-called conversational or dialog interfaces that mimic the conversational behavior between two social actors [27] and (b) the so-called model world or direct manipulation interfaces that involve continues representations of objects and real-time feedback of their behavior or manipulations [24]. With the advent of mobile computer systems and other technological progress, there is a need to integrate the two paradigms in light of these changes. Users can BTalk^with these systems supported by a conversational interface, and they can work with these systems supported by BTools^based on direct manipulation interfaces. Here, the Talk pertains to spoken or written dialogs between human users and/or a computer system that behaves as a social actor, while the Tools enable the observation, analysis, and manipulation of task objects in a physical or virtual domain. We will, therefore, refer to this integration as the Talk-and-Tools paradigm.
In this paper, we will apply the Talk-and-Tools paradigm in the domain of e-coaching for behavior change. The main goal is to show how the paradigm contributes to an acceptable coaching process and to present the mechanisms behind a concrete implementation of the paradigm. A human coach will be replaced by an automated e-coach-a virtual social agent that mimics the role of a personal coach. To make the discussed concepts and their relations more concrete, the paradigm is realized in a mobile application called SleepCare-a system that supports an individual human client (the so-called coachee) with a tailored insomnia therapy on a smartphone. The Talk provides for dialog and periodic consults with the ecoach and the Tools for data acquisition, analysis, and visualization.

Theoretical background of the Talk-and-Tools paradigm
The rationale behind the use of the Talk-and-Tools paradigm is depicted in Fig. 1, where the distinctive interaction channels are represented in a triangular structure. The corners represent the external world and two social actors, and the arrows represent the information that flows between them. The external world is often called the domain of discourse and the social actors may embody the participants in a conversation. The Tools enable the participants to have different views on the external world or different ways to manipulate objects and their features [3].
The domain of discourse can be represented by an object model, i.e., a set of concepts representing the entities, their properties, and relations under discussion. For example, in the insomnia therapy domain presented below, key entities are Bbedtime,^Badherence,^Bsleep efficiency,^and Bexercise^. Not all entities discussed in the conversations between the two agents are represented in the domain of discourse, greetings, for example, although vital for the human interaction, are not.
The domain of discourse can be replaced by a computer system, containing a database, an ontology, a simulation program, or a digital game, and may even be in connection with a sensor or effector system to connect to the physical world. In the latter case, the system not only enables the measurement or manipulation of the physical conditions, such as light and location, but also conditions of the human body of a human participant (e.g., skin temperature, heartbeat). Moreover, one of the participants (e.g., Participant X) may be simulated by a computer system that behaves as a social agent and that has the ability to access the domain of discourse. To the human user (e.g., Participant Y), the appearance of the computer agent may take different forms, ranging from a simple text field to a human robot or a sophisticated embodied character that interprets and generates speech and shows non-verbal expressions.
The paradigm enables a human user to view interaction with computer systems as (i) a conversation with some intermediary, (ii) a sequence of direct manipulations and observations of some virtual world, or (iii) a combination of the two. The third type of interaction was applied in the design of, for instance, SHRDLU [45], Collagen [34], and the DenK-system [1]. In SHRDLU, the concrete domain was a simulated blocks world; in Collagen, a virtual videocassette recorder; and in DenK, a virtual electron microscope.
To be able to Talk, the computer agent employs an intermediate structure or discourse model for interpreting messages and Fig. 1 The rationale behind the use of the Talk-and-Tools paradigm: an external world and two social actors who participate in interaction; the arrows show the flow of information between entities constructing responses [37]. A simple approach to initiate a response from the agent is to develop a structure that uses a combination of word categories and pattern rules as in Eliza [44] and Alicebot [42]. Others use logic-oriented approaches that include contextual information and the ability to reason about the domain of discourse or about the mental state (e.g., beliefs and goals) of the other communication partner. For this purpose, the agent usually contains information structures about the dialog, the domain, the tasks, and the users. Examples of such structures are dialog grammars or finite state machines to model the dialog [25]; Type-Theory to represent knowledge structures and discourse [1]; the context-change approaches to dialog act interpretation and generation [10]; and Beliefs-Desires-Intentions architectures to reason about, for instance, goals and plans [2,33,43].
In the present work, participant X will be replaced by an intermediate structure that mimics the conversational behavior of an e-coach in the insomnia therapy domain; the direct interaction flow will be represented by Tools that form a coherent structure of visual and auditory interaction facilities. To trigger the system's communication activities, we will introduce an activity schedule and a so-called Bconstraint-basedm odel that will be explained below.

Coaching in the domain of insomnia
An essential question for the development of a virtual participant is what should be communicated by that participant in its role of a coach at a particular moment in time. Here, we assume that the interactive behavior of the coach is determined by three elements: the typical behavior of a coach in terms of goals, knowledge, and responsibilities; the constraints of the therapy; and the behavior of the other participant. Let us first briefly discuss the typical behavior of a coach.

Coaching
Coaching can be defined as a result-oriented, systematic process in which the coach facilitates the enhancement of life experience and goal-attainment in the personal and/or professional lives of normal, non-clinical clients [17] (p. 254). In the coaching process, two learning dimensions are distinguished on the part of the coachee that strikingly correspond with the Talk-and-Tools paradigm [38]: (1) learning as a social and collaborative practice and (2) learning through individual subjective experiences. The first dimension refers to the idea of a collaborative dialog that unfolds between the coaching parties and where the coachee learns in interaction with the coach (Talk); the second refers to the experiential (perception and action)-oriented process of the coachee (Tools). Through the dialog, the coach and coachee exchange information to achieve alignment, set goals, and discuss the results of the individual activities and exercises [18]. In contrast to therapy and teaching, we assume that coaching involves a more collaborative approach and respect of the coachee's autonomy. Society and daily lives of people are highly contextual and characterized by a growing degree of uncertainty. What counts as a solution for one person may not matter as a solution for another. Consequently, coaches will be cautious in offering solutions and help coachees to learn rather than to teach them. In our view, coaching refers to a more explorative interaction style, a highly personalized and contextualized process of deliberation characterized by a continuous cycle of questioning, advising, agreement, observation, feedback, and adaptation. Since computer systems contain only a fraction of the knowledge necessary to offer an adequate tailored therapy, the role of a coach seems an excellent candidate for a digital agent that supports automated self-help therapies; we will here refer to such systems as ecoaching systems.

E-coaching
E-coaching systems exist in many forms and functions, may include sophisticated avatars, and may be targeted at a variety of domains. Here, we focus on fully automated mobile systems (i.e., without human interference) that aim at a form of sustainable behavior change in the health domain. Health coaching dialog systems have been developed on the basis of research methods from persuasive technology (e.g., [14]) and behavior medicine (e.g., [36]) and cover a wide range of health domains, ranging from the treatment of obesity to sleep disorders and revalidation. In [8], for example, an e-coach was described for supporting overweight people to improve their lifestyle. The e-coach was able to help motivated participants to adhere to the program and effectively lose weight. In [9], an e-coach is described that offers a fully automated treatment for depression, based on behavioral activation. In [11], an e-coach is described that offers an insomnia treatment in six weekly sessions. Also, the research by, for instance, [35,40,41], showed promising effects from this type of approaches in the health domain.
While many studies mainly focus on one specific domain, some projects aim at developing a reusable framework for ecoaching. Examples are the work by [7,12,28,31]. In particular, e-coaches of [7,12] provided fully automated Talk-based communication, albeit the latter used only text messages instead of a dialog interface. In [7], various theory-driven computational models are introduced to develop a knowledge representation for behavior change counseling and focus on modeling counseling knowledge from which dialog actions can be inferred.
In line with [7], we aim at developing a reusable framework based on generic coaching principles, but our perspective starts from interaction, communication, and cybernetics theory. We will incorporate the integration of Talk and Tools as the central interaction paradigm in e-coaching. Before we discuss the underlying principles of our e-coaching model, we first briefly present the insomnia therapy domain that will be used as a proof of concept.

Insomnia and insomnia therapy
Insomnia is a sleep disorder with a high prevalence (about 10% of the population) that can have severe individual and societal consequences (e.g., concentration problems, increased risks of accidents, depression, reduced productivity, increased absenteeism); people with insomnia have difficulty initiating and/or maintaining sleep. Today, it is widely accepted that cognitive behavior therapy (CBT) produces sustainable positive changes in the condition of insomnia [29].
CBT for insomnia (CBT-I) is designed to change dysfunctional beliefs, attitudes, and behavior that support sleepdisruptive habits, thoughts, and emotions and usually rests on two pillars: (1) a behavioral component to unlearn maladaptive habits and to learn sleep improving behavior and (2) a cognitive component to change a person's dysfunctional cognition, such as negative thinking or unrealistic expectations with respect to poor sleep. A CBT-I protocol usually takes between 6 and 10 weeks of coaching sessions and offers a variety of exercise types that differ in aim and properties [30]: sleep restriction, stimulus control, relaxation, cognitive therapy, and sleep hygiene. Sleep restriction involves curtailing the time spent in bed to stabilize the sleep pattern and lengthening sleep time as sleep efficiency improves. Stimulus control aims at restoring the coachee's association of the bed and the bedroom with sleep. Relaxation training involves methods aimed at reducing somatic tension. Cognitive therapy aims at the dysfunctional beliefs and attitudes. Sleep hygiene and education aims to make the person aware of practices and environmental factors that may either be detrimental or beneficial for sleep. These activities require the coachee to maintain multiple interactions with the coach over an extended period of time. The interaction frequencies can range from multiple times a day (e.g., for physical exercises), to once a day (e.g., journaling), or to one or more times per week (e.g., evaluation of progress). The actual intervention is preceded by a 1-or 2-week baseline sleep-diary monitoring period.

The SleepCare e-coaching system
The SleepCare e-coaching system is responsible for an acceptable coaching process that conforms to the proper standards of CBT-I. It behaves as an active partner, i.e., it does not only respond to interactional activities of the coachee, but also operates proactively and takes the initiative to interact with the coachee. The system provides one Talk interface and a set of Tool interfaces. The Talk interface provides the functionality for the conversational component in a WhatsAppstyle interface structure, while the Tool interfaces provide direct manipulation and observation of the insomnia domain.
To optimize the interaction, the two interface types are constantly intertwined and, depending on the type of information exchange, Tools can be accessed from the Talk interface and vice versa. Furthermore, the coachee can access any of the Talk-and-Tools interfaces from the main page of the system (i.e., using the provided menu, toolbar, or dashboard after activating the e-coach; see Fig. 6, right picture) or simply by reacting on a reminder/notification. Figure 2 shows the navigation through both interface types.
Before we discuss the unique contribution of the separate Talk-and-Tools interfaces to the system's interactive behavior, we first explain how the coaching process and the behavior are designed and implemented.

The coaching process in the SleepCare system
Basically, the system gives tailored advice to the coachee to do certain exercises, makes appointments for consultation, and gives feedback with respect to the current state of the coachee and progress in the therapy. In addition, it includes a variety of persuasive strategies that are targeted at improving the coachee's adherence to the therapy [6,20,21]. These strategies usually relate to increasing ability and/or motivation of the coachee [15]. For that, a distinction is made between two types of CBT-I-related activities: (a) main activities that have to be performed and experienced by the coachee, for example, sleep restriction and (b) supporting activities, such as activity scheduling and calculating sleep efficiency. To improve ability, supporting activities can be simplified by the Tools; for instance, a paper and pencil sleep diary in CBT-I is replaced by an electronic version in SleepCare. An additional benefit of built-in Tools is that interaction with these Tools may give valuable information to personalize the intervention. So, for instance, sleep efficiency can be automatically calculated on the basis of the electronic sleep diary and integrated in a proposal for bedtimes by the e-coach. Motivational strategies, such as shared decision-making, are usually related to main activities.
The therapy offered by the SleepCare system takes about 6 weeks and includes a number of assignments and consultation dialogs based on exercise types from CBT-I. The coaching process consists of three phases [5]: the opening phase, the intervention phase, and the closure phase (see Fig. 3). In the opening phase, the e-coach explains the details of the therapy, acquires relevant information to personalize the exercises, and establishes a shared commitment to the therapy. In the intervention phase, the actual therapy is conducted and the coachee is supposed to carry out the assignments. The closure phase starts when all assignments have been performed or when the coachee indicates the desire to withdraw; in this phase, the e-coach and coachee evaluate the offered therapy and say goodbye.
The exercises in the intervention phase can be very different in nature (e.g., in duration and intensity) but always follow the same pattern (see Fig. 3). They start with an introductory conversation (Alignment) followed by the establishment of a shared commitment about the assignments (Plan & Commit). Subsequently, the coachee performs the assignments (Task Execution), which are usually evaluated after 1 week (Evaluation). After the evaluation stage and depending on the results of the previous week, an exercise may be reestablished or ended (Closure).

The behavior of the SleepCare system
To mimic the behavior of an active partner, both an activity schedule and a constraint mechanism are implemented. On the basis of these two mechanisms, notifications and reminders are sent to trigger the coachee to start a particular activity, for instance, a conversation or a therapy-related exercise. The essential difference between the two mechanisms is that planned activities are triggered by a clock event in the schedule, while unplanned activities are triggered by interrupting events that may be detected during the therapy.
The primary task of the activity schedule is to keep track of the various assignments that have been or should be carried out during the therapy, such as consultation dialogs or relaxation exercises. All scheduled assignments result from an agreed contract between the coach and coachee (see Fig. 3). The end time of the activities is usually unknown, but in practice, the time interval of assignments is relatively short (e.g., between 1 and 16 min).
The constraint mechanism detects violations of pre-defined rules (the so-called constraints) such as in cases of nonadherence by the coachee to the agreed assignments (see [5]). Whenever a violated constraint is detected, it has to be repaired by the e-coach by triggering a conversation. The status of constraints (i.e., violated or not) is verified by a constraint checker at least once a day or after a registered event, such as the completion of a conversation or an exercise.
To enable the functionality of the e-coach, various knowledge repositories can be consulted by the system. We distinguish, for instance, static and dynamic knowledge. Static knowledge covers general and timeless background information about the coaching process and is the same for all coachees. Examples of this type of knowledge are a representation of the constraints, the notifications, and the structure and content of the conversations. The actual realization is based on dynamic knowledge about the coachee, such as subjective opinions about sleep quality, recorded activities, and the current stage of the therapy. This type of knowledge may See text for an explanation be considered as a long-term repository reflecting the formal knowledge that is built up during the coaching process.

The Talk Interface
The conversational mechanism of Talk considerably increases the expressivity of the coaching system and enables a designer to incorporate social elements that have a high impact on a person's behavior, such as personalized explanation and shared decision making. Conversations in SleepCare consist of a sequence of natural language moves between coach and coachee and are inspired by the periodic consults with a human coach. In the SleepCare coaching process, conversations may contribute to the following persuasive strategies (c.f. [4,14,20]): 1. Tunneling: Conversation enables the e-coach to focus and guide the coachee and tell which activities should be performed and which should be refrained from. This is what we call Btunneling.^Tunneling may consist of introducing or discussing activities in the coachee's daily life, such as filling in a sleep diary or changing bedtimes; in other cases, activities and habits such as drinking coffee or alcohol just before bedtime are discouraged. Hence, conversation improves the coachee's ability by tunneling therapy-related activities. 2. Improving transparency: The use of natural language substantially increases the expressivity of the coaching system. Consequently, conversation also enables the e-coach to manifest a variety of speech acts that pertain to the explanation and clarification of the therapy and the use of various Tools. In other words, natural language conversation explains why, how, and when activities should be performed. 3. Creating awareness: An important component of the therapy is the creation of the coachee's awareness about their behavior, thinking, and feelings. For that, conversations are included that discuss results, progress, and actual behavior of the coachee over a past period. These conversations are the stepping-stone to further personalization of the therapy. 4. Tailoring the therapy: Conversation enables the adaptation of exercises and communication to the circumstances and the characteristics of the coachee. To align e-coach and coachee, the therapy starts with the general introductory conversation where both the e-coach and coachee get acquainted to each other. Also, frequent feedback loops are included where the e-coach and coachee reshape the offered techniques, and where shared decision making facilitates the personalization of exercises. Conversational turns enable the user to adapt the amount of information and the presented discourse content: large chunks of text can be broken down into smaller pieces; options for content choices facilitates to present text to the interest of the coachee. 5. Building a relationship and improving engagement: The use of conversation introduces a wide range of social elements that creates a feeling of engagement and the presence of a social partner that collaborates, judges, and gives motivational support. In particular personalization, transparency and speech acts such as Bwelcome,^Bpraise,â nd Bpromise^considerably contribute to the establishment of a relationship that contributes to a feeling of trust and commitment by the coachee.
In the design of the conversations, we need a balance between flexibility and rigidity. Impracticable exercises, inaccurate or false information, and irrelevant discourse not only contribute to a feeling of irritation and mistrust but also may even cause dangerous situations in cases of, for instance, car driving or operating complex machinery. Therefore, conversations should be carefully designed with a minimum of variation to rule out unpredictable behavior. On the other hand, conversations should include the necessary mechanisms for variation to include personalization and to improve the coachee's engagement.
To meet these requirements, the dialog component that generates the conversation uses a recursive tree-like specification called the E-Coach Markup Language (ECML) described in [13]. This format provides pre-defined templates to describe the interaction moves and the mechanism to collect knowledge from various sources and to update the coach's knowledge base (the domain of discourse).
Conversations have some flexibility with respect to the information content, the surface structure of the expressions, and their timing. The flexibility of the content and surface structure of the moves is determined by branches in the tree and the variables in the templates. A particular branch in the tree may be selected on the basis of the coachee's choice in a number of pre-defined options or on the basis of contextual information, such as the coachee's sleep efficiency. Also, variables in the templates are assigned a particular value based on contextual information, such as time of day (e.g., BGood morning,B Good evening^) or coachee information (e.g., BYour sleep efficiency is 79%^). This idea is illustrated in the evaluation dialogs below where the values of variables are represented in italics and the coachee's selections of the multiple choice responses are in bold (adapted versions of the original Dutch dialogs; E: e-coach, C: coachee): In general, the flexibility in the coachee's response is limited to the selection of one or more options (three in Dialog I and II). Depending on the presented options, a selection may have different results (see also Fig. 2): 1. Updating the knowledge base of the e-coach, for instance, the coachee's opinion about the therapy 2. Activation of a specific tool, for instance, to fill in the diary, to start a relaxation exercise, or to edit the coachee's personal profile 3. Continuation of the conversation; in case there is only one option to continue, the option is included to break down large chunks of text into smaller ones. 4. Closing the conversation; the status of the conversation changes into Bcompleted.T he status completed is included for different reasons. First, the order of conversations plays a crucial role. For instance, a scheduled evaluation about a particular exercise type may not be generated before the introduction is completed. In fact, there is only one active conversation and as long as a conversation is active, other Bready-to-start^conversations are listed and opened only when the active conversation is completed (first in, first out). Also, completed conversations do not generate reminders. Another reason for the status is that it is used as a metric for the coachee's adherence.
With respect to timing, two types of conversation may be distinguished: planned conversations that are triggered by the schedule (e.g., introductions and evaluations of exercises) and unplanned conversations that are triggered by the violation of a particular constraint (e.g., detection of non-adherence).
Some, but not all, planned conversations are scheduled during the general introductory conversation; new conversations may also be scheduled as a consequence of performing particular conversations. In general, and depending on the behavior of the coachee, the frequency of Talk interactions is around two or three conversations a week.
Each time a conversation is triggered, either by the schedule or by a constraint, a notification is sent to the coachee (e.g., BLet us start the sleep training,^BLet us evaluate a relaxation exercise^); the actual realization of the notification is not part of the dialog with the coachee but is presented in the notification area of the smartphone. In case a conversation is ignored, the coachee will be reminded periodically (usually the next day) until the conversation is completed.
In Fig. 4, the type conversation is specified in a class diagram. Planned conversations contain two elements: a notification and a non-empty set of so-called interaction recipes. Unplanned conversations also include a representation of the constraint that triggers the conversation.
Interaction recipes are the central entities in the specification of a conversation. They represent one e-coach turn in the conversation and end with one or more options indicating the coachee's turn. To some extent, they may be compared with the notion of Badjacency pairs^in conversation analysis (see e.g., [26]). Interaction recipes can be called at each stage in the conversation to prepare the interaction with the coachee based on a script containing a set of dialog actions. Dialog actions consist of one or more templates that can be selected on the basis of a condition. The actual text of the conversation is embedded in the templates. A condition refers to expressions that refer to concepts in the domain of discourse and so define the current context of the coachee.
In Dialogs I and II, turn E1 and C are the result of the execution of one interaction recipe. Each sentence (or paragraph) in E1 is represented by one template. In E1, the same templates were included, but the assigned value of the variable in the greeting is different. In E2, the content of the first sentence is based on the coachee's selection, but the last sentence is based on the value of contextual information represented in the knowledge base. In other words, the information in the knowledge base not only influences the value of the variables of the templates, but also the choice of a particular discourse fragment.
In Table 1, a more elaborate conversation is given with the actual interface of the smartphone. The fragment illustrates the first evaluation of the bedtime restriction exercise. To cover the main interaction needs for a 6-week CBT-I training program, 19 different conversational types were included (e.g., introduction, planning and commitment, and evaluation). Ten of them are planned conversations and the others are unplanned.
So far, we have discussed the necessary machinery and unique features of the Talk component to generate relevant and tailored conversational sequences without losing control of the coaching process. We will now turn to the second part of the paradigm and show the features of the relatively independent modules representing the Tools part of the system's interface.

The Tools Interface
Tools form a coherent structure of input and output facilities that enable the coachee to get access to the domain of discourse (observation and/or manipulation). They may contain linguistic elements, such as words, sentences, and even paragraphs, but in SleepCare, the interfaces of Tools are mainly represented by interactive visual and auditory interface structures that contain pictures and sounds. Tools were included for the following: 1. Registration of sleep data: an electronic sleep diary functions as a Tool to update sleep-related data, such as time in bed and sleep quality. The information was based on the most relevant items of the consensus sleep diary and data was stored to calculate at least the coachee's sleep efficiency. The interface of the tool consists of four pages: the first two pages ask for the time the coachee went to bed Fig. 4 Class diagram of a conversation Table 1 Left: an example of a SleepCare conversation (translated from Dutch); right: the corresponding screenshot of the first part of the original conversation.

E1:
Good evening Emily! / Let's see how the exercise bedtime restriction went. C1: Next E2: What is your opinion about it? Is it easy or difficult to keep the new bedtimes? C2: Very easy -Easy -Difficult -Very difficult E3: Pity you think it is difficult. / Your mean bedtime is 7 hours, our agreement was 6 hours. Your mean sleeping time is 5.75 hours. / In your opinion it didn't go very well, but I think that you kept yourself quite well to our agreement. / You may give yourself a reward. Think of taking a bath, buying a magazine or a book, or spending more time on your hobby. C3: Next E4: Let's take a look at your sleep efficiency. When you started the bedtime restriction exercise, it was 71%, now it is 81%. C4: Next E5: … The italics show the values that are based on the e-coach's knowledge base; in bold is the choice of the coachee the previous evening and got out of bed in the morning; these pages function as a delimiter of the offered input possibilities in the next page. This next page offers a structure of blocks that each represents a quarter of an hour where the coachee can register three types of information: in_bed/awake, in_bed/sleep, out_of_bed (see Fig. 5, left picture). In future versions of the system, the sleep diary can in principle be complemented or replaced by sensor information that updates the sleep data. Moreover, the structures provide a disclaimer and enable the exchange of information about the system's stakeholders and settings, and the coachee's profile.

5.
Scheduling: to observe the properties of the scheduled assignments, such as starting-time and performance status, a corresponding interaction tool exists for each type of assignments. The most important function of the tool is to display the coachee's commitments and adherence to the assignments. For that, the interaction tool not only displays the scheduled starting time and date in the past, present, or future but also the performance status of an assignment by a colored line in front: red means missed, orange means nearly-missed (i.e., still possible to perform), green means completed, and gray means planned. Scheduled assignments result from an agreed contract between coach and coachee that is established in a conversational setting and may trigger the generation of notifications and reminders. The middle picture in Fig. 6 shows the agenda that contains information about past and future commitments; the right picture depicts the planned activities of the day.

Testing and user evaluation
In the previous section, we explained how the Talk-and-Tools paradigm was realized in the SleepCare system and how both interface types may contribute to an acceptable coaching process. To evaluate the acceptability of the system and the therapy process in real-life situations, we now turn to the testing and user evaluation phases. We distinguished four phases of testing and evaluation: (a) automated testing for error code [19]; (b) expert analysis; (c) user evaluation; and, finally, (d) a randomized controlled trial (RCT) with insomnia patients [22]. Below, we will summarize the methodology and some interesting findings.

Automated testing
In automated testing, we aimed in particular at defect detection and reliability estimation, which both require elaborate testing involving all possible input cases of the conversations and the Tools [19]. We tested the system for error code using a test automation approach running on the Android environment. The approach met the challenges posed by a mobile CBT e-coach, such as the lead-time of the system (6 weeks), the dynamic knowledge build-up during the lead-time, and unpredictable user behavior and absence of behavior. For that, a 6-week therapy could be simulated within a few minutes.
To test the build-up of knowledge during the lead time, 32 realistic scenarios were written in collaboration with a sleep psychologist. The scenarios covered the most common sleeping patterns for general testing; more importantly, extreme sleeping cases were covered to make sure that the app acts responsibly to potentially dangerous situations (e.g., sleeping less than 5 h). Each scenario was divided into three elements: activities, dialog paths, and sleep data. The first describes the activities of 1 day; the second contains the details of each conversation performed in a dialog activity; and the third consists of sleep data used for filling in the diary tool, which forms the actual basis of each scenario. These three elements are very much interconnected: sleep data and missed activities directly influence the content and the occurrence of the dialogs and possible paths in a conversation.
Since each conversation relates to a specific exercise type, we divided the testing process of the app in parts describing one type, or 1 week at a time. Each part could be tested separately using the scenarios, limiting the amount of possible courses through the 6-week SleepCare program. Consequently, failures could be identified effectively and error propagation could be investigated in a smaller behavioral space. Three levels of aggregation were identified on which the issues could occur: Tools level: raw, unprocessed data from the various Tools Aggregated data level: the result of raw data processing (e.g., sleep efficiency) Conversational level: constraints and dialogs Most difficult detected errors were caused by error propagation. Such errors could only be found by studying the interaction between the e-coach and coachee spanning weeks of insomnia training. Below, we will return to this issue.

Expert evaluation
Because conversations play a prominent role in the therapy, all conversations (including their branches) were evaluated by a sleep therapist and two experts in Dutch communication. The experts were asked to analyze the system specifically on safety and quality of the advice and on coherence and relevance of the conversational units. On the basis of the 32 scenarios, we recorded simulations that displayed the user interaction with the Talk-and-Tools interface under different conditions and time scales. Using the test automation environment, the approach allowed the experts to playback pre-recorded and predefined actions to compare the results with the desired behavior and to detect any errors.
Several issues were observed during these expert tests. For instance, the e-coach complimented the coachee in an evaluation conversation while the sleep data indicated that the coachee did not sleep well. After studying the recordings, it was found that the e-coach presented sleep parameters that did not match the user data from the scenario. These sleep Fig. 6 The left picture shows a screen shot of a document that provides non-personalized behavior advice to improve the quality of sleep (in this case, food and drink behavior). The middle and right picture show parts of the schedule interface parameters are calculated on the aggregation level using the raw data from the first level. It turned out that there were issues with the storage of sleep diary data (tools level) and with the queries executed to gather data (aggregation level). In addition, the rounding of variables resulted in noticeable mistakes after accumulating sleep data for a week (aggregation level). Thus, the observed bug appeared to be a conversational issue but turned out to have several origins on lower levels of aggregation [19]. Other examples were errors that occurred when a conversation was triggered too soon, because the constraints were not restrictive enough. In addition to these tests, the communication experts improved the language used in the conversations considerably in terms of terminology used, length of the sentences and paragraphs, and grammar (e.g., Bavoid passive sentences as much as possible^).

Evaluation of the tools
In addition to the previous tests, three evaluation methods were applied to test the five Tools of the SleepCare system: (a) heuristic evaluation, (b) cognitive walkthrough, and (c) user experience evaluation based on the thinking aloud method. All tests were performed in the laboratory. With respect to the heuristic evaluation, various guidelines were applied on the basis of [16] and [32], ranging from visibility of the system status to the protection of the user's privacy. In the cognitive walkthrough, six activities were evaluated: filling in the sleep diary, reading sleep information, starting a relaxation exercise, finding information about sleep problems, switching off an exercise reminder, and re-scheduling an appointment. In the user experience evaluation, ten subjects participated (five men, five women; age between 20 and 80), of which three reported having a sleeping problem. None of them had used a sleeping app before. Starting from the introductory screen ( Fig. 6 right), subjects had to perform six tasks in line with the activities of cognitive walkthrough. Afterwards, they had to fill in several questionnaires concerning usability of the interfaces.
The three evaluation methods resulted in 39 suggestions for improvement, ranging from visibility of the diary status to the addition of a clear privacy statement and the amount of text on the screen. These suggestions were classified in four categories that indicated the seriousness of the problem: 3: high priority, mandatory to solve this in a new release (1); 2: average priority, if possible solve this problem (9); 1: low priority: can be done, but not necessary (16); and 0: cosmetic, change only when enough time left (13). In general, participants had become familiar with the interfaces of the Tools and managed to perform successfully the tasks; all participants were satisfied with the system and found it easy to learn. An important issue was that the perceived usefulness reported by participants that had a sleeping problem was considerably higher than those not having a sleeping problem.
In the final preparation phase for the randomized controlled trial (RCT), a pilot experiment was conducted where 24 people asynchronously started using the app in their daily life. The pilot experiment focused in particular on the risk of a decrease in participant's sleep efficiency and sleep quality and the occurrence of technical failures of the system. The results gave us enough confidence in the system and the training program to execute the RCT as a next step in the research process.

Results from the RCT
For an extended overview of the setup, the statistical analysis and the results of the RCT, we refer to [22]. In the RCT, 74 participants were allocated as an active treatment group to the SleepCare system and 77 participants to a waiting list group that received intervention after the active group. Participants with a relatively mild insomnia disorder (a mean of 16.4 on a Dutch version of the Insomnia Severity Index) were recruited. Questionnaires measured insomnia severity, dysfunctional beliefs about sleep and anxiety, and depression symptoms; sleep efficiency was measured by the sleep diary tool.
It was found that, compared to the waiting list group, significantly more people in the active group reached a meaningful clinical change on the Insomnia Severity Index (i.e., Δ ISI ≥ 8). Moreover, in the active group, the mean score went down to 9.9; in the waiting list group, the mean score went down to 13.2. These improvements were largely sustained at a 3-month follow-up. Except for dysfunctional beliefs, mild improvements were found on all other measurements. It should be noted, however, that the SleepCare system did not include explicit cognitive exercises to improve on dysfunctional beliefs. Treatment adherence showed a large variation in the type of activity. The sleep diary was filled out more than 35 times by 35 participants, 13 participants filled in less than 7 diaries, and the other 24 participants filled in between 7 and 35 diaries. 1 Relaxation exercises were less adhered to: 41 participants performed a maximum of 7 exercises and 11 participants did not perform a single relaxation exercise; only 7 participants performed more than 35 exercises. Conversations, on the other hand, were strongly adhered to 47 participants finished 90% of the offered conversations and only 4 participants finished less than 10% of the conversations.
During the RCT, we received 942 e-mails that could be organized into three categories with respect to their content: the experiment, the therapy, and the usability of the app. Participants asked questions, reported problems, and provided suggestions for improvement and gave their judgment about various aspects of these categories. For instance, participants reported about the therapy (translated from Dutch): Bedtime restriction was a revelation. I had read about it, but it is so beautiful to see that the app helped me to give a personal and tailored advice.
But, not all of them were so positive: Bedtime restriction didn't feel good. The influence on daily life is huge.
Since we aimed at offering the user the experience of a coach, of particular interest are remarks that refer to the relational aspects between the app and the participants. One participant reported I miss a real person to talk to.
But, another participant had a more positive opinion: Maybe it sounds strange, but I had the feeling of building a bond with my coach, although I am aware that it is just an algorithm. That is why I was shocked somehow by the way the therapy was closed. [..] I expected a closing conversation where my results were evaluated and where the coach wishes me success.
Clearly, the opinion of the users may differ considerably. What is important here is that at least a subgroup of users was engaged in the therapy, even after 6 weeks of frequent interaction, and had the impression of building a positive relationship with the system.

Discussion and concluding remarks
In this paper, the main goal was to show how the Talk-and-Tools paradigm for human-computer interaction could be applied in the design of interactive systems that offer an acceptable coaching process in the domain of e-coaching. For that, we have specified requirements for e-coaching, and, on the basis of the interaction paradigm, designed and implemented an e-coaching system in the domain of insomnia therapy, called SleepCare. Finally, the system was tested empirically on a large group of target users in their daily environment. Let us now summarize the main findings and briefly discuss the contribution of the Talk-and-Tools paradigm to the design of e-coaching systems.
We claimed that coaching refers to a highly personalized and contextualized process of deliberation that requires a continuous cycle of questioning, advising, agreement, observation, feedback, and adaptation. We recognized that from the coachee's point of view, an e-coach should include two learning dimensions: (1) learning as a social and collaborative practice and (2) learning through individual subjective experiences. The SleepCare system offered the first dimension by means of a set of personalized conversations and the second one by presenting a set of exercises that are part of a cognitive behavior therapy for insomnia (CBT-I). In order to improve the coachee's subjective experience, persuasive strategies were applied to increase the coachee's adherence to the individual exercises.
Persuasive strategies aimed at an increase of the coachee's ability and/or motivation. By means of conversation (or Talk), the system enables a designer to incorporate social elements that have a high impact on a person's behavior, such as personalized explanation, shared decision making, creating awareness, and building a bond. Tools were used to simplify particular activities without deteriorating the aimed individual experience and gave valuable information to personalize the intervention.
In SleepCare, conversations were carefully designed with a minimum of variation to rule out unpredictable behavior but included the necessary mechanisms for personalization and improving the coachee's engagement. With respect to timing, two types of conversation were distinguished: planned conversations triggered by a schedule and unplanned conversations triggered by the violation of particular constraints. Tools are represented by relatively independent modules and form a coherent structure of input and output facilities that enable the coachee to get access to the domain of discourse, i.e., the insomnia domain.
From the various tests and evaluations, we not only conclude that the system is acceptable as an e-coaching system but also effective with respect to its coaching goal. Results of the RCT show that the system had moderate but significant effects on the primary measures of insomnia severity and sleep efficiency. Moreover, the adherence rates to the conversations and the positive remarks referring to a bond with the system, even after 6 weeks, gives us the confidence that the addition of an explicit discourse context representing the symbolic level of interaction can be considered as a valuable contribution to the coaching process.
Including a symbolic level in the interaction in terms of natural language texts and dialog opens a way to include a variety of powerful persuasive strategies to improve exercise adherence. Designing a system as a social actor with natural language capabilities not only dramatically increases the expressive power of the system but also enables us to include meta-level strategies based on social influence, such as commitment. In the design of the conversations, we balanced between flexibility and rigidity. Flexibility improves the personalization of the e-coach's contribution to the characteristics of the coachee and the results of the exercises. On the other hand, dangerous advice by the e-coach should be avoided at all costs. Therefore, conversations were designed with a minimum of variation to preclude unpredictable behavior. We believe that the introduced formalism gives us the necessary stability to guarantee a safe advice and the variability to avoid the boredom of impersonalized texts.
Tools, on the other hand, emphasize the separation of the functionality of a system into independent, interchangeable modules. Each Tool contains the functionality and an optimized set of interface constructs towards the domain of discourse (the API) and the coachee to focus on one aspect of the desired functionality. The coachee can focus on the assigned task at hand, which is more manageable and can be done in a relatively short time. Furthermore, by inheriting features from the direct manipulation concept, Tools present the coachee with an easy to use, familiar method of interaction. In other words, well-designed Tools increase the coachee's ability and motivation to support the process of behavior change, because the approach can dramatically reduce the amount of effort that should be put into the therapy. To improve adherence to necessary activities (e.g., filling in the sleep diary) and exercises (e.g., relaxation), future Tools could be extended with gaming elements or even replaced by a complete game.
The Talk-and-Tools paradigm offers the best of both worlds: Talk for generating conversations, mimicking the collaborative dimensions in the coaching process; Tools for support of the necessary individual experience and reduction of the coachee's energy spent on activities that do not target the aimed experience. In that sense, Talk may be considered as the glue in a coaching program that sticks together the various coaching elements supported by Tools. We, therefore, believe that the inclusion of an automated conversational component integrated with contextual information fills an important gap in the design of current mobile systems that aim at behavior change.