Abstract
The steadily increasing capabilities in AI systems can have tremendous beneficial
impacts on society. However, it is important to simultaneously tackle possible risks
that these developments are accompanied by. Therefore, the relatively young field of
AI safety has gained international relevance. In parallel, popular media were
commenting on whether society should ascribe motifs such
... read more
as fear or enthusiasm to
AI. However, in order to assess the landscape of AI risks and opportunities, it is
instead first and foremost of relevance not to be afraid, not to be enthusiastic, but to
understand as similarly suggested by Spinoza in the 17th century. In this vein, in this
thesis, a transdisciplinary examination is performed to understand how to address
possible instantiations of AI risks with the aid of scientifically grounded hybrid
cognitive-affective strategies. The identified strategies are “hybrid" due to the fact that AI systems cannot be analyzed in isolation and the nature of human entities as well as the properties of human-machine interactions have to be taken into account within a socio-technological framework and not only addressing unintentional failures but also intentional malice. Consequently, the attribute “cognitive-affective" refers to the inherently affective nature of human cognition.
We consider two disjunct sets of systems: Type I and Type II systems. Type II
systems are systems that are able to consciously create and understand explanatory
knowledge. Conversely, Type I systems are all systems that do not exhibit this
ability. All current AIs are of Type I. However, even if Type II AI is non-existent nowadays, its implementation is not physically impossible. Overall, we identify the following non-exhaustive set of 10 tailored hybrid cognitive-affective strategical clusters for AI safety 1) international (meta-)goals, 2) transdisciplinary Type I/II AI safety and related education, 3) socio-technological feedback-loop, 4) integration of affective, dyadic and social information, 5) security measures and ethical adversarial examples research, 6) virtual reality frameworks, 7) orthogonality-based disentanglement of responsibilities, 8) augmented utilitarianism and ethical goal functions, 9) AI self-awareness and 10) artificial
creativity augmentation research.
In the thesis, we also introduce the so-called AI safety paradox stating, figuratively
speaking, that value alignment and control represent conjugate requirements. In
theory, with a Type II AI, a mutual value alignment might be achievable via a co-
construction of novel values, however, at the cost of its predictability. Conversely, it
is possible to build Type I AI systems that are controllable and predictable, but they
would not exhibit a sufficient understanding of human morality. Nevertheless, AI safety can be addressed by a cybersecurity oriented and risk-centered approach reformulating AI safety as a discipline which proactively addresses AI risks and reactively responds to occurring instantiations of AI risks. In a nutshell, future AI safety requires transdisciplinarily conceived and scientifically grounded dynamics combining proactive error-prediction and reactive error-correction within a socio-technological feedback-loop together with the cognizance that it is first of relevance not to be afraid, not to be enthusiastic, but to understand – that the price of security is eternal creativity.
show less