Abstract
This paper describes a novel algorithm called CONMODP
for computing Pareto optimal policies for deterministic
multi-objective sequential decision problems. CON-MODP is
a value iteration based multi-objective dynamic programming
algorithm that only computes stationary policies. We observe that
for guaranteeing convergence to the unique Pareto optimal set of
deterministic stationary policies, the algorithm needs to perform
a policy
... read more