
|
PhD Dissertation Defense Abstract: Liviu Panait
The Analysis and Design of Concurrent Learning Algorithms for Cooperative Multiagent SystemsConcurrent learning is the application of several machine-learning algorithms in parallel to automatically discover behaviors for teams of agents. As machine, learning techniques tend to find better and better solutions if they are allowed additional time and resources, the same would be expected from concurrent learning algorithms. Surprisingly, previous empirical and theoretical analysis has shown this not to be the case. Instead, concurrent learning algorithms often drift towards suboptimal solutions due to the learners' coadaptation to one another. This is very problematic for practitioners interested in employing these techniques to discover optimal team behaviors. This thesis presents theoretical and empirical research on what causes this drift, as well as on how to minimize it altogether. I present evidence that the drift occurs because learners often have distorted perceptions of the overall search space. Interestingly, increasing the accuracy of a learner's perception does not require more sophisticated sensing capabilities; rather, it can be simply achieved if the learner ignores certain reward information that it received for performing actions. I provide formal proofs that concurrent learning algorithms will converge to the globally optimal solution, if each learner has sufficiently accurate perceptions. This theoretical analysis provides the foundation for the design of novel concurrent learning algorithms that benefit from accurate perceptions of the joint search space. First, the perceptions of learners employing the biased cooperative coevolutionary algorithm are greatly improved based on reward information that was received in the past. Second, the iCCEA algorithm provides learners with functionally-equivalent perceptions at a reduced computational cost. Finally, I describe the lenient multiagent Q-learning algorithm, which benefits from more accurate perceptions when tackling challenging coordination tasks in stochastic domains. |