Aggregating Actor-Critic Value Functions to Support Military Decision-Making - UTU Tutkimustietojärjestelmä

A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

Aggregating Actor-Critic Value Functions to Support Military Decision-Making

Tekijät: Vasankari, Lauri; Virtanen, Kai

Toimittaja: Moosaei, Hossein; Kotsireas, Ilias; Pardalos, Panos M.

Konferenssin vakiintunut nimi: International Conference on the Dynamics of Information Systems

Kustantaja: Springer Nature Switzerland

Julkaisuvuosi: 2025

Lehti: Lecture Notes in Computer Science

Kokoomateoksen nimi: Dynamics of Information Systems: 7th International Conference, DIS 2024, Kalamata, Greece, June 2–7, 2024, Revised Selected Papers

Vuosikerta: 14661

Aloitussivu: 141

Lopetussivu: 152

ISBN: 978-3-031-81009-1

eISBN: 978-3-031-81010-7

ISSN: 0302-9743

eISSN: 1611-3349

DOI: https://doi.org/10.1007/978-3-031-81010-7_10

Julkaisun avoimuus kirjaamishetkellä: Ei avoimesti saatavilla

Julkaisukanavan avoimuus : Ei avoin julkaisukanava

Verkko-osoite: https://doi.org/10.1007/978-3-031-81010-7_10

Tiivistelmä

Reinforcement learning (RL) is used for finding optimal policies for agents in respective environments. The obtained policies can be utilized in decision support, i.e. suggesting or determining optimal actions for different states or observations in the environment. An actor-critic RL method combines policy gradient methods with value functions, where the critic estimates the value function, and the actor updates the policy as directed by the critic. Usually, the utility is the policy learned by the actor. However, if the environment is defined accordingly, the approximated value function can be used to assess, e.g., an optimal solution for placing military units in an operational theatre. This paper explores the use of the critic as the primary output as a decision-support tool, presenting an experiment in a littoral warfare environment.