How to run a world record? A Reinforcement Learning approach - UTU Tutkimustietojärjestelmä

A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

How to run a world record? A Reinforcement Learning approach

Tekijät: Shahsavari Sajad, Immonen Eero, Karami Masoomeh, Haghbayan Mohammadhashem, Plosila Juha

Toimittaja: Ibrahim A. Hameed, Agus Hasan, Saleh Abdel-Afou Alaliyat

Konferenssin vakiintunut nimi: European Conference on Modelling and Simulation

Kustantaja: European Council for Modelling and Simulation

Julkaisuvuosi: 2022

Lehti: Proceedings: European Conference for Modelling and Simulation

Kokoomateoksen nimi: Proceedings of the 36th ECMS International Conference on Modelling and Simulation ECMS 2022 May 30th – June 3rd, 2022, Ålesund, Norway

Tietokannassa oleva lehden nimi: Proceedings - European Council for Modelling and Simulation, ECMS

Sarjan nimi: Proceedings : European Conference for Modelling and Simulation

Numero sarjassa: 36

Vuosikerta: 1

Aloitussivu: 159

Lopetussivu: 166

ISBN: 978-3-937436-77-7

ISSN: 2522-2414

eISSN: 2522-2422

Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla

Julkaisukanavan avoimuus : Kokonaan avoin julkaisukanava

Verkko-osoite: https://www.scs-europe.net/dlib/2022/2022-0159.html

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/175657877

Tiivistelmä

Finding the optimal distribution of exerted effort by an athlete in competitive sports has been widely investigated in the fields of sport science, applied mathematics and optimal control. In this article, we propose a reinforcement learning-based solution to the optimal control problem in the running race application. Well-known mathematical model of Keller is used for numerically simulating the dynamics in runner's energy storage and motion. A feed-forward neural network is employed as the probabilistic controller model in continuous action space which transforms the current state (position, velocity and available energy) of the runner to the predicted optimal propulsive force that the runner should apply in the next time step. A logarithmic barrier reward function is designed to evaluate performance of simulated races as a continuous smooth function of runner's position and time. The neural network parameters, then, are identified by maximizing the expected reward using on-policy actor-critic policy-gradient RL algorithm. We trained the controller model for three race lengths: 400, 1500 and 10000 meters and found the force and velocity profiles that produce a near-optimal solution for the runner's problem. Results conform with Keller's theoretical findings with relative percent error of 0.59% and are comparable to real world records with relative percent error of 2.38%, while the same error for Keller's findings is 2.82%.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

0159_simo_ecms2022_0049.pdf