feat: adding more related works and introduction content

ReadyResearchers-2023-24 · Jan 25, 2024 · d286f61 · d286f61
1 parent 4a8e989
commit d286f61
Show file tree

Hide file tree

Showing 2 changed files with 58 additions and 5 deletions.
diff --git a/references.bib b/references.bib
@@ -18,6 +18,26 @@ @article{koyama2014
   publisher={Springer},
   annote={Koyoma reviews advances in VCSEL photonics and presents the physical principles underlying VCSEL diodes.}
 }
+@inproceedings{bernini2021,
+  author = {Bernini, Nicola and Bessa, Mikhail and Delmas, R\'{e}mi and Gold, Arthur and Goubault, Eric and Pennec, Romain and Putot, Sylvie and Sillion, Fran\c{c}ois},
+  title = {A few lessons learned in reinforcement learning for quadcopter attitude control},
+  year = {2021},
+  isbn = {9781450383394},
+  publisher = {Association for Computing Machinery},
+  address = {New York, NY, USA},
+  url = {https://doi.org/10.1145/3447928.3456707},
+  doi = {10.1145/3447928.3456707},
+  abstract = {In the context of developing safe air transportation, our work is focused on understanding how Reinforcement Learning methods can improve the state of the art in traditional control, in nominal as well as non-nominal cases. The end goal is to train provably safe controllers, by improving both training and verification methods. In this paper, we explore this path for controlling the attitude of a quadcopter: we discuss theoretical as well as practical aspects of training neural nets for controlling a crazyflie 2.0 drone. In particular we describe thoroughly the choices in training algorithms, neural net architecture, hyperparameters, observation space etc. We also discuss the robustness of the obtained controllers, both to partial loss of power for one rotor and to wind gusts. Finally, we measure the performance of the approach by using a robust form of a signal temporal logic to quantitatively evaluate the vehicle's behavior.},
+  booktitle = {Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control},
+  articleno = {27},
+  numpages = {11},
+  location = {Nashville, Tennessee},
+  series = {HSCC '21},
+  annote = {This article conciseley defines the steps taken to train a
+    *crazyflie* quadcopter using reinforcement learning. I will discuss it and
+    learn from their experiences in training a quadcopter for attitude control
+    via reinforcement learning.}
+}
 @article{iga2000,
   title={Surface-emitting laser-its birth and generation of new optoelectronics field},
   author={Iga, Kenichi},

diff --git a/thesis.md b/thesis.md
@@ -86,11 +86,26 @@ defining a set of desired qualities of a system, the system will learn to
 develop its own policy, which is responsible for mapping its instantaneous of
 its environment to its action at a given point in time.
 
-<!-- FIXME: continue to develop idea about why autonomous navigation is not
-perfect. -->
-
-<!-- FIXME: try to give a proper motivation for why my approach has validity -->
-<!-- FIXME: set it up for my specific project -->
+### Reinforcement Learning
+
+RL has long been considered to be more adaptive than the industry standard
+method of control, PID control, which requires extensive tuning. RL tries to
+find the optimal way to map a perceived state to an action by finding what is
+called a control policy. The decision making is influenced by a reward factor,
+which quantifies the success of the system's actions. A control policy is a set
+of rules that define the way in which a system's state is mapped to its next
+action. This is similar to the Markov decision process, in which an agent's
+action at time $\tau_k+1$ is derived from its state at $\tau_k$. Although a
+control policy can be defined qualitatively, it is rather a mapping of a state
+tensor to an action tensor, both almost always being multidimensional.
+
+Recent investigations into methods of control for quadcopter systems involve
+controlling the quadcopter's *attitude*, or the desired state of its position.
+As opposed to PID control, which requires tuning between the feedback loop and
+action of the controller, RL autonomously solves control problems by optimizing
+its actions with respect to a reward metric. This results in an enhanced ability
+to react to diverse situations, which would be considered generalized
+intelligence [@bernini2021].
 
 ### The Growing UAV Industry
 
@@ -362,6 +377,17 @@ because of its refined scope. This will consequently allow for RL training to
 take place for a consistent kind of problem, rather than leaving both 'free'
 navigation and obstacle avoidance for the quadcopter to handle.
 
+#### Using Reinforcement Learning for Attitude Control
+
+Giving a quadcopter complete control over its attitude requires extensive
+training and computational power to train because of how large the action and
+state spaces become. Continuous action and state spaces, as opposed to discrete
+action and state spaces, require a completely distinct set of algorithms for
+training, because of the infinite number of states and actions possible.
+
+<!-- related-work -->
+The authors in [@bernini2021] compare different RL algorithms' effectiveness in
+controlling a quadcopter's attitude.
 <!-- FIXME: more here -->
 
 # Method of approach
@@ -428,6 +454,13 @@ the I2C protocol.
 The method of learning used in this project is the Deep Deterministic Policy
 Gradient algorithm, which maps the state of our system to an action.
 
+In RL algorithms, training is accomplished by giving the system a feedback
+mechanism called a reward. The reward may be based off of historical data or on
+the most recent state of the system.
+
+One method of finding an optimal policy for controlling a system is to track the
+gradient of the expected reward
+
 <!-- FIXME: reward metric to actually get the thing to perform navigation has
 yet to be determined -->