Skip to content

Commit

Permalink
feat: adding more related works and introduction content
Browse files Browse the repository at this point in the history
  • Loading branch information
simojo committed Jan 25, 2024
1 parent 4a8e989 commit d286f61
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 5 deletions.
20 changes: 20 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,26 @@ @article{koyama2014
publisher={Springer},
annote={Koyoma reviews advances in VCSEL photonics and presents the physical principles underlying VCSEL diodes.}
}
@inproceedings{bernini2021,
author = {Bernini, Nicola and Bessa, Mikhail and Delmas, R\'{e}mi and Gold, Arthur and Goubault, Eric and Pennec, Romain and Putot, Sylvie and Sillion, Fran\c{c}ois},
title = {A few lessons learned in reinforcement learning for quadcopter attitude control},
year = {2021},
isbn = {9781450383394},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3447928.3456707},
doi = {10.1145/3447928.3456707},
abstract = {In the context of developing safe air transportation, our work is focused on understanding how Reinforcement Learning methods can improve the state of the art in traditional control, in nominal as well as non-nominal cases. The end goal is to train provably safe controllers, by improving both training and verification methods. In this paper, we explore this path for controlling the attitude of a quadcopter: we discuss theoretical as well as practical aspects of training neural nets for controlling a crazyflie 2.0 drone. In particular we describe thoroughly the choices in training algorithms, neural net architecture, hyperparameters, observation space etc. We also discuss the robustness of the obtained controllers, both to partial loss of power for one rotor and to wind gusts. Finally, we measure the performance of the approach by using a robust form of a signal temporal logic to quantitatively evaluate the vehicle's behavior.},
booktitle = {Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control},
articleno = {27},
numpages = {11},
location = {Nashville, Tennessee},
series = {HSCC '21},
annote = {This article conciseley defines the steps taken to train a
*crazyflie* quadcopter using reinforcement learning. I will discuss it and
learn from their experiences in training a quadcopter for attitude control
via reinforcement learning.}
}
@article{iga2000,
title={Surface-emitting laser-its birth and generation of new optoelectronics field},
author={Iga, Kenichi},
Expand Down
43 changes: 38 additions & 5 deletions thesis.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,26 @@ defining a set of desired qualities of a system, the system will learn to
develop its own policy, which is responsible for mapping its instantaneous of
its environment to its action at a given point in time.

<!-- FIXME: continue to develop idea about why autonomous navigation is not
perfect. -->

<!-- FIXME: try to give a proper motivation for why my approach has validity -->
<!-- FIXME: set it up for my specific project -->
### Reinforcement Learning

RL has long been considered to be more adaptive than the industry standard
method of control, PID control, which requires extensive tuning. RL tries to
find the optimal way to map a perceived state to an action by finding what is
called a control policy. The decision making is influenced by a reward factor,
which quantifies the success of the system's actions. A control policy is a set
of rules that define the way in which a system's state is mapped to its next
action. This is similar to the Markov decision process, in which an agent's
action at time $\tau_k+1$ is derived from its state at $\tau_k$. Although a
control policy can be defined qualitatively, it is rather a mapping of a state
tensor to an action tensor, both almost always being multidimensional.

Recent investigations into methods of control for quadcopter systems involve
controlling the quadcopter's *attitude*, or the desired state of its position.
As opposed to PID control, which requires tuning between the feedback loop and
action of the controller, RL autonomously solves control problems by optimizing
its actions with respect to a reward metric. This results in an enhanced ability
to react to diverse situations, which would be considered generalized
intelligence [@bernini2021].

### The Growing UAV Industry

Expand Down Expand Up @@ -362,6 +377,17 @@ because of its refined scope. This will consequently allow for RL training to
take place for a consistent kind of problem, rather than leaving both 'free'
navigation and obstacle avoidance for the quadcopter to handle.

#### Using Reinforcement Learning for Attitude Control

Giving a quadcopter complete control over its attitude requires extensive
training and computational power to train because of how large the action and
state spaces become. Continuous action and state spaces, as opposed to discrete
action and state spaces, require a completely distinct set of algorithms for
training, because of the infinite number of states and actions possible.

<!-- related-work -->
The authors in [@bernini2021] compare different RL algorithms' effectiveness in
controlling a quadcopter's attitude.
<!-- FIXME: more here -->

# Method of approach
Expand Down Expand Up @@ -428,6 +454,13 @@ the I2C protocol.
The method of learning used in this project is the Deep Deterministic Policy
Gradient algorithm, which maps the state of our system to an action.

In RL algorithms, training is accomplished by giving the system a feedback
mechanism called a reward. The reward may be based off of historical data or on
the most recent state of the system.

One method of finding an optimal policy for controlling a system is to track the
gradient of the expected reward

<!-- FIXME: reward metric to actually get the thing to perform navigation has
yet to be determined -->

Expand Down

0 comments on commit d286f61

Please sign in to comment.