Skip to content

Commit

Permalink
feat: related work on gps-denied positioning
Browse files Browse the repository at this point in the history
  • Loading branch information
simojo committed Mar 22, 2024
1 parent dfb4786 commit 21c4e2c
Show file tree
Hide file tree
Showing 3 changed files with 110 additions and 98 deletions.
1 change: 1 addition & 0 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ pandoc \
--metadata-file config.yaml \
--lua-filter .filters/abstract-to-meta.lua \
--lua-filter .filters/sec-refs-better.lua \
--lua-filter .filters/pagebreak.lua \
--template template/thesis.tex \
--citeproc \
--csl https://www.zotero.org/styles/journal-of-the-acm \
Expand Down
12 changes: 12 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -482,3 +482,15 @@ @Misc{gnuplot
year={2024},
howpublished={\url{http://www.gnuplot.info/}}
}
@article{engel2012,
title={Accurate figure flying with a quadrocopter using onboard visual and inertial sensing},
author={Engel, Jakob and Sturm, J{\"u}rgen and Cremers, Daniel},
journal={Imu},
volume={320},
number={240},
year={2012},
publisher={Citeseer},
annote={This paper is written by Jakob Engel, a CV researcher for Facebook,
where he and his team explored controlling a quadcopter using an EKF,
monocular SLAM, and a PID controller.}
}
195 changes: 97 additions & 98 deletions thesis.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,16 @@ regardless of socioeconomic or cultural factors.

# Related work

## GPS-Denied Positioning Using ArUco Markers
## GPS-Denied Positioning

This project relies on GPS-denied positioning to estimate the pose of the
quadcopter. GPS-denied positioning is the control of a robotic system using
local measurements to estimate its position rather than coordinates from a GPS
system. Typical implementations of GPS-denied positioning use some form of
computer vision, whether for measuring relative velocity or measuring relative
orientation from markers such as QR codes or ArUco Markers.

### Using ArUco Markers

<!-- related-work -->
In an effort to demonstrate the ability of a quadcopter to perform basic
Expand All @@ -315,6 +324,27 @@ While our project may not use ArUco markers, a comparison can be made
between the effectiveness of ArUco markers and an array of ToF sensors for
determining the local position of the quadcopter.

### Using Monocular SLAM for Position Estimation

<!-- related-work -->

Similarly to how the PX4's position estimation functions, the authors of
[@engel2012] use a downward-facing camera for measuring the velocity of the
quadcopter relative to the ground. However, in addition, they add a
forward-facing camera that is used for monocular Simultaneous Localization And
Mapping (SLAM). Because of the large gaps in time between sensor readouts, they
employ an Extended Kalman Filter (EKF), which is capable of smoothing out
state predictions between varying time steps.

By using a front-facing camera for monocular SLAM, it was discovered that
monocular SLAM efficiently eliminated the accumulated error that can happen
while maneuvering a quadcopter system [@engel2012]. For a similar reason, our
project uses an array of ToF sensors to keep track of the relative location of
obstacles around the Clover. Although there is no form of SLAM used in our
project, by providing the ranging measurements to the reinforcement learning
algorithm's state, we aim to see some form of localization as an emergent
behavior after training.

## Simulating Quadcopter Dynamics

In order for the control loop to properly decide a course of action based on its
Expand Down Expand Up @@ -346,7 +376,6 @@ correction [@de2014].
### Using Reinforcement Learning for Path Planning

<!-- related-work -->

Algorithms such as the famous $\text{A}^{*}$ algorithm require a robotic system
to have a comprehensive understanding of the environment it is in. The authors
in [@hodge2021] combined uses a Proximal Policy Optimization (PPO) algorithm
Expand All @@ -361,9 +390,11 @@ previous policy. By ignoring the altitude of the quadcopter, the authors of
dynamic control up to the reader. This means that their method provides a way to
recommend which movements to make without actually controlling the system.

Our project seeks to both provide a mode of dynamic control and navigation
for the quadcopter system, but we can still compare the accuracy of our
navigation algorithm to that of the PPO-based project in [@hodge2021].
Our project follows a very similar path, relying on the onboard flight
controller's control to navigate to setpoints; however, because the policy has
no long-term memory, we rely on the action-value function, which is trained off
of historical training data, to properly weigh the decision-making process to
navigate coherently through an environment.

<!-- related-work -->
The authors in [@doukhi2022] present a hybrid approach of dynamic control and
Expand Down Expand Up @@ -509,7 +540,7 @@ define the action space $A$ of the quadcopter by three parameters and their
corresponding range of values and activation functions, detailed in
{+@tbl:action}.

<div style="page-break-after: always;"></div>
<!-- FIXME: migrate to tabular environment; only problem is citing it. -->

Table: The values that exist in the state space of the quadcopter system.
{#tbl:state}
Expand Down Expand Up @@ -568,8 +599,6 @@ Table: The values that exist in the state space of the quadcopter system.
| $\text{range}_9$ | $[0.001, 6]$ | $\text{m}$ | |
+======================+======================+===========================+==========================+

<div style="page-break-after: always;"></div>

Table: The values that exist in the action space of the quadcopter system and
their corresponding activation functions.
{#tbl:action}
Expand Down Expand Up @@ -692,7 +721,7 @@ Gazebo how to resolve relationships between links and joints. Further, plugins,
such as sensors, motors, or other actuators can be defined under the `<gazebo>`
tag.

```
```xml
<!-- tell gazebo about this link, specifying custom physics properties -->
<gazebo reference="base_link">
<self_collide>0</self_collide>
Expand Down Expand Up @@ -819,7 +848,7 @@ when running the simulated PX4 firmware.

Once the Clover reaches its initial position of $(0, 0, 0.5)$, the first action
$a$ is sampled from the policy $\mu_{\theta}$, and noise is added to it. The
sampled action, i.e., a position relative to the Clover's frame is then
sampled action, i.e., a position relative to the Clover's frame, is then
navigated to using `navigate` once again. If the quadcopter is unable to reach
the position derived from $\mu_{\theta}$ within a certain number of seconds, the
action is aborted, and the timeout is taken into account by the reward metric.
Expand All @@ -830,9 +859,9 @@ is from its goal or if the quadcopter has crashed or flipped over.

The reward metric is then recorded, and the gradient of the reward metric is
measured with respect to the action and state variables [@spinningup2018]. The
*Adam* optimizing function, provided by TensorFlow, is then used to modify the
policy and the action-value function's weights [@tensorflow]. This is when the
learning happens.
`keras.optimizers.Adam` optimizing function, provided by TensorFlow, is then
used to modify the policy and the action-value function's weights [@tensorflow].
This is when the learning happens.

### Reward Metric {#reward-metric}

Expand All @@ -841,13 +870,13 @@ granted to the agent for each episode.

$$
\begin{array}{rl}
(1) & r_t = 0 \\
(2) & r_t \Leftarrow r_t - 1000 \cdot n_{\text{collisions}} \\
(3) & r_t \Leftarrow r_t - 1000 \cdot \sqrt{(p_{x_t}^s - x_{{\text{desired}}_t}^s)^2 + (p_{y_t}^s - y_{{\text{desired}}_t}^s)^2} \\
(4) & \text{if}\ \ \Delta t > \text{timeout}\ \ \text{then} \\
(5) & \ \ \ \ r_t \Leftarrow r_t - 1000 \\
(6) & \text{if}\ \ p_{z_t}^s + r_t^s \cos{\theta_t^s} <= 0.5 \ \ \text{then} \\
(7) & \ \ \ \ r_t \Leftarrow r_t - 2000
(1) & r_t = 0 \\[2ex]
(2) & r_t \leftarrow r_t - 1000 \cdot n_{\text{collisions}} \\[2ex]
(3) & r_t \leftarrow r_t - 1000 \cdot \sqrt{(p_{x_t}^s - x_{{\text{desired}}_t}^s)^2 + (p_{y_t}^s - y_{{\text{desired}}_t}^s)^2} \\[2ex]
(4) & \text{if}\ \ \Delta t > \text{timeout}\ \ \text{then} \\[2ex]
(5) & \ \ \ \ r_t \leftarrow r_t - 1000 \\[2ex]
(6) & \text{if}\ \ p_{z_t}^s + r_t^s \cos{\theta_t^s} <= 0.5 \ \ \text{then} \\[2ex]
(7) & \ \ \ \ r_t \leftarrow r_t - 2000
\end{array}
$$ {#eq:reward-metric}
Expand Down Expand Up @@ -1095,6 +1124,7 @@ Putting {+@eq:r_perp_squared_direction_cosines} back into
{+@eq:sum_of_mass_times_r},
<!-- FIXME: write equation 9.1.6 from the book and finish it out from there. -->
<!-- FIXME: not done -->
$$
\begin{array}{rcl}
Expand Down Expand Up @@ -1143,6 +1173,7 @@ I_{zz}$ are the moments of inertia along the $x$, $y$, and $z$ axes
<!-- FIXME: reference paper on Newton-Euler formulation -->
<!-- FIXME: reference de2014 -->
<!-- FIXME: more here. Should I even follow through with completing this? -->
### ToF Ranging Sensors
Expand Down Expand Up @@ -1292,86 +1323,52 @@ inexpensiveness, small size, and ability to transmit continuously [@raj2020].
<!-- FIXME: beef this up for the second semester -->
# Preliminary Results
<!-- FIXME: this is strictly writing for the first semester -->
In the current implementation of this project, the quadcopter is being trained
to hover at a static position near its origin within a virtual machine provided
by COEX. The quadcopter's reward metric, given in {+@eq:reward} is subtracted by
its distance from the desired position of $(x_{\text{desired}},
y_{\text{desired}}, z_{\text{desired}}) = (0\text{m}, 0\text{m}, 1\text{m})$. To
incentivize the quadcopter to hover at $z_{\text{desired}} = 1\text{m}$, a
Gaussian function centered around $z_s = 1\text{m}$ is added to the reward.
$$
\begin{array}{ll}
\text{reward}
\equiv &
100\left(e^{-(z_s - 1)^2} - 1\right) - \\[2ex]
& \sqrt{
\left(x_s - x_\text{desired}\right)^2
+ \left(y_s - y_\text{desired}\right)^2
+ \left(z_s - z_\text{desired}\right)^2
} \\
\end{array}
$$ {#eq:reward}
The most prominent challenge with the approach of using Gazebo for simulating
the episodes has been determining a way for the simulation to step through time
at a constant rate. ROS has a callback function capable of measuring the time
elapsed during a single control loop and throttling its speed to maintain a
consistent frequency of reading its state and taking action. This could allow a
predictable control frequency rather than a variable control frequency. A
variable control frequency means that a model may not make appropriate decisions
between control loops. The ability to maintain a consistent timing between each
control iteration would provide much more stability regarding the model's
control of the quadcopter system.
During each episode of training, once the quadcopter flips upside down or flies
out of sight, it is most appropriate to tack an extremely low reward value onto
the data for that episode and reset the simulation environment; however, because
the PX4 flight controller software is not preemptive by default, meaning that it
cannot be easily suspended, the flight controller is not able to handle the
sudden change in position when the quadcopter's position is reset using Gazebo.
Although Gazebo allows for an option to pause the physics engine, the PX4 flight
controller software must be manually modified in order to have this feature.
This feature is not yet a part of the current implementation, and the PX4 still
poses as an issue to switching quickly between episodes.
This issue has prevented the quadcopter system from learning how to control,
because each episode cannot be reset.
# Results
<!--
- FIXME: should have discretized steps to speed up training and make actions
more obvious. Perhaps discretizing by half meters by approximating the
measurements of state to integers, and leaving actions up to moving by one
nearest voxel of the discretized size.
- FIXME: fixing timing issues with sensor readouts, creating the thread handler,
was most important in getting valuable feedback. Feedback has appeared to be
the most important part of training.
- foreshadow that using some form of SLAM or transforming the ToF sensor data
into a point cloud would be better, nodding to this explicit discussion
during the future works section
- FIXME: discuss all training runs and why they failed/succeeded, using the
comments that I embedded in the top of each file.
- need to format training data using a gnuplot script
- FIXME: training appears to converge; need to actually record the drone
navigating to verify this howver.
-->
# Future Work
## Fixing Timing Inconsistencies in PX4
The next step of this project is to fix the timing inconsistencies of the
simulation such that each control loop, or observation-action pair, happens at a
consistent frequency that can be reproduced in the real world. This may involve
creating a custom version of the PX4 flight controller software capable of
preemptive behavior.
Presumably, fixing the timing inconsistency will allow the quadcopter system to
train itself to hover at a static position. Once this is demonstrated, it will
then be appropriate to explore techniques to train the quadcopter to navigate.
## Simulating and Physically Implementing ToF Sensors
In order for the quadcopter to navigate, it must have the array of ToF sensors.
This means that the ToF must also be simulated in Gazebo with the exact
specifications of the Adafruit VL53L4CX sensors. Additionally, the ten ToF
sensors must be physically mounted to the quadcopter, which will require a 3D
printed fixture. This means that time must be taken to draft, print, and
simulate this design.
## Developing Reward Metrics to Incentivize Quadcopter Navigation
Developing reward metrics to incentivize quadcopter navigation is arguably the
crux of this project, and it will require continuous self-assessment and
reevaluation. By looking upon the work found in [@doukhi2022], we will determine
if certain parts of the quadcopter's navigation are appropriate to hand off to
the inertial navigation process.
<!--
- FIXME: doing more training, perhaps on a GPU accelerated computer
- FIXME: perhaps adding nodes that transform the collection of range
measurements into a point cloud in the `body` frame
- this could also be extended to use SLAM, if computing power permits
- FIXME: fine-tuning the reward metric for better training
- FIXME: physically implementing the quadcopter system
- implementing ToF sensors; hooking them up via I2C to the RPi
- creating ROS node that publishes information from the ToF sensors via a
topic
- PID tuning the quadcopter and finishing mounting equipment onto it
- maiden flight via manual control
- creating new ros node that controls the quadcopter using pre-defined weights
- booting up raspberry Pi image and testing offboard control
- [optional] because of inconsistencies with simulation and real-world, to
advance the idea of cirriculum learning, to modify clover_train or create a
new package that trains the quadcopter, using saved weights, while it is
flying, ensuring the heavy use of failsafes.
- This would require a deeper understanding of how the `clover` package
actually interacts with the PX4.
-->
# Appendix
Expand Down Expand Up @@ -1414,7 +1411,9 @@ After opening MeshLab, navigate to `File -> Import Mesh` to import the COLLADA
file. Then, selecting
```txt
Filters -> Quality Measure and Computations -> Compute Geometric Measures
Filters
-> Quality Measure and Computations
-> Compute Geometric Measures
```
will print the physical properties of the mesh in the lower-right log:
Expand Down

0 comments on commit 21c4e2c

Please sign in to comment.