-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add relay snapshot test scenario with low bandwidth on data socket #149
base: master
Are you sure you want to change the base?
Add relay snapshot test scenario with low bandwidth on data socket #149
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick here and there.
As for the pull request description and commit:
"This test highlights a race in the relay snapshot mode. This testcase triggers a
race where the trace is unreadable because tracing is stopped eventhough
data is still in flight."
If I remember the problem correctly the main problem is the lack of "synchronization" (data pending phase) at the end of a snapshot record command. The client (lttng-cli) get a response that the snapshot was recorded correctly even if data is still "in-flight" toward the relayd. This results with inconsistent trace (snapshot) on the relayd side if the snapshot is read before all data made its way to the relayd.
At no point tracing must be "stopped" for the problem to occur.
Do you agree?
Setting low bandwidth on the data port only accentuate the problem.
The big question is: what is the guaranteed regarding trace validity offered by the return of lttng-snapshot-record command?
Prepare for addition of new test Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
3aae351
to
9f1f618
Compare
@PSRCode Are you okay with the first two commits (cleanup)? I could merge those and wait for a fix to merge the new test. |
This commit adds a testcase that simulates a snapshot on a relayd with a data socket with very low bandwidth. This configuration can trigger a race where the trace is unreadable because the snapshot is reported as completed even though data is still in flight. As of right now, this testcase fails because the trace is unreadable. Babeltrace outputs the following error: [error] Packet size (4194304 bits) is larger than remaining file size (175104 bits) in trace with UUID "d9e6182e6469405094f839a08f438c3b", at path: "/tmp/tmp.sY3M2G54Oy/raton/snapshot-1-20181203-100618-0/ust/uid/0/64-bit", within stream id 0, at relative path: "chan1_1". Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
9f1f618
to
1bf4cb5
Compare
I updated the commit message but we will need to update it again when we come up with a fix. |
This test highlights a race in the relay snapshot mode. This testcase triggers a
race where the trace is unreadable because tracing is stopped eventhough
data is still in flight.
This PR only provides a testcase/reproducer and does not provide a fix.
This PR also includes some cleanups of the snapshot testcases.