Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relay snapshot test scenario with low bandwidth on data socket #149

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

frdeso
Copy link
Contributor

@frdeso frdeso commented Dec 4, 2018

This test highlights a race in the relay snapshot mode. This testcase triggers a
race where the trace is unreadable because tracing is stopped eventhough
data is still in flight.

This PR only provides a testcase/reproducer and does not provide a fix.

This PR also includes some cleanups of the snapshot testcases.

Copy link
Contributor

@PSRCode PSRCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick here and there.

As for the pull request description and commit:

"This test highlights a race in the relay snapshot mode. This testcase triggers a
race where the trace is unreadable because tracing is stopped eventhough
data is still in flight."

If I remember the problem correctly the main problem is the lack of "synchronization" (data pending phase) at the end of a snapshot record command. The client (lttng-cli) get a response that the snapshot was recorded correctly even if data is still "in-flight" toward the relayd. This results with inconsistent trace (snapshot) on the relayd side if the snapshot is read before all data made its way to the relayd.

At no point tracing must be "stopped" for the problem to occur.

Do you agree?

Setting low bandwidth on the data port only accentuate the problem.

The big question is: what is the guaranteed regarding trace validity offered by the return of lttng-snapshot-record command?

Prepare for addition of new test

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
@frdeso frdeso force-pushed the tests/add_delay_snapshot_streaming_ust branch from 3aae351 to 9f1f618 Compare December 4, 2018 18:41
@jgalar
Copy link
Member

jgalar commented Dec 10, 2018

@PSRCode Are you okay with the first two commits (cleanup)? I could merge those and wait for a fix to merge the new test.

@PSRCode
Copy link
Contributor

PSRCode commented Dec 10, 2018

@jgalar first two looks good. Go ahead.

@frdeso Still waiting for feedback regarding the commit message of the test.

This commit adds a testcase that simulates a snapshot on a relayd with a
data socket with very low bandwidth. This configuration can trigger a
race where the trace is unreadable because the snapshot is reported as
completed even though data is still in flight.

As of right now, this testcase fails because the trace is unreadable.

Babeltrace outputs the following error:
[error] Packet size (4194304 bits) is larger than remaining file size
(175104 bits) in trace with UUID "d9e6182e6469405094f839a08f438c3b", at
path:
"/tmp/tmp.sY3M2G54Oy/raton/snapshot-1-20181203-100618-0/ust/uid/0/64-bit",
within stream id 0, at relative path: "chan1_1".

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
@frdeso frdeso force-pushed the tests/add_delay_snapshot_streaming_ust branch from 9f1f618 to 1bf4cb5 Compare December 10, 2018 23:53
@frdeso
Copy link
Contributor Author

frdeso commented Dec 10, 2018

I updated the commit message but we will need to update it again when we come up with a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants