Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls-eio: fix shutdown semantics of tls_eio flow #460

Closed
wants to merge 1 commit into from

Conversation

bikallem
Copy link
Contributor

@bikallem bikallem commented Dec 16, 2022

This PR makes tls_eio flow instance follow shutdown semantics as specified by eio - https://github.com/ocaml-multicore/eio/blob/main/lib_eio/flow.mli#L92-L95)

The rationale for this change is that in addition to following tls semantics, tls-eio has to follow at the minimum flow semantics as espoused by eio.

/cc @talex5 @hannesm

EDIT The test fails on current tls-eio but passes with this PR.

.gitignore Outdated Show resolved Hide resolved
eio/tests/dune Outdated Show resolved Hide resolved
@hannesm
Copy link
Member

hannesm commented Dec 19, 2022

just for me, being a bit slow: does this PR superseed #459 and #458, and fix #457? would @talex5 have some time to report a review of this PR?

@bikallem
Copy link
Contributor Author

bikallem commented Dec 19, 2022

This PR does not supercede #459. However #458 does include this PR and possibly a fix for #458 #459 for the issue as specified in the comment #459 (comment) . This PR is scoped from #458 so that is is easy to review/merge and addresses a specific/particular issue as shown by the test. Hope this clarifies the PR.

EDIT: updated typo and added precise issue location.

Copy link
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments in the shutdown function you're modifying explain why I didn't do it your way already:

Not sure if we need to keep both directions open on the underlying flow when closing one direction at the TLS level.
Not obvious how to do this with TLS, so ignore for now.

My understanding is that you can't just shut down one side of the underlying flow to shut down one side of the TLS flow, because e.g. reading from TLS requires writing to the underlying flow.

I removed the code from the fuzz tests that prevent them from checking this case. Then it fails even with this PR applied:

https://github.com/talex5/ocaml-tls/tree/flow-shutdown

With that said, I know very little about the TLS protocol itself. Perhaps @hannesm can clarify whether this is possible.

eio/tests/tls_eio.md Outdated Show resolved Hide resolved
@bikallem
Copy link
Contributor Author

The comments in the shutdown function you're modifying explain why I didn't do it your way already:

Ah, if the intention was not to follow Eio.Flow.two_way semantics, then perhaps tls-eio could expose some other abstraction than eio flow. However, from the perspective of the user of the lib, I have found flow to be the correct and convenient abstraction. And thus the tls-eio api much more easy to onboard and use.

I was bit surprised that shutdown of the tls-eio flow doesn't quite shutdown the flow as advertised by the eio flow api description. This PR attempts to improve on that assuming the intention is to use eio flow to express tls-eio flow.

With regards to general tls close/shutdown semantics, some discussion can be found at #452. However, I believe the issue this PR is attempting address is orthogonal to the more general tls close/shutdown issue.

@hannesm
Copy link
Member

hannesm commented Dec 22, 2022

My understanding is that you can't just shut down one side of the underlying flow to shut down one side of the TLS flow, because e.g. reading from TLS requires writing to the underlying flow.

To elaborate on that:

TLS 1.2 (RFC 5246):

Either party may initiate a close by sending a close_notify alert.
   Any data received after a closure alert is ignored.

   Unless some other fatal alert has been transmitted, each party is
   required to send a close_notify alert before closing the write side
   of the connection.  The other party MUST respond with a close_notify
   alert of its own and close down the connection immediately,
   discarding any pending writes.  It is not required for the initiator
   of the close to wait for the responding close_notify alert before
   closing the read side of the connection.

This is different from e.g. TCP. A close notify signals "I won;t send anymore any data", and also "please don't send any further datta". Surely there can still be data in flight (i.e. TCP packets that have been transmitted), but any pending writes should be dropped, and a close notify should be sent by the party that received the close notify. This also means that for the initiator of close notify there's no further "write" to the TCP session (i.e. no key renegotiation).

TLS 1.3

Here, a close notify has half-close TCP semantics. Once you sent a close notify, you'll not send anything (maybe a fatal error if you received garbage).

Have a look at the thread starting with Message-id: A6C599ED-3F3D-462F-9B39-1FEF6A0B549B@apple.com on the IETF TLS working group mailing list, which indicates that some TLS implementations already use the half-close semantics for TLS versions before 1.3. I understand that could be done, since nobody is able to figure out what is "in-flight" and what is pending (since TCP buffer size of your OS may be huge, and different from mine, so your TLS application may have received the close notify only when all data has been already handed off to the OS).

The mail thread also covers "what should happen when after a close notify has been sent, and a key update with update_requested is received?" -- the answer is nothing (since there won't be any more application data, and a key update is only needed before more application data is sent).

Conclusion

Reading up on the RFCs and the mailing list, I guess we can actually have a unified close_notify semantics for all protocol versions. Still we need to track that close_notify state in the TLS engine to reject any "send_application_data" etc.

@bikallem
Copy link
Contributor Author

It seems we have a few overlapping PRs in flight, the original PR 458 includes the fix in this PR as well as a few others. This PR was split from that PR specifically to address the tls-eio flow semantics to match what is expected of an eio flow and to reduce the diff there.

I removed the code from the fuzz tests that prevent them from checking this case. Then it fails even with this PR applied:
https://github.com/talex5/ocaml-tls/tree/flow-shutdown

Yes, this is already identified in the original PR. See in particular my note on this very specific issue (https://github.com/mirleft/ocaml-tls/pull/458/files#diff-55178e7aaabed769f352085f2da8f50085442362661c355b1dfe0cc3a58843ceR55-R66). I believe we need to update the test to not generate send action after send_close action is generated; unless perhaps we are attempting to test if tls-eio correctly throws error/exception when attempting to write to a closed flow. If the latter, this PR already includes that test.

Reading up on the RFCs and the mailing list, I guess we can actually have a unified close_notify semantics for all protocol versions. Still we need to track that close_notify state in the TLS engine to reject any "send_application_data" etc.

Agreed.

If there are no further objections, could we merge this PR. I would like to focus on #458 after this.

@hannesm
Copy link
Member

hannesm commented Dec 23, 2022

could we merge this PR

Well, there are some issues:

  • this is all one huge commit
  • it contains unrelated questionable changes "vendor" (since you've moved those changes to misc: develop against vendored dependencies #462, they can be removed here)
  • I'd prefer to have the "fuzz test changes" in a separate commit (isn't this similar to tls-eio: improve fuzz tests #463 -- which itself is also not mergeable in its current state, see my comments there)
  • I understand this PR does some calls to the underlying flow in order to improve what? the TLS RFC is pretty clear that you have an established flow (let's say TCP) and you can start a TLS handshake on that flow -> you get a TLS flow. on shutdown/close/teardown you still have a fine TCP connection that can be used for other purposes (or another TLS session)
  • the actual changes in this PR, which behaviour do they fix? is there a test case for this? or do they fix the (revised) fuzz test?

Now, I don't have any clue what "eio flow semantics" are about. But is it desired that the underlying (TCP) flow is closed/shutdown? Since it seems both you @bikallem and @talex5 are working on TLS and EIO, maybe it'd make sense if both of you come to a common understanding what kind of semantics you'd like to have. I can only talk about TLS semantics and the underlying RFCs - and I'm still unsure what the relation between #458 #459 #460 #463 is, and what will fix the reported issue #457.

I'm here to ensure that any change to tls-eio is agreed upon by the two developers/users thereof, and that for me who has some knowledge about tls and ocaml-tls to ensure we've a clean history with commit messages that are understandable in a decade. I already wonder whether I merged the tls-eio PR way too early.

@bikallem bikallem force-pushed the flow-shutdown branch 3 times, most recently from 1b532ac to 190a079 Compare December 23, 2022 17:26
@bikallem
Copy link
Contributor Author

@hannesm apologies. I mistakenly pushed changes from another branch onto this one. I have now removed unrelated changes from this branch.

@bikallem
Copy link
Contributor Author

bikallem commented Dec 23, 2022

I understand this PR does some calls to the underlying flow in order to improve what? the TLS RFC is pretty clear that you have an established flow (let's say TCP) and you can start a TLS handshake on that flow -> you get a TLS flow. on shutdown/close/teardown you still have a fine TCP connection that can be used for other purposes (or another TLS session)

Aha, thanks for this information. I hadn't looked into that part of the TLS rfc. I was wondering how Tls_eio.reneg () is supposed to work after we do Eio.Flow.shutdown tls_flow [All/Send/Receive]? Looks like it won't work. @talex5 do you agree with this assessment? Also, do you think perhaps tls-eio shouldn't expose an eio flow from Tls_eio.server_of_flow/client_of_flow? If you agree, I can close this PR.

Now, I don't have any clue what "eio flow semantics" are about. But is it desired that the underlying (TCP) flow is closed/shutdown? Since it seems both you @bikallem and @talex5 are working on TLS and EIO, maybe it'd make sense if both of you come to a common understanding what kind of semantics you'd like to have.

I would say my work on eio is mostly peripheral, In particular the semantics of flow is well established by @talex5 so I have to defer to him on how it is exactly supposed to work.

@hannesm my warmest apologies for the frustration and noise with the PRs and such. I believe ocaml-tls is one fine piece of OCaml software that I have come to appreciate. Thank you for your excellent work.

@hannesm
Copy link
Member

hannesm commented Dec 23, 2022

I was wondering how Tls_eio.reneg () is supposed to work after we do Eio.Flow.shutdown tls_flow [All/Send/Receive]?

According to the RFC, once close_notify has been sent, no renegotiation (or key update) can be done. Maybe it's worth to mention that with TLS 1.3 a shutdown `Receive or shutdown `All is not possible, only half shutdowns are possible (TODO: figure out how Unix does that for TCP sessions which also only have a way (FIN packet) to notify the other peer that they won't sent any further data).

Also, do you think perhaps tls-eio shouldn't expose an eio flow from Tls_eio.server_of_flow/client_of_flow?

This is how I feel now, in circles trying to understand "what is a 'eio flow'?" -- but so far I haven't seen any concrete semantics.

To me, TLS 1.3 (and even 1.2 and below can be made the same) feel like TCP connection from the semantic point of view. But that doesn't answer what "eio flow" is (or tries to achieve).

@bikallem
Copy link
Contributor Author

This is how I feel now, in circles trying to understand "what is a 'eio flow'?" -- but so far I haven't seen any concrete semantics.

"eio flow" semantics is mostly defined here - https://github.com/ocaml-multicore/eio/blob/main/lib_eio/flow.mli. In particular for this PR, eio defines the flow shutdown semantics as such:

type shutdown_command = [
  | `Receive  (** Indicate that no more reads will be done *)
  | `Send     (** Indicate that no more writes will be done *)
  | `All      (** Indicate that no more reads or writes will be done *)
]

class virtual two_way : object
  inherit source
  inherit sink

  method virtual shutdown : shutdown_command -> unit
end

val shutdown : #two_way -> shutdown_command -> unit
(** [shutdown t cmd] indicates that the caller has finished reading or writing [t]
    (depending on [cmd]).
    This is useful in some protocols to indicate that you have finished sending the request,
    and that the remote peer should now send the response. *)

Tls_eio.server_of_flow/client_of_flow returns an instance of two_way flow object (object oriented object) which in turn defines the shutdown method. Generally, as I understood it tls-eio flow object conforms (or should conform?) to this spec/semantics. After calling Eio.Flow.shutdown tls_flow `Send, we shouldn't be able to write on tls_flow anymore. Similarly for `Receive and `All. Attempt to do so should result in error/exception. That is what I understood and what this PR aims to do - mostly conform to this spec. @talex5 Is this my correct understanding?

@bikallem
Copy link
Contributor Author

Also we already close the underlying connection on Eio.Flow.shutdown tls_flow `All (https://github.com/mirleft/ocaml-tls/blob/main/eio/tls_eio.ml#L159). Perhaps if we want to reuse the underlying tcp connection then we remove shutting down the tcp connection on `` All too and clarify it in the API doc of `tls-eio`.

@bikallem
Copy link
Contributor Author

Closing this PR for now in support of #464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants