parser: two-phase parsing

With the new changes, the parser returns immediately after the header is parsed and does not begin parsing the body until the next call to `parse()`. In the case of bodiless messages and head responses, it directly transitions to the `complete_in_place` state after the header is parsed, making a call to `parse()` unnecessary (but still valid). This two-phase parsing brings a few benefits with almost no complications on the usage side of the API: - It introduces an optimization opportunity for users who want to attach a body. If they do so immediately after the header is parsed (which seems to be the case most of the time), there's no need for `cb1_` for elastic bodies and a small `cb1_` for sink bodies (as it will be used temporarily). This means all the extra space can be utilized for `cb0_`. - Because parsing the body might complete with an error, returning after the header is parsed allows users to access the header and on the next call to parse encounter the error. - Setting the body limit in the middle of parsing the body or after it doesn't make much sense, so returning right after the header is parsed provides a window for setting such limits. - If users want to attach a body, they will almost always do so immediately after the header is parsed. By not continuing the parsing of the body, we avoid the need for an extra buffer copy operation (in case the user wants to attach a buffer).
cppalliance · Jan 12, 2025 · c8199d8 · c8199d8
1 parent cfc59eb
commit c8199d8
Show file tree

Hide file tree

Showing 4 changed files with 421 additions and 330 deletions.
diff --git a/include/boost/http_proto/parser.hpp b/include/boost/http_proto/parser.hpp
@@ -323,6 +323,18 @@ class BOOST_SYMBOL_VISIBLE
     Sink&
     set_body(Args&&... args);
 
+    /** Sets the maximum allowed size of the body for the current message.
+
+        This overrides the default value specified by
+        @ref config_base::body_limit.
+        The limit automatically resets to the default
+        for the next message.
+
+        @param n The new body size limit in bytes.
+    */
+    void
+    set_body_limit(std::uint64_t n);
+
     /** Return the available body data.
 
         The returned buffer span will be invalidated if any member
@@ -369,9 +381,6 @@ class BOOST_SYMBOL_VISIBLE
     bool
     is_plain() const noexcept;
 
-    void
-    on_headers(system::error_code&);
-
     BOOST_HTTP_PROTO_DECL
     void
     on_set_body() noexcept;
@@ -382,13 +391,17 @@ class BOOST_SYMBOL_VISIBLE
         std::size_t,
         bool);
 
+    std::uint64_t
+    body_limit_remain() const noexcept;
+
     static constexpr unsigned buffers_N = 8;
 
     enum class state
     {
         reset,
         start,
         header,
+        header_done,
         body,
         set_body,
         complete_in_place,
@@ -407,10 +420,11 @@ class BOOST_SYMBOL_VISIBLE
 
     detail::workspace ws_;
     detail::header h_;
-    std::size_t body_avail_ = 0;
+    std::uint64_t body_limit_= 0;
     std::uint64_t body_total_ = 0;
     std::uint64_t payload_remain_ = 0;
     std::uint64_t chunk_remain_ = 0;
+    std::size_t body_avail_ = 0;
     std::size_t nprepare_ = 0;
 
     // used to store initial headers + any potential overread