Discussion:
HTTP/2: allow binary data in header field values
Piotr Sikora
2017-08-29 01:34:31 UTC
Permalink
Hi,
as discussed with some of you in Prague, I'd like remove the
restriction on CR, LF & NUL characters and allow binary data in header
field values in HTTP/2.

Both HTTP/2 and HPACK can pass binary data in header field values
without any issues, but RFC7540 put an artificial restriction on those
characters in order to protect clients and intermediaries converting
requests/responses between HTTP/2 and HTTP/1.1.

Unfortunately, this restriction forces endpoints to use base64
encoding when passing binary data in headers field values, which can
easily become the CPU bottleneck.

This is especially true in multi-tier proxy deployments, like CDNs,
which are connected over high-speed networks and often pass metadata
via HTTP headers.

The proposal I have in mind is based on what gRPC is already doing [1], i.e.:

1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,

2. Binary header field values are prefixed with NUL byte (0x00), so
that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers
and VCHAR headers. In theory, this should also protect peers unaware
of this extension from ever accepting such headers, since RFC7540
requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if
that's really enforced.

3. Binary-aware peers MUST base64 encode binary header field values
when forwarding them to peers unaware of this extension and/or when
converting to HTTP/1.1.

4. Binary header field values cannot be concatenated, because there is
no delimiter that we can use.

NOTE: This proposal implies that endpoints SHOULD NOT use binary
header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers
unaware of this extension MUST reject requests with headers containing
NUL byte (0x00) with a stream error, endpoints could opportunistically
use binary header field values on the first flight and assume that if
peer isn't aware of this extension, then it will reject the request,
which can be subsequently retried with base64 encoded header field
values.

I'd like to hear if anyone strongly disagrees with this proposal
and/or the binary data in header field values in general. Otherwise,
I'm going to write a draft and hopefully we can standardize this
before HTTP/2-over-QUIC, so that binary header field values can be
supported there natively and not via extension.

[1] https://github.com/grpc/proposal/blob/master/G1-true-binary-metadata.md

Best regards,
Piotr Sikora
Willy Tarreau
2017-08-29 04:10:23 UTC
Permalink
Hi,
Post by Piotr Sikora
1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,
This sounds reasonable.
Post by Piotr Sikora
2. Binary header field values are prefixed with NUL byte (0x00), so
that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers
and VCHAR headers. In theory, this should also protect peers unaware
of this extension from ever accepting such headers, since RFC7540
requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if
that's really enforced.
I've just checked my experimental implementation in haproxy and I'm
indeed missing this check. There are two reasons to this, which I think
could indicate others might have done the same mistake :
- the header values are prefixed with their length so there's no need
to parse their contents before copying ;

- the reminder of the list of forbidden characters is only present in
the security consideration section of 7540 and never mentionned in
7541, while it's mostly when processing hpack that such characters
have a chance to be detected.

So at least some check would be needed on other implementations (well,
I might very well have produced the only bogus one, but I think that
overlooking this is easy).
Post by Piotr Sikora
3. Binary-aware peers MUST base64 encode binary header field values
when forwarding them to peers unaware of this extension and/or when
converting to HTTP/1.1.
I really don't like this, because it becomes possible for a sender to
produce special values on the other side by benefitting from the
decoding, thus evading certain filtering measures. For example, an
agent may send :

Content-Encoding: \x00\x83\x38\xa9

and the binary-aware recipient having to "encode" it as gzip would
apply base64 to "\x83\x38\xa9" and would produce :

Content-Encoding: gzip

You can apply the same principle to other fields like content-length or
other and easily cause some trouble. If you need this, at the very least
it's important to enclose the encoded value between some "rare enough"
characters to prevent such risks. Something like this for example :

Binary-Header: $b64<Y2h1bmtlZA==>$

The example above would produce :

Content-Encoding: $b64<gzip>$

That's just an example of course.
Post by Piotr Sikora
4. Binary header field values cannot be concatenated, because there is
no delimiter that we can use.
Good point.
Post by Piotr Sikora
NOTE: This proposal implies that endpoints SHOULD NOT use binary
header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers
unaware of this extension MUST reject requests with headers containing
NUL byte (0x00) with a stream error, endpoints could opportunistically
use binary header field values on the first flight and assume that if
peer isn't aware of this extension, then it will reject the request,
This is really the part which needs to be verified I think.

I'm now thinking that prefixing the value with <CR><NUL> instead of <NUL>
could be more efficient : while <NUL> may translate to an end of string
on many implementations and simply result in a field silently being
processed as empty, a <CR> in the middle of a string is always an
error, unless the next character is <LF> in which case what follows is
the next header field. In case a gateway would inappropriately forward
the <CR> to HTTP/1, it would produce an invalid message, even if
truncated on the <NUL> since the line would end in <CR><CR><LF>.
Post by Piotr Sikora
which can be subsequently retried with base64 encoded header field
values.
I'd like to hear if anyone strongly disagrees with this proposal
and/or the binary data in header field values in general.
I don't disagree and think that some fields could later become binary
(dates, etc). I'm just extra careful for knowing that most HTTP/1 parsers
are extremely lazy and that at least one H2 parser (the one I wrote) does
not notice the presence of <CR>, <LF> or <NUL> in values. But this one is
not yet deployed so it can be fixed. I don't know for other ones.

Cheers,
Willy
James M Snell
2017-08-29 05:35:21 UTC
Permalink
From an implementer point of view, this could be problematic from a
security perspective but with a setting it would be easily mitigated. I for
one would definitely like to see this.
Hi,
Post by Piotr Sikora
The proposal I have in mind is based on what gRPC is already doing [1],
1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS
option,
This sounds reasonable.
Post by Piotr Sikora
2. Binary header field values are prefixed with NUL byte (0x00), so
that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers
and VCHAR headers. In theory, this should also protect peers unaware
of this extension from ever accepting such headers, since RFC7540
requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if
that's really enforced.
I've just checked my experimental implementation in haproxy and I'm
indeed missing this check. There are two reasons to this, which I think
- the header values are prefixed with their length so there's no need
to parse their contents before copying ;
- the reminder of the list of forbidden characters is only present in
the security consideration section of 7540 and never mentionned in
7541, while it's mostly when processing hpack that such characters
have a chance to be detected.
So at least some check would be needed on other implementations (well,
I might very well have produced the only bogus one, but I think that
overlooking this is easy).
Post by Piotr Sikora
3. Binary-aware peers MUST base64 encode binary header field values
when forwarding them to peers unaware of this extension and/or when
converting to HTTP/1.1.
I really don't like this, because it becomes possible for a sender to
produce special values on the other side by benefitting from the
decoding, thus evading certain filtering measures. For example, an
Content-Encoding: \x00\x83\x38\xa9
and the binary-aware recipient having to "encode" it as gzip would
Content-Encoding: gzip
You can apply the same principle to other fields like content-length or
other and easily cause some trouble. If you need this, at the very least
it's important to enclose the encoded value between some "rare enough"
Binary-Header: $b64<Y2h1bmtlZA==>$
Content-Encoding: $b64<gzip>$
That's just an example of course.
Post by Piotr Sikora
4. Binary header field values cannot be concatenated, because there is
no delimiter that we can use.
Good point.
Post by Piotr Sikora
NOTE: This proposal implies that endpoints SHOULD NOT use binary
header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers
unaware of this extension MUST reject requests with headers containing
NUL byte (0x00) with a stream error, endpoints could opportunistically
use binary header field values on the first flight and assume that if
peer isn't aware of this extension, then it will reject the request,
This is really the part which needs to be verified I think.
I'm now thinking that prefixing the value with <CR><NUL> instead of <NUL>
could be more efficient : while <NUL> may translate to an end of string
on many implementations and simply result in a field silently being
processed as empty, a <CR> in the middle of a string is always an
error, unless the next character is <LF> in which case what follows is
the next header field. In case a gateway would inappropriately forward
the <CR> to HTTP/1, it would produce an invalid message, even if
truncated on the <NUL> since the line would end in <CR><CR><LF>.
Post by Piotr Sikora
which can be subsequently retried with base64 encoded header field
values.
I'd like to hear if anyone strongly disagrees with this proposal
and/or the binary data in header field values in general.
I don't disagree and think that some fields could later become binary
(dates, etc). I'm just extra careful for knowing that most HTTP/1 parsers
are extremely lazy and that at least one H2 parser (the one I wrote) does
not notice the presence of <CR>, <LF> or <NUL> in values. But this one is
not yet deployed so it can be fixed. I don't know for other ones.
Cheers,
Willy
Amos Jeffries
2017-08-29 05:44:40 UTC
Permalink
Post by Willy Tarreau
Hi,
Post by Piotr Sikora
1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,
This sounds reasonable.
Post by Piotr Sikora
2. Binary header field values are prefixed with NUL byte (0x00), so
that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers
and VCHAR headers. In theory, this should also protect peers unaware
of this extension from ever accepting such headers, since RFC7540
requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if
that's really enforced.
I've just checked my experimental implementation in haproxy and I'm
indeed missing this check. There are two reasons to this, which I think
- the header values are prefixed with their length so there's no need
to parse their contents before copying ;
- the reminder of the list of forbidden characters is only present in
the security consideration section of 7540 and never mentionned in
7541, while it's mostly when processing hpack that such characters
have a chance to be detected.
So at least some check would be needed on other implementations (well,
I might very well have produced the only bogus one, but I think that
overlooking this is easy).
Post by Piotr Sikora
3. Binary-aware peers MUST base64 encode binary header field values
when forwarding them to peers unaware of this extension and/or when
converting to HTTP/1.1.
I really don't like this, because it becomes possible for a sender to
produce special values on the other side by benefitting from the
decoding, thus evading certain filtering measures. For example, an
Content-Encoding: \x00\x83\x38\xa9
and the binary-aware recipient having to "encode" it as gzip would
Content-Encoding: gzip
You can apply the same principle to other fields like content-length or
other and easily cause some trouble. If you need this, at the very least
it's important to enclose the encoded value between some "rare enough"
Binary-Header: $b64<Y2h1bmtlZA==>$
Content-Encoding: $b64<gzip>$
That's just an example of course.
Post by Piotr Sikora
4. Binary header field values cannot be concatenated, because there is
no delimiter that we can use.
Good point.
Post by Piotr Sikora
NOTE: This proposal implies that endpoints SHOULD NOT use binary
header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers
unaware of this extension MUST reject requests with headers containing
NUL byte (0x00) with a stream error, endpoints could opportunistically
use binary header field values on the first flight and assume that if
peer isn't aware of this extension, then it will reject the request,
This is really the part which needs to be verified I think.
I'm now thinking that prefixing the value with <CR><NUL> instead of <NUL>
could be more efficient : while <NUL> may translate to an end of string
on many implementations and simply result in a field silently being
processed as empty, a <CR> in the middle of a string is always an
error,
No. Think TLS certificates and crypto keys in their binary form. Both CR
and NUL are valid "string" characters. I'm suspecting delivery of such
keys in headers is part of where this proposal is coming from.
Post by Willy Tarreau
unless the next character is <LF> in which case what follows is
the next header field. In case a gateway would inappropriately forward
the <CR> to HTTP/1, it would produce an invalid message, even if
truncated on the <NUL> since the line would end in <CR><CR><LF>.
I don't think so. A bare-CR in RFC 723x is no longer an end-of-line
character. When placed at the beginning or end of a field-value it is
just a part of the BSP / OSP whitespace construction, and when placed
mid-value it is part of the value string [albeit with malformed encoding].

The only invalid part here is the NUL. And ...

My experience from discussions with server admin in the past is that
seeing CR NUL sequence where they expect HTTP/1 Mime-format headers
people assume immediately that the NUL was supposed to be a LF and do
something to 'fix' the problem by adding it as the message transits the
Apache / IIS servers CGI gateway layer.


Amos
Willy Tarreau
2017-08-29 06:37:02 UTC
Permalink
Hi Amos,
Post by Willy Tarreau
I'm now thinking that prefixing the value with <CR><NUL> instead of <NUL>
could be more efficient : while <NUL> may translate to an end of string
on many implementations and simply result in a field silently being
processed as empty, a <CR> in the middle of a string is always an
error,
No. Think TLS certificates and crypto keys in their binary form. Both CR and
NUL are valid "string" characters. I'm suspecting delivery of such keys in
headers is part of where this proposal is coming from.
Absolutely. What I mean is that I want to be sure that *if* such a field
escapes control (ie is mistakenly passed as-is to H1 via a gateway), it is
invalid before starting to process its contents.
Post by Willy Tarreau
unless the next character is <LF> in which case what follows is
the next header field. In case a gateway would inappropriately forward
the <CR> to HTTP/1, it would produce an invalid message, even if
truncated on the <NUL> since the line would end in <CR><CR><LF>.
I don't think so. A bare-CR in RFC 723x is no longer an end-of-line
character.
That's exactly the point, it's not allowed unless followed by an LF,
so its presence followed by a NUL at the beginning of an accidentally
leaked value increases the likeliness of its detection by an H1 agent.
When placed at the beginning or end of a field-value it is just a
part of the BSP / OSP whitespace construction, and when placed mid-value it
is part of the value string [albeit with malformed encoding].
The only invalid part here is the NUL. And ...
My experience from discussions with server admin in the past is that seeing
CR NUL sequence where they expect HTTP/1 Mime-format headers people assume
immediately that the NUL was supposed to be a LF and do something to 'fix'
the problem by adding it as the message transits the Apache / IIS servers
CGI gateway layer.
The problem I'm having with the NUL alone is that all those doing limited
checks will consider it as an end of string. But at least I see that we
agree that there's quite a risk here with legacy implementations.

Willy
Piotr Sikora
2017-08-29 06:56:08 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
I really don't like this, because it becomes possible for a sender to
produce special values on the other side by benefitting from the
decoding, thus evading certain filtering measures. For example, an
Content-Encoding: \x00\x83\x38\xa9
and the binary-aware recipient having to "encode" it as gzip would
Content-Encoding: gzip
You can apply the same principle to other fields like content-length or
other and easily cause some trouble. If you need this, at the very least
it's important to enclose the encoded value between some "rare enough"
Binary-Header: $b64<Y2h1bmtlZA==>$
Content-Encoding: $b64<gzip>$
That's just an example of course.
Good point, I didn't consider that.
Post by Willy Tarreau
This is really the part which needs to be verified I think.
I'm now thinking that prefixing the value with <CR><NUL> instead of <NUL>
could be more efficient : while <NUL> may translate to an end of string
on many implementations and simply result in a field silently being
processed as empty, a <CR> in the middle of a string is always an
error, unless the next character is <LF> in which case what follows is
the next header field. In case a gateway would inappropriately forward
the <CR> to HTTP/1, it would produce an invalid message, even if
truncated on the <NUL> since the line would end in <CR><CR><LF>.
(...)
I don't disagree and think that some fields could later become binary
(dates, etc). I'm just extra careful for knowing that most HTTP/1 parsers
are extremely lazy and that at least one H2 parser (the one I wrote) does
not notice the presence of <CR>, <LF> or <NUL> in values. But this one is
not yet deployed so it can be fixed. I don't know for other ones.
Sorry, I should have omitted the whole "NOTE". I added it only to
illustrate that in some deployments endpoints could abuse RFC7540's
restrictions to avoid base64 encoding on the first flight when they
assume that the peer is binary-aware and will negotiate the extension.
It wasn't meant to be the part of the draft, so please disregard it,
since it clearly side-tracked the discussion.

To reiterate, binary header field values prefixed with NUL byte (0x00)
would be sent only to binary-aware peers that announced support for
binary headers via HTTP/2 SETTINGS option, therefore, those headers
would never be seen by HTTP/1.x or HTTP/2 parsers that aren't aware of
this extension, so I'm not sure if there is any difference whether
it's <NUL>, <CR> or any combination of those.

Best regards,
Piotr Sikora
Willy Tarreau
2017-08-29 07:04:38 UTC
Permalink
Post by Piotr Sikora
To reiterate, binary header field values prefixed with NUL byte (0x00)
would be sent only to binary-aware peers that announced support for
binary headers via HTTP/2 SETTINGS option, therefore, those headers
would never be seen by HTTP/1.x or HTTP/2 parsers that aren't aware of
this extension, so I'm not sure if there is any difference whether
it's <NUL>, <CR> or any combination of those.
So if that's done only after the sender has received the recipient's
SETTINGS frame, I think it's safe, possibly at the cost of an extra RTT,
unless such behaviour could be enforced by configuration where that makes
sense.

Willy
Piotr Sikora
2017-08-29 07:57:47 UTC
Permalink
Hi Willy,
Post by Willy Tarreau
So if that's done only after the sender has received the recipient's
SETTINGS frame, I think it's safe, possibly at the cost of an extra RTT,
unless such behaviour could be enforced by configuration where that makes
sense.
Well, it just means that the first few requests would be sent with
base64 encoded binary header field values (i.e. using standard RFC7540
rules), and after receiving recipient's SETTINGS frame with this
extension, sender would start sending "raw" binary header field values
prefixed with NUL byte (0x00).

Hope this clarifies things!

Best regards,
Piotr Sikora
Willy Tarreau
2017-08-29 08:10:45 UTC
Permalink
Post by Piotr Sikora
Hi Willy,
Post by Willy Tarreau
So if that's done only after the sender has received the recipient's
SETTINGS frame, I think it's safe, possibly at the cost of an extra RTT,
unless such behaviour could be enforced by configuration where that makes
sense.
Well, it just means that the first few requests would be sent with
base64 encoded binary header field values (i.e. using standard RFC7540
rules), and after receiving recipient's SETTINGS frame with this
extension, sender would start sending "raw" binary header field values
prefixed with NUL byte (0x00).
Hope this clarifies things!
Yes that's my understanding as well.

Willy
Amos Jeffries
2017-08-29 05:28:23 UTC
Permalink
Post by Piotr Sikora
Hi,
as discussed with some of you in Prague, I'd like remove the
restriction on CR, LF & NUL characters and allow binary data in header
field values in HTTP/2.
Both HTTP/2 and HPACK can pass binary data in header field values
without any issues, but RFC7540 put an artificial restriction on those
characters in order to protect clients and intermediaries converting
requests/responses between HTTP/2 and HTTP/1.1.
The prohibition of these characters is a Security Considerations
requirement in HTTP. It would be best to keep that fact clearly
up-front. It was not a casual / arbitrary design decision, so the
reasons for it cannot just be ignored when implementing or negotiating
extended behaviour.

So long as HTTP/1 <-> HTTP/2 gateways exist the security attacks will
remain a problem. This is not a theoretical problem either,
intermediaries are still fending off active attacks and malformed agent
messages involving these three characters in HTTP/1.x environment before
HTTP/2 mapping even gets involved.

The simple problem is that one cannot guarantee the absence of a mapping
gateway in any transaction. So it HAS to be considered by every agent
involved.
Post by Piotr Sikora
Unfortunately, this restriction forces endpoints to use base64
encoding when passing binary data in headers field values, which can
easily become the CPU bottleneck.
It is worth noting that base64 encoding is more efficiently encoded by
HPACK. So avoiding it is a con', not a pro'.
Post by Piotr Sikora
This is especially true in multi-tier proxy deployments, like CDNs,
which are connected over high-speed networks and often pass metadata
via HTTP headers.
It is worth noting that the RFC7540 offers some benefits here. Any of
their internal traffic using the extended ability that gets leaked will
be actively rejected by RFC7540 compliant agents outside.
Post by Piotr Sikora
1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,
2. Binary header field values are prefixed with NUL byte (0x00), so
that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers
and VCHAR headers. In theory, this should also protect peers unaware
of this extension from ever accepting such headers, since RFC7540
requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if
that's really enforced.
There is no need for this. With the SETTINGS value already negotiating
the ability HPACK simply needs to decode the wire syntax into a binary
'string' header.

Agents that comply and reject the headers will not be negotiating to
accept them. If the binary value sent in any particular message header
does not use these trouble characters there is no harm in letting it
through, so artificially forcing rejection is not beneficial here like
the RFC 7540 requirement was for default / general use.
Post by Piotr Sikora
3. Binary-aware peers MUST base64 encode binary header field values
when forwarding them to peers unaware of this extension and/or when
converting to HTTP/1.1.
As written that would violate RFC 7540. This requirement needs to take
the form of prohibiting sending binary headers to any peers which has
not explicitly negotiated the extension being defined.
ie, comply with RFC7540 an all connections unless explicitly
negotiated otherwise on a per-connection basis.
Post by Piotr Sikora
4. Binary header field values cannot be concatenated, because there is
no delimiter that we can use.
Of course they can. Every coding language has some form of array or
linked-list structure available.

To display these type of header in *ASCII MiME format* on the other hand
requires encoding by the display agent. HTTP/2 does not change any
requirements around display, it is concerned only with the on-wire delivery.
Post by Piotr Sikora
NOTE: This proposal implies that endpoints SHOULD NOT use binary
No. MUST NOT. RFC 7540 still applies during this pre-negotiation period.
Agents which assume capabilities not specific in HTTP/2 *will* get into
trouble eventually.
Post by Piotr Sikora
header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers
unaware of this extension MUST reject requests with headers containing
NUL byte (0x00) with a stream error, endpoints could opportunistically
use binary header field values on the first flight and assume that if
peer isn't aware of this extension, then it will reject the request,
which can be subsequently retried with base64 encoded header field
values.
I'd like to hear if anyone strongly disagrees with this proposal
and/or the binary data in header field values in general. Otherwise,
I'm going to write a draft and hopefully we can standardize this
before HTTP/2-over-QUIC, so that binary header field values can be
supported there natively and not via extension.
How do you plan on making it "native HTTP/2" without replacing the whole
HTTP/2 RFC *and* getting that new specification rolled out to the
non-QUIC world?

(I am Seriously interested in that answer. Many of us middleware
implementers have been pushing for UTF-8 / binary support in headers for
around 10 years already and progress has been painfully slow).

It seems to me you [like several of us] are dreaming of
HTTP/3-over-QUIC. Not HTTP/2-over-QUIC, extended or otherwise. I am very
doubtful that getting all this done before QUIC rolls out is going to be
possible - a negotiable extension is far more realistic and will allow a
testing rollout to happen before everybody in the HTTP world has to
change code for it.

Amos
Willy Tarreau
2017-08-29 06:42:45 UTC
Permalink
Hi Amos,
Post by Piotr Sikora
2. Binary header field values are prefixed with NUL byte (0x00), so
that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers
and VCHAR headers. In theory, this should also protect peers unaware
of this extension from ever accepting such headers, since RFC7540
requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if
that's really enforced.
There is no need for this. With the SETTINGS value already negotiating the
ability HPACK simply needs to decode the wire syntax into a binary 'string'
header.
There's no *need* but a huge value : it saves gateways from having to check
for any forbidden character in the string at the moment it decides to pass
them, to know if they have to be encoded or not. Having the information
upfront provides a significant benefit.
I am very doubtful that getting
all this done before QUIC rolls out is going to be possible - a negotiable
extension is far more realistic and will allow a testing rollout to happen
before everybody in the HTTP world has to change code for it.
I tend to think that negociation is the only really safe way there as well.
We could possibly state that the knowledge of the other side's support can
be enforced by configuration, this would allow low-latency profiles in various
web services environments when both ends are well controlled.

Regards,
Willy
Piotr Sikora
2017-08-29 07:26:48 UTC
Permalink
Hi Amos,
The prohibition of these characters is a Security Considerations requirement
in HTTP. It would be best to keep that fact clearly up-front. It was not a
casual / arbitrary design decision, so the reasons for it cannot just be
ignored when implementing or negotiating extended behaviour.
So long as HTTP/1 <-> HTTP/2 gateways exist the security attacks will remain
a problem. This is not a theoretical problem either, intermediaries are
still fending off active attacks and malformed agent messages involving
these three characters in HTTP/1.x environment before HTTP/2 mapping even
gets involved.
The simple problem is that one cannot guarantee the absence of a mapping
gateway in any transaction. So it HAS to be considered by every agent
involved.
100% agreed. What I meant is that it's an artificial limitation in the
HTTP/2 protocol, since it wouldn't be necessary in HTTP/2 only world.
It is worth noting that base64 encoding is more efficiently encoded by
HPACK. So avoiding it is a con', not a pro'.
I'm not sure that I agree.

Base64 with Huffman Encoding is basically a wash in terms of space
(4/3 * 6 bits ~= 1 byte per byte), and requires somehow CPU-intensive
transformations.

Binary data without Huffman Encoding is also 1 byte per byte, and
requires no CPU-intensive transformations.
There is no need for this. With the SETTINGS value already negotiating the
ability HPACK simply needs to decode the wire syntax into a binary 'string'
header.
Agents that comply and reject the headers will not be negotiating to accept
them. If the binary value sent in any particular message header does not use
these trouble characters there is no harm in letting it through, so
artificially forcing rejection is not beneficial here like the RFC 7540
requirement was for default / general use.
Oh, it wasn't meant to force rejection. The main point of NUL byte
(0x00) is for binary-aware proxies to know which headers they need to
base64 encode when forwarding to HTTP/1.x or HTTP/2 peers unaware of
this extension.
As written that would violate RFC 7540. This requirement needs to take the
form of prohibiting sending binary headers to any peers which has not
explicitly negotiated the extension being defined.
ie, comply with RFC7540 an all connections unless explicitly negotiated
otherwise on a per-connection basis.
Yes, this wasn't formal enough language, but I meant the same.
Of course they can. Every coding language has some form of array or
linked-list structure available.
To display these type of header in *ASCII MiME format* on the other hand
requires encoding by the display agent. HTTP/2 does not change any
requirements around display, it is concerned only with the on-wire delivery.
Fair enough. We could invent a format to allow concatenation of those
header field values, but I'm not sure if that's worth the trouble.
How do you plan on making it "native HTTP/2" without replacing the whole
HTTP/2 RFC *and* getting that new specification rolled out to the non-QUIC
world?
(I am Seriously interested in that answer. Many of us middleware
implementers have been pushing for UTF-8 / binary support in headers for
around 10 years already and progress has been painfully slow).
It seems to me you [like several of us] are dreaming of HTTP/3-over-QUIC.
Not HTTP/2-over-QUIC, extended or otherwise. I am very doubtful that getting
all this done before QUIC rolls out is going to be possible - a negotiable
extension is far more realistic and will allow a testing rollout to happen
before everybody in the HTTP world has to change code for it.
I'm not sure if that was clear from my original email (it probably
wasn't...), but the proposal is to allow binary header field values
(prefixed by NUL byte) along regular VCHAR headers, not to convert
everything (dates, sizes, etc.) into binary values.

As for "native" - what I meant is that in HTTP/2-over-TCP, allowing
binary header field values will be announced via HTTP/2 SETTINGS
option, and, if we're luckily enough and this gets standardized before
QUIC, then in HTTP/2-over-QUIC, this settings would be supported
natively (i.e. simply enabled by default), so that all
HTTP/2-over-QUIC peers would be binary-aware.

I don't believe this extension requires a lot of changes (assuming
that w're not converting everything to binary), and HTTP/2-over-QUIC
is already different enough that this should be an acceptable change.

Best regards,
Piotr Sikora
Mike Bishop
2017-08-29 17:52:03 UTC
Permalink
As with other typed header fields (and let's be clear, binary blob is just another type), this isn't about changing HTTP/2, it's about changing HTTP. Currently, header fields in HTTP are, by definition, sequences of octets with a scoped range of valid values. If you change the allowed values, that's a change at the semantic layer, not to any given transport mapping. This is the HTTP WG; we can do that, but let's be clear what we're talking about. But we'd need to have reasonable ways of ensuring that the values are sanitized before they're passed to "legacy" HTTP consumers.

As you note, HTTP/2 and HPACK are already perfectly capable of transporting these octets. You can even Huffman-encode a binary blob if you want -- all possible values are listed in the table, though non-ASCII octets are severely disadvantaged. That's precisely what the Security Considerations says -- HTTP/2 (i.e. the TCP mapping) is capable of transporting header values that aren't valid HTTP, and it's the HTTP layer's responsibility to validate that. Obviously, if you rev HTTP to make those valid values, those checks would be modified. The HTTP/QUIC mapping is no different -- it's capable of transporting these values already, but the HTTP layer knows they're not valid.

On the whole, I can see niche situations where this might be useful, but I think it will be difficult to deploy generally. Our stacks essentially act as HTTP/1.1-to-2 intermediaries within client and server; we don't assume that the apps above our layer are HTTP/2-aware, though obviously we expose ways to take advantage of extra features. Unless we wanted to add additional header set/get APIs that supported typing, I suspect we would initially opt not to advertise this extension rather than base64-encode headers upon arrival. That's just extra work for no apparent benefit.

And if we're going to go this route and modify HTTP itself, let's have a reasonable set of types instead of just adding one at a time. 😊

-----Original Message-----
From: Piotr Sikora [mailto:***@google.com]
Sent: Monday, August 28, 2017 6:35 PM
To: HTTP Working Group <ietf-http-***@w3.org>
Cc: Craig Tiller <***@google.com>
Subject: HTTP/2: allow binary data in header field values

Hi,
as discussed with some of you in Prague, I'd like remove the restriction on CR, LF & NUL characters and allow binary data in header field values in HTTP/2.

Both HTTP/2 and HPACK can pass binary data in header field values without any issues, but RFC7540 put an artificial restriction on those characters in order to protect clients and intermediaries converting requests/responses between HTTP/2 and HTTP/1.1.

Unfortunately, this restriction forces endpoints to use base64 encoding when passing binary data in headers field values, which can easily become the CPU bottleneck.

This is especially true in multi-tier proxy deployments, like CDNs, which are connected over high-speed networks and often pass metadata via HTTP headers.

The proposal I have in mind is based on what gRPC is already doing [1], i.e.:

1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,

2. Binary header field values are prefixed with NUL byte (0x00), so that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers and VCHAR headers. In theory, this should also protect peers unaware of this extension from ever accepting such headers, since RFC7540 requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if that's really enforced.

3. Binary-aware peers MUST base64 encode binary header field values when forwarding them to peers unaware of this extension and/or when converting to HTTP/1.1.

4. Binary header field values cannot be concatenated, because there is no delimiter that we can use.

NOTE: This proposal implies that endpoints SHOULD NOT use binary header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers unaware of this extension MUST reject requests with headers containing NUL byte (0x00) with a stream error, endpoints could opportunistically use binary header field values on the first flight and assume that if peer isn't aware of this extension, then it will reject the request, which can be subsequently retried with base64 encoded header field values.

I'd like to hear if anyone strongly disagrees with this proposal and/or the binary data in header field values in general. Otherwise, I'm going to write a draft and hopefully we can standardize this before HTTP/2-over-QUIC, so that binary header field values can be supported there natively and not via extension.

[1] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgrpc%2Fproposal%2Fblob%2Fmaster%2FG1-true-binary-metadata.md&data=02%7C01%7CMichael.Bishop%40microsoft.com%7C3c1991f44f854de0c52e08d4ee7eaa87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636395675174515196&sdata=dd2q2yYa%2FeBFZvJIvRsJhOjriWgrHPrxnRxmMFvjxss%3D&reserved=0
Jason Greene
2017-08-29 19:49:23 UTC
Permalink
Just to add to this, a semantic definition would prevent encoding optimization loss between multiple hops (e.g. h2->h1->h2). You ideally want the definition to persist so that any intermediary, and additionally any layer of the stack understands how to interact with it.

-Jason
Post by Mike Bishop
As with other typed header fields (and let's be clear, binary blob is just another type), this isn't about changing HTTP/2, it's about changing HTTP. Currently, header fields in HTTP are, by definition, sequences of octets with a scoped range of valid values. If you change the allowed values, that's a change at the semantic layer, not to any given transport mapping. This is the HTTP WG; we can do that, but let's be clear what we're talking about. But we'd need to have reasonable ways of ensuring that the values are sanitized before they're passed to "legacy" HTTP consumers.
As you note, HTTP/2 and HPACK are already perfectly capable of transporting these octets. You can even Huffman-encode a binary blob if you want -- all possible values are listed in the table, though non-ASCII octets are severely disadvantaged. That's precisely what the Security Considerations says -- HTTP/2 (i.e. the TCP mapping) is capable of transporting header values that aren't valid HTTP, and it's the HTTP layer's responsibility to validate that. Obviously, if you rev HTTP to make those valid values, those checks would be modified. The HTTP/QUIC mapping is no different -- it's capable of transporting these values already, but the HTTP layer knows they're not valid.
On the whole, I can see niche situations where this might be useful, but I think it will be difficult to deploy generally. Our stacks essentially act as HTTP/1.1-to-2 intermediaries within client and server; we don't assume that the apps above our layer are HTTP/2-aware, though obviously we expose ways to take advantage of extra features. Unless we wanted to add additional header set/get APIs that supported typing, I suspect we would initially opt not to advertise this extension rather than base64-encode headers upon arrival. That's just extra work for no apparent benefit.
And if we're going to go this route and modify HTTP itself, let's have a reasonable set of types instead of just adding one at a time. 😊
-----Original Message-----
Sent: Monday, August 28, 2017 6:35 PM
Subject: HTTP/2: allow binary data in header field values
Hi,
as discussed with some of you in Prague, I'd like remove the restriction on CR, LF & NUL characters and allow binary data in header field values in HTTP/2.
Both HTTP/2 and HPACK can pass binary data in header field values without any issues, but RFC7540 put an artificial restriction on those characters in order to protect clients and intermediaries converting requests/responses between HTTP/2 and HTTP/1.1.
Unfortunately, this restriction forces endpoints to use base64 encoding when passing binary data in headers field values, which can easily become the CPU bottleneck.
This is especially true in multi-tier proxy deployments, like CDNs, which are connected over high-speed networks and often pass metadata via HTTP headers.
1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,
2. Binary header field values are prefixed with NUL byte (0x00), so that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers and VCHAR headers. In theory, this should also protect peers unaware of this extension from ever accepting such headers, since RFC7540 requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if that's really enforced.
3. Binary-aware peers MUST base64 encode binary header field values when forwarding them to peers unaware of this extension and/or when converting to HTTP/1.1.
4. Binary header field values cannot be concatenated, because there is no delimiter that we can use.
NOTE: This proposal implies that endpoints SHOULD NOT use binary header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers unaware of this extension MUST reject requests with headers containing NUL byte (0x00) with a stream error, endpoints could opportunistically use binary header field values on the first flight and assume that if peer isn't aware of this extension, then it will reject the request, which can be subsequently retried with base64 encoded header field values.
I'd like to hear if anyone strongly disagrees with this proposal and/or the binary data in header field values in general. Otherwise, I'm going to write a draft and hopefully we can standardize this before HTTP/2-over-QUIC, so that binary header field values can be supported there natively and not via extension.
[1] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgrpc%2Fproposal%2Fblob%2Fmaster%2FG1-true-binary-metadata.md&data=02%7C01%7CMichael.Bishop%40microsoft.com%7C3c1991f44f854de0c52e08d4ee7eaa87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636395675174515196&sdata=dd2q2yYa%2FeBFZvJIvRsJhOjriWgrHPrxnRxmMFvjxss%3D&reserved=0
Best regards,
Piotr Sikora
--
Jason T. Greene
Chief Architect, JBoss EAP
Red Hat
Piotr Sikora
2018-11-07 04:58:03 UTC
Permalink
Reviving this thread now that we have HTTP Core, with semantics
separated from HTTP/1.1 and HTTP/2 messaging.

As of right now, the header values at the semantics layer are limited
to the visible characters:

field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text

where:

VCHAR = %x21-7E
obs-text = %x80-FF

I'm fine with the existing restrictions being enforced at the HTTP/1.1
messaging layer, where binary values could be converted according to
the "Byte Sequence" rules from Structured Headers, however both HTTP/2
and HTTP/3 are perfectly capable of transmitting all octets, so the
semantics layer shouldn't be limited by the fact that HTTP/1.x is a
text-based protocol.

If this is published as-is, it's going to prevent use of binary values
in header fields "for the rest of our careers" (according to the WG
chairs), so I guess it's "now or never" kind of thing.

Thoughts?

Best regards,
Piotr Sikora

On Tue, Aug 29, 2017 at 10:52 AM Mike Bishop
As with other typed header fields (and let's be clear, binary blob is just another type), this isn't about changing HTTP/2, it's about changing HTTP.. Currently, header fields in HTTP are, by definition, sequences of octets with a scoped range of valid values. If you change the allowed values, that's a change at the semantic layer, not to any given transport mapping. This is the HTTP WG; we can do that, but let's be clear what we're talking about. But we'd need to have reasonable ways of ensuring that the values are sanitized before they're passed to "legacy" HTTP consumers.
As you note, HTTP/2 and HPACK are already perfectly capable of transporting these octets. You can even Huffman-encode a binary blob if you want -- all possible values are listed in the table, though non-ASCII octets are severely disadvantaged. That's precisely what the Security Considerations says -- HTTP/2 (i.e. the TCP mapping) is capable of transporting header values that aren't valid HTTP, and it's the HTTP layer's responsibility to validate that. Obviously, if you rev HTTP to make those valid values, those checks would be modified. The HTTP/QUIC mapping is no different -- it's capable of transporting these values already, but the HTTP layer knows they're not valid.
On the whole, I can see niche situations where this might be useful, but I think it will be difficult to deploy generally. Our stacks essentially act as HTTP/1.1-to-2 intermediaries within client and server; we don't assume that the apps above our layer are HTTP/2-aware, though obviously we expose ways to take advantage of extra features. Unless we wanted to add additional header set/get APIs that supported typing, I suspect we would initially opt not to advertise this extension rather than base64-encode headers upon arrival. That's just extra work for no apparent benefit.
And if we're going to go this route and modify HTTP itself, let's have a reasonable set of types instead of just adding one at a time.
-----Original Message-----
Sent: Monday, August 28, 2017 6:35 PM
Subject: HTTP/2: allow binary data in header field values
Hi,
as discussed with some of you in Prague, I'd like remove the restriction on CR, LF & NUL characters and allow binary data in header field values in HTTP/2.
Both HTTP/2 and HPACK can pass binary data in header field values without any issues, but RFC7540 put an artificial restriction on those characters in order to protect clients and intermediaries converting requests/responses between HTTP/2 and HTTP/1.1.
Unfortunately, this restriction forces endpoints to use base64 encoding when passing binary data in headers field values, which can easily become the CPU bottleneck.
This is especially true in multi-tier proxy deployments, like CDNs, which are connected over high-speed networks and often pass metadata via HTTP headers.
1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,
2. Binary header field values are prefixed with NUL byte (0x00), so that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers and VCHAR headers. In theory, this should also protect peers unaware of this extension from ever accepting such headers, since RFC7540 requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if that's really enforced.
3. Binary-aware peers MUST base64 encode binary header field values when forwarding them to peers unaware of this extension and/or when converting to HTTP/1.1.
4. Binary header field values cannot be concatenated, because there is no delimiter that we can use.
NOTE: This proposal implies that endpoints SHOULD NOT use binary header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers unaware of this extension MUST reject requests with headers containing NUL byte (0x00) with a stream error, endpoints could opportunistically use binary header field values on the first flight and assume that if peer isn't aware of this extension, then it will reject the request, which can be subsequently retried with base64 encoded header field values.
I'd like to hear if anyone strongly disagrees with this proposal and/or the binary data in header field values in general. Otherwise, I'm going to write a draft and hopefully we can standardize this before HTTP/2-over-QUIC, so that binary header field values can be supported there natively and not via extension.
[1] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgrpc%2Fproposal%2Fblob%2Fmaster%2FG1-true-binary-metadata.md&data=02%7C01%7CMichael.Bishop%40microsoft.com%7C3c1991f44f854de0c52e08d4ee7eaa87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636395675174515196&sdata=dd2q2yYa%2FeBFZvJIvRsJhOjriWgrHPrxnRxmMFvjxss%3D&reserved=0
Best regards,
Piotr Sikora
Willy Tarreau
2018-11-07 05:53:29 UTC
Permalink
Post by Piotr Sikora
Reviving this thread now that we have HTTP Core, with semantics
separated from HTTP/1.1 and HTTP/2 messaging.
As of right now, the header values at the semantics layer are limited
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
VCHAR = %x21-7E
obs-text = %x80-FF
I'm fine with the existing restrictions being enforced at the HTTP/1.1
messaging layer, where binary values could be converted according to
the "Byte Sequence" rules from Structured Headers, however both HTTP/2
and HTTP/3 are perfectly capable of transmitting all octets, so the
semantics layer shouldn't be limited by the fact that HTTP/1.x is a
text-based protocol.
If this is published as-is, it's going to prevent use of binary values
in header fields "for the rest of our careers" (according to the WG
chairs), so I guess it's "now or never" kind of thing.
Thoughts?
While I can see the value in doing this for having had to deal with value
encoding, I also see some obvious problems with it like transcoding to
older HTTP versions. Despite this I think it deserves some thinking.

My primary concern really is about the risk that such header fields get
transcoded to H1 and cause huge damage, even where unexpected by the
persons deploying a gateway for instance. One idea against this could
be that we introduce a new header field (please don't beat me) to
indicate if a message may be downgraded and if so, till what version.
We could for example have "Requires: HTTP/2" in an H3 message to
indicate that the message cannot be conveyed over HTTP versions older
than 2. This could be useful over the long term to transport other
semantics that were possibly ambigiuous before certain versions. In
this case a message conveying binary data would be expected to pass
this "requires: h2" field to make sure a gateway doesn't pass it over
an older version. And in my opinion it's this signal that needs to be
defined early and before we generalize H{3,2,1} <-> H{3,2,1} gateways.

We could then decide that certain new header fields must be watched
and obeyed by gateways supporting certain versions.

We already have this issue with some protocol elements introduced in
H2 like the "never index" fields in HPACK. It's the reason we've had
to completely redesign the internal HTTP stack in haproxy, because for
now H2 messages are translated to HTTP/1.1 but it's not possible to
keep this type of information there if we need to re-encode to H2.

Another important element to keep in mind is the list delimiter. The
HTTP spec says that a header field may appear multiple times in a
message if and only if it's defined as a comma-delimited list (with
an exception for set-cookie which we all love). With your proposal
to allow all characters and to pass binary data, as soon as a header
field appears multiple times, there will definitely be agents which
will fold the values by appending a comma and a space and this will
break your contents. So we'd probably need to define how such header
fields should (not?) be folded and which ones it applies to.

Finally, agents are free to trim leading and trailing LWS in values.
Here again it will destroy your contents, so we also need to take care
of this.

For all these reasons I'm starting to suspect that we'll sooner or
later have to introduce a notion of properties associated with header
fields. One of them could be "binary", which implies no trimming, no
folding. Another one could be "do-not-fold" for set-cookie and possibly
others (i.e. all those supposed to contain a comma like Date or Expires).
We could imagine having a "binary: <name-list>" field passing the list
of binary header names, but it really is a pain to deal with parsers
relying on names found in other header fields.

All such properties could be defined in the core with their default
values (e.g. no-fold for "set-cookie"), and certain HTTP versions could
be able to override the default properties, for instance to specify that
a given field is of type binary and must not be mangled. And it's only
with the minimum required version signal that you can make sure that all
elements along the chain will respect the promise not to touch it.

Regards,
Willy
Mark Nottingham
2018-11-07 06:12:53 UTC
Permalink
Structured Headers effectively creates a new abstraction on top of the restrictions in core; if an implementation wants to expose an API that skips the textual serialisation, it can do that without violating core.

Cheers,
Post by Piotr Sikora
Reviving this thread now that we have HTTP Core, with semantics
separated from HTTP/1.1 and HTTP/2 messaging.
As of right now, the header values at the semantics layer are limited
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
VCHAR = %x21-7E
obs-text = %x80-FF
I'm fine with the existing restrictions being enforced at the HTTP/1.1
messaging layer, where binary values could be converted according to
the "Byte Sequence" rules from Structured Headers, however both HTTP/2
and HTTP/3 are perfectly capable of transmitting all octets, so the
semantics layer shouldn't be limited by the fact that HTTP/1.x is a
text-based protocol.
If this is published as-is, it's going to prevent use of binary values
in header fields "for the rest of our careers" (according to the WG
chairs), so I guess it's "now or never" kind of thing.
Thoughts?
Best regards,
Piotr Sikora
On Tue, Aug 29, 2017 at 10:52 AM Mike Bishop
As with other typed header fields (and let's be clear, binary blob is just another type), this isn't about changing HTTP/2, it's about changing HTTP.. Currently, header fields in HTTP are, by definition, sequences of octets with a scoped range of valid values. If you change the allowed values, that's a change at the semantic layer, not to any given transport mapping. This is the HTTP WG; we can do that, but let's be clear what we're talking about. But we'd need to have reasonable ways of ensuring that the values are sanitized before they're passed to "legacy" HTTP consumers.
As you note, HTTP/2 and HPACK are already perfectly capable of transporting these octets. You can even Huffman-encode a binary blob if you want -- all possible values are listed in the table, though non-ASCII octets are severely disadvantaged. That's precisely what the Security Considerations says -- HTTP/2 (i.e. the TCP mapping) is capable of transporting header values that aren't valid HTTP, and it's the HTTP layer's responsibility to validate that. Obviously, if you rev HTTP to make those valid values, those checks would be modified. The HTTP/QUIC mapping is no different -- it's capable of transporting these values already, but the HTTP layer knows they're not valid.
On the whole, I can see niche situations where this might be useful, but I think it will be difficult to deploy generally. Our stacks essentially act as HTTP/1.1-to-2 intermediaries within client and server; we don't assume that the apps above our layer are HTTP/2-aware, though obviously we expose ways to take advantage of extra features. Unless we wanted to add additional header set/get APIs that supported typing, I suspect we would initially opt not to advertise this extension rather than base64-encode headers upon arrival. That's just extra work for no apparent benefit.
And if we're going to go this route and modify HTTP itself, let's have a reasonable set of types instead of just adding one at a time.
-----Original Message-----
Sent: Monday, August 28, 2017 6:35 PM
Subject: HTTP/2: allow binary data in header field values
Hi,
as discussed with some of you in Prague, I'd like remove the restriction on CR, LF & NUL characters and allow binary data in header field values in HTTP/2.
Both HTTP/2 and HPACK can pass binary data in header field values without any issues, but RFC7540 put an artificial restriction on those characters in order to protect clients and intermediaries converting requests/responses between HTTP/2 and HTTP/1.1.
Unfortunately, this restriction forces endpoints to use base64 encoding when passing binary data in headers field values, which can easily become the CPU bottleneck.
This is especially true in multi-tier proxy deployments, like CDNs, which are connected over high-speed networks and often pass metadata via HTTP headers.
1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,
2. Binary header field values are prefixed with NUL byte (0x00), so that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers and VCHAR headers. In theory, this should also protect peers unaware of this extension from ever accepting such headers, since RFC7540 requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if that's really enforced.
3. Binary-aware peers MUST base64 encode binary header field values when forwarding them to peers unaware of this extension and/or when converting to HTTP/1.1.
4. Binary header field values cannot be concatenated, because there is no delimiter that we can use.
NOTE: This proposal implies that endpoints SHOULD NOT use binary header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers unaware of this extension MUST reject requests with headers containing NUL byte (0x00) with a stream error, endpoints could opportunistically use binary header field values on the first flight and assume that if peer isn't aware of this extension, then it will reject the request, which can be subsequently retried with base64 encoded header field values.
I'd like to hear if anyone strongly disagrees with this proposal and/or the binary data in header field values in general. Otherwise, I'm going to write a draft and hopefully we can standardize this before HTTP/2-over-QUIC, so that binary header field values can be supported there natively and not via extension.
[1] https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgrpc%2Fproposal%2Fblob%2Fmaster%2FG1-true-binary-metadata.md&data=02%7C01%7CMichael.Bishop%40microsoft.com%7C3c1991f44f854de0c52e08d4ee7eaa87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636395675174515196&sdata=dd2q2yYa%2FeBFZvJIvRsJhOjriWgrHPrxnRxmMFvjxss%3D&reserved=0
Best regards,
Piotr Sikora
--
Mark Nottingham https://www.mnot.net/

Jeffrey Yasskin
2017-08-29 18:00:40 UTC
Permalink
How does this interact with
https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-01?

Jeffrey
Post by Piotr Sikora
Hi,
as discussed with some of you in Prague, I'd like remove the
restriction on CR, LF & NUL characters and allow binary data in header
field values in HTTP/2.
Both HTTP/2 and HPACK can pass binary data in header field values
without any issues, but RFC7540 put an artificial restriction on those
characters in order to protect clients and intermediaries converting
requests/responses between HTTP/2 and HTTP/1.1.
Unfortunately, this restriction forces endpoints to use base64
encoding when passing binary data in headers field values, which can
easily become the CPU bottleneck.
This is especially true in multi-tier proxy deployments, like CDNs,
which are connected over high-speed networks and often pass metadata
via HTTP headers.
1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,
2. Binary header field values are prefixed with NUL byte (0x00), so
that binary value 0xFF is encoded as a header field value 0x00 0xFF.
This allows binary-aware peers to differentiate between binary headers
and VCHAR headers. In theory, this should also protect peers unaware
of this extension from ever accepting such headers, since RFC7540
requires that requests/responses with headers containing NUL byte
(0x00) MUST be treated as malformed and rejected, but I'm not sure if
that's really enforced.
3. Binary-aware peers MUST base64 encode binary header field values
when forwarding them to peers unaware of this extension and/or when
converting to HTTP/1.1.
4. Binary header field values cannot be concatenated, because there is
no delimiter that we can use.
NOTE: This proposal implies that endpoints SHOULD NOT use binary
header field values before receiving HTTP/2 SETTINGS from the peer.
However, since, at least in theory, all RFC7540-compliant peers
unaware of this extension MUST reject requests with headers containing
NUL byte (0x00) with a stream error, endpoints could opportunistically
use binary header field values on the first flight and assume that if
peer isn't aware of this extension, then it will reject the request,
which can be subsequently retried with base64 encoded header field
values.
I'd like to hear if anyone strongly disagrees with this proposal
and/or the binary data in header field values in general. Otherwise,
I'm going to write a draft and hopefully we can standardize this
before HTTP/2-over-QUIC, so that binary header field values can be
supported there natively and not via extension.
[1] https://github.com/grpc/proposal/blob/master/G1-true-
binary-metadata.md
Best regards,
Piotr Sikora
Loading...