Migrating some high-entropy HTTP headers to Client Hints.

Discussion:

Mike West

2018-11-29 10:22:27 UTC

Hey folks,

Section 9.7 of RFC7231 <https://tools.ietf.org/html/rfc7231#section-9.7>
rightly notes that some of the content negotiation headers user agents
deliver in HTTP requests create substantial fingerprinting surface. I think
it would be beneficial if we took steps to reduce their prevalence on the
wire, and Client Hints looks like a reasonable infrastructure on top of
which to build.

`User-Agent` and `Accept-Language` seem like particularly tasty and
low-hanging fruit, and I've sketched out two proposals as proofs of concept:

* `User-Agent` could be represented as ~four distinct hints: `UA`,
`Model`, `Platform`, and `Arch`: https://github.com/mikewest/ua-client-hints is
a high-level explainer, and
https://tools.ietf.org/html/draft-west-ua-client-hints a sketchy ID for the
new headers.

* `Accept-Language` could be represented as a `Lang` hint:
https://github.com/mikewest/lang-client-hint is a high-level explainer,
https://tools.ietf.org/html/draft-west-lang-client-hint an equally sketchy
ID for the new header.

I'd appreciate y'all's feedback. Thanks!

-mike

Thomas Peterson

2018-11-29 12:08:05 UTC

Permalink

I would propose that all Accept* headers are included in Client Hints as
all can be used for some level of fingerprinting, e.g. Accept can used
to distinguish between desktop browsers (which typically have html/xml
MIME types) and cURL/wget which by default have '*/*'. Many user agents
also do their own guess work on response bodies anyway (such as looking
at the magic number) to determine content type or encoding, so the
impact of a "failed negotiation" of content can be limited.

Also, Is there a particular reason why Sec-CH-Lang omits Quality Values?

Regards

Post by Mike West
Hey folks,
Section 9.7 of RFC7231
<https://tools.ietf.org/html/rfc7231#section-9.7> rightly notes that
some of the content negotiation headers user agents deliver in HTTP
requests create substantial fingerprinting surface. I think it would
be beneficial if we took steps to reduce their prevalence on the wire,
and Client Hints looks like a reasonable infrastructure on top of
which to build.
`User-Agent` and `Accept-Language` seem like particularly tasty and
* `User-Agent` could be represented as ~four distinct hints: `UA`,
https://github.com/mikewest/ua-client-hints is a high-level explainer,
and https://tools.ietf.org/html/draft-west-ua-client-hints a sketchy
ID for the new headers.
https://github.com/mikewest/lang-client-hint is a high-level
explainer, https://tools.ietf.org/html/draft-west-lang-client-hint an
equally sketchy ID for the new header.
I'd appreciate y'all's feedback. Thanks!
-mike

Mike West

2018-11-29 12:54:53 UTC

Permalink

Thanks for the feedback!

Post by Thomas Peterson
I would propose that all Accept* headers are included in Client Hints as
all can be used for some level of fingerprinting, e.g. Accept can used
to distinguish between desktop browsers (which typically have html/xml
MIME types) and cURL/wget which by default have '*/*'.

The philosophy in https://tools.ietf.org/html/draft-west-ua-client-hints is
that it's reasonable to expose basic information about the user agent (e.g.
it's Firefox, not cURL). That level of information seems quite difficult to
hide (given differences in behavior, network stacks, etc.) and quite
valuable to developers, which tips the balance for me towards exposing
brand and major version by default.

With that in mind, `Accept` and `Accept-Encoding` seem to be fairly static
in their relationship to the UA brand and version. Chrome more or less
hard-codes `Accept` and `Accept-Encoding` based on the kind of resource
being asked for, for instance (see
https://cs.chromium.org/chromium/src/net/url_request/url_request_http_job.cc?g=0&l=666
and
places like
https://cs.chromium.org/chromium/src/media/blink/resource_multibuffer_data_provider.cc?rcl=e53d19f7befd7927b6b9727dc88b9ee295c6fa05&l=110
and
https://cs.chromium.org/chromium/src/content/renderer/loader/web_url_loader_impl.cc?rcl=e53d19f7befd7927b6b9727dc88b9ee295c6fa05&l=672
).

With the caveat that I'm sometimes prone to a myopic view of the world from
the standpoint of a web browser: `User-Agent` and `Accept-Language` seem to
contain significantly more entropy, and therefore feel like the right place
to start. I certainly wouldn't suggest that that's where we ought to stop.
:)

Post by Thomas Peterson
Many user agents
also do their own guess work on response bodies anyway (such as looking
at the magic number) to determine content type or encoding, so the
impact of a "failed negotiation" of content can be limited.
Also, Is there a particular reason why Sec-CH-Lang omits Quality Values?

https://tools.ietf.org/html/draft-west-lang-client-hint-00#section-4.3
addresses this. In a nutshell, it seems like cruft, and some widely-used
user agents (I spot-checked Chrome and Firefox) implement the weighting
mechanism as a function of the list order. That semantic makes sense to me,
and more doesn't seem to be necessary.

I might well be missing use cases here, which I'd be thrilled to hear about!

-mike

Post by Thomas Peterson
Regards

Stephen Farrell

2018-11-29 14:44:52 UTC

Permalink

Hiya,

Post by Mike West
* `User-Agent` could be represented as ~four distinct hints: `UA`,
`Model`, `Platform`, and `Arch`: https://github.com/mikewest/ua-client-hints is
a high-level explainer, and
https://tools.ietf.org/html/draft-west-ua-client-hints a sketchy ID for the
new headers.

It'd be good to see some progress along these lines, so thanks
for writing that.

If doing so, and in addition to greasing, could it be useful
to define some fixed strings that mean "NOT SAYING" for each
of the new headers?

I'm not sure it'd be a major win, but if those were defined,
then that might help a bit for whatever various UAs do when
they're supposedly in a more privacy-friendly mode - it'd
allow/encourage such UAs to look that little bit more alike.

Cheers,
S.

Post by Mike West
https://github.com/mikewest/lang-client-hint is a high-level explainer,
https://tools.ietf.org/html/draft-west-lang-client-hint an equally sketchy
ID for the new header.
I'd appreciate y'all's feedback. Thanks!
-mike

Mike West

2018-11-30 08:32:02 UTC

Permalink

Hey Stephen!

Post by Mike West

Post by Mike West
* `User-Agent` could be represented as ~four distinct hints: `UA`,

https://github.com/mikewest/ua-client-hints is

Post by Mike West
a high-level explainer, and
https://tools.ietf.org/html/draft-west-ua-client-hints a sketchy ID for

the

Post by Mike West
new headers.

It'd be good to see some progress along these lines, so thanks
for writing that.
If doing so, and in addition to greasing, could it be useful
to define some fixed strings that mean "NOT SAYING" for each
of the new headers?
I'm not sure it'd be a major win, but if those were defined,
then that might help a bit for whatever various UAs do when
they're supposedly in a more privacy-friendly mode - it'd
allow/encourage such UAs to look that little bit more alike.

We could certainly such a constant, but it's not clear how it would help. I
would expect user agents to simply not send the relevant hints, rather than
sending a hint that says "I'm not sending you a hint." Do you think it's
likely that some user agents might make difference choices?

-mike

Stephen Farrell

2018-11-30 10:35:38 UTC

Permalink

Hiya,

Post by Mike West
We could certainly such a constant, but it's not clear how it would help. I
would expect user agents to simply not send the relevant hints, rather than
sending a hint that says "I'm not sending you a hint." Do you think it's
likely that some user agents might make difference choices?

Not sending anything is better yes. Each UA deciding to
send something slightly different would be worse though,
so it may make sense to define it even if one recommends
it not be sent.

S.

Martin Thomson

2018-11-29 23:22:20 UTC

Permalink

I think maybe I was predisposed to not like this, but I do like it.
Not saying that I'm hugely enthusiastic about doing the CH part, but
the bit were User-Agent becomes fixed is really appealing.

One thing we might consider, if the timing works out, is having user
agents register their UA strings with us. We can bake them into the
QPACK static table so that we can save bits. I don't want to
privilege particular clients overly, so that only works if we have
broad acceptance of the plan.

Post by Mike West
Hey folks,
Section 9.7 of RFC7231 rightly notes that some of the content negotiation headers user agents deliver in HTTP requests create substantial fingerprinting surface. I think it would be beneficial if we took steps to reduce their prevalence on the wire, and Client Hints looks like a reasonable infrastructure on top of which to build.
* `User-Agent` could be represented as ~four distinct hints: `UA`, `Model`, `Platform`, and `Arch`: https://github.com/mikewest/ua-client-hints is a high-level explainer, and https://tools.ietf.org/html/draft-west-ua-client-hints a sketchy ID for the new headers.
* `Accept-Language` could be represented as a `Lang` hint: https://github.com/mikewest/lang-client-hint is a high-level explainer, https://tools.ietf.org/html/draft-west-lang-client-hint an equally sketchy ID for the new header.
I'd appreciate y'all's feedback. Thanks!
-mike

Mike West

2018-11-30 08:37:57 UTC

Permalink

Post by Martin Thomson
I think maybe I was predisposed to not like this, but I do like it.
Not saying that I'm hugely enthusiastic about doing the CH part, but
the bit were User-Agent becomes fixed is really appealing.

I am glad the concept overcame your potential predispositions! I think it's
a pretty natural extension of the CH infrastructure, and I think it's a
pretty reasonable compromise that gives us the breathing room to freeze the
UA string while not freezing out developers.

One thing we might consider, if the timing works out, is having user

Post by Martin Thomson
agents register their UA strings with us. We can bake them into the
QPACK static table so that we can save bits. I don't want to
privilege particular clients overly, so that only works if we have
broad acceptance of the plan.

That seems pretty reasonable to me, though I'm not sure how much space we
have to work with in the static table. Chrome's current user agent string
comes in at around ~150 characters, depending on platform. Taking that as a
baseline, how many strings could we reasonably encode? Can the table expand
to an extend that would allow it to cover enough of the UA market?

-mike

Post by Martin Thomson

Post by Mike West
Hey folks,
Section 9.7 of RFC7231 rightly notes that some of the content

negotiation headers user agents deliver in HTTP requests create substantial
fingerprinting surface. I think it would be beneficial if we took steps to
reduce their prevalence on the wire, and Client Hints looks like a
reasonable infrastructure on top of which to build.

Post by Mike West
`User-Agent` and `Accept-Language` seem like particularly tasty and
* `User-Agent` could be represented as ~four distinct hints: `UA`,

https://github.com/mikewest/ua-client-hints is a high-level explainer,
and https://tools.ietf.org/html/draft-west-ua-client-hints a sketchy ID
for the new headers.
https://github.com/mikewest/lang-client-hint is a high-level explainer,
https://tools.ietf.org/html/draft-west-lang-client-hint an equally
sketchy ID for the new header.

Post by Mike West
I'd appreciate y'all's feedback. Thanks!
-mike

Martin Thomson

2018-12-04 17:12:18 UTC

Permalink

That seems pretty reasonable to me, though I'm not sure how much space we have to work with in the static table. Chrome's current user agent string comes in at around ~150 characters, depending on platform. Taking that as a baseline, how many strings could we reasonably encode? Can the table expand to an extend that would allow it to cover enough of the UA market?

The cost is in static code size only, so in theory it could be
modestly large. (Those concerned with static code size can replace
the strings with symbols representing the semantics without preserving
the syntax - those 150+ characters could be just be a static
IS_CHROME)

Mark Nottingham

2018-11-30 00:29:47 UTC

Permalink

I, for one, welcome our new Client Hint overlords.

Personally, I'd like to see these integrated into the current CH document, rather than as separate drafts. CH still needs some work, so it's not like we're going to get it out the door tomorrow.

However, it seems like Ilya wants to go in a different direction, based upon the notes we received for Bangkok.

Ilya, your thoughts?

--
Mark Nottingham https://www.mnot.net/

Mike West

2018-11-30 08:44:59 UTC

Permalink

These hints seem pretty clearly separable from the infrastructure upon
which they're built. I'd prefer to split them out into things-in-themselves
that we can point developers towards independently, giving ourselves the
opportunity to explain the rationale and background more coherently than I
think we'll be able to if we bury these in a subsection of the larger
document.

I'll defer to the group as to how y'all would like to handle these, but I'd
prefer several short and focused docs as a reader.

-mike

However, it seems like Ilya wants to go in a different direction, based

Post by Mark Nottingham
upon the notes we received for Bangkok.
Ilya, your thoughts?

Post by Mike West
Hey folks,
Section 9.7 of RFC7231 rightly notes that some of the content

Post by Mike West
`User-Agent` and `Accept-Language` seem like particularly tasty and
* `User-Agent` could be represented as ~four distinct hints: `UA`,

Post by Mike West
I'd appreciate y'all's feedback. Thanks!
-mike

--
Mark Nottingham https://www.mnot.net/

Yoav Weiss

2018-11-30 10:10:24 UTC

Permalink

Post by Mike West

On my list, I want to remove the specific image-related features and move
them to their own specification, with a well defined browser processing
model.
Anything else that's needed to get CH infra "out the door tomorrow"? :)

Post by Mike West
These hints seem pretty clearly separable from the infrastructure upon
which they're built. I'd prefer to split them out into things-in-themselves
that we can point developers towards independently, giving ourselves the
opportunity to explain the rationale and background more coherently than I
think we'll be able to if we bury these in a subsection of the larger
document.

Similarly, I'd prefer clear distinctions between "CH as infrastructure" and
"Features that use the CH infrastructure".
We've had a lot of confusion and resistance to "CH the infrastructure" due
to some of the features that rely on it, and clearly separating the two
will enable implementations and user-agents to say "I support the CH
infrastructure, and certain features relying on it, but not feature X".

Post by Mike West
From a procedural perspective, we wouldn't want every added feature to

delay "CH as infrastructure" to advance.

Post by Mike West
I'll defer to the group as to how y'all would like to handle these, but
I'd prefer several short and focused docs as a reader.
-mike
However, it seems like Ilya wants to go in a different direction, based

Post by Mark Nottingham
upon the notes we received for Bangkok.
Ilya, your thoughts?

Post by Mike West
Hey folks,
Section 9.7 of RFC7231 rightly notes that some of the content

Post by Mike West
`User-Agent` and `Accept-Language` seem like particularly tasty and
* `User-Agent` could be represented as ~four distinct hints: `UA`,

Post by Mike West
I'd appreciate y'all's feedback. Thanks!
-mike

--
Mark Nottingham https://www.mnot.net/

Ilya Grigorik

2018-11-30 17:00:09 UTC

Permalink

I agree with Yoav on the direction to separate the infrastructure from the
features.

We started by integrating the two within one spec, but overtime realized
that it's hard to cleanly and crisply define many of the concepts without
pulling in and having to replicate a whole lot of existing concepts and
plumbing from HTML, Fetch, and other specs. This problem also only gets
harder when we want to spec browser implementation. Hence the reason why we
started pulling out individual hints (e.g. Downlink, RTT, ECT) into NetInfo
spec, where those concepts and implementations are defined â bonus,
everything is in one place for consistency â and I think it makes sense to
do the same for DPR and remaining hints in current spec.

+Mike West <***@google.com> I like your proposal for User-Agent and
Accept-Language. Most of the CH plumbing is already in place (or close to)
in HTML spec, WDYT of defining those hints directly in the HTML spec?

Post by Yoav Weiss

Post by Mike West

Similarly, I'd prefer clear distinctions between "CH as infrastructure"
and "Features that use the CH infrastructure".
We've had a lot of confusion and resistance to "CH the infrastructure" due
to some of the features that rely on it, and clearly separating the two
will enable implementations and user-agents to say "I support the CH
infrastructure, and certain features relying on it, but not feature X".
From a procedural perspective, we wouldn't want every added feature to
delay "CH as infrastructure" to advance.

Post by Mark Nottingham
upon the notes we received for Bangkok.
Ilya, your thoughts?

Post by Mike West
Hey folks,
Section 9.7 of RFC7231 rightly notes that some of the content

Post by Mike West
`User-Agent` and `Accept-Language` seem like particularly tasty and
* `User-Agent` could be represented as ~four distinct hints: `UA`,

Post by Mike West
I'd appreciate y'all's feedback. Thanks!
-mike

--
Mark Nottingham https://www.mnot.net/

Mike West

2018-12-03 09:48:49 UTC

Permalink

Post by Ilya Grigorik
Accept-Language. Most of the CH plumbing is already in place (or close to)
in HTML spec, WDYT of defining those hints directly in the HTML spec?

For `User-Agent`: HTML currently relies on Fetch's "default `User-Agent`
value <https://fetch.spec.whatwg.org/#default-user-agent-value>" (for `
navigator.userAgent <https://html.spec.whatwg.org/#dom-navigator-useragent>`),
and Fetch is also responsible for setting the `User-Agent` value (in step
5.11 of https://fetch.spec.whatwg.org/#http-network-or-cache-fetch). That
seems like the right direction for the dependency, so I could imagine
defining the entire mechanism in Fetch if this group is willing to point
off in that direction in a future version of RFC7231 that deprecates
`User-Agent`.

For `Accept-Language`, HTML defines `navigator.language{.s}
<https://html.spec.whatwg.org/#language-preferences>` with some
recommendations about what data to expose, and Fetch sets the header in
step 1.4 of https://fetch.spec.whatwg.org/#fetching. Again, it seems
reasonable for the header work to be exposed in Fetch rather than pulling
it into HTML.

If we agree that these hints are interesting enough to build and ship, then
I'd be perfectly happy defining them in Fetch route, or doing something
like the drafts I sketched out above. The former approach would require
this group to point elsewhere when deprecating `Accept-Language` and
`User-Agent` in a future iteration of RFC7231, the latter requires some
discussion about which HTTPbis document they'd live in (see Mark's
suggestion to merge into the client hints draft itself). I have a personal
preference for small, focused, and separate documents (and I think Fetch is
already growing some features I'd have preferred to split out), but I also
don't have much context on the client hints discussion here thus far. I'm
happy to defer to y'all.

-mike

Daniel Stenberg

2018-12-03 10:05:06 UTC

Permalink

the entire mechanism in Fetch if this group is willing to point off in that
direction in a future version of RFC7231 that deprecates `User-Agent`.

I would be most curious on *how* we would go ahead and actually deprecate
User-Agent for real on the web. Just saying it in a document like that won't
be enough.

I'm sure that removing that header in requests will break the experience on N%
of the world's web sites, which I doubt browsers would like to impose on their
users. I expect that nobody will remove this header until such an action is
likely to only cause an insignificant amount of user pain, and the same time
the world's web sites with all their user-agent sniffing logic are unlikely to
actually swich to CH as long as the vast majority of the browsers keep sending
User-Agent headers... The good old depcrecating things on the web dilemma.

Or are there reasons to believe this time will be different?

--
/ daniel.haxx.se

Mike West

2018-12-03 10:21:00 UTC

Permalink

Post by Daniel Stenberg

the entire mechanism in Fetch if this group is willing to point off in

that

direction in a future version of RFC7231 that deprecates `User-Agent`.

I would be most curious on *how* we would go ahead and actually deprecate
User-Agent for real on the web. Just saying it in a document like that won't
be enough.

https://github.com/mikewest/ua-client-hints#a-proposal tries to lay out a
story: in short, we take another stab at Safari's attempt to freeze the UA
string, with the client hint serving as an escape valve for developers who
have legitimate need for the existing string.

-mike

Ilya Grigorik

2018-12-05 14:08:07 UTC

Permalink

Hey folks.

Post by Mark Nottingham
Speaking personally --
I could see splitting out the various task-specific bits into separate
documents (e.g., images).
However, it's not good to ship a framework like this without having it
actually working for a real-world use case, so even if we split out all of
the various CHs into separate docs, I think we'd need to hold the "main"
document until at least one of those is ready.
Just a thought -- replacing User-Agent is probably suitable for that test
case (in addition to the image-focused stuff, or perhaps instead of it). If
we do decide to do that, it might be suitable to put it in the core
document, since that seems like it's pretty central to what's going on here
(the current claims in the document about not replacing UA notwithstanding).

Just to clarify, we do have a shipping implementation of image hints
(Viewport-Width, Width, DPR), as well as network related hints (ECT, RTT,
Downlink). The latter set is defined in NetInfo [1] spec and the proposal
is to spec processing definitions for image hints within Fetch+HTML to
resolve current UA implementation and processing gaps â this would resolve
and prevent questions similar to [2], [3]. On that note, the basic
integration for image hints is already in Fetch (see steps 6+7 and related
plumbing in [4]) but instead of linking to the IETF draft for definitions,
the goal is to define those concepts directly against relevant HTML+Fetch
concepts.

Which is to say, this is editorial shuffling and it doesn't change or
affect current shipping implementation of either sets of hints. I'm excited
by Mike's proposal but I don't think we need to block the framework on that
set of hints: we already have two existing shipping sets active in the wild.

[1] https://wicg.github.io/netinfo/
[2] https://github.com/httpwg/http-extensions/issues/698
[3] https://github.com/httpwg/http-extensions/issues/697
[4] https://fetch.spec.whatwg.org/#fetching

Post by Mark Nottingham
Chair hat on -- what I did notice was that when the update for CH was read
in Bangkok, *many* WG participants expressed surprise at the direction you
were taking it in; most people seemed to think that this document was
almost done in its current form, and there was concern that forming WG
consensus on that was being disregarded. So whatever you do here, please
make sure you get buy-in on the list, and make sure you coordinate with the
chairs. Continuing this discussion and moving towards a common idea of what
the doc(s) should include, when we should ship them, etc. sounds like a
good start.

Yup, fair feedback and will do. The change in course is based on recent
discussions and issues that were raised as we were trying to shepherd the
document through last stages.

Post by Mark Nottingham

Post by Ilya Grigorik
Accept-Language. Most of the CH plumbing is already in place (or close to)
in HTML spec, WDYT of defining those hints directly in the HTML spec?

+Mike West <***@google.com> yep, defining these in Fetch makes sense to
me. I'll defer to the group on the deprecation.

ig

Mark Nottingham

2018-12-02 03:48:27 UTC

Permalink

Speaking personally --

I could see splitting out the various task-specific bits into separate documents (e.g., images).

However, it's not good to ship a framework like this without having it actually working for a real-world use case, so even if we split out all of the various CHs into separate docs, I think we'd need to hold the "main" document until at least one of those is ready.

Just a thought -- replacing User-Agent is probably suitable for that test case (in addition to the image-focused stuff, or perhaps instead of it). If we do decide to do that, it might be suitable to put it in the core document, since that seems like it's pretty central to what's going on here (the current claims in the document about not replacing UA notwithstanding).

Chair hat on -- what I did notice was that when the update for CH was read in Bangkok, *many* WG participants expressed surprise at the direction you were taking it in; most people seemed to think that this document was almost done in its current form, and there was concern that forming WG consensus on that was being disregarded. So whatever you do here, please make sure you get buy-in on the list, and make sure you coordinate with the chairs. Continuing this discussion and moving towards a common idea of what the doc(s) should include, when we should ship them, etc. sounds like a good start.

Cheers,

Post by Mark Nottingham
I, for one, welcome our new Client Hint overlords.
Personally, I'd like to see these integrated into the current CH document, rather than as separate drafts. CH still needs some work, so it's not like we're going to get it out the door tomorrow.
On my list, I want to remove the specific image-related features and move them to their own specification, with a well defined browser processing model.
Anything else that's needed to get CH infra "out the door tomorrow"? :)
These hints seem pretty clearly separable from the infrastructure upon which they're built. I'd prefer to split them out into things-in-themselves that we can point developers towards independently, giving ourselves the opportunity to explain the rationale and background more coherently than I think we'll be able to if we bury these in a subsection of the larger document.
Similarly, I'd prefer clear distinctions between "CH as infrastructure" and "Features that use the CH infrastructure".
We've had a lot of confusion and resistance to "CH the infrastructure" due to some of the features that rely on it, and clearly separating the two will enable implementations and user-agents to say "I support the CH infrastructure, and certain features relying on it, but not feature X".
From a procedural perspective, we wouldn't want every added feature to delay "CH as infrastructure" to advance.
I'll defer to the group as to how y'all would like to handle these, but I'd prefer several short and focused docs as a reader.
-mike
However, it seems like Ilya wants to go in a different direction, based upon the notes we received for Bangkok.
Ilya, your thoughts?

--
Mark Nottingham https://www.mnot.net/