[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Author Information

Brian Kardell
  • Developer Advocate at Igalia
  • Original Co-author/Co-signer of The Extensible Web Manifesto
  • Co-Founder/Chair, W3C Extensible Web CG
  • Member, W3C (OpenJS Foundation)
  • Co-author of HitchJS
  • Blogger
  • Art, Science & History Lover
  • Standards Geek
Follow Me On...
Posted on 03/30/2022

UA gotta be kidding

The UA String... It's a super weird, complex string that browsers send to servers, and is mostly dealt with behind the scenes. How big a deal could it be, really? I mean... It's a string. Well, pull up a chair.

I am increasingly dealing with in an ever larger number of things which involve very complex discussions, interrelationships of money, history, new standards, maybe even laws that are ultimately, somehow, about... A string. It's kind of wild to think about.

If you're interested in listening instead, I recently did an Igalia Chats podcast on this topic as well with fellow Igalians Eric Meyer and Alex Dunayev.

To understand any of this, a little background is helpful.

How did it get so complicated?

HTTP's first RFC 1945 was 1996. Section 10.15 defined the User Agent header as a tokenized string which it said wasn't required, but you should send it. Its intent was for

statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations

Seems reasonable enough, and early browsers did that ecxactly as expected.

So we got things like NCSA_Mosaic/2.0 (Windows 3.1), and we could count how many of our users were using that (statistical purposes).

But the web was new and there were lots of browsers popping up. Netcape came along, phenomenally well funded, intending to be a "Mosaic killer" they sent Mozilla/1.0 (Win3.1). Their IPO was the thing that really made the broad public really sit up and take notice of the web. It wasn't long before they had largely been declared the winners, impossible to unseat.

However, about this same time, Microsoft licensed the Mosaic source through NCSA's partner (called Spyglass) and created the initial IE in late 1995. It sent Microsoft Internet Explorer/1.0 (Windows 3.1). Interestingly, Apple too got into the race with a browser called Cyberdog released in Feb 1996. It sent a similarly simple string like Cyberdog/2.0 (Macintosh; 68k).

While we say things were taking off fast, it's worth mentioning that most people didn't have access to a computer at all. Among those that did, only a small number of them were really capable systems with graphical UIs. So text-based browsers, like the line mode browser from CERN, which could be used in university systems, for example, really helped expand the people exposed to the bigger idea of the web. It sent a simple string like W3CLineMode/5.4.0 libwww/5.4.0.

So far, so good.

But just then, the interwebs were really starting to hit a tipping point. Netscape quickly became the Chrome of their day (more, really): Super well funded, wanting to be first, and occasionally even just making shit up and shipping it. And, as a result, they had a hella good browser (for the first time). This created a runaway market share.

Oh hai! Are UA Netscape Browser?

Now, if you were a web master in those days, the gaps and bugs between the runaway top browser and others is kind of frustrating to manage. Netscape was really good in comparison to others. It supported frames and lots of interesting things. So, web masters just began creating two websites: A really nice one, with all the bells and whistles and the much simpler plain one that had all of the content, but worked fine even in text-based browsers... Or just blocking others and telling them to get a real browser. And they did this via the UA string.

Not too long after this became common, many other browsers (like IE and Cyberdog) did implement framesets and started getting a lot better… But it didn't matter.

It didn't matter because people had already placed them them in the "less good/doesn't support framesets and other fancy features" column. And, they weren't rushing out and changing it. Even if they wanted to, we all have other things to do, so it would take a long while before it would be changed everywhere.

If web masters wouldn't chage, end-users wouldn't adopt. If users don't adopt, why would your organization even try to fund and compete. Perhaps you can see the chicken and egg problem that Microsoft faced at this critical stage...

And so, they lied.

IE began sending Mozilla/1.22 (compatible; MSIE 2.0; Windows 3.1).

Note that in the product token, which was intended to identify the product, they knocked on the door and identified themselves as "Mozilla". Note also that they did identify themselves as MSIE in there elsewhere.

Why? Well, it's complicated.

For one, they needed to get the content. Secondly though, they needed a way to take credit, and build on it. Finally though - intentionally or not: If you start to win, the tables can turn. Web masters might send good stuff to MSIE and something less to everyone else. So, effectively, they deployed a clever workaround that cheated the particular parsing that was employed at that time (because that's what the spec said it should do) to achieve detection. It was the thing that was in their control.

Wash, rinse, repeat (and fork)...

So, basically, this just keeps happening. Everytime a browser comes along it's this problem all over again. We have to figure out a new lie that will fall through all of the right cracks in how people are currently parsing/using the UA strings. And we've got all the same pressures.

By the time you get to the release of Chrome 1.0 in 2008 it is sending something like Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.39 Safari/525.19.

Yikes. What is that Frakenstein thing?

But wait! There's more!

As flawed and weird as that is, it's just the beginning of the problem, because as I say this string is useful in ways that are sometimes at odds. Perhaps unintentionally, we've also created a system of advesarial advances.

Knowing stuff about the browser does lets you do useful things. But the decisioning powers available to you are mostly debatable, weird, and incomplete: You are reasoning about a thing stuck in time, which can become problematic. And so on the other end, we have to cheat.

That doesn't prevent people from wanting to know the answers to those questions or to do ever more seemingly useful things. "Useful things" can mean even something as simple as product planning and testing, as I say, even for browsers.

This goes wrong so many ways. For example, until not long ago, everything in the world counted Samsung Internet as "Chrome". However, that's not great for Samsung, and it's not necesarily great for all websites either. It is very much not Chrome, it is chromium-based. It's support Matrix and qualities are not the same in ways that do sometimes matter, at least in the moment. The follow on effects and ripples of that are just huge - from web masters routing content, sites making project choices, which polyfills to send, or even whether users have the inkling to want to try it - all of this is based on our perceptions of those stats.

But, it turns out that if you actually count them right - wow yes - Samsung Internet is the third most popular mobile browser worldwide, and by a good margin too! And also, a lot of stuff that totally should have let them in the door as totally capable before should have done that, and they should've gotten a good experience with the right polyfills too.

Even trying to keep track of all of this is gnarly, so we've built up whole industries to collect data, make sense of it and allow people to do "useful stuff" in ways that shield them from all of that. For example, if you use a popular CMS with things that let you say "if it's an iPad", or even just summarizes your stats in far more understandable ways like that - it's probably consulting one of these massive databases. Things like "whatismybrowser.com" which claims to have information about over 150 million unique UA strings in the wild.

Almost always, these systems involve mapping the UA string (including its lies) to "the facts, as we know them". These are used, often, not just for routing whole pages, but to deliver workarounds for specific devices, for example.

God of the UA Gaps

As you can imagine it's just gotten harder and harder to slide the the right holes. So now we have a kind of a new problem...

What happens when you have a lie that works for 95% of sites, but fails on, say, a few Alexa top 1k sites, or important properties you or your partners own?

Well, you lie differently to those ones.

That's right, there are many levels of lies. Your browser will send different UA strings to some domains, straight up spoofing another browser entirely.

Why? Because it has to. It's basically impossible to slip through all the cracks, and that's the only way to make things work for users that's in the browser's control.

What if the lie isn't enough? Well, you special case another kind of lie. Maybe you force that domain into quirks mode. You have to, because while the problem is on the site, that doesn't matter to regular users - they'll blame your "crappy browser". Worse still, if you're unlucky enough to be a newbie working on a brand new site in that domain, surprise! It doesn't work like almost anything else for some reason you can't explain! So, you try to find a way, another kind of workaround... and on and on it goes.

Privacy side effects

Of course, a side effect of all of this is that ultimately all of those simple variants in the UA and work that goes into those giant databases mean that we could know an awful lot about you, by default. So that's not great.

WebKit led on privacy by getting rid of most thirdparty cookies way way back. Mozilla followed. Now only Chrome does, and they're trying to figure out how to follow too.

But, back in 2017, WebKit also froze the UA string. And, since then we've been working to sort out a path that strikes all of the right balances. We do an experiment, and something breaks. We talk about doing another an experiment and some people get very cross. There are, after all, businesses built on the status quo.

Lots of things happening in standards (and chromium) surround trying to wrestle all of this into a manageable place. Efforts like UA reduction or Client Hints and many others are trying to find a way.

Obviously, it isn't easy.

Y2UA

Because of all of this complexity, there's even some worry that as browser versions hit triple digits (which once seemed it would take generations), some things could get tripped up in important ways.

There are several articles which discuss the various plans to deal with that - and, how this might involve (we hope, temporarily) some more lies.

Virtual (Reality) Lies

An interesting part of this is that occasionally we spawn a whole new paradigm - like mobile devices, dual screen, foldables - or XR.

The XR space is really dominated by new standalone devices that run Android and have a default Chromium browser with, realistically, no competition. Like, none. Not engine just choice, no actively developed browser choice. This is always the case in new paradigms, it seems, until it isn't.

As you might know, Igalia is changing that with our new Wolvic browser. Unfortunately, a lot of really interesting things fall into this same old trap - the "enter vr" button is only presented if it is what was previously the only real choice, and everything else was considered mobile or desktop. I'm not sure if it is them, or a service or library reasoning about it that way, but that's what happens.

So guess what? We selectively have to lie.

It's hard to overstate just how complex and intertwined this all is and what astounding amounts of money across the industry have been spent adversarially on ... a string.