Security patterns and anti-patterns in embedded development

By Joe Brockmeier
April 30, 2024

When it comes to security, telling developers to do (or not do) something can be ineffective. Helping them understand the why behind instructions, by illustrating good and bad practices using stories, can be much more effective. With several such stories Marta Rybczyńska fashioned an interesting talk about patterns and anti-patterns in embedded Linux security at the Embedded Open Source Summit (EOSS), co-located with Open Source Summit North America (OSSNA), on April 16 in Seattle, Washington.

Rybczyńska started the talk by discussing her relevant experience as a security consultant as well as being a developer by training. (Though not mentioned, she is a frequent guest author for LWN as well.) She then moved on to her picks for recent, high-profile examples, including bricked trains, HTTP/2 protocol implementation issues, leaked signing keys, and the XZ backdoor.

The little engines that couldn't

The first story touched on embedded security practices for devices where security and safety are of the utmost priority: trains. A Polish railway operator, the Lower Silesian Railway (LSR), encountered problems with trains purchased from Newag. LSR sent the trains out to be serviced by Serwis Pojazdów Szynowych (SPS), a competing train maintenance provider, rather than Newag. SPS found that the trains would no longer start, and brought in a third party to investigate the software used in the trains. (A presentation on the investigation was given at the 37th Chaos Communication Congress in 2023.)

What they found, Rybczyńska said, was that the trains were intentionally bricked under certain conditions. For example, trains that had been stopped for a long period of time or trains that were at certain GPS coordinates that matched SPS repair facilities. The trains were also found to lock up on specific dates (possibly meant to force maintenance), and the researchers discovered "cheat codes" that could unlock the trains.

Rybczyńska identified several anti-patterns in this story. In addition to the apparent ransomware built into the firmware for the trains, they also suffered more general flaws. She mentioned that nearly every train had a different firmware version, and no indication of version control. "So we can have some doubts about the quality of the development process, right?" One might, she said, have doubts about the certification process that didn't detect any of these problems before certifying the trains for use in public transit. "I don't have access to the certification documents, so I'm not able to say what they're checking, but that's an interesting part."

Finally, Rybczyńska questioned the ethics of the developers for including functionality that would prevent third-party repairs and parts, or allow disabling a train according to its location. "Especially for the GPS conditions, because that is pretty obvious what it is going to do."

HTTP/2 implementations

Having ridden the train story to its conclusion, she then switched tracks to HTTP/2 implementations in embedded systems. She looked at CVE-2023-44487, a HTTP/2 rapid reset flaw that impacted NGINX, nghttp2, Apache Tomcat, Apache Traffic Server, and others. In this attack, a client sends multiple requests and then cancels them in rapid succession, causing the server to do extra work processing the requests. This can lead to a denial-of-service as the server is unable to process new incoming requests.

Part of the problem, said Rybczyńska, was a weakness in the HTTP/2 protocol itself. That does not excuse the vulnerabilities, however. She said developers were responsible for not just implementing standards, but anticipating what might happen. "The protocol is not protecting you from everything." (LWN has recently covered continuation-flood attacks on HTTP/2 that might have been prevented with better implementations of the protocol.)

She also asserted that web servers written for embedded systems were "way less affected than the other ones" because they are subject to more stringent resource allocations. Her thesis was that software written for resource-constrained systems, such as embedded systems, would be less likely to be vulnerable to some attacks. As an example of this, she cited lighttpd, a web server designed for low-resource usage compared to other popular web servers. Lighttpd is not considered vulnerable to CVE-2023-44487 in its default configuration. What it did differently, she said, was to process HTTP/2 frames in batches and set a limit of eight streams per client, rather than 100 or greater as recommended by the RFC. This meant that an attack that debilitated other web servers merely caused lighttpd to increase its resource usage.

Watch your keys

Next, she turned to an embarrassing incident for hardware vendor MSI from early 2023. The company was subject to a ransomware attack and data breach to the tune of 1.5TB of data. The stolen data included source code, firmware, and perhaps worst of all, image signing keys for UEFI firmware. Used correctly by a hardware vendor, the signing keys would allow the Intel Boot Guard system to verify firmware before loading it at boot time. In the hands of attackers it would allow distribution of UEFI bootkits that can bypass secure boot features on MSI devices using 11th through 13th generation Intel CPUs.

Rybczyńska identified several anti-patterns in this story that embedded developers should take pains to avoid. Firstly, she noted that the keys could not have been well-secured if they were caught up in a general data breach. Signing keys, she pointed out, should not be on a machine connected to a company's general network. Ideally, they would be stored on hardware tokens or systems that are air-gapped from the main network to reduce the chance they could be exfiltrated.

Better protection of signing keys could have prevented their exposure, but it's not a guarantee. MSI's other sin, in this case, was that the keys had no revocation mechanism. This means attackers can attempt to exploit any of the affected hardware through the entire life of the systems, with no way for MSI or Intel to revoke the vulnerable keys. The one positive in this story, she said, was that MSI had used a separate key for each product rather than a single signing key for all of its products.

XZ, of course

The XZ backdoor episode was a dominant topic at EOSS and OSSNA. If things had gone a bit differently, Rybczyńska said, the backdoor might have been caught by the Yocto project because XZ versions 5.6.0 and 5.6.1 broke the build. The failure to notice the backdoor was because Yocto's build-system maintainer didn't have time to investigate why the builds weren't working before the backdoor was discovered elsewhere.

The reason, or one reason, that compromised versions of XZ wouldn't build is that Yocto does not use the build scripts provided with the source tarball. This is in part because Yocto targets a broader set of compilers and architectures than mainstream Linux distributions. She surveyed the room and asked how many people really understood Autoconf's m4 scripts. In a room with about 100 attendees, few hands went up. "That's the issue," she said, "hide your backdoor in M4 scripts." Developers, she said, should be using languages for build systems that aren't obscure and therefore difficult to read.

She also called out, like many others, that developers need to consider their dependencies. She suggested that having a dependency on a project with a sole maintainer who is underfunded and overworked is something to be wary of. "It is important to consider your dependencies. Those projects maintained by the person in Nebraska, are you really sure you want to use them?"

Rybczyńska wrapped up by summing up some of the lessons learned from the stories in her talk, and reminded the audience that security practices evolve from real-world situations. "Security practices are there for a reason [...] if there's a security practice that is making your life harder, ask the security person why" it exists and see if there's another way to mitigate risk. Odds are, there's a story behind the practice.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event.]

Index entries for this article
Conference	Embedded Open Source Summit/2024
Conference	Open Source Summit North America/2024

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 16:15 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link] (11 responses)

The Yocto build system with its wild mixture of Python, shell and all of the Bitbake ad hocery isn't exactly a shining example of something that's not "obscure" and I seriously doubt that the majority of people using it actually understand it. Further, the language isn't the real issue here but the code written in it. Most code written in any language will be "obscure" to most people, regardless if they know the language or not, possibly just for the trivial reason that there's a real lot of it (code) and they haven't really seen much of that.

How many people know that meta/classes/devtool-source.bbclass¹ exists and how many of these understand it?

¹ From the dated Yocto version I've worked with last.

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 16:59 UTC (Tue) by dskoll (subscriber, #1630) [Link]

This. I haven't used Yocto directly, but I did use Xylinx's Petalinux, which is built on top of Yocto and it was a complete and utter mess. Over-engineered, under-documented and requiring Internet searches to do just about anything.

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 17:50 UTC (Tue) by Paf (subscriber, #91811) [Link] (4 responses)

So I can't speak to Yocto... But while it's certainly true that writing build stuff in a less obscure language isn't some cureall... It's *also* true that it doesn't help! This can be an "and" type situation.

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 18:03 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

I think the bigger thing here is that Yocto doesn't take upstream's build system; it rewrites it, instead. As a result, backdoors in the build system don't get run, and there's a human tracing through how the build system works in order to redo the important bits in Yocto's system. Because the build system is "executed" by a human, a backdoor that depends on the build system will get caught in that execution.

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 19:18 UTC (Tue) by dezgeg (subscriber, #92243) [Link]

It's not the case that Yocto replaces the upstream build system (unlike say, the Android build system, or if I understood correctly Bazel), but just that the pre-generated configure isn't used but always generated on the fly by autoreconf et al.

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 21:16 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

This isn't really. It's pretty much a meta build system for integrating all kinds of software using all kinds of build system into a system for building an OS image from source. The default for handling autotools-based packages is the so-called autotools class which always invokes autoreconf as part of configuration. But that's an implementation detail which could well work in a different way. And there's obviously nothing which forces people to use the default autotools support.

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 21:27 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

The remark about obscure languages is really an obscure tangent. Understanding the autoconf m4 code is vastly more complicated than understanding the m4 syntax. And the same is true for any complex software system (written in a mix of different programming languages). The Yocto copy I have lying around here consists of 100,849 non-blank lines of text. A lot can be hidden in there even from people who are perfectly familiar with all the involved programming languages because the sheer volume of code mean most of it will always simply be taken for granted without even trying to understand what it does.

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 18:23 UTC (Tue) by atai (subscriber, #10977) [Link] (4 responses)

unrelated to Yocto or embedded development in general, but as a build system, bazel, used widely due to Google, seems also a system in obscurity.

Security patterns and anti-patterns in embedded development

Posted Apr 30, 2024 22:27 UTC (Tue) by willy (subscriber, #9762) [Link] (3 responses)

A lot of this is familiarity. When an unholy mess of shell, Makefile, sed, awk, perl spits out an error, I have an idea about how to go about diagnosing and even fixing it.

When it's something I don't know like m4, cmake or Bazel, I'm screwed. And honestly when it's something like Bazel which appears to exist Because Google Is Better At Everything Than You Are, I am disinclined to learn.

Security patterns and anti-patterns in embedded development

Posted May 1, 2024 11:47 UTC (Wed) by smurf (subscriber, #17840) [Link] (1 responses)

Also there's cmake's propensity for spitting out errors halfway through a complicated setup and then, surprise, going through the other half of the complicated setup before finally dying … with an error message that kindof implies that the last step of the complicated setup has gone wrong.

Funnily enough, that last step often is looking for the precise flavor of the pthread[s] library, which habitually "fails" because the check for the 'wrong' variant prints an error message. A web search for pthreads "breaking" your cmake script yields a heap of confused examples.

Makefiles aren't exactly anti-pattern-free either. You can make them arbitrarily complex, if not NP-complete. Look at the Linux kernel's build system if you need an example.

Security patterns and anti-patterns in embedded development

Posted May 3, 2024 1:06 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Yes, sometimes you want `message(FATAL_ERROR)` to stop ASAP, other times you want to try and get as many questions in the cache at once to reduce reconfigure cycles when satisfying a project's requests. But as with any tool, starting with the *first* reported error is usually a good place to start.

Security patterns and anti-patterns in embedded development

Posted May 3, 2024 9:40 UTC (Fri) by mss (subscriber, #138799) [Link]

meson seems to be this decade build system of choice for OSS projects.

cmake was more like year 2010 thing.

Autoconf and m4

Posted Apr 30, 2024 19:40 UTC (Tue) by epa (subscriber, #39769) [Link] (8 responses)

GNU Autoconf with its combination of shell scripting, m4 macros, odd snippets of code to compile and test, and Makefile generation is indeed obscure to all but the most advanced wizards. There are good reasons for that—it has a messy job to do. Nonetheless, you might want to cut some of the complexity.

There are forces pulling in two opposing directions. On the one hand you want to build from pristine sources, straight out of git, and not from the traditional release tarball. That argues for doing the full autoconf setup each time, rather than relying on a configure script that the maintainer has generated. On the other hand, you want the build to have as few steps as possible and to be understandable in its entirety without having to know m4 and all that.

Just possibly the answer could be to take a step further away from the original sources. Nowadays there is less variety among Unix-like systems, or at least those that 99% of developers care about. Your contributors are not using a mixture of Irix, AIX and old SunOS versions. People using those obscure systems might still run the configure script, but for everyone else why not ship a pre-generated makefile that assumes sensible defaults for a GNU/Linux system? Simple customizations like install root could be set via environment variables. Then Linux distributions could pull the sources, delete the autoconf/ subdirectory just to make sure it’s not used, and build the rest in a predictable way.

Is that at all feasible? Or is it like Microsoft Office, where we agree that only 10% of Autoconf’s functionality is needed, but nobody can agree which 10%?

Autoconf and m4

Posted Apr 30, 2024 20:20 UTC (Tue) by Paf (subscriber, #91811) [Link]

"GNU Autoconf with its combination of shell scripting, m4 macros, odd snippets of code to compile and test, and Makefile generation is indeed obscure to all but the most advanced wizards. There are good reasons for that—it has a messy job to do. Nonetheless, you might want to cut some of the complexity."

Shell-independent shell scripting, babe-y. It's a thing of beauty. Or at least ... a thing.

Autoconf and m4

Posted Apr 30, 2024 20:23 UTC (Tue) by Paf (subscriber, #91811) [Link] (3 responses)

RE: Assume sensible defaults.

The problem - or so it seems to me - is that configure is the main way builds find and report missing dependencies. So I'm building X and the build requires totally-reasonable-but-not-universal library Y (or god forbid it requires obscure library Z). Those dependencies are generally expressed via the configure script. Seems like a non-starter.

Autoconf and m4

Posted Apr 30, 2024 20:50 UTC (Tue) by epa (subscriber, #39769) [Link] (2 responses)

But if the deps are missing the build will fail anyway. The configure script is mostly for configuring optional dependencies. (We saw this used in the xz attack, I believe, where the detection of Landlock was subtly broken so that the build continued without it.)

There will be exceptions, but generally I think you could define a default build that requires most of the dependencies without having to sniff whether they are available. If that’s not flexible enough for everyone, some will stay using the configure script.

Autoconf and m4

Posted May 1, 2024 7:16 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

I guess one improvement would be a "strict mode" for the configure script where everything has to be explicitly specified as either --with-foo or --without-foo. If some new dependency appears then you'd get a message like

This configure script was invoked with --strict. The optional dependency 'libfoo' has not been specified.
You must pass either --with-libfoo or --without-libfoo.
Autodetection found that libfoo is present on this system.

Then build systems for Linux distributions would tend to use the --strict flag and nail down exactly what dependencies they want, leaving nothing to autodetection. That would have stopped the xz attack where the configure script stopped including the Landlock dependency and nobody noticed.

Autoconf and m4

Posted May 3, 2024 1:22 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Yes, I've found that explicit requests for things like that are best. It's really frustrating when you copy a configure line from one machine to another and get a result because somewhere in the middle of the log dependency X's state differed between the two machines and you're left figuring out why behavior Y differs. *Some* things make sense to just inspect and change, but that's for things that you can polyfill (BSD routines on Linux or glibc routines elsewhere) or are just plain impossible on the platform (e.g., v4l support on macOS or Windows).

Autoconf and m4

Posted May 1, 2024 15:35 UTC (Wed) by mb (subscriber, #50428) [Link] (2 responses)

>GNU Autoconf with its combination of shell scripting, m4 macros, odd snippets of code to compile and
>test, and Makefile generation is indeed obscure to all but the most advanced wizards.
>There are good reasons for that—it has a messy job to do.

Today there is pretty much *no* good reason for 95% of the autotools mess.

Almost all project don't care, if the build system works on obscure operating systems of the past. Yet, most code and checks in autotools are there because of some obsolete and obscure operating system that it used to support (or claims to still support). Your app won't work on these operating systems *anyway*, if you have never actually tested that.

Seriously, if you are still using autotools, you need to migrate away from that mess.
Some projects have started to migrate away from it like 20 years ago.

Having autotools in a project is a anti quality indicator.

Autoconf and m4

Posted May 1, 2024 16:07 UTC (Wed) by paulj (subscriber, #341) [Link] (1 responses)

I'd generalise a bit from that... having fought with all kinds of build systems (and seen some crazy crazy things):

- If a project's build system is not generally a series of declarative statements; with no more than a modicum of small, confined and clear, ad-hoc logic (for whatever transforms or tests needed) then that is an anti-pattern.

Is quite possible to have a small, clean, fast, declarative build system in auto*, and it's possible to make mess in pretty much any build system tool. The anti-pattern is the mess. The anti-pattern is the willingness of the people who made the build system to engage in dirty hacks, rather than read the documentation of the tool and do it properly (whether, using the features of the tool properly; or extending it cleanly).

Basically: You can evaluate a project simply on the amount of ad-hoc logic they've stuffed into their build system, and the actual build system doesn't really matter that much to this.

Autoconf and m4

Posted May 6, 2024 10:26 UTC (Mon) by LtWorf (subscriber, #124958) [Link]

Sometimes the tool is limited and there is no proper way.

Security patterns and anti-patterns in embedded development

Posted May 6, 2024 10:28 UTC (Mon) by LtWorf (subscriber, #124958) [Link]

> Ideally, they would be stored on hardware tokens or systems that are air-gapped from the main network to reduce the chance they could be exfiltrated.

I fear that in practice, they are kept on vault so that they can be used automatically, and be vulnerable to being exfiltrated by anyone who can place a "echo $PRIVKEY" in the appropriate place.