A privilege escalation via SCSI pass-through
One of the important attributes for virtualization is to provide complete isolation between the virtual machines, so that attackers (or bugs) in one VM cannot interfere with the other VMs. But, as a recent bug report shows, the kernel is vulnerable, in some configurations, to VMs that can read and write the disks of other VMs. That's clearly a serious security problem, but the discussion about patches to fix the bug makes it clear that it may take some time before the fix can be applied.
The problem occurs when programs issue the SCSI pass-through SG_IO ioctl() to a particular disk partition (e.g. /dev/sdb2) or LVM volume, which causes the SCSI command to be sent to underlying block device (/dev/sdb). The actual commands that can be sent to the device via SG_IO are filtered for processes that don't have the CAP_SYS_RAWIO capability, but there are still dangerous things that can be done. In particular, if a process can write to the partition, it can write to the underlying device without being restricted to the boundaries of that partition.
For virtualization configurations that mingle partitions or volumes used by different VMs on the same block device, that means that a VM can access—and change—the data on another VM's disk. Worse still, if the host OS stores its own data on that block device, a rogue VM could potentially compromise the host. Exploiting the vulnerability does not require a virtualization (or containerization) scenario, but those are the most likely ways that it could come about. Any process that can open the partition device node will be able to issue the ioctl(), but, on "standard" Linux systems, that ability is typically restricted to root.
Based on the bug report, Paolo Bonzini found the problem back in November 2011, but security problems with SG_IO were known as far back as August 2004. Bonzini posted patches to fix the problem at the end of December (though it would appear that the issue was under discussion on the closed kernel security mailing list in the interim). The proposed fix would disallow most SCSI commands on partition-like devices. So, doing any of the "dangerous" SCSI commands would fail unless the ioctl() is being called on the underlying block device.
The patches sparked a few comments from Linus Torvalds, mostly regarding error return codes (partly because ENOTTY is badly named for its use as an indication of "no such ioctl"). But, beyond that, he started to wonder whether there might be situations where users do issue SCSI commands to partitions and expect them to be passed down to the block device. It turns out that there is at least one place where it may be a common event: "ejecting" USB sticks and other removable media. Torvalds notes:
And that's the *natural* way to eject a mounted device. Look at the USB memory sticks you have. They are almost all partitioned to have one partition, and that one partition doesn't cover the whole device. And it's that one partition you use to interact with it - it's what you mount, and what you eject.
According to Bonzini, the fact that the
CDROMEJECT fails on a kernel with his proposed fix doesn't cause
any problems in practice. But Torvalds's concern goes beyond that one
particular example. The fix has been suggested for merging late in the 3.2
development cycle and his concern was the level of testing that it has been
subjected to: "I absolutely do not get the feeling that this has been tested so much
and is so obvious that there is no risk of breakage.
" Based on the
discussion, the testing seems to have been focused on ensuring that the
security hole was closed, without considering the other impacts that a—fairly sweeping—change might have.
Torvalds would certainly like to see the vulnerability fixed, but not at
the expense of a regression in what users have come to depend on. As he pointed out: "Suddenly
totally changing things and saying 'you can't do that on a partition'
when clearly people *have* been doing that on partitions isn't
something we can do without serious testing.
" His plan is to wait
for the 3.3 merge window to bring in the fix, which should allow some
testing time for distributions and others to ensure that the code doesn't
have any unintended consequences.
While it is important to fix security holes, it is equally important to keep everything else working, which is the bulk of Torvalds's concern. While the 3.3 development cycle may still not be long enough to shake out all of the places where the SCSI pass-through is used on partial disks (partitions or logical volumes), it certainly will provide more of a chance to do so than would a merge in the final stages of 3.2 development. In the meantime, now that the bug and fix are out in the open, concerned administrators can apply the patch or take other steps to remedy the problem.
Index entries for this article | |
---|---|
Kernel | SCSI/Command filtering |
Security | Linux kernel |
Posted Jan 5, 2012 21:51 UTC (Thu)
by dougg (guest, #1894)
[Link] (4 responses)
Anyone thinking about command filtering should consider the SCSI command set (a moving target), the SAT standard and the fact that protocols other than SCSI use the SG_IO ioctl (e.g. SMP).
P.S. One would think Paolo Bonzini might bring up the subject on the linux-scsi list.
Posted Jan 6, 2012 10:01 UTC (Fri)
by drag (guest, #31333)
[Link]
From what I've read...
No. From a file, yes. From a partition: No.
Any block device. It does not have to do with iSCSI or SCSI or anything like that in particular. It's any block device on a storage device that uses SCSI subsystem, which is going to be most things. That means whole disks, partitions, and logical volumes on most storage devices (such as SATA drives) are vulnerable.
On my KVM virtual machines I use LVM because of the performance advantage of using block devices directly rather then through file-backed storage.
This bug is a bit disheartening.
Posted Jan 6, 2012 10:02 UTC (Fri)
by lacos (guest, #70616)
[Link] (1 responses)
SCSI targets accessible from within a VM would themselves be virtual; for example with storage backed from a file (or partition) on the host machine That's about the default: virtual disks. However, please look at the title: "SCSI pass-through". The idea is to let the guest use the host's resource directly, with its own driver (strictly restricted to boundaries configured in the host). What's passed-through is a partition, not a complete disk. So the configuration is correct, the partition is basically dedicated to the guest. But the boundaries (ie. partition, not full drive) are not properly enforced by the host. Just my two cents.
Posted Jan 7, 2012 22:02 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
The pass-through that refers to is passing through the block layer, so as to access the underlying SCSI storage device instead of the block device. In a virtual machine, the underlying SCSI storage device is a virtual SCSI device which itself uses an underlying real SCSI device as a resource. The issuer of a pass-through ioctl isn't supposed to have any concept of a VM host.
The kind of pass-through you're talking about is also a reasonable concept, but the way you would implement it is by defining a pass-through SCSI command class (analogous to Write or Request Sense or Eject) and having the virtual SCSI device implement it. The Passthrough CDB would include a CDB to be passed through.
It does not make any sense for an "eject" command specifying a virtual device to cause a real flash drive to eject, but there could be a
"hosteject" command that ejects the underlying real flash drive. It would use a SCSI passthrough ioctl that specifies a CDB that specifies a Passthrough SCSI command that specifies an Eject command.
Leaving out the whole virtual machine scenario, it's probably just as reasonable to do SCSI pass-through to a partition block device as to a whole-device block device. In both cases, the user is insinuating himself into Linux internals -- the fact that Linux uses a SCSI device in some way to implement the block device.
Posted Jan 9, 2012 11:15 UTC (Mon)
by pbonzini (subscriber, #60935)
[Link]
This suggests that MAINTAINERS needs some care in this area.
In any case, the patches were so intrusive that they could only go in directly through Linus.
Posted Jan 10, 2012 17:47 UTC (Tue)
by rwmj (subscriber, #5474)
[Link] (1 responses)
Posted Jan 11, 2012 11:22 UTC (Wed)
by sorpigal (guest, #36106)
[Link]
A privilege escalation via SCSI pass-through
A privilege escalation via SCSI pass-through
A privilege escalation via SCSI pass-through
A privilege escalation via SCSI pass-through
However, please look at the title: "SCSI pass-through".
A privilege escalation via SCSI pass-through
Linus is wrong about this
Linus is wrong about this