[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
|
|
Subscribe / Log in / New account

Twisted in an asyncio world

By Jake Edge
June 22, 2016
PyCon 2016

At PyCon 2016, Amber Brown gave a presentation on the advent of the asyncio module for handling asynchronous I/O in Python 3 and what that means for the Twisted event-driven networking framework. There is some thinking that asyncio "kills" Twisted, but that's not how she sees things. Brown is a core Twisted developer and the release manager for the project. Over the last year or so, she has ported 40,000 lines of Twisted code to Python 3. She has also ported Autobahn|Python and Crossbar.io to Python 3 as part of her day job working on Crossbar.io.

The inspiration behind the talk came from two places. Russell Keith-Magee asked her at one point why Twisted was still relevant now that asyncio had been added to Python. In addition, Twisted's lead architect Glyph Lefkowitz posted that the "report of Twisted's death was an exaggeration" to his blog in May 2014. She believes that she is in a unique position to explain what asyncio means for Twisted and what the future holds, thus the talk.

[Amber Brown]

The basic problem that Twisted addresses is handling multiple concurrent I/O operations, generally network I/O. The way that web frameworks (e.g. Django) typically do that is with multiple "runners" to handle requests. These runners are either threads or processes.

But neither threads nor processes will help with the C10k problem—handling 10,000 concurrent connections. Threads are "hard to get right" and have high overhead. A 128KB stack per thread means that 10,000 connections requires 1.3GB just for the stacks. Beyond that, the Python global interpreter lock (GIL) means there will be no parallelism anyway. Furthermore, "you won't do threads properly"—she suggested posting that statement "above your computer".

The only good way to handle that many connections in Python is by using non-threaded asynchronous I/O. Twisted is one of the first Python asynchronous I/O frameworks, going back to 2001, while asyncio is much newer. But they are identical at the core, she said.

In general, asynchronous I/O uses select() and friends to wait for a list of file descriptors, which can be sockets, files, or other events, to become ready for read or write. When the call returns, it indicates which of the descriptors is ready. Those calls allow programs to handle "thousands and thousands of concurrent connections", she said.

To demonstrate that, she ran a live demo on her Mac laptop. Using Twisted running under PyPy, she ran a client and server that made over 10,000 concurrent connections sending ping messages back and forth. Handling more than 10,000 pings per second on consumer hardware shows what asynchronous I/O can do, she said. That's probably more concurrent connections "than your site needs".

Twisted, asyncio, and others rely on "selector loops" so that they do not block. Data is queued to be sent when the network is ready and reads are only done when it is already known that there is data available to be read. These selector loops, also called "I/O loops" or "reactors", allow a higher density per CPU core, without threads. There is no parallelism, but there is concurrency: "you are still handling one thing at a time, but you are a bit smarter about what one thing you are handling when".

This works well when there is high I/O throughput, high-latency clients such as mobile phones, and low CPU processing needed for each request. Calculating pi to a million digits for each connection is not going to work well, but in most cases, the program is waiting for the client or for the database.

Asynchronous I/O frameworks generally provide users with an object that is a stand-in for a pending result. Twisted uses Deferred objects to do so, while asyncio uses Future objects. They are similar, though a Deferred will run its callback as soon as possible, while a Future will schedule it for the next reactor loop.

In 2012, the asynchronous I/O situation in Python 3 was "a mess". Twisted was not available, but Node.js was exploding in popularity and .NET had recently added async/await for asynchronous I/O support. Python 3 needed a "killer feature", she said. Enter asyncio. It was designed with coroutines in mind. Coroutines in Python are a special type of Generator. Python 3.5 added async and await to make Future objects act like coroutines.

The asyncio module will help reduce the library API fragmentation that has occurred over time and will also reduce duplication. Other frameworks, such as Twisted, Tornado , gevent, and others will be able to adapt their event loops to fit into the asyncio model. None of those will have to duplicate what is already available in the language. She quoted extensively from the "Interoperability" section of PEP 3156, which is the basis of asyncio, in her slides [Speaker Deck].

So that leads to a question, she said: "Doesn't asyncio replace Twisted?". They are both cooperative, single-threaded frameworks with primitives to support asynchronous programming. They use the same system calls and their I/O loops are architecturally similar. The asyncio transports and protocols were directly inspired by Twisted. Asyncio comes as a standard feature in Python 3.4 and beyond, so perhaps Twisted is not needed any longer?

But Brown begs to differ: "asyncio is an apple, Twisted is a fruit salad". There is a huge amount of code and comments in Twisted, nearly 300,000 lines of code (Python and C with tests), including over 100,000 lines of comments. Asyncio has some 24,000 lines currently. That size difference is not from bloat, she said; there are lots of places where the standard library is deficient in terms of networking protocols and the like, so Twisted has filled in a lot of those gaps. There are many features in Twisted that are not available in asyncio, as well.

Tornado is an asynchronous web server framework that has many similar concepts and constructs to those in Twisted. It has its own I/O loop, though it integrates with either Twisted or asyncio. Ultimately, the project may remove its I/O loop and move to using the asyncio version. Over the years, Tornado has changed to adopt the standard Python mechanisms as they have become available. She wondered if that was a model for Twisted moving forward.

But interoperability turns out to be hard. Asyncio is similar, but not the same, and there is no way to directly map Twisted to asyncio. Her focus is on getting async and await working with Twisted. await gets the result of a coroutine, but without blocking waiting for the result. It allows writing asynchronous code in a synchronous style. Since coroutines are a special form of Generator, the "trampoline" that will turn a Deferred into a Generator, which has been in Twisted since 2006, can be used to make that work.

Two features are coming soon that will help with interoperability. The @deferredCoroutine decorator will allow coroutines wrapped in a Deferred so that await can be used on a Deferred. The second is the asyncioreactor, which is a Twisted reactor built on top of asyncio. The patches for those have not been reviewed yet and require changes to asyncio, so they may still be a ways out.

There are good reasons to continue to use Twisted, Brown said. It is released often, typically three times per year, though 2016 is set to have five. These are time-based releases that come directly from the trunk. Because of its stability, some people actually deploy from the Twisted trunk, though she is "not going to say it's a good idea."

There are a large number of protocols available in Twisted right out of the box. She put up a list of a dozen or so (e.g. HTTP, DNS, IRC, FTP, POP3, IMAP4), all of which can be glued together in various ways. It is also easy to add protocols. Support for HTTP/2 is coming soon.

There are a number of libraries and frameworks that use Twisted under the hood. These include txacme and txsni for supporting automatic certificate renewal of Let's Encrypt certificates, the hendrix web server, and Autobahn|Python for WebSocket handling, which is "really fast under PyPy", she said.

Twisted is a dependable base; "we try not to break your code". It has deprecation cycles that give a year's warning when things are being removed. It undergoes a lot of code review and automated testing, which allows users to "upgrade with impunity". Twisted is also fast, especially when it is run with PyPy.

Beyond that, Twisted officially supports multiple platforms (most major Linux distributions, FreeBSD, Windows, and OS X). That means that all tests must pass on each supported platform before a branch can be merged to the trunk. It runs on Python 2.7 for all of those platforms; it also supports 3.4 and 3.5 (though there are still some protocols and such that need to be ported) on Linux and FreeBSD. There are only a handful of tests that do not pass under PyPy, almost all of which are due to the code making assumptions that it is running on CPython.

Competition is good, she said. The arrival of asyncio helped get Twisted moving to support Python 3 better. Eventually, Twisted will be calling asyncio and vice versa and there will be full interoperability between them. Those wishing to help make that happen should follow the async-sig mailing list.

A YouTube video of the talk is available for this interested in more details.

[ I would like to thank LWN subscribers for supporting my travel to Portland for PyCon. ]

Index entries for this article
ConferencePyCon/2016


to post comments


Copyright © 2016, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds