A Gilectomy update

By Jake Edge
May 16, 2018

In a rather short session at the 2018 Python Language Summit, Larry Hastings updated attendees on the status of his Gilectomy project. The aim of that effort is to remove the global interpreter lock (GIL) from CPython. Since his status report at last year's summit, little has happened, which is part of why the session was so short. He hasn't given up on the overall idea, but it needs a new approach.

Gilectomy has been "untouched for a year", Hastings said. He worked on it at the PyCon sprints after last year's summit, but got tired of it at that point. He is "out of bullets" at least with that approach. With his complicated buffered-reference-count approach he was able to get his "gilectomized" interpreter to reach performance parity with CPython—except that his interpreter was running on around seven cores to keep up with CPython on one.

The old adage of "fast, cheap, good; pick any two" has a parallel in the world of the Gilectomy (and other similar projects). In that case the three items are: "high performance", "doesn't break the C API", and "uses multiple cores". CPython doesn't use multiple cores and Gilectomy 1.0 is not high performance, which leads him to consider breaking the C API.

For "Gilectomy 2.0", Hastings will be looking at using a tracing garbage collector (GC), rather than the CPython GC that is based on reference counts. Tracing GCs are more multicore friendly, but he doesn't know anything about them. He also would rather not write his own GC.

This new version of Gilectomy would also have "cheap local object locking". It would distinguish between objects that are only visible in one thread versus those that are visible to two or more. Objects can transition from local to non-local in various ways, some of which are not particularly obvious. It will be difficult to identify all of those, so it is not a change he makes lightly.

His next step will be to get rid of the existing code. He hopes to be able to gain access to the Instagram runtime discussed in the previous summit session and to gilectomize that. Though the code has not been released, he may be able to arrange an agreement with Facebook/Instagram to gain access to the code, he said. Since that version of the interpreter did not break the C API (at least for extensions that Instagram uses), it may mean that Gilectomy 2.0 will actually not have to break that API.

But Thomas Wouters pointed out that there are things that the Instagram experiment is doing that will immediately break big and important Python extensions like NumPy and SciPy. Hastings is a bit sanguine about that: if Gilectomy ever ships in CPython, the core developers can go on vacation for a month and the extension developers will have it all fixed by the time they return. In truth, it would be a great problem to have, but he is so far away from being successful with his Gilectomy efforts that he can't even see it from where he is, he said.

One audience member said that it would be nice to be able to see the Instagram experimental code to determine just what the C API compatibility issues are. Hastings agreed but noted that there was no way to make that happen until the company is ready to do so. There was also a question of how this work fits with the subinterpreter effort. Hastings said that he saw no reason that the Gilectomy and subinterpreters would not play well together.

Index entries for this article
Conference	Python Language Summit/2018
Python	Global interpreter lock (GIL)/Gilectomy

A Gilectomy update

Posted May 17, 2018 9:32 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (9 responses)

Would it be possible for Python's runtime to use the Boehm-Demers-Weiser GC? It's written for C/C++ and implements mark-sweep.

A Gilectomy update

Posted May 17, 2018 10:11 UTC (Thu) by ehiggs (subscriber, #90713) [Link] (8 responses)

Yes, it's possible to use Boehm (et al), but mark and sweep is not particularly nice for larger data sets since the mark and sweep phases result in walking through memory twice (paging memory a lot). Also, being conservative means the Boehm collector keeps hold of a lot of memory.

One thing I'm particularly interested in, however, is if it would be possible to do the following:

1. Deprecate the threading API (who even uses threads in Python in the face of the GIL?)
2. Make the GIL thread local (reference counts no longer require locks or cache shattering atomic operations).
3. Provide Actor library or other light weight process that can only communicate via message passing.
4. Schedule Actors on threads.
5. The GIL is dead! Long live the TLIL!

A Gilectomy update

Posted May 17, 2018 10:46 UTC (Thu) by mfuzzey (subscriber, #57966) [Link] (2 responses)

"who even uses threads in Python in the face of the GIL?"

I don't think the GIL makes threads completely useless.

If the threads are IO bound rather than compute bound the GIL shouldn't impact performance that much.

Multithreaded designs can be easier to understand than async callback based designs, sometimes (often) the aim is to have code that is easy to write and maintain rather than being high performance. Procedural code with parallelism via threads is, IMHO, *much* easier to understand and reason about than "callback soup".
If the threads mostly work on independent data locking isn't much of an issue either.

Of course, for independent data a multiprocess design may be better and avoids the GIL but, on some operating systems that lack fork() process creation is expensive vs thread creation.

A Gilectomy update

Posted May 17, 2018 12:27 UTC (Thu) by ehiggs (subscriber, #90713) [Link] (1 responses)

>(often) the aim is to have code that is easy to write and maintain rather than being high performance.

Sure, if you write MT code where everything is behind the GIL, then it's easy to maintain since there's no concurrency (in the Python code). If we remove the GIL, then your code that used to work is naked and exposed to all the joys of C style thread interaction. I know I've written code that probably shouldn't have worked, but somehow did (because the GIL protected me).

If we want to keep the threading model and remove the GIL then, IMO, Python will have to offer some hooks for data hiding or immutability. Instead of trying to retrofit this into threading, I suggest just using the aforementioned actor model. We don't need to actually remove threading, I guess.

Maybe I should rephrase the comment:
1. Deprecate the threading API (who is even using the threading module and believes their code will still work when the GIL no longer protects them from actual concurrency?)

A Gilectomy update

Posted May 17, 2018 14:41 UTC (Thu) by mfuzzey (subscriber, #57966) [Link]

The difficulty of multi-threading very much depends on your data model.

If your problem can be partitioned so that each thread is only working on its own data (like the image processing example that was mentioned) then multi-threading is quite straightforward. Especially if you use python modules like Queue for interthread communication of objects.

On the other hand if your data model has lots of mutable state shared between threads then yes it can be a pain to get right and there is indeed a risk that it "appears to work" due to the GIL and will break if/when that is removed.

I think a significant part of python applications fall into the first category.

A Gilectomy update

Posted May 17, 2018 11:03 UTC (Thu) by excors (subscriber, #95769) [Link]

> 1. Deprecate the threading API (who even uses threads in Python in the face of the GIL?)

I've used threads for interfacing with external devices like cameras - there's a module that uses the straightforward synchronous camera API to receive images, does some processing with numpy/OpenCV/etc, then pushes the image onto a queue for the main thread to receive. The main thread is coordinating several different external devices at once, each with their own thread. The GIL doesn't matter since the threads are either sleeping or are inside numpy/OpenCV (which should release the GIL). Asynchronous APIs to the hardware are either nonexistent or harder to use. I assume multiprocessing would make it expensive to transfer the image data. Threading seems like probably the best solution in that situation.

A Gilectomy update

Posted May 17, 2018 15:14 UTC (Thu) by quietbritishjim (subscriber, #114117) [Link] (1 responses)

> who even uses threads in Python in the face of the GIL?

Plenty of people use multiple threads in Python, even for CPU-bound computation (others already mentioned IO-bound code). It's possible for a C extension module to release the GIL during a call, and many do, most significantly numpy (so long as your matrices don't have Python objects in their entries) and some of the scipy modules.

This makes these efforts to remove the GIL a little misplaced in my view. If you're doing computation that's so intense that you could do with multiple threads, even if the GIL didn't exist it would probably be a better strategy to vectorise your operations using numpy or the like because pure Python is so slow. If you've already done that, you can use more threads and the GIL won't make too much difference anyway; it will just serialise the code that essentially just coordinates calls to functions that do the real work. That code will make up the majority (/all) of your project's code, but a tiny minority of the run time.

A Gilectomy update

Posted May 17, 2018 15:58 UTC (Thu) by rghetta (subscriber, #39444) [Link]

Well, it's true if you can vectorize, but if you cannot, the GIL hurts a lot.
In some of our workloads, despite very heavy use of C extensions, threads spend most of the time waiting for the GIL. The time spent at C level is significant, but not enough to offset the waitings.
To be fair, our application probably would benefit most from the subinterpreter work mentioned previously, but still a GIL-less python would be very useful.

A Gilectomy update

Posted May 25, 2018 17:29 UTC (Fri) by renejsum (guest, #124634) [Link]

Great idea, not as far fetched as some might think. I was thinking in these lines, but using the subinterpreter and for example the pyworks actor framework. (GitHub.com/pylots/pyworks)

A Gilectomy update

Posted May 27, 2018 7:14 UTC (Sun) by ehiggs (subscriber, #90713) [Link]

> 2. Make the GIL thread local (reference counts no longer require locks or cache shattering atomic operations).
> ...
> 5. The GIL is dead! Long live the TLIL!

What on Earth is a Thread Local Lock? You obviously didn't think this through.

A Gilectomy update

Posted May 25, 2018 21:39 UTC (Fri) by renejsum (guest, #124634) [Link] (1 responses)

I have deep respect for what Larry is trying to do and I wish there was some way to help the effort.

One thing I have been pondering is what it would take to move the effort to PyPy?

PyPy is a modern VM, it's even written in (R)Python, it's fast, it's being actively developed. Maciej Fijalkowski, Armin Rigo & co is ready to work on removing the GIL. (I have even offered to pay some of the work)

I understand the backward compatibility issue and the fear of breaking backward compatibility (It has been a long 10 years for Python3 to become mainstream)

But, still I think it would be the right way into the future for Python to get a *much* better VM, fast and with multicore support...

A Gilectomy update

Posted May 30, 2018 14:26 UTC (Wed) by laike9m (guest, #124719) [Link]

This so hard, considering Guido has no real interest in this.