SciPy reaches 1.0
After 16 years of evolution, the SciPy project has reached version 1.0. SciPy, a free-software project, has become one of the most popular computational toolkits for scientists from a wide range of disciplines, and is largely responsible for the ascendancy of Python in many areas of scientific research. While the 1.0 release is significant, much of the underlying software has been stable for some time; the "1.0" version number reflects that the project as a whole is on solid footing.
What is SciPy?
The term "SciPy" is overloaded: it refers to a library of scientific and numerical routines used with Python (and here usually spelled "scipy") and to an annual conference devoted to the use of Python in the sciences. It also is used to denote a collection of libraries, packaged together and often distributed as a whole, that work together to form a scientist's toolkit. Beyond that, the name is also used to refer to the organization that guides and curates the various parts of the libraries.
The core pieces of the SciPy collection are:
- The scipy library
- NumPy, which is the centrally important numerical library for Python,
- IPython, which is the enhanced Python interactive shell (or REPL),
- The matplotlib plotting library
- The symbolic mathematics library SymPy
- The pandas data analysis library
The libraries mentioned above are, for the most part, general-purpose infrastructure. The typical user of SciPy may also install one or more SciKits, which are specialized packages designed to be used with SciPy. They are kept apart from the main SciPy bundle when they are too specialized, have an incompatible license (SciPy uses BSD), or are not quite mature enough. Right now there are 89 SciKits listed on the official page, covering a wide range of specialties. There is software for fuzzy logic, aeronautical engineering, digital signal processing, particle physics, image processing, machine learning, and many more arcane tasks.
As for the scipy library itself, it contains such things as a database of physical constants, a library of special functions (Bessel, hypergeometric, elliptic, etc.), numerical integrators, Fourier transforms, statistics routines, and image processors.
The 1.0 release
Since SciPy, even when it refers to software, encompasses a large and various collection of programs, each of which has its own version number, the meaning of the SciPy release version needs some explanation. The significance of the 1.0 release is that some organizational milestones and some overall project goals have been reached. Much of the actual module code has been mature and stable for some time.
With this release, SciPy has adopted a formal governance structure with a Benevolent Dictator for Life (Pauli Virtanen), as well as a Steering Council. It has an official Code of Conduct and a Roadmap. The 1.0 version number is meant to reflect the maturity and stability of the organization as much as the code under the SciPy umbrella.
The project now has better Windows support, including both binary downloads and the ability to compile SciPy code using the Microsoft Visual C++ compiler and GFortran. This may strike some readers of this publication as uninteresting, but it is a boon to the users of SciPy who, for one reason or another, need to use Windows.
Some of the major technical milestones reached in this release include a new set of ordinary differential equation solvers and reorganization of the ODE library interface; some better performing functions within the optimizer library, which handles such things as searching for the minimum of functions, curve fitting, and root finding; and assimilation of more LAPACK and BLAS routines, completing the interface to the entire BLAS library. These are sets of Fortran subroutines for linear algebra and matrix computations that are widely used and highly regarded by numerical programmers working in all languages; highly optimized versions are available for various machine architectures. There is also an assortment of improvements to many other routines in the scipy library.
The SciPy community
While the first official version of SciPy was released in 2001, work on SciPy actually began in the 1990s. This was during the emergence of Python. Many young scientists and engineers, having been raised in the era of Fortran (and sometimes C) as the only language for scientific computing, began to experiment with this new interpreted, dynamically typed, expressive language. Python was fun to program in, but could it be used for serious work? Python's suitability as a glue language helped to answer that question in the affirmative.
SciPy began as a set of Python interfaces to trusted Fortran and C programs for numerical calculations. This makes sense: Fortran is used because its compilers generate fast numerical code, but its string handling and I/O are clumsy, and it has no convenient interfaces to the operating system or for interacting with other programs. Python is good at what Fortran lacks, although, before the emergence of NumPy, it was slow at numerical computation. Using Python to steer compiled numerical routines, and to explore their outputs, exploits the strengths of each language.
This turned out to be a popular idea. Now the scipy library involves over 500 contributors and is actively developed: in the past 30 days as of this writing, 28 developers have pushed 101 commits to the project (this does not include NumPy nor the other parts of SciPy). The SciPy developers are diverse. They include a physicist who now works for a New Zealand company in data science and forestry, a mathematician and physicist working for Wolfram Research, a researcher in image compression from Massachusetts, an applied mathematician from the Netherlands, a lecturer in natural language processing from Australia, a graduate student in astrophysics at the University of Notre Dame, a mathematician studying dynamical systems at Enthought in Texas, someone working in geographic information systems at MapBox from Colorado, an electronics engineer in New York, a cosmologist at Berkeley, and, of course, hundreds more in many fields of research.
Development is done in the open on GitHub, and the community is welcoming and helpful toward new contributors. This friendly culture may be due to the fact that most SciPy developers are researchers who are creating software with the overriding goal of helping to solve their research problems, with the niceties of programming practice as only a secondary concern.
Some idea of the emphases in development for the near future can be found in SciPy's official Roadmap. This document highlights the need for making the APIs consistent and improving test coverage. There are plans to fix issues with the Fourier Transform routines and clarify the differences between those in scipy and those in NumPy. Also, adding features to the interpolation routines (an opportunity for experts in splines to contribute), further generalizing the interface to LAPACK and BLAS routines, removing or rewriting the routines for wavelets (another good opportunity for contributions), improving several important routines for calculating special functions, and perhaps creating a new module for numerical differentiation are planned. From these details and others in the Roadmap, there emerges a general emphasis on not making any radical changes, but continuing to lay a solid foundation for a toolkit that will be generally usable far into the future. Hence the group has an interest in rationalizing the overall organization of the code, creating consistent interfaces that abstract the details of using legacy routines written with different styles and conventions, and keeping the documentation complete and up to date.
As I was preparing this article I came across a fascinating, current research report in Nature Communications on the biophysics of human sperm locomotion. It's not only free to read, but has links to all of the code used in the analysis of the experiments. It's all Python, and uses SciPy. I mention this not only to bring up another example of how SciPy has become so widely adopted, but to suggest that its adoption is part of a growing culture of openness in the sciences. Proprietary tools are being replaced by open-source alternatives; authors are placing data, which used to be kept locked away in laboratory filing cabinets, in open repositories. In addition, computer programs that used to be kept secret are now open to scrutiny and the grip of publishers on the dissemination of journal articles is being weakened. It's tempting to speculate that the spirit of open source is influencing science through the adoption of toolkits such as SciPy, and is reinforcing the movement toward greater transparency in research.
How to get it
SciPy is one of the free software world's huge umbrella projects, such as TeX Live, that consist of scores of other projects, many of which are developed independently of each other. As is usual with such umbrella projects, the version available through your distribution's package manager will be several releases behind the current one. In the case of SciPy, this may very well not matter to you; but if it does, and you desire the 1.0 release, you must get closer to the source.
SciPy supports multiple Python versions. For Python 2.x, it supports 2.6 and 2.7; for Python 3, all versions starting from 3.2 are supported.
The easiest way for a Python user on Linux probably is to use the
pip install
command—the full incantation is spelled out on the
official install page, and
simply uses pip
to install SciPy's major components, such as
NumPy and Jupyter.
Another option, which may be more convenient for Windows users and some
others, is to install the self-contained Python distributions maintained by
several companies, including Enthought, a major
institutional sponsor of the SciPy project and its conferences. As of this
writing, however, the versions packaged in these distributions, while
recent, were slightly behind the latest releases available through
pip
.
Finally, if you have a few spare gigabytes and are willing to relinquish some control over which versions of various libraries are installed, Sage, the mathematical software covered here back in January, contains SciPy (and Jupyter, as well, since that is the currently preferred notebook interface to Sage).
Documentation
Documenting such a huge project is itself a significant undertaking. There is a reference manual in PDF form. I hesitate to link to it, as the one for version 1.0 weighs in at 2115 pages, and has no table of contents nor index. The first hundred pages are release notes, lists of authors, lists of pull requests, and similar material. This leaves search as the only way to find anything; searching through 2000 pages of PDF is not snappy. (But here is the PDF for those who want it despite the warnings.)
A better choice if you're getting started is the "documentation" link on the SciPy home page. This will lead you to other guides, including collections of recipes and some tutorials.
The typical user of SciPy will only use a fraction of its many
specialized scientific and numerical libraries, so there is no need to have
a huge trove of documentation at hand. One challenge may be to
discover whether there exists a SciPy module that might help you in your
work. Your options for exploring this space are web searches, specifically
site searches aimed at scipy.org, discussion with members of your research
community, browsing the scipy
directory within your local
machine's Python library directory tree (the source code has extensive
docstrings), or using Python's online help system, which is a more
convenient interface to this last approach.
To use the help system, first execute the ipython
command;
then, within the REPL, type help()
. This works because
ipython
imports the pydoc module for you, which defines the
help()
command. This command will place you in a documentation
subsystem, which you can exit with a ctrl-D. Within the help system, you
can simply type the names of modules or sub-packages that you want to learn
about. Typing "scipy" will give you a list of top-level modules in the
scipy library; to learn about any one of these, ask about it using import
syntax. For example, you will see "stats" in the list; to learn about the
functions in this package, type "scipy.stats", and you will get extensive,
if concise, documentation about SciPy's statistical functions, including
some examples of use. If you've browsed the source code directly, you'll
notice that this documentation is built from the docstrings, but organized
more conveniently.
Using SciPy
Let's try out SciPy in the terminal, using IPython. The aim in this section is to provide a feel for interactive exploration in the REPL, and demonstrate the expressive power of the SciPy libraries. First, we need to spend a little time getting briefly acquainted with NumPy, which is the core extension to Python upon which everything else is based. Even if you never use any other part of SciPy, if you ever find yourself using Python to do any type of numerical computation, you will want to at least be aware of what NumPy offers.
NumPy adds a new data type to python: the numpy array. These are true
arrays, distinct from Python's lists (and different, as well, from the
array
type provided in the standard library). They are
multidimensional, homogeneous (numbers only) collections that store their
elements contiguously in memory, which allows fast, vectorized operations
by the Fortran or C routines that use them. NumPy
provides operators and functions that operate on arrays as a whole or
element-wise; while this is not the place for a NumPy tutorial, here is a
taste of what is essentially an array mini-language within Python, that
will seem familiar to those who have used array languages such as APL,
or array Fortran. If you execute the following Python code (in either
Python2 or Python3):
import numpy as np a = np.array([1, 2, 3]) a**2
You will get the output array([1, 4, 9])
. Note that each
element of the array is squared, and that this would be a
TypeError if we
tried it with a list rather than an array.
You can have multidimensional arrays, too. Here is a 2D array (a matrix):
d = np.array([ [1, 2, 3], [4, 5, 6] ]) d + 1
The way the result is printed shows how matrices are represented, and that the "+" operator is overloaded to act element-wise on arrays:
array([[2, 3, 4], [5, 6, 7]])
NumPy provides versions of many mathematical functions, such as
sin()
and cos()
, that are extended to act element-wise
on arrays, and even makes it possible for you to extend your own functions
to operate the same way.
Using what is essentially a domain-specific-language for arrays can be a
mind-expanding style of programming for those who are mainly familiar with
explicit looping over data. NumPy enables you to use an interpreted
scripting language to perform non-trivial, fast numerical computations,
making the rest of the SciPy ecosystem possible.
The first figure is a screen shot of an IPython session showing the use
of some of the linear algebra functions from the scipy library. First we do
the necessary imports, then set some options for the way NumPy will print
numbers, to keep things neat. We define a matrix a
, and print
it out, then calculate its inverse, and print that. The inverse of a matrix
gives you the identity matrix (ones on the diagonal) when you
matrix-multiply it with the original. The inverse can be used to solve sets of
linear equations, for example. To verify that b
is really the
inverse of a
, we multiply the two matrices, and get an
identity matrix as the result.
Now let's import the fast Fourier transform routines to create a filtered square wave:
from scipy.fftpack import fft, ifft
We'll need one cycle of a square wave to begin; we'll use a numpy array of 100 zeroes followed by 100 ones. We'll also need some x-values for plotting, so we make an x array with the same shape:
sw = np.array([np.zeros(100), np.ones(100)]).reshape(200,) x = np.arange(0, 200, 1)
We haven't seen all of the things in the lines above explained, but
their meaning should be fairly clear. Just to make sure we did this right,
the figure above shows the square
wave. All we need to do to see it is import pylab
and then
pylab.plot(x, sw)
(see the matplotlib article linked
above).
Now we pass the array into the fft()
routine and store the
Fourier coefficients in the array swt
:
swt = fft(sw)
Plotting swt
with the pylab.bar()
command
reveals the spectrum in the figure above. As an exercise for the reader, if
you now
calculate the inverse
transform of the spectrum using ifft(sw)
and plot the result,
you should get back the original square wave.
Finally, to show the effect of truncating the Fourier series, we make a copy of the coefficient array, zero out all the coefficients above the 10th, and take the inverse transform of that:
swtrunk = swt swtrunk[10:] = 0 swtrunki = ifft(swtrunk)
Plotting the result produces a picture of the low-pass filtered square wave in the figure above.
I hope that these brief example sessions give you some idea of the interactive, exploratory powers that Python and SciPy provide. This style of expressive, fluid computation was virtually unknown to scientists a generation ago. Even if one had access to such conveniences as precompiled Fourier transform routines, in order to carry out the simple experiment that we did above would be comparatively laborious, without the immediate feedback that encourages a constructively playful approach to computation. The SciPy community deserves thanks and congratulations on reaching their recent milestone, and support for future development.
Index entries for this article | |
---|---|
GuestArticles | Phillips, Lee |
Posted Nov 16, 2017 7:46 UTC (Thu)
by efiring (guest, #4543)
[Link] (8 responses)
y = np.zeros((200, ), dtype=float)
Posted Nov 16, 2017 13:09 UTC (Thu)
by leephillips (guest, #100450)
[Link] (7 responses)
Posted Nov 16, 2017 17:01 UTC (Thu)
by efiring (guest, #4543)
[Link] (6 responses)
Posted Nov 16, 2017 18:14 UTC (Thu)
by leephillips (guest, #100450)
[Link]
Posted Nov 16, 2017 18:33 UTC (Thu)
by leephillips (guest, #100450)
[Link] (2 responses)
Posted Nov 17, 2017 3:55 UTC (Fri)
by efiring (guest, #4543)
[Link] (1 responses)
Posted Nov 17, 2017 13:47 UTC (Fri)
by leephillips (guest, #100450)
[Link]
Posted Nov 16, 2017 20:58 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (1 responses)
Posted Nov 17, 2017 13:45 UTC (Fri)
by leephillips (guest, #100450)
[Link]
Posted Nov 16, 2017 15:08 UTC (Thu)
by kpfleming (subscriber, #23250)
[Link] (5 responses)
"many 'more arcane' tasks" != "'many more' arcane tasks", and yet there's no straightforward way to express this to avoid the confusion. Some day we will find alien life which communicates using math, and we'll understand why.
Posted Nov 16, 2017 15:15 UTC (Thu)
by leephillips (guest, #100450)
[Link] (1 responses)
Posted Nov 18, 2017 3:39 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link]
Posted Nov 16, 2017 17:53 UTC (Thu)
by sanjoy (guest, #5026)
[Link] (2 responses)
To respond to the point that
"many 'more arcane' tasks" != "'many more' arcane tasks", and yet there's no straightforward way to express this to avoid the confusion.
I would distinguish the two using a hyphen (which is why I am a hyphen stickler): "many more-arcane tasks" vs. "many more arcane tasks." If one is extremely consistent about hyphenating compound modifiers, then the second form should be interpreted correctly. And don't get me started on the fading of the Oxford (series) comma.
Posted Nov 16, 2017 18:18 UTC (Thu)
by leephillips (guest, #100450)
[Link] (1 responses)
Posted Nov 18, 2017 3:40 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link]
But with the recent Oxford comma lawsuit here, maybe it will be used more.
Posted Nov 16, 2017 16:23 UTC (Thu)
by mgedmin (subscriber, #34497)
[Link] (1 responses)
swtrunk = swt
does not actually copy the array, it merely creates a new alias for the same array object. To copy you would use
swtrunk = swt.copy()
like in the example suggested by efiring.
Posted Nov 16, 2017 16:34 UTC (Thu)
by leephillips (guest, #100450)
[Link]
Posted Nov 17, 2017 10:59 UTC (Fri)
by smitty_one_each (subscriber, #28989)
[Link] (1 responses)
Well, so much for "Skippy".
Posted Nov 17, 2017 20:42 UTC (Fri)
by adam820 (subscriber, #101353)
[Link]
Posted Nov 23, 2017 4:20 UTC (Thu)
by charris (guest, #13263)
[Link] (2 responses)
Posted Nov 23, 2017 12:34 UTC (Thu)
by leephillips (guest, #100450)
[Link] (1 responses)
Python 2.x >= 2.6 or 3.x >= 3.2 and
Do you have a source for the slightly higher versions you've listed?SciPy reaches 1.0
y[100:] = 1
yft = rfft(y)
yft_trunc = yft.copy()
yft_trunc[21:] = 0
yrecon = irfft(yft_trunc)
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
The truncated reconstruction can be done using the complex (fft and ifft) forms, as illustrated in the "reconstruct" function towards the end of this notebook: https://currents.soest.hawaii.edu/ocn_data_analysis/_stat.... It requires zeroing a slice in the *middle* of the transformed array.
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
SciPy reaches 1.0
NumPy >= 1.6