[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
|
|
Subscribe / Log in / New account

SciPy reaches 1.0

November 15, 2017

This article was contributed by Lee Phillips

After 16 years of evolution, the SciPy project has reached version 1.0. SciPy, a free-software project, has become one of the most popular computational toolkits for scientists from a wide range of disciplines, and is largely responsible for the ascendancy of Python in many areas of scientific research. While the 1.0 release is significant, much of the underlying software has been stable for some time; the "1.0" version number reflects that the project as a whole is on solid footing.

What is SciPy?

The term "SciPy" is overloaded: it refers to a library of scientific and numerical routines used with Python (and here usually spelled "scipy") and to an annual conference devoted to the use of Python in the sciences. It also is used to denote a collection of libraries, packaged together and often distributed as a whole, that work together to form a scientist's toolkit. Beyond that, the name is also used to refer to the organization that guides and curates the various parts of the libraries.

The core pieces of the SciPy collection are:

  • The scipy library
  • NumPy, which is the centrally important numerical library for Python,
  • IPython, which is the enhanced Python interactive shell (or REPL),
  • The matplotlib plotting library
  • The symbolic mathematics library SymPy
  • The pandas data analysis library
Another project that's often installed with SciPy is Jupyter, which can be used in place of IPython or the normal Python REPL. Jupyter uses a web browser as a rich interface to Python that is particularly well suited to scientific or mathematical work, as it can embed graphical output and LaTeX equations.

The libraries mentioned above are, for the most part, general-purpose infrastructure. The typical user of SciPy may also install one or more SciKits, which are specialized packages designed to be used with SciPy. They are kept apart from the main SciPy bundle when they are too specialized, have an incompatible license (SciPy uses BSD), or are not quite mature enough. Right now there are 89 SciKits listed on the official page, covering a wide range of specialties. There is software for fuzzy logic, aeronautical engineering, digital signal processing, particle physics, image processing, machine learning, and many more arcane tasks.

As for the scipy library itself, it contains such things as a database of physical constants, a library of special functions (Bessel, hypergeometric, elliptic, etc.), numerical integrators, Fourier transforms, statistics routines, and image processors.

The 1.0 release

Since SciPy, even when it refers to software, encompasses a large and various collection of programs, each of which has its own version number, the meaning of the SciPy release version needs some explanation. The significance of the 1.0 release is that some organizational milestones and some overall project goals have been reached. Much of the actual module code has been mature and stable for some time.

With this release, SciPy has adopted a formal governance structure with a Benevolent Dictator for Life (Pauli Virtanen), as well as a Steering Council. It has an official Code of Conduct and a Roadmap. The 1.0 version number is meant to reflect the maturity and stability of the organization as much as the code under the SciPy umbrella.

The project now has better Windows support, including both binary downloads and the ability to compile SciPy code using the Microsoft Visual C++ compiler and GFortran. This may strike some readers of this publication as uninteresting, but it is a boon to the users of SciPy who, for one reason or another, need to use Windows.

Some of the major technical milestones reached in this release include a new set of ordinary differential equation solvers and reorganization of the ODE library interface; some better performing functions within the optimizer library, which handles such things as searching for the minimum of functions, curve fitting, and root finding; and assimilation of more LAPACK and BLAS routines, completing the interface to the entire BLAS library. These are sets of Fortran subroutines for linear algebra and matrix computations that are widely used and highly regarded by numerical programmers working in all languages; highly optimized versions are available for various machine architectures. There is also an assortment of improvements to many other routines in the scipy library.

The SciPy community

While the first official version of SciPy was released in 2001, work on SciPy actually began in the 1990s. This was during the emergence of Python. Many young scientists and engineers, having been raised in the era of Fortran (and sometimes C) as the only language for scientific computing, began to experiment with this new interpreted, dynamically typed, expressive language. Python was fun to program in, but could it be used for serious work? Python's suitability as a glue language helped to answer that question in the affirmative.

SciPy began as a set of Python interfaces to trusted Fortran and C programs for numerical calculations. This makes sense: Fortran is used because its compilers generate fast numerical code, but its string handling and I/O are clumsy, and it has no convenient interfaces to the operating system or for interacting with other programs. Python is good at what Fortran lacks, although, before the emergence of NumPy, it was slow at numerical computation. Using Python to steer compiled numerical routines, and to explore their outputs, exploits the strengths of each language.

This turned out to be a popular idea. Now the scipy library involves over 500 contributors and is actively developed: in the past 30 days as of this writing, 28 developers have pushed 101 commits to the project (this does not include NumPy nor the other parts of SciPy). The SciPy developers are diverse. They include a physicist who now works for a New Zealand company in data science and forestry, a mathematician and physicist working for Wolfram Research, a researcher in image compression from Massachusetts, an applied mathematician from the Netherlands, a lecturer in natural language processing from Australia, a graduate student in astrophysics at the University of Notre Dame, a mathematician studying dynamical systems at Enthought in Texas, someone working in geographic information systems at MapBox from Colorado, an electronics engineer in New York, a cosmologist at Berkeley, and, of course, hundreds more in many fields of research.

Development is done in the open on GitHub, and the community is welcoming and helpful toward new contributors. This friendly culture may be due to the fact that most SciPy developers are researchers who are creating software with the overriding goal of helping to solve their research problems, with the niceties of programming practice as only a secondary concern.

Some idea of the emphases in development for the near future can be found in SciPy's official Roadmap. This document highlights the need for making the APIs consistent and improving test coverage. There are plans to fix issues with the Fourier Transform routines and clarify the differences between those in scipy and those in NumPy. Also, adding features to the interpolation routines (an opportunity for experts in splines to contribute), further generalizing the interface to LAPACK and BLAS routines, removing or rewriting the routines for wavelets (another good opportunity for contributions), improving several important routines for calculating special functions, and perhaps creating a new module for numerical differentiation are planned. From these details and others in the Roadmap, there emerges a general emphasis on not making any radical changes, but continuing to lay a solid foundation for a toolkit that will be generally usable far into the future. Hence the group has an interest in rationalizing the overall organization of the code, creating consistent interfaces that abstract the details of using legacy routines written with different styles and conventions, and keeping the documentation complete and up to date.

As I was preparing this article I came across a fascinating, current research report in Nature Communications on the biophysics of human sperm locomotion. It's not only free to read, but has links to all of the code used in the analysis of the experiments. It's all Python, and uses SciPy. I mention this not only to bring up another example of how SciPy has become so widely adopted, but to suggest that its adoption is part of a growing culture of openness in the sciences. Proprietary tools are being replaced by open-source alternatives; authors are placing data, which used to be kept locked away in laboratory filing cabinets, in open repositories. In addition, computer programs that used to be kept secret are now open to scrutiny and the grip of publishers on the dissemination of journal articles is being weakened. It's tempting to speculate that the spirit of open source is influencing science through the adoption of toolkits such as SciPy, and is reinforcing the movement toward greater transparency in research.

How to get it

SciPy is one of the free software world's huge umbrella projects, such as TeX Live, that consist of scores of other projects, many of which are developed independently of each other. As is usual with such umbrella projects, the version available through your distribution's package manager will be several releases behind the current one. In the case of SciPy, this may very well not matter to you; but if it does, and you desire the 1.0 release, you must get closer to the source.

SciPy supports multiple Python versions. For Python 2.x, it supports 2.6 and 2.7; for Python 3, all versions starting from 3.2 are supported.

The easiest way for a Python user on Linux probably is to use the pip install command—the full incantation is spelled out on the official install page, and simply uses pip to install SciPy's major components, such as NumPy and Jupyter.

Another option, which may be more convenient for Windows users and some others, is to install the self-contained Python distributions maintained by several companies, including Enthought, a major institutional sponsor of the SciPy project and its conferences. As of this writing, however, the versions packaged in these distributions, while recent, were slightly behind the latest releases available through pip.

Finally, if you have a few spare gigabytes and are willing to relinquish some control over which versions of various libraries are installed, Sage, the mathematical software covered here back in January, contains SciPy (and Jupyter, as well, since that is the currently preferred notebook interface to Sage).

Documentation

Documenting such a huge project is itself a significant undertaking. There is a reference manual in PDF form. I hesitate to link to it, as the one for version 1.0 weighs in at 2115 pages, and has no table of contents nor index. The first hundred pages are release notes, lists of authors, lists of pull requests, and similar material. This leaves search as the only way to find anything; searching through 2000 pages of PDF is not snappy. (But here is the PDF for those who want it despite the warnings.)

A better choice if you're getting started is the "documentation" link on the SciPy home page. This will lead you to other guides, including collections of recipes and some tutorials.

The typical user of SciPy will only use a fraction of its many specialized scientific and numerical libraries, so there is no need to have a huge trove of documentation at hand. One challenge may be to discover whether there exists a SciPy module that might help you in your work. Your options for exploring this space are web searches, specifically site searches aimed at scipy.org, discussion with members of your research community, browsing the scipy directory within your local machine's Python library directory tree (the source code has extensive docstrings), or using Python's online help system, which is a more convenient interface to this last approach.

To use the help system, first execute the ipython command; then, within the REPL, type help(). This works because ipython imports the pydoc module for you, which defines the help() command. This command will place you in a documentation subsystem, which you can exit with a ctrl-D. Within the help system, you can simply type the names of modules or sub-packages that you want to learn about. Typing "scipy" will give you a list of top-level modules in the scipy library; to learn about any one of these, ask about it using import syntax. For example, you will see "stats" in the list; to learn about the functions in this package, type "scipy.stats", and you will get extensive, if concise, documentation about SciPy's statistical functions, including some examples of use. If you've browsed the source code directly, you'll notice that this documentation is built from the docstrings, but organized more conveniently.

Using SciPy

Let's try out SciPy in the terminal, using IPython. The aim in this section is to provide a feel for interactive exploration in the REPL, and demonstrate the expressive power of the SciPy libraries. First, we need to spend a little time getting briefly acquainted with NumPy, which is the core extension to Python upon which everything else is based. Even if you never use any other part of SciPy, if you ever find yourself using Python to do any type of numerical computation, you will want to at least be aware of what NumPy offers.

NumPy adds a new data type to python: the numpy array. These are true arrays, distinct from Python's lists (and different, as well, from the array type provided in the standard library). They are multidimensional, homogeneous (numbers only) collections that store their elements contiguously in memory, which allows fast, vectorized operations by the Fortran or C routines that use them. NumPy provides operators and functions that operate on arrays as a whole or element-wise; while this is not the place for a NumPy tutorial, here is a taste of what is essentially an array mini-language within Python, that will seem familiar to those who have used array languages such as APL, or array Fortran. If you execute the following Python code (in either Python2 or Python3):

    import numpy as np
    a = np.array([1, 2, 3])
    a**2

You will get the output array([1, 4, 9]). Note that each element of the array is squared, and that this would be a TypeError if we tried it with a list rather than an array.

You can have multidimensional arrays, too. Here is a 2D array (a matrix):

    d = np.array([ [1, 2, 3], [4, 5, 6] ])
    d + 1

The way the result is printed shows how matrices are represented, and that the "+" operator is overloaded to act element-wise on arrays:

    array([[2, 3, 4],
           [5, 6, 7]])

NumPy provides versions of many mathematical functions, such as sin() and cos(), that are extended to act element-wise on arrays, and even makes it possible for you to extend your own functions to operate the same way. Using what is essentially a domain-specific-language for arrays can be a mind-expanding style of programming for those who are mainly familiar with explicit looping over data. NumPy enables you to use an interpreted scripting language to perform non-trivial, fast numerical computations, making the rest of the SciPy ecosystem possible.

[IPython session]

The first figure is a screen shot of an IPython session showing the use of some of the linear algebra functions from the scipy library. First we do the necessary imports, then set some options for the way NumPy will print numbers, to keep things neat. We define a matrix a, and print it out, then calculate its inverse, and print that. The inverse of a matrix gives you the identity matrix (ones on the diagonal) when you matrix-multiply it with the original. The inverse can be used to solve sets of linear equations, for example. To verify that b is really the inverse of a, we multiply the two matrices, and get an identity matrix as the result.

Now let's import the fast Fourier transform routines to create a filtered square wave:

    from scipy.fftpack import fft, ifft

We'll need one cycle of a square wave to begin; we'll use a numpy array of 100 zeroes followed by 100 ones. We'll also need some x-values for plotting, so we make an x array with the same shape:

    sw = np.array([np.zeros(100), np.ones(100)]).reshape(200,)
    x = np.arange(0, 200, 1)
[Square wave]

We haven't seen all of the things in the lines above explained, but their meaning should be fairly clear. Just to make sure we did this right, the figure above shows the square wave. All we need to do to see it is import pylab and then pylab.plot(x, sw) (see the matplotlib article linked above).

Now we pass the array into the fft() routine and store the Fourier coefficients in the array swt:

    swt = fft(sw)
[Bar graph]

Plotting swt with the pylab.bar() command reveals the spectrum in the figure above. As an exercise for the reader, if you now calculate the inverse transform of the spectrum using ifft(sw) and plot the result, you should get back the original square wave.

Finally, to show the effect of truncating the Fourier series, we make a copy of the coefficient array, zero out all the coefficients above the 10th, and take the inverse transform of that:

    swtrunk = swt
    swtrunk[10:] = 0
    swtrunki = ifft(swtrunk)
[Low-pass filtered square wave]

Plotting the result produces a picture of the low-pass filtered square wave in the figure above.

I hope that these brief example sessions give you some idea of the interactive, exploratory powers that Python and SciPy provide. This style of expressive, fluid computation was virtually unknown to scientists a generation ago. Even if one had access to such conveniences as precompiled Fourier transform routines, in order to carry out the simple experiment that we did above would be comparatively laborious, without the immediate feedback that encourages a constructively playful approach to computation. The SciPy community deserves thanks and congratulations on reaching their recent milestone, and support for future development.


Index entries for this article
GuestArticlesPhillips, Lee


to post comments

SciPy reaches 1.0

Posted Nov 16, 2017 7:46 UTC (Thu) by efiring (guest, #4543) [Link] (8 responses)

Thank you for the nice article about scipy--but please correct the example involving the FFT. The fft function operating on an array of N values, which can be real or complex, returns a complex array with contributions at the N frequencies ordered from 0 to N-1 cycles per record length. Attempting to plot this complex array results in a warning; the imaginary parts are being discarded. For illustrating reconstruction using a truncated transform, the easiest approach is to use the rfft and irfft functions, both of which take real array inputs and yield real array outputs. See the docstrings or other scipy documentation for more information. Briefly, to calculate the sum of the 10 lowest-frequency components, we keep the first 21 elements of the rfft output:

y = np.zeros((200, ), dtype=float)
y[100:] = 1
yft = rfft(y)
yft_trunc = yft.copy()
yft_trunc[21:] = 0
yrecon = irfft(yft_trunc)

SciPy reaches 1.0

Posted Nov 16, 2017 13:09 UTC (Thu) by leephillips (guest, #100450) [Link] (7 responses)

Your way is clearer, I agree. But due to the symmetry of the original function the transform coefficients are pure imaginary, so the result should be the same. Thanks for telling me about rfft though.

SciPy reaches 1.0

Posted Nov 16, 2017 17:01 UTC (Thu) by efiring (guest, #4543) [Link] (6 responses)

Your use of fft and ifft is simply incorrect; it is giving the wrong answer. The contributions from frequencies above the Nyquist (or equivalently, the negative frequencies) are not zero and cannot be neglected. Note that in your truncated reconstruction the approximation to the square wave has only half the amplitude that it should have.

SciPy reaches 1.0

Posted Nov 16, 2017 18:14 UTC (Thu) by leephillips (guest, #100450) [Link]

I note that it gives 90% of the correct amplitude.

SciPy reaches 1.0

Posted Nov 16, 2017 18:33 UTC (Thu) by leephillips (guest, #100450) [Link] (2 responses)

Nevertheless, you are correct. In filtering the complex amplitudes I did not simply low-pass the function, but did something messier, and I should have used the rfft routines. Thank you for pointing it out.

SciPy reaches 1.0

Posted Nov 17, 2017 3:55 UTC (Fri) by efiring (guest, #4543) [Link] (1 responses)

I should have said "half the energy", not "half the amplitude".

The truncated reconstruction can be done using the complex (fft and ifft) forms, as illustrated in the "reconstruct" function towards the end of this notebook: https://currents.soest.hawaii.edu/ocn_data_analysis/_stat.... It requires zeroing a slice in the *middle* of the transformed array.

SciPy reaches 1.0

Posted Nov 17, 2017 13:47 UTC (Fri) by leephillips (guest, #100450) [Link]

That notebook page is excellent - an exhaustive introduction to ffts in scipy.

SciPy reaches 1.0

Posted Nov 16, 2017 20:58 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

I was wondering why the lower half pf the square root doesn't have any overshoot. After all the two parts should be symmetrical.

SciPy reaches 1.0

Posted Nov 17, 2017 13:45 UTC (Fri) by leephillips (guest, #100450) [Link]

Actually, it would be symmetrical had I plotted the real part instead of the absolute value, but, as efiring points out, the amplitude would still be wrong.

SciPy reaches 1.0

Posted Nov 16, 2017 15:08 UTC (Thu) by kpfleming (subscriber, #23250) [Link] (5 responses)

In an article about a incredibly expressive and powerful (and easy to use) scientific computing system, we also get an example of how terrible English is at the same tasks.

"many 'more arcane' tasks" != "'many more' arcane tasks", and yet there's no straightforward way to express this to avoid the confusion. Some day we will find alien life which communicates using math, and we'll understand why.

SciPy reaches 1.0

Posted Nov 16, 2017 15:15 UTC (Thu) by leephillips (guest, #100450) [Link] (1 responses)

Interesting observation. I could have written, "many additional arcane tasks", but ambiguity in English is actually a feature, not a bug.

SciPy reaches 1.0

Posted Nov 18, 2017 3:39 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

My pun-loving friends would probably be shadows of their current selves without the ambiguities. And the parsing wordplay is my favorite. Or responding in conversation as if puns had been made. It's hard to choose.

SciPy reaches 1.0

Posted Nov 16, 2017 17:53 UTC (Thu) by sanjoy (guest, #5026) [Link] (2 responses)

To respond to the point that "many 'more arcane' tasks" != "'many more' arcane tasks", and yet there's no straightforward way to express this to avoid the confusion.

I would distinguish the two using a hyphen (which is why I am a hyphen stickler): "many more-arcane tasks" vs. "many more arcane tasks." If one is extremely consistent about hyphenating compound modifiers, then the second form should be interpreted correctly. And don't get me started on the fading of the Oxford (series) comma.

SciPy reaches 1.0

Posted Nov 16, 2017 18:18 UTC (Thu) by leephillips (guest, #100450) [Link] (1 responses)

I try to avoid accidentally creating neologisms through hyphenation, but I stand with you in support of the Oxford comma. They can pry it from my cold, dead keyboard.

SciPy reaches 1.0

Posted Nov 18, 2017 3:40 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

Hopefully your keyboard is cold and dead already.

But with the recent Oxford comma lawsuit here, maybe it will be used more.

SciPy reaches 1.0

Posted Nov 16, 2017 16:23 UTC (Thu) by mgedmin (subscriber, #34497) [Link] (1 responses)

Nitpickery: this line

swtrunk = swt

does not actually copy the array, it merely creates a new alias for the same array object. To copy you would use

swtrunk = swt.copy()

like in the example suggested by efiring.

SciPy reaches 1.0

Posted Nov 16, 2017 16:34 UTC (Thu) by leephillips (guest, #100450) [Link]

This is true - it just amounts to an unnecessary second name for the same array.

SciPy reaches 1.0

Posted Nov 17, 2017 10:59 UTC (Fri) by smitty_one_each (subscriber, #28989) [Link] (1 responses)

"SciPy (pronounced “Sigh Pie”)"

Well, so much for "Skippy".

SciPy reaches 1.0

Posted Nov 17, 2017 20:42 UTC (Fri) by adam820 (subscriber, #101353) [Link]

Choosy scientists choose Skippy? :]

SciPy reaches 1.0

Posted Nov 23, 2017 4:20 UTC (Thu) by charris (guest, #13263) [Link] (2 responses)

SciPy 1.0 supports Python 2.7 or 3.4+ and requires NumPy 1.8.1+.

SciPy reaches 1.0

Posted Nov 23, 2017 12:34 UTC (Thu) by leephillips (guest, #100450) [Link] (1 responses)

According to https://www.scipy.org/stackspec.html it's

Python 2.x >= 2.6 or 3.x >= 3.2 and
NumPy >= 1.6

Do you have a source for the slightly higher versions you've listed?

SciPy reaches 1.0

Posted Nov 23, 2017 16:37 UTC (Thu) by jwilk (subscriber, #63328) [Link]


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds