Following Clifford's hint I have had my nerdness tested. The version 2 result is here:
It seems a science/math post is long overdue. So here it is: Resistance of a wave guide/coax cable. Readers of the Feynman Lectures know that you can model a coax cable by an infinite sequence of capacitors and inductance likethis
To compute the asymptotic (complex) resistance of this infinite circuit Feynamn instructs you look at a single iteration and to summarise the rest in some black box with resistance R.
The new resistance between the terminals is then easily computed (when driven with frequency w):
Now you argue that an infinite chain should not change its resistance if one link is added and thus R=R'. This quadratic equation is solved as
The final thing is to remember that the chain of L's and C's is a discrete version of a continuous chain and thus one should take both L and C to zero while keeping their ratio fixed. We end up with
Note that in this limit the frequency w dropped out. So far the Feynman lectures.
But there is one curious thing: Although we have been adding only capacitors and inductances which have purely imaginary resistances and no Ohmic (real) resistance, nevertheless, the limit is real!
How can this be true? When you think about the physics you should be even more irritated: Neither capacitors nor inductances do any work and only Ohmic resistance produces heat. By adding together elements that do not produce heat. After meditating this fact for a while one settle with the explanation that one might say that the energy is carried down the infinite circuit and never returns and thus is gone and might be considered heat. But this is not really convincing.
So we should better study the maths to some more detail. What we have done was to consider a sequence of resistances and computed the possible fixed points of this iteration. If the resistance converges it certainly will converge to a fixed point. But who tells you it really converges?
So, let's again add one infinitessimal bit of chain. Let us use new variables z and x such that
Thus z is the fixed point resistance we computed above and x is the thing which we take to 0 (and thus we can work at O(x)). We do the above calculation again and find that the change in resistance is
We can view i(z^2-R^2)/z as a vector field in the complex R plane
We can see (either from the formula or the plot) that for purely imaginary R we will always stay on the imaginary axis and flow to complex infinity! Without Ohmic resistance we will not flow towards the fixed points.
But even if we start off slightly off the imaginary axis we do not spiral in to one fixed point as one might have thought: The vector field is holomorphic and thus Hamiltonian. Therefore there is a conserved quantity (although right now I am too tired to compute it).
Well, I thought I might not be too tired, got confused for two entire hours and with the help of Enrico (who suggested separation of variables) and Jan found the conserved quantity to be
UPDATE: When I wrote this last night I was too tired to correctly compute the real part of 1-R^2. Thus I got the real components of the vector field wrong and this explains why I had such trouble to find the correct conserved quantity. After one short night of sleep I noticed my error and indeed the conserved quantity I had calculated earlier was correct,
(setting w=z=1) but the vector field was wrong (plots are corrected as well). See for example
Tuesday, December 11, 2007
Wednesday, November 28, 2007
Lehrer Video
I bet many of you out there love Tom Lehrer songs as much as I do. So I hope you enjoy this video showing the master himself performing some of his maths songs:
There are some more Lehrer songs on youtube and especially this superb performance/animation of New Math:
There are some more Lehrer songs on youtube and especially this superb performance/animation of New Math:
Sunday, November 25, 2007
An example of examples: Series and limits (in German)
As an example of how I would explain a concept in terms of examples and counter-examples let me cut and paste a text that I wrote for a mailing list which explains the notion of series and limits to ninth grader. That mailing list is in German so is this text. Sorry.
Ich versuche es mal mit einer Prosabeschreibung. Also erstmal, was ist eine Folge? Einfach gesagt ist das ein Liste von Zahlen, die nicht aufhoert, also zB
1, 2, 3, 4, 5 etc.
oder auch
1, 1, 1, 1, 1, 1 etc.
oder auch
1, 1/2, 1/3, 1/4, 1/5, etc
oder auch
3, 3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141592, etc
oder auch
1, -1, 1, -1, 1, -1, etc
Vornehm gesagt ist eine Folge nix weiter als eine Funktion von den natuerlichen Zahlen in eine (Zahlen)-Menge Deiner Wahl. D.h. fuer jede natuerliche Zahl n (die Position in der Folge) gibt es eine Zahl a_n. Im ersten Beispiel ist
a_n = n
im zweiten Beispiel
a_n = 1
im dritten Beispiel
a_n = 1/n
und im vierten Beispiel ist a_n die Zahl, die man erhaelt, wenn man von pi die ersten n Dezimalstellen nimmt. Die fuenfte Folge koennen wir schreiben als
Soweit alles klar?
Von einer Folge kann es nun sein, dass sie gegen einen Grenzwert a kovergiert (sie "diesen Grenzwert hat"). Grob gesagt soll das heissen, dass sie 'auf lange Sicht' der Zahl a immer naeher kommt. Das muss man nun etwas formalisieren. Eine moegliche Definition ist, dass fuer alle offenen Intervalle, die a enthalten, hoechstens endlich viele Glieder der Folge nicht auch schon in diesem Intervall liegen, egal wie klein das offene Intervall ist (wenn es kleiner wird, liegen halt mehr Folgenglieder nicht drin, aber es bleiben immmer endlich viele).
Nehmen wir zB das dritte Beispiel a_n = 1/n . Davon ist offenbar 0 der Grenzwert. Wir koennen das ueberpruefen. Ueberleg Dir ein offenes Intervall, das die 0 enthalet, also zb ]l,r[ . Damit die 0 drin ist, muss l negativ und r positiv sein. Offenbar liegen nur die a_n fuer die n<1/r ist, nicht in dem Intervall, alle anderen liegen drin, also haben wir tatsaechlich nur endlich viele Ausnahmen, egal welches Intervall wir nehmen.
Das zweite Beispiel, a_n = 1, hat auch einen Grenzwert, naemlich natuerlich die eins. Ein offenes Intervall, das die 1 enthaelt, enthaelt auch alle Folgenglieder, es gibt also ueberhaupt keine Ausnahmen.
An den beiden Beispielen sehen wir auch, dass es volkommen egal ist, ob der Grenzwert selber in der Folge vorkommt.
Bei der Definition ist es aber wesentlich, dass wir nur offfene Intervalle zulassen. Sonst koennten wir fuer die 1/n Folge das geschlossene Interval [0, 0] nehmen, dieses enthaelt zwar die Null, aber kein einziges Folgenglied, damit liegen alle, also unendlich viele Folgenglieder nicht im Intervall. Ueberlege Dir selbst, welche Folgen konvergieren wuerden, wenn wir geschlossene Invervalle nehmen wuerden.
Das Beispiel mit den Dezimalstellen von pi ist auch konvergent und hat den Grenzwert pi.
Du kannst Dir auch leicht ueberlegen, dass eine Folge nicht mehrere Zahlen als Grenzwert haben kann: Haette sie zwei verschidene Grenzwerte, koenntest Du zwei offene Intervalle I1 und I2 benutzen, die jeweils nur einen der beiden Grenzwerte enthalten und deren Schnitt leer ist (gegebenfalls musst Du sie entsprechend verkleinern). Dann muessen alle bis auf endlich viele der Folgenglieder in I1 enthalten sein. Daraus folgt aber, dass unendlich viele Folgenglieder nicht in I2 sind. Also gibt es einen Widerspruch zu der Annahme, dass ein ein Grenzwert in I2 ist.
Die fuenfte Folge, die abwechselnd 1 und -1 ist, ist hingegen nicht konvergent, sie hat keinen Grenzwert: Als Grenzwert kaemen sowieso nur 1 und -1 in Frage. Schauen wir uns also das offene Intervall
] 1/2 , 1 1/2 [
an. Dann liegen da zwar unendlich viele Folgenglieder drin (naemlich jedes zweite), aber es liegen auch unenedlich viele Folgenglieder nich drin, naemlich die restlichen. Also kann 1 kein Grenzwert sein, denn es gibt ein offenes Intervall, das 1 enthaelt, aber unendlich viele Folgenglieder nicht.
Bleibt noch die erste Folge a_n = n. Wenn wir als Grenzwert nur 'normale' Zahlen zulassen, dann hat die Folge keinen Grenzwert, da die Folgenglieder aus jedem endlichen offenen Intervall herauslaufen. Wir koennen aber auch "unendlich" als Grenzwert zulassen, wenn wir es als Obergrenze fuer offene Intervalle erlauben. So soll etwa
] l, unendlich [
die Menge aller Zahlen, die groesser als l sind sein. Nun koennten wir definieren, dass eine Folge gegen unendlich konvergiert, wenn in allen solchen Intervallen bis auf endlich viele Ausnahmen alle Folgenglieder drin liegen. In diesem Sinn konvergiert die erste Folge dann gegen unendlich. Auf aehnlich weise kann man dann auch definieren, was es heissen soll, dass eine Folge gegen minus unendlich konvergiert.
Wenn ich das urspruengliche Beispiel von Lukas richtig verstanden habe, war da der Witz, dass seine Folge abwechselnd positive und negative Zahlen haben sollte, die im Betrag immer groesser werden. Diese Folge konvergiert dann aber weder gegen unendlich noch minus unendlich aus dem gleichen Grund, wie die 1, -1, 1, Folge nicht gegen 1 oder -1 konvergiert.
Soweit zum Grenzwert und Konvergenz. Das Beispiel mit 1, -1,... suggeriert aber noch die Definition eines aehnlichen Begriffs, der aber in einem gewissen Sinn schwaecher ist: des Haeufungspunkts. Eine Zahl a ist ein Hauefungspunkt einer Folge, wenn in jedem offenen Interval, egal wie klein, das a enthaelt, auch unendlich viele Folgenglieder drin sind. Hier wird aber nichts darueber gesagt, wieviele Folgenglieder nicht drin sein duerfen.
Du ueberlegst Dir schnell, dass einen Folge, die einen Grenzwert a, auch a als Haeufungspunkt hat (und keinen weiteren). Die Folge mit den 1ern und -1ern hat zwei Haeufungspunkte, naemlich 1 und -1. Im Gegensatz zum Grenzwert kann eine Folge also mehrere Haeufungspunkte haben.
Die Folge 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, etc hat zB fuenf Haeufungspunkte, die Folge
1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, etc
hat alle natuerlichen Zahlen als Haufungspunkt. Mit einem kleinen Trick ('Cantorsches Diagonalverfahren') kann man sich auch eine Folge ueberlegen, die alle rationalen oder sogar alle reellen Zahlen als Haeufungspunkt hat.
Von besonderem Interesse sind manchmal noch der groesste und der kleinste Haeufungspunkt einer Folge, Limes Superior und Limes Inferior genannt. Es ist eine Eigenschaft der reellen Zahlen, dass jede Folge von reellen Zahlen mindestens einen Haeufingspunkt hat (wenn man auch minus unendlich und unendlich als Haeufungspunkte zulaesst). Dieser Satz ist unter dem Namen "Satz von Bolzano Weierstrass" bekannt (siehe Wikipedia). Fuer die rationalen Zahlen stimmt er nicht (eines der obigen Beispiele fuer Folgen ist ein Gegenbeispiel, welches?)
Unsere Feststellung von oben kann man aber auch umkehren: Wenn eine Folge (im reellen) nur einen Haufungspunkt hat, der groesste also gleich dem kleinsten Haeufungspunkt ist, ist dieser automatisch auch schon Grenzwert der Folge und die Folge ist konvergent. Kannst Du das selber beweisen?
Soweit mein kleiner Crash-Kurs zum Thema Konvergenz von Folgen.
Ich versuche es mal mit einer Prosabeschreibung. Also erstmal, was ist eine Folge? Einfach gesagt ist das ein Liste von Zahlen, die nicht aufhoert, also zB
1, 2, 3, 4, 5 etc.
oder auch
1, 1, 1, 1, 1, 1 etc.
oder auch
1, 1/2, 1/3, 1/4, 1/5, etc
oder auch
3, 3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141592, etc
oder auch
1, -1, 1, -1, 1, -1, etc
Vornehm gesagt ist eine Folge nix weiter als eine Funktion von den natuerlichen Zahlen in eine (Zahlen)-Menge Deiner Wahl. D.h. fuer jede natuerliche Zahl n (die Position in der Folge) gibt es eine Zahl a_n. Im ersten Beispiel ist
a_n = n
im zweiten Beispiel
a_n = 1
im dritten Beispiel
a_n = 1/n
und im vierten Beispiel ist a_n die Zahl, die man erhaelt, wenn man von pi die ersten n Dezimalstellen nimmt. Die fuenfte Folge koennen wir schreiben als
(minus eins hoch n)
n
a_n = (-1)
Soweit alles klar?
Von einer Folge kann es nun sein, dass sie gegen einen Grenzwert a kovergiert (sie "diesen Grenzwert hat"). Grob gesagt soll das heissen, dass sie 'auf lange Sicht' der Zahl a immer naeher kommt. Das muss man nun etwas formalisieren. Eine moegliche Definition ist, dass fuer alle offenen Intervalle, die a enthalten, hoechstens endlich viele Glieder der Folge nicht auch schon in diesem Intervall liegen, egal wie klein das offene Intervall ist (wenn es kleiner wird, liegen halt mehr Folgenglieder nicht drin, aber es bleiben immmer endlich viele).
Nehmen wir zB das dritte Beispiel a_n = 1/n . Davon ist offenbar 0 der Grenzwert. Wir koennen das ueberpruefen. Ueberleg Dir ein offenes Intervall, das die 0 enthalet, also zb ]l,r[ . Damit die 0 drin ist, muss l negativ und r positiv sein. Offenbar liegen nur die a_n fuer die n<1/r ist, nicht in dem Intervall, alle anderen liegen drin, also haben wir tatsaechlich nur endlich viele Ausnahmen, egal welches Intervall wir nehmen.
Das zweite Beispiel, a_n = 1, hat auch einen Grenzwert, naemlich natuerlich die eins. Ein offenes Intervall, das die 1 enthaelt, enthaelt auch alle Folgenglieder, es gibt also ueberhaupt keine Ausnahmen.
An den beiden Beispielen sehen wir auch, dass es volkommen egal ist, ob der Grenzwert selber in der Folge vorkommt.
Bei der Definition ist es aber wesentlich, dass wir nur offfene Intervalle zulassen. Sonst koennten wir fuer die 1/n Folge das geschlossene Interval [0, 0] nehmen, dieses enthaelt zwar die Null, aber kein einziges Folgenglied, damit liegen alle, also unendlich viele Folgenglieder nicht im Intervall. Ueberlege Dir selbst, welche Folgen konvergieren wuerden, wenn wir geschlossene Invervalle nehmen wuerden.
Das Beispiel mit den Dezimalstellen von pi ist auch konvergent und hat den Grenzwert pi.
Du kannst Dir auch leicht ueberlegen, dass eine Folge nicht mehrere Zahlen als Grenzwert haben kann: Haette sie zwei verschidene Grenzwerte, koenntest Du zwei offene Intervalle I1 und I2 benutzen, die jeweils nur einen der beiden Grenzwerte enthalten und deren Schnitt leer ist (gegebenfalls musst Du sie entsprechend verkleinern). Dann muessen alle bis auf endlich viele der Folgenglieder in I1 enthalten sein. Daraus folgt aber, dass unendlich viele Folgenglieder nicht in I2 sind. Also gibt es einen Widerspruch zu der Annahme, dass ein ein Grenzwert in I2 ist.
Die fuenfte Folge, die abwechselnd 1 und -1 ist, ist hingegen nicht konvergent, sie hat keinen Grenzwert: Als Grenzwert kaemen sowieso nur 1 und -1 in Frage. Schauen wir uns also das offene Intervall
] 1/2 , 1 1/2 [
an. Dann liegen da zwar unendlich viele Folgenglieder drin (naemlich jedes zweite), aber es liegen auch unenedlich viele Folgenglieder nich drin, naemlich die restlichen. Also kann 1 kein Grenzwert sein, denn es gibt ein offenes Intervall, das 1 enthaelt, aber unendlich viele Folgenglieder nicht.
Bleibt noch die erste Folge a_n = n. Wenn wir als Grenzwert nur 'normale' Zahlen zulassen, dann hat die Folge keinen Grenzwert, da die Folgenglieder aus jedem endlichen offenen Intervall herauslaufen. Wir koennen aber auch "unendlich" als Grenzwert zulassen, wenn wir es als Obergrenze fuer offene Intervalle erlauben. So soll etwa
] l, unendlich [
die Menge aller Zahlen, die groesser als l sind sein. Nun koennten wir definieren, dass eine Folge gegen unendlich konvergiert, wenn in allen solchen Intervallen bis auf endlich viele Ausnahmen alle Folgenglieder drin liegen. In diesem Sinn konvergiert die erste Folge dann gegen unendlich. Auf aehnlich weise kann man dann auch definieren, was es heissen soll, dass eine Folge gegen minus unendlich konvergiert.
Wenn ich das urspruengliche Beispiel von Lukas richtig verstanden habe, war da der Witz, dass seine Folge abwechselnd positive und negative Zahlen haben sollte, die im Betrag immer groesser werden. Diese Folge konvergiert dann aber weder gegen unendlich noch minus unendlich aus dem gleichen Grund, wie die 1, -1, 1, Folge nicht gegen 1 oder -1 konvergiert.
Soweit zum Grenzwert und Konvergenz. Das Beispiel mit 1, -1,... suggeriert aber noch die Definition eines aehnlichen Begriffs, der aber in einem gewissen Sinn schwaecher ist: des Haeufungspunkts. Eine Zahl a ist ein Hauefungspunkt einer Folge, wenn in jedem offenen Interval, egal wie klein, das a enthaelt, auch unendlich viele Folgenglieder drin sind. Hier wird aber nichts darueber gesagt, wieviele Folgenglieder nicht drin sein duerfen.
Du ueberlegst Dir schnell, dass einen Folge, die einen Grenzwert a, auch a als Haeufungspunkt hat (und keinen weiteren). Die Folge mit den 1ern und -1ern hat zwei Haeufungspunkte, naemlich 1 und -1. Im Gegensatz zum Grenzwert kann eine Folge also mehrere Haeufungspunkte haben.
Die Folge 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, etc hat zB fuenf Haeufungspunkte, die Folge
1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, etc
hat alle natuerlichen Zahlen als Haufungspunkt. Mit einem kleinen Trick ('Cantorsches Diagonalverfahren') kann man sich auch eine Folge ueberlegen, die alle rationalen oder sogar alle reellen Zahlen als Haeufungspunkt hat.
Von besonderem Interesse sind manchmal noch der groesste und der kleinste Haeufungspunkt einer Folge, Limes Superior und Limes Inferior genannt. Es ist eine Eigenschaft der reellen Zahlen, dass jede Folge von reellen Zahlen mindestens einen Haeufingspunkt hat (wenn man auch minus unendlich und unendlich als Haeufungspunkte zulaesst). Dieser Satz ist unter dem Namen "Satz von Bolzano Weierstrass" bekannt (siehe Wikipedia). Fuer die rationalen Zahlen stimmt er nicht (eines der obigen Beispiele fuer Folgen ist ein Gegenbeispiel, welches?)
Unsere Feststellung von oben kann man aber auch umkehren: Wenn eine Folge (im reellen) nur einen Haufungspunkt hat, der groesste also gleich dem kleinsten Haeufungspunkt ist, ist dieser automatisch auch schon Grenzwert der Folge und die Folge ist konvergent. Kannst Du das selber beweisen?
Soweit mein kleiner Crash-Kurs zum Thema Konvergenz von Folgen.
Tuesday, November 06, 2007
An example for example
Tim Gowers has two very interesting posts on using examples early on in a mathematical exposition of a subject. I can only second that and say that this is my favorite way of understanding mathematical concepts: Try to think through the simplest non-trivial example.
Of course, for a mathematician it could be enough just to state a definition or state a theorem (including a proof) but very often this leaves one without a proper understanding of the subject. Why this definition and not something else? Where and how can I use the theorem? Why do I have to make assumption x, y and z?
For my practical purposes I often want to see "the key example" that shows what's going on and then often the general theory is a more or less obvious generalisation or extension or formalisation or abstraction of this key example. This second step hopefully is then clear enough that one can come up with it by oneself and it's really the key example one should remember and not the formal wording of the definition/theorem etc.
I am not talking about those examples some mathematicians come up with when pressed for an example, like after stating the definition of a vector space giving {0} as the example. This is useless. I want to see a typical example, one that many (all) other cases are modeled on not the special one that is different from all other cases. And as important as examples are of course counter-examples: What is close but not quite according to the new definition (and why do we want to exclude it? What goes wrong if I drop some of the assumptions of the theorem?
I have already talked for too long in the abstract, let me give you some examples:
Of course, for a mathematician it could be enough just to state a definition or state a theorem (including a proof) but very often this leaves one without a proper understanding of the subject. Why this definition and not something else? Where and how can I use the theorem? Why do I have to make assumption x, y and z?
For my practical purposes I often want to see "the key example" that shows what's going on and then often the general theory is a more or less obvious generalisation or extension or formalisation or abstraction of this key example. This second step hopefully is then clear enough that one can come up with it by oneself and it's really the key example one should remember and not the formal wording of the definition/theorem etc.
I am not talking about those examples some mathematicians come up with when pressed for an example, like after stating the definition of a vector space giving {0} as the example. This is useless. I want to see a typical example, one that many (all) other cases are modeled on not the special one that is different from all other cases. And as important as examples are of course counter-examples: What is close but not quite according to the new definition (and why do we want to exclude it? What goes wrong if I drop some of the assumptions of the theorem?
I have already talked for too long in the abstract, let me give you some examples:
- What's a sheaf and what do I need it for (at least in connection with D-branes)? Of course, there is the formal definition in terms of maps from open sets of a topological space into a category. The wikipedia article Sheaf reminds you what that is (and explains many interesting things. I think I only really understood what it really is after I realised that it's the proper generalisation of a vector bundle for the case at hand: A vector bundle glues some vector space to every point of a topological space and does that in a continuous manner (see, that's basically my definition of a vector bundle). Of course, once we have such objects, we would like to study maps between them (secretly we want to come up with the appropriate category). We know already what maps between vector-spaces look like. So we can glue them together point-wise (and be careful that we are still continuous) and this gives us maps between vector bundles. But from the vector-space case we know that then a natural operation is to look at the kernel of such a map (and maybe a co-kernel if we have a pairing). We can carry this over in a point-wise manner but, whoops, the 'kernel-bundle' is not a vector bundle in general: The dimension can jump! The typical example here is to consider as a one dimensional vector bundle over the real line (with coordinate x). Then multiplication in the fiber over the point x by the number x is trivially a fiber-wise linear map. Over any point except x=0 it has an empty kernel but over x=0 the kernel is everything. Thus, generically the fiber has dimension 0 but at the origin it has dimension one. Thus, in order to be able to consider kernels (and co-kernels) of linear bundle maps we have to weaken our definition of vector bundle and that what is a sheaf: It's like a vector bundle but in such a way that linear maps all have kernels and co-kernels.
- When I was a student in Hamburg I had the great pleasure to attend lectures by the late Peter Slodowy (I learned complex analysis from him as well as representation theory of the Virasoro algebra, gauge theories in the principle bundle language, symplectic geometry and algebraic geometry). The second semester of the algebraic geometry course was about invariants. Without the initial example (which IIRC took over a week to explain) I would have been completely lost in the algebra: The crucial example was: We want to understand the space of matrices modulo similarity transformations . Once one has learned that the usual algebraic trick is to investigate a space in terms of the algebra of functions living on it (as done in algebraic geometry for polynomial functions, or in non-commutative geometry in terms of continuous functions) one is lead to the idea that this moduli space is encoded in the invariants, that is functions that do not change under similarity transformations. Examples of such functions are of course the trace or the determinant. It turns out that this algebra of invariants (of course the sum or product of two invariant functions is still invariant) is generated by the coefficients of the characteristic polynomial that is, by the elementary symmetric functions of the eigenvalues (eigenvalues up to permutations). So this should be the algebra of invariants and its dual the moduli space. But wait, we know what the moduli space looks like from linear algebra: We can bring any matrix to Jordan normal form and that's it, matrices with different Jordan normal forms are not similar. But both and have the same characteristic polynomial but are not related by a similarity transformation. In fact the second one is similar to any matrix
for any . This shows that there cannot be a continuous (let alone polynomial) invariant which separates the two orbits as the first orbit is a limit point of points on the second orbit. This example is supposed to illustrate the difference between which is the naive space of orbits which can be very badly behaved and the much nicer which is the nice space of invariants. - Let me also give you an example for a case where it's hard to give an example: You will have learned at some point that a distribution is a continuous linear functional on test-functions. Super. Linear is obvious as a condition. But why continuous? Can you come up with a linear functional on test-functions which fails to be continuous? If you have some functional analysis background you might think "ah, continuous is related to bounded, let's find something which is unbounded". Let me assure you, this is the wrong track. It turns out you need the axiom of choice to construct an example (as in you need the axiom of choice to construct a set which is not Lebesgue measurable). Thus you will not be able to write down a concrete example.
- Here is a counter-example: Of course is the typical example of a real finite dimensional vectors space. But it is very misleading to think automatically of when a real vector space is mentioned. People struggled long enough to think of linear maps as abstract objects rather than matrices to get rid of ad hoc basis dependence!
Monday, October 29, 2007
Flickr upload
I had not used Flickr in 18 months and when I wanted to use it today my script didn't work anymore since they have changed their API significantly in the meantime. Simply updating the perl module was not enough and I found the documentation the the web to be rather cryptic. Thus as a service to the community, this is what finally worked for me:
You need a key and a secret which you can generate here. Of course, you also need the module
Then all you have to do is to copy (or link) all pictures to be uploaded to one directory (/home/robert/fotos/flickr in my case) and run the script. It gives you an URL you have to paste into your browser and then press ok and the upload begins.
#!/usr/bin/perl
use Flickr::API;
use Flickr::Upload;
# Path to pictures to be uploaded
my $flickrdir = '/home/robert/fotos/flickr';
my $flickr_key = 'PUT YOUR KEY HERE';
my $flickr_secret = 'PUT YOUR SECRET HERE';
my $ua = Flickr::Upload->new( {'key' => $flickr_key, 'secret' => $flickr_secret} );
$ua->agent( "perl upload" );
my $frob = getFrob( $ua );
my $url = $ua->request_auth_url('write', $frob);
print "1. Enter the following URL into your browser\n\n",
"$url\n\n",
"2. Follow the instructions on the web page\n",
"3. Hitwhen finished.\n\n";
<>;
my $auth_token = getToken( $ua, $frob );
die "Failed to get authentication token!" unless defined $auth_token;
print "Token is $auth_token\n";
opendir(FLICKR, $flickrdir) || die "Cannot open flickr directory $flickrdir: $!";
while(my $fn = readdir FLICKR){
next unless $fn =~ /[^\.]/;
print "$flickrdir/$fn\n";
$ua->upload(
'auth_token' => $auth_token,
'photo' => "$flickrdir/$fn",
'is_family' => 1
) or print "Failed to upload $fn!\n";
}
sub getFrob {
my $ua = shift;
my $res = $ua->execute_method("flickr.auth.getFrob");
return undef unless defined $res and $res->{success};
# FIXME: error checking, please. At least look for the node named 'frob'.
return $res->{tree}->{children}->[1]->{children}->[0]->{content};
}
sub getToken {
my $ua = shift;
my $frob = shift;
my $res = $ua->execute_method("flickr.auth.getToken",
{ 'frob' => $frob ,
'perms' => 'write'} );
return undef unless defined $res and $res->{success};
# FIXME: error checking, please.
return $res->{tree}->{children}->[1]->{children}->[1]->{children}->[0]->{content};
}
You need a key and a secret which you can generate here. Of course, you also need the module
perl -MCPAN -e shell
install Flickr::Upload
Then all you have to do is to copy (or link) all pictures to be uploaded to one directory (/home/robert/fotos/flickr in my case) and run the script. It gives you an URL you have to paste into your browser and then press ok and the upload begins.
Friday, October 26, 2007
Quantum Field lectures and some notes
The past few weeks here were quite busy as now the semester has started (October 15th) and with it the master program "Theoretical and Mathematical Physics" has become reality with the first seven student (one of them attracted apparently via this blog) have arrived and are now taking classes in mathematical quantum mechanics, differential geometry, string theory, quantum electrodynamics, conformal field theory, general relativity, condensed matter theory and topology (obviously not everybody attends all these courses).
I have already fulfilled my teaching obligation by teaching a block course "Introduction to Quantum Field Theory" the two weeks before the semester. Even though we had classes both in the morning and the afternoon for two weeks there was obviously only a limited amount of time and I had to decide which small part of QFT I was going to present. I came up with the following
As far as books are concerned: For the preparation, I used Ryder, my favorite QFT text for large parts (and Schulman's book for the path integrals in quantum mechanics introduction). Only later I discovered that Zinn-Justin's book has a very similar approach (at least if you ignore all material on fields other than spin 0 and all the discussions of critical phenomena). Only yesterday, a copy of the new QFT book by Srednicki arrived on my desk (thanks CUP!) and from what I read there so far, this looks also extremely promising!
For your entertainment, I have also uploaded the exercise sheets here:
1 2 3 4 5
PS: If instead of learning QFT in two weeks you want to learn string theory in two minutes check this out.Didn't know molecules were held togehter by the strong force, though...
I have already fulfilled my teaching obligation by teaching a block course "Introduction to Quantum Field Theory" the two weeks before the semester. Even though we had classes both in the morning and the afternoon for two weeks there was obviously only a limited amount of time and I had to decide which small part of QFT I was going to present. I came up with the following
- Leave out the canonical formalism completely. Many courses start with it as this was the historical development and students will recognize commutation relations from quantum mechanics classes. But the practical use of it is limited: As soon as you get to interacting theories it becomes complicated and the formalism is just horrible as soon as you have gauge invariance. Of course, it's still possible to use it (and it is the formalism of choice for some investigations) but it's definitely not the simplest choice to be presented in an introductory class.
- Thus, I was going to use the path integral formalism from day one. I spend the first two days introducing it via a series of double (multi) slit (thought) experiments motivating that a sum over paths is natural in quantum mechanics and then arguing for the measure factor by demanding the correct classical limit in a saddle point approximation. This heuristic guess for the time evolution was then shown to obey Schrödinger's equation and thus equivalence with the usual treatment was established at least for systems with a finite number of degrees of freedom.
- In addition, proceeding using analogies with quantum mechanics can lead to some confusion (at least it did for me when I first learned the subject): The Klein-Gordon equation is often presented as the relativistic version of Schrödinger's equation (after discarding an equation involving a square root because of non-locality). Later then it turns out, the field it describes is not a wave function as it cannot have a probability interpretation. The instructor will hope that this interpretation is soon to be forgotten because it's really strange to think of the vector potential as the wave function of the photon which would be natural from this perspective. And if the Klein-Gordon field is some sort of wave function, why does it again need to be quantised? So what kind of objects are the field operators and what do they act on? In analogy with first quantisation one would guess they act on wave functionals that map field configurations on a Cauchy surface to complex numbers which are in some functional integral sense square integrable. OK, Fock space does the job but again, that's not obvious.
- All these complications are avoided using path integrals. At least if one gets his head around these weird infinite dimensional integrals and the fact that in between we have to absorb infinite normalisation constants. But then, only a little bit later, one arrives at Feynman rules and for example the partition function for the free field is a nice simple expressions and all strange integrals are gone (they have been performed in a Gaussian way).
- So instead of requantising an already (pseudo) quantum theory, I introduced the Klein-Gordon equation just as any classical equation of motion of a system which happens to have a continuum of degrees of freedom (I did it via the continuum limit of some "balls with springs" model). Thus before getting into any fancy quantum business, we solved this field equation (including the phi^4 interaction) classically. Doing that perturbatively, we came up with Feynman rules (tree diagrams only of course) and a particle-wave duality while still being entirely classical. As I am not aware of a book which covers Feynman diagrams from a classical perspective I have written up some lecture notes of this part. They also include the discussion of kink solutions which were an exercise in the course and which suggest the limitations of the perturbative approach and how solitonic objects have to be added by hand. (To be honest, advertising these lecture notes is the true purpose of this post... Please let me know you comments and corrections!)
- The other cut I decided to make was to restrict attention only to the scalar field. I did not discuss spinors or gauge fields. They are interesting subjects for themselves but I decided to focus on features of quantum field theories rather than representation theory of the Lorentz group. The Dirac equation is a nice subject by itself and discussing gauge invariance leading to a kinetic operator which is not invertible (and thus requiring a gauge fixing and eventually ghosts to make it invertible) would have been nice, but there was no time. But as I said, there is a regular course on QED this semester and there all these things will be covered.
- These severe cuts allowed us to get quite deep into the subject: When I took a QFT course, we spend the entire first semester discussing only free fields (spin 0, 1/2 and 1). Here, in this course, we managed to get to interacting fields in only two weeks including the computation of 1 loop diagrams. We computed the self-energy correction and the fish graph (including Schwinger parameters, Feynman trick and all that) went through their dimensional regularisation and renormalisation (including a derivation of the important residues of the gamma function). The last lecture, I could even sketch the idea of the renormalisation group, running coupling constants and why nature seems to use only renormalisable theories for particle physics (as the others have vanishingly small couplings at our scales).
As far as books are concerned: For the preparation, I used Ryder, my favorite QFT text for large parts (and Schulman's book for the path integrals in quantum mechanics introduction). Only later I discovered that Zinn-Justin's book has a very similar approach (at least if you ignore all material on fields other than spin 0 and all the discussions of critical phenomena). Only yesterday, a copy of the new QFT book by Srednicki arrived on my desk (thanks CUP!) and from what I read there so far, this looks also extremely promising!
For your entertainment, I have also uploaded the exercise sheets here:
1 2 3 4 5
PS: If instead of learning QFT in two weeks you want to learn string theory in two minutes check this out.Didn't know molecules were held togehter by the strong force, though...
Wednesday, September 19, 2007
The fun of cleaning up
Since early childhood I hate cleaning up. Now that I am a bit older, however, I sometimes realise it has to be done, especially of other people are involved (visitors, flat-mates, etc). See however this and this.
And yes, when I'm doing DIY or installing devices/cables/networks etc I am usually satisfied with the "great, it works (ahem, at least in principle)" stage.
But today, (via Terry Tao's blog) I came across The Planarity Game, which might have changed my attitude towards tidying up... Have fun!
And yes, when I'm doing DIY or installing devices/cables/networks etc I am usually satisfied with the "great, it works (ahem, at least in principle)" stage.
But today, (via Terry Tao's blog) I came across The Planarity Game, which might have changed my attitude towards tidying up... Have fun!
Tuesday, September 18, 2007
Not quite infinite
Lubos has a memo where he discusses how physicists make (finite) sense of divergent sums like 1+10+100+1000+... or 1+2+3+4+5+... . The last is, as string theorists know, of course -1/12 as for example explained in GSW. Their trick is to read that sum as the value at s=-1 of and define that value via the analytic continuation of the given expression which is well defined only for real part of s>1.
Alternatively, he regularises as . Then, in an obscure analogy with minimal subtraction throws away the divergent term and takes the finite remainder as the physical value.
He justifies this by claiming agreement with experiment (here in the case of a Casimir force). This, I think, however, is a bit too weak. If you rely on arguments like this it is unclear how far they take you when you want to apply them to new problems where you do not yet know the answer. Of course, it is good practice for physicists to take calculational short-cuts. But you should always be aware that you are doing this and it feels much better if you can say "This is a bit dodgy, I know, and if you really insist we could actually come up with a rigorous argument that gives the same result.", i.e. if you have a justification in your sleeve for what you are doing.
Most of the time, when in a physics calculation you encounter an infinity that should not be there (of course, often "infinity" is just the correct result, questions like how much energy I have to put into the acceleration of an electron to bring it up to the speed of light? come to my mind), you are actually asking the wrong question. This could for example be because you made an idealisation that is not physically justified.
Some examples come to my mind: The 1+2+3+... sum arises when you try to naively compute the commutator of two Virasoro generators L_n for the free boson (the X fields on the string world sheet). There, L_n is given as an infinite sum over bilinears in a_k's, the modes of X. In the commutator, each summand gives a constant from operator ordering and when you sum up these constants you face the sum 1+2+3+...
Once you have such an expression, you can of course regularise it. But you should be suspicious that it is actually meaningful what you do. For example, it could be that you can come up with two regularisations that give different finite results. In that case you should better have an argument to decide which is the better one.
Such an argument could be a way to realise that the infinity is unphysical in the first place: In the Virasoro example, one should remember that the L_n stand for transformations of the states rather than observables themselves (outer vs. inner transformations of the observable algebra). Thus you should always apply them to states. But for a state that is a finite linear combination of excitations of the Fock vacuum there are always only a finite number of terms in the sum for the L_n that do not annihilate the state. Thus, for each such state the sum is actually finite. Thus the infinite sum is an illusion and if you take a bit more care about which terms actually contribute you find a result equivalent to the -1/12 value. This calculation is the one you should have actually done but the zeta function version is of course much faster.
My problem with the zeta function version is that to me (and to all people I have asked so far) it looks accidental: I have no expansion of the argument that connects it to the rigorous calculation. From the Virasoro algebra perspective it is very unnatural to introduce s as at least I know of no way to do the calculation with L_n and a_k with a free parameter s.
Another example are the infinities that arise in Feynman diagrams. Those arise when you do integrals over all momenta p. There are of course the usual tricks to avoid these infinities. But the reason they work is that the integral over all p is unphysical: For very large p, your quantum field theory is no longer the correct description and you should include quantum gravity effects or similar things. You should only integrate p up the scale where these other effects kick in and then do a proper computation that includes those effects. Again, the infinity disappears.
If you have a renormalisable theory you are especially lucky: There you don't really have to know the details of that high energy theory, you can subsume them into a proper redefinition of your coupling constants.
A similar thing can be seen in fluid dynamics: The Navier-Stokes equation has singular solutions much like Einstein's equations lead to singularities. So what shall we do with for example infinite pressure? Well, the answer is simple: The Navier-Stokes equation applies to a fluid. But the fluid equations are only an approximation valid at macroscopic scales. If you look at small scales you find individual water molecules and this discreteness is what saves you actually encountering infinite values.
There is an approach to perturbative QFT developed by Epstein and Glaser and explained for example in this book that demonstrates that the usual infinities arise only because you have not been careful enough earlier in your calculation.
There, the idea is that your field operators are actually operator valued distributions and that you cannot always multiply distributions. Sometimes you can, if their singularities (the places where they are not a function but really a distribution) are in different places or in different directions (in a precise sense) but in general you cannot.
The typical situation is that what you want to define (for example delta(x)^2) is still defined for a subset of your test functions. For example delta(x)^2 is well defined for test functions that vanish in a neighbourhood of 0. So you start with a distribution defined only for those test functions. Then, you want to extend that definition to all test-functions, even those that are finite around 0. It turns out that if you restrict the degree of divergence (the maximum number of derivatives acting on delta, this will later turn out to be related to the superficial scaling dimension) to be below some value, there is a finite dimensional solution space to this extension problem. In the case of phi^4 theory for example the two point distribution is fixed up to a multiple of delta(x) and a multiple of the d'Alambertian of delta(x), the solution space is two dimensional (if Lorentz invariance is taken into account). The two coefficients have to be fixed experimentally and of course are nothing but mass and wave function renormalisation. In this approach the counter terms are nothing but ambiguities of an extension problem of distributions.
I has been shown in highly technical papers, that this procedure is equivalent to BPHZ regularization and dimensional regularisation and thus it's save to use the physicist's short-cuts. But it's good to know that the infinities that one cures could have been avoided in the first place.
My last example is of slightly different flavour: Recently, I have met a number of mathematical physicists (i.e. mathematicians) that work on very complicated theorems about what they call stability of matter. What they are looking at is the quantum mechanics of molecules in terms of a Hamiltonian that includes a kinetic term for electrons and Coulomb potentials for electron-electron and electron-nucleus interactions. The position of the nuclei are external (classical) parameters and usually you minimise them with respect to the energy. What you want to show is that the spectrum of this Hamiltonian is bounded from below. This is highly non-trivial as the Coulomb potential itself alone is not bounded from below (-1/r becomes arbitrarily negative) and you have to balance it with the kinetic term. Physically, you want to show that you cannot gain an infinite amount of energy by throwing an electron into the nucleus.
Mathematically, this is a problem about complicated PDE's and people have made progress using very sophisticated tools. What is not clear to me is if this question is really physical: It could well be that it arises from an over-simplification: The nuclei are not point-like and thus the true charge distribution is not singular. Thus the physical potential is not unbounded from below. In addition, if you are worried about high energies (as would be around if the electron fell into a nucleus) the Schrödinger equation would no longer be valid and would have to be replaced with a Dirac equation and then of course the electro-magnetic interaction should no longer be treated classically and a proper QED calculation should be done. Thus if you are worried about what happens to the electron close to the nucleus in Schrödinger theory, you are asking an unphysical question. What still could be a valid result is that you show (and that might look very similar to a stability result) is that you don't really get out of the area of applicability of your theory as the kinetic term prevents the electrons from spending too much time very close to the nucleus (classically speaking).
What is shared by all these examples, is that some calculation of a physically finite property encounters infinities that have to be treated and I tried to show that those typically arise because earlier in your calculation you have not been careful and stretched an approximation beyond its validity. If you would have taken that into account there wouldn't have been an infinity but possible a much more complicated calculation. And in lucky cases (similar to the renormalisable situation) you can get away with ignoring these complications. However you can sleep much better if you know that there would have been another calculation without infinities.
Update: I have just found a very nice text by Terry Tao on a similar subject to "knowing there is a rigorous version somewhere".
Alternatively, he regularises as . Then, in an obscure analogy with minimal subtraction throws away the divergent term and takes the finite remainder as the physical value.
He justifies this by claiming agreement with experiment (here in the case of a Casimir force). This, I think, however, is a bit too weak. If you rely on arguments like this it is unclear how far they take you when you want to apply them to new problems where you do not yet know the answer. Of course, it is good practice for physicists to take calculational short-cuts. But you should always be aware that you are doing this and it feels much better if you can say "This is a bit dodgy, I know, and if you really insist we could actually come up with a rigorous argument that gives the same result.", i.e. if you have a justification in your sleeve for what you are doing.
Most of the time, when in a physics calculation you encounter an infinity that should not be there (of course, often "infinity" is just the correct result, questions like how much energy I have to put into the acceleration of an electron to bring it up to the speed of light? come to my mind), you are actually asking the wrong question. This could for example be because you made an idealisation that is not physically justified.
Some examples come to my mind: The 1+2+3+... sum arises when you try to naively compute the commutator of two Virasoro generators L_n for the free boson (the X fields on the string world sheet). There, L_n is given as an infinite sum over bilinears in a_k's, the modes of X. In the commutator, each summand gives a constant from operator ordering and when you sum up these constants you face the sum 1+2+3+...
Once you have such an expression, you can of course regularise it. But you should be suspicious that it is actually meaningful what you do. For example, it could be that you can come up with two regularisations that give different finite results. In that case you should better have an argument to decide which is the better one.
Such an argument could be a way to realise that the infinity is unphysical in the first place: In the Virasoro example, one should remember that the L_n stand for transformations of the states rather than observables themselves (outer vs. inner transformations of the observable algebra). Thus you should always apply them to states. But for a state that is a finite linear combination of excitations of the Fock vacuum there are always only a finite number of terms in the sum for the L_n that do not annihilate the state. Thus, for each such state the sum is actually finite. Thus the infinite sum is an illusion and if you take a bit more care about which terms actually contribute you find a result equivalent to the -1/12 value. This calculation is the one you should have actually done but the zeta function version is of course much faster.
My problem with the zeta function version is that to me (and to all people I have asked so far) it looks accidental: I have no expansion of the argument that connects it to the rigorous calculation. From the Virasoro algebra perspective it is very unnatural to introduce s as at least I know of no way to do the calculation with L_n and a_k with a free parameter s.
Another example are the infinities that arise in Feynman diagrams. Those arise when you do integrals over all momenta p. There are of course the usual tricks to avoid these infinities. But the reason they work is that the integral over all p is unphysical: For very large p, your quantum field theory is no longer the correct description and you should include quantum gravity effects or similar things. You should only integrate p up the scale where these other effects kick in and then do a proper computation that includes those effects. Again, the infinity disappears.
If you have a renormalisable theory you are especially lucky: There you don't really have to know the details of that high energy theory, you can subsume them into a proper redefinition of your coupling constants.
A similar thing can be seen in fluid dynamics: The Navier-Stokes equation has singular solutions much like Einstein's equations lead to singularities. So what shall we do with for example infinite pressure? Well, the answer is simple: The Navier-Stokes equation applies to a fluid. But the fluid equations are only an approximation valid at macroscopic scales. If you look at small scales you find individual water molecules and this discreteness is what saves you actually encountering infinite values.
There is an approach to perturbative QFT developed by Epstein and Glaser and explained for example in this book that demonstrates that the usual infinities arise only because you have not been careful enough earlier in your calculation.
There, the idea is that your field operators are actually operator valued distributions and that you cannot always multiply distributions. Sometimes you can, if their singularities (the places where they are not a function but really a distribution) are in different places or in different directions (in a precise sense) but in general you cannot.
The typical situation is that what you want to define (for example delta(x)^2) is still defined for a subset of your test functions. For example delta(x)^2 is well defined for test functions that vanish in a neighbourhood of 0. So you start with a distribution defined only for those test functions. Then, you want to extend that definition to all test-functions, even those that are finite around 0. It turns out that if you restrict the degree of divergence (the maximum number of derivatives acting on delta, this will later turn out to be related to the superficial scaling dimension) to be below some value, there is a finite dimensional solution space to this extension problem. In the case of phi^4 theory for example the two point distribution is fixed up to a multiple of delta(x) and a multiple of the d'Alambertian of delta(x), the solution space is two dimensional (if Lorentz invariance is taken into account). The two coefficients have to be fixed experimentally and of course are nothing but mass and wave function renormalisation. In this approach the counter terms are nothing but ambiguities of an extension problem of distributions.
I has been shown in highly technical papers, that this procedure is equivalent to BPHZ regularization and dimensional regularisation and thus it's save to use the physicist's short-cuts. But it's good to know that the infinities that one cures could have been avoided in the first place.
My last example is of slightly different flavour: Recently, I have met a number of mathematical physicists (i.e. mathematicians) that work on very complicated theorems about what they call stability of matter. What they are looking at is the quantum mechanics of molecules in terms of a Hamiltonian that includes a kinetic term for electrons and Coulomb potentials for electron-electron and electron-nucleus interactions. The position of the nuclei are external (classical) parameters and usually you minimise them with respect to the energy. What you want to show is that the spectrum of this Hamiltonian is bounded from below. This is highly non-trivial as the Coulomb potential itself alone is not bounded from below (-1/r becomes arbitrarily negative) and you have to balance it with the kinetic term. Physically, you want to show that you cannot gain an infinite amount of energy by throwing an electron into the nucleus.
Mathematically, this is a problem about complicated PDE's and people have made progress using very sophisticated tools. What is not clear to me is if this question is really physical: It could well be that it arises from an over-simplification: The nuclei are not point-like and thus the true charge distribution is not singular. Thus the physical potential is not unbounded from below. In addition, if you are worried about high energies (as would be around if the electron fell into a nucleus) the Schrödinger equation would no longer be valid and would have to be replaced with a Dirac equation and then of course the electro-magnetic interaction should no longer be treated classically and a proper QED calculation should be done. Thus if you are worried about what happens to the electron close to the nucleus in Schrödinger theory, you are asking an unphysical question. What still could be a valid result is that you show (and that might look very similar to a stability result) is that you don't really get out of the area of applicability of your theory as the kinetic term prevents the electrons from spending too much time very close to the nucleus (classically speaking).
What is shared by all these examples, is that some calculation of a physically finite property encounters infinities that have to be treated and I tried to show that those typically arise because earlier in your calculation you have not been careful and stretched an approximation beyond its validity. If you would have taken that into account there wouldn't have been an infinity but possible a much more complicated calculation. And in lucky cases (similar to the renormalisable situation) you can get away with ignoring these complications. However you can sleep much better if you know that there would have been another calculation without infinities.
Update: I have just found a very nice text by Terry Tao on a similar subject to "knowing there is a rigorous version somewhere".
Thursday, August 16, 2007
Not my two cent
Not only that theoretical physicists should be able to estimate any number (or at least its exponent), we feel that we can say something intelligent about almost any topic especially if it involves numbers. So today, I will give a shot at economics.
As a grad student, I had proposed to a friend (fellow string theory PhD student) that I would guess that with about three months study it should be possible to publish a research paper in economics. That was most likely complete hybris but I never made the effort (but I would like to point out that still my best cited paper(364 and counting) was written after only three months as a summer student stating from scratch in biophysics (but of course with great company who, however at that point were also biophysics amateurs)).
About the same time, I helped a friend with the math (linear inequalities mostly) his thesis in macro-economics only to find out his contribution to money market theory was the introduction of new variable to the theory which he showed to be too important to neglect but which unfortunately is not an observable... (it was about the amount of a currency not in its natural country but somewhere else which makes the central bank underestimate the relative change when they issue a certain absolute amount of that currency into the market. For example about two thirds of the US$760 billion are estimated to be overseas and the US dollar is even the official currency in a number of countries other than the US according to Wikipedia).
Economics is great for theoretical physicists as large parts are governed by a Schödinger equation missing an i (a.k.a. diffusion equation or Black-Scholes equation) and thus path integral techniques come in handy when computing derivative prices. However, it's probably the deviations from BS where the money is made as I learned from a nice book written by ex-physicists now making money by telling other people how to make money.
Of course this is a bit worrying: Why do consultants consult rather than make money directly? This is probably connected with my problem of understanding economic theory at stage one: All these derivations start out with the assumption that prices are fair and there cannot be arbitrage which is just a fancy way of saying that you cannot make profit or at least that prices immediately equalize such that you make the same profit with whatever you buy. If there is a random element involved it applies to the expectation value and the only thing that varies or that you can influence is the variance. This just means that you cannot expect to make profit. So why bother?
There are however at least four possibilities to still make profit:
As a grad student, I had proposed to a friend (fellow string theory PhD student) that I would guess that with about three months study it should be possible to publish a research paper in economics. That was most likely complete hybris but I never made the effort (but I would like to point out that still my best cited paper(364 and counting) was written after only three months as a summer student stating from scratch in biophysics (but of course with great company who, however at that point were also biophysics amateurs)).
About the same time, I helped a friend with the math (linear inequalities mostly) his thesis in macro-economics only to find out his contribution to money market theory was the introduction of new variable to the theory which he showed to be too important to neglect but which unfortunately is not an observable... (it was about the amount of a currency not in its natural country but somewhere else which makes the central bank underestimate the relative change when they issue a certain absolute amount of that currency into the market. For example about two thirds of the US$760 billion are estimated to be overseas and the US dollar is even the official currency in a number of countries other than the US according to Wikipedia).
Economics is great for theoretical physicists as large parts are governed by a Schödinger equation missing an i (a.k.a. diffusion equation or Black-Scholes equation) and thus path integral techniques come in handy when computing derivative prices. However, it's probably the deviations from BS where the money is made as I learned from a nice book written by ex-physicists now making money by telling other people how to make money.
Of course this is a bit worrying: Why do consultants consult rather than make money directly? This is probably connected with my problem of understanding economic theory at stage one: All these derivations start out with the assumption that prices are fair and there cannot be arbitrage which is just a fancy way of saying that you cannot make profit or at least that prices immediately equalize such that you make the same profit with whatever you buy. If there is a random element involved it applies to the expectation value and the only thing that varies or that you can influence is the variance. This just means that you cannot expect to make profit. So why bother?
There are however at least four possibilities to still make profit:
- You counsel other people how to make money and charge by the hour. Note that you get your money even if your advice was wrong. And of course it can be hard to tell that your advice was wrong: If you suggest to play Roulette and always put money on red and double when you lose most people will make (small) money following these instructions. Too bad a few people (assuming the limit is high enough) will have big losses. But in a poll many people will be happy with your advice. You don't even have to charge by the hour, you can sell your advice with full money back guarantee, in that way you participate in winnings but not in losses and that's already enough.
- You could actually produce something (even something non-material) and convert your resources (including your time and effort) into profit. But that's surplus and old fashioned. Note that at least infinitessimally your profit at time t is proportional to the economic activity A(t), i.e. as long as there is demand the more sausages the butcher produces the more money he makes.
- You trade for other people in the money market and receive a commission per transaction. As transactions are performed when the situation changes your will make profit proportional to the (absolute value) of the time derivative of A(t). Thus you have an interest that the situation is not too stable and stationary. This includes banks and rating agencies and many more.
- Finally, there is the early bird strategy: You get hold of a commodity (think: share in a dot-com company or high-risk mortgages) and then convince other people that this commodity is profitable so they as well will buy it. The price goes up (even if the true value is constant or zero) and indeed the people early in the game make profits. Of course if the true value is zero these profits are paid by the people who join too late as in any other pyramid scheme or chain letter. The core of all these models of course is as Walter Kunhardt pointed out to me
Give me $100. Then you can ask two other people to give you $100.
Of course, people following strategy three above like it if there is some activity of this type going on...
Thursday, August 09, 2007
Julius Wess 1934-2007
Just got an email form Hermann Nicolai:
Dear All,
this is to inform you of the passing away of Julius Wess who
was a teacher and friend to many of us. His untimely death (at the age of 72) is particularly tragic in view of the fact that he
would have been a sure candidate for the Nobel Prize in physics if supersymmetry
is discovered at LHC. We will always remember him as a great physicist
and human being.
Monday, August 06, 2007
Giraffes, Elephants and other scalings
It's not the first time I am blogging about the as I find amazing fact that with some simple scaling arguments you can estimate quite a number of things without knowing them a priori. You could even argue that this is a core competence of the theoretical physicist: If you consider your self as being one you should be able to guesstimate any number and at least get the order of magnitude right . I have been told of ob interviews for business consultant jobs where the candidates were asked how many bricks the Empire State Building was build from and it's the physicists who usually are quite good at this.
Today, on the arxiv, Don Page gives some more examples of such calculations which I find quite entertaining (even if Lubos argues that they are too anthropocentric and apparently does not understand the concept of order of magnitude calculation where one sets not only h=G=c=1 but 2=pi=1 (ever tried this in mathematica and done further calculations?) as well): Page aims to compute the size of the tallest land animals from first principles and get it basically right.
The basic argument goes like this: First you assume chemistry (i.e. the science of molecules) is essential for the existence and dynamics of the animals. Then the mass of the electron and the fine structure constant give you a Rydberg which is the typical energy scale for atoms (and via the Bohr radius and the mass of a proton gives you estimates for the density both of planets and animal bodies). Molecular excitation energies are down by a factor of proton over electron mass. This implies the typical temperature: It should not be so high that all molecules fly apart but still be warm enough that not all molecular dynamics freeze out.
From this and the assumption that at this temperature atmospheric gases should not at a large scale have thermal energies higher than the gravitational binding energies to the planet gives you an estimate on the size of a the planet and the gravity there. The final step is to either make sure that the animals do not break whenever they fall or to make sure the animals do not overheat when they move or that gravity can be overcome to make sure all parts of the body can be reached by blood (this is where the Giraffes come in).
Of course these arguments assume that some facts about animals are not too different from what we find here (and some assumptions do not hold if all happens within a liquid, the pressure argument and the argument about falling which is why whales can be much bigger than land animals), but still I find it very interesting that one can "prove" why we are not much smaller or larger.
There is a very entertaining paper which makes similar arguments just the other way round (the title is misleading, its really about physics rather than biology): It argues why things common in B movies (people/animals much too large or too small) would not work in real life: King Kong for example would immediately break all his bones if he made one step. On the other hand, if we were a bit smaller, we could fall from any height as the terminal velocity would be much smaller. But simultaneously the surface tension of water would pose severe problems with drinking.
I would recommend this paper especially to the author of an article in this week's "Die Zeit" about nano scale machines that reports amongst other things about a nano-car with four wheels made of bucky balls. Understanding how things change when you try to scale things down show how the whole concept of wheels and rolling does not make sense at very small scales: First of all Brownian movement poses real threats, and then roughness of surfaces at the atomic scale would make any ride very bumpy (the author mentions these two things). But what I think is much more important is that gravity is completely negligible as your nano car would either float in the air or be glued by electrostatic forces (which for example cause most of the experimental headaches to people building submillimeter Cavendish pendulums to check the 1/r law or its modifications due to large extra dimensions) to the surface both perspectives not compatible with wheels and a rolling.
So there are good reasons why we are between one and two meters tall and why our engines and factories are not much smaller.
Today, on the arxiv, Don Page gives some more examples of such calculations which I find quite entertaining (even if Lubos argues that they are too anthropocentric and apparently does not understand the concept of order of magnitude calculation where one sets not only h=G=c=1 but 2=pi=1 (ever tried this in mathematica and done further calculations?) as well): Page aims to compute the size of the tallest land animals from first principles and get it basically right.
The basic argument goes like this: First you assume chemistry (i.e. the science of molecules) is essential for the existence and dynamics of the animals. Then the mass of the electron and the fine structure constant give you a Rydberg which is the typical energy scale for atoms (and via the Bohr radius and the mass of a proton gives you estimates for the density both of planets and animal bodies). Molecular excitation energies are down by a factor of proton over electron mass. This implies the typical temperature: It should not be so high that all molecules fly apart but still be warm enough that not all molecular dynamics freeze out.
From this and the assumption that at this temperature atmospheric gases should not at a large scale have thermal energies higher than the gravitational binding energies to the planet gives you an estimate on the size of a the planet and the gravity there. The final step is to either make sure that the animals do not break whenever they fall or to make sure the animals do not overheat when they move or that gravity can be overcome to make sure all parts of the body can be reached by blood (this is where the Giraffes come in).
Of course these arguments assume that some facts about animals are not too different from what we find here (and some assumptions do not hold if all happens within a liquid, the pressure argument and the argument about falling which is why whales can be much bigger than land animals), but still I find it very interesting that one can "prove" why we are not much smaller or larger.
There is a very entertaining paper which makes similar arguments just the other way round (the title is misleading, its really about physics rather than biology): It argues why things common in B movies (people/animals much too large or too small) would not work in real life: King Kong for example would immediately break all his bones if he made one step. On the other hand, if we were a bit smaller, we could fall from any height as the terminal velocity would be much smaller. But simultaneously the surface tension of water would pose severe problems with drinking.
I would recommend this paper especially to the author of an article in this week's "Die Zeit" about nano scale machines that reports amongst other things about a nano-car with four wheels made of bucky balls. Understanding how things change when you try to scale things down show how the whole concept of wheels and rolling does not make sense at very small scales: First of all Brownian movement poses real threats, and then roughness of surfaces at the atomic scale would make any ride very bumpy (the author mentions these two things). But what I think is much more important is that gravity is completely negligible as your nano car would either float in the air or be glued by electrostatic forces (which for example cause most of the experimental headaches to people building submillimeter Cavendish pendulums to check the 1/r law or its modifications due to large extra dimensions) to the surface both perspectives not compatible with wheels and a rolling.
So there are good reasons why we are between one and two meters tall and why our engines and factories are not much smaller.
Tuesday, July 24, 2007
Kids and Computers
Via Mark Jason Dominus' blog I learned about this paper: The Camel has two humps. It's written computer science professors that wonder why independent of teaching method used (and programming language paradigm) there seems to be a constant fraction of students who after an introductory course are not able to program a computer.
They claim this is not strongly correlated with intellectual capacity or grades for example in math. However, what they present is that there is a simple predictor of success in an introductory course in computer programming. Even before the course starts and assuming that the students have no prior knowledge in programming you give a number of problems of the following type:
The important thing is how to analyse the answers: The students not having been taught the correct meaning of the progamming language have several possibilities: Either they refuse outright to answer these problems as they do not know the answer. Or they guess. Given the way the question is phrased they might guess the the equal sign is not a logical operator but some sort of assignment. Then still they do not know how it works exactly, but there are several possibilities: Right to left (the correct one), left to right, some kind of shift that leaves the originating variable 'empty' or some add and assign procedure. It doesn't matter for which possibility the students decide, what counts (and that they are not told) is that between problems they stick to one interpretation. According to the paper, the final grades of both groups, the consistent and the inconsistent students, both follow some Gaussian distribution but with the consistent students in the region of good marks and the inconsistent students in the fail region.
This brings me to my topic for this post: Different philosophies approaching a computer. Last week, I (for the first time in my life, me previous computer experiences being 100% self study and talking to more experienced users one to one) at to sit through a computer course that demonstrated the content management system (CMS) that LMU websites have to use. It started out with "you have to double click on the blue 'e' to go to the internet' but it got better from there. The whole thing took three hours and wasn't to painful (but net exactly efficient) and finally got me the required account so I can now post web pages via the CMS. The sad thing about this course was that obviously this is the way many people use computers/software: They are told in a step after step manner how to do things and eventually they can perform these steps. In other words they act like computers themselves.
The problem with this approach of course is that the computer will always stay some sort of black box which is potentially scary and you are immediately lost once something is not as expected.
I think the crucial difference comes once you have been programming yourself. Of course, it is not essential to have written your own little office suite to be able to type a letter in word but very often I find myself thinking "if I had written this program, how would I have done it and how would I want the user to invoke this functionality?". This kind of question comes especially handy in determining what kind of information (in the form of settings and parameters) I have to supply to the computer that it can complete a certain task. Having some sort of programming experience also comes handy when you need find out why the computer is not doing what you expect it to do, some generalised version of debugging, dividing the problem into small parts, checking if they work, trying alternative ways etc.
This I consider the most basic and thus most important part of IT literacy, much more fundamental than knowing how you can convert a table of numbers into a pie chart using Excel or formating a formula in TeX (although that can come close as TeX is Turing complete... but at least you have to be able to define macros etc). You cannot start early enough with these skills. When you are still a kid you should learn how to write at least a number of simple programs.
20 years ago that was simple: The first computer I had under my fingers (I didn't own one but my friend Rüdi did, mine came later as my dad had the idea of buying a home assembly kit for an early 68k computer that took months to get going) greeted you with "38911 BASIC BYTES FREE" when you turned it on. Of course you could play games (and many of my mates entirely did that) but still the initial threshold was extremely low to start out with something along the lines of
With a computer running windows this threshold is much higher: Yes you have a GUI and can move the mouse but how can you get the stupid thing to do something slightly non-trivial?
For Linux the situation is slightly better: There the prompt comes natural, and soon you will start putting several commands in a file to execute and there is your first shell script. Plus there is C and Perl and the like preinstalled you already have it and the way to the first "Hello world" is not that long.
So parents, if you read this: I think you really do your kids a big favour in the long run if you make sure they get to see a prompt on their computer. An additional plus is of course that Linux much better runs on dated (i.e. used) hardware. Let them play games, no problem, just make sure programming is an option that is available.
And yes, even C is a language that can be a first programming language although all those core dumps can be quite frustrating (of course Perl is much better suited for this as you can use it like the BASIC of the old days). My first C compiler ran on my Atari ST (after my dad was convinced that with the home build one we didn't get very far) which then (1985) had only a floppy drive (10 floppies in a pack for 90DM, roughly 50$) but 1MB RAM (much much more than the Commodore 64's of those days and nearly twice as much as PCs) so you could run a RAM disk. I had a boot disk that copied the C compiler (and editor and linker etc) into that ramdisk and off you went with the programming. The boot up procedure took up to five minutes and had to be repeated every time you code core dumped because you had gotten some stupid pointers wrong. Oh, happy days of the past...
They claim this is not strongly correlated with intellectual capacity or grades for example in math. However, what they present is that there is a simple predictor of success in an introductory course in computer programming. Even before the course starts and assuming that the students have no prior knowledge in programming you give a number of problems of the following type:
int a=20;
int b=30;
a=b;
What are the new values of a and b?
The important thing is how to analyse the answers: The students not having been taught the correct meaning of the progamming language have several possibilities: Either they refuse outright to answer these problems as they do not know the answer. Or they guess. Given the way the question is phrased they might guess the the equal sign is not a logical operator but some sort of assignment. Then still they do not know how it works exactly, but there are several possibilities: Right to left (the correct one), left to right, some kind of shift that leaves the originating variable 'empty' or some add and assign procedure. It doesn't matter for which possibility the students decide, what counts (and that they are not told) is that between problems they stick to one interpretation. According to the paper, the final grades of both groups, the consistent and the inconsistent students, both follow some Gaussian distribution but with the consistent students in the region of good marks and the inconsistent students in the fail region.
This brings me to my topic for this post: Different philosophies approaching a computer. Last week, I (for the first time in my life, me previous computer experiences being 100% self study and talking to more experienced users one to one) at to sit through a computer course that demonstrated the content management system (CMS) that LMU websites have to use. It started out with "you have to double click on the blue 'e' to go to the internet' but it got better from there. The whole thing took three hours and wasn't to painful (but net exactly efficient) and finally got me the required account so I can now post web pages via the CMS. The sad thing about this course was that obviously this is the way many people use computers/software: They are told in a step after step manner how to do things and eventually they can perform these steps. In other words they act like computers themselves.
The problem with this approach of course is that the computer will always stay some sort of black box which is potentially scary and you are immediately lost once something is not as expected.
I think the crucial difference comes once you have been programming yourself. Of course, it is not essential to have written your own little office suite to be able to type a letter in word but very often I find myself thinking "if I had written this program, how would I have done it and how would I want the user to invoke this functionality?". This kind of question comes especially handy in determining what kind of information (in the form of settings and parameters) I have to supply to the computer that it can complete a certain task. Having some sort of programming experience also comes handy when you need find out why the computer is not doing what you expect it to do, some generalised version of debugging, dividing the problem into small parts, checking if they work, trying alternative ways etc.
This I consider the most basic and thus most important part of IT literacy, much more fundamental than knowing how you can convert a table of numbers into a pie chart using Excel or formating a formula in TeX (although that can come close as TeX is Turing complete... but at least you have to be able to define macros etc). You cannot start early enough with these skills. When you are still a kid you should learn how to write at least a number of simple programs.
20 years ago that was simple: The first computer I had under my fingers (I didn't own one but my friend Rüdi did, mine came later as my dad had the idea of buying a home assembly kit for an early 68k computer that took months to get going) greeted you with "38911 BASIC BYTES FREE" when you turned it on. Of course you could play games (and many of my mates entirely did that) but still the initial threshold was extremely low to start out with something along the lines of
10 PRINT "HELLO WORLD": GOTO 10
With a computer running windows this threshold is much higher: Yes you have a GUI and can move the mouse but how can you get the stupid thing to do something slightly non-trivial?
For Linux the situation is slightly better: There the prompt comes natural, and soon you will start putting several commands in a file to execute and there is your first shell script. Plus there is C and Perl and the like preinstalled you already have it and the way to the first "Hello world" is not that long.
So parents, if you read this: I think you really do your kids a big favour in the long run if you make sure they get to see a prompt on their computer. An additional plus is of course that Linux much better runs on dated (i.e. used) hardware. Let them play games, no problem, just make sure programming is an option that is available.
And yes, even C is a language that can be a first programming language although all those core dumps can be quite frustrating (of course Perl is much better suited for this as you can use it like the BASIC of the old days). My first C compiler ran on my Atari ST (after my dad was convinced that with the home build one we didn't get very far) which then (1985) had only a floppy drive (10 floppies in a pack for 90DM, roughly 50$) but 1MB RAM (much much more than the Commodore 64's of those days and nearly twice as much as PCs) so you could run a RAM disk. I had a boot disk that copied the C compiler (and editor and linker etc) into that ramdisk and off you went with the programming. The boot up procedure took up to five minutes and had to be repeated every time you code core dumped because you had gotten some stupid pointers wrong. Oh, happy days of the past...
Thursday, June 28, 2007
Near encouters and chaotic spectral statistics
Yesterday, Fritz Haake gave an interesting talk in the ASC colloquium. He explained why the observed statistics of energy levels characteristic for classically chaotic systems can be understood.
Classically, it is a characterisation of chaotic behaviour that if you start with similar initial conditions the distance will separate exponentially over time. This is measured by the Lyapunov exponent. Quantum mechanically, the situation is more complicated as the notion of paths is no longer available.
However it had been noticed already some time ago that if you quantise a classically chaotic system, the energy levels have a characteristic statistic: It's not the individual energy levels but you have to consider the difference between nearby levels. It the levels were random, the differences would be Poisson distributed (for a fixed density of states). However what one observes is a Wigner-Dyson distribution: It starts out with (E-E')^n for some small integer n (which depends on the symmetry of the system) before it falls of exponentially. This is just the same distribution that one obtains in random matrix theory (where n depends on the ensemble of matrices, orthogonal, unitary or symplectic). This distribution is supposed to be characteristic for chaos and does not depend (beyond the universality classes) on the specific system.
In the collqium now, Haake explained the connection between a positive Lyapunov exponent and level statistics.
Let us assume for simplicity that the hypersurfaces of constant energy in phase space are compact. This is for example the case for billards, the toy systems of chaos people: You draw some wall in hyperbolic space and study free motion with reflections at this wall. Now you consider very long periodic orbits (it's another property of chaotic systems that these exist). Because there is not too much room in the constant energy surface there will be a number of points where the periodic orbit nearly self-intersects (it cannot exactly self-intersect as the equation of motion in phase space is first order). You can think of the periodic orbit then as starting from this encounter point, doing some sort of loop coming back and leaving along the other loop.
Now, there is a nice fact about chaotic systems: For these self encounters there is always a nearby periodic orbit which is very similar along the loops but which connects the loops differently at the self encounter. Here is a simple proof of this fact: The strong dependence on initial conditions is just the same as stability of the boundary value problem: Let's ask what classical paths of the system are there such that x(t0)=x0 and x(t1)=x1. If you now vary x0 or x1 slightly, the solution will only very a tiny bit and the variation is exponentially small away from the endpoints x0 and x1! This is easy to see by considering a midpoint x(t) for t0<t<t1: The path has some position and velocity there. Because of the positive Lyapunov exponent, if you vary position or velocity at t, the end-points of the path will vary exponentially. Counting dimensions you see that an open set of varying position and velocity at t maps to an exponentially larger open set of x0 and x1. Thus, 'normal' variation at the end-points corresponds to exponentially small variation of mid-points.
Now you treat the point of near self encouter of the periodic orbit as boundary points of the loops and move them a bit to reconnect differnetly and you see that the change of the path in the loops is exponentially small.
Thus for a periodic orbit with n l-fold self-encounters, there are (l!)^n nearby periodic orbits that nearly differ only be reconnections at the self-encounters. This was the classical part of the argument.
On the quantum side, instead of the energy difference between adjacent levels (which is complicated to treat analytically) one should consider the two-point correlation for the density of states. This can be Fourier transformed to the time domain and for this Fourier transform there is a semiclassical expression coming from path integrals in terms of sums over periodic orbits. Now, the two point correlation receives contributions from correlations between two periodic orbits. The leading behaviour (as was known for a long time) is determined between the correlation between one periodic orbit and itself.
The new result is that the sub-leading contributions (which sum up to the Wigner Dyson distribution) can be computed by looking at the combinatorics of the a periodic orbit and its correlation with the other periodic orbits obtained by reconnecting at the near encounter points.
If you want to know the details, you have to look at the papers of Haake's group.
Another approach to these statistics is via the connection of random matrix theory to non-linear sigma models (as string theorists know). Haake claims that the combinatorics of these reconnections is in one to one correspondence to the Feynman diagrams of the NLSM perturbation theory although he didn't go into the details.
BTW, I just received a URL for the videos from Strings 07 for us Linux users which had problems with the files on the conference web page.
Classically, it is a characterisation of chaotic behaviour that if you start with similar initial conditions the distance will separate exponentially over time. This is measured by the Lyapunov exponent. Quantum mechanically, the situation is more complicated as the notion of paths is no longer available.
However it had been noticed already some time ago that if you quantise a classically chaotic system, the energy levels have a characteristic statistic: It's not the individual energy levels but you have to consider the difference between nearby levels. It the levels were random, the differences would be Poisson distributed (for a fixed density of states). However what one observes is a Wigner-Dyson distribution: It starts out with (E-E')^n for some small integer n (which depends on the symmetry of the system) before it falls of exponentially. This is just the same distribution that one obtains in random matrix theory (where n depends on the ensemble of matrices, orthogonal, unitary or symplectic). This distribution is supposed to be characteristic for chaos and does not depend (beyond the universality classes) on the specific system.
In the collqium now, Haake explained the connection between a positive Lyapunov exponent and level statistics.
Let us assume for simplicity that the hypersurfaces of constant energy in phase space are compact. This is for example the case for billards, the toy systems of chaos people: You draw some wall in hyperbolic space and study free motion with reflections at this wall. Now you consider very long periodic orbits (it's another property of chaotic systems that these exist). Because there is not too much room in the constant energy surface there will be a number of points where the periodic orbit nearly self-intersects (it cannot exactly self-intersect as the equation of motion in phase space is first order). You can think of the periodic orbit then as starting from this encounter point, doing some sort of loop coming back and leaving along the other loop.
Now, there is a nice fact about chaotic systems: For these self encounters there is always a nearby periodic orbit which is very similar along the loops but which connects the loops differently at the self encounter. Here is a simple proof of this fact: The strong dependence on initial conditions is just the same as stability of the boundary value problem: Let's ask what classical paths of the system are there such that x(t0)=x0 and x(t1)=x1. If you now vary x0 or x1 slightly, the solution will only very a tiny bit and the variation is exponentially small away from the endpoints x0 and x1! This is easy to see by considering a midpoint x(t) for t0<t<t1: The path has some position and velocity there. Because of the positive Lyapunov exponent, if you vary position or velocity at t, the end-points of the path will vary exponentially. Counting dimensions you see that an open set of varying position and velocity at t maps to an exponentially larger open set of x0 and x1. Thus, 'normal' variation at the end-points corresponds to exponentially small variation of mid-points.
Now you treat the point of near self encouter of the periodic orbit as boundary points of the loops and move them a bit to reconnect differnetly and you see that the change of the path in the loops is exponentially small.
Thus for a periodic orbit with n l-fold self-encounters, there are (l!)^n nearby periodic orbits that nearly differ only be reconnections at the self-encounters. This was the classical part of the argument.
On the quantum side, instead of the energy difference between adjacent levels (which is complicated to treat analytically) one should consider the two-point correlation for the density of states. This can be Fourier transformed to the time domain and for this Fourier transform there is a semiclassical expression coming from path integrals in terms of sums over periodic orbits. Now, the two point correlation receives contributions from correlations between two periodic orbits. The leading behaviour (as was known for a long time) is determined between the correlation between one periodic orbit and itself.
The new result is that the sub-leading contributions (which sum up to the Wigner Dyson distribution) can be computed by looking at the combinatorics of the a periodic orbit and its correlation with the other periodic orbits obtained by reconnecting at the near encounter points.
If you want to know the details, you have to look at the papers of Haake's group.
Another approach to these statistics is via the connection of random matrix theory to non-linear sigma models (as string theorists know). Haake claims that the combinatorics of these reconnections is in one to one correspondence to the Feynman diagrams of the NLSM perturbation theory although he didn't go into the details.
BTW, I just received a URL for the videos from Strings 07 for us Linux users which had problems with the files on the conference web page.
Thursday, May 24, 2007
Shameless promotion
Update: Due to some strange web server configuration at LMU, people coming from a blogspot.com address were denied access to the TMP web pages. This should be fixed now.
By now, I have settled a bit in Munich (and yes, I like beer gardens) , found a flat to move into in a week and started my new job as scientific coordinator of a new graduate course in theoretical and mathematical physics. There are many new things to learn (for example how to interact with the university's lawyer to come up with documents defining examination and application procedures for the course which both satisfy the scientists and the legal department) and it's quite exciting. The only downside is that right now as we have to get things going I have not actively done any physics in the past three weeks.
But today, Wolfgang, my collaborator from Erlangen and former office mate from Bremen comes to visit for two days and we hope to put some of the finishing touches on our entropy project. And yes, Jiangyang, I have not forgotten you and our non-commutative project and will restart working on it very soon. Promise!
What I wanted to talk about is that yesterday, the web page for the Elite Graduate Course in Theoretical and Mathematical Physics went on-line. A lot of things there are still preliminary but we wanted to get as much information out as soon as possible as the deadline (July 15th) for applications for the course starting in fall is approaching fast.
So if your are interested in theoretical physics (including quantum field theories and strings but not exclusively, there courses in condensed matter theory and statistical physics/maths as well) and looking for a graduate school you should definitely consider us.
Or if you know somebody in that situation, please tell him/her about our program!
I think, at least in Europe, this program is quite unique: It is a very demanding course offering classes in a number of advanced topics of current interest which in a very short time bring students up to the forefront of research. It hinges on the fact that the Munich area with its two universities (LMU and TUM) and several Max Planck institutes plus Erlangen university has an exceptional large number of leading researchers who teach courses in their area of specialisation. The program is run jointly by the physics and math department and several classes will be taught jointly by a mathematician and a physicist so students can obtain a wide perspective on topics on the intersection of these disciplines.
In addition to the courses on the web page which are scheduled on a regular basis, there will be
a large number of smaller courses on topics of recent interest or more specialised subjects to be decided on close to the time when they will be given.
Furthermore, it is planned (and there are reserved slots in the schedule) to have lectures given by visiting scientists adding expertise complementing the local one. Thus, if you are reading this and are further in your career to apply for a graduate course but have an idea for an interesting lecture course (like for example it could be given on a summer school) you could teach and would fancy visiting Munich (I mentioned the beer gardens above) please do get in touch with me! We do have significant money to make this possible.
By now, I have settled a bit in Munich (and yes, I like beer gardens) , found a flat to move into in a week and started my new job as scientific coordinator of a new graduate course in theoretical and mathematical physics. There are many new things to learn (for example how to interact with the university's lawyer to come up with documents defining examination and application procedures for the course which both satisfy the scientists and the legal department) and it's quite exciting. The only downside is that right now as we have to get things going I have not actively done any physics in the past three weeks.
But today, Wolfgang, my collaborator from Erlangen and former office mate from Bremen comes to visit for two days and we hope to put some of the finishing touches on our entropy project. And yes, Jiangyang, I have not forgotten you and our non-commutative project and will restart working on it very soon. Promise!
What I wanted to talk about is that yesterday, the web page for the Elite Graduate Course in Theoretical and Mathematical Physics went on-line. A lot of things there are still preliminary but we wanted to get as much information out as soon as possible as the deadline (July 15th) for applications for the course starting in fall is approaching fast.
So if your are interested in theoretical physics (including quantum field theories and strings but not exclusively, there courses in condensed matter theory and statistical physics/maths as well) and looking for a graduate school you should definitely consider us.
Or if you know somebody in that situation, please tell him/her about our program!
I think, at least in Europe, this program is quite unique: It is a very demanding course offering classes in a number of advanced topics of current interest which in a very short time bring students up to the forefront of research. It hinges on the fact that the Munich area with its two universities (LMU and TUM) and several Max Planck institutes plus Erlangen university has an exceptional large number of leading researchers who teach courses in their area of specialisation. The program is run jointly by the physics and math department and several classes will be taught jointly by a mathematician and a physicist so students can obtain a wide perspective on topics on the intersection of these disciplines.
In addition to the courses on the web page which are scheduled on a regular basis, there will be
a large number of smaller courses on topics of recent interest or more specialised subjects to be decided on close to the time when they will be given.
Furthermore, it is planned (and there are reserved slots in the schedule) to have lectures given by visiting scientists adding expertise complementing the local one. Thus, if you are reading this and are further in your career to apply for a graduate course but have an idea for an interesting lecture course (like for example it could be given on a summer school) you could teach and would fancy visiting Munich (I mentioned the beer gardens above) please do get in touch with me! We do have significant money to make this possible.
Friday, April 27, 2007
Packing again
I started this blog two and a half years ago when I had just moved to Bremen and discovered (not too surprisingly) that IUB is not as busy physics wise as DAMTP had been. I wanted to have some forum to discuss whatever crossed my mind and what I would have bored the other people at the morning coffee/tea meetings in Cambridge with.
Now my time here is up and once again I have put nearly all my life into moving boxes. I am about to return my keys and tomorrow I will be heading to Munich where next week I will start my new position as "Scientific Coordinator" of a new (to be started in autumn) 'elite' master course in theoretical and mathamatical physics chaired by Dieter Lüst.
This promises to be quite an exciting and attractive course which teaches many interesting subjects of choice ranging from condensed matter theory to QFT/particles and string theory. It will be run by math and physics departmens from both Munich univerisities joint by other places like Erlangen and people from the Max Planck Institut fü Physik (Heisenberg institute).
So if you are a student about to graduate (or obtain a Vordiplom) with strong interests in theoretical and mathematical physics you should seriously consider applying there!
I will be taking care of all kinds of organisational stuff and admin of this course and still hopefully have some time for actual physics (as I was promised). In any case, joining the big Munich string comunity (parts of which I know from earlier times like in Berlin) I am looking forward too and hope that moving to another (this time: high price) city will turn out well.
Now my time here is up and once again I have put nearly all my life into moving boxes. I am about to return my keys and tomorrow I will be heading to Munich where next week I will start my new position as "Scientific Coordinator" of a new (to be started in autumn) 'elite' master course in theoretical and mathamatical physics chaired by Dieter Lüst.
This promises to be quite an exciting and attractive course which teaches many interesting subjects of choice ranging from condensed matter theory to QFT/particles and string theory. It will be run by math and physics departmens from both Munich univerisities joint by other places like Erlangen and people from the Max Planck Institut fü Physik (Heisenberg institute).
So if you are a student about to graduate (or obtain a Vordiplom) with strong interests in theoretical and mathematical physics you should seriously consider applying there!
I will be taking care of all kinds of organisational stuff and admin of this course and still hopefully have some time for actual physics (as I was promised). In any case, joining the big Munich string comunity (parts of which I know from earlier times like in Berlin) I am looking forward too and hope that moving to another (this time: high price) city will turn out well.
Wednesday, April 04, 2007
Causality issues
Yesterday, I was reading a paper by Ellis, Maartens and MacCallum on "Causality and the Speed of Sound" which made me rethink a few things about causality which I thought I new and now would like to share. See also an old post on faster than light communication.
First of all, there is the connection between causality and special relativity: It is a misconception that a relativistic theory is automatically causal. Just because you contracted all Lorentz indices properly does not mean that in your theory there is no propagation faster than light. There is an easy counter-example: Take the Lagrangian . where f is a smooth function (actually quadratic is enough for the effect) of the usual kinetic term of a scalar phi. I have already typed up a brief discussion of this theory but then I realised that this might actually be the basis of a nice exam problem (hey guys, are you reading this???) for the QFT course I am currently teaching. So just a sketch at this point: The equation of motion allows for solutions of the form and when you now expand small fluctuations around this solution you see that they propagate with an adjustable speed of sound depending on f and V.
Obviously, this theory is Lorentz invariant, it's only the solution which breaks this invariance (as most interesting solutions of field theories do).
The next thing is how you interpret this result: For suitably chosen V and f you can communicate with space-time points which are space-like to you. So is that really bad? If you think about it (or read the above mentioned paper) you find that this is not necessarily so: You really only get into trouble with causality if you have the possibility to call yourself in the past and tell you the lottery numbers of a drawing in the future of your past self.
If you can communicate with space-points, this can happen: If you send a signal faster than the speed of light to a point P which is space like to you, then from there it can be sent to your past, part of which is again space-like to P. If the sender at P (a mirror is enough) is moving the speed of communication (as measured by the respective sender) has to be only infinitesimally faster than the speed of light (if the whole set-up is Lorentz invariant).
In the theory above, however, this cannot happen: The communication using the fluctuations of the field phi is always to the future as defined by the flow of the vector field V (which we assume to be time-like and future directed). Thus you cannot send signals to points which are upstream in that flow and all of your past is. And using light (according to the usual light-cones) does not help either.
This only changes if you have two such field with superluminus fluctuations: Then you can use one field's fluctuations to send to P (which has to be downstream for that field) and the other field to send the signal from P to your past. So strictly speaking, only if you have two such fields, there is potential for sci-fi stories or get rich fast (or actually: in the past) schemes. But who stops you to have two such fields if one is already around?
At this point, it might be helpful to formalise the notion of "sending signals" a bit further. This also helps to better understand the various notions of velocity which are around when you have non-trivial dispersion relations: As an undergrad you learn that there is the phase velocity which is and that there is the group velocity but at least to me nobody really explained why the later one is important. It was only claimed that it is this velocity which is responsible for signal propagation.
Anyway, what you probably really want is the following: You have some hyperbolic field equation which you solve for some Cauchy data. Then you change the Cauchy data in a compact region K and solve again. Hopefully, the two solution differ only in the region causally connected to K. For this, it is the highest derivative term in the field equation (the leading symbol) which matters and if you Fourier transform you see this is actually the group velocity.
Formulating this "sending to P and back" procedure is a bit more complicated. My suspicion is that it's like the initial value problem when you have closed time-like loops: Then not all initial data is consistent: If my time fore example is periodic with a period of one year I should only give initial data which produces a solution with the same periodicity. But how exactly does this work for the two superluminal fields?
There is one further complication: If gravity is turned on and I have to give initial data for it as well, things get a lot more complicated as the question of a point with given coordinates is space-like to me depends on the metric. But my guess would be that also changes in the metric propagate only with maximally the speed of light in the reference metric.
And finally, there is the problem that the theory above (for non-linear f) is a higher derivative theory. Thus the initial value problem in that theory is likely to require more than phi and its time derivative to be given on the Cauchy surface.
First of all, there is the connection between causality and special relativity: It is a misconception that a relativistic theory is automatically causal. Just because you contracted all Lorentz indices properly does not mean that in your theory there is no propagation faster than light. There is an easy counter-example: Take the Lagrangian . where f is a smooth function (actually quadratic is enough for the effect) of the usual kinetic term of a scalar phi. I have already typed up a brief discussion of this theory but then I realised that this might actually be the basis of a nice exam problem (hey guys, are you reading this???) for the QFT course I am currently teaching. So just a sketch at this point: The equation of motion allows for solutions of the form and when you now expand small fluctuations around this solution you see that they propagate with an adjustable speed of sound depending on f and V.
Obviously, this theory is Lorentz invariant, it's only the solution which breaks this invariance (as most interesting solutions of field theories do).
The next thing is how you interpret this result: For suitably chosen V and f you can communicate with space-time points which are space-like to you. So is that really bad? If you think about it (or read the above mentioned paper) you find that this is not necessarily so: You really only get into trouble with causality if you have the possibility to call yourself in the past and tell you the lottery numbers of a drawing in the future of your past self.
If you can communicate with space-points, this can happen: If you send a signal faster than the speed of light to a point P which is space like to you, then from there it can be sent to your past, part of which is again space-like to P. If the sender at P (a mirror is enough) is moving the speed of communication (as measured by the respective sender) has to be only infinitesimally faster than the speed of light (if the whole set-up is Lorentz invariant).
In the theory above, however, this cannot happen: The communication using the fluctuations of the field phi is always to the future as defined by the flow of the vector field V (which we assume to be time-like and future directed). Thus you cannot send signals to points which are upstream in that flow and all of your past is. And using light (according to the usual light-cones) does not help either.
This only changes if you have two such field with superluminus fluctuations: Then you can use one field's fluctuations to send to P (which has to be downstream for that field) and the other field to send the signal from P to your past. So strictly speaking, only if you have two such fields, there is potential for sci-fi stories or get rich fast (or actually: in the past) schemes. But who stops you to have two such fields if one is already around?
At this point, it might be helpful to formalise the notion of "sending signals" a bit further. This also helps to better understand the various notions of velocity which are around when you have non-trivial dispersion relations: As an undergrad you learn that there is the phase velocity which is and that there is the group velocity but at least to me nobody really explained why the later one is important. It was only claimed that it is this velocity which is responsible for signal propagation.
Anyway, what you probably really want is the following: You have some hyperbolic field equation which you solve for some Cauchy data. Then you change the Cauchy data in a compact region K and solve again. Hopefully, the two solution differ only in the region causally connected to K. For this, it is the highest derivative term in the field equation (the leading symbol) which matters and if you Fourier transform you see this is actually the group velocity.
Formulating this "sending to P and back" procedure is a bit more complicated. My suspicion is that it's like the initial value problem when you have closed time-like loops: Then not all initial data is consistent: If my time fore example is periodic with a period of one year I should only give initial data which produces a solution with the same periodicity. But how exactly does this work for the two superluminal fields?
There is one further complication: If gravity is turned on and I have to give initial data for it as well, things get a lot more complicated as the question of a point with given coordinates is space-like to me depends on the metric. But my guess would be that also changes in the metric propagate only with maximally the speed of light in the reference metric.
And finally, there is the problem that the theory above (for non-linear f) is a higher derivative theory. Thus the initial value problem in that theory is likely to require more than phi and its time derivative to be given on the Cauchy surface.
Tuesday, February 20, 2007
Generalised Geometries and Flux Compactifications
For two days, I have attended a workshop at Hamburg on Generalised Geometries and Flux Compactifications. Even though the talks have been generally of amazingly high quality and with only a few exceptions have been very interesting I refrain from giving you summaries of the individual contributions. If you are interested, have a look at the conference website and check out the speakers most recent (or upcoming) papers.
I still like to mention two things though: Firstly there are the consequences of having wireless network links available in lecture halls: By now, we are all used to people doing their email during talks if one is less interested in what is currently going on on stage. Or alternatively, you try not to be so nosy as to try to read the emails on the laptop of the person in the row in front of you. But what I have encountered for the first time is that the speaker attributes a certain construction to some reference and then somebody from the audience challenging that reference to be the original source of that construction and than backing up that claim with a quick Spires HEP search.
In this case, it was Maxim Zabzine talking and Martin Roceck claiming to know an earlier reference to which Maxim replied "No, Martin, don't look it up on your laptop!". But it was already too late....
The other things is more technical and if you are not interested in the details of flux compactifications you can stop reading at this point. I am sure, most people are aware of this fact and also I had read about it but in the past it had never stroke me as so important: In traditional compactifications without fluxes on Calabi-Yaus say, the geometry is expressed in terms of J and Omega which are both covariantly constant and which fulfill J^3=Omega Omega-bar = vol(CY). Both arise from fierzing the two covanriantly constant spinors on the CY. Now, it's a well defined procedure to study deformations of this geometry: For a compact CY and as the Laplacian (or the other relevant operator to establish a form is harmonic) is elliptic, the deformations can be thought of to come from some cohomology class which is finite dimensional. So, effectively one has reduced the infinite dimensional spaces of forms on the CY to a finite dimensional subspace one in the end arrives at a finite number of light (actually massless) fields in the 4d low energy theory.
Or even more technical: What you do is to rewrite the 10d kinetic operator (some sort of d'Alambertian) as the sum of 4d d'Alambertian and a 6d Laplacian. The latter one is the elliptic operator and one can decompose all functions on the CY in terms of eigenfunctions of this Laplacian. As a result, the eigenvalues become the mass^2 of the 4d field and since the operator is elliptic, the spectrum is discrete. Any function which is no harmonic has a KK-mass which is parametrically the inverse linear dimension of the CY.
If you now turn on fluxes, the susy conditions on the spinors is no longer that they are covariantly constant (with respect to the Levi-Civita connection) but that they are constant relative to a new connection where the flux appears as torsion, formally Nabla' = Nabla + H. As a consequence one only has SU(3) structure: One can still fierz the spinors but now the resulting forms J and Omega are no longer harmonic. Thus it no longer makes sense to expand them (and their perturbations) in terms of cohomology classes. Thus the above trick to reduce the deformation problem to a finite dimensional one now fails and one does no longer have the separation into massless moduli and massive KK-states. In principle, one immediately ends up with infinitely many fields of all kinds of uncontrollable masses (unless one does not assume some smallness due to the smallness of the fluxes one has introduced). This is just because there is no longer a natural set of forms to expand things in.
However, today, Paul Koerber reported on some progress in that direction for the case of generalised Kähler manifolds: He demonstrated that one can get in that direction by the consideration of the cohomology with respect to d+H, the twisted differential. But still, this is work in progress and one does not have these cohomologies under good control. And even more, quasi by definition these do no longer contain the 'used to be' moduli which now obtained masses due to flux induced superpotential. Those are obviously not in the cohomology and thus still have the same status as the massive KK-modes which one would like to be parametrically heavier to all this really make sense.
There were many other interesting talks, some of them on non-geometries, spaces pioneered by Hull and friends where one has to use stringy transformations like T-dualities when going from one coordinate patch to another. Thus at least they are not usual geometries but maybe as T-duality has a quasi-field theoretical description there, might be amenable to non-commutative geometry.
I still like to mention two things though: Firstly there are the consequences of having wireless network links available in lecture halls: By now, we are all used to people doing their email during talks if one is less interested in what is currently going on on stage. Or alternatively, you try not to be so nosy as to try to read the emails on the laptop of the person in the row in front of you. But what I have encountered for the first time is that the speaker attributes a certain construction to some reference and then somebody from the audience challenging that reference to be the original source of that construction and than backing up that claim with a quick Spires HEP search.
In this case, it was Maxim Zabzine talking and Martin Roceck claiming to know an earlier reference to which Maxim replied "No, Martin, don't look it up on your laptop!". But it was already too late....
The other things is more technical and if you are not interested in the details of flux compactifications you can stop reading at this point. I am sure, most people are aware of this fact and also I had read about it but in the past it had never stroke me as so important: In traditional compactifications without fluxes on Calabi-Yaus say, the geometry is expressed in terms of J and Omega which are both covariantly constant and which fulfill J^3=Omega Omega-bar = vol(CY). Both arise from fierzing the two covanriantly constant spinors on the CY. Now, it's a well defined procedure to study deformations of this geometry: For a compact CY and as the Laplacian (or the other relevant operator to establish a form is harmonic) is elliptic, the deformations can be thought of to come from some cohomology class which is finite dimensional. So, effectively one has reduced the infinite dimensional spaces of forms on the CY to a finite dimensional subspace one in the end arrives at a finite number of light (actually massless) fields in the 4d low energy theory.
Or even more technical: What you do is to rewrite the 10d kinetic operator (some sort of d'Alambertian) as the sum of 4d d'Alambertian and a 6d Laplacian. The latter one is the elliptic operator and one can decompose all functions on the CY in terms of eigenfunctions of this Laplacian. As a result, the eigenvalues become the mass^2 of the 4d field and since the operator is elliptic, the spectrum is discrete. Any function which is no harmonic has a KK-mass which is parametrically the inverse linear dimension of the CY.
If you now turn on fluxes, the susy conditions on the spinors is no longer that they are covariantly constant (with respect to the Levi-Civita connection) but that they are constant relative to a new connection where the flux appears as torsion, formally Nabla' = Nabla + H. As a consequence one only has SU(3) structure: One can still fierz the spinors but now the resulting forms J and Omega are no longer harmonic. Thus it no longer makes sense to expand them (and their perturbations) in terms of cohomology classes. Thus the above trick to reduce the deformation problem to a finite dimensional one now fails and one does no longer have the separation into massless moduli and massive KK-states. In principle, one immediately ends up with infinitely many fields of all kinds of uncontrollable masses (unless one does not assume some smallness due to the smallness of the fluxes one has introduced). This is just because there is no longer a natural set of forms to expand things in.
However, today, Paul Koerber reported on some progress in that direction for the case of generalised Kähler manifolds: He demonstrated that one can get in that direction by the consideration of the cohomology with respect to d+H, the twisted differential. But still, this is work in progress and one does not have these cohomologies under good control. And even more, quasi by definition these do no longer contain the 'used to be' moduli which now obtained masses due to flux induced superpotential. Those are obviously not in the cohomology and thus still have the same status as the massive KK-modes which one would like to be parametrically heavier to all this really make sense.
There were many other interesting talks, some of them on non-geometries, spaces pioneered by Hull and friends where one has to use stringy transformations like T-dualities when going from one coordinate patch to another. Thus at least they are not usual geometries but maybe as T-duality has a quasi-field theoretical description there, might be amenable to non-commutative geometry.
Monday, January 08, 2007
Trusting voting machines
Before New Year I attended day one of the 26th Chaos Communication Congress (link currently down), the yearly hacker convention organised by the Chaos Computer Club. One of the big topics were voting machines especially after this and this and there being a strong lobby to introduce voting machines in Germany.
Most attendants agreed that most problems could be avoided by just not using voting machines but old school paper ballots. But there were also arguments in favour especially for elections with complicated voting systems: In some local elections in Germany, the voter can cast as many votes (70 IIRC) as there are seats in the parliament she is voting for with up to three votes per candidate. Obviously this is a night mare for a manual count. The idea behind these systems is to give voters rather than parties more influence on the composition of the parliament (while maintaining proportional vote) than in list voting system used most of the time: There, the parties set up sorted lists and the voters just vote for parties determining the number of seats for each party. Then these seats are filled with the candidates from the list from the top. This effectively means that the first list positions of the big parties are not really voted for in the general election as these people will go into parliament with nearly 100% probability and only the candidates further down the list are effectively decided about by the constituency.
The obvious problem with more complicated voting systems which might have the advantage of being fairer is that they are harder to understand and consequently they have the risk of being less democratic because too many voters fail to understand them. But voting systems should be a topic of a different post and have already been a topic in the past.
I would like to discuss what you can do to increase trust in voting machines if you decided to use them.
Note that nobody claims you cannot cheat in paper elections. You can. But typically, you can only influence a few votes with reasonable effort and it is unlikely that these will have big influences, say for order N voters you only influence O(1) votes. However with voting machines there are many imaginable attacks with influence possibly O(N) votes threatening the whole election.
The problem arises from the fact that there are three goals for an election mechanism which are hard to achieve all at the same time: a) the result should be check able b) the vote should be anonymous and c) the voter should not get a receipt of his vote (to prevent vote selling). If you drop either of these criteria the whole thing becomes much easier.
The current problems with voting machines mostly come from a): The machine just claims that the result is x and there is no way of challenging or verifying it. And in addition you have no real possibility to work out what software the machine is actually running, it could just have deleted the evil subroutines or it could be more involved.
A first solution would be that the voting machine prints out the vote on an internal printer and presents it to the voter so she could check it is what she really voted but the printout remains inside the voting machine and is a checkable proof of the vote. However now, you have to make sure you do not run into conflict with anonymity as it should not be possible to later find out what was for example the 10th vote.
Here comes my small contribution to this problem: Why not have the machine prove that what it claims is the result is correct? I would propose the following procedure:
For each voting possibility (party/candidate) there is one separate machine that cannot communicate with the other machines plus there is one further separate, open machine per polling station. Before the election, in some verifiable open and random procedure a hyperplane in an M dimensional projective space is determined (M has to be larger than the total number of voters), maybe with different people contributing different shares of information that go into the selection of that hyperplane. The idea behind this procedure is that any three points determine a plane in 3-space and it does not matter which 3 points you give (as long as they are not collinear).
Then each voter is presented a random point on that hyper surface (on some electric device or printout) and she votes by inputting that point into the machine that represents her vote.
After the election, each of the machines holds the information about as many points as votes have been cast for that party/candidate. Then it announces its result (the number of votes), N_i say. The new thing is it has to prove that it has really been given this N_i votes but it can do so demonstrating that it holds the information about N_i points. For example it does it by requesting further M-1-N_i points. After obtaining these it should know the hyper surface and can show this when given one further point with one coordinate missing: From the knowledge of the hyper surface it can compute the remaining coordinate and thus showing it really holds the information from the N_i votes it claimed it received. The nice thing about this procedure is that it does not matter which N_i points it received it is only the number that matters.
Of course, this is only the bare procedure and you can wrap it with further encryption for example to make the transition of the points from the polling to the counting computers safer. Furthermore, the dimensions sould really be larger so no points are accidentally linear dependant.
And of course, this procedure only prevents the machines from claiming more votes than they actually received. There is nothing which stops them from forgetting votes. Therefore, in this system, the individual machines should be maintained by the parties whose votes they count: These would have a natural interest in not losing any votes.
But at least, this procedure makes sure no votes from one candidate are illegally transferred to another candidate by evil code in a voting machine.
Most attendants agreed that most problems could be avoided by just not using voting machines but old school paper ballots. But there were also arguments in favour especially for elections with complicated voting systems: In some local elections in Germany, the voter can cast as many votes (70 IIRC) as there are seats in the parliament she is voting for with up to three votes per candidate. Obviously this is a night mare for a manual count. The idea behind these systems is to give voters rather than parties more influence on the composition of the parliament (while maintaining proportional vote) than in list voting system used most of the time: There, the parties set up sorted lists and the voters just vote for parties determining the number of seats for each party. Then these seats are filled with the candidates from the list from the top. This effectively means that the first list positions of the big parties are not really voted for in the general election as these people will go into parliament with nearly 100% probability and only the candidates further down the list are effectively decided about by the constituency.
The obvious problem with more complicated voting systems which might have the advantage of being fairer is that they are harder to understand and consequently they have the risk of being less democratic because too many voters fail to understand them. But voting systems should be a topic of a different post and have already been a topic in the past.
I would like to discuss what you can do to increase trust in voting machines if you decided to use them.
Note that nobody claims you cannot cheat in paper elections. You can. But typically, you can only influence a few votes with reasonable effort and it is unlikely that these will have big influences, say for order N voters you only influence O(1) votes. However with voting machines there are many imaginable attacks with influence possibly O(N) votes threatening the whole election.
The problem arises from the fact that there are three goals for an election mechanism which are hard to achieve all at the same time: a) the result should be check able b) the vote should be anonymous and c) the voter should not get a receipt of his vote (to prevent vote selling). If you drop either of these criteria the whole thing becomes much easier.
The current problems with voting machines mostly come from a): The machine just claims that the result is x and there is no way of challenging or verifying it. And in addition you have no real possibility to work out what software the machine is actually running, it could just have deleted the evil subroutines or it could be more involved.
A first solution would be that the voting machine prints out the vote on an internal printer and presents it to the voter so she could check it is what she really voted but the printout remains inside the voting machine and is a checkable proof of the vote. However now, you have to make sure you do not run into conflict with anonymity as it should not be possible to later find out what was for example the 10th vote.
Here comes my small contribution to this problem: Why not have the machine prove that what it claims is the result is correct? I would propose the following procedure:
For each voting possibility (party/candidate) there is one separate machine that cannot communicate with the other machines plus there is one further separate, open machine per polling station. Before the election, in some verifiable open and random procedure a hyperplane in an M dimensional projective space is determined (M has to be larger than the total number of voters), maybe with different people contributing different shares of information that go into the selection of that hyperplane. The idea behind this procedure is that any three points determine a plane in 3-space and it does not matter which 3 points you give (as long as they are not collinear).
Then each voter is presented a random point on that hyper surface (on some electric device or printout) and she votes by inputting that point into the machine that represents her vote.
After the election, each of the machines holds the information about as many points as votes have been cast for that party/candidate. Then it announces its result (the number of votes), N_i say. The new thing is it has to prove that it has really been given this N_i votes but it can do so demonstrating that it holds the information about N_i points. For example it does it by requesting further M-1-N_i points. After obtaining these it should know the hyper surface and can show this when given one further point with one coordinate missing: From the knowledge of the hyper surface it can compute the remaining coordinate and thus showing it really holds the information from the N_i votes it claimed it received. The nice thing about this procedure is that it does not matter which N_i points it received it is only the number that matters.
Of course, this is only the bare procedure and you can wrap it with further encryption for example to make the transition of the points from the polling to the counting computers safer. Furthermore, the dimensions sould really be larger so no points are accidentally linear dependant.
And of course, this procedure only prevents the machines from claiming more votes than they actually received. There is nothing which stops them from forgetting votes. Therefore, in this system, the individual machines should be maintained by the parties whose votes they count: These would have a natural interest in not losing any votes.
But at least, this procedure makes sure no votes from one candidate are illegally transferred to another candidate by evil code in a voting machine.
Subscribe to:
Posts (Atom)