Description
Playing with your formulae, I noticed that the update algorithm could be simplified significantly, at least in the positive recall case. First, I noticed that time-shifting the Beta distribution by delta
gives a Generalized Beta Distribution of the First Kind (with a=1/delta; b=1
in the terms of that page). Unfortunately, the literature seems to be sparse on that particular formulation (or I'm not using the right search terms), but the formula for moments provided on the page is quite useful. It does look like the GB1 distribution is conjugate for a Bernoulli trial, with an easy analytical formula for the positive-recall update and at least a good numerical moment-matching fit for the negative-recall case.
I've made a notebook over here to explore the possibilities.
Notice that when you multiply the likelihood (p
) to the prior, you can fold it into the term (p^((alpha-delta)/delta)*p = p^((alpha+delta-delta)/delta
) which is formally the same as if you had alpha+delta
instead of alpha
. So at least for the positive recall case, you can leave the time part of the model alone and just increment alpha += delta
and leave the time part of the model alone. Of course, when delta==1.0
, where our prior is a pure Beta distribution, that collapses to the well-known Bayesian update for a Beta distribution and a Bernoulli trial. You can also work through the moment formula given on the Wikipedia page to verify that it reproduces your moments when you use alpha + delta
in the Wikipedia formula (basically, the n
th moment is gamma[n+1]/gamma[1]
, in your shorthand). Basically, when you evolve the posterior distribution back to the original time constant, you get back another pure Beta distribution. Qualitatively, this update makes some sense. When delta
is low, the prior is more heavily weighted towards 1, so a positive recall is less surprising and therefore updates less. When delta
is high, the prior is more heavily weighted towards 0, so a positive recall is more surprising and updates more.
The negative recall case is more tricky, and I haven't made any headway on getting an analytical form. However, it is relatively easy to match the moments using numerical optimization by adjusting both alpha
and beta
in the GB1 distribution. Again, like the positive-recall update, the time constant is left alone. I do wonder, however, if maybe adjusting beta
and delta
(e.g. by way of adjusting the time constant) would lead to a concise analytical form. That would make a better formal correspondence to the negative-recall update in the pure Beta case.
If the time constant (i.e. the time at which the prior is a pure Beta) is kept fixed, that might mean that when the actual half-life has increased to many times the original value, you might get numerical instabilities. It might be a good idea to always move out the time constant to the estimated half-life. One scheme could be to do the GB1 update where the time constant remains fixed, then project that distribution out to the estimated half-life, find the moments there, and then moment-match a pure Beta distribution there.
What originally led me to investigate this is when I was considering what would happen if I happened to review twice very quickly (my Anki history has some of these due to Anki's lapse algorithm). With Ebisu's current scheme of using a time constant corresponding to the last review interval, that created large numerical problems in my tests. The pure Beta approximation for such a small delta
is likely not great. t
would be very tiny, so a normal review after that would project that not-great approximation out to quite a high delta
. I think that using the GB1 update, either alone or with my suggestion of telescoping out the time constants to match the estimated half-lives, would probably alleviate some of those issues.
Thank you for your attention and an interesting, well-thought-out algorithm.