8000 Possible performance gains · Issue #521 · chalk-lab/Mooncake.jl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Possible performance gains #521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nsiccha opened this issue Mar 12, 2025 · 5 comments
Open

Possible performance gains #521

nsiccha opened this issue Mar 12, 2025 · 5 comments
8000
Labels
enhancement (performance) Would reduce the time it takes to run some bit of the code

Comments

@nsiccha
Copy link
nsiccha commented Mar 12, 2025

This is a low priority issue.

I've come across some simple Bayesian models for which Mooncake is significantly (~4times) slower than Enzyme or an alternative, very limited, Proof-Of-Concept Julia AD method (StanBlocksAD.jl). AFAICT, Mooncake should be able to reach Enzyme's/StanBlocksAD.jl performance. It's a bit unclear to me what exactly is "dragging Mooncake down".

Furthermore, for a batched version of that model, neither Enzyme nor Mooncake achieve the same scaling as StanBlocksAD.jl. To clarify/summarize, the timings relative to the scalar StanBlocksAD.jl/Enzyme.jl timing are roughly:

BATCH_TYPE        |         Float64  SReal{1, Float64}  SReal{2, Float64}  SReal{4, Float64}  SReal{8, Float64}  SReal{16, Float64}
=====             |         =====    =====              =====              =====              =====              =====
Primal            |         0.35     0.38               0.37               0.37               0.4                0.64
StanBlocksAD      |         1.0      0.99               1.1                1.2                1.6                2.7
Mooncake          |         4.5      4.6                4.6                12.0               21.0               35.0
Enzyme            |         1.0      2.7                3.1                3.4                4.5                7.8

Notebook with (slightly different) timings and potentially reproducible code: https://nsiccha.github.io/StanBlocksAD.jl/#why

I don't intend to continue developing StanBlocksAD.jl, but I find it interesting that there are apparently still possible performance gains for something purely Julian. We can discuss what StanBlocksAD.jl does differently than Mooncake and what if anything could be ported to Mooncake. But this issue is mainly meant to record this link, and to be revisited at some later point.

@willtebbutt willtebbutt added the enhancement (performance) Would reduce the time it takes to run some bit of the code label Mar 12, 2025
@yebai
Copy link
Member
yebai commented Mar 12, 2025

@nsiccha, can you confirm that below is the target benchmarking function:

https://github.com/nsiccha/StanBlocksAD.jl/blob/16b1882d4a60b6eeaa4bf436d142cc1ebcb7399b/docs/index.qmd#L213-L229

@nsiccha
Copy link
Author
nsiccha commented Mar 13, 2025

@yebai, yes, exactly. The main work happens in the final StanBlocks.normal_lpdf call.

It uses this overwrite of the StanBlocks.normal_lpdf function to replace the Base.sum function by the StanBlocksAD.my_sum function, IIRC that made Mooncake and Enzyme a bit faster.

Using StanBlocks.constview here instead of a regular view (as commented out above) made all versions faster IIRC, because it avoids an allocation that Base.view apparently feels compelled to do.

@yebai
Copy link
Member
yebai commented Mar 21, 2025

@nsiccha, can you try to prepare an MWE that only depends on StanBlocks.jl and Julia's standard library so we can quickly analyse the cause here?

@nsiccha
Copy link
Author
nsiccha commented Mar 21, 2025

@yebai, of course, I'll try to do it next week.

@nsiccha
Copy link
Author
nsiccha commented Mar 28, 2025

Won't manage it this week, bite hopefully the next one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement (performance) Would reduce the time it takes to run some bit of the code
Projects
None yet
Development

No branches or pull requests

3 participants
0