Perhaps switch to pointers for matrices and vectors #29

UserAB1236872 · 2014-08-22T10:07:19Z

I got an email recently with some bechmarks. The long and short is: once you get to Mat4 size, it is a significant speed improvement (about 13 ns/op for multiplcation) to use a pointer for Mat4. This is a significant API change and would break a lot of existing code. Doing

m := mgl32.Ident4()
m[0] = 5

Would break. Any code that currently uses any of: At and Set for matrices or X, Y, Z, or W for vectors is safe (SetX and the like would likely needed to be added for Vectors).

The way I see it, there are two potential solutions here:

Redo everything to use pointers. This, as mentioned, breaks existing code.
Add functions like func (m *Mat4) MulTo(dst *Mat4, m2 *Mat4) which would have similar semantics to *MatMN.MulMN(dst, m2).

I like option 2, since there's no compatability breakage, but it does clutter the API and people may mistakenly use the slower code without realizing it.

Thoughts?

The text was updated successfully, but these errors were encountered:

UserAB1236872 · 2014-08-22T10:09:46Z

Another snag with wholesale switching to pointers is the Quaternion type. Right now it's

type Quat struct {
    W float32
    V Vec3
}

Should this be changed to

type Quat struct {
    W float32
    V *Vec3
}

or kept the same and have a pointer to the Quat struct (assuming we go with option 1 above)?

dmitshur · 2014-08-23T23:04:19Z

I was gonna comment on this, but I forgot what I wanted to say.

Honestly, I think it's a hard question and I don't know what's the best thing to do.

ghost · 2014-08-28T18:12:32Z

(i initially brought up the problem)

BenchmarkValue 50000000 45.5 ns/op
BenchmarkReference 50000000 40.5 ns/op
BenchmarkDestination 100000000 29.2 ns/op
BenchmarkBoth 100000000 24.2 ns/op

multiplication by reference: 11% faster
multiplication by value with destination matrix: 36% faster
both pass by ref and have dst matrix: 53% faster

"The long and short is: once you get to Mat4 size, it is a significant speed improvement (about 13 ns/op for multiplication)" @Jragonmiris
13 ns/op doesn't mean anything if you don't also give the speed before :P

you can find the implementation of the benchmark here

as for wether or not to break the api, the library seems to be 6 month old (according to first commit), might as well make it right today then to pay for it later. There's only so much you can break. Also, since the code is mostly generated, you could offer the new (pointer and dst) version as the main version and a deprecated version with the pass-by-value, something like

import "github.com/go-gl/mathgl/mgl32"
or
import "github.com/go-gl/mathgl/dep/mgl32"

for the quaternions, the right answer is "whatever is fastest" but i suspect it's the 2nd option (with the pointer) because quaternion multiplication probably suffers a lot from pass-by-value
quat.go:L92

func (q1 Quat) Mul(q2 Quat) Quat {
return Quat{q1.W*q2.W - q1.V.Dot(q2.V), q1.V.Cross(q2.V).Add(q2.V.Mul(q1.W)).Add(q1.V.Mul(q2.W))}
}

dmitshur · 2014-08-28T18:19:37Z

as for wether or not to break the api, the library seems to be 6 month old (according to first commit), might as well make it right today then to pay for it later. There's only so much you can break. Also, since the code is mostly generated, you could offer the new (pointer and dst) version as the main version and a deprecated version with the pass-by-value, something like

import "github.com/go-gl/mathgl/mgl32"
or
import "github.com/go-gl/mathgl/dep/mgl32"

If you're going to do that, you might as well use gopkg.in with semantic versioning. I have to admit, it's starting to grow on me because it allows one to improve APIs with less fear of breaking dependent code.

UserAB1236872 · 2014-09-03T05:18:23Z

I've done some benchmarking, though with Vec4 rather than Mat4. I was playing around with SIMD (oh, yeah, I'm going to update the package to use SIMD, surprise) and I've found:

With pointers, a destination, and SIMD, performing element-wise multiplications of two float32 vectors is only barely slower than multiplying two float32s. Once pointers come in, it gets tricky:

Two float32s: 0.5ns/op
Two Vec4s SIMD by pointer with preallocated dst: 1.5 ns/op
Two Vec4s SIMD by pointer returned by value: 4.0 ns/op
Two Vec4s SIMD by pointer, allocate new dst every time: 40 ns/op
Two Vec4s SIMD by value, return by value: 7.3 ns/op
What we have now (no SIMD, all by value): 10 ns/op

Reallocating a new dst is over 25x slower than having a preallocated dst. More importantly, it's about 4x slower than what we have now.

What do I mean by "reallocating a new dst"? I mean, essentially, this:

func (v1 *Vec4) ElMul(v2 *Vec4) *Vec4 {
    out := &Vec4{}
    simdElMul(v1, v2, out)
    return out
}

That single step of creating that new Vec4 single handedly wreaks havoc performance. It seems irresponsible to have a function like that, since people will likely use it due to the much nicer syntax than a dst function. More notably: if we simply made every functions return pointers now, all existing code would immediately be changed to that. It's possible a sync.Pool may help, but given the extra type conversions I'm dubious.

Thoughts?

ghost · 2014-09-03T13:11:29Z

well according to those results, keep returning by value and/or allow a dst function on top, this way people can chose, I know my app takes care of creating mat4 in advance. this way if people don't wish to care about performance, they can still use the old non-dst version.

Mul4(in *Vec4) Vec4
and
Mul4dst(in, out *Vec4)

UserAB1236872 · 2014-11-04T09:04:23Z

I think the direction to go for the time being (ignoring any theoretical code changes we can do due to things possibly happening in 8000 December with this package), would be to change all functions to have a pointer receiver, this is possible and will result in minimal code breakage due to auto-addressing. Functions will still take vectors as arguments and return vectors.

MulIn functions will be added, e.g.

func (m1 *Mat4)Mul4In(dst, m2 *Mat4) {
   (*dst) = //do stuff
   // alternatively simdMul4In(m1, m2, dst)
   return dst
}

ghost · 2016-04-08T02:00:11Z

holy crap these results are wrong, what happens is cpu cache kicks in and makes pointer faster then they really are, memory access is super expensive. I don't know what exactly the right solution is, I just know these results shouldn't be considered

dominikh · 2018-02-22T08:01:32Z

Doing

m := mgl32.Ident4()
m[0] = 5

Would break.

As long as Ident4() returns a unique copy, not a pointer to a globally defined variable, this will continue to work fine.

Indexing, as well as the built-in len function, are defined for pointers to arrays (not to be confused with slices, where the same is not true.)

What would break are conversions between mathgl types and arrays or f32/f64 types.

UserAB1236872 · 2018-03-09T10:30:15Z

I think the package has been stable for too long to risk a breaking change this big now.

UserAB1236872 mentioned this issue Nov 4, 2014

Improve code generation with text/template #32

Closed

UserAB1236872 closed this as completed Mar 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perhaps switch to pointers for matrices and vectors #29

Perhaps switch to pointers for matrices and vectors #29

Perhaps switch to pointers for matrices and vectors #29

Perhaps switch to pointers for matrices and vectors #29

Comments