-
-
Notifications
You must be signed in to change notification settings - Fork 259
SIMD update (NEON, SSE3, SSE4) + Features #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* optimize dot product
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some operations (vec4 and mat4) are optimized with NEON, SSE3 and SSE4 intrinsics.
glm_vec4_dot
function now supports SSE3, SSE4 and NEON. But SSE3 and SSE4 dot is disabled by default. If CGLM_SSE4_DOT macro is enabled and SSE4 is supported then SSE4 version will be used. If CGLM_SSE3_DOT is defined and SSE3 is supported then SSE3 version will be used.New Options:
CGLM_SSE4_DOT
: Enable SSE4 optimization for dot productsCGLM_SSE3_DOT
: Enable SSE3 optimization for dot productsNew Functions:
void glm_vec4_cubic(float s, vec4 dest)
fills vec4 as [s^3, s^2, s, 1.0]float glm_mat4_rmc(vec4 r, mat4 m, vec4 c)
multiplies row vector, matrix and column vector and returns scalar. This is good helper to get SMC result easily for curves.float glm_smc(float s, mat4 m, vec4 c)
calculates SMC multiplication by usingglm_mat4_rmc()
andglm_vec4_cubic()
float glm_bezier()
cubic bezier equationfloat glm_hermite()
cubic hermite equationfloat glm_decasteljau()
solve cubic bezier equation using decasteljauNew glmm (SIMD) functions:
glmm_vhadds(v)
horizontal add, returns registerglmm_hadd(v)
horizontal add, returns scalarglmm_vdots(a, b)
dot product, single lane contain dot product to convert result to scalarglmm_vdot(a, b)
dot product, all lanes contain dot product to use result with other vector operationsImprovements:
glmm_
functions are moved to platform specific headers.