8000 SIMD update (NEON, SSE3, SSE4) + Features by recp · Pull Request #72 · recp/cglm · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

SIMD update (NEON, SSE3, SSE4) + Features #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 3, 2019
Merged

SIMD update (NEON, SSE3, SSE4) + Features #72

merged 20 commits into from
Feb 3, 2019

Conversation

recp
Copy link
Owner
@recp recp commented Jan 22, 2019

Some operations (vec4 and mat4) are optimized with NEON, SSE3 and SSE4 intrinsics.

glm_vec4_dot function now supports SSE3, SSE4 and NEON. But SSE3 and SSE4 dot is disabled by default. If CGLM_SSE4_DOT macro is enabled and SSE4 is supported then SSE4 version will be used. If CGLM_SSE3_DOT is defined and SSE3 is supported then SSE3 version will be used.

New Options:

  • CGLM_SSE4_DOT : Enable SSE4 optimization for dot products
  • CGLM_SSE3_DOT : Enable SSE3 optimization for dot products

New Functions:

  • void glm_vec4_cubic(float s, vec4 dest) fills vec4 as [s^3, s^2, s, 1.0]
  • float glm_mat4_rmc(vec4 r, mat4 m, vec4 c) multiplies row vector, matrix and column vector and returns scalar. This is good helper to get SMC result easily for curves.
  • float glm_smc(float s, mat4 m, vec4 c) calculates SMC multiplication by using glm_mat4_rmc()and glm_vec4_cubic()
  • float glm_bezier() cubic bezier equation
  • float glm_hermite() cubic hermite equation
  • float glm_decasteljau() solve cubic bezier equation using decasteljau

New glmm (SIMD) functions:

  • glmm_vhadds(v) horizontal add, returns register
  • glmm_hadd(v) horizontal add, returns scalar
  • glmm_vdots(a, b) dot product, single lane contain dot product to convert result to scalar
  • glmm_vdot(a, b) dot product, all lanes contain dot product to use result with other vector operations

Improvements:

  • glmm_ functions are moved to platform specific headers.
  • Now some glmm_ functions supports NEON
  • CGLM_SIMD_x86, CGLM_SIMD_ARM and CGLM_SIMD are defined if it is supported.

@coveralls
Copy link
coveralls commented Jan 22, 2019

Coverage Status

Coverage decreased (-0.4%) to 11.126% when pulling af088a1 on simd-update into 0f223db on master.

@recp recp changed the title SIMD update (NEON, SSE3, SSE4) SIMD update (NEON, SSE3, SSE4) + Features Jan 26, 2019
@recp recp merged commit 1a34ffc into master Feb 3, 2019
@recp recp deleted the simd-update branch February 3, 2019 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0