[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Announcement

Collapse
No announcement yet.

Intel Mesa Code Lands Big Patch Series For Treating Convergent Values As SIMD8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Mesa Code Lands Big Patch Series For Treating Convergent Values As SIMD8

    Phoronix: Intel Mesa Code Lands Big Patch Series For Treating Convergent Values As SIMD8

    A patch series six months in the making and consisting of 24 patches by longtime Intel Linux graphics engineer Ian Romanick was merged on Christmas Eve for Mesa 25.0...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    What does convergent values mean in this context?

    Comment


    • #3
      In this context, a value is "convergent" if it has the same value in every SIMD lane.

      Intel GPUs can execute 8, 16 or 32 lanes wide SIMD; before this change, convergent values had to be stored in a register as wide as SIMD execution. After this change, they can be stored in an 8 lane register, and the GPU will automatically replicate it twice or four times for operations on 16 or 32 lanes.

      Comment


      • #4
        Do such Mesa enhancements automatically benefit Rusticle?

        Comment


        • #5
          Originally posted by farnz View Post
          In this context, a value is "convergent" if it has the same value in every SIMD lane.
          I assume this distinction is a static determination, no?

          Originally posted by The Patch
          Our register allocator is not clever enough to handle scalar allocations. It's fundamental unit of allocation is SIMD8. Start treating convergent values as SIMD8.
          Is this a hardware or software limitation? It still seems awfully wasteful to burn 256 bits on a 32-bit scalar, but I get that it's a lot better than 512 or 1024 bits.

          So, does the hardware actually have scalar registers and they're just not being used?

          Originally posted by farnz View Post
          Intel GPUs can execute 8, 16 or 32 lanes wide SIMD; before this change, convergent values had to be stored in a register as wide as SIMD execution. After this change, they can be stored in an 8 lane register, and the GPU will automatically replicate it twice or four times for operations on 16 or 32 lanes.
          Thanks for the explanation. When you say "the GPU will automatically replicated it", does that mean adding a SIMD8 vector to a SIMD32 vector will cause the SIMD8 operand automatically to get replicated to match the width of the larger operand?

          Comment


          • #6
            Originally posted by coder View Post
            I assume this distinction is a static determination,​ no? ​​​​​​
            This is a patch for the Intel compiler graphics as the article says, so i am pretty sure it is static, the compiler will check wich values are convergent and put that information on the machine code.

            Originally posted by coder View Post
            Is this a hardware or software limitation? It still seems awfully wasteful to burn 256 bits on a 32-bit scalar, but I get that it's a lot better than 512 or 1024 bits.
            Software, register allocation is a function of compilers, hardware dont decide wich registers are used in a instruction.

            the architecture of the compiler can only allocate register in groups of 8, they could change it, but that would require to change lots of code.

            Originally posted by coder View Post
            So, does the hardware actually have scalar registers and they're just not being used?


            Modern GPUs have scalar ALUs, but not scalar registers, they are shared with SIMD ALUs.

            Originally posted by coder View Post
            Thanks for the explanation. When you say "the GPU will automatically replicated it", does that mean adding a SIMD8 vector to a SIMD32 vector will cause the SIMD8 operand automatically to get replicated to match the width of the larger operand?
            I have not seen the code to see how it is implemented.

            But it could be like that, store convergent values in a SIMD8 to reduce register usage, then when using it in a instruction, just copy the value to more registers until you have a full SIMD32.​

            Comment


            • #7
              Originally posted by coder View Post
              Is this a hardware or software limitation? It still seems awfully wasteful to burn 256 bits on a 32-bit scalar, but I get that it's a lot better than 512 or 1024 bits.

              So, does the hardware actually have scalar registers and they're just not being used?
              No...all registers are 256-bit (8 lanes at 32-bit) until Xe2 (Lunarlake/Battlemage) when they start being 512-bit (16 lanes at 32-bit). You've got the right of it—using 8 lanes for a scalar is still pretty wasteful, but a lot better than wasting 16 or 32 lanes. Eventually, we plan to do SIMD1 scalars, where we use only 1 lane, and can pack things more tightly.

              But, as an incremental step, this let us figure out which values are convergent, teach consumers to handle that, and begin taking advantage of the information. We can then look at allocating scalars more efficiently in a second step.

              Originally posted by coder View Post
              Thanks for the explanation. When you say "the GPU will automatically replicated it", does that mean adding a SIMD8 vector to a SIMD32 vector will cause the SIMD8 operand automatically to get replicated to match the width of the larger operand?
              Most instructions can implicitly replicate a scalar source out to all the lanes. For example,

              Code:
              add(16)   r2<1>UD    r4<16,16,1>UD    r8.7<0,1,0>UD
              Would add the 16 lanes in register r4 with r8's single lane 7 and store that in r2. This is effectively free.​
              Free Software Developer .:. Mesa and Xorg
              Opinions expressed in these forum posts are my own.

              Comment

              Working...
              X