Description
Based on the results, even the 2GHz Quad-Core A53 on TP-Link XDR 6088 can achieve 818 Mbits/sec. I doubt the Raspberry Pi 4's result of only 394 Mbits/sec is accurate as it has Quad-Core A72 @ 1.5GHz. Then, I switched back to the archlinuxarm-based PiKVM distro which my Raspberry PI 4 usually works on with armv7l kernel rather than aarch64 on Raspberry Pi OS, and ran the benchmark. Then, the result made me astonished.
| Device / CPU | OS / Kernel / iperf Param | Speed |
| Raspberry Pi 4 / BCM2711* | Debian bookworm / 6.1.63 | 394 Mbits/sec |
| Raspberry Pi 4 / BCM2711* | archlinux / 6.1.61(armv7l) | 665 Mbits/sec |
Using armv7l Kernel we will get about 69% faster, WHY?
I searched on the web and found a thread that has the same confusion as me but on AES rather than chacha20 used by wg[1]. It might be the chacha20 implementation in the kernel is not optimized in aarch64. I want to leave the issue here to record any further investigation of this performance issue.