8000 haproxy-3.1.7 aborted with large map_sub · Issue #2949 · haproxy/haproxy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

haproxy-3.1.7 aborted with large map_sub #2949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
felipewd opened this issue Apr 24, 2025 · 3 comments
Closed

haproxy-3.1.7 aborted with large map_sub #2949

felipewd opened this issue Apr 24, 2025 · 3 comments
Labels
status: works as designed This issue stems from a misunderstanding of how HAProxy is supposed to work. type: bug This issue describes a bug.

Comments

@felipewd
Copy link
felipewd commented Apr 24, 2025

Detailed Description of the Problem

On some machines, we're using haproxy to do path-based routing to certain servers, in order to serve video traffic. we map certain UUIDs to certain backends in a map-file. Using path,lower,map_sub with a large enough map file (around 2,2M lines) haproxy ended up using all cores with %usr and ended up crashing.

Expected Behavior

Don't crash.

Steps to Reproduce the Behavior

Load a 2M map file and trying to process it with map_sub

Do you have any idea what may have caused this?

Not really, but I can confirm changing it to an exact match of the path fixed the problem.

Our URL is usually:

/vod/<uuid>/bla.m3u8

At first we were matching with path,lower,map_sub(/etc/haproxy/uuid.map), but later changing to path,lower,field(3,/),map(/etc/haproxy/uuid.map fixed the issue.

We get around 5000 RPS on this service.

Do you have an idea how to solve the issue?

In our case, doing an exact-match on the map file.

What is your configuration?

global
        log /dev/log    local0
        log /dev/log    local1 notice
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        maxconn 900000
        ulimit-n 3150919

        daemon

defaults
        mode    http

        timeout connect 1s
        timeout server 100s
        timeout server-fin 4s
        timeout client 100s
        timeout client-fin 4s
        timeout http-request 6s
        timeout http-keep-alive 40s
        timeout tunnel 300s

#        log-format      %Ts.%ms\ %ts\ %b\ %HM\ %ST\ %fc\ %bc\ %sc\ %ci:%cp\ %B\ %U\ %TR\ %Tr\ %Tw\ %Tc\ %sslv\ %sslc\ %[capture.req.hdr(0)]\ %HU


frontend stats

bind *:8404 shards by-thread
    stats enable
    stats uri /stats
    stats refresh 5s
    stats auth doritos:fiAcorkOlb

frontend front
        bind :88
        bind *:443 ssl crt /etc/haproxy/cert.pem

#       log global

        http-response set-header server Streaming
        default_backend osd-18 # default backend
        # use_backend %[path,lower,field(3,/),map(/etc/haproxy/uuid.map)] < --- FIXED
        use_backend %[path,lower,map_sub(/etc/haproxy/uuid.map)] #<--- BUG

<tons of backends>

Output of haproxy -vv

HAProxy version 3.1.7-c3f4089 2025/04/17 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2026.
Known bugs: http://www.haproxy.org/bugs/bugs-3.1.7.html
Running on: Linux 5.4.0-212-generic #232-Ubuntu SMP Sat Mar 15 15:34:35 UTC 2025 x86_64
Build options :
  TARGET  = linux-glibc
  CC      = cc
  CFLAGS  = -O2 -g -fwrapv
  OPTIONS = USE_OPENSSL=1 USE_ZLIB=1 USE_PCRE=1 USE_PCRE_JIT=1
  DEBUG   = 

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT +PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=16).
Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=SPOP  side=BE     mux=SPOP  flags=HOL_RISK|NO_UPG
       spop : mode=SPOP  side=BE     mux=SPOP  flags=HOL_RISK|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
	[BWLIM] bwlim-in
	[BWLIM] bwlim-out
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

can't put the full backtrace here, see below

Additional Information

Since we were using 3.0 branch in production, I tried compiling 3.1 in order to see if it helped. So I can confirm the issue exists on both 3.0 and 3.1 branches in our use case.

Crash message attached here.

haproxy-map-3.1-abort.txt

@felipewd felipewd added status: needs-triage This issue needs to be triaged. type: bug This issue describes a bug. labels Apr 24, 2025
@wtarreau
Copy link
Member

Ouch, 2.2M lines ??? map_sub() and map_reg() are the most expensive matches, they have no other option but try every single pattern one after the other. Usually, 1000 lines already provide catastrophic performance and incur noticeable latencies to the whole process. But 2.2M I've never saw that yet :-/ It will definitely take a few seconds during which all traffic stalls, which precisely is the purpose of the watchdog, detect that the process is no longer making any progress.

In 3.2 we've implemented yielding at the rule level, in order to minimize the latency caused by many rules, but we don't have anything to minimize the latency impact of a single rule. Maybe one dirty work around could be to split your huge map_sub into smaller pieces evaluated in distinct rules, and set tune.max-rules-at-once to 1 to force yielding after each one. But IMHO that remains a hack, because anyway the request that evaluates this will have to go through all these patterns and will take ages to evaluate (and the CPU usage will be huge as well).

There is very likely another solution. Algorithmically speaking it makes no sense to look for 2.2M patterns at random places (which is what map_sub does). It looks like a heavier form of what anti-virus do. And even the risk of unexpected match of a pattern inside another one is huge if all patterns are not the same size. Isn't it possible instead to extract the part that is supposed to be matched against, and compare it to fixed size values ? That way the research of the location would be done once, and then the match against known values would be performed in O(logN), no longer O(N*input_size).

@felipewd
Copy link
Author

Hi @wtarreau thanks for the fast reply!

Sure, that's what we did. When changed to an exact match on the map file:

use_backend %[path,lower,field(3,/),map(/etc/haproxy/uuid.map)]

Everything went smoothly.

You can put that on my Wall of Shame, since I didn't know at first the cost of map_sub.

But hey, with less than 1.9M lines it was no biggie, so that's a plus ;-)

I just thought a bug report here might shed the light on this.

This was one those cases that the map file started with 20 entries and was updated programmatically, and we noticed today when it exploded :-)

Feel free to mark it as a non-bug.

@wtarreau
Copy link
Member

I would really love to emit some warnings when loading such maps/acls with too many lines, but the problem is that some users would consider that no warning = valid config. But even a single regex can be constructed to take multiple days to evaluate, so the length is not all. But maybe in a case like this, it would at least warn you that the bot feeding the map is going out of control, so maybe that could be useful anyway.

It's possible that the doc is not clear enough about the dangers.

But yeah, at least the issue may serve to help someone else facing the same in the future. I'm glad you could work it out!

@wtarreau wtarreau added status: works as designed This issue stems from a misunderstanding of how HAProxy is supposed to work. and removed status: needs-triage This issue needs to be triaged. labels Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: works as designed This issue stems from a misunderstanding of how HAProxy is supposed to work. type: bug This issue describes a bug.
Projects
None yet
Development

No branches or pull requests

2 participants
0