8000 refactor(942340): move to regex assembly by fzipi · Pull Request #4014 · coreruleset/coreruleset · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

refactor(942340): move to regex assembly #4014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 24, 2025
Merged

Conversation

fzipi
Copy link
Member
@fzipi fzipi commented Feb 24, 2025

what

  • move to regex assembly syntax

In this process, there are some small semantic changes:

  • optional spaces \s*? are just changed to \s*, like we do in other regexes this will be in a followup PR (maybe we need to add some docs on this decision?)
  • operators are consolidated in two main groups: logical and sql
  • quotes are also consolidated and expanded from only ['"] to ['"```] (don't know how to escape the backtick in gh markdown) this will be in a followup PR

tests

To make some sense from this rule, I'm adding here the test payloads so we can figure it out what it should be matching, add more, etc. (you can get these using: yq e '.tests[] | .stages[] | .input | (.uri, .data)' tests/regression/tests/REQUEST-942-APPLICATION-ATTACK-SQLI/942340.yaml)

/get?var=in ( select * from
/get?var=except 	select.1,2
/get?var=except values(1,2)
/get?var=except selecting
email=x' or array[id] is not null--
email=x' or email~all(array[email]);analyze--
email=' and email not similar to id--
email=' or true; foo
email=' or false; foo
email='||true
email='ortrue

See https://regex101.com/r/2Kn3mc/1

changes

  • added more operators (->, *, /), or things are easy to bypass followup PR
  • couldn't find any database that has something called xxor?

@fzipi fzipi changed the title refactor(942340): move to regex assambly refactor(942340): move to regex assembly Feb 24, 2025
@fzipi fzipi force-pushed the rules/942340-move-regex-assembly branch from f062144 to 11745f0 Compare March 3, 2025 12:53
Copy link
Contributor
github-actions bot commented Mar 3, 2025

📊 Quantitative test results for language: eng, year: 2023, size: 10K, paranoia level: 1:
🚀 Quantitative testing did not detect new false positives

@fzipi fzipi force-pushed the rules/942340-move-regex-assembly branch from 11745f0 to 903aba3 Compare March 3, 2025 13:33
@fzipi fzipi requested a review from theseion March 3, 2025 14:11
@fzipi fzipi force-pushed the rules/942340-move-regex-assembly branch from 4be97c9 to f419b0e Compare March 4, 2025 13:19
@fzipi fzipi marked this pull request as ready for review March 4, 2025 14:04
@fzipi fzipi requested a review from theseion March 4, 2025 14:05
@fzipi fzipi enabled auto-merge March 5, 2025 01:44
Copy link
Contributor
@theseion theseion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated regex differs significantly from the original.

Copy link
Contributor
@theseion theseion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my last review. I was looking at an older version. I'll have to take some time to analyse the updated expression.

@fzipi fzipi force-pushed the rules/942340-move-regex-assembly branch from e422d2c to 1b278f9 Compare March 26, 2025 15:21
@fzipi fzipi force-pushed the rules/942340-move-regex-assembly branch from 242631d to 587185d Compare April 8, 2025 13:51
@fzipi fzipi requested a review from theseion April 8, 2025 13:51
@fzipi fzipi requested a review from theseion April 13, 2025 20:35
Copy link
Contributor
@theseion theseion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be more of these small differences. Don't have the time right now to find them all.

@fzipi fzipi force-pushed the rules/942340-move-regex-assembly branch from 8d4a6f3 to 40ed9f0 Compare April 20, 2025 13:48
@theseion
Copy link
Contributor

I've used the following to validate (formatting is not consistent but so as to make it easy to spot differences).
Original:

(?i)in[\s\x0b]*?\(+[\s\x0b]*?select|(?:
	(?:N?AND|X?X?OR|DIV|LIKE|BETWEEN|NOT)[\s\x0b]+|
	(?:\|\||&&)[\s\x0b]*
)[\s\x0b\+0-9A-Z_a-z]+(?:regexp[\s\x0b]*?\(|sounds[\s\x0b]+like[\s\x0b]*?[\"'`]|[0-9=]+x)|
[\"'`](?:
	[\s\x0b]*?(?:
		[0-9][\s\x0b]*?(?:--|#)|is[\s\x0b]*?(?:
			[0-9].+[\"'`]?[0-9A-Z_a-z]|[\.0-9]+[\s\x0b]*?[^0-9A-Z_a-z].*?[\"'`]
		)
	)|
	[%&<->\^]+[0-9][\s\x0b]*?(?:=|x
8000
?or|div|like|between|and)|
	(?:[^0-9A-Z_a-z]+[\+\-0-9A-Z_a-z]+[\s\x0b]*?=[\s\x0b]*?[0-9][^0-9A-Z_a-z]+|\|?[\-0-9A-Z_a-z]{3,}[^\s\x0b,\.0-9A-Z_a-z]+)[\"'`]|
	[\s\x0b]*(?:
			(?:N?AND|X?X?OR|DIV|LIKE|BETWEEN|NOT)[\s\x0b]+|
			(?:\|\||&&)[\s\x0b]*
		)(?:
			array[\s\x0b]*\[|
			[0-9A-Z_a-z]+(?:
				[\s\x0b]*!?~|
				[\s\x0b]+(?:not[\s\x0b]+)?similar[\s\x0b]+to[\s\x0b]+
			)|(?:tru|fals)e\b
		)
)|
\bexcept[\s\x0b]+(?:select\b|values[\s\x0b]*?\()

Updated:

(?i)in[\s\x0b]*?\(+[\s\x0b]*?select|(?:
	(?:and|n(?:and|ot)|(?:xx?)?or|div|like|between)[\s\x0b]+|
	(?:\|\||&&)[\s\x0b]*?
)[\s\x0b\+0-9A-Z_a-z]+(?:regexp[\s\x0b]*?\(|sounds[\s\x0b]+like[\s\x0b]*?[\"'`]|[0-9=]+x)|
[\"'`](?:
	[\s\x0b]*?(?:
		(?:
			[0-9]+[\s\x0b]*?(?:--|#)|is[\s\x0b]*?(?:
				[0-9].+[\"'`]?[0-9A-Z_a-z]|[\.0-9]+[\s\x0b]*?[^0-9A-Z_a-z].*?[\"'`]
			)|(?:and|n(?:and|ot)|(?:xx?)?or|div|like|between)[\s\x0b]+|
			(?:\|\||&&)[\s\x0b]*?
		)(?:
			array[\s\x0b]*?\[|
			(?:tru|fals)e\b|
			[0-9A-Z_a-z]+(?:
				[\s\x0b]*?!?~|
				[\s\x0b]+(?:not[\s\x0b]+)?similar[\s\x0b]+to[\s\x0b]+
			)
		)|
		[%&<->\^]+[0-9]+[\s\x0b]*?(?:and|n(?:and|ot)|(?:xx?)?or|div|like|between)=
	)|
	(?:[^0-9A-Z_a-z]+[\+\-0-9A-Z_a-z]+[\s\x0b]*?=[\s\x0b]*?[0-9][^0-9A-Z_a-z]+|\|?[\-0-9A-Z_a-z]{3,}[^\s\x0b,\.0-9A-Z_a-z]+)[\"'`]
)|
\bexcept[\s\x0b]+(?:select\b|values[\s\x0b]*?\()

As you can see, there is (should be) only one major change, where grouping was optimised for expressions with the "quotes + spaces" prefix.

@theseion
Copy link
Contributor
theseion commented May 3, 2025

I believe it's good to go. Let me know if you're happy as well.

theseion
theseion previously approved these changes May 3, 2025
@github-actions github-actions bot added the Stale label Jun 5, 2025
@Xhoenix Xhoenix removed the Stale label Jun 13, 2025
fzipi and others added 14 commits June 21, 2025 10:59
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: Max Leske <250711+theseion@users.noreply.github.com>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: Max Leske <250711+theseion@users.noreply.github.com>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: Max Leske <250711+theseion@users.noreply.github.com>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
@fzipi fzipi force-pushed the rules/942340-move-regex-assembly branch from 6f45fed to 9fce961 Compare June 21, 2025 14:00
@fzipi
Copy link
Member Author
fzipi commented Jun 21, 2025

@theseion Can you reapprove so we merge?

@fzipi fzipi added this pull request to the merge queue Jun 24, 2025
Merged via the queue into main with commit 27ee4e5 Jun 24, 2025
6 checks passed
@fzipi fzipi deleted the rules/942340-move-regex-assembly branch June 24, 2025 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0