-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Optimize and improve parameter conversion in OCI8 8000 Statement #2495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
do { | ||
if (!$quote) { | ||
$result = $capture('/[?\'"]/', function ($token) use ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the closure to a private method, and give it a specific name
@Ocramius please check the reworked version. It contains:
I totally agree. Luckily, unlike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I like the attempt to improve performance here, I am honestly frightened by possible security implications. I'll have to think about those - meanwhile, consider looking at the comments above.
While it might be slower, I really wish to move away from explicit by-ref parameters, and instead rely on a tiny object containing the arguments being passed around. Unsure if that makes any sense, but please let me know if you can envision that, or if it seems like a complication.
*/ | ||
private static function findPlaceholderOrOpeningQuote( | ||
$statement, | ||
&$tokenOffset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having by-ref parameters is not something I'd want to maintain/debug later on. I see why it's made, but is there a way to put them together instead? Can we create an object that is passed down instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move all the parsing logic (not only the by-ref variables) into a different class. It might look like:
$foo = new Foo($sql);
$newSql = $foo->getSql();
$paramMap = $foo->getParameterMap();
But I don't like that it only does get*()
but doesn't do do*()
. Maybe it could be a proxy (or sub-) class which would translate all messages with positional parameters (__construct()
, bind*()
) to the other object which only understands the named ones?
return false; | ||
} | ||
|
||
if ($token == '?') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strict comparison please
* @param int $fragmentOffset The offset to build the next fragment from | ||
* @param array $fragments Resulting query fragments | ||
* @param string|null $currentLiteralDelimiter Current literal delimiter | ||
* @param array $paramMap Parameter map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit unclear what kind of map this is. Consider using a syntax such as string[]
for the type, and explaining what key and value would be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, this is an array<int, string>
, where the keys are numeric (positional), and the strings are the parameter names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's array<int, string>
.
$fragments[] = substr($statement, $fragmentOffset, $tokenOffset - $fragmentOffset); | ||
$fragments[] = $param; | ||
$paramMap[$position] = $param; | ||
$fragmentOffset = ++$tokenOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split this into two statements: $tokenOffset += 1; $fragmentOffset = $tokenOffset;
} | ||
|
||
if ($token == '?') { | ||
$position = count($paramMap) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counting shouldn't be repeated: if the parameters of this method are refectored into a tiny object, this can be moved out.
$currentLiteralDelimiter = null; | ||
|
||
do { | ||
if (!$currentLiteralDelimiter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the code correctly, you are currently looping over the structure, and adding to the $paramMap
as you go, checking for the "
starting and closing delimiters. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this logic currently work with multiline literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It works with multiline literals same as with others (I'll add tests for that). In regexes, we don't search for any character classes which do or do not match the newline characters. We only search for "
, '
and ?
.
* @param string $tokenOffset The offset to start searching from | ||
* @param int $fragmentOffset The offset to build the next fragment from | ||
* @param array $fragments Resulting query fragments | ||
* @param string|null $currentLiteralDelimiter Current literal delimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description is a bit unclear/redundant
* @param string $statement The SQL statement to parse | ||
* @param string $tokenOffset The offset to start searching from | ||
* @param int $fragmentOffset The offset to build the next fragment from | ||
* @param array $fragments Resulting query fragments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description is a bit unclear/redundant - what does determine a fragment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fragment is a part of the original statement not containing placeholders. When re-constructing the query, fragments are joined back together with new parameters between them. For example, for "SELECT * FROM users WHERE id = ? OR name = ?"
, the fragments are "SELECT * FROM users WHERE id = "
, " OR name = "
and ""
.
Probably, this much implementation details in a type description means there should be a dedicated type for that.
$fragments[] = $param; | ||
$paramMap[$position] = $param; | ||
$fragmentOffset = ++$tokenOffset; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return early instead of using else
&$tokenOffset, | ||
&$currentLiteralDelimiter | ||
) { | ||
$token = self::findToken($statement, $tokenOffset, '/(?<!\\\\)' . $currentLiteralDelimiter . '/'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$currentLiteralDelimiter
should be passed to preg_quote()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't really understand the ?<!
sequence here: can you clarify it? (couldn't make it work on JS regex parsing tools)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to have a load of security-sensitive string escape sequences to test against here. As good as your intentions are, this is reeeeeally playing around with fire
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a negative lookbehind: we're searching for the current literal delimiter which is not preceded by a backslash (i.e. not escaped). The backslash is escaped twice: once to avoid special meaning of the backslash a PHP string and once to avoid its special meaning in the regex.
396bc66
to
9c3afca
Compare
@Ocramius let me try running some real app tests with this patch and seeing if there are any anomalies. Shame on me, according to both documentation and how it's implemented in the DBAL, Currently unresolved concerns:
Will be glad to hear from @deeky666 as well. |
1. Reworked the parser syntax to use proper Oracle escaping syntax as '' instead of \'. 2. Moved valid tests to the Functional section to ensure correct SQL syntax and logic.
Please see the update. I ran a large set of mixed unit/integration tests using this branch on Oracle and didn't find any issues. (TBH, I found that the empty string literal was not handled properly in the reworked version and fixed it, added a test). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@morozov this is good to go: 🚢-ing
Current implementation of
OCI8Statement::convertPositionalToNamedPlaceholders()
is relatively slow. In a worst case scenario I observed during profiling the app I'm working on, parsing takes up to 10% of the execution time.It could be sped up by getting rid of a character-by-character string traversal in PHP and using regular expressions.
Besides the performance problem, the current implementation incorrectly handles escaped quotes and quotes of another kind in string literals. Given that in many cases string literals contain user input, it may lead to all sort of issues including security ones:
The suggested implementation is times faster and handles the described above literals correctly.
Additionally, it introduces throwing exception in case of parsing errors. It will be mostly useful in cases if the converter is unable to parse a correct query and thus produces a knowingly invalid result.