-
Notifications
You must be signed in to change notification settings - Fork 8
Remove utf8_encode() #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note that I've not actually been able to test and verify this, but I'm 99.999999% confident it will fix the linked externals.io issue. |
@@ -29,8 +29,8 @@ public function create(string $mailbox, Horde_Imap_Client_Data_Fetch $hordeEmail | |||
// Parse the message body | |||
$parser = new Parser(); | |||
$parser->setText($hordeEmail->getFullMsg()); | |||
$htmlContent = utf8_encode((string) $parser->getMessageBody('html')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code change requires a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢 I'm terribly ashamed to admit I didn't write tests in this package (testing something calling IMAP would mean mocking a lot of stuff which defeats the purpose of the tests, or putting in a lot of work into a setup with Vagrant + IMAP server). It was the kind of thing I left for later (this package is more of a prototype than a complete solution)
I'm tempted to try it the old way: merge it and see what happens :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mnapoli Since the symptom is demonstrably a double-encoding of ISO-8859-1 -> UTF-8 and that is precisely what utf8_encode()
would do here, I am very confident that this would fix the problem.
Just as a concrete example of this being what is happening:
https://externals.io/thread/787#email-14713
Sara Golemon a écrit :
'Ã' === "\xC3"
'©' === "\xA9"
'é' === "\xC3\xA9"
If you are ever having encoding issues and you see things that start with Ã
it's a pretty good indicator that this is what is happening, quite a few commonly used non-ASCII chars start with this byte when UTF-8 encoded :-)
Let's go! Thank you @DaveRandom |
This definitely shouldn't be here, and I'm fairly certain is the root cause of mnapoli/externals#11 - the example linked there is a classic UTF-8 double-encoding issue.
The character set of the message is decoded on retrieval, since no decoder is passed to the parser explicitly the default is used, the default decoder always returns valid UTF-8.
Also ftr
utf8_encode()
andutf8_decode()
are almost completely useless (they only make sense when the other relevant charset is ISO-8859-1) and should probably be deprecated in PHP.