-
-
Notifications
You must be signed in to change notification settings - Fork 28
escaped characters #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
escaped characters #356
Comments
Hi @kartofelek007 , Thanks for the issue report. I'll take a look at this and create a test case to investigate. I know the content of a script tag is treated differently in the underlying DOMDocument, so maybe there's something this library can do to patch this behaviour. Will keep you up to date when I have a fix. Greg. |
Hi @kartofelek007, Please can you check over the changes I've just pushed to branch Unit tests available here: https://github.com/PhpGt/Dom/blob/356-escaped-characters/test/phpunit/HTMLDocumentTest.php#L650-L667 |
I am simple man. I dont know how to use this with composer. |
😄 I'm happy to help. If you are requiring phpgt/dom directly through composer, you can refer to a particular branch by using this syntax: {
"require": {
"phpgt/dom": "dev-356-escaped-characters"
}
} Alternatively, you could clone this repository, use If you need any help integrating, I'm more than happy advising you - I'm really invested in helping get my open source work accessible to all. |
I install it on my page. In composer.lock I have something like: {
"name": "phpgt/dom",
"version": "dev-356-escaped-characters",
"source": {
"type": "git",
"url": "https://github.com/PhpGt/Dom.git",
"reference": "c2f11cc299feefc7ee2faae61178de5b04a17dec"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/PhpGt/Dom/zipball/c2f11cc299feefc7ee2faae61178de5b04a17dec",
"reference": "c2f11cc299feefc7ee2faae61178de5b04a17dec",
"shasum": ""
}, But it still not working. <pre class="line-numbers"><code class="language-js">
const p = document.querySelector("#testNodeText");
const btn = document.querySelector(".btn-textNodeTest");
const word1 = document.createTextNode("Psy");
p.append(word1);
p.append(" są ");
const word2 = document.createTextNode("fajne");
p.append(word2);
btn.addEventListener("click", () => {
console.dir(word1); //wypiszemy sobie by zobaczyć co możemy użyć
word1.textContent = "Koty też";
word2.textContent = "super!";
});
</code></pre>
<script>
{
const p = document.querySelector("#testNodeText");
const btn = document.querySelector(".btn-textNodeTest");
const word1 = document.createTextNode("Psy");
p.append(word1);
p.append(" są ");
const word2 = document.createTextNode("fajne");
p.append(word2);
btn.addEventListener("click", () => {
console.dir(word1); //wypiszemy sobie by zobaczyć co możemy użyć
word1.textContent = "Koty też";
word2.textContent = "super!";
});
}
</script> In normal html elements (pre, code etc) the letters are correct. But in script tag all non eglish letters are escaped (look for example at comment after console.dir |
Thank you for supplying a test case. I'll add this to the unit tests to replicate in a controlled environment. If you keep your eye on the |
Hi @kartofelek007, I've added a test to check that the non-English characters within script tags are not escaped. The test is passing, so I think I need more information from you about your particular use-case, so I can write more tests and isolate this bug properly. Please could you provide me the source HTML that contains the With your help I'm sure we can figure out why this strange encoding issue is occurring. P.S. cool DOOM terminal |
Take a look at scratch.php - I've pushed this to the branch which can be tested within a real web browser. The I serve the Please see this screen recording. I hope this helps get us to a solution: |
Hmmm. Your example work fine.
Edit. I test my funcion in two scanarios in your scratch.php:
|
Hi! Thanks for providing more detail. I can see where your issue lies now - within the I'll see what I can do to improve this functionality, but in the meantime, I think you can resolve your issue by using the HTMLDocument's toString function. Rather than doing this: $html = $document->documentElement;
return $html->innerHTML; do this instead: return (string)$document; You will not see any of this escaping after casting the document to a string. I will see if I can provide a fix to the There is some advice I'd like to share with you at this point, which is to try and avoid concatenating strings of HTML, and instead using the DOM functionality like |
This escape letters outside script tag:
but not in script tag |
Hello @kartofelek007 Sorry for the delay, but I've spent some time tonight trying to get UTF-8 characters working across all tests. Please can you check this is working for you? I believe I've fixed things for you. Greg. P.S. It's on the |
After update this return content where every inline js script (include inline google ad script) look like this:
And most important - generate take long time. For example for my single page this may be 20-30s. I create simple test with 200lines html - generate may take 3-4s? If I delete all script from this html, generate is muuuuuch faster. |
I've just added some more tests to the branch and can confirm that I've somehow added some weird bugs. Sorry to keep this open for so long, I'll revisit it and see if I can fix whatever weird bug I've introduced. |
I'm very poor with php. In dont know how to run test. I do this in my own way: I download your repo, checkout do character branch, install composer, recreate test page (see abowe 29 jul) and run it. Then i run your library on my page. I'm not suprised, because this php functionality is somehow broken and not refined |
@kartofelek007 I made a big mistake with my previous fix. I believe I have now fixed everything in the latest commit - sorry for confusing things. Here's how you can run the tests locally, with the latest version of the code: git checkout 356-escaped-characters
git pull
composer install
vendor/bin/phpunit test/phpunit I believe you should be able to see the correct characters within the Thank you for helping me figure this one out, I'm very interested in hearing how you get on. |
First test on test page and my real page pass correct. Looks like it's working fine now. Good job! |
Great! I'll keep the pull request open for a bit longer while I integrate the branch into a few live apps I'm running. |
* bug: certain elements do not encode properly on render closes #356 * test: test bug #356 provided by @kartofelek007 * test: extra assertions - check escaped string are not in the render * test: Remove unnecessary echo * scratch: Marcin's test isolation * fix: more work towards #356 * tweak: remove scratch file * fix: better solution to #356 using processing entity
I have html with code:
I read my html, add div and print output ($content is html get by ob_get_clean() and it good because i tested it in my other function):
I got:
The text was updated successfully, but these errors were encountered: