8000 If the code of the article has Chinese comments, weird encoding can appear · Issue #7 · croqaz/clean-mark · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

If the code of the article has Chinese comments, weird encoding can appear #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SilenceZhou opened this issue Apr 11, 2020 · 7 comments

Comments

@SilenceZhou
Copy link

If the code of the article has Chinese comments, weird encoding can appear:

You try this url (chinese blog):

clean-mark "https://juejin.im/post/5e916011e51d4547153d15c7"

@gaozhao7
Copy link

the same question,example:
{@link 包名.类名#方法名(参数类型)} -->
{@link 包名.类名#方法名(参数类型)}

@croqaz
Copy link
Owner
croqaz commented May 18, 2020

Hi guys, thank you for raising this issue!
I implemented a feature about encoding, some time ago: #2
In the case of the website you mention, the encoding cannot be detected from the meta charset.

I will implement a new command line flag, so you can manually specify the encoding, eg: --encoding gb2312. I will probably implement this in the next days.

@kiyoakii
Copy link

Hi guys, thank you for raising this issue!
I implemented a feature about encoding, some time ago: #2
In the case of the website you mention, the encoding cannot be detected from the meta charset.

I will implement a new command line flag, so you can manually specify the encoding, eg: --encoding gb2312. I will probably implement this in the next days.

That would be great for Chinese users, for I just met the same issue. Thank you very much.

@croqaz
Copy link
Owner
croqaz commented Jun 8, 2020

Guys, I didn't have time to look at this issue too deeply, sorry about that.
But I did find something and there's good news and bad news.
The good news is encoding works correctly in the HTML all the way.
The bad news is that breakdance library, that converts the HTML into Markdown, breaks the encoding in case of code blocks.

You can actually check this on your own like this:

clean-mark 'https://juejin.im/post/5e916011e51d4547153d15c7' -t html

You'll see that the HTML is correct. At least it looks to me, but I don't understand the language...
So I'll look into this more and see if there's anything I can do.

The worst case scenario, I have to look at alternative libraries to convert the HTML into Markdown. If there are any...

@kiyoakii
Copy link

I have just checked the HTML generated by the above instruction, and it is correct.
Thank you for doing this for us and hopefully it will be solved one day.

@croqaz
Copy link
Owner
croqaz commented Nov 12, 2020

Hi guys, I believe I fixed the issue in the latest commit.
I replaced "breakdance" with "turndown" to convert the HTML into Markdown and it works much better.
I didn't make a release yet, because the tests are still broken, but if you can clone the repo and check a few websites, it would be amazing, I'm thinking to add a few pages in the tests too, just to make sure the app will always work.
Would you mind giving me a 2-3 links to articles that you think are more important?

@codeth99
Copy link
codeth99 commented Dec 5, 2020

Thanks!Thanks!Thanks!I have cloned the repo and checked a few websites, it normally works.Such as :
https://blog.csdn.net/weixin_33743248/article/details/88733044
😄

However,in this article(https://blog.csdn.net/NextStand/article/details/59535555)
,some comments of the code like“//输出 test.js” will be losed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
0