chardet detect UTF-8 XML File as EUC_KR - Possibility to exclude encodings? · Issue #287 · chardet/chardet · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i've got an utf-8 xml File that will be detected as EUC_KR. Is there a possibility to exclude encodings from detection?
I would like to exclude EUC_KR and EUC_JP from encodings getting detected, but i don't find any method to exclude encodings.
This is my code:
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import sys,os,csv
import chardet
from chardet.universaldetector import UniversalDetector
from datetime import datetime
from datetime import date
print(chardet.__version__)
3.0.4
def detect_encode_generic(file):
detector = UniversalDetector()
detector.reset()
with open(file, 'rb') as f:
for row in f:
detector.feed(row)
if detector.done: break
detector.close()
return detector.result
infile=os.path.realpath("./2023-01-08_C12_DE35435545485488415265_EUR_000123.xml")
result_gen = detect_encode_generic(infile)
print(f" {infile} is encoded in '{result_gen['encoding']}' with confidence level of {result_gen['confidence']}")
/foo/bar/2023-01-08_C12_DE35435545485488415265_EUR_000123.xml is encoded in 'EUC-KR' with confidence level of 0.99
Hello,
i've got an utf-8 xml File that will be detected as EUC_KR. Is there a possibility to exclude encodings from detection?
I would like to exclude EUC_KR and EUC_JP from encodings getting detected, but i don't find any method to exclude encodings.
This is my code:
XML UTF-8 File that gets detected as EUC_KR is in the attachment:
2023-01-08_C12_DE35435545485488415265_EUR_000123.zip
Best Regards,
Thomas
The text was updated successfully, but these errors were encountered: