feat: support chardet config file setting #457

rasa · 2025-03-26T21:21:08Z

Fixes #40

no longer applicable

Per [here](https://github.com//pull/457/files#diff-2a0547966abe4b6fcace630584fe01fee0d2498396cc80c8ee2a36c0d46fae28R202): ``` // The below file fails the test, but it may not be a valid UTF-16LE file. // For example, the Linux file command doesn't identify the file as // "Unicode text, UTF-16, little-endian text" // but simply // "data" // but since the file is from // https://cs.opensource.google/go/x/text/+/master:encoding/testdata/ // I think it's correct to fail the test, and fix the chardet package. {"candide-utf-16le.txt", "utf16le"}, ```

ccoVeille · 2025-03-27T20:08:39Z

Please let me know when we can review, I have already remarks, but I want for your go, because I try to cure myself

rasa · 2025-03-28T00:08:28Z

@ccoVeille Yeah, hold off for a bit. I'm gonna toss chardet and use our own code to determine if a file is latin1, utf-8, utf-8-bom, utf-16be or utf-16le.

Here's a start:

expand

``` package main

import (
"bytes"
"fmt"
"unicode/utf16"
"unicode/utf8"
)

func detectEncoding(data []byte) string {
if len(data) >= 2 {
if bytes.HasPrefix(data, []byte{0xFE, 0xFF}) {
return "utf-16be (with BOM)"
}
if bytes.HasPrefix(data, []byte{0xFF, 0xFE}) {
return "utf-16le (with BOM)"
}
}

if utf8.Valid(data) {
	return "utf-8"
}

if isValidUTF16LE(data) {
	return "utf-16le (no BOM)"
}
if isValidUTF16BE(data) {
	return "utf-16be (no BOM)"
}

if isLikelyLatin1(data) {
	return "latin1"
}

return "binary (unknown or invalid text)"

}

func isValidUTF16LE(data []byte) bool {
if len(data)%2 != 0 {
return false
}
u16 := make([]uint16, len(data)/2)
for i := 0; i < len(u16); i++ {
u16[i] = uint16(data[2i]) | uint16(data[2i+1])<<8
}
decoded := utf16.Decode(u16)
for _, r := range decoded {
if r == utf8.RuneError {
return false
}
}
return true
}

func isValidUTF16BE(data []byte) bool {
if len(data)%2 != 0 {
return false
}
u16 := make([]uint16, len(data)/2)
for i := 0; i < len(u16); i++ {
u16[i] = uint16(data[2i+1]) | uint16(data[2i])<<8
}
decoded := utf16.Decode(u16)
for _, r := range decoded {
if r == utf8.RuneError {
return false
}
}
return true
}

func isLikelyLatin1(data []byte) bool {
const disallowed = "" +
"\x00\x01\x02\x03\x04\x05\x06\x07\x08" + // C0 controls
"\x0B" + // Vertical Tab
"\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F" +
"\x80\x81 8000 \x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F" + // C1 controls
"\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F"

return !bytes.ContainsAny(data, disallowed)

}

func main() {
data := []byte("Hello,\tworld!\n") // Replace with file or test content
fmt.Println("Detected encoding:", detectEncoding(data))
}

</details>
See https://go.dev/play/p/lcye7XmZLJv

Based on my research, I think we should interpret `latin1` to mean `iso8859-1`, so if a file has any bytes in the 00-31 (except tab, lf, cr, and ff) or 128-159 range (`windows-1252` uses these), we would reject it as not `latin1`. If the user wants to allow those bytes, they need to use `charset = unset`.

Alternatively, we treat `latin1` as `unset`, and allow any byte stream, including valid uft8/16/32.

Thoughts?

(love the pic btw!)

klaernie · 2025-03-28T07:30:50Z

I would probably ignore the entire set of control characters entirely (0-32). If I recall correctly they always stayed the same in meaning throughout all the iso-8859 family and Unicode.

Also IIRC if we only find byte values <128 it could be both utf8 and latin1, so both should be accepted.

ccoVeille · 2025-03-28T07:43:03Z

I looked for a lib about that.

More looking for examples and ideas, more than looking for a lib to import.

I found this

https://github.com/softlandia/cpd
License Apache 2.0

I didn't check about how they handle the char 0-32.

But, I can tell their test files are interesting.

I like it considers codepage I wouldn't have thought about.

Does anyone know another lib? We could look at.

rasa · 2025-04-01T15:39:47Z

I found this

https://github.com/softlandia/cpd License Apache 2.0

@ccoVeille Thank you for the suggestion. Unfortunately, of the ISO8859s, it only identifies ISO8859-5, not ISO8859-1, which is what I think is the best match for the latin1 config setting.

rasa · 2025-04-01T15:50:14Z

I would probably ignore the entire set of control characters entirely (0-32). If I recall correctly they always stayed the same in meaning throughout all the iso-8859 family and Unicode.

@klaernie Sorry, I don't quite follow. The 0-32 characters (other than TAB, FF, LF, CR) are the best way to determine if a file is text, or binary. It's how dos2unix, and many other programs, determines this. And utf-16/32 uses 0-32, so I don't see how they have the "same meaning" when in a utf-8 file, or an iso-8859-1 file.

Also IIRC if we only find byte values <128 it could be both utf8 and latin1, so both should be accepted.

That's true. The trick is to determine if a file is latin1 (aka iso-8859-1), or some other non-UTF encoding. The solution is clearly non-trivial, and the best solution I've found so far is uchardet.

klaernie · 2025-04-02T20:29:04Z

I would probably ignore the entire set of control characters entirely (0-32). If I recall correctly they always stayed the same in meaning throughout all the iso-8859 family and Unicode.

@klaernie Sorry, I don't quite follow. The 0-32 characters (other than TAB, FF, LF, CR) are the best way to determine if a file is text, or binary. It's how dos2unix, and many other programs, determines this. And utf-16/32 uses 0-32, so I don't see how they have the "same meaning" when in a utf-8 file, or an iso-8859-1 file.

I was thinking of the meaning of the first 32 unicode codepoints being identical in meaning to the first 32 in ASCII - but I didn't think about the fact that despite them being the first 32 codepoints they are not represented as bytes with the values 0-32 in utf16 and utf32. However in utf8 this assumption would hold.

But no matter, you are indeed correct that this is the only chance to differentiate text from binary files. I think I should no spend too much time on GitHub having just gotten out of bed before my first coffee ;)

Thanks a lot for the effort you are putting into this!

rasa · 2025-04-02T22:53:33Z

The new chardet library works much better than the old one. Here are the only failures:

click

--- FAIL: TestEqual (1.39s)
    encoding_test.go:283: Equal(): "iso88591.txt": expected: latin1, got: windows1255
    encoding_test.go:105: result={Encoding:Windows-1255 Confidence:0.99 Language:Hebrew} (first 4 bytes: 54686973)
    encoding_test.go:283: Equal(): "utf8-sdl.txt": expected: latin1, got: windows1254
    encoding_test.go:105: result={Encoding:Windows-1254 Confidence:0.5250663680561672 Language:Turkish} (first 4 bytes: 5554462d)
    encoding_test.go:283: Equal(): "utf8.txt": expected: utf8, got: windows1254
    encoding_test.go:105: result={Encoding:Windows-1254 Confidence:0.5132700345166831 Language:Turkish} (first 4 bytes: 7072656d)
FAIL

I need to drill into these.

rasa · 2025-04-04T18:19:16Z

So I added the 159 testdata files from
https://gitlab.freedesktop.org/uchardet/uchardet/-/tree/master/test?ref_type=heads
which our new detector, https://github.com/wlynxg/chardet, fails on 31 of them:

click to expand

testdata/uchardet/bg/windows-1251.txt: got ISO-8859-1, want Windows-1251 (confidence 0.99, language Bulgarian
testdata/uchardet/da/iso-8859-15.txt: got ISO-8859-1, want ISO-8859-15 (confidence 0.73, language 
testdata/uchardet/et/iso-8859-13.txt: got ISO-8859-1, want ISO-8859-13 (confidence 0.73, language 
testdata/uchardet/et/windows-1257.txt: got ISO-8859-1, want Windows-1257 (confidence 0.73, language 
testdata/uchardet/fi/iso-8859-1.txt: got MacRoman, want ISO-8859-1 (confidence 0.7159344894026975, language 
testdata/uchardet/he/iso-8859-8.txt: got ISO-8859-1, want ISO-8859-8 (confidence 0.99, language Hebrew
testdata/uchardet/he/windows-1255.txt: got ISO-8859-1, want Windows-1255 (confidence 0.9773686833361969, language Hebrew
testdata/uchardet/hr/iso-8859-13.txt: got ISO-8859-1, want ISO-8859-13 (confidence 0.73, language 
testdata/uchardet/hr/iso-8859-16.txt: got ISO-8859-1, want ISO-8859-16 (confidence 0.73, language 
testdata/uchardet/hr/iso-8859-2.txt: got ISO-8859-1, want ISO-8859-2 (confidence 0.73, language 
testdata/uchardet/hu/iso-8859-2.txt: got ISO-8859-1, want ISO-8859-2 (confidence 0.73, language 
testdata/uchardet/lt/iso-8859-10.txt: got ISO-8859-1, want ISO-8859-10 (confidence 0.73, language 
testdata/uchardet/lt/iso-8859-13.txt: got ISO-8859-1, want ISO-8859-13 (confidence 0.73, language 
testdata/uchardet/lt/iso-8859-4.txt: got ISO-8859-1, want ISO-8859-4 (confidence 0.73, language 
testdata/uchardet/lv/iso-8859-10.txt: got ISO-8859-1, want ISO-8859-10 (confidence 0.73, language 
testdata/uchardet/lv/iso-8859-13.txt: got ISO-8859-1, want ISO-8859-13 (confidence 0.73, language 
testdata/uchardet/lv/iso-8859-4.txt: got ISO-8859-1, want ISO-8859-4 (confidence 0.73, language 
testdata/uchardet/mt/iso-8859-3.txt: got ISO-8859-1, want ISO-8859-3 (confidence 0.73, language 
testdata/uchardet/no/iso-8859-15.txt: got ISO-8859-1, want ISO-8859-15 (confidence 0.73, language 
testdata/uchardet/pl/iso-8859-13.txt: got ISO-8859-1, want ISO-8859-13 (confidence 0.6365243902439024, language 
testdata/uchardet/pl/iso-8859-16.txt: got ISO-8859-1, want ISO-8859-16 (confidence 0.73, language 
testdata/uchardet/pl/iso-8859-2.txt: got ISO-8859-1, want ISO-8859-2 (confidence 0.73, language 
testdata/uchardet/ro/iso-8859-16.txt: got ISO-8859-1, want ISO-8859-16 (confidence 0.73, language 
testdata/uchardet/sk/iso-8859-2.txt: got ISO-8859-1, want ISO-8859-2 (confidence 0.6586976744186046, language 
testdata/uchardet/sl/iso-8859-16.txt: got ISO-8859-1, want ISO-8859-16 (confidence 0.73, language 
testdata/uchardet/sl/iso-8859-2.txt: got ISO-8859-1, want ISO-8859-2 (confidence 0.73, language 
testdata/uchardet/tr/iso-8859-3.txt: got ISO-8859-1, want ISO-8859-3 (confidence 0.73, language 
testdata/uchardet/tr/iso-8859-9.txt: got ISO-8859-1, want ISO-8859-9 (confidence 0.73, language

superseded

That's over 20%, so I propose the following solution:

editorconfig-checker/README.md

Lines 378 to 420 in dde3580

    
           For example: `ec --exclude node_modules` 
        
           ## Charset setting 
        
           Our current charset detector accurately identifies `utf-8`, `utf-8-bom`, `utf-16be`, and `utf-16le`  
        
           encodings, as well as files that are UTF32 encoded. 
        
           Unfortunately, it struggles to correctly indentify `latin1` (aka ISO-8859-1) encoded files. 
        
           So, by default, we don't check if a file is `latin1` encoded. If you want to enable this check, 
        
           you will need to add the following to your configuration file: 
        
           ```json 
        
           { 
        
             ... 
        
             "Charsets": { 
        
               "Latin1": 50 
        
             } 
        
             ... 
        
           } 
        
           ``` 
        
           In the example above, the number `50` identifies the minimum confidence level (between 0 and 100) 
        
           that is found that the file is indeed `latin1` encoded. A higher number indicates more confidence, 
        
           and a lower number indicates less confidence. 
        
           A value of `0`, disables the `latin1` charset check. 
        
           Since our charset detector accurately identifies `utf-8`, `utf-8-bom`, `utf-16be`, and `utf-16le`, 
        
           this check is enabled by default, with a default confidence factor of 50. If you are seeing files 
        
           are being identified incorrectly, you can disable this charset check by adding any of the 
        
           following entries to your configuration file: 
        
           ```json 
        
           { 
        
             ... 
        
             "Charsets": { 
        
               "UTF8": 0, 
        
               "UTF8BOM": 0, 
        
               "UTF16BE": 0, 
        
               "UTF16LE": 0 
        
             } 
        
             ... 
        
           } 
        
           ```

pkg/encoding/encoding.go

klaernie · 2025-04-04T19:33:59Z

Reading the tests in the https://github.com/editorconfig/editorconfig-plugin-tests I'm pretty sure all the ISO-8859 variants are all grouped as latin1, or the understanding of latin1 means pure ASCII.

So I think I would reduce the ISO8859 variants to all be treated as latin1 - assuming that a user specifying latin1 in .editorconfig intends to accept all variants of ISO8859, specifically the variant they use locally - no matter if it is ISO8859-1 or IS=8859-15.

The more important use case should be IMHO that we correctly identify UTF8, 16 and 32 in both their endiannesses and detect a BOM. This will be the more frequent use case for people wanting to ensure their codebase is up to a modern standard and not introducing files containing what I'd call legacy encoding.

rasa · 2025-04-04T20:24:06Z

@klaernie Good feedback. I thought of that too. But what led me to think latin1 means ISO-8859-1 is https://github.com/editorconfig/editorconfig/wiki/Character-Set-Support which links latin1 to https://en.wikipedia.org/wiki/ISO/IEC_8859-1 . That page says it's "Latin alphabet no. 1" https://www.iana.org/assignments/character-sets/character-sets.xhtml also says an alias for ISO-8859-1 is latin1.

But I think you're right. By default, we interpret latin1 to mean "any text file that's not a UTF* encoding. edit ~~But perhaps we add an option to interpret latin1 as ISO-8869-1. Perhaps something like:~~

outdated

{ 
  ... 
  "Charsets": {
    "Latin1": ["ISO-8859-1"]
  }
  ... 
}

~~And if the user adds this to their config file, we'll report charset mismatches for latin1.~~

Thoughts?

edit: ~~Aside: I would assume identifying UTF16 files is 100% accurate, but the chardet library failed on two UTF16 files, so I added some extra checks here.~~

But note that https://en.wikipedia.org/wiki/Byte_order_mark#UTF-16 says my logic "can result in both false positives and false negatives."

I guess if the user has files that fail our checks, they can always exclude them.

Fixes editorconfig-checker#40

rasa · 2025-04-07T01:21:14Z

Note, I had to add "application/octet-stream", to validation.go's textRegexes list, for things to work as expected.

Now it's in sync (again) with pkg/config/config.go.

I'm not sure how things worked without this.

codecov · 2025-04-07T06:47:43Z

Codecov Report

Attention: Patch coverage is 86.40777% with 28 lines in your changes missing coverage. Please review.

Project coverage is 87.37%. Comparing base (af09b21) to head (1fed588).
Report is 48 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/encoding/encoding.go	86.13%	20 Missing and 8 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #457      +/-   ##
==========================================
+ Coverage   86.72%   87.37%   +0.65%     
==========================================
  Files          11       11              
  Lines        1017     1228     +211     
==========================================
+ Hits          882     1073     +191     
- Misses        102      120      +18     
- Partials       33       35       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

klaernie · 2025-04-07T07:07:55Z

@klaernie Good feedback. I thought of that too. But what led me to think latin1 means ISO-8859-1 is https://github.com/editorconfig/editorconfig/wiki/Character-Set-Support which links latin1 to https://en.wikipedia.org/wiki/ISO/IEC_8859-1 . That page says it's "Latin alphabet no. 1" https://www.iana.org/assignments/character-sets/character-sets.xhtml also says an alias for ISO-8859-1 is latin1.

I did not find the wiki page - that uncovers the hints I failed to notice.

So we should implement latin1 as strictly ISO-8859-1, and not support any of the other ISO-8859-* variants for now, as the editorconfig itself does not list them as supported. According to spec we must ignore any value we do no implement and treat it as charset = unset, so there is no harm in not supporting the other variants.

This also underpins the sentiment I get from the editorconfig wiki - I read it as "use unicode as a first choice" - which I personally also find as the most reasonable.

I'll hopefully get to review the code later, but that might be not today.

rasa · 2025-04-07T14:27:29Z

So we should implement latin1 as strictly ISO-8859-1, and not support any of the other ISO-8859-* variants for now, as the editorconfig itself does not list them as supported. According to spec we must ignore any value we do no implement and treat it as charset = unset, so there is no harm in not supporting the other variants.

I hear you. I initially thought so as well, but you convinced me to be more lenient given our false positives in identifying iso-8859-1 files.

So, note that
https://github.com/editorconfig/editorconfig/wiki/Character-Set-Support#supported-character-sets says

Other character sets could be specified outside of this set and they would be ignored if not understood by the editor.

Since we are not an editor, I think we can include support for other character sets. The spec says that latin1 and the utf*s are what "all plugins should attempt to support at a minimum." It doesn't say we can't support others.

This also underpins the sentiment I get from the editorconfig wiki - I read it as "use unicode as a first choice" - which I personally also find as the most reasonable.

Agreed. And I can see many people using editorconfig-checker to help them migrate their legacy files to Unicode.

I'll hopefully get to review the code later, but that might be not today.

edit: I don't intend to make any more changes, but take your time. It's a big change.

superseded

Take your time. I feel it's ready for review, but there are three small changes I am considering:

Add a test for encoding.Supported which I somehow overlooked.
Use sparse-checkout in our encoding package's Makefile, and
Use golang/text's ascii encoder for ASCII file decoding.

klaernie · 2025-04-08T06:25:36Z

pkg/encoding/encoding_test.go

+
+const defaultConfidence = 1
+
+const testResultsJson = "test-results.json"


do I understand this correctly as being a test snapshot file? If so, why not reuse snaps - there the order of the tests would not matter?

To be honest, I haven't worked with go-snaps before. Are you suggesting we should here?

Note: If EDITORCONFIG_ADD_NEW_FILES=1 is set in the environment, the tests suite will scan for new files in testdata and add them to the test-results.json. Otherwise the suite runs only on the testdata files listed in the file.

I found the file very helping in my debugging, as I could run a git diff after a run to see if anything changed. That was a lot easier than scanning the log output for hundreds of files.

okay, so basically you implement the reverse behaviour of snaps. Snaps will add new snapshots, but never change existing snapshots unless UPDATE_SNAPS is set to a true value.

In this use case I would think using snaps.MatchStandaloneJSON(t, someValue) would be better.
https://github.com/gkampitakis/go-snaps?tab=readme-ov-file#matchjson

This would mean:

always scan for testfiles

during teardown call snaps.MatchStandaloneJSON() for each test. Although one might argue that matching the snapshot inline would be easier, but right now the test architecture touches the central tests slice multiple times, if I understand it correctly.

snaps itself will then generate test failures when a previously created snapshot is not matched.

rasa · 2025-04-15T07:33:14Z

pkg/encoding/encoding.go

+	UnknownEncoding = "unknown"
+
+	// See https://spec.editorconfig.org/#supported-pairs
+	// CharsetUnset defines the value allowing for file encoding.


We could use the constants at https://github.com/editorconfig/editorconfig-core-go/blob/7404f9a31780afcfa5bc3a62697476e720b39c5d/editorconfig.go#L36 instead.

rasa · 2025-04-15T07:36:18Z

pkg/encoding/encoding.go

+		charsetFound = "utf8bom"
+	}
+
+	if !supported(charsetFound) {


supportedUTFEncoding() would be a more apt name.

rasa · 2025-04-15T07:38:16Z

pkg/encoding/encoding.go

+		}
+
+		// We need to check for UTF16/32 encodings first, as
+		// UTF16/32 encoded first can be valid UTF8 files (surprisingly).


s/first/files/

rasa · 2025-04-20T01:53:10Z

I think the test files at
https://github.com/arthenica/libiconv/tree/master/tests and
https://github.com/pa-0/dos2unix/tree/master/dos2unix/test
may be of higher quality, so I am thinking of adding them to our testdata.

If code review hasn't started, lemme know, and I'll set this to draft status, and add them, as well as the comments noted above.

klaernie · 2025-04-29T21:20:56Z

Makefile

@@ -1,5 +1,6 @@
 ifeq ($(OS),Windows_NT)
 	STDERR=con
+	EXEEXT=.exe


where did that come from?

@klaernie Windows' exe's need an extension for Windows to execute them.

Is that a requirement for the tests to pass, or just to work around a limitation on your dev machine?

Not a requirement. Just needed to run the tests when CGO is not available.

klaernie · 2025-04-29T21:30:44Z

I think the test files at https://github.com/arthenica/libiconv/tree/master/tests and https://github.com/pa-0/dos2unix/tree/master/dos2unix/test may be of higher quality, so I am thinking of adding them to our testdata.

I wonder if there is an optimal set instead of collecting test files from everywhere. But currently I'm on neither side of the fence, so feel free to add them.

If code review hasn't started, lemme know, and I'll set this to draft status, and add them, as well as the comments noted above.

There is no binary state of code review, feel free to make changes as you see fit. We maintainers would be stupid to keep you from iterating towards the best solution - after all you're doing the hard work right now, and I'm very thankful for that!

Generally I'm a bit vary of the huge implementation of the test cases, since I still haven't wrapped my head around it fully. It seems fairly complicated, but probably isn't. But I think if you convert this from storing tests in a json file to matching snapshots it might become clearer.

rasa · 2025-04-30T04:38:42Z

The code I believe is production ready, but I'm converting to draft to revision the test framework, and explore using snapshots, as @klaernie suggested. Please be patient as I haven't worked with this tooling before. I think only using the highest quality test files, makes sense too.

rasa changed the title ~~feat: Support chardet config file setting~~ feat: support chardet config file setting Mar 26, 2025

rasa force-pushed the rs/support-charset-setting branch 2 times, most recently from 7624a20 to f554002 Compare April 4, 2025 18:09

rasa mentioned this pull request Apr 4, 2025

Misidentifies 31 uchardet test files wlynxg/chardet#3

Open

ccoVeille reviewed Apr 4, 2025

View reviewed changes

pkg/encoding/encoding.go Outdated Show resolved Hide resolved

pkg/encoding/encoding.go Outdated Show resolved Hide resolved

rasa force-pushed the rs/support-charset-setting branch 2 times, most recently from b525ba2 to c8905c0 Compare April 7, 2025 00:40

rasa marked this pull request as ready for review April 7, 2025 00:41

rasa force-pushed the rs/support-charset-setting branch from c8905c0 to fb7a7c7 Compare April 7, 2025 00:43

feat: support chardet config file setting

6f1c4bc

Fixes editorconfig-checker#40

rasa force-pushed the rs/support-charset-setting branch from fb7a7c7 to 6f1c4bc Compare April 7, 2025 01:11

rasa added 2 commits April 6, 2025 21:54

feat: add ascii as its own encoding

1b8c34f

docs: tighten up charset docs

5bb70b9

rasa added 2 commits April 7, 2025 08:32

fix: fix minor golangci-lint issues

511be6f

chore: shorten testdata/.editorconfig's filespecs

9d49f08

rasa added 3 commits April 7, 2025 08:34

chore: use $@ vars in encoding's Makefile

975a364

docs: update comments in encoding/test-results.json

c24412d

Merge branch 'main' into rs/support-charset-setting

200ff37

klaernie reviewed Apr 8, 2025

View reviewed changes

rasa commented Apr 15, 2025

View reviewed changes

Merge branch 'main' into rs/support-charset-setting

1fed588

klaernie reviewed Apr 29, 2025

View reviewed changes

rasa marked this pull request as draft April 30, 2025 04:34

rasa added 2 commits May 10, 2025 08:55

Merge branch 'main' into rs/support-charset-setting

c432426

Merge branch 'main' into rs/support-charset-setting

b9664ff

+              testnorace: ## Run test suite without -race which requires cgo
+              	go test -coverprofile=coverage.txt -covermode=atomic ./...
+              	go test -trimpath -coverprofile=coverage.txt -covermode=atomic ./...
+              	go vet ./...
+              	@test -z $(shell gofmt -s -l . | tee $(STDERR)) || (echo "[ERROR] Fix formatting issues with 'gofmt'" && exit 1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support chardet config file setting #457

feat: support chardet config file setting #457

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		const defaultConfidence = 1

		const testResultsJson = "test-results.json"

feat: support chardet config file setting #457

Are you sure you want to change the base?

feat: support chardet config file setting #457

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!