8000 GitHub · Where software is built
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
scannerloop stops after encountering a very long line #81
Open
@timmattison

Description

@timmattison

If you have a line longer than the 1MB buffer length (don't ask) the scannerloop's scanner.Scan() for condition will evaluate to false. When this happens line counting for the current file stops where it is and reports incorrect results for that file.

gocloc/file.go

Line 90 in 7b24285

for scanner.Scan() {

I could see a few fixes for this.

  1. A new option to set the buffer size with a maximum of 1MB being the default if it is unset:
	if opts.MaxLineLength > 0 {
		scanner.Buffer(buf.Bytes(), opts.MaxLineLength)
	} else {
		scanner.Buffer(buf.Bytes(), 1024*1024)
	}
  1. Scanning the files ahead of time to find the longest gap between line endings and then automatically setting that as the buffer size. This does require reading the file twice though.

  2. Changing the scannerloop to use something like mmap instead of scanner.

If you're interested in the third one let me know and I'll work on a PR.

The first one probably touches a bit more of the overall design than I should take on for a first PR.

I think the second one is safe but it does double the I/O required. Disk caching may make this less of an issue than doubling the amount of raw data read from disk but still feels like a last resort.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0