racket-xp-mode: Handling very large files

@samth

Via Racket Slack, @samth supplied this example file which is over 8 million bytes and 86,000 lines long.

It seems to take nearly 60 seconds for the Racket Mode back end to run check-syntax and prepare the response.

[  debug] racket-mode: 47913 cpu | 48131 real | 3047 gc <= drracket/check-syntax/expanded-expression
[  debug] racket-mode:    0 cpu |    0 real |    0 gc <= drracket/check-syntax/expansion-completed
[  debug] racket-mode:   50 cpu |   50 real |    2 gc <= defs/uses
[  debug] racket-mode:  732 cpu |  733 real |   82 gc <= get-annotations
[  debug] racket-mode:  114 cpu |  114 real |   67 gc <= imports
[  debug] racket-mode: 51073 cpu | 51293 real | 4029 gc <= total /var/tmp/samth-huge.rkt

The response is a single huge s-expression, which after our elisp-writeln (which takes about 2 seconds) is about 20,452,540 bytes long.

Emacs then seems to freeze when reading the response.

I'm not yet sure how much time is spent attempting an Emacs Lisp read of the process buffer text. Since the process filter will be getting text in smaller chunks, the read is being attempted multiple times before succeeding.

If it were to get past that, I'm not sure how much time is spent in the command response handler dolist of the response sexpr, adding text properties to the buffer.

Clearly the current design isn't intended or able to handle files this large. A 5,000 line file like drracket/private/unit.rkt is closer to the envisioned definition of "large".

What I don't yet know, is what to do about it.

Simple mitigation

Of course there can be a mitigation: We can supply people a function to use for a racket-mode-hook, that is not merely racket-xp-mode. Instead of unconditionally enabling that mode, it would check buffer-size.

Another mitigation could be, don't change the hook function. Instead, have racket-xp-mode itself check buffer-size, and act differently. e.g. Set racket-xp-after-change-refresh-delay to nil for such a buffer, and, have the manual racket-xp-annotate command warn about such buffers.

"Streaming"

Instead of returning a single response, the back end command could send a stream of notifications.

Although this would probably solve the issue of Emacs "freezing", it's not clear that ~60 seconds of such notifications and updates to the buffer is going to be a good or coherent experience. For example, what if the user edits, prompting a new check-syntax? How/when do we delete text properties for the previous generation?

Don't give Emacs the data and use text properties, at all

Another idea is not to return the data in a command response and insert it as text properties; instead hold it in the back end. Add a command to query it, e.g. "What are the annotations for this interval of the buffer?" Drawbacks here include managing a new source of state in the back end. Also, we consult properties in an Emacs 'pre-redisplay-functions hook, so we can update things on screen as the user navigates. Now each such movement of point will need to issue a command to the back end; will that be fast enough to be satisfactory?

Other?

Probably there are other ideas, which I'm not yet thinking of, with their own tradeoffs, which I don't yet know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Simple mitigation

"Streaming"

Don't give Emacs the data and use text properties, at all

Other?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Description

Simple mitigation

"Streaming"

Don't give Emacs the data and use text properties, at all

Other?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions