8000 Inconsistent datatype representation causes segfault · Issue #353 · smlnj/legacy · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Inconsistent datatype representation causes segfault #353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 12 tasks
zbyrn opened this issue Apr 10, 2025 · 3 comments
Open
2 of 12 tasks

Inconsistent datatype representation causes segfault #353

zbyrn opened this issue Apr 10, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@zbyrn
Copy link
Contributor
zbyrn commented Apr 10, 2025

Version

110.99.7.1 (Latest)

Operating System

  • Any
  • Linux
  • macOS
  • Windows
  • Other Unix

OS Version

No response

Processor

  • Any
  • Arm (using Rosetta)
  • PowerPC
  • Sparc
  • x86 (32-bit)
  • x86-64 (64-bit)
  • Other

System Component

Core system

Severity

Major

Description

functor F(S : sig
  type 'a t
  datatype t' = A | B of int t
end) = struct
  open S
  fun toT (B x) = x | toT A = raise Fail "irrelevant"
end

The CPS IR generated for toT is the following:

toT2(v133[C],v27[PV]) =
         if boxed(v27) [v146] then
            v27.0 -> v103[PV]
            v133(v103)
         else
            v164.1 -> v166[PV]
            v164.0 -> v167[PR2]
            {v60,v166,v167} -> v140
            gethdlr() -> v144[FN]
            v144(v130,v140)

It assumes that A is represented by an integer and B is a box storing int t.

If a structure defines int t to be a boxed type, the B constructor is represented directly by the int t box --- instead of a box storing int t --- causing inconsistency.

Discovered in conversation with @Skyb0rg007

Transcript

functor F(S : sig
  type 'a t
  datatype t' = A | B of int t
end) = struct
  open S
  fun toT (B x) = x | toT A = raise Fail "irrelevant"
end

structure S = F(struct type 'a t = 'a * 'a datatype t' = A | B of int t end)

val x = #1 (S.toT (S.B (1, 2)))
sml bug.sml
Standard ML of New Jersey [Version 110.99.7.1; 64-bit; January 17, 2025]
[opening bug.sml]
zsh: segmentation fault  sml bug.sml

Expected Behavior

No response

Steps to Reproduce

functor F(S : sig
  type 'a t
  datatype t' = A | B of int t
end) = struct
  open S
  fun toT (B x) = x | toT A = raise Fail "irrelevant"
end

structure S = F(struct type 'a t = 'a * 'a datatype t' = A | B of int t end)

val x = #1 (S.toT (S.B (1, 2)))

Additional Information

This bug seems to relate to the fact that t is a type constructor with type variables. Changing the signature from type 'a t to type t makes the compiler generate the correct code.

functor F(S : sig
  type t
  datatype t' = A | B of t
end) = struct
  open S
  fun toT (B x) = x | toT A = raise Fail "irrelevant"
end
toT2(v121[C],v27[PV]) =
            if boxed(v27) [v136] then
               if i63=(v49,(I63t)0) [v134] then
                  v27.0 -> v103[PV]
                  v121(v103)
               else
                  v121(v27)
            else
               v168.1 -> v170[PV]
               v168.0 -> v171[PR2]
               {v60,v170,v171} -> v128
               gethdlr() -> v132[FN]
               v132(v117,v128)

Email address

byronzhong@cs.uchicago.edu

@zbyrn zbyrn added the bug Something isn't working label Apr 10, 2025
@dmacqueen
Copy link
Contributor
dmacqueen commented Apr 12, 2025

This is a scary bug! But not entirely surprising, since we have this long-standing issue of how the choice of "optimized" representations for datatype constructors (or, more precisely, datatype "variants" like the nil and cons variants for lists) and how that choice is limited by type abstraction of various kinds.

This is actually a fairly deep and delicate problem. The underlying polymorphic type system is build on the assumption that all types are the "same", i.e. that all values have the same shape and size. This allows us to use the same algorithm for, say, list reverse or list append without taking the list element type into account. In practice, this is based on the uniform representation assumption that all values are represented physically by a "word" (e.g., 32 or 64 bits, depending on the computer architecture, or specifically the size of general-purpose registers. Then, naively, a discriminated or tagged union (like list = nil | cons) is represented by "tagged" values, like (0,x), (1,y). But it turns out that there is a way of distinguish values based not on the bits themselves, but on how they are interpreted in a particular context. This distinction is between words used as pointers, and words used as numbers or bytes or bit sequences. This distinction is normally important for garbage collection, so in languages like ML (or Lisp, or Pop-2, or ...) we make the distinction between pointers and non-pointers physically manifest by using even numbers as pointers and odd numbers as non-pointers (numbers, etc. where n represents n div 2). Then we have a way of distinguishing values that can be exploited to simplify or streamline the representations of datatype values: E.g., nil = 0, a cons cell is a pointer (to a pair)) -- this means that "tagging" is no longer necessary for lists, for example. One can get fancier and generalize this idea, as Luca Cardelli explored in the early 1980s in his implementation of "Cardelli ML". The problem is the presence of "abstract" types whose representation is unknown limits the type representation information available for calculating these optimized variant or constructor representations, and things get tricky. Hence lurking bugs like the one. I have some further thoughts on the general issue, but I won't go into them here.

@JohnReppy
Copy link
Contributor

I wonder if this is a regression, since I thought that handling this issue was one of the things that FLINT uses runtime type passing for?

@dmacqueen
Copy link
Contributor

It could be a regression, but since none of us understands the FLINT type passing functionality in any detail, it may require some digging and code revision to find and fix the problem. I have never learned or touched that code, but I have revised some other FLINT code that might be connected somehow. It is still the case that much of FLINT remains a "black hole" in the implementation. It needs to be re-implemented (renewed) or replaced, and I would start with a greatly simplified version of the FLINT types. We also need to think more deeply about the general problem of data constructor representation and type abstraction Might make a good Master's thesis topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
0