Insulating layer?

Posted Oct 12, 2024 6:47 UTC (Sat) by mb (subscriber, #50428)
In reply to: Insulating layer? by NYKevin
Parent article: On Rust in enterprise kernels

If you put refcounting everywhere, you are making the language a lot slower and it would still *not* be fully safe. Think about threads.

There is no such thing as a "safe subset" of C++ and there never will be. If you create a "safe subset", then it will be a completely different language that people will have to learn. Why bother? You would still end up with a language containing massive amount of obsolete backward compatibility stuff next to your new "safe subset" language.

Just switch to a safe language (for new developments). It's much simpler. And it's much safer.

Insulating layer?

Posted Oct 12, 2024 16:45 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

I'm not saying I think this is a good idea, and I'm certainly not trying to argue that it will be a serious competitor to Rust in terms of overall safety. I'm saying that the very same political factor you identify (massive legacy codebases) will also create great pressure for C++ to provide a "safe profile" for use in contexts where Compliance™ is mandatory. It is reasonable to assume that the smart people who are working on this will end up producing something that has at least some improvements in safety over the status quo (which is a very low bar for C++). Compare and contrast MISRA C, which has the advantage of starting from a much simpler language, but the disadvantage of not having smart pointers for simple "object X owns object Y" cases.

Insulating layer?

Posted Oct 14, 2024 9:31 UTC (Mon) by paulj (subscriber, #341) [Link] (36 responses)

Isn't Rust basically a ref-counting language, for long-lived programmes with non-trivial data relationships?

Sure, for a compute-and-complete programme, processing a finite input to give some result, Rust programmes can allocate from the stack, and/or have acyclically-linked heap objects with tight lifetimes. But... for anything long-lived with complex relationships between different data objects, isn't the pattern in Rust to use Rc/Arc? I.e., refcounting?

Insulating layer?

Posted Oct 14, 2024 10:26 UTC (Mon) by farnz (subscriber, #17727) [Link] (14 responses)

Your long-lived data is often refcounted (especially in async Rust) because that's simpler to get right than rearranging so that your data doesn't have complicated lifetime relationships. But your short-lived accesses to that data usually borrow it, rather than refcounting it, because their relationship to the bit of data they need is simple.

And that's something that's not easy to express in Python - the idea that I borrow this data in a simple fashion, even though the structure as a whole is complicated and refcounted.

Insulating layer?

Posted Oct 14, 2024 11:00 UTC (Mon) by paulj (subscriber, #341) [Link] (13 responses)

I have tried playing with Rust. There's a lot to like. There's some stuff I dislike - the macro system especially makes me go "Wut?", Zig's approach of allowing the /same/ language to be used to compute at compile time looks to be much more ergonomic (though, havn't tried Zig yet).

In the little I've played with Rust, the lifetime annotations make sense on simpler stuff. But it seems to get very baroque on more complex stuff. And I havn't gotten over a certain "barrier" on how to design the not-simpler kinds of inter-linked relationships that are unavoidable in real-world stuff. E.g. a network protocol, you may have objects and state for "peers", "connections", "TLS contexts", "Streams", "message entity" (plurality of these) with m:n relationships between them. A peer may have multiple connections, messages may have a lifetime related to a /set/ of peers and not just the peer it was received from, etc. You have to use RC for at least some of these - so now you're dealing with /both/ refcounting AND lifetime annotations for the refcounting objects (in other languages you would embed the RC information in the refcounted object, but in Rust the RC object holds the refcounted object), and there are rules for getting data into and out of RCs and cells and what not. And I end up with long screeds of compiler error messages that give me flashbacks to C++ - maybe Rust's are better, but I've been traumatised. The complexity of implementing simple linked lists in Rust, without unsafe, is a bit of a red flag to me.

I think the local stack variable lifetimes possibly could be dealt with in simpler ways, perhaps with more limited annotations, or perhaps none at all but just compiler analysis. It might be more limited than Rust, but be enough to deal with typical needs for automatic data in long-lived programmes.

What I'd like, for long-lived programmes with non-simple relationships in their data model (e.g., many network protocols) is a language that did ref-counting /really/ well - ergonomically, with low programming complexity, and minimal runtime overhead. Cause refcounting is how all non-trivial, long-lived programmes, that can not tolerate scannning GC, end-up managing state.

Maybe I'm just too stupid for Rust. Like I'm too stupid for Haskell. Maybe refcounting is the right answer for my lack of intelligence - but then it seems many other programmers are like me. And it seems maybe even the best Rust programmers too, given how they always end up reaching for (A)RC too.

Insulating layer?

Posted Oct 14, 2024 11:28 UTC (Mon) by mb (subscriber, #50428) [Link]

>the macro system especially makes me go "Wut?", Zig's approach of allowing the /same/ language
>to be used to compute at compile time looks to be much more ergonomic (though, havn't tried Zig yet).

Rust can do that, too. It's called proc-macro

Challenges learning how to model data structures in Rust

Posted Oct 14, 2024 12:26 UTC (Mon) by farnz (subscriber, #17727) [Link] (10 responses)

One of the challenges I found in moving from C++ to Rust is that in C++, you model ownership in your head; you sort-of know whether a given pointer represents owning something, shared ownership, or simply referring to something someone else owns. Rust doesn't let you play fast-and-loose with ownership like this; the borrow checker means that you have to be extremely clear about who owns what, and when, using T for "this T is embedded here", Box<T> for "this T is owned, but is stored out of line", and Arc/Rc<T> for "ownership of this T is shared".

You can also write a simple linked list in 40 lines of Rust without any unsafe at all - the use of unsafe is because you want a complicated linked list that meets more interesting properties than the simple linked list implementation does.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 9:09 UTC (Thu) by paulj (subscriber, #341) [Link] (9 responses)

I was referring more to "simple" doubly-linked lists on the complexity side.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 9:39 UTC (Thu) by farnz (subscriber, #17727) [Link] (8 responses)

Making a doubly-linked list in safe Rust isn't that much more complicated, although it does bring in the need for Rc, since a doubly-linked list has complex ownership (the previous pointer cannot be owning).

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 10:10 UTC (Thu) by paulj (subscriber, #341) [Link] (2 responses)

Yes, course you can do with with RC. But RC is a "fuck it, we'll do it live!" escape hatch to runtime, out from the compile-time lifetime typing system,

To stay within the typed lifetime-scope of Rust, without unsafe, is somewhere between extremely complex and impossible, isn't it? (I thought impossible, but then I thought I saw a tutorial claiming it was possible once - but I can't find it back right now).

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 14:23 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

The difficult bit is the ownership model if you don't use refcounting; you want each node to be individually mutable, but you also want each node to be shared by both the previous and the next nodes. That's inevitably difficult in a "shared, or mutable, but not both" model as Rust imposes, unless you use refcounting.

Challenges learning how to model data structures in Rust

Posted Oct 18, 2024 14:38 UTC (Fri) by taladar (subscriber, #68407) [Link]

Something would probably be possible with interior mutability but it likely would need locking or some other internal mechanism to prevent mutation from both ways to access it at the same time.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 11:48 UTC (Thu) by excors (subscriber, #95769) [Link] (4 responses)

That looks like a poor example - it doesn't implement any operations beyond "new" and "append", so it never even uses its "prev" pointers. The much more thorough exploration at https://rust-unofficial.github.io/too-many-lists/ calls a similar Rc<RefCell<...>> design "A Bad Safe Deque" and notes "It was a nightmare to implement, leaks implementation details, and doesn't support several fundamental operations". Eventually it gets to "A Production Unsafe Deque", which looks pretty complicated. Singly-linked lists are much easier (though still not trivial), the double-linking is the big problem.

(It also explains:

> Linked lists are terrible data structures. Now of course there's several great use cases for a linked list: [...] But all of these cases are _super rare_ for anyone writing a Rust program. 99% of the time you should just use a Vec (array stack), and 99% of the other 1% of the time you should be using a VecDeque (array deque). These are blatantly superior data structures for most workloads due to less frequent allocation, lower memory overhead, true random access, and cache locality.
>
> Linked lists are as _niche_ and _vague_ of a data structure as a trie. Few would balk at me claiming a trie is a niche structure that your average programmer could happily never learn in an entire productive career -- and yet linked lists have some bizarre celebrity status.

Generally, you probably shouldn't judge a language on the basis of how easily it can implemented linked lists. On the other hand, one of the listed "great use cases" is "You're writing a kernel/embedded thing and want to use an intrusive list", which is probably much more common amongst LWN readers than your average programmer; it's fair to have different priorities when judging languages.

But I think it's still important to consider whether your design uses (doubly-)linked lists because they really are the best data structure for your needs, or whether it's just because linked lists are really easy to implement in C (and an equivalent of Vec is hard) so they're your default choice for any kind of sequential container, and you're trying to apply the same mindset to Rust where they're really hard (but Vec is easy) and they should never be the default choice.)

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 12:22 UTC (Thu) by khim (subscriber, #9252) [Link]

> But I think it's still important to consider whether your design uses (doubly-)linked lists because they really are the best data structure for your needs, or whether it's just because linked lists are really easy to implement in C (and an equivalent of Vec is hard) so they're your default choice for any kind of sequential container, and you're trying to apply the same mindset to Rust where they're really hard (but Vec is easy) and they should never be the default choice.)

Tragedy of lists (and the reason Rust had to wait till 2015 to be viable) lies with the fact that while double-linked lists are very bad data structure today – that wasn't the case half-century ago when Computer Science was established and first courses that teach people how to program were developed.

Back then computers big, but their memories (and thus programs written for these computers) were small, while RAM was fast and CPU slow.

Linked lists really shined in such environment!

Today computers are tiny, but have memory measured in gagabytes, programs are crazy big and CPUs are hudnreds of times faster than RAM.

Today linked lists are almost never the right data structure for anything and even when they actually better than something like Vec the difference is not large.

But people are creatures of habits: after they have learned to deal with linked lists once they tend not to ever return back and use them as long as they could.

Worse yet: while it's true, today, that Vec is the way to go and linked lists are [mostly] useless… there wasn't any point in time when linked lists stopped being useful and Vec started shining. Transition was very slow and gradual: first Vec was awful and linked lists great, then Vec have become a tiny bit less inefficient and thus become better for some tasks, while linked list have become less efficient and thus worse for these some task, etc.

Thus even today, long after linked lists and all tricks associated with them stopped being important in 99% of cases, people try to apply what they know to Rust.

And yes, that inevitably sends us back to one funeral at time story.

Which is sad, because that means that new generation would have to relearn everything again: while linked lists are no longer useful many other lessons that “old beards” learned are still usefull… but if C/C++ would be replaced with Rust (or any other safe language) via one funeral at time way then all these lessons would be forgotten… and thus repeated.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 12:42 UTC (Thu) by smurf (subscriber, #17840) [Link]

Well, the kernel contains a veritable heap of linked lists, so we're stuck with them.

The key isn't to implement a doubly-linked-list in Safe Rust, which is not exactly a workable idea for obvious lifetime reasons, but to hide them behind an appropriate abstraction.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 14:00 UTC (Thu) by paulj (subscriber, #341) [Link] (1 responses)

I agree linked-lists are a terrible data-structure. But that isn't really the point here.

Doubly linked lists are just a well-understood form of data-structure with cycles. Cycles in data-representations in non-trivial programmes are common, even if those programmes (rightly) eschew linked-lists. The linked-list question speaks to the ability to do common kinds of data modelling.

Focusing on the linked-list part misses the point.

Challenges learning how to model data structures in Rust

Posted Oct 17, 2024 15:57 UTC (Thu) by mb (subscriber, #50428) [Link]

> Focusing on the linked-list part misses the point.

I think that focussing on everything must be implemented in safe Rust is missing the point of Rust.
Yes, many things *can* be implemented in safe Rust, but that doesn't mean they should.

For implementing such basic principles as containers and lists, it's totally fine to use unsafe code.
If you can drop reference counting by implementing it in unsafe code and manually checking the rules, then do it. This is actively encouraged by the Rust community. The only thing that you must do is keep interfaces sound and do all of your internal manual safety checks.

Therefore, I don't really understand why reference cycles or linked lists are discussed here. They are not a problem. Use a safe construct such as Rc or use unsafe. It's fine.

Insulating layer?

Posted Oct 20, 2024 11:15 UTC (Sun) by ssokolow (guest, #94568) [Link]

I have tried playing with Rust. There's a lot to like. There's some stuff I dislike - the macro system especially makes me go "Wut?", Zig's approach of allowing the /same/ language to be used to compute at compile time looks to be much more ergonomic (though, havn't tried Zig yet).

The declarative macro syntax is an interesting beast because it's somehow so effective at disguising that a macro is just a match statement to simulate function overloading with some Jinja/Liquid/Twig/Moustache/Handlebars/etc.-esque templating inside, disguised by a more Bourne-like choice of sigil.

...maybe because it's really the only place in the language where you have to code in a Haskell-like functional-recursive style instead of an imperative style if you want to do the fancier tricks?

In the little I've played with Rust, the lifetime annotations make sense on simpler stuff. But it seems to get very baroque on more complex stuff.

True. On the plus side, they are still actively working to identify places where they can remove the need for weird tricks. See, for example, the Precise capturing use<..> syntax section in this week's announcement of Rust 1.82.0.

Maybe I'm just too stupid for Rust. Like I'm too stupid for Haskell. Maybe refcounting is the right answer for my lack of intelligence - but then it seems many other programmers are like me. And it seems maybe even the best Rust programmers too, given how they always end up reaching for (A)RC too.

One of Rust's greatest weaknesses has always been that its "make costs explicit" philosophy has created a language that encourages premature optimization, and people in the community concerned with writing learning materials spend an inordinate amount of time encouraging people to write quick and dirty code (eg. String instead of &str, .clone() everywhere, etc.) first, learn the more advanced stuff later, and always profile before optimizing.

(Of course, to be fair, the same thing occurs more subtly at lower levels with people misjudging the cost of a given machine language opcode and how superscalar architecture will affect costs. It just doesn't result in compiler errors.)

There's no shame in reaching for a reference-counted pointer. (I'm no C++ programmer, but I've never seen any signs of this much agonizing over std::shared_ptr.) However, for better or for worse, having people averse to it does seem to contribute to the trend that Rust implementations of things have low and stable memory usage.

Insulating layer?

Posted Oct 14, 2024 10:31 UTC (Mon) by excors (subscriber, #95769) [Link] (18 responses)

A little bit, but that's not the same as "refcounting everywhere". You might store long-lived complex-ownership state in an Rc/Arc, but I think you typically wouldn't use the Rc much within your code - you'd (automatically) deref it into an uncounted &T reference before passing it down the call stack and operating on it, with the borrow checker ensuring the &T doesn't outlive the Rc<T> it originated from. That means you're rarely touching the reference count (avoiding the performance problems it causes in languages like Python), but you still get the lifetime guarantees (unlike C++ when you mix shared_ptr with basic references and risk use-after-free).

Insulating layer?

Posted Oct 14, 2024 11:03 UTC (Mon) by paulj (subscriber, #341) [Link] (17 responses)

That's a nice comment on how RC and lifetimes can complement each other. Thanks.

I'm still left a bit uneasy about the complexity on all the rules on getting data in and out of RCs and cells and what not, and implementing non-trivial relationships in data, in Rust. It's not a simple language.

Insulating layer?

Posted Oct 15, 2024 8:20 UTC (Tue) by taladar (subscriber, #68407) [Link] (16 responses)

You are confusing the complexity of the problem with the complexity of the language.

The problem of ownership is complex, Rust helps you deal with it by making it explicit, other languages just let you make mistakes without providing any help with it at all.

Insulating layer?

Posted Oct 15, 2024 9:15 UTC (Tue) by paulj (subscriber, #341) [Link] (15 responses)

The way I code in C is that I reduce the complexity of the ownership problem by constraining myself to what I am allowed to do:

- references may be "owning" or "shared"
- An "owning" reference may only ever be stored to automatic, local variables.
- A "shared" reference is ref-counted, and can be stored to multiple variables (using whatever refcounting machinery/helpers I've put in place)

Rust automating these rules and enforcing them, AND allowing me to safely pass a "owning" reference to other functions, taking care of passing the ownership in the process, is nice. In my way of doing things, I will often have to use local helper functions that take "owning" references to do stuff with them. And I have to myself ensure I don't store those pointers somewhere. But, that's what I have to do anyway. Rust enforcing this, unless code is marked 'unsafe' seems nice.

But... for me... Rust's lifetime type system allows too much. I can see it would let me do more than what my own heavily constrained system allows, and with guarantees, but it also drags in a lot of complexity. Complexity I already made a conscious decision to avoid.

A language with a much more limited set of scoping rules for pointers - more like the system I've already been using for years in unsafe languages - could be simpler and just as effective, possibly.

Insulating layer?

Posted Oct 16, 2024 7:44 UTC (Wed) by taladar (subscriber, #68407) [Link] (14 responses)

As long as you are the only one working on your code and you don't make mistakes you might be able to rely on your assumptions but I think the strength of Rust (and similar strict languages) is that they can guarantee that as long as there aren't any compiler bugs all programmers working on your project, including you on a bad day where you do make mistakes, will follow the rules.

Insulating layer?

Posted Oct 17, 2024 9:19 UTC (Thu) by paulj (subscriber, #341) [Link] (13 responses)

Yes, I get that. Rust can enforce rules, and that's great.

My point is more that my own system of ad-hoc, non-compiler-enforced rules for being able to manage the problem of lifetimes of objects and their references is simpler than Rusts'. In my system I basically try to have just 2 scopes for references - the very local, and then the refcounted. The latter are "safe" use-after-free issues (I don't use usually use refcounting machinery that deals with concurrency, but you could).

Rust goes further than that, and provides for arbitrary scopes of lifetimes for references to different objects. Great, lot more powerful. But... it also becomes harder to reason about when you start to make use of that power, it seems to me. I can't keep track of more than a couple of scopes - that's why my ad-hoc safe-guards in lesser languages are so simple, and I think it's part of why I struggle to get my head around Rust's error messages when I try write more interlinked, complex data-representations, in my own toy test/learn coding attempts. The fact that a lot of code ends up going back to refcounted containers suggests I might not be alone, not sure.

What I'm saying is that simple programmers like me might be better off with a "safe" (i.e. enforcing the rules) language with a more constrained, simpler, object lifetime management philosophy.

Insulating layer?

Posted Oct 17, 2024 10:36 UTC (Thu) by Wol (subscriber, #4433) [Link] (12 responses)

This sounds a bit like alloca?

I wonder. Could you create a "head" object who's lifetime is the same as the function that created it, and then create a whole bunch of "body" objects that share the same lifetime?

These objects can now reference each other with much fewer restrictions, because the compiler knows they share the same lifetime. So there's no problem with the head owning the next link down the line etc etc, and each link having a reference to the link above, because they'll all pass out of scope together?

Of course that means you can't access that data structure outside the scope of the function that created it, but that would be enough for a lot of purposes?

Cheers,
Wol

Insulating layer?

Posted Oct 17, 2024 11:45 UTC (Thu) by smurf (subscriber, #17840) [Link] (2 responses)

> that would be enough for a lot of purposes?

That would also be rather pointless for a lot of other purposes (among them: freeing a bunch of body objects, oops you suddenly have scoping problems anyway), so what's the point?

Insulating layer?

Posted Oct 17, 2024 12:25 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> so what's the point?

It feels pretty simple to me - cf paulj's comment that full blown Rust just seems to blow his mind.

Does Rust actually have a memory allocator, or does it have a "new"? That was the point of my mention of alloca - as the caller goes out of scope, any memory allocated by alloca goes out of scope as well. You don't free alloca memory.

Could the Rust compiler handle "this memory is now out of scope, just forget about it"? If standard Rust rules say an object cannot own an object with a lifetime longer than itself, just dropping the object can't do any harm, can it?

You're effectively setting up a heap, with its lifetime controlled by "head". So you just drop the entire heap, because every object in the heap has had the heap lifetime constraints imposed on it.

Cheers,
Wol

Insulating layer?

Posted Oct 20, 2024 11:11 UTC (Sun) by ssokolow (guest, #94568) [Link]

Does Rust actually have a memory allocator, or does it have a "new"?

That's the concern of the data type, not the language.

If you want a new, you give the struct at least one private member so it can't be initialized directly and write a public associated function (i.e. a public class method) which constructs and returns an instance (Named new by convention only.) ...but that doesn't automatically mean heap allocation. It's purely a matter of whether there are "correct by construction" invariants that need to be enforced.

If you want to heap-allocate, you either use a type which does it internally like Box<T> (std::unique_ptr in C++) or Vec<T> or you do as they do internally and use the unsafe wrappers around malloc/calloc/realloc/free in std::alloc.

Could the Rust compiler handle "this memory is now out of scope, just forget about it"?

It does. If you want a destructor, you impl Drop and, if you don't, a stack allocation will just be forgotten until the whole frame is popped and a heap allocation will go away when the owning stack object's Drop is run and frees the memory.

Rust's design does a good job of making its memory management appear more sophisticated than it really is. It's really just stack allocation, access control, destructors but no actual language-level support for constructors, and using RAII design patterns to implement everything else in library code on top of manual calls to malloc/realloc/free.

The borrow checker plays no role in machine code generation beyond rejecting invalid programs, which is why things like mrustc can exist.

Insulating layer?

Posted Oct 17, 2024 14:07 UTC (Thu) by paulj (subscriber, #341) [Link] (8 responses)

Samba has "talloc" which supports having heap objects allocated as a hierarchy. Freeing an object frees all its children in the hierarchy.

You could take that kind of tack and allocate stuff on the stack with alloca, sure.

Neither hierarchical allocation, nor stack allocation, address the issue of tracking validity of references though, as such. As you're implying, that requires something else - be it an ad-hoc system of rules that enforce guarantees, assuming the programmers' can hold themselves to applying those rules (and... they will fail to every now and then); or whether they are rules in the language and enforced in the compiler.

The question for me is: What is the most programmer friendly system of rules to guarantee safety?

Rust is one example of that. With a very general lifetime typing system (the most general possible?). That generality brings complexity. Yet, many Rust programmes have to step out of that compile-time lifetime-type system and use runtime ref-counting. So then the question is, if the lesson from Rust is that many many programmes simply go to runtime ref-counting for much of their scoping, would it be possible to just have a less general, simpler lifetime-typing system?

E.g., perhaps it isn't necessary at all to even need to represent lifetime types. Perhaps it would be sufficient to have 2 kinds of references - local and refcounted, with well-defined and safe conversion semantics enforced by the language.

Insulating layer?

Posted Oct 17, 2024 16:00 UTC (Thu) by khim (subscriber, #9252) [Link]

> E.g., perhaps it isn't necessary at all to even need to represent lifetime types. Perhaps it would be sufficient to have 2 kinds of references - local and refcounted, with well-defined and safe conversion semantics enforced by the language.

That's where Rust have started, more-or-less. The language which used this approach, Cyclone, is listed among many languages that “influenced” Rust.

Only it's not practical: language that you are getting as a result doesn't resemble C at all! Or, more precisely: language would look like a C, but it's APIs would be entirely different. Even most trivial functions like strchr are designed around ability to pass reference to local variable somewhere. It's not even possible to pass buffer that you would fill with some values to another function!

I'm not entirely sure all the complexity that Rust have (with covariance and contravariance, reborrows and so on) is needed… technically – but it's 100% wanted. Till Rust 2018 introduced Non-lexical lifetimes Rust was famous not for its safety, but for it's needless strictness. I think almost every article till that era included a mandatory part which was telling you tales how your “fight with the borrow checker” is not in vain and how it enables safety, etc.

After introduction of NLL, reborrows, HRBTs and so on people stopped complaning about that… and started complaining about complexity of the whole thing… but it's highly unlikely that people would accept anything less flexible: they want to write code and not fight with a borrow checker!

> So then the question is, if the lesson from Rust is that many many programmes simply go to runtime ref-counting for much of their scoping, would it be possible to just have a less general, simpler lifetime-typing system?

Sure. Swift does that.

It has quite significant performance penalty, but still much faster than many other popular languages.

Insulating layer?

Posted Oct 17, 2024 17:10 UTC (Thu) by Wol (subscriber, #4433) [Link] (5 responses)

> Neither hierarchical allocation, nor stack allocation, address the issue of tracking validity of references though, as such. As you're implying, that requires something else - be it an ad-hoc system of rules that enforce guarantees, assuming the programmers' can hold themselves to applying those rules (and... they will fail to every now and then); or whether they are rules in the language and enforced in the compiler.

But does it *have* to?

If you have a procedure-level heap, or some other data structure with a guaranteed lifetime, you apply exactly the same borrow-rules but on the heap level. If all references to the heap have the same or shorter lifetime than the heap, when the heap dies so will the references.

So you create a heap for your linked list as early as possible, and you can freely create references WITHIN that heap as much as you like. They'll die with the heap. It's not meant to be an all-singing-all-dancing structure, it's meant to have a couple of simple rules that make it easy to create moderately complex data structures. You probably wouldn't be able to store pointers in it that pointed outside of it, for example. You would have to be careful storing references to items with shorter lifetimes. But if you want to store something with loads of complex *internal* references, it would be fine precisely because of the "everything dies together" rule.

So you'd use the existing rust checking setup, just that it's a lot easier for you as the programmer to keep track of, precisely because "it's an internal self reference, I don't need to care about it".

Cheers,
Wol

Insulating layer?

Posted Oct 17, 2024 17:27 UTC (Thu) by daroc (editor, #160859) [Link] (1 responses)

You may be interested in Vale, a programming language that is trying a bunch of new things around lifetime-based memory safety, including compiler support for "regions", which work almost exactly how you describe. The project also has some interesting ideas about tradeoffs between different memory management strategies. I want to write an article about it at some point.

Insulating layer?

Posted Oct 18, 2024 16:33 UTC (Fri) by paulj (subscriber, #341) [Link]

That is very interesting. The generational references sound useful for performant weak-references. Although, you can not use them for very frequently allocated objects (you can't reuse the memory past <generation size>_MAX).

Insulating layer?

Posted Oct 18, 2024 14:25 UTC (Fri) by taladar (subscriber, #68407) [Link] (2 responses)

That is essentially just an arena allocator. It solves the problem of forgetting to free something but doesn't solve e.g. accessing something that should not be used anymore after a certain operation.

Insulating layer?

Posted Oct 18, 2024 16:24 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

> but doesn't solve e.g. accessing something that should not be used anymore after a certain operation.

And can't you just apply ordinary Rust rules to that? It's not intended as a way of escaping Rust's rules. It's just meant as a way of enabling the *programmer* to forget abut a lot of the rules on the basis that internal references, pointers, objects will all go invalid at the exact same time. So if A points to B and B points to A and they're in this structure you don't worry about cleanup because they both go poof and emit the magic smoke at the same time.

If A contains a pointer to C with a shorter (or longer) lifetime, Rust will need to check that A destroys the pointer as C goes out of scope, or alternatively that the entire heap cannot go out of scope until C is destroyed.

A simple structure for simple(ish) situations, and more complex structures for where simple doesn't work. And if those complex structures are hard to grasp, it helps if you've realised that the simple structures aren't up to the task (and why).

Cheers,
Wol

Rust has arena allocators

Posted Oct 18, 2024 16:35 UTC (Fri) by kleptog (subscriber, #1183) [Link]

There appear to be several arena allocators for Rust: https://manishearth.github.io/blog/2021/03/15/arenas-in-r...

The basic idea is you have a function with allocates an arena and keeps it alive. Within the arena objects can reference each other as much as they want, including cycles. They can also reference other objects, as long as they live longer than the arena. When your function exits, the arena is cleaned up in one go. Incompatible with destructors (though there are tricks for that), but otherwise looks like what you want.

I know them from PostgreSQL where they have an arena per query so interrupting the query safely throws away everything. There are plenty of use cases for them.

Insulating layer?

Posted Oct 18, 2024 14:23 UTC (Fri) by taladar (subscriber, #68407) [Link]

> many many programmes simply go to runtime ref-counting for much of their scoping

This is not my experience with Rust at all. Refcounting does happen occasionally but the vast, vast majority of cases doesn't use ref-counting at all in Rust. Usually it is no more than a few values (logically speaking, some of them can exist in many copies of course but the one that is referred to by the same name in the code and passed along the same code-paths) even in large programs.

Insulating layer?

Posted Oct 14, 2024 13:08 UTC (Mon) by khim (subscriber, #9252) [Link]

It's true that some people (even among Rust developers) actively want to make it a ref-counting language… but currently… no, it's not a refcounting language.

It's very subtle and very much “something good done for entirely wrong reasons”, but the critical part is that in Rust to increment your counter you need to explicitly call Arc::clone or Rc::clone.

And if you pass your smart pointer around you don't really do any expensive operations at all. This affects efficiency very significantly.

Swift is trying to reach the same point from the other side and automatically eliminate ref-counting from places where compiler can prove it's not needed.

It's an interesting experiment, but I'm sceptical about that approach: that advantage of Rust (every much unplanned, as shown above!) is that pesky .clone that you don't want to see on every line.

That's simple psychology: if you see it – you want to eliminate it (when that's easy to do, it's not a crime to call .clone, if you need to!), if it's invisible – you wouldn't even think about whether what you are doing is expensive or not.

Insulating layer?

Posted Oct 17, 2024 19:16 UTC (Thu) by sunshowers (guest, #170655) [Link]

Yes, it's common to sprinkle in refcounts where necessary. But most of the time you don't need them. So a common pattern is that the components all use refcounts to talk to each other, but use borrows internally.