RFC 0016: Instances vs Types #485

rhazn · 2023-12-15T15:43:10Z

No description provided.

georg-schwarz · 2023-12-15T17:07:30Z

rfc/0016-instances-vs-types/0016-instances-vs-types.md

+```
+
+## Drawbacks
+- This change introduces more structure for blocks but pipelines are still confusing, see Alternatives


With that enforced structure, users might loose flexibility to define things where they want (as an implication).

Leading to an alternative:

leave as it is and document that the pipe syntax means instantiation and everything else is type definition

decided against by applying principle "Explicit modeling over hidden magic"; not mixing concepts makes things more explicit IMO

Added this as drawback / alternative with the decision as described here.

georg-schwarz

Love it! I think this will immensly help object-oriented programmers to understand what is happening.

joluj

What about the more verbose from-to syntax of pipeline steps?

joluj · 2023-12-15T17:59:34Z

rfc/0016-instances-vs-types/0016-instances-vs-types.md

+
+### Proposed change
+
+Allow **definitions** only outside of pipelines, allow block **instantiations** only inside of pipelines.


Suggested change

Allow **definitions** only outside of pipelines, allow block **instantiations** only inside of pipelines.

Allow **definitions** only outside of pipelines, allow block **instantiations** only inside of pipelines and composite blocks.

Or better: write one sentence at the top that a composite block is equivalent to a pipeline definition in this case.

Good point. I added a note for that into the future enhancements, I think we should clean it up with the packages RFC implementation. For this RFC, I would not change anything about that setup.

rhazn · 2023-12-18T14:34:22Z

What about the more verbose from-to syntax of pipeline steps?

Luckily we just deleted them :D.

rhazn · 2023-12-18T14:38:06Z

FYI @dirkriehle regarding our discussion on instances vs types in the recent JValue team meeting. If you have some time, I'd love to get a review from you for this RFC. Otherwise I'll bring it into the next JValue meeting.

georg-schwarz · 2023-12-21T12:16:09Z

rfc/0016-instances-vs-types/0016-instances-vs-types.md

+
+Allow **definitions** only outside of pipelines, allow block **instantiations** only inside of pipelines.
+
+Because pipelines get executed implicitly when executing a Jayvee model (and therefore pipelines get **instantiated** during runtime), it makes sense to bundle all other instantiations in them as well. This means everything outside of a pipeline is a **definition** (**type**), everything inside a pipeline is an **instance**.


Do we always instantiate as singletons within a pipeline?

I don't think so but I am also not sure I understand what you are asking. The interpreter instantiates a new object for every block reference in a pipeline I think.

I think it is a special kind of instantiation: you cannot create two instances of a block, right?
I don't know how to better frame it >.<

Do you mean in Jayvee or in the interpreter?

I think in Jayvee as well. Let's take this example:

pipeline Demonstrator { MyCsvExtractor -> MyTableInterpreter -> MySqliteLoader; MyTableInterpreter -> MyPostgresLoader; }

In my understanding, one instance of MyTableInterpreter would be shared between those two branches of the graph.

Exactly, but within the scope of a pipeline it kinda is. The question is if that is just an interpreter detail or part of the language as well..

However, that is not always true. Example:

pipeline MySecondPipeline { MyExtractor1 -> MyTableInterpreter -> MyLoader; MyExtractor2 -> MyTableInterpreter -> MyLoader; }

Sorry, I think I lost your suggestion on how to change this RFC for this. I think this is an implementation detail of the interpreter so I am unsure what effect it should have here. Can you make that more explicit? 😅

Okay, my suggestion is to specify when an instance is "reused" like in the first example and when not like in the second example. However, I'm having troubles to specify this. Maybe leave it open then?

Ah, I would leave this out of the scope of this to focus on the description vs. instantiation semantics. Reusing existing instances on instantiation is a nuance that would make this too big I think?

dirkriehle · 2023-12-27T13:45:51Z

Thanks for pointing me to this thread. My original thrust was to fix the oftype relationship, which I think remains broken, see email copy below.

This proposal, if I understand it, makes matters worse because it cements the confusion. So it is a good time to have the fundamental discussion, though I don't think this is the best place for it (hence the email). Maybe it can serve as an example though.

This RFC is called Instances vs. types but it is only about block types? Also, it isn't really about introducing types but rather about where code may appear and where not? (In general reducing expressiveness is not a good thing; it is better to work from atomic principles that allow sound expressions so that expressiveness increases.)

I don't think it is unclear what an instance and what a type is in Jayvee. As explained below, we don't have explicit pipeline types, only implicit ones. This can be fixed easily by making the type explicit by allowing it to have a name. Same thing for blocks and block types I would think. Without giving them a name, we can't reuse them, so my assumption is that an anonymous type only ever has exactly one instance (within a running Jayvee instance).

My interpretation of code like "block CarsCSVExtractor oftype CSVExtractor" inside a pipeline description is that you are describing an M0 (to be created at runtime) object structure consisting of an outer pipeline object with with several M0 block objects of which one is called CarsCSVExtractor whose type is CSVExtractor and of which the url attribute is set a specific value. You write this is a definition of a data source but I don't understand this. It is a description of a block inside a pipeline; there is no explicit type definition here. You then say the block is instantiated. I'd think there is only exactly one block and it gets linked by pipes, it doesn't get instantiated multiple times. If you are going to tell me that this is actually a type definition, I'd be even more confused: It should not relate to another type by oftype and it should not have value to attribute assignments then.

As to the proposed change, the confusion for me continues: I think it makes sense to have block types outside of pipeline and then instantiate the block types to blocks inside a concrete pipeline description. So is what follows after "block" outside of a pipeline actually a block type or just a block (instance)?

All of this to more urgently move block types like CSVInterpreter into a system library (M1 level) out of the language (M2 level), see below. (If it hasn't happened yet.)

Hi all,

I wanted to pick up the old topic of modeling levels and types and instances, as discussed in a recent team meeting. I saw that Philip wanted to do something here as well, but I think it is only about closing an old unfinished thread about value types? Go ahead, but I suggest we first finish the discussion triggered below.

Jayvee is just fabulous work and I'm very proud of you and what we have achieved so far. That said, I remain convinced that the overall typing / relationship structure isn't right or at least is unconventional. This goes back to me trying to explain it but gaining no ground about a year ago. We got today's structure, because, in Felix words, he wouldn't know how to implement it any other way then the way he did. So I want to try again and if only to help us see the problems we might run into earlier. Also, with Johannes onboard, it may be helpful to rehash some concepts.

My basic point is that our modeling / language structure does not conform to how about everyone else does it in language design, and that I suspect that this is not a good idea i.e. we are missing something.

First the fundamentals. I'm using the UML terminology of M levels, i.e. M0, M1, M2.

M0 are the runtime objects, M1 are the model obbjects, and M2 are the language objects. The M2 level defines the language. Logical M2 objects are block and blocktype, value and valuetype, etc. They are expressed as elements in the grammar and as generated classes from which ASTs are created. Logical M1 objects are specific Jayvee programs (models). They are the written code in a .jv file or expressed as the objects in an AST after parsing the .jv file. Logical M0 objects then are the running pipelines. They are created, as runtime objects, by the code of the classes of the objects in an AST as the AST gets interpreted.

Between the objects in one M-level, you can have regular object relationships. The relationships are what the next higher level lets you express. If the M2 level (the Jayvee language) defines a subtype relationship, then two M1-level objects can get related by a subtype relationship.

Traditionally, there is only one relationship, which goes across the levels, which is an is-instance-of relationship. Any Mx level object is logically an instance of an M(x+1) level object. There is no other relationship across the M-levels. (It is always possible to repeat this whole architecture within the M1 level, the model level, to let users at runtime get the benefits of this flexible architecture, but this is beyond the scope of this email. Let's stick with the fundamental structure of a regular programming / modeling language.)

The M2-to-M1 level is-instance-of relationship is established when using the keywords of the language while programming. The M1-to-M0 level is-instance-of relationship is established when you declare instances on the M1 level and give them a type: The declaration maybe solely on the M1 level, mixing types and instances of types, but the runtime relationship is between M1 and M0 objects.

The misery starts when people start confusing the is-instance-of relationship with the is-a relationship. Again: is-instance-of establishes an instantiation relationship between two objects on adjacent M-levels. The is-a relationship (also subtyping, generalization, etc.) is a relationship between two objects within one level (the supertype and the subtype) and can only exist if the next higher level actually created an object for this type of relationship (= introduced the concept). If there is no subtyping concept defined on M2 in the Jayvee language definition, obviously users can't express subtype relationships between M1 level objects like specific block types or value types.

Let me use these fundamentals now to go over the existing example code that I got confused over back then and that I'm still confused about today. Also, what it implies for libraries and other future additions to Jayvee.

Anonymous types

Before I can get started, I need to address some possibly problematic syntactic shorthands we are using

The code

pipeline CarsPipeline { ... }

contains an anonymous type specification, the type of the CarsPipeline object, as specified by the { ... } code.

If we want reusable pipelines, we need to introduce pipeline types and name them; this would lead to code like

pipelinetype CarsPipeline { ... };
pipeline myCarsPipeline oftype CarsPipeline;

Please note the potential for inconsistences using lower and upper caps for the first character of names. The canonical way is to have types on M1 start with a capital letter, and objects on M1 with a lower case letter. "pipeline CarsPipeline ..." screws with this and is likely to confuse users.

My use of oftype here is to indicate an is-instance-of relationship between the runtime M0-level object myCarsPipeline and the M1-level object CarsPipeline. This a valid use in my book; in Java it would be expressed like:

CarsPipeline myCarsPipeline = new CarsPipeline();

I always thought of "oftype" as a shorthand for "is-of-type" or is-instance-of indicating an instantiation relationship, not a subtyping relationship.

oftype confusion 1

The first block in the CarsPipeline example is

block CarsExtractor oftype HttpExtractor { ... };

Like CarsPipeline in the example code above, CarsExtractor is an instance, not a type; hence, a better name would have been myCarsExtractor. Its type is HttpExtractor.

My understanding from back when, and I hope that this has changed, is that HttpExtractor is an M2 level object and part of the language (because we wouldn't know how to implement it differently). It really should be an M1-level object defined in a library. (We touched on this a couple of times; did we get out of that pit? In any case:) The oftype relationship described under 1. is between a type specification on the M1 level and an object declaration on the M1 level. It can't also go across two M-levels. If we can't fix this, it should at least be two different concepts, oftype1 and oftype2.

A more canonical way if we really wanted a library M1-level concept like HttpExtractor be a language-level M2-level concept would be to introduce the concept as a keyword i.e. write

httpextractorblock myCarsExtractor { ... }; // yes ugly

oftype confusion 2

Here is the use of the oftype concept as it exists for value types:

valuetype VehicleIdentificationNumber10 oftype text { ... };

This time, a type (VIN10) is created as an instance of the M2-level concept text.

It is really confusing to me here: Think back to your normal programming. You wouldn't make VIN10 a subtype of String and you wouldn't make it an instance of String. What you want is to write

valuetype VIN { // text field with constraints };

so that you can't just stick a VIN10 into anywhere a String is expected (this would blow up quickly).

This oftype relationship between an M2-level object and an M1-level type object is broken in my book.

I think there should be one definition of oftype on the M2 level, and it specifies the instantiation relationship between two M1 level objects, i.e. user-defined types and their instances.

oftype decimal vs. oftype HttpExtractor

There is one caveat here. Did you notice the intuition for text (lower-case first letter i.e. language-level keyword) and HttpExtractor (upper-case first letter suggesting specified by user even though it isn't if my memory is correct)?

I think it is valid to somehow represent a finite set of built-in value types on the M2-level so that users can create values and value thpes on the M1 level using built-in value types. Examples:

value myVIN oftype text; // poor modeling just need an example
valuetype VIN { / see above and below };

This does not apply in my book to an ever growing number of block types like HttpExtractor.

This may be an inconsistency, but it is how other languages do it AFAIK. So maybe there is an elegant way of handling a defined finite set of the builtin value types that everyone knows.

M1-level subtyping

Which leads me to the missing is-a relationship or the future goal of letting users specify subtypes. Forgive me for rehashing old examples from ADAP, but I couldn't find examples in the Jayvee example code that worked for me.

I want to be able to write

valuetype Coordinate {};
valuetype CartesianCoordinate extends Coordinate {
attributes: [
x oftype decimal;
y oftype decimal;
z oftype decimal;
];
};
valuetype PolarCoordinate extends Coordinate { ... };

I can then have

value cartesianOrigin oftype CartesianCoordinate {
x: 0;
y: 0;
z: 0;
};

I don't know whether extends is a good keyword, but I do know we shouldn't overload oftype with two meanings.

Libraries

With the recent addition of separate files and import statements, we can now relate types from library files to newly declared instances using oftype and newly specified types using extends.

I recognize that this is probably a hard to digest email and I probably made mistakes. A whiteboard is probably a better idea to wrap your head around these concepts. I hope we can pick this up in an upcoming discussion.

Happy holidays ;-)

Dirk

…ue/jayvee into rfc-0016-instances-vs-types

rhazn added the rfc label Dec 15, 2023

rhazn self-assigned this Dec 15, 2023

rhazn requested review from georg-schwarz and joluj December 15, 2023 15:43

rhazn force-pushed the rfc-0016-instances-vs-types branch from 9b2950d to 2704a9e Compare December 15, 2023 15:46

georg-schwarz reviewed Dec 15, 2023

View reviewed changes

georg-schwarz approved these changes Dec 15, 2023

View reviewed changes

joluj reviewed Dec 15, 2023

View reviewed changes

rhazn force-pushed the rfc-0016-instances-vs-types branch from 2704a9e to 34af978 Compare December 18, 2023 14:26

rhazn requested a review from dirkriehle December 18, 2023 14:37

georg-schwarz mentioned this pull request Dec 19, 2023

Navigate Pipeline via Pipes instead of Blocks #491

Merged

2 tasks

georg-schwarz reviewed Dec 21, 2023

View reviewed changes

georg-schwarz mentioned this pull request Jan 17, 2024

Refactor block validation multiple pipe inputs #506

Merged

rhazn added 2 commits February 5, 2024 15:06

Initial RFC suggestion

0714901

Added review feedback

660ba73

rhazn force-pushed the rfc-0016-instances-vs-types branch from 1e39a0f to 660ba73 Compare February 5, 2024 14:06

rhazn added 2 commits February 15, 2024 11:56

Initial RFC suggestion

99c9873

Added review feedback

14636cb

rhazn force-pushed the rfc-0016-instances-vs-types branch from 660ba73 to 14636cb Compare February 15, 2024 10:56

rhazn added 2 commits February 15, 2024 11:56

Merge branch 'rfc-0016-instances-vs-types' of https://github.com/jval…

f5f08f6

…ue/jayvee into rfc-0016-instances-vs-types

docs: 📝

91028db

rhazn closed this Feb 16, 2024

github-actions bot locked and limited conversation to collaborators Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC 0016: Instances vs Types #485

RFC 0016: Instances vs Types #485

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		### Proposed change

		Allow definitions only outside of pipelines, allow block instantiations only inside of pipelines.


		Allow definitions only outside of pipelines, allow block instantiations only inside of pipelines.

		Because pipelines get executed implicitly when executing a Jayvee model (and therefore pipelines get instantiated during runtime), it makes sense to bundle all other instantiations in them as well. This means everything outside of a pipeline is a definition (type), everything inside a pipeline is an instance.

RFC 0016: Instances vs Types #485

RFC 0016: Instances vs Types #485

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!