8000 fix(gen-jsonld): handle multiple contexts by Silvanoc · Pull Request #2602 · linkml/linkml · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix(gen-jsonld): handle multiple contexts #2602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Silvanoc
8000
Copy link
Contributor
@Silvanoc Silvanoc commented Apr 4, 2025

CLI supports multiple context files (what returns a tuple), but code mostly expecting a single context (and therefore only a string).
IDEs and linters recognize the inconsistency thanks to typing.

The context can get extended , therefore converting the tuple into a list.

[Update]: trying to analyze some unexpected changes in the resulting JSON-LD I think I have uncovered an unknown issue:

Pending the confirmation of an expert, I am the impression that the patch is correct and fixing both the types inconsistencies and the context generation.

I wonder what's the reason for the lack of code-sharing for @context generation between jsonldgen.py and jsonldcontextgen.py.

‼️ ATTENTION ‼️: snapshot updated without analyzing its correctness (see here) for easy identification of the changes, do not merge until this note has been removed!

Closes #2625

@Silvanoc Silvanoc marked this pull request as draft April 4, 2025 17:28
@Silvanoc Silvanoc force-pushed the fix-json-ld-context-typing branch from 5154af3 to 930bf9c Compare April 4, 2025 17:31
Copy link
codecov bot commented Apr 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.95%. Comparing base (6226784) to head (b6410c0).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2602   +/-   ##
=======================================
  Coverage   82.95%   82.95%           
=======================================
  Files         127      127           
  Lines       13856    13856           
  Branches     3863     3863           
=======================================
  Hits        11494    11494           
  Misses       1722     1722           
  Partials      640      640           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Silvanoc
Copy link
Contributor Author
Silvanoc commented Apr 7, 2025

My changes have a bigger impact on the generated JSON-LD than expected...

Before my changes calling JSON-LD generation from the CLI (linkml generate jsonld) resulted in a tuple due to the click option specification. As a consequence context cannot be None and the context is not getting extended with the context of the prefixes. No additional context will be appended to context, apart from that directly obtained from the LinkML document.

With my changes the context of each prefix is being included and the resulting JSON-LD document is significantly different. I'm lacking the JSON-LD expertise though, to assess its correctness.

@Silvanoc Silvanoc force-pushed the fix-json-ld-context-typing branch 2 times, most recently from e5439ac to 811f2d2 Compare April 7, 2025 15:44
@Silvanoc
Copy link
Contributor Author
Silvanoc commented Apr 7, 2025

The changes in the context for the test schema simple_uri_test.yaml can be seen here.

@Silvanoc Silvanoc force-pushed the fix-json-ld-context-typing branch 2 times, most recently from 4fe1f18 to 9a724c8 Compare April 10, 2025 05:32
@tfliss
Copy link
Contributor
tfliss commented Apr 10, 2025

@Silvanoc I don't have a full root cause, but for now:

https://github.com/linkml/linkml/blob/fix-json-ld-context-typing/linkml/generators/jsonldgen.py#L165
Note how the arguments are passed to the constructor, but not to serialize.
add_prefixes = ContextGenerator(self.original_schema, model=False, metadata=False).serialize()

https://github.com/linkml/linkml/blob/fix-json-ld-context-typing/linkml/generators/jsonldcontextgen.py#L234
And here the arguments are all passed to both the constructor and to serialize.
print(ContextGenerator(yamlfile, **args).serialize(**args))

This causes some odd handling of model / self.model in end_schema:
https://github.com/linkml/linkml/blob/fix-json-ld-context-typing/linkml/generators/jsonldcontextgen.py#L91
where I'm seeing the value of model being true.

I tried getting equivalent behavior using command-line options to generate jsonld-context, but because of the differences above I haven't succeeded. Will probably have to look at this at a later time. Given that the JSON-LD may not parse in either form, do you want to break this work out into a separate issue?

@tfliss
Copy link
Contributor
tfliss commented Apr 10, 2025

@Silvanoc

Also these lines account for the difference in @base between the two forms:

JSON-LD generator adds @base after the rest of the context is generated here:
https://github.com/linkml/linkml/blob/fix-json-ld-context-typing/linkml/generators/jsonldgen.py#L185

The default context generator adds it here:
https://github.com/linkml/linkml/blob/fix-json-ld-context-typing/linkml/generators/jsonldcontextgen.py#L103
but when model is false, the context_body is not used:
https://github.com/linkml/linkml/blob/fix-json-ld-context-typing/linkml/generators/jsonldcontextgen.py#L118

@tfliss
Copy link
Contributor
tfliss commented Apr 10, 2025

@Silvanoc Below are collapsed draft patches that pass arguments into ContextGenerator serialize() the way jsonld-context does. They also move base processing down into ContextGenerator. We should loop in a core dev to check.

expand/collapse Patch for JSON-LD generator handling of context base and serialize parameters
diff --git a/linkml/generators/jsonldgen.py b/linkml/generators/jsonldgen.py
index aa5f9b2b..5e3ecf87 100644
--- a/linkml/generators/jsonldgen.py
+++ b/linkml/generators/jsonldgen.py
diff --git a/linkml/generators/jsonldgen.py b/linkml/generators/jsonldgen.py
index aa5f9b2b..bece1738 100644
--- a/linkml/generators/jsonldgen.py
+++ b/linkml/generators/jsonldgen.py
@@ -162,7 +162,10 @@ class JSONLDGenerator(Generator):
             # model_context = self.schema.source_file.replace('.yaml', '.prefixes.context.jsonld')
             # context = [METAMODEL_CONTEXT_URI, f'file://./{model_context}']
             # TODO: The _visit function above alters the schema in situ
-            add_prefixes = ContextGenerator(self.original_schema, model=False, metadata=False).serialize()
+            add_prefixes = ContextGenerator(
+                self.original_schema, base=base_prefix, model=False, metadata=False, prefixes=False
+            ).serialize(
+                base=base_prefix, model=False, metadata=False, prefixes=False)
             add_prefixes_json = loads(add_prefixes)
             context = [METAMODEL_CONTEXT_URI, add_prefixes_json["@context"]]
         elif isinstance(context, str):  # Some of the older code doesn't do multiple contexts
@@ -181,8 +184,6 @@ class JSONLDGenerator(Generator):
 
         if self.format == "jsonld":
             self.schema["@context"] = context[0] if len(context) == 1 and not base_prefix else context
-            if base_prefix:
-                self.schema["@context"].append({"@base": base_prefix})
         # json_obj["@id"] = self.schema.id
         out = str(as_json(self.schema, indent="  ")) + "\n"
         self.schema = self.original_schema

The change in JSON-LD context generator sets base regardless of the model check.

expand/collapse Patch for JSON-LD CONTEXT handling of context base
diff --git a/linkml/generators/jsonldcontextgen.py b/linkml/generators/jsonldcontextgen.py
index da2d9d92..43e12f3c 100644
--- a/linkml/generators/jsonldcontextgen.py
+++ b/linkml/generators/jsonldcontextgen.py
@@ -98,12 +98,6 @@ class ContextGenerator(Generator):
             comments.source = self.schema.source_file
             context.comments = comments
         context_content = {"xsd": "http://www.w3.org/2001/XMLSchema#"}
-        if base:
-            base = str(base)
-            if "://" not in base:
-                self.context_body["@base"] = os.path.relpath(base, os.path.dirname(self.schema.source_file))
-            else:
-                self.context_body["@base"] = base
         if prefixes:
             for prefix in sorted(self.emit_prefixes):
                 url = str(self.namespaces[prefix])
@@ -120,6 +114,12 @@ class ContextGenerator(Generator):
                 context_content[k] = v
             for k, v in self.slot_class_maps.items():
                 context_content[k] = v
+        if base:
+            base = str(base)
+            if "://" not in base:
+                context_content["@base"] = os.path.relpath(base, os.path.dirname(self.schema.source_file))
+            else:
+                context_content["@base"] = base
         context["@context"] = context_content
         if output:
             with open(output, "w", encoding="UTF-8") as outf:

This results in the model being excluded in both commands below and the base being included in both cases:

  • linkml generate jsonld simple_uri_test.yaml
  • linkml generate jsonld-context --no-model --no-metadata --flatprefixes simple_uri_test.yaml

In jsonld-gen the paths to the imported model contexts are also added to the context, an additional difference. It would change the type of the returned context json to be union(dict,list) serialized to a string?
https://github.com/linkml/linkml/blob/fix-json-ld-context-typing/linkml/generators/jsonldgen.py#L172

The script below generates parsable JSON-LD with data that conforms to the model.
For this to work I had to set the model flag to be true (the default), so the context generated is different than the existing gen-jsonld case) Even though the RDF is now parsable, I haven't done regression testing on the changes or reviewed the generated context in detail. I'd recommend adding or checking that unit tests to cover this.

expand/collapse simple script to generate json-ld and context, read it and write as turtle
import os
import json
from rdflib import Graph

# Generate the JSON-LD
os.chdir('tests/test_scripts/input')
os.system("linkml generate jsonld simple_uri_test.yaml > simple_uri_test.out.jsonld")

# also generate the .context.jsonld for the included LinkML .yaml, doesn't seem to happen automatically
os.system("linkml generate jsonld-context simple_slots.yaml > simple_slots.context.jsonld")
os.chdir('includes')
os.system("linkml generate jsonld-context simple_types.yaml > simple_types.context.jsonld")
os.chdir('..')

# parse and reserialize the resulting RDF/JSON-LD as RDF/Turtle
# the results look plausible, but I haven't gone over them in detail
g = Graph().parse("simple_uri_test.out.jsonld", format='json-ld')
print(g.serialize(format='turtle'))

print("\n\n---------------------\n\n")

# Generate the JSON-LD Context directly and read it back in
ctx_file = "simple_uri_test.out.context.jsonld"
os.system(f"linkml generate jsonld-context --no-metadata --mergeimports --flatprefixes simple_uri_test.yaml > {ctx_file}")
with open(ctx_file) as f:
    ctx = json.load(f)

# Create a data object and make it into JSON-LD by attaching the context
data = {
    "@context": ctx,
    "model": {
        "state": "discombobulated"
    }
}

# Parse the data object as RDF/JSON-LD and re-serialize it as RDF/Turtle
# the triple [] simple_uri_test:model [ foo:state "discombobulated" ] . corresponds to the data.
g2 = Graph().parse(data=data, format='json-ld')
print(g2.serialize(format='turtle'))

The generated RDF from the example script is as follows, which is what I would expect:

@prefix foo: <http://samples.r.us/foo#> .
@prefix simple_uri_test: <https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/> .

[] simple_uri_test:model [ foo:state "discombobulated" ] .

@Silvanoc
Copy link
Contributor Author

@tfliss I don't get what you mean with following sentence:

Given that the JSON-LD may not parse in either form, do you want to break this work out into a separate issue?

Do you want to have an issue for this PR? I mean there is not much in here, so I don't know what you would want to break into a separate issue...

@Silvanoc
Copy link
Contributor Author
Silvanoc commented Apr 11, 2025

@tfliss the context after applying your patch is closer to the one being currently generated, than that generated applying the patch of this PR. You can see them below:

"@context" from current "main"
[
  "https://w3id.org/linkml/extensions.context.jsonld",
  "simple_slots.context.jsonld",
  "includes/simple_types.context.jsonld",
  "https://w3id.org/linkml/types.context.jsonld",
  {
    "@base": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/"
  }
]
"@context" from current "fix-json-ld-context-typing"
[
  "https://w3id.org/linkml/meta.context.jsonld",
  {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "bar": {
      "@id": "http://samples.r.us/bar_",
      "@prefix": true
    },
    "foo": "http://samples.r.us/foo#",
    "linkml": "https://w3id.org/linkml/",
    "simple_uri_test": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/",
    "@vocab": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/",
    "tag": {
      "@type": "xsd:anyURI",
      "@id": "linkml:tag"
    },
    "value": {
      "@type": "linkml:Any",
      "@id": "linkml:value"
    },
    "extensions": {
      "@type": "linkml:Extension",
      "@id": "linkml:extensions"
    },
    "state": {
      "@id": "foo:state"
    },
    "AnyValue": {
      "@id": "linkml:Any"
    },
    "Extensible": {
      "@id": "linkml:Extensible"
    },
    "Extension": {
      "@id": "linkml:Extension"
    },
    "Model": {
      "@id": "foo:model"
    }
  },
  "https://w3id.org/linkml/extensions.context.jsonld",
  "simple_slots.context.jsonld",
  "includes/simple_types.context.jsonld",
  "https://w3id.org/linkml/types.context.jsonld",
  {
    "@base": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/"
  }
]
"@context" from Tim's patch on "fix-json-ld-context-typing"
[
  "https://w3id.org/linkml/meta.context.jsonld",
  {
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "@base": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/"
  },
  "https://w3id.org/linkml/extensions.context.jsonld",
  "simple_slots.context.jsonld",
  "includes/simple_types.context.jsonld",
  "https://w3id.org/linkml/types.context.jsonld"
]

Or possibly easier to view are the diffs:

Difference "@context" between "main" and Tim's patch on "fix-json-ld-context-typing"
--- ../main/main.context.ld.json	2025-04-11 14:06:34
+++ ./fix-json-ld-context-typing.context.ld.json	2025-04-11 14:06:16
@@ -1,9 +1,11 @@
 [
+  "https://w3id.org/linkml/meta.context.jsonld",
+  {
+    "xsd": "http://www.w3.org/2001/XMLSchema#",
+    "@base": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/"
+  },
   "https://w3id.org/linkml/extensions.context.jsonld",
   "simple_slots.context.jsonld",
   "includes/simple_types.context.jsonld",
-  "https://w3id.org/linkml/types.context.jsonld",
-  {
-    "@base": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/"
-  }
+  "https://w3id.org/linkml/types.context.jsonld"
 ]
Difference "@context" between "main" and "fix-json-ld-context-typing"
--- ../main/main.context.ld.json	2025-04-11 14:06:34
+++ ./tim.context.ld.json	2025-04-11 14:12:13
@@ -1,4 +1,43 @@
 [
+  "https://w3id.org/linkml/meta.context.jsonld",
+  {
+    "xsd": "http://www.w3.org/2001/XMLSchema#",
+    "bar": {
+      "@id": "http://samples.r.us/bar_",
+      "@prefix": true
+    },
+    "foo": "http://samples.r.us/foo#",
+    "linkml": "https://w3id.org/linkml/",
+    "simple_uri_test": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/",
+    "@vocab": "https://raw.githubusercontent.com/linkml/linkml/main/tests/test_scripts/input/simple_uri_test/",
+    "tag": {
+      "@type": "xsd:anyURI",
+      "@id": "linkml:tag"
+    },
+    "value": {
+      "@type": "linkml:Any",
+      "@id": "linkml:value"
+    },
+    "extensions": {
+      "@type": "linkml:Extension",
+      "@id": "linkml:extensions"
+    },
+    "state": {
+      "@id": "foo:state"
+    },
+    "AnyValue": {
+      "@id": "linkml:Any"
+    },
+    "Extensible": {
+      "@id": "linkml:Extensible"
+    },
+    "Extension": {
+      "@id": "linkml:Extension"
+    },
+    "Model": {
+      "@id": "foo:model"
+    }
+  },
   "https://w3id.org/linkml/extensions.context.jsonld",
   "simple_slots.context.jsonld",
   "includes/simple_types.context.jsonld",

I lack the level of JSON-LD expertise to assess which result is correct. What I can tell is that in neither main nor your patch an entry appears for the foo prefix, but it does with the patch of this PR. Again, I cannot tell was is correct.

@tfliss
Copy link
Contributor
tfliss commented Apr 11, 2025

@Silvanoc Thanks for the great summary and output diffs above. The elements in your branch with type and id entries are due to the model=False argument being changed to True by the time it got to the ContextGenerator logic. The entries with prefix are due to the prefix=False not being passed in and defaulting to True. foo is also a prefix, but has different behavior because it ends in #. Flipping the --prefixes/--noprefixes and --model/--no-model options on the gen-jsonld-context cli can reproduce the extra fields.

linkml generate jsonld-context --no-metadata --model --prefixes simple_uri_test.yaml

One thing I realized is it is not necessary to move the code block in the context generator. While I'd still recommend a core dev review I'm otherwise confident enough today to recommend applying the patches to this PR.

Copy link
Contributor
@tfliss tfliss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just finishing my review. This is a great improvement, thanks for talking through the context behaviors with me, I understand it a lot better now.

At this point I'd recommend:

  • applying the patches we discussed
  • update the snapshot as appropriate to the patched output
  • add a multi-context test parameter case to /test_scripts/test_gen_jsonld.py#test_metamodel_valid_calls()

@Silvanoc Silvanoc force-pushed the fix-json-ld-context-typing branch 3 times, most recently from 504e311 to f3fafd9 Compare April 14, 2025 13:30
@Silvanoc Silvanoc marked this pull request as ready for review April 15, 2025 17:43
@Silvanoc Silvanoc force-pushed the fix-json-ld-context-typing branch 2 times, most recently from 6b1bd2f to 8dc37f4 Compare April 16, 2025 21:10
@amc-corey-cox amc-corey-cox requested a review from tfliss April 23, 2025 16:39
@amc-corey-cox
Copy link
Contributor

@tfliss this looks like @Silvanoc has addressed your comments. Can you pleases take another look and make sure this is ready to go? Great work to both of you and thank you!

@tfliss
Copy link
Contributor
tfliss commented Apr 23, 2025

@tfliss this looks like @Silvanoc has addressed your comments. Can you pleases take another look and make sure this is ready to go? Great work to both of you and thank you!

@amc-corey-cox Can you clarify the procedures/expectations of approval? Does merging to main wait until all of the issues under the parent #2624 are approved? On its own this PR still includes the model info in the snapshot JSON-LD despite the model=False flag being set. Once you confirm that's ok, I'll approve.

@Silvanoc
Copy link
Contributor Author

Does merging to main wait until all of the issues under the parent #2624 are approved?

It's the other way around. This is a sub-issue of #2624. So merging this PR is a pre-condition (among others) for that issue to be fulfilled.

@Silvanoc Silvanoc marked this pull request as draft April 24, 2025 12:09
@Silvanoc
Copy link
Contributor Author

@tfliss @amc-corey-cox I'm trying to improve test coverage and observing some strange things. Therefore moving to "draft" to block merging until I have a clearer understanding.

@Silvanoc Silvanoc force-pushed the fix-json-ld-context-typing branch from 8dc37f4 to 2af9549 Compare May 9, 2025 16:01
Silvanoc added 4 commits May 15, 2025 07:58
CLI supports multiple context files (what returns a tuple), but code
mostly expecting a single context (and therefore only a string). IDEs
and linters recognize the inconsistency thanks to typing.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Validate that appending multiple files on JSON-LD generation works as
expected.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
@Silvanoc Silvanoc force-pushed the fix-json-ld-context-typing branch from 181f138 to b6410c0 Compare May 15, 2025 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data typing inconsistencies in JSON-LD generator
3 participants
0