8000 [#405] Convert object to/from JSON by jabolina · Pull Request #411 · infinispan/protostream · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[#405] Convert object to/from JSON #411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 14, 2025

Conversation

jabolina
Copy link
Member

The field IDs for Instant and Date were ignored during the conversion so the generated JSON was an empty string. The fix was a bit more convoluted. I'll explain all I have and maybe there's a better solution.

  • An Instant is converted by the WrappedMessage to byte[] by writing one int64 field for seconds and int32 for nanos.
  • The Proto to JSON conversion needs to read two fields now. The message-wrapping.proto file describes the seconds part as a oneOf which writes the key as "_value". We still need to write to fields, so we end with {"_value": 0, "wrappedInstantNanos": 0} without any typing.
  • We could use the type as a WrappedMessage, but the conversion from JSON to Instant would fail as it creates the Instant inside the WrappedMessage object. So we need an explicit type for the conversion to create the correct object.

Since we need this conversion from JSON to Instant to work, I've created an adapter in the types module. During the Proto to JSON conversion, we require this adapter to be registered so we can write the field names and the type correctly.

We are still left with converting JSON to Proto byte[]. We need to verify in the WrappedMessage class whether this object corresponds to a known type. Otherwise, the generated byte[] will be entirely different from the one produced by Instant to Proto byte[] serialization.

The Date object suffers from the same thing. The difference is that:

  • We serialize Date as an int64 from the Date#getTime method.
  • This would create a JSON as {"_type": "int64", "_value": 12345}. Which loses any type parameter that this value is a Date object.
  • The conversion from JSON to Date will fail as it returns a primitive long number instead of the Date object.

Closes #405.

@jabolina jabolina requested a review from a team as a code owner January 27, 2025 19:36
@jabolina
Copy link
Member Author
jabolina commented Feb 14, 2025

Directly wrapping objects with the WrappedMessage leads to:

// new WrappedMessage(new WrappedMessage(UUID.randomUUID()))
{
   "_type":"org.infinispan.protostream.WrappedMessage",
   "_value":{
      "_type":"org.infinispan.protostream.WrappedMessage",
      "_value":{
         "_type":"org.infinispan.protostream.commons.UUID",
         "mostSigBitsFixed":15461222357418722979,
         "leastSigBitsFixed":13070220576097665502
      }
   }
}

Using with collections:

// List.of(new WrappedMessage(UUID.randomUUID()), new WrappedMessage(UUID.randomUUID()))
{
   "_type":"collections.ListOfAdapter",
   "elements":[
      {
         "_type":"org.infinispan.protostream.WrappedMessage",
         "_value":{
            "_type":"org.infinispan.protostream.commons.UUID",
            "mostSigBitsFixed":1624897274251135499,
            "leastSigBitsFixed":11937877021784957987
         }
      },
      {
         "_type":"org.infinispan.protostream.WrappedMessage",
         "_value":{
            "_type":"org.infinispan.protostream.commons.UUID",
            "mostSigBitsFixed":12908692626181016473,
            "leastSigBitsFixed":9445129632214002810
         }
      }
   ]
}

I've also removed the use of _value by default in fields defined in the message-wrapping.proto:

// new WrappedMessage(1.23f)
{
   "_type":"org.infinispan.protostream.WrappedMessage",
   "wrappedFloat":1.23
}

@jabolina
Copy link
Member Author

I'll revert this to a draft. There are some corner cases with more nested objects, collections, and adapters that don't create exact same byte array for the conversion of JSON -> Proto.

And by looking at the test failures, the _value field is expected in several places :/

@jabolina jabolina marked this pull request as draft February 14, 2025 21:53
@jabolina
Copy link
Member Author

Some containers were completely ignored by the JSON serialization (ArrayList, HashSet). I'll work to add those as well.

@jabolina
Copy link
Member Author
jabolina commented Apr 1, 2025

Some time later, more than I would like. I've refactored the old code. Instead of doing (parse, write, prettify, etc.) spread all around, I've created specialized classes for each type we would expect in the byte stream: primitives, nested objects, repeated fields, maps, and containers.

This uses the specialized classes with internal delegates, so each class can focus on doing its part and passing the request along. For example, a list of UUIDs: there is the root parser, which passes the request to an object writer for the adapter, which delegates to an array writer, which delegates each UUID to an object writer.

The writers will pass the context along so we can track where we are in the JSON. Leading to the other change. Instead of passing a StringBuilder and writing, we tokenize the byte stream by pushing tokens from the JSON specification into a list. After reading the stream, we write the tokens in order and create the final JSON.

The parsing of JSON to byte array remains more or less the same as before, plus fixes to handle more types. This needed changes to go over again from the start when parsing a document, and we need to pass along which field number it belongs. The downside here is that we go to the bottom of the JSON and build up writing in a nested buffer. This has some extra allocations in some places when we need to wrap a byte array with WRAPPED_MESSAGE and the field.

The changes here are pretty much backwards compatible with what we had before. A JSON written in a previous version should be decoded just fine. Observe that we only changed the expected JSON in a single test: a test that uses an ArrayList container. Everything else is the same.

@jabolina jabolina marked this pull request as ready for review April 1, 2025 20:46
@jabolina jabolina linked an issue Apr 1, 2025 that may be closed by this pull request
@karesti
Copy link
Contributor
karesti commented Apr 7, 2025

@jabolina would this allow to push json from the client and convert to / from protostream ? in hotrod I mean

8000

@jabolina
Copy link
Member Author
jabolina commented Apr 7, 2025

@karesti, correct. A sample application doing that with the Hot Rod client:

public class EncodingRemoteCache {

   private static final String CACHE_NAME = "test-cache";

   private static void registerMagazineSchemaInTheServer(RemoteCacheManager cacheManager, GeneratedSchema schema) throws IOException {
      // Retrieve metadata cache
      RemoteCache<String, String> metadataCache =
            cacheManager.getCache(PROTOBUF_METADATA_CACHE_NAME);

      // Define the new schema on the server too
      metadataCache.put(schema.getProtoFileName(), schema.getProtoFile());
   }

   public static ConfigurationBuilder connectionConfig() {
      ConfigurationBuilder builder = new ConfigurationBuilder();
      builder.addServer()
            .host("127.0.0.1").port(11222)
            .security()
            .authentication()
            .username("admin")
            .password("password");

      String configuration = """
            localCache:
              name: "test-cache"
              statistics: "true"
              encoding:
                mediaType: "application/x-protostream"
            """;
      builder.remoteCache(CACHE_NAME).configuration(configuration);
      return builder;
   }

   public static void main(String[] args) throws IOException {
      GeneratedSchema userSchema = new UserModelSerializationContextInitializerImpl();
      ProtoStreamMarshaller marshaller = new ProtoStreamMarshaller();
      marshaller.register(userSchema);

      ConfigurationBuilder cb = connectionConfig();
      cb.marshaller(marshaller);

      RemoteCacheManager rcm = new RemoteCacheManager(cb.build());
      registerMagazineSchemaInTheServer(rcm, userSchema);
      RemoteCache<String, UserModel> userCache = rcm.getCache(CACHE_NAME);
      RemoteCache<String, String> jsonCache = rcm.getCache(CACHE_NAME)
            .withDataFormat(DataFormat.builder()
                  .valueType(MediaType.APPLICATION_JSON)
                  .valueMarshaller(new UTF8StringMarshaller())
                  .build());

      UserModel user = new UserModel("jabolina", "jose", 28, Instant.EPOCH);
      userCache.put("jose", user);
      System.out.println("GET PROTO: " + userCache.get("jose"));
      System.out.println("GET JSON: " + jsonCache.get("jose"));
      rcm.stop();
   }

   @Proto
   public record UserModel(String username, String name, int age, Instant creation) {

      @ProtoSchema(
            syntax = ProtoSyntax.PROTO3,
            dependsOn = {
                  CommonContainerTypes.class,
            },
            includeClasses = UserModel.class,
            schemaFileName = "user.proto",
            schemaFilePath = "proto/",
            schemaPackageName = "user"
      )
      public interface UserModelSerializationContextInitializer extends GeneratedSchema {}
   }
}

The output here for the system out is:

GET PROTO: UserModel[username=jabolina, name=jose, age=28, creation=1970-01-01T00:00:00Z]
GET JSON: {"_type":"user.UserModel","username":"jabolina","name":"jose","age":28,"creation":0}

I believe the marshaller here is only necessary for the put operation. The server is the one doing the work of transforming the model to JSON. In theory, the Hot Rod client doesn't need to know about the Proto descriptor.

Also, the changes in the PR already exist in previous versions of ProtoStream. The current PR is only refactoring that part.

@karesti
Copy link
Contributor
karesti commented Apr 8, 2025

@jabolina I believed it was something else, like really posting json JSONObject for example, and getting it converted!

@@ -711,6 +719,20 @@ private static <E> E readContainerElementWrapped(ImmutableSerializationContext c
return readMessage(ctx, elementReader, true);
}

public static boolean knownWrappedDescriptor(ImmutableSerializationContext ctx, GenericDescriptor descriptor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a stream?

   public static boolean knownWrappedDescriptor(ImmutableSerializationContext ctx, GenericDescriptor descriptor) {
      return WRAPPED_KNOWN_TYPES.stream()
            .filter(ctx::canMarshall)
            .map(ctx::getMarshaller)
            .filter(Objects::nonNull)
            .anyMatch(m -> ctx.getDescriptorByName(m.getTypeName()) == descriptor);
   }

We could even remove WRAPPED_KNOWN_TYPES and just declare inline: Stream.of(Instant.class, Date.class)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I've kept the WRAPPED_KNOWN_TYPES so we don't create the stream every invocation.

*
* @author anistor@redhat.com
* @author José Bolina
* @since 4.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is sufficiently different to the old code that we can change this to 6.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

if (ast.isEmpty()) return;

int last = ast.size() - 1;
JsonTokenWriter jtw = Objects.requireNonNull(ast.get(last));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame we don't baseline on 21 so we could use getLast() 😢 .

*
* @param out The output to write the token.
*/
void write(StringBuilder out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth making this a higher level interface, e.g. Appendable?

I can't think of a particular use-case, I'm just thinking it might provide more flexibility in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make that work. I'll create the Appendable outside and extend the interface here.

}

@Test
public void testNestedWrappedMessage() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current JSON produced by this is:

{"_type":"org.infinispan.protostream.WrappedMessage","wrappedTypeId":1005,"wrappedMessage":"GTBDX5fqKajeIaHZX/rYYGi9"}

Whereas I would expect the UUID type to be represented in a readable format.

I like the generic marshallAndUnmarshallTest but I think we need to make it more robust by also passing in the expected JSON output so that we can verify it's correct. I'm not sure how we do this in a way that satisfies all of the MarshallingMethodType values though 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a matter of distinguishing when the field was coming from a wrapped message of from a primitive. This now outputs something like:

{
   "_type":"org.infinispan.protostream.WrappedMessage",
   "wrappedMessage":{
      "_type":"org.infinispan.protostream.commons.UUID",
      "mostSigBitsFixed":16476212963255011726,
      "leastSigBitsFixed":11281785963689059814
   }
}

The tests in ProtobufUtilTest do something similar, validating the expected JSON, but it does not have as many tests as here. I'll create a new test dedicated to the JSON conversion and compare it against the contents of a file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a new test that validates the JSON format. I am unsure if we can approach this in some other way to avoid the many new files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the individual files are good for readability. AFAICT we're still missing test cases for prettyPrint? I don't think we need a test for all cases, but it would be good to have at least one just to be sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. I'll put together some nested objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanemerson, great suggestion! There was a problem where a raw list ("list": ["a", "b"]) didn't have the elements with proper indentation. This is fixed now.

* Instant and Date objects not handled during JSON conversion.
* Additionally, these objects try to convert as primitive types. This
  loses all the type context for the conversion back from JSON.
* Require adapters from the types module to perform the conversion to
  JSON.
* Updates JSON serialization to support nested types in lists and
  wrapped messages.
@ryanemerson ryanemerson merged commit bf16c3c into infinispan:main Apr 14, 2025
3 checks passed
@ryanemerson
Copy link
Contributor

Thanks @jabolina

@jabolina jabolina deleted the 405/instant-json branch April 14, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to convert serialized Instant to JSON Unable to parse json bytes containing WrappedMessage
3 participants
0