What Happened to Marshalling?
In Jankson 2, the Marshalling story needs to change. The 1.2.x API assumed that you would be translating to and from "intermediate value" objects, such as a JsonArray or JsonPrimitive. Because of streaming, this is no longer a safe assumption. So whatever shape the eventual API takes, let's lay down some ground rules:
- People familiar with the 1.2.x API should be able to write simple functions converting between ValueElement and the target type, at the cost of buffering the data beforehand.
- People should be able to do a little more work and create StructuredDataReaders and Writers to take full advantage of streaming.
- Wherever possible, reflection-based serialization and deserialization should Do The Right Thing, including honoring annotation hints in the class.
Design Round 1: Function Factories
The first draft contains ObjectReaderFactory, where you can register "classic" serializers in the form, Function<T, ValueElement>; or you could register "streaming" serializers in the form, Function<T, StructuredDataReader>. ObjectReaderFactory takes all this info, and regardless of which type of serializer was registered, it's able to give you a StructuredDataReader for any registered type. Then you use your typical transferTo to move that data to the json file or wherever you want it to go.
record User(String name, String nonce, long lastLogin) {};
ObjectReaderFactory readerFactory = new ObjectReaderFactory();
readerFactory.registerSerializer(
User.class,
(user) -> {
ObjectElement result = new ObjectElement();
result.put("name", PrimitiveElement.of(user.name()));
result.put("nonce", PrimitiveElement.of(user.nonce()));
result.put("last-login", PrimitiveElement.of(user.lastLogin()));
return result;
});
User carl = new User("Carl", "QSbB86Yp6N", 1680699521);
StructuredDataReader reader = readerFactory.getReader(carl);
// Let's pipe it to a StringWriter and print the result to stdout
StringWriter sw = new StringWriter();
JsonWriter writer = new JsonWriter(sw);
reader.tranferTo(writer);
System.out.println(sw.toString());
ObjectWriterFactory has "classic" deserializers in the form, Function<ValueElement, T>, as well as "streaming" serializers in the form, Function<StructuredDataReader, T>. Again, a shim is used to buffer data for the classic deserializer, so the ObjectWriterFactory can always give you a deserializer function for a given type.
ObjectWriterFactory writerFactory = new ObjectWriterFactory();
writerFactory.register(
User.class,
(ValueElement value) -> {
if (value instanceof ObjectElement obj) {
User result = new User(
obj.getString("name").orElse(null),
obj.getString("nonce").orElse(null),
obj.getLong("lastLogin").orElse(0L)
);
return result;
} else {
// Yes, this is allowed because it's a CheckedFunction
throw new IOException("Value must be an ObjectElement");
}
});
final String CARL_STRING = """
{ "name": "Carl", "nonce": "QSbB86Yp6N", "last-login": 1680699521 }
"""
StructuredDataReader reader = new JsonReader(new StringReader(CARL_STRING));
StructuredDataWriter writer = ...
// This situation right here is when I realized that we WANT a writer, but
// weren't providing it, so the actual API would wind up being like,
User user = writerFactory.getDeserializer(User.class).apply(reader);
// And that's ugly and asymmetrical.
Pros
- Straightforward implementation that would definitely hit all the requirements
- Don't have to implement a deserializer just because you implemented a serializer
Cons
- Seems like a lot of "configuration-like" objects to throw around
- More complicated to use than Marshaller
- Very opaque wording - Serializer and Deserializer versus ObjectReader and ObjectWriter feel backwards
- ObjectWriterFactory has no relation to similarly named ObjectWriter
- CheckedFunction used in writers but not readers, not sure what to do about exceptions.
Design Round 2: Deserializer and Codec
Notice that ObjectWriterFactory gave not a StructuredDataWriter but a deserializer function. This is because, at the end of the day, we need a T, your domain type. For example, the built-in ObjectWriter has a toObject() to get that completed T once everything's written. That tells me there's type information missing.
Enter Deserializer<T>. This is a StructuredDataWriter with additional methods. getResult() will give you your completed T when the data is fully written, and isComplete() tells you if the data that has already been written can be used to construct a complete T. This lets us restore the symmetry between serializers and deserializers, and use them on the hot path of streaming. Existing ObjectWriters will need to be recast to Deserializer, but the result will be a very coherent set of classes.
Meanwhile, something interesting has been happening over in DataFixerUpper-land. Codecs have emerged to represent serialization and deserialization into / out of a common data fabric. Packaging the serializer and deserializer for one type together makes a lot of sense, and reduces the number of configuration-like objects people have to deal with.
So let's consider Codec as a way to get a StructuredDataReader or a Deserializer (the specialized type of Writer) for the target type, and CodecManager as a way to get a reader/writer for ANY registered type.
// Note: There's currently an interface/impl split not shown here,
// that I'll likely get rid of.
CodecManager codecManager = new CodecManager();
StructuredDataCodec<User> userCodec = new JsonValueCodec<>(
User.class,
(User user) -> {
ObjectElement result = new ObjectElement();
result.put("name", PrimitiveElement.of(user.name()));
result.put("nonce", PrimitiveElement.of(user.nonce()));
result.put("last-login", PrimitiveElement.of(user.lastLogin()));
return result;
},
(ValueElement value) -> {
if (value instanceof ObjectElement obj) {
User result = new User(
obj.getString("name").orElse(null),
obj.getString("nonce").orElse(null),
obj.getLong("lastLogin").orElse(0L)
);
return result;
} else {
// Yes, this is allowed because it's a CheckedFunction
throw new IOException("Value must be an ObjectElement");
}
});
codecManager.register(userCodec);
// I'm using an anonymous block to isolate variable names here
{
// Use codec to pull data from an object
User carl = new User("Carl", "QSbB86Yp6N", 1680699521);
StructuredDataReader reader = codecManager.getReader(carl);
StringWriter sw = new StringWriter();
StructuredDataWriter writer = new JsonWriter(sw);
reader.transferTo(writer);
System.out.println(sw.toString());
}
{
// Use codec to put data into a new object
StructuredDataReader reader = new JsonReader(new StringReader(CARL_STRING));
Deserializer<User> writer = codecManager.getWriter(User.class);
reader.transferTo(writer);
User carl = writer.getResult();
System.out.println(carl);
}
Codecs come with some interesting side effects. You could map a codec from one type to another. This would result in the json being deserialized to the original type, but then being converted to the new type via a mapping Function. Or, potentially, the ability to build codecs out of lists or trees of other codecs. I'll continue to explore this path and update this page as I learn more.
Pros
- Fewer configuration-like objects floating around
- Great potential to leverage function composition
- Similar to code that users are likely to be already familiar with
Cons
- Defining a codec could potentially be a lot of lines of code
- Might be an interface segregation principle failure, requiring a matching deserializer for every registered serializer