Structured Data

You've heard of the Structured Programming Theorem, perhaps. In review, it's been proven that any turing machine or computer algorithm can be represented with only three components: The sequence, the selection, and the loop. No goto's, no funny business, just a small number of simple, well-understood control structures.

Structured data is the same. You can represent any data using a very small number of components. The relational data model uses the table and the reference, for example. That's a kind of structured data. However, when speaking about Jankson, I will often speak about a "tree" or "document" model of structured data. In this model, the components are the list, the dictionary, and the primitive. Every tree or subtree of data will be one of these three components, which provide complete coverage over any acyclic data.

We could add node references to complete the coverage over all types, but these are unsupported and outside the scope of what we're doing here. Let it suffice to say that you can create an acyclic graph out of a cyclic one - for example, if you have a doubly-linked list, you can cut all the reverse pointers and arrive at a simple chain of forward-pointing nodes.

Structured Data Streams

A very common and intuitive way of serializing a tree is to do a depth-first traversal of it. If you examine JSON, YAML, INI, or trivial TOML (TOML and I have beef!! We'll get into that later!), you will see elements listed in depth-first order.

"Structured data streams" are a new feature of Jankson that let you pipe data around arbitrarily. These streams are a linear series of "atoms" which describe that depth-first traversal of the data tree.

Here's an example JSON:

{
	"username": "old-feather-8065",
	"last-login": "2026-06-18",
	"friends": [
		{ "username": "gentle-king-4843", "list-index": 1, "starred": false }
	]
}

And the equivalent structured data stream, one line per atom: (indents below are not part of the stream, they are to help readability)

OBJECT_START
  KEY "username"
  PRIMITIVE "old-feather-8065"
  KEY "last-login"
  PRIMITIVE "2026-06-18"
  KEY "friends"
  ARRAY_START
    OBJECT_START
      KEY "username"
      PRIMITIVE "gentle-king-4843"
      KEY "list-index"
      PRIMITIVE 1
      KEY "starred"
      PRIMITIVE false
    OBJECT_END
  ARRAY_END
OBJECT_END

Here, the data is "serial", but it hasn't been fully serialized. Something else (a "StructuredDataWriter") needs to consume each of these atoms and perform some action related to them, such as emitting character data or constructing a domain object. This also means that any StructuredDataReader only needs to read character data and emit a token stream, and doesn't need to concern itself with constructing a full object tree. And, in fact, an intermediate object tree may never be constructed at all. This greatly reduces the memory footprint of parsing large data files, and lets you get to your domain objects faster.

Using Readers and Writers

The typical workflow for data transformation operations is to create an appropriate reader, create an appropriate writer, and call transferTo to pipe data from one side to the other. For example:

// Takes the String JSON_DATA and parses it as json
JsonReader reader = new JsonReader(new StringReader(JSON_DATA));

// Creates a traditional intermediate Json object tree (the root of the tree will be a ValueElement)
ValueElementWriter writer = new ValueElementWriter();

// The writer will receive each atom the reader produces, until the reader is finished.
reader.transferTo(writer);

// Grabs the ValueElement out of the writer now that the transfer is complete.
ValueElement result = writer.getResult();

What moves from one side to the other is atoms. And those atoms could go anywhere, like directly into a domain object:

// Same as before
JsonReader reader = new JsonReader(new StringReader(JSON_DATA));

// Define some domain types
// Using @SerializedName here to match names from schema above
record Friend(
		String username,
		@SerializedName("list-index") int listIndex,
		boolean starred
		) {};
record User(
		String username,
		@SerializedName("last-login") String lastLogin,
		List<Friend> friends
		) {};

// Same thing here. We create an appropriate writer and pipe the data across.
ObjectWriter<User> writer = new ObjectWriter(User.class);
reader.transferTo(writer);

// Presto! We have a domain object. We *did* buffer data at the very end because User is an immutable type -
// but now this has roughly the same overhead as a single "builder" instance.
User user = writer.toObject();

Serialization and Deserialization Are The Same Operation

Going straight from our domain object to JSON character data:

User user;
Path destFile = Path.of("user.json");

try(BufferedWriter bufferedWriter = Files.newBufferedWriter(destFile)) {
	// You can create/configure ObjectReaderFactory kind of like Marshaller,
	// but a default works for almost all scenarios.
	StructuredDataReader reader = new ObjectReaderFactory().getReader(user);

	// JsonWriter is just another kind of StructuredDataWriter
	JsonWriter writer = new JsonWriter(bufferedWriter);

	reader.transferTo(writer);

	bufferedWriter.flush(); // Just in case.

} catch (IOException | SyntaxError ex) {
	ex.printStackTrace();
}

Shortcuts

The following shortcuts can be used instead of creating explicit readers or writers (try/catch blocks omitted):

ValueElement userJson = Jankson.readJson(JSON_DATA);

User user = Jankson.readJson(
		new StringReader(JSON_DATA),
		JsonReaderOptions.UNSPECIFIED,
		User.class);

Jankson.writeJson(user, Files.newBufferedWriter(destFile));