Site Loader

Share with your friends!

JSON has become one of the most popular data formats within the past few years with the rise of REST APIs and NoSQL database solutions. The main reason for this is its lightweight and human-readable format. I have worked with both JSON and XML data formats. Later, I moved with JSON because of the above-mentioned reasons.

These days I am working on a solution which required to send a large amount of data through the network. When it comes to performance and efficiency, the Request body is not the ideal for this kind of data transferring. Because of that, transferring a file would be more convenient. But the size of this file can be from 20 KB to several MegaBytes as the amount of writing data grows.

Recently I was investigating about Jackson Streaming API. It can provide efficiency up to some extent with its streaming behaviour.

Streaming Processing (aka Incremental Processing) is the most efficient way to process JSON contents. It has a lower amount of memory consumption and the lowest processing overhead, and can often match the performance of many binary data formats available on the Java platform.

This performance comes at a cost. This is not the most convenient way to process JSON content due to below-mentioned reasons.

  • All content to read/write has to be processed in the exact same order as input comes in or output is to go out. For the cases such as random access, you need to use Data Binding or Tree Model (which both actually use Streaming API for actual JSON reading/writing).
  • No Java objects are created unless specifically requested; and even then only very basic types are supported (Strings, byte[] for base64-encoded binary content)

Later, I got to know that the term Smile, a data format based on JSON. In other terms, Smile can also be considered a binary serialization of the generic JSON data model. This format was specified in 2010 by Jackson JSON processor development team. First compliant implementation was included as a Jackson backend for Jackson version 1.6, released in September 2010.

The name comes from first 2 bytes of the 4-byte header, which consist of Smiley “:)” followed by a linefeed: a choice made to make it easier to recognize Smile-encoded data files using textual command-line tools.

A sample smile file content would be something like below.

:)
úƒnameBTom‚age$²†addressøEPolandI5th avenueùû

Efficiency

Compared to JSON, Smile is both more compact and more efficient to process (both to read and write). Part of this is due to more efficient binary encoding (similar to BSON, CBOR and UBJSON), but an additional feature is the optional use of back-references for property names and values.

Smile has two key features,

  • Binary Encoding
  • Back Referencing

Back Referencing

Back referencing allows replacing of property names and/or short (64 bytes or less) String values with 1- or 2-byte reference ids.

This feature makes Smile is more efficient and useful than other binary encoding formats over JSON. This will reduce your content size drastically when your data object contains repeating keys and values. Obviously, when it comes to JSON array of objects there will be a high possibility of having lots of similar values.

Flags comes into play

Not only that, but the size can also be further reduced by using flags introduced for the SmileFactory. During the initialization of the factory, cab enables the flag. As an example, CHECK_SHARED_STRING_VALUES flag is useful in cases where the JSON contains repetitive values to the keys as it enables back referencing for values as well in addition to the keys. By default this flag is disabled.

Available flags for SmileFactory

CHECK_SHARED_NAMES
Whether the generator should check if it can “share” field names during generating content or not.

CHECK_SHARED_STRING_VALUES
Whether generator should check if it can “share” short (at most 64 bytes encoded) String value during generating content or not.

ENCODE_BINARY_AS_7BIT
Whether to use simple 7-bit per byte encoding for binary content when output.

WRITE_END_MARKER
Whether write byte marker that signifies the end of the logical content segment

WRITE_HEADER
Whether to write the 4-byte header sequence when starting output or not.

How Smile reduces the file sizes

[
  {
    "id": "id-1244",
    "name": "John",
    "city": "Colombo",
    "country": "Sri Lanka",
    "isActive": true
  },
  {
    "id": "id-1244",
    "name": "Jenny",
    "city": "Kandy",
    "country": "Sri Lanka",
    "isActive": false
  }
]

sample.json (261 bytes)

Then, I converted this JSON data into Smile format.

:)
øúidˆfirstNameCJohn‡lastNameBDoe…emailsøPjohn.doe@mail.comùŽcreatedDateTimeS2019-08-19T20:30:00Zûú@ÄACJaneBBDoeCøPjane.poe@mail.comMjanep@mail.comùDS2019-08-19T20:45:00Zûù

sample-converted.sml (108 bytes)

This conversion reduced the file size drastically. The new size is ~58% reduced from the original content.

As I have stated earlier, with flags this can be improved further. I have used these flags.

ENCODE_BINARY_AS_7BIT
CHECK_SHARED_STRING_VALUES

The result I got is as follows.

:)
øúidFid-1244ƒnameCJohnƒcityFColombo†countryHSri Lanka‡isActive#ûú@ADJennyBDKandyCD"ûù

sample-converted-with-flags.sml (92 bytes)

This conversion reduced the file size drastically. The new size is ~65% reduced from the original content.

Let me share the source of these implementations.

First of all, jackson-dataformat-smile needs to be added as a dependency.

<dependency>
    <groupId>com.fasterxml.jackson.dataformat</groupId>
    <artifactId>jackson-dataformat-smile</artifactId>
    <version>2.8.9</version>
</dependency>

The implementation for the creation of Smile file without flags as follows.

  public void writeSmileFile() throws IOException
  {
    SmileFactory smileFactory = new SmileFactory();
    ObjectMapper smileMapper = new ObjectMapper(smileFactory);
    ObjectMapper mapper = new ObjectMapper();

    try
    {
      JsonNode jsonNode = mapper.readValue(file, JsonNode.class);
      byte[] val = smileMapper.writeValueAsBytes(jsonNode);
      writeToFile(val, "target/sample-converted.sml");
    }
    catch (IOException e)
    {
      log.error("Some error occurred in json parsing", e);
      throw e;
    }
  }

The implementation for the creation of Smile file with flags as follows.

public void writeSmileFileWithFlags() throws IOException
  {
    SmileFactory smileFactory = new SmileFactory();
    smileFactory.enable(SmileGenerator.Feature.ENCODE_BINARY_AS_7BIT);
    smileFactory.enable(SmileGenerator.Feature.CHECK_SHARED_STRING_VALUES);
    ObjectMapper smileMapper = new ObjectMapper(smileFactory);
    ObjectMapper mapper = new ObjectMapper();

    try
    {
      JsonNode jsonNode = mapper.readValue(file, JsonNode.class);
      byte[] val = smileMapper.writeValueAsBytes(jsonNode);
      writeToFile(val, "target/sample-converted-with-flags.sml");
    }
    catch (IOException e)
    {
      log.error("Some error occurred in json parsing", e);
      throw e;
    }
  }

This is the writeToFile method defined in above examples.

  private void writeToFile(byte[] bytes, String fileDest) throws IOException
  {
    try (FileOutputStream fileOuputStream = new FileOutputStream(fileDest))
    {
      fileOuputStream.write(bytes);
    }
    catch (IOException e)
    {
      log.error("Some error occurred in writing file", e);
      throw e;
    }
  }

Smile support for the Jackson Streaming API

The Smile extension of Jackson extension handles reading and writing of data encoded in Smile data format (“binary JSON”). It extends standard Jackson streaming API (JsonFactory, JsonParser, JsonGenerator), and as such works seamlessly with all the higher-level data abstractions (data binding, tree model, and pluggable extensions).

What makes Smile special?

During my investigation, I found a research paper named Smart Grid Serialization Comparision – Comparision of serialization for distributed control in the context of the Internet of Things.

This explains a quantitative comparison of the serializers. According to that paper, The serialization measures by following titles.

  • Serialization time.
  • Deserialization time.
  • Compression time.
  • Decompression time.
  • Memory use for serialization.
  • Memory use for compression.
  • Serialized message size.
  • Compressed message size.

The comparison carried out with a set of other binary serialize libraries Smile has a good place with numbers. Obviously, this is better than using Jackson directly.

Also, There are several frameworks, Systems that use Smile codec (encoder and decoder).

  • Elastic Search uses Smile as transport format supports access using Smile encoding.
  • Apache Solr can use Smile as the response format with the wt=smile parameter.

Conclusion

Based on the details I have stated Smile can improve efficiency in the aspect of space drastically. Smile is a format you should definitely consider when it comes to JSON and payload over the network. It can improve both writing and reading aspects over the network and during the processing. The data transferred can be reduced by at least 40%.

The main drawback of this solution is readability. The readability we had with JSON is no longer available. This can cause some verification issues and debugging issues during the time of implementation. Each solution with additional benefits comes with some level of trade-off. But this can be solved by using a simple reading operation.

References

Smile (data interchange format (https://en.wikipedia.org/wiki/Smile_(data_interchange_format))

Jackson Data format library https://github.com/FasterXML/jackson-dataformats-binary/tree/master/smile

Smart Grid Serialization Comparision Research paper https://backend.orbit.dtu.dk/ws/portalfiles/portal/127962965/Smart_Grid_Serialization_Comparision.pdf

Smile format spec
https://github.com/FasterXML/smile-format-specification
https://github.com/FasterXML/smile-format-specification/blob/master/smile-specification.md

Smile design goals https://github.com/FasterXML/smile-format-specification/blob/master/smile-design-goals.md

Share with your friends!

Leave a Reply

Your email address will not be published. Required fields are marked *