Java 序列化之 Hessian

在 Java 序列化之深入理解 Java 序列化和反序列化一文中，大明哥说到，Java 的原生序列化存在三个缺点：

无法跨语言
序列化后的字节流太大
序列化时间太长

这三个缺陷是我们在开发过程中是无法容忍的，所以我们必须来寻找一个可替代的方案。Hessian 就是一个不错的选择。

什么是 Hessian

Hessian 是一种动态类型、二进制序列化和 Web 服务协议，专为面向对象的传输而设计。

与 Java 原生序列化类似，Hessian 也是采用二进制协议，不过它比 Java 原生序列化的性能更加高，序列化的字节数也更小。

与 Java 原生序列化相比，Hessian 具有如下几个特点（http://hessian.caucho.com/doc/hessian-serialization.html）：

它必须自我描述序列化类型，即不需要外部模式或接口定义
它必须是独立于语言的，包括支持脚本语言
它必须是可以通过单一方式进行读写
它必须尽可能紧凑
它必须简单，这样才能有效地测试和实现
必须尽可能地快
它必须支持Unicode字符串
它必须支持8位二进制数据，而不需要转义或使用附件
它必须支持加密、压缩、签名和事务上下文信封

目前 Hessian 已经到 2.0 版本了，相比 1.0 版本，Hessian 2.0 版本增加了压缩编码，其序列化二进制流大小事 Java 序列化的 50%，序列化耗时是 Java 序列化的 30%，反序列化耗时是 Java 序列化的20%。

Hessian 语法

Hessian 序列化的字节数更加小，与它的实现序列化的语法密不可分，如下：

           # starting production
top        ::= value

           # 8-bit binary data split into 64k chunks
binary     ::= x41 b1 b0 <binary-data> binary # non-final chunk
           ::= 'B' b1 b0 <binary-data>        # final chunk
           ::= [x20-x2f] <binary-data>        # binary data of
                                                 #  length 0-15
           ::= [x34-x37] <binary-data>        # binary data of
                                                 #  length 0-1023

           # boolean true/false
boolean    ::= 'T'
           ::= 'F'

           # definition for an object (compact map)
class-def  ::= 'C' string int string*

           # time in UTC encoded as 64-bit long milliseconds since
           #  epoch
date       ::= x4a b7 b6 b5 b4 b3 b2 b1 b0
           ::= x4b b3 b2 b1 b0       # minutes since epoch

           # 64-bit IEEE double
double     ::= 'D' b7 b6 b5 b4 b3 b2 b1 b0
           ::= x5b                   # 0.0
           ::= x5c                   # 1.0
           ::= x5d b0                # byte cast to double
                                     #  (-128.0 to 127.0)
           ::= x5e b1 b0             # short cast to double
           ::= x5f b3 b2 b1 b0       # 32-bit float cast to double

           # 32-bit signed integer
int        ::= 'I' b3 b2 b1 b0
           ::= [x80-xbf]             # -x10 to x3f
           ::= [xc0-xcf] b0          # -x800 to x7ff
           ::= [xd0-xd7] b1 b0       # -x40000 to x3ffff

           # list/vector
list       ::= x55 type value* 'Z'   # variable-length list
     ::= 'V' type int value*   # fixed-length list
           ::= x57 value* 'Z'        # variable-length untyped list
           ::= x58 int value*        # fixed-length untyped list
     ::= [x70-77] type value*  # fixed-length typed list
     ::= [x78-7f] value*       # fixed-length untyped list

           # 64-bit signed long integer
long       ::= 'L' b7 b6 b5 b4 b3 b2 b1 b0
           ::= [xd8-xef]             # -x08 to x0f
           ::= [xf0-xff] b0          # -x800 to x7ff
           ::= [x38-x3f] b1 b0       # -x40000 to x3ffff
           ::= x59 b3 b2 b1 b0       # 32-bit integer cast to long

           # map/object
map        ::= 'M' type (value value)* 'Z'  # key, value map pairs
     ::= 'H' (value value)* 'Z'       # untyped key, value

           # null value
null       ::= 'N'

           # Object instance
object     ::= 'O' int value*
     ::= [x60-x6f] value*

           # value reference (e.g. circular trees and graphs)
ref        ::= x51 int            # reference to nth map/list/object

           # UTF-8 encoded character string split into 64k chunks
string     ::= x52 b1 b0 <utf8-data> string  # non-final chunk
           ::= 'S' b1 b0 <utf8-data>         # string of length
                                             #  0-65535
           ::= [x00-x1f] <utf8-data>         # string of length
                                             #  0-31
           ::= [x30-x34] <utf8-data>         # string of length
                                             #  0-1023

           # map/list types for OO languages
type       ::= string                        # type name
           ::= int                           # type reference

           # main production
value      ::= null
           ::= binary
           ::= boolean
           ::= class-def value
           ::= date
           ::= double
           ::= int
           ::= list
           ::= long
           ::= map
           ::= object
           ::= ref
           ::= string

举几个例子：

boolean

Boolean 的语法如下：

boolean ::= T
        ::= F

T 代表 true，F 代表 false。

用如下代码演示下：

hessianOutput.writeObject(true);
System.out.println("序列化：" + Arrays.toString(bos.toByteArray()));

// --运行结果
序列化：[84]

84 的 ASCII 为 T。

Integer 语法如下：

int ::= 'I' b3 b2 b1 b0
    ::= [x80-xbf]
    ::= [xc0-xcf] b0
    ::= [xd0-xd7] b1 b0

一个 32 位有符合的整数，它由八位数x49('I')表示，后面是整数的4个八位数，以高位优先（big-endian）顺序排列。其中 value = (b3 << 24) + (b2 << 16) + (b1 << 8) + b0

用如下代码演示下：

hessianOutput.writeObject(1024);
System.out.println("序列化：" + Arrays.toString(bos.toByteArray()));

// --运行结果
序列化 ：[73, 0, 0, 4, 0]

73 的 ASCII 为 I，4 << 8 就等于 1024。

对象

对象的语法如下：

class-def  ::= 'C' string int string*

object     ::= 'O' int value*
           ::= [x60-x6f] value*

它的表示也很简单：

class Car {
  String color;
  String model;
}

out.writeObject(new Car("red", "corvette"));
out.writeObject(new Car("green", "civic"));

---

C                        # 类的描述信息
  x0b example.Car        # type is example.Car
  x92                    # two fields
  x05 color              # color field name      // 属性名
  x05 model              # model field name      // 属性名

O                        # object def (long form)
  x90                    # object definition #0
  x03 red                # color field value     // 属性值
  x08 corvette           # model field value     // 属性值

x60                      # object def #0 (short form)
  x05 green              # color field value     // 属性值
  x05 civic              # model field value     // 属性值

更多详情请参考官方文档：http://hessian.caucho.com/doc/hessian-serialization.html

Hessian 的使用

先引入包，大明哥在写这篇文章的时候，最新版本为，4.0.65

<dependency>
  <groupId>com.caucho</groupId>
  <artifactId>hessian</artifactId>
  <version>4.0.65</version>
</dependency>

序列化对象实现 Serializable 接口

public class Student implements Serializable {

    private String name;

    private int age;

    private Integer height;

    private transient String gender;
}

Hessian 序列化

public class Hessian02Test {
    public static void main(String[] args) throws IOException {
        Student student1 = new Student("张三",18,180,"男");

        // 序列化
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        HessianOutput hessianOutput = new HessianOutput(bos);
        hessianOutput.writeObject(student1);
        System.out.println("序列化内容：" + Arrays.toString(bos.toByteArray()));

        //反序列化
        ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
        HessianInput hessianInput = new HessianInput(bis);
        Student student2 = (Student) hessianInput.readObject();
        System.out.println("反序列化内容：" + student2);
    }
}

执行结果

序列化内容：[77, 116, 0, 48, 99, 111, 109, 46, 115, 105, 107, 101, 46, 106, 97, 118, 97, 99, 111, 114, 101, 46, 115, 101, 114, 105, 97, 108, 105, 122, 101, 114, 46, 104, 101, 115, 115, 105, 97, 110, 46, 100, 116, 111, 46, 83, 116, 117, 100, 101, 110, 116, 83, 0, 4, 110, 97, 109, 101, 83, 0, 2, -27, -68, -96, -28, -72, -119, 83, 0, 3, 97, 103, 101, 73, 0, 0, 0, 18, 83, 0, 6, 104, 101, 105, 103, 104, 116, 73, 0, 0, 0, -76, 122]
反序列化内容：Student(name=张三, age=18, height=180, gender=null)

从执行结果来看，反序列化完美将 Student 对象还原了。

这里需要注意的是，无论是 Java 原生序列化还是 Hessian 序列化，对象都必须实现 Serializable 接口，否则会报 must implement java.io.Serializable 异常。

与 Java 原生序列化一样，用 transient 标识的属性是不需要序列化和反序列化的。

还有一点，若对象经过 Hessian 序列化后，在不加 serialVersionUID 的情况下，我们改变该对象的属性都不会引起 Hessian 反序列化失败，所以 Hessian 序列化对象不需要再使用 serialVersionUID 来标注对象版本了。

我们再看一个稍微复杂的对象。

public class SubStudent extends Student{
    private Integer height;

    private String subValue;
}

SubStudent 继承 Student，并且拥有同一个属性 height。我们对 SubStudent 进行序列化和反序列化。

public class Hessian03Test {
    public static void main(String[] args) throws IOException {
        SubStudent subStudent = new SubStudent();
        subStudent.setName("李四");
        subStudent.setHeight(185);
        subStudent.setSubValue("lisi");
        subStudent.setGender("男");

        // 序列化
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        HessianOutput hessianOutput = new HessianOutput(bos);
        hessianOutput.writeObject(subStudent);
    
        //反序列化
        ByteArrayInputStream bis = new ByteArrayInputStream(bos.toByteArray());
        HessianInput hessianInput = new HessianInput(bis);
        SubStudent student2 = (SubStudent) hessianInput.readObject();
        System.out.println("反序列化内容：" + student2);
    }
}

运行结果

反序列化内容：SubStudent(height=null, subValue=lisi)

从反序列化结果我们看到 height 属性值为 null，但是我们明明对其值设置为 185 ，在反序列化时怎么会是 0 呢？这其实是与 Hessian 对复杂对象的处理逻辑相关：Hessian 会把复杂对象的所有属性存储在一个 Map 中进行序列化。所以在父类、子类中存在同名成员变量的情况下，Hessian 序列化时，先序列化子类，然后序列化父类。因此，反序列化结果会导致子类同名成员变量被父类的值覆盖。

Hessian VS Java 原生

在开头大明哥说到，Java 原生序列化性能太差，Hessian 可以作为一个好的替代者，相比 Java 原生序列化它的速度更快，序列化后的字节流更小，且跨语言，兼容性更好。咱们来验证下。

public class Hessian04Test {
    public static void main(String[] args) throws IOException {
        Student student = new Student("张三",18,180,"男");

        // Hessian 序列化
        ByteArrayOutputStream bos1 = new ByteArrayOutputStream();
        HessianOutput hessianOutput = new HessianOutput(bos1);
        hessianOutput.writeObject(student);
        System.out.println("Hessian 序列化字节流大小：" + bos1.toByteArray().length);
        
        // Java 原生序列化
        ByteArrayOutputStream bos2 = new ByteArrayOutputStream();
        ObjectOutputStream outputStream = new ObjectOutputStream(bos2);
        outputStream.writeObject(student);
        System.out.println("Java 原生序列化字节流大小：" + bos2.toByteArray().length);
    }
}

结果

Hessian 序列化字节流大小：94
Java 原生序列化字节流大小：224

从执行结果来看，Hessian 序列化后的字节流确实是比 Java 原生序列化后的字节流要小的多。

Java 面试宝典是大明哥全力打造的 Java 精品面试题，它是一份靠谱、强大、详细、经典的 Java 后端面试宝典。它不仅仅只是一道道面试题，而是一套完整的 Java 知识体系，一套你 Java 知识点的扫盲贴。

它的内容包括：

大厂真题：Java 面试宝典里面的题目都是最近几年的高频的大厂面试真题。
原创内容：Java 面试宝典内容全部都是大明哥原创，内容全面且通俗易懂，回答部分可以直接作为面试回答内容。
持续更新：一次购买，永久有效。大明哥会持续更新 3+ 年，累计更新 1000+，宝典会不断迭代更新，保证最新、最全面。
覆盖全面：本宝典累计更新 1000+，从 Java 入门到 Java 架构的高频面试题，实现 360° 全覆盖。
不止面试：内容包含面试题解析、内容详解、知识扩展，它不仅仅只是一份面试题，更是一套完整的 Java 知识体系。
宝典详情：https://www.yuque.com/chenssy/sike-java/xvlo920axlp7sf4k
宝典总览：https://www.yuque.com/chenssy/sike-java/yogsehzntzgp4ly1
宝典进展：https://www.yuque.com/chenssy/sike-java/en9ned7loo47z5aw

目前 Java 面试宝典累计更新 400+ 道，总字数 42w+。大明哥还在持续更新中，下图是大明哥在 2024-12 月份的更新情况：

想了解详情的小伙伴，扫描下面二维码加大明哥微信【daming091】咨询

同时，大明哥也整理一套目前市面最常见的热点面试题。微信搜[大明哥聊 Java]或扫描下方二维码关注大明哥的原创公众号[大明哥聊 Java] ，回复【面试题】即可免费领取。

阅读全文