闻所未闻的JVM底层之类加载

 2023-01-25
原文作者:Five在努力 原文地址:https://juejin.cn/post/6909438065396318222

JVM中的OOP-KLASS模型

在JVM中,使用OOP-KLASS模型来表示Java对象

OOP或者OOPS(Ordinary Object Pointer)指的是普通对象指针,主要职能是表示对象的实例数据,存储在堆里面

Klass是用来描述实例的具体类型,实现语言层面是Java类,存储在元空间(JDK8之后)

打开OpenJDK源码

202301011653145581.png这里我用的jdk12编译的,之后会单独出一篇mac下编译jdk源码的教程,没编译出来的也没事,不影响后面学习

打开Klass类

202301011653153992.png 这里可以看出来,Klass继承自Metadata(元数据),我们跳转到Metadata中

202301011653164823.png Metadata继承了MetaspaceObj(元空间)

在Klass下面,还继承了几个类,分别是:

  • InstanceKlass

    • InstanceMirrorKlass:描述java.lang.class的实例
    • InstanceRefKlass:描述java.lang.ref.Reference的子类
    • InstanceClassLoaderKlass:用于遍历某个加载器加载的类
  • ArrayKlass

    • TypeArrayKlass:描述Java中基本类型数组的数据结构
    • ObjArrayKlass:描述Java中引用类型数组的数据结构

接下来一一讲解

InstanceKlass

类加载器把class文件加载到内存中,生成的类

    // An InstanceKlass is the VM level representation of a Java class.
    // It contains all information needed for at class at execution runtime.
    
    //  InstanceKlass embedded field layout (after declared fields):
    //    [EMBEDDED Java vtable             ] size in words = vtable_len
    //    [EMBEDDED nonstatic oop-map blocks] size in words = nonstatic_oop_map_size
    //      The embedded nonstatic oop-map blocks are short pairs (offset, length)
    //      indicating where oops are located in instances of this klass.
    //    [EMBEDDED implementor of the interface] only exist for interface
    //    [EMBEDDED unsafe_anonymous_host klass] only exist for an unsafe anonymous class (JSR 292 enabled)
    //    [EMBEDDED fingerprint       ] only if should_store_fingerprint()==true
    
    
    // forward declaration for class -- see below for definition
    struct JvmtiCachedClassFileData;
    
    class InstanceKlass: public Klass {
      friend class VMStructs;
      friend class JVMCIVMStructs;
      friend class ClassFileParser;
      friend class CompileReplay;
    
     public:
      static const KlassID ID = InstanceKlassID;
    
     protected:
      InstanceKlass(const ClassFileParser& parser, unsigned kind, KlassID id = ID);
    
     public:
      InstanceKlass() { assert(DumpSharedSpaces || UseSharedSpaces, "only for CDS"); }
    
      // See "The Java Virtual Machine Specification" section 2.16.2-5 for a detailed description
      // of the class loading & initialization procedure, and the use of the states.
      enum ClassState {
        allocated,                          // allocated (but not yet linked)
        loaded,                             // loaded and inserted in class hierarchy (but not linked yet)
        linked,                             // successfully linked/verified (but not initialized yet)
        being_initialized,                  // currently running class initializer
        fully_initialized,                  // initialized (successfull final state)
        initialization_error                // error happened during initialization
      };
    
     private:
      static InstanceKlass* allocate_instance_klass(const ClassFileParser& parser, TRAPS);
    ...

实战

这里使用到了一个工具HSDB

    public class Test_1 {
    
        public static void main(String[] args) {
            while (true);
        }
    }
    
    class Test_1_A{
        public static String str = "A str";
    
        static {
            System.out.println("A Static Block");
        }
    }
    
    class Test_1_B{
        public static String str = "B str";
    
        static {
            System.out.println("B Static Block");
        }
    }

运行这段代码,使用jps查看这个进程id,输入到HSDB上

202301011653172384.png 点击Tools,选择第一个Class Browser(类浏览)

202301011653179865.png 这里显示的是JVM关联的所有的class对象

202301011653187726.png 找到这个类

202301011653193027.png 后面显示的0x00000007c0060828是内存地址

点击Tools点击Inspector,输入复制的内存地址

202301011653201658.png 可以看一下,上面的Test_1的Java类,其实就是对应一个InstanceKlass

InstanceMirrorKlass

这个类说白了就是class对象(堆区)

ArrayKlass

和InstanceKlass类似,是存储数组类的元信息

静态数据类型和动态数据类型

  • 动态数据类型是运行时动态生成的
  • 静态数据类型JVM中内置的八种数据类型

在Java中,数组是动态数据类型

证明:

还是用到之前文章中用到的idea插件jclasslib

    public class Test_1 {
    
        public static void main(String[] args) {
            int[] arr = new int[1];
            //while (true);
        }
    }
    
    class Test_1_A{
        public static String str = "A str";
    
        static {
            System.out.println("A Static Block");
        }
    }
    
    class Test_1_B{
        public static String str = "B str";
    
        static {
            System.out.println("B Static Block");
        }
    }

这是一段代码,我们运行一下,查看反编译之后生成的字节码

202301011653209999.png 第一步,iconst_1将1压入操作数栈

第二步,newarray 10,上一篇文章中没有出现这个,我们来看一下newarray是什么意思

2023010116532199310.png 意思就是创建一个指定原始类型(int,float,char...)的数组,并将其引用值压入栈顶

基本数据类型的数组,在JVM中的表现形式就是TypeArrayKlass

那么,如果是一个引用类型数组呢?

    public class Test_2 {
    
        public static void main(String[] args) {
            int[] arr = new int[1];
            //while (true);
            Test_2[] test_2 = new Test_2[1];
        }
    }

2023010116532348811.png 可以看到,引用数据类型是anewarray,看一下anewarray是什么意思

2023010116532466612.png 创建一个引用型(如类,接口,数组)的数组,并将其引用压入栈顶

引用数据类型的数组,在JVM中的表现形式就是ObjArrayKlass

我们运行上面代码,用jps查看进程id

2023010116532571013.png 点击main,点击放大镜右边的按钮,查看线程的堆栈

2023010116532647314.png 这里我们可以看到0x000000076ada7858,点击Tools,点击Inspector,输入0x000000076ada7858回车查看

2023010116532815615.png TypeArrayKlass,证明了:基本数据类型的数组,在JVM中的表现形式就是TypeArrayKlass

输入0x000000076ada78d8,回车

2023010116532905816.png 可以看到ObjArrayKlass,证明了:引用数据类型的数组,在JVM中的表现形式就是ObjArrayKlass

类加载过程

2023010116533050117.png

加载

  1. 通过类的全限定名获取存储该类的class文件(没有明确必须从哪获取)
  2. 解析成运行时数据,即instanceKlass实例,存放在方法区
  3. 在堆区生成该类的Class对象,即instaceMirrorKlass实例

何时加载

  1. new,getstatic,putstatic,invokestatic
  2. 反射
  3. 初始化一个类的子类会去主动加载类
  4. 启动类(main函数所在类)
  5. 当使用jdk1.7动态语言支持时,如果一个java.lang.invoke.MethodHandle实例最后的解析结果REF_getstatic,REF_putstatic,REF_invokeStatic的方法句柄,并且这个方法句柄所对应的类没有进行初始化,则需要先触发其初始化

预加载:包装类,String,Thread

从哪里加载

  1. 从压缩包中读取,如jar,war
  2. 从网络中获取,如Web Applet
  3. 动态生成,如动态代理,GCLIB
  4. 由其他文件生成,如JSP
  5. 从数据库读取
  6. 从加密文件中读取

验证

  1. 文件格式验证
  2. 元数据验证
  3. 字节码验证
  4. 字符引用验证

准备

为静态变量分配内存,赋初值

实例变量是在创建对象的时候完成赋值的,没有赋初值一说

2023010116533149018.png 如果被final修饰,在编译的时候,会给属性添加ConstantValue属性,准备阶段直接完成赋值,即没有赋初值这一步

    public class Test_3 {
    
        public static final int a = 10;
        public static int b = 10;
    
        public static void main(String[] args) {
            int[] arr = new int[1];
            
            Test_3[] test_2 = new Test_3[1];
           
        }
    }

上面这段代码我们运行之后,再看一下字段

2023010116533279119.png 可以看到,被final修饰的a比没被final修饰的b多一个ConstantValue,并且常量值所以是10,直接赋值10了

解析

将常量池中的符号引用(指向运行时常量池的引用)转为直接引用(内存地址)

解析后的信息存储在ConstantPoolCache类实例中

  1. 类或接口的解析
  2. 字段解析
  3. 方法解析
  4. 接口方法解析

何时解析

思路:

  1. 加载阶段解析常量池
  2. 用的时候解析

进入到.class目录下,控制台输入javap -v xxx.class

    /Test_3.class
      Last modified 2020年12月23日; size 617 bytes
      MD5 checksum eb92dd7cf9716f003d1f643143892fd3
      Compiled from "Test_3.java"
    public class com.zzz.Test_3
      minor version: 0
      major version: 52
      flags: (0x0021) ACC_PUBLIC, ACC_SUPER
      this_class: #2                          // com/zzz/Test_3
      super_class: #4                         // java/lang/Object
      interfaces: 0, fields: 2, methods: 3, attributes: 1
    Constant pool:
       #1 = Methodref          #4.#31         // java/lang/Object."<init>":()V
       #2 = Class              #32            // com/zzz/Test_3
       #3 = Fieldref           #2.#33         // com/zzz/Test_3.b:I
       #4 = Class              #34            // java/lang/Object
       #5 = Utf8               a
       #6 = Utf8               I
       #7 = Utf8               ConstantValue
       #8 = Integer            10
       #9 = Utf8               b
      #10 = Utf8               <init>
      #11 = Utf8               ()V
      #12 = Utf8               Code
      #13 = Utf8               LineNumberTable
      #14 = Utf8               LocalVariableTable
      #15 = Utf8               this
      #16 = Utf8               Lcom/zzz/Test_3;
      #17 = Utf8               main
      #18 = Utf8               ([Ljava/lang/String;)V
      #19 = Utf8               args
      #20 = Utf8               [Ljava/lang/String;
      #21 = Utf8               arr
      #22 = Utf8               [I
      #23 = Utf8               test_2
      #24 = Utf8               [Lcom/zzz/Test_3;
      #25 = Utf8               StackMapTable
      #26 = Class              #22            // "[I"
      #27 = Class              #24            // "[Lcom/zzz/Test_3;"
      #28 = Utf8               <clinit>
      #29 = Utf8               SourceFile
      #30 = Utf8               Test_3.java
      #31 = NameAndType        #10:#11        // "<init>":()V
      #32 = Utf8               com/zzz/Test_3
      #33 = NameAndType        #9:#6          // b:I
      #34 = Utf8               java/lang/Object
    {
      public static final int a;
        descriptor: I
        flags: (0x0019) ACC_PUBLIC, ACC_STATIC, ACC_FINAL
        ConstantValue: int 10
    
      public static int b;
        descriptor: I
        flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    
      public com.zzz.Test_3();
        descriptor: ()V
        flags: (0x0001) ACC_PUBLIC
        Code:
          stack=1, locals=1, args_size=1
             0: aload_0
             1: invokespecial #1                  // Method java/lang/Object."<init>":()V
             4: return
          LineNumberTable:
            line 3: 0
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0       5     0  this   Lcom/zzz/Test_3;
    
      public static void main(java.lang.String[]);
        descriptor: ([Ljava/lang/String;)V
        flags: (0x0009) ACC_PUBLIC, ACC_STATIC
        Code:
          stack=1, locals=3, args_size=1
             0: iconst_1
             1: newarray       int
             3: astore_1
             4: iconst_1
             5: anewarray     #2                  // class com/zzz/Test_3
             8: astore_2
             9: goto          9
          LineNumberTable:
            line 9: 0
            line 11: 4
            line 12: 9
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0      12     0  args   [Ljava/lang/String;
                4       8     1   arr   [I
                9       3     2 test_2   [Lcom/zzz/Test_3;
          StackMapTable: number_of_entries = 1
            frame_type = 253 /* append */
              offset_delta = 9
              locals = [ class "[I", class "[Lcom/zzz/Test_3;" ]
    
      static {};
        descriptor: ()V
        flags: (0x0008) ACC_STATIC
        Code:
          stack=1, locals=0, args_size=0
             0: bipush        10
             2: putstatic     #3                  // Field b:I
             5: return
          LineNumberTable:
            line 6: 0
    }
    SourceFile: "Test_3.java"

我这里是Test3.class的静态常量池,我们运行这个类,控制台输入jps查看进程,复制到HSDB上,点击Tools,点击Class Browser,点击public class com.zzz.Test_3 @0x00000007c0060828

2023010116533338020.png 点击Constant Pool下面的,可以查看运行时常量池

2023010116533441521.png index为3的位置上,class已经指向了内存地址,而上面的静态常量池中,class指向的还是#32

2023010116533550122.png 这就证明了,解析阶段,符号引用转了直接引用

初始化

执行静态代码块,完成静态变量的赋值

静态字段,静态代码块,字节码层面会生成clinit方法,方法中语句的先后顺序与代码的编写顺序相关

    public class Test_3 {
    
        public static int a = 10;
        public static int b = 10;
    
        public static void main(String[] args) {
            int[] arr = new int[1];
            
            Test_3[] test_2 = new Test_3[1];
            
        }
    }

运行上面这段代码

2023010116533637323.png 用jclasslib可以看到,执行顺序是和编写顺序一样的

实战:static加载顺序

    public class Test_21 {
    
        public static void main(String[] args) {
            Test_21_A obj = Test_21_A.getInstance();
    
            System.out.println(obj.val1);
            System.out.println(obj.val2);
        }
    }
    
    class Test_21_A{
        public static int val1;
    
        public static int val2 = 1;
    
        public static Test_21_A instance = new Test_21_A();
    
        Test_21_A(){
            val1++;
            val2++;
        }
    
        public static Test_21_A getInstance(){
            return instance;
        }
    }

上面这段代码的运行结果:

2023010116533727224.png 因为val1被赋值初始值是0,val2是1,所以结果是1,2

    public class Test_21 {
    
        public static void main(String[] args) {
            Test_21_A obj = Test_21_A.getInstance();
    
            System.out.println(obj.val1);
            System.out.println(obj.val2);
        }
    }
    
    class Test_21_A{
        public static int val1;
    
        public static Test_21_A instance = new Test_21_A();
    
        Test_21_A(){
            val1++;
            val2++;
        }
        public static int val2 = 1;
    
        public static Test_21_A getInstance(){
            return instance;
        }
    }

想一下,上面这段代码结果是多少呢?

2023010116533801425.png 结果是1,1因为定义val2是在后面,所以虽然执行了++,但是val2的赋值把前面的直接覆盖了,所以是1,1

JVM加载类是懒加载模式

    public class Test_2 {
    
        public static void main(String[] args) {
            System.out.println(Test_2_B.str);
        }
    }
    
    class Test_2_A{
        public static String str = "A str";
    
        static {
            System.out.println("A Static Block");
        }
    }

看这段代码,猜一下执行结果

2023010116533875126.png 没有执行B的static代码块?为什么,上面说了JVM加载类是懒加载模式,str是定义在Test_2_A中的,并没有使用到Test_2_B,所以不会加载Test_2_B中的static代码块

这里我们改一下

    public class Test_2 {
    
    
        public static void main(String[] args) {
            System.out.println(Test_2_B.str);
        }
    }
    
    class Test_2_A{
    
    
        static {
            System.out.println("A Static Block");
        }
    }
    
    class Test_2_B extends Test_2_A{
        //public static String str = "B str";
        public static String str = "A str";
    
        static {
            System.out.println("B Static Block");
        }
    }

这样的话,结果

2023010116533996827.png 这里结果很明显了,也打印了B Static Block

    public class Test_2 {
    
    
        public static void main(String[] args) {
            Test_2_C arrs[] = new Test_2_C[1];
        }
    }
    
    class Test_2_A{
    
    
        static {
            System.out.println("A Static Block");
        }
    }
    
    class Test_2_B extends Test_2_A{
        //public static String str = "B str";
        public static String str = "A str";
    
        static {
            System.out.println("B Static Block");
        }
    }
    
    class Test_2_C{
        static {
            System.out.println("C Static Block");
        }
    }

结果

2023010116534081828.png 没有输出,因为定义数组只是定义一个数据类型

    public class Test_2 {
    
        public static void main(String[] args) {
            System.out.println(Test_2_D.str);
        }
    }
    
    class Test_2_A{
    
    
        static {
            System.out.println("A Static Block");
        }
    }
    
    class Test_2_B extends Test_2_A{
        //public static String str = "B str";
        public static String str = "A str";
    
        static {
            System.out.println("B Static Block");
        }
    }
    
    class Test_2_C{
        static {
            System.out.println("C Static Block");
        }
    }
    
    class Test_2_D{
        public static final String str = "A str";
    
        static {
            System.out.println("D Static Block");
        }
    }

结果:

2023010116534137429.png 就只打印了A str,这是怎么回事? 这里我们用javap -v查看一下

    public class com.zzz.Test_2
      minor version: 0
      major version: 52
      flags: (0x0021) ACC_PUBLIC, ACC_SUPER
      this_class: #6                          // com/zzz/Test_2
      super_class: #7                         // java/lang/Object
      interfaces: 0, fields: 0, methods: 2, attributes: 1
    Constant pool:
       #1 = Methodref          #7.#21         // java/lang/Object."<init>":()V
       #2 = Fieldref           #22.#23        // java/lang/System.out:Ljava/io/PrintStream;
       #3 = Class              #24            // com/zzz/Test_2_D
       #4 = String             #25            // A str
       #5 = Methodref          #26.#27        // java/io/PrintStream.println:(Ljava/lang/String;)V
       #6 = Class              #28            // com/zzz/Test_2
       #7 = Class              #29            // java/lang/Object
       #8 = Utf8               <init>
       #9 = Utf8               ()V
      #10 = Utf8               Code
      #11 = Utf8               LineNumberTable
      #12 = Utf8               LocalVariableTable
      #13 = Utf8               this
      #14 = Utf8               Lcom/zzz/Test_2;
      #15 = Utf8               main
      #16 = Utf8               ([Ljava/lang/String;)V
      #17 = Utf8               args
      #18 = Utf8               [Ljava/lang/String;
      #19 = Utf8               SourceFile
      #20 = Utf8               Test_2.java
      #21 = NameAndType        #8:#9          // "<init>":()V
      #22 = Class              #30            // java/lang/System
      #23 = NameAndType        #31:#32        // out:Ljava/io/PrintStream;
      #24 = Utf8               com/zzz/Test_2_D
      #25 = Utf8               A str
      #26 = Class              #33            // java/io/PrintStream
      #27 = NameAndType        #34:#35        // println:(Ljava/lang/String;)V
      #28 = Utf8               com/zzz/Test_2
      #29 = Utf8               java/lang/Object
      #30 = Utf8               java/lang/System
      #31 = Utf8               out
      #32 = Utf8               Ljava/io/PrintStream;
      #33 = Utf8               java/io/PrintStream
      #34 = Utf8               println
      #35 = Utf8               (Ljava/lang/String;)V
    {
      public com.zzz.Test_2();
        descriptor: ()V
        flags: (0x0001) ACC_PUBLIC
        Code:
          stack=1, locals=1, args_size=1
             0: aload_0
             1: invokespecial #1                  // Method java/lang/Object."<init>":()V
             4: return
          LineNumberTable:
            line 3: 0
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0       5     0  this   Lcom/zzz/Test_2;
    
      public static void main(java.lang.String[]);
        descriptor: ([Ljava/lang/String;)V
        flags: (0x0009) ACC_PUBLIC, ACC_STATIC
        Code:
          stack=2, locals=1, args_size=1
             0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
             3: ldc           #4                  // String A str
             5: invokevirtual #5                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
             8: return
          LineNumberTable:
            line 6: 0
            line 7: 8
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0       9     0  args   [Ljava/lang/String;
    }
    SourceFile: "Test_2.java"

#4 = String #25 // A str 通过这个可以看出,这里指向的是常量值,这里也就是将常量str写入了Test_6的常量池中

    public class Test_2 {
    
        public static void main(String[] args) {
            System.out.println(Test_2_E.uuid);
        }
    }
    
    
    class Test_2_E{
        public static final String uuid = UUID.randomUUID().toString();
    
        static {
            System.out.println("E Static Block");
        }
    }

结果:

2023010116534197630.png

这里和上一个str的不同点是str是一个常量,这里虽然也是final修饰了,但是因为uuid需要动态生成,所以涉及到Test_2_E的主动使用

    public class Test_2 {
    
        static {
            System.out.println("Static Block");
        }
    
        public static void main(String[] args) throws ClassNotFoundException {
            Class<?> clazz = Class.forName("com.zzz.Test_2_A");
            
        }
    }

2023010116534259131.png

读取静态变量的底层实现

    public class Test_3 {
    
        public static void main(String[] args) {
            System.out.println(Test_3_B.str);
            while (true);
        }
    }
    
    class Test_3_A{
        public static String str = "A str";
    
        static {
            System.out.println("A Static Block");
        }
    }
    
    class Test_3_B extends Test_3_A{
        static {
            System.out.println("B Static Block");
        }
    }

运行这段代码,控制台输入jps,使用HSDB,找到Test_3_A

2023010116534347632.png 这里我们可以看到,这个str的指针是存储在InstanceMirrorKlass

静态变量str的值是存放在StringTable中(也就是之前讲的字符串常量池)

看下Test_3_B有没有str

2023010116534482133.png 事实证明是没有的

实现思路:

  1. 先去Test_3_B的镜像类中去取,如果有直接返回,如果没有,会沿着继承链将请求往上抛。很明显,这种算法的性能随着继承的death而上升,算法复杂度为O(n)
  2. 借助另外的数据结构实现,使用k-v的格式存储,查询性能为O(1)

Hotspot就是使用的第二种方式;借助另外的数据结构ConstantPoolCache,常量池类ConstantPool中有个属性_cache指向了这个结构。每一条数据对应一个类ConstantPoolCacheEntry

ConstantPoolCache主要用于存储某些字节码指令所需的解析(resolve)好的常量项,例如给[get | put]static,[get|put]field,invoke[static|special|virtual|interface|dynamic]等指令对应的常量池项用。

ConstantPoolCacheEntry

常量池缓存是为常量池预留的运行时数据结构。保存所有字段访问和调用字节码的解释器运行时信息。缓存是在类被积极使用之前创建和初始化的。每个缓存项在解析时被填充

    ConstantPoolCacheEntry* base() const           { 
      return (ConstantPoolCacheEntry*)((address)this + in_bytes(base_offset()));
    }

这个公式的意思是ConstantPoolCache对象的地址加上ConstantPoolCache对象的内存大小

如何读取

\openjdk\hotspot\src\share\vm\interpreter\bytecodeInterpreter.cpp

    CASE(_getstatic):
            {
              u2 index;
              ConstantPoolCacheEntry* cache;
              index = Bytes::get_native_u2(pc+1);
    
              // QQQ Need to make this as inlined as possible. Probably need to
              // split all the bytecode cases out so c++ compiler has a chance
              // for constant prop to fold everything possible away.
    
              cache = cp->entry_at(index);
              if (!cache->is_resolved((Bytecodes::Code)opcode)) {
                CALL_VM(InterpreterRuntime::resolve_get_put(THREAD, (Bytecodes::Code)opcode),
                        handle_exception);
                cache = cp->entry_at(index);
              }
    ……

从代码中可以看出,是直接去获取ConstantPoolCacheEntry