JVM中的OOP-KLASS模型
在JVM中,使用OOP-KLASS模型来表示Java对象
OOP或者OOPS(Ordinary Object Pointer)指的是普通对象指针,主要职能是表示对象的实例数据,存储在堆里面
Klass是用来描述实例的具体类型,实现语言层面是Java类,存储在元空间(JDK8之后)
打开OpenJDK源码
这里我用的jdk12编译的,之后会单独出一篇mac下编译jdk源码的教程,没编译出来的也没事,不影响后面学习
打开Klass类
这里可以看出来,Klass继承自Metadata(元数据),我们跳转到Metadata中
Metadata继承了MetaspaceObj(元空间)
在Klass下面,还继承了几个类,分别是:
-
InstanceKlass
- InstanceMirrorKlass:描述java.lang.class的实例
- InstanceRefKlass:描述java.lang.ref.Reference的子类
- InstanceClassLoaderKlass:用于遍历某个加载器加载的类
-
ArrayKlass
- TypeArrayKlass:描述Java中基本类型数组的数据结构
- ObjArrayKlass:描述Java中引用类型数组的数据结构
接下来一一讲解
InstanceKlass
类加载器把class文件加载到内存中,生成的类
// An InstanceKlass is the VM level representation of a Java class.
// It contains all information needed for at class at execution runtime.
// InstanceKlass embedded field layout (after declared fields):
// [EMBEDDED Java vtable ] size in words = vtable_len
// [EMBEDDED nonstatic oop-map blocks] size in words = nonstatic_oop_map_size
// The embedded nonstatic oop-map blocks are short pairs (offset, length)
// indicating where oops are located in instances of this klass.
// [EMBEDDED implementor of the interface] only exist for interface
// [EMBEDDED unsafe_anonymous_host klass] only exist for an unsafe anonymous class (JSR 292 enabled)
// [EMBEDDED fingerprint ] only if should_store_fingerprint()==true
// forward declaration for class -- see below for definition
struct JvmtiCachedClassFileData;
class InstanceKlass: public Klass {
friend class VMStructs;
friend class JVMCIVMStructs;
friend class ClassFileParser;
friend class CompileReplay;
public:
static const KlassID ID = InstanceKlassID;
protected:
InstanceKlass(const ClassFileParser& parser, unsigned kind, KlassID id = ID);
public:
InstanceKlass() { assert(DumpSharedSpaces || UseSharedSpaces, "only for CDS"); }
// See "The Java Virtual Machine Specification" section 2.16.2-5 for a detailed description
// of the class loading & initialization procedure, and the use of the states.
enum ClassState {
allocated, // allocated (but not yet linked)
loaded, // loaded and inserted in class hierarchy (but not linked yet)
linked, // successfully linked/verified (but not initialized yet)
being_initialized, // currently running class initializer
fully_initialized, // initialized (successfull final state)
initialization_error // error happened during initialization
};
private:
static InstanceKlass* allocate_instance_klass(const ClassFileParser& parser, TRAPS);
...
实战
这里使用到了一个工具HSDB
public class Test_1 {
public static void main(String[] args) {
while (true);
}
}
class Test_1_A{
public static String str = "A str";
static {
System.out.println("A Static Block");
}
}
class Test_1_B{
public static String str = "B str";
static {
System.out.println("B Static Block");
}
}
运行这段代码,使用jps查看这个进程id,输入到HSDB上
点击Tools,选择第一个Class Browser(类浏览)
这里显示的是JVM关联的所有的class对象
找到这个类
后面显示的0x00000007c0060828是内存地址
点击Tools点击Inspector,输入复制的内存地址
可以看一下,上面的Test_1的Java类,其实就是对应一个InstanceKlass
InstanceMirrorKlass
这个类说白了就是class对象(堆区)
ArrayKlass
和InstanceKlass类似,是存储数组类的元信息
静态数据类型和动态数据类型
- 动态数据类型是运行时动态生成的
- 静态数据类型JVM中内置的八种数据类型
在Java中,数组是动态数据类型
证明:
还是用到之前文章中用到的idea插件jclasslib
public class Test_1 {
public static void main(String[] args) {
int[] arr = new int[1];
//while (true);
}
}
class Test_1_A{
public static String str = "A str";
static {
System.out.println("A Static Block");
}
}
class Test_1_B{
public static String str = "B str";
static {
System.out.println("B Static Block");
}
}
这是一段代码,我们运行一下,查看反编译之后生成的字节码
第一步,iconst_1将1压入操作数栈
第二步,newarray 10,上一篇文章中没有出现这个,我们来看一下newarray是什么意思
意思就是创建一个指定原始类型(int,float,char...)的数组,并将其引用值压入栈顶
基本数据类型的数组,在JVM中的表现形式就是TypeArrayKlass
那么,如果是一个引用类型数组呢?
public class Test_2 {
public static void main(String[] args) {
int[] arr = new int[1];
//while (true);
Test_2[] test_2 = new Test_2[1];
}
}
可以看到,引用数据类型是anewarray,看一下anewarray是什么意思
创建一个引用型(如类,接口,数组)的数组,并将其引用压入栈顶
引用数据类型的数组,在JVM中的表现形式就是ObjArrayKlass
我们运行上面代码,用jps查看进程id
点击main,点击放大镜右边的按钮,查看线程的堆栈
这里我们可以看到0x000000076ada7858,点击Tools,点击Inspector,输入0x000000076ada7858回车查看
TypeArrayKlass,证明了:基本数据类型的数组,在JVM中的表现形式就是TypeArrayKlass
输入0x000000076ada78d8,回车
可以看到ObjArrayKlass,证明了:引用数据类型的数组,在JVM中的表现形式就是ObjArrayKlass
类加载过程
加载
- 通过类的全限定名获取存储该类的class文件(没有明确必须从哪获取)
- 解析成运行时数据,即instanceKlass实例,存放在方法区
- 在堆区生成该类的Class对象,即instaceMirrorKlass实例
何时加载
- new,getstatic,putstatic,invokestatic
- 反射
- 初始化一个类的子类会去主动加载类
- 启动类(main函数所在类)
- 当使用jdk1.7动态语言支持时,如果一个java.lang.invoke.MethodHandle实例最后的解析结果REF_getstatic,REF_putstatic,REF_invokeStatic的方法句柄,并且这个方法句柄所对应的类没有进行初始化,则需要先触发其初始化
预加载:包装类,String,Thread
从哪里加载
- 从压缩包中读取,如jar,war
- 从网络中获取,如Web Applet
- 动态生成,如动态代理,GCLIB
- 由其他文件生成,如JSP
- 从数据库读取
- 从加密文件中读取
验证
- 文件格式验证
- 元数据验证
- 字节码验证
- 字符引用验证
准备
为静态变量分配内存,赋初值
实例变量是在创建对象的时候完成赋值的,没有赋初值一说
如果被final修饰,在编译的时候,会给属性添加ConstantValue属性,准备阶段直接完成赋值,即没有赋初值这一步
public class Test_3 {
public static final int a = 10;
public static int b = 10;
public static void main(String[] args) {
int[] arr = new int[1];
Test_3[] test_2 = new Test_3[1];
}
}
上面这段代码我们运行之后,再看一下字段
可以看到,被final修饰的a比没被final修饰的b多一个ConstantValue,并且常量值所以是10,直接赋值10了
解析
将常量池中的符号引用(指向运行时常量池的引用)转为直接引用(内存地址)
解析后的信息存储在ConstantPoolCache类实例中
- 类或接口的解析
- 字段解析
- 方法解析
- 接口方法解析
何时解析
思路:
- 加载阶段解析常量池
- 用的时候解析
进入到.class目录下,控制台输入javap -v xxx.class
/Test_3.class
Last modified 2020年12月23日; size 617 bytes
MD5 checksum eb92dd7cf9716f003d1f643143892fd3
Compiled from "Test_3.java"
public class com.zzz.Test_3
minor version: 0
major version: 52
flags: (0x0021) ACC_PUBLIC, ACC_SUPER
this_class: #2 // com/zzz/Test_3
super_class: #4 // java/lang/Object
interfaces: 0, fields: 2, methods: 3, attributes: 1
Constant pool:
#1 = Methodref #4.#31 // java/lang/Object."<init>":()V
#2 = Class #32 // com/zzz/Test_3
#3 = Fieldref #2.#33 // com/zzz/Test_3.b:I
#4 = Class #34 // java/lang/Object
#5 = Utf8 a
#6 = Utf8 I
#7 = Utf8 ConstantValue
#8 = Integer 10
#9 = Utf8 b
#10 = Utf8 <init>
#11 = Utf8 ()V
#12 = Utf8 Code
#13 = Utf8 LineNumberTable
#14 = Utf8 LocalVariableTable
#15 = Utf8 this
#16 = Utf8 Lcom/zzz/Test_3;
#17 = Utf8 main
#18 = Utf8 ([Ljava/lang/String;)V
#19 = Utf8 args
#20 = Utf8 [Ljava/lang/String;
#21 = Utf8 arr
#22 = Utf8 [I
#23 = Utf8 test_2
#24 = Utf8 [Lcom/zzz/Test_3;
#25 = Utf8 StackMapTable
#26 = Class #22 // "[I"
#27 = Class #24 // "[Lcom/zzz/Test_3;"
#28 = Utf8 <clinit>
#29 = Utf8 SourceFile
#30 = Utf8 Test_3.java
#31 = NameAndType #10:#11 // "<init>":()V
#32 = Utf8 com/zzz/Test_3
#33 = NameAndType #9:#6 // b:I
#34 = Utf8 java/lang/Object
{
public static final int a;
descriptor: I
flags: (0x0019) ACC_PUBLIC, ACC_STATIC, ACC_FINAL
ConstantValue: int 10
public static int b;
descriptor: I
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
public com.zzz.Test_3();
descriptor: ()V
flags: (0x0001) ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 3: 0
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this Lcom/zzz/Test_3;
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=3, args_size=1
0: iconst_1
1: newarray int
3: astore_1
4: iconst_1
5: anewarray #2 // class com/zzz/Test_3
8: astore_2
9: goto 9
LineNumberTable:
line 9: 0
line 11: 4
line 12: 9
LocalVariableTable:
Start Length Slot Name Signature
0 12 0 args [Ljava/lang/String;
4 8 1 arr [I
9 3 2 test_2 [Lcom/zzz/Test_3;
StackMapTable: number_of_entries = 1
frame_type = 253 /* append */
offset_delta = 9
locals = [ class "[I", class "[Lcom/zzz/Test_3;" ]
static {};
descriptor: ()V
flags: (0x0008) ACC_STATIC
Code:
stack=1, locals=0, args_size=0
0: bipush 10
2: putstatic #3 // Field b:I
5: return
LineNumberTable:
line 6: 0
}
SourceFile: "Test_3.java"
我这里是Test3.class的静态常量池,我们运行这个类,控制台输入jps查看进程,复制到HSDB上,点击Tools,点击Class Browser,点击public class com.zzz.Test_3 @0x00000007c0060828
点击Constant Pool下面的,可以查看运行时常量池
index为3的位置上,class已经指向了内存地址,而上面的静态常量池中,class指向的还是#32
这就证明了,解析阶段,符号引用转了直接引用
初始化
执行静态代码块,完成静态变量的赋值
静态字段,静态代码块,字节码层面会生成clinit方法,方法中语句的先后顺序与代码的编写顺序相关
public class Test_3 {
public static int a = 10;
public static int b = 10;
public static void main(String[] args) {
int[] arr = new int[1];
Test_3[] test_2 = new Test_3[1];
}
}
运行上面这段代码
用jclasslib可以看到,执行顺序是和编写顺序一样的
实战:static加载顺序
public class Test_21 {
public static void main(String[] args) {
Test_21_A obj = Test_21_A.getInstance();
System.out.println(obj.val1);
System.out.println(obj.val2);
}
}
class Test_21_A{
public static int val1;
public static int val2 = 1;
public static Test_21_A instance = new Test_21_A();
Test_21_A(){
val1++;
val2++;
}
public static Test_21_A getInstance(){
return instance;
}
}
上面这段代码的运行结果:
因为val1被赋值初始值是0,val2是1,所以结果是1,2
public class Test_21 {
public static void main(String[] args) {
Test_21_A obj = Test_21_A.getInstance();
System.out.println(obj.val1);
System.out.println(obj.val2);
}
}
class Test_21_A{
public static int val1;
public static Test_21_A instance = new Test_21_A();
Test_21_A(){
val1++;
val2++;
}
public static int val2 = 1;
public static Test_21_A getInstance(){
return instance;
}
}
想一下,上面这段代码结果是多少呢?
结果是1,1因为定义val2是在后面,所以虽然执行了++,但是val2的赋值把前面的直接覆盖了,所以是1,1
JVM加载类是懒加载模式
public class Test_2 {
public static void main(String[] args) {
System.out.println(Test_2_B.str);
}
}
class Test_2_A{
public static String str = "A str";
static {
System.out.println("A Static Block");
}
}
看这段代码,猜一下执行结果
没有执行B的static代码块?为什么,上面说了JVM加载类是懒加载模式,str是定义在Test_2_A中的,并没有使用到Test_2_B,所以不会加载Test_2_B中的static代码块
这里我们改一下
public class Test_2 {
public static void main(String[] args) {
System.out.println(Test_2_B.str);
}
}
class Test_2_A{
static {
System.out.println("A Static Block");
}
}
class Test_2_B extends Test_2_A{
//public static String str = "B str";
public static String str = "A str";
static {
System.out.println("B Static Block");
}
}
这样的话,结果
这里结果很明显了,也打印了B Static Block
public class Test_2 {
public static void main(String[] args) {
Test_2_C arrs[] = new Test_2_C[1];
}
}
class Test_2_A{
static {
System.out.println("A Static Block");
}
}
class Test_2_B extends Test_2_A{
//public static String str = "B str";
public static String str = "A str";
static {
System.out.println("B Static Block");
}
}
class Test_2_C{
static {
System.out.println("C Static Block");
}
}
结果
没有输出,因为定义数组只是定义一个数据类型
public class Test_2 {
public static void main(String[] args) {
System.out.println(Test_2_D.str);
}
}
class Test_2_A{
static {
System.out.println("A Static Block");
}
}
class Test_2_B extends Test_2_A{
//public static String str = "B str";
public static String str = "A str";
static {
System.out.println("B Static Block");
}
}
class Test_2_C{
static {
System.out.println("C Static Block");
}
}
class Test_2_D{
public static final String str = "A str";
static {
System.out.println("D Static Block");
}
}
结果:
就只打印了A str,这是怎么回事? 这里我们用javap -v查看一下
public class com.zzz.Test_2
minor version: 0
major version: 52
flags: (0x0021) ACC_PUBLIC, ACC_SUPER
this_class: #6 // com/zzz/Test_2
super_class: #7 // java/lang/Object
interfaces: 0, fields: 0, methods: 2, attributes: 1
Constant pool:
#1 = Methodref #7.#21 // java/lang/Object."<init>":()V
#2 = Fieldref #22.#23 // java/lang/System.out:Ljava/io/PrintStream;
#3 = Class #24 // com/zzz/Test_2_D
#4 = String #25 // A str
#5 = Methodref #26.#27 // java/io/PrintStream.println:(Ljava/lang/String;)V
#6 = Class #28 // com/zzz/Test_2
#7 = Class #29 // java/lang/Object
#8 = Utf8 <init>
#9 = Utf8 ()V
#10 = Utf8 Code
#11 = Utf8 LineNumberTable
#12 = Utf8 LocalVariableTable
#13 = Utf8 this
#14 = Utf8 Lcom/zzz/Test_2;
#15 = Utf8 main
#16 = Utf8 ([Ljava/lang/String;)V
#17 = Utf8 args
#18 = Utf8 [Ljava/lang/String;
#19 = Utf8 SourceFile
#20 = Utf8 Test_2.java
#21 = NameAndType #8:#9 // "<init>":()V
#22 = Class #30 // java/lang/System
#23 = NameAndType #31:#32 // out:Ljava/io/PrintStream;
#24 = Utf8 com/zzz/Test_2_D
#25 = Utf8 A str
#26 = Class #33 // java/io/PrintStream
#27 = NameAndType #34:#35 // println:(Ljava/lang/String;)V
#28 = Utf8 com/zzz/Test_2
#29 = Utf8 java/lang/Object
#30 = Utf8 java/lang/System
#31 = Utf8 out
#32 = Utf8 Ljava/io/PrintStream;
#33 = Utf8 java/io/PrintStream
#34 = Utf8 println
#35 = Utf8 (Ljava/lang/String;)V
{
public com.zzz.Test_2();
descriptor: ()V
flags: (0x0001) ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 3: 0
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this Lcom/zzz/Test_2;
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #4 // String A str
5: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
LineNumberTable:
line 6: 0
line 7: 8
LocalVariableTable:
Start Length Slot Name Signature
0 9 0 args [Ljava/lang/String;
}
SourceFile: "Test_2.java"
#4 = String #25 // A str 通过这个可以看出,这里指向的是常量值,这里也就是将常量str写入了Test_6的常量池中
public class Test_2 {
public static void main(String[] args) {
System.out.println(Test_2_E.uuid);
}
}
class Test_2_E{
public static final String uuid = UUID.randomUUID().toString();
static {
System.out.println("E Static Block");
}
}
结果:
这里和上一个str的不同点是str是一个常量,这里虽然也是final修饰了,但是因为uuid需要动态生成,所以涉及到Test_2_E的主动使用
public class Test_2 {
static {
System.out.println("Static Block");
}
public static void main(String[] args) throws ClassNotFoundException {
Class<?> clazz = Class.forName("com.zzz.Test_2_A");
}
}
读取静态变量的底层实现
public class Test_3 {
public static void main(String[] args) {
System.out.println(Test_3_B.str);
while (true);
}
}
class Test_3_A{
public static String str = "A str";
static {
System.out.println("A Static Block");
}
}
class Test_3_B extends Test_3_A{
static {
System.out.println("B Static Block");
}
}
运行这段代码,控制台输入jps,使用HSDB,找到Test_3_A
这里我们可以看到,这个str的指针是存储在InstanceMirrorKlass
静态变量str的值是存放在StringTable中(也就是之前讲的字符串常量池)
看下Test_3_B有没有str
事实证明是没有的
实现思路:
- 先去Test_3_B的镜像类中去取,如果有直接返回,如果没有,会沿着继承链将请求往上抛。很明显,这种算法的性能随着继承的death而上升,算法复杂度为O(n)
- 借助另外的数据结构实现,使用k-v的格式存储,查询性能为O(1)
Hotspot就是使用的第二种方式;借助另外的数据结构ConstantPoolCache,常量池类ConstantPool中有个属性_cache指向了这个结构。每一条数据对应一个类ConstantPoolCacheEntry
ConstantPoolCache主要用于存储某些字节码指令所需的解析(resolve)好的常量项,例如给[get | put]static,[get|put]field,invoke[static|special|virtual|interface|dynamic]等指令对应的常量池项用。
ConstantPoolCacheEntry
常量池缓存是为常量池预留的运行时数据结构。保存所有字段访问和调用字节码的解释器运行时信息。缓存是在类被积极使用之前创建和初始化的。每个缓存项在解析时被填充
ConstantPoolCacheEntry* base() const {
return (ConstantPoolCacheEntry*)((address)this + in_bytes(base_offset()));
}
这个公式的意思是ConstantPoolCache对象的地址加上ConstantPoolCache对象的内存大小
如何读取
\openjdk\hotspot\src\share\vm\interpreter\bytecodeInterpreter.cpp
CASE(_getstatic):
{
u2 index;
ConstantPoolCacheEntry* cache;
index = Bytes::get_native_u2(pc+1);
// QQQ Need to make this as inlined as possible. Probably need to
// split all the bytecode cases out so c++ compiler has a chance
// for constant prop to fold everything possible away.
cache = cp->entry_at(index);
if (!cache->is_resolved((Bytecodes::Code)opcode)) {
CALL_VM(InterpreterRuntime::resolve_get_put(THREAD, (Bytecodes::Code)opcode),
handle_exception);
cache = cp->entry_at(index);
}
……
从代码中可以看出,是直接去获取ConstantPoolCacheEntry