Golang反射机制的实现分析——查询类型名称

栏目: C · 发布时间: 6年前

内容简介：为了防止编译器做优化，例子中的源码都通过下面的指令编译这段代码最终将打印出1的类型——int。main函数的入口地址是main.main。我们使用gdb在这个位置下断点，然后反汇编。略去一部分函数准备工作，我们看到

为了防止编译器做优化，例子中的源码都通过下面的指令编译

go build -gcflags "-N -l" [xxxxxx].go

查询类型名称

基本类型

package main

import (
	"fmt"
	"reflect"
)

func main() {
	t := reflect.TypeOf(1)
	s := t.Name()
	fmt.Println(s)
}

这段代码最终将打印出1的类型——int。

main函数的入口地址是main.main。我们使用gdb在这个位置下断点，然后反汇编。略去一部分函数准备工作，我们看到

0x0000000000487c6f <+31>:    mov    %rbp,0xa0(%rsp)
   0x0000000000487c77 <+39>:    lea    0xa0(%rsp),%rbp
   0x0000000000487c7f <+47>:    lea    0xfb5a(%rip),%rax        # 0x4977e0
   0x0000000000487c86 <+54>:    mov    %rax,(%rsp)
   0x0000000000487c8a <+58>:    lea    0x40097(%rip),%rax        # 0x4c7d28 <main.statictmp_0>
   0x0000000000487c91 <+65>:    mov    %rax,0x8(%rsp)
   0x0000000000487c96 <+70>:    callq  0x46f210 <reflect.TypeOf>

第3~4行，这段代码将地址0x4977e0压栈。之后在5~6行，又将0x4c7d28压栈。64位系统下，程序的压栈不像32位系统使用push指令，而是使用mov指令间接操作rsp寄存器指向的栈空间。

第7行，调用了reflect.TypeOf方法，在Golang的源码中，该方法的相关定义位于\src\reflect\type.go中

// TypeOf returns the reflection Type that represents the dynamic type of i.
// If i is a nil interface value, TypeOf returns nil.
func TypeOf(i interface{}) Type {
	eface := *(*emptyInterface)(unsafe.Pointer(&i))
	return toType(eface.typ)
}

// toType converts from a *rtype to a Type that can be returned
// to the client of package reflect. In gc, the only concern is that
// a nil *rtype must be replaced by a nil Type, but in gccgo this
// function takes care of ensuring that multiple *rtype for the same
// type are coalesced into a single Type.
func toType(t *rtype) Type {
	if t == nil {
		return nil
	}
	return t
}

reflect.emptyInterface是一个保存数据类型信息和裸指针的结构体，它位于\src\reflect\value.go

// emptyInterface is the header for an interface{} value.
type emptyInterface struct {
	typ  *rtype
	word unsafe.Pointer
}

之前压栈的两个地址0x4977e0和0x4c7d28分别对应于type和word。

(gdb) x/16xb $rsp
0xc42003fed0:   0xe0    0x77    0x49    0x00    0x00    0x00    0x00    0x00
0xc42003fed8:   0x28    0x7d    0x4c    0x00    0x00    0x00    0x00    0x00

这样在内存上便构成了一个emptyInterface结构。下面我们查看它们的内存，0x4c7d28保存的值0x01即是我们传入reflect.TypeOf的值。

0x4977e0:       0x08    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x4c7d28 <main.statictmp_0>:    0x01    0x00    0x00    0x00    0x00    0x00    0x00    0x00

reflect.rtype定义位于src\reflect\type.go

// rtype is the common implementation of most values.
// It is embedded in other struct types.
//
// rtype must be kept in sync with ../runtime/type.go:/^type._type.
type rtype struct {
	size       uintptr
	……
	str        nameOff  // string form
	ptrToThis  typeOff  // type for pointer to this type, may be zero
}

我们看到reflect.toType隐式的将reflect.rtype转换成了reflect.Type类型，而reflect.Type类型和它完全不一样

type Type interface {
	Align() int
	FieldAlign() int
	Method(int) Method
	……
}

从Golang的源码的角度去解析似乎进入了死胡同，我们继续转向汇编层面，查看reflect.TypeOf的实现

0x000000000046f210 <+0>:     mov    0x8(%rsp),%rax
   0x000000000046f215 <+5>:     test   %rax,%rax
   0x000000000046f218 <+8>:     je     0x46f22c <reflect.TypeOf+28>
   0x000000000046f21a <+10>:    lea    0xaddbf(%rip),%rcx        # 0x51cfe0 <go.itab.*reflect.rtype,reflect.Type>
   0x000000000046f221 <+17>:    mov    %rcx,0x18(%rsp)
   0x000000000046f226 <+22>:    mov    %rax,0x20(%rsp)
   0x000000000046f22b <+27>:    retq   
   0x000000000046f22c <+28>:    xor    %eax,%eax
   0x000000000046f22e <+30>:    mov    %rax,%rcx
   0x000000000046f231 <+33>:    jmp    0x46f221 <reflect.TypeOf+17>

之前介绍过，在调用reflect.TypeOf前，已经在栈上构建了一个emptyInterface结构体。由于此函数只关注类型，而不关注值，所以此时只是使用了typ字段——rsp+0x08地址的值。

比较有意思的是这个过程获取了一个内存地址0x51cfe0，目前我们尚不知它是干什么的。之后我们会再次关注它。

0x0000000000487c9b <+75>:    mov    0x10(%rsp),%rax
   0x0000000000487ca0 <+80>:    mov    0x18(%rsp),%rcx
   0x0000000000487ca5 <+85>:    mov    %rax,0x38(%rsp)
   0x0000000000487caa <+90>:    mov    %rcx,0x40(%rsp)
   0x0000000000487caf <+95>:    mov    0xc0(%rax),%rax
   0x0000000000487cb6 <+102>:   mov    %rcx,(%rsp)
   0x0000000000487cba <+106>:   callq  *%rax

从reflect.TypeOf调用中返回后，rax寄存器保存的是0x51cfe0，然后在第6行计算了该地址偏移0xC0的地址中保存的值。最后在第7行调用了该地址所指向的函数。

(gdb) x/64bx 0x51cfe0+0xc0 
0x51d0a0 <go.itab.*reflect.rtype,reflect.Type+192>:     0x80    0xcc    0x46    0x00    0x00    0x00    0x00    0x00
0x51d0a8 <go.itab.*reflect.rtype,reflect.Type+200>:     0xf0    0xd6    0x46    0x00    0x00    0x00    0x00    0x00
0x51d0b0 <go.itab.*reflect.rtype,reflect.Type+208>:     0x60    0xd7    0x46    0x00    0x00    0x00    0x00    0x00
0x51d0b8 <go.itab.*reflect.rtype,reflect.Type+216>:     0xe0    0xbe    0x46    0x00    0x00    0x00    0x00    0x00
0x51d0c0 <go.itab.*reflect.rtype,reflect.Type+224>:     0xd0    0xd7    0x46    0x00    0x00    0x00    0x00    0x00
0x51d0c8 <go.itab.*reflect.rtype,reflect.Type+232>:     0x80    0xd8    0x46    0x00    0x00    0x00    0x00    0x00
0x51d0d0 <go.itab.*reflect.rtype,reflect.Type+240>:     0x90    0xcb    0x46    0x00    0x00    0x00    0x00    0x00
0x51d0d8 <go.itab.*reflect.rtype,reflect.Type+248>:     0x60    0xb9    0x46    0x00    0x00    0x00    0x00    0x00

使用反汇编指令看下0x46cc80处的函数，可以看到它是reflect.(*rtype).Name()

(gdb) disassemble 0x46cc80
Dump of assembler code for function reflect.(*rtype).Name:

我们再看0x51d0a0附近的内存中的值，发现其很有规律。其实它们都是reflect.(*rtype)下的函数地址。

(gdb) disassemble 0x46b960
Dump of assembler code for function reflect.(*rtype).Size:

(gdb) disassemble 0x46cb90
Dump of assembler code for function reflect.(*rtype).PkgPath:

这些方法也是reflect.Type接口暴露的方法。当我们调用Type暴露的方法的时候，实际底层调用的rtype对应的同名方法。

type Type interface {
	Align() int
	FieldAlign() int
	……
	Name() string
	PkgPath() string
	Size() uintptr
	……
}

reflect.(*rtype).Name()的相关实现是

func (t *rtype) Name() string {
	if t.tflag&tflagNamed == 0 {
		return ""
	}
	s := t.String()
	……
	return s[i+1:]
}

func (t *rtype) String() string {
	s := t.nameOff(t.str).name()
	if t.tflag&tflagExtraStar != 0 {
		return s[1:]
	}
	return s
}

type name struct {
	bytes *byte
}

func (t *rtype) nameOff(off nameOff) name {
	return name{(*byte)(resolveNameOff(unsafe.Pointer(t), int32(off)))}
}

这段代码表示，变量的类型值和rtype的地址和rtype.str字段有关。而这个rtype就是reflect.TypeOf调用前构建的emptyInterface的rtype。我们使用gdb查看该结构体

$4 = {
  size = 0x8, 
  ptrdata = 0x0, 
  hash = 0xf75371fa, 
  tflag = 0x7, 
  align = 0x8, 
  fieldAlign = 0x8, 
  kind = 0x82, 
  alg = 0x529a70, 
  gcdata = 0x4c6cd8, 
  str = 0x3a3, 
  ptrToThis = 0xac60
}

最后我们就要看相对复杂的resolveNameOff实现。

func resolveNameOff(ptrInModule unsafe.Pointer, off nameOff) name {
	if off == 0 {
		return name{}
	}
	base := uintptr(ptrInModule)
	for md := &firstmoduledata; md != nil; md = md.next {
		if base >= md.types && base < md.etypes {
			res := md.types + uintptr(off)
			if res > md.etypes {
				println("runtime: nameOff", hex(off), "out of range", hex(md.types), "-", hex(md.etypes))
				throw("runtime: name offset out of range")
			}
			return name{(*byte)(unsafe.Pointer(res))}
		}
	}

	// No module found. see if it is a run time name.
	reflectOffsLock()
	res, found := reflectOffs.m[int32(off)]
	reflectOffsUnlock()
	if !found {
		println("runtime: nameOff", hex(off), "base", hex(base), "not in ranges:")
		for next := &firstmoduledata; next != nil; next = next.next {
			println("\ttypes", hex(next.types), "etypes", hex(next.etypes))
		}
		throw("runtime: name offset base pointer out of range")
	}
	return name{(*byte)(res)}
}

我们先忽略17行之后的代码。从6~15行，程序会遍历模块信息，并检测rtype地址是否在该区间之内（base >= md.types && base < md.etypes）。如果在此区间，则返回相对于该区间起始地址的off偏移地址。

所以，rtype.str字段的偏移不是相对于rtype的起始地址。而是相对于rtype起始地址所在的区间的起始地址。

和rtype信息一样，firstmoduledata的信息也是全局初始化的。我们使用IDA协助查看它位置。

Golang反射机制的实现分析——查询类型名称

可以看到这些数据都存储在elf的noptrdata节中，该节中数据是Golang构建程序时保存全局数据的地方。所以这种“反射”是编译器在编译的过程中，暗中帮我们构建了和变量等有关的信息。

我们再看下模块起始地址0x488000偏移rtype.str=0x3a3的地址空间。

Golang反射机制的实现分析——查询类型名称

这样我们就看到int字段的来源了。

自定义结构类型

package main

import (
	"fmt"
	"reflect"
)

type t20190107 struct {
	v string
}

func main() {
	i2 := t20190107{"s20190107"}
	t2 := reflect.TypeOf(i2)
	s2 := t2.Name()

	fmt.Println(s2)
}

这段代码故意构建一个名字很特殊的结构体，我们看下反汇编的结果。

0x0000000000487c6f <+31>:    mov    %rbp,0xc0(%rsp)
   0x0000000000487c77 <+39>:    lea    0xc0(%rsp),%rbp
   0x0000000000487c7f <+47>:    movq   $0x0,0x58(%rsp)
   0x0000000000487c88 <+56>:    movq   $0x0,0x60(%rsp)
   0x0000000000487c91 <+65>:    lea    0x2f868(%rip),%rax        # 0x4b7500
   0x0000000000487c98 <+72>:    mov    %rax,0x58(%rsp)
   0x0000000000487c9d <+77>:    movq   $0x9,0x60(%rsp)
   0x0000000000487ca6 <+86>:    mov    %rax,0x98(%rsp)
   0x0000000000487cae <+94>:    movq   $0x9,0xa0(%rsp)
   0x0000000000487cba <+106>:   lea    0x196ff(%rip),%rax        # 0x4a13c0
   0x0000000000487cc1 <+113>:   mov    %rax,(%rsp)
   0x0000000000487cc5 <+117>:   lea    0x98(%rsp),%rax
   0x0000000000487ccd <+125>:   mov    %rax,0x8(%rsp)
   0x0000000000487cd2 <+130>:   callq  0x40c7e0 <runtime.convT2E>

第5行，我们获取了0x4b7500空间地址，我们看下其值，就是我们初始化结构体的“s20190107"

Golang反射机制的实现分析——查询类型名称

第10行，我们又获取了0x4a13c0地址。依据之前的经验，该地址保存的是reflect.rtype类型数据。但是由于之后调用的runtime.convT2E，所以其类型是runtime._type。

func convT2E(t *_type, elem unsafe.Pointer) (e eface) {
	if raceenabled {
		raceReadObjectPC(t, elem, getcallerpc(unsafe.Pointer(&t)), funcPC(convT2E))
	}
	if msanenabled {
		msanread(elem, t.size)
	}
	x := mallocgc(t.size, t, true)
	// TODO: We allocate a zeroed object only to overwrite it with actual data.
	// Figure out how to avoid zeroing. Also below in convT2Eslice, convT2I, convT2Islice.
	typedmemmove(t, x, elem)
	e._type = t
	e.data = x
	return
}

其实runtime._type和reflect.rtype的定义是一样的

type _type struct {
	size       uintptr
	……
	str       nameOff
	ptrToThis typeOff
}

type rtype struct {
	size       uintptr
	……	
	str        nameOff  // string form
	ptrToThis  typeOff  // type for pointer to this type, may be zero
}

而reflect.emptyInterface和runtime.eface也一样

type eface struct {
	_type *_type
	data  unsafe.Pointer
}

type emptyInterface struct {
	typ  *rtype
	word unsafe.Pointer
}

这让我们对基本类型的分析结果和经验在此处依然适用。

使用gdb把_type信息打印出来，可以发现这次类型名称的偏移量0x6184比较大。

$3 = {
  size = 0x10, 
  ptrdata = 0x8, 
  hash = 0xe1c71878, 
  tflag = 0x7, 
  align = 0x8, 
  fieldalign = 0x8, 
  kind = 0x19, 
  alg = 0x529a90, 
  gcdata = 0x4c6dc4, 
  str = 0x6184, 
  ptrToThis = 0xae80
}

runtime.convT2E第8行在垃圾回收器上构建了一段内存，并将裸指针指向的数据保存到该地址空间中。然后在第12~13行重新构建了eface结构体。

之后进入reflect.TypeOf逻辑，这和之前分析的流程一致。我们最后看下保存的类型数据的全局区域

Golang反射机制的实现分析——查询类型名称

总结

编译器在编译过程中，将变量对应的类型信息（runtime._type或reflect.rtype）保存在.rodata节中。
字面量直接使用reflect.TypeOf方法获取rtype类型函数地址列表
变量使用runtime.convT2*类型转换函数，使用垃圾回收器上分配的空间存储变量值，然后调用reflect.TypeOf方法
遍历保存在.noptrdata节中的模块信息，确认类型信息的存储地址位于的模块区域。然后使用str字段表示的偏移量计算出字符在内存中的位置。

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

精益数据分析

[加] 阿利斯泰尔·克罗尔、[加] 本杰明·尤科维奇 / 韩知白、王鹤达 / 人民邮电出版社 / 2014-12 / 79.00元

本书展示了如何验证自己的设想、找到真正的客户、打造能赚钱的产品，以及提升企业知名度。30多个案例分析，全球100多位知名企业家的真知灼见，为你呈现来之不易、经过实践检验的创业心得和宝贵经验，值得每位创业家和企业家一读。深入理解精益创业、数据分析基础，和数据驱动的思维模式如何将六个典型的商业模式应用到各种规模的新企业找到你的第一关键指标确定底线，找到出发点在大......一起来看看《精益数据分析》这本书的介绍吧!

码农工具

Golang反射机制的实现分析——查询类型名称

查询类型名称

基本类型

自定义结构类型

总结

精益数据分析

HTML 编码/解码

MD5 加密

XML 在线格式化