golang硬核技术（五）并发调度模型GMP到底是如何工作的

开发运维 2023-09-16 大白菜程序猿手机阅读

前言

我们总是说go更适合并发编程，那它到底哪里适合并发。我们用其他语言的异步机制，一样可以达到和go语言同样的并发效果，甚至性能可能更好。所有go的并发和调度到底如何实现的？有何优点缺点？

建议读本文之前先看o程序从启动到运行到底经历了啥这篇博文，否则会有很多东西不知道是什么。

还有推荐这篇博文，说的很清楚，借鉴了其中很多内容：zboya.github.io/post/go_sch…

依然延续前文用mac+go1.18.4版本

GMP

go把每个工作单元拆分成goroution，然后将这些goroution调度到不同的线程上运行。模型如下：

如图所示： G就是goroution，是调度的基本单位。P是管理G的调度，负责将G调度到M。M是操作系统的线程，是真正执行的地方。

这里可以看到GMP的一个优点：用户态调度。在传统的并发模式下，我们需要若干线程来执行若干任务，比较经典的是每个线程处理一个http请求。请求是非常多的，但CPU个数是有限的，能够同一时间执行的线程是有限的。这就需要CPU不断切换线程，不停地中断，开销很高。可以先简单的这么理解，实际会有出入

整体链路

r0 ： runtime0 最开始启动程序的运行时
m0 ：启动线程，
g0 ：每个m都会有一个g0 负责协程的调度。

在程序启动的时候，最开始运行的就是r0和m0，他们做了大量的初始化工作。详细过程可以参考之前的博文：o程序从启动到运行到底经历了啥，这里只看和GMP相关的关键内容。其中最重要的函数依然是schedinit。

这里主要通过procresize初始化p，并存在runtime目录下的runtime2.go的全局变量allp中。

func schedinit() {
...
   //默认m(线程)的最大值是10000个，面试经常问
   sched.maxmcount = 10000
...
   //设置了p的数量 ，
   //由启动时环境变量 `$GOMAXPROCS` 或者是由`runtime.GOMAXPROCS()` 决定
...
   //初始化p
   if procresize(procs) != nil {
      throw("unknown runnable goroutine during bootstrap")
   }
... 
}

在rt0_go中，会将我们的main函数当做第一个g推入到p的队列中。并且p会启动m进行执行。

推g的代码是CALL runtime·newproc(SB)，newproc 相当go关键字。不理解的看上篇博文已经讲的很清楚了
在函数newproc中会调用wakep函数，wakep会调用startm函数，在这个函数里，会先尝试拿取一个空闲的m，没有则会通过newm创建新的m。
调用schedule 开始运行GMP

至此，整个GMP就循环起来了，直到程序退出。

M

M的结构在runtime目录下的runtime2.go中，并且同allp一样有全局变量allm，M的结构如下：

type m struct {
    // 每个m都有一个对应的g0线程，用来执行调度代码，
    // 当需要执行用户代码的时候，g0会与用户goroutine发生协程栈切换
   g0      *g     // goroutine with scheduling stack
   morebuf gobuf  // gobuf arg to morestack
...
   // tls作为线程的本地存储
   // 其中可以在任意时刻获取绑定到当前线程上的协程g、结构体m、逻辑处理器p、特殊协程g0等信息
   tls           [tlsSlots]uintptr // thread-local storage (for x86 extern register)
   mstartfn      func()
   // 指向正在运行的goroutine对象
   curg          *g       // current running goroutine
   caughtsig     guintptr // goroutine running during fatal signal
   // 与当前工作线程绑定的p
   p             puintptr // attached p for executing go code (nil if not executing go code)
   nextp         puintptr
   oldp          puintptr // the p that was attached before executing a syscall
   id            int64
   mallocing     int32
   throwing      int32
   // 与禁止抢占相关的字段，如果该字段不等于空字符串，要保持curg一直在这个m上运行
   preemptoff    string // if != "", keep curg running on this m
   // locks也是判断g能否被抢占的一个标识
   locks         int32
   dying         int32
   profilehz     int32
   // spining为true标识当前m正在处于自己找工作的自旋状态，
   // 首先检查全局队列看是否有工作，然后检查network poller，尝试执行GC任务
   //或者偷一部分工作，如果都没有则会进入休眠状态
   spinning      bool // m is out of work and is actively looking for work
   // 表示m正阻塞在note上
   blocked       bool // m is blocked on a note
...
   doesPark      bool        // non-P running threads: sysmon and newmHandoff never use .park
   // 没有goroutine需要运行时，工作线程睡眠在这个park成员上
   park          note
   // 记录所有工作线程的一个链表
   alllink       *m // on allm
   schedlink     muintptr
   lockedg       guintptr
   createstack   [32]uintptr // stack that created this thread.
...
}

很多大佬已经整理过了，我这里直接沾一个。

当M没有工作时，它会自旋的来找工作，首先检查全局队列看是否有工作，然后检查network poller，尝试执行GC任务，或者偷一部分工作，如果都没有则会进入休眠状态。当被其他工作线程唤醒，又会进入自旋状态。

P

P的结构和m在同一个文件中。

type p struct {
    // 全局变量allp中的索引位置
   id          int32
   // p的状态标识
   status      uint32 // one of pidle/prunning/...
   link        puintptr
   // 调用schedule的次数，每次调用schedule这个值会加1
   schedtick   uint32     // incremented on every scheduler call
   // 系统调用的次数，每次进行系统调用加1
   syscalltick uint32     // incremented on every system call
   // 用于sysmon协程记录被监控的p的系统调用时间和运行时间
   sysmontick  sysmontick // last tick observed by sysmon
   // 指向绑定的m，p如果是idle状态这个值为nil
   m           muintptr   // back-link to associated m (nil if idle)
   // 用于分配微小对象和小对象的一个块的缓存空间，里面有各种不同等级的span
   mcache      *mcache
   // 一个chunk大小（512kb）的内存空间，用来对堆上内存分配的缓存优化达到无锁访问的目的
   pcache      pageCache
   raceprocctx uintptr

   deferpool    [5][]*_defer // pool of available defer structs of different sizes (see panic.go)
   deferpoolbuf [5][32]*_defer

   // Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.
   // 可以分配给g的id的缓存，每次会一次性申请16个
   goidcache    uint64
   goidcacheend uint64

   // Queue of runnable goroutines. Accessed without lock.
   // 本地可运行的G队列的头部和尾部，达到无锁访问
   runqhead uint32
   runqtail uint32
   // 本地可运行的g队列，是一个使用数组实现的循环队列
   runq     [256]guintptr
   // 下一个待运行的g，这个g的优先级最高
   // 如果当前g运行完后还有剩余可用时间，那么就应该运行这个runnext的g
   runnext guintptr

   // Available G's (status == Gdead)
   // p上的空闲队列列表
   gFree struct {
      gList
      n int32
   }

...
    // 用于内存对齐
   _ uint32 // Alignment for atomic fields below
...
    // 是否被抢占
   preempt bool

   // Padding is no longer needed. False sharing is now not a worry because p is large enough
   // that its size class is an integer multiple of the cache line size (for any of our architectures).
}

当程序刚开始运行进行初始化时，所有的P都处于_Pgcstop状态，随着的P的初始化（runtime.procresize），会被设置为_Pidle状态。
当M需要运行时会调用runtime.acquirep来使P变为_Prunning状态，并通过runtime.releasep来释放，重新变为_Pidele。
当G执行时需要进入系统调用，P会被设置为_Psyscall，如果这个时候被系统监控抢夺（runtime.retake），则P会被重新修改为_Pidle。
如果在程序中发生GC，则P会被设置为_Pgcstop，并在runtime.startTheWorld时重新调整为_Prunning。

G

type g struct {
    // 简单数据结构，lo 和 hi 成员描述了栈的下界和上界内存地址
    stack       stack   // offset known to runtime/cgo
    stackguard0 uintptr // offset known to liblink
    stackguard1 uintptr // offset known to liblink

    _panic *_panic // innermost panic - offset known to liblink
    _defer *_defer // innermost defer
    // 当前的m
    m *m // current m; offset known to arm liblink
    // goroutine切换时，用于保存g的上下文
    sched     gobuf
    syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc
    syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc
    stktopsp  uintptr // expected sp at top of stack, to check in traceback
    // 用于传递参数，睡眠时其他goroutine可以设置param，唤醒时该goroutine可以获取
    param        unsafe.Pointer // passed parameter on wakeup
    atomicstatus uint32
    stackLock    uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
    // 唯一的goroutine的ID
    goid int64
    // g被阻塞的大体时间
    waitsince  int64  // approx time when the g become blocked
    waitreason string // if status==Gwaiting
    schedlink  guintptr
    // 标记是否可抢占
    preempt        bool     // preemption signal, duplicates stackguard0 = stackpreempt
    paniconfault   bool     // panic (instead of crash) on unexpected fault address
    preemptscan    bool     // preempted g does scan for gc
    gcscandone     bool     // g has scanned stack; protected by _Gscan bit in status
    gcscanvalid    bool     // false at start of gc cycle, true if G has not run since last scan; TODO: remove?
    throwsplit     bool     // must not split stack
    raceignore     int8     // ignore race detection events
    sysblocktraced bool     // StartTrace has emitted EvGoInSyscall about this goroutine
    sysexitticks   int64    // cputicks when syscall has returned (for tracing)
    traceseq       uint64   // trace event sequencer
    tracelastp     puintptr // last P emitted an event for this goroutine
    // G被锁定只在这个m上运行
    lockedm  muintptr
    sig      uint32
    writebuf []byte
    sigcode0 uintptr
    sigcode1 uintptr
    sigpc    uintptr
    // 调用者的 PC/IP
    gopc uintptr // pc of go statement that created this goroutine
    // 任务函数
    startpc    uintptr // pc of goroutine function
    racectx    uintptr
    waiting    *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
    cgoCtxt    []uintptr      // cgo traceback context
    labels     unsafe.Pointer // profiler labels
    timer      *timer         // cached timer for time.Sleep
    selectDone uint32         // are we participating in a select and did someone win the race?

    gcAssistBytes int64
}

当使用go关键字新建一个goroutine时，runtime 会调用 newproc 来生成新的g，详细流程如下：

用 systemstack 切换到系统堆栈，调用 newproc1 ，newproc1 实现g的获取。

尝试从p的本地g空闲链表和全局g空闲链表找到一个g的实例。

如果上面未找到，则调用 malg 生成新的g的实例，且分配好g的栈和设置好栈的边界，接着添加到 allgs 数组里面，allgs保存了所有的g。

保存g切换的上下文，这里很关键，g的切换依赖 sched 字段。

生成唯一的goid，赋值给该g。

调用 runqput 将g插入队列中，如果本地队列还有剩余的位置，将G插入本地队列的尾部，若本地队列已满，插入全局队列。

如果有空闲的p 且 m没有处于自旋状态且 main goroutine已经启动，那么唤醒或新建某个m来执行任务。

看一下g的状态

可以看到G的状态很多，但没关系，这里先不深入去了解每个状态的切换(打算另一篇文章来介绍)，只要知道只有 _Grunnable 的G才能被M执行。 G退出的时候会做清理工作，将引用的对象都置为nil，这样对象就能被gc。这里特别强调一下 goexit ，因为当调度器执行完一个G时，并不会主动去循环调度，而是在 goexit 再次调用 schedule 来达到目的。