GNU OpenMP是怎么结束的

Haibin
Micro Topics
2025-06-04
529 Views
0 Comments
795 Words

gcc/libgomp/config/posix/simple-bar.h at 4e47e2f833732c5d9a3c3e69dc753f99b3a56737 · gcc-mirror/gcc

gcc/libgomp/parallel.c at e2bf0b3910de7e65363435f0a7fa606e2448a677 · gcc-mirror/gcc

void
GOMP_parallel_end (void)
{
  struct gomp_task_icv *icv = gomp_icv (false);
  if (__builtin_expect (icv->thread_limit_var != UINT_MAX, 0))
    {
      struct gomp_thread *thr = gomp_thread ();
      struct gomp_team *team = thr->ts.team;
      unsigned int nthreads = team ? team->nthreads : 1;
      gomp_team_end ();
      if (nthreads > 1)
    {
      /* If not nested, there is just one thread in the
         contention group left, no need for atomicity.  */
      if (thr->ts.team == NULL)
        thr->thread_pool->threads_busy = 1;
      else
        {
#ifdef HAVE_SYNC_BUILTINS
          __sync_fetch_and_add (&thr->thread_pool->threads_busy,
                    1UL - nthreads);
#else
          gomp_mutex_lock (&gomp_managed_threads_lock);
          thr->thread_pool->threads_busy -= nthreads - 1;
          gomp_mutex_unlock (&gomp_managed_threads_lock);
#endif
        }
    }
    }
  else
    gomp_team_end ();
}
ialias (GOMP_parallel_end)

GOMP_parallel_end() 是 GNU OpenMP（libgomp）用于结束一个 parallel 区域的核心函数。它的主要功能是拆解线程 team，回收资源，并更新线程池的使用情况。下面我将详细解释这个函数的逻辑，以及它是如何结束 parallel region 的。

🔧 函数功能总览

void GOMP_parallel_end(void)

该函数用于标志一个 OpenMP parallel 区域的结束。其核心职责包括：

判断是否启用了 thread_limit：
- 如果设置了 thread_limit（即 icv->thread_limit_var != UINT_MAX），则需要额外处理线程池的 threads_busy 字段。
调用 gomp_team_end()：
- 回收线程 team、清理相关数据结构，完成同步。
更新 threads_busy（仅当启用 thread_limit）：
- 如果 nthreads > 1，则更新 threads_busy，即“当前线程池中仍处于忙状态的线程数”。

📜 分步骤详解

1️⃣ 获取当前线程的 ICV（Implicit Control Variables）

struct gomp_task_icv *icv = gomp_icv(false);

获取当前线程的控制变量（如 thread_limit）。
false 表示不为 task 创建新的 ICV。

2️⃣ 检查是否设置了 `thread_limit`

if (__builtin_expect (icv->thread_limit_var != UINT_MAX, 0))

如果 thread_limit_var 设置了上限（即不是默认的 UINT_MAX），需要进行额外的线程池状态更新。
这种情况常出现在 OMP_THREAD_LIMIT=n 的环境变量设定下。

3️⃣ 调用 `gomp_team_end()`：拆解线程团队

gomp_team_end();

这是线程区域结束的核心函数，负责：
- 等待所有 worker thread 完成任务；
- 清理 struct gomp_team 数据结构；
- 重置当前线程的 team 状态。

4️⃣ 更新线程池的忙碌线程数

if (nthreads > 1) {
  if (thr->ts.team == NULL)
    thr->thread_pool->threads_busy = 1;
  else {
    __sync_fetch_and_add(&thr->thread_pool->threads_busy, 1UL - nthreads);
  }
}

✅ 情况 1：当前线程不再处于 nested team 中

if (thr->ts.team == NULL)
  thr->thread_pool->threads_busy = 1;

说明已经退回最外层线程，线程池只剩这个线程在执行（即主线程）。

🔄 情况 2：还在嵌套并行中

__sync_fetch_and_add(&thr->thread_pool->threads_busy, 1UL - nthreads);

原子操作减少线程池中繁忙线程的计数。
如果不支持原子指令，则使用互斥锁手动更新。

5️⃣ 默认情况（无 thread_limit）

else
  gomp_team_end();

如果没有启用 thread_limit，只需调用 gomp_team_end()，无需修改线程池状态。

🧠 小结：GNU OpenMP 如何结束 parallel region？

GOMP_parallel_end() 被主线程调用，表明 parallel region 结束。
它通过 gomp_team_end()：
- 等待所有线程完成；（原子更新）
- 销毁 team 数据结构；
- 重置线程状态。
如果启用了 thread_limit，还会更新线程池中活跃线程数量 threads_busy，以便合理安排后续线程复用。

如果你正在研究 libgomp 的线程池管理或嵌套并行行为（特别是 thread_limit 和 proc_bind），这部分逻辑与 GOMP_parallel_start() 和 gomp_team_start() 紧密配合，值得一起分析。

需要我进一步剖析 gomp_team_end() 的具体行为，或解释如何从调用到回收线程资源的完整流程吗？