多线程程序卡在优化模式但在 -O0 中正常运行
问题描述
我写了一个简单的多线程程序如下:
I wrote a simple multithreading programs as follows:
static bool finished = false;
int func()
{
size_t i = 0;
while (!finished)
++i;
return i;
}
int main()
{
auto result=std::async(std::launch::async, func);
std::this_thread::sleep_for(std::chrono::seconds(1));
finished=true;
std::cout<<"result ="<<result.get();
std::cout<<"
main thread id="<<std::this_thread::get_id()<<std::endl;
}
它在Visual Studio或-O0
中在gcc中的调试模式下正常运行,并在1<后打印结果/code> 秒.但是在发布模式或
-O1 -O2 -O3
下它卡住了并且不打印任何东西.
It behaves normally in debug mode in Visual studio or -O0
in gcc and print out the result after 1
seconds. But it stuck and does not print anything in Release mode or -O1 -O2 -O3
.
推荐答案
两个线程,访问一个非原子的、非保护的变量是 UB 这涉及finished
.您可以制作 std::atomic
类型的 finished
来解决这个问题.
Two threads, accessing a non-atomic, non-guarded variable are U.B. This concerns finished
. You could make finished
of type std::atomic<bool>
to fix this.
我的修复:
#include <iostream>
#include <future>
#include <atomic>
static std::atomic<bool> finished = false;
int func()
{
size_t i = 0;
while (!finished)
++i;
return i;
}
int main()
{
auto result=std::async(std::launch::async, func);
std::this_thread::sleep_for(std::chrono::seconds(1));
finished=true;
std::cout<<"result ="<<result.get();
std::cout<<"
main thread id="<<std::this_thread::get_id()<<std::endl;
}
输出:
result =1023045342
main thread id=140147660588864
coliru 现场演示
有人可能会认为'这是一个 bool
–大概有一点.这怎么可能是非原子的?(当我自己开始使用多线程时,我就这样做了.)
Somebody may think 'It's a bool
– probably one bit. How can this be non-atomic?' (I did when I started with multi-threading myself.)
但请注意,std::atomic
提供给您的不只是缺乏撕裂.它还使来自多个线程的并发读+写访问得到明确定义,阻止编译器假设重新读取变量将始终看到相同的值.
But note that lack-of-tearing is not the only thing that std::atomic
gives you. It also makes concurrent read+write access from multiple threads well-defined, stopping the compiler from assuming that re-reading the variable will always see the same value.
使 bool
不受保护、非原子会导致其他问题:
Making a bool
unguarded, non-atomic can cause additional issues:
- 编译器可能会决定将变量优化为一个寄存器,甚至将 CSE 多次访问优化为一个,并从循环中提升负载.
- 可能会为 CPU 内核缓存该变量.(在现实生活中,CPU 具有一致的缓存.这不是一个真正的问题,但 C++ 标准足够宽松,可以涵盖非连贯共享内存上的假设 C++ 实现,其中
atomic
和memory_order_relaxed
存储/加载将工作,但volatile
不会.为此使用 volatile 将是 UB,即使它在实际 C++ 实现中实际工作.)
- The compiler might decide to optimize variable into a register or even CSE multiple accesses into one and hoist a load out of a loop.
- The variable might be cached for a CPU core. (In real life, CPUs have coherent caches. This is not a real problem, but the C++ standard is loose enough to cover hypothetical C++ implementations on non-coherent shared memory where
atomic<bool>
withmemory_order_relaxed
store/load would work, but wherevolatile
wouldn't. Using volatile for this would be UB, even though it works in practice on real C++ implementations.)
为了防止这种情况发生,必须明确告知编译器不要这样做.
To prevent this to happen, the compiler must be told explicitly not to do.
关于 volatile
与这个问题的潜在关系的不断发展的讨论让我有点惊讶.因此,我想花掉我的两分钱:
I'm a little bit surprised about the evolving discussion concerning the potential relation of volatile
to this issue. Thus, I'd like to spent my two cents:
- volatile 对线程有用
- 谁害怕糟糕的优化编译器?.
这篇关于多线程程序卡在优化模式但在 -O0 中正常运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!