The C++ Memory Model¶
Data Race: Load Modify Store¶
A data race is what the memory model is there to help understand and prevent
Here: Load-Modify-Store-Conflict
⟶ Integer access is not atomic
#include <thread>
#include <array>
#include <iostream>
#include <cassert>
int main()
{
const unsigned long numtimes = 1000*1000*1000UL;
unsigned long counter = 0;
auto threadfunc = [&counter]() {
unsigned long n = numtimes;
while (n--)
counter++;
};
std::array threads{
std::thread(threadfunc),
std::thread(threadfunc),
};
for (auto& t: threads)
t.join();
std::cout << "expected " << threads.size()*numtimes << ", real " << counter << '\n';
return 0;
}
$ g++ -O0 load-modify-store.cpp
$ ./a.out
expected 2000000000, real 1066359623
Data Race: Load Modify Store (Explained)¶
Innocent
counter++
isLoad from memory
Increment in CPU register
Write back to memory
CPU A |
CPU B |
Memory |
||
Instr |
Loc |
Instr |
Loc |
Glob |
load |
42 |
42 |
||
42 |
load |
42 |
42 |
|
inc |
43 |
42 |
||
43 |
inc |
43 |
42 |
|
43 |
store |
43 |
43 |
|
store |
43 |
43 |
43 |
Optimizations¶
Your program does not run how you wrote it
Compiler optimizations (see here for an overview)
CPU optimizations: e.g. read and write reordering
⟶ the order in which writes becomes visible to other threads/CPUs is not what you think
(Similar for reads)
⟶ no sequential order guaranteed
CPU Optimization: Write Reordering¶
The following code has “undefined behavior”
A threads that accesses the object through
global_widget
might see the pointer in memory before the memory itselfThis is done by the CPU
⟶
Widget* w = new Widget(...); // <-- write: memory allocation, Widget initialization
global_widget = w; // <-- write: global_widget
Another Race: std::list
Is Not Threadsafe¶
#include <list>
#include <array>
#include <thread>
#include <iostream>
int main()
{
const unsigned long numtimes = 1000*1000UL;
std::list<int> ints;
auto threadfunc = [&ints](){
unsigned long n = numtimes;
while (n--)
ints.push_back(666); // <-- non-atomic two-pointer update
};
std::array threads{
std::thread(threadfunc),
std::thread(threadfunc),
};
for (auto& t: threads)
t.join();
std::cout << "#elements: " << ints.size() << '\n';
return 0;
}
$ g++ -O0 list.cpp
$ ./a.out
#elements: 1476027
Somehow pointers have been updated concurrently and incorrectly
⟶ A typical case for a mutex
Protecting std::list
With A Mutex¶
#include <list>
#include <array>
#include <thread>
#include <mutex>
#include <iostream>
int main()
{
const unsigned long numtimes = 1000*1000UL;
std::list<int> ints;
std::mutex ints_lock;
auto threadfunc = [&ints, &ints_lock](){
unsigned long n = numtimes;
while (n--) {
ints_lock.lock(); // <-- acquire
ints.push_back(666);
ints_lock.unlock(); // <-- release
}
};
std::array threads{
std::thread(threadfunc),
std::thread(threadfunc),
};
for (auto& t: threads)
t.join();
std::cout << "#elements: " << ints.size() << '\n';
return 0;
}
$ g++ -O0 list.cpp
$ ./a.out
#elements: 2000000
What’s A Mutex?¶
A boolean flag, with states “open” and “closed”
Operations “acquire” and “release”
Acquire
Flag is “open”
Set it “closed”
Flag is “closed”
Enqueue yourself onto list of waiters
Enter wait state at the same time (⟶ Lost Wakeup Problem)
Retry on wakeup
Release
Reset flag to “open”
Wake any waiters
What if there is something wrong with the flag?
For example:
Memory Order¶
enum class memory_order
{
relaxed,
consume,
acquire, // <-- cf lock
release, // <-- cf unlock
acq_rel,
seq_cst // <-- atomic ++
};
Release And Acquire Ordering¶
Mutex: memory/flag requirements
Release order: when flag is released (set to “open”), the reader/locker must see all writes that happened before “open”
⟶ CPU must not reorder writes across flag write
Acquire order: the opposite side must cooperate and tag their flag read operations accordingly
⟶ CPU must not reorder reads across flag read
Release And Acquire Ordering: A Naive Spinlock¶
Spinlock: does not block the locker if locked, but rather spins actively until free
⟶ less context switches
Good for short critical sections
Not good if both contenders are scheduled on the same CPU
std::atomic_flag
is guaranteed to be lock-freestd::atomic<>
specializations are not
#include <list>
#include <array>
#include <thread>
#include <mutex>
#include <iostream>
class spinlock
{
public:
void lock()
{
while (true) {
bool oldstate = _flag.test_and_set(std::memory_order::acquire);
if (oldstate == false) // <-- flag changed to true -> **locked**
break;
}
}
void unlock()
{
_flag.clear(std::memory_order::release); // <-- send "open" to reader
}
private:
std::atomic_flag _flag;
};
int main()
{
const unsigned long numtimes = 1000*1000UL;
std::list<int> ints;
spinlock ints_lock; // <-- mutex replacement
auto threadfunc = [&ints, &ints_lock](){
unsigned long n = numtimes;
while (n--) {
ints_lock.lock(); // <-- acquire
ints.push_back(666);
ints_lock.unlock(); // <-- release
}
};
std::array threads{
std::thread(threadfunc),
std::thread(threadfunc),
};
for (auto& t: threads)
t.join();
std::cout << "#elements: " << ints.size() << '\n';
return 0;
}
Sequential Consistency: Resolving Load-Modify-Store-Conflict¶
#include <atomic>
#include <thread>
#include <array>
#include <iostream>
#include <cassert>
int main()
{
const unsigned long numtimes = 1000*1000*1000UL;
std::atomic<unsigned long> counter = 0;
auto threadfunc = [&counter]() {
unsigned long n = numtimes;
while (n--)
counter++;
};
std::array threads{
std::thread(threadfunc),
std::thread(threadfunc),
};
for (auto& t: threads)
t.join();
std::cout << "expected " << threads.size()*numtimes << ", real " << counter << '\n';
return 0;
}