Lock Free Queues
When you use queues in a concurrent program, you would usually synchronize push and pop functions with locks. You’d lock before pushing or popping and unlock right before the function ends. However, locking comes with some overhead in both creating and acquiring them, so it would be amazing if we could avoid using locks all together and still support concurrent operations. For many data structures like linked lists, stacks, and queues, we can make synchronized, lock-free versions of them with the help of the atomic function Compare and Swap (CAS). In C++, its called std::atomic<T>::compare_exchange_weak
.
What is CAS?
CAS was a CPU instruction first introduced by IBM in 1973. Remember the only operations that can be truly atomic are the bare CPU instructions. I’d like to thank the hardware guys for giving us CAS because it really is a great and useful instruction.
std::atomic<T>::compare_exchange_weak
compares the value of the atomic<T>, to the expected
value, and if they equal you replace the value of atomic is set to the desired
value. Here’s a rough C++ code for what it’s doing.
bool compare_exchange_weak(T& expected, T desired){
if this == expected:
this = desired;
return true
return false
}
Implementation
We will implement the queue as a linked list. You push things on the tail
and pop them out on the head
. Our node
structure includes a shared_ptr<T>
for the data and atomic<node*>
pointing to the next node in the queue. We need to use atomic types so that we can do the atomic operations on them. I’ve choose to use a shared_ptr<T>
for the data so that pop
can return a pointer rather than the value of data.
push
To start push
off, we create our new_node
that we want push to the end. Next, we atomic::load
the current node*
of the tail
on line 19. If old_tail->next
is nullptr
we know this is still the tail
, so we set old_tail->next
to new_node
. This is all done atomically on line 20 thanks to compare_exchange_weak
.
If multiple threads are calling push at the same time, we still wont have a race condition. Let’s see what happens when thread a and thread b call push
at the same time.
thread a and b set old_tail
to the same tail
on line 19. Then, thread a gets to line 20 first and executes the compare_exchange_weak
. Thus, tail->next
points to thread a’s new_node
so when thread b gets to line 20 old_tail->next
no longer is nullptr
thus it goes around the loop again. Thread a’s new_node
is the new tail and its thread a’s responsibility to update tail
to its new_node
which is done on line 23.
Thread b will then load the up to date tail
on line 21, then enqueue its new_node
and update the tail
. This is a success! We managed to enqueue both a and b’s new_node
concurrently with no locks!
pop
For pop
you start by loading the head
to old_head
. Then, if the old_head
still equals the head
, we set the head to head->next
which completes the popping part. All of that is done atomically on line 29. We also have a guard on line 28 making sure that our queue isn’t empty. old_head
is the node we popped, so on line 32 if that node isn’t nullptr
we’ll return a pointer to the actual value, data
.
The way we handle concurrent pop
calls is similar to concurrent push
calls. We load the old_head
, and as long as it’s still the head
we pop it using compare_exchange_weak
. Otherwise, we load up the new head on line 30 and try again.
Conclusion
Avoiding locks makes the queue much faster. Lock free code can be very tricky to write, so make sure you test your code well. If you want to use a production level lock free queue or a few other lock free data structures, check out the Boost library’s Lockfree data structures here.
Reference
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8674&rep=rep1&type=pdf