When you use queues in a concurrent program, you would usually synchronize push and pop functions with locks. You’d lock before pushing or popping and unlock right before the function ends. However, locking comes with some overhead in both creating and acquiring them, so it would be amazing if we could avoid using locks all together and still support concurrent operations. For many data structures like linked lists, stacks, and queues, we can make synchronized, lock-free versions of them with the help of the atomic function Compare and Swap (CAS). In C++, its called std::atomic<T>::compare_exchange_weak.

What is CAS?

CAS was a CPU instruction first introduced by IBM in 1973. Remember the only operations that can be truly atomic are the bare CPU instructions. I’d like to thank the hardware guys for giving us CAS because it really is a great and useful instruction.

std::atomic<T>::compare_exchange_weak compares the value of the atomic<T>, to the expected value, and if they equal you replace the value of atomic is set to the desired value. Here’s a rough C++ code for what it’s doing.

Implementation

We will implement the queue as a linked list. You push things on the tail and pop them out on the head. Our node structure includes a shared_ptr<T> for the data and atomic<node*> pointing to the next node in the queue. We need to use atomic types so that we can do the atomic operations on them. I’ve choose to use a shared_ptr<T> for the data so that pop can return a pointer rather than the value of data.

push

To start push off, we create our new_node that we want push to the end. Next, we atomic::load the current node* of the tail on line 19. If old_tail->next is nullptr we know this is still the tail, so we set old_tail->next to new_node. This is all done atomically on line 20 thanks to compare_exchange_weak.

If multiple threads are calling push at the same time, we still wont have a race condition. Let’s see what happens when thread a and thread b call push at the same time.

thread a and b set old_tail to the same tail on line 19. Then, thread a gets to line 20 first and executes the compare_exchange_weak. Thus, tail->next points to thread a’s new_node so when thread b gets to line 20 old_tail->next no longer is nullptr thus it goes around the loop again. Thread a’s new_node is the new tail and its thread a’s responsibility to update tail to its new_node which is done on line 23.

Thread b will then load the up to date tail on line 21, then enqueue its new_node and update the tail. This is a success! We managed to enqueue both a and b’s new_node concurrently with no locks!

pop

For pop you start by loading the head to old_head. Then, if the old_head still equals the head, we set the head to head->next which completes the popping part. All of that is done atomically on line 29. We also have a guard on line 28 making sure that our queue isn’t empty. old_head is the node we popped, so on line 32 if that node isn’t nullptr we’ll return a pointer to the actual value, data.

The way we handle concurrent pop calls is similar to concurrent push calls. We load the old_head, and as long as it’s still the head we pop it using compare_exchange_weak. Otherwise, we load up the new head on line 30 and try again.

Conclusion

Avoiding locks makes the queue much faster. Lock free code can be very tricky to write, so make sure you test your code well. If you want to use a production level lock free queue or a few other lock free data structures, check out the Boost library’s Lockfree data structures here.

Reference

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8674&rep=rep1&type=pdf

C/C++ | Computer Systems | Low Latency | Distributed Systems | Computer Networks | Operating Systems