IncRef:
refcount.fetch_add(1, std::memory_order_relaxed)
DecRef:
if (refcount.load(std::memory_order_relaxed) == 1 ||
refcount.fetch_sub(1, std::memory_order_release) == 1) {
std::atomic_thread_fence(std::memory_order_acquire);
delete this;
}
Is it safe to compare the refcount to 1, and immediately delete the object without an actual release operation? The reasoning is that the only time the refcount can be 1 is if the current thread has the only reference.
Luddes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
I’m not sure if delete this
is ever safe; I know this
isn’t allowed to be null. But you could rework this to be not a member function so it’s ptr->refcount
and delete ptr
.
I think this has a problem if another thread can create another reference with IncRef
while you’re in the middle of a DecRef
. If you always did refcount--
, their fetch_add
would see the old value as 0
and know that another thread has already or is about to delete the object and they were too late in trying to get a reference.
Without that, they have no way to distinguish that case from simply being the second reference.
But if that’s impossible, e.g. because new references are created by an owner and then given to another thread, yes I think this is safe.
The acquire
fence is necessary in case another thread just wrote to the object and did a DecRef, to make sure the stuff they did to the object’s members happens-before the delete
. And yes, using an acquire
fence means the load can be relaxed
instead of acquire
and the RMW can be just release
not acq_rel
. On some ISA (like AArch64) it might be more efficient to use acquire
and acq_rel
operations and avoid a separate fence.
You’re optimizing for the last-reference case by avoiding an RMW there, at the cost of making other calls slower. That might or might not be good, depending on your use-case.
A couple extra instructions to load+branch before you RMW+branch, and that’s another branch that needs to be predicted correctly.
Potentially you get the cache line into MESI Shared state with the read-only access, but then the RMW also misses in cache and has to do a read-for-ownership (RFO) to get it into MESI Exclusive/Modified state, so you have two off-core communications, hopefully pipelined with each other if the ==1
branch predicts correctly. (Other than that, a load right before an RMW is nothing to worry about on typical CPUs.)
2