Do we need a memory acquire barrier for one-shot spinlocks?

For locks (spin locks, mutex), we usually need to add acquire fence when locking to ensure the functionality of the lock.

But is this necessary for one-shot spin locks? For example:

int val = 0;
atomic_int lock = 0;

void thread0(void)
{
    int tmp = 0;
    if (atomic_compare_exchange_strong_explicit(&lock, &tmp, 1, memory_order_relaxed, memory_order_relaxed)) { // do we need memory_order_acquire here ?
        assert(!val); // will it always success?
        val = 1;
    }
}

// same as thread0
void thread1(void)
{
    int tmp = 0;
    if (atomic_compare_exchange_strong_explicit(&lock, &tmp, 1, memory_order_relaxed, memory_order_relaxed)) {
        assert(!val);
        val = 1;
    }
}

More specifically, is the following code correct on the armv7-a architecture(There may be some differences from the C code mentioned above):

val:
    .long 0
lock:
    .long 0

core0:
    mov r0, #val
    mov r1, #lock
    mov r4, #1
2:
    ldrex r2, [r1]
    cmp r2, #0
    beq 1f
    bx  lr  // ret
1:
    strex r3, r4, [r1]
    cmp r3, #0
    bne 2b

    // without acquire fence
    ldr r5, [r0] // is r5 != 0 allowed?



core1:
    mov r0, #val
    mov r1, #lock
    mov r4, #1
2:
    ldrex r2, [r1]
    cmp r2, #0
    beq 1f
    bx  lr  // ret
1:
    strex r3, r4, [r1]
    cmp r3, #0
    bne 2b

    dmb ish  // acquire fence
    str r4, [r0]  // store 1

A more specific example (do work and do clean should not have race) :

#define EXIT_FLAG 1
#define WORK_FLAG 2

atomic_int state = 0;

void thread0(void)
{
    int tmp;
    while (1) {
        tmp = 0;
        if (!atomic_compare_exchange_strong_explicit(&state, &tmp, WORK_FLAG, memory_order_relaxed, memory_order_relaxed)) { // do we need acquire here?
            assert(tmp == EXIT_FLAG);
            return;
        }

        // do work

        tmp = WORK_FLAG;
        if (!atomic_compare_exchange_strong_explicit(&state, &tmp, 0, memory_order_release, memory_order_relaxed)) {
            assert(tmp == (EXIT_FLAG | WORK_FLAG));
            // do the clean
            return;
        }
    }
}

void thread1(void)
{
    int tmp = 0;

    while (1) {
        if (atomic_compare_exchange_strong_explicit(&state, &tmp, tmp | EXIT_FLAG, memory_order_acquire, memory_order_relaxed)) // we need acquire here to fit with release in thread0
            break;
    }

    if (!(tmp & WORK_FLAG)) {
        // do the clean
    }
}

This section is about the first code block, where both threads do
if (lock.CAS_strong(0, 1, relaxed)){ assert(val==0); val=1; }

The assert (in the first code block) always succeeds because only one thread or the other ever runs the if body, and it’s sequenced before the val=1 in the same thread.

The atomic RMW decides which wins the race to be that thread, but it doesn’t need to sync with any previous writer to prevent overlap of their “critical sections”. (https://preshing.com/20120913/acquire-and-release-semantics/)

You don’t have a critical section. The load of val could have happened before the CAS, but that would still be fine because you’d still load the initial value then. And there’s no release store so nothing lets other threads know that your val update is complete.

I wouldn’t call it a one-shot spinlock, that has potentially misleading implications like that you’ll sync-with the lock variable and that other threads can see when you’re done. (BTW, lock needs to bet atomic_int aka _Atomic int not plain int.)

This is a bit like a guard variable for a non-constant initializer for a static variable which is one-shot for the whole program, although that actually does still need acquire, unfortunately, unless you can separate the case where one thread very recently finished init vs. cases where we’ve already synced-with the init. Maybe a third value for the guard variable and something like a global all-threads membarrier() system call that runs itself on all cores / threads.

Second code block

The new code has one thread running an infinite loop around try_lock() / do work / unlock. The other thread spinning on a CAS(acquire) to set another bit that will make the first thread stop, and to sync with its release store.

But there are no shared variables other than atomic_int state;, so there’s no difference between relaxed and release/acquire. Operations on a single object don’t reorder with each other within the same thread even for relaxed. seq_cst would forbid store-forwarding where one thread sees its own stores before they’re globally visible (to all threads).

The first loop will exit (via one if or the other) on the first failed CAS_strong. So as soon as the second thread succeeds at its CAS to set the second bit.

This isn’t a locking algorithm: the second thread can succeed at setting EXIT_FLAG while the first thread is inside its “critical section”, i.e. while WORK_FLAG is set in state. No strengthening of the memory_order parameter to any of the operations can change this.

Without acquire in the first thread’s CAS which “takes the lock” (setting WORK_FLAG), later operations can become visible to other threads before state changes. That’s a problem for a normal lock, but this is far enough from being a lock that it’s not obvious what exactly do work and do the clean are supposed to be.

Only one thread will run do the clean; either the first or second thread depending on when the second thread succeeds at a CAS.

Do we need a memory acquire barrier for one-shot spinlocks?

TL;DR: Yes.

It looks like your “one-shot” is meant to describe a device in which you try only once to acquire the lock represented by an atomic object, possibly failing, instead of retrying until succeeding. But retrying is what the “spin” part of “spinlock” describes, so your “one-shot spinlock” is an oxymoron.

The question seems odd in another way, too. If you accept, as you seem to do, that you need acquire or stronger memory semantics on locking for a regular spinlock to have its intended effect (combined with release or stronger when unlocking), then why do you have any doubt about needing the same for your one-shot device? After all, a regular spinlock might well succeed on its first locking attempt, and that’s equivalent to the case where your one-shot device succeeds. That the one-shot device might also fail does not suggest that it could get away with weaker memory semantics in the success case.

In particular, a release store by thread T₁ ensures that all memory operations by T₁ that happen before the store are visible in each thread T₂ that performs an acquire load that observes the release store or another operation that happens after it. Relaxed loads and stores do not provide those guarantees. The memory ordering semantics are about the visibility of non-atomic memory operations, not about the visibility of accesses to the atomic object. There may be special cases where it makes sense to perform locking that does not provide memory-ordering semantics for non-atomic memory operations, but those are atypical.

You don’t necessarily need a full barrier / fence per se, however. At least in the C semantics. You need only a release paired with an acquire on the lock object. The C stdatomic functions do not assume that that will require fences, though it might be implemented with fences for a given target architecture.

The new issue in your one-shot case is what memory semantics you need or want when acquiring the lock fails. Relaxed ordering should be fine for most purposes in the locking failure case, as you must then assume that it is unsafe to access any data protected by the lock anyway. Moreover, the C interfaces do not allow you to specify release or acquire / release for the failure case, which makes sense because in that case, there is no store to the atomic object. C does forbid the memory semantics being stronger than those for success, so with respect to the C interfaces, the only viable alternatives that avoid sequentially consistent ordering are acquire or acquire / release on success combined with acquire or relaxed on failure.

The weakest memory ordering that works for standard locking semantics is:

_Atomic int lock = 0;

/*
 * Attempts to acquire the lock, returning 1 on success and 0 on failure.
 * Success affects memory with acquire semantics.
 * Failure has relaxed memory semantics.
 */
_Bool try_lock(void) {
    int tmp = 0;
    return atomic_compare_exchange_strong_explicit(
        &lock, &tmp, 1, memory_order_acquire, memory_order_relaxed);
}

/*
 * Unconditionally releases the lock, affecting memory with release
 * semantics.  Should be called only by a thread that currently holds
 * the lock.
 */
void release_lock(void) {
  atomic_store_explicit(&lock, 0, memory_order_release);
}

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 21:32

Thẻ: cassemblyarmatomicmemory-barriers

Thiết kế website giá rẻ

Danh mục

Do we need a memory acquire barrier for one-shot spinlocks?

Second code block