Concurrency Correctness (Cache) #42

nhusung · 2023-04-28T09:45:03Z

Shouldn‘t it be memory_order_acquire here:

Line 115 in 0278500

const uint64_t s = atomic_load_explicit(s_bucket, memory_order_relaxed);

Line 176 in 0278500

const uint32_t s = atomic_load_explicit(s_bucket, memory_order_relaxed);

I‘m not an expert about the C/C++ memory model, but as far as I understand it, the compiler or CPU is free to reorder the instructions roughly like this (i.e. move the load instructions of the bucket values before the load of the status):

    uint64_t bucket_a = bucket->a, bucket_b = bucket->b, bucket_c = bucket->c;
    const uint32_t s = atomic_load_explicit(s_bucket, memory_order_relaxed);
    // abort if locked or if part of a 2-part cache entry
    if (s & 0xc0000000) return 0;
    // abort if different hash
    if ((s ^ (hash>>32)) & 0x3fff0000) return 0;
    // abort if key different
    if (bucket_a != a || bucket_b != b || bucket_c != c) return 0;
    *res = bucket->res;
    // abort if status field changed after compiler_barrier()
    return atomic_load_explicit(s_bucket, memory_order_acquire) == s ? 1 : 0;

memory_order_acquire guarantees that “no reads or writes in the current thread can be reordered before this load,” which would fix the problem. In return one could probably change the memory_order_acquire of the second atomic load into memory_order_relaxed, at least I don’t see any need for memory_order_acquire here.

Also, atomic_compare_exchange_weak in the put functions could probably be replaced by atomic_compare_exchange_weak_explicit with memory_order_acq_rel for the success case and memory_order_relaxed for the fail case. This may have no effect on the generated assembly for strongly-ordered architectures such as x86-64 but may have a performance impact on others. Furthermore, it would be great if the compiler_barrier() macro and comments could be removed, they are somewhat confusing when reading the code for the first time.

The text was updated successfully, but these errors were encountered:

nhusung · 2023-04-30T10:14:26Z

In addition, the loads and stores here (and similarly for cache_get6() and cache_put6()) need to be atomic operations:

sylvan/src/sylvan_cache.c

Lines 182 to 183 in 0278500

    
           if (bucket->a != a || bucket->b != b || bucket->c != c) return 0; 
        
           *res = bucket->res;

sylvan/src/sylvan_cache.c

Lines 209 to 212 in 0278500

    
           bucket->a = a; 
        
           bucket->b = b; 
        
           bucket->c = c; 
        
           bucket->res = res;

Section 5.1.2.4 of the C23 language specification draft states that if “there is ambiguity about which side effect to a non-atomic object is visible, then there is a data race and the behavior is undefined.” Assume that thread T1 executes cache_get() up to the point immediately before the loads cited above (L182). Then thread T2 executes cache_put() for the same bucket up to the point immediately before the stores cited above (L209). Now, there is no proper happens-before relationship between the cited loads and stores. Therefore, the values visible to T1 are ambiguous and we have undefined behavior. As I read the language specification, it does not matter that this ambiguity is finally “resolved” by checking the status on line 185 in cache_get().

nhusung changed the title ~~Concurrency Correctness~~ Concurrency Correctness (Cache) Apr 28, 2023

volkm mentioned this issue Sep 4, 2024

Apple-silicon implies single-threaded sylvan moves-rwth/storm#600

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency Correctness (Cache) #42

Concurrency Correctness (Cache) #42

nhusung commented Apr 28, 2023

nhusung commented Apr 30, 2023

Concurrency Correctness (Cache) #42

Concurrency Correctness (Cache) #42

Comments

nhusung commented Apr 28, 2023

nhusung commented Apr 30, 2023