Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce 'markedForEviction' state for the Item. #183

Closed
wants to merge 1 commit into from

Conversation

igchor
Copy link
Contributor

@igchor igchor commented Dec 15, 2022

It is similar to 'moving' but requires ref count to be 0.

An item which is marked for eviction causes all incRef() calls to that item to fail.

This will be used to ensure that once item is selected for eviction, no one can interfere and prevent the eviction from suceeding.

'markedForEviction' relies on the same 'exlusive' bit as the 'moving' state. To distinguish between those two states, 'moving' add 1 to the refCount. This is hidden from the user, so getRefCount() will not return that extra ref.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 15, 2022
@facebook-github-bot
Copy link
Contributor

@jiayuebao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@igchor igchor force-pushed the eviction_and_moving_bit branch from ba7cbd6 to bd92995 Compare January 9, 2023 18:40
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

try {
return ref_.incRef();
} catch (exception::RefcountOverflow& e) {
throw exception::RefcountOverflow(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an extra layer of try-catch here. I think the reason is that we want to use the return value of ref_.incRef for the predicate test instead of just exception.

@therealgymmy Is it okay to regress a bit with refcount over/underflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I've added this rethrow is to keep the abstraction but if this does affect performance perhaps we could just build the message on RefCount.h level?

*
* An item can only be marked exclusive when `isInMMContainer` returns true.
* An item can only be marked exclusive when `isInMMContainer` returns true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking how we can make the docs better about these state transitions as now they are no longer just marking a bit. I suggest we do it in the following ways.

Taking markForEviction for as an example:

  1. What does the state mean. (An item is "for eviction" indicates the item is ready to be evicted.)
  2. When can an item enter this state. (An item can be marked for eviction when it has no access refcount and is both accessible and in MMContainer)
  3. What does it mean when an item is in the state. (An item is markedForEviction if its exclusive bit is set and accessRef is 0)
  4. What does leaving the state mean? (When unmakeForEviction, the exclusive bit of the item is unset. And does not depend on anything else).

Let's make sure that we touch these 4 points in both markEviction and markMoving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten the comments.

cachelib/allocator/Refcount.h Outdated Show resolved Hide resolved
cachelib/allocator/Refcount.h Outdated Show resolved Hide resolved
cachelib/allocator/CacheItem.h Show resolved Hide resolved
*
* An item can only be marked exclusive when `isInMMContainer` returns true.
* An item can only be marked exclusive when `isInMMContainer` returns true
* and item is not already exclusive nor moving and the ref count is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just say it "does not have exclusive bit set and access ref count is 0", which implies it is not moving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cachelib/allocator/CacheItem.h Show resolved Hide resolved
cachelib/allocator/CacheItem.h Show resolved Hide resolved
cachelib/allocator/CacheItem.h Show resolved Hide resolved

/** This function attempts to mark item for eviction.
* Can only be called on the item that is moving.*/
bool markForEvictionWhenMoving() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like an item can be linked (inMMContainer) but not accessible and be marked as moving. Then enter forEviction via this function.
Is this a problem? Should we check isLinked in this function as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it does not matter whether the item isLinked at this point or not. However, I actually realised that markForEviction does not need to depend on isAccessible - this was just an optimization I implemented some time ago but it does not seem to matter anymore - I removed that check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should do it the other way round: both makrForEviction and markMoving should check isAccessible.
Today, moving need the item to be in access container to succeed. (see moveRegularItem in CacheAllocator-inl.h).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we cannot require isAccessible for markMoving because markMoving is used also for SlabRelease and it should work for items not in the Access Container as well.

I can restore the check for markForEviction but as I said it's only an optimization. It's true that we require the item to be accessible in moveRegularItem but item can become inaccessible anytime after being marked as moving anyway (for example by concurrent insertOrReplace).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We don't require accessible today in markExcluive (Basically mark moving).

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igchor: can you help me understand why we need this change?

Today the goal of preventing anyone else from accessing an item while it is being prepared for eviction is to remove it from accessContainer. Is this insufficient?

My general impression reading the code is that we're changing the responsibility of managing an Item's state (and the right to access an item) to be exclusively within CacheItem logic itself. I don't understand why we're making this change. The change itself also doesn't seem to have reduced the complexity of eviction/moving management.

cachelib/allocator/CacheItem.h Show resolved Hide resolved
* The following four functions are used to track whether or not
* an item is currently in the process of being moved. This happens during a
* slab rebalance or resize operation or during eviction.
* The following two functions correspond to whether or not an item is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking an item for eviction will prevent anyone from incrementing refcounts.

Can we achieve the same by removing this item from mmContainer and accessContainer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, however, as I mentioned in the previous comment removing from accessContainer results in much higher contention (if done under mmContainer lock).

* returns false and leaves refCount_ unmodified.
*/
template <typename P, typename F>
bool atomicUpdateValue(P&& predicate, F&& newValueF) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. Thanks for abstracting this out.

@igchor igchor force-pushed the eviction_and_moving_bit branch from bd92995 to 88309d4 Compare February 9, 2023 22:43
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cachelib/allocator/CacheItem.h Show resolved Hide resolved

/** This function attempts to mark item for eviction.
* Can only be called on the item that is moving.*/
bool markForEvictionWhenMoving() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should do it the other way round: both makrForEviction and markMoving should check isAccessible.
Today, moving need the item to be in access container to succeed. (see moveRegularItem in CacheAllocator-inl.h).

@igchor igchor force-pushed the eviction_and_moving_bit branch from 88309d4 to b6e7301 Compare February 17, 2023 01:03
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@haowu14 haowu14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending back to you for questions. Maybe I miss something.
Thanks for addressing my previous comments.

} else {
// item is being evicted
return WriteHandle{};
}
return WriteHandle{it, *this};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/** This function attempts to mark item for eviction.
* Can only be called on the item that is moving.*/
bool markForEvictionWhenMoving() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We don't require accessible today in markExcluive (Basically mark moving).

cachelib/allocator/CacheItem.h Show resolved Hide resolved
@igchor igchor force-pushed the eviction_and_moving_bit branch from b6e7301 to 2460d1c Compare February 17, 2023 17:13
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@therealgymmy therealgymmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igchor: I made another pass. LGTM overall. Comments inline.

Comment on lines +313 to +314
* An item can only be marked moving when `isInMMContainer` returns true
* and does not have `kExclusive` bit set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we still need to maintain isInMMContainer == true in later changes? I'm wondering if all threads can synchronize operations strictly on the item's refcount & flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, I belive we still need this for SlabRelease but I'll I think about it.

Value exclusiveBitMask = getAdminRef<kExclusive>();

auto predicate = [linkedBitMask, exclusiveBitMask](const Value curValue) {
const bool linked = curValue & linkedBitMask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: const bool unlinked = !(curValue & linkedBitMask);

easier to read if (unlinked || alreadyExclusive)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return (raw & getAdminRef<kExclusive>()) && ((raw & kAccessRefMask) != 0);
}

/** This function attempts to mark item for eviction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add comments to explain that markForEvictionWhenMoving can only succeed when this item isOnlyMoving == true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, @igchor: I am wondering if the changes in this PR are introducing a form of "upgrade" and "write" locks to our cache item.

The analogy I have in mind:

  1. Multiple readers can access an item. (No exclusive bit set. IncRef() act as the way to getting hold of the read-lock, which is released when ItemHandle goes out of scope)
  2. Multiple readers and one upgrade can co-exist. (Exclusive bit set. Refcount > 1)
  3. Only one upgrade lock can be held at a time. (First thread setting exclusive bit wins)
  4. Upgrade lock can be upgraded to a write lock but only without any concurrent readers. (Exclusive bit set. Refcount == 1 -> Refcount == 0)
  5. Write lock can be held without an upgrade lock in place as long as no concurrent readers.
  6. Readers are blocked when write lock is held. (Refcount fails when exclusive bit is set)

Once I start thinking in terms of read/upgrade/write, I'm starting to appreciate the simplicity brought by this PR much more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good analogy. The logic changes slightly once we introduce WaitContext for moving items (in that case, 'readers' wait for 'upgrade' to finish it's work), but the reasoning for 'upgrading' still holds.

We even had an idea of using the 'moving' bit and Wait Context to replace some existing synchronization for Chained Items, but we haven't done any implementation yet.

Comment on lines 393 to 395
// An item is only moving when its refcount is one and only the exclusive
// bit among all the control bits is set. This indicates an item is already
// on its way out of cache and does not need to be moved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still true? Considering we also require this state to mark an item for eviction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the last part is not true - I removed it. We should be able to extend this description in later PR (when we introduce Wait Context for moving items).

Comment on lines +488 to +498
if ((++nCASFailures % 4) == 0) {
// this pause takes up to 40 clock cycles on intel and the lock cmpxchgl
// above should take about 100 clock cycles. we pause once every 400
// cycles or so if we are extremely unlucky.
folly::asm_volatile_pause();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to this diff but I'm wondering if this is still necessarily. The 40 clock cycles and 100 clock cycles were some measurement from >5 years ago.

It also seems unlikely for this to contend so heavily in typical usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually had some discussion and did some measurements of this internally. I'll try to get the data and see if it makes sense to change this.

const bool alreadyExclusive = curValue & bitMask;
if (!flagSet || alreadyExclusive) {
auto predicate = [linkedBitMask, exclusiveBitMask](const Value curValue) {
const bool linked = curValue & linkedBitMask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick again: notLinked = !(curValue & linkedBitMask)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* An item can only be marked exclusive when `isInMMContainer` returns true
* and the item is not yet marked as exclusive. This operation is atomic.
* An item can only be marked for eviction when `isInMMContainer` and
* `isAccessible` return true and item does not have `kExclusive` bit set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still true? We don't check for isAccessible here.

Separately, curious if we can remove the condition isInMMContainer == true down the road?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot to update the comment. Done.

The requirement for isInMMContainer == true is mainly needed for SlabRelease. Since SlabRelease does not take MMContainer lock, checking whether item is still in the container is the only possible synchronization mechanism - I'm not sure we can change that.

It is similar to 'moving' but requires ref count to be 0.

An item which is marked for eviction causes all incRef() calls to
that item to fail.

This will be used to ensure that once item is selected for eviction,
no one can interfere and prevent the eviction from suceeding.

'markedForEviction' relies on the same 'exlusive' bit as the 'moving'
state. To distinguish between those two states, 'moving' add 1 to the
refCount. This is hidden from the user, so getRefCount() will not
return that extra ref.
@igchor igchor force-pushed the eviction_and_moving_bit branch from 2460d1c to b9b7db3 Compare February 22, 2023 22:21
@facebook-github-bot
Copy link
Contributor

@igchor has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@haowu14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@haowu14 merged this pull request in 0bc9f66.

@haowu14
Copy link
Contributor

haowu14 commented Mar 9, 2023

@igchor We are ready for the next PR! Thanks for working on this.

@igchor
Copy link
Contributor Author

igchor commented Mar 9, 2023

@haowu14 thanks! The next PR is coming soon!

facebook-github-bot pushed a commit that referenced this pull request Jun 5, 2023
Summary:
This is the next part of the 'critical section' patch after #183.

Original PR (some of the changes already upstreamed): #172

This PR contains changes to findEviction only. The remaining part (changes in SlabRelease code) will be provided in the next (final) PR.

Pull Request resolved: #217

Reviewed By: therealgymmy

Differential Revision: D45071460

Pulled By: haowu14

fbshipit-source-id: 4d44d75182537369a50e3c1ebb10a7c844449875
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants