Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency for File State #374

Merged
merged 64 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
1350081
Delete file immediately (not async) in CachePurge
foodprocessor Nov 25, 2024
e58a516
Set test timeout to 1s.
foodprocessor Nov 25, 2024
e37f609
Remove unused deleteEvent channel
foodprocessor Nov 25, 2024
63f412d
Use cacheTimeoutMonitor even when timeout is zero.
foodprocessor Nov 26, 2024
3e71a0f
Do not use CachePurge for eviction, even when timeout is zero.
foodprocessor Nov 26, 2024
5d2e6af
fixup
foodprocessor Nov 26, 2024
8a6f792
Merge branch 'zero-timeout-evicts-all' into delete-files-synchronously
foodprocessor Nov 26, 2024
2b62986
Use fileLocks to keep file info coherent
foodprocessor Nov 26, 2024
1704b57
Merge remote-tracking branch 'origin/main' into delete-files-synchron…
foodprocessor Nov 26, 2024
77206e8
Remove locks in Chmod & FlushFile to avoid deadlock
foodprocessor Nov 27, 2024
383e8bf
Use object paths when locking files in RenameDir.
foodprocessor Nov 27, 2024
965676c
Close loopback file handles to avoid issues (especially on Windows).
foodprocessor Nov 27, 2024
4c77941
Prevent attribute data corruption
foodprocessor Nov 27, 2024
60321b5
Fix remaining race conditions detected in file cache
foodprocessor Nov 27, 2024
ff15366
Lock file to protect calls to stat.
foodprocessor Nov 27, 2024
4f3f95a
Merge branch 'fix-race-conditions' into delete-files-synchronously
foodprocessor Nov 27, 2024
7b45c80
Make timeout minimum 1s.
foodprocessor Nov 27, 2024
f810edf
Merge branch 'timeout-minimum-1s' into delete-files-synchronously
foodprocessor Nov 27, 2024
bcb288d
Reinstate test which was flaky due to a race condition
foodprocessor Nov 27, 2024
2ea9b61
Remove protection from stat call, which is not within the scope of th…
foodprocessor Nov 27, 2024
74e3769
Merge branch 'fix-race-conditions' into delete-files-synchronously
foodprocessor Nov 27, 2024
3fe1ede
Change test config default to default timeout (120s)
foodprocessor Nov 27, 2024
5b3a6a1
Slow down test check loops to 100ms
foodprocessor Nov 27, 2024
95f6228
Add custom configs for eviction tests.
foodprocessor Nov 27, 2024
f5707fb
Don't open to prevent eviction in tests (unless testing that mechanic).
foodprocessor Nov 27, 2024
0caeb2f
Open from cache when no object exists
foodprocessor Dec 21, 2024
16c8290
Rename downloadFile to openFileInternal
foodprocessor Dec 21, 2024
6eb909b
openFileInternal runs with flock already locked
foodprocessor Dec 21, 2024
787c623
Remove redundant code
foodprocessor Dec 21, 2024
a5b2e51
Replace lost flock.Inc
foodprocessor Dec 21, 2024
2d0e64a
Add test to verify that OpenFile changes what GetAttr returns when op…
foodprocessor Dec 21, 2024
730c271
Do not use &internal.ObjAttr when you mean nil!
foodprocessor Dec 21, 2024
b0c87be
Merge remote-tracking branch 'main' into delete-files-synchronously
foodprocessor Dec 31, 2024
b3136ef
Merge branch 'local-files-open-right-away' into delete-files-synchron…
foodprocessor Dec 31, 2024
1c83708
Add lazy open state in flock, since it affects the whole file's state…
foodprocessor Jan 1, 2025
d59c324
Refactor isDownloadRequired, and use LazyOpen state variable to allow…
foodprocessor Jan 1, 2025
d9d2845
Refactor openFile
foodprocessor Jan 1, 2025
e21dcc1
Update file state in CreateFile
foodprocessor Jan 1, 2025
80fee42
Use O_CREAT flag when testing creation with OpenFile.
foodprocessor Jan 1, 2025
dc7fc99
Merge branch 'local-files-open-right-away' into refactor-openFile
foodprocessor Jan 1, 2025
78be394
Merge branch 'refactor-openFile' into delete-files-synchronously
foodprocessor Jan 1, 2025
b0cbc15
Update state in DeleteFile
foodprocessor Jan 1, 2025
ffd5aea
Update file state in CachePurge (and not speculatively outside of it)
foodprocessor Jan 1, 2025
2ffa0cb
Improve openFile file state logic
foodprocessor Jan 1, 2025
9f47003
Protect file state in Close and Flush (and Sync by consequence)
foodprocessor Jan 1, 2025
094417d
make flushFileInternal update the file state itself
foodprocessor Jan 1, 2025
5e1a000
make renameCachedFile protect and update file states
foodprocessor Jan 1, 2025
80d659c
When locking more that one file, always go in lexical order to preven…
foodprocessor Jan 2, 2025
143e7c4
Add TODOs for directory operations
foodprocessor Jan 2, 2025
4841a19
Lock all files before renaming or deleting a whole directory in one o…
foodprocessor Jan 2, 2025
56a509c
Merge branch 'delete-files-synchronously' into maintain-file-state
foodprocessor Jan 2, 2025
d6a3c72
Make CachePurge update the file state
foodprocessor Jan 2, 2025
f712e7f
Remove file state flags which are never read, and the complexity they…
foodprocessor Jan 2, 2025
ccaca08
Protect file state in GetAttr and Chmod
foodprocessor Jan 2, 2025
b519f5b
Remove unused flags in platform-specific file cache files
foodprocessor Jan 2, 2025
1defe41
Avoid deadlock when flushFileInternal calls Chmod
foodprocessor Jan 2, 2025
74ed602
Don't use base name to find the object name when WalkDir is recursive
foodprocessor Jan 2, 2025
681b140
Don't search and lock files in DeleteDir, since it is only called on …
foodprocessor Jan 7, 2025
c23d36b
Simplify DeleteDir, since the directory being deleted is empty
foodprocessor Jan 7, 2025
6655269
Revert accidental code comment change
foodprocessor Jan 2, 2025
a45a761
Merge branch 'main' into delete-files-synchronously
foodprocessor Jan 11, 2025
077d979
Move helper functions to where they are used
foodprocessor Jan 11, 2025
0cefa32
Remove deadlock bug from incorrect branch merge
foodprocessor Jan 14, 2025
64997fd
Fix open handle count issue with Rename
jfantinhardesty Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion component/file_cache/cache_policy.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ type cachePolicy interface {
UpdateConfig(cachePolicyConfig) error

CacheValid(name string) // Mark the file as hit
CachePurge(name string) // Schedule the file for deletion
CachePurge(name string) // Delete the file from cache

IsCached(name string) bool // Whether or not the cache policy considers this file cached

Expand Down
71 changes: 40 additions & 31 deletions component/file_cache/file_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -442,7 +442,11 @@ func (fc *FileCache) invalidateDirectory(name string) {
if err == nil && d != nil {
if !d.IsDir() {
log.Debug("FileCache::invalidateDirectory : removing file %s from cache", path)
objPath := common.JoinUnixFilepath(name, d.Name())
flock := fc.fileLocks.Get(objPath)
flock.Lock()
fc.policy.CachePurge(path)
flock.Unlock()
} else {
// remember to delete the directory later (after its children)
directoriesToPurge = append(directoriesToPurge, path)
Expand Down Expand Up @@ -541,14 +545,19 @@ func (fc *FileCache) StreamDir(options internal.StreamDirOptions) ([]*internal.O
if token == "" {
for _, entry := range dirents {
entryPath := common.JoinUnixFilepath(options.Name, entry.Name())
if !entry.IsDir() && !fc.fileLocks.Locked(entryPath) {
if !entry.IsDir() {
// This is an overhead for streamdir for now
// As list is paginated we have no way to know whether this particular item exists both in local cache
// and container or not. So we rely on getAttr to tell if entry was cached then it exists in cloud storage too
// If entry does not exists on storage then only return a local item here.
_, err := fc.NextComponent().GetAttr(internal.GetAttrOptions{Name: entryPath})
if err != nil && (err == syscall.ENOENT || os.IsNotExist(err)) {
info, err := entry.Info() // Grab local cache attributes
// get the lock on the file, to allow any pending operation to complete
flock := fc.fileLocks.Get(entryPath)
flock.Lock()
// use os.Stat instead of entry.Info() to be sure we get good info (with flock locked)
info, err := os.Stat(filepath.Join(localPath, entry.Name())) // Grab local cache attributes
flock.Unlock()
// If local file is not locked then only use its attributes otherwise rely on container attributes
if err == nil {
// Case 2 (file only in local cache) so create a new attributes and add them to the storage attributes
Expand Down Expand Up @@ -621,7 +630,13 @@ func (fc *FileCache) RenameDir(options internal.RenameDirOptions) error {
newPath := strings.Replace(path, localSrcPath, localDstPath, 1)
if !d.IsDir() {
log.Debug("FileCache::RenameDir : Renaming local file %s -> %s", path, newPath)
srcFlock := fc.fileLocks.Get(path)
dstFlock := fc.fileLocks.Get(newPath)
srcFlock.Lock()
dstFlock.Lock()
fc.renameCachedFile(path, newPath)
srcFlock.Unlock()
dstFlock.Unlock()
} else {
log.Debug("FileCache::RenameDir : Creating local destination directory %s", newPath)
// create the new directory
Expand Down Expand Up @@ -755,7 +770,6 @@ func (fc *FileCache) validateStorageError(path string, err error, method string,
return nil
}

// DeleteFile: Invalidate the file in local cache.
func (fc *FileCache) DeleteFile(options internal.DeleteFileOptions) error {
log.Trace("FileCache::DeleteFile : name=%s", options.Name)

Expand All @@ -771,11 +785,6 @@ func (fc *FileCache) DeleteFile(options internal.DeleteFileOptions) error {
}

localPath := filepath.Join(fc.tmpPath, options.Name)
err = deleteFile(localPath)
if err != nil && !os.IsNotExist(err) {
log.Err("FileCache::DeleteFile : failed to delete local file %s [%s]", localPath, err.Error())
}

fc.policy.CachePurge(localPath)

return nil
Expand Down Expand Up @@ -1026,12 +1035,6 @@ func (fc *FileCache) closeFileInternal(options internal.CloseFileOptions, flock
if options.Handle.Fsynced() {
log.Trace("FileCache::closeFileInternal : fsync/sync op, purging %s", options.Handle.Path)
localPath := filepath.Join(fc.tmpPath, options.Handle.Path)

err = deleteFile(localPath)
if err != nil && !os.IsNotExist(err) {
log.Err("FileCache::closeFileInternal : failed to delete local file %s [%s]", localPath, err.Error())
}

fc.policy.CachePurge(localPath)
return nil
}
Expand Down Expand Up @@ -1173,6 +1176,11 @@ func (fc *FileCache) FlushFile(options internal.FlushFileOptions) error {
//defer exectime.StatTimeCurrentBlock("FileCache::FlushFile")()
log.Trace("FileCache::FlushFile : handle=%d, path=%s", options.Handle.ID, options.Handle.Path)

// lock the file state
flock := fc.fileLocks.Get(options.Handle.Path)
flock.Lock()
defer flock.Unlock()

// The file should already be in the cache since CreateFile/OpenFile was called before and a shared lock was acquired.
localPath := filepath.Join(fc.tmpPath, options.Handle.Path)
fc.policy.CacheValid(localPath)
Expand Down Expand Up @@ -1312,28 +1320,20 @@ func (fc *FileCache) GetAttr(options internal.GetAttrOptions) (*internal.ObjAttr

// To cover cases 2 and 3, grab the attributes from the local cache
localPath := filepath.Join(fc.tmpPath, options.Name)
flock := fc.fileLocks.Get(options.Name)
flock.Lock()
info, err := os.Stat(localPath)
flock.Unlock()
// All directory operations are guaranteed to be synced with storage so they cannot be in a case 2 or 3 state.
if err == nil && !info.IsDir() {
if exists { // Case 3 (file in cloud storage and in local cache) so update the relevant attributes
// Return from local cache only if file is not under download or deletion
// If file is under download then taking size or mod time from it will be incorrect.
if !fc.fileLocks.Locked(options.Name) {
log.Debug("FileCache::GetAttr : updating %s from local cache", options.Name)
attrs.Size = info.Size()
attrs.Mtime = info.ModTime()
} else {
log.Debug("FileCache::GetAttr : %s is locked, use storage attributes", options.Name)
}
log.Debug("FileCache::GetAttr : updating %s from local cache", options.Name)
attrs.Size = info.Size()
attrs.Mtime = info.ModTime()
} else { // Case 2 (file only in local cache) so create a new attributes and add them to the storage attributes
if !strings.Contains(localPath, fc.tmpPath) {
// Here if the path is going out of the temp directory then return ENOENT
exists = false
} else {
log.Debug("FileCache::GetAttr : serving %s attr from local cache", options.Name)
exists = true
attrs = newObjAttr(options.Name, info)
}
log.Debug("FileCache::GetAttr : serving %s attr from local cache", options.Name)
exists = true
attrs = newObjAttr(options.Name, info)
}
}

Expand Down Expand Up @@ -1375,6 +1375,7 @@ func (fc *FileCache) RenameFile(options internal.RenameFileOptions) error {
return nil
}

// both src and dst file locks should be locked before calling renameCachedFile
func (fc *FileCache) renameCachedFile(localSrcPath string, localDstPath string) {
err := os.Rename(localSrcPath, localDstPath)
if err != nil {
Expand Down Expand Up @@ -1453,6 +1454,10 @@ func (fc *FileCache) TruncateFile(options internal.TruncateFileOptions) error {
func (fc *FileCache) Chmod(options internal.ChmodOptions) error {
log.Trace("FileCache::Chmod : Change mode of path %s", options.Name)

flock := fc.fileLocks.Get(options.Name)
flock.Lock()
defer flock.Unlock()

// Update the file in cloud storage
err := fc.NextComponent().Chmod(options)
err = fc.validateStorageError(options.Name, err, "Chmod", false)
Expand Down Expand Up @@ -1487,6 +1492,10 @@ func (fc *FileCache) Chmod(options internal.ChmodOptions) error {
func (fc *FileCache) Chown(options internal.ChownOptions) error {
log.Trace("FileCache::Chown : Change owner of path %s", options.Name)

flock := fc.fileLocks.Get(options.Name)
flock.Lock()
defer flock.Unlock()

// Update the file in cloud storage
err := fc.NextComponent().Chown(options)
err = fc.validateStorageError(options.Name, err, "Chown", false)
Expand Down
30 changes: 7 additions & 23 deletions component/file_cache/file_cache_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ func (suite *fileCacheLinuxTestSuite) SetupTest() {
rand := randomString(8)
suite.cache_path = common.JoinUnixFilepath(home_dir, "file_cache"+rand)
suite.fake_storage_path = common.JoinUnixFilepath(home_dir, "fake_storage"+rand)
defaultConfig := fmt.Sprintf("file_cache:\n path: %s\n offload-io: true\n timeout-sec: 0\n\nloopbackfs:\n path: %s", suite.cache_path, suite.fake_storage_path)
defaultConfig := fmt.Sprintf("file_cache:\n path: %s\n offload-io: true\n timeout-sec: 1\n\nloopbackfs:\n path: %s", suite.cache_path, suite.fake_storage_path)
jfantinhardesty marked this conversation as resolved.
Show resolved Hide resolved
log.Debug(defaultConfig)

// Delete the temp directories created
Expand Down Expand Up @@ -111,23 +111,15 @@ func (suite *fileCacheLinuxTestSuite) cleanupTest() {

func (suite *fileCacheLinuxTestSuite) TestChmodNotInCache() {
defer suite.cleanupTest()
// Setup
// Setup - create file directly in fake storage
path := "file33"
handle, _ := suite.fileCache.CreateFile(internal.CreateFileOptions{Name: path, Mode: 0777})
suite.fileCache.CloseFile(internal.CloseFileOptions{Handle: handle})

_, err := os.Stat(suite.cache_path + "/" + path)
for i := 0; i < 1000 && !os.IsNotExist(err); i++ {
time.Sleep(10 * time.Millisecond)
_, err = os.Stat(suite.cache_path + "/" + path)
}
suite.assert.True(os.IsNotExist(err))
suite.loopback.CreateFile(internal.CreateFileOptions{Name: path, Mode: 0777})

// Path should be in fake storage
suite.assert.FileExists(suite.fake_storage_path + "/" + path)

// Chmod
err = suite.fileCache.Chmod(internal.ChmodOptions{Name: path, Mode: os.FileMode(0666)})
err := suite.fileCache.Chmod(internal.ChmodOptions{Name: path, Mode: os.FileMode(0666)})
suite.assert.NoError(err)

// Path in fake storage should be updated
Expand Down Expand Up @@ -178,7 +170,6 @@ func (suite *fileCacheLinuxTestSuite) TestChmodCase2() {
err = suite.fileCache.FlushFile(internal.FlushFileOptions{Handle: createHandle})
suite.assert.NoError(err)

// Path should be in the file cache with old mode (since we failed the operation)
info, err := os.Stat(suite.cache_path + "/" + path)
suite.assert.NoError(err)
suite.assert.EqualValues(info.Mode(), newMode)
Expand All @@ -187,6 +178,7 @@ func (suite *fileCacheLinuxTestSuite) TestChmodCase2() {
suite.assert.NoError(err)

// loop until file does not exist - done due to async nature of eviction
jfantinhardesty marked this conversation as resolved.
Show resolved Hide resolved
time.Sleep(time.Second)
_, err = os.Stat(suite.cache_path + "/" + path)
for i := 0; i < 1000 && !os.IsNotExist(err); i++ {
time.Sleep(10 * time.Millisecond)
Expand All @@ -206,23 +198,15 @@ func (suite *fileCacheLinuxTestSuite) TestChownNotInCache() {
defer suite.cleanupTest()
// Setup
path := "file36"
handle, _ := suite.fileCache.CreateFile(internal.CreateFileOptions{Name: path, Mode: 0777})
suite.fileCache.CloseFile(internal.CloseFileOptions{Handle: handle})

_, err := os.Stat(suite.cache_path + "/" + path)
for i := 0; i < 1000 && !os.IsNotExist(err); i++ {
time.Sleep(10 * time.Millisecond)
_, err = os.Stat(suite.cache_path + "/" + path)
}
suite.assert.True(os.IsNotExist(err))
suite.loopback.CreateFile(internal.CreateFileOptions{Name: path, Mode: 0777})

// Path should be in fake storage
suite.assert.FileExists(suite.fake_storage_path + "/" + path)

// Chown
owner := os.Getuid()
group := os.Getgid()
err = suite.fileCache.Chown(internal.ChownOptions{Name: path, Owner: owner, Group: group})
err := suite.fileCache.Chown(internal.ChownOptions{Name: path, Owner: owner, Group: group})
suite.assert.NoError(err)

// Path in fake storage should be updated
Expand Down
Loading
Loading