This is a brief Python (3.10.0) project that aims to synchronise source and destination directories using a client-server that communications over an IP.
This was developed and tested on Windows 11, so I cannot verifiy if it will work on Linux.
To run the program, run main.py and specify the existing source and destination folders as the --src and --dst arguments respectively, such as:
python .\src\main.py --src=test_components/src/ --dst=test_components/dst
To run the behavioural tests:
behave .\test\features\
The client component recursively scans the source directory and retrieves the path of every file found. The client then iterates over the list of paths, and determines if the files are:
- New - the path hasn't been saved by the client and thus not synchronised yet.
- Modified - the path has been seen already, and the modification timestamp metadata of the file is more recent than the timestamp of the last scan
- Deleted - a previously found file is not present in the list of found paths within the directory.
Once the list of new, modified and deleted files has been obtained, the client shall read the data of the file, and encode it within the ISO8859-1 scheme. This scheme facilities the sending and writing of different file extensions, while being in a relatively compact format, and has yielded great results when testing against the popular document, video, and audio file types. The component then wraps this encoded data with the relative file path to the source directory folder within a JSON object, and sends the data to the server over an IP, facilitated by the socket library.
If the file has been deleted, the client simply sends the path to the server. While this works well, it doesn't inform of behaviour well - given the message simply contains a path. While adding complexity, this could be improved by creating two different channels of communication between the client-server, one for writing data and one for deletion of files at a given path. As a result, it becomes incredibly clear to the server, or any future software, what it should do with the data.
This project features behavioural tests using the behave library. The high-level scenarios and result are shown below, but additional breakdown can be found in /test/features/*.feature
.
- Synchronise single tiered directory - Passed ✅
- Synchronise multi tiered directory - Passed ✅
- Synchronise copied file - Passed ✅
- Synchronise directories when a file is removed - Passed ✅
- The client is given a large file, but it is too large to send. - Passed ✅
- A rejected large file is later deleted from the client directory. - Passed ✅
- Synchronise .mp4 files - Passed ✅
- Synchronise .avi files - Passed ✅
- Synchronise .mov files - Passed ✅
- Synchronise .jpg files - Passed ✅
- Synchronise .tiff files - Passed ✅
- Synchronise .png files - Passed ✅
- Synchronise .mp3 files - Passed ✅
- Synchronise .wav files - Passed ✅
- Synchronise .pdf files - Passed ✅
- Synchronise .docx files - Passed ✅
- Synchronise .xlsx files - Passed ✅
Other than the aforementioned communication, another improvement could be to reduce the amount of data required to be sent in the event of a file being modified. Currently, if a file is updated, all the file's data is resent to the server, which is inefficient. To improve this, the client could firstly identify a series of modifications, and then upload each series with a starting position within the file. Therefore, if a text file containing the string Hello
were to be updated to Hello World!
, the client would simply need to send World!
along with an index informing the server to insert World!
to the end of the string.
Furthermore, testing could be improved with stub messaging. Currently, tests in place are behavioural tests for the entire system, and so doesn't particularly well inform at a low-level what has failed. This could be improved by messaging interfaces, which could be implemented using the socket library for actual software usage, but could also be implemented by a stub messaging class that simply sends and/or receives data through a list. As a result, it would allow a further breakdown of testing - for example testing that if a new file was added, the stub messaging list would receive a single entry for the file, and could be verified standalone without the use of the server. I do something similar at work, hence the observation!