From 7680d3ac52559a8b67a8eaa9902185e79a75ed87 Mon Sep 17 00:00:00 2001 From: Igor Shishkin Date: Sun, 7 Jul 2024 10:56:33 +0300 Subject: [PATCH] Describe some basics in README (#2) Signed-off-by: Igor Shishkin --- README.md | 38 +++++++++++++++++++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ca21c87..7c46030 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,39 @@ # archived -Cloud native service so store versioned data in space-efficient manner +Cloud native service to store versioned data in space-efficient manner + +## How it works + +archived is inspired by `rsync --link-dest` which allowed to store package +mirrors without duplicating data for decades. And now archived makes this +approach unbound from local file systems by using modern era storage services +under the hood like S3. + +To do so archived relies on two storages: metadata and CAS. + +Metadata is a some kind of database to store all of the things: + +* containers - some kind of directories +* versions - immutable version of the data in container +* objects - named data BLOBs with some additional metadata + +Good example of metadata storage is a PostgreSQL database. + +CAS storage is a BLOB storage which stores the data behind objects. +CAS is actually an acronym means Content Addressed Storage which describes +how exactly it operates: stores BLOBs under content aware unique key (SHA256 +is used by default). + +Good example of CAS storage is S3. + +This approach allows to reduce raw data usage by linking duplicates instead +if storing copies. + +## archived components + +archived is built with microservice architecture containing the following +components: + +* access - HTTP server to allow data listing and fetching +* manage - gRPC API to manage containers, versions and objects +* CLI - CLI application to interact with manage component