Skip to content
/ ltsm Public
forked from GSI-HPC/ltsm

LTSM - Lightweight TSM API, Lustre TSM Copytool and TSM Console Client for Archiving Data

License

Notifications You must be signed in to change notification settings

inkdot7/ltsm

 
 

Repository files navigation

LTSM - Lightweight TSM API, Lustre TSM Copytool for Archiving Data, TSM Console Client and Lustre TSM File Storage Queue API and Daemon.

Tag Version License

This project consists of five parts:

  1. Lightweight TSM API/library (called tsmapi) supporting operations (archiving, retrieving, deleting, querying).
  2. Lustre TSM Copytool.
  3. TSM console client.
  4. Benchmark and test suite.
  5. Lustre TSM storage API/library and daemon.

Introduction into Lustre HSM

Lustre has since version 2.5 hierarchical storage management (HSM) capabilities, that is, data can be automatically archived to low-cost storage media such as tape storage systems and seamlessly retrieved when accessing the data on Lustre clients. The Lustre HSM capabilities can be illustrated by means of state diagrams and the related Lustre commands lfs hsm_archive, lfs hsm_release, lfs hsm_restore as depicted below. Statediagram The initial state is a new file. Once the file is archived, it changes into state archived and a identical copy exists on the HSM storage. If the archived file is modified, then the state changes to dirty. As a consequence, one has to re-archive the file to reobtain an identical copy on the Lustre side as well as the HSM storage. If the state changes to released, then the bulk data of the file is moved to the HSM storage and only the file skeleton is kept. If the file is accessed or the lfs hsm_restore command is executed, the file is restored and the state is changes back to archived.

Lustre HSM Framework

The aim of the Lustre HSM framework is to provide an interface for seamlessly archiving data which is no longer used, and place them into a long term retrievable archive storage (such as tape drives). The meta information of the archived data such as the file name, user id, etc. is however still present in the Lustre file system, whereas the bulk data is placed on the retrievable storage. When the released data is accessed, the bulk data is seamlessly copied from the retrievable storage back to the Lustre file system. In summary, the data is still available regardless of where it is actually stored. For providing such a functionality, the HSM framework is embedded in the main Lustre components and explained for the case of retrieving data in the Figure below. HSMArchitecture Triggered actions and data flows for retrieving data inside the Lustre HSM framework. The participating nodes (depicted as dashed frames) are: Client, MDS, OSS, Copytool, Storage. The Lustre client opens a file which is already archived, thus the bulk data is not available on any OST. The MDT node receives the open call, allocate objects on the OST and notifies the HSM coordinator to copy in data for file indentifier (short FID) of the corresponding file. The copytool receives from the HSM coordinator the event to copyin data of FID and notifies the internal data mover for retrieving the data stored on the storage node and copy it over to a OST. Once the copy process is completed, the data mover notifies the HSM agent with a copydone message and the HSM agent finalizes the process with a HSM copydone message back to the MDT node.

TSM Overview

Tivoli Storage Manager (now renamed to IBM Spectrum Protect) (TSM) is a client/server product from IBM employed in heterogeneous distributed environments to backup and archive data. The difference between a backup and an archive can be summarized as follows:

  • Backup: A copy of the data is stored in the event the original becomes lost or damaged. Typically an incremental (forever) backup strategy is performed.
  • Archive: Remove from an on-line system those data no longer in day to day use, and place them into a long term retrievable storage (such as tape drives).

The core capabilities and features of a TSM server are:

  • Compression: Compress data stream seamless either on client or server side.
  • Deduplication: Eliminating duplicate copies of repeating data.
  • Collocation: Store and pack data of a client in few number of tapes as much as possible to reduce the number of media mounts and for minimizing tape drive movements.
  • Storage hierarchies: Automatically move data from faster devices to slower devices based on characteristics such as file size or storage capacity.

The last issue is a crucial feature for the Lustre HSM framework, as it enables effectively archiving and retrieving data to fast devices, rather than being tied up to slow but large and cheap devices (see Fig. below). fig:storage.hierarchy Storage hierarchy of a TSM server. Fast but limited space storage devices (high cost per GByte) are located at the top of the pyramid, whereas slower but large space storage devices (low cost per GByte) are located at the bottom. Fast storage devices act as a cache layer, where data in automatically migrated by the TSM server based on access pattern and space occupation level. More details can be found in IBM Tivoli Storage Manager Implementation Guide (pp. 257).

Denote r1 the transfer rate to (fast) storage devices s1 and r2 the transfer rate to (slow) storage devices s2. Make sure to set the migration threshold t from s1 to s2 to a proper value which suits your use case. Example: Let r1 = 800MB/sec, r2 = 200MB/sec, s1 = 150TB and migration threshold is set to t = 0.6. If r1 > r2, then after t · s1 / (r1 - r2) seconds, s1 is 100% full. For that example it yields 41.66667 hours, when continuously writing into s1 with rate r1.

Data Organized on TSM Server

The TSM server is an object storage server and is developed for storing and retrieving named objects. Similar to Lustre, the TSM server has two main storage areas, database and data storage. The database contains the metadata of the objects such as unique identifiers, names and attributes, whereas the data storage contains the object data. Each object stored on the TSM server has a name associated with it that is determined by the client. The object name is composed of:

  • fs: File space name (mount point),
  • hl: High level name (directory name),
  • ll: Low level name (file name),

and must be specified to identify the object for operations query, archive, retrieve and delete. In addition to determined TSM metadata information, user defined metadata can be stored on the TSM server of size up to 255 bytes. The user defined space is employed to store Lustre information as well as a CRC32 check-sum and a magic id (see Fig. below).

Copytool

As described above, a copytool is composed of a HSM proxy and data mover that runs on the same node. Loosely speaking, the copytool receives archive, retrieve and delete actions from MDT node and triggers data moving operations. In more detail it is a daemon process that requires a Lustre mount point for accessing Lustre files and processes actions HSMA_ARCHIVE, HSMA_RESTORE, HSMA_REMOVE and HSMA_CANCEL as depicted below.

The functions ct_archive(session), ct_restore(session) and ct_remove(session) implement the core actions for accessing named objects and for transferring data between the Lustre mount point and the TSM storage.

Implementation Details

For achieving high data throughput by means of parallelism the copytool employs the producer-consumer model. The data structure of the model is a concurrent queue. The producer is a single thread receiving HSM action items from the MDS’s and enqueues the items in the queue. There are multiple consumer threads, each having a session opened to the TSM server, which are dequeueing items and executing the HSM actions (see Fig. below).

Overview of a single-producer multiple-consumer model for a produce thread (P-Thread) and four consumer threads (C-Threads). Three C-Threads dequeued HSM action items and are executing the actions. The P-Thread received from the MDS a new HSM action item and enqueues the item into the concurrent queue. Once the enqueue operation is successful, C-Thread_4 will be notified to dequeue and execute new HSM action.

Getting Started

Before using LTSM a working access to a TSM server is required. One can install for testing purposes a fully working TSM server (e.g. inside a KVM) for a period of 30 days before the license expires. A complete installation and setup guide is provided at TSM Server Installation Guide. Download and install the TSM API client provided at 7.1.X.Y-TIV-TSMBAC-LinuxX86_DEB.tar. Make sure to install tivsm-api64.amd64.deb where the header files dapitype.h, dsmapips.h, ... and the low-level library libApiTSM64.so are provided.

Compile lhsmtool_tsm and ltsmc

Clone the git repository

git clone https://github.com/tstibor/ltsm

and execute

./autogen.sh && ./configure CFLAGS='-g -DDEBUG -O0' --enable-tests

If Lustre header files are not found you will get the message

...
configure: WARNING: cannot find Lustre header files, use --with-lustre-headers=PATH
configure: WARNING: cannot find Lustre API headers and/or Lustre API library
...
configured ltsm:

build fsqapi library           : yes
build tsmapi library and ltsmc : yes
build ltsmfsq and lhsmtool_tsm : no
build test suite               : yes

and the console client ltsmc as well as the fsqlib are built only. For building also the Lustre Copytool thus make sure the Lustre header files and the Lustre library liblustreapi.so are available and the paths are correctly specified e.g.

./autogen.sh && ./configure CFLAGS='-g -DDEBUG -O0' --with-lustre-src=/usr/local/include/lustre LDFLAGS='-L/usr/local/lib' --with-tsm-headers=/opt/tivoli/tsm/client/api/bin64/sample --enable-tests

If required TSM and optional Lustre header files and libraries are found the following executable files are provided:

  • src/lhsmtool_tsm (Lustre TSM Copytool)
  • src/ltsmc (Console client)
  • src/ltsmfsq (Lustre TSM File System Daemon)
  • src/test/test_cds (Test suite for data structures (linked-list, queue, hashtable, etc.))
  • src/test/test_tsmapi (Test suite for tsmapi)
  • src/test/test_fsqapi (Test suite for fsqapi)
  • src/test/ltsmbench (Benchmark suite for measuring threaded archive/retrieve performance)

Install or Build DEB/RPM Package

Download and install already built Debian Stretch package ltsm_0.8.0_amd64.deb or CentOS 7.7 package ltsm-0.8.0-1.x86_64.rpm. In addition you can build the rpm package yourself as follows

git clone https://github.com/tstibor/ltsm && cd ltsm && ./autogen.sh && ./configure && make rpms

or the deb package

git clone https://github.com/tstibor/ltsm && cd ltsm && ./autogen.sh && ./configure && make debs

Make sure your Linux distribution supports systemd. For customizing the options of systemd started lhsmtool_tsm daemon, edit file /etc/default/lhsmtool_tsm.

Lustre Copytool

Make sure you have enabled the HSM functionality on the MGS/MDS, e.g. lctl set_param 'mdt.<LNAME>-MDT0000.hsm_control=enabled and TSM server is running as well as dsm.sys contains the proper entries such as

SErvername              polaris-kvm-tsm-server
   Nodename             polaris
   TCPServeraddress     192.168.254.102

If the compilation process is successful, then you should see the following information ./src/lhsmtool_tsm --help

usage: ./src/lhsmtool_tsm [options] <lustre_mount_point>
	-a, --archive-id <int> [default: 0]
		archive id number
	-t, --threads <int>
		number of processing threads [default: 1]
	-n, --node <string>
		node name registered on tsm server
	-p, --password <string>
		password of tsm node/owner
	-o, --owner <string>
		owner of tsm node
	-s, --servername <string>
		hostname of tsm server
	-c, --conf <file>
		option conf file
	-v, --verbose {error, warn, message, info, debug} [default: message]
		produce more verbose output
	--abort-on-error
		abort operation on major error
	--daemon
		daemon mode run in background
	--dry-run
		don't run, just show what would be done
	--restore-stripe
		restore stripe information
	--enable-maxmpc
		enable tsm mount point check to infer the maximum number of feasible threads
	-h, --help
		show this help

IBM API library version: 7.1.8.0, IBM API application client version: 7.1.8.0
version: 0.8.0 © 2017 by GSI Helmholtz Centre for Heavy Ion Research

Starting the copytool with 4 threads and archive_id=1 works as follows ./src/lhsmtool_tsm --restore-stripe -a 1 -v debug -n polaris -p polaris -s 'polaris-kvm-tsm-server' -t 4 /lustre where parameters, -n = nodename, -p = password and -s = servername have to match with those setup on the TSM Server (see TSM Server Installation Guide).

[D] 1509708481.724705 [15063] lhsmtool_tsm.c:262 using TSM filespace name '/lustre'
[D] 1509708481.743621 [15063] tsmapi.c:739 dsmSetUp: handle: 0 ANS0302I (RC0)    Successfully done.
[M] 1509708481.743642 [15063] lhsmtool_tsm.c:836 tsm_init: session: 1
[D] 1509708481.927847 [15063] tsmapi.c:785 dsmInitEx: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509708481.936992 [15063] tsmapi.c:806 dsmRegisterFS: handle: 1 ANS0242W (RC2062) On dsmRegisterFS the filespace is already registered
[D] 1509708482.241943 [15063] tsmapi.c:1262 [rc=0] extract_hl_ll
fpath: '/lustre/.mount/.test-maxnummp'
fs   : '/lustre'
hl   : '/.mount'
ll   : '/.test-maxnummp'

[I] 1509708482.241976 [15063] tsmapi.c:1020 query structure
fs   : '/lustre'
hl   : '/.mount'
ll   : '/.test-maxnummp'
owner: ''
descr: '*'
[D] 1509708482.242415 [15063] tsmapi.c:1023 dsmBeginQuery: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509708482.276913 [15063] tsmapi.c:1039 dsmGetNextQObj: handle: 1 ANS0258I (RC2200) On dsmGetNextQObj or dsmGetData there is more available data
[D] 1509708482.276953 [15063] tsmapi.c:1039 dsmGetNextQObj: handle: 1 ANS0272I (RC121)  The operation is finished
[D] 1509708482.276963 [15063] tsmapi.c:1233 [rc:0] get_qra: 0
[D] 1509708482.277211 [15063] tsmapi.c:1198 dsmBeginTxn: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509708482.277226 [15063] tsmapi.c:1208 dsmDeleteObj: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509708482.321404 [15063] tsmapi.c:1217 dsmEndTxn: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509708482.321415 [15063] tsmapi.c:1240 [rc:0] tsm_del_obj: 0
[I] 1509708482.321424 [15063] tsmapi.c:705 [delete] object # 0
fs: /lustre, hl: /.mount, ll: /.test-maxnummp
object id (hi,lo)                          : (0,34772)
object info length                         : 48
object info size (hi,lo)                   : (0,1) (1 bytes)
object type                                : DSM_OBJ_DIRECTORY
object magic id                            : 71147
crc32                                      : 0x00000000 (0000000000)
archive description                        : node mountpoint check
owner                                      :
insert date                                : 2017/11/03 12:28:01
expiration date                            : 2018/11/03 12:28:01
restore order (top,hi_hi,hi_lo,lo_hi,lo_lo): (2,0,11491,0,0)
estimated size (hi,lo)                     : (0,1) (1 bytes)
lustre fid                                 : [0:0x0:0x0]
lustre stripe size                         : 0
lustre stripe count                        : 0
...
...
...
[I] 1509708482.321437 [15063] tsmapi.c:2118 passed mount point check
[D] 1509708482.321440 [15063] lhsmtool_tsm.c:871 Abort on error 0
[D] 1509708482.322375 [15063] lhsmtool_tsm.c:645 waiting for message from kernel

Note, there is unfortunately no low-level TSM API call to query the maximum number of mount points (that is parallel sessions). This number is an upper limit of the number of parallel threads of the copytool. To determine the maximum number of mount points and thus set the number of threads appropriately, one can apply a trick and send dummy transactions to the TSM server until one receives a certain error code. That is the reason why at start you see (in debug mode) the output of node mountpoint check when starting the copytool with option --enable-maxmpc.

Once the copytool is started one can run the Lustre commands lfs hsm_archive, lfs hsm_release, lfs hsm_restore as depicted in the state diagram.

In the following example we archive the Lustre wiki website: wget -O - -o /dev/null http://wiki.lustre.org > /lustre/wiki.lustre.org && sudo lfs hsm_archive /lustre/wiki.lustre.org. If the command was successful you should see:

[M] 1509710509.205682 [16973] lhsmtool_tsm.c:680 copytool fs=ldomov archive#=1 item_count=1
[M] 1509710509.205705 [16973] lhsmtool_tsm.c:735 enqueue action 'ARCHIVE' cookie=0x59f9b6f9, FID=[0x200000401:0x17:0x0]
[D] 1509710509.205717 [16973] lhsmtool_tsm.c:645 waiting for message from kernel
[D] 1509710509.205803 [16976] lhsmtool_tsm.c:596 dequeue action 'ARCHIVE' cookie=0x59f9b6f9, FID=[0x200000401:0x17:0x0]
[M] 1509710509.205823 [16976] lhsmtool_tsm.c:532 '[0x200000401:0x17:0x0]' action ARCHIVE reclen 72, cookie=0x59f9b6f9
[D] 1509710509.207877 [16976] lhsmtool_tsm.c:361 [rc=0] ct_hsm_action_begin on '/lustre/wiki.lustre.org'
[M] 1509710509.207901 [16976] lhsmtool_tsm.c:366 archiving '/lustre/wiki.lustre.org' to TSM storage
[D] 1509710509.208148 [16976] lhsmtool_tsm.c:376 [fd=10] llapi_hsm_action_get_fd()
[D] 1509710509.208196 [16976] lhsmtool_tsm.c:389 [rc=0,fd=10] xattr_get_lov '/lustre/wiki.lustre.org'
[I] 1509710509.208201 [16976] tsmapi.c:1964 tsm_archive_fpath:
fs: /lustre, fpath: /lustre/wiki.lustre.org, desc: (null), fd: 10, *lustre_info: 0x7fd54c7bad70
[D] 1509710509.208534 [16976] tsmapi.c:1792 [rc:0] extract_hl_ll:
fpath: /lustre/wiki.lustre.org
fs   : /lustre
hl   : /
ll   : /wiki.lustre.org

[D] 1509710509.208938 [16976] tsmapi.c:1547 dsmBeginTxn: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509710509.208958 [16976] tsmapi.c:1555 dsmBindMC: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509710509.208993 [16976] tsmapi.c:1576 dsmSendObj: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509710509.209499 [16976] tsmapi.c:1608 dsmSendData: handle: 1 ANS0302I (RC0)    Successfully done.
[I] 1509710509.209510 [16976] tsmapi.c:1615 cur_read: 20313, total_read: 20313, total_size: 20313
[D] 1509710509.209747 [16976] tsmapi.c:1663 dsmEndSendObj: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509710509.394623 [16976] tsmapi.c:1674 dsmEndTxn: handle: 1 ANS0302I (RC0)    Successfully done.
[I] 1509710509.394646 [16976] tsmapi.c:1702
*** successfully archived: DSM_OBJ_FILE /lustre/wiki.lustre.org of size: 20313 bytes with settings ***
fs: /lustre
hl: /
ll: /wiki.lustre.org
desc:

[D] 1509710509.440357 [16976] tsmapi.c:1707 [rc:0] tsm_obj_update_crc32, crc32: 0x7d0db330 (2098049840)
[M] 1509710509.440370 [16976] lhsmtool_tsm.c:402 archiving '/lustre/wiki.lustre.org' to TSM storage done
[M] 1509710509.440681 [16976] lhsmtool_tsm.c:325 action completed, notifying coordinator cookie=0x59f9b6f9, FID=[0x200000401:0x17:0x0], err=0
[D] 1509710509.441332 [16976] lhsmtool_tsm.c:338 [rc=0] llapi_hsm_action_end on '/lustre/wiki.lustre.org' ok

Let us use our ltsmc to query (for fun) some TSM object information of /lustre/wiki.lustre.org.

>src/ltsmc -f /lustre --query -v debug -n polaris -p polaris -s 'polaris-kvm-tsm-server' /lustre/wiki.lustre.org
...
...
[D] 1509712134.421465 [18010] tsmapi.c:1303 [rc:0] extract_hl_ll:
fpath: /lustre/wiki.lustre.org
fs   : /lustre
hl   : /
ll   : /wiki.lustre.org

[I] 1509712134.421487 [18010] tsmapi.c:1020 query structure
fs   : '/lustre'
hl   : '/'
ll   : '/wiki.lustre.org'
owner: ''
descr: '*'
[D] 1509712134.421833 [18010] tsmapi.c:1023 dsmBeginQuery: handle: 1 ANS0302I (RC0)    Successfully done.
[D] 1509712134.455811 [18010] tsmapi.c:1039 dsmGetNextQObj: handle: 1 ANS0258I (RC2200) On dsmGetNextQObj or dsmGetData there is more available data
[D] 1509712134.455866 [18010] tsmapi.c:1039 dsmGetNextQObj: handle: 1 ANS0258I (RC2200) On dsmGetNextQObj or dsmGetData there is more available data
[D] 1509712134.455927 [18010] tsmapi.c:1039 dsmGetNextQObj: handle: 1 ANS0272I (RC121)  The operation is finished
[I] 1509712134.455945 [18010] tsmapi.c:705 [query] object # 0
fs: /lustre, hl: /, ll: /wiki.lustre.org
object id (hi,lo)                          : (0,34778)
object info length                         : 48
object info size (hi,lo)                   : (0,20313) (20313 bytes)
object type                                : DSM_OBJ_FILE
object magic id                            : 71147
crc32                                      : 0x7d0db330 (2098049840)
archive description                        :
owner                                      :
insert date                                : 2017/11/03 13:01:49
expiration date                            : 2018/11/03 13:01:49
restore order (top,hi_hi,hi_lo,lo_hi,lo_lo): (2,0,11497,0,0)
estimated size (hi,lo)                     : (0,20313) (20313 bytes)
lustre fid                                 : [0x200000401:0x17:0x0]
lustre stripe size                         : 1048576
lustre stripe count                        : 1

Lustre TSM File Storage Queue Daemon

The goal of the Lustre TSM File System Daemon (short ltsmfsq) is to efficiently and robustly transfer data to a Lustre file system and additionally archive the data seamlessly on a TSM server. A deployed Lustre file system is frequently shared and accessed by thousands of users and can suffer from latency delays. To overcome this problem, the daemon implements a straightforward socket communication protocol and employs (similar to the copytool architecture) a queue data-structure and multiple producer-consumer model to leverage asynchronous data transfer by using an intermediate local file system. The daemon can be started as follows:

>ltsmfsq -l /fsqdata -i identmap.conf -s 16 -q 12 -v info /lustre
[I] 1579167563.378896 [174822] ltsmfsq.c:137 node: 'polaris', servername: 'tsmserver-8', archive_id: 15, uid: 1000, gid: 2000
[M] 1579167563.382178 [174822] ltsmfsq.c:1618 listening on port 7625 with 16 socket threads, 12 queue worker threads, local fs '/fsqdata' and number of tolerated file errors 16
[M] 1579167563.382268 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/0
[M] 1579167563.382309 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/1
[M] 1579167563.382353 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/2
[M] 1579167563.382387 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/3
[M] 1579167563.382418 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/4
[M] 1579167563.382448 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/5
[M] 1579167563.382480 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/6
[M] 1579167563.382511 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/7
[M] 1579167563.382544 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/8
[M] 1579167563.382575 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/9
[M] 1579167563.382606 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/10
[M] 1579167563.382641 [174822] ltsmfsq.c:1539 created queue consumer thread fsq_queue/11

For interfacing the daemon from a client, use the function calls provided in: fsqapi.h

void fsq_init(struct fsq_login_t *fsq_login, const char *servername,
              const char *node, const char *password,
              const char *owner, const char *platform,
              const char *fsname, const char *fstype,
              const char *hostname, const int port);
int fsq_fconnect(struct fsq_login_t *fsq_login,
                 struct fsq_session_t *fsq_session);
void fsq_fdisconnect(struct fsq_session_t *fsq_session);
int fsq_fopen(const char *fs, const char *fpath, const char *desc,
              struct fsq_session_t *fsq_session);
ssize_t fsq_fwrite(const void *ptr, size_t size, size_t nmemb,
                   struct fsq_session_t *fsq_session);
int fsq_fclose(struct fsq_session_t *fsq_session);

Tips for Tuning

There are basically 3 knobs for adjusting the archive/retrieve performance.

  1. The Txnbytelimit option for adjusting the number of bytes the tsmapi buffers before it sends a transaction to the TSM server. The option depends on the workload, a value of 2GByte resulted in good performance on most tested machines and setups.
  2. The buffer length TSM_BUF_LENGTH for sending and receiving TSM bulk data. Empirical investigations showed that a buffer length of 2^15 - 4 = 32764 resulted in good performance on most tested machines and setups.
  3. Maximum number of TSM mount points (that is parallel threaded sessions) and related QUEUE_MAX_ITEMS setting. As described above, this parameter is crucial for achieving high throughput. By means of QUEUE_MAX_ITEMS the maximum number of HSM actions items in the queue is determined. That is, no new HSM action items will be received until queue length drops below QUEUE_MAX_ITEMS. This value is set as QUEUE_MAX_ITEMS = 2 * # threads.

For checking the archive/retrieve performance the benchmark tool ltsmbench can be used

>./src/test/ltsmbench --help
usage: ./src/test/ltsmbench [options]
	-z, --size <long> [default: 16777216 bytes]
	-b, --number <int> [default: 16]
	-t, --threads <int> [default: 1]
	-f, --fsname <string> [default: '/']
	-n, --node <string>
	-p, --password <string>
	-s, --servername <string>
	-v, --verbose {error, warn, message, info, debug} [default: message]
	-h, --help

IBM API library version: 7.1.6.0, IBM API application client version: 7.1.6.0
version: 0.5.7-5 © 2017 by GSI Helmholtz Centre for Heavy Ion Research

In this example we benchmark the archive/retrieve performance of a TSM server running inside a KVM with 1 thread

>./src/test/ltsmbench -v warn -n polaris -p polaris -s polaris-kvm-tsm-server -t 1 -z 134217728 -b 4
[measurement]	'tsm_archive_fpath' processed 536870912 bytes in 9.307 secs (57.682 Mbytes / sec)
[measurement]	'tsm_retrieve_fpath' processed 536870912 bytes in 6.381 secs (84.137 Mbytes / sec)

vs 4 threads

>./src/test/ltsmbench -v warn -n polaris -p polaris -s polaris-kvm-tsm-server -t 4 -z 134217728 -b 4
[measurement]	'tsm_archive_fpath' processed 536870912 bytes in 8.444 secs (63.578 Mbytes / sec)
[measurement]	'tsm_retrieve_fpath' processed 536870912 bytes in 5.260 secs (102.070 Mbytes / sec)

For more TSM server/client tuning tips see Tips for Tivoli Storage Manager Performance Tuning and Troubleshooting

Performace Measurement over 10GBit Network

By means of ltsmbench the archive and retrieve performance is measured from a ltsm machine to a TSM server. The TSM server is equipped with SSD's bundled into a hardware RAID-5 where the TSM database and transactions logs are located. The TSM bulk data is placed on regular disks bundled into a hardware RAID-6 device. The archive and retrieve performance for increasing number of threads {1,2,..,16} is visualized as a Box-Whisker-Plot. Archive performace Retrieve performace

Scaling

The Lustre file system can be bond to multiple HSM storage backends, where each bond is identified by an unique number called the archive id. This concepts thus allows scaling within the Lustre HSM framework. In the simplest case, only one copytool instance is started which handles all archive id's in the range {1,2,...,32}. That is, all data is transferred through a single TSM Server though with multi-threaded access. In the full expanded parallel setup 32 TSM servers can be leveraged for data transfer as depicted below. Archive performace

More Information

In the manual pages lhsmtool_tsm.1, ltsmc.1 and ltsmfsq.1 usage details and options of lhsmtool_tsm, ltsmc and ltsmfsq are provided. In addition, a screencast of an older version of this project is provided.

References

A thorough description and code examples of IBM's low-level TSM API/library can be found in the open document Using the Application Programming Interface, Fourth edition (September 2015).

Funding

This project is funded by Intel® through GSI's Intel® Parallel Computing Centers

License

This project itself is licensed under the GPL-2 license. The code however, depends on IBM's low-level TSM API/libraries which are distributed under different licenses and thus are not provided in the repository. See IBM Tivoli Storage Manager Server.

Warranty

Note, this project is still under development and in a beta development state. It is however ready to be tested.

About

LTSM - Lightweight TSM API, Lustre TSM Copytool and TSM Console Client for Archiving Data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 75.9%
  • Shell 18.0%
  • M4 4.4%
  • Makefile 1.6%
  • Emacs Lisp 0.1%