Similar presentations

1
Building a High Performance Mass Storage System for Tier1 LHC site Vladimir Sapunenko, INFN-CNAF GRID2012, July 16 – 21 Dubna, Russia

2
July 18, 20122Vladimir.Sapunenko@cnaf.infn.it Tier1 site at INFN-CNAF CNAF is the National Center of INFN (National Institute of Nuclear Physics) for Research and Development into the field of Information Technologies applied to High-Energy physics experiments. Operational since 2005

4
Mass Storage Challenge Several PetaBytes of data (online and near-line) need to be accessed at any time from thousands of concurrent processes Aggregated data throughput required, both on Local Area Network (LAN) and Wide Area Network (WAN), is of the order of several GB/s. Long term transparent archiving of data is needed Frequent configuration changes Independent experiments (with independent production managers and end-users) concur for the usage of disk and tape resources Chaotic access can lead to traffic jams which must be taken into account as quasi-ordinary situations 4 July 18, 2012Vladimir.Sapunenko@cnaf.infn.it

5
What do we need to do to meet that challenge? We need to a Mass Storage Solution which has the following features Grid-enabled high performance modular stable and robust targeted to large computing centers (as WLCG Tier-1s) large means custodial of O(10) PB of data simple installation and management 24x7 operation with limited manpower centralized administration July 18, 2012Vladimir.Sapunenko@cnaf.infn.it5

9
LHCB: CPU used at CERN and Tier-1s in 2012 CERN CNAF GRIDKA RAL IN2P3 NIKHEF PIC SARA Share of used CPU in succesful jobs CNAF Share of CPU used in failed jobs CNAF is the first centre after CERN for CPU used and the last when counted for fraction of CPU time wasted by jobs failing for any reason The main reason: stability of the storage system ! July 18, 20129Vladimir.Sapunenko@cnaf.infn.it

17
Software components GPFS as a Clustered Parallel File System TSM as HSM system StoRM as SRM GEMSS as interface between StoRM and GPFS and TSM NAGIOS as alarm and event handling QUATTOR as system configuration manager LEMON as monitoring tool July 18, 201217Vladimir.Sapunenko@cnaf.infn.it

18
GPFS General Parallel File System from IBM –Clustered (fault tolerance and redundancy) –Parallel (scalability) –Used widely in industry (very well documented and supported by user community and by IBM) –Always provide maximum performance (no need to replicate data to increase availability) –Running on AIX, Linux (RH, SL) and Windows –Is NOT bounded to IBMs HW! July 18, 201218Vladimir.Sapunenko@cnaf.infn.it

19
GPFS (2) Advanced High-Availability features disruption-free maintainance servers and storage devices can be added or removed while keeping the filesystems online when storage is added or removed the data can be dynamically rebalanced to maintain optimal performance Centralized administration cluster-wide operations can be managed from any node in the GPFS cluster easy administration model, consistent with standard UNIX file systems Support standard file system functions user quotas, snapshots, etc. Many other features not fitting in two slides… July 18, 2012Vladimir.Sapunenko@cnaf.infn.it19

21
StoRM: STOrage Resource Manager StoRM is an implementation of the SRM solution designed to leverage the advantages of cluster file systems (like GPFS) and standard POSIX file systems in a Grid environment developed at INFN-CNAF. –http://storm.forge.cnaf.infn.it July 18, 201221Vladimir.Sapunenko@cnaf.infn.it StoRM provides data management capabilities in a Grid environment to share, access and transfer data among heterogeneous and geographically distributed data centers, supporting direct access (native POSIX I/O call) to shared files and directories, as well as other standard Grid access protocols. StoRM is adopted in the context of WLCG computational Grid framework.

22
A little bit of history CASTOR was the traditional solution for Mass Storage at CNAF for all VO's since 2003 Large variety of issues –both at set-up/admin level and at VOs level (complexity, scalability, stability, …) –successfully used in production, despite large operational overhead In parallel to production, in 2006 we started to search for a potentially more scalable, performing and robust solution –Q1 2007: after massive comparison tests GPFS was chosen as the only solution for disk-based storage (it was already in use at CNAF for a long time before this test) –Q2 2007: StoRM (developed at INFN) implements SRM 2.2 specifications –Q3-Q4 2007: StoRM/GPFS in production for D1T0 for LHCb and Atlas Clear benefits for both experiments (significantly reduced load on CASTOR) –End 2007: a project started at CNAF to realize a complete grid- enabled HSM solution based on StoRM/GPFS/TSM July 18, 2012Vladimir.Sapunenko@cnaf.infn.it22

23
GEMSS Grid Enabled Mass Storage System –A full HSM (Hierarchical Storage Management) integration of GPFS, TSM and StoRM –combined GPFS and TSM specific features with StoRM to provide a transparent Grid-friendly HSM solution An interface between GPFS and TSM has been implemented to minimize mechanical operations in tape robotics (mount/unmount, search/rewind) StoRM has been extended to include the SRM methods required to manage the tapes Permits minimize management effort and increase reliability Very positive experience for scalability so far Based on large GPFS installation in production at CNAF since 2005 with increasing disk space and number of users July 18, 201223Vladimir.Sapunenko@cnaf.infn.it

24
GEMSS Development TimeLine July 18, 2012Vladimir.Sapunenko@cnaf.infn.it24 2007200820092010 D1T0 Storage Class implemented with StoRM/GPFS for LHCb and ATLAS D1T1 Storage Class implemented with StoRM/GPFS/TSM for LHCb D0T1 Storage Class implemented with StoRM/GPFS/TSM for CMS GEMSS is used by all LHC and non-LHC experiments in production for all Storage Classes ATLAS, ALICE, (CMS) and LHCb experiments, together with all other non-LHC experiments (Argo, Pamela, Virgo, AMS) use GEMSS in production! 20112012 Introduced DMAPI server (to support GPFS 3.3/3.4

26
GEMSS recall system Selective recall system in GEMSS use 4 processes: yamssEnqueueRecall yamssMonitor, yamssReorderRecall yamssProcessRecall yamssEnqueueRecall & yamssrReorderRecall manage a FIFO queue with the files to be recalled, fetches files from the queue and builds sorted lists with optimal file ordering. July 18, 2012Vladimir.Sapunenko@cnaf.infn.it26 yamssProcessRecall actually creates the recall streams, perform the recalls and manages the error conditions (i.e. retries file recall failures…) yamssMonitor is the supervisor of the reorder and recall phases

27
GEMSS interface Set of administrative commands have been also developed, (for monitoring, stopping and starting migrations and recalls, performance reporting). Almost 50 user interface commands/daemon some examples: – yamssEnqueueRecall (command) Simple command line to enqueue into a FIFO the files to recall from tape –yamssLogger (daemon) Centralized logging facilty. 3 log files (for migrations, premigrations and recalls) are centralized for each YAMSS-managed file system –yamssLs (command) ls-like interface, but in addition prints status of each file: premigrated, migrated, disk-resident. Shipped as RPM package for installation/distribution Provides several STAT files for accurate statistic which includes –file name –Time stamp –File size –Tape label July 18, 2012Vladimir.Sapunenko@cnaf.infn.it27

28
Pre-production tests ~24 TB of data moved from tape to disk Recalls of five days typical usage by a large LHC experiment (namely CMS) compacted in one shot and completed in 19h Files were spread on ~100 tapes Average throughput: ~400MB/s 0 failures Up to 6 drives used for recalls Simultaneously, up to 3 drives used for migrations of new data files ~ 400 MB/s Up to ~ 530 MB/s of tape recalls July 18, 201228Vladimir.Sapunenko@cnaf.infn.it

32
Conclusions We implemented a full HSM system based on GPFS and TSM able to satisfy the requirements of WLCG experiments operating the Large Hadron Collider StoRM, the SRM service for GPFS, has been extended in order to manage tape support An interface between GPFS and TSM (GEMSS) was realized in order to perform tape recalls in an optimal order, so achieving great performances A modification to XrootD library permitted to interface XrootD and GEMMS GEMSS is the storage solution used in production in our Tier1 as a single integrated system for ALL the LHC and no-LHC experiments. The recent improvements in GEMSS have increased the level of reliability and performance in the storage access. Results from the experiment perspective of the latest years of production show the systems reliability and high performance with moderate effort GEMSS is the treasure! July 18, 2012Vladimir.Sapunenko@cnaf.infn.it32