Andrzej M. Goscinski School of Computing and Mathematics

Similar presentations

Presentation on theme: "Andrzej M. Goscinski School of Computing and Mathematics"— Presentation transcript:

1 Making Parallel Processing on Clusters Efficient, Transparent and Easy for ProgrammersAndrzej M. GoscinskiSchool of Computing and MathematicsDeakin UniversityJoint work with Michael Hobbs. Jackie Silcock and Justin Rough

3 Parallel Processing: User ExpectationsAffordableSupercomputers for a “poor man”PerformanceGood performanceEase of UseFree from creation and placement concernsTransparencyUnaware of location of processesEase of ProgrammingChoice and easy use of communication paradigm

5 Parallel Processing PhasesThree distinct phases:InitializationExecutionTerminationResearchers and manufacturers mainly concentrate on execution to achieve the best performanceEase of use of parallel systems and programmer’s time are neglectedApplication developers are discouraged as they have to program many activities, which are of an operating system nature

6 Parallelism ManagementPresent operating systems that manage clusters are not built to support parallel processingReason: these operating systems do not provide services to manage parallelismParallelism management is the management of parallel processes and computational resourcesAchieve high performanceUse computational resources efficientlyMake programming and use of parallel systems easy

7 Parallelism ManagementParallelism management in parallel programming tools, Distributed Shared Memory and enhanced operating system environmentshas been neglectedleft to the application developersApplication developers must dealnot only with parallel application developmentbut also with the problems of initiation and control for the execution on the clusterTransparency and reliability (SSI) have been neglected – users do not see a cluster as a single powerful computer

8 Services for Parallelism Management on ClustersServices for parallelism management and transparencyEstablishment of a virtual machineMapping of processes to computersParallel processes instantiationData (including shared) distributionInitialisation of synchronization variablesCoordination of parallel processesDynamic load balancing

9 Transparency Users should see a cluster as a single powerful computerDimensions of parallel processing transparencyLocation transparencyProcess relation transparencyExecution transparencyDevice transparency

10 Communication ParadigmsTwo communication paradigms:Message Passing (MP) Explicit communication between processes of a parallel applicationFastDifficult to use for programmersDistributed Shared Memory (DSM) Implicit communication between processes of a parallel application through shared memory objectsEasy to useDemonstrates reduced performanceClaim: Operating environments that offer MP and DSM should be provided as a part of a cluster operating system as they manage system resources

11 What to do?AffordableClustersPerformanceIntroduce special servicesEase of UseParallelism managementTransparencyOperating systemsEase of ProgrammingMessage passing and DSMDevelopment of cluster operating systems supporting parallel processingServices of cluster operating systems:Distributed services for transparent communication and management of basic system resourcesServices for parallelism management and transparency

12 Related Systems Message Passing SystemsPVMA set of cooperating server processes and specialized libraries that support process communication, execution and synchronizationA virtual machine must be set up by the userProvides transparent process creation and terminationMPIObjective is to standardize and coordinate the direction of various message passing applications, tools and environmentsProvides limited process management functions to support parallel processingHARNESSDoes not provide transparencyProgrammers are forced to specify computers, map processes to these computersLoad imbalance is neglected

13 Related Systems DSM SystemsResearch concentrates mainly on improving performanceEase of use has been neglectedMuninProgrammers must label different variables according to the consistency protocol they requireThe initialisation stage requires the application developer to define the number of computers to be usedProgrammers must create a thread on each computer, initialise shared data and create synchronization variablesTreadMarksThe application developer has a substantial input into initialisation of DSM processesFull transparency is not provided

14 Related Systems Execution EnvironmentsImprovement to PVM, MPI and DSM approach of running on top of an operating system is through the enhancement of an operating system to support parallel processingBeowulfExploits distributed process space to manage parallel processesProcesses can be started on remote computers after logon operation into that computer was completed successfullyIt does not address resource allocation nor load balancingTransparent process migration is not provided

16 Related Systems SummaryAll systems but MOSIX are based on middleware – there is no trial to develop a comprehensive operating system to support parallel processing on clustersThe solutions are performance driven – little work has been done on making them programmer friendlyProblems from parallel processing point of view:Processes are created one at a time although primitives provided enable the user to create multiple processesThese systems (with the exception of MOSIX) do not provide complete transparencyVirtual machine is not set up automaticallyThese systems do not provide load balancing

17 Cluster Execution EnvironmentsExecution environments that support parallel processing on clusters can be developed usingMiddleware approach – at the application levelUnderware – at the kernel level

19 Middleware - summary Middleware allows programmers Middlewareto develop parallel application (PVM, MPI)execute parallel applications on clusters (Beowulf)employ shared memory based programming (Munin)achieve good execution performancetake advantage of portabilityMiddlewaredoes not offer complete transparencyreduces potential execution performance (services are duplicated)forces programmers to be involved in many time consuming and error prone activities that are of the operating system natureConclusion: to provide parallelism management, offer transparency, make programming and use of a system easy develop the needed services at the operating system level

20 Cluster operating systemsCluster is a special kind of a distributed systemCluster operating system supporting parallel processing shouldpossess the features of a distributed operating system to deal with distributed resources and their management and hide distributionexploit additional services to manage parallelism for application and offer complete transparencyprovide an enhanced programming environmentThree logical levels of a cluster operating systemBasic distributed operating systemParallelism management and transparency systemProgramming environment

25 Establishment of a Virtual MachineResource Discovery Server supports adaptive establishment of a virtual machineResource Discovery ServerIdentifiesIdle and lightly loaded computersComputer resources: e.g., processor model, memory sizeComputational load and available memoryCommunication patterns for each processPasses information to the Global Scheduling Server perProcessServerAveraged over an entire clusterVirtual machine changes dynamicallySome computers become overloaded or out of orderSome computers become idle

26 Process Creation Requirements Three forms of process creation:Multiple process creation – to create many instances of a process on a single or over many computersScalability – must be scalable to many computersComplete transparency – must hide the location of all resources and processesThree forms of process creation:SingleMultipleGroupCreation is invoked when the Execution Manager receives a process create request from a parent processExecution Manager notifies Global SchedulerGlobal Scheduler sends location on which process should be createdExecution Manager on selected computer manages process creation

27 Process Creation Single and Multiple ServicesSingle process creation serviceSimilar to the services found in traditional systems supporting parallel processingRequires executable image to be downloaded from disk for each parallel process to be createdMultiple process creation serviceSupports the concurrent instantiation of a number of processes on a given computer through one creation callWhen many computers are involved in multiple process creation, each computer is addressed in a sequential mannerExecutable image of a parallel child process must be downloaded separately for each computer involved – scalability problem

28 Process Creation GroupGroup process creation combines multiple process creation and group communicationGroup process creation serviceallows multiple process to be created concurrently on many computersSingle executable is downloaded from a file server using group communication

30 Process Duplication Single Local and RemoteParallel processes are instantiated on selected computers by employing process duplication supported by process migrationThree forms of process duplicationSingle local and remoteMultiple local and remoteGroup remoteSingle local and remote process duplicationDuplication is invoked when the Execution Manager receives a twin request from a parent processExecution Manager notifies Global SchedulerGlobal Scheduler sends a location on which twin should be placedIf this computer is remote process migration is employed

31 Process Duplication Multiple Local and RemoteMultiple local and remote process duplication is an enhancement of single process duplicationDuplication is invoked when the Execution Manager receives a multiple duplication request from a parent processExecution Manager notifies Global SchedulerGlobal Scheduler sends a location on which twin should be placedIf computer is localProcess Manager and Space Manager are requested to duplicate multiple copies of process entries and memory spacesIf computer is remotethe parent process is migrated to this destinationmultiple copies of the parent process are duplicatedthe parent process on the remote computer is killedChild processes should be duplicated on many computersRemote process duplication is performed for each selected computer

32 Process Duplication Group RemoteWhen more than one remote computer is involved in process duplication the overall performance decreasesDecrease is caused by migrating a parent process to each remote computer sequentiallyPerformance is improved by employing group process migrationProcess Managers and Execution Managers each join a relevant group and use group communicationThe parent process is concurrently migrated to all selected remote computers involved in process duplication

34 Process Migration Designed to separate policy from mechanismProcess Migration Manager acts as the coordinator for migration of various resources that combine to form a processMigration of resources: memory, process entries, buffers is carried out by the Space, Process and IPC Managers, respectivelyTwo forms of process migration: single and groupSingle process migrationGlobal Scheduler provides “which” process to “where” computerLocal Manager requests its remote peer to prepare for a processLocal Migration Manager requests Space, Process and IPC Managers to migrate respective resourcesRemote Manager informs its local peer of successful migrationLocal Manager requests Space, Process and IPC Managers to delete the respective resources of the migrated process

36 Group Process MigrationEnhancement of the single process migrationModifying the single communication between the peer Migration Managers, Process Managers, Space Managers and IPC Managers to that of group communicationGlobal Scheduler provides “which” process to “where” computersEach server migrates their respective resources to multiple destination computers in a single message using group communicationParent process is duplicated on each remote computerAt the end of successful migration the parent process on each remote computer is killed

37 Global SchedulingMakes policy decisions of which processes should be mapped to which computersInput provided by the Resource Discovery ManagerRelies on mechanisms ofSingle, multiple an group process creation and duplication servicesSingle and group process migrationThe server combines services ofStatic allocation – at the initial stage of parallel processingDynamic load balancing – to react to load fluctuationsCurrently, the Global Scheduler is implemented as a centralized server

42 Distributed Shared MemoryDSM is an integral component of the operating systemSince DSM is a memory management function the DSM system is integrated into the Space ManagerShared memory used as though it were physically sharedEasy to use shared memoryLow overhead, improved performanceTwo consistency models supported:Sequential – implemented using invalidation modelRelease – implemented using write-update modelSynchronization and coordination of processesSemaphores - owned by Space Manager on particular computerGaining ownership is distributed and mutually exclusiveBarriers used for coordination – their management is centralized

46 Easy to Use and Program EnvironmentGENESIS systemProvides and efficient and transparent environment for execution of parallel applicationsOffers transparencyRelieves programmers from activities such as:Selection of computers for a virtual a machine for the given applicationSetting up a virtual machineMapping processes to virtual machineProcess instantiation using process creation and duplication supported by process migrationLoad balancing

47 Easy to Use and Program EnvironmentIn the GENESIS systemLocation of the remote computer(s) of the cluster is selected automatically by Global SchedulerUsers do not know process locationProgramming of parallel applications has been made easy by providingMessage passing: standard and PVMDistributed Shared MemoryPowerful primitives: implement sequences of operations and provide transparency process_ncreate(GROUP_CREATE,n, “child_prog”)Process instantiation using process creation and duplication supported by process migrationLoad balancing

50 GENESIS PVM vs. Unix PVM IPC LatencySupport for IPC provided by the PVM server in Unix was substituted with GENESIS operating system mechanismsTo measure the time saved by removing the server, a simple PVM application that exchanges messages (1kbyte –100kbytes) was usedRound-trip time (including data packing and unpacking) was measured

51 GENESIS PVM vs. Unix PVM SpeedupApplication used to study the influence of process instantiation - amount of work relates to the overall exec time – was studiedParameters:Number of workstationsGENESIS with and without load balancing

55 Summary Nondedicated clusters are commonly availableForce application developers to program operating system operationsDo not offer transparencyApplication developers need a computer system thatProcesses applications efficientlyUses cluster resources wellAllows to see cluster as a single powerful computer rather than as a set of connected computersProposal: employ a cluster operating systemDesign: cluster operating system with three logical levelsDistributed operating systemParallelism management and transparency systemProgramming environment

56 Summary GENESIS – designed and developed as a “proof of concept”GENESIS is a system that satisfies user requirementsGENESIS approach is uniqueOffers both message passing (MP and PVM) and DSM environmentServices providing parallelism management are integral components of an operating systemProvides a comprehensive environment to transparently manage system resourcesProgrammers do not have to be involved in parallelism managementUse of the cluster is has been made easyComplete transparency is offeredGood performance results have been achieved

57 Future Work Port GENESIS to an Intel like platformUse virtual memory to support DSMOffer reliable parallel computing services on clusters by employingReliable group communicationCheckpointing to offer fault tolerance