Pegasus WMShttps://pegasus.isi.edu
Automate, recover, and debug scientific computationsSat, 10 Feb 2018 18:26:17 +0000en-UShourly1CyVerse Container Camphttps://pegasus.isi.edu/2018/02/05/cyverse-container-camp/
Mon, 05 Feb 2018 17:30:59 +0000https://pegasus.isi.edu/?p=1758Read More]]>Pegasus team members will be giving an overview of how to use containers in Pegasus workflows, at the CyVerse Container Camp, March 7-9 at University of Arizona, Tucson. The Pegasus portion is currently scheduled for Friday, but the rest of the agenda is packed with related and interesting topics such as how to create Docker and Singularity containers, how to integrate your own codes and data, and how to scale up.

Online Pegasus Office Hourshttps://pegasus.isi.edu/2018/01/25/online-pegasus-office-hours/
Fri, 26 Jan 2018 04:46:50 +0000https://pegasus.isi.edu/?p=1749Read More]]>We will be holding regular online Pegasus Office Hours starting Friday February 9th at 11AM Pacific.Initially, they will be held on a bi-monthly basis on second Friday of the month, and will address user questions and also apprise the community of new developments.

For our first series, we will be have an overview presentation on containers in Pegasus.Support for user application containers was first introduced in Pegasus 4.8.0 released in September 2017

We hope to see you online on February 9th. Please feel to forward to interested people.

Visualization of a neutron star merger seen in gravity and matter. Photo/Karan Jani-Georgia Tech.

A collaboration that began 16 years ago between computer scientists at the USC Information Sciences Institute (ISI) and members of the Laser Interferometer Gravitational-Wave Observatory (LIGO) and Virgo projects is opening up a new window onto the nature of the universe.

Pegasus, a specialized computer program developed by a team of ISI computer scientists led by distributed computing expert Ewa Deelman in collaboration with HTCondor, played an important role in a new astronomy discovery announced by LIGO, Oct. 16 in Washington, D.C.

For the first time, scientists have directly detected gravitational waves — ripples in space-time — in addition to light produced by colliding neutron stars. Detected by two identical LIGO detectors on Aug. 17, this marks the first time that a cosmic event has been viewed in both gravitational waves and light.

The software developed at ISI allowed the LIGO scientists to confirm the signal’s significance by conducting rigorous offline analyses of massive amounts of data. As part of this latest discovery, in August 2017 Pegasus managed almost 4,000 workflows with more than nine million tasks.

Professor Duncan Brown of Syracuse University, a member of the LIGO Scientific Collaboration, said: “Thanks to our collaboration with the Pegasus, OSG, and Condor teams, we can now turn around our offline analyses in days not weeks. This is essential for getting confirmations of low- latency alerts and getting our results out to the world.”

Other benefits of Pegasus include automatic recovery after job failures, the ability to automatically manage data flow, and a dashboard to track progress and identify the sources of job failures.

This is not the first time that Pegasus, a workflow management tool initially designed for astronomers, has helped to propel a new scientific discovery. In 2016, LIGO scientists first discovered gravitational waves, confirming Einstein’s Theory of Relativity. The breakthrough, which received the 2017 Nobel Prize for physics, was made possible in part by Pegasus.

“It is extremely exciting to see years of computer science research and software development have impact on cutting- edge science,” said Deelman.

“Thanks to the funding that Pegasus continues to receive from National Science Foundation and Department of Energy, we hope to improve our solutions and make them available to new science communities.”

The Nobel Prize-winning discovery that gravitational waves exist in the universe, which in turn further confirmed Albert Einstein’s General Theory of Relativity, was made possible in part by a collaboration with USC computer scientists.

By developing a specialized computer program called Pegasus, a team of USC Information Sciences Institute researchers facilitated the work of scientists who this week won the Nobel Prize in Physics for their discovery of gravitational waves that are powerful enough to ripple throughout space and distort the shape of the cosmos.

While the prize is shared by Rainer Weiss of the Massachusetts Institute of Technology, and Caltech scientists Kip Thorne and Barry Barish, the joy over their achievement is shared by colleagues.

“It’s very exciting to see 16 years of collaboration come to this point,” saidÂ Ewa Deelman, who spearheaded Pegasus’ development as a research professor in computer scienceÂ and research director for the ISI at the USC Viterbi School of Engineering. “Gravitational waves, like the ones emitted by merging black holes, give a unique insight into how the universe is structured and how it developed. It’s a basic question about nature.”

Experimental data

The software developed at ISI supported computational research, such as the type conducted to detect cosmic gravitational waves at the Laser Interferometer Gravitational- wave Observatory (LIGO). It automated the execution of hundreds of thousands of computational tasks on high-end systems, while managing the data flow between tasks and accessing experimental data across wide area networks.

Deelman said the Pegasus program, so named because it was initially designed for astronomers, was first conceived as a virtual data grid that would deliver information based on whether it was available in existing data sets (such as information collected by NASA) or by computing new data on demand.

One challenge that Deelman and her team encountered when designing the program was that the scientists had an idea of their workflow stored in their minds and wanted it enacted exactly that way, not necessarily in the manner suggested by a computer. The computer scientists had to create something that could quickly link the researchers’ ideas to existing data, software and computing resources.

“Pegasus allows you to describe these computational steps [workflow tasks] in an abstract way, providing the logic and data flow between computational steps, and then ties in those abstractions to relevant data sets, software and computational resources that can enact these steps,” Deelman said. “It maps the abstract tasks descriptions onto the available resources and manages their execution in an automated, robust and efficient way.”

Pegasus eliminated the need for developing manual commands or scripts. It was designed to quickly recover from failures and ensure the data are delivered securely and efficiently.

Relatively speaking

LIGO’s discovery marked nearly 100 years since Einstein wrote his General Theory of Relativity, when he suggested that space and time are a single continuum, “space-time,” and that matter and energy could warp the shape of space-time and produce gravity.

Deelman led a team for developing Pegasus that over the years included USC computer scientists Karan Vahi, Mats Rynge, Rajiv Mayani, Rafael Ferreira da Silva, Gideon Juve, Gaurang Mehta and a cohort of PhD and master’s degree candidates to write the software in a way that functioned intuitively and reliably for the scientists. Early on, the team included ISI scientists such as Carl Kesselman, director of the Center for Discovery Informatics at the USC Michelson Center for Convergent Bioscience, which will open next month.

Other researchers from the University of Wisconsin- Madison’s HTCondor team led by Miron Livny also had a hand in the creation of Pegasus and have collaborated on the project since 2003.

“Over the years, Pegasus greatly benefited from collaborations with domain scientists from various projects, including LIGO’s Duncan Brown, Scott Koranda, Kent Blackburn, Peter Couvares and Stuart Anderson, among others. They provided us with real-life computational challenges that make our software relevant and inspiration that keeps us motivated to do better,” Deelman said.

Since Pegasus is an open-source software, it is available for other researchers to download without any licensing fees. Deelman said it has been used for other scientific initiatives, including those at the Southern California Earthquake Center, which uses the software to model seismic hazards in SoCal and predict the flow of an earthquake.

Pegasus also has been used for modeling climate change, to track the extinction of monkeys, create better soybeans and to understand genetic patterns in disease and conditions such as schizophrenia.

“With Pegasus, I think there will be many more new exciting discoveries,” Deelman said.

Pegasus receives continued support from the National Science Foundationhttps://pegasus.isi.edu/2017/09/14/pegasus-nsf-grant/
Thu, 14 Sep 2017 15:26:54 +0000https://pegasus.isi.edu/?p=1647Read More]]>

The Pegasus team is pleased to announce that it has received a new grant from the National Science Foundation to support new development and maintenance of the Pegasus Workflow Management System. It will support Pegasus for the next 5 years and help address the needs of our diverse user community.

Since 2001, the Pegasus Workflow Management System has been designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. Automation enables the delivery of services that consider criteria such as time-to-solution, as well as takes into account efficient use of resources, managing the throughput of tasks, and data transfer requests. Pegasus allows scientists to easily monitor and debug their scientific workflows, providing a suite of command line tools and a web-based workflow dashboard. These capabilities allow scientists to do production-grade science at scale using Pegasus. The power of these abstractions was demonstrated in 2015 when Pegasus was used by an international collaboration to harness a diverse set of resources and to manage compute and data- intensive workflows that confirmed the existence of gravitational waves, as predicted by Einstein’s theory of relativity.

Experience from working with these diverse scientific domains has helped us uncover opportunities for further automation of scientific workflows. The new effort will addresses these opportunities through innovation in the following areas: automation methods to include resource provisioning ahead of and during workflow execution, data-aware job scheduling algorithms, and data sharing mechanisms in high-throughput environments. Near-term capabilities to be released in the 4.8 software release include:

Integration with Jupyter Notebook;

Support for application container technologies: both Docker and Singularity.

To support a broader group of “long-tail” scientists, the new grant provides funding towards usability improvements as well as outreach, education, and training activities.

The proposed enhancements will be integrated into Pegasus, and distributed to the user community as part of regular Pegasus software releases. This will facilitate adoption and evaluation of these capabilities in the context of real-life applications and computing environments. The data-aware focus targets new classes of applications executing in high-throughput and high-performance environments.

The Pegasus team very much looks forward to our continued collaboration with domain and computer scientists and we also hope to work with new users and communities. Please contact us at pegasus@isi.edu if you would like to discuss your workflow needs and ideas.

Jupyter Support – Pegasus now provides a Python API to declare and manage workflows via Jupyter, which allows workflow creation, execution, and monitoring. The API also provides mechanisms to create Pegasus catalogs (sites, replica, and transformation). More details can be found in the documentation at https://pegasus.isi.edu/docs/4.8.0/jupyter.php

New Features and Improvements

JGlobus is no longer actively supported and is not in compliance with RFC 2818(https://docs.globus.org/security-bulletins/2015-12-strict-mode). As a result cleanup jobs using pegasus-gridftp client would fail against the servers supporting the strict mode. We have removed the pegasus-gridftp client and now use gfal clients as globus-url-copy does not support removes. If gfal is not available, globus-url-copy is used for cleanup by writing out zero bytes files instead of removing them.

[PM-1212] – new defaults for number of transfer and inplace jobs created

Pegasus 4.7.5 Releasedhttps://pegasus.isi.edu/2017/09/05/pegasus-4-7-5-released/
Tue, 05 Sep 2017 22:35:17 +0000https://pegasus.isi.edu/?p=1610Read More]]>We are happy to announce the release of Pegasus 4.7.5 . Pegasus 4.7.5 is a minor release, which contains minor enhancements and fixes bugs. This will most likely be the last release in the 4.7 series, and unless you have specific reasons to stay with the 4.7.x series, we recommend to upgrade to 4.8.0.

Improvements

[PM-1146] – There doesn’t seem to be a way to get a persistent URL for a workflow in dashboard

[PM-1186] – pegasus-db-admin should list compatibility with latest pegasus version if no changes to schema

[PM-1187] – make scheduler type case insensitive for grid gateway in site catalog

JGlobus is no longer actively supported and is not in compliance with RFC 2818(https://docs.globus.org/security-bulletins/2015-12-strict-mode). As a result cleanup jobs using pegasus-gridftp client would fail against the servers supporting the strict mode. We have removed the pegasus-gridftp client and now use gfal clients as globus-url-copy does not support removes. If gfal is not available, globus-url-copy is used for cleanup by writing out zero bytes files instead of removing them.

Jupyter Support – Pegasus now provides a Python API to declare and manage workflows via Jupyter, which allows workflow creation, execution, and monitoring. The API also provides mechanisms to create Pegasus catalogs (sites, replica, and transformation). More details can be found in the documentation at https://pegasus.isi.edu/docs/4.8.0dev/jupyter.php