In the first year of this project, substantial progress has been made on all fronts:programmatic, technical, and scientific. As we begin the second year of the project, weare poised to make our first public science demonstrations, building upon substantialtechnical developments in metadata standards and data access protocols. We have beensuccessful in engaging nearly all participating organizations in substantive work. TheNVO project is co-leading international VO initiatives, including the formation of theInternational Virtual Observatory Alliance, for which the NVO Project Manager serves aschairperson.

NVO Science.

At the spring project team meeting (Tucson, 16-17 April 2002) threescientific demonstration projects were selected from an extensive list of potential projectscompiled by the Science Working Group. The demonstrations were chosen based on anumber of criteria, including availability of necessary data, feasibility of completion byJanuary 2003, and ability to show results in a matter of a few minutes (i.e., the time onecan typically hold the attention of an astronomer passing by a display booth at an AASmeeting). Theselected demonstrations are



Brown dwarf candidate search.



Gamma-ray burst follow-up service.



Galaxy morphology measurement and analysis.

These are described in more detail in WBS 10.1 and 10.2 of this report.

Next year we will develop more complex science

demonstrations, and these willincorporate data from our international partners. A major milestone is the August 2003IAU General Assembly, where we will unveil a second round of demonstrations andparticipate in a Joint Discussion on virtual observatories and new large telescopes.

NVO Technology.

In collaboration with the European virtual observatory developmentprojects, AstroGrid and AVO, we released V1.0 of the VOTable XML formattingstandard for astronomical tables. Using VOTable as a standard output product, some 50“cone search” services were implemented by 7 different groups within the team. Thecone search services respond to a request for information based on a right ascension,declination, and radius about that position. Four software libraries for parsing VOTabledocuments were written and made available via the team web site. Also, a JHU-basedAnnual Report, AST0122449

October 2001-September 2002

2

team developed a catalog cross-correlation service for SDSS, 2MASS, and FIRST usingMicrosoft’s .NET facilities and won second place in a nationwide software developmentcontest.

During the summer of 2002 we developed the specification for a Simple Image AccessProtocol, and by the end of the first project year several implementations had beencompleted. By combining the cone search and SIA services we have the infrastructurenecessary for implementing the science demonstration projects.

Substantial progress has been made on metadata standards, work that supports both theVOTable and SIA specifications. In addition, a standard for resource and service-levelmetadata has been developed based on the Dublin Core. This standard has been widelyreviewed and discussed among the international VO projects.

Next year we will begin to explore methods for creating industry-standard Web Services,and for deploying our initial http and cgi-bin services through WSDL.

The NVO Project.

Despite the somewhat lengthy negotiation process that was required toplace all of the subawards under this project, participating organizations were generallyable to start work

within the first several months. It is a challenge to coordinate work andfully exchange information within a collaboration of this scale, but through a system ofworking groups, project status reviews, and regular team meetings we have establishedeffective communication and cooperation. The project Executive Committee meetsweekly by telecon to address issues as they arise.

The delays in getting all subawards issued led to cost underruns in Year 1. These will berolled forward to Year 2, and furtherto Year 3, to help smooth out the strongly front-endloaded funding profile. Financially the project is in good shape.

Annual Report, AST0122449

October 2001-September 2002

3

Activities by WBS

1

Management

1.1 Science Oversight

The Executive Committee has taken a direct interest in the progress on the selectedscience demonstration projects:



A brown dwarf candidate search



A galaxy morphology analysis



A gamma-ray burst follow-up service

These have been monitored closely, and when issues have arisen or progress been lessthan expected, the EC has intervened accordingly. It will be a challenge to complete allthree demonstrations in time for the January AAS meeting, though we remain optimisticof attaining success.

Two members of the EC, R. Hanisch and D. De Young, team member G. Fabbiano, andEPO collaborator J. Mattei, are members of the Astrophysical Virtual ObservatoryScience Working Group. We are following the AVO science demonstrationdevelopments and will work with the AVO, AstroGrid, and other international VOprojects to develop science demonstrations in the second year of the project that drawupon data resources and information services from all international partners.

1.2 Technical Oversight

The Executive Committee is also directly involved with technical development activities:metadata standards, interoperability protocols, and web services. We have been activelyinvolved with the IVOA (International Virtual Observatory Alliance) to build a singleinternationally accepted Simple Image Access Protocol, a follow-on to prior success inestablishing the VOTable standard.

We maintained a web site, http://us-vo.org, for the project. This includes a documentmanagement system that allows team members to publish documents directly, withoutgoing through a human web master. The system already has over 40 documents. The webpage also contains archives of several active discussion groups that are associated withthe NVO (http://archives.us-vo.org). These include the very active Metadata andVOTable discussion groups each with several hundred messages. A new discussiongroup, “semantics,” has been set up to discuss application of Knowledge Engineeringtechnologies such as DAML-OIL and Topic Maps to astronomy.

1.3 Project and Budget Oversight

Performance Against Schedule. We are on or ahead of schedule in most activities. Thedetailed project plan shows progress (estimated percent completion) to date. Somescheduled activities need to be modified to reflect changes in approach.

Annual Report, AST0122449

October 2001-September 2002

4

Performance Against Budget. We did not spend the full first-year funding for the projectowing to complications in issuing subawards and the associated delays in hiring at manyorganizations. Many of our university-based team members operate on quarterly billingcycles, and have mechanisms for covering costs internally until invoices are issued andpayments are received. It has been difficult, therefore, to have a very accurate picture ofto-date spending. Based on invoices received and known commitments, we expect tocarry forward approximately 40% of our first year budget. We have made some budgetreallocations within the project, moving responsibilities and associated funding toorganizations that have been the strongest contributors. One senior member of the teamrelocated from one participating organization to another, taking responsibilities and workareas with him; SOWs and budgets were adjusted accordingly.

2

Data Models

2.1 Data Models / Data Model Architecture

We established a mailing list for data model discussions (dm@us-vo.org) and beganwork on proposednomenclature. J. McDowell visited Strasbourg for the interoperabilityworkshop and held discussions with M. Louys and F. Genova to establish a collaborationwith the AVO data model effort. A draft document on the data model architecture hasbeen circulated among the team and to members of the international collaborations.

Fruitful discussions at the April NVO team meeting in Tucson and at the VO conferencein Garching have led to agreement on a basic approach, in which we will make smallmodels of aspects of the data and agree on a mechanism for associating such models withdatasets and representing them in formats such as VOTable. A document describing themodeling of spectral bandpasses was also written and circulated.

The SAO group has begun modeling existing datasets and elaborating the possiblecomponents of the data model. A detailed comparison of the CDS Aladdin image archivemodel and the CXC X-ray data model was carried out and distributed to the team tostimulate discussion.

2.2 Data Models / Data Types

We have established that both images and catalogs have many common attributes; theinformation content of the CDS catalog description file is closely matched by theinformation content required to describe image axes.

Our investigations emphasize theneed to support, at a fundamental level, mosaiced images such as those made by HST andmodern ground based imagers.

We studied image data formats from the archives of participating organizations, andestablished the importance of unifying the

different mosaic image formats (four mainvariants were identified).

These issues and a proposed general approach were describedin a talk at the Garching VO conference.

Annual Report, AST0122449

October 2001-September 2002

5

2.3

Data Models / Data Associations

During the Strasbourg discussions we addressed issues of data quality (WBS 2.3.4) as animportant component of the VO that should eventually be supported at the level ofdatasets, calibration quantities, and individual data pixels.

Work in this WBS supports the Metadata Working Group in their definition

of space-time metadata. From the data model point of view, it is important to ensure that themechanisms used to associate the space-time metadata with a dataset are definedgenerically so that they can also be used with other kinds of metadata.

A collaboration between CACR, the Caltech Astronomy department, and the CDSStrasbourg has been using Topic Map technology to create tools that can federatemetadata. We are leveraging the UCD (Uniform Content Descriptor) mechanism—which closely describes the semantic meaning of an astronomical datum—and the centralrole of UCD in the VOTable specification. Given that UCDs are already internationallyaccepted we can build these further semantic tools. Topic maps can be used to take anumber of related astronomical tables and find the connections and commonalitiesbetween the attribute descriptors, so that effective federation and data mining can beassisted by machine.

3

Metadata Standards

3.1 Metadata Standards / Basic Profile Elements

The Space-Time metadatadesign (A. Rots) has progressed to a point where it is definedin terms of an XML DTD as well (and more usefully) as an XML Schema.

Extensivediscussions and experiments have led to various revisions.

One final revision will bemade before November 1, 2002.

When that is done, we can concentrate on writing codeto construct and interpret the Space-Time Coordinate objects, as well as to performtransformations.

With the help of such tools the Space-Time Coordinate (STC) metadatacan actually be used.

The

STC metadata project also has shown the path to a metadatageneralization that will allow us to express other metadata following a similar design. Aspart of this work, we have contributed to the effort to define the proper place and use ofUniform Content Descriptors (UCDs).

There are a few issues left concerning the STC metadata.

In particular, we will need tofind a firm design for defining new coordinate frames such as, for instance, coordinateframes anchored to solar system objects.

But these are

not of immediate concern and we

have ensured that the current design of the STC metadata allows such extensions. As asub-issue in this area, we have provided a design for Spatial Region metadata.

In the nextyear we will need to work on interfaces to this metadata design.

Various experiments inthe Metadata and Data Model groups have made us all realize the importance of suchmetadata.

Annual Report, AST0122449

October 2001-September 2002

6

R. Hanisch led the draft definition of Resource and Service Metadata (http://bill.cacr.caltech.edu/cfdocs/usvo-pubs/files/ResourceServiceMetadataV5.pdf). An importantresult of this document has been its description of an architecture for understanding therole of resources and services in the VO. The architecture outlined by the Resource andService Metadata document makes clear the need for an integrated approach to resourceand service registration in which service descriptions “inherit” the metadata of resourcethatprovides it. Such an approach will ultimately make registration easier for providersby minimizing the information they must provide as they register or extend more andmore services.

3.2 Specific Profile Implementations

A white paper describing therelationship between existing metadata standards and theinteractions between users and the VO was circulated. In the related NASA ITWG effort,preliminary WSDL profiles were written for services for several NASA archives andsome simple Web services built on these profiles were prototyped.

While substantial work has been done in this area, the anticipated focus on specificmetadata profiles in the early part of the VO development has been somewhat shifted toimplementations of more generic metadata and transport protocols for the support of theVO demonstrations.

The relationship between this effort and the data models effort continues to be clarified.An image specification was nominally made in the data models area but was stronglyinfluenced by themetadata discussion.

3.3 Metadata Representations and Encoding

The bulk of our work in this area has been oriented toward supporting the first yearprototype demonstrations. The first major accomplishment in this area was thedevelopment of the VOTable

XML definition, version 1.0, led by R. Williams (Caltech)and Francois Ochsenbein (CDS/AVO). Besides proving to be a critical component of thecone search and Simple Image Access interfaces, the VOTable demonstrated the processof developing standards through an open, international effort.

An important part of the Simple Image Access (SIA) specification is the handling ofmetadata used for locating and querying image servers. As part of the development ofthis specification, we identified the metadata

required for the various forms of the serviceand matched them with existing CDS/ESO UCD tags. Where appropriate UCDs werenot defined, we defined new UCDs within an experimental namespace (named VOX, forVirtual Observatory eXperimental). The specification enumerates which metadata arerequired and how they should be represented in the image query and the VOTableresponse. The specification also lists the metadata need for registering the service, whichis a superset of the Resource and Service Metadata.

Annual Report, AST0122449

October 2001-September 2002

7

In addition to our short-term focus on the first year demos, we have put some effort intolong-term metadata solutions. In particular, R. Plante has been assembling requirementsand implementation ideas for a general metadata definition framework, resulting in awhite paper (http://bill.cacr.caltech.edu/cfdocs/usvo-pubs/files/fw-draft2.pdf). Thisframework will be further refined in collaboration with the Data ModelsWorking Group.

Issues and Concerns:

Since the release of version 1.0 of the VOTable specification, wehave examined how the Space-Time Coordinate System metadata might be integratedinto VOTable. We realized that this problem exemplified a more general

need toassociate detailed metadata information with one or more table columns. VOTable thusneeds a hook for referencing arbitrary, external schemas so that new metadata can beeasily inserted into the VOTable document.

The SIA specification is a prototype developed to support the first year demonstrations;thus, we expect to replace this specification. The approach used to develop the spec wasto start by mirroring architecture of the prototype cone search specification, use existingVOTable capabilities and practices to express query result information, and use existingUCDs where ever possible. This uncovered various shortcomings of these technologies,and we departed from this approach accordingly.

With the coding version of the specification complete, we are now focusing on thecaching of service metadata in registries. After the completion of the first year demos,efforts will shift to longer-term solutions; this includes:



a comprehensive framework for registering data and services that minimizesredundant information and effort required of providers, and



a general framework for defining metadata on which to base generic metadatasoftware.

3.4 Profile Applications

In support of the first-year demonstrations, R. Williams, R. Hanisch, and A.

Szalaydeveloped a specification for a “Cone Search” interface for gathering informationassociated with circular regions on the sky from distributed catalogs. This specificationhas been implemented for over 50 data services to date (seehttp://skyserver.pha.jhu.edu/

VoConeProfile/). Szalay has set up the registration service used to located compliantcone search services.

The Simple Image Access interface represents the image analog to the catalog conesearch, however it harnesses a wider array of metadata. At the core is a rectangularregion search. Since this interface can apply to cutout services as well as static imagearchives, additional data for describing and reporting precise spatial coverageis providedin both the query and response.

3.5 Metadata Standards / Relationships

No work scheduled during this period.

Annual Report, AST0122449

October 2001-September 2002

8

3.6 Metadata APIs

A number of libraries have been developed for getting information in and out ofVOTables. Parsers are available in Perl (HEASARC), Java (CACR), and C++ (VO-India); a library for writing VOTables is available in Perl (NCSA). In addition, aVOTable-to-AIPS++-Table has also been developed.

The Simple Image Access (SIA) specification includes a mechanism, referred

to as ametadata query, which allows implementing services to describe how they support imagequeries. In particular, they describe what input parameters they support and whatcolumns they will return in the query result. While this functionality is accessible to endclients, it is primarily intended for use by the central registry service: when animplementing service registers itself, the central registry will send it a metadata query andcache the results. This will allow clients to use the registryto search for compliantservices according the query-able parameters and the information they return.

Currently, the SIA metadata query mechanism does not return (in a standard, specifiedway) the resource and service metadata; that is, this information

is only available via theregistry. In the future, however, it is expected that querying the service directly should bethe most authoritative way to get this information. Thus, a scheme must be worked outfor dynamically gathering this information into

registries for efficient access by clients.

It would be good to revise the cone search specification to adopt this metadata queryframework. This would make it possible to better integrate cone search registryinformation with that of the SIA services. A registry for the SIA services is now beingset up at JHU.

4

Systems Architecture

4.1 System Design

The system design for the NVO relies strongly upon the Grid technology that is beingdeveloped under the NSF NMI initiative and applied in the NSF Teragrid. The designhas three main components: Web services support, data analysis support, and collectionmanagement support. The web services design has been primarily led by D. Tody and R.Williams. The data analysis support will be provided by the Globus toolkit. Thecollection management support is being provided by the Storage Resource Broker.Components of the NVO system include portals for accessing images, catalogs, andprocedures; interactive web services; batch oriented survey processing pipelines; and gridservices. While these components are oriented towards data and informationmanagement, a similar infrastructure is required for knowledge management thatexpresses the sets of operations that can be performed on a given data model, and definesthe relationships between the UCDs that express exact semantics for physical quantities.The knowledge management tools are a current active area of discussion, with multipleoptions being considered.

Annual Report, AST0122449

October 2001-September 2002

9

The NVO system design document is primarily being

driven by the technologies that arebeing used within the NSF Teragrid. The Teragrid will be the NVO testbed for bothlarge-scale data manipulation and collection replication. Hence the NVO system designwill closely follow that of the Teragrid. The data handling systems of the Teragrid arestill being debated. Three environments are under consideration; persistent sky surveydisk caches, high performance SAN-based data analysis disk caches, and deep archives.Versions of each of these environments either exist at SDSC, Caltech, or NCSA, or arebeing implemented. The specification of an architecture for the NVO will in part dependupon how the Teragrid decides to integrate these data management systems.

Similarly, the system design for the web services environment depends upon thecompeting standards from three communities: the Web Services Description Languageenvironment being created by vendors, the Open Grid Services Architecture beingdeveloped by the Global Grid Forum, and the Semantic Web architecture beingdeveloped by the W3C community. There are efforts to merge the three environments.The challenges are the choices that will be made for authentication, for service discoveryregistries, and for service instantiation factories. The current Grid Forum Servicesarchitecture is not yet stable enough for production systems. We have the choice ofgoing with WSDL based implementations, and then upgrading to the next generationtechnology, or waiting to see what the final architecture will look like. The services thatare being implemented currently within the NVO are an important step, but they willrequire significant modification to interoperate with the Grid.

4.1.1 System-Level Requirements Definition. The system design for most of the NVO

architecture is being driven by practical experience with test systems. Three categories ofenvironments are in active test. They include services oriented towards processing asmall amount of data (1000 records or 90 seconds access), data analysis pipelines that arescaled to process all the images acquired during one day of image collection, and largescale processing supported by the NVO testbed. The design of the testbed requires anengineering estimate of the computation capacity, I/O bandwidth, and caching capacity.To ascertain a reasonable scale of resources, we are continuing the implementation of abackground analysis of the 2MASS collection, in collaboration with J. Good of IPAC.This will require a complete sweep through 10 TB of data, at an expected rate of 3GB/sec. Good has created the initial pixel reprojection and background normalizationroutine, which is being applied at SDSC to the 2MASS data. The analysis is computeintensive, instead of data intensive. The complexity of the computation appears to be9000 operations per pixel. The needed data access rates can be sustained from archiveswithout use of high performance disk.

A second observation is related to the cut-out and mosaicing service that has been createdby R. Williamsfor processing the DPOSS collection. Each DPOSS image is a gigabytein size. The initial version of the service retrieved the entire image from the remotestorage system, and applied the cut-out and mosaic generation locally. The time neededto generate

the cut-out was dominated by the time needed to move a GB of data over thenetwork. The analysis was implemented as a remote proxy in the Storage ResourceAnnual Report, AST0122449

October 2001-September 2002

10

Broker by G. Kremenek. This eliminated the need to move the entire image. The cutoutwas generated directly on the remote storage system, and only the reduced imagetransferred to the user. The service then ran much faster.

A third observation is related to the replication of the DPOSS sky survey collectionbetween Caltech and SDSC. The files wereregistered into a logical name space throughexecution of a script. The images were then replicated onto a second archive at arelatively slow rate limited by the network bandwidth. The management of data withinthe NVO testbed will need to rely heavilyupon the use of logical name spaces rather thanphysical file names. The preservation of the NVO logical name space will be one of themajor system design requirements. This will be a differentiating factor between the Gridand the NVO testbed. The Grid

replica services are currently designed for short-termreplication of data, rather than the long-term replication of entire collections.

Four TB of the 2MASS collection are replicated onto a disk cache at SDSC. We havecompleted the replication of the 2MASS collection ontothe HPSS archive at Caltech.This is important to improve reliability by a factor of 10. When the HPSS archive atSDSC is off line, we are able to retrieve images from the Caltech copy. To support theautomated replica fail over, we have installed version 1.1.8 of the SRB at Caltech.

We have done a test run of a re-analysis for the DPOSS collection, in collaboration withR. Williams of Caltech. This was done on a 64-processor Sun platform, accessing datafrom a disk cache. The computation was CPU-limited, taking 410 seconds to process asingle 1-GB image on one processor. Using the entire platform, the re-analysis of thecomplete DPOSS collection could be done in 11 hours, at a sustained I/O rate of 135MB/sec. This includes writing a new version ofthe entire collection back to disk, ormoving 5.6 TB of data. The goal is to gain a factor of 10 in performance by moving tothe Teraflops compute platform, and the large 30 TB disk cache.

We are also working on engineering estimates for the manipulation

of large catalogs. J.Gray has shipped us a copy of the SDSS metadata (80 GB). We have dedicated diskspace and compute resources to the analysis support requirements for this catalog.

4.1.2 Component Requirements, and 4.1.3 Interaction with Grid Components and Tools.E-mail exchanges were conducted on WSDL and OGSA interfaces to the gridenvironment, metadata management, data model specification, and knowledgemanagement with E. Deelman, D. Tody, R. Williams, and R. Plante. SDSC isimplementing a set of WSDL data management services for data discovery, data access,collection building, and data replication. The services are being integrated with GridAnnual Report, AST0122449

October 2001-September 2002

11

portal technology to support computations on shared data environments. We expect thisapproach to be a prototype for the NVO services that are integrated with Grid technology.

4.1.4 Logical Name Space. We upgraded the SRB server at Caltech to version 1.1.8, tosupport automatic fail over to an alternate replica. This will improve reliability of thesystem for image access by a factor of ten. This still requires testing the new versionwith the existing IPAC 2MASS portal.

4.2 Interface Definition

SOAP-based web services are becoming standard in the business community, and areexpected to rapidly become the vehicle for sophisticated web applications like the VirtualObservatory. In Year 2 of the NVO project, we expect to start a transition to SOAP ofmany of the GET/POST based services that we have defined this year, such as the ConeSearch and Simple Image Access Protocol. CACR has been creating simple SOAP WebServices from open-source Apache Tomcat and Axis software. This complements work atJHU, which is using the Microsoft framework for SOAP services. These alternatedevelopment paths are necessary to assure interoperability among variousimplementations. Also see WBS 3.4.

4.3 Network Requirements

Work not scheduled until Year 2.

4.4 Computational Requirements

Work not scheduled until Year 2.

4.5 Security Requirements

See WBS 6.2.

5

Data Access/Resource Layer

5.1 Resource and Information Discovery

Work has proceeded along several fronts in the area of resource and informationdiscovery. Following the specification of the Cone Search service, we implemented aregistration service that indexes these services. Fifty services are registered. Thisregistry will be extended to include services supporting the Simple Image AccessProtocol. Working in collaboration with CDS (Strasbourg), we assigned UniformContent Descriptors (UCDs) tomore than 1400 attributes in the Sloan Digital Sky Surveydatabase. The SDSS database was amended to be compliant with the UCD physical unitsstandards. In making the UCD associations, we noted several gaps in the UCD hierarchythat have since been added

by the CDS. A template was developed that allows themapping between UCDs and database relations to be incorporated directly into theAnnual Report, AST0122449

October 2001-September 2002

12

archive, and thus to support queries based on UCDs. This, in turn, will allow theautomatic creation of Topic Maps.

5.2 Data Access Mechanisms

5.2.1 Data Replication. In a wide area computing system, it may be desirable to createremote read-only copies (replicas) of data elements (files)—for example, to reduce accesslatency, increase robustness, or increase the probability that a file can be found associatedwith idle computing capacity. A system that includes such replicas requires a mechanismfor locating them.

USC/ISI is developing a Replica Location Service, the next generation of the GlobusReplica Catalog (RC). RC permitted a mapping from logical file names to the physicallocations of the particular file. Although the functionality of RC in terms of the mappingwas adequate the performance and the reliability of the system (a centralized server) waslow.The new generation, the Replica Location Service, allows for the system to bedistributed, and replicated.The RLS is extensible in that the users and applications canextend the information contained within it to other application specific attributes.

Thetesting on the alpha prototype of the service is underway. As we progress in thedevelopment cycle, we will look forward to setting up a testing environment within theNVO framework. We are also in the process of integrating RLS into Chimera (see WBS6.2).

The Replica Location Service is now in beta testing. During this period we are testing thefunctionality of the service as well as its performance. So far the results are encouragingin both areas however, further testing still need to be conducted.

5.2.2 Metadata Catalog Service. The Metadata Service provides a mechanism for storingand accessing metadata, which is information that describes data files or data items. TheMetadata Service (MCS) allows users to query based on attributes of data rather than datanames. In addition, the MCS provides management of logical collections of files andcontainers that consist of small files that are stored, moved and replicated together.

At this time, an initial design has been proposed and a Java API to access the catalog hasbeen implemented.

Metadata Services require a high level of consistency. In the current design, we haveimplemented the service as a single centralized unit. Obviously this solution may not bescalable as the size of the metadata increases and the accesses to the catalog serviceincrease. As a result, in the future, we may consider a more distributed architecturewhere we can have access to the information at various locations in the Grid, but still beable to rely on highly up-to-date information.

5.3 Data Access Protocols

Much of the work of the Metadata Working Group concentrated on a protocol by whichimage data could be published and retrieved—the so-called Simple Image AccessProtocol. The word “image” in this context wasrestricted to sky-registered images—Annual Report, AST0122449

October 2001-September 2002

13

images that have an actual or implied World Coordinate System (WCS) structure—and asingle well-defined bandpass specification. However, the standard is capable ofrepresenting several publication paradigms:



a collection of pointed observations,



a collection of overlapping survey images covering a region,



a uniform mosaic coverage of a region of the sky, and



dynamically reprojected images with client-specified WCS parameters.

The Metadata Working Group has also discussed and defined XML data models forpoint, region, coordinate frame, bandpass, and other astronomical data objects.

5.4

Data Access Portals

The paper “Simple Image Retrieval: Interface Concepts and Issues” (July 2002)presented a conceptual design for implementing uniform image access via servicessupporting multiple access protocols. The document “Simple Image Access PrototypeSpecification” was released in late September following much discussion and severaldrafts.

Several implementations of simple image

access were completed during interfacedevelopment (by STScI, HEASARC, NOAO) and a number of others were in progress asof the end of the reporting period. A related image cutout service developed by Caltechand SDSC for DPOSS data uses scaleable grid services to provide access to massive all-sky survey data collections such as DPOSS.

The initial goal of simple image access was to support the NVO science demos whileexploring the issues of providing uniform access to heterogeneous, distributed image data

holdings. The simple image access service, along with the cone search service developedpreviously, provide early prototype data access portals. The simple image accessinterface has since drawn interest from our IVOA partners in Canada, Europe, and the

UK, and future development will be a collaborative effort with these partners.

The next step will be to explore the use of web services for data access, and look into theissues of client access to such services. This will be done by demonstrating simple imageservices that simultaneously support both URL and WSDL/SOAP based access. Aparallel effort is underway to develop a modular data model and metadata framework,which will be integrated with data access as it develops. The simple image accessprototype already includes experimental data model components, e.g., for the imageworld coordinate system, and for characterizing the spectral bandpass of an image.

To keep simple image access “simple” and have it ready in time to support the sciencedemos the SIA interface is based on simple URLs for requests, using FITS files to returnscience data. Future challenges will be to provide data access via a web servicesinterface (WSDL/SOAP), and later, via grid-enabled interfaces such as OGSA orCondor-G. Aconcern is that the effort expended on implementing simple image servicesnot be lost as we develop future, more sophisticated access protocols and services. Apotential solution is to separate the access protocol from the service implementation.Annual Report, AST0122449

October 2001-September 2002

14

This approach also has the advantage that a service can potentially support multiplesimultaneous access protocols.

6

NVO Services

6.1 Computational Services

The work on computational services is proceeding on two broad fronts. The first is thedevelopment ofcompute and I/O-intensive services for deployment within the NVOarchitecture:



Montage, an astronomical mosaic service funded by the Earth Sciences TechnologyOffice Computing Technologies program. It will deliver science-grade mosaicswhere terrestrialbackground emission has been removed. Ultimately, Montage willrun operationally on the Teragrid, and deliver on demand custom mosaics accordingto the user’s specification of size, rotation, spatial sampling, coordinates and WCSprojection.



A general cross-matching engine funded by the National Partnership for AdvancedComputing Infrastructure (NPACI) Digital Sky project; this service will have theflexibility to cross-match two tables in memory, or two database catalogs, and willhave the option to return probabilistic measures of cross-identification of sources inthe two tables.

A Software Engineering plan, Requirements Specification, Design Specification, andTest Plan have been completed for the Montage project. These documents are availableon the project web site athttp://montage.ipac.caltech.edu.

The design of Montage separates the functions of finding the images needed to generate amosaic, reprojection, and transformation of images, background removal, and co-additionof images. Thus it is a toolkit whose functions can be controlled by executives or scriptsto support many processing scenarios.

The heart of Montage is the reprojection algorithm. An input pixel will overlap severalpixels in the output mosaic. We have developed a general algorithm that conservesenergy and astrometric accuracy. It uses spherical trigonometry to determine thefractional overlap in the output mosaic pixels.

A fully functional prototype has been deployed for Solaris 2.8, Linux 6.x, and AIX ; it isavailable for download to parties willing to take part in validating the algorithms. Thereprojection algorithm is slow—a single 2MASS image takes 4 minutes on a Sun Ultra10 workstation. The algorithm can be easily parallelized, and we will use this approachto speed up the code. We have begun a collaboration with SDSC to parallelize Montageon the IBM Blue Horizon supercomputer. We have already run Montage on 64 nodes inparallel, where a 1 square degree area (55 2MASS images) can be run in under 3 minutes.

Annual Report, AST0122449

October 2001-September 2002

15

USC/ISI is also working closely with the IPAC team in the porting of Montage onto theGrid. ISI has also agreed to be an initial tester of the system. At present USC/ISI islearning about the structure of the Montage code with the hopes of using the Chimerasystem (WBS 6.2 to drive the execution of the Montage components. The main concernis the access to the data required for the Montage. Although we can use protocols such asGridFTP to access individual files,

the data is currently stored in containers that can onlybe indexed by SRB. We are working on indexing the SRB containers so that thenecessary data can be retrieved.

Montage will be used to deliver small image mosaics as part of the “Gamma RayTransients” demonstration project.

The cross-match engine development has been largely geared towards the “brown dwarfdemonstration project,” which will cross-match the 2MASS and SDSS point-sourcecatalogs. We have developed a design that is quite general and will support cross-matching between local files and database catalogs, and streaming from distributedcatalogs. Our aim is, in fact, to stream the 2MASS and SDSS catalogs and cross-matchthem on the fly. Thus far, we have delivered code that will cross-match small tables thatcan be held in memory, and applies the probabilistic cross-match code used by the NASAExtragalactic Database to match sources. We are currently developing code that willhandle database catalogs and streamed data.

6.2 Computational Resource Management

6.2.1 Computational Request and Planning. We are developing the Request ObjectManagement Environment (ROME) to manage requests for compute-

and time-intensiveprocessing and data requests through existing portals; this middleware

employsEnterprise technology already widely used in e-business. Most web services and portalsemploy Apache to manage requests. Apache is efficient and stable, but has no memoryof requests submitted to it. Consequently, visitors to web services haveno means ofmonitoring their jobs or resubmitting them, and the service itself has no means of loadbalancing requests. When large numbers of requests for time-

and compute-intensiverequests are submitted to NVO services, such functions are essential; without them, userswill simply have to wait until job information returns. Users will not tolerate severaldays of waiting to learn that their job has failed.

ROME will rectify this state of affairs. It will deploy an Enterprise Java applicationsserver, the commercial system BEA Web Logic, which accepts and persists time andcompute intensive requests. It is based on e-business technology used, for example, bybanks in managing transactions, but with one major change: ROME will have twocomponents optimized for handling very time intensive requests. One, the RequestManager, will register requests in a database, and a second, the Process Manager, willperform load balancing by polling the database to find jobs that must be submitted andthen send them

for processing on a remote server.

Annual Report, AST0122449

October 2001-September 2002

16

We have delivered design and requirements documents, and have prototyped thefollowing EJB components:



UserRegistration—Creates user entry in the DBMS, user email address is used as userID.



UpdateUserInfo—A user contacts ROME to update the log-in information (e.g.,machine name and port).



RequestSubmission—Creates a request entry in the request DBMS table, and returnsa request ID to user.



UpdateRequest—Allows a user to send interrupt request to abort a job.



GetStatus—A user fetches request status from DBMS.



GetRequest—A processor thread asks ROME to search DBMS for the next request(of the specified application) in the queue.



UpdateApplicationJobID—Once a processor thread started a job runningsuccessfully, it sendsthe job ID to ROME.



SetMessage—A processor thread sends messages from the application to ROME.

Each of these components is a servlet/EJB pair. The servlet accepts an HTTP requestfrom external entities (user/processor) and employs the corresponding EJB to

write/retrieve the information to/from DBMS tables.

A Request Processor with multiple processing threads was built to process the requests. Asimple dummy application program was used in the server to accept the requestparameters and to send a sequenceof “processing” messages to ROME (and on to theuser).

This prototyping effort was aimed at understanding the challenges involved in using EJBtechnology under heavy load. We found that:



An EJB container is very good at maintaining DBMS integrity. When

two EJBs tryto access a DBMS record simultaneously, the EJB container automatically deals withrecord locking and data rollback so that only one of the EJB instances will succeed inaccessing the record, but it does not ensure that both updates are eventually processedsuccessfully.



When two processor threads contact ROME requesting the “next” job to process,ROME must ensure that the same request is not given to both of them.

We are also tracking technology that is being developed through the NSF GriPhyNproject, the DOE Particle Physics Data Grid, the DOE SciDAC projects, and the NASAInformation Power Grid, for the management of computational resources. The twocentral components are management of the computational resources, and management ofthe processes that are being run on the computational resources. The former is handledby the Globus toolkit, version 2. The latter is still a research activity. There are multipleversions of work flow management under development, including the Condor DAGmanand associated data scheduling mechanisms, the survey pipeline processing systems usedAnnual Report, AST0122449

October 2001-September 2002

17

in astronomy, and an advanced knowledge-based processing system under developmentat SDSC for a DOE SciDAC project. We would expect to start with the current surveypipeline systems, switch processing to a grid managed environment under Condor whencomputer resources are exceeded, and then switch to the knowledge-based processingsystems for complex queries. The advantage of the knowledge-based systems is theirability to dynamically adjust the workflow based upon results of complex queries toinformation collections. The conditional relationships between processing steps can bequite complex, as opposed to the simple semantic mapping of output files to input filesfor the DAGman system.

6.2.2 Authentication.

USC/ISI has evaluated Spitfire, a database access service, whichallows access to a variety of databases. Spitfire, developed as part of the European DataGrid project, consist of a server as well as client tools. The Spitfire server connectsthrough JDBC to a database using predefined roles. The client can connect directly to theserver through HTTP, and perform database operations. Even though Spitfire seems onthe surface to be an interesting technology,it has many drawbacks in terms of securityand support for transactions that span multiple database tables. For example, althoughSpitfire is based on the Globus Grid Security Infrastructure for Authentication, itexemplifies security problems in terms of

authorization. In tests performed at USC/ISI,we are able to modify the database using a new version of the Spitfire server with anunauthorized client (a client from an earlier version of the code, which did not implementany security). Spitfire also does not currently support transactions that span multiple DBtables. The documentation was also inadequate, as it showed only examples of queryoperations and not example templates for create, update or delete operations. USC/ISIhas communicated the authentication concerns to the Spitfire developers and is currentlystudying the possibility of adding transactional support to Spitfire. We are also followingthe developments within the UK e-Science program for any development in the area ofgrid-enabled interfaced to databases.

6.2.4 Virtual Data. USC/ISI is working with University of Chicago on a Virtual DataSystem, Chimera, which allows users to specify virtual data in terms of transformationsand input data. The system is composed of a language and adatabase for storing theinformation needed to derive virtual data products. USC/ISI has focused on designingand implementing a planner which enables the translation between an abstractrepresentation of the workflow necessary to produce the virtual dataand the concretesteps needed to schedule the computation and data movement.

USC/ISI is currently working on the second version of the planner, which is part of theChimera. The second version allows the planner to map the execution of the workflowontoa heterogeneous set of resources. Currently the planner is rudimentary and furtherresearch is needed to increase the level of sophistication of the planning algorithm as wellas increase the level of the planner’s fault tolerance. ISI is actively workingwith the AIplanning community to increase the capabilities of the planner.

The Virtual Data System language (VDL), developed at University of Chicago isspecified in both a textual and XML format. The textual version is intended for use in theAnnual Report, AST0122449

October 2001-September 2002

18

manual creation of VDL definitions, for use in tutorial, discussion, and publicationcontexts. The XML version is intended for use in all machine-to-machinecommunication contexts, such as when VDL definitions will be automatically generatedby application components for inclusion into a VDL definition database. The VDS-1system, also known as Chimera, is implemented in Java, and currently uses a very simpleXML text file format for the persistent storage of VDL definitions. Its virtual datalanguage provides a simple and consistent mechanism for the specification of formal andactual parameters, and a convenient paradigm for the specification of input parameterfiles. VDS-1 has been released in the summer of 2002.

This planner takes an abstract Directed Acyclic Graph (DAG) specified by Chimera andbuilds a concrete DAG that can then be executed by Condor-G. In the abstract DAGneither the locations of where the computation is to take place nor the location of the dataare specified. The planner consults the replica catalog to determine which data specifiedin the abstract DAG already exists and reduces the DAG to only the minimum number ofrequired computations and data movements. Finally, the planner transforms the abstractDAG into a concrete DAG where theexecution locations and the sources of the inputdata are specified. This DAG is then sent to Condor-G for execution.

7

Service/Data Provider Implementation and Integration

7.1 Service/Data Provider Implementation

Through the publication of the Cone Search and Simple Image Access Protocols, we havemade it possible for service and data providers to begin to make information availablethrough VO-compliant interfaces. Within the NVO project we brought some 50 ConeSearch services on-line, and as the first project year came to a close several SIA serviceshad already been implemented.

7.2 Service/Data Provider Integration (Hanisch/STScI)

Integration of Cone Search (VOTable) and Simple Image Access Protocol services is achallenge for the initial science demonstrations. As formal work in this area is notscheduled until 2003, the science demonstration teams are experimenting and developingprototypes that will, in time, migrate into additional integration tools and templates.

8

Portals and Workbenches

8.1 Data Location Services

Although formal activities in this area are not scheduled until later, the registrationservices for the cone search and simple image access protocols directly impinge on thisarea. Similar prototype efforts as part of the GRB demo project enable searching ahierarchy of surveys to find the “best” available survey image in a given wavelengthregime.

Annual Report, AST0122449

October 2001-September 2002

19

8.2 Cross-Correlation Services

Work in this area is primarily funded by other resources. See WBS 6.1 for details.

8.3 Visualization Services

In anticipation of the need to be able to visualize correlations in complex data sets, suchas joins between large catalogs, we have been evaluating several extant softwarepackages that might serve as user front-ends. Foremost among these

isPartiview, apackage developed originally at NCSA and currently supported by the AmericanMuseum of National History/Hayden Planetarium. R. Hanisch and M. Voit met withprogram developers B. Abbott and C. Emmart to understand more about its capabilities.Partiview (particle viewer) was designed to render 3-D scenes for complex distributionsof particles. It includes, for example, a full 3-D model of the Galaxy as a test data set.Our interest in Partiview is as a visualization tool forn-dimensional

parameter sets,where one might plot an V magnitude on one axis, an x-ray magnitude on a second axis,and an infrared color index on a third axis. The ability to view such distributions fromarbitrary angles, and to “fly” through and around the data, will be helpful inunderstanding correlations and in identifying unusual object classes. Partiview is freelyavailable for Unix and Windows platforms.

We have also begun experimenting withMirage, a 2-D plotting and data exploration tooldeveloped by T. K.Ho (Bell Laboratories). Mirage provides a very flexible userinterface and allows for rapid exploration of complex data. One can highlight objects inone 2-dimensional view, and instantly see the same objects in all other views. We expectto use Mirageas one of the visualization tools for the galaxy morphology sciencedemonstration. Mirage is a Java application that installs easily on any Java-enabledplatform.

Most recently, we have implemented some enhancements to the CDS Aladinvisualization package, including the ability to overlay data directly from VOTables and toplot symbols in colors corresponding to an attribute in the VOTable. For example,objects in a catalog could be marked by position, and a third attribute such as spectralindex or ellipticity could be encoded through the color of the plot symbol. Otherencoding schemes (symbol size, vectors, etc.) will be explored in the future.

8.4 Theoretical Models

The inclusion of the US theoretical astrophysics community into the NVO frameworkcontinues to be a high priority item.

In FY 2002 there were continued discussions amongtheorists interested in establishing a “Theory Virtual Observatory” (TVO) as a workingprototype that could be incorporated into the US-NVO.

These discussions focused

primarily on the N-body codes being developed for simulation of the evolution ofglobular clusters, but discussions were also held with those groups working on N-bodyAnnual Report, AST0122449

October 2001-September 2002

20

plus hydrodynamic codes, together with groups involved with MHD codes.

The intent isto

develop libraries of computationally derived datasets that can be directly comparedwith observations.

In addition, there is interest in establishing sets of commonly shared

subroutines and software tools for post-processing analysis. Throughout the fiscal yearthe “TVO Website,” located athttp://bima.astro.umd.edu/nemo/tvo

has been maintainedand updated.

Work that will lead to incorporation of theoretical astrophysics into the general NVOstructure has been initiated in collaboration with J. McDowell (SAO).

This effort willbegin the definition of the metadata for simulation archives and will design the path

needed to implement the publication and archiving of both theory datasets and theorysoftware.

9

Test-Bed

9.1 Grid Infrastructure

We are engaged in initial experiments based on Grid services at USC/ISI, UCSD, SDSC,and NCSA, building upon the TeraGrid collaboration’s infrastructure. See WBS 6.1 fordetails.

9.2

User Support

Work not scheduled in this area until 2003.

9.3

Software Profile

Work not scheduled in this area until 2003.

9.4

Data Archiving and Caching

Work not scheduled in this area until 2003.

9.5

Testbed Operations

Formal activity in this area not scheduled until 2003, though some use ofthe Gridtestbed is planned for the early science demonstrations.

9.6

Resource Allocation

See WBS 6.2.1.

9.7 Authentication and Security

See WBS 6.2.2.

Annual Report, AST0122449

October 2001-September 2002

21

10

Science Prototypes

10.1 Definition of Essential Astronomical Services

We have defined a set of Core Services for astronomical web services. These includemetadata services, basic catalog query functions, basic image access functions, surveyfootprint functions and functions for cross-identification. URL-based definitions of thesefunctions have been

developed in the Metadata Working Group. Based upon the abovetentative specifications, JHU team members have built a prototype multi-layer WebServices application, called SkyQuery, which uses archive-level core services to performbasic functions, using the SOAP protocol. Proper WSDL descriptions have been writtenfor these services, and the services have been successfully built for the SDSS, FIRST,and 2MASS. Templates for these Web Services have been used successfully by othergroups (STScI, AstroGrid Edinburgh , Institute of Astronomy Cambridge).

JHU and STScI are in the process of creating a prototype footprint service that can beused to automatically determine overlap areas between several surveys. JHU and STScIhave successfully built a simple SOAP-based web service interoperating between the.NET and Java platforms. JHU staff have built a web-services template to turn legacy Capplications into Web Services. In collaboration with A. Moore (CMU), we have builtseveral data-mining web services. We are currently building a C# class around theCFITSIO package, which will enable an easier handling of legacy FITS files within webservices.

10.2 Definition of Representative Query Cases

In order to facilitate the functionality of the NVO and totest software developments, aclear need exists for implementation of representative query cases.

In addition, the earlydemonstration of this capability to the US astronomical community will inform

astronomers in general about the NVO and its ability toenhance scientific inquiry.

Thusin FY 2002 the NVO Science Working Group (SWG) was given the task of developingan appropriate suite of scientific queries that would serve both to test the NVO structureand to demonstrate its capability.

Through the process of many e-mail exchanges and telecons the SWG finally convergedon a set of 13 well-defined scientific inquiries that would

be appropriate to the NVO andwould yield interesting and timely scientific results.

These 13 queries were than presented to an NVO Team meeting held in Tucson 16-17April 2002.

One of the major objectives of this Team Meeting was to converge on a setof three or four Science Demonstration Projects that could be developed in time forpresentation at the AAS meeting in January 2003.

Discussions at the Team Meeting thusfocused not only on the scientific merits of the 13 inquiries but also their technicalfeasibility and their appropriateness to the NVO concept and architecture. Atthe end ofAnnual Report, AST0122449

October 2001-September 2002

22

the Tucson meeting, three of the 13 science queries had been chosen, and their technicalrequirements had been largely defined.

The NVO ExecutiveCommittee held a number of meetings with the team members identified to lead each ofthese demonstrations, and progress in their development has been closely monitored.

10.3

Design, Definition, and Demonstration of Science Capabilities

Gamma-Ray Burst Follow-up Service Demonstration:

The GRB demo comprises severaldistinct elements:



Automated response to the discovery of a GRB: A request to include this demo in theGamma-Ray Burst Coordinate (GCN) network has been submitted. This will informthe service of bursts within a few (typically < 2) seconds of initial GRB trigger insatellite flags. Occasional triggers are being received today, but these will becomecommon with the launch of Swift. The GCN provides software to receive thesereports and this software has been modified to initiate the retrieval of data.



Querying and caching of results. Preliminary scripts for the querying and caching ofresults have been developed and tested. These are currently being changed to use theSIA and Cone search protocols for resources that support them. A more formalizedcaching mechanism needs to be developed.



Initial notification page. A design for the initial notification of a burst was circulatedand comments received. The actual implementation of this page is underway.



Initiation of user interfaces. Neither of the user interfaces to be used in the demo,Aladin or OASIS, directly supports the VOTable format. Scripts for

chopping datainto appropriate pieces to start these programs have been developed but will needfurther work.

Galaxy Morphology Demonstration:

This demonstration, which examines therelationship between galaxy morphology and cluster evolution, has beendesigned toillustrate some of the key functionality of VO infrastructure, including access to datathrough standard interfaces and grid-based analysis (http://bill.cacr.caltech.edu

/cfdocs/usvo-pubs/files/morphdemo-plan2.txt). Development of this demo hasprogressed along several fronts:

1.

Science goal development: With advisement from the Science Working Group,R. Plante, J. Annis, and D. De Young developed the overall plan for the demo.

2.

Development of the Simple Image Access interface: With its development led byD. Tody and contributions from the Metadata Working Group, this interface willbe used to access image data used by the demo.

3.

Identification support of input data sets: E. Shaya and B. Thomas (NASAADC/Raytheon) have implemented access to ADC catalog data via VOTable conesearch interface. This service will provide various data about the target clusters.Annual Report, AST0122449

October 2001-September 2002

23

The exact image data that will be used will depend on which of candidate datasetscan be made available via the SIA interface. Candidates include the DSS survey,2MASS, and the HST WFPIC2 data from the Canadian Data Center. X-ray datawill come from the Chandra data archive through a specialized service that canreturn calculated fluxes; A. Rots and J. McDowell have implemented the basicservice. Galaxy catalog data will come from either CNOC1 catalog from theCDC or the DSS catalog at NCSA.

4.

Assembling the Grid-based data management and computing infrastructure: Aspecial working group made upof J. Annis, E. Deelman, and R. Plante wasformed to work on this. R. Plante has been testing the use of the mySRB tool formanaging the data workspace where data for the demo can be collected. J. Annisand E. Deelman have been defining the technology components required tolaunch the grid-based analysis of the galaxy images.

11

Outreach and Education

11.1 Strategic Partnerships

NVO Outreach Workshop. The NVO project held an outreach workshop in Baltimore onJuly 11-12, 2002 that brought together a diverse group of education and outreach expertsto identify critical features of NVO that would enable effective outreach. Twenty-sixpeople attended, representing the NASA outreach community, the NSF ground-basedastronomy community, museum professionals, amateur astronomers, planetariumbuilders, and developers of desktop planetarium software. The recommendationsemerging from this meeting are setting the agenda for the development of the NVOoutreach infrastructure.

collects and prioritizes the recommendations of the outreach community thatwere identified at the July workshop. The most critical need is for infrastructuredevelopment that will 1) lead non-astronomers who visit NVO to services andinformation that are most likely to be of interest to them, and 2) simplify the developmentof education and outreach resources by our partners. We will develop a metadatavocabulary for identifying and categorizing EPO services; work in this area has alreadybeen applied to theResource and Service Metadata

document.

Amateur Astronomy Image Archive. We have been working on a feasibility study of anAmateur Astronomers Deep Space Image Archive that would encourage amateurs to

publish and request images using NVO protocols. This pilot is in collaboration with Skyand Telescope magazine.

11.2 Education Initiatives

No education initiatives were planned for this year.

Annual Report, AST0122449

October 2001-September 2002

24

11.3 Outreach and Press Activities

No outreach and press

activities were planned for this year.

Annual Report, AST0122449

October 2001-September 2002

25

Activities by Organization

California Institute of Technology/Astronomy Department

S.G. Djorgovski, R. Brunner, and A. Mahabal participated in the discussions on thedevelopment of science demonstration cases. Work was also done in the following areas:

1. Preparation of the DPOSS data (one of the selected data sets) for the various VO usesand demonstration experiments.

Image data reside at both CACR and SDSC.

Catalogdata are served via:

http://dposs.caltech.edu:8080/query.jsp

VOTable format is supported.

A cone search service is under development.

Most of thiswork was done by R. Brunner, with contributions from A. Mahabal.

2. Most of the effort supported by this grant was focused on the exploration of the TopicMaps technology for a VO.

Most of the work was done by A. Mahabal.

In collaborationwith CACR and CDS Strasbourg, we have been using Topic Map technology to createtools that can federate metadata. We are using UCDs (Uniform Content Descriptors) inastronomical catalogs as PSIs (Published Subject Indexes) to relate columns fromdifferent tables to each other. The tool that we have built allows a user to choose a set ofexisting UCD-enabledcatalogs and build a Topic Map out of the metadata of thosetables. That Topic Map is then available for the community and can be used as a datadiscovery tool. Users can explore combinations of different catalogs to look forcompatibility, overlap, cross-matches, and other scientifically enabling activities. Asneeded, for instance, users can generate and query different Topic Maps for X-ray, IR,and optical regions by combining metadata for those catalogs.

We have also started adding meaningful external links as part of the basic Topic Map.These include access to the catalogs, doing statistics on individual columns, plottinghistograms etc. While these tools are neither part of Topic Maps nor necessarilydeveloped by us, providing them in this fashion

is a right step in semantically connectingthe VO tools.

The main Topic Map page can be seen at:

http://www.astro.caltech.edu/~aam/science/topicmaps/ucd.html

The Topic Map generator is accessed by going to:

http://avyakta.caltech.edu:8080/topicmap/

California Institute of Technology/Center for Advanced Computational Research

Extragalactic Database(NED) to create SIA services for their large and diverse image holdings.

Caltech is one of the four core sites of the NSF-funded Teragrid project(http://www.teragrid.org), which is designed

to bring the scientific community towardsthe new paradigm of Grid computing. Much of the funding of this project goes to high-performance clusters of 64-bit Itanium processors, as well as large “datawulf” style diskstorage systems. One of these systems will become part of the NVO testbed, with 15terabytes allocated for storing large astronomical datasets such as DPOSS, 2MASS, andSDSS. These will be available online—with no delay as tapes are loaded—and underNVO access protocols. In this way, we hope to increase acceptance in the astronomicalcommunity of these protocols.

CACR is a collaborator in the NASA-funded Montage project for creating scientificallycredible image mosaics from sky surveys such as 2MASS. Montage allows accurateimage reprojection, thus creating federated multi-wavelength images. CACR, with SDSCand USC/ISI, is working on efficient parallel and grid implementations of Montage.(http://montage.ipac.caltech.edu)

In collaboration with SDSC, we have implemented a cutout servicefor the DPOSSarchive that takes the data from the nearest of any number of replications of the archive.The SRB (Storage Resource Broker) that underlies the service provides this locationtransparency—DPOSS is currently at both SDSC and CACR. The SRB also providesprotocol transparency, so that the archive can be stored with different mass-storagesoftware (HPSS, Unitree, Sun-QFS, Posix, etc.). The image service responds to requestsbased on sky position, then finds and opens the relevant image file andextracts thedesired pixels. Further processing of the cutout creates a valid FITS-World CoordinateSystem header (for sky registration) from the polynomial Digitized Sky Survey platesolution.

Made substantial progress in deploying a general cross-match engine, to be used inthe “brown dwarfs” project.



Developed mature prototype of the Montage image mosaic service, and successfullyran it on 64 nodes of the IBM Blue Horizon supercomputer.



Developed design and requirements for ROME; began prototyping efforts.

Annual Report, AST0122449

October 2001-September 2002

27

Canadian Astronomy Data Centre/Canadian Virtual Observatory Project

The

Canadian Virtual Observatory (CVO) Prototype system has been developed andtested with WFPC2 catalogue content. Deployment of the CVO prototype has beendelayed because of unacceptable database performance. Funding has been secured for amajor upgrade in database hardware and software. Several months have been invested inidentifying the most effective purchasing strategy. A new database system will be inplace by March 31, 2003.

The Canadian Astronomy Data Centre (CADC) has committed to participation in theNVO demo project on Galaxy Morphology and will supply catalogues, cone search,WFPC2 image cutout service, and WPC2 image retrieval service for that demo.

The Canada-France-Hawaii Telescope Legacy Survey represents valuable content for theVirtual Observatory. CADC is designing and implementing the data processing anddistribution system in collaboration with CFHT and TERAPIX. The initial goal is toeffectively deliver archive services for this data and full integration into the CanadianVirtual Observatory (CVO) prototype system will be started.

Storage capacity at CADC will reach 40 Terabytes and a 40-node processing array willbe deployed in early 2003. Hiring of 1.5 FTE of new staffing for CVO has been initiated.

Carnegie-Mellon University/University of Pittsburgh

The NVO NSF funding we received was used to support P. Husing, a programmerworking with the Autonlab group at Carnegie Mellon.

Husing was tasked with threeNVO related problems that he succeeded in during this year.

First, he has created simpleand complete web documentation of our fast and efficient data-mining applications (seehttp://www.autonlab.org/astro/). These pages provide the code and examples of how touse the code and the various inputs and outputs. Second, he succeeded in makingthe EMMixture Model code (see Connolly et al. 2000) command-line based as well as makingthe code much more modular in nature. This is vital for creating a web service out of this

algorithm as well as providing users with the underlying kd-tree technology. Thirdly, hewas successful in working with the JHU SDSS database group in interfacing our EMMixture Model code with the SDSS SQL database. This was done in the Microsoft .NETarchitecture where an http request was sent to the server to a) extract som