Thursday, December 23, 2010

From everyone in Operations we would like to wish you a happy holiday season, and wish you good fortune and progress in all of your endeavors in 2011!

The GOC will continue to provide round the clock holiday coverage in case of emergencies over the holiday season. You can continue to open tickets at https://ticket.grid.iu.edu/goc/open, send us mail at goc@openscinecegrid.org, of call us at +1 (317)-274-9699. However, we will be operating on holiday procedures on Friday the 24th and Friday the 31st. During this time any reported non-emergency issues will be handled on the next business day.

Thanks to everyone for a very successful year! And here's to your continued growth and prosperity in 2011.

Wednesday, December 8, 2010

The schedule for the GGUS Certificate change has been moved. The change will now be made at 14:00 UTC on Thursday, December 9, 2010, which is 8:00am local time for the GOC. This will allow us to greatly shorten the ticket exchange outage. Please note that while no problems are expected, we will check to make sure that all tickets were exchanged correctly during the time it takes to switch our trusted certificate.

Again, during this time, all Alarm-level events will generate telephone calls.

Tuesday, December 7, 2010

The GOC will upgrade the following services beginning at Tuesday, December 14th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered. We encourage users to test affected services before the production release.

GOC Ticket 1.31 (https://ticket.grid.iu.edu)

ITB version is now available for testing at https://ticket-itb.grid.iu.edu

Updated backend script that generates data cache.
Added a new class definition for monthly delta
Added capability to cache the DB fetched records for daily and monthly transfers
Added a join to the JobUsageRecord to fetch the number of jobs for hourly transfers.
RSV Collector (rsv.grid.iu.edu)

Major updates on the RSV report scripts. The script will no loner rely on t2.unl.edu web services and it will use rsvprocess.grid.iu.edu for RSV Metric data.
OSG TWiki (https://twiki.grid.iu.edu)

ITB version is now available for testing at https://twiki-itb.grid.iu.edu; we encourage users to test this service before the production release.

(Related to the last TWiki X509 authentication) Streamlined the Apache configuration files so that user can now access http:// URLs without being redirected to https:// Also, https:// will autologin if valid X509 certificate is provided.

On December 9, 2010, at 8:00 UTC (2:00 AM EST) the GGUS certificate allowing exchange of tickets with the GOC will be updated. This will require an update to the trusted certificate list at the GOC which will occur during working hours in the eastern time one. Between these events, tickets will not propagate between the two systems. After the update at the GOC, any tickets created during the 2:00-8:00 interval will be manually propagated as appropriate. Alarm level events will generate telephone calls as usual during this period.

The GOC has requested that this change be delayed until working hours in the US but has not yet received a response to this request. We will update this notification if the situation changes.

On December 9, 2010, at 8:00 UTC (2:00 AM EST) the GGUS certificate allowing exchange of tickets with the GOC will be updated. This will require an update to the trusted certificate list at the GOC which will occur during working hours in the eastern time one. Between these events, tickets will not propagate between the two systems. After the update at the GOC, any tickets created during the 2:00-8:00 interval will be manually propagated as appropriate. Alarm level events will generate telephone calls as usual during this period.

The GOC has requested that this change be delayed until working hours in the US but has not yet received a response to this request. We will update this notification if the situation changes.

Tuesday, November 30, 2010

OSG Operations and Integration are pleased to announce the release of OSG version 1.2.16

The following components are affected:

* All OSG software installations

This release incorporates a variety of fixes and changes to various OSG software components. The CA cert scripts have been updated to support a new format for CA certs. The existing CA cert format is still supported but the new format will allow the security group to distribute certs that work with the hash names that OpenSSL 1.0 uses. A summary of the main changes are as follows:

* vdt-ca-manage and vdt-cert-update scripts have been updated to allow the CA certificates to be distributed in the new format
* Bestman and SRM-Client-LBNL have been updated to 2.2.1.3.16
* Bestman2, Bestman-Client, and SRM-Tester3-LBNL have been updated to 2.0.3
* Symlinks to various log files have been added to the $VDT_LOCATION/logs directory
* PHP has been updated to 5.2.14 to fix a small security issue
* UberFTP has been updated to 2.6

Update instructions can be found on the OSG Twiki under the OSG 1.2 update instructions ( https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/OSG12UpdateInstructions ).

The release notes for the VDT 2.0.0p23 release underlying this release can be found here ( http://vdt.cs.wisc.edu/releases/2.0.0/release-p23.html ).

Thursday, November 18, 2010

Fermilab has informed us that ongoing problems have led to a Gratia outage since approximately 3PM Central time. Please note that this will impact Gratia data in the various places where it is used, including the Gratia portions of MyOSG and OSG Display.

There is no estimated time of repair yet, but we will pass along updates as we receive them. Your patience is appreciated.

Wednesday, November 17, 2010

The Gratia and Resource Selection (ReSS) services operated
at Fermilab will be unavailable from 05:00 CST to 08:00 CST (1100-1400 UTC).
due to a repair of a circuit breaker in Fermilab's facility.
Gratia probes will automatically send all the data to the
Gratia collectors after they come back up so no action on the
part of site administrators is necessary.

Also OSG resources FNAL_FERMIGRID and FNAL_GPGRID_1 will
be unavailable from 04:00 CST until 12:00 CST.

Tuesday, November 16, 2010

The GOC will upgrade the following services beginning at Tuesday, November 23rd, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered. We encourage users to test affected services before the production release.
OIM 2.28 (https://oim.grid.iu.edu)
ITB version is now available for testing at https://oim-itb.grid.iu.edu

* Changed the logic used to determine which downtimes can be edited by users [OIM-59]
* Added AUP Confirmation check boxes to Resource / SC and VO forms. [OIM-57]
* Updated contact association view to add details for each items as to which contact type users are associated with [OIM-61]
* (Patched Already) Installed capacity report bug fix by Karthik.

OSG TWiki (https://twiki.grid.iu.edu)
ITB version is now available for testing at https://twiki-itb.grid.iu.edu; we encourage users to test this service before the production release.

* Authentication method will be switched from username/password system to x509 system. [OIM-45]
* GOC Service monitor will be installed to monitor conditions of the TWiki service. [MISC-84]

MyOSG 1.29 (https://myosg.grid.iu.edu)

ITB version is now available for testing at https://myosg-itb.grid.iu.edu

* Enabled GGUS title update and updated to the latest WSDL which for GGUS accessor
* Updated jssecacerts file for new iwrgustrain.fzk.de HTTP cert
* GGUS2GOC now sets ASSOCIATED_R_ID/NAME (if the selected site is a resource name)

Software Cache (https://software.grid.iu.edu)
ITB version is now available for testing at https://software-itb.grid.iu.edu

* VO Sponsor list is now loaded dynamically from MyOSG

All services

* Install RHEL updates, specifically a critical glibc security update that addresses a vulnerability described here: https://www.redhat.com/security/data/cve/CVE-2010-3856.html
* As the glibc update requires a reboot to take effect, we will be installing all other available updates at this time, excluding those known to adversely affect performance

Wednesday, November 3, 2010

The GOC will upgrade the following services beginning at Tuesday, November 9th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

BDII Service Upgrade Plan

The first step in the BDII Upgrade Plan will be implemented. This will consist of a BDII Version 5 machine being brought up at is3.grid.iu.edu. This instance will NOT be put into DNS Round Robin, but will be a production quality service available for testing of Version 5 of the BDII Software.

The full BDII Upgrade document is available at https://docs.google.com/document/d/1sYZyZcPtWD3ZXp6-YQF7Wzty_9wJuQRbokDcSWgd7b4/edit?authkey=CJOH65UJ&hl=en#.
RSV-Client (GOC Internal Service)
Fixed an issue where the following probes were not passing HTTP error codes to the probe output.

Tuesday, October 26, 2010

Data instability was seen today in the Indianapolis-based BDII (is2.grid.iu.edu) between the hours of 18:30 UTC and 21:45 UTC. During this time availability of OSG Sites Data was inconsistent.

The problem was traced back to a IP renumbering project which had unforeseen affects on our internal DNS and LDAP. As soon as the issue was identified, the maintenance was stopped and service was rolled back to the pre-maintenance configuration.

Tuesday, October 19, 2010

OSG Operations and Integration are pleased to announce the release of OSG version 1.2.15

The following components are affected:

* CE installations
* Client installations
* Bestman installations

This release includes the following updates:

* The Gratia GridFTP probe will be properly configured to report results
* The Generic Information Provider has been updated
* The OSG Discovery Tools have been updated
* Other small changes, see the VDT release notes for full details:
http://vdt.cs.wisc.edu/releases/2.0.0/release-p22.html

The GOC will upgrade the following services beginning at Tuesday, October 26th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

Tuesday, October 26th at 14:00 UTC

OIM 2.26 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

ITB version is now available for testing at https://ticket-itb.grid.iu.edu; we encourage users to test this service before the production release.

Added submitter information when a ticket is initially created (similar to previous fix made on missing submitter when non-Footprint agent updates a ticket)
Moved the form explanations verbiage to each submission page
Minor CSS change for Filter box on ticket navigator (shorten it slightly)

Redhat Enterprise Linux Software Updates

We will be installing software updates that do not require system restarts, so no downtime is expected due to this. Since last month, the only software update Redhat has released that affects GOC systems and is marked Critical is a security update to the java-1.6.0-sun package. There are also a number of updates of lesser importance, which we will also install, with a few exceptions. Some of those noncritical updates are to the kernel, glibc, and related packages, which would require system restarts, so we will not be installing those at this time. The only packages we will update will be those that do not require system restarts.

Monday, October 11, 2010

Due to a site-wide machine room reorganization the GOC will be moving its Indianapolis-based servers to a new rack beginning at 14:00 UTC on Tuesday, October 12, 2010.

The moving of each server will take only a short time, probably no more than 10 minutes, but we are reserving up to an hour in case of unexpected complications. We will be using the DNS round-robin to shift traffic to the Bloomington servers during this time where possible (CEMon/BDII, MyOSG, the OSG Software Cache, and the GOC Ticketing System).

Services that will be affected include:

BDII: Will remain online but may experience degraded service. We have seen ATLAS SAM test failures while doing similar maintenance.
OSG TWiki: Will experience a short downtime during this window.
OSG Information Management (OIM): Will experience a short downtime during this window.

The move to the new rack should result in improved cooling and power infrastructure for the GOC's Indianapolis-based servers. We appreciate everyone's patience during this transition.

Wednesday, October 6, 2010

Due to a site-wide machine room reorganization the GOC will be moving its Indianapolis-based servers to a new rack beginning at 14:00 UTC on Tuesday, October 12, 2010.

The moving of each server will take only a short time, probably no more than 10 minutes, but we are reserving up to an hour in case of unexpected complications. We will be using the DNS round-robin to shift traffic to the Bloomington servers during this time where possible (CEMon/BDII, MyOSG, the OSG Software Cache, and the GOC Ticketing System).

Services that will be affected include:

BDII: Will remain online but may experience degraded service. We have seen ATLAS SAM test failures while doing similar maintenance.
OSG TWiki: Will experience a short downtime during this window.
OSG Information Management (OIM): Will experience a short downtime during this window.

The move to the new rack should result in improved cooling and power infrastructure for the GOC's Indianapolis-based servers. We appreciate everyone's patience during this transition.

The GOC will upgrade the following services beginning at Tuesday, October 12th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

CEMon/BDII - *ITB Only* (http://is-itb.grid.iu.edu)

* Install-Capacitor filtering script / Storage capacity adjustments
o processing of storage capacity based on resource group and resource name
o adjustment of storage capacity based on OIM data
* Added new monitors & munin metrics.
* Increased the number of processors assigned to 2

Software Cache (https://software.grid.iu.edu)

* For the Certificate form, vo-sponsor file was updated.

OSG Display 1.0.7 (http://display.grid.iu.edu)

* Changed the data load frequency indicator to 10 minutes. 5 minutes update was causing too much load on source Gratia DB. It is now update every 10 minutes.

GOC Ticket 1.28 (https://ticket.grid.iu.edu)

Bug Fixes

* Fixed the bug where non-admin edit doesn't mark the generic submitter's name in the description and metadata [GOCTICKET-75]
* Fixed the mailto encoding issue for the ticket viewer

All Servers - Renumber Private VLAN IPs

This will not affect the public IPs of any servers; it will only affect the internal private VLAN that the GOC uses to allow servers to talk privately amongst themselves. There may be slight service irregularities, but nothing that is expected to last more than a few minutes as the various nameservice caches expire.

This release also incorporates smaller fixes to a variety of components distributed in the OSG stack. Although any given change is not very large, a large number of components have been updated.

Highlights of the changes include:

* setup.sh is no longer sourced by the Globus jobmanager, reducing the load and I/O on gatekeepers
* Tomcat's settings have been altered to use 512MB of memory, this helps to avoid out-of-memory error in CEMon seen by larger sites
* the vdt-updater script has been altered so that subsequent updates will not require a second backup
* Glexec has been updated and includes a fix for users that have run a grid-proxy-init instead of voms-proxy-init
* small security updates to MySQL, Apache httpd, and OpenLDAP
* LFC has been updated to 1.7.4-7

See the VDT release notice for more detailed information. Notable fixes include support for grid-proxy-init in Glexec and security fixes in MySQL, Apache, and OpenLDAP.

Update instructions can be found on the OSG Twiki under the OSG 1.2 update instructions:

Tuesday, September 21, 2010

The GOC will upgrade the following services beginning at Tuesday, September 28th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

OIM 2.25 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Thursday, September 9, 2010

The GOC will upgrade the following services beginning at Tuesday, September 14th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.
OIM 2.24 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Added filter for submitter contact type so that submitters can't edit Resource / SC / VO records. Added submitter contact to Misc contact on all entities if necessary [OIM-43]
For Resource and Support Center form, added code to pre-populate required contacts with submitter's contact. [OIM-46]
Fix the way site is ordered inside resource group selector (and site selector) [OIM-45]
Improve "update confirmation date" feature [OIM-44]
Fixed the line-wrapping issue by moving application menu to the top.
Made the header to looks more similar to other applications.
Fixed the wrong access check for contact association.
MyOSG 1.26 (https://myosg.grid.iu.edu)

ITB version is now available for testing at https://myosg-itb.grid.iu.edu; we encourage users to test this service before the production release.

Update bdii information table [MYOSG-86]
Fixed the issue where invalid timezone (invalid for PHP, but valid for Java) can cause invalid assignment of timezone. Added indicator of timezone next to the DN.
Fixed the issue where long URL caused by long time duration for google chart api can cause graph to not render [MYOSG-85]
Various Cosmetic Changes
GOC RSV clients / cacert-verify-igtf-central-probe will be updated to the latest version.
GOC Ticket 1.26 (https://ticket.grid.iu.edu)

ITB version is now available for testing at https://ticket-itb.grid.iu.edu; we encourage users to test this service before the production release.

We will be upgrading our ITB instances to the latest version of BDII version 5.7.1. This upgrade will *not* be applied to our production.

Software Cache (https://software.grid.iu.edu)

DOEGrid Certificate request & renewal pages will be installed on software.grid.iu.edu. You can test these forms at > http://software-itb.grid.iu.edu/cert/
All Services

Following system administrative updates will be made during this production update.

SSHD: Disabling GSSAPI authentication for SSH, as it is unused and is a security risk
Apache: Disabling mod_proxy_connect, as it is unused and is a security risk
Apache: Disabling HTTP TRACE method, as it is unused and is a security risk
Apache: Disabling ServerSignature, as it is a security risk
Apache: Reducing information available via ServerTokens, as it is a security risk
Apache: Disabling SSL on all servers that do not actually use SSL (servers that use SSL will be unaffected), as it is a security risk

This release has updates to three components. Xrootd, Gratia, and the Fermi SRM clients have been updated. The updates are fairly minor and do not necessitate an immediate update unless you are encountering an issue that is corrected in these updates.

Notable fixes include correct a syntax issue when specifying the redirector on a data server. Gratia has been updated to correct several bugs and has added accounting probes for Xrootd. Fermi SRM clients have been updated as well.

The release notes for the VDT 2.0.0p20 release underlying this release can be found here: http://vdt.cs.wisc.edu/releases/2.0.0/release-p20.html

Update instructions can be found on the OSG twiki under the OSG 1.2.12: https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/OSG12UpdateInstructions

Tuesday, August 24, 2010

Earlier today, it was announced that the GOC would briefly bring the TWiki and OIM down today to perform maintenance. Unfortunately, that outage period has been pushed back. We anticipate these outages to be brief and to occur between 14:00 and 17:00 EDT.

The GOC would like to clarify the service announcement for this week
by explicitly announcing the services to be effected by todays
release include TWiki and OIM. We anticipate these outages to be brief
and occur between 9:00 and noon EST.

Monday, August 23, 2010

The WLCG has confirmed with the GOC that Service Availability Monitoring messages are now arriving properly. We are currently re-sending the missed records after receiving the confirmation that the mechanism is fixed and will continue to track this issue to ensure there is no recurrence.

This morning, Service Availability Monitoring problems were reported in which sites were showing in unknown status. The GOC is currently giving this issue top priority. Please be advised of this issue and expect another notification when we can confirm with our WLCG collaborators that this issue is fully resolved and we will additionally re-send any data that was not properly received during this time.

We apologize greatly for the inconvenience and thank you for your patience.

Tuesday, August 17, 2010

The GOC will upgrade the following services beginning at Tuesday, August 24th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.
GOC Ticket Synchronizer 1.11

We will be installing Redhat software updates to all production services except for BDII. This will require reboots in most cases; however, in those cases where we have redundancy via DNS round-robin, we will be making use of this to reduce or eliminate service interruption. Only the updates that have already been installed on the ITB hosts will be installed on the production hosts. The production BDII servers will not be updated in this release.

We will be rebooting is1.grid.iu.edu, which has been responding sluggishly since an attempt to install a software update, which was rolled back. This is one of the two BDII servers, and we will be utilizing the DNS round robin to shift traffic to the other server, which is not being rebooted at this time, so we expect no interruption in service.

Wednesday, August 4, 2010

The GOC will upgrade the following services beginning at Tuesday, August 10th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

OIM 2.23 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

ITB version is now available for testing at https://twiki-test.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:Added new style for CEMon/BDII (http://is.grid.iu.edu)

There has been a security update to OpenLDAP; Redhat has backported a patch to OpenLDAP while maintaining the same major version, as is customary with Redhat Enterprise Linux. This is documented here: http://rhn.redhat.com/errata/RHSA-2010-0542.html . At the same time, there are a number of other Redhat updates we would like to install on the two BDII servers, including this kernel update: http://rhn.redhat.com/errata/RHSA-2010-0504.html . We will bemaking use of our DNS round-robin setup between the two servers to ensure that one of them will always be up and responding to queries.

Timekeeping Improvements on Virtual Machines

The unreliability of the timekeeping on all GOC virtual machines has been a cause for concern for some time. We have followed VMware's best practices (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427) in terms of ntpd parameters, but this has not been enough, despite VMware's assurances that the "divider=10" kernel parameter is no longer necessary as of RHEL 5.3. After extensive testing we have determined that certain kernel command-line parameters will greatly improve the timekeeping, but in order to make use of them a virtual machine must be rebooted. We will therefore be systematically rebooting GOC services that reside on VMs after altering the kernel parameters to improve the timekeeping. We will also be taking advantage of the maintenance window to apply all recent Redhat security updates. This will affect all GOC services other than BDII (which does not reside on virtual machines), but downtime will in all cases be limited to under 15 minutes, and very likely under 5 minutes.

Tuesday, July 20, 2010

The GOC will upgrade the following services beginning at Tuesday, July 27th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

Host relocation at Indianapolis site

GOC staff will be relocating the servers at the Indianapolis site to another rack and installing additional equipment. The following services will be down during the relocation:
OIM
OSG TWiki

The following services will be redirected to Bloomington and will remain up, although there may be a temporary decrease in performance during the maintenance:
CEMon/BDII
GOC backend database cluster
GOC RSV client (monitors various GOC services)
GOC Ticket
MyOSG
MyOSG consolidater for RSV data
Software Cache/CA Certificate Repository

OIM 2.22 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:
Added link to OSG homepage on OSG header logo.
Fixed a broken link on the home page.

MyOSG 1.23 (https://myosg.grid.iu.edu)

ITB version is now available for testing at https://myosg-itb.grid.iu.edu; we encourage users to test this service before the production release.

At 10:40 CDT (11:40 EDT, 15:40 UTC) on 2010/07/15, the status of the Gratia upgrades was as follows:

The OSG-PROD service completed its table upgrade at 18:24 CDT 2010/07/14 (19:24 EDT, 23:24 UTC) and started receiving data shortly thereafter.
At 19:08 CDT (20:08 EDT, 00:08 UTC on 2010/07/15) the reporting DB IP was switched over to the collector DB, meaning that the reporting, "snapped forward" in time and continues to catch up as data come in from remote probes. As of this time, the downtime for all services except OSG-TRANSFER is considered complete.

At 21:51 CDT (22:51 EDT, 02:51 UTC on 2010/07/15) the OSG-TRANSFER table upgrade was completed. The service had to be restarted due to a possible timed out connection, meaning that there is a small possibility that some older probes, "froze" and may have to be restarted. As of this time however, the downtime for OSG-TRANSFER is considered complete.

At 10:13 CDT on 2010/07/15 (11:13 EDT, 15:13 UTC) the reporting DB was observed to have caught up to the collector DB and the IP was switched back and backups enabled. Reporter URLs are now pulling their data from the reporting DB.
At this time, the Gratia upgrade is complete and no more disruptions are anticipated.

If anyone believes their probe has become stuck (this should be only a rare occurrence), they should check for processes with "gratia" in the command string (ps auwwx | grep gratia) and kill any that were started yesterday. All probes except dCache-transfer will recover automatically; dCache-transfer probe should be restarted using:

service gratia-dcache-transfer stop
service gratia-dcache-transfer start

2) OSG-DAILY and OSG-ITB services have been upgraded successfully and are receiving incoming probe data.

3) Based on a very rough estimate of progress (the relative sizes of the upgrading table file and its temporary counterpart on disk), OSG-PROD is expected to come online around 18:30 CDT (19:30 EDT, 23:30 UTC) this evening.

4) At the time OSG-PROD comes back online, we will swap the reporting-DB IP over to the collector DB and the reporting will start to catch up. Meanwhile the upgrades will be replicated to the reporting DB. All service downtimes except OSG-TRANSFER will be considered complete at this time.

5)OSG-TRANSFER is expected to complete by the same token around or before 23:30 CDT (00:30 EDT, 04:30 UTC), at which time it will quietly start receiving data without human intervention.

6)The reporting-DB IP will be switched back to the reporting DB at such time as the replication has caught up, which is likely to be a day or two from now. This will be transparent and will not affect user-visible service.

One particular note: please do not try to kill / restart probes until such time as that service has been marked available in OIM: A hang is expected until that time.

Monday, July 12, 2010

All FNAL-based Gratia services will be down on 2010/07/14 for OS and service
upgrades. In addition, the previously announced decommissioning of the legacy
redirector service will also take place at this time.

Gratia release notes for v1.06.16:

* Principal improvement is to the housekeeping feature: this is expected to greatly reduce and hopefully eliminate the instances of significant data lag in reporting.

Outage details:

* At about 09:00 CDT (14:00 UTC), incoming data to the GRATIA-OSG-PROD, GRATIA-OSG-ITB, GRATIA-OSG-TRANSFER and GRATIA-OSG-DAILY services will be stopped in such a way as to hopefully eliminate the possibility of probes becoming "stuck" as has happened in the past (note: probe release currently in VDT test cycle will eliminate this completely).

* There will be one data-collection outage of about 20 minutes, with a shorter
outage in reporting as services are shuffled between highly-available servers and the reporting services are upgraded.

* There will be another outage in data collection of much longer duration as
collector services are upgraded. Because a DB schema upgrade is involved, this could be several hours in duration. During this period, reporting services will be available but data will of course be stale.

* When all collectors upgrades are complete, data collection will resume. This is not expected to be later than 16:00 CDT (21:00 UTC), but given the uncertainty in the time required to upgrade each schema service may be resumed in actuality some time earlier or later than this estimate. In any event, the MyOSG downtime page will have the latest details.

* During any data collection outage, data are retained on probes and re-sent when collector service resumes.

* At some point during this period, the legacy redirector service which has up to now redirected probe data sent to obsolete addresses and port numbers, will be deactivated. The upcoming demise of this service has been announced previously.

Friday, July 9, 2010

All FNAL-based Gratia services will be down on 2010/07/14 for OS and service
upgrades. In addition, the previously announced decommissioning of the legacy
redirector service will also take place at this time.

Gratia release notes for v1.06.16:

* Principal improvement is to the housekeeping feature: this is expected to greatly reduce and hopefully eliminate the instances of significant data lag in reporting.

Outage details:

* At about 9am, incoming data to the GRATIA-OSG-PROD, GRATIA-OSG-ITB, GRATIA-OSG-TRANSFER and GRATIA-OSG-DAILY services will be stopped in such a way as to hopefully eliminate the possibility of probes becoming "stuck" as has happened in the past (note: probe release currently in VDT test cycle will eliminate this completely).

* There will be one data-collection outage of about 20 minutes, with a shorter
outage in reporting as services are shuffled between highly-available servers and the reporting services are upgraded.

* There will be another outage in data collection of much longer duration as
collector services are upgraded. Because a DB schema upgrade is involved, this could be several hours in duration. During this period, reporting services will be available but data will of course be stale.

* When all collectors upgrades are complete, data collection will resume.

* During any data collection outage, data are retained on probes and re-sent when collector service resumes.

* At some point during this period, the legacy redirector service which has up to now redirected probe data sent to obsolete addresses and port numbers, will be deactivated. The upcoming demise of this service has been announced previously.

Thanks for your help and time,

Chris Green

Tuesday, July 6, 2010

The GOC will upgrade the following services beginning at Tuesday, July 13th, 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

MyOSG 1.22 (https://myosg.grid.iu.edu)
ITB version is now available for testing at https://myosg-itb.grid.iu.edu ; we encourage users to test this service before the production release.

Thursday, July 1, 2010

This update replaces Java 5 with Java 6 because Java 5 is past its end of life and no longer has security updates. This affects all software in the OSG software stack that uses Java. In addition, we have updated to the latest version of Java 6, 1.6.0_20.

This release updates several software components to new versions, see the complete list below.

Monday, June 21, 2010

The GOC will upgrade the following services beginning at Tuesday, June 22nd 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

OSG TWiki (https://twiki.grid.iu.edu)
ITB version is now available for testing at https://twiki-test.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Added PDF Generator plugin (GenPDFAddOn)

OIM 2.21 (https://oim.grid.iu.edu)
ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Removed indentation for vo form / science vo info section
* Split the field of science list into 3 columns
* Patched bug that caused null URL to be displayed on VO form

MyOSG 1.21 (https://myosg.grid.iu.edu)
ITB version is now available for testing at https://myosg-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Fixed hidden service filter bug
* Implemented initial version of Misc/Operational Status Page.
* Updated OIM hierarchy indicator for resource group
* RSV Status Map: Updated logic so that non-reporting site, or site that has the latest resource status change older than N days will be treated as "Non Reporting". Added new information to display to display such sites with "NR" icons.
* RSV Status Map: Updated the map bubble style to be more inline with rest of MyOSG. Also added some verbiages to make it easier to understand why no resource status is displayed
* RSV Status Map: Updated the default URLs related to RSV status map so that "show non reporting site" check mark is checked by default.
* RSV Status Map: Set the default configuration parameter for "expired status change" to be 30 days.
* RSV Status Map: Added legend and link to a page explaining status calculation method.
* Made it easier to understand resource group / resource, site entities since this view is not exposed via display.grid.
* Other minor cosmetic changes and minor bug fixes.

GOC-TX
Release Notes:

* Moving internal synchronization table to data1/2.grid.iu.edu

GOC Ticket 1.21 (https://ticket.grid.iu.edu)
ITB version is now available for testing at https://ticket-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Released new version of ticket navigator (switched /navigator2 to /navigator and renamed original /navigator to /navigatorold)
* Updated the application icon and favicon (from tea cup to an orange tag)
* Fixed the bug where phone number didn't get populated (patched on production)

Tuesday, June 15, 2010

The GOC will upgrade the following services beginning at Tuesday, June 22nd 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

OSG TWiki (https://twiki.grid.iu.edu)
ITB version is now available for testing at https://twiki-test.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Added PDF Generator plugin (GenPDFAddOn)

OIM 2.21 (https://oim.grid.iu.edu)
ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Removed indentation for vo form / science vo info section
* Split the field of science list into 3 columns
* Patched bug that caused null URL to be displayed on VO form

MyOSG 1.21 (https://myosg.grid.iu.edu)
ITB version is now available for testing at https://myosg-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Fixed hidden service filter bug
* Implemented initial version of Misc/Operational Status Page.
* Updated OIM hierarchy indicator for resource group
* RSV Status Map: Updated logic so that non-reporting site, or site that has the latest resource status change older than N days will be treated as "Non Reporting". Added new information to display to display such sites with "NR" icons.
* RSV Status Map: Updated the map bubble style to be more inline with rest of MyOSG. Also added some verbiages to make it easier to understand why no resource status is displayed
* RSV Status Map: Updated the default URLs related to RSV status map so that "show non reporting site" check mark is checked by default.
* RSV Status Map: Set the default configuration parameter for "expired status change" to be 30 days.
* RSV Status Map: Added legend and link to a page explaining status calculation method.
* Made it easier to understand resource group / resource, site entities since this view is not exposed via display.grid.
* Other minor cosmetic changes and minor bug fixes.

GOC-TX
Release Notes:

* Moving internal synchronization table to data1/2.grid.iu.edu

GOC Ticket 1.21 (https://ticket.grid.iu.edu)
ITB version is now available for testing at https://ticket-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Released new version of ticket navigator (switched /navigator2 to /navigator and renamed original /navigator to /navigatorold)
* Updated the application icon and favicon (from tea cup to an orange tag)
* Fixed the bug where phone number didn't get populated (patched on production)

Wednesday, June 9, 2010

OSG Operations and Integration are pleased to announce the release of OSG version 1.2.10.

This update fixes several minor bugs in software found in the OSG software stack and adds some new functionality to xrootd to support ATLAS and Tier3 requirements.

This is an update for several of the components in the OSG software stack. The changes are primarily to provide xrootd changes requested by ATLAS and as well as to correct a few minor issues. This update is recommended mainly for admins interested in the new features present in the xrootd update.
Components updated
This release updates several software components to new versions, see the complete list below.
* Xrootd
* Bestman, SRM-Client-LBNL, and SRM-Tester-LBNL 2.2.1.3.12
* MyProxy and GSIOpenSSH 5.1
* vdt-ca-manage
* Apache
* osg-version

Please see the VDT release notes for more details: http://vdt.cs.wisc.edu/releases/2.0.0/release-p17.html

Update instructions can be found on the OSG twiki under the OSG 1.2. update instructions: https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/OSG12UpdateInstructions

Tuesday, June 1, 2010

The GOC will upgrade the following services beginning at Tuesday, June 8th 2010 at 14:00 UTC. The GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.BDII (ldap://is.grid.iu.edu:{2170,2180})

ITB servers are running the latest CEMon collector; the BDIIs on the ITB servers are now available for testing at ldap://is-itb.grid.iu.edu:2170 (OSG BDII) and ldap://is-itb.grid.iu.edu:2180 (WLCG Interop BDII); we encourage users to test this service before the production release.

Release Notes:

* Upgrade CEMon collector to latest available production release version.

OIM 2.20 (https://oim.grid.iu.edu)ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Modified VO registration/edit forms to allow pure resource provider VOs to be registered; number of required fields for such VOs is smaller. (Related: https://ticket.grid.iu.edu/goc/viewer?id=8417) * Modified ordering of type of contacts shown on resource, VO, SC display pages to be more intuitive * Added filters to prevent OIM registrations from sending notifications to various contacts in debug mode. * Updated jQuery/jQuery-UI libraries to the latest version.

MyOSG 1.20 (https://myosg.grid.iu.edu)ITB version is now available for testing at https://myosg-itb.grid.iu.edu; we encourage users to test this service before the production release.

The GOC will upgrade the following services beginning at Tuesday, May 25 2010 at 14:00 UTC.

Special Note about short expected intermittent outages

A short intermittent outage of (~10 minutes) is expected for all the services listed below while the GOC adjusts the virtual machine settings one VM host at a time; For example, myosg2 might be down for 10 minutes while myosg1 is still available for anyone trying to access myosg.grid.iu.edu. Additionally, the GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

Update 2010-05-24: The GOC will also restart the NSCD daemon with an updated configuration on the BDII servers is1 and is2 during this maintenance. This is expected to be completely transparent to the outside world, and users of the BDII should not experience any problems.

OIM 2.19 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:
* Added direct link to register a new vo, sc, or resource on the home page.
* Converted descriptive text to tooltips on homepage.
* Added tooltips on VO page (Expect more of this in the future!)

Patched since last release:
* Fixed an issue which caused a null pointer exception when an empty Log ID was parsed by a specific Log function.
* Fixed authorization bug in Contact and ProfileEdit Servlets.

MyOSG 1.19 (https://myosg.grid.iu.edu)

ITB version is now available for testing at https://myosg-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:
* Switched the order of metadata / comment update on RT accessor so that when RT sends an update, the metadata shown on it has the latest information.
* Added logic for BNL so that it will use comment action instead of correspondence if ticket status is going to be resolved in order to avoid re-opening of the ticket due to ticket update (a necessary change due to metadata / comment reorder)
* Update GOC-TX so that if GGUS's ticket status is currently "verified" and if the new status is "solved", then GOC-TX will not do the update. We do something similar thing for RT already - we need to have this logic because the concept of "verified" status currently exist only in GGUS.
* Added code to append custom field information for RT during ticket update.
* Added CF.{VO Name} for GOC > RT conversion.
* Updating Associate RG name to "Resource Group Name" for GOC>BNL converter

GOC Ticket 1.19 (https://ticket.grid.iu.edu)

ITB version is now available for testing at https://ticket-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:
* Made Ticket Viewer to show priority tags when ticket has higher priority
* [For GOC Staff only] On Myticket page, changed the table header to reflect that fact that it includes CC-ed ticket

Friday, May 14, 2010

The OSG TWiki experienced a second brief outage while administrators were
investigating reported service slowness. The TWiki was restarted as
soon as alarms were received and the recorded downtime was approximately
30 minutes.

The OSG TWiki experienced a brief outage while administrators were
investigating reported service slowness. The TWiki was restarted as
soon as alarms were received and the recorded downtime was less than
10 minutes.

Friday, May 7, 2010

The GOC has scheduled removal of is2.grid.iu.edu from round-robin for is.grid.iu.edu on Monday, May 10, 2010 11:00 EST (15:00 UTC), to enable work on resolving potential operating system and/or hardware issues. We do not have an ETA for when is2 will be added back into the BDII round-robin, we expect it to be out of commission for at least 2 weeks. We will announce via a follow up notification when we have a clear idea. During the time is2 is offline, the is1 server is expected to handle all the BDII traffic without any issues - the GOC will monitor the situation.

Tuesday, May 4, 2010

OSG Operations was made aware of an issue of some CEMon clients on production CEs not reporting to the OSG CEMon collector; consequently, those CEs did not show up on the BDII.

GOC staff and USCMS admins have started troubleshooting this issue, and have found a temporary work around by removing one of the two CEMon collectors (on is2) from activity. The raw LDIF data is still being rsynced between the two servers. At this point all information is available in the OSG BDII and being sent properly to the WLCG. Investigations of the cause and work toward a fix will continue when business hours resume tomorrow.

The GOC will upgrade the following services beginning at Tuesday, May 11 2010 at 14:00 UTC. No outages are expected for any of the services listed but the GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered.

OIM 2.18 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

* Added "sort by" section for VO [Jira ticket MYOSG-32]
* Cleaned up verbiage on Query editor
* Updated formatting of service name display (to use smaller font)
* [Experimental Feature]: Added BDII Information page under Resource Group - shows selected information from BDII like version numbers, and some other statistics. Note that this feature is still in experimental mode, and may be modified further in future releases.

GOC-TX Ticket Exchange/Sync System 1.5 (tx.grid.iu.edu)

The ITB TX server has been functioning properly.

Release Notes:

* Added capability to access GOC's metadata server
* Replaced GGUS internal field that was added to ticket update with metadata.
* Added new FP>RT custom field conversion using newly available metadata

* Removed erroneous conversion code for ticket type field from GGUS to FP

GOC Ticket 1.18 (https://ticket.grid.iu.edu)

ITB version is now available for testing at https://ticket-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

* Updated how db connections are handled to allow failover to secondary server
* Made the resource form to be able to preselect resource names (per MyOSG ticket submitter change)
* Updated various menu/sub-menu links - based on recent changes to the location of various forms
* [GOC Staff and Ticket Editors]Updated ticket URL for ticket cluster view
* [GOC Staff and Ticket Editors]: Added new "My Tickets" page
* Added more error logs in case TX assignee can not be found

Thursday, April 22, 2010

OSG Operations and Integration are pleased to announce the release of OSG version 1.2.9.

This is an update for several of the components in the OSG software stack. The changes are primarily to fix minor bugs or add some new functionality. Upgrading is not needed unless you are affected by one of these bugs or you need the new functionality provided in an update.

Wednesday, April 21, 2010

The GOC will upgrade the following services beginning at Tuesday, April 27th at 14:00 UTC.

Special Note about short expected intermittent outages
A short intermittent outage of (~5 minutes) is expected for all the services listed below while the GOC adjusts the virtual machine settings one VM host at a time; For example, myosg2 might be down for 5 minutes while myosg1 will still available be for anyone trying to access myosg.grid.iu.edu; Additionally, the GOC reserves four hours (14:00 - 18:00 UTC) in the unlikely event that unexpected problems are encountered. Services impacted include:

GOCTicket (Round-robined between two service endpoints)

GOC-TX

GOC VOMS

MyOSG (Round-robined between two service endpoints)

OIM

OSG Display

Secmon RA Agent (For security team)

Software Cache (Round-robined between two service endpoints)

Twiki

OIM 2.17 (https://oim.grid.iu.edu)

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

Update MySQL Connector and patched various connection related issues (some of the critical fixes are already installed on the current production instance)

Various minor bug fixes

MyOSG 2.17 (https://myosg.grid.iu.edu)

ITB version is now available for testing at https://myosg-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

Added status change XML that can be optionally pulled from RGstatusHistory XML

Updated logic on RSV status map to remove resource groups that do not have any resources reporting RSV status data

RSV status map will now honer disabled field on site table

Added HEPSPEC XML tag to misccpuinfo per Brian's request

Under Resource Group / GIP validation status page

Updated the way LDIF URLs to match resource name, and not resource_group name; Placed LDIF URL links under resource

Thursday, April 8, 2010

There will be a disaster recovery exercise performed by the IU GRNOC
between Saturday, April 10, 2010, 12:00 UTC until Sunday, April 11, 2010,
12:00 UTC which includes the server that houses the Footprints ticket
system. The GOC does not anticipate an outage of its ticket system, but
please be advised that there may be some unanticipated irregularities
or intermittent problems during the specified window.

Wednesday, March 31, 2010

OSG Operations and Integration are pleased to announce the release of OSG version 1.0.6

This is a security update for all OSG installations using Gratia probes to report accounting information. The urgency of this update depends on the following factors:

* If your resource is using Gratia probes to report accounting information and are using Condor or Managed Fork, you should apply this update to prevent authorized local users from gaining elevated privileges
* Other resources using Gratia probes for accounting can treat this as a low priority security update that may prevent authorized local users from being able to run a DOS attack on Gratia reporting
* Resources not using Gratia do not need to apply this update since it does not apply in this case

This release updates two software components, see the complete list below.

* Gratia probes
* osg-version

Please see the VDT release notes for more details: http://vdt.cs.wisc.edu/releases/1.10.1/release-p25.html

Update instructions can be found on the OSG twiki under the OSG 1.0.6: https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/OSG106UpdateInstructions

Sites using rpms to install Gratia probes should update their rpms to the latest versions in accordance with the guidelines given in the summary.

Friday, March 26, 2010

During a test of the Footprints server redundancy this afternoon, the GRNOC at IU mistakenly re-sent some old ticket emails as a part of the sendmail turn up on the second server. The duplicate sendmail queue was halted and emptied. We are investigating all ticket exchanges to ensure that there are no further issues. We apologize to any OSG collaborator that may have been inconvenienced by these emails and are here to answer any questions.

Tuesday, March 23, 2010

The GOC will upgrade the following services beginning at Tuesday, March 30th at 14:00 UTC. No outages are expected for any of the services listed but the GOC reserves the hours of 14:00 - 18:00 UTC in the unlikely event that unexpected problems are encountered.

ITB version is now available for testing at https://oim-itb.grid.iu.edu; we encourage users to test this service before the production release.

Release Notes:

Added warnings if user removes a service from a resource or removes a VOReportName or VOReportName-FQAN or VOResourceOwner from a VO but navigates away from the edit page without hitting the update button. (Related ticket: https://ticket.grid.iu.edu/goc/viewer?id=8115).

Cleaned up profile edit and contact edit pages to (a) disallow altering person flag if contact is mapped to a registered DN (b) show associated DN in person section.

Modified registration page so its fields are consistent with contact edit or profile edit pages.

Authorization module to deal with unregistered and disabled-registered DNs more elegantly; Also modified home page to print more useful warnings if either case or no-DN case is hit.

Modified top menu to ensure unregistered and disabled-registered DN cases are taken into account.

About and Legend sideviews for contact edit and profile edit pages

Updates to DB connection module that will hopefully prevent "Ran out of Active Connection" error from recurring.

Updated submenu style to make it consistent with other GOC applications.

Removed active field from resource_group view and DB table since it was never used. In the future active field will be removed from facility, site, and support center fields to make the status of those entities clearer and easier to understand.

Added capability to display help tooltip. Added tooltip for active/disable column with some description.

In support center legacy view, fixed issue so when contact inforamation can not be found it displays information from previous row.

In status map, added check to see if a facility actually has any sites under it (and warn if there isn't any); Set a default camera angle / locations etc to show for Google Earth display.

Updated submenu style to make it consistent with other GOC applications.

Minor changes to JavaScript, and display styles.

(Experimental) BDII Information Gatherer: In Resource Group menu, provide an experimental information to display drop down item that shows treemaps of number of jobs, number of CPUs, etc. based on information collected from the BDII.

GOC-TX Ticket Exchange/Sync System1.2 (tx.grid.iu.edu)

The ITB TX server has been functioning properly.

Release Notes:

Updates to allow ticket exchange with BNL RT (Used by USATLAS support center). This setup will replace existing email-based ticket exchange sometime in April assuming production level tests succeed.

Updates to allow GGUS ticket submission form to pass along resource_group (or resource_name) as a parameter instead of resource_name (used in previous version); Destination VO on GOC ticket should be set if automatic resource_name or resource_group_name or concerned_VO based SC mapping is performed.

Wednesday, March 17, 2010

OSG Operations and Integration are pleased to announce the release of OSG version 1.2.8.

This is a security update for all OSG installations using Gratia probes to report accounting information. The urgency of this update depends on the following factors:

If your resource is using Gratia probes to report accounting information and are using Condor or Managed Fork, you should apply this update to prevent authorized local users from gaining elevated privileges

Other resources using Gratia probes for accounting can treat this as a low priority security update that may prevent authorized local users from being able to run a DOS attack on Gratia reporting

Resources not using Gratia do not need to apply this update since it does not apply in this case

This release also updates several software components, see the complete list below.

Gratia probes

osg-version

This update corrects a bug in the Gratia probes that may pose a security risk to resources in certain instances. Please see the VDT release notes for more details: http://vdt.cs.wisc.edu/releases/2.0.0/release-p15.html

Complete update instructions can be found at https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/OSG128UpdateInstructions

Sites using rpms to install Gratia probes should update their rpms to the latest versions in accordance with the guidelines given in the summary.

If you are updating from a version prior to 1.2.0 or installing the OSG stack for the first time see the full installation instructions at https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/

Search This Blog

OSG Operations Team

Open Science Grid Operations Center

Based at Indiana University, the OSG Operations Group provides a single point of operational support for the Open Science Grid (OSG). Operations performs real time Grid monitoring and problem tracking, provides support to users, developers and systems administrators, maintains grid services, provides security incident response, and maintains information repositories.