Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough

08/04/2008

Faulty DIMM replaced

03/11/2008

lhcb065.usc.cesga.es

On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

01/11/2008

All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008

after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

15/10/2008

after unplugging and plugging again all the memory the problem was disappeared

16/10/2008

lhcb066.usc.cesga.es

Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough

08/04/2008

Faulty DIMM replaced

03/11/2008

lhcb065.usc.cesga.es

On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

01/11/2008

All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008

after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

15/10/2008

after unplugging and plugging again all the memory the problem was disappeared

16/10/2008

lhcb066.usc.cesga.es

Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough

08/04/2008

Faulty DIMM replaced

03/11/2008

lhcb065.usc.cesga.es

On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

01/11/2008

All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008

after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

15/10/2008

after unplugging and plugging again all the memory the problem was disappeared

16/10/2008

lhcb066.usc.cesga.es

Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough

08/04/2008

Faulty DIMM replaced

03/11/2008

Changed:

<<

lhcb065.usc.cesga.es

On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

01/11/2008

>>

lhcb065.usc.cesga.es

On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

01/11/2008

All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008

faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008)

13/05/2008

motherboard and power supply replaced

2/07/2008

>>

lhcb070.usc.cesga.es

faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008)

13/05/2008

motherboard and power supply replaced

02/07/2008

lhcb074.usc.cesga.es

unknown

02/06/2008

after unplugging and plugging again all the memory the problem was solved

13/06/2008

lhcb054.usc.cesga.es

faulty DIMM

30/06/2008

DIMM replaced

28/07/2008

lhcb079.usc.cesga.es

after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory

15/10/2008

after unplugging and plugging again all the memory the problem was disappeared

16/10/2008

Added:

>>

lhcb066.usc.cesga.es

Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough

08/04/2008

Faulty DIMM replaced

03/11/2008

lhcb065.usc.cesga.es

On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory