Its a relatively common problem when something goes wrong in a SAN for ext3 to detect the disk write errors and remount the filesystem read-only. Thats all well and good, only when the SAN is fixed I can't figure out how to re-re-mount the filesystem read-write without rebooting.

I have tried all sorts of different mount/tune2fs/dmsetup commands and I cannot figure out how to get it to un-flag the block device as write-protected. Rebooting will fix it, but I'd much rather do it on-line. An hour of googling has gotten me nowhere either. Save me ServerFault.

hmm, couple of questions 'Its a relatively common problem when something goes wrong in a SAN' why is your SAN so unreliable, I'd check that out first? Have you tried just unmounting with umount, and then mounting it again? Is there a good reason why you need to do a remount?. I usually only need to remount my root filesystems after maintainace.
–
The Unix JanitorMar 18 '10 at 23:00

umount bounces on open file handles, which are often from processes you'd much rather have exit sanely.
–
cagenutMar 19 '10 at 15:47

I have a similar issue where after a SAN issue VMs disks are read only and attempting to remount causes the same error in the OP. VMs are on esxi 4.1 with fibre channel storage. Reboot of VM fixes the problem. I dont personally think that this is anything to do with multipath. Surely there must be a way to fix without rebooting, especially since some services (apache) tend to keep running on a read only FS.
–
WillDec 26 '12 at 21:22

I came here looking for a solution to my own problem (which is different, a corrupt disk). I smiled instead. +1 for "The hell you are"
–
user1207217Mar 23 '13 at 8:51

I have the exact same issue as this, but I'm using LVM. Same lvdisplay would give me a "read failed after 0 of 4096 at 449197309952: Input/output error" until I did a "multipath -r", then LVM started displaying everything right without errors. I still can't get the partition to remount, though. Can't unmount either, says device is busy. If I shut down all processes using the device, I can unmount and then remount successfully, but I'd prefer just being able to remount the device read-write, as I should be able to...
–
mpontesMay 15 '13 at 10:06

I know FreeBSD, not Linux. But for fBSD it's mount -rw /mnt/foo, so this one looks the most right to me.
–
Chris S♦Mar 19 '10 at 0:27

1

I have never had this work in the scenario outlined in the question. Once the disk is marked read-only due to errors, it has always taken a reboot for me.
–
AlexMar 19 '10 at 0:52

I'll edit this into the OP, but Alex is right here, the problem appears to be below the filesystem: [root@localhost ~]# mount -o remount,rw /mnt/foo mount: block device /dev/mapper/mpath0 is write-protected, mounting read-only
–
cagenutMar 19 '10 at 15:37

1

Have you tried unmounting the partition and remounting it? I've had data errors before with a drive, unmounting (or remount,rw) has fixed it for me. This was with SATA drives (and older EIDE/SCSI) However, in your situation, I am wondering if the issue is that the drive channel needs to be reset. I'm wondering if HDIO_DRIVE_RESET sent through ioctl somehow. blockdev can be used to force rereading of the partition table which might do it. IDE exposes this with hdparm -w, perhaps with your FC drives, you've got a way to send the ioctl to the channel.
–
deletedMar 20 '10 at 0:38

I am a fan of preventing the issue in the first place. Most enterprise UNIX boxes will retry filesystem operations like forever. You as an administrator need to do some homework before tuning your MPIO configuration. If your application should wait until the device return to a usable state, then here is a solution. In your /etc/multipath.conf make sure that the device type you care about has a setting for "no_path_retry" set to "queue". Setting this will cause failed I/Os to queue until there is a valid path. We have done this for our EMC Symmtrix/DMX boxes to work about hiccups under certain conditions drive/controller/srdf path failures/recovery. When you want to fail the device manually during a failure it gets more complicated as you will need to use tools like dmsetup to flush/fail I/Os or temporarily change the multipath.conf file and rescan devices....etc.

This approach has saved our bacon countless times and is our standard for hundreds of boxes on a multicabinet/multivendor SAN with replication for disaster recovery.

Yep, its not that exact specific bug since I'm running much newer versions than those they reference, but all sorts of similar situations can cause it. The world of fibre-channel, hbas/hba-firmware/hba-drivers, array firmware, switch firmware, fabric design, device-mapper/multipathd config, lvm, and ext3 is just plain a lot of moving parts. Work on enough environments and you'll see this scenario caused by a grab bag of similar but not identical problems. The question at hand is, how to recover/remount without rebooting.
–
cagenutMar 19 '10 at 17:35

yes I got that. the question is how to change it.
–
cagenutApr 8 '10 at 15:53

@cagenut: No, that wasn't what you asked. That may be what you should have asked, though.
–
TeddyApr 8 '10 at 17:18

2

Right there in the post Teddy: "I cannot figure out how to get it to un-flag the block device as write-protected."
–
cagenutApr 8 '10 at 22:30

2

What a pedantic and worthless answer. The question displays the problem accurately and has all the information available, yet you took the time to write a non-answer. Would downvote if I had enough reputation on serverfault.
–
mpontesMay 15 '13 at 9:16

1

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post.
–
HopelessN00b♦Jan 22 '14 at 12:39

Linux simply doesn't cope well enough with medium-large scale SANs.
You MUST give it some care and fine tune the IO timeouts and multipath timeout handling, they're all pretty much at desktop-ready defaults.

Default disk IO timeout of 30seconds? The above thread? The note from RedHat (outdated as it may) stating they cannot handle a "State change notification" gracefully, the way it would be intended. That Redhat by default put the multipath bindings in a location (/var/lib) that would not accessible at the load time of the multipath driver? That you can't recursively hot-disable a PCI hotplug hba and temporarly automagically take all dependant LUNs offline until it has been replaced. That it has no multithreaded HW init and takes "a while" to come up with >1k luns. Udev, being a shell script...
–
darkfaderOct 17 '11 at 15:18