Pinned topicmmcrfs is failing with "No such device", and "Error accessing disks"

‏2013-01-30T19:15:47Z
|Tags:

Answered question
This question has been answered.

Unanswered question
This question has not been answered yet.

Hey all..I'm pretty new to GPFS so bear with me.

I'm trying to set up a 2 node cluster (both quorum), both machines have the OS installed on sda and a free disk on sdb, pretty simple right? I've been using guides such as this http://www.ibm.com/developerworks/wikis/display/hpccentral/gpfs+quick+start+guide+for+linux and everything is going well right up until I need to run mmcrfs. Both machines have passwordless ssh access set up and I've even tried chmod'ing the block device under /dev (I don't know if this was a good idea or not..)

Anyway, here is the output of anything that may help. I've been on this for about 2 days with no luck so anything is a huge help. I hope it's just some simple issue.

That's all I can think of. If anything else is needed please let me know.

Hey all..I'm pretty new to GPFS so bear with me.

I'm trying to set up a 2 node cluster (both quorum), both machines have the OS installed on sda and a free disk on sdb, pretty simple right? I've been using guides such as this http://www.ibm.com/developerworks/wikis/display/hpccentral/gpfs+quick+start+guide+for+linux and everything is going well right up until I need to run mmcrfs. Both machines have passwordless ssh access set up and I've even tried chmod'ing the block device under /dev (I don't know if this was a good idea or not..)

Anyway, here is the output of anything that may help. I've been on this for about 2 days with no luck so anything is a huge help. I hope it's just some simple issue.

Sorry, overlooked this in the original post: it looks like you don't have NSD servers defined for either NSD, which GPFS takes to mean that both disks are visible directly on both nodes (as would be the case had you had a SAN). Use mmchnsd or mmdelnsd/mmcrnsd to re-define NSDs with a primary server.

Re: mmcrfs is failing with "No such device", and "Error accessing disks"

Sorry, overlooked this in the original post: it looks like you don't have NSD servers defined for either NSD, which GPFS takes to mean that both disks are visible directly on both nodes (as would be the case had you had a SAN). Use mmchnsd or mmdelnsd/mmcrnsd to re-define NSDs with a primary server.

I'm not quite sure what you mean..here is some output from mmlscluster, it says ITE5 is the primary node with ITE6 being a secondary node. Is this different than a primary NSD server? If I indeed have a primary and secondary set up, is something wrong in maybe diskdef.txt?

Primary/secondary configuration server nodes listed by mmlscluster are entirely unrelated to NSD server definitions. You need to specify an NSD server when creating an NSD for a disk device that is only visible from some but not all nodes, using the second field of the NSD descriptor. Please see 'mmcrnsd' man page.

Thanks for your help everyone but I still seem to be hitting the same issue..after running

mmchnsd
"gpfs3nsd:ITE5;gpfs4nsd:ITE6"

as truongv suggested I tried to recreate the filesystem but I am receiving the same error..my first thought was that the diskdef.txt file might be wrong (since some config was changed), but even with the NSD servers in diskdef.txt it's still trying to open gpfs4nsd on ITE5 which is obviously impossible.

Here is the output of mmlsnsd after the NSDs have been given appropriate servers:

What I think is weird is that the log seems to be complaining about no such device "gpfs1" existing when the man page for mmcrfs specifically says that the device can not already exist under /dev..maybe I'm reading it wrong.

Thanks for your help everyone but I still seem to be hitting the same issue..after running
<pre class="jive-pre">
mmchnsd
"gpfs3nsd:ITE5;gpfs4nsd:ITE6"
</pre>
as truongv suggested I tried to recreate the filesystem but I am receiving the same error..my first thought was that the diskdef.txt file might be wrong (since some config was changed), but even with the NSD servers in diskdef.txt it's still trying to open gpfs4nsd on ITE5 which is obviously impossible.

Here is the output of mmlsnsd after the NSDs have been given appropriate servers:
<pre class="jive-pre">
File system Disk name NSD servers --------------------------------------------------------------------------- (free disk) gpfs3nsd ITE5 (free disk) gpfs4nsd ITE6
</pre>

What I think is weird is that the log seems to be complaining about no such device "gpfs1" existing when the man page for mmcrfs specifically says that the device can not already exist under /dev..maybe I'm reading it wrong.

Re: mmcrfs is failing with "No such device", and "Error accessing disks"

Thanks for your help everyone but I still seem to be hitting the same issue..after running
<pre class="jive-pre">
mmchnsd
"gpfs3nsd:ITE5;gpfs4nsd:ITE6"
</pre>
as truongv suggested I tried to recreate the filesystem but I am receiving the same error..my first thought was that the diskdef.txt file might be wrong (since some config was changed), but even with the NSD servers in diskdef.txt it's still trying to open gpfs4nsd on ITE5 which is obviously impossible.

Here is the output of mmlsnsd after the NSDs have been given appropriate servers:
<pre class="jive-pre">
File system Disk name NSD servers --------------------------------------------------------------------------- (free disk) gpfs3nsd ITE5 (free disk) gpfs4nsd ITE6
</pre>

What I think is weird is that the log seems to be complaining about no such device "gpfs1" existing when the man page for mmcrfs specifically says that the device can not already exist under /dev..maybe I'm reading it wrong.

I would try:
dd read the raw device /dev/sdb make sure you can see the disk
create the filesystem on just one disk and see which one is good/bad
run each command on the respective NSD server to see if it makes any difference
Do you have /var/mmfs/etc/nsddevices user exit script?
Can you show the output of mmdevdiscover on both node? Also /var/mmfs/gen/mmsdrfs file.

Re: mmcrfs is failing with "No such device", and "Error accessing disks"

I would try:
dd read the raw device /dev/sdb make sure you can see the disk
create the filesystem on just one disk and see which one is good/bad
run each command on the respective NSD server to see if it makes any difference
Do you have /var/mmfs/etc/nsddevices user exit script?
Can you show the output of mmdevdiscover on both node? Also /var/mmfs/gen/mmsdrfs file.

The NSD id of gpfs4nsd doesn't look right. The first part should contains IP address instead of 0s.

gpfs4nsd 000000005106F8A6 /dev/sdb ITE6 server node

However, this doesn't explain why it ITE5 can't access the ITE6's NSD disk, gpfs4nsd. I suggest you to start everything from scratch but follow everything from the book this time. Make sure your hostnames resolve. Check /etc/hosts or DNS on all nodes. Since you can't get obtain a lock, make sure you don't have any issue with ssh/rsh.

The NSD id of gpfs4nsd doesn't look right. The first part should contains IP address instead of 0s.
<pre class="jive-pre">
gpfs4nsd 000000005106F8A6 /dev/sdb ITE6 server node
</pre>
However, this doesn't explain why it ITE5 can't access the ITE6's NSD disk, gpfs4nsd. I suggest you to start everything from scratch but follow everything from the book this time. Make sure your hostnames resolve. Check /etc/hosts or DNS on all nodes. Since you can't get obtain a lock, make sure you don't have any issue with ssh/rsh.