Troubleshooting tips for NetWare SSH and SFTP

Environment

Situation

This document attempts to give a starting point for troubleshooting problems connecting to or authenticating to NetWare SSH (including SFTP and SCP). It may be expanded from time to time.

Resolution

Enable the Highest Debug Logging

If users cannot connect or authenticate through the OpenSSH suite (offered through SSHD.NLM), edit the /etc/ssh/sshd_config file. Find the LogLevel parameter and set it to:

LogLevel DEBUG3

To put this change in place, SSHD.NLM must be unloaded and reloaded.

Obtain an sshd.log of the problem occurring.

After reproducing the problem, obtain the sys:etc/ssh/logs/sshd.log file. (Older events may have been archived in sshd1.log, sshd2.log, etc.) There will be extensive areas of the logs which are of little concern. Knowing what to look for for certain important events is very useful. Common areas of concern in sshd.log are discussed below. When looking over a log, it will help to scroll through and find these sections, or do an electronic search for the syntax involved. By looking at these areas, the nature of the problem can often be ascertained. For example, it might be seen that the user object was not found; or that the server and/or volume the user is trying to reach was not found during the applicable searches.

Identifying your session:

One of the easiest ways to find your session in a log is to search on the user name and/or IP address in use by the client. This will yield such results as:

The number in <brackets> after "Session" or "Shell" is the session number, also known as the thread number. Activities that are general and don't necessarily relate to one session will show session 0, for example:

17 Nov - 02:51:42[0001670975] (aa6aa520)SSHD Session <0>

Individual sessions will get numbered 1 and higher, and the system will increment the session number for each session that comes in. More than one session can be going on at once, so be careful to pay attention to the one you are attempting to follow.

Searching for the user in eDirectory:

In the log messages, a normal, successful ldap search for a user will take the following form:

- The first line shows that the user name being searched on in the above log information is "sftpu1" and that SSHD is searching for a object whose "uniqueid" attribute matches that name. It is possible that some user objects may not have this attribute, depending upon the tool which created the user. Older tools like NWADMN32 did not create that attribute.

- The second line shows the searchBase (the location in the tree where the search is starting) as well as the scope, which will be either "one level" for a simple context search, or "subtree" to include all sub-containers. These parameters are controlled by the "eDirNameContext" settings in sys:\etc\ssh\sshd_config. If more than one line with that setting is present in the config file, multiple debug lines will occur. If a user is not found, it is typically because the correct context is not being searched.

- The third and fourth lines show that the search had a successful result. "attribute = uniqueid" was found where "value = sftpu1". If these lines do not appear, then the search was not successful. No "error" message will be shown. The absence of these "attribute" and "value" lines would be the main clue that no result was found.

- If the search fails, then it is possible that a new, almost identical search will be done where SSHD searches on "cn" rather than "uniqueid". This can help if the user is missing the "uniqueid" attribute. However, in a default tree and default SSHD configuration, it is common that the ldap anonymous bind does not have enough rights to browse "cn", so this search would typically fail also.

- Shortly after a user is found, additional lines will show whether the password was correctly validated, and what home directory was read out of eDir. For example:

Similarly to user searches, this shows the name being searched, the object type being filtered against, and the searchBase and scope of the search. (See the above "Searching for users" section for more details.) The presence of the "attribute", "value" and "returned" messages shows the success. This search is also controlled by the "eDirNameContext" settings mentioned above.

Then a broader server search will be done, to identify all NCP Servers in the eDirNameContext search areas. It will look like:

Multiple servers may be found, so portions of this pattern may repeat. The message "SERVER1 is not existing" simply means that when SSHD.NLM went to add the server name to it's list, the name was not already in it's list. This is normal.

Note: If a certain server is not found in this search, users will not be able to navigate to that server, or be placed in a home directory on that server. If a server is not found, it is typically because it's context is not being covered by the "eDirNameContext" setting(s) in sshd_config.

Searching for volumes in eDirectory

After the server searches are done, a volume search will be done, and will look like this:

Note: If a certain volume is not found, users will not be able to navigate to that volume, or be placed in a home directory on that volume. If a volume is not found, it is typically because it's context is not being covered by the "eDirNameContext" setting(s) in sshd_config.

Reaching a user's home server or home directory:

For a user to reach their home directory, the home server and home volume must be found in the ldap searches described above. If a user is not reaching a their home directory or a location to which they are trying to 'cd', review the debug search output and see if the server and volume in question were found in the initial searches. Note: SSHD does not search for servers and volume upon user request. It builds a server and volume list after the initial login, and then will use that static list during the rest of the user's session.

In addition to *finding* the server and volume in question, SSHD must be able to create an identity token and path context to the server and volume in question. These can be found by searching for mention of calls to "create_identity" and "NXCreatePathContext". These functions are provided by LIBC.NLM and can be a source of errors. The "call" being made will appear, and soon after, the resulting "return" will occur. Both will show the name of the function, so they should be easy to find. Just be sure to look at the correct session/thread number (see "Identifying Your Session" above). If an error occurred in one of these APIs, part of the output will list 4 error codes, in this format:

rc: 117, errno 7, h_errno: 10110, clienterrno: 0

The first code (rc) is usually the return code from LIBC itself. LIBC codes can be found in KB 10100304 (here). Occasionally, the rc: is an eDirectory error code. eDir error codes can be referenced in KB 10010902 (here). That document was original written for older versions of DS but the information still applies.

The second code (errno) is not generally useful. It typically will report "7" both in cases of success and failure.

The third code (h_errno) may or may not be useful. A code in the 10000 range is typically a Winsock error. For example, if LIBC received an error from Winsock (communication, host name resolution, etc) it will be shown here. Sometimes a value of 997 is shown here. The source of "997" has not been determined by the author of this document, but it occurs in enough different conditions that it is considered too generic, and not helpful. Sometimes a NCP or eDir error appear here. Also, the first (LIBC) error code, when looked up in the LIBC error codes document, might instruct you to check this code for clarification.

The fourth code (clienterrno) will often contain an eDirectory error code. eDir error codes can be referenced in KB 10010902 (here). That document was original written for older versions of DS but the information still applies.

The create_identity call will typically contain the fully distinguished user name. The NXCreatePathContext call will typically contain the server name and volume name where the user's "home" has been determined. These are good values to double check.

Between these 2 calls, as well as after the second call, messages containing home directory information will typically be found. This can help you determine whether the system is, for some reason, deciding that it needs to alter ("fix up") the target destination. If the user is having a hard time reaching the right directory, those lines may give clues. If unexpected alterations of "fix up" is occurring to the home directory information, then some of the settings in sys:/etc/ssh/sshd_config may be coming into play. Check that file for these settings, and read the comments next to them to understand their impact: