Troubleshooting

Typical errors when bringing up a store are typos and
misconfiguration. It is also possible to run into network port
conflicts, especially if the deployment failed and you are
starting over. In that case be sure to remove all partial
store data and configuration and kill any remnant processes.
Processes associated with a store as reported by "jps -m" are
one of these:

StorageNodeAgentImpl

ManagedService

If you kill the StorageNodeAgentImpl it should also kill its
managed processes.

You can use the monitoring tab in the Admin Console to look at
various log files.

There are detailed log files available in
KVROOT/storename/log as well as logs of the
bootstrap process in KVROOT/*.log. The
bootstrap logs are most useful in diagnosing initial startup
problems. The logs in storename/log appear
once the store has been configured. The logs on the host
chosen for the admin process are the most detailed and include
a store-wide consolidated log file:
KVROOT/storename/log/storename_*.log

Each line in a log file is prefixed with the date of the message,
its severity, and the name of the component which issued it. For
example:

When looking for more context for events at a given time, use
the timestamp and component name to narrow down the section of
log to peruse.

Error messages in the logs show up with "SEVERE" in them so you
can grep for that if you are troubleshooting. SEVERE error
messages are also displayed in the Admin's Topology tab, in the
CLI's show events command, and when you use
the ping command.

In addition to log files, these directories may also contain
*.perf files, which are performance files for the Replication
Nodes.

Where to Find Error Information

As your store operates, you can discover information about
any problems that may be occurring by looking at the plan
history and by looking at error logs.

The plan history indicates if any configuration or
operational actions you attempted to take against the store
encountered problems. This information is available as the
plan executes and finishes. Errors are reported in the plan
history each time an attempt to run the plan fails. The plan
history can be seen using the CLI show plan
command, or in the Admin's Plan History
tab.

Other problems may occur asynchronously. You can learn about
unexpected failures, service downtime, and performance issues
through the Admin's critical events display in the Logs tab,
or through the CLI's show events command.
Events come with a time stamp, and the description may
contain enough information to diagnose the issue. In other
cases, more context may be needed, and the administrator may
want to see what else happened around that time.

The store-wide log consolidates logging output from all
services. Browsing this file might give you a more complete
view of activity during the problem period. It can be viewed
using the Admin's Logs tab, by using the CLI's
logtail command, or by directly viewing
the <storename>_N.log file in the
<KVHOME>/<storename>/log directory. It is also
possible to download the store-wide log file using the Admin's
Logs tab.

Service States

Oracle NoSQL Database uses three different types of services, all of which
should be running correctly in order for your store to be
in a healthy state. The three service types are the Admin,
Storage Nodes, and Replication Nodes. You should have
multiple instances of these services running throughout
your store.

Each service has a status that can be viewed using any of
the following:

The Topology tab in the Admin Console

The show topology command in the
Administration CLI.

Using the ping command.

The status values can be one of the following:

STARTING

The service is coming up.

RUNNING

The service is running normally.

STOPPING

The service is stopping. This may take some time as
some services can be involved in time-consuming
activities when they are asked to stop.

WAITING_FOR_DEPLOY

The service is waiting for commands or
acknowledgments from other services during its
startup processing. If it is a Storage Node, it is
waiting for the initial deploy-SN command. Other
services should transition out of this phase
without any administrative intervention from the
user.

STOPPED

The service was stopped intentionally and cleanly.

ERROR_RESTARTING

The service is in an error state. Oracle NoSQL Database attempts
to restart the service.

ERROR_NO_RESTART

The service is in an error state and is not automatically
restarted. Administrative intervention is required.

UNREACHABLE

The service is not reachable by the Admin. If the status
was seen using a command issued by the Admin, this
state may mask a STOPPED or ERROR state.

A healthy service begins with STARTING.
It may transition to WAITING_FOR_DEPLOY
for a short period before going on to
RUNNING.

ERROR_RESTARTING and
ERROR_NO_RESTART indicate that there has
been a problem that should be investigated. An
UNREACHABLE service may only be in that
state temporarily, although if that state persists, the
service may be truly in an
ERROR_RESTARTING or
ERROR_NO_RESTART state.

Note that the Admin's Topology tab only shows abnormal
service statuses. A service that is
RUNNING does not display its status in
that tab.

Useful Commands

The following commands may be useful to you when
troubleshooting your KVStore: