Troubleshooting Sidekiq

Sidekiq is the background job processor GitLab uses to asynchronously run
tasks. When things go wrong it can be difficult to troubleshoot. These
situations also tend to be high-pressure because a production system job queue
may be filling up. Users will notice when this happens because new branches
may not show up and merge requests may not be updated. The following are some
troubleshooting steps that will help you diagnose the bottleneck.

Note: GitLab administrators/users should consider working through these
debug steps with GitLab Support so the backtraces can be analyzed by our team.
It may reveal a bug or necessary improvement in GitLab.

Note: In any of the backtraces, be weary of suspecting cases where every
thread appears to be waiting in the database, Redis, or waiting to acquire
a mutex. This may mean there's contention in the database, for example,
but look for one thread that is different than the rest. This other thread
may be using all available CPU, or have a Ruby Global Interpreter Lock,
preventing other threads from continuing.

Thread dump

Send the Sidekiq process ID the TTIN signal and it will output thread
backtraces in the log file.

kill -TTIN <sidekiq_pid>

Check in /var/log/gitlab/sidekiq/current or $GITLAB_HOME/log/sidekiq.log for
the backtrace output. The backtraces will be lengthy and generally start with
several WARN level messages. Here's an example of a single thread's backtrace:

In some cases Sidekiq may be hung and unable to respond to the TTIN signal.
Move on to other troubleshooting methods if this happens.

Process profiling with perf

Linux has a process profiling tool called perf that is helpful when a certain
process is eating up a lot of CPU. If you see high CPU usage and Sidekiq won't
respond to the TTIN signal, this is a good next step.

If perf is not installed on your system, install it with apt-get or yum:

Above you see sample output from a perf report. It shows that 97% of the CPU is
being spent inside Nokogiri and xmlXPathNodeSetMergeAndClear. For something
this obvious you should then go investigate what job in GitLab would use
Nokogiri and XPath. Combine with TTIN or gdb output to show the
corresponding Ruby code where this is happening.

The GNU Project Debugger (gdb)

gdb can be another effective tool for debugging Sidekiq. It gives you a little
more interactive way to look at each thread and see what's causing problems.

Note: Attaching to a process with gdb will suspends the normal operation
of the process (Sidekiq will not process jobs while gdb is attached).

Once you're done debugging with gdb, be sure to detach from the process and
exit:

detachexit

Check for blocking queries

Sometimes the speed at which Sidekiq processes jobs can be so fast that it can
cause database contention. Check for blocking queries when backtraces above
show that many threads are stuck in the database adapter.

The PostgreSQL wiki has details on the query you can run to see blocking
queries. The query is different based on PostgreSQL version. See
Lock Monitoring for
the query details.