<!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. -->

<!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. -->

+

Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the ABRT Retrace Server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.

ABRT Backtrace Deduplication Service

Summary

Backtrace deduplication service solves the problem of many duplicate crash reports being submitted by ABRT to Red Hat Bugzilla. It helps ABRT users to find duplicate reports before filing a new bug, and it helps package maintainers to triage/reassign/merge already reported bugs.

Current status

Detailed Description

Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the ABRT Retrace Server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.

Benefit to Fedora

Red Hat Bugzilla receives a lot of duplicate crash reports from ABRT clients, even for a single component. This makes ABRT reports less useful and causes developers to give ABRT reports lower priority. Red Hat Bugzilla receives a lot of low-quality reports, which should be closed without intervention from maintainers. For example, the simple-scan component is very affected by low quality of ABRT: many of its bug reports are duplicates, and some reports are incorrectly showing __libc_message and similar functions as crash functions.

Red Hat Bugzilla contains multiple crash reports filed on end-user applications, that are caused by a single bug in a library. The crash reports are then analyzed multiple times by various developers, and that wastes their time.

Scope

Backtrace deduplication service for C/C++ backtraces, which
takes a backtrace and component, and checks backtraces from all
related components (of libraries used by the crashed binary) in
Bugzilla

name: faf-btserver-find-duplicates

HTTP interface to the backtrace deduplication service,
implemented as a CGI script

name: faf-btserver-cgi

must contain a machine interface (plain text)

must contain a human interface (HTML)

distinguishes between the two by reading HTTP_ACCEPT
environment variable

Apache configuration file to activate the CGI script

require a backtrace, component name, operating system
version from the user

respond with a list of bug ids, bug components, operating
system version, and similarity:

625354 glib2 14 94%
688952 glib2 15 94%
654789 emacs 14 92%

Crash report cleanup service, which merges crashes that are
already reported in Bugzilla. It also finds low quality reports
and duplicates and close/reassign them. The implementation consists of four scripts:

faf-btserver-cluster

The merging is done on a component level, where similar
bugs from the same component are merged, and also on a
cross-component level, where bugs from applications are
matched to those of their library dependencies, and bugs
in libraries are detected by searching duplicates between
components with shared dependencies.

Achieve the right balance between application bug and
library bug blaming. For example, many applications are
crashing on a strcmp call, but we can
reasonably assume there is no bug
in strcmp.

Compute distances and similarity indices between a bug
(backtrace of bug) and all relevant bugs

Compute backtrace quality

Store the computed data in a bug report

The number of crash combinations to check is
huge. Optimizations might be needed to limit checks to
backtraces having the same library calls on stack.

faf-btserver-prepare-actions

find similar bugs in the bug reports

check bug statuses and generate a list of desired
actions to be performed on Bugzilla

faf-btserver-push-actions-bugzilla

Performs desired actions on Bugzilla

If a bug that is filed on an application but belongs to
a library is detected, it will be either reassinged or a
comment will be added:It appears that this bug should be moved to
component glib2. Other bugs from emacs (bug #644532) and
evolution (bugs #758654, #749564) are duplicates of this
bug. Please consider marking them as duplicates and
moving this bug to glib2.

faf-btserver-actions-log - generate a log of desired actions
on Bugzilla in a text file; this is good for development,
tweaking, debugging

User Experience

by closing all bugs except one as duplucates, with the remaining opened bug being reassingned to a common library

Dependencies

None

Contingency Plan

ABRT uses duplicate hashes to detect duplicates as usual. Without
the backtrace deduplication server, ABRT bugs are still filed on the
software component that owns the crashed binary. Duplicates within
single component can be closed by extending an existing script,
without having a server deployed.