<!-- The actual name of your feature page should look something like: Features/YourFeatureName. This keeps all features in the same namespace -->

<!-- The actual name of your feature page should look something like: Features/YourFeatureName. This keeps all features in the same namespace -->

−

= ABRTBacktraceDeduplication =

+

= ABRT Backtrace Deduplication Service =

== Summary ==

== Summary ==

−

Backtrace deduplication server solves the problem of many duplicate crash reports being submitted by ABRT to Red Hat Bugzilla. It is

+

Backtrace deduplication service solves the problem of many duplicate crash reports being submitted by ABRT to Red Hat Bugzilla. It helps ABRT users to find duplicate reports before filing a new bug, and it helps package maintainers to triage/reassign/merge already reported bugs.

−

designed to help ABRT users to find duplicate reports before filing a new bug, and to help package maintainers to triage/reassign/merge already reported bugs.

−

−

Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the retrace server hardware, which is

−

a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.

== Owner ==

== Owner ==

Line 22:

Line 18:

* Name: [[User:mlichvar|Miroslav Lichvar]]

* Name: [[User:mlichvar|Miroslav Lichvar]]

* Email: mlichvar at redhat.com

* Email: mlichvar at redhat.com

+

+

* Name: Jan Smejda

== Current status ==

== Current status ==

* Targeted release: [[Releases/17|Fedora 17]]

* Targeted release: [[Releases/17|Fedora 17]]

−

* Last updated: 2012-01-17

+

* Last updated: 2012-03-26

−

* Percentage of completion: 60%

+

* Percentage of completion: 100%

== Detailed Description ==

== Detailed Description ==

<!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. -->

<!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. -->

+

Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the ABRT Retrace Server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.

== Benefit to Fedora ==

== Benefit to Fedora ==

−

<!-- What is the benefit to the platform? If this is a major capability update, what has changed? If this is a new feature, what capabilities does it bring? Why will Fedora become a better distribution or project because of this feature?-->

+

# Red Hat Bugzilla receives a lot of duplicate crash reports from ABRT clients, even for a single component. This makes ABRT reports less useful and causes developers to give ABRT reports lower priority. Red Hat Bugzilla receives a lot of low-quality reports, which should be closed without intervention from maintainers. For example, the simple-scan component is very affected by low quality of ABRT: many of its bug reports are duplicates, and some reports are incorrectly showing __libc_message and similar functions as crash functions.

+

# Red Hat Bugzilla contains multiple crash reports filed on end-user applications, that are caused by a single bug in a library. The crash reports are then analyzed multiple times by various developers, and that wastes their time.

== Scope ==

== Scope ==

−

<!-- What work do the developers have to accomplish to complete the feature in time for release? Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->

+

<ol>

+

<li>Implementation of backtrace metrics and indexes in [http://fedorahosted.org/btparser Btparser].

+

<ol>

+

<li>Damerau-Levenshtein distance</li>

+

<li>Jaro-Winkler distance</li>

+

</ol>

+

</li>

+

<li>Implementation of backtrace optimization in Btparser.</li>

+

<li>Backtrace deduplication service for C/C++ backtraces, which

+

takes a backtrace and component, and checks backtraces from all

+

related components (of libraries used by the crashed binary) in

+

Bugzilla

+

<ul>

+

<li>name: faf-btserver-find-duplicates</li>

+

</ul>

+

</li>

+

<li>HTTP interface to the backtrace deduplication service,

+

implemented as a CGI script

+

<ul>

+

<li>name: faf-btserver-cgi</li>

+

<li>must contain a machine interface (plain text)</li>

+

<li>must contain a human interface (HTML)</li>

+

<li>distinguishes between the two by reading HTTP_ACCEPT

+

environment variable</li>

+

<li>Apache configuration file to activate the CGI script</li>

+

<li>require a backtrace, component name, operating system

+

version from the user</li>

+

<li>respond with a list of bug ids, bug components, operating

+

system version, and similarity:

+

<pre>625354 glib2 14 94%

+

688952 glib2 15 94%

+

654789 emacs 14 92%</pre>

+

</li>

+

</ul>

+

</li>

+

<li>Crash report cleanup service, which merges crashes that are

+

already reported in Bugzilla. It also finds low quality reports

+

and duplicates and close/reassign them. The implementation consists of four scripts:

+

<ul>

+

<li>faf-btserver-cluster

+

<ul>

+

<li>The merging is done on a component level, where similar

+

bugs from the same component are merged, and also on a

+

cross-component level, where bugs from applications are

+

matched to those of their library dependencies, and bugs

+

in libraries are detected by searching duplicates between

+

components with shared dependencies.</li>

+

<li>Achieve the right balance between application bug and

+

library bug blaming. For example, many applications are

+

crashing on a <code>strcmp</code> call, but we can

+

reasonably assume there is no bug

+

in <code>strcmp</code>.</li>

+

<li>Compute distances and similarity indices between a bug

+

(backtrace of bug) and all relevant bugs</li>

+

<li>Compute backtrace quality</li>

+

<li>Store the computed data in a bug report</li>

+

<li>The number of crash combinations to check is

+

huge. Optimizations might be needed to limit checks to

+

backtraces having the same library calls on stack.</li>

+

</ul>

+

</li>

+

<li>faf-btserver-prepare-actions

+

<ul>

+

<li>find similar bugs in the bug reports</li>

+

<li>check bug statuses and generate a list of desired

+

actions to be performed on Bugzilla</li>

+

</ul>

+

</li>

+

<li>faf-btserver-push-actions-bugzilla

+

<ul>

+

<li>Performs desired actions on Bugzilla</li>

+

<li>If a bug that is filed on an application but belongs to

+

a library is detected, it will be either reassinged or a

+

comment will be added:<br/>

+

<code>It appears that this bug should be moved to

+

component glib2. Other bugs from emacs (bug #644532) and

+

evolution (bugs #758654, #749564) are duplicates of this

+

bug. Please consider marking them as duplicates and

+

moving this bug to glib2.</code>

+

</li>

+

</ul>

+

</li>

+

<li>faf-btserver-actions-log - generate a log of desired actions

+

on Bugzilla in a text file; this is good for development,

+

tweaking, debugging</li>

+

</ul>

+

</li>

+

<li>Synchronization script to update server metadata &mdash; bugs,

+

backtraces, builds, RPMs</li>

+

<li>[https://fedorahosted.org/abrt ABRT] client using

+

Backtrace deduplication server</li>

+

</ol>

== How To Test ==

== How To Test ==

+

# via ABRT

+

# via web interface

<!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this feature is expected to pass when it is done. If it needs to be tested with different hardware or software configurations, indicate them. The more specific you can be, the better the community testing can be.

<!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this feature is expected to pass when it is done. If it needs to be tested with different hardware or software configurations, indicate them. The more specific you can be, the better the community testing can be.

Line 54:

Line 147:

== User Experience ==

== User Experience ==

<!-- If this feature is noticeable by its target audience, how will their experiences change as a result? Describe what they will see or notice. -->

<!-- If this feature is noticeable by its target audience, how will their experiences change as a result? Describe what they will see or notice. -->

+

# Maintainers: ABRT will open lower amount of bug duplicates

+

# Maintainers: Bugs across components will be marked as duplicates

+

## by adding comment to each bug with links to other bugs

+

## by closing all bugs except one as duplucates, with the remaining opened bug being reassingned to a common library

== Dependencies ==

== Dependencies ==

+

None

<!-- What other packages (RPMs) depend on this package? Are there changes outside the developers' control on which completion of this feature depends? In other words, completion of another feature owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate? Other upstream projects like the kernel (if this is not a kernel feature)? -->

<!-- What other packages (RPMs) depend on this package? Are there changes outside the developers' control on which completion of this feature depends? In other words, completion of another feature owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate? Other upstream projects like the kernel (if this is not a kernel feature)? -->

== Contingency Plan ==

== Contingency Plan ==

−

<!-- If you cannot complete your feature by the final development freeze, what is the backup plan? This might be as simple as "None necessary, revert to previous release behaviour." Or it might not. If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy. -->

+

ABRT uses duplicate hashes to detect duplicates as usual. Without

+

the backtrace deduplication server, ABRT bugs are still filed on the

+

software component that owns the crashed binary. Duplicates within

+

single component can be closed by extending an existing script,

+

without having a server deployed.

== Documentation ==

== Documentation ==

<!-- Is there upstream documentation on this feature, or notes you have written yourself? Link to that material here so other interested developers can get involved. -->

<!-- Is there upstream documentation on this feature, or notes you have written yourself? Link to that material here so other interested developers can get involved. -->

−

*

+

No documentation is currently available.

+

+

If you want to see more details about implementation, you can check the source code:

<!-- The Fedora Release Notes inform end-users about what is new in the release. Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->

<!-- The Fedora Release Notes inform end-users about what is new in the release. Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->

<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns. If there are any such changes involved in this feature, indicate them here. You can also link to upstream documentation if it satisfies this need. This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->

<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns. If there are any such changes involved in this feature, indicate them here. You can also link to upstream documentation if it satisfies this need. This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->

−

*

+

Fedora's bug reporting tool (ABRT) now uses new sophisticated server-side algorithms to discover bug duplicates and direct new reports to right operating system component.

== Comments and Discussion ==

== Comments and Discussion ==

−

* See [[Talk:Features/YourFeatureName]] <!-- This adds a link to the "discussion" tab associated with your page. This provides the ability to have ongoing comments or conversation without bogging down the main feature page -->

+

* See [[Talk:Features/ABRTBacktraceDeduplication]]

−

−

[[Category:FeaturePageIncomplete]]

+

[[Category:FeatureAcceptedF17]]

<!-- When your feature page is completed and ready for review -->

<!-- When your feature page is completed and ready for review -->

<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler -->

<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler -->

<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete-->

<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete-->

<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process -->

<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process -->

ABRT Backtrace Deduplication Service

Summary

Backtrace deduplication service solves the problem of many duplicate crash reports being submitted by ABRT to Red Hat Bugzilla. It helps ABRT users to find duplicate reports before filing a new bug, and it helps package maintainers to triage/reassign/merge already reported bugs.

Current status

Detailed Description

Backtrace deduplication server is a collection of newly-developed tools that will be deployed on the ABRT Retrace Server hardware, which is a part of Fedora infractructure. ABRT will contain a client tool and integration with the server.

Benefit to Fedora

Red Hat Bugzilla receives a lot of duplicate crash reports from ABRT clients, even for a single component. This makes ABRT reports less useful and causes developers to give ABRT reports lower priority. Red Hat Bugzilla receives a lot of low-quality reports, which should be closed without intervention from maintainers. For example, the simple-scan component is very affected by low quality of ABRT: many of its bug reports are duplicates, and some reports are incorrectly showing __libc_message and similar functions as crash functions.

Red Hat Bugzilla contains multiple crash reports filed on end-user applications, that are caused by a single bug in a library. The crash reports are then analyzed multiple times by various developers, and that wastes their time.

Scope

Backtrace deduplication service for C/C++ backtraces, which
takes a backtrace and component, and checks backtraces from all
related components (of libraries used by the crashed binary) in
Bugzilla

name: faf-btserver-find-duplicates

HTTP interface to the backtrace deduplication service,
implemented as a CGI script

name: faf-btserver-cgi

must contain a machine interface (plain text)

must contain a human interface (HTML)

distinguishes between the two by reading HTTP_ACCEPT
environment variable

Apache configuration file to activate the CGI script

require a backtrace, component name, operating system
version from the user

respond with a list of bug ids, bug components, operating
system version, and similarity:

625354 glib2 14 94%
688952 glib2 15 94%
654789 emacs 14 92%

Crash report cleanup service, which merges crashes that are
already reported in Bugzilla. It also finds low quality reports
and duplicates and close/reassign them. The implementation consists of four scripts:

faf-btserver-cluster

The merging is done on a component level, where similar
bugs from the same component are merged, and also on a
cross-component level, where bugs from applications are
matched to those of their library dependencies, and bugs
in libraries are detected by searching duplicates between
components with shared dependencies.

Achieve the right balance between application bug and
library bug blaming. For example, many applications are
crashing on a strcmp call, but we can
reasonably assume there is no bug
in strcmp.

Compute distances and similarity indices between a bug
(backtrace of bug) and all relevant bugs

Compute backtrace quality

Store the computed data in a bug report

The number of crash combinations to check is
huge. Optimizations might be needed to limit checks to
backtraces having the same library calls on stack.

faf-btserver-prepare-actions

find similar bugs in the bug reports

check bug statuses and generate a list of desired
actions to be performed on Bugzilla

faf-btserver-push-actions-bugzilla

Performs desired actions on Bugzilla

If a bug that is filed on an application but belongs to
a library is detected, it will be either reassinged or a
comment will be added:It appears that this bug should be moved to
component glib2. Other bugs from emacs (bug #644532) and
evolution (bugs #758654, #749564) are duplicates of this
bug. Please consider marking them as duplicates and
moving this bug to glib2.

faf-btserver-actions-log - generate a log of desired actions
on Bugzilla in a text file; this is good for development,
tweaking, debugging

User Experience

by closing all bugs except one as duplucates, with the remaining opened bug being reassingned to a common library

Dependencies

None

Contingency Plan

ABRT uses duplicate hashes to detect duplicates as usual. Without
the backtrace deduplication server, ABRT bugs are still filed on the
software component that owns the crashed binary. Duplicates within
single component can be closed by extending an existing script,
without having a server deployed.

Documentation

No documentation is currently available.

If you want to see more details about implementation, you can check the source code: