Sebastian Baltes

Edit and Comment History of Stack Overflow Threads

TL;DR:This blog post explains how to retrieve the edit and comment history of Stack Overflow threads using the SOTorrent dataset.

Information about the most recent database layout and dataset version can be found on the project page.

Creating the Edit History Table

In our MSR 2018 paper, we describe a close relationship between post edits and comments.
To support a further investigation of this relationship, we show how to create a new table EditHistory, which aggregates all title and body edits of Stack Overflow posts, together with post comments.
Using this table, one can easily retrieve the edit and comment history of individual threads (see below).
The BigQuery versions of the following queries can be found here.

Then, we retrieve the edit history of the Stack Overflow thread mentioned in this earlier blog post with the following query:

SELECT *
FROM EditHistory
WHERE PostId IN (
SELECT PostId
FROM Threads
WHERE ParentId = (
SELECT ParentID
FROM Threads
# this is an answer id, the question id 3758606 yields the same result
WHERE PostId=3758880
)
)
ORDER BY CreationDate;

The following figure shows the result of the above query (truncated):

Possible Extensions

To retrieve the edit history on the level of individual text or code blocks, one could either modify the table creation query or join the InitialBody and BodyEdit events with table PostBlockVersion, using the PostHistoryId of the events.

More Information

First insights into the evolution of Stack Overflow posts, and in particular the connection of edits and comments, can be found in our research paper about SOTorrent.
The dataset is available as a database dump (including import scripts) on Zenodo and as a public Google BigQuery dataset. Visit the project page to get the most recent version.