Detecting Denial-of-Service Attacks on Social Media:Applying NLP to Network Security

This page provides extra details for the data used in my NAACL-2018 paper:

Using Social Media Text to Detect Denial-of-Service Attacks: Applying NLP to Network Security
Nathanael Chambers and Ben Fry and James McMastersNAACL-2018, New Orleans, USA. June 2018.PDF download

Tweets Corpus

Unfortunately, Twitter's terms of service prohibit us from making our collected tweets available. I will not send raw tweets, so please do not request them from me. You can duplicate our dataset by using Twitter's web-based search tool with date constraints and the company name as a keyword.

Attack Dates List

The tweets collected for this project focused on 20 day windows around known historical DDoS attacks. We searched old news articles for past attacks, and created a list of attacks where the date of the attack could be ascertained with certainty from the news or related sources. The following services and dates are the final set of attacks. Note that the dates are not the news article publication dates, but rather the dates of the attacks themselves. This is an important distinction as news articles also generate Twitter chatter, but those days are not necessarily attack days (they often follow the day of attack, in fact).

Organization Name

Attack Date (YEAR-MM-DD)

Ancestry.com

2014-06-16

2014-06-17

Bank of America

2012-09-19

BBC Website

2015-03-14

2015-12-31

Bitcoin

2014-02-11

Blizzard

2016-08-03

2016-08-23

2016-08-24

2016-08-31

Call of Duty

2014-09-20

Chase Bank

2012-09-19

Department of Justice (USA)

2012-01-19

DNS

2016-10-21

Evernote

2014-06-10

Feedly

2014-06-11

Federal Bureau of Investigation (FBI)

2012-01-19

Femsplain

2015-03-08

Github

2015-03-27

GetResponse

2014-04-26

2014-04-27

GoDaddy

2012-09-10

Hadopi

2012-01-19

JANET Network

2015-12-08

Organization Name

Attack Date (YEAR-MM-DD)

Komodia

2015-02-20

2015-02-21

2015-02-22

Library of Congress

2016-07-18

2016-07-19

MPAA

2012-01-19

NameCheap

2014-02-20

Newsweek

2016-09-29

Pirate Bay

2012-05-16

2012-11-13

Planned Parenthood

2015-07-29

Playstation

2014-12-25

PNC Bank

2012-09-19

PNC Bank

2012-09-26

PNC Bank

2012-09-27

Reddit

2013-04-19

RIAA

2012-01-19

Spamhaus

2013-03-18

2013-03-19

2013-03-22

Tor Network

2014-12-26

Universal Music

2012-01-19

ustream

2012-05-09

Wells Fargo

2012-09-19

Wells Fargo

2012-09-25

2012-09-25

XBox Live

2014-12-25

Neural Network Code

Our neural network model (Neural1 and Neural2 from the paper) is written in Python and uses DyNet for learning.

PLDAttack : Partially Labeled LDA

Our generative model is a modified version of Partially Labeled LDA. We implemented this in Java. It is not currently packaged into a nice easy-to-use library at this point in time. If interested in using it, please send me an email.

Questions?

Other questions can be sent to Nate Chambers. The two co-authors with Chambers were undergraduate students at the time of this project.