The Digital Methods Initiative is a contribution to doing research into the "natively digital". Consider, for example, the hyperlink, the thread and the tag. Each may 'remediate' older media forms (reference, telephone chain, book index), and genealogical histories remain useful (Bolter & Grusin, 1999; Elsaesser, 2005; Kittler, 1995). At the same time new media environments - and the software-makers - have implemented these concepts, algorithmically, in ways that may resist familiar thinking as well as methods (Manovich, 2005; Fuller, 2007). In other words, the effort is not simply to import well-known methods - be they from humanities, social science or computing. Rather, the focus is on how methods may change, however slightly or wholesale, owing to the technical specificities of new media.

"How Different are Digital Methods?"

The Web archiving specialist, Niels Brügger, has written: "[U]nlike other well-known media, the Internet does not simply exist in a form suited to being archived, but rather is first formed as an object of study in the archiving, and it is formed differently depending on who does the archiving, when, and for what purpose" (Brügger, 2005). That the object of study is co-constructed in the means by which it is 'tamed' or 'corralled' by method and technique is a classic point from the sociology and philosophy of science and elsewhere. For example, when one studies the Internet archive, what stands out is not so much that the Internet is archived, but how it is. Unlike a Web search engine, at archive.org's wayback machine, one queries a URL, not a key word. Moreover, one cannot 'surf' or search the Web as it was at some given date. In other words, a series of decisions was taken on how to build the archive, and those decisions constrain the type of research one can perform. One can study the evolution of a single site (or multiple sites) over time by collecting snapshots from the dates that a page was indexed. One also can go back in time to a Website for evidentiary purposes. Is that how one may wish to study the history of the Web? What kinds of research questions may be asked fruitfully and not asked, given the constraints? Digital methods perhaps begin with coming to grips with given forms of objects under study.

Brügger seems to go a step further, however, in arguing that methodological standardisation is unlikely if not impossible. To Brügger the form assumed by the object of study depends on its creator, in this case the particular archivist. Does such a thought imply that digital method, if method remains the right word, is more of an art than a science, where the tacit knowledge and skill are paramount? Can there be no instruments, only tools? Data are less gathered, than they are sculpted, or 'scraped', as the term is known. Perhaps 'data-mining' is appropriate in the sense that there is always some waste, or slurry, that runs off. Digital methods may have to have more patience with the lack of exhaustiveness in data sets than would be the norm in other sciences.

"I study virtual methods. How would I relate to what you are doing with 'digital methods'?"

The origins of 'virtual methods' may lie in the U.K. virtual society? research program of the late 1990s (Woolgar, 2002). In particular, the virtual society question mark was emphasized. The research challenged the then dominant division between the real and the virtual realms, empirically demonstrating instead the embeddedness of the Internet in society. The desire to innovate methodologically saw perhaps its greatest challenge in ethnography, with the desire to put forward and defend a new strain of scholarship, 'virtual ethnography', that combined the terrains of 'the ground' with the online (Hine, 2000; see also Slater/Miller, 2000). Special skills, and methods, were developed to gain entry to and study communities now rooted both in the offline and the online. How should the introductory email message be written, and to whom? How should the online survey be designed? Questions revolved around how to adapt methods from social science to the online environment.

If one were to contrast the challenges of virtual methods with those of digital methods, one could begin by thinking about the embeddedness of society in the Internet. Thus the important question mark from the earlier research program shifts: virtual? society. The methods and skills developed here strive to put society on display. How can the Internet be made to show what's happening in society?

In this respect, the Dutch newspaper, the NRC Handelsblad, published an in-house study of home-grown right-wing Websites over the past few years (NRC Handelsblad, 2007). The remarkable line in the article, which seemed unusual for those accustomed to reading at least implicit distinctions between 'the real' and 'the virtual', read: "The Internet reflects the increasing hardening [of the right-wing] in the Netherlands."* Thus here the Web becomes the site to study social trends. More to the point, how improbable is it to study right-wing movement trends without the Internet? The special skills once entailed how one would embed oneself successfully within the groups, and report with distance. Now, the 'digital methods' question becomes, how to collect and analyze the data to distill such trends from the Web?

"You mentioned 'society' quite a bit. I am from humanities. How should we think about 'culture'? Is there also a 'humanities' approach to your work?"

"e-Social Science" is one term used to describe how to study the online now that the environments have become digital. Normally the skills with which one collects data and forms them into something palatable for existing methods are under-emphasized. Data cleaning is something that is taught tacitly, and the results are applied formally. (There is hardly ever a footnote on how one has cleaned the data, though there are occasionally cases that insist that the data have not been cleaned well.) Here the cleaning is the issue. For example, it may come as a surprise both to the casual user as well as to the formal social scientist that a Google search engine return has millions of results, but that one can only access 1,000 of them. There was a quip in the 1990s that it would take a scientist most all her life to go through the millions of results. Well, there are only 1,000 to deal with. That is what Google 'serves'. This is a simple example of the difference between a sense of the 'digital' and the Internet, and less of a sense. "Less of a sense" refers to the quest for exhaustiveness, prevalent in social science critiques of the humanities' case study approach. It also refers to the bewilderment amongst humanities scholars with respect to the great sea of information and how to deal with it. However, it's seen as unreasonable for an engine to serve more than one thousand results, and the 1000 maximum figure has become standard. Moreover, not too many users set preferences to more than 10 results per page; typically they do not look past the first page (see the work by Bernard Jansen and Amanda Spink). Digital methods begins with understanding the culture of programming (why serve 'only' 1,000 results) as well as the culture of use (why not look past the first few).

"'Distilling Social Trends from the Internet' by digital methods is interesting, but it seems like something a journalist would like to do. Is your work journalistic? Is the work also theoretical? What's the difference?"

Journalism has methodological needs now that the Internet has become a significant meta-source. Normally the question concerns the trustworthiness of a source. "Snowballing" from source to source was once a 'social network' issue, to speak in terms of method. Who else should I speak to? That's the question at the conclusion of the interview, if trust has been built. The relationship between 'who I should speak to' and 'who else do you link to' is asymmetrical for journalism, but the latter is the question asked by search engines when recommending information. How to think through the difference between source recommendations from verbal and online links? Is search the beginning of the quest for information that ends with some grounded interview reality beyond the net, whereby we happily maintain the divide between some real and some virtual? Or is that too simplistic? Our ideal source set divide (real and virtual, grounded or googled) raises the question of what's next. What do we 'look up' upon conclusion of the interview to check the reality? The internet may not be changing the hierarchy of sources for some (e.g., proposals to ban wikipedia as citable source), but it may well be changing the order of checking.

"Could you illustrate what you mean by the 'natively digital'? What's a clickable setting?"

One could argue, classically, that certain objects could not survive outside of their digital environments. Consider debates about human memory, and the length of a telephone number. With the mobile phone there was the idea that no one would be able to remember the longer nine-digit telephone numbers as compared to the seven-digit number of the landline. Mobile phone numbers would have to 'live' as programmed (stored) in the phone. Similarly, at what character length does a URL become 'digital-only'? There are URLs on advertising billboards, but at a particular length, a URL is best stored in a clickable setting.

Tinyurl.com, in existence since 2002, markets itself nowadays as having transformed over 47 million long URLs into tiny URLs. It asks, "which URL would you [prefer]"?

Of course one could ask at what character length the 47 million tiny URLs themselves become too long to remember, e.g., tinyurl.com/47000000. (Actually it uses numeric and alphanumeric characters, so URLs look more like this: http://tinyurl.com/25txg). Whilst a trivial example, the point is that a URL, with a particular character length, is like the stored mobile phone number. It's best served in a digital-only environment. There are no hard-copy URL address books, but there is a browser history file (with a 9-day expiration by default), a browser cache and social bookmarking sites for URL storage.

There is a second point, which is less classic. Internet archivists often ask, where does a website start and where does it end? It is an important practical question, and there is some theory behind it, too. It resonates in film and TV thought, inspired by new media, because film and TV-makers as well as fan communities continue the story on both ends, online. Could the website 'begin' with a tinyurl, which redirects to the site? That a page has been made into a tinyurl -- is that important enough to archive, so that we can recall how long URLs have been handled in digital environments, 2002-2007?

The site then has content that may be fed from elsewhere, say adverts loaded on the basis of where you are located (through IP-to-geo). Where one archives determines what is archived. Tinyurls are only one example of where a site starts, and linked, embedded and syndicated content expand where it ends.