I am interested in how typical open data users, such as journalists, researchers, companies, developers, and others, find out about new open data sets today. For example, how do sources like search engines, re-distributors of open data, tech media, and government open data platforms compare? Is there any research into this question? (This question seems particularly relevant to this site, since many people have posted questions about how to find particular data sets.)

It's a combination; I believe no individual answer covers it yet. New releases generally occur according to a schedule, so following social media, e.g. a RSS feed, is popular. For journalists, see our slide 32 here.
– UlrichFeb 3 '14 at 15:01

They don't look for open datasets as such. They are interested in certain areas, follow what happens there, and then bump into announcements, links, tweets, ...
– Jan DoggenMar 8 '17 at 9:36

For datasets where a single authoritative source exists (for instance, a government office often fulfills this role for data concerning its jurisdiction), then it's helpful to find an RSS feed or revisions page that lists additions/updates to datasets.

I've also found that Google Alerts are helpful for this task, even beyond open data. Once you've found an existing dataset using a particular query, the Alert will email you (based on your settings) whenever another similar piece of content is posted.

It is something I am asked a lot about at work. We call the process 'Data Landscaping'.

In terms of sites http://www.programmableweb.com/ can be useful - but often comes a bit light. We would then start with data marketplaces and data providers (http://www.quandl.com/ for example). And of course, search engines and following the right accounts on twitter.
Building Data Curation Experience plays a huge part.

We often have to start with exploring what a client as already.
I use a 'magic quadrant' with axis of 'Distant and Close data' and 'Dark and Light' data.
Distant data 'you know is out there, but don't have in your organisation'.
Close data 'you have access to directly'.
Dark data 'you don't really know what to do with'
Light data 'you know how to use to benefit your organisation'.

I then map against 'The Data Journey' (I am producing a white paper on this - message me if you would like a copy). Which shows how data moves and can be put to work.

The combination of these two assets help show gaps in the landscape and raise questions to focus where to look next. I have not, as yet, seen any research into how people are finding new data.

Your site links to datacatalogs.org ; which claims to have 337 data catalogs ... is there a reason you only have 100? (is it just an issue of 'portals' having more than one 'catalog'?)
– JoeAug 16 '13 at 15:47

Not all of the catalogs have the same search APIs. I've only implemented the search call for CKAN, Junar, and Socrata, and they account for a bit over 100 portals.
– Thomas LevineAug 16 '13 at 17:12

In research, one common way is to simply read other research articles and see what datasets they have used. Also, when a dataset is created by a research lab, the latter usually advertises to their colleagues (research meeting, email, conference, etc.) and might publish a paper presenting the new dataset (1).

My apologies. Is it considered good etiquette to delete this comment? Or should I edit it? I'm relatively new here
– wirefireFeb 15 '17 at 21:22

2

No worries. I am relatively new to the site as well and do not know if there is a "correct" way to do it. Maybe others can comment on this. Then again, if it was my answer I would either delete it or update with additional information not mentioned yet.
– eigenvectorFeb 15 '17 at 21:28

As an addendum, even though it may not sound like the most obvious source, NICAR (National Institute for Computer-Assisted Reporting) maintains a very active listserv of data journalists corresponding with one another and sharing new data as it's released in real-time nearly. If something's happening in current events, a data journalist somewhere is looking for and sharing data on the topic. You can sign up for the ListServ on NICAR's website.