Millie: In the Balkans we have an issue of forest fires and consequent air quality

09:31:35 [PhilA]

... I want to know if my child can go out on the street

09:31:42 [PhilA]

... we have kids building air quality monitors

09:31:51 [PhilA]

... we move to solutions too quickly

09:32:33 [edsu]

cjg: it's hard to develop all the apps/visualizations people want ; giving them the data and empowering them to do it seems like a no brainer -- except to people who don't want new interesting visualizations of their data :)

09:32:45 [PhilA]

JeniT: For Bob - you spoke about the need for collecting data about people using the data and restricting terrorists's access - that's not the usual definition of open data

09:32:56 [PhilA]

BobS: I see a spectrum, not a point

09:33:17 [cjg]

I generally tell people that "open" means removing as many barriers as possible

09:33:30 [PhilA]

... we're going to have rock solid stuff - it will be there and accurate for 9 years. Then there's softer and softer - we need to cover the specturm

09:33:31 [cjg]

the barriers can be technical, social or legal.

09:33:58 [cjg]

"as open as possible" can still be used to describe data which is confidential.

Concluding remark from first session: "Open data is a means, not an end. Come at it from what real world problems it will solve."

09:42:24 [cjg]

"

09:42:46 [HadleyBeeman]

HadleyBeeman has joined #odw

09:42:49 [markbirbeck]

Paul Davidson introducing James King — senior principal scientist at Adobe — to talk about how PDF is more open than we all think it is.

09:42:57 [edsu]

BibS++ concur

09:44:27 [markbirbeck]

Structure of talk: open data paradigm, PDF itself, and then its role in open data.

09:44:42 [bhyland]

bhyland has joined #odw

09:44:44 [StevenPemberton]

s/:/-

09:44:46 [jpcs1]

jpcs1 has joined #odw

09:45:08 [markbirbeck]

Organisations taking data, shaping it and presenting it.

09:45:33 [markbirbeck]

…but others — the "processors" — would prefer to deal with the raw data...

09:45:58 [markbirbeck]

…they might present that too, but also use the data to draw new conclusions, or use it for advocacy.

09:46:17 [markbirbeck]

…A further group is that of the tool providers, who will help us process this data.

09:46:40 [markbirbeck]

…About 30% of the room are providers...

09:46:53 [markbirbeck]

…80% are processors...

09:47:23 [markbirbeck]

…most are consumers, and some are tool providers.

09:48:11 [markbirbeck]

…PDF will be 20 years old this June.

09:48:23 [cjg]

cjg has joined #odw

09:48:24 [markbirbeck]

…PDF and Acrobat are different beasts.

09:48:57 [markbirbeck]

…The internals of PDF have always been published, and it became an ISO Standard in 2008.

09:49:06 [PhilA]

PhilA: Nice approach to backwards compatibility from Adobe for PDF

09:49:18 [markbirbeck]

…A PDF 1.0 doc is also a 1.7 doc — always backwards compatible.

09:49:27 [bhyland]

Jim King: PDF will be 20 years old this June. PDF 1.7 became an ISO Standard in July 2008. ISO work on PDF is ongoing.

09:49:40 [edsu]

hopefully mozilla's pdf.js will get a mention ...

09:50:55 [markbirbeck]

…To make the PDF spec into a 'proper' ISO Standard the team at Adobe had to go through the entire document…very thoroughly…

09:51:12 [amp]

amp has joined #odw

09:51:31 [markbirbeck]

…PDFs are abundant, containing lots of useful information.

09:51:50 [cjg]

I had surprisingly good results converting our student union committee minutes from PDF to RDF: http://lemur.ecs.soton.ac.uk/~cjg/TheyWorkForSUSU -- just looking at where on the page text appears gives more semantics than the naive pdf2utf8 (or 2html) approach.

09:52:12 [markbirbeck]

…It's a format that distinguishes between text and graphics, and can be used to produce good looking documents.

09:52:22 [markbirbeck]

…But it's not a data format.

09:52:27 [edsu]

cjg: i think that's roughly what google scholar does when it scrapes pdfs

09:52:55 [bschloss]

bschloss has joined #odw

09:53:15 [markbirbeck]

…Billions of documents out there, but difficult to extract any data that's in there.

09:53:38 [edsu]

cjg: grabbing the largest text at the top of the first page as the title

09:53:51 [markbirbeck]

…If pages *contain* graphics then extract that with something like Illustrator.

09:54:09 [markbirbeck]

…If pages are text then there's a bunch of software that can process the text.

09:54:23 [markbirbeck]

…(A big list is on Wikipedia.)

09:54:37 [bschloss]

There is a 'spectrum of open data' -- totally free, available forever, no recording of downloader is one end of that spectrum, but airlines, investment markets, sports leagues, available job listing websites, retailers are all doing open data on a slightly different point on the spectrum.

09:54:50 [markbirbeck]

…And if the pages are images (i.e., rather than *containing* images) then need to go the OCR route.

…If you're making PDFs, here's what you could do to make things easier.

09:57:37 [markbirbeck]

…Making files that both contain raw data and look good is difficult.

09:58:32 [markbirbeck]

…There *is* software around that can embed metadata to provide structural information.

09:58:40 [AndreaP]

AndreaP has joined #odw

09:59:09 [bschloss]

Seems to me that any producer of a PDF who wants it to be available to people with no sight is hopefully providing a table or textual alternative rendering in the PDF for any diagram or image in the PDF, yes?

09:59:17 [markbirbeck]

…The structural information would be stuff like reading order, tags such as headers, footnotes, figures, maths, and so on.

09:59:42 [markbirbeck]

…Tools can make use of this extra data which will make the extraction process much more reliable.

10:00:13 [markbirbeck1]

markbirbeck1 has joined #odw

10:00:25 [markbirbeck1]

…A second thing to do is make use of the attachment facility.

10:00:37 [markbirbeck1]

scribenick: markbirbeck1

10:00:45 [markbirbeck1]

…A second thing to do is make use of the attachment facility.

10:01:08 [lottebelice]

lottebelice has joined #odw

10:01:27 [markbirbeck1]

…Raw data on its own is probably insufficient for doing something useful.

…PDF can be used well and powerfully, and of course it's clear that some people aren't using it well.

10:07:08 [edsu]

heh, re: billions of dollars worth of information that's unusable, you have to wonder if that's by design, not by accident ...

10:07:14 [markbirbeck1]

…You didn't mention XMP, though, which includes RDF.

10:07:24 [markbirbeck1]

…You also didn't mention accessibility.

10:07:44 [bhyland]

Peter Murry-Rust - Scientific publishers are paid $10B/yr worldwide to lock up scholarly publishing, that is after governments spend $100B/yr globally on scientific funding for R&D in the first place. He is looking for people to help him in his mission to unlock the enormous value locked in PDFs.

10:08:45 [serena_v]

serena_v has joined #odw

10:08:46 [bhyland]

s/Murry-Rust/Murray-Rust

10:09:08 [markbirbeck1]

James: The accessibility aspects are quite mature in PDF, and the structured aspects help that.

10:09:13 [roger]

roger has joined #odw

10:09:15 [StevenPemberton]

PDF is a page description language, so not in a reading order necessarily

10:10:04 [markbirbeck1]

…We don't have much control over what people produce, although things have improved in the last 5 years.

10:10:45 [bhyland]

@edsu - perhaps re: your comment above. My experience suggests that we're more thoughtful publishing structured data about data sets (metadata) because they are fewer in quantity whereas PDF are like water, they are everywhere and almost "too easy" to create but the mere click of "Print —> PDF" …

10:11:03 [bhyland]

s/but the/by the

10:11:07 [markbirbeck1]

speaker: For many people PDF data is closed data.

10:11:31 [yaso]

yaso has joined #odw

10:11:55 [markbirbeck1]

speaker2: You've outlined many things I didn't know were possible, so why is there not the uptake on these features?

10:12:45 [bhyland]

@hadleybeeman - because the tools are proprietary, complex to use … at least harder than clicking "Print —> PDF" and well let's face it, people are lazy and hand entered metadata has been proven to be *very* challenging and highly inconsistent.

10:13:36 [markbirbeck1]

James: Not sure if it's our fault. In some areas there have been successes, perhaps where there's industry interest or our sales people have promoted a feature.

10:13:52 [StevenPemberton]

s/speaker2/hadleybeeman/

10:14:07 [markbirbeck1]

s/speaker2/HadleyBeeman/

10:14:21 [markbirbeck]

markbirbeck has joined #odw

10:15:08 [alex]

If they want stuff like metadata to be adopted, then surely they need to encourage support in tools other than their own (OpenOffice; Word)

… We can & should learn from the Free Software Foundation. Supports giving people the ability to make copies of content. Highlighted the importance of porting content to be ported to many other computer systems, both on & off the Web, for it to be considered truly open.

11:00:34 [bhyland]

Panel Convener is Jeni … She puts the following question to Rufus. Q) There is debate on how manage metadata, to embed or not ...

Rufus: Regarding embedding, it almost becomes an AI project to figure out metadata that is embedded. It can be a nightmare. The beauty of keeping it separate is it is easier on tools & therefore treatment by tools. He is supportive of graceful degradation.

11:03:44 [DeirdreLee]

DeirdreLee has joined #odw

11:04:09 [bhyland]

Tyng-Ruey Chuang: Prefers to have structured schema as part of the data (?)

11:04:59 [bhyland]

Omar: Mainly, the important thing is to get agreement on format, then all kinds of good things can happen. Linking tables & metadata to Web pages (authoritative) is really important.

11:05:37 [bhyland]

Stuart: We're been using this word "metadata" which leads us to schema information. In RDF world, we can click through to it & immediately see it.

11:06:14 [bhyland]

… Using RDF model, you don't have to scramble all over the Web, rather, you get bits of schema info back because it is carried *with* the data.

11:07:15 [bhyland]

… Highlighted the perils of carry possible too much provenance information that it drowns out the important data itself.

11:07:17 [cjg]

Quite simply, tabular data requires a lower cognitive load to work with. Most people can't be bothered to learn to think in graphs. So tabular is more open because it's easier to comprehend.

11:07:34 [edsu]

aside: embedded metadata (facebook opengraph, schema.org) is getting published because it is getting used

11:07:47 [HadleyBeeman]

cjg I wonder how much of that is because our computer science training wasn't very graph-focused. Next generation might be different?

11:07:54 [edsu]

i don't buy the argument that it needs to be separate ...

11:08:13 [bhyland]

Questions from the audience ...

11:08:59 [bhyland]

Ivan: When we speak of metadata, my biggest issue is what vocabularies to use. It is the biggest problem we have to solve, even more important than the data format/model … if we had widely available vocabularies, it would solve many problems.

11:09:13 [cjg]

HadleyBeeman: I'm talking about the people who maintain my data. They are *not* computer scientists… they are in finance, buildings & estates, catering...

Rufus: If you meet most developers, and start talking about vocabularies, "they'll run for the hills." Been part of long countless fights on what vocab. Suggested a new site called http://GiveMeTheDamnSchema.org as a joint project of cygri and Rufus ;-)

11:10:23 [HadleyBeeman]

cjg: Ah, I see. Yes, different user base there.

11:10:55 [cjg]

I went to see what they already had, tidied it all up in excel and moved it to google spreadsheets so it was easy to grab automatically.

11:11:15 [andyhedges]

andyhedges has joined #odw

11:11:15 [bhyland]

… What is the minimum to make CSV files useful. Just give me the basics, string, integer. This is *our* problem, not publishers. I'm all about 'reducing the time' … open vs. closed data.

11:11:25 [edsu]

problem hasn't been schemas per se, as much as it has been schemas divorced from their actual use

11:11:28 [bhyland]

… Licensing is a lower priority for many.

11:11:40 [bhyland]

… Ease of publishing is king

11:12:00 [cjg]

Also, I want to create a collection of SPARQL queries which produce useful spreadsheet downloads for humans to consume. Secretaries are a whizz with Excel, but only if the file loads first time. Telling them TSV can be "easily imported" is already outside their comfort zone.

11:12:01 [bhyland]

… Our mission is to reduce the cost & RDF, at the moment, is not doing that.

11:12:59 [bhyland]

Omar: If we want to bring data together, we have to harmonize into a common model. I don't know whether developers should have to be encumbered with that responsibility. But it is a real problem to solve.

11:13:51 [bhyland]

Bhyland notes, (not in a comment), there is a wide spectrum of opinions in the room & that is good to stimulate that discussion. Deepening understanding is key to all of this.

11:14:13 [AndyS]

AndyS has joined #odw

11:14:34 [bhyland]

Stuart: Finding the stuff in the first place, with schematic markup answering provenance information, is critical to solving the hurdles we face with better use of open data on the Web.

11:15:18 [alex]

cjg: I played with SharePoint/Excel integration yesterday, and it looks like you can get Excel to live-update from SharePoint lists; I suppose something similar could be done with s/SharePoint/SPARQL endpoint/

11:15:35 [bhyland]

John Snelson: Vocabularies have their place, but search is a great way to find data that is not expressed perhaps as nicely as we'd like...

11:15:36 [alex]

then SPARQL would be truly Enterprise™

11:15:43 [masao]

masao has joined #odw

11:16:42 [alex]

it would also be possible to embed the metadata for a table in a second sheet of an XLSX/ODS file, instead of prepending it to a CSV file

11:16:46 [cjg_]

cjg_ has joined #odw

11:16:52 [bhyland]

Questions from the mob: You've got to help represent/model data, but that is not the entire story. It is a "horses for courses" kind of thing. Please be careful not to reinvent RDF with JSON glasses on.

11:16:55 [pascalRomain_]

pascalRomain_ has joined #odw

11:18:17 [bhyland]

IBM guy - Dealing with data is hard. It is harder than process. We won't solve problems with data exchange standards alone. One thing we haven't heard about today is Best Practices and Architectural processes. We need to rise above data formats and really focus on data patterns, best practices.

11:18:22 [cjg]

I have this horrific image of people creating n-triples documents in Excel...

Omar: "I think we've been spoiled by the Web" because search engines have done a good job. The question is, can we make this Web of Data thing work such that we publish our metadata & data and have it easily found. This is the question.

11:22:57 [pieterc]

cjg: spreadsheets are for calculations, not data. CSV is a format which people use with spreadsheet programs, thus not suited for the job. Got your point?

11:23:43 [bhyland]

Peter Murray-Rusk: To Omar - what do you do with things are labelled as tables but really are not tables?

11:23:50 [cjg]

yeah, maybe we need a nice "CSV" editor?

11:24:04 [cjg]

Or even a "table" editor, using PMR's description.

11:24:07 [bhyland]

Omar: Smart people are working on it … it's complicated.

11:24:22 [pieterc]

cjg: thought of it as well already

11:24:36 [cjg]

basically a cut-down google docs.

11:24:46 [pieterc]

cjg: open refine? ;)

11:24:47 [BartvanLeeuwe]

BartvanLeeuwe has joined #odw

11:24:50 [bhyland]

John Snelson: Need to be able to break out & work with data in a schema-less fashion.

11:25:04 [cjg]

with a magic table heading

11:25:39 [bhyland]

John Sheridan asked, in the world of tables & CSVs and [screw the metadata], how are you prepared to deal with the license matter?

11:26:39 [bhyland]

Rufus: I didn't say, 'screw the metadata'. Rather, we need simplicity and innovation about process. He suggested having multiple parties be part of the "packaging process".

11:28:02 [bhyland]

… Clearly a license has to come from an authoritative source. Gave example about data from Bank of England. Two important points, we need minimal metadata and … [some one else augment please, scribe missed second point]

11:28:12 [cjg]

*if* the source of the metadata is the same website as the data then that's probably good enough for me.

11:28:47 [bhyland]

Wrap up from panelists - 'wear your data on the outside, use HTTP URIs to describe things if putting on the Web.'

11:29:00 [alex]

s/data/schemas/

11:29:16 [bhyland]

John: Great opportunity for tool developers to liberate data.

11:29:23 [StevenPemberton]

Topic: Lightning Talks with a linked data theme

11:29:28 [StevenPemberton]

Scribe: Steven Pemberton

11:29:30 [bhyland]

End of panel facilitated by Jeni. Thanks all.

11:29:35 [StevenPemberton]

scribenick: StevenPemberton

11:29:36 [pieterc]

I have a problem with the fact that the data are/is being able to be processed through quick bash scripts, or other low barrier scripting languages, but the meta-data needs a json parser

Since Open Data is a means to several valuable ends, IBM is talking to our clients about thoughts of "becoming a Contextual Enterprise" and we emphasize the critical need to dynamically assemble context for every key input and output of their work, including the context of external data they import. See http://www.research.ibm.com/files/pdfs/gto_booklet_executive_review_march_12.pdf for very high-level summary of our recently released Global Technology Outlook.

11:51:26 [StevenPemberton]

... GTFS/DSPL formats

11:51:43 [StevenPemberton]

Topic: Linked Data at the Science Museum, Tristan Roddis, Cogapp

11:52:04 [StevenPemberton]

Tristan: We work with cultural heritage. Will talk about science museum now

11:52:11 [rtroncy]

rtroncy has joined #odw

11:52:11 [StevenPemberton]

... also a plea for help

11:52:36 [StevenPemberton]

... Science Museum is august and venerable, with loads of internal systems, we are trying to consolidate them

11:52:55 [rtroncy]

rtroncy has joined #odw

11:53:11 [StevenPemberton]

... we extract, and convert to linked data

11:53:20 [StevenPemberton]

... triple store

11:53:45 [pieterc]

rtroncy: how active is the development of Datalift? I haven't seen a lot of activity on the SCM

11:53:57 [StevenPemberton]

... built a data model, in cooperation with British Library, British Musem [others], see the paper

11:54:09 [StevenPemberton]

.. use that to drive the website

11:54:29 [StevenPemberton]

... my plea for help is what should be the next steps

11:54:45 [StevenPemberton]

... how can we make it more open?

11:55:02 [StevenPemberton]

... Publication strategies, stable URIs, dereferencable etc

11:55:15 [StevenPemberton]

... IS the data model interoperable

11:55:21 [StevenPemberton]

s/IS/Is/

11:56:02 [StevenPemberton]

Topic: Open Linked Education: a new Community Group, Madi Solomon, Pearson

11:56:23 [StevenPemberton]

Madi: I am new to W3C, and open linked data devotee

11:56:51 [StevenPemberton]

... Pearson is a publishing company, owns Financial Times and some Penguin books.

… Key part of this: the annotations. They're in the OData spec. Defined for: URI namespace for entity primary keys, URIs for entity typoes, properties and directionality of links

13:14:35 [naomi]

naomi has joined #odw

13:14:46 [bhyland]

bhyland has joined #odw

13:15:03 [HadleyBeeman]

… Annotations are visible to the consumer, mappings done against the SPARQL endpoint are visible

13:15:30 [HadleyBeeman]

… Allows you to reconstruct the source triples you've just queried, if you'd ever want to.

13:16:24 [HadleyBeeman]

… Implementation issues: Our naive approach: if you ask for an entity, a DESCRIBE will give you what you want. It was too unspecified, so you have to use CONSTRUCT, which led to sroting and identification issues.

13:16:31 [roger]

roger has joined #odw

13:17:03 [HadleyBeeman]

… OData allows the server to do paging. If there's been a server-side limit imposed, you don't know that.

13:17:43 [HadleyBeeman]

… Biggest implementation issue: because we're turning primary keys into URI identifiers, every entity in the entity set has to have the same base URI. Not a problem in most cases, but potentially.

13:17:58 [HadleyBeeman]

… [Example query to select a simple film]

13:18:07 [pieterc`]

pieterc` has joined #odw

13:18:10 [HadleyBeeman]

… [Example query to enumerate films]

13:18:49 [HadleyBeeman]

… [example query to show property navigation]

13:19:09 [jpcs1]

jpcs1 has joined #odw

13:19:49 [HadleyBeeman]

… That's all leading up to a bunch of questions. First and I'm most interested in discussing here: What is the group's seen importance of standards in interoperability? Do standards need to interoperate? Do different standards body's standards need to interoperate? Whose responsibility is it?

13:20:15 [francois]

francois has joined #odw

13:20:49 [AndreaP]

AndreaP has joined #odw

13:20:51 [HadleyBeeman]

… More questions: what could the W3C LDP WG learn from OData and vice versa. OData changed in response to feedback/requirements. Now on third iteration… Should these requirements and use cases be shared between groups?

13:21:34 [HadleyBeeman]

…

13:22:01 [HadleyBeeman]

… Finally, is there a shared meta-model for entity-oriented view of data resources between the two?

13:22:14 [HadleyBeeman]

LeighDodds: Do you have a sense of uptake?

13:22:36 [JeniT]

(uptake of OData)

13:22:48 [HadleyBeeman]

Kal: hard to tell because search discovery of OData endpoints is hard. Probably more not visible to the Web than those that are.

13:23:09 [bschloss]

[I think the SAP ERP platform, recent version, has APIs to get information as ODATA]

13:23:41 [pieterc`]

pieterc` has joined #odw

13:23:46 [HadleyBeeman]

ivan: There have been several attempts to get these groups together. For all kinds of personal reasons, it did not work out. There is a community group at W3C on OData vs RDF; the group is silent, empty.

13:24:05 [HadleyBeeman]

Kal: It shouldn't be "OData vs RDF". They should be coexistant and work together.

13:24:42 [yaso]

yaso has joined #odw

13:24:47 [bhyland]

My question is (and I'm not being snarky or flip), why OData? Isn't this MS trying to redo RDF? RDF has matured and is well-documented. It is not perfect & use is far from ubiquitous however, why fragment?

Neil: I'll focus more specifically on health and health sensor data. I've recently joined this group, and this is one of the projects we're working on.

13:26:37 [HadleyBeeman]

… We're working on a cloud platform for large-scale graph storage. Public and private data. That seems to be a tension that is coming across throughout today. Therefore, Linked (Open|Closed) (Big) Data

… We've been working with DERI on a CKAN-like LInked Data Global Repository. Faster and more searchable.

13:27:30 [HadleyBeeman]

… We're also involved in the W3C LDP WG

13:28:43 [HadleyBeeman]

… With the University of Singapore, we've been working on health care sensors. Temperature monitor, heart rate monitor, establish patient history. Challenge: how to combine sensor data with patient specific data from their health record, which might be different to medical best practice, clinical recommendations, etc?

13:29:16 [HadleyBeeman]

… We're making this sensor data linkable - 10m triples per person per week, for example - standardise, and link to data about effective drugs.

13:29:45 [HadleyBeeman]

… Announced in Nov, just working out how to do this. Open, closed and anonymisable data involved.

13:30:51 [HadleyBeeman]

… We are handling temporal data and binary data. Do we want to convert binary sensory data, with an established community of tools, into RDF? Maybe not. If not, how to work with the binary and the (other) linked data?

13:31:05 [HadleyBeeman]

… These things keep me… well, not awake at night, but certainly busy during the coffee break.

13:31:19 [floppy]

floppy has joined #odw

13:32:03 [masao]

masao has joined #odw

13:32:10 [HadleyBeeman]

… Non-technical challenges: main motivator for this paper: most open health data is on hospital numbers, costs of services, etc. But these are questions for policy makers; not as much emphasis on medical research.

13:32:42 [yaso_]

yaso_ has joined #odw

13:32:57 [HadleyBeeman]

… Found data on ECG and HBR stuff… but not as much emphasis of having a "broad church" of open medical health care data to generate further epidemiological and clinical research.

13:33:30 [HadleyBeeman]

… Generating these datasets is labour-intensive. One researcher said teams of researchers working on a dataset would be useful… How to do on the Web?

13:33:35 [floppy1]

floppy1 has joined #odw

13:33:57 [HadleyBeeman]

… Could be that we have more administrative hospital data than clinical data because it's easier to lobby governments than universities and researchers?

13:34:50 [HadleyBeeman]

… There still isn't much best practice on this. Vocabularies, dataset engineering patterns. We have patterns for building modular software… is there an equivalent here?

13:35:19 [HadleyBeeman]

… Ex: There is an ECG ontology I came across… should I use it?

13:35:44 [HadleyBeeman]

Questions

13:36:09 [markbirbeck]

markbirbeck has joined #odw

13:36:14 [HadleyBeeman]

BillR: You should look at Linked Data Patterns, LeighDodds is one of the authors

Albert: We work with historical censuses, encoded in thousands of .xls spreadsheets. We would like to uniformly query them, but they are extremely messy. We'd like to transform them into RDF Data Cube and other vocabularies using SPARQL queries?

13:38:44 [HadleyBeeman]

Question: Bob Schloss: The value we seem to be talking about is mashups between datasets with unexpected results. Mapping was one of the first join points. What other join points do you see and do you agree this is critical?

13:38:51 [BartvanLeeuwen]

BartvanLeeuwen has joined #odw

13:39:15 [markbirbeck1]

markbirbeck1 has joined #odw

13:39:55 [HadleyBeeman]

Kal: Yes, I agree. Increasingly, I see a lot of time-series value type data, sets combined in a way to expose latent knowledge. Biggest problem is vocabulary interoperability. Odata doesn't have them so we can't do conceptual joins with data tagged with different systems.

13:40:07 [rszeno]

rszeno has joined #odw

13:40:18 [HadleyBeeman]

Bob: Let's reuse the requirements gathered from XBRL in the Financial industry. They do have publicly listed busineses.

13:41:21 [HadleyBeeman]

Neil: Open data is administrative, government-driven. People want to answer local questions, so that has driven a lot of the applications. But in that healthcare example, it's not geographically-specific. New disease patterns may not be tied to parts of a city.

13:41:43 [lottebelice]

lottebelice has joined #odw

13:42:26 [HadleyBeeman]

… With regard to the vocabularies question… I don't want to learn about all the vocabs out there. In the same way I can modularly take a bit of a software library to see what's in it, I'd like to do the same with a vocabulary. I want to conceptualise my data first, and modularly pick a vocabulary.

13:42:47 [HadleyBeeman]

Kal: The individual is an interesting join-point. For governments and otherwise.

13:42:59 [rszeno]

rszeno has left #odw

13:43:22 [roger]

roger has joined #odw

13:43:24 [HadleyBeeman]

Albert: In some domains, historical data is so badly degraded… and it may not have been intended to be comparable.

13:43:58 [rtroncy]

rtroncy has joined #odw

13:44:29 [HadleyBeeman]

TomHeath: Re data engineering patterns: we do need to go further than Leigh's book. Hack-y stuff (download, GREP, etc), ad-hoc processes. Things going on in the Hadoop community to describe these processes

13:44:59 [HadleyBeeman]

Neil: The term dataset engineering patterns… [coining a new phrase]

13:45:01 [markbirbeck]

markbirbeck has joined #odw

13:45:47 [HadleyBeeman]

Michael (from the EC): to Neil: re the link between closed/sensitive/open data… Are you looking at aggregated personal data that then can be opened? As in other areas of sensitive public data

13:46:48 [HadleyBeeman]

Neil: we don't quite have a generic process for anonymising sensitive data. Some organisations do that… I'm just in the early stages of learning the issues around that.

13:47:33 [HadleyBeeman]

questionasker?: concerned about applying the label of "open data" to data that's locked behind a query API. Do you share my concerns?

13:48:33 [HadleyBeeman]

Kal: OData entity set that conforms to the standard is enumerable… It's an ATOM feed with Next links in it. You can download it. Also, a data dump isn't any better — you're relying on the server's capacity to provide the data and the data being up to date.

13:48:44 [HadleyBeeman]

… I can see your point but I think it applies to all open data.

13:49:04 [HadleyBeeman]

questionasker?: If I were going to mortgage my house to fund a startup on this data, I would see this as a problem.

Johnlsheridan: It's 2020 and we've seen the failure of the world's first multibillion dollar open data corporation. How did this happen?

14:52:36 [yaso]

Yes, I'm with connection problems

14:53:08 [HadleyBeeman]

Conor Riffle: We've been looking at lots of business models. Sponsorship would be hard to scale to that level.

14:53:15 [yaso_]

yaso_ has joined #odw

14:53:31 [HadleyBeeman]

… Also look at people like Google who make tons of apps and sell ads on that.

14:54:01 [HadleyBeeman]

JohnLsheridan: which of the eight business models Michele has identified could scale to that level?

14:54:50 [yaso__]

yaso__ has joined #odw

14:55:07 [HadleyBeeman]

Miguel: Usually, all the four actors are able to manage a huge amount of data. We have some enablers - usually they are scalable - but they do not serve end users. They're in a wholesale position in the value chain. Examples: Microsoft, Socrata.

14:55:27 [HadleyBeeman]

… Many of them have other business lines, even outside the boundary of public sector information.

14:56:31 [HadleyBeeman]

Irina: I think you'd want lots and lots of smaller companies, not one big one. As small music app companies are threatening the big distributors, a big company doesn't fit.

14:57:10 [yaso]

yaso has joined #odw

14:57:16 [HadleyBeeman]

Bart: The Fire Department wants to be the authoritative source of information. They won't make a business out of it, but they will engage to have usable data.

14:57:50 [HadleyBeeman]

Michele: Risk to opening up data… fear of losing control. But benefit: they will be seen as the authoritative source. We see both.

14:58:19 [HadleyBeeman]

Lotte: open data can bring big benefits to companies.

14:58:45 [yaso_]

yaso_ has joined #odw

14:59:39 [HadleyBeeman]

questionasker?: Do we all agree that we should build public infrastructure, basic datasets to build business models on top of… If we don't do it fast, a big multi-billion company maybe wants to become a public infrastructure provider? Or the market will collapse and transform in another way. We, as a community, need to identify the basic datasets which will be the "streets" of open data.

14:59:59 [HadleyBeeman]

JohnLsheridan: What are the basic datasets of interest for fire services?

15:00:21 [HadleyBeeman]

Bart: Address data. Real streets. We don't have "highways" for open data yet; we have "rural roads."

15:00:49 [HadleyBeeman]

… Large companies taking over scares the Fire departments as well. "What if a company over in America is holding our data?" An important discussion to have.

15:01:05 [HadleyBeeman]

Johnlsheridan: Do you see CDP becoming that sort of infrastructure provider?

15:02:02 [HadleyBeeman]

Conor: I think we are. Especially where companies are contributing pollutants to that atmosphere, it impacts all of us. But we see it's useful where people can make money out of it. Investors will use it. But there's more to do with it. We need a hybrid model: some monitisable, some open.

15:03:11 [HadleyBeeman]

Bernadette: I'd recast the question: It would give me great joy if, next year, there are 20 companies 10-100 people with $2-20m in gross revenues who are using this technology to share information, for-profits (not grant-funded). We don't need yet another social network or cow-tipping site.

15:03:47 [HadleyBeeman]

… If they are venture-funded, it would be with a social enterprise angle.

15:04:31 [HadleyBeeman]

Chris Metcalf: In the US, I feel like we're seeing the steam come out of pure open data. We need to show the benefits, which are often business. We work with small businesses to do that. We need to focus on that in the community.

15:05:44 [HadleyBeeman]

Bob: Infrastructure isn't always provided by regulators, grant makers and hackers/coders. It's sometimes created by lawyers and judges. I think some orgs and agencies are hesitating to publish open data because they're afraid of inaccurate records and resulting harm and subsequent lawsuits. We may need some case law to determine this.

15:05:51 [bhyland]

bhyland has joined #odw

15:05:53 [HadleyBeeman]

… To Conor: because your data can impact stock price, do you have T&Cs to cover that?

15:06:48 [yaso]

yaso has joined #odw

15:07:02 [HadleyBeeman]

Conor: We do have cleverly-written T&Cs. Many many companies to agree to them. Other orgs can learn from our lessons: we don't own the data submitted to us.

15:07:44 [HadleyBeeman]

… To Chris: Yes, we need to crate value from things built on public data, but also as a provider: how can we increase the value all along the chain?

15:08:44 [HadleyBeeman]

Michele: What we see: one the benefits is people correcting data and pushing it back to the publisher. Enhancing it, geotagging, improving our metadata.

15:09:32 [HadleyBeeman]

… There was a company who wanted to make money out of the data, and we want them to succeed. But this is a public sector answer, I realise.

15:09:44 [DeirdreLee]

DeirdreLee has joined #odw

15:10:16 [HadleyBeeman]

Lotte: Do not forget SMEs like ours: manufacturers, consulting services, pharmacies… they are the ones who will recreate the value in the data.

15:10:21 [albertm`]

albertm` has joined #odw

15:10:30 [HadleyBeeman]

… New standards, new protocols, new releases, new things.

15:11:15 [HadleyBeeman]

questionasker?: This isn't a level playing field. In the development of the Web, it's a case of survival of the fittest, driven by quality, quantity and cost.

15:11:55 [HadleyBeeman]

… Chances are high that whoever that company is in the future, they are here today. I'm hearing that open data should be a communal type where everyone has a chance. Those at the front will probably stay there; this is a call to them to maintain the lead.

15:12:10 [AndreaP]

AndreaP has joined #odw

15:12:31 [HadleyBeeman]

s/questionasker/phil tetlow

15:12:33 [AndyS]

AndyS has joined #odw

15:12:52 [HadleyBeeman]

questionasker?: Can we learn from the open source business models?

15:13:02 [JeniT]

s/questionasker/Thijs/

15:13:11 [HadleyBeeman]

Miguel: Yes, one of our models is called "open source like".

15:13:22 [JeniT]

s/Miguel/Michele/

15:14:33 [HadleyBeeman]

… where reusers do not pay. As with Open Corporates, Licenses allowing non-commercial reuse.

15:14:56 [HadleyBeeman]

Conor: Ask: How did the open source software people monitise it? A lot of them got burned.

15:15:19 [HadleyBeeman]

Thijs: Training, consultancy,

15:15:56 [HadleyBeeman]

Bart: In the Netherlands, the interesting datasets are often 3GB downloads. They will pay someone to maintain it in a usable form for them. That's the added value.

Bart: Services model similar to what RedHat does — good packaging and great support for enterprises.

15:17:11 [StevenPemberton]

s/tetlow?:/tetlow:

15:17:24 [StevenPemberton]

s/Thijs?:/Thijs:/

15:17:48 [HadleyBeeman]

Irina: CKAN is both open source and open data. How do you make it sustainable for businesses who publish data? Isn't that only an issue for businesses who only sell data? If it's a by-product of something else, it may drive more traffic

15:18:52 [StevenPemberton]

s/monitise/monetise/

15:18:57 [HadleyBeeman]

John: final thoughts

15:18:58 [JeniT_]

JeniT_ has joined #odw

15:19:40 [HadleyBeeman]

Lotte: We're seeing a shift from the fear of publishing to the network of data and content. Besides data, I look forward to opening more videos and content.

15:20:17 [atlets]

atlets has joined #odw

15:20:51 [HadleyBeeman]

Michele: The first enabler is the government itself. Gov has to build the governmental infrastructure. Inspiring motto from Federal CIO of USA: Everything should be an API.

Jer Thorp, "Any kind of data reserve that exists has not been lying in wait beneath the surface; data are being created, in vast quantities, every day. Finding value from data is much more a process of cultivation than it is one of extraction or refinement."

15:54:41 [BartvanLeeuwen]

BartvanLeeuwen has joined #odw

15:55:09 [DeirdreLee]

... having libraries for existing designers' tools would enable easy access to Open Data

15:55:28 [DeirdreLee]

... as would low-level examples and list of data catalogues

15:55:51 [DeirdreLee]

... This is an example of how Open Data could be opened up to another community

15:56:22 [DeirdreLee]

... small effort for Open Data practitioners, but would be of great benefit to other communities

15:56:40 [StevenPemberton_]

StevenPemberton_ has joined #odw

15:57:22 [DeirdreLee]

... easy access to Open Data would enable designers (and other communities) to see the value within the data and enable them to use it and extract knowledge from it

15:57:48 [DeirdreLee]

subtopic: Benedikt Groß, Royal College of Art, Large Scale Data & Speculative Maps

The HBR article by Jer Throp nicely supports the thoughts of the speakers, (I think), "As we proceed towards profit and progress with data, let us encourage artists, novelists, performers and poets to take an active role in the conversation. In doing so we may avoid some of the mistakes that we made with the old oil."

16:02:48 [DeirdreLee]

... Metrology, visualises the London tube map with Open Street Map data as a mental map, by mapping actual locations to tube map, using mathematical models

16:03:35 [StevenPemberton_]

He showed the mapping from true life to the tube map, and then reversed the process to make a real map with the same distortions

julian: do you see yourself creating a toolbox for visulaising open data?

16:09:46 [albertm`]

albertm` has joined #odw

16:10:24 [DeirdreLee]

benedikt: great to release tools, but you can't just release source-code but need documentation and examples too, which is time-consuming

16:11:02 [st]

st has joined #odw

16:11:28 [DeirdreLee]

Alvaro: you can't just release code/tools/projects, but you are responsible for maintaining it (like kids :) )

16:11:37 [yvesr]

had very good experiences with http://d3js.org/ for data visualisation - very powerful toolkit

16:12:13 [DeirdreLee]

Question from audience

16:12:24 [bhyland]

@Alvaro, Interesting analogy, Open Source is like a marriage, 'it comes back and you have to answer questions… it is also like children, you cannot let them out into the wild [without guidance]' ;-)

16:12:58 [DeirdreLee]

Aivan: if you have to convince CNN in an elevator pitch to use the approach as BBC, how would you do it?

16:13:35 [DeirdreLee]

Olivier (BBC, from audience): focus on your own data, and use Open Data where possible to fill the gaps

16:13:59 [DeirdreLee]

TimBL: Who publlishes data about their own products?

16:14:42 [yvesr]

s/Olivier/yvesr :)

16:15:11 [ivan]

s/Aivan/Ivan/

16:15:18 [DeirdreLee]

... if people publich data about their own products, there won't be a need for CNN to publish data

16:15:22 [albertm``]

albertm`` has joined #odw

16:15:59 [StevenPemberton]

s/publich/publish

16:16:24 [bhyland]

I invite everyone to publish information about their organization, project, product and/or service on the Web today using http://dir.w3.org.

16:16:25 [bhyland]

If you care, it is a entirely Linked Data app. If you don't care, just fill out the form, publish the dir.ttl file produced for you automagically (like FOAF-a-Matic) on the public Web and submit it for harvesting.

16:16:33 [StevenPemberton]

s/publlishes/publishes

16:16:40 [DeirdreLee]

sofia: so much in archives, not just about publishing data, but reusing data

CNN will have to put out metadata or risk losing sales or eyeballs. Let's learn from history where first movers got value (like Airlines that listed their schedules and prices on GDS', then other Airlines followed rapidly to not be at a disadvantage)