Data standards: sampling chickens in an open data way

Here’s an example of why open data standards are important: Campylobacter, which the biggest cause of food poisoning ilness amongst humans. It’s commonly found in chickens, and the Food Standards Agency is actively monitoring for it. So, how to create a useful set of data standards for it?

Cost? How is it determined? Specified for each dataset. For the chicken dataset, it’s the cost per entire chicken. Is it a sensible thing to have in?

Chicken costs

“Date” is a much simpler concept compared to “Cost”. That asks a lot of questions that someone using the dataset needs to figure out – but that means (perhaps) it shouldn’t be part of a standard. The cost is important – not to the FSA – but to the consumer.

Why just “a chicken”? It’s the standard measurement for measuring for Campylobacter. You can do a comparison with chicken breasts, but the standard is “whole plucked chicken with skin on”. Why not just call it “unit cost”?

The temperature is really important. Product temperature? Sample temperature? The temperature needs to be recorded at the time it is sampled, and when it arrives at the lab.

They need to know when and where in the process the sample was taken. This context is useful.

Data on rearing conditions (organic, free-range, corn-fed, etc) and breeds is in another set of data.

Addresses are a “nightmare”

IDs are a struggle. Do they need a universal one? Probably not. A sample one? Probably. Do they need their own one, or a provided one – from the retailer for example? Is there a need for a separate sample ID and chicken ID – will there be multiple samples from one chicken? They use UPRNs to identify retailers – they’re more accurate than dresses, particularly for shopping centres.

Location is really important, because each local authority has its own sample selection process. The FSA doesn’t know where businesses are in the UK – there’s no register. They want to be able to use historical data to target discretionary inspections.

It appears that addresses are a nightmare. For example, there are good reasons for including counties, but they’re just a mess. There are ceremonial counties that are actually unitary authorities. The Post Office has deprecated counties. That’s why they’ve fallen back on UPRNs.

Make it linkable

Make the data linkable – it’ll save work in the long-term. They publish data about every sample, but they don’t include the address explicitly. There is a unique identifier, so the address can be extrapolated from that identifier. That information is already open data.

There is existing data about chicken rearing and conditions, so as the data sets get released you will be able to start looking at chicken movements (and not just for chickens – for all food animals). Is there anything missing from this set of data standards that will allow that linking to happen? What will this data be linked with, and how easy is that to do? Nobody wants to spend a lot of time pondering about what, exactly, a cow is… Can we lock this down usefully?

Once you get there, some interesting paths for analysis emerge. Is there anything about abattoir premises that’s interesting that emerges from this? Species specific? Rate of processing?

'So fowl and fair a dataset I have not seen' @foodgov lead session on chicken, food poisoning and data #odcamp