The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Identifying Missing Data Fields?

Hi All,

I'm hoping someone can offer some insite into this one!

I'm making a database application that will have several users collect data from various sources and enter it as they gather it. I'm looking for a method of reporting progress on entering all the data ... maybe expressed as a percentage. So, say there are 20 fields in a given row, if 5 are still empty, I'd be able to determine that the row was 75% complete. Another useful tool would be to highlight which fields are still incomplete.

Example:

Say I've got a table that contains various data about 100 restaurants in my city. After 2 weeks of data gathering, I'd like to be able to see how close I am to having complete data for (restaurant _a), (restaurant_b), etc each individually, or all 100, as a percentage. Then, I'd like to be able to generate a report that show which data are still missing for each restaurant.

I was thinking along the lines of COUNTing NULL values or something, but I can't get it to work properly. Or maybe a giant WHERE clause ...

PHP Code:

SELECT COUNT(*)
FROM Table1, Table2, Table3 // , ... (for all tables)
WHERE Column1 IS NULL OR Column2 IS NULL OR Column2 IS NULL // OR ... (for all columns);

which isn't right because you're not going to put each restaurant in its own table -- i won't let you

as far as determining the percentage of attributes that each restaurant has or has not got data for, you would be in a better position to judge how to write that query after you have actually amassed some data, and the first step towards doing that is to decide which attributes you want to record

Each product can contain many components. Each components can potentially be in many products. Each component may have many substances in it, and each substance may exist in any number of components, hence the lookup tables.

For a given product, I'd like to be able to see which components are missing substance data, which specific data are missing, and what percentage are missing. I'd like to be able to tell, for a given component, what % is complete. Also, for a given product, what % is complete (ie sum up all the components in the product).

Are there any straight forward methods of making these types of queries?

whether a column is NULL or NOT NULL depends entirely on whether the business rules of the application dictate that there should always be a value

I the application doesn't require that all the attributes have values, so I currently have unentered data represented as NULL.

if you want to count rows, use COUNT(*), which ignores NULLs

I think I see: to count the rows without data, I count all the rows, then all the rows with data and take the difference? But what about counting empty attributes? I need this to get a % of completeness for each component.

a LEFT JOIN with WHERE ... IS NULL will also do it

I'll look into that. Thanks!

which some?

Well, any really. I'd like to be able to generate a list of components for a given product that still needs more info (either more Substance data, or more straight Component attributes) ... basically anything short of fully complete. I'm hoping I don't have to make a string of 'OR colunm1 IS NULL OR Column2 IS NULL OR ..." for all the columns.

are you suggesting that "fully complete" means that a product must have every component?

In my model, a product contains many components, and data will be collected about the components ... not directly about the products. So, to be fully complete, a product must have every data for each of its components. So, each product, when entered into the database, has all its components, just not the data about each component or the data about which substances are in each component.

not that the PC table does not contain NULLs, rather, the NULLs come from the way a LEFT OUTER JOIN works, and represents a missing row

... maybe, I'm not sure if that's what I want or not. I haven't seen CROSS JOIN or LEFT OUTER JOIN before ... only LEFT JOIN or OUTER JOIN. I'll need to read or experiment a bit to see what they are doing.