Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I'm currently working on a project that lets users input the names of freight vehicles for particular packages. Eg:

Company: White Star Line
Vehicle: The Titanic
Vehicle Type: Ship

At the present time, all three of those fields are unrestricted text fields. So effectively the data is useless for anything except display. If someone enters the exact same details as above, the DB has no means of realising it's the same ship as a record with the exact same entries.

We cannot restrict any of these fields to a set of predefined values (too many potential entries, need to be open to anything), but I'd love to unify some of the more frequently used values. I'm wondering about the most elegant/efficient way of doing this.

My current line of thinking is to add an extra set of database tables with 'recognised company' & 'recognised vehicle', which work in parallel to the existing table. So if a user enters "White Star Line" as the company, and there's a pre-approved entry in the "recognised company" table, it forms an association with that company in an optional 'recognised_company_id' field.

Similarly, if the user enters White Star Line AND 'The Titanic' as ship, and there's an entry for that ship WITH that brand, it enters the association into recognised_vehicle_id.

A few things:

Autocomplete on the input fields could hopefully eliminate some spelling errors/syntax differences on identical brands/vehicles.

Reviewing a count of duplicates of the existing table could help determine which recognised companies/vehicles should be manually added.

This is far from a pretty solution, but it's the best I've got. I'm guessing this is something folks have to do quite regularly, and I'm just wondering - is there a more graceful approach to what I'm trying to achieve? Any rules of thumb for tackling this sort of issue?

1 Answer
1

One possibility would be to build a list of the values you currently have, and make users pick from that list, or enter it separately to get it on the list.

I do this all the time in various web interfaces - when I need to pay a bill through my online bank, for example, I first have to go create a payee. I curse this when I am only paying a one-time bill, but I love that I don't have to enter my account number and other details every time I make a car payment, mortgage payment or pay an electric bill - only the first time.

This will also reduce the noise from "almost" duplicates, for example if one person spells it Weber's and another spells it Webbers hopefully the second user will just use the first even if they don't think it's correct (and if they're both wrong you still only have to correct one row).

If you make users see the long-term benefit there should be no screaming about the up-front pain.

Hi - thanks for the response. This is an option, but I suspect the proportion of recognised entries will be less than 25%. If it was higher this would certainly be the way to go, but I think the added inconvenience wouldnt' be justified for just 25% of cases.
–
PlankTonApr 24 '12 at 20:30

So 75% of the entries are unique and will remain unique? (And not unique as in the Weber's vs. Webbers case, but truly representing different company/vehicle/vehicle type combinations?)
–
Aaron Bertrand♦Apr 24 '12 at 20:32

It's a crude estimate, but yeah. The vehicles used are upgraded regularly & the fleet is enormous, so the likelihood of any given entry having been entered before is't particularly high.
–
PlankTonApr 24 '12 at 22:27

Then maybe build a list of just the 25% most common ones, and let them pick from the list or just enter free text. Might be the right compromise between complete mayhem and complete control. :-)
–
Aaron Bertrand♦Apr 24 '12 at 22:30