I have had plenty of issues with the Jet database engine - dates not being imported correctly most commonly. But I know and am aware of this issue and will commonly create a schema.ini. However, for less advanced users, Tableau's use of Jet is causing severe data quality issues. For example, I was working with a colleague who had pulled some data in from Excel. We were using one of the dimensions (location) when I noticed a much larger than usual NULL category (the data had some legitimate NULLS in it). It turns out the first 50,000 rows of the data had a numerical tag for the locations, and the last 20,000 were alphanumeric. Jet, having decided location was only numeric then replaced these latter locations with NULLs. I guarantee you (Tableau) that this is happening more often than you think and people are making decisions based on bad data.I cannot say this strongly enough; this is Tableau's worst failing by far.Suggestions for partially fixing this (in the absence of replacing Jet):

If someone changes a data type (number to string), prompt the user that the Jet engine may have missed some data out. Automatically write a schema.ini out based on what Tableau now knows about the data, and offer to reimport

Upon import, list the columns out and inform the user what Jet thinks they are, show the number of NULLs returned in each column. Allow the user to make changes, automatically create a schema....

After import, Tableau would look at any columns with a suspiciously high number of NULLs and do a fake import of just those columns this time checking many more rows (if not all), write a schema.ini out if there's a problem, and reimport.

I reiterate, this is not a problem for many of us on this forum who are used to the failings of JET. However, this is a problem for many. I would like members of the Tableau team to comment on this post please.

I've also toyed with the idea that Tableau could apply some heuristics of it's own to generate a schema.ini to overcome some of JET's worst failings - I'll have a think about the ideas you have suggested and see if I have any more ideas to add.

There's one point in your posting that I don't agree with. You say:

> I reiterate, this is not a problem for many of us on this forum who are used to the failings of JET.

I'm not so sure about that. I would definitely count myself amongst the people who are very used to JET's failings. I very often use schema.ini files. But I still regularly get caught out by JET's habit of silently NULLing data which doesn't agree with it's initial guess about data types. This is particularly bad when the outliers which get NULLed are the most important values.

I had an example recently where the distribution of system response times was cut-off at just over 30 seconds (actually just over 30,000 ms). We initially interpreted this as an indication of a time-out setting somewhere and had people head off down the path of looking for where the time-out was happening. I then happened to spot that the maximum response time was actually 32,767 ms. JET had just made it a short integer and NULLed everything that didn't fit. In this case there was no harm done - just a slightly embarrassing retraction of some advice I'd given - but it's very easy to see how any of us could fall into the trap and make wrong decisions on the basis of mangled data.

One other very nasty habit of JET's which I have fallen foul of recently is silently ignoring everything beyond 4 GB in a large file. For that one it would be very easy for Tableau to spot this and warn about it (and I think one of the Tableau folk, probably Robert, said he had made a note to look at doing that when I suggested that a little while ago, so hopefully this one may appear some time soon).

Great points Richard - I agree I get caught out as well - I guess I was trying to make the point that the problem is much larger than the odd post here on the forum, but most people don't even realize it is a problem...

Just thinking about this some more, your suggestions have triggered an idea for me on how to audit the results returned by JET. One of the really difficult things is always locating the bad rows - especially on big input files. (I know Joe always advocates an ETL process, which I agree is the best way to go if you have the skills, the tools and the time...)

The idea I'm going to play with is this:

1) Clone the physical data file.

2) Connect to one copy using the default JET settings and let JET do it's worst.

3) Define a schema.ini entry for the other copy, and define every field as text.

4) Define a custom SQL connection which joins the two versions. Filter on rows where a column is NULL in one copy and not NULL in the other copy.

Not sure how effective this will be, but it seems like it could be a useful audit step. I'll give it a go on a known problematic data source...

Agreed Joe - but a schema is no use for people who have never heard of one, or even understand it's a problem. The format to text suggestion is a good one - I didn't realize that Jet would take note of that. That fails again of course against a text file.

Interestingly, on my computer (windows 7), TypeGuessRows isn't found in the place suggested, and makes no difference when I set it to zero and reboot...

Does Tableau want to give users the ability to deal with data types when connecting to a data source via MS Jet via a user interface?

Does the benefit of making it easy for a user to set the data types for MS Jet, offset the added dialog box with an additional click?

I don't know all the reason or factors that affect a decision like this, but it seems to me that so far, the choice is, Tableau depends on the data source for the data types. This makes sense for any normal database, the metadata is already there, but for text files, and Excel files, the metadata has to be generated. So with Tableau's current approach, I feel it becomes the user's responsibility to maintain the data types because MS Jet is a poor guesser.

Here is another situation to consider when reading the whole file to guess the data types: If you have a 4GB text file that you want to connect Tableau to, do you want the data loader to read the whole file to determine the metadata, and then read the whole file a second time to load it, or do you want to be able to tell the data loader what the metadata is before loading?

I prefer to Load my data into a database before Tableau because I can control things like metadata (also nearly every data source I want to connect Tableau to can benefit from some kind of transformation). The downside of ETL, like Richard mentioned, is it requires additional software, know how, and time.

It is a difficult situation, a balance of user interface/experience and concept complexity. Do you know of an application that gracefully handles text file imports in a nice user friendly way?

Alex, ping me via email at djewett@tableausoftware.com. We have some changes coming, and I'd like to be sure you are in our beta program for it. (Richard and Joe, we are already counting on you for beta testing.) v6.1 beta will start in early May.

Hi Dan, I am having issues for the same reason, querying fields with mixed data types from Excel through Tableau. I ended up having to make changes to the JET driver in the registry to make it work but would sure appreciate an easier and more granular way through Tableau itself.

me too I have big problems to import "csv" files in Tableau 7.0, also little files of 28.000 rows (the "normal" files in our company are "csv" files of about 800.000 rows)...

I work usually with these files with other BI programs without any problem...

Is there any solution in arrive that will substitute this "horrible" MS JET module in Tableau, or, at least, a way to specific the Fields type like in Exel (where the same files are imported correctly...) without to be obliged to deal with the SCHEMA.INI for each file to be imported?

Now the problem is to convert all the "CSV" or EXCEL files we actually use, one-by-one, in SQL writing down, for each field, the definition and the length... it is a really, loooong, process for files with 60/90 fields each...

And this, only because, the TXT "import" procedure it could be done better and more complete...