Suggestion for learning SQL

As I continue in my job search, it's becoming clear that I seriously need to expand some skills. First up: I need to learn some SQL. Most of the jobs I'm looking at are mostly non-technical, but require some level of data mining and analysis, all of which is accomplished via SQL, variant unknown.

I've gone through some of the W3Schools online SQL tutorial and it was informative, but I don't feel confident that I could produce a correct query without a reference in front of me. Are there any books, online classes, etc. that would at least give me a reasonable ability to put SQL proficiency on my resume and be able to talk intelligently about queries in an interview?

I'm also starting a Master's later this year that has a couple classes on SQL and DB design, but anything I could learn ahead of that would be really helpful.

You need to write a small application on your own that uses a SQL backend. You'll stumble through it but by the end, you'll feel much better about it.

A suggestion you could do is make a phone book application (record names with phone numbers) and make sure to keep names in one table with numbers in another and make in so one name can have multiple numbers.

1. Get at least a basic understanding of set theory. I cannot tell you how often you find people trying to do things in very difficult ways using SQL because they lack any understanding of set theory.2. Learn SQL.

If it's not, it might be a good idea to use Access or LibreOffice Calc to model some data interesting to you, enter data and create statistics using SELECTs. For instance, you could model statistics for basketball. Points scored, rebounds; extract season averages, etc.

I've gone through some of the W3Schools online SQL tutorial and it was informative, but I don't feel confident that I could produce a correct query without a reference in front of me.

As mentioned above, you need to "get it". It's not about syntax - it's a way of thinking about things. It's much simpler than most programmers make it out to be - looking at their schemas and SQL can make your head explode.

Quote:

Are there any books, online classes, etc. that would at least give me a reasonable ability to put SQL proficiency on my resume and be able to talk intelligently about queries in an interview?

Read a book called The Art of SQL. It does a pretty good job. In the end, though, I'm not sure how you expect to get through an interview based on some home study, unless the interviewer simply sucks, or isn't looking for much.

The older (much older) editions of CJ Date's "An Introduction to Database Systems" explains the high-level logic (relational algebra) that you need to understand modern SQL databases very well. Because it predates SQL's takeover of the market, he explains the concepts more on their own, before getting bogged down in syntax. I was able to follow it all pretty easily in high school. If you get one of the old editions of this, you can skip the parts that talk about DB models that have been abandoned.

Edit: apparently, even the newer editions maintain its more theoretical focus. I was under the impression that he'd "sold out" and made the later revisions more "howto-ish".

Thanks for the suggestions folks. I ordered a couple of books from Amazon. I don't need the back-end programming experience as much as I need the ability to extract and analyze data, so I'm more focused on that aspect although I will have to learn some DB administration. I appreciate the help, espcially the link to the Stanford class.

In the end, though, I'm not sure how you expect to get through an interview based on some home study, unless the interviewer simply sucks, or isn't looking for much.

Using SQL isn't a primary focus for the job, it's just the method to extract the data I'll be working on. The job isn't technical in nature.

Edit: although the listing has already disappeared so it seems I missed that one. GDFS...need to hurry up and study so I don't miss the next one.

The biggest problem I seem to find is that candidates can run simple queries, but don't seem to grok the relationships between tables and what they mean in terms of usage.

An example schema:

Code:

Stores: Store_ID Name

Sales: Sale_ID Store_ID Date Amount

Safety_Incidents: Safety_Incident_ID Store_ID Date Cost

If I wanted you to produce a report that showed me columns for Store Name, Number of and Cost of Safety Incidents and Number of and Total of Sales, I'd expect you to know what's wrong with the following query:

Sadly, a significant number of interviewees with 5-10+ years of SQL experience will produce that query when presented with that situation in an interview. I don't care if you botch the syntax a bit, but if you write this query, there's a lot about databases you don't understand.

The glaring issue I see with that example is that the inner joins could result in stores being omitted completely if they have no sales, or no safety incidents. You'd want to left join to the stores table to make sure you got some output for every store.

Which is a good example of how bad logic in SQL can burn you. It's not hard to write a query that runs. Making sure the output is correct to your intent is the trick.

The glaring issue I see with that example is that the inner joins could result in stores being omitted completely if they have no sales, or no safety incidents. You'd want to left join to the stores table to make sure you got some output for every store.

Which is a good example of how bad logic in SQL can burn you. It's not hard to write a query that runs. Making sure the output is correct to your intent is the trick.

There are two one to many relationships, both being joined and no common grain on which it isn't a multiplicative result. A store that has 2 sales and 3 incidents would produce 6 rows to be aggregated.

The glaring issue I see with that example is that the inner joins could result in stores being omitted completely if they have no sales, or no safety incidents. You'd want to left join to the stores table to make sure you got some output for every store.

Which is a good example of how bad logic in SQL can burn you. It's not hard to write a query that runs. Making sure the output is correct to your intent is the trick.

There are two one to many relationships, both being joined and no common grain on which it isn't a multiplicative result. A store that has 2 sales and 3 incidents would produce 6 rows to be aggregated.

Oh yeah you'd want to aggregate to a single store_id before joining. Derp.

There are two one to many relationships, both being joined and no common grain on which it isn't a multiplicative result. A store that has 2 sales and 3 incidents would produce 6 rows to be aggregated.

To be fair, a substantial portion of the issue here is the retarded syntax of SQL, and it's encouragement of the 'one massive statement' style of development, rather than something based on relational algebra, ala D, which would encourage modularity.

There are two one to many relationships, both being joined and no common grain on which it isn't a multiplicative result. A store that has 2 sales and 3 incidents would produce 6 rows to be aggregated.

To be fair, a substantial portion of the issue here is the retarded syntax of SQL, and it's encouragement of the 'one massive statement' style of development, rather than something based on relational algebra, ala D, which would encourage modularity.

I've found that the more I do SQL, the more modular I write. Just for clarity and ease of modification. Discovering the WITH clause, for example, changed my goddamn world.

To be fair, a substantial portion of the issue here is the retarded syntax of SQL, and it's encouragement of the 'one massive statement' style of development, rather than something based on relational algebra, ala D, which would encourage modularity.

There's no reason for statements in programming. I find it more readable and logical to nest expressions than to break them into separate statements, especially if there is no enforced order of execution of the code.

As a non-SQL expert, could someone post the "correct" way to structure that example report query? I know I've been bitten by that double-counting issue before (and resolved it through trial and error), but would appreciate a refresher on it.

As a non-SQL expert, could someone post the "correct" way to structure that example report query? I know I've been bitten by that double-counting issue before (and resolved it through trial and error), but would appreciate a refresher on it.

Dracorat's query works fine. Richard's option of two queries works fine as well. Both are going to get to the same end, which is to find a common level of granularity on which you can relate the data.

I don't consider this an expert topic for SQL. It's not the first thing you learn about SQL, but I would say it's fundamental. In interviewing someone with less experience, it wouldn't bother me as much. In interviewing someone claiming plenty of SQL experience, it would seriously impact my impression of them.

As a non-SQL expert, could someone post the "correct" way to structure that example report query? I know I've been bitten by that double-counting issue before (and resolved it through trial and error), but would appreciate a refresher on it.

I would think about it more in terms of what it's doing than syntax. It's all about the logic.

So if you have the following tables:

Code:

Store_id name1 foo2 bar3 baz

Code:

sale_id store_id amount1 1 502 1 753 2 1504 2 905 2 656 3 207 3 12

Code:

incident_id store_id cost1 1 252 1 103 2 204 3 30

The original query would do the following when joining stores to incidents:

Code:

name incident_id costfoo 1 25foo 2 10bar 3 20baz 4 30

So far, so good. But now we join in the sales data and things get awkward.

You've inadvertently created a Cartesian product of the incidents and sales for each location resulting in inaccurate results. In larger datasets, it's really easy to miss a mistake like that as the output looks just fine. But it's not.

I find it helps to break down the needed output as much as possible and then bring it all together. With the above example, I would first write a query to just hit the incidents table and aggregate counts and amounts, then validate it's output. Then do the same with sales data. Once both pieces look good, it's just a matter of hooking them together with a WITH clause, or as joined sub-queries.

Huh, just tried it: looks like Sybase finally allows correlated subqueries in the FROM clause without having to explicitly bring the outer table into the inner query's scope. I'm certain this didn't work in prior versions.

Quote:

In interviewing someone with less experience, it wouldn't bother me as much.

It would definitely influence my decision if the position required them to work with data, period. Cardinality is not just a SQL quirk. I couldn't just brush off this mistake unless the candidate were a true newb.

Quote:

"with" lets you do both without the appearance of statements (implying a method of execution). However, I suppose the difference is only cosmetic.

To be fair, it's not always cosmetic. Lacking the flexibility / composability found in more modern SQL flavors, I find myself using SELECT INTO #NamedSubExpression a lot, if only for my own sanity. This affects OoO, not to mention the DB's optimizer behavior.

But yes -- in a sane world there would be no different. Just as the following C# fragments are completely identical once compiled:

Sorry for being late to the game, but I used to teach SQL, and the general plan of learning SQL by just hacking out a simple app is greatly flawed.

There are several levels of "knowing SQL"

Basic understanding of what it is

The basic syntax for basic usage

Getting a basic understanding of all the vast things SQL can do, I why you might use them.

People who learn SQL by doing tend to short-circuit their knowledge by focusing on the basics, and then over-using the basics while ignoring all the advanced functions that they should be using simply because they now know a way instead of the right way.

Hacking out an app is a great way to get to basic knowledge. But, IMHO, nobody should ever stop there unless they want to stay mediocre. It would be like learning how to code by hacking out if/then blocks without ever learning about objects and classes and how to use them to your advantage.

Step 2 is that you need to pick a platform (MSSQL, Oracle, DB2) and get a book that surveys through all the developer functionality of the platform (There is tons of overlap, regardless of which you learn you'll get a reasonable overview of capabilities.) Go through it and discover what modern SQL platforms can do, and make sure you mentally think through what each thing is for in the real world. The goal is to get a good enough feel of what you can do that in the future that SQL can help you with. You don't need to memorize syntax for everything, just remember enough to look up stuff when you recognize you have a problem that SQL can help with.

IMHO, there is a vast skill difference between guys who "use sql" (they know all the CRUD operations and how to use them) and guys who "know SQL" (they've done Step 2 and know what SQL can do beyond the basics of CRUD) There is a massive amount more you can deliver if you do step 2 and learn beyond the CRUD operations, and it will establish you as a person who can be trusted with data in your future position. Although >80% of SQL is CRUD stuff, the other 20% is usually where the big wins are of stuff that can make your life a LOT easier if you understand it.

SELECT d.Deptnum, Count(*) as Emps, Sum(e.Salary) as Salsum, Budget FROM Emp as e JOIN Dept as d ON e.Deptnum = d.Deptnum GROUP BY d.Deptnum, Budget HAVING Sum(e.Salary) > Budget

The first, despite the rampant capitalization , is IMHO, a far nicer syntax, and more closely follows the logical structure of what is requested. And it would be nicer still with the use of sensible names rather than T1/T2/T3.

But alas, we're stuck using SQL, which while acceptable and perhaps even preferable for the simplest queries, sucks for more complex stuff.

SELECT d.Deptnum, Count(*) as Emps, Sum(e.Salary) as Salsum, Budget FROM Emp as e JOIN Dept as d ON e.Deptnum = d.Deptnum GROUP BY d.Deptnum, Budget HAVING Sum(e.Salary) > Budget

The first, despite the rampant capitalization , is IMHO, a far nicer syntax, and more closely follows the logical structure of what is requested. And it would be nicer still with the use of sensible names rather than T1/T2/T3.

But alas, we're stuck using SQL, which while acceptable and perhaps even preferable for the simplest queries, sucks for more complex stuff.

The first query has more differences than assigning to variables. It also uses a syntax closer to relational algebra, which looks terser and more elegant. I assume you could say:

which I would prefer unless certain expressions are reused or contain certain sufficiently distinct logic from the rest of the query that it makes sense to separate and name them for clarity.

Yep - you still have that option.

But the advantage of separating out and naming the subqueries is that for many tables or pairs of tables, certain subqueries are very common, so why the copy-pasta?

Furthermore, hitting people's working memory limit (about 7) is also much less likely to happen when the concept of say, per dept employee summary (which would probably include the summary & join above), is as clear and usable as the underlying tables.

Naturally the query optimizer can choose not to create any attributes of these intermediate relations when they're not needed later on.

As a non-SQL expert, could someone post the "correct" way to structure that example report query? I know I've been bitten by that double-counting issue before (and resolved it through trial and error), but would appreciate a refresher on it.

Dracorat's query works fine. Richard's option of two queries works fine as well. Both are going to get to the same end, which is to find a common level of granularity on which you can relate the data.

I don't consider this an expert topic for SQL. It's not the first thing you learn about SQL, but I would say it's fundamental. In interviewing someone with less experience, it wouldn't bother me as much. In interviewing someone claiming plenty of SQL experience, it would seriously impact my impression of them.

Ha, yeah, I probably should have chosen a word other than "expert". I've got some experience writing CRUD stored procedures and the odd report query that could get fairly heavy, but each time I'd learn/google just enough to get through it and return to my normal C# work--with the result being that I would oftentimes forget what I had learned.

Maybe this thread will be the impetus for me actually remembering this key concept about relational databases