I want to get serious about my mostly C# code blog, but I'm not sure what sample code would be most useful to the average programmer. The last 10 years I've mainly focused on the important but unglamorous stuff like File I/O, low level Internet protocols, parallel programming, Windows internals, generics, web service integration and all the nitty gritty stuff that normally give programmers grief and cause them to do retarded shit.

So, any thoughts from the coders here about what would be most useful? Currently I'm working on a sample to capture a database schema to an XML file using .NET's generic DB access. Turns out our $5,000 red gate tool will only do SQL server and not ODBC or any other provider. So I had to write a schema capture/comparison tool myself to catch changes in one of our ancient systems. I'm going to build a stripped down version of it for my blog. But I'm at a loss as to what next.

C++ was my bread and butter for more than a decade. When they were building the first version of .NET, I used to beg them for pointers on the news groups.

One exchange went something like this:

Me: Please give me pointers.
MS: Think of the GC as your own personal chef. We do the chopping and keep the sharp knives away from you.
Me: But ... what if I'm a better chef than you? What if I have better quality knives?
MS: You may or may not be, but most people aren't, and we're aiming this at the lowest common denominator. We want a world where software doesn't leak.

Since then I'm a total believer. I used to be the guy that showed up at your cube with a printout from BoundsChecker, saying "here's the 457 memory leaks in your 400 lines of code." Since then I've seen many years of mediocre programmers writing code that at least won't make the system leak until you have to reboot the server. Before managed code, the servers needed to be rebooted regularly. Most of my systems now, the only down time is when they pick up Windows updates. Our weak outsourced team can't even fuck that up.

But years later I personally still see the GC as this clunky thing, so yeah, thanks for the suggestion. I'm not an expert on the GC but this is the kind of nerdy thing I like. My GC kung fu will get stronger. Challenge accepted.

The DB capture-to-SQL article is almost done. I just need to install MySQL and make sure I can capture from that provider. It works really well. It can capture the schema from any generic data source using any .NET provider and is already capturing tables, views, indexes and procedures. It's a fully functional Console app and it's only 200 lines of code- fluffy code even.

The most common C# mistakes we find are all memory leaks from people who still believe the GC secret sauce. The difference shifts to "you leaked it because you forgot to free it" to "you leaked it because you held a ref you didn't know about".

Hey check out the Database Schema Capture sample. It's a console app that captures pretty much every data point the provider will cough up and dumps it to an XML file. It uses .NET's generic DB access System.Data.Common. I tested it on a bunch of different providers. It was fun to write and I hope a few people find it useful.

Yep, on the list, but first I want to do a comparison class that gives a delta XML file when you give it two input XML files. This kind of stuff has just killed us when someone renames a column or changes its width, a week later some production system blows up and 8 hours of troubleshooting later I say "hey, did you rename column x to column y?" We already had a fancy red gate tool that does that, but it only works with SQL server and we were also running into this on older ODBC systems. The code I wrote is 100% generic and even captures all the provider info, reserved keywords, data types, etc. Now I build "schema awareness" into our important systems. If you try to slip an undocumented database change in there, code like this will bust your ass. I got tired of playing Where's Waldo with the database.

Been thinking about the GC and memory/resource allocation. People also think the voodoo inside the GC magically stops them from leaking other resources like system handles and GDI objects. I'll admit some of what the GC does is still voodoo to me, too, so that should be a good blog post.

What about design patterns? Does anyone still care about them? I see them everywhere in Microsoft's stuff and I see people use Microsoft's MVC but I don't hear a lot of people talking about them. Maybe I just don't get out much.

Ddrak wrote:
Now you need a way to create a database from the XML you generate.

I've been chipping away at this code but I hit a snag. The generic provider for SQL server does not report enough info to reconstitute the database. I am having to tweak the capture code to capture some provider specific stuff like identity columns. Other than that, I should absolutely be able to reconstitute a database from the XML schema snapshot.

Only said it because we've got a home-grown tool at work for exporting and recreating databases, except we just export to SQL scripts (a bit like mysql's backups). It's always a pain with new features in SQL Server to keep working.

My "recombobulator" tool outputs an SQL script. You're welcome to use it when it's done.

I just about have it building the table structure perfectly. All that's left I think are default column values. But it's getting the identity columns correct now that that the capture tool grabs sys.all_objects, sys.tables, sys.columns and sys.identity_columns if it sees the SQL Server provider.

The cool thing about the table generation is that the generic schema gives a format string and parameter list for all column data types, which I use to build each column of the script. It should also be generating gracefully on non-SQL server providers, too. For example, without knowing for sure if a key is a primary key, if all else fails it looks for its name to start with "PK_". But if it has the provider specific info to know for sure, it will go by that.

Still to do are constraints (another biggie but easier than tables) and procedures. This blog post will also update the first one and should be a good companion to it.

Expand on stackoverflow.com answers, but on your blog. A lot of people simply write the correct answer to a question on there, but never explain why, or go into the history of why things work that way. It would provide you with endless content and even allow you to mooch off of an already-popular site for SEO (assuming you'd copy/paste the original questions and answers into your posts). I haven't met a developer co-worker in the last 5 years that doesn't use stackoverflow, even if just once a month.

Taxious wrote:Expand on stackoverflow.com answers, but on your blog. A lot of people simply write the correct answer to a question on there, but never explain why, or go into the history of why things work that way. It would provide you with endless content and even allow you to mooch off of an already-popular site for SEO (assuming you'd copy/paste the original questions and answers into your posts). I haven't met a developer co-worker in the last 5 years that doesn't use stackoverflow, even if just once a month.

So you're saying I should link to my blog from Stackoverflow, or just copy and paste the question and let Google find the answer on my blog?

Either way, that's an awesome idea, thank you. If I'm seeing your strategies for getting page views correctly, I'm seeing a) aggregate/regurgitate existing content and b) piggyback on existing content. See, I can make good content, and that's a good way to go, too. But you are basically machine generating content while you sit back and watch, which seems more, uh, profitable. Not to mention less work.

You would copy paste both the question and answer into a new blog post on your site. Then below that, in the same post, you could write extra details about why .NET assumes binary data output (versus PHP default), why MS chose to do that, when and if it changed, other (better, worse, or more popular) encrypting options, or ANYTHING else to expand on the answer given.

Yeah, that's awesome. It's a deep well. But more than that, I never thought of doing blog posts with a question/answer format. I like it. Though I think that makes me their SEO competitor rather than riding their SEO coat-tails. But that's fine, too. Google seems to like my content, and it doesn't hurt my search rankings that Google hosts my content.

I'm really excited about all this stuff. It's been a great learning experience.

This database "recombobulator" is kinda bugging me. The keys and constraints gave me some grief. Now I'm just down to UNIQUE constraints, which SQL server seems to treat as a special case.

In fact, if you choose "CREATE TO-->SCRIPT" for a table with a UNIQUE constraint, it will generate the script for everything but that. My theory is that it's not returned by sys.all_objects, which was how my code was looking for them and missed them, too. But all the IX_ type indexes are returned by sys.indexes instead, and it's the only query I've found which has them.

This has been a huge undertaking, but the blog needs this "pillar content", and I want to create the whole series, so ... Next up will be views, then procedures.

Ddrak wrote:Pretty sure unique constraints are just implemented as indexes under the hood.

Yep, but unique constraints do not show up in a sys.all_objects query with all the other constraints and PKs. I was assuming I could look at a table and dump all its dependent objects in one shot. Right now, dumping a tables' dependent objects doesn't give me unique constraints. Like I said, I need to use sys.indexes instead for this type of constraint.

The best I can tell, default constraints are the opposite. They only seem to show up on sys.all_objects and nothing else. So it's like playing where's Waldo for some of these objects.

I think I have it all worked out now, and one more good late night coding session should finish off keys and constraints.

First cut at the table structure with keys and constraints. A couple little bugs but it's "code complete".

I'm testing it on a huge insurance system. Here's some generated SQL below that shows a little bit of everything. Other than a couple mal-formed constraints, the script builds the whole damn table structure. There are thousands of constraints in my test database and I've just about got it perfect.