Monday, July 15, 2013

Do you have strange characters in your Web app? Like a reversed question mark? Or characters like these: ⁿ‼↨♫☼◙ Sometimes it happens where you should see an apostrophe or a quote character like : Daniel◙s Bar-B-Q .

Here's one reason why: Oracle and Unicode. My scenario went like this:

A user has a Document with some Unicode characters on it like the fancy "smart quote" characters (a Unicode u2019 for apostrophe)

The user does a copy & paste to put the text into the Web app's comments field (a Java String)

The web app saves the String into an Oracle VARCHAR2(4000) column called comments

The Oracle instance has a popular character setting: US-ASCII-7, which does not support Unicode characters

Oracle converts the character by truncating the leftmost bits, rendering an unprintable character ASCII(19) into the comments column

Later, the Webapp reads the Oracle column, displays it to the browser, and displays the unprintable ASCII(19) character. as something Strange like ♫ or ◙.

Throw an error. This would rollback transactions and force developers to handle or prevent the errors - pbbbt... what else you got?

Remove the Unicode character(s). That would change the size of the value stored.

Convert the character to something else, quietly. This is actually what happens, but it has it's own pitfalls: silent but deadly... pitfalls

Oracle can actually do some impressive up-converts : like from Windows-1252 to UTF-8. But unfortunately, the down-convert smashing of "smart quote" chars into ASCII uses the most brain-dead algorithm - truncating the left most bits - and u2019 becomes x19, which is unprintable. And if you try to convert that to XML, x19 is not even a valid XML character and the parser crashes.

Why Unicode? What are Smart Quotes?So how did these Unicode characters get into my App anyhow? "My app only uses English, it's a simple app", you might be saying. The answer is that popular apps like Microsoft Office will automatically change your " or ' characters into more fancy, curly quote characters as you type. Nice, eh? Why not make your text look better? Who wouldn't want that? So now millions of MS Word documents with Unicode chars are floating around the planet, even your users are typing them, and they are copy & pasting those strings into your app. So THAT's where the Unicode came from for my scenario above.
And truly this should be OK, because tons of systems support Unicode already, Java and other languages use Unicode for their Strings internally already. And lots of apps like your Web browser support Unicode too. Oracle supports Unicode as well, but you do have to be aware at installation time. Oracle's US7ASCII setting is very popular, including on my Oracle instance, but I wouldn't recommend it. In today's world, you need Unicode, UTF-8 is well supported, and would have helped me here.

Now What? How do I Fix It?You have a few options:

1. Long-term, switch to UTF-8. UTF-8 is well-supported and it's a super-set of ASCII, so all of the old ASCII text should convert in all of your VARCHAR2 columns. For almost all characters, the size is still just 8-bits, but can be larger for smart-quotes and others. However, converting an entire Oracle instance is nuclear - if you have a large database, this can take time, and the conversion your DBA recommends may be akin to backing up everything, and re-importing all of your data: hours or days of effort.

2. Use NVARCHAR2. You might consider the Oracle NVARCHAR2 column type just for your large comment-style fields that need free-form text. This will only require an ALTER TABLE on that column. But you will have to remember this for new Tables with similar needs.

3. Use a BLOB. If you store the field in a BLOB column, Oracle will not do any conversion. For columns where you don't care about indexes and search, like a comments section on your app, this should be OK. In Java, for example, you may not have to change any code at all. This is because you can still directly use this column as a String, i.e. JDBC Statement.setString() and getString() methods work fine without the messy JDBC BLOB handling. Or, JPA's @Lob annotation can also be placed on a String for minimum coding effort. As an added benefit, your users can paste the entire contents of "War and Peace" without having a size limit of 4000 in VARCHAR2. Note: Oracle's CLOB column type will still do character conversion, so it will not help here.

4. Filter the characters. In the short term I ended up using a regex to filter out and remove any unprintable characters. Unfortunately this lost the character, turning "Dan's Bar-B-Q" into "Dans Bar-B-Q". Users were not too happy with that, but it did stop the bleeding until I can implement one of the above solutions.

Conclusion, RecommendationIf you have strange characters appearing in your application, check the character set in your database engine. Oracle, SQL Server, Postgres, MySQL - they can all support a wide range of settings. Also, think about the design of your fields, do you really want VARCHAR2(4000) for a comments field? Maybe BLOB would be better?

Use UTF-8 in your database engine

Use a BLOB (careful, not CLOB)for a comments column. You'll get a big size, and still use a simple String in memory

Make a separate table for all of the comment fields, you probably have many comment fields in your app:

Other SolutionsWindows-1252. This Microsoft-created character set looks attractive because it is still 8-bit, and it includes the curvy smart-quote characters and many other useful punctuation characters. Oracle supports it too, calling it WE8MSWIN1252 in their settings. However, this is really just a stop-gap, a hack, until good Unicode support arrived. So I cannot recommend this unless you are desperate. Technically it would work, but now that UTF-8 support is so good everywhere, and with disk space at pennies per GB, there's no need for it, and IMHO, Windows-1252 character set support should start to fade.

Monday, May 27, 2013

Oracle stored procedures are not my favorite slice of technology, but I have to use them sometimes. In this case I had a stored procedure that I needed to call: it returns a set of rows - a result set. But Oracle Stored procedures do not return anything by design - other database systems like SQLServer and Postgres can, but not Oracle. So what's a Git-R-Donepragmatic developer to do?

The stored procedure in question used an OUT parameter with a "cursor" type, this is how Oracle can "return" the results of a select statement. (I can feel my CS professor grading this now: "D-, NEVER use Out Params!" some languages like Java do not even have out params). And then the NEXT challenge: How do you call that from Java? Good question - this case doesn't happen in the abridged JDBC docs, nor in the Hibernate nor Spring frameworks docs either.

JDBC has standard support for ResultSets returned from a stored procedure CallableStatment, this is how many DB vendors like Postgres and SQLServer work. In those drivers, Statement.exeecuteQuery() returns a ResultSet. But Oracle has a long history of having backwards and low-compatibility issues for JDBC details,and this is another case of Oracle jamming their square-peg stored procedures into the round hole of JDBC. But you can make it work.

Here are the "challenges":
* Oracle Stored Procedures CANNOT return a result set (it's an old design?)
* Oracle "gives" you SYS_REFCURSOR on an OUT param (not a JDBC standard, so that's an odd feeling)
* Oracle's JDBC driver does not have a JDBC-standard way to do this, they require OracleType.CURSOR
* Oracle's driver rejects the "OTHER" SQL Type, so you cannot use this JDBC Standard either

Step 1: Your stored procedure

Oracle stored procedures can use an OUT param with type SYS_REFCURSOR to return results of a SQL Select:

Step 2: CallableStatment

Of course, been there, done that - the ol' JDBC way to call stored procedures:

CallableStatement cs = connection.prepareCall("call jay_proc(?)");

Step 3: register the OUT param

Feeling squeemish? It's OK... we are almost finished ...you'll feel a small prick in your good programming sense:

cs.registerOutParameter(1, OracleType.CURSOR);
cs.execute();

Step 4: blink... cast the ResultSet ??!

err.. Yes, it's true, the object returned is actually a ResultSet, but you have to cast it:

ResultSet rs = (ResultSet) cs.getObject(1);

Step 5: Use the ResultSet as normal, JDBC calls

Whew! It's over.. Now we have what we wanted, a ResultSet:

while( rs.next()){

Conclusion

Oracle does not directly support stored procedures that return ResultSet, but with these 3 or 4 steps, you can get the stored procedure with a cursor to work. I assume that Oracle's JDBC driver does the work of mapping their cursor object to a JDBC-compliant ResultSet, which is nice. But if I had a preference, I would wish for Oracle to support procedures that can return rows from a select. Then you could use the execute() method to return a ResultSet directly without the odd casting operation in step 4.

Ideally you would use a SQL Select statement and the PreparedStatement JDBC object, but I was saddled with an existing stored procedure in this case. I do not recommend using stored procedures for almost any use case, if you can avoid it. Instead I prefer to use Java's standard JPA as much as possible.

Friday, October 1, 2010

Yesterday I was in a meeting with several enterprise application developers, the users of our application were complaining that page load times had increased to 4 seconds. One of the users mentioned that google loads in under a second.

What was the general response? Well, this is pretty good, we're not google after all. If we had the resources of google, maybe we could get our page load times down.

To me 4 seconds is an eternity for a user application, on most of the apps I've worked on we've usually had response times less than 300ms. It is usually fairly easy to meet that goal with a combination of caching, sql tuning and using tools like jawr.

I usually don't like to release code with high page load times unless there is a good reason.

Friday, December 18, 2009

Google Public DNS is primarily aimed at users of the internet. I had run across it surfing the web and filed it away. But it wasn't until my configured DNS servers for my local network went down (again) that I decided to give it a shot.

A couple of things that I notice right away is that the IP addresses are easy to remember:

8.8.8.8 and 8.8.4.4

That's actually something I am going to be able to recall next time I find myself with broken DNS services. The second thing is that my ping to these are 24ms roughly, that's not too bad. The real test is lookup speed, and this is where Google is riding on the edge as they appear to be doing some aggressive pre-lookup of DNS entries on slow DNS servers. Are they going to break TTL? A lot of people design fail-over using a lower TTL in their DNS responses so that the cache will be flushed if a failover is needed to a new IP address.

In any case, I've giving it a go for a while and so far my surfing does feel faster.

So if your public facing users are complaining about a slow web experience on your site, one suggestion is to point them to the free Google DNS servers to see if that improves their experience.