Thursday, December 29, 2005

I knew that the VM does not exit as long as there is one non-daemon thread running. But there were a couple of new things I learned. Some snippets from the above article:

There are two ways a thread can become a daemon thread (or a user thread, for that matter) without putting your soul at risk. First, you can explicitly specify a thread to be a daemon thread by calling setDaemon(true) on a Thread object. Note that the setDaemon() method must be called before the thread's start() method is invoked.Once a thread has started executing (i.e., its start() method has been called) its daemon status cannot be changed.The second technique for creating a daemon thread is based on an often overlooked feature of Java's threading behavior: If a thread creates a new thread and does not call setDaemon() to explicitlly set the new thread's "daemon status", the new thread inherits the "daemon-status" of the thread that created it. In other words, unless setDaemon(false) is called, all threads created by daemon threads will be daemon threads; similarly, unless setDaemon(true) is called, all threads created by user threads will be user threads.

Monday, December 26, 2005

I have often wanted to learn more about the class file format of Java class files. This study let me to interesting discoveries - regarding what debug information is stored in class files and how it is stored.

When we compile a Java source using the 'javac' exe, the generated class file by default contains some debug info - By default, only the line number and source file information is generated.Hence when a stack trace is printed on the screen, we can see the source file name and the line number also printed on the screen. Also, when using log4J, I remember using a layout which can print the line number of each log statement - I bet log4J uses the debug infomation present in the class file to do this.

The javac exe also has a '-g' option - This option generates all debugging information, including local variables. To understand this better, I compiled a java source twice; once with the -g option and once without. I then decompiled both the class files to see how the decompiled output differs. The class file compiled with the debug option, showed all the local variable names same as in the original Java file, whereas the one without the debug option, had the local variables named by the decompiler as the local variable information was not stored in the class file.

Another interesting fact is that the class file can contain certain attributes that are non-standard; i.e. vendor specific. Suprised !!!...So was I..Please find below a snippet from the VM spec :

Compilers for Java source code are permitted to define and emit class files containing new attributes in the attributes tables of class file structures. Java Virtual Machine implementations are permitted to recognize and use new attributes found in the attributes tables of class file structures. However, all attributes not defined as part of this Java Virtual Machine specification must not affect the semantics of class or interface types. Java Virtual Machine implementations are required to silently ignore attributes they do not recognize.

For instance, defining a new attribute to support vendor-specific debugging is permitted. Because Java Virtual Machine implementations are required to ignore attributes they do not recognize, class files intended for that particular Java Virtual Machine implementation will be usable by other implementations even if those implementations cannot make use of the additional debugging information that the class files contain

While designing a systems, I often came across the moot point of choosing btw a heirarchical database and a relational flat table database. There are some domain problems, where heirarchical databases such as LDAP and MS Directory Services make sense over RDBs.

The hierarchical database model existed before the far more familiar relational model in early mainframe days. Hierarchical databases were blown away by relational versions because it was difficult to model a many-to-many relationship - the very basis of the hierarchical model is that each child element has only one parent element.

In his article, Scott Ambler makes the following statement:Hierarchical databases fell out of favor with the advent of relational databases due to their lack of flexibility because it wouldn’t easily support data access outside the original design of the datastructure. For example, in the customer-order schema you could only access an order through a customer, you couldn’t easily find all the orders that included the sale of a widget because the schema isn’t designed to all that.

I found an article on the microsoft site which discusses the areas where directory services can be used in place of Relational databases.

Thursday, December 22, 2005

It took me sometime to appreciate the inheritance heirarchy in Log4J. It's quite simple to understand the Log Levels and their usage.

It is because of this heirarchy that we can have any arbitraty level of granularity in logging. For e.g. suppose we have 10 components in a distributed environment and we want to shutdown the logging of one component. This can be easily done in Log4J using config files. We can also disable logging at a class level - provided we have a Logger for that class - using the Logger.getLogger(this.class) method. Another example is if U want to switch off logging of one or more modules in a application.

We can even make dymanic changes in the configuration file at runtime and these changes would get reflected provided we have used the configureAndWatch() method.

Classloaders, as the name suggests, are responsible for loading classes within the JVM. Before your class can be executed or accessed, it must become available via a classloader. Given a class name, a classloader locates the class and loads it into the JVM. Classloaders are Java classes themselves. This brings the question: if classloaders are Java classes themselves, who or what loads them?When you execute a Java program (i.e., by typing java at a command prompt), it executes and launches a native Java launcher. By native, I mean native to your current platform and environment. This native Java launcher contains a classloader called the bootstrap classloader. This bootstrap classloader is native to your environment and is not written in Java. The main function of the bootstrap classloader is to load the core Java classes.

The JVM implements two other classloaders by default. The bootstrap classloader loads the extension and application classloaders into memory. Both are written in Java. As mentioned before, the bootstrap classloader loads the core Java classes (for example, classes from the java.util package). The extension classloader loads classes that extend the core Java classes (e.g., classes from the javax packages, or the classes under the ext directory of your runtime). The application classloader loads the classes that make up your application.

bootstrap classloader In application servers, each separately-deployed web application and EJB gets its own classloader (normally; this is certainly the case in WebLogic). This classloader is derived from the application classloader and is responsible for that particular EJB or web application. This new classloader loads all classes that the webapp or EJB require that are not already part the Java core classes or the extension packages. It is also responsible for loading and unloading of classes, a feature missing from the default classloaders. This feature helps in hot deploy of applications. When WebLogic starts up, it uses the Java-supplied application classloader to load the classes that make up its runtime. It then launches individual classloaders, derived from the Java application classloader, which load the classes for individual applications. The individual classloaders are invisible to the classloaders of the other applications; hence, classes loaded for one particular application will not be seen by another application. What if you want to make a single class available to all applications? Load it in a top-level classloader. This could be in the classpath of WebLogic. When WebLogic starts, it will automatically load this class in memory using the Java-supplied application classloader. All sub-application classloaders get access to it. However, the negatives of this approach are clear too. First, you lose the capability of hot deploy for this particular class in all individual applications. Second, any change in this class means that the server needs to be restarted, as there is no mechanism for a Java application classloader to reload classes. You will need to weigh in the pros and cons before you take this approach. WebLogic Server allows you to deploy newer versions of application modules such as EJBs while the server is running. This process is known as hot-deploy or hot-redeploy and is closely related to classloading.Java classloaders do not have any standard mechanism to undeploy or unload a set of classes, nor can they load new versions of classes. In order to make updates to classes in a running virtual machine, the classloader that loaded the changed classes must be replaced with a new classloader. When a classloader is replaced, all classes that were loaded from that classloader (or any classloaders that are offspring of that classloader) must be reloaded. Any instances of these classes must be re-instantiated.

Tuesday, December 20, 2005

Often I have heard people complain that Swing Apps are slow. I also have had some bad experiences with slow Swing Apps, but then I have also seen some very cool high performant applications. So, what's the secret of these fast Swing apps. I think the reason can be found in James Gosling's answer:

The big problem that Swing has is that it's big and complicated. And the reason that it's big and complicated is that it is the most fully featured, flexible, whatever-way-you-want-to-measure-it UI [user interface] tool kit on the planet. You can do the most amazing things with Swing. It's sort of the 747 cockpit of UI design tool kits, and flying a 747 is a little intimidating, and what people really want is a Cessna. It is definately possible to write good speedy Swing applications, but U need to be a master of Swing and threading and know all the potholes which hog memory.

Wednesday, December 14, 2005

The Synchronizer Token pattern addresses the problem of duplicate form submissions. A synchronizer token is set in a user's session and included with each form returned to the client. When that form is submitted, the synchronizer token in the form is compared to the synchronizer token in the session. The tokens should match the first time the form is submitted. If the tokens do not match, then the form submission may be disallowed and an error returned to the user. Token mismatch may occur when the user submits a form, then clicks the Back button in the browser and attempts to resubmit the same form.

On the other hand, if the two token values match, then we are confident that the flow of control is exactly as expected. At this point, the token value in the session is modified to a new value and the form submission is accepted.

Struts has inbuild support for the Synchronizer Token pattern. Method such as "saveToken()" and "isTokenValid()" are present in the ActionForm class.

Other ways of preventing a duplicate submit is using javascript to disable the submit button after it has been pressed once. But this does not work well with all browsers.

Databases support different types of locking strategies. We have optimistic locking strategies where concurrency would be more, but data integrity would be less. We have pessimistic locking strategies where concurrency is less, but there is more data integrity. Basically when it comes to locking and transaction isolation levels, its a trade-off between data integrity and concurrency.

Databases supports various levels of lock granularity. The lowest level is a single row. Sometimes, the database doesn't have enough resources to lock each individualrow. In such cases, the database can acquire locks on a single data or index page, group of pages, or an entire table. The granularity of locks depends on the memory available to the database. Servers with more memory can support more concurrent users because they can acquire and release more locks.

We can give locking hints to the database while sending queries that would help us override locking decisions made by the database. For instance, in SQL Server. we can specify the ROWLOCK hint with the UPDATE statement to convince SQL Server to lock each row affected by that data modification.

While dealing with transactions, the type of locking used depends on the Transaction Isolation Level specified for that particular transaction.SQL Server supports implicit and explicit transactions. By default, each INSERT, UPDATE, and DELETE statement runs within an implicit transaction.Explicit transactions, on the other hand, must be specified by the programmer. Such transactions are included inBEGIN TRANSACTION ... COMMIT TRANSACTION block

There are two common types of locks present in all databases- Shared locks and Exclusive locks.Shared locks (S) allow transactions to read data with SELECT statements. Other connections are allowed to read the data at the same time; however, no transactions are allowed to modify data until the shared locks are released.Exclusive locks (X) completely lock the resource from any type of access including reads. They are issued when data is being modified through INSERT, UPDATE and DELETE statements

Lost Updates: Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of other transactions. The last update overwrites updates made by the other transactions, which results in lost data.

Dirty Read: Uncommitted dependency occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row.

Nonrepeatable Read: Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time.

Phantom reads: These occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction. The transaction's first read of the range of rows shows a row that no longer exists in the second or succeeding read, as a result of a deletion by a different transaction. Similarly, as the result of an insert by a different transaction, the transaction's second or succeeding read shows a row that did not exist in the original read.

These concurrency problems can be solved using the correct transaction isolation level in our programs. To balance between high concurrency and data integrity, we can choose from the following isolation levels:

READ UNCOMMITTED: This is the lowest level where transactions are isolated only enough to ensure that physically corrupt data is not read. Does not protect against dirty read, phantom read or non-repeatable reads.

READ COMMITTED: The shared lock is held for the duration of the transaction, meaning that no other transactions can change the data at the same time. Protects against dirty reads, but the other problems remain.

REPEATABLE READ: This setting disallows dirty and non-repeatable reads. But phantom reads are still possible.

SERIALIZABLE: This is the highest level, where transactions are completely isolated from one another. Protects against dirty read, phantom read and non-repeatable read.

There are other types of locks available in the database such as 'update lock', 'intent lock', 'schema lock' etc. The update lock prevents deadlocks in the database. For more information about these check out the following links:

Tuesday, December 13, 2005

This seems to a favourite question during interviews - How to delete duplicate rows in a table?Well, we can have many strategies:

- Capture one instance of the unique rows using a SELECT DISTINCT .., and dump the results into a temp table. Delete all of the rows from the original table and then insert the rows from the temp table back into the original table.

- If there is an identity column, the we can also write a query to get all the rows that have duplicate entries :SELECT Field1, Field2, Count(ID)FROM Foo1GROUP BY Foo1.Field1, Foo1.Field2HAVING Count(Foo1.ID) > 1-----------------------------Then loop thru the resultset and get a cursor for the query"SELECT Field1, Field2, IDFROM Foo1WHERE Field1 = @FIELD1 and Field2 = @FIELD2".Use the cursor to delete all the duplicate rows returned except one.

While working on projects, often I had tussles with erst-while database developers who used to drive the design of the system using Data Models. Most of them came from non-OO background and were apt in ER diagrams. But the porblem is that a Data Model cannot and should not drive the OO model. I came across this excellent article by Scott Ambler discussing this dilemma.

In the above article, the author argues that object schemas can be quite different from physical data schemas. Using two lucid examples he shows how different object schemas can map to the same data schema and how it is possible for a single object schema to correctly map to several data schemas.There are situations where OO varies significantly from ERD, for example in cases of inheritance and non-persistent classes.Aslo, OO models can convey more specification on the class diagram in representing certain things like associations than ERD can.

I personally believe that if U are building a OO system, then we should first go for OO Class modelling and then think about resolving the OO-relational database impedance mismatch.

A multidimensional database (MDB) is a type of database that is optimized for data warehouse and online analytical processing (OLAP) applications. Multidimensional databases are frequently created using input from existing relational databases.

A multidimensional database - or a multidimensional database management system (MDDBMS) - implies the ability to rapidly process the data in the database so that answers can be generated quickly. A number of vendors provide products that use multidimensional databases. Approaches to how data is stored and the user interface vary.

Conceptually, a multidimensional database uses the idea of a data cube to represent the dimensions of data available to a user. For example, "sales" could be viewed in the dimensions of product model, geography, time, or some additional dimension. In this case, "sales" is known as the measure attribute of the data cube and the other dimensions are seen as feature attributes. Additionally, a database creator can define hierarchies and levels within a dimension (for example, state and city levels within a regional hierarchy).

Tuesday, December 06, 2005

Cursors are database objects that allow us to manipulate data in a set in a row-by-row basis or on a group of rows at one time. Cursors are quite popular among database developers as the row can be updated at the same time itself. There is no need to fire a separate SQL query.

But there is also a lot of confusion regarding the exact definition of cursors, because different people use it in different contexts. For e.g. when a database administrator talks of cursors, he means the server-side cursor he uses inside stored procedures. But if a JDBC application developer talks about a cursor, he may mean the pointer in the JDBC ResultSet object.I did a bit of reseach on the web, and finally cleared a few cobwebs from the head. Here's a snapshot of what I learned:

There are implicit cursors and explicit cursors. An implicit cursor is - well, that's just a piece of memory used by the database server to work with resulsets internally. Implicit cursors are not accessible via an externally exposed API, whereas explicit cursors are.

Server-side cursors and Client-side cursors: The difference between a client-side cursor and a server-side cursor in classic ADO is substantial and can be confusing. A server-side cursor allows you to manipulate rows on the server through calls on the client, usually storing all or a portion of the data in TEMPDB. The client-side cursor fetches data into a COM object on the client. Its name comes from the fact that the buffered data on the client exhibits cursor-like behaviors you can scroll through and, potentially, update it. The behavior difference manifests itself in a few ways. Fetching a large resultset into a client cursor causes a big performance hit on the initial fetch, and server cursors result in increased memory requirements for SQL Server and require a dedicated connection all the time you're fetching. Client-side cursors can be dangerous because using them has the side effect of retrieving all records from the server that are a result of your command/SQL statement. If you execute a procedure or SQL statement that could retrieve 10,000 records, and use a client-side cursor, you had better have enough memory to hold 10,000 records on the client. Also, control will not return to your code/application until all of these records are retrieved from the server, which can make your application appear slow to users. If you suspect that you are going to retrieve a large number of rows, client-side cursors are not the way to go.

DYNAMIC, STATIC, and KEYSET cursors: Dynamic cursors will show changes made on the base table as you scroll through the cursor. Static cursors copy the base table, to the tempdb. The cursor then reads from the tempdb, so any changes happening on the base table will not be reflected in the cursors scrolling. Keysets are in between Dynamic and Static cursors. A keyset will copy the base table's selected records keys into the tempdb, so the cursor will select the rows from the tempdb, but the data from the base table. So change to the base table will be seen, but new record inserts will not be.

Next, I tried to find out how these cursors can be manipulated using .NET and Java APIs.

In .NET, the DataReader object acts as a wrapper class for a server-side cursor. The DataReader provides a read-only and forward-only cursor. More info about this is here.

In JDBC, we handle everything using ResultSet objects. Since JDBC 2.0/3.0, we have ResultSets that are scrollable backwards and also that allow update operations. So I believe if U configure a ResultSet to be scrollable backwards and allow update opeartions, U are essentially dealing with a server-side cursor.

Code snippet:

Statement stmt = con.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE,ResultSet.CONCUR_READ_ONLY); ResultSet srs = stmt.executeQuery("SELECT COF_NAME, PRICE FROM COFFEES");---------------------------------------------------------------------------------The above code makes the ResultSet as scroll-sensitive, so it will reflect changes made to it while it is open. This is the equivalent of a dynamic cursor. In a dynamic cursor, the engine repeats the query each time the cursor is accessed so new members are added and existing members are removed as the database processes changes to the database.The second argument is one of two ResultSet constants for specifying whether a result set is read-only or updatable: CONCUR_READ_ONLY and CONCUR_UPDATABLE.Specifying the constant TYPE_FORWARD_ONLY creates a nonscrollable result set, that is, one in which the cursor moves only forward. If you do not specify any constants for the type and updatability of a ResultSet object, you will automatically get one that is TYPE_FORWARD_ONLY and CONCUR_READ_ONLY (as is the case when you are using only the JDBC 1.0 API). You might keep in mind, though, the fact that no matter what type of result set you specify, you are always limited by what your DBMS and driver actually provide.

Monday, December 05, 2005

Recently, I wanted to see the hex dump of a binary file on a server. The server machine did not have my favourite hex GUI editors. I struggled to download a hex dump utility from the web, when my friend showed me how to use the 'debug.exe' tool in DOS to get a hex dump of a file.

On the DOS prompt type :C:\>debug -d-----------------------The 'd' (dump) option would dump the first 128 bytes. Typing 'd' again would give the hex dump of the next block of bytes.

U can type '?' to get a list of all the commands. Type 'q' to quit from debug mode.

A few years back, I often used to lament on the lack of packet sniffing API's available in Java or C#.NET. I knew of the WinPcap library, but I wanted a wrapper around it. Fortunately some good guys in the open source community have build these wrappers around WinPcap.

A DataReader is a stream of data that is returned from a database query. When the query is executed, the first row is returned to the DataReader via the stream. The stream then remains connected to the database, poised to retrieve the next record. The DataReader reads one row at a time from the database and can only move forward, one record at a time. As the DataReader reads the rows from the database, the values of the columns in each row can be read and evaluated, but they cannot be edited.

The DataReader can only get its data from a data source through a managed provider. The DataSet can also get its data from a data source via a managed provider, but the data source can also be loaded manually, even from an XML file on a hard drive.

The DataReader supports access to multiple resultsets, one at a time, in the order they are retrieved. (Methods to checkout are NextResult() and Read() )

When the SqlDataAdapter's Fill method is executed it opens its associated SqlConnection object (if not already open) and issues its associated SqlCommand object against the SqlConnection. Behind the scenes, a SqlDataReader is created implicitly and the rowset is retrieved one row at a time in succession and sent to the DataSet. Once all of the data is in the DataSet, the implicit SqlDataReader is destroyed and the SqlConnection is closed.

The DataSet's disconnected nature allows it to be transformed into XML and sent over the wire via HTTP if appropriate. This makes it ideal as the return vehicle from business-tier objects and Web services. A DataReader cannot be serialized and thus cannot be passed between physical-tier boundaries where only string (XML) data can go.

There are other times when a DataReader can be the right choice, such as when populating a list or retrieving 10,000 records for a business rule. When a huge amount of data must be retrieved to a business process, even on a middle tier, it can take a while to load a DataSet, pass the data to it on the business tier from the database, and then store it in memory. The footprint could be quite large and with numerous instances of it running (such as in a Web application where hundreds of users may be connected), scalability would become a problem. If this data is intended to be retrieved and then traversed for business rule processing, the DataReader could speed up the process as it retrieves one row at a time and does not require the memory resources that the DataSet requires.

Most databases have built-in functions, such as GetDate, DateAdd, and ObjectName. Such built-in functions are useful, but you can’t alter their functionality in any way, and that’s why UDFs are so powerful and necessary. UDFs allow you to add custom solutions for unique application-specific problems.A UDF is actually a kind of subroutine that contains T-SQL statements and can return a scalar value or a table value. Hence U can call a UDF from a SELECT statement. Whereas a Stored Procedure needs to be invoked using the 'EXEC' command.So when to use what? The answer depends on the problem situation. If you have an operation such as a query with a FROM clause that requires a rowset be drawn from a table or set of tables, then a function will be your appropriate choice. However, when you want to use that same rowset in your application the better choice would be a stored procedure.

The other differences btw a UDF and a Stored Procedure are as follows:

The application blocks that comprise the Enterprise Library are the following:Caching Application Block. This application block allows developers to incorporate a local cache in their applications.Configuration Application Block. This application block allows applications to read and write configuration information.Data Access Application Block. This application block allows developers to incorporate standard database functionality in their applications.Cryptography Application Block. This application block allows developers to include encryption and hashing functionality in their applications.Exception Handling Application Block. This application block allows developers and policy makers to create a consistent strategy for processing exceptions that occur throughout the architectural layers of enterprise applications.Logging and Instrumentation Application Block. This application block allows developers to incorporate standard logging and instrumentation functionality in their applications.Security Application Block. This application block allows developers to incorporate security functionality in their applications. Applications can use the application block in a variety of situations, such as authenticating and authorizing users against a database, retrieving role and profile information, and caching user profile information.

Most of us have worked on transactions one time or the other. Transactions can be broadly classified as local or distributed.Local transactions are of the simplest form in which the application peforms CRUD operations with a single datasource/database. The simplest form of relational database access involves only the application, a resource manager, and a resource adapter. The resource manager can be a relational database management system (RDBMS), such as Oracle or SQL Server. All of the actual database management is handled by this component.The resource adapter is the component that is the communications channel, or request translator, between the "outside world," in this case the application, and the resource manager. In Java applications, this is a JDBC driver.

In a distributed transaction, the transaction accesses and updates data on two or more networked resources, and therefore must be coordinated among those resources.These resources could consist of several different RDBMSs housed on a single sever, for example, Oracle, SQL Server, and Sybase; or they could include several instances of a single type of database residing on a number of different servers. In any case, a distributed transaction involves coordination among the various resource managers. This coordination is the function of the transaction manager. The transaction manager is responsible for making the final decision either to commit or rollback any distributed transaction. A commit decision should lead to a successful transaction; rollback leaves the data in the database unaltered

The first step of the distributed transaction process is for the application to send a request for the transaction to the transaction manager. Although the final commit/rollback decision treats the transaction as a single logical unit, there can be many transaction branches involved. A transaction branch is associated with a request to each resource manager involved in the distributed transaction. Requests to three different RDBMSs, therefore, require three transaction branches. Each transaction branch must be committed or rolled back by the local resource manager. The transaction manager controls the boundaries of the transaction and is responsible for the final decision as to whether or not the total transaction should commit or rollback. This decision is made in two phases, called the Two-Phase Commit Protocol.

In the first phase, the transaction manager polls all of the resource managers (RDBMSs) involved in the distributed transaction to see if each one is ready to commit. If a resource manager cannot commit, it responds negatively and rolls back its particular part of the transaction so that data is not altered.

In the second phase, the transaction manager determines if any of the resource managers have responded negatively, and, if so, rolls back the whole transaction. If there are no negative responses, the translation manager commits the whole transaction, and returns the results to the application.

In VS.NET, whenever we add a web-reference to a webservice, what happens behind the scenes is that a proxy is created which contains the URL to the webservice.

Now if we need to shift the webservice to a production server, do we have to create the proxy again?. (The IP address of the webservice server would have changed)

Fortunately there is a easy way out. Just right-click on the proxy in VS.NET and change its URL property from 'static' to 'dynamic'. And Voila !!!, VS.NET automatically adds code to the proxy class and web.config. The newly added code in the proxy will first check for a URL key in the web.config and use it to bind to the webservice.So when we move the webservice to a different server, we just have to change the URL in the web.config file.

An interesting aspect of the package was the clear separation of object creation from the way the objects are pooled. There is a PoolableObjectFactory interface that provides a generic interface for managing the lifecycle of a pooled instance.

By contract, when an ObjectPool delegates to a PoolableObjectFactory : -

makeObject is called whenever a new instance is needed.activateObject is invoked on every instance before it is returned from the pool.passivateObject is invoked on every instance when it is returned to the pool.destroyObject is invoked on every instance when it is being "dropped" from the pool (whether due to the response from validateObject, or for reasons specific to the pool implementation.)validateObject is invoked in an implementation-specific fashion to determine if an instance is still valid to be returned by the pool. It will only be invoked on an "activated" instance.

In this article the author argues that using thread pools makes sense whenever U have a large number of tasks that need to be processed and each task is short-lived. For e.g. Webservers, FTP servers etc.But when the tasks are long running and few in number it may make sense to actually spawn a thread for each task. Another common threading model is to have a single background thread and task queue for tasks of a certain type. AWT and Swing use this model, in which there is a GUI event thread, and all work that causes changes in the user interface must execute in that thread.

Java 5.0 has come up with a new package java.util.concurrent that contains a lot of cool classes if U need to implement threading in Ur applications.

Thursday, November 17, 2005

There is a new class in Java5.0 - StringBuilder class that can be used whenever we are doing some heavy-duty string manipulation and we do not wish to get bogged down by the immutable property of strings.But then what happened to the old good StringBuffer class. Well, it's still there.Here's what the JavaDoc says:

This StringBuilder class provides an API compatible with StringBuffer, but with no guarantee of synchronization. This class is designed for use as a drop-in replacement for StringBuffer in places where the string buffer was being used by a single thread (as is generally the case). Where possible, it is recommended that this class be used in preference to StringBuffer as it will be faster under most implementations.

Instances of StringBuilder are not safe for use by multiple threads. If such synchronization is required then it is recommended that StringBuffer be used.

Wednesday, November 09, 2005

I still see many people with a lot of myths about Unicode. I guess the reason for this is that a lot of people still feel that Unicode is a encoding format that uses 16 bits to represent a character.Let's put a few things in perspective here:

Unicode is a standard which has defined a character code for every character in most of the speaking languages in the world. Also it has defined a character code for items such as scientific, mathematical, and technical symbols, and even musical notation. These character codes are also known as code points.

Unicode characters may be encoded at any code point from U+0000 to U+10FFFF, i.e. Unicode reserves 1,114,112 (= 220 + 216) code points, and currently assigns characters to more than 96,000 of those code points. The first 256 codes precisely match those of ISO 8859-1, the most popular 8-bit character encoding in the "Western world"; as a result, the first 128 characters are also identical to ASCII.

The number of bits used to represent each code point may differ - e.g. 8 bits, 16 bits.The size of the code unit used for expressing those code points may be 8 bits (for UTF-8), 16 bits (for UTF-16), or 32 bits (for UTF-32).So what this means is that there are several formats for storing Unicode code points. When combined with the byte order of the hardware (BE or LE), they are known officially as "character encoding schemes." They are also known by their UTF acronyms, which stand for "Unicode Transformation Format"

UTF-8 is widely used because the first 128 bits in the byte are ASCII, and although up to four bytes can be used, only one byte is required for use in the English speaking world. UTF-16 and UTF-32 use a fixed number of bytes.

So to put in other words, Unicode text can be represented in more than one way, including UTF-8, UTF-16 and UTF-32. So, hey...what's this UTF ?

A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point to a unique byte sequence. UTF-8 is most common on the web. UTF-16 is used by Java and Windows. UTF-32 is used by various Unix systems. The conversions between all of them are algorithmically based, fast and lossless. This makes it easy to support data input or output in multiple formats, while using a particular UTF for internal storage or processing.

During i8n and localization, we often come across basic fundamental issues such as:- How many bytes make a character?- How many characters/bytes are present in a string?

Each character gets encoded into bytes according to a specific charset. For e.g. ASCII uses 7 bit encoding, i.e. each char is represented by 7 bits. ANSI/Cp1521 uses 8-bit encoding, Unicode uses 16 bit encoding. UTF-8, which is a popular encoding set on the internet is a multibyte Unicode charset. So if someone asks - how many bytes make a character - the answer is - it depends on the charset used to encode the character.

Another interesting point in Java is the difference btw a 'char' and a character.When we do "String.length()" in Java, we get the number of chars in the string. But a Unicode character may be made up of more than one 'char'.This blog throws light on this concept: http://forum.java.sun.com/thread.jspa?threadID=671720

Snippet from the above blog:---------------------------------A char is not necessarily a complete character. Why? Supplementary characters exist in the Unicode charset. These are characters that have code points above the base set, and they have values greater than 0xFFFF. They extend all the way up to 0x10FFFF. That's a lot of characters. In Java, these supplementary characters are represented as surrogate pairs, pairs of char units that fall in a specific range. The leading or high surrogate value is in the 0xD800 through 0xDBFF range. The trailing or low surrogate value is in the 0xDC00 through 0xDFFF range. What kinds of characters are supplementary? You can find out more from the Unicode site itself. So, if length won't tell me home many characters are in a String, what will? Fortunately, the J2SE 5.0 API has a new String method: codePointCount(int beginIndex, int endIndex). This method will tell you how many Unicode code points are between the two indices. The index values refer to code unit or char locations, so endIndex - beginIndex for the entire String is equivalent to the String's length. Anyway, here's how you might use the method: int charLen = myString.length();int characterLen = myString.codePointCount(0, charLen);

Tuesday, November 08, 2005

I still see a lot of applications that are vulnerable to SQL injection attacks bcoz of usage of dynamic SQL using string concatenation. It is a common myth that to circumvent SQL injection attacks we 'have' to use stored procedures.Even if we are using dynamic SQL, it is pretty simple to avoid these attacks.

In .NET the following techniques can be used:Step 1. Constrain input - Validate input using client side validation and server side validation (e.g. using regular expressions)Step 2. Use parameters with stored procedures. There is one caveat with Stored procedures. If Ur stored procedure uses the 'EXEC' command which takes a string, then the same vulnerability exists there too.Step 3. Use parameters with dynamic SQL. (Yepppiee...in .NET it's so simple to have named parameters even for dynamic SQL)

I have been interested in datawarehousing concepts for a long time, but unfortunately never got a chance to work on them. Here are some common concepts U need to know to understand any datawarehousing glibber.

Data Warehousing: An enterprise-wide implementation that replicates data from the same publication table on different servers/platforms to a single subscription table. This implementation effectively consolidates data from multiple sources.

Data Mining: The process of finding hidden patterns and relationships in data. For instance, a consumer goods company may track 200 variables about each consumer. There are scores of possible relationships among the 200 variables. Data mining tools will identify the significant relationships.

OLAP (On-Line Analytical Processing)Describes the systems used not for application delivery, but for analyzing the business, e.g., sales forecasting, market trends analysis, etc. These systems are also more conducive to heuristic reporting and often involves multidimensional data analysis capabilities.

Thursday, October 20, 2005

I often used to write some utility classes that would simplify the coding of most used tasks.For e.g. accessing a DataReader for binding to a grid, we have to write all the database plumbing code again and again.Hence MS has released some utility wrapper classes that can be used to simplify such routine tasks. A good article explaining this is at http://aspnet.4guysfromrolla.com/articles/070203-1.aspx

Wednesday, October 19, 2005

After authenticating a user using forms authentication, we may want to restict access to certain parts of the website to certain users - i.e. Authorize users.

To implement Role based authorization we would need to set up a database containing info about which role a user belongs to. Then we need to construct a Principal object specifying which role the user belongs to and assign it to the HttpContext user property.

I often wondered what was the advantage of compiling files into .net modules and linking them together using the Al.exe tool. I also noticed that al.exe actually only creates a stub dll, the netmodule files really have to be physically deployed too.

I guess the advantage of compiling to netmodules are:

Each module may be independently developed by a separate set of developers.

Each module may be written in any .net language.

A multimodule assembly does have memory advantages: if you put rarely-usedtypes in one module, that module will only be loaded when one of its types is referenced, when the type is first used. If the rarely-used types are never used in an execution of the assembly, the module is not loaded.

When we go the default GAC folder in Windows located at : "C:\WINNT\assembly" we see the dlls that are registered in the GAC. But what if we have different versions of the same DLL registered in the GAC? How can a single Windows folder allow 2 files with the same name to exist?

To understand this, go to the DOS prompt and navigate to "C:\WINNT\assembly". Do a 'dir' command and see the contents. There would be a 'GAC' folder and inside the folder, there is a folder for each Assembly. Each assembly folder has a folder for each version. Inside this version folder there is the actual DLL. Try this out :)

Tuesday, October 18, 2005

Often in my applications, I needed to persist some Java data objects as XML. For this I used to use JDOM which was quite cool to use bcoz of its Java-like API.But there is one more XML data binding framework known as "Castor" which can handle all the marshalling and unmarshalling of object to XML for you automatically or with the help of a mapping file. Castor is indeed a cool XML data binding framework.

Friday, October 07, 2005

The .NET API has quite a few classes to help us manipulate XML data.We have the 'XMLReader' class and its subclasses like 'XMLTextReader', 'XMLValidatingReader' ect that provide a forward-only, non-cached, read-only cursor to XML data. An interesting point is that XMLReader uses a 'pull model' unlike SAX push event model.

Then we have XPathDocument that provides a fast, read-only cache for XML document processing using XSLT. Quite good if U need to use XPath to query the XML data.

We have a special class for loading a DataSet directly into a XML form for manipultion. This class is the XMLDataDocument class that can load either relational data or XML data and manipulate that data using the W3C Document Object Model (DOM).

Wednesday, October 05, 2005

I have been a strong advocate of Struts right from the start. I often see developers getting confused over making a choice between struts, velocity, tiles etc.

The point is that "Struts" is a "controller-centric" MVC solution. The "View" and the "Model" parts are pluggable thanks to the plug-in architecture of Struts1.1Hence is possible to use "Tiles" or Velocity templates inplace of plain JSP for the View in Struts.Similarly the Model can be used to interact with EJB, JDO, Hibernate...etc.

The various choices available in Server-side Java may seem daunting to a new developer,but with experience we come to realize and appreciate the value of each solution and where it fits best :)

Monday, October 03, 2005

At first I thought that the "package" access-specifier in Java is equivalent to the "internal" access-specifier in .NET. But actually there is a subtle difference.

The 'package' access-specifier in Java limits the access to a particular package/namespace, where as the 'internal' access-specifier in .NET limits the access to the containing assembly. Now a .NET assembly can contain more than one 'namespace' or package.So is there any way to restrict the scope of a member of a class to the same namespace?

Suprisingly NO, there is none in .NET..So if we were designing by 'separation of concerns', maybe it would make sense to map one namespace to one physical assembly.

Wednesday, September 28, 2005

There are a couple of things we need to keep in mind while doing a file-upload in ASP.NET.

If the upload fails it could be bcoz of the following reasons:

ASP.NET limits the size of file uploads for security purposes. The default size is 4 MB. This can be changed by modifying the maxRequestLength attribute of Machine.config's element.

If we are using the HtmlInputFile html control, then we need to set the the Enctype property of the HtmlForm to "multipart/form-data" for this control to work properly, or we might get 'null' for the PostedFile property.

Friday, September 23, 2005

Developers often get confused when they see the following methods (all sound the same).RegisterWellknownServiceType()RegisterWellknownClientType()RegisterActivatedServiceType()RegisterActivatedClientType()

Finally I got hold of a image that explains the above in a lucid manner...

In Java, when a class is declared as 'protected', it is also given 'package' access automatically. i.e. a protected member can be accessed within the same package by other members.

But in .NET, 'protected' access specifier means that only subclasses can see it. If we want other members of a assembly to see the 'protected' class, then we need to put the access specifier as 'protected internal'.

Thus 'protected' in Java is equivalent to 'protected internal' in .NET

Tuesday, September 20, 2005

I often used to get frustrated with the speed of eclipse on my machine. Then I heard from my friend how I can increase the initial memory allocated to Eclipse. U just have to pass values from the command prompt using the vmargs argument.

eclipse -vmargs -Xms256m -Xmx256m

This single change, gave me a very good "perceived" performance benifit while using the IDE.

Friday, September 16, 2005

I came accross this slang many a times over the last few years. So what does "eating Ur own dog food " mean?

“A company that eats its own dog food sends the message that itconsiders its own products the best on the market.”“This slang was popularized during the dotcom craze when somecompanies did not use their own products and thus could "not even eattheir own dog food". An example would've been a software company thatcreated operating systems but used its competitor's software on itscorporate computers.”

Thursday, September 15, 2005

I have often wondered how download managers worked? I wanted to know the internal working that makes it possible for these components to download faster?

The main functions of a download manager are:• Resuming interrupted downloads (i.e., downloading only the rest of the file instead of restarting theprocess from the very beginning);• Scheduled operation: connecting to the Internet, downloading a list of specific files and disconnectingaccording to a user-defined schedule (e.g. at night when the connection quality is usually higher, while the connection rates are lower);• some download managers have additional functions: searching for files on WWW and FTP servers byname, downloading files in several "streams" from one or from different mirror servers, etc.

Most download managers use the concept of "Multi-connection downloading" - the file is downloaded in several segments through multiple connections and reassembled at the user's PC.To understand how this would work, we first need to understand a feature of web-servers.A lot of webservers (http and ftp) today support the "resume download" function - what this means is that if your download is interrupted or stopped, U can resume downloading the file from where U left it. But the question now arises, how does the client(web-browser) tell the server what part of the file it wants or where to resume download? Is this a standard or server proprietary? I was suprised when I found out that it is the HTTP protocol itself that has support for "range downloads", i.e. when U request for a resource, U can also request what portion/segment of the resource U want to download. This information is passed from the client as a HTTP header:

Now what download managers do is that they start a number of threads that download different portions of the resource. So the download manager will make another request with the header:

GET http://lrc.aiha.com/English/Training/Dldmgrs-Eng.pdf?Cache HTTP/1.1Host: lrc.aiha.comAccept: */*User-Agent: DA 7.0Proxy-Authorization: Basic bmFyZW5kcjpuYXJlbjEyNDM=Connection: CloseRange: bytes=96143-192286This solves the mystery of how the download managers are able to simultaneously download different portions of the resource.

Imp Note: To resume interrupted downloads, it is not enough to use a download manager: the server from which the file is being downloaded should support download resumption. Unfortunately, some servers do not support thisfunction, and are called “non-resumable.”So Ur download managers won't work (no increase in speed either), as the servers would ignore the HTTP "range" header.

But I was still confused how exactly does this increase the speed. After all if the current bandwidth is "fully utilized" with one connection, how does making more connections help? The answers I found on the net are as below:

Normal TCP connections, as used by HTTP, encounter a maximum connection throughput well below that of the available bandwidth in circumstances with even moderate amounts of packet loss and signal latency. (so bcoz of packet loss and latency, the client will have to re-request some packets ) Multiple TCP connections can help to alleviate these effects and, in doing so, provide faster downloads and better utilization of available bandwidth.

Opening more connections means less sharing with others. Web servers are set up to split their bandwidth into several streams to support as many users downloading as possible. As an example, if the download manager created eight connections to the server, the server thinks it is transmitting to eight different users and delivers all eight streams to the same user. Each of the eight requests asks for data starting at a different location in the file.

I always wished someone could provide me with a neat API for accessing the windows registry using Java. I did not want to get into the nitty-gritty of JNI calls..Fortunately there is an opensource library available at:http://www.trustice.com/java/jnireg/

Monday, September 12, 2005

Recently I wanted to add a whole directory struture to Clearcase VOB and I was suprised to see that the Graphical Explorer did not have a option to recursively add an entire directory structure to source-control. Thankfully I found out a command line tool ("clearfsimport") usingwhich I could import/add all directories and files to the VOB.The general usage of the command is :

Friday, September 09, 2005

I have seen a lot of programs using the double-checked locking pattern in their singleton classes to avoid the overhead of synchronization for each method call.

But it is now known that the double-check locking idiom does not work. It is bcoz of the memory model of Java, which allows out-of-order writes to memory. i.e. There exists a window of time, when an instance is not null, but still not fully initiliazed(constructor has not returned) !!!

Thursday, September 08, 2005

Quite often, we may feel the need for an immutable collection, i.e. users cannot modify the collection and bring it to an invalid state, but can only read it.The Collection API has methods to help us with this. For e.g. to get an immutable list, use the following code:

List ul = Collections.unmodifiableList(list);

Check out the other methods of Collections which give you other helper methods to make collections immutable. Any attempt to modify the returned list, whether direct or via its iterator, result in an UnsupportedOperationException.

The 'immutablity' aspect can also be used as a design pattern for concurrent read-and-write access to a collection. (Think mutiple users or threads). Anyone who needs to modify a collection will make a copy of the collection object and modify it.

Tuesday, August 30, 2005

Recently I had to convert a byte[] array to a string in .NET. I knew that there should be some method somewhere which takes a encoding and returns the string accordingly.I searched in the "byte[]" class, then in the "string" class, but could not find any appropriate method. Finally I found it in the System.Text.Encoding class. It has other subclasses such asSystem.Text.ASCIIEncoding,System.Text.UnicodeEncoding,System.Text.UTF7Encoding,System.Text.UTF8Encoding.

Thursday, August 25, 2005

In quite a few unix programs ( shell scripts) I see the following command =

someprog > /dev/null 2>&1

The > /dev/null 2>&1 part means to send any standard output to /dev/null (the linux trash can) and to redirect standard error (2) to the same place as the standard output (1). Basically it runs the command without any output to a terminal etc.

Wednesday, August 24, 2005

I have been using VNC for years and I always assumed it to be a product of a company named RealVNC. It was only recently that I realised that VNC (Virtual Network Computing) is a open-source technology and had been first developmed on the Unix platform !!!...(I thought VNC works only on Windows)We have VNC servers available for all platforms such as Unix X-window, Solaris CDE, Linux, Windows XP etc.

Thursday, August 18, 2005

When I was first introduced to Java, most of the program debugging was done through console/file logging. (using the ubiquitous System.out.println).

But today, we have so many GUI debuggers at our disposal. IDEs contain their own debuggers (such as IBM VisualAge for Java, Symantec VisualCafe, and Borland JBuilder, IBM Eclipse), Stand-alone GUIs (such as Jikes, Java Platform Debugger Architecture javadt, and JProbe) ,Text-based and command-line driven (such as Sun JDB)My favourite is the Eclipse Debugger and it serves most of my purposes.But there are some other cool stand-alone debuggers available too. Here are links to a few of them:http://www.bluemarsh.com/java/jswat/http://www.debugtools.com/index.htmlhttp://freshmeat.net/projects/jikesdebugger/

If any of the above still do not meet your needs, the Java platform has introduced the JavaDebugging APIs, which you may use to create a debugger that specifically meets yourneeds. The revised Java Debugger (JDB) serves as both a proof of concept for the Java DebuggingAPI, and as a useful debugging tool. It was rewritten to use the Java Debug Interface (JDI)and is part of the JDK.

Tuesday, August 16, 2005

It's been some time that I have worked on any new XML processing program. My favaourite Java XML library had always been JDOM (promoted by Jason Hunter) JDOM was so intutive to use for Java developers and also made excellent use of Java collections.

But today, there are quite a number of alternatives that one needs to study before deciding on which library to use.

The javaw command is identical to java, except that with javaw there is no associated console window. Use javaw when you don't want a command prompt window to appear. The javaw launcher will, however, display a dialog box with error information if a launch fails for some reason.

So If Ur application is 100% GUI, then U can use javaw to launch it instead of java.exe

Tuesday, August 09, 2005

Recently I faced a peculiar problem when I had compiled a application using JDK 1.5 and then tried to run it inside TOMCAT running Java 1.4. And I got an runtime exception - Version mismatch -- 0.49 should be 0.48I was bewildered, but later on searching the NET understood that the class files generated by the JDK 1.5 compiler(v 0.49) and JDK 1.4(v 0.48) compiler are not same.

My application was not using any of the new JDK 1.5 features, even then this problem was arising. I think the reason for this being the host of new features introduced in JDK1.5For e.g. in Java 1.5, U can write:Integer I=7; and it will compile with Java1.5 and also run inside a Java 1.5 JVM.

But if the above code is run inside a Java 1.4 JVM, then it will fail. Hence SUN has put a check to ensure that there is a runtime error shown if the class files are of different versions than that of the JVM.

Monday, August 01, 2005

Even if we are using JUnit and NUnit for executing our unit test cases, there are still some loopholes left. For e.g. how do we test that all paths in the code have been covered in the test cases? What code are the tests actually testing? What code isn't being tested? Is the test suite getting out of date?

Code coverage tools prove to be invaluable in such occasions. These tools instrument the code/binaries to discover those sections of the code that have not been tested.

Sometimes we may wish to change the default bootstrap classes that are loaded by the java compiler while compiling.

javac.exe provides us with a '-bootclasspath' option for that.

I found this option useful, when I was using a older version of CORBA interfaces, which were not compatible with the "org.omg.*" CORBA interfaces in JDK 1.5. So I included the old CORBA interfaces jar file before rt.jar in the bootstrap classpath and the problem was solved.

In Eclipse, the problem was solved just by changing the order of jar files in the "Build path" menu.

Found out that there was an alternative way to pass Java source files names to the javac compiler :)

Here's what the Java docs say:There are two ways to pass source code file names to javac:- For a small number of source files, simply list the file names on the command line.- For a large number of source files, list the file names in a file, separated by blanks or line breaks. Then use the list file name on the javac command line, preceded by an @ character.

When I first heard about DBC 4 years back, little did I know how important a role it can play in software development. But my experience in the past few years have made me wiser. I have seen so much of cluttered code where inside each method call, the first thing that is seen is the checking of all the parameters. (This is sometimes also replicated on the client side). How many times I have seen a try/catch for a condition that will never occur if the method is called correctly.

The "Design By Contract" paradigm eliminates all this. It specifies that a method should have a pre-condition, a post-condition and an invariant. At first, I found the concept of invariant a bit foggy to understand.

Well, actually it is very simple. An invariant is something (in a class) which does not change. Classes should specify their invariants: what is true before and after executing any public method.

Wednesday, July 20, 2005

The problem with using Java on Cygwin is that Java.exe is a windows program and expects the path to be in windows style. Whereas Cygwin expects paths to be in Unix style.

Hence if U run a simple Java command like the below, U will get an error:java -classpath /cygdrive/d/naren MyJavaProgram

Even the following would give an error, bcoz cygwin cannot understand windows path on the command promptjava -classpath d:\naren MyJavaProgram

So, the solution is to use a cygwin utility known as cygpath.exe. This tool converts from unix paths to windows path, before passing it to the shell.java -classpath `cygpath -wp $CLASSPATH` [arguments]

Tuesday, July 19, 2005

Came across a cool tool today that can be used to convert Java byte-code to .NET IL.What does this mean to developers..Well, it means that if U have a Java library, U need not 'manually' port the Java library to .NET. Just use the IKVM tool to do this for U...

In the CORBA world, an object is a programming entity with an identity, an IDL (interface definition language)-defined interface, and an implementation. An object is an abstract concept and cannot serve client requests. To do so, an object must be incarnated or given bodily form—that is, its implementation must be activated. The servant gives the CORBA object its implementation. At any moment, only one servant incarnates a given object, but over an object's lifetime, many (different) servants can incarnate the object at different points in time.The terms creation and destruction apply to objects, while the terms incarnation and etherealization apply to servants.Once an object is created, it can alternate between many activations and deactivations during its lifetime. To serve requests, an object must:1)Be activated if it is not active.2)Be associated with a servant if it does not already have one. Just because an object is active does not mean that it has an associated servant. You can configure/program the POA to use a new servant upon request.

A client views a CORBA object as an object reference. The fact that a client has an object reference does not mean that a servant is incarnating the object at that time. In fact, the object reference's existence does not indicate an object's existence. If the object does not exist (that is, it has been destroyed), the client will receive an OBJECT_NOT_EXIST error when it tries to access the object using the object reference. However, as noted above, if the object is in a deactivated condition, it will activate, a process transparent to the client.

Servant: A programming language entity that exists in the context of a server and implements a CORBA object. In non-OO languages like C and COBOL, a servant is implemented as a collection of functions that manipulate data (e.g., an instance of a struct or record) that represent the state of a CORBA object. In OO languages like C++ and Java, servants are object instances of a particular class.

The POA distinguishes between the CORBA object reference (IOR) and the implementation object that does the work. This implementation object is called a servant. A BOA-based approach has the IOR and servant existing at the same time. A POA-based approach can support this, but can also support IORs existing without being associated with servants, and also servants existing without being associated with IORs.

Obviously, the association between an IOR and a servant has to be made at some point, to make the servant a useable CORBA object. But this association can be done on-demand. Consider the following example scenarios to motivate the advantages of on-demand association:

A pool of servants can be instantiated, and then associated in turn with IORs, as needed.

A set of IORs can be created for the purposes of publishing the references to the Name Service, without going through the work to actually instantiate the servants.

Moreover, the POA allows a single servant to simultaneously support several IORs.All of the above significantly contribute to scalable applications.

HOW DOES THE POA MAKE THE ASSOCIATION BETWEEN SERVANTS AND CORBA OBJECTS?

This is where the Object ID and and POA Active Object Map come in. So, for a given POA, the Object ID identifies a specific CORBA object, which is used in the Active Object Map to identify the servant.