Dustin's Pages

Friday, December 9, 2016

As I've worked with legacy Java code over the years, I've run into subtle logic and performance issues that could be traced back to improperly overridden Object.equals(Object) methods. Although the concept behind the "equals" method is seemingly simple, Josh Bloch points out in Effective Java that "Overriding the equals method seems simple, but there are many ways to get it wrong, and the consequences can be dire. The easiest way to avoid problems is not to override the equals method, in which case each instance is equal only to itself." In this post, I look at one of "the many ways" to get an equals(Object) wrong: failing to compare exactly the same characteristics of the two objects being evaluated for equality.

The next code listing is for class MismatchedFieldAccessor. This class's equals(Object) method is flawed because it compares the class's direct attribute someString to the value retrieved from the other object's getSomeString(). In most Java classes, comparing a class's field to its accessor/get method will work properly because the accessor/get method simply returns the associated field. In this example class, however, the accessor/get method does more than simply returning the field and that makes the comparison of the field to the get/accessor method in the equals(Object) method inconsistent. (Note that the idea of having a "get" method do this type of thing is not being recommended here, but merely exists as a simple-to-understand example.)

The second unit test makes use of the handy EqualsVerifier library to identify an issue with this equals(Object) implementation (emphasis added):

java.lang.AssertionError: Reflexivity: object does not equal an identical copy of itself:
dustin.examples.brokenequals.MismatchedFieldAccessor@0
If this is intentional, consider suppressing Warning.IDENTICAL_COPY
For more information, go to: http://www.jqno.nl/equalsverifier/errormessages
at nl.jqno.equalsverifier.EqualsVerifier.handleError(EqualsVerifier.java:381)
at nl.jqno.equalsverifier.EqualsVerifier.verify(EqualsVerifier.java:367)
at dustin.examples.brokenequals.MismatchedFieldAccessorTest.testEqualsWithEqualsVerifier(MismatchedFieldAccessorTest.java:36)

This post has covered one of the many ways in which an equals method can go bad if not carefully implemented, reviewed, and tested. Fortunately, the fix for this particular problem is easy: always compare the exact same field or same method's returned object of the two instances being compared for equality. In the example used in this post, comparing the two "someString" fields directly would have made the "equals" method work properly.

Wednesday, November 23, 2016

Because I work with Linux and Windows based machines for development, I often find myself wishing that I had some of the handy command-line Linux tools available in my Windows environments. Cygwin, PowerShell, and custom Groovy scripts written to emulate Linux tools have helped, but I was pleasantly surprised to recently learn that Bash on Ubuntu on Windows 10 is available. In this post, I briefly summarize some of the steps to make bash available on Windows. More detailed instructions with helpful screen snapshots can be found in the Installation Guide.

The Windows Subsystem for Linux (WSL) is described in the Frequently Asked Questions page as "a new Windows 10 feature that enables you to run native Linux command-line tools directly on Windows, alongside your traditional Windows desktop and modern store apps." This same FAQ page states that enabling WSL "downloads a genuine Ubuntu user-mode image, created by Canonical."

Verify that Windows 10 installation is 64-bit "system type" and has an "OS Build" of at least 14393.0.

Turn on "Developer Mode"

Enable Windows Subsystem for Linux (WSL) ["Turn Windows Features On or Off" GUI]

Enable Windows Subsystem for Linux (WSL) [PowerShell Command-line]

Restart Computer as Directed

Run bash from Command Prompt (Downloads Canonical's Ubuntu on Windows)

Create a Unix User

Once the steps described above have been executed, bash can be easily used in the Windows 10 environment. A few basic commands are shown in the next screen snapshot. It shows running bash in the Command Prompt and running a few common Linux commands while in that bash shell.

As a developer who uses both Windows and Linux, having bash available in Windows 10 is a welcome addition.

The following interfaces and classes are contrived examples that will be used in this post to illustrate inheritance of Javadoc comments on methods. Some inherited/implementing methods include their own Javadoc comments that override parent's/interface's methods comments fully or partially and other simply reuse the parent's/interface's methods' documentation.

The next screen snapshot shows the contents of the package that includes the interfaces and classes whose code listings are shown above (not all the classes and interfaces in the package had their code listings shown).

The three classes of most interest here from methods' Javadoc perspective are the classes Dog, Cat, and Horse because they implement several interfaces and extend MamalWithHair, which extends Mammal, which extends Animal.

The next screen snapshot is of the Javadoc for the Animal class rendered in a web browser.

The Animal class doesn't inherit any methods from a superclass and doesn't implement any methods from an interface and is not very interesting for this blog post's topic. However, other classes shown here extend this class and so it is interesting to see how its method comments affect the inheriting classes' methods' descriptions.

The next two screen snapshots are of the Javadoc for the Mammal and MammalWithHair classes as rendered in a web browser. There are no Javadoc comments on any significance on Mammal, but there is one method comment for a new method introduced by MammalWithHair.

The next three screen snapshots are of subsets of Javadoc documentation in a web browser for the interfaces Herbivorous, Carnivorous, and Omnivorous. These interfaces provide documentation for methods that will be inherited by classes that implement these methods.

With the generated Javadoc methods documentation for the parent classes and the interfaces shown, it's now time to look at the generated documentation for the methods of the classes extending those classes and implementing those interfaces.

The methods in the Dog class shown earlier generally used {@inheritDoc} in conjunction with additional text. The results of inheriting method Javadoc comments from extended classes and implemented interfaces combined with additional test provided in Dog's comments are shown in the next screen snapshots.

The last set of screen snapshots demonstrates that the Dog class's documentation mixes the documentation of its "parents" with its own specific documentation. This is not surprising. The Dog class's methods generally explicitly inherited Javadoc documentation from the parents (base classes and interfaces), but the Cat class mostly has no Javadoc comments on its methods, except for the eat method, which simply uses {@inheritDoc}. The generated web browser output from this class is shown in the next screen snapshots.

The methods in Cat that had no Javadoc comments applied show up in the generated web browser documentation with documentation inherited from their base classes or interfaces and the documentation on these methods includes the phrase "Description copied from class:" or "Description copied from interface:" as appropriate. The one Cat method that does explicitly include the documentation tag {@inheritDoc} does copy the parent's method's documentation, but does not include the "Description copied from" message.

The Horse class's methods are generally not documented at all and so their generated documentation includes the message "Description copied from...". The eat() and giveBirth() methods of the Horse class override the @param portion and so the parameter documentation for these two methods in the generated web browser documentation (shown in the next set of screen snapshots) is specific to Horse.

From the above code listings and screen snapshots of generated documentation from that code, some observations can be made regarding the inheritance of methods' Javadoc comments by extending and implementing classes. These observations are also described in the javadoc tool documentation:

Javadoc comments are inherited from parent class's methods and from implemented interface methods either implicitly when no text is specified (no Javadoc at all or empty Javadoc /** */).

Use {@inheritDoc} explicitly states that comments should be inherited.

javadoc documentation: "Insert the {@inheritDoc} inline tag in a method main description or @return, @param, or @throws tag comment. The corresponding inherited main description or tag comment is copied into that spot."

Implicit and explicit inheritance of method documentation can be achieved in combination by using {@inheritDoc} tags in different locations within the method comment.

Given the above observations and given the advertised "Method Comments Algorithm", a good rule of thumb for writing Javadoc from the perspective of the HTML generated from the Javadoc is to define general comments at as high a level as possible and allow automatic inheritance of the extended classes's and implemented interfaces' methods' Javadoc documentation to take place, adding or overriding only portions of a method's Javadoc text that are necessary to clarify or enhance the description for a lower-level method. This is better than copying and pasting the same comment on all methods in an inheritance or implementation hierarchy and needing to then keep them all updated together.

This post has focused on the web browser presentation of generated Javadoc methods' documentation. Fortunately, the most commonly used Java IDEs (NetBeans [CTRL+hover], IntelliJ IDEA [CTRL+Q / Settings], Eclipse [F2 / hover / Javadoc View], and JDeveloper [CTRL-D]) support presentation of Javadoc that generally follows the same rules of method documentation inheritance. This means that Java developers can often write less documentation and almost entirely avoid repeated documentation in inheritance and implementation hierarchies.

Most of the issues encountered and discussed in Java related to floating-point representation and arithmetic are caused by the inability to precisely represent (usually) decimal (base ten) floating point numbers with an underlying binary (base two) representation. In this post, I focus on similar consequences that can result from mixing fixed-point numbers (as stored in a database) with floating-point numbers (as represented in Java).

The Oracle database allows numeric columns of the NUMBER data type to be expressed with two integers that represent "precision" and "scale". The PostgreSQL implementation of the numeric data type is very similar. Both Oracle's NUMBER(p,s) and PostgreSQL's numeric(p,s) allow the same datatype to represent essentially an integral value (precision specified but scale not specified), a fixed-point number (precision and scale specified), or a floating-point number (neither precision nor scale specified). Simple Java/JDBC-based examples in this post will demonstrate this.

For the examples in this post, a simple table named DOUBLES in Oracle and doubles in PostgreSQL will be created. The DDL statements for defining these simple tables in the two database are shown next.

With the DOUBLES table created in Oracle database and PostgreSQL database, I'll next use a simple JDBCPreparedStatement to insert the value of java.lang.Math.PI into each table for all three columns. The following Java code snippet demonstrates this insertion.

The output of running the above Java insertion and querying code against the Oracle and PostgreSQL databases respectively is shown in the next two screen snapshots.

Comparing Math.PI to Oracle's NUMBER Columns

Comparing Math.PI to PostgreSQL's numeric Columns

The simple examples using Java and Oracle and PostgreSQL demonstrate issues that might arise when specifying precision and scale on the Oracle NUMBER and PostgreSQL numeric column types. Although there are situations when fixed-point numbers are desirable, it is important to recognize that Java does not have a fixed-point primitive data type and use BigDecimal or a fixed-point Java library (such as decimal4j or Java Math Fixed Point Library) to appropriately deal with the fixed-point numbers retrieved from database columns expressed as fixed points. In the examples demonstrated in this post, nothing is really "wrong", but it is important to recognize the distinction between fixed-point numbers in the database and floating-point numbers in Java because arithmetic that brings the two together may not have the results one would expect.

In Java and other programming languages, one needs to not only be concerned about the effect of arithmetic operations and available precision on the "correctness" of floating-point numbers. The developer also needs to be aware of how these numbers are stored in relational database columns in the Oracle and PostgreSQL databases to understand how precision and scale designations on those columns can affect the representation of the stored floating-point number. This is especially applicable if the representations queried from the database are to be used in floating-point calculations. This is another (of many) examples where it is important for the Java developer to understand the database schema being used.

Saturday, October 22, 2016

The Paysa Blog recently featured a post Silicon Valley’s Most Valuable Skills in which they looked at the most valuable software development skills in the United States in terms of average salary and in terms of job openings listing the skill. Of particular interest to me was the portion of the post on programming languages and how average salaries and number of opportunities correlate to each of the listed programming languages. I briefly summarize some of the observations I found interesting in this post and refer interested parties to the original post for further details on the methodology and results. For anyone wanting a slideshow providing a high-level summary of this post that highlights nine programming languages associated with numerous job listings and with salaries over $120,000 (US) in the United States, see 9 tech skills that pay over $120,000 and are in demand.

The blog post summarizes a key observation from the collected data depicted in the chart above: "We found that skills considered to be less common often resulted in a higher salary." As an example, the post describes how the increasingly less commonly requested skill of Objective C (because of the ascent of Swift) is one of the most highly compensated and the most highly requested programming language skill on the chart (SQL) is associated with one of the lowest average salaries.

Another emphasized observation in this post is, "Almost 20 percent of the jobs we saw indicated a need or understanding of SQL." This is more evidence that it benefits software developers to avoid contracting SQLphobia.

There is coverage of regional salary and available positions related to programming languages in this post, but I would have loved to see the salary and job openings mapped to individual metropolitan areas in the United States in a chart or charts because the "same salary" can actually quite different from a "supported lifestyle" point of view between areas. Speaking of regions, this information is obviously less useful to developers residing and/or working for employers outside of the United States, but I still think these observations can provide interesting perspective for those developers. It was a bit surprising to me that Paysa found that "San Francisco companies are looking for applicants who specialize in Java" and that "Boulder, Colorado, proved to be the city where JavaScript skills are most in demand."

Software development opportunities are about more than monetary compensation. Many of the best software developers I know want interesting challenges, the ability to learn new things, and other non-monetary compensation from their work. Even compensation is often tricky to compare as benefits packages and perks can differ significantly. Another factor that complicates things is the use of "average salaries" because these averages can be significantly affected by region and may not represent large ranges in compensation for certain programming language skills. However, after acknowledging these caveats, I still find it interesting to see which programming languages have, according to Paysa's collected data, the highest average associated salaries and the most available job openings in the United States.

Monday, October 17, 2016

I have been interested in the progress of Project Valhalla for quite a while, but Brian Goetz's recent message "Project Valhalla: Goals" has raised my level of interest. I have frequently enjoyed Goetz's writing because he combines two characteristics I want most in a technical author: he knows the subjects he writes about much more deeply than what he is writing about, but is also able to present these concepts at a level approachable to the rest of us who lack his his depth of knowledge in that area. The mail message "Project Valhalla: Goals" is significant in several ways and is highly approachable; it should be read directly by anyone interested in why Project Valhalla is so exciting. Although I recommend reading the original, approachable message, I collect some of my observations from reading this message in this post.

During my software developer career, regardless of the programming language I am using, I have typically found that most software development consists of a series of trade-offs. It is very common to run into areas where the best performing code is less readable than slower code. This trade-off is, in fact, what leads to premature optimization. The danger of premature optimization is that it is "premature" because it the performance gain achieved by the less readable code is actually not needed and so one is effectively exchanging "more dangerous" or "more expensive" code for an unneeded performance benefit.

In Java, a common trade-off of this type is when using objects. Objects can often be easier to use and are required for use with the highly used standard Java collections, but the overhead of objects can be costly in terms of memory and overhead. Goetz points out in "Project Valhalla: Goals" that Project Valhalla has the potential to be one of those relatively rare situations in which performance can be achieved along with "safety, abstraction, encapsulation, expressiveness, [and] maintainability."

Goetz provides a concise summary of the costs associated with objects and maintaining object identity. From this brief explanation of the drawbacks of maintaining object identity in cases in which it is not needed, Goetz moves to the now expected description of how values types for Java could address this issue. In addition to briefly describing advantages of value types, Goetz also provides some alternate names and phrases for value types that might help to understand them better:

"Aggregates, like Java classes, that renounce their identity"

"Codes like a class, works like an int"

"Faster Objects"

"Programmable Primitives"

"Cheaper objects"

"Richer primitives"

Regarding value types, Goetz writes, "We need not force users to choose between abstraction/encapsulation/safety and performance. We can have both." It's not everyday that we can have our cake and eat it too.

In "Project Valhalla: Goals", Goetz also discusses the goal of "extend[ing] generics to allow abstraction over all types, including
primitives, values, and even void." He uses examples of the JDK needing to supply multiple methods in its APIs to cover items that are not reference types but must be supported by the APIs because "generics are currently limited to abstracting only over reference types." Goetz points out that even when autoboxing allows a primitive to be used in the API expecting the reference type corresponding to the primitive (such as an int argument being autoboxed to an Integer reference), this boxing comes at a performance cost. With these explanations of the issues in place, Goetz summarizes, "Everyone would be better off if we could write a generic class or method once -- and abstract over all the possible data types, not just
reference types." He adds, "Being able to write things once ... means simpler, more expressive, more regular, more testable, more composable libraries without giving up performance when dealing with primitives and values, as boxing does today."

Goetz concludes "Project Valhalla: Goals" with the statement, "Valhalla may be motivated by performance considerations, but a better way to view it as enhancing abstraction, encapsulation, safety, expressiveness, and maintainability -- 'without' giving up performance." I really like Project Valhalla from this perspective: we can get many of the benefits of using objects and reference types while not giving up the performance benefits of using primitives.

Project Valhalla: Goals provides much to think about in a concise and approachable manner. Reading this has increased my interest in Project Valhalla's future and I hope we can see it manifest in the JDK.

Wednesday, October 12, 2016

I recently had to deal with some legacy code with significant performance issues. It was more challenging than I thought it should have been to fix the issues and I was reminded throughout the process of how relatively simple good practices could have made the code easier to fix and might have even helped avoid the troublesome code from being written in the first place. I collect some of those reminders and reaffirming observations in this post.

I was having difficulty following what the code was doing and realized that some of the difficulty was due to the naming of the methods and parameters. Several methods had names that implied the method had a single major responsibility when it actually had multiple responsibilities. There are advantages to a method having only a single responsibility, but even methods with multiple responsibilities are more readable and understandable when their name appropriately communicates the multiple responsibilities. As I changed the method names to reflect their multiple responsibilities, I was able to more clearly understand all the things they were doing and, in some cases, break the methods up into separate methods that each had a single responsibility and together implemented the multiple responsibilities of a single method.

Another issue that reduced readability of the code and added to confusion was out-of-date comments. These comments were likely correct at one point, but had become obsolete or incorrect and often described something quite different than what the code was actually doing. As I removed some of these comments and corrected and updated others, the code was easier to read because the code and comments were in harmony. I tried to only leave comments that are necessary because the code itself does not provide an easy way to express the behaviors.

Another common characteristic of this challenging piece of legacy code was the use of several overloaded methods. The types of the parameters in some cases were exactly the same, just in different orders to allow the methods to be overloaded. The methods were far clearer in terms of intent and in terms of differentiation between them when I broke the "clever" use of overloaded methods and named the methods more appropriately for what they were doing. Part of this was changing the method names to reflect the entire reason for the particular ordering of the parameters in the first place. For example (these are conceptual illustrations rather than the methods I worked on), if the methods were to covert an instance of ClassA to an instance of ClassB and vice versa, I changed the method names and signatures from convert(A, B) and convert(B, A) to convertAToB(A, B) and convertBToA(B, A). This made the calling code much more readable because one did not need to know that the local variables passed to these respective methods were of type A or of type B to know which method was being called.

The following summarizes the relatively small issues that together made this a difficult piece of code to understand and fix:

Methods doing too many things (too many responsibilities)

Poorly named methods, arguments, and local variables that insufficiently described the constructs or even incorrectly described them

Overloaded methods with same arguments but very different functionality added to confusion.

A colleague remarked that it would have been better if the methods and variables had been named something nonsensical like "fred" and "bob" than to have their incorrect names that seemed explanatory but ended up being misleading.

Obsolete and outdated comments that were now misleading or blatantly incorrect

Each of these issues by itself was a small thing. When combined in the same area of code, however, they compounded the negative effect of each other and together made the code difficult to understand and therefore difficult to change or fix. Even with a powerful IDE and unit tests, this code was difficult to work with. Making the code more readable was a more effective tool in diagnosing the issue and fixing it than even the powerful IDE and unit tests. More importantly, had this code been written in a more readable fashion in the first place, it would more likely have been implemented correctly and not required a fix later.