Dustin's Pages

Tuesday, September 30, 2014

Earlier this summer, it was announced on the Coverity Blog and via Andy Chou's Tweet that Coverity Code Spotter Beta is available. That 8 July 2014 blog post describes Coverity Code Spotter as "a free and simple to use cloud-based service built upon Coverity source code analysis technology for finding often hard-to-detect bug-causing issues in Java source code." The blog post also states that "for the duration of the beta period, participants are welcome to upload as much code as they would like and submit builds for analysis as often as they would like.

Yesterday's (29 September 2014) press release (issued in conjunction with JavaOne 2014), "Coverity Launches Code Spotter™ in Free Beta Version to Speed Defect Detection in Java Code," restates some of these observations regarding Coverity Code Spotter. It states, "Built on Coverity’s static code analysis technology, Code Spotter is available for free to the software development community during the beta period." The press release, like the July blog post, describes the types of issues in Java code that Code Spotter detects: "the most common and critical issues in Java code bases, including resource leaks, race conditions, concurrency issues, control flow issues, null pointer dereferences, issues detected by the open source FindBugs tool, copy and paste errors, and many other software defects resulting in incorrect or unpredictable program behavior."

Dennis Chu, Senior Product Manager for Coverity, provided answers to some questions I had. Those questions and answers are shown next.

Q: Is this free for open source and proprietary code bases?
A: Yes, both open source and proprietary Java codes bases can utilize Code Spotter without any limitations during the beta period.

Q: Is the uploaded code made available in any way to others?
A: The uploaded code is kept completely private.

Q: Are the analysis results of the uploaded code available for others with traceability to the code that was analyzed?
A: It is currently possible for a user to download analysis results (which include issues detected as well as code snippets that help understand these issues) and share them with anyone they wish. We are working on a set of team-oriented features that would allow users publish their results to other users within the Code Spotter application.

Q: How long does the uploaded code remain on Coverity's cloud? Can it be completely removed if desired?
A: The code can be completely removed with a click of a button. By default, the code and the results are removed within 30 days of analysis completion. Further, the code is not actually stored on Coverity servers. Instead, the code (and the analysis results) are stored in Amazon's S3 under tight access control.

Monday, September 22, 2014

The Javadoc for the ChoiceFormat class states that ChoiceFormat "allows you to attach a format to a range of numbers" and is "generally used in a MessageFormat for handling plurals." This post describes java.text.ChoiceFormat and provides some examples of applying it in Java code.

One of the most noticeable differences between ChoiceFormat and other "format" classes in the java.text package is that ChoiceFormat does not provide static methods for accessing instances of ChoiceFormat. Instead, ChoiceFormat provides two constructors that are used for instantiating ChoiceFormat objects. The Javadoc for ChoiceFormat highlights and explains this:

ChoiceFormat differs from the other Format classes in that you create a ChoiceFormat object with a constructor (not with a getInstance style factory method). The factory methods aren't necessary because ChoiceFormat doesn't require any complex setup for a given locale. In fact, ChoiceFormat doesn't implement any locale specific behavior.

Constructing ChoiceFormat with Two Arrays

The first of two constructors provided by ChoiceFormat accepts two arrays as its arguments. The first array is an array of primitive doubles that represent the smallest value (starting value) of each interval. The second array is an array of Strings that represent the names associated with each interval. The two arrays must have the same number of elements because there is an assumed one-to-one mapping between the numeric (double) intervals and the Strings describing those intervals. If the two arrays do not have the same number of elements, the following exception is encountered.

Exception in thread "main" java.lang.IllegalArgumentException: Array and limit arrays must be of the same length.

The Javadoc for the ChoiceFormat(double[], String[]) constructor states that the first array parameter is named "limits", is of type double[], and is described as "limits in ascending order." The second array parameter is named "formats", is of type String[], and is described as "corresponding format strings." According to the Javadoc, this constructor "constructs with the limits and the corresponding formats."

Use of the ChoiceFormat constructor accepting two array arguments is demonstrated in the next code listing (the writeGradeInformation(ChoiceFormat) method and fredsTestScores variable will be shown later).

The example above satisfies the expectations of the illustrated ChoiceFormat constructor. The two arrays have the same number of elements, the first (double[]) array has its elements in ascending order, and the second (String[]) array has its "formats" in the same order as the corresponding interval-starting limits in the first array.

The writeGradeInformation(ChoiceFormat) method referenced in the code snippet above demonstrates use of a ChoiceFormat instance based on the two arrays to "format" provided numerical values as Strings. The method's implementation is shown next.

The code above uses the ChoiceFormat instance provided to "format" test scores. Instead of printing a numeric value, the "format" prints the String associated with the interval that numeric value falls within. The next code listing shows the definition of fredsTestScores used in these examples.

Running these test scores through the ChoiceFormat instance instantiated with two arrays generates the following output:

75.6 is a 'C'.
88.8 is a 'B'.
97.3 is a 'A'.
43.3 is a 'F'.
The average score (76.25) is a 'C'.

Constructing ChoiceFormat with a Pattern String

The ChoiceFormat(String) constructor that accepts a String-based pattern may be more appealing to developers who are comfortable using String-based pattern with similar formatting classes such as DateFormat and DecimalFormat. The next code listing demonstrates use of this constructor. The pattern provided to the constructor leads to an instance of ChoiceFormat that should format the same way as the ChoiceFormat instance created in the earlier example with the constructor that takes two arrays.

The writeGradeInformation method called here is the same as the one called earlier and the output is also the same (not shown here because it is the same).

ChoiceFormat Behavior on the Extremes and Boundaries

The examples so far have worked well with test scores in the expected ranges. Another set of test scores will now be used to demonstrate some other features of ChoiceFormat. This new set of test scores is set up in the next code listing and includes an "impossible" negative score and another "likely impossible" score above 100.

When the set of test scores above is run through either of the ChoiceFormat instances created earlier, the output is as shown next.

-25.0 is a 'F '.
0.0 is a 'F '.
20.0 is a 'F '.
60.0 is a 'D '.
70.0 is a 'C '.
80.0 is a 'B '.
90.0 is a 'A'.
100.0 is a 'A'.
115.0 is a 'A'.
The average score (56.666666666666664) is a 'F '.

The output just shown demonstrates that the "limits" set in the ChoiceFormat constructors are "inclusive," meaning that those limits apply to the specified limit and above (until the next limit). In other words, the range of number is defined as greater than or equal to the specified limit. The Javadoc documentation for ChoiceFormat describes this with a mathematical description:

X matches j if and only if limit[j] ≤ X < limit[j+1]

The output from the boundaries test scores example also demonstrates another characteristic of ChoiceFormat described in its Javadoc documentation: "If there is no match, then either the first or last index is used, depending on whether the number (X) is too low or too high." Because there is no match for -25.0 in the provided ChoiceFormat instances, the lowest ('F' for limit of 0) range is applied to that number lower than the lowest range. In these test score examples, there is no higher limit specified than the "90" for an "A", so all scores higher than 90 (including those above 100) are for "A". Let's suppose that we wanted to enforce the ranges of scores to be between 0 and 100 or else have the formatted result indicate "Invalid" for scores less than 0 or greater than 100. This can be done as shown in the next code listing.

When the above method is executed, its output shows that both approaches enforce boundary conditions better.

-25.0 is a 'Invalid - Too Low'.
0.0 is a 'F'.
20.0 is a 'F'.
60.0 is a 'D'.
70.0 is a 'C'.
80.0 is a 'B'.
90.0 is a 'A'.
100.0 is a 'A'.
115.0 is a 'Invalid - Too High'.
The average score (56.666666666666664) is a 'F'.
-25.0 is a 'Invalid - Too Low '.
0.0 is a 'F '.
20.0 is a 'F '.
60.0 is a 'D '.
70.0 is a 'C '.
80.0 is a 'B '.
90.0 is a 'A '.
100.0 is a 'A '.
115.0 is a 'Invalid - Too High'.
The average score (56.666666666666664) is a 'F '.

The last code listing demonstrates using Double.NEGATIVE_INFINITY and \u221E (Unicode INFINITY character) to establish a lowest possible limit boundary in each of the examples. For scores above 100.0 to be formatted as invalid, the arrays-based ChoiceFormat uses a number slightly bigger than 100 as the lower limit of that invalid range. The String/pattern-based ChoiceFormat instance provides greater flexibility and exactness in specifying the lower limit of the "Invalid - Too High" range as any number greater than 100.0 using the less-than symbol (<).

Handling None, Singular, and Plural with ChoiceFormat

I opened this post by quoting the Javadoc stating that ChoiceFormat is "generally used in a MessageFormat for handling plurals," but have not yet demonstrated this common use in this post. I will demonstrate a portion of this (plurals without MessageFormat) very briefly here for completeness, but a much more complete explanation (plurals with MessageFormat) of this common usage of ChoiceFormat is available in the Java Tutorials' Handling Plurals lesson (part of the Internationalization trail).

The next code listing demonstrates application of ChoiceFormat to handle singular and plural cases.

Running the example in the last code listing leads to output that is shown next.

0: I own no cacti.
1: I own a cactus.
2: I own a couple cacti.
3: I own a few cacti.
4: I own many cacti.
5: I own many cacti.
6: I own many cacti.
7: I own many cacti.
8: I own many cacti.
9: I own many cacti.
10: I own a plethora of cacti.

One Final Symbol Supported by ChoiceFormat's Pattern

One other symbol that ChoiceFormat pattern parsing recognizes for formatting strings from a generated numeric value is the \u2264 (≤). This is demonstrated in the next code listing and the output for that code that follows the code listing. Note that in this example the \u2264 works effectively the same as using the simpler # sign shown earlier.

The "limits" double[] array provided to the ChoiceFormat(double[], String[]) constructor constructor should have the limits listed from left-to-right in ascending numerical order. When this is not the case, no exception is thrown, but the logic is almost certainly not going to be correct as Strings being formatted against the instance of ChoiceFormat will "match" incorrectly. This same expectation applies to the constructor accepting a pattern.

ChoiceFormat allows \u221E and -\u221E to be used for specifying lower range limits via its single String (pattern) constructor.

The ChoiceFormat constructor accepting a String pattern is a bit more flexible than the two-arrays constructor and allows one to specify lower limit boundaries as everything over a certain amount without including that certain amount exactly.

Symbols and characters with special meaning in the String patterns provided to the single String ChoiceFormat constructor include #, <, \u2264 (≤), \u221E (∞), and |.

Conclusion

ChoiceFormat allows formatting of numeric ranges to be customized so that specific ranges can have different and specific representations. This post has covered several different aspects of numeric range formatting with ChoiceFormat, but parsing numeric ranges from Strings using ChoiceFormat was not covered in this post.

Monday, September 15, 2014

Coverity, Inc. issued a press release this morning announcing that "the LibreOffice team" has "analyzed more than 9 million lines of code to find and fix more than 6,000 defects." In the press release, Zack Samocha, senior director of products for Coverity, states, "LibreOffice’s remarkable results after just two years [since October 2012] of using the Coverity Scan service reiterates the mission criticality of software testing for the open source community to find and fix software defects early."

This press release cites the Coverity Scan 2013 Open Source Report in explaining the degree of success the LibreOffice team has achieved. Specifically, according to the press release, the LibreOffice team has "reduced the project’s defect density from .8 to .08."

I was curious about some of the specific details associated with LibreOffice's use of Coverity Scan to reduce defects and improve quality and so took advantage of an offer to ask Zack Samocha some questions. The remainder of this post indicates my questions, Zack's answers, and some related references.

Q: What programming languages are used for LibreOffice (all/mostly C++ or some Java or other languages)?A: The language used for LibreOffice is mostly C++.

Q: What is an example of one of the most common types of defects discovered and fixed in LibreOffice?A: The top issues were:

Error handling issues = 2271

Null pointer dereferences = 1796

Uninitialized members = 1145

Q: Is this typical of other open source projects analyzed with Coverity Scan?A: This is comparable for OSS projects [with more than] 1 million lines of code (LOC)

Q: What is an example of one of the most serious types of defects (high-impact) discovered and fixed? Is this typical of other open source projects analyzed with Coverity Scan?A: The most serious are memory related. For example, memory-illegal accesses (there were 23) and memory–corruptions (there were 17). This amount is common for such large code base.

Q: Are there any additional metrics regarding the fixes to LibreOffice using Coverity Scan such as number of developers or number of person hours spent on the effort? Is there any estimate of how much of this effort was identifying the issues (running the scan) versus fixing them and testing the fixes?A: In the past year, LibreOffice fixed more than 8,500 defects, assuming at least one hour per defects, which is conservative. That's about 365 days of work for a single developer.

Q: How does Coverity Scan differ from FindBugs, PMD, other static analysis tools, and IDEs' built-in static analysis support? What advantages does Coverity Scan offer instead of or in addition to those tools?A: At Coverity, we believe in open source collaboration. Coverity complements Findbugs, PMD and others. In fact, in Coverity Scan and our enterprise products, we provide FindBugs defects in the same workflow as defects found by Coverity development testing solutions. Coverity and FindBugs are looking for different things. Coverity is designed to find critical, crash-causing defects where FindBugs is best suited for find coding-style and best practice-type issues. To illustrate the point, we conducted an experiment with the Jenkins open source build server under a controlled environment. We found 296 critical defects while FingBugs found 1010 coding style issues. There were only 30 defects that were found by both solutions. Coverity analysis tends to be more inter procedural, in addition Coverity covers OWASP10 for Security issues.

"The work on using the Coverity reports for LibreOffice is done by a variety of LibreOffice developers, some on the building and testing, others on the other work to fix the issues. In the first months many hundred improvements have been made, making LibreOffice more robust, better."

Before demonstrating Java 8 style date/time parsing/formatting with examples, it is illustrative to compare the Javadoc descriptions for DateFormat/SimpleDateFormat and DateTimeFormatter. The table that follows contains differentiating information that can be gleaned directly or indirectly from a comparison of the Javadoc for each formatting class. Perhaps the most important observations to make from this table are that the new DateTimeFormatter is threadsafe and immutable and the general overview of the APIs that DateTimeFormatter provides for parsing and formatting dates and times.

The remainder of this post uses examples to demonstrate formatting and parsing dates in Java 8 with the java.time constructs. The examples will use the following string patterns and instances.

/** Pattern to use for String representation of Dates/Times. */
private final String dateTimeFormatPattern = "yyyy/MM/dd HH:mm:ss z";
/**
* java.util.Date instance representing now that can
* be formatted using SimpleDateFormat based on my
* dateTimeFormatPattern field.
*/
private final Date now = new Date();
/**
* java.time.ZonedDateTime instance representing now that can
* be formatted using DateTimeFormatter based on my
* dateTimeFormatPattern field.
*
* Note that ZonedDateTime needed to be used in this example
* instead of java.time.LocalDateTime or java.time.OffsetDateTime
* because there is zone information in the format provided by
* my dateTimeFormatPattern field and attempting to have
* DateTimeFormatter.format(TemporalAccessor) instantiated
* with a format pattern that includes time zone details
* will lead to DateTimeException for instances of
* TemporalAccessor that do not have time zone information
* (such as LocalDateTime and OffsetDateTime).
*/
private final ZonedDateTime now8 = ZonedDateTime.now();
/**
* String that can be used by both SimpleDateFormat and
* DateTimeFormatter to parse respective date/time instances
* from this String.
*/
private final String dateTimeString = "2014/09/03 13:59:50 MDT";

Before Java 8, the standard Java approach for dates and times was via the Date and Calendar classes and the standard approach to parsing and formatting dates was via DateFormat and SimpleDateFormat. The next code listing demonstrates these classical approaches.

With Java 8, the preferred date/time classes are no longer in the java.util package and the preferred date/time handling classes are now in the java.time package. Similarly, the preferred date/time formatting/parsing classes are no longer in the java.text package, but instead come from the java.time.format package.

The java.time package offers numerous classes for modeling dates and/or times. These include classes that model dates only (no time information), classes that model times only (no date information), classes that model date and time information, classes that use timezone information, and classes that do not incorporate time zone information. The approach for formatting and parsing these is generally similar, though the characteristics of the class (whether it supports date or time or timezone information, for example) affects which patterns that can be applied. In this post, I use the ZonedDateTime class for my examples. The reason for this choice is that it includes date, time, and time zone information and allows me to use a matching pattern that involves all three of those characteristics like a Date or Calendar instance does. This makes it easier to compare the old and new approaches.

The DateTimeFormatter class provides ofPattern methods to provide an instance of DateTimeFormatter based on the provided date/time pattern String. One of the format methods can then be called on that instance of DateTimeFormatter to get the date and/or time information formatted as a String matching the provided pattern. The next code listing illustrates this approach to formatting a String from a ZonedDateTime based on the provided pattern.

Parsing a date/time class from a String based on a pattern is easily accomplished. There are a couple ways this can be accomplished. One approach is to pass the instance of DateTimeFormatter to the static ZonedDateTime.parse(CharSequence, DateTimeFormatter) method, which returns an instance of ZonedDateTime derived from the provided character sequence and based on the provided pattern. This is illustrated in the next code listing.

A second approach to parsing ZonedDateTime from a String is via DateTimeFormatter's parse(CharSequence, TemporalQuery<T>) method. This is illustrated in the next code listing which also provides an opportunity to demonstrate use of a Java 8 method reference (see ZonedDateTime::from).

Very few projects have the luxury of being a greenfield project that can start with Java 8. Therefore, it's helpful that there are classes that connect the pre-JDK 8 date/time classes with the new date/time classes introduced in JDK 8. One example of this is the ability of JDK 8's DateTimeFormatter to provide a descending instance of the pre-JDK 8 abstract Format class via the DateTimeFormatter.toFormat() method. This is demonstrated in the next code listing.

The Instant class is especially important when working with both pre-JDK 8 Date and Calendar classes in conjunction with the new date and time classes introduced with JDK 8. The reason Instant is so important is that java.util.Date has methods from(Instant) and toInstant() for getting a Date from an Instant and getting an Instant from a Date respectively. Because Instant is so important in migrating pre-Java 8 date/time handling to Java 8 baselines, the next code listing demonstrates formatting and parsing instances of Instant.

The output from running the above demonstration is shown in the next screen snapshot.

Conclusion

The JDK 8 date/time classes and related formatting and parsing classes are much more straightforward to use than their pre-JDK 8 counterparts. This post has attempted to demonstrate how to apply these new classes and to take advantage of some of their benefits.