You just released
SPECjAppServer2001 and SPECjAppServer2002 last year. Why are you
releasing another SPECjAppServer benchmark so soon?

A2:

The two previous benchmarks
(SPECjAppServer2001 and SPECjAppServer2002) were basically a
repackage of the ECperf benchmark, which was designed to meet
the J2EE 1.2 standards specification. The design and layout were
left basically unchanged in order to get the benchmarks released
quickly. SPECjAppServer2004 is an enhanced version of the
benchmark that includes a modified workload and more J2EE 1.3
standard capabilities.

Historically, SPEC creates a
new version of a benchmark every 3 to 4 years, providing a large
number of published results to compare. By releasing new
SPECjAppServer benchmarks so frequently, you are making it
difficult to do trend studies. Can you tell us the shelf life of
this benchmark?

A3:

SPEC intends to keep
SPECjAppServer2004 for as long as it can before developing a new
benchmark, but it also needs to move the benchmark along as new
standards and technologies evolve and old standards and
technologies become obsolete. The exact shelf life is not
predictable and depends largely on the evolution of the J2EE
platform.

In both SPECjAppServer2001 and
SPECjAppServer2002, the load drivers access the application via
a direct connection to the EJBs. In the SPECjAppServer2004
benchmark, the load drivers access the application through the
web layer (for the dealer domain) and the EJBs (for the
manufacturing domain) to stress more of the capabilities of the
J2EE application servers. In addition, SPECjAppServer2004 adds
extensive use of the JMS and MDB infrastructure.

Does this benchmark replace
SPECjAppServer2001 and SPECjAppServer2002?

A5:

Yes. SPEC is providing a
six month transition period from the date of the
SPECjAppServer2004 release. During this period, SPEC will
accept, review and publish results from all three benchmark
versions. After this period, results from SPECjAppServer2001 and
SPECjAppServer2002 will no longer be accepted by SPEC for
publication.

The performance metric is
jAppServer Operations Per Second ("SPECjAppServer2004 JOPS").
This is calculated by adding the metrics of the dealership
management application in the dealer domain and the
manufacturing application in the manufacturing domain as:

SPECjAppServer2004 was
developed by the Java subcommittee's core design team. BEA,
Borland, Darmstadt University of Technology, HP, IBM, Intel,
Oracle, Pramati, Sun and Sybase participated in design,
implementation and testing phases of the product.
SPECjAppServer2004 is not a refresh of the older SPECjAppServer
(01,02) benchmarks. In addition to the EJB tier exercised in
older SPECjAppServer benchmarks, SPECjAppServer2004 also
extensively exercises the web tier and the messaging
infrastructure.

SPECjAppServer2001 and
SPECjAppServer2002 both had a price/performance metric. Why
doesn't SPECjAppServer2004 have one?

A16:

SPECjAppServer2001 and
SPECjAppServer2002 were the first benchmarks released by SPEC
that contained a price/performance metric. They were released
for a year with the price/performance metric as an experiment so
that SPEC could determine if the benefit of this metric was
worth the costs involved. SPEC OSSC (Open Systems Steering
Committee) reviewed the arguments for and against the
price/performance metric and voted to remove it on new
benchmarks.

Although there is no
price/performance metric, you provide a BOM for reproducing
results. Can I create my own price/performance metric and report
it alongside SPEC's published results?

A17:

SPEC does not endorse any
price/performance metric for the SPECjAppServer2004 benchmark.
Whether vendors or other parties can use the performance data to
establish and publish their own price/performance information is
beyond the scope and jurisdiction of SPEC. Note that the
benchmark run rules do not prohibit the use of
$/"SPECjAppServer2004 JOPS" calculated from pricing obtained
using the BOM.

No. Results between
standard and distributed categories of SPECjAppServer2004 cannot
be compared; any public claims that attempt to compare
categories will be considered a violation of SPEC fair use
guidelines.

Do you permit benchmark results
to be estimated or extrapolated from existing results?

A22:

No. This is an
implementation benchmark and all the published results have been
achieved by the submitter and reviewed by the committee.
Extrapolations of results cannot be accurately achieved due to
the complexity of the benchmark.

SPECjAppserver 2004 is designed
to test the performance of a representative J2EE application and
each of the components that make up the application environment,
e.g., H/W, application server, JVM, database.

Can I use SPECjAppServer2004 to
determine the size of the server I need?

A27:

SPECjAppServer2004 should not
be used to size a J2EE 1.3 application server configuration,
because it is based on a specific workload. There are numerous
assumptions made about the workload, which might or might not
apply to other user applications. SPECjAppServer2004 is a tool
that provides a level playing field for comparing J2EE
1.3-compatible application server products.

In addition to the hardware for
the system under test (SUT), one or more client machines are
required, as well as the network equipment to connect the
clients to the SUT. The number and size of client machines
required by the benchmark will depend on the injection rate to
be applied to the workload.

A SPEC member has run the
benchmark on a Pentium 4 1.6GHz laptop system with 512MB of RAM
and a 30GB hard drive. The benchmark completed successfully with
an injection rate of 2. This is not a valid configuration that
you can use to report results, however, as it does not meet the
durability requirements of the benchmark.

Yes, but you are
required to run the files provided with the benchmark if you are
publishing results. As a general rule, modifying the source code
is not allowed. Specific items (the load program, for example)
can be modified to port the application to your environment.
Areas where you are allowed to make changes are listed in the
SPECjAppServer2004 Run and Reporting
Rules. Any changes made must be disclosed in the submission
file when submitting results.

Why do you insist on J2EE
products with CTS certification? Do you and or any certifying
body validate this?

A36:

CTS certification ensures that
the application server being tested is a J2EE technology-based
application server and not a benchmark-special application
server that is crafted specifically for SPECjAppServer2004. The
CTS certification is validated by Sun Microsystems, Inc.

In our initial tests we have
seen good scalability with three 4-CPU systems (two systems for
the J2EE application server and one system for the database
server). SPEC did not explicitly restrict scalability in the
benchmark.

How well does the benchmark
scale in both scale-up and scale-out configurations?

A40:

SPECjAppServer2004 has been
designed and tested with both scale-up and scale-out
configurations. The design of the benchmark does not limit the
scaling in either way. How well it scales in a particular
configuration depends largely on the capabilities of the
underlying hardware and software components.

The
SPECjAppServer2004 Run and Reporting Rules do not preclude
third-party submission of benchmark results, but result
submitters must abide by the licensing restrictions of all the
products used in the benchmark; SPEC is not responsible for
vendor (hardware or software) licensing issues. Many products
include a restriction on publishing benchmark results without
the expressed written permission of the vendor.

The following table shows the
approximate raw data size used to load the database for
different benchmark injection rates:

IR

Size

100

430MB

500

2.1GB

1000

4.2GB

Actual storage space consumed by the RDBMS and all the
supporting structures (i.e. indices) is far higher, however. It
is not unreasonable, for example, to have the database consuming
5GB of disk space to support runs at IR=100. There are a large
number of factors -- both RDBMS- and configuration-dependent --
that influence the actual disk space required.

Can you describe the DB
contents? Do you have jpegs or gifs of cars, or any dynamic
content such as pop-ups or promotional items?

A47:

The DB comprises text and
numeric data. We do not include jpegs or gifs as these are
better served by static web content than in the DB. We do not
include dynamic content as this represents web content and is
usually not part of general DB usage. The client-side processing
of such content is not measured in SPECjAppServer2004.

If the size of the DB is very
small, almost all of it can be cached. Is this realistic?

A48:

We have significantly increased
the database size in SPECjAppServer2004. While still relatively
small, the chances of caching the whole database in memory have
been significantly reduced. Since SPECjAppServer2004 focuses on
evaluating application server performance, a small but
reasonably sized database seems to be far more appropriate than
using database sizes equivalent to the ones used in pure
database benchmarks.

What is typically the ratio of
read vs. write/update operations on the DB?

A49:

An exact answer to this
question is not possible, because it depends on several factors,
including the injection rate and the application server and
database products being used. Lab measurements with a specific
application and database server at an injection rate of 80 have
shown a database read vs. write/update ratio of approximately 4.
Your mileage may vary.

Why didn't you select several
DB sizes, like those in TPC-H and TPC-W?

A50:

The size of the database data
scales stepwise, corresponding to the injection rate for the
benchmark. Multiple scaling factors for database loading would
add another different category. Since we are trying to measure
application server performance, it is best to keep the database
scaling linear for all submissions.

In this benchmark, the size of
the DB is a step function of the IR. This makes it difficult to
compare beyond each step -- between configuration reporting with
IR=50 and IR=65, for example, as both of them have a
different-sized database. Wouldn't it be more fair to compare
the same-sized DB?

A51:

No. As we increase the
load on the application server infrastructure, it is realistic
to increase the size of the database as well. Typically, larger
organizations have a higher number of transactions and larger
databases. Both the load injection and the larger database will
put more pressure on the application server infrastructure. This
will ensure that at a higher IR the application server
infrastructure will perform more work than at a lower IR, making
the results truly comparable.

I have heard that DB
performance had a significant influence on previous
SPECjAppServer benchmarks. What have you done to reduce this
influence?

A52:

In SPECjAppServer2004, a
significant amount of functionality has been incorporated into
the application server layer (e.g., servlets, JSPs, JMS). As a
result, the influence of the database relative to the
application server has been reduced somewhat. In addition, the
scaling of the database has been increased, which results in a
more realistic configuration with reduced table/row contention.
The database continues to be a key component of the benchmark,
however, since it is representative of a typical J2EE
application. Because of this fact, database configuration and
tuning will continue to be very important for performance.

Are results sensitive to
components outside of the SUT -- e.g., client driver machines?
If they are, how can I report optimal performance for a) fewer
powerful driver machines or b) larger number of less powerful
driver machines?

A54:

SPECjAppServer2004 results are
not that sensitive to the type of client driver machines, as
long as they are powerful enough to drive the workload for the
given injection rate. Experience shows that if the client
machines are overly stressed, one cannot reach the throughput
required for the given injection rate.

This is an end-to-end solution
benchmark. How can I determine where the bottlenecks are? Can
you provide a profile or some guidance on tuning issues?

A55:

Unfortunately, every
combination of hardware, software, and any specific
configuration poses a different set of bottlenecks. It would be
difficult or impossible to provide tuning guidance based on such
a broad range of components and configurations. As we narrow
down to a set of products and configurations, such guidelines
are more and more possible. Please contact the respective
software and/or hardware vendors for tuning guidance using their
products.

Is it realistic to use a very
large configuration that would eliminate typical garbage
collection? How much memory is required to eliminate GC for IR=100,
IR=500, and IR=1000?

A56:

Section 2.9.1 of the Run
Rules state that the steady state period must be
representative of a 24-hour run. This means that if no garbage
collection is done during the steady state, it shouldn't be done
during an equivalent 24-hour run. Due to the complexity of the
benchmark and the amount of garbage it generates, it is
unrealistic to configure a setup to run for 24 hours without any
GC. And even if it is possible, such memory requirements have
not yet been established and will vary according to many
factors.