Network Monitoring Experimentations 5

Previous articles in this series showed a lot of interesting capabilities that Wireshark and similar packet capture utilities offer, such as seeing the behind the scenes SQL statements that do not appear in a 10046 trace when Autotrace is enabled in SQL*Plus. But there is more that Wireshark offers to assist with analysis of Oracle data transfers across the network. Today’s article explores some of these additional features while experimenting with different array fetch sizes using Oracle Database 11.2.0.1 with default listener settings on the server side, and Oracle client 11.1.0.7 on the client-side, with an 802.11g wireless network between the two computers (greater network packet latency than a wired connection).

The test table and SQL statement used in this article is similar to those used in this article, which demonstrated that the execution plan on Oracle Database 11.2.0.1 differed from that used on earlier releases, and that difference may affect your ability to reproduce the test results for today’s article if you are using an older release of Oracle Database.

Windows client computers do not have an operating system sleep command, so if the client is running on Windows, create a file named sleep.vbs in the current folder used by SQL*Plus with the following code:

Const lngSleepSecondsDefault = 600
Dim lngSleepSeconds
Dim objScriptArguments
Dim strArgument
Set objScriptArguments = WScript.Arguments
If objScriptArguments.Count = 0 then
lngSleepSeconds = lngSleepSecondsDefault
Else
For Each strArgument in objScriptArguments
If IsNumeric(strArgument) Then
lngSleepSeconds = cLng(strArgument)
Exit For
End If
Next
If lngSleepSeconds <= 0 Then
lngSleepSeconds = lngSleepSecondsDefault
End If
End If
WScript.Sleep lngSleepSeconds * 1000

Before enabling the Wireshark capture, execute the test SQL statement a couple of times to limit the potential effects of a cold cache and physical reads unfairly affecting the first execution with the small array fetch size once packet logging is enabled. Start the Wireshark capture on the client, and then execute the above test script in SQL*Plus on the client computer. The client will fetch approximately 999,001 rows from the database, and will likely result in a Wireshark packet capture totalling about 600MB. Later processing of the packet capture, as depicted later in this article, will certainly tax the memory limits of 32 bit operating systems, and may result in Wireshark crashing (the latest beta version of Wireshark was used in this test). Additionally, processing of the resulting packet capture is extremely CPU intensive – you have been warned.

First, we will start the analysis by right-clicking one of the TNS type packets, and select Follow TCP Stream. The resulting window that is eventually displayed will not be used, we are just using that feature to enter the Filter criteria shown near the top of the Wireshark window with the light green background in the picture below:

As shown above, when a line is checked in the lower pane, such as Time delta from previous captured frame, the status bar at the bottom of the window is updated with a field name, in this case frame.time_delta. If you manually review the Time column in the Wireshark window you will see that the time delta between packets varies quite a bit. I wonder if the time delta values vary depending on the array fetch size… we could extract the information and process it a spreadsheet package to determine whether the time deltas change. More on this later. The packet frame length for this packet is 704 bytes.

—

If we scroll down to the bottom of the lower pane, we see that Wireshark has attempted to decode this TNS packet, all except for decrypting the control characters and binary data in the data portion of the packet.

We see that the TNS packet length is 650 of the packet frame’s 704 bytes, and the actual data contained in the packet is 640 of those 650 bytes. The status bar shows that the field name for the data length is data.len – we will use this information later. The data section shows that this is the submission of the first SQL statement in our test script.

—

The following screenshot is showing where the server is returning the data to the client early during the execution of the test script. In this case, with the array fetch size set to 10, the server had to break the selected data into two network packets, with the first containing bytes 0 through 1459, and the second containing bytes 1460 through 1760 (just 301 bytes in the second packet).

—

Scrolling down near the end of the packet capture, we see one of the network packets where the array fetch size of 5,000 was in use. The displayed time delta is 0.000341 seconds, which is very good for a wireless network.

—

Moving down to the next packet, we find that this packet is 855 bytes in length, but the total TCP length is 8,155 bytes, compared to the value 1,761 that we found when the array fetch size of 10 was in use.

The field name for the reassembled TCP length, as displayed in the status bar, is tcp.reassembled.length – we will use this later.

—

Now for the exciting part, graphing the result. If you had to make a choice of which array fetch size to use: 10, 50, 100, 500, 1000, or 5000, which should you use, and why? From the Statistics menu in Wireshark, select the IO Graphs menu item. You should see something like this if you set the Tick Interval to 10 seconds and the Pixels per tick to 10. The Tick Interval value determines how the statistics will be grouped together, in this case it is 10 second intervals. The Pixels per tick value determines how horizontally spread out, the distance between the data items when displayed on screen.

The first bump width in the values represents the time required to retrieve the data 10 rows at a time. The second bump width represents the time required to retrieve 50 rows at a time. The third bump width represents the time required to retrieve 100 rows at a time. The remaining bump widths show the time required to fetch 500, 1000, and 5000 rows at a time, respectively. The height of the bumps, in this case, indicates the number of packets transmitted between the server and the client in each of the 10 second time intervals. When the client retrieved 10 rows at a time, only about 50% of the number of network packets flowed between the server and client as when the client retrieved 50 or 100 rows at a time – this of course increased the time required to transmit all of the requested rows.

—

By changing Packets/Tick to Bytes/Tick we are able to determine how much data (including overhead) is actually contained in the packets flowing across the network every 10 seconds. While the bumps in the previous screenshot showed that roughly the same number of packets flowed across the network for the array fetch sizes of 50 and 100, this screenshot shows that the amount of usable data (plus overhead) in the packets was about 20% less – this explains why the bump representing the array fetch size of 50 is about 20% longer than the bump representing the array fetch size of 100.

—

Interesting, but why were there only about half as many packets transferred every 10 seconds when the array fetch size was set to 10 compared to when it was set to 50? Did I hear someone say “wait and see”, or was it “latency”? :-) If we change the Units field to Advanced we are able to find out what is happening. For the Filter of Graph 1 I specified tcp.stream eq 1 (the same filter as was entered automatically when we earlier selected to Follow the TCP Stream). For the Calc entry I selected Max(*) and then in the next field to the right I specified frame.time_delta_displayed – we saw earlier where this field was obtained in the main Wireshark window. I then set the scale to 100000 so that I was able to see both the peaks and valleys in the graph. The graph shows that there is a maximum latency between packets of about 50ms when the array fetch size was set to 10, and that maximum latency dropped significantly until the array fetch size increased to 1,000. With the array fetch size set to 5,000 the maximum latency is nearly identical to that of the maximum latency when the array fetch size was set to 10 – at this point the client is essentially choking on all of the data that it is receiving, it is taking the client roughly 50ms to process the rows and send the next fetch request to the server.

—

Neat, what else are we able to do? This time we will turn off the first graph an create two additional graphs. The first of the additional graphs shows the sum of the sizes of the TNS section of the network packets for each of the 10 second time intervals, while the second of the additional graphs shows the maximum data length in the TNS section – how much actual data is stuffed into the network packets (per the configured Oracle SDU, and may actually be broken into multiple packets). A logarithmic scale is selected for the data since the sum of the TNS section lengths will be considerably larger than the maximum data length in any given 10 second time interval. The green line, representing the amount of actual data in the packets, shows that in a best case scenario, when the array fetch size is set to 10 all of the packets are leaving without being full, or more correctly, the SDU size is lower – so, not only are half as many packets transmitted per second, but in a best case each packet also contains less data. The situation improves a bit when the fetch array size increased to 50, and then stabilized at a higher value when the fetch array size was bumped to 100.

—

The previous graph is somewhat useful, yet at the same time somewhat useless. Rather than looking at the maximum of how full the data packets were in each 10 second time interval, it is probably more useful to look at how full the packets were on average. Well, the following graph does not actually show that – it shows the amount of data Oracle is packaging before transmission, as controlled by the session data unit (SDU) size, which typically has a maximum default value of 8KB. The following modified version of the previous chart shows how the average SDU size of the packets changed as the array fetch size increased. The effective average SDU size doubled as the array fetch size increased from 100 to 500 – does that mean that the transfer completed twice as fast? No, as shown in the previous graphs, but it might have made a greater difference with a gigabit wired network.

—

Wireshark also provides the option of controlling the graph type. In the following screenshot I added the tcp.reassembled.length as Graph 4, changed Graph 2 to an Impulse style (vertical line) and changed Graph 3 to FBar style (vertical rectangle). Notice that the tcp.reassembled.length values are greater when the array fetch size is 100 than when it is at any other value. I believe that the reason why this is the case is because on average there was a greater percentage of “overhead” bytes in the TNS section of the packets when array fetching 100 rows at a time when compared to array fetching, for example, 500 rows at a time.

—

With the above in mind, what is the ideal array fetch size for this SQL statement? Stop playing with your data packets and answer the question. :-)

——–

Added May 13, 2010:

First 4 packets retrieving rows after the initial single row fetch, fetch array size set to 10, response to 3 fetch calls (click to zoom):

First 4 packets retrieving rows after the initial single row fetch, fetch array size set to 5,000, response to 1 fetch call (click to zoom):

Hi Charles,
When I look at the graphs I would choose a ArraySize 500. In fact, for values
less than 500 (for bytes and packets) have response times greater.
With more than 500 ArraySize values deviate less evident.
As for the latency, I note that there is a latency too long for
ArraySize 50 and 5000, while lower values for ArraySize 50, 100, 500 and 1000
(Although I note that with increasing ArraySize also increases the latency time).

So in conclusion, my answer is ArraySize = 500.

But I wanted to make a couple of considerations:
the first, I repeated the test by monitoring elapsed. Elapsed times decrease
with increasing of ArraySize. But if the response time is less, that does not mean that I believe is the most efficient, right?

I could imagine “set always ArraySize 5000 and I solved my problems.” Creed
not work well, because I certainly increase memory usage and then I consider the latency, which is my second consideration.

I think we must take Consider SDU which has a default value of 8K.
In tests I conducted on my DB I know that each line is 109 bytes, then
SDU is filled with rows about 8192/109 = 75 (ie SDU= 8k divided row length). If the values of your tests are not very different from mine, this should explain why it has the lowest latency for ArraySize values between 50 and 100, or wrong ….

Thank you for posting your analysis. Based on the graphs that were provided, I agree with you that a fetch array size of 500 seems to be ideal.

I think that there is a problem with the test, at least with the design of the data for the test. Each row retrieved was intended to be roughly 875 bytes long (7*8 bytes for column C1 + 100 * 8 bytes for column C2 + overhead). Your observation that each line/row is about 109 bytes disagrees with the intended row length. So, what is happening?

I must admit that I did not catch this earlier. Notice in the third picture (networkmonitoring5-3.jpg) that we see in the packet detail at the bottom of the picture a series of many letter A’s immediately after a number – this is the first row of the fetch. Compare that picture with the fourth picture (networkmonitoring5-4.jpg) which shows one of the later packets in the fetch array size 5000 test. The fourth picture does not show the sequence of the letter A’s – it shows a single letter A, but that is actually part of column C1. Strange… until you read Jonathan Lewis’ recent blog article on the topic of SQL*Net compression. With Jonathan’s post in mind, it now becomes a bit more clear why there was such a significant decrease in the time to fetch 50 rows at a time compared to 10 rows at a time – I would need to double-check the packets, but it looks like 49 of every 50 rows (98%) in each fetch could be compressed when the fetch array size was set to 50, while only 9 of every 10 rows (90%) in each fetch could be compressed when the fetch array size was set to 10. That does not sound like much of a difference until you compare the uncompressed byte size with the compressed byte size. The larger fetch array size also should have reduced the number of partially filled network packets leaving the server.

In short, a smaller fetch array size, such as 100, may be ideal with a different data set that contains less frequently repeated data (thus less compression takes place).

—
Edit May 9, 2010: after reviewing the Wireshark packet capture, the fetch array size did not affect the SQL*Net compression – only the first row in the result set was transmitted in full, with the rest of the rows taking advantage of SQL*Net compression.

For example if I get the last case (ArraySize = 5000) I 85925641/999001, then about 86 bytes transmitted per row. But I know I should have 9 * 8 = 72 bytes for column c1 + 100 * 8 = 800 bytes for column c2, then a total of 872 bytes per row. In conclusion I think that takes advantage of SQL * Net compression, as discussed by J. Lewis the link you highlighted.

But what does this mean? SDU that transmits up to 8192/86, then about 95 rows? (SDU size (8192) / 86 bytes transmitted per row)

Sorry for the delay. There is a very long “Network Monitoring Experimentations 6″ blog article that is planned for Friday. I don’t want to give away any of the details about that article just yet, but hopefully it will be useful/helpful.

I did not have a chance to look at a network packet trace yesterday – the packet trace that I captured for tomorrow’s blog article was over 2GB in size, and that caused the 32 bit client to become unstable when loading the trace file at the end of the capture (the entire trace file was lost in the process).

I think that it might be important to mention that the first row transmitted by the server contains values for all columns. It appears that starting with the second row the server starts transmitting only those columns that change in value from the previous row. That means that in this test case fewer rows fit into the first 8192 byte SDU unit compared to the second 8192 byte SDU unit, or for that matter the first network packet compared to the second network packet.

Last night I added 2 more screen captures to this blog article that show the result of the client’s second fetch call, which is the first fetch call that utilizes the specified arraysize. It might be important to mention here that the default behavior on a TCP/IP network when transferring TCP traffic, the sender is supposed to send 2 network packets, and then the receiver is supposed to send an ACK packet to acknowledge that the previous 2 network packets were received.

The first screen capture shows the first 4 network packets that were received by the client once the specified arraysize of 10 took effect. In the picture, you can see the first row ends at byte 03d2 (hex), the next row ends at byte 0428 (hex), which implies that each row after the first row requires about 86 bytes in the packet. With the array fetch size of 10, the first network packet is, for all intents and purposes, fully used. The second packet contains just 0161 (hex) bytes ~ 353 bytes, compared to the first packet which contained 05F0 (hex) bytes ~ 1520 bytes. The second and third fetch requests with the specified arraysize resulted in packets that were only about 66% filled – 03F2 (hex) bytes ~ 1010 bytes.

Compare the above with the second screen capture that shows the first 4 network packets that were received by the client once the specified arraysize of 5,000 took effect. In the picture, you can see the first row ends at byte 03d2 (hex), the next row ends at byte 0428 (hex), which implies that each row after the first row requires about 86 bytes in the packet. With the array fetch size of 5000, the first network packet is, for all intents and purposes, fully used (1520 bytes). The second, third, and fourth packets are also fully used (1520 bytes).

By the end of the fourth packet, the array fetch size of:
* 10 just finishes transferring row A0001085, with a total time delta of 0.003852 seconds.
* 5,000 was just starting to transfer row A0001114, with a total time delta of 0.001436.

then, after the first row (which is transmitted in full) subsequent transmissions could take advantage of SQL * Net compression.

An observation:
but if, for example, are transmitted rows with all distinct values (ie no SQL * Net compression) and assuming that the average length of a row is 400 bytes, I assume that SDU (maximum) can transmit 8192/400 = about 20 rows. If this is true I expect that with a value Arraysize greater than 20 , not get additional benefits. this consideration is correct? or there are other factors to consider?

Good question – that is the reason why I created the follow-up blog article (click the “Forward to the Next Post in the Series” link at the top of this blog article).

In the follow-up article, table T6 was created with unique (distinct) values in each of the rows – thus there was less chance (maybe no chance) for SQL*Net compression. There is a summary table near the bottom of the follow-up article that compares the transfer time of tables T5 and T6 using 6 different computer configurations and various array fetch sizes.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: