Item RTA000086732

Commands: Display  Find
Last updated . . . 01/17/97
Bad performance with a sockets application


We are trying to use CICS sockets interface. In doing some performance
testing to an OEM system we could see, through the use of a sniffer,
that our TCP/IP only sends a frame after receiving an ACK to the
previous frame. I think that this is not in accordance with the
sliding window mechanism with which I should be able to send more
data without waiting for a confirmation, providing that the window is
not full.

We haven't got the trace in softcopy, but I'll transcribe a typical
sequence. Note this has nothing to do with slowstarts as our connection
is established only once and the delay is always observed.

Trace:
Time    OEM -----------------------------------------------  IBM
                                    .
                                    .
                                    .
                                    .

0       ->  SEQ 026920DB / ACK 2D34A4FD / WINDOW 4000
+3 ms   <-  SEQ 2D34A4FD / ACK 026920DB / PUSH (108 bytes) / WINDOW
64000
  More data is now pending to be sent from the IBM side (the
  application
  has issued another socket send.
+373 ms ->  SEQ 026920DB / ACK 2D34A569 / WINDOW 4000
+3 ms   <-  SEQ 2D34A569 / ACK 026920DB / PUSH (108 bytes) / WINDOW
64000
This last frame was held till reception of the ack from the OEM system.

When we do a SEND, TCP/IP in MVS automatically makes a PUSH and we
don't have the possibility of waiting for more data in order to
piggyback and be more efficient.

Can you comment on these two aspects?

We did some testing with FTP (not a sockets application) and it is
behaving correctly by not waiting for an ACK if the window is not
closed.


ANSWER

First the delay of acknowledgements:

In MVS TCP/IP V3R1 you can place a parameter on either a PORT statement
or on a GATEWAYS entry in the PROFILE.TCPIP data set, where you
specify DELAYACKS.  This will prevent MVS TCP/IP from sending an
acknowledgment at once.  It will instead wait for a short
amount of time to see if it could 'piggyback' the ACK bit on an outgoing
TCP segment.

Now for the sliding window:

FTP is a socket application like any other socket application.  So from
a TCP protocol layer point of view, there should not be any difference
in what you see for your own application and for FTP.  An application
can not via the socket API order the TCP protocol layer to set the
PUSH bit.  It is entirely up to the TCP protocol layer to decide when
to set the PUSH bit or not.  Normally the PUSH bit will be set, if the
segment being sent will empty the TCP send buffer, and it often will as
data usually is sent when it is written.

It seems as if you are familiar with the intrinsic details of TCP, but
could, what you see, be the result of the slow-start mechanism?  The
first segment on a new connection, will always be sent as a single
segment, and any succeeding segments will not be sent until this first
segment has been acknowledged.  TCP/IP will then increase the so-called
congestion window and send two segments before waiting for a new
acknowledgment - and so on exponentially increasing the congestion
window.  The sender can send up to the minimum of the congestion
window and the advertised window.

When you deal with TCP performance, you have something which is called
the Silly-Window-Syndrome (SWS) and you have something which is called
the NAGLE algorithm that deals with the SWS.

The silly window syndrome occurs when small amounts of data, instead
of full size segments, are transmitted over a network.  It is normally
caused by a sender who transmits small segments and a receiver who
advertises a small window.  The NAGLE algorithm prevents a sending TCP
from entering this syndrome.

See W. Richard Stevens' books (TCP/IP Illustrated volumes 1 and 2) for
more details on these concepts - in particular Volume 2 chapter 26.

Based on your trace entries, here is a detailed explanation of
what goes on in the MVS TCP protocol layer:

    OEM                                MVS
    -------------------------------------------------------------------
        -ACK--S(20DB) A(A4FD) W(4000) -->
                            - Connection enters idle state, no
                              ACKS are expected by MVS
                            - Application writes 108 bytes to API
                            - In idle state and transmit will empty
                              send buffer: Transmit short segment
                              and set PSH bit
      <--S(A4FD) A(20DB) W(64000) PSH L(108) --- TCP transmits
                            - Connection is in not-idle state,
                              an ACK is expected
                            - Application writes another 108 bytes to
                              API
                            - Not in idle state and NAGLE
                              algorithm is not disabled: Suspend
                              transmission until an ACK is received
                              or API fills sufficient data into send
                              buffer to fill a segment or
                              to create a segment that is half the size
                              of the current advertised window size
                              (in your example: 2000 bytes)

                             - Waiting -

        -ACK--S(20DB) A(A569) W(4000) -->
                            - An ACK was received and whatever data
                              is residing in the TCP send buffer will
                              be transmitted.  As this write will empty
                              the TCP send buffer again, the PSH
                              bit is set.
      <--S(A569) A(20DB) W(64000) PSH L(108) --- TCP transmits
                            - Connection is in not-idle state,
                              an ACK is expected

If performance is of great concern, you have the following options:

1. Do not delay an ACK from the OEM system.  This may cause some
   extra TCP segments to cross the network; but it will in your
   situation probably improve response times, because the OEM system
   will not delay an ACK for 200 msec; but send it immediately forcing
   MVS to send the pending data in its TCP send buffer.
   If you are able to customize the delay ack timer on the OEM
   system, you might be able to set it to, for example, 25 msec
   instead of the normal default of 200 msec.
   This is a better solution from a performance point of view, than
   having the OEM application send a one-byte segment.  The ACK
   will be sent from the OEM TCP protocol layer as soon as the MVS
   data has arrived, and will not wait for the 108 bytes to pass up to
   the OEM application.   By default MVS will
   not delay ACKS.  You can force MVS to delay ACKS on a link-level or
   on a PORT level, as mentioned in a previous append to this entry.

2. Do as you already did, let the OEM application respond with a
   one-byte buffer as soon as it has received the 108 bytes from your
   CICS application.  When the OEM application writes this one-byte
   buffer to the OEM TCP layer, the OEM TCP layer is in an idle
   state (it does not expect any ACK's from MVS), and will write the
   one-byte segment immediately, piggybacking the pending ACK to
   MVS on this one-byte segment.

3. Do not use TCP; but UDP instead.  UDP datagrams will be sent
   immediately, so from a performance point of view, UDP may be
   desirable; but you have to consider carefully if UDP will
   be sufficient for your reliability requirements.


There is a socket option called TCP_NODELAY.  In MVS it is only
honored if your network interface is via an offload box (3172-3 in
offload mode).  For other network interfaces it is ignored.
TCP_NODELAY disables the NAGLE algorithm for the connection it
applies to.

S e a r c h - k e y w o r d s:
NAGLE SWS PERFORMANCE TCP CICS SOCKET UDP SLOWSTART SLOW START