rick@nyit.UUCP (Rick Ace) (02/26/86)
[KCN (a little cyanide for the line-eater)] Consider an application consisting of a client and a server process connected via TCP sockets. The client sends a request to the server for some text, and the server sends the text over the socket via write() calls of some arbitrary size. Following the text are more bytes (call them "control information") sent by the server, which must be distinguished from the text. The client has no a priori knowledge of the exact number of bytes of text coming down the tube from the server; therefore, the client- server protocol dictates that the server shall send one out-of-band character following the text to alert the client that the end of the text has occurred. The server will then send some bytes of control information following the OOB character. The essence of this scenario is that the receiver (the client) must be able to determine the position of the OOB mark within the in-band stream coming from the sender; the application demands this because the client must know where the text ends and the control information begins. In practice, this application worked fine most of the time; however, sometimes it would just hang. A little analysis showed that when it hung, the client had missed the OOB mark. The logic in the client went like this: 1. Establish connection with server 2. Request server to begin transfer of text. 3. If (ioctl_SIOCATMARK_says_I_am_at_the_mark) goto 7. 4. Read text with read() call. 5. Process text that was just read. 6. Goto 3. 7. Read control information with read() call. ...and so on. I thought, gee, maybe there's a race and you must catch SIGURG to avoid it. So, I adapted the logic from oob() in rlogin.c to the application. No help. After a little head-scratching, it seemed that there was no way to avoid this race: Assume we're in the above loop at step 3, and that the last read() from the client has gobbled up all the in-band data in the socket. The client executes step 3 and observes the OOB data has not arrived. The client issues its read() call, which does a CHMK and enters the kernel at IPL 0. The OOB mark arrives at the client's host's Ethernet interface, followed immediately by more in-band bytes. The hardware interrupt suspends the user in the early stage of read() before it has had a chance to enter the socket logic. tcp_input() notes that the socket (which at that point contained no unread in-band bytes) is now SS_RCVATMARK and duly notifies the user with a SIGURG. Then, the in-band bytes following the OOB mark are dumped into the socket's so_rcv queue. The network interrupt is dismissed and control returns to the client process, which then enters soreceive(). (Note that the client has not had the opportunity to "see" its SIGURG yet because it hasn't emerged from the kernel.) The "do" loop in soreceive() now clears SS_RCVATMARK! This obliterates any knowledge in the client's socket database of the position of the out-of-band mark. Although the client will receive its SIGURG upon returning from the read() syscall, it is not equipped with enough information to pinpoint the position of the OOB mark with respect to the bytes it just got from the read() call. (i.e., because soreceive() cleared SS_RCVATMARK, the client's SIOCATMARK ioctl call will never yield a "true" value in this scenario. Since the client is totally dependent upon the result of the ioctl for its orientation, it has lost its orientation.) (Note: I am aware of the constraint that the stream socket abstraction does not guarantee the support of multiple out-of-band characters. This application sends exactly one out-of-band character, and therefore does not violate that constraint.) In the final implementation of the application, I abandoned out-of-band data entirely, and the intermittent failures disappeared. The 4.2bsd Interprocess Communication Primer (Leffler, Fabry, Joy) illustrates a similar application of out-of-band processing in a code fragment called oob() in Section 5 (ADVANCED TOPICS). It would appear that this fragment is also susceptible to failure due to the race described above. Similar code appears in rlogin.c. To summarize: although it is likely that a process receiving IPC data from a socket can pinpoint the position of an out-of-band data byte with respect to the bytes in the in-band stream, it is not guaranteed that this is always possible. ----- Rick Ace Computer Graphics Laboratory New York Institute of Technology Old Westbury, NY 11568 (516) 686-7644 {decvax,seismo}!philabs!nyit!rick