hans@uunet.uu.net (Hans Buurman) (02/08/89)
In article <615@dutrun.UUCP> I write: (program running under 3.5 did not run under 4.0 due to function select() in both the program and libsunview.a) >Could somebody out there >a) tell me which rules the linker uses ? >b) guess where Sun screwed up ? As several people have pointed out this was not an unreasonable behaviour from the compiler. My apologies to Sun for suggesting this. I am left wondering why the program used to work in the first place..... Hans Disclaimer: any opinions above are my own. Hans Buurman | hans@duttnph.UUCP Pattern Recognition Group | mcvax!hp4nl!dutrun!duttnph!hans Faculty of Applied Physics | tel. 31 - (0) 15 - 78 46 94 Delft University of Technology |
mac@mrk.ardent.com (Michael McNamara) (02/08/89)
In article <615@dutrun.UUCP> mcvax!duttnph!hans@uunet.uu.net (Hans Buurman) writes: >One user on a Sun 3/60 that has recently been upgraded to SunOS 4.0 was >complaining about a function dumping core that should not have been used >at all. It turned out that he had a function called select() in his >program.... >Could somebody out there >a) tell me which rules the linker uses ? There is this neat facility on UNIX. :-) It's called man. :-) If you want to know how something works, you type man. :-) Try man ld. :-) The program works as coded. From your decription, it sounds like his compile command was cc -o myprog myprog1.o myprog2.o -lsunwindow This is automatically expanded to: cc -o myprog myprog1.o myprog2.o -lsunwindow -lc One of the linker's job is to link up references to external routines. It's rules are well defined. Look in the manual for ld. This behaviour is unchanged since early unix days: ... If a named file is a library, it is searched exactly once at the point it is encountered in the argument list. Only those routines defining an unresolved external reference are loaded... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You've already supplied select(), hence your version is used. As your fictious programmer supplied a routine called select(), and linked it ahead of the libraries, his select() would be used instead of any select defined in any library. Although he didn't call select, one of the library routines did, and got his instead of the one in libc.a. This is all as it should be, and allows one to quietly overload routine names in any library. Note that if select() was multiply defined in object files, a warning would be posted. Now the ability to quietly override library routines is of somewhat questionable utility, and violates the principle of least suprize; although I have used this feature on occasion. I used it once to attach statistics collecting to fopen/fclose. I supplyed my fopen/fclose, which jot down statistics, then call openf/closef. Then I extracted the systems versions of fopen/fclose via "ar x libc.a" and used emacs to change the names of the procedure definitions in the .o from fopen/fclose to openf/closef. (being careful not to change calls to fopen/fclose lurking in the library). Then when I link, everything calls my fopen, which calls openf, from my doctored .o's extracted from libc.a and the extra fopen definition (from libc.a) is not used. >b) guess where Sun screwed up ? Sorry. Not Sun's mistake. Program works as coded. Sun (and every other Unix vedor) might want to change ld so that it issues a warning message when a procedure supplied in a library is already defined from some earlier object or library... [[ That would be nice. I have often been tempted to name a function "wait" for whatever reason, and I'm sure that naive users have made similar mistakes. --wnl ]] Michael McNamara mac@ardent.com
gandalf@csli.stanford.edu (Juergen Wagner) (02/08/89)
[I am sending the reply to the entire list because I think there's been some confusion about the new select(2) syntax.] In article <615@dutrun.UUCP> mcvax!duttnph!hans@uunet.uu.net (Hans Buurman) writes: >... >One user on a Sun 3/60 that has recently been upgraded to SunOS 4.0 was >complaining about a function dumping core that should not have been used >at all. It turned out that he had a function called select() in his >program. Hmm... There is also a system call select(2) in SunOS4.0. This isn't particularly notable but the syntax has changed from 3.x to 4.x. Unless done deliberately, one shouldn't use names of system call for one's own functions. >One of the SunView routines he called (to initialize a panel) also used a >function called select(), which is in ndet_select.o in libsunwindow.a. It >looks like the compiler had linked the sunwindow calls to select() to his >own program. /* * Ndet_select.c - Notifier's version of select.... And if you look at the declaration: extern int select(nfds, readfds, writefds, exceptfds, timeout) register int nfds; fd_set *readfds, *writefds, *exceptfds; struct timeval *timeout; What the linker did was correct. It used the libsunwindow.a version of select because this library was mentioned first on the cc line: something like cc -o foo foo.c -lsunwindow -lpixrect Even if you used the standard version of select, the core dump should still be there. Consult your man page for select(2) for the changed syntax. -- Juergen Wagner gandalf@csli.stanford.edu wagner@arisia.xerox.com [[ On this topic, I have noticed that the old select kernel call was retained for backward compatibility. A program that uses "select" and that is compiled and linked on a 3.x machine will, under most circumstances, still work correctly under 4.x (because it's using the old select kernel call). But before you can compile it on a 4.x machine, you *must* make some changes. Read the new manual page for select(2) to see how it must be changed. This backward compatibility move was likely documented, but I'm too busy (or is that "lazy") to go look it up right now. To be more specific: the new select can handle file descriptor masks longer than 4 bytes (thus it can handle file descriptors >= 32). The old one assumed that the mask was 4 bytes. You can use old executables provided that you never try to "select" on a fd >= 32. --wnl ]]
guy@uunet.uu.net (Guy Harris) (02/22/89)
>[[ On this topic, I have noticed that the old select kernel call was >retained for backward compatibility. No, it wasn't. "select" in 3.x was system call number 93, and "select" in 4.0 is system call number 93. There is no "old select kernel call" in 4.0. >A program that uses "select" and that is compiled and linked on a 3.x >machine will, under most circumstances, still work correctly under 4.x >(because it's using the old select kernel call). A program that uses "select" "properly" and that is compiled and linked on a 3.x machine will, assuming no other binary compatibility problems occur, still work correctly under 4.x under *any* circumstances (modulo bugs in 4.x). A program that uses "select" "improperly" - i.e., passes in pointers to "int"s rather than "fd_set"s, but passes the result of "getdtablesize()" as the first argument under the assumption that it will always return a number less than or equal to the number of bits in an "int" - is quite likely to fail under 4.x. Replace "3.x" with "4.2BSD", and "4.x" with 4.3BSD, and the above statements are pretty much correct; the change to "select" from 3.x to 4.0 is a change from the 4.2BSD version to the 4.3BSD version. I don't know whether the 4.2BSD or SunOS 3.x documentation made it clear that it was a bad idea to use "getdtablesize" - or, at least, to use it without cutting its value off at 32 - because the system might be changed to support more than (# of bits per "int") file descriptors, or not; it may well have *encouraged* the use of "getdtablesize()", which is unfortunate. >But before you can compile it on a 4.x machine, you *must* make some >changes. Only if you've been using "select" "improperly", as indicated. [[ I realized that there didn't need to be two separate kernel calls just the other day while tracking down yet another select related bug. When I saw the described behavior (a 3.x executable that uses select still working under 4.0) I assumed that they had just retained the old call. After delving into things deeper, I know *exactly* why things work the way they do. An fd_set is an array of longs holding the bitmask. The first long corresponds to the *lowest* numbered file descriptors. Therefore, if you never use a "width" greater than 31, your program stands a very good chance of working under 3.x (also 4.2BSD) as well as 4.x (also 4.3BSD), because the kernel will never need anything beyond the first long. Neat, huh? As for using select "properly" under 3.x, please tell me where in the 3.x documentation it is stated that one must use a fd_set pointer in a select call. The manual page sure doesn't say anything about it. It also encouraged the use of "getdtablesize", unfortunately. Although fd_set was defined in <sys/types.h>, none of the macros associated with it were defined anywhere (much less documented). This all made it rather hard to use select "properly" under 3.x. --wnl ]]