gvr@cs.brown.edu (George V. Reilly) (11/30/90)
% You must concatentate part1.tex and part2.tex together to form % portableC.tex % remove [portableC] from the \documentstyle command below if you % prefer the old format. However, be sure to somehow include the % section marked `% incorporate any additional commands I find necessary'. \documentstyle[portableC]{article} \pagestyle{headings} \begin{document} \bibliographystyle{alpha} % The number between brackets is the minor revision number which % must be removed when we finally agree on the contents. \title{{\bf Notes on Writing\\Portable Programs in C}\\ {\small (Nov 1990, 8th Revision)} } \author{A. Dolenc% \protect\thanks{Internet: \id{ado@sauna.hut.fi}.} \\ A. Lemmke \\ {\em Helsinki University of Technology} \\ D. Keppel% \protect\thanks{Internet: \id{pardo@cs.washington.edu}.} \\ {\em CS\&E, University of Washington} \\ {\normalsize and} \\ G. V. Reilly% \protect\thanks{Internet: \id{gvr@cs.brown.edu}.} \\ {\em Dept.\ of Computer Science, Brown University} } \maketitle { \abstract \parskip=4pt plus 1pt \parindent=0pt This documents describes the features and non-features of different C~preprocessors, compilers, and environments. As such, it is an incomplete document, growing as information is gathered. It contains some material concerning ANSI~C but it is not a substitute for the Standard itself. We assume the reader is familiar with the C~programming language. \endabstract } \pagebreak \tableofcontents \pagebreak \parskip=4pt plus 1pt \parindent=0pt \raggedbottom %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \section{Foreword} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ We will call a program {\em portable\/} if adapting it to a new environment is easier than rewriting it for that environment. This document is mainly for those who have {\em never\/} ported a program to another platform --- a specific hardware and software environment --- and, evidently, for those who plan to write large systems which must be used across different vendor machines. If you have already done some porting, you may not find the information herein very useful. We suggest that \cite{style} be read in conjunction with this document.\footnote{\cite{style} can be obtained via {\em anonymous FTP\/} from \site{cs.washington.edu} in \file{\twiddle{}ftp/pub/cstyle.tar.Z}\@.} Posters to the newsgroup \ng{comp.lang.c} have repeatedly recommended \cite{MH} and \cite{AK} (none of the information herein has been taken from those two references). {\bf Disclaimer:} We will attempt to keep the information herein updated, but it can happen that some of it may be incorrect at the time of reading. The code fragments presented are intended to make applications ``more'' portable, meaning that they may fail with some compilers and/or environments. {\footnotesize This document can be obtained via anonymous FTP from \site{sauna.hut.fi} [130.233.251.253] in \file{\twiddle{}ftp/pub/CompSciLab/doc}. The files \file{portableC.tex}, \file{portableC.sty}, \file{portableC.bib}, and \file{portableC.ps.Z} are the \LaTeX\ source and style files, {\sc Bib}\TeX\ and the compressed {\sc PostScript}, respectively. Alternatively, there is a site in the US from which one can obtain all four files, \site{cs.washington.edu} [128.95.1.4] in \file{\twiddle{}ftp/pub/cport.tar.Z}\@. All files are in the public domain. Comments, suggestions, flames, eggs, and requests for copies via e-mail should be directed to \id{ado@sauna.hut.fi}. } %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \section{Introduction} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The aim of this document is to collect the experience of several people who have had to write and/or port programs written in~C to more than one platform. In order to keep this document within reasonable bounds, we must restrict ourselves to programs which must execute under Unix-like operating systems and those which implement a reasonable Unix-like environment. The only exception we will consider is VMS\@. A wealth of information can be obtained from programs that have been written to run on several platforms. This is the case of publicly available software such as that developed by the Free Software Foundation and the MIT X~Consortium. When discussing portability, one focuses on two issues: \begin{description} \item[The language,] which includes the preprocessor and the syntax and the semantics of the language. \item[The environment,] which includes the location and contents of header files and the run-time library. \end{description} We include in our discussions the standardization efforts upon the language and the environment. Special attention will be given to floating-point representations and arithmetic, to limitations of specific compilers, and to VMS\@. Our main focus will be {\em boiler-plate\/} problems. Systems programming, \e.g. raw I/O from terminals, and twisted code associated with bizarre interpretations of \cite{ansi} --- henceforth referred to as the Standard --- are not extensively covered in this document.\footnote{We regard this document as a living entity growing as needed and as information is gathered. Future versions of this document may contain a lot of such information.} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \section{Standardization Efforts} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ All standards have a good side and an evil side. Due to the nature of this document, we are forced to focus our attention on the latter. The American National Standards Institute (ANSI) has recently approved of a standard for the C~programming language \cite{ansi}. The Standard concentrates on the syntax and semantics of the language and specifies a minimum environment (the name and contents of some header files and the specification of some run-time library functions). Copies of the ANSI~C Standard (ANSI X3.159--1989) can be obtained from the following address: {\small \begin{center} \begin{tabular}{l} American National Standards Institute\\ Sales Department\\ 1430 Broadway\\ New York, NY 10018\\ (Voice) (212) 642--4900\\ (Fax) (212) 302--1286\\ \end{tabular} \end{center} } %============================================================================= \subsection{ANSI~C} %============================================================================= %----------------------------------------------------------------------------- \subsubsection{Translation Limits} %----------------------------------------------------------------------------- We first bring to the reader's attention the fact that the Standard states some environmental limits. These limits are {\em lower bounds}, meaning that a correct (compliant) compiler may refuse to compile an otherwise-correct program that exceeds one of those limits.\footnote{Maybe there {\em are\/} people out there who still write compilers in FORTRAN after all\ldots.} Below are the limits that we judge to be the most important. The ones related to the preprocessor are listed first. \begin{itemize} \item {\em 8~nesting levels of conditional inclusion.} \item {\em 8~nesting levels for \<\#include>d files.} \item {\em 32~nesting levels of parenthesized expressions within a full expression.} This will probably occur when using macros. \item {\em 1024~macro identifiers simultaneously.} Can happen if one includes too many header files. \item {\em 509~characters in a logical source line.} This is a serious restriction if it applies {\em after\/} preprocessing. Since a macro expansion always results in one line, this affects the maximum size of a macro. It is unclear what the Standard means by a logical source line in this context and in most implementations this limit will probably apply {\em before\/} macro expansion. \item {\em 6~significant initial characters in an external identifier.} Usually this constraint is imposed by the environment, \e.g. the linker, and not by the compiler. \item {\em 127~members in a single structure or union.} \item {\em 31~parameters in one function call.} This may cause trouble with functions that accept a variable number of arguments. Therefore, it is advisable that when designing such functions that either the number of parameters be kept within reasonable bounds or that alternative interfaces be supplied, \e.g. using arrays. \end{itemize} It is really unfortunate that some of these limits may force a programmer to code in a less elegant way. We are of the opinion that the remaining limits stated in the Standard can usually be obeyed if one follows ``good'' programming practices. However, these limits may break programs that {\em generate\/} C~code such as compiler-compilers and many \C++~compilers. %----------------------------------------------------------------------------- \subsubsection{Unspecified and Undefined Behavior} %----------------------------------------------------------------------------- The following are examples of unspecified and undefined behavior: \begin{enumerate} \item The order in which the function designator and the arguments in a function call are evaluated. \item The order in which the preprocessor concatenation operators \<\#> and \<\#\#> are evaluated during macro substitution. \item The representation of floating-point types. \item An identifier is used that is not visible in the current scope. \item A pointer is converted to something other than an integral or pointer type. \end{enumerate} The list is long. One of the main reasons for explicitly defining what is {\em not\/} covered by the Standard is to allow the implementor of the C~environment to make use of the most efficient alternative. %============================================================================= \subsection{POSIX} %============================================================================= % arl: We should order the release9 (10 ?) manual \ldots maybe LK does ? The objective of the POSIX working group P1003.1 is to define a common interface for Unix. Granted, the ANSI~C standard does specify the contents of some header files and the behavior of some library functions but it falls short of defining a useful environment. This is the task of P1003.1. We do not know how far P1003.1 addresses the problems presented in this document as at the moment we lack proper documentation. Hopefully, this will be corrected in a future release of this document. %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \section{Preprocessors} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Preprocessors can behave differently in several ways. For those who need them, there are good publicly available preprocessors that are ANSI~C--compliant. One such preprocessor is the one distributed with the X~Window System developed by the MIT X~Consortium. %============================================================================= \subsection{Command Options} %============================================================================= The interpretation of the \<-I> command option can differ from one system to another. Besides, it is not covered by the Standard. For example, the directive \<\#include "dir/file.h"> in conjunction with \<-I..> would cause most preprocessors in a Unix-like environment to search for \file{file.h} in \file{../dir}, but under VMS, \file{file.h} is only searched for in the subdirectory \file{dir} in the current working directory. %============================================================================= \subsection{\<\#pragma> and \<\#elif>} %============================================================================= Directives are very much the same in all preprocessors, except that some preprocessors may not know about the \<defined> operator in a \<\#if> directive nor about the \<\#pragma> and \<\#elif> directives. The \<\#pragma> directive should pose no problems even to old preprocessors {\em if it comes indented}.\footnote{Old preprocessors only take directives that begin with \<\#> in the first column.} Furthermore, it is advisable to enclose them with \<\#ifdef>s in order to document under which platform they make sense: \begin{verbatim} #ifdef <platform-specific-symbol> #pragma ... #endif \end{verbatim} Beware of \<\#pragma> directives that alter the semantics of the program and consider the case when they are not recognized by a particular compiler. Evidently, if the behavior of the program relies on their correct interpretation then, in order for the program to be portable, all target platforms must recognize them properly. %============================================================================= \subsection{Concatenation} %============================================================================= Concatenation of symbols has two variants. One is the old K\&R \cite{KR1} style that simply relied on the fact that the preprocessor substituted comments such as \</**/> for nothing. Obviously, that does not result in concatenation if the preprocessor includes a space in the output. The ANSI~C Standard defines the operators \<\#\#> and (implicit) concatenation of adjacent strings. Since both styles are a fact of life it is useful to include the following in one's header files:\footnote{Some have suggested using \<\#if \_\_STDC\_\_> instead of simply \<\#ifdef \_\_STDC\_\_> to test if the compiler is ANSI-compliant because of compilers that are {\em not}, but define \<\_\_STDC\_\_> equal to zero.} \begin{verbatim} #ifdef __STDC__ # define GLUE(a,b) a##b #else # define GLUE(a,b) a/**/b #endif \end{verbatim} If needed, one could define similar macros to \<GLUE> several arguments.\footnote{\<GLUE(a,GLUE(b,c))> would not result in the concatenation of \<a>, \<b>, and \<c>.} %============================================================================= \subsection{Token Substitution} %============================================================================= Some preprocessors perform token substitution within quotes while others do not. Therefore, this is intrinsically non-portable. The Standard disallows it but provides a mechanism to obtain the same results. The following should work with ANSI-compliant preprocessors or with the ones that perform token substitution within quotes: \begin{verbatim} #ifdef __STDC__ # define MAKESTRING(s) # s #else # define MAKESTRING(s) "s" #endif \end{verbatim} %============================================================================= \subsection{Miscellaneous} %============================================================================= \begin{itemize} \item We would {\em not\/} trust the following to work on {\em all\/} preprocessors: \begin{verbatim} #define D define #D this that \end{verbatim} The Standard does not allow such a syntax (see~\S3.8.3 \P20 in \cite{ansi}). \item Many preprocessors ignored, or still ignore, text after the \<\#else>, \<\#elif>, and \<\#endif> directives. However, the Standard forbids anything but comments after these directives. \item Some preprocessors will consider it an error to \<\#undef> something that has not been \<\#define>d, although it is allowed to do so. \item Finally, we must add that the Standard has fortunately included a \<\#error> directive with obvious semantics. Indent the \<\#error> since old preprocessors do not recognize it. \end{itemize} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \section{The Language} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ %============================================================================= \subsection{The Syntax} %============================================================================= The syntax defined in the Standard is a {\em superset\/} of the one defined in K\&R~\cite{KR1}. It follows that if one restricts oneself to the former, there should be no problems with an ANSI~C--compliant compiler {\em with respect to syntax}. The {\em semantics\/} are, however, another problem altogether and is covered superficially in the next section. The Standard extends the syntax with the following: \begin{enumerate} \item The inclusion of the keywords \<const>, \<enum>, \<signed>, \<void>, and \<volatile>. \item The inclusion of additional constant suffixes to indicate their type. \item The ellipsis (``\<...>'') notation to indicate a variable number of arguments. \item Function prototypes. \item Trigraph notation for specifying otherwise-unobtainable characters in restricted character sets. \end{enumerate} We encourage the use of the reserved words \<const> and \<volatile> since they aid in documenting the code. It is useful to add the following to one's header files if the code must be compiled by a non-conforming compiler as well: \begin{verbatim} #ifndef __STDC__ # define const # define volatile #endif \end{verbatim} However, one must then make sure that the behavior of the application does not depend on the presence of such keywords. (Evidently, programs that contain identifiers with those names must be modified to conform to the Standard.) The trigraph notation can bring unexpected results when a program is compiled by an ANSI-compliant compiler, \e.g. strings such as~\<"??!"> will produce~\<"|">. Watch out! %============================================================================= \subsection{The Semantics} %============================================================================= The syntax does not pose any problem with regard to interpretation because it can be defined precisely. However, programming languages are always described using a natural language, \e.g. English, and this can lead to different interpretations of the same text. Evidently, \cite{KR1} does not provide an unambiguous definition of the C~language otherwise there would have been no need for a standard. Although the Standard is much more precise, there is still room for different interpretations in situations such as \<f(p=\&a, p=\&b, p=\&c)>. Does this mean \<f(\&a,\&b,\&c)> or \<f(\&c,\&c,\&c)>? Even ``simple'' cases such as \<a[i] = b[i++]> are compiler-dependent \cite{style}. As stated in the Introduction, we would like to exclude such topics. The reader is instead directed to the Usenet newsgroups \ng{comp.std.c} or \ng{comp.lang.c} where such discussions take place and from where the above example was taken. {\em The Journal of C~Language Translation}\footnote{Address is 2051, Swans Neck Way, Reston, Virginia 22091, USA\@.} could, perhaps, be a good reference. Another possibility is to obtain a clarification from the Standards Committee and the address is: {\small \begin{center} \begin{tabular}{l} X3 Secretariat, CBEMA\\ 311 1st St NW Ste 500\\ Washington DC, USA\\ \end{tabular} \end{center} } Finally, we mention that a complete list of the differences between ``ordinary''~C and ANSI~C can be found in the Second Edition of~K\&R~\cite{KR2}. A slightly less up-to-date list can also be found in~\cite{HS}. %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \section{Unix Flavors: System~V and BSD} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ A long time ago (1969), Unix said ``{\tt papa}'' for the first time at AT\&T (then called Bell Laboratories, or Ma Bell for the intimate) on a PDP-7. Everyone liked Unix very much and its widespread use we see today is probably due to the relative simplicity of its design and of its implementation. (It is written, of course, mostly in~C\@.) However, these facts also contributed to everyone developing their own dialect. In particular, the University of Berkeley at California distribute the so-called BSD\footnote{Berkeley Software Distribution} Unix whereas AT\&T now distribute (sell) System~V Unix. All other versions of Unix are descendants of one of these major dialects. The differences between these two major flavors should not upset most application programs. In fact, we would even say that most differences are just annoying. BSD~Unix has an enhanced signal handling capability and implements sockets. However, {\em all\/} Unix flavors differ significantly in their raw I/O interface (that is, the \<ioctl> system call), and this should be avoided if possible. The reader interested in knowing more about the past and future of Unix can consult \cite{unix1,unix2}. %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \section{Header Files} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Many useful system header files are in different places in different systems, or they define different symbols. We will assume henceforth that the application has been developed on a BSD-like Unix and must be ported to a System~V-like Unix or VMS or a Unix-like system with header files that comply with the Standard. In the following sections, we show how to handle the most simple cases that arise in practice. Some of the code that appears below was derived from the header file \file{Xos.h} which is part of the X~Window System distributed by MIT\@. We have added changes, \e.g. to support VMS\@. Many header files are unprotected in many systems, notably those derived from BSD version~4.2 and earlier. By ``unprotected'' we mean that an attempt to include a header file more than once will either cause compilation errors (\e.g. due to recursive or nested includes) or, in some implementations, warnings from the preprocessor stating that symbols are being redefined. It is good practice to protect header files. %============================================================================= \subsection{\file{ctype.h}} %============================================================================= \file{ctype.h} provides {\em almost\/} the same functionality on all systems, except that some symbols must be renamed. \begin{verbatim} #ifdef SYSV # define _ctype_ _ctype # define toupper _toupper # define tolower _tolower #endif \end{verbatim} Under Sys~V, \<toupper> and \<tolower> are also defined and will check the validity of their arguments and perform the conversion only if necessary. Under BSD-derived systems, one must normally remember to check the validity of the arguments. The following solution might be acceptable to most: \begin{verbatim} #ifdef SYSV # define TOUPPER(c) toupper(c) #else /* !SYSV */ # define TOUPPER(c) (islower(c)?toupper(c):(c)) #endif \end{verbatim} {\em The definitions in \file{<ctype.h>} are not portable across character sets.} %============================================================================= \subsection{\file{fcntl.h} and \file{sys/file.h}} %============================================================================= Many files that a BSD-like system expects to find in the \file{sys} directory are placed in \file{/usr/include} in System~V\@. Other systems, such as VMS, do not even have a \file{sys} directory.\footnote{Under VMS, since a path such as \file{<sys/file.h>} will evaluate to \file{sys:file.h}, it is sufficient to equate the logical name \file{sys} to \file{sys\$library}.} The symbols used in the \<open> function call are defined in different header files in the two types of systems: \begin{verbatim} #ifdef SYSV # include <fcntl.h> #else # include <sys/file.h> #endif \end{verbatim} In some systems, \e.g. BSD~4.3 and SunOS, it does not make a difference which one is used because both define the \<O\_xxxx> symbols. %============================================================================= \subsection{\file{errno.h}} %============================================================================= The semantics of the error number may differ from one system to another and the list may differ as well (\e.g. BSD systems have more error numbers than System~V). Some systems, \e.g. SunOS, define the global symbol \<errno> which will hold the last error detected by the run-time library. This symbol is not {\em declared\/} in most systems, although it is required by the Standard that such a symbol be defined (see~\S4.1.3 of \cite{ansi}). It is, of course, available in all Unix implementations. The most portable way to print error messages is to use \<perror>. %============================================================================= \subsection{\file{math.h}} %============================================================================= System~V has more definitions in this header file than BSD-like systems. The corresponding library has more functions as well. This header file is unprotected under VMS and Cray, and in that case we must do it ourselves: \begin{verbatim} #if defined(CRAY) || defined(VMS) # ifndef __MATH__ # define __MATH__ # include <math.h> # endif #endif \end{verbatim} %============================================================================= \subsection{\file{strings.h} {\em vs.\ }\file{string.h}} %============================================================================= Some systems cannot be treated as System~V or BSD, but are really special cases, as one can see in the following: \begin{verbatim} #ifdef SYSV # ifndef SYSV_STRINGS # define SYSV_STRINGS # endif #endif #ifdef _STDH_ /* ANSI C Standard header files */ # ifndef SYSV_STRINGS # define SYSV_STRINGS # endif #endif #ifdef macII # ifndef SYSV_STRINGS # define SYSV_STRINGS # endif #endif #ifdef vms # ifndef SYSV_STRINGS # define SYSV_STRINGS # endif #endif #ifdef SYSV_STRINGS # include <string.h> # define index strchr # define rindex strrchr #else # include <strings.h> #endif \end{verbatim} As one can easily observe, System~V-like Unix systems use different names for \<index> and \<rindex> and place them in different header files. Although VMS supports better System~V features, it must be treated as a special case. %============================================================================= \subsection{\file{time.h} and \file{types.h}} %============================================================================= When using \file{time.h}, one must also include \file{types.h}. The following code does the trick: \begin{verbatim} #ifdef macII # include <time.h> /* on a Mac II we need this one as well */ #endif #ifdef SYSV # include <time.h> #else # ifdef vms # include <time.h> # else # ifdef CRAY # ifndef __TYPES__ /* it is not protected under CRAY */ # define __TYPES__ # include <sys/types.h> # endif # else # include <sys/types.h> # endif /* of ifdef CRAY */ # include <sys/time.h> # endif /* of ifdef vms */ #endif \end{verbatim} The above is not sufficient in order for the code to be portable since the structure that defines time values is not the same in all systems. Different systems have vary in the way \<time\_t> values are represented. The Standard, for instance, only requires that it be an arithmetic type. Recognizing this difficulty, the Standard defines a function called \<difftime> to compute the difference between two time values of type \<time\_t>, and \<mktime> which takes a string and produces a value of type \<time\_t>. %============================================================================= \subsection{\file{varargs.h} {\em vs.\ }\file{stdarg.h}}\label{varargsh} %============================================================================= In some systems the definitions in both header files are contradictory. For instance, the following will produce compilation errors, \e.g. under VMS: \begin{verbatim} #include <varargs.h> #include <stdio.h> \end{verbatim} This is because \file{<stdio.h>} includes \file{<stdarg.h>} which in turn redefines all the symbols (\<va\_start>, \<va\_end>, etc.)\ in \file{<varargs.h>}. This is incorrect behavior because Standard header files should not include other Standard header files. Furthermore, the method used in \file{<varargs.h>} for defining variadic functions is incompatible with the Standard (see~\S\ref{ansic} for more information on variadic functions). The solution we adopt is to always include \file{<varargs.h>} last and not to define in the same module both functions that use \file{<varargs.h>} and functions that use the ellipsis notation. %============================================================================= \subsection{\file{sys/wait.h}} %============================================================================= This one is lacking in some systems (\e.g. Altos and Xenix). HP-UX does define it but one must use macros to access the fields of the \<wait struct>, instead of using the names of the fields. The \<wait struct> uses bit-fields and if the platform does not define it one must do it oneself and care must be taken with respect to byte ordering (see {\bf Byte ordering} in~\S\ref{tp}). %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \section{Run-time Library} %+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ % System~V vs. BSD % The Tektronix manual has some good stuff about this % arl: o I think hpux manuals have too. hpux is a sysV based % system which has nowadays lots of bsd features. % o Sun is also sysV based 'all the goodies' from bsd % implemented os. Mostly you can program with it like % bsd or sysV or mixed \ldots it tries (?) to support both. % o some X11 manuals might help, because X is 'portable' % o 88open manuals & stuff. 88open is a consortium % which describes portability of software & binaries % between Motorola 88k based computers. % o we should have here something about signals too ? % the stuff is not so portable, but in extensive hacking % you need signals \ldots I have some information of that. % ado: Many functions have the same functionality in various systems % but they differ on (i) the type of value they return and (ii) % the setting of errno. E.g., printf&friends,rewind. This section admittedly contains very little information if compared to \cite{MH}. We direct the reader to that reference for more information. Time and time again, it happens that the target platform does not have all the library functions needed by a given application. This is particularly true with mathematical functions. We would like to remind the reader that the sources to 4.3BSD are publicly available, and may be obtained at several sites, \e.g. \site{funic.funet.fi} [128.214.6.100] in \file{\twiddle{}ftp/pub/bsd-sources}, the contents of which are cloned from \site{uunet.uu.net}. Read the copyright notices before using them. %============================================================================= \subsection{Mathematical Functions} %============================================================================= %----------------------------------------------------------------------------- \subsubsection{\<cbrt> and \<pow>} %----------------------------------------------------------------------------- \<cbrt(x)> evaluates the cube root of its argument, that is,~$x^{1/3}$. \<pow(x,y)> evaluates~$x^y$. Some systems implement neither of these, or just the latter. In that case, one can define \<pow> as a function of \<exp> and \<log>, and if one has \<pow> but not \<cbrt>, one can write the latter as a function of the former: \begin{verbatim} #define pow(x,y) (exp(log(x)*(y))) #define cbrt(x) (pow((x),1./3.)) \end{verbatim} Thus defined, \<pow> only admits strictly positive arguments. If the argument~\<x> is negative, then a result can be evaluated if~\<y> is an integer and one must implement such a function oneself (a predicate which determines if~\<y> is an integer is usually not available). The definitions given above are a ``poor man's'' solution to the problem but acceptable in many situations. In order to obtain numerically robust and accurate results one must investigate other alternatives such as obtaining the source code for the 4.3BSD implementation via anonymous FTP as mentioned at the beginning of this Section. It should be mentioned that if the argument~\<y> is zero then implementations differ on the result. The 4.3BSD implementation returns always~$1.0$; others may return undefined values, flag an error, or return not-a-number. %----------------------------------------------------------------------------- \subsubsection{\<rand>} %----------------------------------------------------------------------------- \<rand> returns a pseudo-random integer in the range 0 to~\<RAND\_MAX>, which is guaranteed only to be at least 32,767. Do not rely on \<rand> returning results over a much wider range. %============================================================================= \subsection{Memory allocation and initialization} %============================================================================= %----------------------------------------------------------------------------- \subsubsection{\<alloca>} %----------------------------------------------------------------------------- \<alloca(n)> allocates the amount of bytes specified by~\<n> and returns a pointer to the allocated memory. This space is --- for all practical purposes --- automatically deallocated (freed) when the block scope is exited. More specifically, the storage is deallocated {\em no sooner\/} than the exit from the block scope; the implementation is allowed to do the freeing at function exit, upon the next call to \<alloca>, or at any other moment deemed appropriate. The example below illustrates {\em incorrect\/} usage of \<alloca>: \begin{verbatim} foo () { char *sto; { sto = alloca (10); use (sto); /* Correct. */ } use (sto); /* Error: storage may have been freed. */ } \end{verbatim} Conceptually, the space is allocated on a stack, so allocation can be as fast as just adjusting the stack pointer if the machine has one, and several regions can be freed at once by simply readjusting the stack pointer. However, it is hard to implement \<alloca> both portably and efficiently. \<alloca> is not available on all platforms and as such is not required by the Standard. However, there are public domain implementations that work in a wide variety of cases, but which can be slow and which can delay freeing arbitrarily\footnote{A public domain implementation of \<alloca> can be obtained from the Free Software Foundation (GNU); try \site{prep.ai.mit.edu} in \file{\twiddle{}ftp/pub/gnu}.}. Thus, while it is very desirable to use \<alloca> when it is available, because of efficiency considerations, it is highly recommended that the code be written so that \<malloc> and \<free> can easily replace it, if and when necessary. %----------------------------------------------------------------------------- \subsubsection{\<bcopy> {\em vs.\ }\<memcpy> and \<memmove>} %----------------------------------------------------------------------------- \<bcopy(s1,s2,n)> copies the string~\<s1> into~\<s2>, whereas \<memcpy(s1,s2,n)> copies~\<s2> into~\<s1>. \<bcopy> can be found in BSD-like systems, and some implementations handle overlapping strings, while others do not. \<memcpy> and \<memmove> are implemented in the other camp (System~V); \<memcpy> does not handle overlapping strings, whereas \<memmove> does. The normal solution is to use macros. %----------------------------------------------------------------------------- \subsubsection{\<bzero> {\em vs.\ }\<memset>} %----------------------------------------------------------------------------- \<bzero(s,n)> is equivalent to \<memset(s,0,n)>. The former is implemented in BSD-like systems, whereas the latter is implemented in System~V-like systems and is required by the Standard. See also {\bf Initialization} in~\S\ref{misc}. %----------------------------------------------------------------------------- \subsubsection{\<malloc> and \<free>} %----------------------------------------------------------------------------- \<malloc> is available in all C~implementations and its behavior is very well defined except in boundary conditions. Not all implementations accept a zero-sized request. There are other minor differences such as the return type being \<char~*> in some implementations and \<void~*> in others. In a similar vein, some implementations of \<free> do not accept \<NULL> as an argument. Worse, though, is that some implementations allowed the caller to use the pointer even {\em after\/} it had been \<free>d so long as no other call to \<malloc> was performed. Relying on such behavior is bad. %----------------------------------------------------------------------------- \subsubsection{\<realloc>} %----------------------------------------------------------------------------- \<realloc(sto,n)> takes a pointer to a region allocated with \<malloc> and grows or shrinks the region so that it is of size~\<n>. The return value from \<realloc> is a pointer to the resized storage; if the storage was grown ``in place'', the return value is the same as \<sto>. If the region was moved, then the old contents are copied to the new storage (if~\<n> is smaller than the old size, then only the first~\<n> units are copied). If the region is grown, the new storage at the end is uninitialized and may contain garbage. Under ANSI C: \begin{itemize} \item If \<sto == NULL>, then \<realloc> acts like \<malloc>. \item If \<n == 0>, then \<realloc> acts like \<free>. \item If \<sto == NULL> {\em and\/} \<n == 0>, the results are undefined. \end{itemize} For non-ANSI versions of \<realloc>, specifying \<NULL> as the storage or \<0>~as the new size causes undefined behavior. Thus, it is recommended that portable programs, {\em even those written in ANSI~C}, not use these features. If it is necessary to rely on those features, use a macro or write a function that can be configured to check for those cases explicitly. %============================================================================= \subsection{Miscellaneous} %============================================================================= %----------------------------------------------------------------------------- \subsubsection{\<scanf>} %----------------------------------------------------------------------------- \<scanf> can behave differently on different platforms because its descriptions, including the one in the Standard, allows for different interpretations under some circumstances. The most portable input parser is the one you write yourself. Some versions of the \<scanf> family modify and then restore arguments which are string constants. These implementations cause problems when string constants are placed in read-only memory (see {``String constants''} in~\S\ref{misc}). If the string is actually a constant, then some workaround is needed; usually a compiler flag may be used to indicate that such constants should be placed in writable memory instead. If such a flag is not available then the code must be modified. %----------------------------------------------------------------------------- \subsubsection{\<setjmp> and \<longjmp>} %----------------------------------------------------------------------------- Quoting anonymously from \ng{comp.std.c}, ``pre-X3.159 implementations of \<setjmp> and \<longjmp> often did not meet the requirements of the Standard. Often they didn't even meet their own documented specs. And the specs varied from system to system. Thus it is wise not to depend too heavily on the exact standard semantics for this facility\ldots''. In other words, it is not that you should {\em not\/} use them but be careful if you do. Furthermore, the behavior of a \<longjmp> invoked from a nested signal handler\footnote{That is, a function invoked as a result of a signal raised during the handling of another signal. See~\S4.6.2.1 \P15 in \cite{ansi}.} is undefined. Finally, the symbols \<\_setjmp> and \<\_longjmp> are only defined under SunOS, BSD, and HP-UX\@. Some systems do not implement \<setjmp> and friends at all. %----------------------------------------------------------------------------- \subsubsection{Signal Handling} %----------------------------------------------------------------------------- We would like to point out one problem when handling signals generated by hardware, such as \<SIGFPE> and \<SIGSEGV>\@. There are two possibilities on a normal exit from the signal handler: (i)~the offending instruction is re-executed, or (ii)~it is not. The first possibility may cause an infinite loop, and the only portable solution is to \<longjmp> out of the signal handler.