henry (05/15/82)
I have had several complaints about floating-point programs that used to work and now fail. My investigation has been limited, and it's not impossible that there are hidden problems of some kind, but it looks very much like programmer carelessness is the major problem. The major cause of "Floating exception - core dumped" in a floating-point program that hasn't done anything unusual is a division by zero. Our old floating-point simulator did not detect this as an error; the hardware croaks on it every time, deliberately. The programs I have looked at so far have contained two serious mistakes: they opened files using open() or fopen() without checking to see whether those opens succeeded, and they read values using scanf() or fscanf() without checking to see whether those scans succeeded. A failure in either of these operations would leave the input variables unaltered. There may be other such operations, although none come to mind just now. Although it is not guaranteed, the most likely value for a variable which has never been set to anything is zero. And any division by such a variable will get "Floating exception - core dumped". One program I tested bombed in exactly this way when I invoked it with nonexistent data files. As far as I can determine, there is nothing wrong with the new hardware except that it finds programmers' carelessness that previously was not detected. I have checked the math library -- exp(), log(), and their relatives -- and can find nothing wrong with it; when I test these routines, they work. I am willing to help anyone who has run into trouble. But I must insist, first, that the programs in question check for the problems described above and report such errors. This is good style in any event: filenames are typed by people, who make errors. The same often applies to data files. It is most unwise (and quite poor programming) for a program to blindly charge ahead without checking for such things.