hamilton@intersil.uucp (Fred Hamilton) (01/23/90)
-- I've just started running MemWatchII to aid (hopefully) in tracking down the source of some intermittent crashes in my system. MemWatch has identified for me a number of programs that write over low memory. Now I understand that the OS needs section(s) of memory all to itself (that's what I'm assuming "low memory" is reserved for), but why do *any* applications write to low memory? What's the appeal? Or is it done by accident? by compilers? How and why do all these programs that trash low mem do it? On a related note, I've wondered about these "XXXXXXX won't run with my 590 because it was expecting to see 00 in location $00, but an early version of the FastFileSystem would write different values to location $00 causing XXXXXXX to crash" messages. The solution was "get the latest revision of FFS". I don't understand that. Why was game/application "XXXXXXX" writing to and/or worried about the value in location 0 in the first place. Why was FFS "wrong" and the program not? Finally, since upgrading to WShell1.2, I've gotten a few "Warning- Hanging Forbid!" messages after running some applications. What is a hanging forbid and how serious is it? Should I report it to the people who made the software that hung the forbid? Thanks in advance for any enlightment. -- Fred Hamilton Any views, comments, or ideas expressed here Harris Semiconductor are entirely my own. Even good ones. Santa Clara, CA
cmcmanis@stpeter.Sun.COM (Chuck McManis) (01/24/90)
In article <67.25bb84f0@intersil.uucp> (Fred Hamilton) writes: > Now I understand that the OS needs section(s) of memory all to itself > (that's what I'm assuming "low memory" is reserved for), but why do *any* > applications write to low memory? What's the appeal? Or is it done by > accident? by compilers? How and why do all these programs that trash > low mem do it? Well, a lot of people program in C, and sooner or later they use pointers. Consider the following code fragment : struct { int a, b, c, d; char string[80]; } *mystruct; ... mystruct->a = CalculateSomething(); strcpy(SomeString, mystruct->string); ... foo = mystruct->a; printf("%s", mystruct->string); ... What do you notice ? Well you will notice that the structure called mystruct occupies 96 bytes of memory (88 if you are using 16 bit integers). And the application is using it properly, (ie it's dereferencing the pointer to get at the members.) But what if the application forgot to initialize the pointer to anything? Guess what, it defaults to 0. Now if we had a MMU and a process address space it might warn us that we were reading/writing outside of our address space, unfortunately we don't. To further complicate matters, unless someone writes into variable 'b' above and clobbers ExecBase the program might perform flawlessly. So now your program is working fine until the process that actually owns the memory you've been writing into needs it and it has been stomped. Or maybe you have overwritten an interrupt vector and the next time the floppy gets accessed your machine crashes. >On a related note, I've wondered about these "XXXXXXX won't run with my 590 >because it was expecting to see 00 in location $00, but an early version >of the FastFileSystem would write different values to location $00 causing >XXXXXXX to crash" messages. The solution was "get the latest revision of >FFS". I don't understand that. Why was game/application "XXXXXXX" writing >to and/or worried about the value in location 0 in the first place. Why >was FFS "wrong" and the program not? Unless you "own" location 0, meaning that you've called the system memory allocator and it's given you the 4 bytes at location zero for your program to use, it is illegal to write to it. However, a side effect of modifying location zero is that when you crash, some alerts that gets displayed will have on one side of the '.' the contents of location 0. Consequently a very clever form for debugging low level code was to write a debugging value to 0 (say a 1 for the subroutine Foo, and a 2 for subroutine Bar) and when the system crashed, looking at that value to determine which routine you were in. The FastFileSystem used this technique and not all of the debugging code was removed before the initial release. >Finally, since upgrading to WShell1.2, I've gotten a few "Warning- >Hanging Forbid!" messages after running some applications. What is a >hanging forbid and how serious is it? Should I report it to the people >who made the software that hung the forbid? A hanging forbid occurs when a process calls the Forbid() exec function without calling the Permit() function. The use of this function is to lock out the task scheduler for a moment because the operation your program is doing cannot be split between instructions. (The classic example of this is when you are modifying system lists which the next task in the queue may reference, you must guarantee that they are consistent before you allow a task switch.) As it turns out things like Wait() and interrupts will break a Forbid() so the system doesn't always halt, but whenever control returns to the task that has an outstanding Forbid(), control will stay with that task until it either calls Wait() or Permit(). A common use for Forbid() is to terminate a child task or process. Generally what happens is that the child process or task is getting ready to exit and thus cleans up all of it's memory that it used or resources that it has allocated and then wishes to send a message to it's parent saying that it is ok to clean it up and unload it. One of the paradoxes of this situation is that the child may need to do one last thing after it sends the message but before it is actually ready to be unloaded. To accomplish this a simple technique is used, the child calls Forbid(), replys to (or sends) the message to it's parent that it is ready to die, and then because it is under a forbid it knows the parent won't get the message before it's ready, it does its final cleaning and calls Wait(0). The effect of the Wait(0) is twofold. First, the Forbid() is broken even though it was never "Permitted" because the Wait() call forces a task switch. Secondly, because there were no signal bits to check for the process/task is permanently placed in the "not ready to run" queue so it won't execute any more instructions. Now when the parent removes the task, it will be guaranteed to not be running and thus safe to remove from the system. If you try to remove a running task, there is a chance that the scheduler will restart it after you have released it's memory but before it has been removed from the run queue. This will often cause a system crash. So in summary, a hanging Forbid() may indicate a bug or it may indicate a child process that has begun the process of exiting, but you can't really tell if that is good or bad. --Chuck McManis uucp: {anywhere}!sun!cmcmanis BIX: cmcmanis ARPAnet: cmcmanis@Eng.Sun.COM These opinions are my own and no one elses, but you knew that didn't you. "If it didn't have bones in it, it wouldn't be crunchy now would it?!"