[comp.sys.hp] rtprio for X

lee@iris.ucdavis.edu (Peng Lee) (09/08/89)

	I notice X server for HPUX 6.5 doesn't have the rtprio (real time
priority) option as the earlier X server.  Would someone tell me why?  	

	Since I am can su in  this system and my system adminstrator doesn't
want to set my group privileges to rtprio, I have no choice but to use
the slowwww server without rtprio if I want to use some new features
(such as disable the reset key in xlock program). 

Thanks
-Peng (lee@iris.ucdavis.edu)

jack@hpindda.HP.COM (Jack Repenning) (09/14/89)

> I notice X server for HPUX 6.5 doesn't have the rtprio (real time
> priority) option as the earlier X server.  Would someone tell me
> why?

I wasn't on the development team for X11, but I am a notes loud-mouth
(inside the company as well as outside), and I was involved in the
discussions at the time.  I also worked in HP's Real Time Executive
(RTE) environment for a number of years, where half the programs in
the system run as real-time, so I have some experience in real time
programming.

Here's a totally unofficial summary of why the X server business went
the way it did:

Writing a program to work reliably as a real-time process is complex,
arcane, and costly, and when an rtprio program doesn't "work
reliably," what that generally means is your system is hung, and has
to be power-cycled to reboot (dirty file systems, work destroyed, that
sort of thing).  The team didn't have the resources to do a good job
of it at the time, and a poor job would not have been a good idea at
all.

Furthermore, making a whole system work reliably when it includes
real-time processes is an almost equally black art, even when all the
processes in the system work well by themselves.  The work consists
"merely" of picking the right ordering of the priorities, for all
programs that interact in any way.  But there are an endless number of
ways for programs to "interact," mistakes are costly, and problems can
be hard to test and reproduce.  Here, the problem would have to be
handled by explaining the details to the end users (you and your
SysAdmin, for example), and getting that much explanation across is
even harder than doing it oneself.

You may ask, "but if it worked with the earlier one, why wouldn't it
work with the newer one?"  It's a good question; answering it is part
of the complex, arcane, costly process of making it work.  But it most
definitely did *not* work with the new one.

As a matter of fact, even the earlier servers weren't supposed to have
the rtprio feature: it was yanked early-on from the design and the
documents.  Someone just forgot to yank it from the code.

It's all rather a shame.  I used to use it, too (that's how I, along
with many others, discovered that it was *not* working with the new
one), and I miss it.

You can achieve some of the same effect, by nice(1)ing most everything
but the server - nice(1) is considerably safer than rtprio(1), because
priorities are still allowed to drift, so the dispatcher can cover for
your mistakes.  I've turned my .x11start file into a /bin/ksh file,
and made sure that the ksh has "bgnice" set.  That way, everything
started from .x11start is niced a bit.  Try that out.

-------------------------------------------------------------
Jack Repenning - Information Networks Division,
		 Hewlett Packard Corporation
uucp:   ... {allegra,decvax,ihnp4,ucbvax} !hplabs!hpda!jack
  or:   ... jack@hpda.hp.com
HPDesk: Jack REPENNING  /HP6600/UX
USMail: 43LN; 19420 Homestead Ave; Cupertino, CA  95014
Phone:  408/447-3380                     HPTelnet: 1-447-3380
-------------------------------------------------------------

grzm@zyx.SE (Gunnar Blomberg) (09/14/89)

In article <4310057@hpindda.HP.COM> jack@hpindda.HP.COM (Jack Repenning) writes:
>> I notice X server for HPUX 6.5 doesn't have the rtprio (real time
>> priority) option as the earlier X server.  Would someone tell me
>> why?
>[...]
>You can achieve some of the same effect, by nice(1)ing most everything
>but the server - nice(1) is considerably safer than rtprio(1), because
>priorities are still allowed to drift, so the dispatcher can cover for
>your mistakes.  I've turned my .x11start file into a /bin/ksh file,
>and made sure that the ksh has "bgnice" set.  That way, everything
>started from .x11start is niced a bit.  Try that out.

  We used to run our X servers at real time priority too (though we did
it ourselves -- nobody had any idea that there was any support for it
in the X server), but after some troubles with crashes (which, by the
way, did not seem to be caused by that anyways), we changed to using
nice(1).  The way we do it is by running the X server (as well as the
window manager) "unniced".  This gives the server higher priority than
practically everything (and not only the stuff explicitely given lower
priority), which is definitely what I want.

  The way I do it is by running x11start at negative niceness of 20
and then in my .x11start I run .realx11start at a (relative) positive
niceness of 20.  This runs the X server at high priority and anything
I start in .realx11start at normal priority.  I guess you could get
the same effect in a more centralized way, but this is works for me...
-- 
"The CPU that has most influenced  | Gunnar Blomberg
 the Unix system is unquestionably | ZYX Sweden AB, Bangardsg 13,
 the Intel 80386."                 | S-753 20  Uppsala, Sweden
--David Fiedler, BYTE, May 1989    | email: grzm@zyx.SE

raveling@isi.edu (Paul Raveling) (09/15/89)

In article <4310057@hpindda.HP.COM>, jack@hpindda.HP.COM (Jack
Repenning) writes:
> 
> Writing a program to work reliably as a real-time process is complex,
> arcane, and costly, and when an rtprio program doesn't "work
> reliably," what that generally means is your system is hung, and has
> to be power-cycled to reboot (dirty file systems, work destroyed, that
> sort of thing). ...
>  
> Furthermore, making a whole system work reliably when it includes
> real-time processes is an almost equally black art, even when all the
> processes in the system work well by themselves.  ...

	Sounds like we come from real time system environments that
	must have different architectures.  Ever since beginning to
	refine my favorite architecture (in 1972) my experience has
	been the opposite.

	Given a good kernel architecture, it's been easy to get real time
	applications to work, even running ALL processes with preemptive
	scheduling.  Of the family of systems I usually mention in this
	context, EPOS (vintage 1975) is the one I refer to most often.
	The last bug report I heard on it before we shut down the last
	PDP-11/45's running it was that the timer keeping track of how
	long it had been up overflowed at somewhere around 7 months.

	I keep wishing UNIX had the sort of facilities EPOS and others
	had to handle multi-process applications.  It's MURDER to try to
	do the same things on UNIX.


----------------
Paul Raveling
Raveling@isi.edu

jack@hpindda.HP.COM (Jack Repenning) (09/19/89)

>        Given a good kernel architecture, it's been easy to get real time
>        applications to work, even running ALL processes with preemptive
>        scheduling.  Of the family of systems I usually mention in this

That's interesting.  Although I can point to a number of details in
the kernel (UNIX or otherwise) that can make things easier or harder,
I've always found that the most intransigent problems were resource
deadlocks between the individual programs.  How can EPOS deal with
that?

j

cjames@hpldsla.HP.COM (Craig James) (09/21/89)

>	Sounds like we come from real time system environments that
>	must have different architectures.  ...
>
> 	Given a good kernel architecture, it's been easy to get real time...
> 	applications to work, even running ALL processes with preemptive
> 	scheduling.  
>
> Paul Raveling
> Raveling@isi.edu

It's hard to see how that is possible with real-time capabilities.
After all, if I start up a process that runs until it explicitely gives up
the CPU, and that process goes into an infinite loop, there is no escape but
to reboot.

Perhaps the system you described didn't provide such abilities?

Craig James, HP Labs
"These are my opinions, not HP's"

raveling@isi.edu (Paul Raveling) (09/21/89)

In article <4310059@hpindda.HP.COM>, jack@hpindda.HP.COM (Jack
Repenning) writes:
> >        Given a good kernel architecture, it's been easy to get real time
> >        applications to work, even running ALL processes with preemptive
> >        scheduling.  Of the family of systems I usually mention in this
> 
> That's interesting.  Although I can point to a number of details in
> the kernel (UNIX or otherwise) that can make things easier or harder,
> I've always found that the most intransigent problems were resource
> deadlocks between the individual programs.  How can EPOS deal with
> that?

	EPOS didn't have a sure-fire cure for a classic deadlock,
	but its architecture made it easy for multiple processes
	to use cooperative rather than competitive techniques.

	Most multiprocess systems in the '70's tended to use some
	variant of semaphores for interprocess synchronization and
	"communication".  EPOS used a signal/wait facility that was
	based on message transmission and queueing rather than on
	semaphores.  When there's a bit more time I may be able
	to dig up some prose describing this signal/wait facility.


	EPOS also had lock/unlock system calls to handle the sort
	of resource locking that semaphores are natural for, but it
	wasn't necessary to use them very often.  Most of the lock/unlock
	usage was for things like storage allocation, where a call
	to allocate or release memory locks the control blocks for
	a particular storage pool while it works.

	At the kernel level there were a number of spots where EPOS
	used an even simpler lock when possible:  Disabling interrupts.
	Critical regions doing things like manipulating queues of
	control blocks simply disabled for the minimum possible time,
	often as little as two machine instructions.

	So we didn't attempt deadlock resolution because we never got
	one.  (A simple deadlock would be something like process A locks
	resource 1, process B locks resource 2, then A locks 2 & B
	locks 1.)  I did have some ideas for untangling deadlocks that
	involve signal/wait and the resource wait processes used by
	the lock/unlock mechanism, but we just didn't need to work
	on the problem.


	BTW, the resource locks were implemented with signal/wait.
	The conceptually simplest way to do this is about as like this
	(guess I'd better mention that event data (from signals) was
	queued if the receiving process wasn't waiting for an the event
	in question, so that multiple requests for a resource go onto
	the lock process' event queue):

	ProcessId	resource_process;

	lock(resource_process)		/*  Used to lock resource	*/
		{
		signal (LOCK_EVENT, resource_process);
		waits  (LOCK_EVENT, resource_process, NULL);
		}

	unlock(resource_process)	/*  Used to unlock resource	*/
		{signal (UNLOCK_EVENT, resource_process);}


	/*	Housekeeping calls:	*/

	Resource *create_lock() {return create_process(lock_process);}
	delete_lock(resource_process) {delete_process(resource_process);}


	lock_process ()			/*  Process to control resource	*/
		{
		ProcessId	locker;

		while (TRUE)
		    {
		    waits  (LOCK_EVENT, NULL, &owner);
		    signal (LOCK_event, locker);
		    waits  (UNLOCK_EVENT, owner, NULL);
		    }
		}


	In truth we used some extra logic to avoid context switches
	in the most common cases -- locking an unlocked resource and
	unlocking a resource noone was waiting for.  Also, the signals
	were a slightly more complicated because resource wait processes
	belonged to the kernel job, while the lock/unlock calls often
	came from a user job.  (Jobs were isolated from each other,
	including having separate process name spaces, except for
	the cross-job interface between the kernel and other jobs.
	cross-job signalling could be done only from code executing
	in kernel mode.)

	The deadlock untangling scheme I was thinking about would
	add some different signal/wait interaction to lock processes
	and would probably add a system process (daemon if you prefer)
	to supervise deadlock resolution.


----------------
Paul Raveling
Raveling@isi.edu

raveling@isi.edu (Paul Raveling) (09/23/89)

In article <3140004@hpldsla.HP.COM>, cjames@hpldsla.HP.COM (Craig James)
writes:
> 
> It's hard to see how that is possible with real-time capabilities.
> After all, if I start up a process that runs until it explicitely gives up
> the CPU, and that process goes into an infinite loop, there is no escape but
> to reboot.
> 
> Perhaps the system you described didn't provide such abilities?

	In such a case the machine would go CPU-bound, but nothing
	drastic would happen unless the looping process was a crucial
	system process, such as a device driver.

	Processes in loaded applications were offspring of an Exec process,
	which was the first process created in each job except the kernel
	job.  The Exec ran as a VERY high priority process, and its offspring
	were restricted to priorities below those of the Exec and all
	system processes other than the Idle process.

	The normal way out of the loop was for the user to hit control-C
	on the keyboard.  The terminal i/o process for the associated
	terminal signalled the Exec, which then preempted the looping
	process.  For ^C the Exec froze its offspring, including the looper,
	and resumed accepting commands from the terminal.

	The usual next command was "MEND", which stood for Multi-Environment
	Native Debugger.  This allowed usual sorts of debugging operations
	on the job's processes, including single-stepping the loop or
	patching data up and resuming execution.


	For a different answer, a prior incarnation of the same kernel
	architecture also implemented time slicing.  This this proved
	unnecesary in EPOS' event-driven real time environment, but
	would be appropriate in a more system with more diversified use.
	

----------------
Paul Raveling
Raveling@isi.edu

raveling@isi.edu (Paul Raveling) (09/23/89)

	(About what happens if a process goes into a loop under EPOS)...

	On looking at my last response, it may not have mentioned clearly
	enough that all activity necessary for the system to run and
	stay healthy occurs in processes with very high priorities.
	A "normal" application process, even at a priority that allows
	real time response and bursts of high CPU use, doesn't interfere
	with system activity in device driver processes, Execs, etc.

	Also, none of the high priority processes did anything that
	monopolized the CPU for long.   Context switching was fast
	(45 times faster than UNIX), so that the device drivers could
	quickly get control, respond to an event such as an interrupt
	or i/o request, and relinquish control back to the previously
	active process.  This quick-response event-handling type of
	operation is well suited to preemptive scheduling.

	BTW, a "normal" application on this system handled real time
	speech.  In speech conferencing it used both LPC and CVSD
	vocoders, with CVSD demanding the higher bandwidth.  My memory's
	unclear now, but I think the meter we put on the PDP-11/45's bus
	showed about 33% CPU utilization for a single CVSD channel at
	10 kHz bandwidth (or maybe it was 16 kHz).


----------------
Paul Raveling
Raveling@isi.edu

jack@hpindda.HP.COM (Jack Repenning) (09/30/89)

EPOS does seem to have provided a lot of good stuff for developing
real-time, event-driven, light-weight processes.

Unfortunately, we were trying to achieve the same effect with
preexisting, multi-user, time-sliced UNIX processes.

And in the case raised by the basenote, we were trying to take a small
piece of this emulation of a pure real-time system, and apply it
freely, from the outside, to programs never developed for that
environment.

The latter problem is the one I meant was "hard".

j

raveling@isi.edu (Paul Raveling) (10/04/89)

In article <4310061@hpindda.HP.COM>, jack@hpindda.HP.COM (Jack
Repenning) writes:
> EPOS does seem to have provided a lot of good stuff for developing
> real-time, event-driven, light-weight processes.

	BTW, EPOS processes weren't light-weight.  LWP's are
	becoming popular mainly because current OS's (at least UNIXes)
	lack adequate facilities for multi-process applications
	using "normal" processes.


----------------
Paul Raveling
Raveling@isi.edu