[comp.compression] Self Extracting Files

pollarda@physc1.byu.edu (06/15/91)

PKZip as well as several other file compression utilities I have seen
have the option to have the files self extracting.  I understand that
the files have the machine code to self extract along with the data
combined somehow.  But how exactly does it work?

What tells the computer to stop loading in the file as machine code
and handle the rest as data?

What codes are needed to be able to achieve this?  And what changes
are needed to be made to the program in order for it to work on
this kind of file?

Does anyone have any ideas?

While I am at it, a friend has shown me something else quite curious
and I have told him I would try to find out what is going on. . .

Some programs have something embeded in them so that if you "type"
them, they will display the program name and the material will stop
scrolling on the screen and the DOS prompt will appear again.

e.g.

C:> type program.exe

The Software
(c) 19xx The Software Co.

C:>

If there is anyone with the answer (or at least a good guess at one)
to either of these two questions, I would greatly appreciate it
if you could let me know what is going on.  Both of these have me
Stumped...

Thanx a bunch,

Art Pollard

BitNet: PollardA@Xray.Byu.Edu
Uncle Sam's Express:
600 N. 195 E. #31, Provo, UT. 84606
Phone Stuff:
(801) 373-0339 (Home --Late!!!)
(801) 378-4490 (Work)

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/15/91)

What does any of this have to do with comp.compression? Sounds like IBM
programming group fodder.

In article <309pollarda@physc1.byu.edu> pollarda@physc1.byu.edu writes:
> What tells the computer to stop loading in the file as machine code
> and handle the rest as data?

It just loads in the whole thing and jumps to a spot near the beginning.
Then both the code and the data are in memory. The code uncompresses the
data, then jumps to the beginning of the newly uncompressed program.

> Some programs have something embeded in them so that if you "type"
> them, they will display the program name and the material will stop
> scrolling on the screen and the DOS prompt will appear again.

The DOS programs interpret ^Z as an end-of-file marker. They'll stop
when they hit ^Z in a file, even if there's more data past that.

---Dan

d88-jwa@byse.nada.kth.se (Jon W{tte) (06/15/91)

In article <309pollarda@physc1.byu.edu> pollarda@physc1.byu.edu writes:

   combined somehow.  But how exactly does it work?

   What tells the computer to stop loading in the file as machine code
   and handle the rest as data?

It doesn't. Everything is loaded as code, but the flow in the program
never reached the addresses where the data is. Look at it like a small
program like:

char stuffed_data [ ] = { 0 , 1 , .... } ;

main ()
{
	while ( more_data ) {
		unstuff ( stuffed_data ) ;
	}
}

   type foo.exe

   The Software
   (c) 19xx The Software Co.

   C:>

Just add the text to the beginning with an end-of-file in there.
Maybe some magic numbers still need to be there, but they can be
erased by backspaces... I'm not so very at home with MS-DOS any
longer.



Now, on the mac, this info is in the "vers" resource and comes up
when you "get info" on the icon - and a self-extrancter would have
the code in the usual place (CODE resources) and the data in the
data fork - so you could open it transparently with the original
program as well, which doesn't need to look at the resource fork.
A file system to die for ! :-)

All of this has little to do with comp.compression, except maybe
that archive writers need to be aware that not all file systems
are flat...

--
						Jon W{tte
						h+@nada.kth.se
						- Speed !

davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (06/17/91)

In article <309pollarda@physc1.byu.edu> pollarda@physc1.byu.edu writes:
| 
| PKZip as well as several other file compression utilities I have seen
| have the option to have the files self extracting.  I understand that
| the files have the machine code to self extract along with the data
| combined somehow.  But how exactly does it work?

  The expandor is the first part of the program, which is fixed length
so it knows what to skip.

| What tells the computer to stop loading in the file as machine code
| and handle the rest as data?

  It all gets loaded into memory in some cases, in others the header
info causes only part of the file to be loaded, and the full filename
(dos 3.x and later) is used to find the real data.


| Some programs have something embeded in them so that if you "type"
| them, they will display the program name and the material will stop
| scrolling on the screen and the DOS prompt will appear again.

  The DOS type command stops when it see a cntl-Z 26(10) character.
-- 
bill davidsen - davidsen@sixhub.uucp (uunet!crdgw1!sixhub!davidsen)
    sysop *IX BBS and Public Access UNIX
    moderator of comp.binaries.ibm.pc and 80386 mailing list
"Stupidity, like virtue, is its own reward" -me

rennyk@apex.com (Renny K) (06/18/91)

In article <309pollarda@physc1.byu.edu> pollarda@physc1.byu.edu writes:
>
>PKZip as well as several other file compression utilities I have seen
>have the option to have the files self extracting.  I understand that
>the files have the machine code to self extract along with the data
>combined somehow.  But how exactly does it work?
>What tells the computer to stop loading in the file as machine code
>and handle the rest as data?

There is a EXE Header as it is called, at the beginning of EVERY EXE file.
The DOS loader looks at this information before loading the file for length,
where to load, relocation etc.  It's possible to change the amount of informa-
tion loaded by modifying this header.

>What codes are needed to be able to achieve this?  And what changes
>are needed to be made to the program in order for it to work on
>this kind of file?

You don't need any special codes.  It's done by DOS (and the linker)

>Some programs have something embeded in them so that if you "type"
>them, they will display the program name and the material will stop
>scrolling on the screen and the DOS prompt will appear again.
>If there is anyone with the answer (or at least a good guess at one)
>to either of these two questions, I would greatly appreciate it
>if you could let me know what is going on.

This is also easy to do.  It's much easier in a .COM program than an .EXE
program.

In a COM program:

	Start off the program with a jump instruction, followed by the message
	and then end the message with the hex character 01Ah.  This is the 
	DOS End-Of-File character, which will cause the type command to stop.

	ex:

		JMP	START

		DB	'My Program',13,10
		DB	'Version 1.0',13,10
		DB	'Copyright 1991, Renny Koshy',13,10,10,10,01Ah

In an EXE program it's harder because of the header in the beginning, and I
don't know how to do it without showing SOME garbage (i.e. the header).


-- 
-------------------------------------------------------------------------------
Renny Koshy						rennyk@apex.com
Apex Computer, Redmond, WA.

aaron@backyard.bae.bellcore.com (Aaron Akman) (06/19/91)

In article <1991Jun17.205910.17669@apex.com>, rennyk@apex.com (Renny K) writes:
>In article <309pollarda@physc1.byu.edu> pollarda@physc1.byu.edu writes:
> 
>There is a EXE Header as it is called, at the beginning of EVERY EXE file.
>The DOS loader looks at this information before loading the file for length,
>where to load, relocation etc.  It's possible to change the amount of informa-

This goes back a bit, but I appended data to an EXE this way:
Compiled the EXE completely, created a little utility program to
open(file.exe), seek(file.exe, EOF), and write(file.exe, extra stuff).
As I recall, I tried 2 methods for ``finding'' the data from within
the executing EXE:

(1) have the utility modify the initial value of
one the EXE's global variables with an exact offset.  You could figure
out how to find a global variable by compiling a little program like:

char *cgoff = "STUFF NUMBER HERE AND DO ATOI IN YOUR PROGRAM";

Probably the utility just has to search for that string in the
file...get the size w/an lseek and make sure not to alter the
size...after executing the utility, cgoff might look like this (if you
could look at it in the compiled EXE): 

char *cgoff = "27435\0NUMBER HERE AND DO ATOI IN YOUR PROGRAM";

(2) have the utility program write an identifiable marker where the
data begins, and search for that when you executing EXE opens itself.

___________________________
Aaron Akman
aaron@backyard.bellcore.com
908-699-8019

mycroft@kropotki.gnu.ai.mit.edu (Charles Hannum) (06/20/91)

In article <1991Jun19.140345.18650@bellcore.bellcore.com> aaron@backyard.bae.bellcore.com (Aaron Akman) writes:

   In article <1991Jun17.205910.17669@apex.com>, rennyk@apex.com (Renny K) writes:
   >In article <309pollarda@physc1.byu.edu> pollarda@physc1.byu.edu writes:
   > 
   >There is a EXE Header as it is called, at the beginning of EVERY EXE file.
   >The DOS loader looks at this information before loading the file for length,
   >where to load, relocation etc.  It's possible to change the amount of informa-

   This goes back a bit, but I appended data to an EXE this way:
   Compiled the EXE completely, created a little utility program to
   open(file.exe), seek(file.exe, EOF), and write(file.exe, extra stuff).
   As I recall, I tried 2 methods for ``finding'' the data from within
   the executing EXE:

   [stuff deleted]

This is unnecessary and unreliable.  The .EXE header tells you exactly
how large the executable portion is.  Most programs also put their own
header on the data, so they know exactly how large it is.  If you
choose to do this, you should make your header compatible with a .EXE
header, but with a different magic value.  (It only requires 16 bytes
-- a small price to pay for the flexibility it adds.)  This also
allows you to add more than one data segment to your program, each
with a different magic value, and simply have your loader search the
headers for the right one.  This is very fast.

rosenkra@convex.com (William Rosencranz) (06/20/91)

i am suprised nobody brought another issue up, so i will. i think
it is relevant, too...

i am extremely reluctant to advocate self extracting archives for 2
reasons: 1) in order to get at the stuff, u have to execute something
(meaning more chance of viral infection), and 2) you can only extract
on a particular system. the latter may make sense if what u archived
will only work on that particular system tho it still makes it impossible
to read an included text file like docs without the target system. however,
the former is impossible to overlook. self extracting files are probably
most useful for third party software distribution where these issues
are probably moot (tho i know of at least one commercial package which
was distributed with a virus, non-intentionally).

-bill
rosenkra@convex.com
--
Bill Rosenkranz            |UUCP: {uunet,texsun}!convex!c1yankee!rosenkra
Convex Computer Corp.      |ARPA: rosenkra%c1yankee@convex.com

brad@looking.on.ca (Brad Templeton) (06/20/91)

Without getting too doshish here, how do you find the .exe file?  DOS
doesn't tell you the name of the command you are executing.  I assumed
most of these self-extractors just took the compressed codes as data that
was loaded in with the program.   Is there a file descriptor sitting around
to your program or something?

Even on Unix the first argument is not assured to be the name of the program
you ran.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

ts@uwasa.fi (Timo Salmi) (06/20/91)

In article <1991Jun20.034508.17792@convex.com> rosenkra@convex.com (William Rosencranz) writes:
>
>i am suprised nobody brought another issue up, so i will. i think
>it is relevant, too...

Perhaps because in the course of time they have been pointed out
quite frequently.  (But maybe not in this newsgroup).

>i am extremely reluctant to advocate self extracting archives for 2
>reasons: 1) in order to get at the stuff, u have to execute something
>(meaning more chance of viral infection), and 2) you can only extract
>on a particular system. the latter may make sense if what u archived
:

Yes these are good and relevant points.  In fact the ones that are
usually deemed problematic.  They certainly are why we try to avoid
having self-extracting files on our FTP site, with the natural
exception of (un)archiving software for obvious logical reasons. 

...................................................................
Prof. Timo Salmi
Moderating at garbo.uwasa.fi anonymous ftp archives 128.214.12.37
School of Business Studies, University of Vaasa, SF-65101, Finland
Internet: ts@chyde.uwasa.fi Funet: gado::salmi Bitnet: salmi@finfun

bowling@ucunix.san.uc.edu (Brian D. Bowling) (06/20/91)

In article <1991Jun20.040437.11896@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
|Without getting too doshish here, how do you find the .exe file?  DOS
|doesn't tell you the name of the command you are executing.  I assumed
|most of these self-extractors just took the compressed codes as data that
|was loaded in with the program.   Is there a file descriptor sitting around
|to your program or something?
|
|Even on Unix the first argument is not assured to be the name of the program
|you ran.
|-- 
|Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

Look up in the PSP.  The info should be stored there by DOS.
Isn't DOS fun???

Brian

sander@cwi.nl (Sander Plomp) (06/20/91)

>i am suprised nobody brought another issue up, so i will. i think
>it is relevant, too...

>i am extremely reluctant to advocate self extracting archives for 2
>reasons: 1) in order to get at the stuff, u have to execute something
>(meaning more chance of viral infection), and 2) you can only extract
>on a particular system. the latter may make sense if what u archived
>will only work on that particular system tho it still makes it impossible
>to read an included text file like docs without the target system. however,
>the former is impossible to overlook. self extracting files are probably
>most useful for third party software distribution where these issues
>are probably moot (tho i know of at least one commercial package which
>was distributed with a virus, non-intentionally).

>-bill
>rosenkra@convex.com

I, too, hate the selfextracting archives, for exactly the same reasons.
Yes, they are:
 (1) The ideal virus carrier.
 (2) Exceptionally non portable. Nearly always MS-DOS and nothing but
     MS-DOS.
and,
 (3) Since we all have the archiver around anyway, whats the use of
     sending the uncompression routines along every time. Remember, we
     were talking about compression. (I know it's not that much, but
     since it's useless and dangerous anyway..)

The reason that self extracting archives where invented was simple. When
SEA distributed ARC as shareware via BBS systems, people often got
incomplete versions. Of course you cannot use ARC to distribute ARC so
the self extracting archive was invented as a way to make sure everyone
got both the programs and the manual.

For doing this, self extracting archives are very useful. For nearly any
other purpose they are a pain in the ass.

\footnote{
  I wonder if anybody patented self extracting archives or something
  like that. These days every neat or not so neat trick seems to get
  patented. Anybody know?
}
-- 
Sander Plomp
Internet: sander@cwi.nl
Fidonet: 2:283/500.4

churchh@ut-emx.uucp (Henry Churchyard) (06/21/91)

In article <1991Jun20.060401.20338@ucunix.san.uc.edu>, bowling@ucunix.san.uc.edu (Brian D. Bowling) writes:
> In article <1991Jun20.040437.11896@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
> |Without getting too doshish here, how do you find the .exe file?  DOS
> |doesn't tell you the name of the command you are executing.  I assumed
> 
> Look up in the PSP.  The info should be stored there by DOS.

  Actually, the info is recorded at the end of a program's copy of the
DOS environemnt in DOS versions 3.0 and above.  Environment variables
are stored as null terminated strings, there's an extra null after the
last environment variable, and then the name of the currently
executing program follows.  You find the segment location of your copy
of the environment at a specified place in the PSP, but I don't think
the name of the program is there.

Followups redirected to comp.os.msdos.programmer.
--
         --Henry Churchyard     churchh@emx.cc.utexas.edu

frankb@sbsvax.cs.uni-sb.de (Frank Bauernoeppel) (06/21/91)

In article <3745@charon.cwi.nl>, sander@cwi.nl (Sander Plomp) writes:
> I, too, hate the selfextracting archives, for exactly the same reasons.
> Yes, they are:
>  (1) The ideal virus carrier.
>  (2) Exceptionally non portable. Nearly always MS-DOS and nothing but
>      MS-DOS.

There is no problem. I can "unzip" self-extracting DOS .exe archives
on a SUN, a VAX or even on a PC without executing them. Since unzip
comes in C, only the native C compiler/linker has the chance to infect 
the program. 

Source code is available at simtel20.army.mil (among others). Thanks
to the people at INFO-ZIP!

BTW, unzip is just one example.
                   ~~~~~~~~~~~
	Greetings 	Frank

frankb@cs.uni-sb.de