po@volta.ece.utexas.edu (02/04/89)
In my program, I am using opendir() to read in the names of text files from a directory. How can I tell whether a file is text or an object file ? Is there a better way than using : system("file filename > /tmp/tempfile") Thanks, Po
gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/04/89)
In article <192@henry.ece.utexas.edu> po@volta.ece.utexas.edu () writes: >How can I tell whether a file is text or an object file ? >Is there a better way than using : > system("file filename > /tmp/tempfile") Anything you do will be pretty much what "file" does, namely, inspect yay many bytes of the file to see if there is a known "magic number" header present or if there are non-diplayable byte values present.
tale@pawl.rpi.edu (David C Lawrence) (02/04/89)
Just something to note here, too, since this is c.u.questions and not c.u.wizards (wizards already know this). `file' is a handy utility for a lot of things, but alas, it is quite easy to trip it up. For one thing, it does not know lisp (as indicated on the manual page) and for another, comments can really screw it up. In fact, cpp directives can screw it up too. I have a programme that starts out with lots of #defines. `file' loves to think that it is just ascii text. Another programme has a large comment section at the beginning of the form /*text\n *more text\n */. This, alas, is just but ascii text to `file'. (Aside: I am curious how it determines something is English text rather than just ascii text.) Another thing that can screw-up `file' is short, tabular information, as near as I can tell. I have a README in my bin directory which consists of the name of each programme, the language it is written in and a message about what it does. This is reported by `file' to be *roff, et al, input. Similar behavious has been exhibited on other files. One thing I -love- file for is reading raster headers; it tells me useful information in a nice compact way -- size of image, encoding information (run-time encoded, standard format, etc). For the purposes of the original poster, `file' is a great way to find scripts and machine executables; don't rely on it for source files, especially, though. Dave -- tale@rpitsmts.bitnet, tale%mts@rpitsgw.rpi.edu, tale@pawl.rpi.edu
guy@auspex.UUCP (Guy Harris) (02/05/89)
>Anything you do will be pretty much what "file" does, >namely, inspect yay many bytes of the file to see if there is a >known "magic number" header present or if there are non-diplayable >byte values present. Note, of course, that: 1) the set of known magic numbers is not "constant" in any sense, so you have to pick a set and hope it's sufficient (or let "file" do it, if your system supports an S5-style "file", and make sure the "/etc/magic" file is up-to-date and complete); 2) you should use "isprint" to determine what is a "non-displayable byte value"; do NOT assume that any byte with its 8th bit set is necessarily non-printable (actually, if you have files with character codes longer than 1 byte, e.g. files containing Kanji, it gets even more complicated).
debra@alice.UUCP (Paul De Bra) (02/05/89)
In article <192@henry.ece.utexas.edu> po@volta.ece.utexas.edu () writes: >In my program, I am using opendir() to read in the names of >text files from a directory. >How can I tell whether a file is text or an object file ? >Is there a better way than using : > system("file filename > /tmp/tempfile") The only way is to look at the contents of the file, which is what the utility "file" does too. So read a number of bytes from the file, and then guess, depending on what you see. Paul. -- ------------------------------------------------------ |debra@research.att.com | uunet!research!debra | ------------------------------------------------------
leo@philmds.UUCP (Leo de Wit) (02/07/89)
In article <192@henry.ece.utexas.edu> po@volta.ece.utexas.edu () writes: |In my program, I am using opendir() to read in the names of |text files from a directory. |How can I tell whether a file is text or an object file ? |Is there a better way than using : | system("file filename > /tmp/tempfile") Since Unix does not have the notion of a file type (at least not like V..) you'll end up doing something equivalent to what 'file' does. Using file perhaps enhances portability. Try as a variant: pp = popen("exec file filename","r"); to get the lines of 'file' into your program (and use 'pclose', not 'fclose' to close the stream). You can even grep for 'text' in the output from file. Leo.
dg@lakart.UUCP (David Goodenough) (02/10/89)
tale@pawl.rpi.edu (David C Lawrence) sez: Stuff about file(1) deleted > (Aside: I am curious how it determines something is English > text rather than just ascii text.) I'd hazard a guess that it looks at the letter distributions. English has well defined (well fairly well defined) ratios of letters. So you count how many E's, T's etc. etc. occur, see how close you are to the "standard". If you are close, say it's English, else say it's ascii. This may be wrong - those in the know are welcome to correct me, but it's one possibility that could be made to work. -- dg@lakart.UUCP - David Goodenough +---+ IHS | +-+-+ ....... !harvard!xait!lakart!dg +-+-+ | AKA: dg%lakart.uucp@xait.xerox.com +---+
mchinni@ardec.arpa (Michael J. Chinni, SMCAR-CCS-E) (02/11/89)
From: po@volta.ece.utexas.edu: > Is there a better way than using : > system("file filename > /tmp/tempfile") Better in what sense. If better than using "file" not really (for reasons mentioned by other responses). If better than using "system" and 'fpoen("/tmp/tempfile","r");' try using "popen" (i.e. popen("file filename","r"); ). Michael J. Chinni (<mchinni@ardec.arpa>) US Army ARDEC Picatinny Arsenal, New Jersey
bph@buengc.BU.EDU (Blair P. Houghton) (02/11/89)
In article <949@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes: > >Note, of course, that: > > 1) the set of known magic numbers is not "constant" in any > sense, so you have to pick a set and hope it's sufficient (or > let "file" do it, if your system supports an S5-style "file", > and make sure the "/etc/magic" file is up-to-date and > complete); Hokay, mah innorance is REALLY gonna show, now. At least this is c.u.q and not c.u.w... Just what are magic numbers for and of what significance is the /etc/magic file. Every once in a while I get an error saying something about bad magic numbers or whatnot, and I usually just punt. What's the scoop? --Blair "Amaze your friends and coworkers: alias Presto-chango 'mv -f * /bit/bucket'"
guy@auspex.UUCP (Guy Harris) (02/13/89)
>Just what are magic numbers for Some software subsystems mark files of some format that they know about with some particular number or set of characters, usually at the front of the file. The name, at least as I remember it being used in UNIX systems, originally referred to stuff stuck at the end of object and executable files that flagged the type of executable file. Back in the dark ages, I think an executable file began with a PDP-11 branch instruction that jumped around a header that specified the size of the code+data in the file; the instruction was a 16-bit number, value octal 407. When separate text&data, and split I&D, executables were added, "407" as the first 16 bits of the executable file was treated as a flag indicating that the image was a non-separate text&data image, and 410 was used for separate text&data and 411 (? - I haven't dealt with PDP-11s in ages...) meant separate I&D. (Also, the header was no longer part of the image itself, since 410 and 411 would branch further than 407 did and I guess they thought it was silly to pad the header to cope with this...). Anyway, the term "magic number" then applied to other flags stuck at the front of files by subsystems; e.g, 0177555 (word size) for a very old archive format, 0177545 (16 bits) for an older archive format, "!<arch>\n" as a string for the "modern" archive format (4.xBSD/S5R2 and later), etc. >and of what significance is the /etc/magic file. The S5 version of the "file" command (S5R2 and later, anyway) has a file "/etc/magic" that it can read to find out some of the magic numbers/strings it should look for; that way, you can extend its repertoire without having to hack the source code. SunOS 3.2 and later use a "file" based on this, with some extensions. >Every once in a while I get an error saying something about bad >magic numbers or whatnot, and I usually just punt. What's the scoop? That probably means you tried to link or execute a file that the linker or the "exec" call didn't recognize as an object or executable file, because it didn't have a proper magic number at the front of it; hence, "bad magic number".