ast@cs.vu.nl (Andy Tanenbaum) (05/09/88)
Here is a primitive file.c If anyone thinks he or she can tell the
difference between French and English, here's your big chance. I knocked
this off in an hour, so it's not real smart. Improvements are welcome.
----------------------------- file.c ----------------------------------
/* file - report on file type. Author: Andy Tanenbaum */
#include <blocksize.h>
#include <ar.h>
#include <stat.h>
#define A_OUT 001401 /* magic number for executables */
#define SPLIT 002040 /* second word on split I/D binaries */
#define XBITS 00111 /* rwXrwXrwX (x bits in the mode) */
#define ENGLISH 25 /* cutoff for determining if text is Eng.*/
char buf[BLOCK_SIZE];
main(argc, argv)
int argc;
char *argv[];
{
/* This program uses some heuristics to try to guess about a file type by
* looking at its contents.
*/
int i;
if (argc < 2) usage();
for (i = 1; i < argc; i++) file(argv[i]);
}
file(name)
char *name;
{
int i, fd, n, magic, second, mode, nonascii, special, funnypct, etaoins;
long engpct;
char c;
struct stat st_buf;
printf("%s: ", name);
/* Open the file, stat it, and read in 1 block. */
fd = open(name, 0);
if (fd < 0) {
printf("cannot open\n");
return;
}
n = fstat(fd, &st_buf);
if (n < 0) {
printf("cannot stat\n");
close(fd);
return;
}
mode = st_buf.st_mode;
n = read(fd, buf, BLOCK_SIZE);
if (n < 0) {
printf("cannot read\n");
close(fd);
return;
}
/* Check to see if file is an archive. */
magic = (buf[1]<<8) | (buf[0]&0377);
if (magic == ARMAG) {
printf("archive\n");
close(fd);
return;
}
/* Check to see if file is an executable binary. */
if (magic == A_OUT) {
/* File is executable. Check for split I/D. */
printf("executable ");
second = (buf[3]<<8) | (buf[2]&0377);
if (second == SPLIT)
printf(" separate I & D space\n");
else
printf(" combined I & D space\n");
close(fd);
return;
}
/* Check to see if file is a shell script. */
if (mode & XBITS) {
/* Not a binary, but executable. Probably a shell script. */
printf("shell script\n");
close(fd);
return;
}
/* Check for ASCII data and certain punctuation. */
nonascii = 0;
special = 0;
etaoins = 0;
for (i = 0; i < n; i++) {
c = buf[i];
if (c & 0200) nonascii++;
if (c == ';' || c == '{' || c == '}' || c == '#') special++;
if (c == '*' || c == '<' || c == '>' || c == '/') special++;
if (c >= 'A' && c <= 'Z') c = c - 'A' + 'a';
if (c == 'e' || c == 't' || c == 'a' || c == 'o') etaoins++;
if (c == 'i' || c == 'n' || c == 's') etaoins++;
}
if (nonascii == 0) {
/* File only contains ASCII characters. Continue processing. */
funnypct = 100 * special/n;
engpct = 100L * (long) etaoins/n;
if (funnypct > 1) {
printf("C program\n");
} else {
if (engpct > (long) ENGLISH)
printf("English text\n");
else
printf("ASCII text\n", engpct);
}
close(fd);
return;
}
/* Give up. Call it data. */
printf("data\n");
close(fd);
return;
}
usage()
{
printf("Usage: file name ...\n");
exit(1);
}henry@utzoo.uucp (Henry Spencer) (05/11/88)
Note that a fairly sophisticated implementation of file(1) appeared in
comp.sources.unix in February.
--
NASA is to spaceflight as | Henry Spencer @ U of Toronto Zoology
the Post Office is to mail. | {ihnp4,decvax,uunet!mnetor}!utzoo!henrynick@nswitgould.OZ (Nick Andrew) (05/12/88)
in article <707@ast.cs.vu.nl>, ast@cs.vu.nl (Andy Tanenbaum) says: > > Here is a primitive file.c If anyone thinks he or she can tell the > difference between French and English, here's your big chance. I knocked > this off in an hour, so it's not real smart. Improvements are welcome. > ... primitive file.c follows ... Some weeks ago, I ported the public domain file(1) from the comp.sources.unix distribution (v13i027 and v13i028). It works very well, though I didn't take the time to add definitions for Minix a.out files or archives. I seem to recall a few changes required in the Minix /usr/include files, most notably sys/types.h and sys/stat.h. I had already made these and some other changes to the library (eg adding John Gilmore's pd Getopt). The location of the "magic" file was changed to /usr/lib/magic, because my hard disk partition 1 is mounted on /usr (helps with portability) and /etc is still on the ramdisk. If there is demand, I'll post my changes to the net. Nick.
nick@nswitgould.OZ (Nick Andrew) (05/12/88)
in article <8550@nswitgould.OZ>, nick@nswitgould.OZ (Nick Andrew) says: ... > Some weeks ago, I ported the public domain file(1) from the ^^^^^^^^^^^^^ Should have been "Ian Darwin's (copyright but redistributable)" > and some other changes to the library (eg adding John Gilmore's pd Getopt). ^^^^^^^^^^^^^^^^^ Should have been "Henry Spencer's" Sorry about these problems with my attributions, folks.