ast@cs.vu.nl (Andy Tanenbaum) (05/09/88)
Here is a primitive file.c If anyone thinks he or she can tell the difference between French and English, here's your big chance. I knocked this off in an hour, so it's not real smart. Improvements are welcome. ----------------------------- file.c ---------------------------------- /* file - report on file type. Author: Andy Tanenbaum */ #include <blocksize.h> #include <ar.h> #include <stat.h> #define A_OUT 001401 /* magic number for executables */ #define SPLIT 002040 /* second word on split I/D binaries */ #define XBITS 00111 /* rwXrwXrwX (x bits in the mode) */ #define ENGLISH 25 /* cutoff for determining if text is Eng.*/ char buf[BLOCK_SIZE]; main(argc, argv) int argc; char *argv[]; { /* This program uses some heuristics to try to guess about a file type by * looking at its contents. */ int i; if (argc < 2) usage(); for (i = 1; i < argc; i++) file(argv[i]); } file(name) char *name; { int i, fd, n, magic, second, mode, nonascii, special, funnypct, etaoins; long engpct; char c; struct stat st_buf; printf("%s: ", name); /* Open the file, stat it, and read in 1 block. */ fd = open(name, 0); if (fd < 0) { printf("cannot open\n"); return; } n = fstat(fd, &st_buf); if (n < 0) { printf("cannot stat\n"); close(fd); return; } mode = st_buf.st_mode; n = read(fd, buf, BLOCK_SIZE); if (n < 0) { printf("cannot read\n"); close(fd); return; } /* Check to see if file is an archive. */ magic = (buf[1]<<8) | (buf[0]&0377); if (magic == ARMAG) { printf("archive\n"); close(fd); return; } /* Check to see if file is an executable binary. */ if (magic == A_OUT) { /* File is executable. Check for split I/D. */ printf("executable "); second = (buf[3]<<8) | (buf[2]&0377); if (second == SPLIT) printf(" separate I & D space\n"); else printf(" combined I & D space\n"); close(fd); return; } /* Check to see if file is a shell script. */ if (mode & XBITS) { /* Not a binary, but executable. Probably a shell script. */ printf("shell script\n"); close(fd); return; } /* Check for ASCII data and certain punctuation. */ nonascii = 0; special = 0; etaoins = 0; for (i = 0; i < n; i++) { c = buf[i]; if (c & 0200) nonascii++; if (c == ';' || c == '{' || c == '}' || c == '#') special++; if (c == '*' || c == '<' || c == '>' || c == '/') special++; if (c >= 'A' && c <= 'Z') c = c - 'A' + 'a'; if (c == 'e' || c == 't' || c == 'a' || c == 'o') etaoins++; if (c == 'i' || c == 'n' || c == 's') etaoins++; } if (nonascii == 0) { /* File only contains ASCII characters. Continue processing. */ funnypct = 100 * special/n; engpct = 100L * (long) etaoins/n; if (funnypct > 1) { printf("C program\n"); } else { if (engpct > (long) ENGLISH) printf("English text\n"); else printf("ASCII text\n", engpct); } close(fd); return; } /* Give up. Call it data. */ printf("data\n"); close(fd); return; } usage() { printf("Usage: file name ...\n"); exit(1); }
henry@utzoo.uucp (Henry Spencer) (05/11/88)
Note that a fairly sophisticated implementation of file(1) appeared in comp.sources.unix in February. -- NASA is to spaceflight as | Henry Spencer @ U of Toronto Zoology the Post Office is to mail. | {ihnp4,decvax,uunet!mnetor}!utzoo!henry
nick@nswitgould.OZ (Nick Andrew) (05/12/88)
in article <707@ast.cs.vu.nl>, ast@cs.vu.nl (Andy Tanenbaum) says: > > Here is a primitive file.c If anyone thinks he or she can tell the > difference between French and English, here's your big chance. I knocked > this off in an hour, so it's not real smart. Improvements are welcome. > ... primitive file.c follows ... Some weeks ago, I ported the public domain file(1) from the comp.sources.unix distribution (v13i027 and v13i028). It works very well, though I didn't take the time to add definitions for Minix a.out files or archives. I seem to recall a few changes required in the Minix /usr/include files, most notably sys/types.h and sys/stat.h. I had already made these and some other changes to the library (eg adding John Gilmore's pd Getopt). The location of the "magic" file was changed to /usr/lib/magic, because my hard disk partition 1 is mounted on /usr (helps with portability) and /etc is still on the ramdisk. If there is demand, I'll post my changes to the net. Nick.
nick@nswitgould.OZ (Nick Andrew) (05/12/88)
in article <8550@nswitgould.OZ>, nick@nswitgould.OZ (Nick Andrew) says: ... > Some weeks ago, I ported the public domain file(1) from the ^^^^^^^^^^^^^ Should have been "Ian Darwin's (copyright but redistributable)" > and some other changes to the library (eg adding John Gilmore's pd Getopt). ^^^^^^^^^^^^^^^^^ Should have been "Henry Spencer's" Sorry about these problems with my attributions, folks.