mrice@caen.engin.umich.edu (Michael Rice) (08/10/90)
What I want to do is store some type of file indentifier for each file so when I get a new file I can create this identified and check it against the identifiers for my other files. If they match then I would know that file is a duplicate. They way I implemented this (the first thing that came into my mind) was to add up the first X number of bytes and use this as the indentified (checksum?). When this X is about 1000 bytes I found 5 duplicates that were duplicates and 1 set that were not duplicates. This was with about 100 files. Any ideas on a better way to do this, more accurate, etc? I am doing this in Turbo C if that matters. Any help is appreciated. Mike
few@quad1.quad.com (Frank Whaley) (08/10/90)
In article <1990Aug9.193919.2996@caen.engin.umich.edu>, mrice@caen.engin.umich.edu (Michael Rice) writes: >What I want to do is store some type of file indentifier for each file >so when I get a new file I can create this identified and check it >against the identifiers for my other files. If they match then I would >know that file is a duplicate. On Unix, the stat() function produces a unique device and inode number for each file. Unfortunately, MS-DOS implementations of stat() typically do not produce unique numbers. Some time back I posted a replacement stat() function which used the "Parent Directory Cluster Number" and "Entry Count In Directory" fields from the ffblk structure to produce a unique inode number. This has worked well enough for me -- I haven't received any bug reports about missing "File in use" messages since the change. The following code demonstrates fetching these magic numbers. ----- #include <dir.h> extern unsigned char _osmajor; extern unsigned char _osminor; /* * devino - simulate fetching device and inode for MSDOS */ int devino(filename, device, inode) char *filename; int *device; long *inode; { int *ec; /* entry count in directory */ int *pc; /* parent dir cluster number */ int osver = (_osmajor * 10) + _osminor; struct ffblk ff; /* get information from directory entry */ if ( findfirst(filename, &ff, 0x17) != 0 ) { return ( -1 ); } ec = (int *)&ff.ff_reserved[13]; pc = (int *)&ff.ff_reserved[(osver >= 32) ? 15 : 19]; *device = ff.ff_reserved[(osver > 20) ? 0 : 1]; *inode = ((long)*pc << 16) + *ec; return ( 0 ); } #ifdef TEST int main(argc, argv) int argc; char *argv[]; { int i; int device; long inode; for ( i = 1; i < argc; i++ ) { if ( devino(argv[i], &device, &inode) < 0 ) { printf("%s: devino error\n", argv[i]); } else { printf("%s: %d %ld\n", argv[i], device, inode); } } } #endif -- Frank Whaley Senior Development Engineer Quadratron Systems Incorporated few@quad1.quad.com uunet!ccicpg!quad1!few Water separates the people of the world; Wine unites them.
jpd@pc.usl.edu (Dugal James P.) (08/11/90)
In article <1990Aug9.193919.2996@caen.engin.umich.edu> mrice@caen.engin.umich.edu (Michael Rice) writes: >What I want to do is store some type of file indentifier for each file >so when I get a new file I can create this identified and check it Mike, how about using the stat() function and concatenating st_dev with st_ino? If this doesn't work I have a function I wrote for Lattice C several years ago that does work. However, the st_ino value may well change after a disk reorganization. --James -- -- James Dugal, N5KNX Internet: jpd@usl.edu Associate Director Ham packet: n5knx@k5arh Computing Center US Mail: PO Box 42770 Lafayette, LA 70504 University of Southwestern LA. Tel. 318-231-6417 U.S.A.