[comp.os.msdos.programmer] Unique file determination

mrice@caen.engin.umich.edu (Michael Rice) (08/10/90)

What I want to do is store some type of file indentifier for each file
so when I get a new file I can create this identified and check it
against the identifiers for my other files.  If they match then I would
know that file is a duplicate.  

They way I implemented this (the first thing that came into
my mind) was to add up the first X number of bytes and use this
as the indentified (checksum?).  When this X is about 1000 bytes
I found 5 duplicates that were duplicates and 1 set that were not
duplicates. This was with about 100 files.

Any ideas on a better way to do this, more accurate, etc?
I am doing this in Turbo C if that matters.
Any help is appreciated.
Mike
 

few@quad1.quad.com (Frank Whaley) (08/10/90)

In article <1990Aug9.193919.2996@caen.engin.umich.edu>,
mrice@caen.engin.umich.edu (Michael Rice) writes:
>What I want to do is store some type of file indentifier for each file
>so when I get a new file I can create this identified and check it
>against the identifiers for my other files.  If they match then I would
>know that file is a duplicate.  

On Unix, the stat() function produces a unique device and inode number
for each file.  Unfortunately, MS-DOS implementations of stat() typically
do not produce unique numbers.  Some time back I posted a replacement
stat() function which used the "Parent Directory Cluster Number" and
"Entry Count In Directory" fields from the ffblk structure to produce
a unique inode number.  This has worked well enough for me -- I haven't
received any bug reports about missing "File in use" messages since
the change.

The following code demonstrates fetching these magic numbers.
-----
#include <dir.h>

extern unsigned char _osmajor;
extern unsigned char _osminor;

/*
 *	devino - simulate fetching device and inode for MSDOS
 */
int
devino(filename, device, inode)
char *filename;
int *device;
long *inode;
{
	int *ec;	/*  entry count in directory  */
	int *pc;	/*  parent dir cluster number  */
	int osver = (_osmajor * 10) + _osminor;
	struct ffblk ff;

	/*  get information from directory entry  */
	if ( findfirst(filename, &ff, 0x17) != 0 )
	{
		return ( -1 );
	}

	ec = (int *)&ff.ff_reserved[13];
	pc = (int *)&ff.ff_reserved[(osver >= 32) ? 15 : 19];

	*device = ff.ff_reserved[(osver > 20) ? 0 : 1];
	*inode = ((long)*pc << 16) + *ec;

	return ( 0 );
}

#ifdef TEST
int
main(argc, argv)
int argc;
char *argv[];
{
	int i;
	int device;
	long inode;

	for ( i = 1; i < argc; i++ )
	{
		if ( devino(argv[i], &device, &inode) < 0 )
		{
			printf("%s: devino error\n", argv[i]);
		}
		else
		{
			printf("%s: %d %ld\n", argv[i], device, inode);
		}
	}
}
#endif
-- 
Frank Whaley
Senior Development Engineer
Quadratron Systems Incorporated
few@quad1.quad.com
uunet!ccicpg!quad1!few

Water separates the people of the world;
Wine unites them.

jpd@pc.usl.edu (Dugal James P.) (08/11/90)

In article <1990Aug9.193919.2996@caen.engin.umich.edu> mrice@caen.engin.umich.edu (Michael Rice) writes:
>What I want to do is store some type of file indentifier for each file
>so when I get a new file I can create this identified and check it
Mike, how about using the stat() function and concatenating st_dev with
st_ino?  If this doesn't work I have a function I wrote for Lattice C
several years ago that does work.  However, the st_ino value may well change
after a disk reorganization.

--James
-- 
-- James Dugal,	N5KNX		Internet: jpd@usl.edu
Associate Director		Ham packet: n5knx@k5arh
Computing Center		US Mail: PO Box 42770  Lafayette, LA  70504
University of Southwestern LA.	Tel. 318-231-6417	U.S.A.