[comp.sources.misc] v02i060: duonly - Program to display du output ONLY for a directory

dennis@rlgvax.UUCP (Dennis.Bednar) (02/22/88)

Comp.sources.misc: Volume 2, Issue 60
Submitted-By: "Dennis.Bednar" <dennis@rlgvax.UUCP>
Archive-Name: duonly

SHORT blurb from the duonly.help file, just to summarize what
this is program is for:

Prints disk usage block sizes only for directories,
but NOT including sizes contributed by sub-directories.

This tool was developed because when viewing the output of a
regular du, it was difficult to see which directories were
the real culprits for occupying the most number of disk blocks.

#--------------- CUT HERE ---------------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	duonly.c
#	duonly.help
# This archive created: Wed Feb 17 20:31:04 EST 1988
#
if test -f duonly.c
then
echo shar: will not over-write existing file 'duonly.c'
else
echo x - duonly.c
# ............    F  I   L   E      B  E  G  .......... duonly.c
cat << '\SHAR_EOF' > duonly.c
/*
 * duonly.c
 * dennis bednar Feb 17, 1988
 *
 * Reads stdin of du output, and outputs sizes based ONLY on the
 * space used within a directory only.  The size will be the
 * number of blocks used by the directory itself, plus any
 * files contained within that directory.
 *
 * NOTE:
 * Much of the source code in dugraph.c (contributed to comp.unix.misc)
 * I borrowed "as is".  What functions I added are at the end.
 * I also added the "father" link.  All of the brothers of the
 * first son, including the first son, have the same father node.
 * That is, if we have /a/b, and /a/c, then "b" and "c" are brothers,
 * and "b"'s and "c"'s father are both "a".  I also added a new
 * variable "me_size" which is the size of the directory, not including
 * the sizes of the sons immediately below it.
 *
 * NOTE: If you do "du /dir1/dir2/dir3| duonly", then the size of
 * /dir1, and /dir2 will be 0, since they were NOT included in
 * the original output of du!!!
 *
 * NOTE: if you do "du /dir1/dir2/dir3| duonly", then the root
 * placeholder points to a cell for "", and NOT to "dir1" as
 * you might expect.  This is because of how read_input() parses
 * its input.  It is not a problem per se, except that the node
 * containing a blank name should not be printed in this case.
 * So, the solution is to add an extra check and avoid the print
 * when the name is "".  By the way the root for "" only occurs
 * because the pathnames are absolute (begin with a /).
 *
 * NOTE: although output of directory names are sorted there is
 * one minor anomoly that occurs when you have input as follows
 * (sizes ommited, since irrelevant):
 * ../dir
 * ...dir/RCS
 * ...dir.ext
 * The important point is that we have both a "/" and a "." after the
 * same name "dir".
 * A sort would place the "." before the "/" (ie line 3 before line 2),
 * but the sort performed by du is only within each directory level, so
 * that duonly would output line 2 before line 3 (since ../dir and
 * ../dir.ext are sorted brothers, and ../dir/RCS is a sorted son of
 * ../dir) !!
 *
 */
#if 0
Path: rlgvax!sundc!seismo!uunet!husc6!think!ames!necntc!ncoast!allbery
From: drw@culdev1.UUCP (Dale Worley)
Newsgroups: comp.sources.misc
Subject: Prettyprint a du listing
Message-ID: <6803@ncoast.UUCP>
Date: 17 Dec 87 02:33:54 GMT
Sender: allbery@ncoast.UUCP
Organization: Cullinet Software, Westwood, MA, USA
Lines: 447
Approved: allbery@ncoast.UUCP
X-Archive: comp.sources.misc/8712/8

I've always wanted to get a du listing that shows how the space is
being used graphically.  I finally wrote a program to digest a du
listing and print out a tree, where each directory occupies lines
proportionally to how much space the files in it consume.

To run it, type "du | dugraph" or some such.  The listing for each
directory starts with a blank space showing space occupied by files
directly in that directory, then the subtrees for each subdirectory
(in descending order of size).  If the subdirectories at the bottom
get so small that they occupy less than 1 line each, they are all
merged into an entry "(etc.)".

The entire listing always occupies 60 lines (the value of 'length').
This program has tab-width = 5.
--------------------------dugraph.c---------------------------------
#endif
/* program to make a pretty graph out of a du report */

#include <stdio.h>
#include <string.h>

/* number of lines the listing should occupy */
int	length = 60;
/* message for suppressed directories */
#define	SUPPRESSED	"(etc.)"

/* format of a tree node */
struct node {
			struct node	*lson;	/* left son */
			struct node	*rbrother;/* right brother */
			struct node	*father; /* parent node, up ptr */
			unsigned long	size;	/* size of directory in kbytes */
			unsigned long	me_size;	/* size of directory only */
			int			loc;		/* location we will print it at */
			int			print_col;/* column to print name in */
			int			print_limit;
								/* location we can't print on or
								 * after */
			int			last;	/* are we last son of our father? */
			char			name[1];	/* name */
		  };

/* root of the tree */
struct node	*root = NULL;
/* total size of things listed */
unsigned long	total_size;
/* current line number we are on (0-origin) */
int			current_line = 0;
/* list of where to put bars */
int			bar_list[50];
/* number of bars in the list */
int			bar_count = 0;

/* declare functions */
void			read_input();
struct node	*insert_in_tree();
void			dfs();
void			dfs1();
void			missing_sizes();
void			sort_size();	/* sort tree by size */
void			sort_name();	/* sort tree alphabetically */
void			calc_loc();
void			blank();
void			mark_last();
void			calc_pc();
void			output();
void			position();
void			show_node();
void			my_space();

main()
	{
	struct node	*t;	/* scratch */

	/* read the input and form a tree */
	read_input();
	root->size = 0;
	/* put sizes on entries that have none */
	dfs(NULL, missing_sizes);
	/* sort each directory */
	dfs(sort_name, NULL);

	/* for each directory, subtract the space used up by
	 * all children one level below me, so that we
	 * can tell how much space is occupied by this
	 * directory only.
	 * IMPORTANT: We need to compute in pre-order, or we
	 * won't get the right results.
	 */
	dfs(my_space, NULL);

	/* print the tree */
	dfs(show_node, NULL);
	exit(0);

	/* unused left-over code I might need later. */

	/* calculate the total size */
	total_size = 0;
	for (t = root->lson; t != NULL; t = t->rbrother)
		total_size += t->size;
	/* calculate the location of each directory */
	/* blank out subdirectories that get scrunched together at the bottom */
	root->print_limit = length;
	dfs(calc_loc, blank);
	/* print out the tree */
	for (t = root->lson; t != NULL; t = t->rbrother)
		{
		/* mark the last son of each directory */
		/* figure out the print columns */
		t->print_col = 0;
		dfs1(calc_pc, mark_last, t);
		dfs1(output, NULL, t);
		}
	/* put blank space at end */
	position(length);
	}

/* read input and form a tree */
void read_input()
	{
	unsigned long	size;		/* size read from input */
	char			name[100];	/* directory name read from input */

	/* make the dummy node at the top of the tree */
	root = (struct node *)malloc(sizeof (struct node));
	root->name[0] = '\0';
	root->lson = NULL;
	root->father = NULL;	/* if walking up the ladder, don't fall off ! */
	/* read the next line of input */
	while (fscanf(stdin, "%lu %s\n", &size, name) != EOF)
		{
		/* insert (or find) the directory in the tree and save its size */
		insert_in_tree(name)->size = size;
		}
	}

/* insert (or find) a directory in the tree */
struct node *insert_in_tree(name)
	char	*name;		/* name of the directory */
	{
	struct node	*t;	/* pointer for searching down through tree */
	char			*np;	/* points to next part of directory name to be
					 * examined */
	struct node	*t1;	/* scratch pointer */
	char			*np1;/* scratch pointer */

	/* read through the name, one directory-part at a time, and hunt
	 * down the tree, constructing nodes as needed */
	for (t = root, np = name; np != NULL; np = np1)
		{
		/* extract the next directory-part */
		if ((np1 = strchr(np, '/')) != NULL)
			{
			/* we found a slash, replace it with a null, and position
			 * np1 to point to the remainder of the name */
			/* can store node.name=="" when name begins with / */
			*np1++ = '\0';
			}
		/* else */
			/* we found no shash, so we are at the end of the name
			 * np1 has been set to NULL for us by strchr */
		/* search the sons of this node for a node with the proper name */
		for (t1 = t->lson; t1 != NULL && strcmp(t1->name, np) != 0;
				t1 = t1->rbrother)
			;
		/* did we find one? */
		if (t1 != NULL)
			/* yes, go to it */
			t = t1;
		else
			{
			/* no, make one */
			t1 = (struct node *)malloc(sizeof(struct node) + strlen(np));
			strcpy(t1->name, np);
			t1->lson = NULL;
			t1->rbrother = NULL;
			t1->father = t;
			t1->size = 0;
			/* insert it in tree */
			t1->rbrother = t->lson;
			t->lson = t1;
			t = t1;
			}
		}
	return t;
	}

/* depth-first-search routine */
void dfs(pre_routine, post_routine)
	void	(*pre_routine)();	/* routine to execute before scanning
						 * descendants */
	void	(*post_routine)();	/* routine to execute after scanning
						 * descendants */
	{
	dfs1(pre_routine, post_routine, root);
	}

/* depth-first-search service routine */
void dfs1(pre_routine, post_routine, t)
	void	(*pre_routine)();	/* routine to execute before scanning
						 * descendants */
	void	(*post_routine)();	/* routine to execute after scanning
						 * descendants */
	struct node *t;		/* node to operate on */
	{
	struct node *t1;		/* scratch pointer */

	/* if it exists, execute the pre-routine */
	if (pre_routine != NULL)
		pre_routine(t);
	/* call self on sons of this node */
	for (t1 = t->lson; t1 != NULL; t1 = t1->rbrother)
		dfs1(pre_routine, post_routine, t1);
	/* if it exists, execute the post-routine */
	if (post_routine != NULL)
		post_routine(t);
	}

/* add missing sizes */
void missing_sizes(t)
	struct node	*t;
	{
	struct node	*t1;		/* scratch pointer */
	unsigned long	s;		/* scratch */

	if (t->size == 0)
		{
		/* size is missing, we have to calcuate it */
		s = 0;
		for (t1 = t->lson; t1 != NULL; t1 = t1->rbrother)
			s += t1->size;
		t->size = s;
		}
	}

/* sort the directories under a directory by size */
void sort_size(t)
	struct node	*t;
	{
	struct node	*p1, *p2, *p3, *pp;		/* scratch pointers */
	int			nodes, n;				/* scratch */

	/* count the number of nodes */
	nodes = 0;
	for (p1 = t->lson; p1 != NULL; p1 = p1->rbrother)
		nodes++;
	/* just a simple and inefficient bubble sort */
	for (n = 1; n < nodes; n++)
		for (p1 = NULL, p2 = t->lson, p3 = p2->rbrother; p3 != NULL;
				p1 = p2, p2 = p3, p3 = p3->rbrother)
			{
			if (p2->size < p3->size)
				{
				/* exchange the nodes p2 and p3 */
				pp = p3->rbrother;
				p3->rbrother = p2;
				p2->rbrother = pp;
				if (p1 != NULL)
					p1->rbrother = p3;
				else
					t->lson = p3;
				/* exchange the values of p2 and p3 */
				pp = p2;
				p2 = p3;
				p3 = pp;
				}
			}
	}

/* calculate the print location */
void calc_loc(t)
	struct node	*t;
	{
	unsigned long	cs;		/* scratch */
	struct node	*t1, *t2;	/* scratch pointers */
	int			print_limit;
						/* location next directory after t will
						 * be printed */

	if (t == root)
		cs = 0;
	else
		{
		/* figure out how much is in the directory itself */
		for (t1 = t->lson, cs = 0; t1 != NULL; t1 = t1->rbrother)
			{
			cs += t1->size;
			}
		/* cs is the size accounted for by subdirectories */
		cs = t->size - cs;
		}
	/* cs is the size of the files in the directory itself */
	/* convert cs to lines */
	cs = cs*length/total_size + t->loc;
	/* calculate where next directory after t will be */
	print_limit = t->print_limit;
	/* assign locations */
	for (t1 = t->lson, t2 = NULL; t1 != NULL; t2 = t1, t1 = t1->rbrother)
		{
		/* make sure we don't run into next directory */
		if (cs >= print_limit)
			{
			cs = print_limit-1;
			}
		t1->loc = cs;
		if (t2 != NULL)
			t2->print_limit = cs;
		cs += t1->size*length/total_size;
		}
	if (t2 != NULL)
		t2->print_limit = print_limit;
	}

/* figure out which directories to blank out */
void blank(t)
	struct node	*t;
	{
	struct node	*t1, *t2, *t3;		/* loop pointers */

	/* return if there aren't at least two sons */
	if (t->lson == NULL || t->lson->rbrother == NULL)
		return;
	for (t1 = NULL, t2 = t->lson, t3 = t2->rbrother; t3 != NULL;
			t1 = t2, t2 = t3, t3 = t3->rbrother)
		if (t2->loc == t3->loc)
			{
			/* replace t1 and succeeding nodes with "(etc.)" */
			t3 = (struct node *)malloc(sizeof (struct node) +
				sizeof (SUPPRESSED) - 1);
			strcpy(t3->name, SUPPRESSED);
			t3->lson = t3->rbrother = NULL;
			t3->loc = t2->loc;
			if (t1 == NULL)
				t->lson = t3;
			else
				t1->rbrother = t3;
			}
	}

/* mark the last son of each directory */
void mark_last(t)
	struct node	*t;
	{
	struct node	*t1, *t2;	/* scratch pointers */
	t->last = 0;
	for (t1 = t->lson, t2 = NULL; t1 != NULL; t2 = t1, t1 = t1->rbrother)
		;
	if (t2 != NULL)
		t2->last = 1;
	}

/* calculate the print columns */
void calc_pc(t)
	struct node	*t;
	{
	struct node	*t1;		/* scratch pointer */
	int			c;		/* column suns will be printed in */

	c = t->print_col + strlen(t->name) + 5;
	for (t1 = t->lson; t1 != NULL; t1 = t1->rbrother)	
		t1->print_col = c;
	}

/* write the output */
void output(t)
	struct node	*t;
	{
	position(t->loc);
	printf("--%s%s", t->name, (t->lson != NULL ? "--+" : ""));
	/* remove the bar for our father if we are the last son */
	if (t->last)
		bar_count--;
	/* add the location of the bar to the bar list if we have a son */
	if (t->lson != NULL)
		{
		bar_list[bar_count] = t->print_col + strlen(t->name) + 5 - 1;
		bar_count++;
		}
	}

/* position to a specific line */
void position(line)
	int	line;		/* line number */
	{
	int	i;			/* counts through the bar list */
	int	j;			/* current column number */

	/* for every line we need to go down */
	for (; current_line < line; current_line++)
		{
		putchar('\n');
		/* print the bars for this line */
		j = 0;
		for (i = 0; i < bar_count; i++)
			{
			for (; j < bar_list[i]; j++)
				putchar(' ');
			if (current_line == line-1 && i == bar_count-1)
				putchar('+');
			else
				putchar('|');
			j++;
			}
		}
	}
#if 0
-----------------------------------example-----------------------------------
--.--+
     |
     |
     |
     |
     |
     |
     |
     +--scpp--+--ftps--+
     |        |        +--scpp--+--temp
     |        +--error
     |        |
     |        +--shar--+
     |                 |
     |                 +--temp
     +--uemacs--+--uemacs3.9
     |
     |
     |
     |
     |
     |
     +--patch--+--dist
     |         |
     |         |
     |         +--build
     |
     |
     |
     +--sccs--+--all
     |        |
     |        |
     |        |
     |        +--(etc.)
     +--yacctest
     |
     |
     |
     +--yacc
     |
     |
     |
     +--rnsend--+--dist1
     |          +--dist3
     |          +--dist2
     +--bin--+
     |       +--source
     +--sources
     |
     +--kwfrob
     +--rsts.tape
     +--rnews
     +--ftp-server
     +--(etc.)






------------------------------end of example------------------------------
-- 
Dale Worley    Cullinet Software      ARPA: culdev1!drw@eddie.mit.edu
UUCP: ...!seismo!harvard!mit-eddie!culdev1!drw
Nothing shocks me -- I'm a scientist.
#endif


/* begin of new code added by dennis bednar */
/*
 * Print the full name associated with the path to this component.
 * Since each node contains only one component of the full path name,
 * in order to print out the entire name, we have to print the
 * "father part of the component", followed by the component.
 * This is why I invented the "father" up pointer.
 * This is done recursively, I might add.
 *
 * the root is a dummy holder, it contains no real name.
 * The top node which contains the first real name will have
 * a father pointer to root.
 * Rather, root -> lson contains the first real root of the tree.
 * This is why root->rbrother is NULL.
 *
 * So do NOT print the root node's name.
 * Also, when printing root->lson's father, it will be "".
 */
void
show_node( p )
	struct	node	*p;
{
	if (p == root)	/* just a placeholder, its lson is the real root */
		return;	/* don't print jibberish */

	/* avoid printing root's lson node whose name is "".
	 * This occurs when first line of stdin contains a
	 * directory name beginning with a leading /.
	 * PS, since root node's name is also "", I probably could remove
	 * the "if" check above, and let this if stmt do the work of
	 * both, but I didn't feel like it.
	 */
	if (p -> name[0] == '\0')
		return;

	/* avoid printing /a and /b whose size is zero, just
	 * because we did a "du /a/b/c | duonly".
	 */
	if (p -> me_size == 0L)
		return;

/*	printf("'%s' and my father is <%s>\n", p -> name, p -> father -> name); */
	printf("%ld\t", p -> me_size);
	path_print(p);
	putchar( '\n' );	/* terminate line */
}

/*
 * recursively print the full path by printing the "part before me",
 * then printing my component name.
 */
path_print( p )
	struct	node	*p;
{
	/* this should not ever happen, because of how path_print() is written */
	if (p == NULL)
		return;		/* paranenoid */

	/* this code is structured so that a leading slash will NOT
	 * be printed before the first component, but rather ONLY
	 * between all components, and NOT after the last component.
	 */
	if (p -> father == root)
		{
		printf("%s", p ->name );
		return;
		}
	else
		{
		/* recursion */
		path_print( p -> father );	/* print components before me */
		printf( "/%s", p -> name );	/* print my component */
		}
}

/*
 * compute size of this directory only, ie don't include
 * size of children directories.  This leaves space only
 * occupied by this directory remaining in the "me_size".
 */
void
my_space( p )
	struct	node	*p;
{
	long	total = 0;
	struct	node	*s;	/* each son of the parent p */

	total = p -> size;	/* blocks used by this directory */

	/* subtract blocks used by immediate children directories.
	 * WILL NOT WORK CORRECTLY if a child is a file !!!!!
	 * this will leave total containing only the number of blocks
	 * in this directory.
	 */
	for ( s = p -> lson; s; s = s -> rbrother)
		total -= s -> size;

	/* store my size only for this parent */
	p -> me_size = total;
}

/*
 * sort the directories under a directory by name.
 *
 * this is just the old "sort()" routine [sort() renamed to
 * sort_size() for clarity] with a minor change, where
 * we now compare names, not size.  That is, we sort
 * the brothers or peers at each level.
 * SEE ALSO the anomoly concerning "." and "/" with regards
 * to sorting, described at the top of this file.
 */
void sort_name(t)
	struct node	*t;
	{
	struct node	*p1, *p2, *p3, *pp;		/* scratch pointers */
	int			nodes, n;				/* scratch */
	int	cmp;

	/* count the number of nodes */
	nodes = 0;
	for (p1 = t->lson; p1 != NULL; p1 = p1->rbrother)
		nodes++;
	/* just a simple and inefficient bubble sort */
	for (n = 1; n < nodes; n++)
		for (p1 = NULL, p2 = t->lson, p3 = p2->rbrother; p3 != NULL;
				p1 = p2, p2 = p3, p3 = p3->rbrother)
			{
			cmp = strcmp( p2 -> name, p3 -> name );
			if (cmp > 0)	/* not alphabetized, p3 1st, p2 2nd */
				{
				/* exchange the nodes p2 and p3 */
				pp = p3->rbrother;
				p3->rbrother = p2;
				p2->rbrother = pp;
				if (p1 != NULL)
					p1->rbrother = p3;
				else
					t->lson = p3;
				/* exchange the values of p2 and p3 */
				pp = p2;
				p2 = p3;
				p3 = pp;
				}
			}
	}

\SHAR_EOF
# ............    F  I   L   E      E  N  D  .......... duonly.c
fi # end of overwriting check
if test -f duonly.help
then
echo shar: will not over-write existing file 'duonly.help'
else
echo x - duonly.help
# ............    F  I   L   E      B  E  G  .......... duonly.help
cat << '\SHAR_EOF' > duonly.help
duonly

Prints disk usage block sizes only for directories,
but NOT including sizes contributed by sub-directories.

This tool was developed because when viewing the output of a
regular du, it was difficult to see which directories were
the real culprits for occupying the most number of disk blocks.
For example, if you did a "du /usr", the last line would be
something like:

18112	/usr

even though /usr might only contain sub-directories, but no files.
Hence, it might be that a sub-directory of /usr was the real
culprit for being a disk hog, and not /usr (or a sub-sub-directory,
etc., of /usr might be the culprit).

Duonly is designed to read standard input generated by "du", and print
information in the same style as du:

size	directory_name

However, the size reported by duonly is the size in blocks
contributed "ONLY by the directory_name itself, including
the sum of the sizes of all regular files directly beneath
directory_name".  It does NOT include the sizes inherited
by ANY of the children directories.  Thus the size reported
by duonly might be thought of as ONLY for that directory.


SAMPLE USAGE

	du | duonly
	du /usr | duonly


NOTES

The directory names are sorted alphabetically by each level.
There is one minor anomoly about the sorted order of directory
names. Duonly will output this order:
	10	dir
	5	dir/subdir
	3	dir.2
but a true "du | duonly | sort +1", with sorting applied to
the second column would really output:
	10	dir
	3	dir.2
	5	dir/subdir
since "." lexically preceeds "/".
With the exception noted, the rest of the output generated by du
is in the correct order.
\SHAR_EOF
# ............    F  I   L   E      E  N  D  .......... duonly.help
fi # end of overwriting check
# end of shell archive
exit 0
-- 
FullName:	Dennis Bednar
UUCP:		{uunet|sundc}!rlgvax!dennis
USMail:		CCI; 11490 Commerce Park Dr.; Reston VA 22091
Telephone:	+1 703 648 3300