[comp.lang.perl] Simplifying paths. A hairy regex

muir@cae780.csi.com (David Muir Sharnoff) (03/02/91)

It's 10pm, everyone else has gone home, I just have to 
share this...  The most twisted regex that I've had to build.

I wanted to get rid of extra junk from unix filenames.

To that end, I tranform:

	a//c            -> a/c
	a/./c           -> a/c
	a/c/.           -> a/c
	a/b/../c        -> a/c
	a/c/d/..        -> a/c

Note: I do not tranform //a/c -> /a/c as that would break Apollo filenames.

------------ perl starts here ----------
sub simplify 
{
	for $p (@_) {
		while($p =~ s!(/\.(/))|(^(.+/)/)|(/\.$)|([^/.]+/\.\./)|(/[^/.]+/\.\.$)!\2\4!) {;}
	}
}
------------ perl ends here ----------

Please let me know if I overlooked something...  This is
production code and I support it in-house.

$,= ' ';$japh = "Just/not/really/../../another/././perl/hacker,/./today/../."; while($japh =~ s!(/\.(/))|(^(.+/)/)|(/\.$)|([^/.]+/\.\./)|(/[^/.]+/\.\.$)!\2\4!) {;}; print (split(m,/+,,$japh))[1-4];

-- 
David Muir Sharnoff.			"RISC is about one year ahead"
muir@csi.com				(415) 358-3664 (415) 644-0441
Comdisco Systems Inc.  919 East Hillsdale Blvd, Foster City, CA 94404

merlyn@iwarp.intel.com (Randal L. Schwartz) (03/03/91)

In article <11249@cae780.csi.com>, muir@cae780 (David Muir Sharnoff) writes:
| It's 10pm, everyone else has gone home, I just have to 
| share this...  The most twisted regex that I've had to build.
| 
| I wanted to get rid of extra junk from unix filenames.
| 
| To that end, I tranform:
| 
| 	a//c            -> a/c
| 	a/./c           -> a/c
| 	a/c/.           -> a/c
| 	a/b/../c        -> a/c
| 	a/c/d/..        -> a/c
| 
| Note: I do not tranform //a/c -> /a/c as that would break Apollo filenames.
| 
| ------------ perl starts here ----------
| sub simplify 
| {
| 	for $p (@_) {
| 		while($p =~ s!(/\.(/))|(^(.+/)/)|(/\.$)|([^/.]+/\.\./)|(/[^/.]+/\.\.$)!\2\4!) {;}
| 	}
| }
| ------------ perl ends here ----------

I'd deal with it more like what it is... a series of commands to execute:

sub simplify {
	local(@source,@dest);
	local($body);
	for $p (@_) {
		@source = split(/\//, $p);
		$body = 0;
			# ($body > 0) means have seen non-null entries
		for (@source) {
			push(@dest,$_) unless $body && /^\.{0,2}$/;
				# don't push body entries that are null,
				# single dot, or double dot
			pop(@dest) if $body && /^\.\.$/;
				# double dot in body means back up one
			$body++ unless length;
				# enter the body after initial null entries
		}
		$p = join("/",@dest);
	}
}

Hmm.  after writing this code, I know it breaks on "../..".  Yuck.
But I'm late for my next appointment.  The flash I just had (if
someone wants to make it work) is to have an inviolate "prefix string"
consisting of all the null and ".." entries from the head of the
string as in /^((\.\.)?\/)+/, and then push and pop the rest as above.
When you reassemble the string, you glue together the prefix and the
stack.  Maybe I'll try finishing that off later this evening.

print "Just another Perl hacker", # on a tight schedule today... durn.
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/03/91)

In article <1991Mar2.202521.29658@iwarp.intel.com> merlyn@iwarp.intel.com (Randal L. Schwartz) writes:
> Hmm.  after writing this code, I know it breaks on "../..".  Yuck.
> But I'm late for my next appointment.  The flash I just had (if
> someone wants to make it work) is to have an inviolate "prefix string"
> consisting of all the null and ".." entries from the head of the
> string as in /^((\.\.)?\/)+/, and then push and pop the rest as above.

No, that fails. foo/../../bar. You'll have to keep track of the number
of leading ..'s, then apply your solution, increasing the number of ..'s
by one every time you pop an empty stack.

I think the regexp version is more natural; I basically did it that way
when I wrote the same routine in C a month back.

---Dan

rbj@uunet.UU.NET (Root Boy Jim) (03/05/91)

In article <11249@cae780.csi.com> muir@csi.com  (David Muir Sharnoff) writes:

?I wanted to get rid of extra junk from unix filenames.

Why bother?

?To that end, I tranform:
?
?	a//c            -> a/c
?	a/./c           -> a/c
?	a/c/.           -> a/c
?	a/b/../c        -> a/c
?	a/c/d/..        -> a/c

Beware: a/b/../c is not always a/c if b is a symlink.
This bothers some people, but not others.
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane