[comp.os.vms] TABs, SPaces, and TPU: TAB_IT.COM, DETAB.COM

u3369429@murdu.OZ (Michael Bednarek) (10/29/87)

Hi again!

Remember my earlier TPU-procedure DETAB?
Well, if you don't, it replaces TABs with the appropriate amount of spaces.

Now, I have a number of files which suffer from excessive spaces, and I would
like to replace them with TABs, thus saving disk space. So I came up with the
following tool TAB_IT which does just that.

As TAB_IT has to be concerned with peculiarities regarding files with
Fortran Carriage Control, DETAB had to be re-written and is also included.

My greatest achievement so far is a document of 10735 lines from the
Australian Bureau of Statistics which shrank from 2000 blocks to 1230 blocks,
replacing almost half a million (453681) blanks with 60992 TABs.
The cost on a VAX 8650:
  Buffered I/O count:          100      Peak working set size:  2048
  Direct I/O count:            274      Peak page file size:    3783
  Page faults:                9683
  Charged CPU time:     0 00:01:31.58

However, when I tried to test the reverse function, DETAB, I had to kill the
process after 90 minutes of CPU time and 40 million page faults. It seems that
TPU is severely brain damaged when it comes to replacing short strings (TABs)
by longer strings (SPs) om a large scale. I realise that a lot of dynamic
memory allocation is needed, but 40 million page faults when a file of
611731 characters is to be expanded to 1004420 seems a bit gross.

Was this behaviour of TPU discussed here before? Are there work-arounds?
Has it been SPRed?

When the file to be DETABbed was split into 11 smaller files of 1000 lines
each, the costs were:
  Buffered I/O count:          345      Peak working set size:  2048
  Direct I/O count:            686      Peak page file size:    3849
  Page faults:               36080
  Charged CPU time:     0 00:26:37.04

It works, but seems rather inefficient. May be its cheaper with files of 500
lines, but I was tired of trying.

By the way, how many of you knew that DIFFERENCES can rid a file of TABs and
replace them with blanks? To wit:
$ Differences/Ignore=Pretty/NoNumber/Output=DOC.DETAB DOC.TAB NL:
  Buffered I/O count:          123      Peak working set size:  1868
  Direct I/O count:            353      Peak page file size:    2577
  Page faults:                3226
  Charged CPU time:     0 00:07:55.72

The output file needs some editing, of course, and it doesn't work for
non-standard tab stops, nor for Fortran Carriage Control files, but otherwise
its fine. Still a bit expensive, though.

However, can somebody explain why DIFFERENCES reports NO DIFFERENCES when
it compares NL: and <any-file> ?


TPU-question: I want to inform the user about the progress of the program by
issuing a message every, say 1000 lines. How do I find out the line number
of the current line?


Michael Bednarek
Institute of Applied Economic and Social Research (IAESR)
Melbourne University, Parkville 3052, AUSTRALIA, Phone : +61 3 344 5744
Domain: u3369429@{murdu.oz.au | ucsvc.dn.mu.oz.au}  or  mb@munnari.oz.au
"bang": ...UUNET.UU.NET!munnari!{murdu.oz | ucsvc.dn.mu.oz}!u3369429

"POST NO BILLS."

...................... Cut between dotted lines and save ......................
$!.............................................................................
$! VAX/VMS archive file created by VMS_SHAR V-5.03 07-Oct-1987
$! which was written by Michael Bednarek (U3369429@ucsvc.dn.mu.oz.au)
$! To unpack, simply save and execute (@) this file.
$!
$! This archive was created by U3369429 (Michael Bednarek)
$! on Wednesday 28-OCT-1987 17:14:42.66
$!
$! It contains the following 2 files:
$! TAB_IT.COM DETAB.COM
$!=============================================================================
$ Set Symbol/Scope=(NoLocal,NoGlobal)
$ Version=F$GetSYI("VERSION") ! See what VMS version we have here:
$ If Version.ges."V4.4" then goto Version_OK
$ Write SYS$Output "Sorry, you are running VMS ",Version, -
                ", but this procedure requires V4.4 or higher."
$ Exit 44
$Version_OK: CR[0,8]=13
$ Pass_or_Failed="failed!,passed."
$ Goto Start
$Convert_File:
$ Read/Time_Out=0/Error=No_Error1/Prompt="creating ''File_is'" SYS$Command ddd
$No_Error1: Define/User_Mode SYS$Output NL:
$ Edit/TPU/NoSection/NoDisplay/Command=SYS$Input/Output='File_is' -
        VMS_SHAR_DUMMY.DUMMY
f:=Get_Info(Command_Line,"File_Name");b:=Create_Buffer("",f);
o:=Get_Info(Command_Line,"Output_File");Set(Output_File,b,o);
Position(Beginning_of(b));Loop x:=Erase_Character(1);Loop ExitIf x<>"V";
Move_Vertical(1);x:=Erase_Character(1);Append_Line;
Move_Horizontal(-Current_Offset);EndLoop;Move_Vertical(1);
ExitIf Mark(None)=End_of(b) EndLoop;Position(Beginning_of(b));Loop
x:=Search("`",Forward,Exact);ExitIf x=0;Position(x);Erase_Character(1);
If Current_Character='`' then Move_Horizontal(1);else
Copy_Text(ASCII(INT(Erase_Character(3))));EndIf;EndLoop;Exit;
$ Delete VMS_SHAR_DUMMY.DUMMY;*
$ Checksum 'File_is
$ Success=F$Element(Check_Sum_is.eq.CHECKSUM$CHECKSUM,",",Pass_or_Failed)+CR
$ Read/Time_Out=0/Error=No_Error2/Prompt=" CHECKSUM ''Success'" SYS$Command ddd
$No_Error2: Return
$Start:
$ File_is="TAB_IT.COM"
$ Check_Sum_is=1834644605
$ Copy SYS$Input VMS_SHAR_DUMMY.DUMMY
X$ Verify='F$Verify(0)
X$ Facility_Name=`009"TAB_IT"
X$ Facility_Version=`009"V-1.01 28-Oct-1987"
X$!
X$ On Error then Continue
X$!
X$!Michael Bednarek`009`009u3369429@{murdu.oz.au | ucsvc.dn.mu.oz.au}
X$!Institute of Applied Economic`009-- or --
X$!  and Social Research (IAESR)`009...{UUNET.UU.NET | seismo.CSS.GOV}!munnari!
X$!Melbourne University`009`009   {murdu.oz | ucsvc.dn.mu.oz}!u3369429
X$!Parkville 3052, Phone : +61 3 344 5744
X$!AUSTRALIA
X$!
X$! Copyright (c) 1987, by Michael Bednarek
X$! The distribution of this file is unrestricted as long as this notice
X$! remains intact.
X$!
X$! Usage: @TAB_IT {Input_File Output_File | ?}
X$!
X$ Say="Write SYS$Output"
X$ Say Facility_Name," ",Facility_Version
X$ Say ""
X$ If P1.eqs."" then Inquire P1 "_Input"
X$ If (P1.nes."?") .and. (P1.nes."") then goto Get_P2
X$ Type SYS$Input
XUsage: TAB_IT Input_File Output_File
X
XTAB_IT assumes that your output device (Terminal/Printer) has TAB-STOPS
Xat every 8th position. It will replace all appropriate blanks with TABs,
Xthus reducing the file size.
X
XTAB_IT will also cater for files with a Record Attribute=Fortran Carriage
XControl.
X
XIf you want to make TAB_IT use different TAB-STOPS, set the global symbol
XTT_TABSTOPS to that value, e.g.: $ TT_TABSTOPS==12
X
XBEWARE: This program has not been extensively tested with TAB_STOP other
X`009than 8, apart from establishing that it runs.
X`009TAB_IT does not cater for asymmetric TAB-STOPS, e.g.:"1 6 7 41 73"
X
XThe program reports the number of removed SPs and inserted TABs.
XIf no SPs could be removed, no output file is written.
X$ Read/End_Of_File=Exit/Error=Exit/Time_Out=30/Prompt="more? " SYS$Command P1
X$ If F$Extract(0,1,F$Edit(P1,"UpCase,Collapse")).nes."Y" then goto Exit
X$ Type SYS$Input
XThis procedure checks for the global symbol TT_TABSTOPS. If the condition
X 1<TT_TABSTOPS<100 is true, that value is used, else TT_TABSTOPS is set to 8.
X
XTT_TABSTOPS will be communicated to TPU as a two-digit number mis-using the
X"/JOURNAL=" qualifier. The TPU program will then extract that number and use
Xit for the variable TAB_STOP.
X$ Read/End_Of_File=Exit/Error=Exit/Time_Out=90/Prompt="more? " SYS$Command P1
X$ If F$Extract(0,1,F$Edit(P1,"UpCase,Collapse")).nes."Y" then goto Exit
X$ Type SYS$Input
XThe Method of the TPU routine:
X
XThe input file is searched for spaces (SP) using the pattern SPAN(" ").
XThis pattern returns a range of found SPs (rnge), the length of which is
Xreturned by LENGTH(rnge). The function GET_INFO(b,"OFFSET_COLUMN") returns
Xthe screen column where that SP was found, observing TABs.
XThen, the number of SPs are compared to the distance to next TAB-STOP,
Xand if sufficient SPs were found, all SPs up to the next TAB-STOP are
Xreplaced with a TAB.
X
XSingle SPs are never replaced, nor are SPs in position #1 if the input file
Xhas a record attribute (RAT) "Fortran Carriage Control". The RAT of any
Xinput file is passed to TPU mis-using the "/JOURNAL=" qualifier.
X$ Goto Exit
X$!
X$Get_P2:
X$ If P2.eqs."" then Inquire P2 "_Output"
X$ If P2.eqs."" then P2=P1
X$!
X$ TT_TABSTOPS=F$Integer("''TT_TABSTOPS'")`009! Allow the user to override
X$ If TT_TABSTOPS.le.1 .or. TT_TABSTOPS.gt.99 -`009! this parameter
X`009then TT_TABSTOPS=8`009`009`009! Default=8
X$ On Warning then goto Exit
X$ JOU=F$FAO("!2ZL",TT_TABSTOPS)+F$File_Attributes(P1,"RAT")
X$!
X$ Edit/TPU/NoSection/NoDisplay/Command=SYS$INPUT/Journal='JOU'/Output='P2 'P1
XProcedure Tab_It
X On_Error
X  Return;
X EndOn_Error
X Loop
X  rnge:=Search(pat,forward,exact);`009`009! Search for next SP
X  Position(rnge);`009`009`009`009! Position thereon
X  l:=Length(rnge);`009`009`009`009! How many SPs are here?
X  If (l>1) then`009`009`009`009`009! Don't bother with one SP
X   co:=Get_Info(b,"Offset_Column")-COOKIE;`009! Offset of the first SP
X   If co>=0 then`009`009`009`009! don't touch pos.1 in FTN file
X    x:=(co/TAB_STOP+1)*TAB_STOP-co;`009`009! How far to the next TAB
X    If (x>1) then`009`009`009`009! don't replace 1 SP
X     If l>=x then`009`009`009`009! All blanks up to the TAB?
X      Erase_Character(x);`009`009`009! Erase SPs up to the next TAB
X      Copy_Text(TAB);`009`009`009`009! Insert a TAB
X      ex:=ex+x;`009`009`009`009`009! Count erased SPs
X      it:=it+1;`009`009`009`009`009! Count inserted TABs
X     else`009`009`009`009`009! SPs don't reach next TAB
X      Move_Horizontal(l);`009`009`009! skip over SPs
X     EndIf;
X    else`009`009`009`009`009! only 1 SP to the next TAB
X     Move_Horizontal(1);
X    EndIf;
X   else`009`009`009`009`009`009! skip pos.1 in FTN file
X    Move_Horizontal(1);
X   EndIf;
X  else`009`009`009`009`009`009! only 1 SP found
X   Move_Horizontal(1);
X  EndIf;
X EndLoop;
XEndProcedure;
X
Xf:=Get_Info(Command_Line,"Journal_File");
XTAB_STOP:=Int(Substr(f,1,2));
Xj:=Substr(f,3,255);
X
Xf:=Get_Info(Command_Line,"File_Name");
Xb:=Create_Buffer("",f);
XSet (Output_File,b,File_Parse(Get_Info(Command_Line,"Output_File")));
X
Xex:=0;`009! Counter for saved SPs
Xit:=0;`009! Counter for inserted TABs
XTAB:=ASCII(9);
XSP:=ASCII(32);
Xpat:=Span(SP);
XIf j="FTN" then
X Message("%TAB_IT-I-RATISFTN, Input file "+f+" is FTN");
X COOKIE:=2;
X l:=TAB_STOP+2;
X xt:="2";
X Loop
X  xt:=xt+" "+STR(l);
X  ExitIf l>512;
X  l:=l+TAB_STOP;
X EndLoop;
X Set(Tab_Stops,b,xt);
Xelse
X COOKIE:=1;
X Set(Tab_Stops,b,TAB_STOP);
XEndIf;
XMessage("%TAB_IT-I-TABSET, TAB_STOP set to "+STR(TAB_STOP));
X
XPosition (Beginning_of(b));
XTab_It;
X
XIf ex=0 then
X Message("%TAB_IT-I-NOREPL, No blanks could be replaced with TABs,");
X Message(FAO("!3(_)no output file was written."));
Xelse
V Message("%TAB_IT-I-REPL, "+FAO("!UL blank!%S replaced by !UL TAB!%S.",ex,it))
X;
XEndIf;
X
XExit;
X$Exit: Verify=F$Verify(Verify)
$ GoSub Convert_File
$ File_is="DETAB.COM"
$ Check_Sum_is=1213485252
$ Copy SYS$Input VMS_SHAR_DUMMY.DUMMY
X$ Verify='F$Verify(0)
X$ Facility_Name=`009"DETAB"
X$ Facility_Version=`009"V-2.01 28-Oct-1987"
X$!
X$ On Error then Continue
X$!
X$!Michael Bednarek`009`009u3369429@{murdu.oz.au | ucsvc.dn.mu.oz.au}
X$!Institute of Applied Economic`009-- or --
X$!  and Social Research (IAESR)`009...{UUNET.UU.NET | seismo.CSS.GOV}!munnari!
X$!Melbourne University`009`009   {murdu.oz | ucsvc.dn.mu.oz}!u3369429
X$!Parkville 3052, Phone : +61 3 344 5744
X$!AUSTRALIA
X$!
X$! Copyright (c) 1987, by Michael Bednarek
X$! The distribution of this file is unrestricted as long as this notice
X$! remains intact.
X$!
X$! Usage: @DETAB {Input_File Output_File | ?}
X$!
X$ Say="Write SYS$Output"
X$ Say Facility_Name," ",Facility_Version
X$ Say ""
X$ If P1.eqs."" then Inquire P1 "_Input"
X$ If (P1.nes."?") .and. (P1.nes."") then goto Get_P2
X$ Type SYS$Input
XUsage: DETAB Input_File Output_File
X
XDETAB assumes that your output device (Terminal/Printer) has TAB-STOPS
Xat every 8th position. It will replace all TABs with blanks, thus
Xincreasing the file size.
X
XDETAB will also cater for files with a Record Attribute=Fortran Carriage
XControl.
X
XIf you want to make DETAB use different TAB-STOPS, set the global symbol
XTT_TABSTOPS to that value, e.g.: $ TT_TABSTOPS==12
X
XBEWARE: This program has not been extensively tested with TAB_STOP other
X`009than 8, apart from establishing that it runs.
X`009DETAB does not cater for asymmetric TAB-STOPS, e.g.:"1 6 7 41 73"
X
XThe program reports the number of removed TABs and inserted SPs.
XIf no TABs could be found, no output file is written.
X$ Read/End_Of_File=Exit/Error=Exit/Time_Out=30/Prompt="more? " SYS$Command P1
X$ If F$Extract(0,1,F$Edit(P1,"UpCase,Collapse")).nes."Y" then goto Exit
X$ Type SYS$Input
XThis procedure checks for the global symbol TT_TABSTOPS. If the condition
X 1<TT_TABSTOPS<100 is true, that value is used, else TT_TABSTOPS is set to 8.
X
XTT_TABSTOPS will be communicated to TPU as a two-digit number mis-using the
X"/JOURNAL=" qualifier. The TPU program will then extract that number and use
Xit for the variable TAB_STOP.
X$ Goto Exit
X$!
X$Get_P2:
X$ If P2.eqs."" then Inquire P2 "_Output"
X$ If P2.eqs."" then P2=P1
X$!
X$ On Warning then goto Exit
X$ EOF=F$Integer(F$File_Attributes(P1,"EOF"))
X$ If EOF.le.50 then goto Small
X$ BEL[0,7]=7
V$ Say "ATTENTION: TPU has been known to loop indefinitely with large files.",B
XEL
X$ Read/End_Of_File=Exit/Error=Exit/Time_Out=30 -
X`009/Prompt="Continue regardless? " SYS$Command c
X$ If F$Extract(0,1,F$Edit(c,"UpCase,Collapse")).nes."Y" then goto Exit
X$!
V$Small: TT_TABSTOPS=F$Integer("''TT_TABSTOPS'")`009! Allow the user to overrid
Xe
X$ If TT_TABSTOPS.le.1 .or. TT_TABSTOPS.gt.99 -`009! this parameter
X`009then TT_TABSTOPS=8`009`009`009! Default=8
X$ JOU=F$FAO("!2ZL",TT_TABSTOPS)+F$File_Attributes(P1,"RAT")
X$!
X$ Edit/TPU/NoSection/NoDisplay/Command=SYS$INPUT/Journal='JOU'/Output='P2 'P1
XProcedure Detab
X On_Error
X  Return;
X EndOn_Error
X Loop
X  Position (Search(TAB,forward,exact));`009`009! Position on next TAB
X  fT:=fT+1;`009`009`009`009`009! Count it
X  co:=Get_Info(b,"Offset_Column")-COOKIE;`009! Get column offset
X  p:=(co/TAB_STOP+1)*TAB_STOP-co;`009`009! How far to next TAB_STOP
X  Tp:=Tp+p;`009`009`009`009`009! Count them
X! "Any problem can be solved by a sufficient level of indirection":
X  f:=FAO("!!!UL!AS!AS",p,"*",SP);`009! Create FAO control string: "!<p>* "
X  Erase_Character(1);`009`009`009`009! Erase old TAB
X  Copy_Text(FAO(f));`009`009`009`009! Write SPs instead
X  If fT-fT/1000*1000=0 then
X   Message("");
X   Message("%DETAB-I-PROGRESS, "
X`009+STR(fT)+"th TAB found, SPs inserted: "+STR(Tp));
X   Message(Current_Line);
X  EndIf;
X EndLoop;
XEndProcedure;
X
XfT:=0;`009! Counter for found TABs
XTp:=0;`009! Counter for inserted blanks
XTAB:=ASCII(9);
XSP:=ASCII(32);
X
Xf:=Get_Info(Command_Line,"Journal_File");
XTAB_STOP:=Int(Substr(f,1,2));
Xj:=Substr(f,3,255);
X
Xf:=Get_Info(Command_Line,"File_Name");
Xb:=Create_Buffer("",f);
XSet (Output_File,b,File_Parse(Get_Info(Command_Line,"Output_File")));
X
XIf j="FTN" then
X Message("%DETAB-I-RATISFTN, Input file "+f+" is FTN");
X COOKIE:=2;
X l:=TAB_STOP+2;
X xt:="2";
X Loop
X  xt:=xt+" "+STR(l);
X  ExitIf l>512;
X  l:=l+TAB_STOP;
X EndLoop;
X Set(Tab_Stops,b,xt);
Xelse
X COOKIE:=1;
X Set(Tab_Stops,b,TAB_STOP);
XEndIf;
XMessage("%DETAB-I-TABSET, TAB_STOP set to "+STR(TAB_STOP));
X
XPosition (Beginning_of(b));
XDetab;
X
XIf fT=0 then
X Message("%DETAB-I-NOREPL, No TABS were found, no output file was written.");
Xelse
X Message("%DETAB-I-REPL, "+FAO("!UL TAB!%S replaced by !UL blank!%S.",fT,Tp));
XEndIf;
X
XExit;
X$Exit: Verify=F$Verify(Verify)
$ GoSub Convert_File
$ Exit