[comp.lang.c] How do I SHORTEN a file without rewriting it?

alex@bilver.UUCP (Alex Matulich) (10/23/90)

I have a very large file which was made by writing a bunch of data
structures out to disk.  When I wish to delete a structure from this
data file, I simply read all the structures following the one to be deleted
and write them out one record-length forward, one at a time.  I use the
fseek(), fread(), and fwrite() functions for this.

This process does the job, but it leaves an unused record at the end of
the file, and of course the file length remains unchanged.

Is there a way to shorten a file, that is, chop some data off the end of
it, so that it doesn't consume as much physical space on the disk?  The
file I have is too big to read into memory and write back out again, and
there is not enough room on the disk to write out a temporary file.

You can email any help you want, but I'll be looking in this newsgroup
for answers also.  Thanks...

-- 
 _ |__  Alex Matulich   (alex@bilver.UUCP)
 /(+__>  Unicorn Research Corp, 4621 N Landmark Dr, Orlando, FL 32817
//| \     UUCP:  ...uunet!tarpit!bilver!alex
///__)     bitnet:  IN%"bilver!alex@uunet.uu.net"

bad@atrain.sw.stratus.com (Bruce Dumes) (10/24/90)

In article <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>
>Is there a way to shorten a file, that is, chop some data off the end of
>it, so that it doesn't consume as much physical space on the disk?  The
>file I have is too big to read into memory and write back out again, and
>there is not enough room on the disk to write out a temporary file.
>

Have you thought about using ftruncate()?

Bruce


--
Bruce Dumes			|  "You don't see many of *these* nowdays, |
bad@zen.cac.stratus.com		|   do you?"				   |

michi@ptcburp.ptcbu.oz.au (Michael Henning) (10/25/90)

alex@bilver.UUCP (Alex Matulich) writes:

>Is there a way to shorten a file, that is, chop some data off the end of
>it, so that it doesn't consume as much physical space on the disk?  The
>file I have is too big to read into memory and write back out again, and
>there is not enough room on the disk to write out a temporary file.

Ftruncate() (BSD call) will do the job. Under AIX (maybe others), there
is an fclear() call that allows you to punch holes into a file at arbitrary
places. The blocks corresponding the hole(s) are returned to the file system.
In SysV.4, you can use fntl() to do the same.

							Michi.

-- 
      -m------- Michael Henning			+61 75 950255
    ---mmm----- Pyramid Technology		+61 75 522475 FAX
  -----mmmmm--- Research Park, Bond University	michi@ptcburp.ptcbu.oz.au
-------mmmmmmm- Gold Coast, Q 4229, AUSTRALIA	uunet!munnari!ptcburp.oz!michi

alex@bilver.UUCP (Alex Matulich) (10/26/90)

In article <179@ptcburp.ptcbu.oz.au> michi@ptcburp.ptcbu.oz.au (Michael Henning) writes:
>>Is there a way to shorten a file, that is, chop some data off the end of
>
>Ftruncate() (BSD call) will do the job. Under AIX (maybe others), there
>is an fclear() call that allows you to punch holes into a file at arbitrary
>places. The blocks corresponding the hole(s) are returned to the file system.
>In SysV.4, you can use fntl() to do the same.

All very fine suggestions, provided I am running unix or a derivative of
unix.  A couple ANSI C-compilers I have looked at for MS-DOS do not have
these functions.

I was hoping there was a portable ANSI-ish way to accomplish this, but
it's beginning to look like that's not the case.

Thanks to all those who replied to my question!

-- 
 _ |__  Alex Matulich   (alex@bilver.UUCP)
 /(+__>  Unicorn Research Corp, 4621 N Landmark Dr, Orlando, FL 32817
//| \     UUCP:  ...uunet!tarpit!bilver!alex
///__)     bitnet:  IN%"bilver!alex@uunet.uu.net"

lairdb@crash.cts.com (Laird Broadfield) (10/27/90)

In article <2830@lectroid.sw.stratus.com> bad@atrain.sw.stratus.com (Bruce Dumes) writes:
>In article <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>>
>>Is there a way to shorten a file, that is, chop some data off the end of
>>it, so that it doesn't consume as much physical space on the disk?  The
>>file I have is too big to read into memory and write back out again, and
>>there is not enough room on the disk to write out a temporary file.
>
>Have you thought about using ftruncate()?

Okay, tell us where in K&R you find ftruncate.  I don't see it in
the TurboC manual, or K&R2.  Perhaps YOUR funky.lib has it, but not
everyone's does.  

As long as we're on the subject, does anyone have a neat-o method for
getting rid of records from the _beginning_ of a file?  (Standard preferred,
but if there's an MSDOS way, I'll accept it....)


-- 
--  Laird P. Broadfield                        | Year after year, site after
    UUCP: {akgua, sdcsvax, nosc}!crash!lairdb  | site, and I still can't think
    INET: lairdb@crash.cts.com                 | of a funny enough .sig.

henry@zoo.toronto.edu (Henry Spencer) (10/28/90)

In article <1244@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>I was hoping there was a portable ANSI-ish way to accomplish this...

Basically, no.  ANSI C unfortunately cannot be very ambitious about file
operations, since there are so many brain-damaged operating systems out
there that are nevertheless viable C environments.
-- 
The type syntax for C is essentially   | Henry Spencer at U of Toronto Zoology
unparsable.             --Rob Pike     |  henry@zoo.toronto.edu   utzoo!henry

thurlow@convex.com (Robert Thurlow) (10/29/90)

In <5289@crash.cts.com> lairdb@crash.cts.com (Laird Broadfield) writes:

>In article <2830@lectroid.sw.stratus.com> bad@atrain.sw.stratus.com (Bruce Dumes) writes:
>>In article <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>>>
>>>Is there a way to shorten a file, that is, chop some data off the end of
>>>it, so that it doesn't consume as much physical space on the disk?  The
>>>file I have is too big to read into memory and write back out again, and
>>>there is not enough room on the disk to write out a temporary file.
>>
>>Have you thought about using ftruncate()?

>Okay, tell us where in K&R you find ftruncate.  I don't see it in
>the TurboC manual, or K&R2.  Perhaps YOUR funky.lib has it, but not
>everyone's does.  

Take it easy, Laird.  ftruncate() is a Berkeley Unix system call, and
would probably have nothing to do with K&R or Turbo C in any case.
While it isn't all that useful to point to it without saying where it
comes from, there very well might have been a lookalike function in
someone's C library.  And there was no clue to readers of comp.lang.c
that Alex wanted an MS-DOS solution.

Some people don't think enough about the fact that C is everywhere,
and that <insert your OS of choice here> is not reflective of the way
the whole world works.  It annoys me wherever I see it.  Bruce didn't
qualify where he found ftruncate(), but Alex committed an equal crime
by not saying he needed a very operating-system-dependent call on
MS-DOS, not BSD 4.3.  Some people get really confused between language
features and standard library routines and system calls, as well.  All
we can do is try to add to the information, rather than bitch because
people are talking about something Turbo C doesn't happen to support.

Rob T
--
Rob Thurlow, thurlow@convex.com or thurlow%convex.com@uxc.cso.uiuc.edu
----------------------------------------------------------------------
"This opinion was the only one available; I got here kind of late."

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (10/29/90)

In article <5289@crash.cts.com>, lairdb@crash.cts.com (Laird Broadfield) writes:
>In article <2830@lectroid.sw.stratus.com> bad@atrain.sw.stratus.com (Bruce Dumes) writes:
>>In article <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>>>
>>>Is there a way to shorten a file, that is, chop some data off the end of
>>    <<< deletions >>>
>>Have you thought about using ftruncate()?
>
>Okay, tell us where in K&R you find ftruncate.  I don't see it in
>     <<< deletions >>>
>As long as we're on the subject, does anyone have a neat-o method for
>getting rid of records from the _beginning_ of a file?  (Standard preferred,
>but if there's an MSDOS way, I'll accept it....)
>
Have you thought about using f_pre_truncate()? I think you'll find
it with ftruncate(), but in smiley.lib, not in funky.lib. :-)
 
-------------------+-------------------------------------------
Al Dunbar          |
Edmonton, Alberta  |   this space for rent
CANADA             |
-------------------+-------------------------------------------

Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout) (10/29/90)

  Since the Unix folks were quick to answer with environment-specific answers 
to an inherently non-portable question, I can feel free to tell you how to do 
it under MS-DOS since this is apparently what you were looking for.

  Use a DOS low-level (file handle) open call to open the file. Then, still 
using the low-level DOS services, seek to the position where you wish to 
truncate the file, then perform a DOS write of zero bytes. The same technique 
can be used to extend a file by seeking to paosition past its present end.

  Several DOS C compilers already contain non_ANSI functions in their standard 
libraries to do this - look for chsize(). Hope this helps...
 

lairdb@crash.cts.com (Laird Broadfield) (10/29/90)

In article <thurlow.657134836@convex.convex.com> you write:
>In <5289@crash.cts.com> lairdb@crash.cts.com (Laird Broadfield) writes:
>
>>In article <2830@lectroid.sw.stratus.com> bad@atrain.sw.stratus.com (Bruce Dumes) writes:
>>>In article <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>>>>
>>>>Is there a way to shorten a file, that is, chop some data off the end of
>>>>it, so that it doesn't consume as much physical space on the disk?  The
>>>>file I have is too big to read into memory and write back out again, and
>>>>there is not enough room on the disk to write out a temporary file.
>>>
>>>Have you thought about using ftruncate()?
>
>>Okay, tell us where in K&R you find ftruncate.  I don't see it in
>>the TurboC manual, or K&R2.  Perhaps YOUR funky.lib has it, but not
>>everyone's does.  
>
>Take it easy, Laird.  ftruncate() is a Berkeley Unix system call, and
>would probably have nothing to do with K&R or Turbo C in any case.
>While it isn't all that useful to point to it without saying where it
>comes from, there very well might have been a lookalike function in
>someone's C library.  And there was no clue to readers of comp.lang.c
>that Alex wanted an MS-DOS solution.
>
>Some people don't think enough about the fact that C is everywhere,
>and that <insert your OS of choice here> is not reflective of the way
>the whole world works.  It annoys me wherever I see it.  ...

[flame control at 30%]
I don't particularly want to drag this out much further, but I've
received at least one flame on my message as well as your (non-flame,
(mostly) well-thought-out) response.  Perhaps my response was a little
vitriolic, but Bruce's had more than a small air of "What a stupid 
question."

1)  Alex did not ask for an MSDOS solution, he asked for a C solution.
    A proper response might have been "Let me explain the difference
    between language features and library functions ...()... therefore
    you should RTFM, libraries differ."  I don't think "Here's the function
    you blind idiot." with no mention of OS or compiler differences was
    valid.  In fact, _I_ was the first one to mention TurboC, and I could
    as well have mentioned my 8080-CP/M environment C compiler's libraries,
    the point is the same.  (It doesn't have ftruncate() either.)  (We
    may never know what Alex's environment is, he's probably too embarrased
    by now over the fuss it's kicked up.  :-) )

2)  K&R is not the ANSI standard.  K&R2 _incorporates_ the ANSI standard,
    and a whole lot of other stuff, including library _guidelines_ which
    have, substantially, been followed by compiler vendors on _many_ 
    targets.  (Actually, I don't have it handy here, the library stuff
    may be part of the ANSI document, but I'm pretty sure if so they
    are "Additions and Appendices" rather than part of the standard itself.)

3)  I was not "bitching" that Bruce did not provide a tailored MSDOS-
    specific answer; I was "annoyed", as you say you are, by his asumption
    that BSD is the world.  Again, perhaps I was a little harsh in my
    response, and admittedly most academic computing fosters this view,
    but people do need to keep in mind that C implements on an incredibly
    wide range of targets and OSs.  Personally I think MSDOS is among the
    worst, but that's neither here nor there.

Anyway:
 1) Alex, read and understand about the differences between library and
    language, and then please _do_ ask appropriate questions.  Just
    remember to include sufficient background so net.land can help.
 2) Bruce, I didn't really mean to reply to a spitball with a tac-nuke,
    but you should remember that Berkeley, and *IX for that matter, is
    not the world.  Please accept one minor-apology token if you feel
    you deserve one.
 3) Rob, your comments are appropriate and well-taken.

'Nuff said.  ("yow!  let's burn some bandwidth, binky!")
-- 
--  Laird P. Broadfield                        | Year after year, site after
    UUCP: {akgua, sdcsvax, nosc}!crash!lairdb  | site, and I still can't think
    INET: lairdb@crash.cts.com                 | of a funny enough .sig.

John.Passaniti@f201.n260.z1.FIDONET.ORG (John Passaniti) (10/30/90)

 > As long as we're on the subject, does anyone have a
 > neat-o method for getting rid of records from the
 > _beginning_ of a file?  (Standard preferred,
 > but if there's an MSDOS way, I'll accept it....)

     If you are interested in a SLEAZY way to do this 
quickly, and if your records are multiples of the size of 
your minimum allocation units (clusters in MS-DOS), then you 
can modify the directory entry of the file to point to 
whatever record you want to start at.  Change the file size 
as necessary, and free the unused clusters at the beginning 
of the file.

     I said it was sleazy...


--  
*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*
John Passaniti - via FidoNet node 1:260/230
UUCP: ...!rochester!ur-valhalla!rochgte!201!John.Passaniti
INTERNET: John.Passaniti@f201.n260.z1.FIDONET.ORG
*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*

Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout) (10/30/90)

In a message of <Oct 27 00:12>, Laird Broadfield (lairdb@crash.cts.com ) writes: 


 >Okay, tell us where in K&R you find ftruncate.  I don't see it in
 >the TurboC manual, or K&R2.  Perhaps YOUR funky.lib has it, but not
 >everyone's does.  

  While you have your TC manual open, turn to chsize()...

 >As long as we're on the subject, does anyone have a neat-o method for
 >getting rid of records from the _beginning_ of a file?  (Standard 
 >preferred, but if there's an MSDOS way, I'll accept it....)

  That one you'll have to write yourself. 

david@csource.oz.au (david nugent) (11/02/90)

In <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:

>Is there a way to shorten a file, that is, chop some data off the end of
>it, so that it doesn't consume as much physical space on the disk?  The
>file I have is too big to read into memory and write back out again, and
>there is not enough room on the disk to write out a temporary file.

Write zero bytes at that position.

Some C libraries have a chsize() function which does exactly that.
Since those libraries also don't seem to allow writing of zero bytes
you will need to create your own write function.


chsize.c:


  int chsize (int fd, long newsize)
  {
     r = -1;

     if (lseek (fd, newsize, SEEK_SET) != -1L)
          r = _write (fd, NULL, 0);
     return r;
  }

  
_write.asm:

.model c,small

.code

_write PROC, fd:WORD, buf:PTR, count:WORD

   mov bx,[fd]
   mov cx,[count]
   mov dx,[buf]
   mov ah,40H
   int 21H
   jnc .W0
   mov ax,-1
.W0:
   ret

_write ENDP
   
-- 

        Fidonet: 3:632/348   SIGnet: 28:4100/1  Imex: 90:833/387
              Data:  +61-3-885-7864   Voice: +61-3-826-6711
 Internet/ACSnet: david@csource.oz.au    Uucp: ..!uunet!munnari!csource!david

dfoster@jarthur.Claremont.EDU (Derek R. Foster) (11/03/90)

In article <747@csource.oz.au> david@csource.oz.au (david nugent) writes:
>In <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>
>>Is there a way to shorten a file, that is, chop some data off the end of
>>it, so that it doesn't consume as much physical space on the disk?  The
>>file I have is too big to read into memory and write back out again, and
>>there is not enough room on the disk to write out a temporary file.
>
>Write zero bytes at that position.

If this works, it isn't documented in the Microsoft C manuals I have.
(And believe me, I searched!) After SEVERAL calls to Microsoft,
(Two seperate people told me it couldn't be done from either within C or
through DOS! I thought these people were supposed to be knowledgeable!)
and a great deal of loud cursing, I was finally led to the chsize()
function. This seems to be the only way of doing this from within
Microsoft C, (And I suspect Turbo C as well.) If you are using
streams, you will probably have to close your stream, reopen the file
using handles, chsize() it, close it again, reopen using streams...
What a mess. But it works, and is better than (in my case) copying
a 20-meg file to a shorter length...

>Some C libraries have a chsize() function which does exactly that.
>Since those libraries also don't seem to allow writing of zero bytes
>you will need to create your own write function.

I'm not sure why it is preferable to create one's own write function
instead of just using chsize(). What is the advantage?

Derek Riippa Foster

ho@hoss.unl.edu (Tiny Bubbles...) (11/05/90)

In <9505@jarthur.Claremont.EDU> dfoster@jarthur.Claremont.EDU (Derek R. Foster) writes:
>In article <747@csource.oz.au> david@csource.oz.au (david nugent) writes:
>>In <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>>>Is there a way to shorten a file, that is, chop some data off the end of
>>>it, so that it doesn't consume as much physical space on the disk?  The
>>Write zero bytes at that position.
>                                              If you are using
>streams, you will probably have to close your stream, reopen the file
>using handles, chsize() it, close it again, reopen using streams...

If I were doing it, I'd just flush the stream, use the fileno() macro
to get the handle number, and then either close it or fseek() to the 
beginning (just to be safe).

I often do crappy things like changing file modes, etc. in this manner,
and it works fine.  Just flush before you access with handles.  Again,
I'm paranoid.  Most of the time, the flush and seek aren't needed.
--
        ... Michael Ho, University of Nebraska
Internet: ho@hoss.unl.edu | "Mine... is the last voice that you will ever hear."

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (11/05/90)

In <9505@jarthur.Claremont.EDU> dfoster@jarthur.Claremont.EDU (Derek R.
Foster) writes:

>>>Is there a way to shorten a file...

>>Write zero bytes at that position.

>If this works, it isn't documented in the Microsoft C manuals I have.
>[chsize()] seems to be the only way of doing this from within
>Microsoft C...

A write of zero bytes using the MS-DOS "write to file handle" system
call will indeed truncate the file at the current seek position.
User programs could well contain statements of the sort:

     count = read(fin, buf, count);
     if (count != -1)
          (void) write(fout, buf, count);

A zero-byte write could truncate the file connected with fout.  If we
had seeked to near the beginning of a long file and were trying to
overwrite some of the data without losing the rest, truncation would be
disastrous.

So it's at least an even bet that your C library carefully avoids
passing zero-byte writes on to MS-DOS.

The chsize() function, on the other hand, most likely simply does a
seek followed by zero-byte write.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi
A pointer is not an address.  It is a way of finding an address. -- me

dougs@videovax.tv.tek.com (Doug Stevens) (11/06/90)

> In <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
> 	Is there a way to shorten a file ... ?

Turbo-C does indeed include chsize() in the library. Examining their
implementation, they use exactly the same trick, ie, a seek to the desired
point of truncation, and then a write of 0 bytes.

Look at page 1308 of the MS-DOS Encyclopedia (under Interrupt 21H, Function
40H, 'Write File or Device'):

	'If CX=0, the file is truncated or extended to the current 
	file pointer location'.

(CX is the number of bytes to write).

	

otto@tukki.jyu.fi (Otto J. Makela) (11/06/90)

In article <9505@jarthur.Claremont.EDU> dfoster@jarthur.Claremont.EDU (Derek R. Foster) writes:
   In article <747@csource.oz.au> david@csource.oz.au (david nugent) writes:
   >In <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
   >[how do I shorten a file ?]
   >Write zero bytes at that position. [MeSsy-DOS only solution]

   If this works, it isn't documented in the Microsoft C manuals I have.
   (And believe me, I searched!) After SEVERAL calls to Microsoft,
   (Two seperate people told me it couldn't be done from either within C or
   through DOS! I thought these people were supposed to be knowledgeable!)
   and a great deal of loud cursing, I was finally led to the chsize()
   function. This seems to be the only way of doing this from within
   Microsoft C, (And I suspect Turbo C as well.) If you are using
   streams, you will probably have to close your stream, reopen the file
   using handles, chsize() it, close it again, reopen using streams...
   What a mess. But it works, and is better than (in my case) copying
   a 20-meg file to a shorter length...

Look at the fileno() function in your manual (your library does have it,
I hope).  Returns the file descriptor (handle as it's called in MeSsy-DOS)
for the given stream.
--
   /* * * Otto J. Makela <otto@jyu.fi> * * * * * * * * * * * * * * * * * * */
  /* Phone: +358 41 613 847, BBS: +358 41 211 562 (CCITT, Bell 24/12/300) */
 /* Mail: Kauppakatu 1 B 18, SF-40100 Jyvaskyla, Finland, EUROPE         */
/* * * Computers Rule 01001111 01001011 * * * * * * * * * * * * * * * * */

smryan@garth.UUCP (Steven Ryan) (11/07/90)

In article <1990Oct27.231207.5611@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>operations, since there are so many brain-damaged operating systems out

Like Unix.

NOS, on the other hand, delinks following tracks (truncates the file) for all
sequential writes.
-- 
...!uunet!ingr!apd!smryan                                       Steven Ryan
...!{apple|pyramid}!garth!smryan              2400 Geng Road, Palo Alto, CA

drd@siia.mv.com (David Dick) (11/08/90)

In <747@csource.oz.au> david@csource.oz.au (david nugent) writes:

>In <1162@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:

>>Is there a way to shorten a file, that is, chop some data off the end of
>>it, so that it doesn't consume as much physical space on the disk?  The
>>file I have is too big to read into memory and write back out again, and
>>there is not enough room on the disk to write out a temporary file.

>Write zero bytes at that position.

Of course, this won't work on a UNIX system.