henseler@uniol.UUCP (Herwig Henseler) (02/28/90)
Hello World, When Un*x-mailers read a mailbox, they have to scan the whole file for "From_"-lines to detect the top of the messages. So does elm. I can hardly imagine a more uneffective way of achieving this aim! The mailbox format is old enough to overcome an improvement... Idea: Why not index this positions in a second file, so that only this file (with seek-positions for every "From_"-line together with the "From:"-entry, the "Subject:"-line and the total amount of lines) has to be scanned to build the internal tables for elm. This will be _much_ faster ! This would not break any standard, because this file may not exist for the incoming mailbox. Maybe some MTA's will adopt this mechanism! Even if not, it would speed up reading my folders. The only danger is deleting mails from an indexed mailbox, but this can be detected via the last-modified-date of the file. Comments? bye, Herwig -- ## Herwig Henseler (CS-Student) D-2930 Varel, Tweehoernweg 69 | Brain fault- ## ## EMail: henseler@uniol.UUCP (..!uunet!unido!uniol!henseler) | core dumped ##
syd@DSI.COM (Syd Weinstein) (03/01/90)
henseler@uniol.UUCP (Herwig Henseler) writes: >When Un*x-mailers read a mailbox, they have to scan the whole file for >"From_"-lines to detect the top of the messages. So does elm. I can hardly >imagine a more uneffective way of achieving this aim! The mailbox format >is old enough to overcome an improvement... >Idea: Why not index this positions in a second file, so that only this > file (with seek-positions for every "From_"-line together with the > "From:"-entry, the "Subject:"-line and the total amount of lines) > has to be scanned to build the internal tables for elm. This will be > _much_ faster ! This was discusses a while back in the development group. Two proposals were considered, one imbed the index in the file itself as a fake pseudo message, the second was to use a seperate file, with the sub ideas of one file per user or one file per mail file. However, this whole point becomes less important as we head toward the Content-Length: header which allows for seeking over the body anyway and we do need to read the headers anyway. -- ===================================================================== Sydney S. Weinstein, CDP, CCP Elm Coordinator Datacomp Systems, Inc. Voice: (215) 947-9900 syd@DSI.COM or {bpa,vu-vlsi}!dsinc!syd FAX: (215) 938-0235
ror@grassys.bc.ca (Richard O'Rourke) (03/01/90)
In article <1990Feb28.230830.9818@DSI.COM>, syd@DSI.COM (Syd Weinstein) writes: > henseler@uniol.UUCP (Herwig Henseler) writes: # # >When Un*x-mailers read a mailbox, they have to scan the whole file for # >"From_"-lines to detect the top of the messages. So does elm. I can hardly # >imagine a more uneffective way of achieving this aim! The mailbox format # >is old enough to overcome an improvement... # # >Idea: Why not index this positions in a second file, so that only this # > file (with seek-positions for every "From_"-line together with the # > "From:"-entry, the "Subject:"-line and the total amount of lines) # > has to be scanned to build the internal tables for elm. This will be # > _much_ faster ! # This was discusses a while back in the development group. Two proposals # were considered, one imbed the index in the file itself as a fake pseudo # message, the second was to use a seperate file, with the sub ideas # of one file per user or one file per mail file. # If you're going to go through this much trouble, I respectfully recommend a reading of the applicable X.400 docs on message data base handling. I'm not suggesting you spend your next year implementing X.400. I am suggesting that if you are going to take the step of using 'mangled' message files or some sort of keyed or database message system, that a perusal of applicable standards is in order. It would be a step in the right direction. # ===================================================================== # Sydney S. Weinstein, CDP, CCP Elm Coordinator # Datacomp Systems, Inc. Voice: (215) 947-9900 # syd@DSI.COM or {bpa,vu-vlsi}!dsinc!syd FAX: (215) 938-0235
les@chinet.chi.il.us (Leslie Mikesell) (03/02/90)
In article <1990Feb28.230830.9818@DSI.COM> syd@DSI.COM writes: >However, this whole point becomes less important as we head toward the >Content-Length: header which allows for seeking over the body anyway >and we do need to read the headers anyway. I'd like to see Content-Length: handling for compatibility with the AT&T PMX mailers (attach/detach of multi-part messages would be nice too). Does anything else currently use it? However, it would still save time to have an optional copy of the headers and file offsets of each message stored in a 2nd file. Perhaps you could just dump the internal index when saving a mailbox over a certain size, then next time check for that file and if it exists, checkpoint the last entry to verify that it is unchanged up to that point, and merge in any appended items. If you want something really different, I'd like to see something like a zoo archive with the body compressed as an optional storage format. Les Mikesell les @chinet.chi.il.us