[news.admin] Messages with >80-character lines

dce@mips.UUCP (David Elliott) (10/11/87)

A year or so ago, I began noticing news postings with lines longer
than 80 characters. These can be a real pain to read, and at one
time I actually had my global rn kill file set up to junk all
articles from Apollo (where most of these were coming from at the
time).

Anyway, with the proliferation of window systems on the net, I
believe that we may be seeing more and more of this type of thing.

First of all, is this a problem? If so, what can we do about it?
If not, convince me that I shouldn't care (remember that it may
be a while before I can get a wide terminal for home, where I read
news).

-- 
David Elliott		dce@mips.com  or  {ames,decwrl,prls}!mips!dce

jef@unisoft.uucp (Jef Poskanzer) (10/15/87)

In the referenced message, dce@mips.UUCP (David Elliott) wrote:
}A year or so ago, I began noticing news postings with lines longer
}than 80 characters. These can be a real pain to read, and at one
}time I actually had my global rn kill file set up to junk all
}articles from Apollo (where most of these were coming from at the
}time).
}
}Anyway, with the proliferation of window systems on the net, I
}believe that we may be seeing more and more of this type of thing.
}
}First of all, is this a problem? If so, what can we do about it?
}If not, convince me that I shouldn't care (remember that it may
}be a while before I can get a wide terminal for home, where I read
}news).

It is a problem, and what we should do about it is fix the news-reading
and news-transferring programs to handle such messages in a reasonable
manner.  Real soon now, >80 character lines will become the norm, and
we had better be ready for them.

Many people have now discovered that the easiest and most natural way
to make text be screen-width-independent is to use <newline> as a
paragraph separator, not a line separator.  The program that displays
the text to the user then becomes responsible for breaking the paragraphs
up into screen lines.  You would not believe how much nicer this makes
things.  Not only does it solve the problem of different people using
different size windows and different width fonts, it also makes composing
text much more of a pleasure - no more reformatting.

Unfortunately, many programs have built-in limits on line length.
For example, pretty much every mailer on the DOD Internet does
disgusting things to lines >80 characters.  The SMTP protocol
specifies a maximum line length of 1000 characters.  And of course,
vi simply loses.

You can be sure that any programs I write can handle arbitrary-length
lines.  The rest of you had better start hacking...
---
Jef

    Jef Poskanzer  unisoft!jef@ucbvax.Berkeley.Edu  ...ucbvax!unisoft!jef
                    Fools rush in and get the best seats.

                     ...and now, a word from our sponsor:
    "The opinions expressed are those of the author and do not necessarily
       represent those of UniSoft Corp, its staff, or its management."

rees@apollo.uucp (Jim Rees) (10/16/87)

    Many people have now discovered that the easiest and most natural way
    to make text be screen-width-independent is to use <newline> as a
    paragraph separator, not a line separator.  The program that displays
    the text to the user then becomes responsible for breaking the paragraphs
    up into screen lines.  You would not believe how much nicer this makes
    things.  Not only does it solve the problem of different people using
    different size windows and different width fonts, it also makes composing
    text much more of a pleasure - no more reformatting.

I don't see why we should have to change the format of the text as sent.
It's easy to tell where lines and paragraphs end with the existing
format.  Lines end in a single NL, paras end in a double NL.  You can
still write a filter that reformats paras to your favorite line length.
This is in fact what the news reading interface (emacs based) that I
used to use did.

david@ms.uky.edu (David Herron -- Resident E-mail Hack) (10/18/87)

>>    Many people have now discovered that the easiest and most natural way
>>    to make text be screen-width-independent is to use <newline> as a
>>    paragraph separator, not a line separator.

PLEASE

We (ukma) exchange a lot of news with BITNET sites.  In particular,
an IBM machine at the U of Pennsylvania, and a VMS Vax cluster at
the U of Louisville.  In both cases their operating systems limit
test files to some maximum number of characters per line.  (The
IBM machine limits it to 132 columns and I don't know what the
VMS machine limits itself to).

In addition ... the file transfers are going over BITNET.  In this
case, BITNET means   CARD PUNCHES   virtual style.  The news is
transferred using a PUNCH deck (Maybe a print deck ... same problems)
in fixed length records.  We're talking truncation city folks!

The point is that this network is rapidly growing away from it's
roots as a UUCP-only network.  We've got greater use of the Internet
going on as well as (potentially) BITNET.  To an extent we can't
violate the standards of other networks and expect to get away
with it.  Instead, we need to be able to live with them.
-- 
<---- David Herron,  Local E-Mail Hack,  david@ms.uky.edu, david@ms.uky.csnet
<----                    {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<---- I thought that time was this neat invention that kept everything
<---- from happening at once.  Why doesn't this work in practice?

fair@ucbarpa.Berkeley.EDU (Erik E. Fair) (10/18/87)

David, are you telling me that we are bound by the most restrictive
set of standards network-wide that any one transport forces on us?

That's not reasonable. The reasonable approach is to do a trivial
encapsulation or encoding that makes it possible to move USENET
articles (no matter what their characteristics are) through BITNET,
or any other strange network.

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.berkeley.edu

david@ms.uky.edu (David Herron -- Resident E-mail Hack) (10/18/87)

In article <21314@ucbvax.BERKELEY.EDU> fair@ucbarpa.Berkeley.EDU (Erik E. Fair) writes:
>David, are you telling me that we are bound by the most restrictive
>set of standards network-wide that any one transport forces on us?

hmmmm .... weeeelll...  

>That's not reasonable. The reasonable approach is to do a trivial
>encapsulation or encoding that makes it possible to move USENET
>articles (no matter what their characteristics are) through BITNET,
>or any other strange network.

yes, I did exactly that for a long time with a news feed we had
coming from GaTech's sole Unix machine on BITNET (gtfelix).  We
used a little pipeline of "compress -d file | btoa" on the sending
side and "atob | uncompress" on the receiving side.  I still use
that same set of stuff with the feed to the VMS machine.

BUT ... compress and atob/btoa don't run on the IBM 308x that's
out other neighbor on BITNET.  ALSO, in both cases their underlying
operating systems has that record-oriented mentality.

I agree that it's ridiculous that silly details of the transport
system, or other operating systems' storage methods, should
cause us to stunt the development of the software.

BUT

Some of us (you included) are trying to free this network from its'
reliance on Unix.  Building the WorldNet and such like.  But what will
the IBM people on BITNET think if they start seeing every article come
in with 2000 character long lines because someone on a Unix machine
wanted "automatic formatting" of his paragraphs?  They'll only be able
to read the first 80 (132?) characters of each paragraph.

YES ... that 3081 at Penn State and the VMS machine at U of L could
patch up their news to use some other storage method.  But they
will gripe every inch of the way and will end up with a slower system
to boot.  (likely).



In essence you're looking down your noses at these people, and just
continuing the old tradition of saying "My <x> is better than yours".
Of course, they do it just as much as we do.  WHICH DOESN'T MAKE IT
ANY MORE CORRECT A THING TO DO.  Each <x> has it's good points and
bad points.  BASIC is still around because it's an easy to use language
and is very good at certain tasks that just need to be solved quickly.
IBM's are still around because some people just prefer that mind-set.
(I personally don't understand why, they just do).




All I wanted to say in my original posting was that we should always
keep in mind the least-common-demoninator.  At the moment it's 80x24
screens.  But I really like the 66line by 96 column display on
my Blit... :-)


>	Erik E. Fair	ucbvax!fair	fair@ucbarpa.berkeley.edu


-- 
<---- David Herron,  Local E-Mail Hack,  david@ms.uky.edu, david@ms.uky.csnet
<----                    {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<---- I thought that time was this neat invention that kept everything
<---- from happening at once.  Why doesn't this work in practice?

blarson@skat.usc.edu (Bob Larson) (10/19/87)

In article <7526@g.ms.uky.edu> david@ms.uky.edu (David Herron -- Resident E-mail Hack) writes:
>In article <21314@ucbvax.BERKELEY.EDU> fair@ucbarpa.Berkeley.EDU (Erik E. Fair) writes:
>>That's not reasonable. The reasonable approach is to do a trivial
>>encapsulation or encoding that makes it possible to move USENET
>>articles (no matter what their characteristics are) through BITNET,
>>or any other strange network.

>yes, I did exactly that for a long time with a news feed we had

>BUT ... compress and atob/btoa don't run on the IBM 308x that's
>out other neighbor on BITNET.

Who said it had to be compress and btoa?

>I agree that it's ridiculous that silly details of the transport
>system, or other operating systems' storage methods, should
>cause us to stunt the development of the software.

>Some of us (you included) are trying to free this network from its'
>reliance on Unix.  Building the WorldNet and such like.  But what will
>the IBM people on BITNET think if they start seeing every article come
>in with 2000 character long lines because someone on a Unix machine
>wanted "automatic formatting" of his paragraphs?  They'll only be able
>to read the first 80 (132?) characters of each paragraph.

So who's forcing them to truncate????  Why can't they set up some
continuation line convention?  A possible example would be to put a \
in column 80 to indicate that the next line is really part of the
current line.  The only programs that would have to know about such a
convention already have to do ascii <-> ebcdic conversion, etc.  (So
it looks ugly to the news readers on the IBM system.  If they care,
they can fix their software.)

While we're talking about fixing the news problems caused by bitnet,
could they standardize an ascii <-> ebcdic conversion table for this
use and make sure that tabs don't get converted to spaces?  (The
conversion breaks patch files, sendmail.cf files, etc.)

>In essence you're looking down your noses at these people, and just
>continuing the old tradition of saying "My <x> is better than yours".

No, we're saying if you are a single person who wants to talk to
several thousand that already speak the same language, trying to
insist that those thousands always use a subset of their language that
you happen to speak so you don't have to bother to learn the rest of
the language probably won't get you far.

--
Bob Larson		Arpa: Blarson@Ecla.Usc.Edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson		blarson@skat.usc.edu
Prime mailing list (requests):	info-prime-request%fns1@ecla.usc.edu

henry@utzoo.UUCP (Henry Spencer) (10/19/87)

> It is a problem, and what we should do about it is fix the news-reading
> and news-transferring programs to handle such messages in a reasonable
> manner.  Real soon now, >80 character lines will become the norm, and
> we had better be ready for them.

Better yet, we should stay compatible with existing practice -- a matter
of considerable importance in a network like this, where coordinated software
updates are utterly impossible -- and let the long-linists fix *their*
software to present text the way they like it while adhering to existing
standards for inter-system transmission.

> Many people have now discovered that the easiest and most natural way
> to make text be screen-width-independent is to use <newline> as a
> paragraph separator, not a line separator...

Actually, text formatters discovered that it was quite possible to have
text be output-device-width-independent without this silly incompatibility
some twenty or more years ago.  Just notice the empty line that separates
the paragraphs.  (Oh yes, and read the ASCII standard about the meaning of
newline, so you know what you're trying to be compatible with.)
-- 
"Mir" means "peace", as in           |  Henry Spencer @ U of Toronto Zoology
"the war is over; we've won".        | {allegra,ihnp4,decvax,utai}!utzoo!henry

karl@haddock.ISC.COM (Karl Heuer) (10/22/87)

In article <37e7ff5a.b8ab@apollo.uucp> rees@apollo.uucp (Jim Rees) writes:
>I don't see why we should have to change the format of the text as sent.
>It's easy to tell where lines and paragraphs end with the existing
>format.  Lines end in a single NL, paras end in a double NL.

I wish this were true.  Unfortunately, there are some folks out there who use
"\n[ \t][ \t]*" rather than "\n\n" as their paragraph separator.

Write a filter that recognizes both formats, you say?  Good idea, but now I
have to worry about the people who think that indentation is a good way to
highlight quoted text.  And their counterparts who believe that the quoted
text should be left as is, and the reply indented.  Intelligent intervention
is required at this point, and since AI doesn't exist, that means a human.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

jerry@oliveb.UUCP (Jerry Aguirre) (10/22/87)

In article <7526@g.ms.uky.edu> david@ms.uky.edu (David Herron -- Resident E-mail Hack) writes:
>All I wanted to say in my original posting was that we should always
>keep in mind the least-common-demoninator.  At the moment it's 80x24
>screens.  But I really like the 66line by 96 column display on
>my Blit... :-)

ACTUALLY I HAVE SEEN MORE THAN A FEW
ARTICLES THAT WERE POSTED ON UPPERCASE
ONLY TERMINALS.  SOME WERE EVEN
RESTRICTED TO 40 COLUMNS.  IF WE ARE
GOING TO RESTRICT OURSELFS TO THE
LEAST-COMMON-DENOMINATOR THEN LET US USE
40 COLUMN UPPERCASE ONLY.  OH, NO
BRACES, TILDE, OR PIPE SYMBOLS BECAUSE
THEY DON'T PRINT ON SOME TERMINALS.

					All :-) if you couldn't tell.

					Jerry Aguirre
					Systems Administration
					Olivetti ATC

(Actually when I read an all UPPER CASE article I am left with the
impression that the writer has been SHOUTING at me.)

nick@nswitgould.OZ (Nick Andrew) (10/22/87)

in article <7526@g.ms.uky.edu>, david@ms.uky.edu (David Herron -- Resident E-mail Hack) says:
| 
| Some of us (you included) are trying to free this network from its'
| reliance on Unix.  Building the WorldNet and such like.  But what will
| the IBM people on BITNET think if they start seeing every article come
| in with 2000 character long lines because someone on a Unix machine
| wanted "automatic formatting" of his paragraphs?  They'll only be able
| to read the first 80 (132?) characters of each paragraph.
| 

	Gee whiz, if the IBM OS can't handle it then it must be accomplished
by the gateway machine(s). How many gateways are there between BITNET and
UUCP?  It should be a (relatively) simple matter for each gateway processor
to fold long lines before the IBMs get hold of it.  Slower?  Nah ... a couple
of instructions!

ACSnet:    nick@nswitgould.oz	zeta@runx.ips.oz
UUCP:      ...!uunet!munnari!nswitgould.oz!nick
Fidonet:   3:713/602
ACSgate:   3:713/603 (nick@zeta.fido@nswitgould.oz in development)

"Anything that is moral for a group to do is moral for one person to do"
			- Clark Fries in Heinlein's "Podkayne of Mars".

allbery@ncoast.UUCP (Brandon Allbery) (10/23/87)

As quoted from <7526@g.ms.uky.edu> by david@ms.uky.edu (David Herron -- Resident E-mail Hack):
+---------------
| In article <21314@ucbvax.BERKELEY.EDU> fair@ucbarpa.Berkeley.EDU (Erik E. Fair) writes:
| >That's not reasonable. The reasonable approach is to do a trivial
| >encapsulation or encoding that makes it possible to move USENET
| >articles (no matter what their characteristics are) through BITNET,
| >or any other strange network.
| 
| BUT ... compress and atob/btoa don't run on the IBM 308x that's
| out other neighbor on BITNET.  ALSO, in both cases their underlying
| operating systems has that record-oriented mentality.
+---------------

At least one Fido-compatible system uses an encoding such that lines end in
^M and paragraphs end in ^M^J; and the Fido standard is paragraph-oriented,
NOT line-oriented, so as to encourage word wrapping.  I suggest that a system
like this, with ^M inserted between words to force lines to < 80 characters,
would work fine without breaking filesystems based on fixed-length records
rather than variable- length ones (i.e. lines).  (For UNIX, ^M and ^J would
seem to be natural choices.  These can easily be changed to ^M and ^M^J for
non-UNIX sites, a' la "text mode" umodem and kermit.)
-- 
Brandon S. Allbery		     necntc!ncoast!allbery@harvard.harvard.edu
  {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery

rhorn@infinet.UUCP (Rob Horn) (10/24/87)

I don't think that these simple solutions will work well.  There are
many aspects of an article that have to be handled properly:
indentation of quotations, marking inclusions from previous postings,
poetry, pictures, etc.

A new format that can convey all this information and still meet the
needs of the most restrictive transport mechanisms would be to define
a minimal set of TeX macros that encompass the kinds of text
structures that news articles need.  Then the news reading software
can tailor the formatted display to the capabilities of the display
hardware.  The super fancy bitmap displays get spiffy formats and the
CRT users get the same old stuff.  This would have one drawback(?).
Since information like ``prior article inclusion'' is now formatted
locally a poster could not control whether the display uses >>'s,
font, or indentation to signify inclusion.  Similarly other text
structures might display differently on different devices.

I don't think this overall approach is yet practical.  A recognizing
filter seems plausible enough, although more complex than these early
posters seem to realize.  But on the display side I think that the
computational load would be excessive.  I can just imagine our poor
little 11/750 attempting to run multiple TeX's for all our news
readers.  Maybe some fast CRT oriented substitute could be dreamed up.

Another set of problems is implementing this in a manner that allows
for a rational transition period.  It must be able to coexist with
prior versions of software for several years --- this being the
approximate lifespan of obsolete versions of news software.  It must
be easy to add as an upgrade.  Both pose real difficulties. 

TeX is not the only suitable system, but it is well suited to conveying
structure independently from text.  If the other problems can be
overcome than the selection of what formatting system to use becomes
interesting.  I have my doubts about the suitability of either troff
or Postscript (and thus NeWS) because both of these are too close to
the display device and have already mapped some of the textual
structures into specific formatting concepts.



-- 
				Rob  Horn
	UUCP:	...harvard!adelie!infinet!rhorn
	Snail:	Infinet,  40 High St., North Andover, MA
	(Note: harvard!infinet path is in maps but not working yet)

dce@mips.UUCP (David Elliott) (10/25/87)

I'm really glad to see that my question has sparked so much thought
and discussion.

With C news in it's Alpha stage, it might be nice to have some kind
of interim solution. For example, if the new news posting mechanism
(still inews?) could look at the message and if it finds any long
lines (where "long" can be arbitrarily set to 80 characters for now),
it prints the message:

	Warning: This article contains lines longer than 80 characters,
	         making it difficult for some people to read. The
		 articles has been sent, but you should refrain from
		 doing this in the future.

This may be better to do in the news posting front ends (postnews,
Pnews, etc.), which could allow users to edit the article again to
remedy the situation, but this is more work to implement.

Another idea might be to have some sites (backbones) scan articles
for long lines, and send mail to the poster with a message similar 
to the warning above.

Yet another idea might be to add a Max-Line-Length: header field,
generated by inews for articles with lines longer than 80 (again,
chosen arbitrarily, and I would even suggest 40 in this case).
This field could be used by news software to reformat articles if
the user wishes, or in the rn KILL file to junk such messages (as
I said, if it looks too hard to read, I tend to toss it).

-- 
David Elliott		dce@mips.com  or  {ames,decwrl,prls}!mips!dce

henry@utzoo.UUCP (Henry Spencer) (10/27/87)

> ... Unfortunately, there are some folks out there who use
> "\n[ \t][ \t]*" rather than "\n\n" as their paragraph separator...
> Write a filter that recognizes both formats, you say?  Good idea, but now I
> have to worry about the people who think that indentation is a good way to
> highlight quoted text... [and so on]

However, it is probably easier (if that is the word) and less painful to
convince people to adhere to standards in such things than to convince them
to shift to a new and *incompatible* standard.  The former can be at least
partly automated, by the way.
-- 
PS/2: Yesterday's hardware today.    |  Henry Spencer @ U of Toronto Zoology
OS/2: Yesterday's software tomorrow. | {allegra,ihnp4,decvax,utai}!utzoo!henry

kimcm@ambush.UUCP (Kim Chr. Madsen) (10/30/87)

In article <8831@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:

>However, it is probably easier (if that is the word) and less painful to
>convince people to adhere to standards in such things than to convince them
>to shift to a new and *incompatible* standard.  The former can be at least
>partly automated, by the way.

Which standard?????

The standards for writing style are dependent upon several things:

	1) Whom are you writing to (Newspaper, Tech. Journal, Letter
	   to Mom, etc.)
	2) Where do you come from (different countries have different
	   style standards).

etc. etc.

				Kim Chr. Madsen.

henry@utzoo.UUCP (Henry Spencer) (11/07/87)

> Which standard?????
> The standards for writing style are dependent upon several things:
> 	1) Whom are you writing to (Newspaper, Tech. Journal, Letter
> 	   to Mom, etc.)

So we set up a specific standard for Usenet.  No big deal, except for the
highly non-trivial problem of getting people to adhere to it.  My point
remains:  getting people to use a new standard will be easier if it doesn't
require scrapping everything that exists and starting over.
-- 
Those who do not understand Unix are |  Henry Spencer @ U of Toronto Zoology
condemned to reinvent it, poorly.    | {allegra,ihnp4,decvax,utai}!utzoo!henry