[news.misc] Messages with >80-character lines

jef@unisoft.uucp (Jef Poskanzer) (10/15/87)

In the referenced message, dce@mips.UUCP (David Elliott) wrote:
}A year or so ago, I began noticing news postings with lines longer
}than 80 characters. These can be a real pain to read, and at one
}time I actually had my global rn kill file set up to junk all
}articles from Apollo (where most of these were coming from at the
}time).
}
}Anyway, with the proliferation of window systems on the net, I
}believe that we may be seeing more and more of this type of thing.
}
}First of all, is this a problem? If so, what can we do about it?
}If not, convince me that I shouldn't care (remember that it may
}be a while before I can get a wide terminal for home, where I read
}news).

It is a problem, and what we should do about it is fix the news-reading
and news-transferring programs to handle such messages in a reasonable
manner.  Real soon now, >80 character lines will become the norm, and
we had better be ready for them.

Many people have now discovered that the easiest and most natural way
to make text be screen-width-independent is to use <newline> as a
paragraph separator, not a line separator.  The program that displays
the text to the user then becomes responsible for breaking the paragraphs
up into screen lines.  You would not believe how much nicer this makes
things.  Not only does it solve the problem of different people using
different size windows and different width fonts, it also makes composing
text much more of a pleasure - no more reformatting.

Unfortunately, many programs have built-in limits on line length.
For example, pretty much every mailer on the DOD Internet does
disgusting things to lines >80 characters.  The SMTP protocol
specifies a maximum line length of 1000 characters.  And of course,
vi simply loses.

You can be sure that any programs I write can handle arbitrary-length
lines.  The rest of you had better start hacking...
---
Jef

    Jef Poskanzer  unisoft!jef@ucbvax.Berkeley.Edu  ...ucbvax!unisoft!jef
                    Fools rush in and get the best seats.

                     ...and now, a word from our sponsor:
    "The opinions expressed are those of the author and do not necessarily
       represent those of UniSoft Corp, its staff, or its management."

rees@apollo.uucp (Jim Rees) (10/16/87)

    Many people have now discovered that the easiest and most natural way
    to make text be screen-width-independent is to use <newline> as a
    paragraph separator, not a line separator.  The program that displays
    the text to the user then becomes responsible for breaking the paragraphs
    up into screen lines.  You would not believe how much nicer this makes
    things.  Not only does it solve the problem of different people using
    different size windows and different width fonts, it also makes composing
    text much more of a pleasure - no more reformatting.

I don't see why we should have to change the format of the text as sent.
It's easy to tell where lines and paragraphs end with the existing
format.  Lines end in a single NL, paras end in a double NL.  You can
still write a filter that reformats paras to your favorite line length.
This is in fact what the news reading interface (emacs based) that I
used to use did.

david@ms.uky.edu (David Herron -- Resident E-mail Hack) (10/18/87)

>>    Many people have now discovered that the easiest and most natural way
>>    to make text be screen-width-independent is to use <newline> as a
>>    paragraph separator, not a line separator.

PLEASE

We (ukma) exchange a lot of news with BITNET sites.  In particular,
an IBM machine at the U of Pennsylvania, and a VMS Vax cluster at
the U of Louisville.  In both cases their operating systems limit
test files to some maximum number of characters per line.  (The
IBM machine limits it to 132 columns and I don't know what the
VMS machine limits itself to).

In addition ... the file transfers are going over BITNET.  In this
case, BITNET means   CARD PUNCHES   virtual style.  The news is
transferred using a PUNCH deck (Maybe a print deck ... same problems)
in fixed length records.  We're talking truncation city folks!

The point is that this network is rapidly growing away from it's
roots as a UUCP-only network.  We've got greater use of the Internet
going on as well as (potentially) BITNET.  To an extent we can't
violate the standards of other networks and expect to get away
with it.  Instead, we need to be able to live with them.
-- 
<---- David Herron,  Local E-Mail Hack,  david@ms.uky.edu, david@ms.uky.csnet
<----                    {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<---- I thought that time was this neat invention that kept everything
<---- from happening at once.  Why doesn't this work in practice?

fair@ucbarpa.Berkeley.EDU (Erik E. Fair) (10/18/87)

David, are you telling me that we are bound by the most restrictive
set of standards network-wide that any one transport forces on us?

That's not reasonable. The reasonable approach is to do a trivial
encapsulation or encoding that makes it possible to move USENET
articles (no matter what their characteristics are) through BITNET,
or any other strange network.

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.berkeley.edu

david@ms.uky.edu (David Herron -- Resident E-mail Hack) (10/18/87)

In article <21314@ucbvax.BERKELEY.EDU> fair@ucbarpa.Berkeley.EDU (Erik E. Fair) writes:
>David, are you telling me that we are bound by the most restrictive
>set of standards network-wide that any one transport forces on us?

hmmmm .... weeeelll...  

>That's not reasonable. The reasonable approach is to do a trivial
>encapsulation or encoding that makes it possible to move USENET
>articles (no matter what their characteristics are) through BITNET,
>or any other strange network.

yes, I did exactly that for a long time with a news feed we had
coming from GaTech's sole Unix machine on BITNET (gtfelix).  We
used a little pipeline of "compress -d file | btoa" on the sending
side and "atob | uncompress" on the receiving side.  I still use
that same set of stuff with the feed to the VMS machine.

BUT ... compress and atob/btoa don't run on the IBM 308x that's
out other neighbor on BITNET.  ALSO, in both cases their underlying
operating systems has that record-oriented mentality.

I agree that it's ridiculous that silly details of the transport
system, or other operating systems' storage methods, should
cause us to stunt the development of the software.

BUT

Some of us (you included) are trying to free this network from its'
reliance on Unix.  Building the WorldNet and such like.  But what will
the IBM people on BITNET think if they start seeing every article come
in with 2000 character long lines because someone on a Unix machine
wanted "automatic formatting" of his paragraphs?  They'll only be able
to read the first 80 (132?) characters of each paragraph.

YES ... that 3081 at Penn State and the VMS machine at U of L could
patch up their news to use some other storage method.  But they
will gripe every inch of the way and will end up with a slower system
to boot.  (likely).



In essence you're looking down your noses at these people, and just
continuing the old tradition of saying "My <x> is better than yours".
Of course, they do it just as much as we do.  WHICH DOESN'T MAKE IT
ANY MORE CORRECT A THING TO DO.  Each <x> has it's good points and
bad points.  BASIC is still around because it's an easy to use language
and is very good at certain tasks that just need to be solved quickly.
IBM's are still around because some people just prefer that mind-set.
(I personally don't understand why, they just do).




All I wanted to say in my original posting was that we should always
keep in mind the least-common-demoninator.  At the moment it's 80x24
screens.  But I really like the 66line by 96 column display on
my Blit... :-)


>	Erik E. Fair	ucbvax!fair	fair@ucbarpa.berkeley.edu


-- 
<---- David Herron,  Local E-Mail Hack,  david@ms.uky.edu, david@ms.uky.csnet
<----                    {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<---- I thought that time was this neat invention that kept everything
<---- from happening at once.  Why doesn't this work in practice?

blarson@skat.usc.edu (Bob Larson) (10/19/87)

In article <7526@g.ms.uky.edu> david@ms.uky.edu (David Herron -- Resident E-mail Hack) writes:
>In article <21314@ucbvax.BERKELEY.EDU> fair@ucbarpa.Berkeley.EDU (Erik E. Fair) writes:
>>That's not reasonable. The reasonable approach is to do a trivial
>>encapsulation or encoding that makes it possible to move USENET
>>articles (no matter what their characteristics are) through BITNET,
>>or any other strange network.

>yes, I did exactly that for a long time with a news feed we had

>BUT ... compress and atob/btoa don't run on the IBM 308x that's
>out other neighbor on BITNET.

Who said it had to be compress and btoa?

>I agree that it's ridiculous that silly details of the transport
>system, or other operating systems' storage methods, should
>cause us to stunt the development of the software.

>Some of us (you included) are trying to free this network from its'
>reliance on Unix.  Building the WorldNet and such like.  But what will
>the IBM people on BITNET think if they start seeing every article come
>in with 2000 character long lines because someone on a Unix machine
>wanted "automatic formatting" of his paragraphs?  They'll only be able
>to read the first 80 (132?) characters of each paragraph.

So who's forcing them to truncate????  Why can't they set up some
continuation line convention?  A possible example would be to put a \
in column 80 to indicate that the next line is really part of the
current line.  The only programs that would have to know about such a
convention already have to do ascii <-> ebcdic conversion, etc.  (So
it looks ugly to the news readers on the IBM system.  If they care,
they can fix their software.)

While we're talking about fixing the news problems caused by bitnet,
could they standardize an ascii <-> ebcdic conversion table for this
use and make sure that tabs don't get converted to spaces?  (The
conversion breaks patch files, sendmail.cf files, etc.)

>In essence you're looking down your noses at these people, and just
>continuing the old tradition of saying "My <x> is better than yours".

No, we're saying if you are a single person who wants to talk to
several thousand that already speak the same language, trying to
insist that those thousands always use a subset of their language that
you happen to speak so you don't have to bother to learn the rest of
the language probably won't get you far.

--
Bob Larson		Arpa: Blarson@Ecla.Usc.Edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson		blarson@skat.usc.edu
Prime mailing list (requests):	info-prime-request%fns1@ecla.usc.edu

henry@utzoo.UUCP (Henry Spencer) (10/19/87)

> It is a problem, and what we should do about it is fix the news-reading
> and news-transferring programs to handle such messages in a reasonable
> manner.  Real soon now, >80 character lines will become the norm, and
> we had better be ready for them.

Better yet, we should stay compatible with existing practice -- a matter
of considerable importance in a network like this, where coordinated software
updates are utterly impossible -- and let the long-linists fix *their*
software to present text the way they like it while adhering to existing
standards for inter-system transmission.

> Many people have now discovered that the easiest and most natural way
> to make text be screen-width-independent is to use <newline> as a
> paragraph separator, not a line separator...

Actually, text formatters discovered that it was quite possible to have
text be output-device-width-independent without this silly incompatibility
some twenty or more years ago.  Just notice the empty line that separates
the paragraphs.  (Oh yes, and read the ASCII standard about the meaning of
newline, so you know what you're trying to be compatible with.)
-- 
"Mir" means "peace", as in           |  Henry Spencer @ U of Toronto Zoology
"the war is over; we've won".        | {allegra,ihnp4,decvax,utai}!utzoo!henry

fair@ucbarpa.Berkeley.EDU (Erik E. Fair) (10/20/87)

There are two issues here:

	1. Netnews transport
	2. Netnews presentation

The first issue is perfectly clear to me: all systems should be
able to transmit netnews articles through their gizzards without
change (excepting those changes in the headers that are mandated
by normal netnews operation, like updating "path:"). If there is
some type of netnews article that some minority of the network
can't swallow, then they're broken, and should be fixed, or left
on the periphery of netnews distribution so that their brokeness
won't affect the rest of the network. I do not intend to preclude
IBM systems from storing things in some internal format that is
more efficient for them; I just want them to understand that when
they transmit such an article to the outside world that the article
should be converted back to what the rest of the network views as
"normal": ASCII, with no transliterations, substitutions, or other
information loss.

The second issue is a bit more thorny. Taken to logical extreme,
we need to write articles in some formatting or page description
language, which the user interfaces interpret for whatever display
the user is using. SGML, anyone? Or perhaps {n,t,dit}roff? Maybe
PostScript?

Whatever we finally choose should be relatively easy to interpret,
easy to learn and write things in (nroff with -ms isn't so bad, if
you don't do anything too fancy), and yet powerful enough to do the
sort of fancy things you might see on a Sun or Macintosh. Not a
weekend hack project, it seems to me.

Given that none of the existing user interfaces is prepared to deal
with this sort of thing automatically (sure, you can pipe articles
to external interpreters, but that's not the point), we have to
make some assumptions, and the prevailing assumptions are 24 lines
of 80 ASCII characters, with various format effectors like tabs,
blank lines and form feeds. People who violate these assumptions
should bear in mind that in making their articles harder to read
on what is certainly the standard display size on the USENET today,
are decreasing the probability that their message will be read and
understood.

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.berkeley.edu

david@ms.uky.edu (David Herron -- Resident E-mail Hack) (10/20/87)

In article <4756@oberon.USC.EDU> blarson@skat.usc.edu (Bob Larson) writes:
>In article <7526@g.ms.uky.edu> david@ms.uky.edu (David Herron -- Resident E-mail Hack) writes:
>>yes, I did exactly that for a long time with a news feed we had
>>BUT ... compress and atob/btoa don't run on the IBM 308x that's
>>out other neighbor on BITNET.
>Who said it had to be compress and btoa?

compress/btoa happen to be what I used since it worked with a
couple of the sites I wanted to exchange news with.  It could
be something else.

>>Some of us (you included) are trying to free this network from its'
	[ I am speaking to Erik here ... ]
>>reliance on Unix.  Building the WorldNet and such like.  But what will
>>the IBM people on BITNET think if they start seeing every article come
>>in with 2000 character long lines because someone on a Unix machine
>>wanted "automatic formatting" of his paragraphs?  They'll only be able
>>to read the first 80 (132?) characters of each paragraph.

>So who's forcing them to truncate???? 

BITNET itself is doing the truncation.  Think that IBM only makes
high speed CARD PUNCHs and BITNET begins to make sense.  

> Why can't they set up some
>continuation line convention?  

To be honest, I don't think they really see the problem.

Also there's a bit of a chicken-and-egg problem.  There are news
readers and transport agents for IBM mainframes...  But the use
isn't very widespread.  And the people doing it don't really understand
that tab preservation or { and } preservation are needed things.
Also ... most of the equivalent sort of traffic gets handled by
their LISTSERVers.  It's a distributed mailing list handler which
allows people to subscribe/unsubscribe by themselves, and automagically
subscribes them to the nearest LISTSERV.

I don't really see a good solution.  Potentially they could be a
valuable addition to this WorldNet thingie we're trying to build.
Doing things which are against their operating systems' assumptions
is as irritating to them as their doing things against our os's
assumptions.


> A possible example would be to put a \
>in column 80 to indicate that the next line is really part of the
>current line.

Bad example.  Suppose a Makefile is posted which has a line exactly 80
characters long with a \ as the last character ...


I just remembered there's already one format in use that could be
used ... The Listserv-punch format ... it can at least handle
long records virtually within an 80 column punch file.  And there's
software around to encode/decode it already ... I'll have to look
into this ...

I think their current software works sort-of like ours.  A batch arrives,
gets put into a /usr/spool/news equivalent, and is transmitted from
there.  In order to not transmit the munged copy of the article 
across the net they'd have to do a non-trivial re-working of their
systems to achieve an effect which won't even be visible to themselves.

>While we're talking about fixing the news problems caused by bitnet,
>could they standardize an ascii <-> ebcdic conversion table for this
>use and make sure that tabs don't get converted to spaces?  (The
>conversion breaks patch files, sendmail.cf files, etc.)

For my part ... I've determined that the only munging (at least for
the news software at psuvm.bitnet) happens for articles which go:

	psuvax1 -> psuvm.bitnet -> ukma

(i.e. were transmitted by UREP) For our feeds to the outside world I
block any articles which arrived here via psuvax1 ... The only disaster
I know of from a munged article which went through these links was one
of the patches-for-patch.  Its' tabs were changed to spaces causing it
to be useless unless you used the -l switch, but many people didn't
know about -l.  (It so happened that sdcsvax -> burdvax -> psuvax1 ->
ukma -> cbosgd was VERY VERY fast that day).




I think we all understand the problems.

Personally I don't like the idea of using VERY VERY long lines anyway.
It looks ugly and like the person doesn't know how to use their editor
very well.

Many people have pointed out that you look for paragraph breaks by
blank lines.  Yah, I use blank lines for paragraph breaks, but not
all do.  For another case, what will happen to the above quotations
if they get automatically-formatted on display?  Won't they stop
looking like quotations?


-- 
<---- David Herron,  Local E-Mail Hack,  david@ms.uky.edu, david@ms.uky.csnet
<----                    {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<---- I thought that time was this neat invention that kept everything
<---- from happening at once.  Why doesn't this work in practice?

owens@psuvax1.psu.edu (Robert Michael Owens) (10/21/87)

first, ascii machine -> ascii machine over bitnet using urep can be done
so that an image of the file is transfered using existing code. as alf
would say -- no problem. the problem occures then an ascii machine ->
unknown machine transfer occurs. in this case the ascii machine must
assume the lowest common denominator (a brain damaged ibm machine). hence,
the unix ascii byte stream must be converted into either ibm punch or
print records. most ibm systems require punch records to be exactly 80
characters (fixed length records) and print records to be less than or
equal to 132 characters (variable length records). also a ccc (would
you believe channel command code) or asa character should be prepended
to a print record. furthermore several records (which may have a nop
ccc's so the user can't really see them) may have to be prepended to the
file (how many can say :read card).

to make thing worse there is no one ebcdic standard (hence the left and
right bracket problem). the ebcdic tab is not correctly interpreted by some
(most) ibm packages (editors, etc), etc. urep converts a file so that if
the file is printed on either the ascii or ebcdic host, the listings are the
same. (how many can say skip to prime page)

second, some (most) ibm hosts can handle very long records. the problem is
not all hosts can. big blue solves this problem by encapsulating the file
(how many can say diskdump or netdata) when it is xfered. also, some jes'es
and rscs2 (as does urep 3.0) can also handle spanned records. rscs1 also
had spanned records but in a way which was pretty much incomptable with
every thing else (special ccc's).

owens

hay. i don't know what i'm talking about either. i just wrote the code.

clewis@stm386.UUCP (Chris Lewis) (10/21/87)

Regarding the discussions about 80 character truncations etc...

I haven't actually seen BITNET, but isn't it primarily VM/CMS machines
communicating by RSCS?  (Good 'ol "CP SET PUN ROUTE...", "PUNCH FILE..." 
etc.)  (I used to be a bit of a VM/CMS hacker till I saw the light :-)
That's a little archaic - if you changed the "PUNCH FILE" to "DISK DUMP"
it would be able to transmit any kind of file (read: RECFM V, 
LRECL=anything).  "DISK LOAD" on the other side.  And, it's relatively
easy to have the software figure out itself which to do (to maintain
compatibility with both)  Depending on the software
there might be some things you'd have to do with the article reading
code ("rnews" equivalent).  Mind you, this is hacking - and if you
think that getting people to upgrade their USENET software on UUCP is
hard...

IBM had (3-4 years ago) an internal news system that works sort of similarly -
I believe that it was an ultra-trivial set of EXECs that merely read
incoming "punch" files, figures out which "newsgroup" it was in and
then appended the article to a "newsgroup file".  There was a central
repository where you sent articles, and it broadcast them to all sites
that "subscribed to the net"  The user interface
was merely "link to news disk" and then the user xedit'd the newsgroups
they wanted to see.  Xedit can handle files of any width.  As you can 
well imagine, the traffic wasn't particularly high.

The point I'm trying to make is that, yes, the virtual card punch is
RECFM=F, LRECL=80, but with the standard software available for
handling the PUNCH this is no longer a limitation on what you can 
send.  Must we remain compatible with a limit on a "peripheral network"
that hasn't been a limit on those systems for ages?   (predates VM/SP!)

Yes, lines over 80 columns are a bit of a pain on my VT100, but c'est la 
vie.

Of course, ASCII<->EBCDIC translation is a real b***h with those thingies.
Braces (there are two different pairs of codes for these - depending
on the peripheral), square brackets (is anybody's 3274 GENed to display
these?), tildes (Yes Virginia, there is a tilde in EBCDIC... somewhere),
carets (Weeelll, you can download a font for one...), tabs? (what's a tab?)

Actually, as far as articles originating in ASCII-land is concerned,
I would prefer that BITNET store them totally unchanged (In ASCII, RECFM=U).
Then, when a user wants to see something, BITNET figures out whether
to use its normal article "presenter", or runs it thru a ASCII-EBCDIC
translation when the user wants to see it.  Then, ASCII-land articles
that go to another ASCII-land site via some BITNET site have no changes
whatsoever.  Of course though, I'm dreaming...
-- 
Chris Lewis, International Semi-Tech Microelectronics Inc.
{uunet|utzoo}!mnetor!stm386!clewis

karl@haddock.ISC.COM (Karl Heuer) (10/22/87)

In article <37e7ff5a.b8ab@apollo.uucp> rees@apollo.uucp (Jim Rees) writes:
>I don't see why we should have to change the format of the text as sent.
>It's easy to tell where lines and paragraphs end with the existing
>format.  Lines end in a single NL, paras end in a double NL.

I wish this were true.  Unfortunately, there are some folks out there who use
"\n[ \t][ \t]*" rather than "\n\n" as their paragraph separator.

Write a filter that recognizes both formats, you say?  Good idea, but now I
have to worry about the people who think that indentation is a good way to
highlight quoted text.  And their counterparts who believe that the quoted
text should be left as is, and the reply indented.  Intelligent intervention
is required at this point, and since AI doesn't exist, that means a human.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

karl@haddock.ISC.COM (Karl Heuer) (10/22/87)

In article <7541@e.ms.uky.edu> david@ms.uky.edu (David Herron -- Resident E-mail Hack) writes:
>In article <4756@oberon.USC.EDU> blarson@skat.usc.edu (Bob Larson) writes:
>>A possible example would be to put a \ in column 80 to indicate that the
>>next line is really part of the current line.
>
>Bad example.  Suppose a Makefile is posted which has a line exactly 80
>characters long with a \ as the last character ...

Then, by this convention, the first 79 characters would be displayed on the
first line, followed by a \ for continuation, and the 80th character (\) would
appear alone on the second line.  Completely unambiguous, although ugly.

>Personally I don't like the idea of using VERY VERY long lines anyway.
>It looks ugly and like the person doesn't know how to use their editor
>very well.

Well, the suggestion was that the newsreading program should know how to
display it properly.

>... For another case, what will happen to the above quotations if they get
>automatically-formatted on display?  Won't they stop looking like quotations?

Again, not if the newsreader formatter is smart.  The quoted text would look
like ">very long line of text\n" internally, but would display as if it were
">very\n>long\n>line\n>of text\n\n".  More generally, the internal format
should be something like ">\{margin}very long line of text\{para}" so that
strings other than ">" can be properly replicated.

(Btw, I use a similar convention when writing C programs.  I try to avoid
breaking a line just because it's getting close to the margin -- the person
reading the code may have a different screen width or tab stops.  Now someone
just needs to write an editor that will display such long lines in a more
conventional format.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

jerry@oliveb.UUCP (Jerry Aguirre) (10/22/87)

In article <7526@g.ms.uky.edu> david@ms.uky.edu (David Herron -- Resident E-mail Hack) writes:
>All I wanted to say in my original posting was that we should always
>keep in mind the least-common-demoninator.  At the moment it's 80x24
>screens.  But I really like the 66line by 96 column display on
>my Blit... :-)

ACTUALLY I HAVE SEEN MORE THAN A FEW
ARTICLES THAT WERE POSTED ON UPPERCASE
ONLY TERMINALS.  SOME WERE EVEN
RESTRICTED TO 40 COLUMNS.  IF WE ARE
GOING TO RESTRICT OURSELFS TO THE
LEAST-COMMON-DENOMINATOR THEN LET US USE
40 COLUMN UPPERCASE ONLY.  OH, NO
BRACES, TILDE, OR PIPE SYMBOLS BECAUSE
THEY DON'T PRINT ON SOME TERMINALS.

					All :-) if you couldn't tell.

					Jerry Aguirre
					Systems Administration
					Olivetti ATC

(Actually when I read an all UPPER CASE article I am left with the
impression that the writer has been SHOUTING at me.)

nick@nswitgould.OZ (Nick Andrew) (10/22/87)

in article <7526@g.ms.uky.edu>, david@ms.uky.edu (David Herron -- Resident E-mail Hack) says:
| 
| Some of us (you included) are trying to free this network from its'
| reliance on Unix.  Building the WorldNet and such like.  But what will
| the IBM people on BITNET think if they start seeing every article come
| in with 2000 character long lines because someone on a Unix machine
| wanted "automatic formatting" of his paragraphs?  They'll only be able
| to read the first 80 (132?) characters of each paragraph.
| 

	Gee whiz, if the IBM OS can't handle it then it must be accomplished
by the gateway machine(s). How many gateways are there between BITNET and
UUCP?  It should be a (relatively) simple matter for each gateway processor
to fold long lines before the IBMs get hold of it.  Slower?  Nah ... a couple
of instructions!

ACSnet:    nick@nswitgould.oz	zeta@runx.ips.oz
UUCP:      ...!uunet!munnari!nswitgould.oz!nick
Fidonet:   3:713/602
ACSgate:   3:713/603 (nick@zeta.fido@nswitgould.oz in development)

"Anything that is moral for a group to do is moral for one person to do"
			- Clark Fries in Heinlein's "Podkayne of Mars".

IRWIN@pucc.Princeton.EDU (Irwin Tillman) (10/22/87)

I distribute a VM/CMS implementation of netnews, and it deals properly
with lines > 80 characters.  Since it has hooks for communicating with
other sites, it is up to the local news admin to specify a transport
mechanism that will preserve long lines (and do ASCII/EBCDIC character
translation "properly" if it is necessary).  Two methods that may be
available (depending on software and hardware available at each site)
are SENDFILE and ftp.
 
Irwin Tillman           BITNET: IRWIN@PUCC
Princeton University    UUCP: {allegra,ihnp4,cbosgd}!psuvax1!PUCC.BITNET!IRWIN

allbery@ncoast.UUCP (Brandon Allbery) (10/23/87)

As quoted from <7526@g.ms.uky.edu> by david@ms.uky.edu (David Herron -- Resident E-mail Hack):
+---------------
| In article <21314@ucbvax.BERKELEY.EDU> fair@ucbarpa.Berkeley.EDU (Erik E. Fair) writes:
| >That's not reasonable. The reasonable approach is to do a trivial
| >encapsulation or encoding that makes it possible to move USENET
| >articles (no matter what their characteristics are) through BITNET,
| >or any other strange network.
| 
| BUT ... compress and atob/btoa don't run on the IBM 308x that's
| out other neighbor on BITNET.  ALSO, in both cases their underlying
| operating systems has that record-oriented mentality.
+---------------

At least one Fido-compatible system uses an encoding such that lines end in
^M and paragraphs end in ^M^J; and the Fido standard is paragraph-oriented,
NOT line-oriented, so as to encourage word wrapping.  I suggest that a system
like this, with ^M inserted between words to force lines to < 80 characters,
would work fine without breaking filesystems based on fixed-length records
rather than variable- length ones (i.e. lines).  (For UNIX, ^M and ^J would
seem to be natural choices.  These can easily be changed to ^M and ^M^J for
non-UNIX sites, a' la "text mode" umodem and kermit.)
-- 
Brandon S. Allbery		     necntc!ncoast!allbery@harvard.harvard.edu
  {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery

rhorn@infinet.UUCP (Rob Horn) (10/24/87)

I don't think that these simple solutions will work well.  There are
many aspects of an article that have to be handled properly:
indentation of quotations, marking inclusions from previous postings,
poetry, pictures, etc.

A new format that can convey all this information and still meet the
needs of the most restrictive transport mechanisms would be to define
a minimal set of TeX macros that encompass the kinds of text
structures that news articles need.  Then the news reading software
can tailor the formatted display to the capabilities of the display
hardware.  The super fancy bitmap displays get spiffy formats and the
CRT users get the same old stuff.  This would have one drawback(?).
Since information like ``prior article inclusion'' is now formatted
locally a poster could not control whether the display uses >>'s,
font, or indentation to signify inclusion.  Similarly other text
structures might display differently on different devices.

I don't think this overall approach is yet practical.  A recognizing
filter seems plausible enough, although more complex than these early
posters seem to realize.  But on the display side I think that the
computational load would be excessive.  I can just imagine our poor
little 11/750 attempting to run multiple TeX's for all our news
readers.  Maybe some fast CRT oriented substitute could be dreamed up.

Another set of problems is implementing this in a manner that allows
for a rational transition period.  It must be able to coexist with
prior versions of software for several years --- this being the
approximate lifespan of obsolete versions of news software.  It must
be easy to add as an upgrade.  Both pose real difficulties. 

TeX is not the only suitable system, but it is well suited to conveying
structure independently from text.  If the other problems can be
overcome than the selection of what formatting system to use becomes
interesting.  I have my doubts about the suitability of either troff
or Postscript (and thus NeWS) because both of these are too close to
the display device and have already mapped some of the textual
structures into specific formatting concepts.



-- 
				Rob  Horn
	UUCP:	...harvard!adelie!infinet!rhorn
	Snail:	Infinet,  40 High St., North Andover, MA
	(Note: harvard!infinet path is in maps but not working yet)

dce@mips.UUCP (David Elliott) (10/25/87)

I'm really glad to see that my question has sparked so much thought
and discussion.

With C news in it's Alpha stage, it might be nice to have some kind
of interim solution. For example, if the new news posting mechanism
(still inews?) could look at the message and if it finds any long
lines (where "long" can be arbitrarily set to 80 characters for now),
it prints the message:

	Warning: This article contains lines longer than 80 characters,
	         making it difficult for some people to read. The
		 articles has been sent, but you should refrain from
		 doing this in the future.

This may be better to do in the news posting front ends (postnews,
Pnews, etc.), which could allow users to edit the article again to
remedy the situation, but this is more work to implement.

Another idea might be to have some sites (backbones) scan articles
for long lines, and send mail to the poster with a message similar 
to the warning above.

Yet another idea might be to add a Max-Line-Length: header field,
generated by inews for articles with lines longer than 80 (again,
chosen arbitrarily, and I would even suggest 40 in this case).
This field could be used by news software to reformat articles if
the user wishes, or in the rn KILL file to junk such messages (as
I said, if it looks too hard to read, I tend to toss it).

-- 
David Elliott		dce@mips.com  or  {ames,decwrl,prls}!mips!dce

henry@utzoo.UUCP (Henry Spencer) (10/27/87)

> ... Unfortunately, there are some folks out there who use
> "\n[ \t][ \t]*" rather than "\n\n" as their paragraph separator...
> Write a filter that recognizes both formats, you say?  Good idea, but now I
> have to worry about the people who think that indentation is a good way to
> highlight quoted text... [and so on]

However, it is probably easier (if that is the word) and less painful to
convince people to adhere to standards in such things than to convince them
to shift to a new and *incompatible* standard.  The former can be at least
partly automated, by the way.
-- 
PS/2: Yesterday's hardware today.    |  Henry Spencer @ U of Toronto Zoology
OS/2: Yesterday's software tomorrow. | {allegra,ihnp4,decvax,utai}!utzoo!henry

kimcm@ambush.UUCP (Kim Chr. Madsen) (10/30/87)

In article <8831@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:

>However, it is probably easier (if that is the word) and less painful to
>convince people to adhere to standards in such things than to convince them
>to shift to a new and *incompatible* standard.  The former can be at least
>partly automated, by the way.

Which standard?????

The standards for writing style are dependent upon several things:

	1) Whom are you writing to (Newspaper, Tech. Journal, Letter
	   to Mom, etc.)
	2) Where do you come from (different countries have different
	   style standards).

etc. etc.

				Kim Chr. Madsen.

henry@utzoo.UUCP (Henry Spencer) (11/07/87)

> Which standard?????
> The standards for writing style are dependent upon several things:
> 	1) Whom are you writing to (Newspaper, Tech. Journal, Letter
> 	   to Mom, etc.)

So we set up a specific standard for Usenet.  No big deal, except for the
highly non-trivial problem of getting people to adhere to it.  My point
remains:  getting people to use a new standard will be easier if it doesn't
require scrapping everything that exists and starting over.
-- 
Those who do not understand Unix are |  Henry Spencer @ U of Toronto Zoology
condemned to reinvent it, poorly.    | {allegra,ihnp4,decvax,utai}!utzoo!henry