brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/18/91)
Here's the ping data from nyu.edu to berkeley.edu over the last 10 minutes or so. They show a minute of communication, followed by 10 seconds of ``unreachable network'', four minutes of communication, half a minute of ``unreachable network'', another three minutes of communication, another half minute of ``unreachable network'', another two and a half minutes of communication, nine seconds of ``unreachable network'', and then just over a minute of communication. Keep in mind that an active TCP connection---e.g., a remote login---dies the second that the network becomes unreachable. An objective observer would have to conclude that, no matter how good the IP service was while it was responding, the network was simply unusable for interactive work during this period. And how good was that IP service? The minimum round trip time was rather impressive: one sixth of a second. But the maximum (not counting the unreachable periods) was awful: over ten seconds. The average was over a second, and the sample standard deviation was a whopping 1.8 seconds. That means you'd expect one packet in every few hundred to have a delay of seven seconds or more. Many studies have shown that people work best with a consistent feedback delay---it's much more important that the standard deviation be small than that the minimum be small. Don't try to tell me that four gateways between NYU and Berkeley crashed in quick succession, only to have our Super Duper Dynamic Routing Protocols bravely intercede, restoring service within 30 seconds of each crash in ways that mere mortals could never hope to understand. The pattern of Berkeley being reachable, then unreachable a moment later, has continued while I typed this article. I see this behavior all the time, even on links as short as BNL to NYU. I see only two possible explanations for these problems. Either the routes are flapping wildly, losing packets every time a router decides to ``optimize'' its choices, or sudden bursts of extra activity push the ``optimal'' route over the edge of its maximum load. Can anyone propose another explanation? No matter what the reasons, the Internet is simply not usable for interactive work coast-to-coast during times like this. Both problems, by the way, are easy to fix. To solve the second, split data among three or four usable routes, instead of loading it all onto one which will crash at the first increase in load. To solve the first, use semi-static routing---update routing tables every few days instead of every few minutes, so that (as control theory experts will observe) routing changes are too slow to hit a resonant frequency of any link. ---Dan PING berkeley.edu (128.32.133.1): 56 data bytes 64 bytes from 128.32.133.1: icmp_seq=64. time=779. ms 64 bytes from 128.32.133.1: icmp_seq=65. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=66. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=67. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=68. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=69. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=70. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=72. time=4649. ms 64 bytes from 128.32.133.1: icmp_seq=73. time=4069. ms 64 bytes from 128.32.133.1: icmp_seq=74. time=3229. ms 64 bytes from 128.32.133.1: icmp_seq=75. time=3149. ms 64 bytes from 128.32.133.1: icmp_seq=76. time=3369. ms 64 bytes from 128.32.133.1: icmp_seq=77. time=2610. ms 64 bytes from 128.32.133.1: icmp_seq=78. time=1850. ms 64 bytes from 128.32.133.1: icmp_seq=79. time=1080. ms 64 bytes from 128.32.133.1: icmp_seq=80. time=719. ms 64 bytes from 128.32.133.1: icmp_seq=81. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=82. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=83. time=7079. ms 64 bytes from 128.32.133.1: icmp_seq=84. time=6079. ms 64 bytes from 128.32.133.1: icmp_seq=85. time=5109. ms 64 bytes from 128.32.133.1: icmp_seq=86. time=4239. ms 64 bytes from 128.32.133.1: icmp_seq=87. time=3299. ms 64 bytes from 128.32.133.1: icmp_seq=88. time=2299. ms 64 bytes from 128.32.133.1: icmp_seq=90. time=400. ms 64 bytes from 128.32.133.1: icmp_seq=91. time=2209. ms 64 bytes from 128.32.133.1: icmp_seq=92. time=2879. ms 64 bytes from 128.32.133.1: icmp_seq=93. time=1890. ms 64 bytes from 128.32.133.1: icmp_seq=94. time=1140. ms 64 bytes from 128.32.133.1: icmp_seq=95. time=340. ms 64 bytes from 128.32.133.1: icmp_seq=96. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=97. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=98. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=99. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=100. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=101. time=270. ms 64 bytes from 128.32.133.1: icmp_seq=102. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=103. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=104. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=105. time=330. ms 64 bytes from 128.32.133.1: icmp_seq=106. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=107. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=108. time=249. ms 64 bytes from 128.32.133.1: icmp_seq=109. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=110. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=111. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=112. time=299. ms 64 bytes from 128.32.133.1: icmp_seq=113. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=115. time=5469. ms 64 bytes from 128.32.133.1: icmp_seq=116. time=4479. ms 64 bytes from 128.32.133.1: icmp_seq=117. time=3599. ms 64 bytes from 128.32.133.1: icmp_seq=118. time=2609. ms 64 bytes from 128.32.133.1: icmp_seq=120. time=829. ms 64 bytes from 128.32.133.1: icmp_seq=121. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=122. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=123. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=124. time=209. ms 64 bytes from 128.32.133.1: icmp_seq=125. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=126. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=127. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=128. time=179. ms 64 bytes from 128.32.133.1: icmp_seq=129. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=130. time=230. ms ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 64 bytes from 128.32.133.1: icmp_seq=141. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=142. time=330. ms 64 bytes from 128.32.133.1: icmp_seq=143. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=144. time=159. ms 64 bytes from 128.32.133.1: icmp_seq=145. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=146. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=159. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=160. time=189. ms 64 bytes from 128.32.133.1: icmp_seq=161. time=420. ms 64 bytes from 128.32.133.1: icmp_seq=162. time=500. ms 64 bytes from 128.32.133.1: icmp_seq=176. time=279. ms 64 bytes from 128.32.133.1: icmp_seq=175. time=1279. ms 64 bytes from 128.32.133.1: icmp_seq=177. time=460. ms 64 bytes from 128.32.133.1: icmp_seq=178. time=610. ms 64 bytes from 128.32.133.1: icmp_seq=179. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=180. time=179. ms 64 bytes from 128.32.133.1: icmp_seq=181. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=182. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=183. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=184. time=229. ms 64 bytes from 128.32.133.1: icmp_seq=185. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=186. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=187. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=188. time=419. ms 64 bytes from 128.32.133.1: icmp_seq=189. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=190. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=191. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=192. time=189. ms 64 bytes from 128.32.133.1: icmp_seq=203. time=430. ms 64 bytes from 128.32.133.1: icmp_seq=204. time=410. ms 64 bytes from 128.32.133.1: icmp_seq=205. time=710. ms 64 bytes from 128.32.133.1: icmp_seq=206. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=207. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=208. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=209. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=210. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=211. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=212. time=530. ms 64 bytes from 128.32.133.1: icmp_seq=213. time=500. ms 64 bytes from 128.32.133.1: icmp_seq=214. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=215. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=216. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=217. time=410. ms 64 bytes from 128.32.133.1: icmp_seq=218. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=219. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=220. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=221. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=222. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=255. time=8839. ms 64 bytes from 128.32.133.1: icmp_seq=259. time=4840. ms 64 bytes from 128.32.133.1: icmp_seq=260. time=4120. ms 64 bytes from 128.32.133.1: icmp_seq=264. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=256. time=8500. ms 64 bytes from 128.32.133.1: icmp_seq=261. time=3500. ms 64 bytes from 128.32.133.1: icmp_seq=262. time=2510. ms 64 bytes from 128.32.133.1: icmp_seq=257. time=7510. ms 64 bytes from 128.32.133.1: icmp_seq=258. time=6510. ms 64 bytes from 128.32.133.1: icmp_seq=263. time=1500. ms 64 bytes from 128.32.133.1: icmp_seq=265. time=8719. ms 64 bytes from 128.32.133.1: icmp_seq=266. time=7739. ms 64 bytes from 128.32.133.1: icmp_seq=268. time=5779. ms 64 bytes from 128.32.133.1: icmp_seq=270. time=3779. ms 64 bytes from 128.32.133.1: icmp_seq=271. time=2849. ms 64 bytes from 128.32.133.1: icmp_seq=272. time=1850. ms 64 bytes from 128.32.133.1: icmp_seq=273. time=1330. ms 64 bytes from 128.32.133.1: icmp_seq=274. time=510. ms 64 bytes from 128.32.133.1: icmp_seq=275. time=570. ms 64 bytes from 128.32.133.1: icmp_seq=276. time=430. ms 64 bytes from 128.32.133.1: icmp_seq=277. time=580. ms 64 bytes from 128.32.133.1: icmp_seq=278. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=279. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=280. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=281. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=282. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=283. time=510. ms 64 bytes from 128.32.133.1: icmp_seq=284. time=350. ms 64 bytes from 128.32.133.1: icmp_seq=285. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=286. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=287. time=340. ms 64 bytes from 128.32.133.1: icmp_seq=288. time=440. ms 64 bytes from 128.32.133.1: icmp_seq=289. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=290. time=540. ms 64 bytes from 128.32.133.1: icmp_seq=291. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=292. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=293. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=294. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=295. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=296. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=297. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=298. time=2390. ms 64 bytes from 128.32.133.1: icmp_seq=301. time=8849. ms 64 bytes from 128.32.133.1: icmp_seq=307. time=5229. ms 64 bytes from 128.32.133.1: icmp_seq=308. time=6680. ms 64 bytes from 128.32.133.1: icmp_seq=309. time=5720. ms 64 bytes from 128.32.133.1: icmp_seq=310. time=4720. ms 64 bytes from 128.32.133.1: icmp_seq=312. time=4110. ms 64 bytes from 128.32.133.1: icmp_seq=314. time=2140. ms 64 bytes from 128.32.133.1: icmp_seq=315. time=1350. ms 64 bytes from 128.32.133.1: icmp_seq=316. time=350. ms 64 bytes from 128.32.133.1: icmp_seq=320. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=321. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=327. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=328. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=329. time=1030. ms 64 bytes from 128.32.133.1: icmp_seq=330. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=331. time=440. ms 64 bytes from 128.32.133.1: icmp_seq=332. time=470. ms 64 bytes from 128.32.133.1: icmp_seq=333. time=910. ms 64 bytes from 128.32.133.1: icmp_seq=334. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=335. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=336. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=337. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=338. time=1010. ms 64 bytes from 128.32.133.1: icmp_seq=339. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=340. time=580. ms 64 bytes from 128.32.133.1: icmp_seq=351. time=320. ms 64 bytes from 128.32.133.1: icmp_seq=352. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=353. time=350. ms 64 bytes from 128.32.133.1: icmp_seq=354. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=355. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=356. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=357. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=358. time=370. ms 64 bytes from 128.32.133.1: icmp_seq=359. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=360. time=400. ms 64 bytes from 128.32.133.1: icmp_seq=361. time=430. ms 64 bytes from 128.32.133.1: icmp_seq=362. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=363. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=364. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=365. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=366. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=367. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=368. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=369. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=370. time=430. ms 64 bytes from 128.32.133.1: icmp_seq=371. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=372. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=373. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=374. time=450. ms 64 bytes from 128.32.133.1: icmp_seq=375. time=3580. ms 64 bytes from 128.32.133.1: icmp_seq=376. time=5750. ms 64 bytes from 128.32.133.1: icmp_seq=377. time=4770. ms 64 bytes from 128.32.133.1: icmp_seq=378. time=3770. ms 64 bytes from 128.32.133.1: icmp_seq=379. time=3460. ms 64 bytes from 128.32.133.1: icmp_seq=380. time=2460. ms 64 bytes from 128.32.133.1: icmp_seq=382. time=580. ms 64 bytes from 128.32.133.1: icmp_seq=383. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=384. time=1820. ms 64 bytes from 128.32.133.1: icmp_seq=385. time=990. ms 64 bytes from 128.32.133.1: icmp_seq=386. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=387. time=8060. ms 64 bytes from 128.32.133.1: icmp_seq=388. time=7470. ms 64 bytes from 128.32.133.1: icmp_seq=389. time=6480. ms 64 bytes from 128.32.133.1: icmp_seq=390. time=5860. ms 64 bytes from 128.32.133.1: icmp_seq=391. time=5020. ms 64 bytes from 128.32.133.1: icmp_seq=392. time=4020. ms 64 bytes from 128.32.133.1: icmp_seq=393. time=3020. ms 64 bytes from 128.32.133.1: icmp_seq=394. time=2140. ms 64 bytes from 128.32.133.1: icmp_seq=395. time=1150. ms 64 bytes from 128.32.133.1: icmp_seq=396. time=420. ms 64 bytes from 128.32.133.1: icmp_seq=397. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=398. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=399. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=400. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=401. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=402. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=403. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=404. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=405. time=870. ms 64 bytes from 128.32.133.1: icmp_seq=406. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=407. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=408. time=1310. ms 64 bytes from 128.32.133.1: icmp_seq=409. time=610. ms 64 bytes from 128.32.133.1: icmp_seq=410. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=411. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=412. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=413. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=414. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=415. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=416. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=417. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=418. time=1580. ms 64 bytes from 128.32.133.1: icmp_seq=419. time=780. ms 64 bytes from 128.32.133.1: icmp_seq=420. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=421. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=422. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=423. time=350. ms 64 bytes from 128.32.133.1: icmp_seq=424. time=750. ms 64 bytes from 128.32.133.1: icmp_seq=425. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=426. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=427. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=428. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=429. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=430. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=431. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=432. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=433. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=434. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=435. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=436. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=437. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=438. time=270. ms 64 bytes from 128.32.133.1: icmp_seq=439. time=6870. ms 64 bytes from 128.32.133.1: icmp_seq=440. time=6570. ms 64 bytes from 128.32.133.1: icmp_seq=441. time=5580. ms 64 bytes from 128.32.133.1: icmp_seq=442. time=4620. ms 64 bytes from 128.32.133.1: icmp_seq=447. time=970. ms 64 bytes from 128.32.133.1: icmp_seq=448. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=449. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=459. time=4250. ms 64 bytes from 128.32.133.1: icmp_seq=463. time=650. ms 64 bytes from 128.32.133.1: icmp_seq=464. time=4570. ms 64 bytes from 128.32.133.1: icmp_seq=467. time=2590. ms 64 bytes from 128.32.133.1: icmp_seq=470. time=2649. ms 64 bytes from 128.32.133.1: icmp_seq=471. time=3180. ms 64 bytes from 128.32.133.1: icmp_seq=474. time=2389. ms 64 bytes from 128.32.133.1: icmp_seq=477. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=478. time=300. ms 64 bytes from 128.32.133.1: icmp_seq=479. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=481. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=482. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=483. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=487. time=1220. ms 64 bytes from 128.32.133.1: icmp_seq=489. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=490. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=491. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=492. time=330. ms 64 bytes from 128.32.133.1: icmp_seq=493. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=496. time=1410. ms 64 bytes from 128.32.133.1: icmp_seq=497. time=580. ms 64 bytes from 128.32.133.1: icmp_seq=498. time=1579. ms 64 bytes from 128.32.133.1: icmp_seq=499. time=750. ms 64 bytes from 128.32.133.1: icmp_seq=500. time=660. ms 64 bytes from 128.32.133.1: icmp_seq=501. time=640. ms ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 64 bytes from 128.32.133.1: icmp_seq=549. time=960. ms 64 bytes from 128.32.133.1: icmp_seq=555. time=340. ms 64 bytes from 128.32.133.1: icmp_seq=556. time=350. ms 64 bytes from 128.32.133.1: icmp_seq=557. time=720. ms 64 bytes from 128.32.133.1: icmp_seq=558. time=269. ms 64 bytes from 128.32.133.1: icmp_seq=559. time=550. ms 64 bytes from 128.32.133.1: icmp_seq=560. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=561. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=563. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=562. time=1399. ms 64 bytes from 128.32.133.1: icmp_seq=564. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=565. time=370. ms 64 bytes from 128.32.133.1: icmp_seq=566. time=419. ms 64 bytes from 128.32.133.1: icmp_seq=567. time=300. ms 64 bytes from 128.32.133.1: icmp_seq=568. time=610. ms 64 bytes from 128.32.133.1: icmp_seq=569. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=570. time=189. ms 64 bytes from 128.32.133.1: icmp_seq=571. time=640. ms 64 bytes from 128.32.133.1: icmp_seq=572. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=576. time=5789. ms 64 bytes from 128.32.133.1: icmp_seq=582. time=1190. ms 64 bytes from 128.32.133.1: icmp_seq=584. time=660. ms 64 bytes from 128.32.133.1: icmp_seq=585. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=586. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=587. time=440. ms 64 bytes from 128.32.133.1: icmp_seq=588. time=490. ms 64 bytes from 128.32.133.1: icmp_seq=643. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=644. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=645. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=646. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=647. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=648. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=649. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=650. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=651. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=653. time=1869. ms 64 bytes from 128.32.133.1: icmp_seq=655. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=656. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=657. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=658. time=270. ms 64 bytes from 128.32.133.1: icmp_seq=659. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=660. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=661. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=662. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=663. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=664. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=665. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=671. time=1480. ms 64 bytes from 128.32.133.1: icmp_seq=673. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=674. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=675. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=676. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=677. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=678. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=679. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=680. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=681. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=682. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=683. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=684. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=685. time=2229. ms 64 bytes from 128.32.133.1: icmp_seq=688. time=3779. ms 64 bytes from 128.32.133.1: icmp_seq=689. time=3319. ms 64 bytes from 128.32.133.1: icmp_seq=690. time=2480. ms 64 bytes from 128.32.133.1: icmp_seq=691. time=1570. ms 64 bytes from 128.32.133.1: icmp_seq=694. time=3319. ms 64 bytes from 128.32.133.1: icmp_seq=695. time=3009. ms 64 bytes from 128.32.133.1: icmp_seq=696. time=2289. ms 64 bytes from 128.32.133.1: icmp_seq=697. time=1569. ms 64 bytes from 128.32.133.1: icmp_seq=698. time=1000. ms 64 bytes from 128.32.133.1: icmp_seq=699. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=700. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=701. time=259. ms 64 bytes from 128.32.133.1: icmp_seq=702. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=703. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=704. time=6839. ms 64 bytes from 128.32.133.1: icmp_seq=708. time=3389. ms 64 bytes from 128.32.133.1: icmp_seq=709. time=2579. ms 64 bytes from 128.32.133.1: icmp_seq=710. time=1710. ms 64 bytes from 128.32.133.1: icmp_seq=711. time=910. ms 64 bytes from 128.32.133.1: icmp_seq=712. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=713. time=289. ms 64 bytes from 128.32.133.1: icmp_seq=714. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=715. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=716. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=717. time=219. ms 64 bytes from 128.32.133.1: icmp_seq=718. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=719. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=720. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=721. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=722. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=723. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=724. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=725. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=726. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=727. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=728. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=730. time=1340. ms 64 bytes from 128.32.133.1: icmp_seq=731. time=1090. ms 64 bytes from 128.32.133.1: icmp_seq=732. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=733. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=734. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=735. time=330. ms 64 bytes from 128.32.133.1: icmp_seq=736. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=737. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=738. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=739. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=740. time=310. ms 64 bytes from 128.32.133.1: icmp_seq=741. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=742. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=743. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=744. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=745. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=746. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=747. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=748. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=749. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=750. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=751. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=752. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=753. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=754. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=755. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=756. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=757. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=758. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=759. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=760. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=761. time=590. ms 64 bytes from 128.32.133.1: icmp_seq=762. time=4119. ms 64 bytes from 128.32.133.1: icmp_seq=765. time=3829. ms 64 bytes from 128.32.133.1: icmp_seq=766. time=2929. ms 64 bytes from 128.32.133.1: icmp_seq=767. time=1949. ms 64 bytes from 128.32.133.1: icmp_seq=769. time=780. ms 64 bytes from 128.32.133.1: icmp_seq=771. time=3989. ms 64 bytes from 128.32.133.1: icmp_seq=772. time=2989. ms 64 bytes from 128.32.133.1: icmp_seq=776. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=777. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=778. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=779. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=780. time=430. ms 64 bytes from 128.32.133.1: icmp_seq=781. time=1540. ms 64 bytes from 128.32.133.1: icmp_seq=782. time=660. ms 64 bytes from 128.32.133.1: icmp_seq=783. time=460. ms 64 bytes from 128.32.133.1: icmp_seq=784. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=785. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=786. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=787. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=788. time=350. ms 64 bytes from 128.32.133.1: icmp_seq=789. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=790. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=791. time=380. ms 64 bytes from 128.32.133.1: icmp_seq=792. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=793. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=794. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=795. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=796. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=797. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=798. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=799. time=180. ms ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 64 bytes from 128.32.133.1: icmp_seq=830. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=831. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=832. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=833. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=834. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=835. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=836. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=837. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=838. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=844. time=209. ms 64 bytes from 128.32.133.1: icmp_seq=849. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=850. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=913. time=2770. ms 64 bytes from 128.32.133.1: icmp_seq=916. time=4139. ms 64 bytes from 128.32.133.1: icmp_seq=917. time=3159. ms 64 bytes from 128.32.133.1: icmp_seq=918. time=2249. ms 64 bytes from 128.32.133.1: icmp_seq=921. time=4209. ms 64 bytes from 128.32.133.1: icmp_seq=922. time=4179. ms 64 bytes from 128.32.133.1: icmp_seq=923. time=3329. ms 64 bytes from 128.32.133.1: icmp_seq=924. time=2469. ms 64 bytes from 128.32.133.1: icmp_seq=925. time=1490. ms 64 bytes from 128.32.133.1: icmp_seq=926. time=500. ms 64 bytes from 128.32.133.1: icmp_seq=927. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=928. time=189. ms 64 bytes from 128.32.133.1: icmp_seq=929. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=930. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=931. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=932. time=249. ms 64 bytes from 128.32.133.1: icmp_seq=933. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=934. time=7019. ms 64 bytes from 128.32.133.1: icmp_seq=936. time=5209. ms 64 bytes from 128.32.133.1: icmp_seq=937. time=4489. ms 64 bytes from 128.32.133.1: icmp_seq=938. time=3639. ms 64 bytes from 128.32.133.1: icmp_seq=939. time=3069. ms 64 bytes from 128.32.133.1: icmp_seq=941. time=1660. ms 64 bytes from 128.32.133.1: icmp_seq=942. time=1000. ms 64 bytes from 128.32.133.1: icmp_seq=943. time=430. ms 64 bytes from 128.32.133.1: icmp_seq=944. time=189. ms 64 bytes from 128.32.133.1: icmp_seq=945. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=946. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=947. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=948. time=189. ms 64 bytes from 128.32.133.1: icmp_seq=949. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=950. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=951. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=952. time=169. ms 64 bytes from 128.32.133.1: icmp_seq=953. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=954. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=955. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=956. time=209. ms 64 bytes from 128.32.133.1: icmp_seq=959. time=3259. ms 64 bytes from 128.32.133.1: icmp_seq=960. time=2530. ms 64 bytes from 128.32.133.1: icmp_seq=961. time=1670. ms 64 bytes from 128.32.133.1: icmp_seq=963. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=964. time=209. ms 64 bytes from 128.32.133.1: icmp_seq=965. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=967. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=968. time=209. ms 64 bytes from 128.32.133.1: icmp_seq=970. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=971. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=972. time=219. ms 64 bytes from 128.32.133.1: icmp_seq=973. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=974. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=975. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=976. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=977. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1068. time=10509. ms 64 bytes from 128.32.133.1: icmp_seq=1069. time=9569. ms 64 bytes from 128.32.133.1: icmp_seq=1071. time=7639. ms 64 bytes from 128.32.133.1: icmp_seq=1070. time=8709. ms 64 bytes from 128.32.133.1: icmp_seq=1097. time=360. ms 64 bytes from 128.32.133.1: icmp_seq=1098. time=340. ms 64 bytes from 128.32.133.1: icmp_seq=1099. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=1100. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=1101. time=320. ms 64 bytes from 128.32.133.1: icmp_seq=1102. time=390. ms 64 bytes from 128.32.133.1: icmp_seq=1103. time=390. ms 64 bytes from 128.32.133.1: icmp_seq=1104. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=1105. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1106. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1107. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1113. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=1119. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=1120. time=320. ms 64 bytes from 128.32.133.1: icmp_seq=1121. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=1122. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=1123. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1124. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1125. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1126. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1128. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1129. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1130. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1131. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1132. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=1133. time=160. ms 64 bytes from 128.32.133.1: icmp_seq=1134. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=1135. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1136. time=340. ms 64 bytes from 128.32.133.1: icmp_seq=1137. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=1138. time=370. ms 64 bytes from 128.32.133.1: icmp_seq=1139. time=359. ms 64 bytes from 128.32.133.1: icmp_seq=1140. time=300. ms 64 bytes from 128.32.133.1: icmp_seq=1141. time=1940. ms 64 bytes from 128.32.133.1: icmp_seq=1142. time=3909. ms 64 bytes from 128.32.133.1: icmp_seq=1144. time=5119. ms 64 bytes from 128.32.133.1: icmp_seq=1147. time=2879. ms 64 bytes from 128.32.133.1: icmp_seq=1149. time=1330. ms 64 bytes from 128.32.133.1: icmp_seq=1150. time=660. ms 64 bytes from 128.32.133.1: icmp_seq=1151. time=329. ms 64 bytes from 128.32.133.1: icmp_seq=1152. time=370. ms 64 bytes from 128.32.133.1: icmp_seq=1153. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1154. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1155. time=270. ms 64 bytes from 128.32.133.1: icmp_seq=1158. time=4839. ms 64 bytes from 128.32.133.1: icmp_seq=1163. time=319. ms 64 bytes from 128.32.133.1: icmp_seq=1164. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=1165. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1166. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1167. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1168. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1170. time=5809. ms 64 bytes from 128.32.133.1: icmp_seq=1169. time=6819. ms 64 bytes from 128.32.133.1: icmp_seq=1174. time=1809. ms 64 bytes from 128.32.133.1: icmp_seq=1176. time=360. ms 64 bytes from 128.32.133.1: icmp_seq=1177. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=1178. time=300. ms 64 bytes from 128.32.133.1: icmp_seq=1179. time=339. ms 64 bytes from 128.32.133.1: icmp_seq=1180. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=1181. time=420. ms 64 bytes from 128.32.133.1: icmp_seq=1182. time=370. ms 64 bytes from 128.32.133.1: icmp_seq=1183. time=259. ms 64 bytes from 128.32.133.1: icmp_seq=1184. time=510. ms 64 bytes from 128.32.133.1: icmp_seq=1185. time=650. ms 64 bytes from 128.32.133.1: icmp_seq=1186. time=5919. ms 64 bytes from 128.32.133.1: icmp_seq=1187. time=6109. ms 64 bytes from 128.32.133.1: icmp_seq=1188. time=5119. ms 64 bytes from 128.32.133.1: icmp_seq=1192. time=3160. ms 64 bytes from 128.32.133.1: icmp_seq=1193. time=2200. ms 64 bytes from 128.32.133.1: icmp_seq=1195. time=839. ms 64 bytes from 128.32.133.1: icmp_seq=1201. time=2110. ms 64 bytes from 128.32.133.1: icmp_seq=1202. time=1140. ms 64 bytes from 128.32.133.1: icmp_seq=1203. time=2129. ms 64 bytes from 128.32.133.1: icmp_seq=1204. time=1170. ms 64 bytes from 128.32.133.1: icmp_seq=1205. time=230. ms ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 ping: wrote berkeley.edu 64 chars, ret=-1 64 bytes from 128.32.133.1: icmp_seq=1215. time=440. ms 64 bytes from 128.32.133.1: icmp_seq=1216. time=330. ms 64 bytes from 128.32.133.1: icmp_seq=1217. time=650. ms 64 bytes from 128.32.133.1: icmp_seq=1218. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1219. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1220. time=4989. ms 64 bytes from 128.32.133.1: icmp_seq=1225. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=1226. time=380. ms 64 bytes from 128.32.133.1: icmp_seq=1227. time=660. ms 64 bytes from 128.32.133.1: icmp_seq=1228. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=1229. time=300. ms 64 bytes from 128.32.133.1: icmp_seq=1230. time=400. ms 64 bytes from 128.32.133.1: icmp_seq=1231. time=340. ms 64 bytes from 128.32.133.1: icmp_seq=1234. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=1239. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=1240. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=1241. time=340. ms 64 bytes from 128.32.133.1: icmp_seq=1242. time=270. ms 64 bytes from 128.32.133.1: icmp_seq=1243. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1244. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1245. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1246. time=390. ms 64 bytes from 128.32.133.1: icmp_seq=1247. time=400. ms 64 bytes from 128.32.133.1: icmp_seq=1248. time=370. ms 64 bytes from 128.32.133.1: icmp_seq=1249. time=680. ms 64 bytes from 128.32.133.1: icmp_seq=1250. time=759. ms 64 bytes from 128.32.133.1: icmp_seq=1251. time=360. ms 64 bytes from 128.32.133.1: icmp_seq=1252. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=1253. time=260. ms 64 bytes from 128.32.133.1: icmp_seq=1254. time=300. ms 64 bytes from 128.32.133.1: icmp_seq=1255. time=370. ms 64 bytes from 128.32.133.1: icmp_seq=1256. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1257. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=1258. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=1269. time=6619. ms 64 bytes from 128.32.133.1: icmp_seq=1270. time=5619. ms 64 bytes from 128.32.133.1: icmp_seq=1271. time=4629. ms 64 bytes from 128.32.133.1: icmp_seq=1272. time=5399. ms 64 bytes from 128.32.133.1: icmp_seq=1273. time=4399. ms 64 bytes from 128.32.133.1: icmp_seq=1275. time=2400. ms 64 bytes from 128.32.133.1: icmp_seq=1276. time=1780. ms 64 bytes from 128.32.133.1: icmp_seq=1277. time=3729. ms 64 bytes from 128.32.133.1: icmp_seq=1278. time=4239. ms 64 bytes from 128.32.133.1: icmp_seq=1279. time=3240. ms 64 bytes from 128.32.133.1: icmp_seq=1280. time=2340. ms 64 bytes from 128.32.133.1: icmp_seq=1281. time=1340. ms 64 bytes from 128.32.133.1: icmp_seq=1282. time=709. ms 64 bytes from 128.32.133.1: icmp_seq=1283. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=1284. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1285. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=1286. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1287. time=190. ms 64 bytes from 128.32.133.1: icmp_seq=1288. time=880. ms 64 bytes from 128.32.133.1: icmp_seq=1289. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1291. time=250. ms 64 bytes from 128.32.133.1: icmp_seq=1292. time=750. ms 64 bytes from 128.32.133.1: icmp_seq=1293. time=2039. ms 64 bytes from 128.32.133.1: icmp_seq=1294. time=1089. ms 64 bytes from 128.32.133.1: icmp_seq=1295. time=290. ms 64 bytes from 128.32.133.1: icmp_seq=1296. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1297. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1298. time=280. ms 64 bytes from 128.32.133.1: icmp_seq=1299. time=650. ms 64 bytes from 128.32.133.1: icmp_seq=1300. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1301. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1302. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1303. time=200. ms 64 bytes from 128.32.133.1: icmp_seq=1304. time=170. ms 64 bytes from 128.32.133.1: icmp_seq=1305. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=1306. time=359. ms 64 bytes from 128.32.133.1: icmp_seq=1307. time=270. ms 64 bytes from 128.32.133.1: icmp_seq=1308. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=1309. time=180. ms 64 bytes from 128.32.133.1: icmp_seq=1310. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=1311. time=240. ms 64 bytes from 128.32.133.1: icmp_seq=1312. time=220. ms 64 bytes from 128.32.133.1: icmp_seq=1313. time=230. ms 64 bytes from 128.32.133.1: icmp_seq=1314. time=210. ms 64 bytes from 128.32.133.1: icmp_seq=1315. time=190. ms ----berkeley.edu PING Statistics---- 1316 packets transmitted, 681 packets received, 48% packet loss round-trip (ms) min/avg/max = 159/1106/10509
brian@ucsd.Edu (Brian Kantor) (06/19/91)
The real solution is to fix telnet and its ilk so that it doesn't kill the connectionn when it gets what could well be a temporary error like network unreachable. It's not at all unusual to get temporary errors like that whilst rerouting is taking place. - Brian
paul@uxc.cso.uiuc.edu (Paul Pomes - UofIllinois CSO) (06/19/91)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Keep in mind that an active TCP connection---e.g., a remote login---dies >the second that the network becomes unreachable. An objective observer >would have to conclude that, no matter how good the IP service was while >it was responding, the network was simply unusable for interactive work >during this period. You must have an older TCP. Berkeley TCP ignores net/host unreachables once the connection is established. 600 lines of ping data is not that informative. traceroute data before, during, and after the glitches would help a lot more. There were also power outages affecting SURAnet yesterday that may have had an affect. /pbp -- Paul Pomes, Computing Services Office University of Illinois - Urbana Email to Paul-Pomes@uiuc.edu
mrc@milton.u.washington.edu (Mark Crispin) (06/19/91)
In article <2039.Jun1803.33.1391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Keep in mind that an active TCP connection---e.g., a remote login---dies >the second that the network becomes unreachable. The above statement is only true with certain broken versions of BSD TCP software. It is not true on other operating systems; nor is it true on fixed versions of BSD (you can thank me for nagging NeXT to fix it). What happened is that long ago some cretin thought that it was important to tell user programs about `network unreachable' errors (which generally are transient) so he made the TCP I/O system calls fail if it happened. The problem is that 99.99% of TCP user software considers a failure of a read() or write() system call to be a hard failure that necessitates the termination of the program. The fix is to change the `network unreachable' handling in the kernel so that it will cause an open() system call to fail, but no-op it for read() and write(). I believe the Host Requirements RFC mandates this. The idea is that `network unreachable' will stop you from opening a connection in the first place, but won't affect an existing connection. A few stubborn individuals continue to insist that every user program in the world should instead interpret the error code and figure out from that whether or not it is a hard error, but fortunately they are in a dwindling minority. -- DoD#105
emv@msen.com (Ed Vielmetti) (06/19/91)
> 10 minutes of ping data, showing awful results
there is unfortuantely no wide-area newsgroup where reports like this
are welcomed; the party line is that you're supposed to contact your
local network service provider to tell them that something's wrong,
and they are supposed to pass the question up and down the line to
pinpoint the problem.
there was a recent article in ny.nysernet describing some outages in
nysernet which would have affected the nyu to berkeley connection;
here's a quote.
We are experiencing intermittent connectivity problems in central
NY this evening as a result of a combination of router and line
problems. The resultant routing fluctuations have made access
to NSS10 (also in central NY) also intermittent at times. We have
dispatched technicians and expect the problems to be cleared by 1am
(6/18/91). (from levinn@nisc.psi.net)
as a point of order -- there's no good reason that a temporary
"network unreachable" error should cause a perfectly good TCP
connection to be torn down. unfortunately i can't quote an RFC on
this point but I think it's in 1122/1123.
--Ed
barmar@think.com (Barry Margolin) (06/19/91)
In article <2039.Jun1803.33.1391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >And how good was that IP service? The minimum round trip time was rather >impressive: one sixth of a second. But the maximum (not counting the >unreachable periods) was awful: over ten seconds. The average was over a >second, and the sample standard deviation was a whopping 1.8 seconds. Sounds like a problem at the NYU end. I tried pinging Berkeley.EDU from here for a few minutes, and my maximum time is closer to your your minimum. My times were: minimum 90ms, maximum 370ms, average 120ms. And we're 18 hops away (7 hops across NEARnet, 7 hops across the NSFnet T3 backbone, and 4 hops across BARRnet and Berkeley's internal network). I sent one packet every five seconds for ten minutes, and had 5% packet loss. I'm now trying a similar test using UDP, to see if I can elicit any ICMP errors (ping doesn't notice down routes, because we use a default route for everything outside the TMC local network, and routers don't generate ICMP errors for ICMP packets). I ran it for about five minutes (again, 1/5 hz) without a single error (the UDP implementation I'm using performs automatic retransmission for query/response protocols, so it would only signal an error for several lost packets in a row or ICMP errors). >That means you'd expect one packet in every few hundred to have a delay >of seven seconds or more. Many studies have shown that people work best >with a consistent feedback delay---it's much more important that the >standard deviation be small than that the minimum be small. I haven't bothered computing the std.dev. of my times. A casual glance at them (since I only pinged every five seconds, my sample size is easy to look at manually) shows that most are in the 100-120ms range. I think people can live with that. >I see this behavior all the >time, even on links as short as BNL to NYU. More evidence that the problem is somewhere near NYU. Other posters have mentioned that Nysernet has been having problems lately. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
barmar@think.com (Barry Margolin) (06/19/91)
In article <1991Jun18.173511.7510@milton.u.washington.edu> mrc@milton.u.washington.edu (Mark Crispin) writes: >In article <2039.Jun1803.33.1391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >>Keep in mind that an active TCP connection---e.g., a remote login---dies >>the second that the network becomes unreachable. >The above statement is only true with certain broken versions of BSD >TCP software. It is not true on other operating systems; nor is it >true on fixed versions of BSD (you can thank me for nagging NeXT to >fix it). Unfortunately, some of the most popular BSD derivatives still have this bug. I think it may still be in SunOS. Also, BSD is not the only system with this bug. Symbolics Genera also has it. And even though the Symbolics error signalling system makes it much easier for an application to recognize and handle this error gracefully, none of them seem to. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/19/91)
In article <35911@ucsd.Edu> brian@ucsd.Edu (Brian Kantor) writes: > The real solution is to fix telnet and its ilk so that it doesn't kill > the connectionn when it gets what could well be a temporary error like > network unreachable. It's not at all unusual to get temporary errors > like that whilst rerouting is taking place. I agree, that would keep connections alive. Provided, that is, that you convince Sun (among others) to change this behavior, and replace all the old machines out there. But you missed my point. Just because service isn't interrupted doesn't mean it's usable. Folks, an average of 1 second round trip time, with 1.8 seconds standard deviation, is just abominable on a route that could easily handle several times more data with a round trip time under a quarter second. Let me explain what's really happening on the NYU-Berkeley connection. On the average there's not too much data on each link: some segments of the ``optimal'' route are at a mere 50% or 90% capacity, with slightly sub-``optimal'' routes nearby lying unused. So I see round trip times of well under a second. Suddenly a few too many people start ftp requests in the same second. The ``optimal'' route is quickly overwhelmed, packets die like flies, and my round trip time goes down the drain. The sub-``optimal'' routes are still carrying almost no traffic. Our Super Duper Dynamic Routing Protocols see the disaster and respond, bravely throwing packets to those no longer suboptimal routes until a permanent lifeline has been established. In the meantime there's been a service interruption or delay of several seconds up to a few minutes. Soon the same thing happens again. Again the route is flooded. Again service disappears. Again the routers intercede and revert to their original routing decisions. And so it goes, on and on through the night. At higher loads, a funny thing happens. The load regularly bursts over the top of what the current route can handle. Within seconds a router changes its decisions---but the other end simultaneously comes to the opposite conclusions. By the time each burst of packets has made its round trip, the routers have changed their decisions again, feeding their already obsolete data back into the loop. And so the routes rapidly flap. Down goes the network. In the meantime, any dolt can see that the network backbone is multiply connected. While one route degenerates, several parallel routes cruise along at 1% or 3% capacity. Sure, they didn't look ``optimal'' five minutes before, because they meant some extra T1 or even 56kb hops. But if every router simply split its data between the three best routes, the whole network would be able to handle a far higher load before *anything* crashed. A funny thing happens, by the way, when you start using split routes. It no longer matters much whether you dynamically optimize or not. If your optimal link goes down, who cares? You're already sending most of your packets along the three or four slightly suboptimal links. Think of it as a backup battery system. Not just a backup battery, but a constantly online backup battery---an uninterruptible power supply, in fact a supply with three or four big backup batteries that will keep you alive just as well as the power company. So there's no point in rushing to react to every little problem. That way lies inefficiency, route flapping, and madness. You might as well leave routes constant for a while---a day, say. Just keep track of how well the routes worked, and the next day adjust the packet flow by a little bit on each line, making sure never to overload one sensible route or to ignore another. I've left out of this story any notes on why NYU-Berkeley was so slow--- why the ``optimal'' routes were so close to capacity that they kept getting pushed over the edge. Suffice it to say that the entire net, rather than just isolated pockets, will be seeing similar loads within two or three years, unless we act now to split packets across every available line. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (06/19/91)
In article <1991Jun18.224350.21721@Think.COM> barmar@think.com writes: > In article <2039.Jun1803.33.1391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >And how good was that IP service? The minimum round trip time was rather > >impressive: one sixth of a second. But the maximum (not counting the > >unreachable periods) was awful: over ten seconds. The average was over a > >second, and the sample standard deviation was a whopping 1.8 seconds. > Sounds like a problem at the NYU end. If it were, I wouldn't be able to get reliable connections from (e.g.) Princeton to NYU while BNL was getting a consistent ``network unreachable''. It is within PSI's reach, but this is all beside the point. Today, the network, with its current routing protocols, cannot handle the user-generated load between two major sites. Don't you think this is a problem? ---Dan
oleary@sura.net (dave o'leary) (06/19/91)
Yes, Dan, there are lots of problems on the Internet as far as route flaps and gateway crashes and congestion, etc. but I think you may have seen a particularly bad instance. A lot of the technology being used is less robust than it needs to be, but hey, we operations types have to keep our jobs somehow.... :-) Your message was dated at 3:30 GMT, which is 7:30 EST, right? That is about when things started to look bad here in College Park as power to the area was gone and one of the routers died of heat stroke. Although our going down shouldn't have affected NYU to Berkeley much, and we were mostly still up then. Script started on Wed Jun 19 00:23:41 1991 % traceroute -g nyu.edu berkeley.edu traceroute to berkeley.edu (128.32.133.1), 30 hops max, 40 byte packets 1 sura6 (128.167.1.6) 20 ms 10 ms 0 ms 2 nss (192.80.214.254) 10 ms 0 ms 10 ms 3 Ithaca.NY.NSS.NSF.NET (129.140.74.9) 40 ms 30 ms 40 ms 4 lan.cornell.site.psi.net (192.35.82.1) 50 ms 50 ms 50 ms 5 cornell.syr.pop.psi.net (128.145.30.1) 90 ms 50 ms 40 ms 6 albpop.syr.pop.psi.net (128.145.20.2) 60 ms syr.wp.pop.psi.net (128.145.91.1) 70 ms albpop.syr.pop.psi.net (128.145.20.2) 70 ms 7 wp.nyc.pop.psi.net (128.145.84.1) 50 ms albpop.nyc2.pop.psi.net (128.145.80.1) 80 ms wp.nyc.pop.psi.net (128.145.84.1) 60 ms 8 nyc_P1.lan.nyc2.pop.psi.net (128.145.218.2) 60 ms 70 ms 70 ms 9 nyc2.nyu.site.psi.net (128.145.44.2) 90 ms 130 ms 80 ms 10 NYU.EDU (128.122.128.2) 80 ms 330 ms 90 ms 11 NYEGRESS.NYU.EDU (128.122.128.44) 70 ms 70 ms 70 ms 12 nyu.nyc2.pop.psi.net (128.145.44.1) 70 ms 80 ms 70 ms 13 nyc_C1.lan.nyc2.pop.psi.net (128.145.218.1) 150 ms 80 ms 90 ms 14 nyc2.albany.pop.psi.net (128.145.80.2) 90 ms 70 ms 90 ms 15 syrpop.albany.pop.psi.net (128.145.20.1) 120 ms 80 ms 80 ms 16 syr.cornell.site.psi.net (128.145.30.2) 70 ms 70 ms 70 ms 17 NSS.TN.CORNELL.EDU (192.35.82.100) 80 ms 80 ms 140 ms 18 Ann_Arbor.MI.NSS.NSF.NET (129.140.81.10) 240 ms 90 ms 100 ms 19 Salt_Lake_City.UT.NSS.NSF.NET (129.140.79.17) 170 ms 180 ms 400 ms 20 Palo_Alto.CA.NSS.NSF.NET (129.140.77.15) 260 ms 450 ms 240 ms 21 SU1.BARRNET.NET (131.119.254.5) 210 ms 250 ms 200 ms 22 UCB2.BARRNET.NET (131.119.2.4) 250 ms 300 ms 250 ms 23 inr-22-dmz.Berkeley.EDU (128.32.252.22) 240 ms 210 ms 190 ms 24 inr-35.Berkeley.EDU (128.32.168.35) 220 ms 190 ms 200 ms 25 ucbvax.Berkeley.EDU (128.32.133.1) 210 ms 230 ms 230 ms % script done on Wed Jun 19 00:24:12 1991 Wow, this worked a lot better than I expected. Relatively consistent results, no packet loss, no route flaps (I am guessing that the albpop/ syr/wp stuff is cisco load balancing?). Anyway, the NYU-Berkeley path looks to be in pretty good shape now. Have fun, dave o'leary SURAnet NOC Mgr. oleary@sura.net (301)982-3214
oleary@sura.net (dave o'leary) (06/19/91)
In article <17719.Jun1904.23.0991@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <35911@ucsd.Edu> brian@ucsd.Edu (Brian Kantor) writes: >> The real solution is to fix telnet and its ilk so that it doesn't kill >> the connectionn when it gets what could well be a temporary error like >> network unreachable. It's not at all unusual to get temporary errors >> like that whilst rerouting is taking place. > >I agree, that would keep connections alive. Provided, that is, that you >convince Sun (among others) to change this behavior, and replace all the >old machines out there. But you missed my point. > >Just because service isn't interrupted doesn't mean it's usable. Folks, >an average of 1 second round trip time, with 1.8 seconds standard >deviation, is just abominable on a route that could easily handle >several times more data with a round trip time under a quarter second. > Do you have stats on the utilization of the lines, the memory on the gateways, etc. I agree that 1000 msec average is pretty unreasonable but I'm not sure what the basis for your claim is - what kind of latencies do you anticipate through a loaded regional network gateway for example? >Let me explain what's really happening on the NYU-Berkeley connection. >On the average there's not too much data on each link: some segments of >the ``optimal'' route are at a mere 50% or 90% capacity, with slightly >sub-``optimal'' routes nearby lying unused. So I see round trip times of >well under a second. > 50% utilization on a point to point line sampled less frequently than every couple of seconds is starting to look like real congestion. 90% is not pretty. Queuing delay, retransmissions due to delayed acks, dropped packets (large swings in buffer allocations) etc. Life becomes harsh very rapidly. If you sample frequently enough, you can see 100% utilization. Without context these numbers don't mean a lot. >Suddenly a few too many people start ftp requests in the same second. >The ``optimal'' route is quickly overwhelmed, packets die like flies, >and my round trip time goes down the drain. The sub-``optimal'' routes >are still carrying almost no traffic. Our Super Duper Dynamic Routing >Protocols see the disaster and respond, bravely throwing packets to >those no longer suboptimal routes until a permanent lifeline has been >established. In the meantime there's been a service interruption or >delay of several seconds up to a few minutes. > I can only think of one case where something like this would happen but it would be pretty unusual. Distance vector protocols don't decide whether a link is up or down - they receive that information from a link level protocol. The link level protocol doesn't care if packets are getting thrown away because the router doesn't have any more buffer space. So the line stays up, routes remain installed in the router's tables, and the "optimal" link is still used. The case where this doesn't happen (brief consideration says that this is applicable to both link state and distance vector protocols) is when the packets that are getting thrown away are the routing updates. However, there are at least a couple of things that prevent this from being too common - first, typically multiple routing updates have to be thrown away before routes time out. Also, since routing info flows in the opposite direction from user traffic, the link has to be congested in both directions for this to be a problem. Another reason, and probably the clincher, is that CPU's can generate routing updates faster than the interfaces can put packets in the queue. So if there is a lot of buffer space on an interface, the CPU can queue a routing update into that space rapidly, much faster than the interface can drain the queue, or another interface could fill the queue (via the CPU). Some of this changes with routers that copy directly between interfaces using a local interface CPU and such but this only makes the proposed scenario less probable. >Soon the same thing happens again. Again the route is flooded. Again >service disappears. Again the routers intercede and revert to their >original routing decisions. And so it goes, on and on through the night. > >At higher loads, a funny thing happens. The load regularly bursts over >the top of what the current route can handle. Within seconds a router >changes its decisions---but the other end simultaneously comes to the >opposite conclusions. By the time each burst of packets has made its >round trip, the routers have changed their decisions again, feeding >their already obsolete data back into the loop. And so the routes >rapidly flap. Down goes the network. > Actually this did happen, in the "old" NSFnet backbone, with the Fuzzballs, using Hello as the IGP and DDCMP as the link level protocol. However, in this case, the link level protocol tried to be smart, and retransmit frames that were lost due to buffer problems at the other end. The fuzzballs didn't have a lot of free memory lying around (despite heroic efforts by a certain individual) and (I'm trying to remember, this was a while ago when I didn't understand this stuff very well :-) so anyway, buffer thrashing resulted and route flapping did occur kind of as you describe. They were 56kb lines, and trying to cram the load of a busy ethernet down 2 or 3 or these slow speed lines was not pretty. Dave Mills can without doubt explain what was breaking much more clearly than I ever could. >In the meantime, any dolt can see that the network backbone is >multiply connected. While one route degenerates, several parallel routes >cruise along at 1% or 3% capacity. Sure, they didn't look ``optimal'' >five minutes before, because they meant some extra T1 or even 56kb hops. >But if every router simply split its data between the three best routes, >the whole network would be able to handle a far higher load before >*anything* crashed. How do you determine what is a "parallel route"? When your routing protocol works at the network layer (or the IP layer, anyway), your routers keep routing tables of IP routes and forwarding decisions are made using the destination IP address. However, congestion occurs on links between a pair of routers (I think this is called a subnet in ISO-ese). So the router can't really balance across the three best links - what are the three "best"? Since you are forwarding *packets* through the network, rather than streams, you have to keep track of a lot of stuff - and you can't anticipate what is coming next. In a circuit switching network (i.e. telephone calls) lots of assumptions are made about the calls that allow the network to do this kind of "balancing". >A funny thing happens, by the way, when you start using split routes. It >no longer matters much whether you dynamically optimize or not. If your >optimal link goes down, who cares? You're already sending most of your >packets along the three or four slightly suboptimal links. Think of it >as a backup battery system. Not just a backup battery, but a constantly >online backup battery---an uninterruptible power supply, in fact a >supply with three or four big backup batteries that will keep you alive >just as well as the power company. > >So there's no point in rushing to react to every little problem. That >way lies inefficiency, route flapping, and madness. You might as well >leave routes constant for a while---a day, say. Just keep track of how >well the routes worked, and the next day adjust the packet flow by a >little bit on each line, making sure never to overload one sensible >route or to ignore another. > I think I'm missing something here. (of course it is getting a little late :-( ). I consider a reaction time on the order of hours to be engineering decisions, not routing decisions. This addresses the problems I started to delineate above, but it isn't clear to me what you are measuring. How do you respond to outages? How do you "keep track of who well a route worked"? What is a "sensible route"? Maybe you know one when you see one, but can you code that into a routing protocol? Congestion on a link occurs on a second by second basis (or more often in some cases), so correcting things on the order of days won't really solve the problem. Although if you are proposing a kind of dynamic bandwidth allocation...well, that's been thought of too. Merit was going to do something like that with the IDNX's of the original "new" NSFnet backbone, late 1988. I'm not sure what really came of that, other than instead of trying to reallocate bandwidth they ended up just adding more everywhere. The packet flows are bursty second to second, so if you can handle the load now, it doesn't mean that you can handle it a second from now. But you might be able to handle it all day tomorrow without dropping anything. How do you plan for that other than building in lots of extra capacity (which just solves the problem anyway)? >I've left out of this story any notes on why NYU-Berkeley was so slow--- >why the ``optimal'' routes were so close to capacity that they kept >getting pushed over the edge. Suffice it to say that the entire net, >rather than just isolated pockets, will be seeing similar loads within >two or three years, unless we act now to split packets across every >available line. > >---Dan We *are* splitting traffic across every available line. Every line isn't running at the same level of utilization, but when we are running RIP we don't really have much of a choice. OSPF and IGRP allow the costing of interfaces so that better balance can be achieved. Okay, so we've started to address the problem within one autonomous system. Now it's time to cross over into the NSFnet backbone. Or maybe the ESnet backbone or Milnet. Of course, they are possibly using different IGP's, and we lose all the costing information anyway when we cross over from one AS to another. Is BGP the answer here? Maybe if we (the network service providers) can agree on how to assign OSPF costs consistently across different routing domains. And on a system to translate between OSPF and IGRP metrics (it's starting to look messy....). The problems certainly aren't trivial. And neither are the answers. Yes, we had better get to work. dave o'leary SURAnet NOC Manager oleary@sura.net (301)982-3214
jas@proteon.com (John A. Shriver) (06/19/91)
You state: Keep in mind that an active TCP connection---e.g., a remote login---dies the second that the network becomes unreachable. An objective observer would have to conclude that, no matter how good the IP service was while it was responding, the network was simply unusable for interactive work during this period. TCP connections are not lost when you get a network unreachable!!! Please see RFC 1122. That is a bug in the Berkeley (4.2bsd) TCP/IP implementation, which unfortunately (for its bugginess) has been the base for more TCP implementations that any other. Its bugs must not be taken as gospel. Please complain to the vendor of your TCP/IP implementation to fix this bug. The MIT V6 UNIX TCP/IP written in 1980-81 did not drop TCP connections on host/net unreachable, although the user telnet was nice enough to tell you that it was happening. The SMTP/TCP would just sit there and keep trying. Nonetheless, cross-country backbone routes should not be thrashing like this. I'm not trying to apologise for the network, but I think the blame for the problem should be more properly shared between the network and the host. There may be some stablitiy problem in our bailing wire inter-AS routing protocols (EGP & BGP) prompted by some link failure.
jbvb@FTP.COM (James B. Van Bokkelen) (06/19/91)
.... Keep in mind that an active TCP connection---e.g., a remote login--- dies the second that the network becomes unreachable. ... Some implementations (many 4bsd based implementations) actually drop the connection on an ICMP error, but others keep it in place and continue retransmitting for quite a while (PC/TCP) or forever (KA9Q). You get lousy response time, but you don't lose your type-ahead. James B. VanBokkelen 26 Princess St., Wakefield, MA 01880 FTP Software Inc. voice: (617) 246-0900 fax: (617) 246-0901
ejm@riscit.NOC.Vitalink.COM (Erik J. Murrey) (06/19/91)
In article <1991Jun19.044623.19628@sura.net>, oleary@sura.net (dave o'leary) writes: |> traceroute to berkeley.edu (128.32.133.1), 30 hops max, 40 byte packets |> 1 sura6 (128.167.1.6) 20 ms 10 ms 0 ms |> 2 nss (192.80.214.254) 10 ms 0 ms 10 ms |> 3 Ithaca.NY.NSS.NSF.NET (129.140.74.9) 40 ms 30 ms 40 ms |> 4 lan.cornell.site.psi.net (192.35.82.1) 50 ms 50 ms 50 ms |> 5 cornell.syr.pop.psi.net (128.145.30.1) 90 ms 50 ms 40 ms |> 6 albpop.syr.pop.psi.net (128.145.20.2) 60 ms syr.wp.pop.psi.net (128.145.91.1) 70 ms albpop.syr.pop.psi.net (128.145.20.2) 70 ms |> 7 wp.nyc.pop.psi.net (128.145.84.1) 50 ms albpop.nyc2.pop.psi.net (128.145.80.1) 80 ms wp.nyc.pop.psi.net (128.145.84.1) 60 ms |> 8 nyc_P1.lan.nyc2.pop.psi.net (128.145.218.2) 60 ms 70 ms 70 ms |> 9 nyc2.nyu.site.psi.net (128.145.44.2) 90 ms 130 ms 80 ms |> 10 NYU.EDU (128.122.128.2) 80 ms 330 ms 90 ms |> 11 NYEGRESS.NYU.EDU (128.122.128.44) 70 ms 70 ms 70 ms |> 12 nyu.nyc2.pop.psi.net (128.145.44.1) 70 ms 80 ms 70 ms ... Actually, PSI has been having some problems recently with a T-span into cornell. This is probably the cause of your problems. --- Erik J. Murrey Vitalink Communications NOC ejm@NOC.Vitalink.COM ...!uunet!NOC.Vitalink.COM!ejm