beame@maccs.dcss.mcmaster.ca (Carl Beame) (04/18/91)
I have a customer who has an interesting NFS problem. The person is copying serveral thousand files from a Novell server to a SUN NFS server a day. About 8 out of every 1000 files are corrupted by zeros in the first 512 bytes. We tracked the problem down and found that all corrupted files occured when the following NFS calls were made. PC sends Sun replies ================= =================== Create file -------------- (timeout retrans) Create file (same XID) OK (created) Write 512 bytes data (<>0) OK (written) -------------------- OK (created) Write 4096 bytes data OK (written) . . . What seems to be happening, is the second created, even though it has the same XID as the first, is executed after the first write of data and thus the SUN treats the file as having a 512 byte hole at the beginning, since the file was created again after the first write. This is a very very slow SUN running Sun OS 3.5. Does anyone known if there is anything this person can do to stop this occuring? (Buying more horse power is NOT a possible solution). - Carl Beame Beame & Whiteside Software Ltd. Beame@McMaster.CA (416) 648-6556
thurlow@convex.com (Robert Thurlow) (04/20/91)
In <280D9593.23762@maccs.dcss.mcmaster.ca> beame@maccs.dcss.mcmaster.ca (Carl Beame) writes: [about getting zeros in the first blocks of files.] > This is a very very slow SUN running Sun OS 3.5. Does anyone >known if there is anything this person can do to stop this occuring? >(Buying more horse power is NOT a possible solution). Upgrade the OS! SunOS 3.5 dates back to before people started using the duplicate request cache. The duplicate create is getting dealt with after the first write because separate NFS daemons are not aware of that possibility. The only problem is what OS to upgrade to. I'd think 4.0.3 would be pretty good if they feel nervous about making the leap to 4.1.1. They could also double or treble the mount timeouts on the client side if they have control over that, at the cost of performance hits when packets really did get lost. Rob T -- Rob Thurlow, thurlow@convex.com An employee and not a spokesman for Convex Computer Corp., Dallas, TX