TCP-group 1992
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
Copy of personal note to Gerard
- To: "Mike Bilow, mikebw@ids.jvnc.net" <mikebw@IDS.JVNC.NET>
- Subject: Copy of personal note to Gerard
- From: Antonio Querubin <tony@mpg.phys.hawaii.edu>
- Date: Wed, 8 Jan 92 23:21:32 HST
> As far as I know, "smtp gateway" works fine. The problem, I think, is
> in the way that timers are handled. We see this in a lot of different
> places, primarily in RSPF and the mailbox, but I find it encouraging
> (in the debugging sense) that you see the same kinds of crashes in the
> regualr smtp system as in the mailbox, if you push it hard enough. I
> will look into this again and see if anything suggests itself. Your
> clue is really very valuable, and provides the first new insight I
> have had on this in the last couple of months.
Yes, SMTP seems to be one of the primary starting points for
GRINOS crashing around here. The usual sequence of events goes like
this:
The SMTP timer kicks in. SMTP tries to resolve a hostname with the
nameservers. The nameservers either do not respond in time, or,
return unuseable info. In either case, SMTP does NOT take the
intelligent approach to timeout and punt the mail to the 'smtp gateway'. The
mail file stays locked. Minutes later, timers begin to fail.
Within a few hours NOS grinds to a halt (and the watchdog timer does NOT
kick in). One favorite theory here is that a timer value used in the name
resolver code is getting so large that it's overflowing to an adjacent
byte and corrupting it (just a hunch).
Here's another problem that MAY be timer related:
There seems to be no way to turn off the NNTP log messages. When NNTP
kicks, SOMETIMES a -more- prompt will appear on the console screen as
a result of the NNTP log messages. The NNTP session ceases all
further processing. Within a few hours NOS hangs. On other
occasions, an NNTP session will remain in a CLOSE WAIT state forever
and always with 54 bytes in the receive queue. When this happens,
resetting or kicking the socket closes it off but does NOT restart the
NNTP timer (it stays at 0). An 'nntp kick' also fails to restart the timer.
Your thoughts?
Tony