PDA

View Full Version : Upgrade Linux Kernel 2.4.18 (Gentoo XFS only?) to 2.4.19 or later.



Ratt
01-15-2003, 02:15 PM
It has come to my attention... or rather, it has BIT ME IN MY F'ING ASS, that at a bare minimum, the Gentoo 2.4.18-XFS kernel TCP stack is b0rked.

I suspect this particular bug is a lower level issue, and affects the base 2.4.18 source tree, not just the Gentoo XFS sources.

Anyway, my point being, if you are running 2.4.18 and are having ANY sort of network issues/latency/SYN errors, etc..., you need to upgrade to 2.4.19 or later.

I spent the better part of a month looking over ACLs, routing tables, IPTables configs, and a grip of other things... and finally, finally after many LD calls to California, Washington and several other states, we've tracked it down to the stupid broken IP stack in the 2.4.18 kernel rev. Go figure. It was an utter nightmare, and I don't want anyone else to go through this henious attempt to solve a network issue that really isn't.

From this point forward, if you aren't running a kernel rev beyond 2.4.18, I'm going to blame any network issues you have on the broken stack. Be forewarned! I don't want to ever hear about 2.4.18 again. ARRRRG

S_B_R
01-15-2003, 04:03 PM
Interesting.... Is there any more information readily available on this bug?

jgorrell
01-15-2003, 10:48 PM
Call me a noob but how do you check this? :)

I know how to install linux and get ShowEQ running. Nothing more :)

S_B_R
01-15-2003, 11:11 PM
uname -r

Lisa
01-16-2003, 06:12 AM
Did you submit a bug report? http://bugs.gentoo.org

Sneaky
01-16-2003, 08:00 AM
Does this affect all distros of linux or just certain ones. I run RH 7.2


Yeah I know another newb question. Hey I am working on it......

Ratt
01-16-2003, 09:49 AM
I can not confirm if it's in other distro's, as the bug is fairly difficult to reproduce, and unless you know exactly what you are looking for, it won't show be readily obvious that there's even a problem.

I'm pretty confident that this is a problem affecting every distro, as the issue is with the IP stack and I seriously doubt Gentoo modified the stack for any reason (why would they?). Unless they did, it's a problem with the base source tree.

I'm not trying to be evasive about the nature of this bug, I simplyl don't know the specifics, other than the fact that every machine I had on .18 had the exact same problems when connecting to certain hosts for certain services (in this case, it was SMTP, though the problem is not with the SMTP protocol, but the way a session is initiated (SYN, etc...)). With the limited testing we did after we found that .18 was the culprit (and .19 did not exhibit this behavior), it would affect any sort of connection initiation due to what I believe to be malformed SYN packets.

However, again, I can't say that it's specifically a malformed SYN packet, as the packets looked perfectly normal to me, and I sent a sniffer log to Cicso (thinking it was an IOS problem with a router), and they said everything looks normal as well.

So if you're following me, you might understand why I can't give too many specifics, because the bug is hard to reproduce and a realy pain in the ass. But I do know that .19 fixes the issue and resolved all the heartache and hair pulling.... and I really just have no desire to disect the .18 source and check it against the .19 to see what's changed.

If someone wants to run a diff from Gentoo XFS sources between .18 and .19 to see what the difference is (and/or also with other distros) we can include the results here. It might be an intersting intellectual exercise, however... since .19 fixes the problem, for me personally, the issue is moot, as the problem has already been addressed, either intentionally or inadvertantly.

OgerSEQ
01-16-2003, 11:27 AM
Ratt! I'm glad you got this figured out. Now we can get you to focus your powers on the problems at hand! We NEED you man!

forty-two
01-16-2003, 05:24 PM
Hmm, I had issues I couldn't manage to pinpoint with Debian 3.2 and kernel 2.4.18... I'll update the kernel and see if it happens in Debian too ;)

CiscoKid
01-16-2003, 05:47 PM
Hrmmmmm, and I'm wondering if this is the cause of my random Ethernet drops with RH8...currently using 2.4.18.something...


Is there a 2.4.19 update yet for RH8?

forty-two
01-16-2003, 07:50 PM
Originally posted by forty-two
Hmm, I had issues I couldn't manage to pinpoint with Debian 3.2 and kernel 2.4.18... I'll update the kernel and see if it happens in Debian too ;)

Now on 2.4.19. Wasn't the problem, or was only part of it...:(

sauron
01-17-2003, 02:37 AM
For me, and others running Redhat 7.2.... I did Uname -r, and got "2.4.7-10". Is this issue affecting us? Do we need to upgrade the kernel?

fee
01-17-2003, 07:51 AM
Ratt,

Kinda sounds like ECN was enabled. Check it by 'cat /proc/sys/net/ipv4/tcp_ecn' or sysctl. ECN == Explicit Congestion Notification, Can cause mass histeria like you have described.


Feel free to disable it using sysctl or echo.


fee

Ratt
01-17-2003, 01:19 PM
Well... fuck me with a stick, you're right, Fee.

I'm just gonna go shave my hair now, since it's just clumps anyway.

*stumbles off mumbling explicitives*

Nacona
01-19-2003, 12:21 PM
[root@XXX nacona] uname -r
2.4.18-14
[root@XXX nacona] cat /proc/sys/net/ipv4/tcp_ecn
0

Does that mean it is on or not? *l*
;) :P

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MizeryDeAria
02-14-2003, 10:00 PM
I believe 0 means false/off/disabled