Tuesday, February 13, 2007

Unnumbered interfaces confuse Quagga

The whole reason I was reading e-mail on a Sunday was not to look for telnetd exploits.

I was logged in because Team IPsec runs its punchin IPsec remote-access server (sometimes called a VPN server, but I hate that term because it's pushed by too many middlebox vendors) which was having routing problems.

As stated before, Solaris implements tunnels as point-to-point interfaces. For a remote-access server like we have in punchin, this means every external IP address gets a tunnel interface. (Until we had Tunnel Reform, this meant only one client per external IP address, which messed up NATs for multiple clients.) A tunnel interface has two addresses - a local one and a remote one. The local one can be shared with other tunnels or even with a different local interface (like the local ethernet). Such interfaces are called unnumbered interfaces.

A remote access server does forward packets, and is therefore by definition a router. One of our servers just swapped out Zebra (from older OpenSolaris/Nevada build) to Quagga. We use Quagga's OSPF to learn the topology of the Sun internal network (the SWAN).

As clients "punch out", their tunnel gets destroyed. Now each of these tunnels shares the same local IP address with our ethernet to the SWAN. Unfortunately, these "interface down" events confuse Quagga, and suddenly all of my punchin clients can't move bits to the internal network anymore.

There is a workaround, and that's to assign a different local IP address than the one that is directly connected to the SWAN for use with all of the client tunnels. It's not that painful, as I only lose one out of 256 possible client addresses (our engineering ones only have a /24 from which to allocate client addresses). Still, as an esteemed colleague said, "I hope that's not the *whole* solution."

It isn't, and I would like to ask the Quagga community (as I've already asked our local routing folks, Paul Jakma and Alan Maguire) to make sure that Quagga and its routing protocols play nicely with unnumbered interfaces. It'll allow me to plumb tunnels until I'm all out of address space! :)

This entry brought to you by the Technorati tags , , and .

How OpenSolaris did its job during this telnet mess

I don't have a tag for general Security because dammit, I'm still a networking person who works on security!

Anyway, you've seen elsewhere about how Alan H. turned around the S10 fix as quickly as he could. I'm going to tell you how Alan already found this:

D 1.67 07/02/11 19:46:41 danmcd 90 89 00009/00010/04896
6523815 LARGE vulnerability in telnetd

when he went to file a bug that'd already been putback into Nevada/OpenSolaris.

The best place to see what happened is to visit the OpenSolaris discussions, especially this thread.

I was reading e-mail on a Sunday because of an operations problem I was having with one of our punchin IPsec remote access servers. (I'll discuss the problem, a routing one, in a followup entry later today.) I found the initial note and read the PDF file to which "skunsul" so graciously provided a link. MAN I was embarassed. After trying it on some lab machines and my laptop, I brought up the in.telnetd source (at the line number provided by Kingcope). My first approach was to verify the content of the $USER environment variable fed to in.telnetd. I compiled-and-ran the fix, which seemed to work. Great! Time to find some code reviewers.

My only regret about this was not putting the review on security-discuss@opensolaris.org or networking-discuss@opensolaris.org. I'll try better next time, especially for something that was announced on an opensolaris list initially. Anyway, two reviewers (OpenSolaris board member and well-known Sun Good Guy Casper Dik, and crypto framework expert Krishna Yenduri) suggested that login(1) is already getopt-compliant, and that I should just pass "--" between the rest of the arguments and the contents of $USER, no matter how *&^$-ed up it is. Because it was a Sunday, I didn't get rapid turnaround on e-mail replies. This is why the putback didn't happen until six hours after I'd read the note from skunsul. Krishna also recommended (in the spirit of open development) that I place the diffs on the very thread, and I did just that.

Anyone I know here who happened to have seen the initial note would've jumped on this in the same way - please don't think I did something others wouldn't do. My point is - this is the first security exploit reported to us via OpenSolaris, and I think the "Open" part of OpenSolaris helped out the code, as well as Sun's customers.

This entry brought to you by the Technorati tags and .