Tuesday, November 3, 2015

From 0-to-illumos on OmniOS r151016

Today we updated OmniOS to its next stable release: r151016. You can click the link to see its release notes, and you may notice a brief mention the illumos-tools package.

I want to see more people working on illumos. A way to help that is to get people started on actually BUILDING illumos more quickly. To that end, r151016 contains everything to bring up an illumos development environment. You can develop small on it, but this post is going to discuss how we make building all of illumos-gate from scratch easier. (I plan on updating the older post on small/focused compilation after ws(1) and bldenv(1) effectively merge into one tool.)

The first thing you want to do is install OmniOS. The latest release media can be found here, on the Installation page.

After installation, your system is a blank slate. You'll need to set a root password, create a non-root user, and finally add networking parameters. The OmniOS wiki's General Administration Guide covers how to do this.

I've added a new building illumos page to the OmniOS wiki that should detail how straightforward the process is. You should be able to kick off a full nightly(1ONBLD) build quickly enough. If you don't want to edit one of the omnios-illumos-* samples in /opt/onbld/env, just make sure you have a $USER/ws directory, clone one of illumos-gate or illumos-omnios into $USER/ws/testws and use one of the template /opt/onbld/env/omnios-illumos-* files corresponding to illumos-gate or illumos-omnios. For example:

omnios(~)[0]% mkdir ws
omnios(~)[0]% cd ws
omnios(~/ws)[0]% git clone https://github.com/illumos/illumos-gate/ testws

omnios(~/ws)[0]% /bin/time /opt/onbld/bin/nightly /opt/onbld/env/omnios-illumos-gate
You can then look in testws/log/log-date&time/mail_msg to see how your build went.

Monday, April 20, 2015

Quick Reminder -- tcp_{xmit,recv}_hiwat and high-bandwidth*delay networks

I was recently working with a colleague on connecting two data centers via an IPsec tunnel. He was using iperf (coming soon to OmniOS bloody along with netperf) to test the bandwidth, and was disappointed in his results.

The amount of memory you need to hold a TCP connection's unacknowledged data is the Bandwidth-Delay product. The defaults shipped in illumos are small on the receive side:

bloody(~)[0]% ndd -get /dev/tcp tcp_recv_hiwat
128000
bloody(~)[0]% 
and even smaller on the transmit side:
bloody(~)[0]% ndd -get /dev/tcp tcp_xmit_hiwat
49152
bloody(~)[0]% 

Even platforms with Automatic tuning, the maximums they use are often not set highly enough.

Introducing IPsec into the picture adds additional latency (if not so much for encryption thanks to AES-NI & friends, then for the encapsulation and checks). This often is enough to take what are normally good enough maximums and invalidate them as too small. To change these on illumos, you can use the ndd(1M) command shown above, OR you can use the modern, persists-across-reboots, ipadm(1M) command:

bloody(~)[1]% sudo ipadm set-prop -p recv_buf=1048576 tcp
bloody(~)[0]% sudo ipadm set-prop -p send_buf=1048576 tcp
bloody(~)[0]% ipadm show-prop -p send_buf tcp
PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
tcp   send_buf              rw   1048576      1048576      49152        4096-1048576
bloody(~)[0]% ipadm show-prop -p recv_buf tcp
PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
tcp   recv_buf              rw   1048576      1048576      128000       2048-1048576
bloody(~)[0]% 

There's future work there in not only increasing the upper bound (easy), but also adopting the automatic tuning so the maximum just isn't taken right off the bat.

Sunday, March 15, 2015

New HDC service: Calendaring (or, The Limitation Game)

I'll start by stating my biases: I don't like data bloat like ASN.1, XML, or even bloaty protocols like HTTP. (Your homework: Would a 1980s-developed WAN-scale RPC have obviated HTTP? Write a paper with your answer to that question, with support.) I understand the big problems they attempt to solve. I also still think not enough people in the business were paying attention in OS (or Networking) class when seeing the various attempts at data representation during the 80s and 90s. Also, I generally like pushing intelligence out to the end-nodes, and in client/server models, this means the clients. CalDAV rubs me the wrong way on the first bias, and MOSTLY the right way on my second bias, though the clients I use aren't very smart. I will admit near-complete ignorance of CalDAV. I poked a little at its RFC, looking up how Alarms are implemented, and discovered that mostly, Alarm processing is a client issue. ("This specification makes no attempt to provide multi-user alarms on group calendars or to find out for whom an alarm is intended.")

I've configured Radicale on my Home Data Center. I need to publicly thank Lauri Tirkkonen (aka. lotheac on Freenode) for the IPS publisher which serves me up Radicale. Since my target audience is my family-of-four, I wasn't particularly concerned with its reported lack of scalability. I also didn't want to have CalDAV be a supplicant of Apache or another web server for the time. If I decide to revisit my web server choices, I may move CalDAV to that new webserver (likely nginx). I got TLS and four users configured on stock Radicale.

My job was to make an electronic equivalent of our family paper calendar. We have seven (7) colors/categories for this calendar (names withheld from the search engines): Whole-Family, Parent1, Parent2, Both-Parents, Child1, Child2, Both-Children. I thought, given iCal (10.6), Calendar.app (10.10), or Calendar (iOS), it wouldn't be too hard for these to be created and shared. I was mildly wrong.

I'm not sure if what I had to do was a limitation of my clients, of Radicale, or of CalDAV itself, but I had to create seven (7) different accounts, each with a distinct ends-in-'/' URL:

  • https://.../Whole-Family.ics/
  • https://.../Parent1.ics/
  • https://.../Parent2.ics/
  • https://.../Both-Parents.ics/
  • https://.../Child1.ics/
  • https://.../Child2.ics/
  • https://.../Both-Children.ics/
I had to configure N (large N) devices or machine-logins with these seven (7) accounts.

Luckily, Radicale DID allow me to restrict Child1's and Child2's write access to just their own calendars. Apart from that, we want the whole family to read all of the calendars. This means the colors are uniform across all of our devices (stored on the server). It also means any alarms (per above) trigger on ALL of our devices. This makes alarms (something I really like in my own Calendar) useless. Modulo the alarms problem (which can be mitigated by judicious use of iOS's Reminders app and a daily glance at the calendar), this seems to end up working pretty well, so far.

Both children recently acquired iPhones. Which means if I open this service outside our internal home network, we can schedule calendars no matter where we are, and get up to date changes no matter where we are. That will be extremely convenient.

I somewhat hope that one of my half-dozen readers will find something so laughably wrong with how I configured things that any complaints I make will be rendered moot. I'm not certain, however, that will be the case.

Sunday, November 9, 2014

Toolsmiths - since everything is software now anyway...

A recent twitter storm occurred in light of last week's #encryptnews event.

I was rather flattered when well-known whistleblower Thomas Drake retweeted this response of mine:

The mention of "buying usable software" probably makes sense to someone who's used to dealing with Commercial, Off-The-Shelf (COTS) software. We don't live in a world where COTS is necessarily safe anymore. There was a period (which I luckily lived and worked in), where Defense Department ARPA money was being directed specifically to make COTS software more secure and high-assurance. Given the Snowden revelations, however, COTS can possibly be a vulnerability as much as it could be a strength.

In the seminal Frederick Brooks book, The Mythical Man-Month, he describes one approach to software engineering: The Surgical Team. See here and scroll down for a proper description. Note the different roles for such a team.

Given that most media is equivalent to software (easily copied, distributed, etc.), I wonder if media organizations shouldn't adopt certain types of those organizational roles that have been until now the domain of traditional software. In particular, the role of the Toolsmith should be one that modern media organizations adopt. Ignoring traditional functions of "IT", a toolsmith for, say, an investigative organization should be well-versed in what military types like to call Defensive Information Warfare. Beyond just the mere use of encryption (NOTE: ANYONE who equates encryption with security should be shot, or at least distrusted), such Toolsmiths should enable their journalists (who would correspond to the surgeon or the assistant in the surgical team model) to do their job in the face of strong adversaries. An entity that needs a toolsmith will also need a software base, and unless the entity has resources enough to create an entire software stack, that entity will need Free Open-Source Software (for various definitions of Free and Open I won't get into for fear of derailing my point).

I haven't been working in security much since the Solaris Diaspora, so I'm a little out of touch with modern threat environments. I suspect it's everything I'd previous imagined, just more real, and where the word "foreign" can be dropped from "major foreign governments". Anyone who cares about keeping their information and themselves safe should, in my opinion, have at least a toolsmith on their staff. Several organizations do, or at least have technology experts, like the ACLU's Christopher Soghoian, for example. The analogy could probably extend beyond security, but I wanted to at least point out the use of an effective toolsmith.

Monday, July 21, 2014

Happy (early) 20th anniversary, IPv6

My first full-time job out of school was with the The U.S. Naval Research Laboratory. It was a spectacular opportunity. I was going to be working on next-generation (at the time) Internet Protocol research and development.

When I joined in early 1994, the IPng proposals had been narrowed to three:

  • SIPP - Simple Internet Protocol Plus. 8-byte addresses, combined with a routing header that could, in theory, extend the space even further (inherited from IPng contender PIP).
  • TUBA - TCP Using Big Addresses. The use of OSI's CLNP with proven IPv4 transports TCP and UDP running over it.
  • CATNIP - Common Architecture for the Internet. I never understood this proposal, to be honest, but I believe it was an attempt to merge CLNP and IPv4.

NRL, well, my part of NRL, anyway placed its bet on SIPP. I was hired to help build SIPP for then-nascent 4.4BSD. (The first 10 months were actually on 4.3 Net/2 as shipped by BSDI!) It was a great team to work with, and our 1995 USENIX paper displayed our good work.

Ooops... I'm getting a bit ahead of myself.

The announcement of the IPng winner was to be at the 30th IETF meeting in Toronto, late in July. Some of us were fortunate to find out early that what would become IPv6 was SIPP, but with 16-byte addresses. Since I was building this thing, I figured it was time to get to work before Toronto.

20 years ago today, I sent this (with slightly reordered header fields) mail out to a subset of people. I didn't use the public mailing list, because I couldn't disclose SIPP-16 (which became IPv6) before the Toronto meeting. I also discovered some issues that later implementors would discover, as you can see.

From: "Daniel L. McDonald" <danmcd>
Subject: SIPP-16 stuff
To: danmcd (Daniel L. McDonald), cmetz (Craig Metz), atkinson (Ran Atkinson),
 deering@parc.xerox.com, Bob.Hinden@eng.sun.com,
 bob.gilligan@eng.sun.com, francis@cactus.ntt.jp,
 rxg@thumper.bellcore.com, set@thumper.bellcore.com, bound@zk3.dec.com,
 christian.huitema@sophia.inria.fr, conta@lassie.ucx.lkg.dec.com,
 grehan@flotsm.ozy.dec.com, nordmark@jurassic-248.Eng.Sun.COM,
 bill.simpson@um.cc.umich.edu, rj@sgi.com
Date: Thu, 21 Jul 1994 19:20:33 -0500 (EST)
Cc: vjs@sgi.com
X-Mailer: ELM [version 2.4 PL23]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-ID:  <9407220020.aa02835@sundance.itd.nrl.navy.mil>
Content-Length:  1578
Status: RO
X-Status: 
X-Keywords: NotJunk         
X-UID: 155

SIPP folks,

Has anyone tried quick-n-dirty SIPP-16 mods yet?

We have managed to send/receive SIPP-16 pings across both Ethernet and
loopback.  UDP was working with SIPP-8, and we're working on it for SIPP-16.
Minor multicast cases were working for SIPP-8 also, and will be moved to
SIPP-16.  TCP will be forthcoming once we're comfortable with some of the
protocol control block changes.

My idea for the SIPP-16 sockaddr_sipp and sipp_addr is something like:

 struct sipp_addr {
  u_long words[4];
 };

 struct sockaddr_sipp {
  u_char ss_len;     /* For BSD routing tree code. */
  u_char ss_family;
  u_short ss_port;
  u_long ss_reserved;
  struct sipp_addr ss_addr;
 };

We've managed to use the above to configure our interfaces, and send raw
SIPP-16 ICMP pings.  I've a feeling the routing tree will get hairy with the
new sockaddr_sipp.  The size discrepancy between the sockaddr_sipp, and the
conventional sockaddr will cause other compatibility issues to arise.
(E.g. SIOCAIFADDR will not work with SIPP, but SIOCAIFADDR_SIPP will.)

We look forward to the implementors meeting, so we can talk about bloody
gory details, experience with certain internals (PCBs!), and to find out
how far behind we still are.

Dan McD, Craig Metz, & Ran Atkinson
--
Dan McDonald       | Mail:  {danmcd,mcdonald}@itd.nrl.navy.mil --------------+
Computer Scientist | WWW:   http://wintermute.itd.nrl.navy.mil/danmcd.html   |
Naval Research Lab | Phone: (202) 404-7122        #include <disclaimer.h>    |
Washington, DC     | "Rise from the ashes, A blaze of everyday glory" - Rush +

Funny how many defunct-or-at-least-renamed organizations are in that mail (Sun, DEC, Bellcore) are in that mail. BTW, for Solarish systems, the SIOCSLIFADDR (note the 'L') became the ioctl of choice for longer sockaddrs. Also, this was before I discovered uintN_t data types.

If it wasn't clear from the text of the mail, we actually transmitted IPv6 packets across an Ethernet that day. It's possible these were the first IPv6 packets ever sent on a wire. (Other early implementations used IPv6-in-IPv4 exclusively.) I won't fully claim that honor here, but I do believe it could be true.

Monday, June 2, 2014

Home Data Center 2.0 - dogfooding again!

Over six years ago, I put together my first home data center (HDC), which I assembled around a free CPU that was given to me.
A lot has happened in those six years. I've moved house, been through three different employers (and yes, I count Oracle as a different employer, for reasons you can see here), and most relevant to this blog post - technology has improved.
My old home server was an energy pig, loud, and hitting certain limits. The Opteron Model 185 has a TDP of 110 watts, and worse, the original power supply in the original HDC broke, and I replaced it with a LOUD one from a Sun w2100z workstation. I also replaced other parts over the years as things evolved. What I ended up with at the start of 2014 was:
  • AMD Opteron Model 185 - No changes here.
  • Tyan S2866 - Same here, too.
  • 4GB of ECC RAM - Up from 2GB of ECC, to the motherboard's maximum. I tried at first with two additional GB of non-ECC, but one nightly build of illumos-gate where I saw a single-bit error in one built binary was enough to convince me about ECC's fundamental goodness.
  • Two Intel S3500 80GB SATA SSDs - I use these as mirrored root, and mirrored slog, leaving alone ~20GB slices (16 + 4) each. I'm under the assumption that the Intel disk controller will do proper wear-leveling, and what-not. (Any corrections are most appreciated!) These replace two different, lesser-brand 64GB SSDs that crapped out on me.
  • Two Seagate ST2000DL003 2TB SATA drives. - I bought these on clearance a month before the big Thailand flood that disrupted the disk-drive market. At $30/TB, I still haven't found as good of a deal, and the batch on sale were of sufficient quality to not fail me or my mirrored data (so says ZFS, anyway).
  • Lian Li case - I still like the overall mechanical design of this brother-in-law recommended case. I already mentioned the power supply, so I'll skip that here.
  • A cheap nVidia 8400 card - It runs twm on a 1920x1200 display, good enough!
  • OpenIndiana - After moving OpenSolaris from SVR4 to IPS, I used OpenSolaris until Oracle happened. OI was a natural stepping stone off of OpenSolaris.
I gave a talk on how I use my HDC. I'll update that later in this post, but suffice to say, between the energy consumption and the desire for me and my family to enable more services, I figured it was time to upgrade the hardware. With my new job at OmniTI, I also wanted to start dogfooding something I was working with. I couldn't use NexentaStor with my HDC, because of the non-storage functions of Illumos I was using. OmniOS, on the other hand, was going to be a near-ideal candidate to replace OpenIndiana, especially given its server focus.
As before, I started with a CPU for the system. The Socket 1150 Xeon E3 chips, which we had on one server at Nexenta (to help with the Illumos bring up of Intel's I210 and I217 ethernet chip, alongside Joyent and Pluribus), seemed an ideal candidate. Some models had low power draws, and they had all of the features needed to exploit more advanced Illumos features like KVM, if I ever needed it. I also considered the Socket 2011 Xeon E5 chips, but decided that I really didn't need more than 32GB of RAM for the forseeable future. So with that in mind, I asked OmniTI's Supermicro sales rep to put together a box for me. Here's what I got:
  • Intel Xeon E3 1265L v3 - This CPU has a TDP of 45 watts, that's 40% of the TDP of the old CPU. It clocks slightly slower, but otherwise is quite the upgrade with 4 cores, hyperthreading (looking like 8 CPUs to Illumos), and all of the modern bells and whistles like VT-x with EPT and AES-NI. It also is being used in at least one shipping illumos-driven product, which is nice to know.
  • Supermicro X10SLM-LN4F motherboard - This motherboard has four Intel I210 Gigabit ethernet ports on it. I only need two for now, thanks to Crossbow, but I have plans that my paranoia about separate physical LANs may require one or both of those last two. I'm using all four of its 6Gbit SATA ports, and it has two more 3Gbit ones for later. (I'll probably move the SSDs to the 3Gbit ones, because of latency vs. throughput, if I go to a 4-spinning-rust storage setup.) I've disabled USB3 for now, but if/when illumos supports it, I'll be able to test it here.
  • 32 GB of ECC RAM - Maxxed out now. So far, this hasn't been a concern.
  • Same drives as the old one - I moved them right over from the old setup. Installed OmniOS (see below), but basically did "zpool split", "zpool export" from the old server, and "zpool import" on the new one. ZFS again for the win!
  • Supermicro SC732D4 - The case, while not QUITE as cabling-friendly as the old Lian Li, has plastic disk trays that are an improvement over just screwing them in place on the Lian Li. The case comes standard with a four-disk 3.5" cage, and I added a four-disk 2.5" cage to mine. The 500W power supply seems to be an energy improvement, and is DEFINITELY quieter.
  • OmniOS r151010 - For my home server use, I'm going to be using the stable OmniOS release, which as of very recently became r151010. Every six months, therefore, I'll be getting a new OmniOS to use on this server. I haven't tried installing X or twm just yet, but that, and possibly printer support for my USB color printer, are the only things lacking over my old OI install.
I've had this hardware running for about two weeks now. It does everything the old server did, and a few new things.
  • File Service - NFS, and as of very recently, CIFS as well. The latter is entirely to enable scan-to-network-disk scanning. This happens in the global zone, on the "internal network" NIC.
  • Router - This is a dedicated zone which serves as the default router and NAT box. It also redirects external web and Minecraft requests (see below) to their respective zones. It also serves as an IPsec-protected remote access point. Ex-Sun people will know exactly what I'm talking about. It uses an internal vNIC, and a dedicated external NIC.
  • Webserver - As advertised. Right now it just serves static content on port 80 (www.kebe.com), but I may expand this, if I don't put HTTPS service in another zone later. This sits on an internal vNIC, and its inbound traffic is directed by the NAT/Router.
  • Minecraft - My children discovered Minecraft in the past year or so. Turns out, Illumos does a good job of serving Minecraft. With this new server, and running the processes as 32-bit ones (implicit 4Gig limit), I can host two Minecraft servers easily now. This sits on an internal vNIC as well.
  • Work - For now, this is just a place for me to store files for my job and build things. Soon, I plan on using another IPsec tunnel in the Router zone, an etherstub, and making this a part of my office, sitting in my house. Once that happens, I'll be using a dedicated NIC (for separation) to plug my work-issued laptop into.
  • Remote printing - I have a USB color printer that the global zone can share (via lpd). To be honest, I don't have this working on OmniOS just yet, but I'll get that back.
  • DHCP and DNS - Some people assume these are part of a router, but that's not necessarily the case. In this new instantiation, they'll live in the same zone as the webserver (which has a default route installed but is NOT the router). For this new OmniOS install, I'm switching to the ISC DHCP daemon. I hope to upstream it to omnios-build after some operational experience.
Not quite two weeks now, and so far, so good. My kids haven't noticed any lags in Minecraft, and I've built illumos-gate from scratch, both DEBUG and non-DEBUG, in less than 90 minutes. We'll see how DHCP holds up when Homeschool Book Club shows up with Moms carrying smartphones, tablets, and laptops, plus even a kid or two bringing a Minecraft-playing laptop as well for after the discussion.

Wednesday, February 26, 2014

It's just me, I think, but return()s in loops can be bad.

I was reviewing some code tonight. It was a simple linked-list match which originally looked like:

obj_t *
lookup(match_t key)
{
        obj_t *p;

        for (p = list_head(); p; p = list_next()) {
                if (p->val == key)
                        return (p);
        }

        return (NULL);
}

Not bad. But it turns out the list in question needed mutually exclusive access, so the reviewee inserted the mutex into this code.


obj_t *
lookup(match_t key)
{
        obj_t *p;

        mutex_enter(list_lock());
        for (p = list_head(); p; p = list_next()) {
                if (p->val == key) {
                        mutex_exit(list_lock());
                        return (p);
                }
        }

        mutex_exit(list_lock());
        return (NULL);
}

Eeesh, two places to call mutex_exit(). I suppose a good compiler would recognize the common basic blocks and optimize them out, but that's still mildly ugly to look at. Still, that above code just rubbed me the wrong way, even though I KNOW there are other bits of Illumos that are like the above. I didn't block the reviewer, but I did write down what I thought it should look like:


obj_t *
lookup(match_t key)
{
        obj_t *p;

        mutex_enter(list_lock());

        p = list_head();
        while ( p != NULL && p->val != key)
                p = list_next();

        mutex_exit(list_lock());
        return (p);
}

That seems simpler. The operation is encapsulated in the mutex_{enter,exit} section, and there are no escape hatches save those in the while boolean. (It's the always-drop-the-mutex-upon-return that makes language constructs like monitors look appealing.)

I think I'm probably making a bigger deal out of this than I should, but the last code looks more readable to me.

One thing the reviewee suggested to me was that a for loop like before, but with breaks, would be equally clean w.r.t. only having one place to drop the mutex. I think the reviewee is right, and it allows for more sophisticated exits from a loop.

Some people would even use "goto fail" here, but we know what can happen when that goes wrong. :)