Thursday, December 12, 2013

What I learned from my Atari 8-bit days

Happy Throwback Thursday! Some time ago, also on Throwback Thursday, I tweeted a link to a document I wish I had when I was much younger:

I wish I'd had it, because it may have helped me save my first 8-bit Atari computer (an 800XL) from having its POKEY chip fried by a dumb copy-ROM-into-RAM loop. Beyond learning not to blindly write into hardware registers, my Atari 8-bits ended up teaching me a surprising amount. A fair amount of what I learned helped me mature into a proper Computer Scientist and Software Engineer.

Be Careful of the Next Version

I generally look forward to upgrades. Bugs get fixed, features get added, things move faster, and if you're really lucky, you get more than one of those with one upgrade. It doesn't always turn out nicely, though. Sometimes, the next version changes things enough where things that once worked no longer do. Other times, the next version just plain sucks.

8-bit Atari owners had two serious negative encounters - one of each kind. The unexpected change was the transition from the original 400 & 800 models to the XL (and later XE) series. The reason this was a problem is actually best described later.

Atari's DOS (almost every 8-bit machine's disk drivers were called "DOS") lingered on version 2.0 from 1980 until 1984. To accompany new "enhanced density" 5.25" floppy drives, Atari released DOS 3. DOS 3 falls squarely into the, "just plain sucks," category. It was a poor design, including such misfeatures as:

  • Larger block sizes (2048 bytes vs. 128 bytes), which lead to wasted disk space and sometimes less overall capacity if anything barely spilled into the next block
  • One-way migration. Once your data moved to DOS 3, it wasn't going back.
  • An overbearing help system that took up disk space (already at a premium).
I didn't know what it was called at the time, but DOS 3 suffered from the Second-System Effect. Luckily, Atari ended up offering DOS 2.5, which looked like DOS 2.0, save for both support for enhance-density floppies, AND the ability to migrate DOS 3 files back to DOS 2.x.

Declare Your String Sizes

Jumping from Pascal or even BASIC to a language like C could be confusing to some. "What do you mean strings are just a character array?" If you cut your teeth on Atari BASIC, you already had an inkling of what was going on.

The classic Microsoft BASIC took up more than the 8K bytes that 8-bit Ataris had reserved for the cartridge slot. The resulting shrinkage of Atari BASIC included the array-like requirements for strings. On classic Microsoft BASIC:


10 A$="HELLO, WORLD"
20 PRINT "THE TEST STRING IS: ", A$
But you had to declare the string size in Atari BASIC:

5 DIM A$(100)
10 A$="HELLO, WORLD"
20 PRINT "THE TEST STRING IS: ", A$
One could not have an array of strings in Atari BASIC, and some of the classic BASIC array operators took on new significance in Atari BASIC. See here for a treatise on the subject.

Don't Depend on Implementation Details

I mentioned the transition from the 400 and 800 to the XL series. Several pieces of software broke when they loaded onto an XL. The biggest reason for this was because these programs, to save cycles, would jump directly into various ROM routines that were supposed to be accessed through a documented table of JMP instructions. To save the three cycles of an additional JMP, programs would often inline the table entries into their programs. The XL series included a rewritten ROM, which scrambled a large portion of where these routines were implemented. BOOM, no more working code.

Atari, to their credit, released a "Translator" boot disk, which loaded a variant of the old 800 ROM into the XL's extended, bank-switched, RAM, and ran the system using the old 800 ROM. This allowed the broken software to continue to work.

You WILL Have Rejected Submissions

Owning an 8-bit Atari meant you subscribed to at least one of Antic or ANALOG. I was an ANTIC subscriber until I graduated high school. I even tried to submit, twice, type-in programs with accompanying articles to ANTIC. Both were terrible, and rightly rejected by the editor. I'm honestly afraid to remember what they were.

And William Gibson's a Pretty Good Writer

Speaking of Antic , check out this article from September, 1985, especially Part 3 of the article. I immediately scoured the Waukesha County Library System trying to find Neuromancer, and wasn't disappointed... not at all. 16-year-old me really liked this book, and wouldn't have discovered it before college were it not for ANTIC, which I'd have not read without my 8-bit Atari.

Saturday, September 7, 2013

I Have No Whistle to Blow, But I Must Scream

I'm sure all twelve of you readers out there know what's been going on with respect to recent revelations about NSA activity. Among other things is the unnerving discovery that NSA has been attempting to actively dumb-down security for the Internet.

In the second linked article, Bruce Schneier calls upon people to blow the whistle on, "how the NSA and other agencies are subverting routers, switches, the internet backbone, encryption technologies and cloud systems." Here's the deal:

I have never been asked to introduce back-doors or weaken security in the Solaris, OpenSolaris, Oracle Solaris 11 (for the four months I worked on it post-barn-door-closing), or Illumos. If there are weaknesses there, it was not because of any deliberate effort on my part.

You can view the kernel IPsec protocol sources (AH & ESP) here, by looking at ipsec*.c, sadb.c, spd.c, spdsock.c, keysock.c and header files in the directory above it. You can see the IPsec management utilities here. According to at least one well-known security researcher, the Illumos (nee OpenSolaris) IPsec code isn't bollocks.

There is no open-source for IKE, because the libike.so.1 library was mostly OEM code, from a vendor whose technical lead let me co-write an RFC with him. You can use the various observability and debugging tools in Illumos to see how things work, however, if you wish.

If you want to write your own, better, key management application for Illumos (or even Oracle Solaris), you can use PF_KEY to control the IPsec SADB. I detail the subsequent additions to RFC 2367 on my day-one-of-OpenSolaris blog post. If you want to work on IPsec in totally-open-source Illumos, you have my blessing, and I'll definitely be reviewing (and maybe integrating if you pass code reviews) your code.

Monday, March 25, 2013

Broad-Spectrum Dogfooding, or Why I Miss Jurassic.

I think most of you dozen readers know what I mean, when I refer to dogfooding. Some people think of Microsoft when they hear the term, but I first heard it from the same person via his being a Sun customer, AND via my old roommate, who worked for him.

I saw this Tweet last week:

I then checked out the blog post. It dealt with how an iSCSI LAN can be a failure point, partially due to the weakness of the ones-complement TCP/IP checksum

Reading this reminded me of an old bug we found in Sun with either NFS or an ethernet device driver, and the only way we caught it was by using IPsec (AH particularly) and seeing packets fail the authentication check. The corrupt NFS packets had 16-bits worth of 1 (0xffff), where it should have had 16-bits worth of 0 (0x0000). Using the standard TCP/IP checksum, there's no difference between those two values, no matter where they fall in the packet. Using IPsec, however, even with HMAC-MD5, showed the packet failure clearly when the packet authentication check failed. This bug wouldn't have been discovered were it not for the Solaris Team's big honking server, jurassic, and how its multiple concurrent uses interacted with each other.

Even before there was OpenSolaris, people knew about jurassic. Solaris people's (not any old Sun people... Solaris people) posts on IETF mailing lists often showed user@jurassic. Jurassic served as the NFS source of home directories, and until the early 2000s e-mail inboxes as well. Every two weeks the in-development Solaris build would be placed upon jurassic. As a Solaris developer, if your changes broke jurassic, you fixed those changes immediately, or risked getting your changes yanked out. Not breaking jurassic was a great motivator for code quality. Also, if you had a new feature, you wanted it used on jurassic, even if not by everyone.

Once the basic IPsec protocols - AH & ESP - went into Solaris 8, I convinced the jurassic maintainers to protect all traffic between jurassic and a couple of workstations. One was mine, naturally. I encrypted all of my traffic to jurassic. Since we only had 100Mbit in our building at that time, the performance hit wasn't too bad, relatively speaking. Another belonged to an NFS developer, who I'd somehow convinced to run AH, because I was already running ESP (and AH used less cycles for protection). It was this NFS developer, surprised he wasn't getting data corruption while other were, who helped suss out the bug in question.

At this point, I'd like to have a moment of silence for all of the made-public Solaris information that Oracle has since put back in its box. I could've had a bug id here, folks, A REAL BUG ID!!!

So for a few of us, jurassic also served as an IPsec testbed. It also was helpful in determining that nobody else's cleartext performance dropped while a few of us were running with network traffic (put more succinctly, connection policy latching worked). Other services would run on jurassic as well: DNS, IMAP, and others I'm sure I'm forgetting. Jurassic core dumps eventually would be used to test out the then-new mdb (oh, those early ::findleaks results...), and I'm sure more than a few DTrace scripts helped diagnose some jurassic-discovered bugs.

At Nexenta, we make a dedicated storage appliance. Naturally, we use them inside where appropriate. We Nexentians (especially the ones in Lowell) use Illumos from other distributions for even greater effect. My Illumos Home Data Center talk touches upon these at about 10:43 in. We use Illumos to host VMs (Thank you Joyent), we use it for site-to-site VPNs, we will be using it for public services at some point, and everything I mentioned all runs on Illumos. It's not quite the magnifying glass Jurassic was, but we do what we can.

I believe Oracle still has jurassic around, I know it did prior to my 2011 departure. I suspect it's helping Oracle Solaris even today. I suspect, however, that a less dense, but more widely instantiated broad-spectrum dogfooding continues on in Illumos today.

Tuesday, February 26, 2013

Delegated ZFS, cloning, and SCM

Well THAT was a long break from blogging...

One of the things that's happened in the illumos community is a subtle shift of the main illumos source repository from being primarily Mercurial to being primarily Git. This means I've had to learn Git. At first, I wasn't sure why people were so rabidly pro-Git. I found one of the big reasons:

everywhere(~/ws)[0]% /bin/time git clone git-illumos git-illumos.copy
Cloning into git-illumos.copy...
done.

real       11.8
user        4.7
sys         3.2
everywhere(~/ws)[0]% /bin/time hg clone illumos-clone illumos-clone.copy
updating working directory
44332 files updated, 0 files merged, 0 files removed, 0 files unresolved

real     1:52.6
user       28.9
sys        25.4
everywhere(~/ws)[0]% 

Wow! Yeah, I can see why this would appeal to people. I'm still using Mercurial in a fair amount of places, both for my illumos work and for Nexenta as well. I should show one other thing that both SCM cloning operations do: take up disk space.

everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time git clone git-illumos git-illumos.copy

  *** SNIP! *** 

everywhere(~/ws)[0]% sync
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G  99.6G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time hg clone illumos-clone illumos-clone.copy

  *** SNIP! *** 

everywhere(~/ws)[0]% sync
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   199G  98.7G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% 

I believe Git will also take up less disk space, but still, that's approximately half a gig or more for an illumos workspace. If it's populated, say with a preinstalled proto area and compiled objects, that'll be even larger.

Consider one of the great strengths of ZFS: its copy-on-write architecture. Take a local, on-disk master repo, say one you're pulling directly from the source, and make it its own filesystem. Child/downstream workspaces from your on-disk master now can be created using low-latency ZFS operations. Only two problems need to be solved: non-privileged usage, and SCM correction to properly designate the parent/child or upstream/downstream relationship.

Another useful ZFS feature is administrative delegation. Put simply, an administrator can allow an ordinary user to perform selected ZFS primitives on a given filesystem, and its descendants in the ZFS filesystem tree. For example:

everywhere(~)[0]% zfs allow rpool/export/home/danmcd
everywhere(~)[0]% zfs allow rpool/export/home/danmcd/ws
---- Permissions on rpool/export/home/danmcd/ws ----------------------
Local+Descendent permissions:
        user danmcd clone,create,destroy,mount,promote,snapshot
everywhere(~)[0]% 

I (as root) delegated several permissions for a subdirectory of $HOME to me (as danmcd). From here, I can create new filesystems in ~/ws, as well as destroy them, clone them, mount, snapshot, and promote them. All of these are useful operations. The syntax for delegation is mostly straightforward: zfs allow -ld clone,create,destroy,mount,promote,snapshot rpool/export/home/danmcd/ws. The -ld flags enable local and descendant permission propagation.

First thing I did was zfs create rpool/export/home/danmcd/ws/illumos-clone, followed by hg clone ssh://anonhg@hg.illumos.org/illumos-gate illumos-clone. This populates my local Mercurial illumos repo. I can perform a similar operation with git. Per my above timing examples, I did so with git-illumos.

I wrote a script to clone, promote, and reparent Git and Mercurial workspaces using ZFS operations. It's called zclone and it's here for download. It's still a work in progress, and I'd like to maybe have it end up in usr/src/tools in illumos-gate someday. (I'll try and update this particular post as things evolve.)

Check out the times, and the disk space (not) used:

everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time zclone git-illumos git-illumos.zc
Created rpool/export/home/danmcd/ws/git-illumos.zc,
    a zfs clone of rpool/export/home/danmcd/ws/git-illumos

real        1.0
user        0.0
sys         0.0
everywhere(~/ws)[0]% /bin/time zclone illumos-clone illumos-clone.zc
Created rpool/export/home/danmcd/ws/illumos-clone.zc,
    a zfs clone of rpool/export/home/danmcd/ws/illumos-clone

real        1.0
user        0.0
sys         0.0
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% 

These are constant-time operations, folks. And like I said earlier, I suppose its possible to have the local master repos populated with pre-compiled objects, header files in proto areas (an illumos build trick), and other disk-intensive operations pre-performed.

A quick search didn't yield me any results in this area: using ZFS to help make source trees take up less space. I'm surprised nobody's blogged about this or documented it, but I may have missed something. Either way, it doesn't hurt to mention it again.