Tuesday, February 26, 2013

Delegated ZFS, cloning, and SCM

Well THAT was a long break from blogging...

One of the things that's happened in the illumos community is a subtle shift of the main illumos source repository from being primarily Mercurial to being primarily Git. This means I've had to learn Git. At first, I wasn't sure why people were so rabidly pro-Git. I found one of the big reasons:

everywhere(~/ws)[0]% /bin/time git clone git-illumos git-illumos.copy
Cloning into git-illumos.copy...
done.

real       11.8
user        4.7
sys         3.2
everywhere(~/ws)[0]% /bin/time hg clone illumos-clone illumos-clone.copy
updating working directory
44332 files updated, 0 files merged, 0 files removed, 0 files unresolved

real     1:52.6
user       28.9
sys        25.4
everywhere(~/ws)[0]% 

Wow! Yeah, I can see why this would appeal to people. I'm still using Mercurial in a fair amount of places, both for my illumos work and for Nexenta as well. I should show one other thing that both SCM cloning operations do: take up disk space.

everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time git clone git-illumos git-illumos.copy

  *** SNIP! *** 

everywhere(~/ws)[0]% sync
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G  99.6G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time hg clone illumos-clone illumos-clone.copy

  *** SNIP! *** 

everywhere(~/ws)[0]% sync
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   199G  98.7G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% 

I believe Git will also take up less disk space, but still, that's approximately half a gig or more for an illumos workspace. If it's populated, say with a preinstalled proto area and compiled objects, that'll be even larger.

Consider one of the great strengths of ZFS: its copy-on-write architecture. Take a local, on-disk master repo, say one you're pulling directly from the source, and make it its own filesystem. Child/downstream workspaces from your on-disk master now can be created using low-latency ZFS operations. Only two problems need to be solved: non-privileged usage, and SCM correction to properly designate the parent/child or upstream/downstream relationship.

Another useful ZFS feature is administrative delegation. Put simply, an administrator can allow an ordinary user to perform selected ZFS primitives on a given filesystem, and its descendants in the ZFS filesystem tree. For example:

everywhere(~)[0]% zfs allow rpool/export/home/danmcd
everywhere(~)[0]% zfs allow rpool/export/home/danmcd/ws
---- Permissions on rpool/export/home/danmcd/ws ----------------------
Local+Descendent permissions:
        user danmcd clone,create,destroy,mount,promote,snapshot
everywhere(~)[0]% 

I (as root) delegated several permissions for a subdirectory of $HOME to me (as danmcd). From here, I can create new filesystems in ~/ws, as well as destroy them, clone them, mount, snapshot, and promote them. All of these are useful operations. The syntax for delegation is mostly straightforward: zfs allow -ld clone,create,destroy,mount,promote,snapshot rpool/export/home/danmcd/ws. The -ld flags enable local and descendant permission propagation.

First thing I did was zfs create rpool/export/home/danmcd/ws/illumos-clone, followed by hg clone ssh://anonhg@hg.illumos.org/illumos-gate illumos-clone. This populates my local Mercurial illumos repo. I can perform a similar operation with git. Per my above timing examples, I did so with git-illumos.

I wrote a script to clone, promote, and reparent Git and Mercurial workspaces using ZFS operations. It's called zclone and it's here for download. It's still a work in progress, and I'd like to maybe have it end up in usr/src/tools in illumos-gate someday. (I'll try and update this particular post as things evolve.)

Check out the times, and the disk space (not) used:

everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time zclone git-illumos git-illumos.zc
Created rpool/export/home/danmcd/ws/git-illumos.zc,
    a zfs clone of rpool/export/home/danmcd/ws/git-illumos

real        1.0
user        0.0
sys         0.0
everywhere(~/ws)[0]% /bin/time zclone illumos-clone illumos-clone.zc
Created rpool/export/home/danmcd/ws/illumos-clone.zc,
    a zfs clone of rpool/export/home/danmcd/ws/illumos-clone

real        1.0
user        0.0
sys         0.0
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% 

These are constant-time operations, folks. And like I said earlier, I suppose its possible to have the local master repos populated with pre-compiled objects, header files in proto areas (an illumos build trick), and other disk-intensive operations pre-performed.

A quick search didn't yield me any results in this area: using ZFS to help make source trees take up less space. I'm surprised nobody's blogged about this or documented it, but I may have missed something. Either way, it doesn't hurt to mention it again.