Tuesday, February 26, 2013

Delegated ZFS, cloning, and SCM

Well THAT was a long break from blogging...

One of the things that's happened in the illumos community is a subtle shift of the main illumos source repository from being primarily Mercurial to being primarily Git. This means I've had to learn Git. At first, I wasn't sure why people were so rabidly pro-Git. I found one of the big reasons:

everywhere(~/ws)[0]% /bin/time git clone git-illumos git-illumos.copy
Cloning into git-illumos.copy...
done.

real       11.8
user        4.7
sys         3.2
everywhere(~/ws)[0]% /bin/time hg clone illumos-clone illumos-clone.copy
updating working directory
44332 files updated, 0 files merged, 0 files removed, 0 files unresolved

real     1:52.6
user       28.9
sys        25.4
everywhere(~/ws)[0]% 

Wow! Yeah, I can see why this would appeal to people. I'm still using Mercurial in a fair amount of places, both for my illumos work and for Nexenta as well. I should show one other thing that both SCM cloning operations do: take up disk space.

everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time git clone git-illumos git-illumos.copy

  *** SNIP! *** 

everywhere(~/ws)[0]% sync
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G  99.6G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time hg clone illumos-clone illumos-clone.copy

  *** SNIP! *** 

everywhere(~/ws)[0]% sync
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   199G  98.7G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% 

I believe Git will also take up less disk space, but still, that's approximately half a gig or more for an illumos workspace. If it's populated, say with a preinstalled proto area and compiled objects, that'll be even larger.

Consider one of the great strengths of ZFS: its copy-on-write architecture. Take a local, on-disk master repo, say one you're pulling directly from the source, and make it its own filesystem. Child/downstream workspaces from your on-disk master now can be created using low-latency ZFS operations. Only two problems need to be solved: non-privileged usage, and SCM correction to properly designate the parent/child or upstream/downstream relationship.

Another useful ZFS feature is administrative delegation. Put simply, an administrator can allow an ordinary user to perform selected ZFS primitives on a given filesystem, and its descendants in the ZFS filesystem tree. For example:

everywhere(~)[0]% zfs allow rpool/export/home/danmcd
everywhere(~)[0]% zfs allow rpool/export/home/danmcd/ws
---- Permissions on rpool/export/home/danmcd/ws ----------------------
Local+Descendent permissions:
        user danmcd clone,create,destroy,mount,promote,snapshot
everywhere(~)[0]% 

I (as root) delegated several permissions for a subdirectory of $HOME to me (as danmcd). From here, I can create new filesystems in ~/ws, as well as destroy them, clone them, mount, snapshot, and promote them. All of these are useful operations. The syntax for delegation is mostly straightforward: zfs allow -ld clone,create,destroy,mount,promote,snapshot rpool/export/home/danmcd/ws. The -ld flags enable local and descendant permission propagation.

First thing I did was zfs create rpool/export/home/danmcd/ws/illumos-clone, followed by hg clone ssh://anonhg@hg.illumos.org/illumos-gate illumos-clone. This populates my local Mercurial illumos repo. I can perform a similar operation with git. Per my above timing examples, I did so with git-illumos.

I wrote a script to clone, promote, and reparent Git and Mercurial workspaces using ZFS operations. It's called zclone and it's here for download. It's still a work in progress, and I'd like to maybe have it end up in usr/src/tools in illumos-gate someday. (I'll try and update this particular post as things evolve.)

Check out the times, and the disk space (not) used:

everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% /bin/time zclone git-illumos git-illumos.zc
Created rpool/export/home/danmcd/ws/git-illumos.zc,
    a zfs clone of rpool/export/home/danmcd/ws/git-illumos

real        1.0
user        0.0
sys         0.0
everywhere(~/ws)[0]% /bin/time zclone illumos-clone illumos-clone.zc
Created rpool/export/home/danmcd/ws/illumos-clone.zc,
    a zfs clone of rpool/export/home/danmcd/ws/illumos-clone

real        1.0
user        0.0
sys         0.0
everywhere(~/ws)[0]% zpool list
NAME    SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool   298G   198G   100G         -    66%  1.00x  ONLINE  -
everywhere(~/ws)[0]% 

These are constant-time operations, folks. And like I said earlier, I suppose its possible to have the local master repos populated with pre-compiled objects, header files in proto areas (an illumos build trick), and other disk-intensive operations pre-performed.

A quick search didn't yield me any results in this area: using ZFS to help make source trees take up less space. I'm surprised nobody's blogged about this or documented it, but I may have missed something. Either way, it doesn't hurt to mention it again.

3 comments:

  1. I've been doing something like this with hg for a while, and yes, it's an excellent time saver, and saves large amounts of disk space too. (I do keep an unpacked copy of the closed bins, and do "make rootdirs" in my zfs parent dataset.)

    Thanks for posting a git version of this handy tool.

    ReplyDelete
  2. Like it or not, here's my take on this (with surprisingly similar name) among my other git plugins: https://github.com/jimklimov/git-scripts

    Announced in mailing lists quite a while ago :)

    Usage: e.g. "git pull" into local workspace which is just a replica of upstream, "git zclone" it to spawn a build workspace (with downloads and publish-repos pointed via envvars into another dataset) and build there. Then instead of a "make clean" you can just "zfs destroy" this workspace.

    ReplyDelete