varakh.de

My Setup - Journey to NixOS Part II

This post is part of #nixjourney series.

Recently, I came across a blog post stating someone’s reasons to move away from nix on his Mac where he maintains dotfiles using Home Manager. This made me think. Some points are valid: /nix/store is large (my desktop has ~267G right now as I keep building for all hosts on my machine and have the tendency to never clean it up), quickly “changing” a hot key takes more time than editing the configuration file directly. Though, that cannot come as surprise, right? It’s the trade off you make for being the new cool kid on the block (or reproducibility, easy rollback) and central management. It plays nicely together if you manage more than your dotfiles with nix, like a lot more. That’s where nix shines. It’s not supposed to work for everyone. If it doesn’t work for you, that’s totally fine. It does for me though. As you might now, I am Cosplaying as a Sysadmin for quite some time which made me finally move to NixOS from Arch (btw) over two years ago.

That post was the perfect motiviation for me to continue the #nixjourney series.


When I started looking around into NixOS documentation, some guides, starters, I really had no clue what was going on and how nice the final result would be. I’ve learned a lot on the way. It took several months to migrate all hosts during my spare time. In the beginning, I was confused about all the terms. I knew my goal and hoped that nix can deliver that, but why would there be different solutions? Nix, NixOS, Nix Flakes, Nix channels, home manager, and sops for secrets?! Took some reading to get this sorted out, but when it was, I noticed flakes are actually what I want. To be honest, without flakes, I would have probably never fully switched to “nixify” my setup. Flakes give you a .lock file, similar to package-lock.json in node applications or alike. They exactly determine which versions and dependencies you’ll get, but on an operating system level whereas Nix channels are more “volatile”, not providing the same guarantees. You definitely have more control with flakes and I would recommend everyone to go with flakes.

While researching to get a jump start, I thought it would be nice to learn by doing and having a solid base line I can extend, thus I started my journey by picking up nix-starter-configs [1]. Nix as language felt alien. Despite the fact that I know some functional programming, it was harder to get started and to write each line of code than I thought. In the beginning, I spent more on researching, reading, and asking than actually getting my setup done. In retrospective, the relatively complex starter didn’t help either. It might have even confused me more. In all honesty, it was a brutal and steep learning curve to get basic things done. After a while though, results were reasonable. I started to enjoy how simple it was getting applications ready and fully configured through nix code/modules, no manual config tuning on any server afterward, that was huge! I also loved the implicit documentation it forces you to do. You’re documenting your steps with git commits (or at least I did that). Your nix setup is a git repository, the same as with any other code. With this, you have all the benefits of versioning, but on an operating system level. Derivations force you to even add proper dependencies to any of your bash scripts which you’ve likely put together late the other day and never touched them again. Furthermore, nix modules force you to structure your setup, add custom options, and enable dependencies to other modules properly by for example leveraging lib.mkDefault.

Instead of starting with my desktop machines to NixOS, I decided to move some of my VPS first. I kept the old ones still alive… in case I decide otherwise half way. Another reason behind this was that I didn’t want to disrupt my daily routine yet. In the end of the day, it was experimenting and seeing how it goes, taking the safe route. I also knew that fully migrating all of my dotfiles and in addition my tiling window manager setup to nix would take more time than going with basic servers. The plan was simple. Iterate, move quickly, do mistakes, recover, and have No-Go’s show up as early in the process as possible to evaluate if this is more than a prototype, something I want to keep. Turns out: Yes, I really want this to stay and never move back (hopefully or until something better comes up).

Even if something breaks (which actually never happened to me since I’ve started using NixOS), having the ability to easily rollback by booting into the previous version feels like a huge step forward (each build/deploy creates a new, bootable generation which you can boot into when your machine comes up through normal means of your bootloader). Back then, I did encounter issues sometimes on my Arch machines with dependencies, or manual interventions were required which I then needed to repeat for all of my hosts which was annoying. I haven’t seen this (until now) with my current NixOS setup.

My setup

Let’s directly jump into it. I currently manage eight hosts using NixOS. The entire setup is “nixified”, meaning that I don’t add plain config files to a specific location, but instead use the provided modules or write them, though some scheme files are imported from plain text files.

The structure is something which evolved the more I added to my setup, still based on the initial base line I touched above.

 1├── commons
 2│   ├── modules     # Opinionated modules (with my configuration)
 3│   ├── presets     # Opinionated presets for desktop, server etc., importing modules/ or snippets/
 4│   └── snippets    # Opinionated snippet options to directly use in imports (with my configuration)
 5├── hosts           # Host configuration, importing presets and specific host configuration
 6│   ├── ando (vps)
 7│   ├── hoth (vps)
 8│   ├── ilum (laptop)
 9│   ├── kuat (vps)
10│   ├── mantell (homeserver)
11│   ├── rion (Raspberry Pi 3)
12│   ├── ukio (Raspberry Pi 4)
13│   └── ziost (desktop)
14├── modules         # Independent modules not provided by upstream
15├── overlays        # Overlays
16└── pkgs            # Derivations for custom packages

The commons/ folder contains opinionated nix files. This means that those represent my configuration or setup, like theming, keyboard shortcuts, or other settings usually wrapped into a custom module under commons/modules/. They typically use the mkIf guard pattern, meaning that I’ve wrapped those opinionated settings by either leveraging the .enable upstream module option or one of my custom written modules (not present in upstream at all and located under modules/).

Here’s an example for btop to apply my settings like the dracula theme.

 1{ config, lib, ... }:
 2{
 3  config = lib.mkIf config.programs.btop.enable {
 4    programs.btop = {
 5      settings = {
 6        color_theme = "dracula";
 7        theme_background = false;
 8      };
 9      themes = {
10        # ....
11      };
12    };
13  };
14}

As you can see, if I now enable the programs.btop.enable anywhere in my config, I’ll automatically get my settings.

Files inside the presets/ folder import modules and set the enable option for them depending on a use case. I have a desktop.nix and a server.nix. This allows me to import a simple preset in my actual hosts file and then all my defaults are applied. If I need to overwrite any defaults for a host (when I don’t need a specific module), I can simple set their module option to be disabled (or not import the preset at all).

The snippets/ folder contains nix snippets which I tend to re-use often, but have no intention of modularising them as they’re so simple that I think they’re not worth wrapping in an additional module and use the import directive instead. Here’s an example of such a snippet.

1# locale.nix
2{ ... }:
3{
4    console.font = "Lat2-Terminus16";
5    console.keyMap = "de-latin1-nodeadkeys";
6}

The hosts/ folder contains files for my all my hosts. When adding a new host, I add a new folder, adapt my flake.nix to import it under a hostname and that’s about it. This means I import a preset and a specific hardware-configuration.nix which is generated during install and some boilerplate code around users, Home Manager, and deployment (see below).

Secret management

When you move to a nixified setup, you don’t want any secret hard-coded in your nix files. Those would be tracked in your git or put into the nix store when you build. Instead, what you really want is proper secret management. There are some alternatives out there, but what has worked best for me is sops-nix. It’s pretty flexible and supports secrets per host. You configure yaml file(s) for a host, edit them with the sops command, e.g., sops ./hosts/mantell/secrets.yaml. The result is asymmetrically encrypted against a sops key of your machine (or multiple ones) and the hosts machine (automatically derived from SSH setup for example, can also be done manually). It also works per user on a certain host. Pretty convenient. With this setup, you can reference any of your secret keys in your nix files.

Here’s an example to secure a reverse proxy with basic auth.

 1{ ... }:
 2let
 3    fqdn = "mydomain.tld";
 4in
 5{
 6    sops.secrets.nginx-basicauth-fqdn.owner = "nginx";
 7
 8    services.nginx.virtualHosts."${fqdn}" = {
 9        forceSSL = true;
10        enableACME = true;
11        locations."/" = {
12            proxyPass = "http://127.0.0.1:${toString port}";
13            extraConfig = ''
14            auth_basic "Protected";
15            auth_basic_user_file ${config.sops.secrets.nginx-basicauth-fqdn.path};
16            '';
17        };
18    };
19}

Notice that I’ve leveraged the directive auth_basic_user_file from nginx directly. Most Nix options actually have a dedicated option to support files as secret inputs. If they only support an environment file, the sops YAML file can host the entire environment file as well. The only catch is that the underlying application needs to have a configuration option to get the necessary application secrets in. Most modern applications do or can be worked around with environment files. If they absolutely don’t support that, I personally think it’s worth requesting it from upstream. So, if you’re an application developer, please consider directly supporting file secrets.

When you change receivers/keys for a secret file, make sure to update the keys with sops updatekeys ./hosts/<name>/secrets.yaml. Otherwise sops cannot decrypt that properly!

Building and deployment

To get the state of your nix files onto a machine, you need to build the nix generation, either on the host itself or “deploy” it from your local machine. For my desktop hosts, I directly build it locally with sudo nixos-rebuild switch --flake .\#ziost or instead of the switch the boot option to have it live on next reboot. This means that you need to have your Nix git repository on the machine, checkout the state you like to build, and execute the command above where ziost is one of my flake’s configuration (desktop machine).

This works fine, but you might not want to go the same route with your servers. What I use instead is build the nix generation on my machine and then deploy it over SSH. For this, I set up deploy-rs in my flake.nix file for each host to define how I want to deploy, e.g., over SSH with a specific user on the remote machine and alike. Let’s imagine I like to deploy the latest state for my server ando, then I’d simply invoke deploy .\#ando on my machine in the git repository. deploy-rs sends over all necesarry data and the server’s updated with my desired state. This means that I build the nix generation on my machine first and send it over afterwards. It also gives you a lot of options how you want the deploy process to happen. You can also decide to build remotely instead.

Upgrading

When I upgrade, I invoke nix flake update to update all defined inputs. This also updates the aforementioned flake.lock file. Afterward, I build the toplevel target for all my hosts separately to see if it builds successfully before I move on into deploying that state. No worries though, deploy-rs would obviously not deploy something which does not build!

1# build a single target, replace <host> with the proper nix configuration (visible through nix flake show)
2nix build .\#nixosConfigurations.<host>.config.system.build.toplevel

For a lot of hosts, this can get quite tedious to do. I use a one-liner to do it for all my defined hosts:

1nix eval .\#nixosConfigurations --raw --apply 'c: builtins.concatStringsSep "\n" (builtins.attrNames c)'| xargs -I'{}' nix build .\#nixosConfigurations.'{}'.config.system.build.toplevel --out-link result-'{}'

Looks scary, but you can put it into a shell alias or into a nix development shell for your nixos-config git repository. It iterates over all defined nix configurations, builds them and stores it as result-<name>.

If that is successful, I deploy one by one as it really depends on the host when I want to do it.

There’s also a way to visualize changes of nix generations, e.g., which packages have updates, with nvd. What I do there is a) build the current generation, b) update nix flake, and lastly c) build the “next” generation, then compare with nvd diff <current> <next>.

I wrapped this into a little convenient script called nvd-diff-prev-next. Here’s an example output when I call that for my VPS ando:

1nix-shell-env ❯ nvd-diff-prev-next ando
2⚠️ Resetting lock file...
3<<< result-ando-prev
4>>> result-ando-next
5Version changes:
6[U.]  #01  linux              6.12.61, 6.12.61-modules x2, 6.12.61-modules-shrunk -> 6.12.63, 6.12.63-modules x2, 6.12.63-modules-shrunk

OS releases

NixOS releases come out every six months, following a year and month versioning scheme (25.05 and 25.11). When this happens, some options you use might be deprecated or got replaced. I typically upgrade the flake input references, then invoke builds and see what breaks (if at all). For strict validation I tend to use --abort-on-warn for building the toplevel target such that even warnings or future deprecations are captured.

This release cycle sounds exhaustive, like you would need to change a lot every half year. In reality though, options are pretty stable, they have automatic migrations, and are usually backwards compatible, even introducing a deprecation / warning. For the past OS releases, my changes are on average 10-20 lines of code changed where most of them target deprecations which would still work fine.

Wrap up

Now with my setup explained, what made me ultimately move and commit to it? Reproducibility, easy fallback to boot into a (previously working) generation. Versioning all my setups with git, sharing (configuration) code in one central place. And, lastly, declaring what I want instead of changing what is provided to me as base made me do the final switch. Without any exaggeration, it feels revolutionary when you’ve fully moved to it. It also costs less time to maintain which was ultimately the goal of why I looked into alternatives in the first place. There are multiple tools out to have a similar feeling for non-declarative distributions (ansible, Puppet). None of them felt right to me though, more like a workaround. Nix is different in every aspect. It feels way more natural compared to the tools you plug in on top where this can be solved directly, not generating that “alien feeling” that it “doesn’t belong there”.

Obviously, there are caveats. It’s a trade off nevertheless. Despite giving you all the benefits and actually saving you time after the migration, the nix eco system can be a rabbit hole. It’s code, your imagination is the limit on how you optimize your setup, where to improve, what to fine-tune, and how much time you spend with this. If you go too deep, then it might not be the time saver anymore, though it will stay fun nevertheless. It’s really up to you to hold you back from over-optimizing and iterating too much over it than needed. 😅

For me personally, it worked out quite well, but it would be an unfair statement that I can recommend it to everyone. I think there’s a clear audience for this: people willing to take the steep learning curve and people who value reproducibility, stability, and versioning for host setup. For a casual user, I think a standard GNU/Linux distribution will do way better. I think it would even scare people off showing them “what is possible”. Though, the moment you face maintaining a lot of hosts, it might be a fit for you. Remember, you decide what exactly you “nixify”. You don’t need to “nixify” your entire setup including user configuration. You can also only have the base system “nixified”.

The next aspect I’d like to personally dive deeper into is some automation around CI/CD and a (local network) cache for Nix. I can imagine that my homesever (mantell) regularly does some build tests, creates merge requests in my git repository and does automatically deploy the main branch to all of my hosts (at least servers). It can also leverage nvd to provide information what has been updated, on which host which version/generation is deployed and alike.

That’s quite advanced, though it would even reduce the maintenance efforts further where I need to take action only if something breaks, I want to explicitly change something, or there’s a new NixOS release. To be honest, I am not sure that I’ll ever dive into that as I am quite happy about my current state.

What nix gave me is that it makes maintaining the hosts enjoyable again. It’s not a burden, really. It’s actually fun! Something I haven’t felt for quite some time with the previous setups using other distributions (before you ask: reducing the number of hosts is not viable for me, I do I need them).


1: FYI, I still use a modified version of it as my daily driver, but it might not be elegant or the most up-to-date starter you should use. It does work fine for me over nearly 2.5 years and I have no intention on changing it as of now.

#linux #nix #nixjourney #operations