What do I care about?

“Never attempt to write about what you don’t care about.” – Gerald Weinberg

The things I care about:

My overall health and sanity. I will not be able to provide for my family without taking care of the basics: eat well, sleep well, and put the body to its paces.

My family. It’s an entirely different life once we decided to build a family. A hidden capacity has been unlocked and also with less tolerance for BS and a greater sense of urgency. I now have to consider the family’s interests in my decisions.

My colleagues and the work that I produce with them. When working with a team, I had to consider our complementary skills and where we can best contribute to our goals. As an individual contributor, I do not like doing half-measures. Granted corners will inevitably be cut, I aim to do the best possible work with the constraints given.

My friends who I have managed to keep for so long. Some of the friends we’ve made remember a different version of myself, as if their working copy needs rebasing.

In some ways I would like to be able to influence the things I care about. While I’m still able, I would like them to stay important.

The Battle of Helm’s Deep

I’m currently migrating a production Kubernetes cluster from Helm v2 to v3.

Helm v2 has been long deprecated. We’ve been using Helm to install our services for almost 4 years, but Helm v2 has been deprecated since last year and everyone seems to have moved to Helm v3.

Helm v3 no longer depends on a server-side daemon called Tiller, which coordinates the installation of Kubernetes resources from a chart’s template.

This is a problem not unique to myself

Props to the Helm team for creating a helpful migration video. This has eased a lot of my worry of breaking not just one, but multiple services running in our production cluster. I was able to go through the tutorial and was able to migrate one Redis release. I could leave still use Helm v2 in our deployments, which is highly appreciated.

See also

Driving on the US Highway 1

Four years ago, me and my wife went on a vacation to the US. It was my first time to drive a car on US soil. From San Francisco, we drove north to Napa Valley and from there all the way south to San Diego. We’ve met new people and visited old friends along the way. I wanted to see the Bixby Bridge so we took the Pacific Coast Highway, but had to turn back due to bad weather and landslide.

Build for operability

In a previous post, I mentioned something about a mullet model of production: operate a service with reliability and simplicity. I intend to expand about of the terms I’ve used there.

In a software-as-a service (SaaS), production refers to the ensemble of software used to deliver a service (e.g. an eCommerce site). If you are a web developer, this includes your code that you’ve written using some language, the database where your data is stored, and the other parts needed to run your service (e.g. hosting infrastructure, instrumentation, etc.).

Consider the Primary Function of your service. Any feature to be build must support that Primary Function. The job of an eCommerce SaaS is to facilitate orders. Customers must be able to visit the site add products to their cart, and collect payment. It is not enough to write the features: there has to be supporting software for these features to deliver its job well.

Operability refers to the degree to which a service can be supported as it performs its Primary Function. Operability varies a lot depending on the type of service. A few of my guide questions are: (1) Can you understand what the code does at 2am while running in production? (2) How long would it take to recall how a feature works after not making any changes for several months? (3) How difficult would it be to extend an existing feature to support a new requirement? These questions impose a lot on the software used to run the services and also its supporting tools. Having a simple, understandable codebase with sufficient test coverage helps a lot. Having a good suite of supporting tools (e.g. alert tracking, instrumentation, etc.) also helps.

Tools and techniques are not enough. Without a team skilled in building and operating what they’ve built, operability would be very difficult to achieve. The team ties everything together. There will be some specialist roles within a team, but everyone in the team has a good mental model of how production works.

Recommended resources

  1. Above the Line, Below the Line. Building reliable services requires a working understanding of the continuously shifting dependencies.
  2. The Soviet Union’s Philosophy of Weapons Design (Chapter 87 of Digest). Build tools with simplicity reliability in mind (e.g. AK-47).
  3. Charity Majors’ Twitter account.

Developers on-call and deploying on a Friday

I’ve been supporting a SaaS product that we’ve built from the ground up for the past four years. This service, despite some bad initial decisions and staff churn, managed to survive and bring in some revenue to the owners. Today I was paged (received a message) about a critical feature that is still broken in production. This was related to problems that were identified yesterday. Users could not get their jobs done.

Rather than wait until Monday and let the stress build up I’ve decided to deploy three bugfixes to production. I’m writing this early Saturday morning and I just finished deploying and testing in production.

It sucks to be on-call and be exposed to angry customers. I’ve made a lot of changes to make on-call suck less over the past few years. Running a service these days involves more moving parts compared to FTPing a tarball and bouncing the web server back then.

I learned the hard way that the software we’ve built (and the other dependencies we use) could end up harming us in ways we could not anticipate. I would rather be ready to deal with the problem than predict every possible error case. This led to what I would refer to as a mullet model of production: the service has to run smoothly as users perceive it and easy to operate while running. Operability is not a new idea, but having worked as a sysadmin, I would want the services that I am responsible for to be relatively easy to troubleshoot.

Deploying on a Friday is taboo in some software teams. What I’ve seen is that it’s usually a symptom of a bigger problem. For example, not having good tooling for deploying code to production. Or perhaps a team issue where the new developers are left to deal with the consequences left by their former colleagues. This list of problems could go on.

To new developers reading this and frowning about on-call: not everything is bad and by being on-call you are preventing a bigger catastrophe from happening. Good luck out there!

Instant vs brewed coffee

I’ve switched to instant coffee a few years ago when I became a parent. I just wanted my caffeine hit done and with minimal fuss. This led to preparing several cups of coffee during the day (I only needed a steady supply of coffee and hot water).

Lately I missed the smell of freshly brewed coffee beans (probably due to not hanging out at coffee houses in recent years). I bought a bag of Arabica beans and took out my brewing equipment from the cabinet (my brewing gear consists of a French press and an electric grinder, nothing fancy).

Having to prepare coffee using a press these days seems laborious to me, but at least it allows me to throttle my caffeine intake by adding friction to the process. One batch equals one-and-a-half mug of this freshly brewed stimulant, which is enough to jumpstart the day.

Pushing past the stupid hour

Just before sleeping I had an idea for a bug that I’ve been working on. Identifying the problem took most of the time. While brainstorming for ideas, I noticed my mind was giving me all these SWAGs (silly wild-ass guesses). After a short pause, I ruled out these ideas and eventually found the culprit.

I would have preferred to get enough sleep before engaging in this type of work (and let my mind work on the problem in the background), but I had a hunch that a solution was nearby. Sometimes you just need to push because of what’s at the top of your mental stack.

See also

Upgrading cert-manager from v0.10 to v1.2.0

I found out recently that I could no longer request SSL certificates using cert-manager’s deprecated APIs. This article describes the steps I took to upgrade cert-manager and some error messages found during the process. Total upgrade time took 1 hour and 15 minutes.

Prerequisites

  • kubernetes 1.16+ (I used 1.18)
  • kubectl 1.16+ (I used 1.18)

Backup secrets

$ kubectl get -o yaml -n cert-manager secrets > cert-manager-secrets.yaml

Backup relevant objects

$ kubectl get -o yaml \
    --all-namespaces \
    issuer,clusterissuer,certificates > cert-manager-backup.yaml

Uninstall the old cert-manager

The old cert-manager was installed using a Helm chart:

$ helm delete <helm-release-name>

Delete the cert-manager namespace

$ kubectl delete namespace cert-manager

Remove the old CRDs

$ kubectl delete crd clusterissuers.certmanager.k8s.io
$ kubectl delete crd issuers.certmanager.k8s.io
$ kubectl delete crd challenges.certmanager.k8s.io
$ kubectl delete crd certificates.certmanager.k8s.io

Check for stuck CRDs

In case CRDs could not be deleted, check for finalizers in the CRD’s manifest. Remove the finalizers from the CRD’s manifest and try to delete the CRD again.

Install cert-manager

This time, I installed using jetstack’s manifests and did not use Helm.

$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml

Verify pods are running

$ kubectl get pods -n cert-manager

Example output:

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-789fdcb77f-7qcgg              1/1     Running   0          3m6s
cert-manager-cainjector-6f6d6cb496-hzhzt   1/1     Running   0          3m7s
cert-manager-webhook-5c79844f4f-kwskp      1/1     Running   0          3m5s

Update API endpoints from backup

I recommend using a text editor to find-and-replace certmanager.k8s.io/v1alpha1 with cert-manager.io/v1.

Remove outdated syntax (e.g. http01) (see Issuer/ClusterIssuer issues).

Apply manifests to restore from backup

$ kubectl apply -f cert-manager-secrets.yaml
$ kubectl apply -f cert-manager-backup.yaml

See also

Wandering, Part One

If you don’t know where you are going, any road will get you there. – Lewis Carroll

My father gave that quote to me as he asked what my plans were after high school. He was willing to pay for college, but I had to decide what to study and see it through until graduation. I chose to study computer science despite discouragement of people around me at the time.

I mentioned luck (fortunate accidents) played a part in getting into a computer science program because I barely prepared for the exam. Staying in the program is a different problem. Not knowing what a computer science program entailed, I struggled for the first half of my stay at university. Things started to turn to a point where I had to convince university officials that I could finish the course.

I finished while working part-time at the university and eventually stayed a few years more to consult for them on software projects. This was the time when the World Wide Web has started to transform to Web 2.0.