Crowds and getting the first vaccine done

I woke up at 2:00am yesterday to line up for my first COVID-19 vaccine shot. The whole process took 7 hours spent mostly waiting in line on a dark street corner with close to a thousand other people.

I arrived at the vaccination site at 2:30am and found the line has already turned a corner from the entrance. This meant at least 100 people have lined up earlier (I found out I was the 221st person in line). I’ve put myself at the last line and a few minutes after, two more people went behind me. Then another two people. Within two hours, the line doubled in size.

People have new ways of killing time while in line. Most of us brought our phones to read social media (which is normal behavior these days). Some were playing games and others were catching up on their streaming shows. Those who lined up with companions spent the time the old-fashioned way by talking endlessly. This pandemic can keep Filipinos separate but it cannot keep them from talking to each other about anything under the sun. Some were complaining about the line while others were making fun of our situation.

We, as a people, have also made the new health protocols as a practice of theater. Social distancing was only observed after the local government officials stepped in and checked before someone else took pictures of the said line. Everyone kept their masks on while in line (which was good enough for me), but the face shield (a thin plastic sheet covering your face and was required outside of residence) proved to be too inconvenient and most people wore them over their heads. I personally found the use of face shield to be symbol of this precaution theater (low cost and questionable value-add).

The vaccination site started to open their doors at 6:30 that morning, most of us who were in line for several hours have the same sleepy look on our faces. Some have started to check their documents. We only needed an ID and a QR code. True enough for people subjected to more than usual red tape all our lives, some brought more proof than required. The vaccination staff (doctors and nurses) arrived shortly and began to settle in.

Crowd control in the vaccination site was well-organized and there were enough personnel to direct the people where to line up next and keep our distance from one another. Not much talking, just point and check. Finally they let us go upstairs where the chairs are setup and where the staff were waiting for us. The line was processed in batches of 12 people, avoiding crowding at the vaccination stations. There were three stations: one for identity verification (using the QR code earlier), another for the vaccine administration (which was done by two staff members), and an observation area just in case there were side-effects right after the dose. A staff member reminded us of our next dose one month after.

I eventually finished at 9:30, seven hours after I stood in line. To celebrate, I immediately lined up to buy food at the nearest Jollibee.

What do I care about?

“Never attempt to write about what you don’t care about.” – Gerald Weinberg

The things I care about:

My overall health and sanity. I will not be able to provide for my family without taking care of the basics: eat well, sleep well, and put the body to its paces.

My family. It’s an entirely different life once we decided to build a family. A hidden capacity has been unlocked and also with less tolerance for BS and a greater sense of urgency. I now have to consider the family’s interests in my decisions.

My colleagues and the work that I produce with them. When working with a team, I had to consider our complementary skills and where we can best contribute to our goals. As an individual contributor, I do not like doing half-measures. Granted corners will inevitably be cut, I aim to do the best possible work with the constraints given.

My friends who I have managed to keep for so long. Some of the friends we’ve made remember a different version of myself, as if their working copy needs rebasing.

In some ways I would like to be able to influence the things I care about. While I’m still able, I would like them to stay important.

The Battle of Helm’s Deep

I’m currently migrating a production Kubernetes cluster from Helm v2 to v3.

Helm v2 has been long deprecated. We’ve been using Helm to install our services for almost 4 years, but Helm v2 has been deprecated since last year and everyone seems to have moved to Helm v3.

Helm v3 no longer depends on a server-side daemon called Tiller, which coordinates the installation of Kubernetes resources from a chart’s template.

This is a problem not unique to myself

Props to the Helm team for creating a helpful migration video. This has eased a lot of my worry of breaking not just one, but multiple services running in our production cluster. I was able to go through the tutorial and was able to migrate one Redis release. I could leave still use Helm v2 in our deployments, which is highly appreciated.

See also

Driving on the US Highway 1

Four years ago, me and my wife went on a vacation to the US. It was my first time to drive a car on US soil. From San Francisco, we drove north to Napa Valley and from there all the way south to San Diego. We’ve met new people and visited old friends along the way. I wanted to see the Bixby Bridge so we took the Pacific Coast Highway, but had to turn back due to bad weather and landslide.

Build for operability

In a previous post, I mentioned something about a mullet model of production: operate a service with reliability and simplicity. I intend to expand about of the terms I’ve used there.

In a software-as-a service (SaaS), production refers to the ensemble of software used to deliver a service (e.g. an eCommerce site). If you are a web developer, this includes your code that you’ve written using some language, the database where your data is stored, and the other parts needed to run your service (e.g. hosting infrastructure, instrumentation, etc.).

Consider the Primary Function of your service. Any feature to be build must support that Primary Function. The job of an eCommerce SaaS is to facilitate orders. Customers must be able to visit the site add products to their cart, and collect payment. It is not enough to write the features: there has to be supporting software for these features to deliver its job well.

Operability refers to the degree to which a service can be supported as it performs its Primary Function. Operability varies a lot depending on the type of service. A few of my guide questions are: (1) Can you understand what the code does at 2am while running in production? (2) How long would it take to recall how a feature works after not making any changes for several months? (3) How difficult would it be to extend an existing feature to support a new requirement? These questions impose a lot on the software used to run the services and also its supporting tools. Having a simple, understandable codebase with sufficient test coverage helps a lot. Having a good suite of supporting tools (e.g. alert tracking, instrumentation, etc.) also helps.

Tools and techniques are not enough. Without a team skilled in building and operating what they’ve built, operability would be very difficult to achieve. The team ties everything together. There will be some specialist roles within a team, but everyone in the team has a good mental model of how production works.

Recommended resources

  1. Above the Line, Below the Line. Building reliable services requires a working understanding of the continuously shifting dependencies.
  2. The Soviet Union’s Philosophy of Weapons Design (Chapter 87 of Digest). Build tools with simplicity reliability in mind (e.g. AK-47).
  3. Charity Majors’ Twitter account.

Developers on-call and deploying on a Friday

I’ve been supporting a SaaS product that we’ve built from the ground up for the past four years. This service, despite some bad initial decisions and staff churn, managed to survive and bring in some revenue to the owners. Today I was paged (received a message) about a critical feature that is still broken in production. This was related to problems that were identified yesterday. Users could not get their jobs done.

Rather than wait until Monday and let the stress build up I’ve decided to deploy three bugfixes to production. I’m writing this early Saturday morning and I just finished deploying and testing in production.

It sucks to be on-call and be exposed to angry customers. I’ve made a lot of changes to make on-call suck less over the past few years. Running a service these days involves more moving parts compared to FTPing a tarball and bouncing the web server back then.

I learned the hard way that the software we’ve built (and the other dependencies we use) could end up harming us in ways we could not anticipate. I would rather be ready to deal with the problem than predict every possible error case. This led to what I would refer to as a mullet model of production: the service has to run smoothly as users perceive it and easy to operate while running. Operability is not a new idea, but having worked as a sysadmin, I would want the services that I am responsible for to be relatively easy to troubleshoot.

Deploying on a Friday is taboo in some software teams. What I’ve seen is that it’s usually a symptom of a bigger problem. For example, not having good tooling for deploying code to production. Or perhaps a team issue where the new developers are left to deal with the consequences left by their former colleagues. This list of problems could go on.

To new developers reading this and frowning about on-call: not everything is bad and by being on-call you are preventing a bigger catastrophe from happening. Good luck out there!

Instant vs brewed coffee

I’ve switched to instant coffee a few years ago when I became a parent. I just wanted my caffeine hit done and with minimal fuss. This led to preparing several cups of coffee during the day (I only needed a steady supply of coffee and hot water).

Lately I missed the smell of freshly brewed coffee beans (probably due to not hanging out at coffee houses in recent years). I bought a bag of Arabica beans and took out my brewing equipment from the cabinet (my brewing gear consists of a French press and an electric grinder, nothing fancy).

Having to prepare coffee using a press these days seems laborious to me, but at least it allows me to throttle my caffeine intake by adding friction to the process. One batch equals one-and-a-half mug of this freshly brewed stimulant, which is enough to jumpstart the day.

Pushing past the stupid hour

Just before sleeping I had an idea for a bug that I’ve been working on. Identifying the problem took most of the time. While brainstorming for ideas, I noticed my mind was giving me all these SWAGs (silly wild-ass guesses). After a short pause, I ruled out these ideas and eventually found the culprit.

I would have preferred to get enough sleep before engaging in this type of work (and let my mind work on the problem in the background), but I had a hunch that a solution was nearby. Sometimes you just need to push because of what’s at the top of your mental stack.

See also

Upgrading cert-manager from v0.10 to v1.2.0

I found out recently that I could no longer request SSL certificates using cert-manager’s deprecated APIs. This article describes the steps I took to upgrade cert-manager and some error messages found during the process. Total upgrade time took 1 hour and 15 minutes.

Prerequisites

  • kubernetes 1.16+ (I used 1.18)
  • kubectl 1.16+ (I used 1.18)

Backup secrets

$ kubectl get -o yaml -n cert-manager secrets > cert-manager-secrets.yaml

Backup relevant objects

$ kubectl get -o yaml \
    --all-namespaces \
    issuer,clusterissuer,certificates > cert-manager-backup.yaml

Uninstall the old cert-manager

The old cert-manager was installed using a Helm chart:

$ helm delete <helm-release-name>

Delete the cert-manager namespace

$ kubectl delete namespace cert-manager

Remove the old CRDs

$ kubectl delete crd clusterissuers.certmanager.k8s.io
$ kubectl delete crd issuers.certmanager.k8s.io
$ kubectl delete crd challenges.certmanager.k8s.io
$ kubectl delete crd certificates.certmanager.k8s.io

Check for stuck CRDs

In case CRDs could not be deleted, check for finalizers in the CRD’s manifest. Remove the finalizers from the CRD’s manifest and try to delete the CRD again.

Install cert-manager

This time, I installed using jetstack’s manifests and did not use Helm.

$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml

Verify pods are running

$ kubectl get pods -n cert-manager

Example output:

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-789fdcb77f-7qcgg              1/1     Running   0          3m6s
cert-manager-cainjector-6f6d6cb496-hzhzt   1/1     Running   0          3m7s
cert-manager-webhook-5c79844f4f-kwskp      1/1     Running   0          3m5s

Update API endpoints from backup

I recommend using a text editor to find-and-replace certmanager.k8s.io/v1alpha1 with cert-manager.io/v1.

Remove outdated syntax (e.g. http01) (see Issuer/ClusterIssuer issues).

Apply manifests to restore from backup

$ kubectl apply -f cert-manager-secrets.yaml
$ kubectl apply -f cert-manager-backup.yaml

See also