On platform engineering

This is my summary of The Future of Ops Jobs is Platform Engineering.

DevOps has succeeded in unifying developer and operations roles, but the problem remains: how can developers operate their code? Skills for operating an increasingly complex cloud infrastructure remain valuable, and the difficult infrastructure parts have been factored out as their own services (e.g. EKS versus bootstrapping your own Kubernetes cluster).

Platform engineering as vendor engineering

Vendor engineering refers to having the necessary experience with another vendor’s API. I encountered this term in The Future of Ops Jobs, which was one of the first attempts at describing this new role.

While most infrastructure tools are now companies of their own, teams are selecting these tools as part of their own platform’s toolchain. This involves making sense of multiple vendor APIs, and providing a path for the team to use daily. All this work is to focus the team’s attention on the product.

I noticed that realizing that this type of work is needed happens while building out a SaaS product. This type of work is necessary and, if not properly done, could distract the team from its mission.

Towards a self-service tier

All this tooling gives rise to the self-service tier, and the article describes what a self-service platform should have (e.g., deploy a service, instrument deployments, etc.). The goal would be for an engineer to quickly bring up a new service using the toolchain provided by the platform engineer. The toolchain serves as the “blessed stack” on what the team should use and is supported by the organization.

What happens to operations roles now?

This begs the question: are platform engineers just developers who happen to be assigned to build tools? Failure of platform engineering happens when the developer(s) assigned to do the work for this job have little experience in operating cloud software and, worse, have little empathy to make their teammates’ experience with the platform better.

The article compares a platform engineer’s job to a typical operations job. What caught my attention was the focus on running less software.

Job titles are lagging indicators

Similar to observability, I think we’ll see “platform engineer” job posts within a year or two.

Weather-weather

We have a local saying (“weather-weather lang yan”). Storms eventually pass. I took this picture today while on my morning walk:

Manila Bay (October 2022)

The waters were not always as calm this: last week, there was a typhoon. Today is a new day and another chance to do well if we can ride the turbulence.

Filtering in JSON:API

I needed to filter data when accessing an API following the JSON:API specifications. For those not familiar with JSON:API, it provides a set of conventions on how to structure JSON responses. It also provides some guidance on how to request data. Similar to semantic versioning, the set of conventions cuts down on the debate on how to structure APIs.

Filtering by Passing Query Parameters

JSON:API describes how to filter parameters, through the use of a filter parameter.

Example (fetching all posts made by author Joe):

http://example.com/api/posts?filter=[author][joe]

What I don’t see in the examples is how to pass in a list of values. For example, filtering by multiple authors. For now, I’ll assume that it is acceptable to pass in a list of values:

http://example.com/api/posts?filter=[author][jack,jill]

See also

Query Parameter Families

Filtering example in Drupal

Change column default value in ActiveRecord

Use change_column_default, which accepts three arguments (the table name, the column name, and the new default value).

See also

API Documentation (v7.0.3)

Change the default value for table column with migration

PS

It is never too late for TIL (Today I Learned) type of blog posts. I hesitated making these types of posts because I can save the StackOverflow URL anyway. Unfortunately, now I have 12 years worth of StackOverflow URLs saved in my computer and I have just started to curate.

You’re welcome.

Doing the hard stuff early

I’ve been waking up early for the past ten years. This habit has carried over, and has served me well, now that I have children. I sleep early and wake up a few hours before everyone else in my household. I use this time to work on what I deem to be the hard stuff.

Everyone struggles with controlling attention, so try not to get distracted on the first hour of waking up.

It helps to have something difficult lined up, such as going to the gym (or going for a walk indoors when its raining). Also, it helps to have something easy enough lined up, such as a failing test from yesterday’s work session.

Certain tasks, such as planning and reflection, require some headspace to be effective. I do these things early in the morning, too.

Domain Events using RailsEventStore

Context

In 2020, I implemented a backend process responsible for paying out sellers in an e-commerce platform. This consists of two distinct phases: (1) calculating the weekly payout amount and (2) transferring funds using Stripe.

Calculating the payout amount works by accounting for the previous week’s sales and deducting fees and refunds. After computing the payout amount, a request is made to transfer funds, which happens some time later. A summary report containing all relevant information is also generated and sent out to sellers.

The initial version of the payout process did not consider errors occurring while calculating the payout amount or transferring funds. This made debugging challenging, sometimes months after a transfer has been completed.

Improving payouts

I wanted to enhance the payout process by adding well-known checkpoints: (1) when the process has successfully computed the payout amount, (2) when the process has determined that the platform has enough funds to transfer, and (3) when the funds have been transferred to sellers successfully. Along the way, errors could occur and we also need to be aware of these errors (and provide the necessary manual intervention).

One approach would be to use a state machine, but I needed something that could capture the a payout process’ journey through the checkpoints I’ve defined above. I also did not want to litter my ActiveRecord models with callbacks, because this becomes difficult to debug for various reasons.

I found this library called RailsEventStore that provides a way to define application events, publish these events, and subscribe to these events. This is made possible by having a single repository of events as a single table. Furthermore, RailsEventStore does not require any fancy storage backend. I was able to make this work using an existing PostgreSQL database (event storage) and Redis (publish-subscribe).

Domain events

A domain event is a record of a fact occurring in some part of a software system. An event could be something like “order has been confirmed” or “customer has signed-up for an account”. Other parts of a large software system could listen to these events and perform additional work (e.g. send emails, compute a rollup table, etc.). What events provide is a way to decouple these side-effects from the main task of some feature.

Example domain events

I’ve defined several events specific to payouts (e.g., PayoutComputed, FundsTransferCompleted, PayoutSendingSuccessful, etc.). I also defined events to capture error conditions (e.g. FundsTransferFailed, etc.)

When an error occurs, I’ve setup a subscription to the FundsTransferFailed event, which kickstarts an ActiveJob to send the necessary alerts.

The listing below shows how an event is published (ignore the Honeycomb span blocks):

A simple audit trail

In order to trace what happened for a particular payout run, RailsEventStore provides a way to enumerate a stream of events (I organized mine by payout run using an ID). This gives me a time-ordered list of events and the parameters passed for each event.

See also

RailsEventStore

Domain-Driven Rails

Avoid premature abstractions

Stop me if you’ve heard this before:

You may have had a few ideas on creating “reusable” components to “save labor” and “deliver quickly.” Inspired, you decided to set aside some time to build your idea, and you were able to build a library that resembles the first iteration of your vision. You then proceeded to “encourage” your team members to adopt this new “framework,” which they have started to adopt in their new projects. Unfortunately, other concerns beckoned, and you had to pause development and work on something else.

Meanwhile, your teammates have started to prepare feedback on this new library, but no guidance was forthcoming. Realizing they were on their own with regards to using this new library “to deliver quickly,” they proceeded to put workarounds to get the new library to work the way it needs to in the real world.

The new library aimed to “save labor” and “deliver quickly” ended up costing time in terms of training and workarounds (and you will hear comments like “it sounded like a good idea at the time”).

Remember: just because you can, doesn’t mean you should.

2018 Japan Trip (Day 7)

Shinjuku to HND

We woke up at 5:00am to get ready for our return to Manila. We took an Uber to the bus terminal (nobody wants to walk that distance again). Tokyo is designed for travelers and removes much of the friction of going about.

ANA Flight from HND to MNL

Turbulent flight on the way back to Manila.

NAIA Terminal 3

I almost forgot how crowded it was at the arrival section: men wearing work-variety barong standing around, people selling real estate and cellphone load, and a duty free stall that did not make sense to me.