System Building Manifesto 

It’s hard for highly technical people to not dominate conversations about tech. But in a role of Engineering Manager it’s important to not do this, ownership should be with the people doing the work not their managers.

So how do you manage people with less experience than you and not become a dictator?

Something I’ve been working on with my teams lately is coming up with High level Guidelines to give them work with. Highlighting common pit falls and encouraging best practice that come from the experienced people in the organisation. Having a common understanding of what’s good or best help people move in the right direction while giving them the freedom to design and build as they like, as long as the guidelines are not too specific and leave room for interpretation that maybe be slightly with each team or engineers individual context.

For example, I would not give my teams a guideline of “Code Coverage >80%”, this is too specific, and based on a team’s application they are working on they maybe happy with 70 or even 60%, and that’s ok. A better way to phrase this if coverage is important to you would be “Team’s should value and have high test coverage”.

This again though is too specific, If you have poor assertions, it doesn’t matter what % coverage you have right? Code coverage has a higher purpose, and it alone does not serve this purpose, it’s better to focus on the higher level goals.

Code Coverage, for me, is a part of Test Automation, the goal of test automation is to reduce bugs, production issues etc. So these in my opinion are better to focus on. In my example below

Systems should have test automation that brings confidence and inspires courage in engineers

Where I mention test automation i mention the behaviour I have seen in high performing teams specifically. I’ve worked in teams where the “deploy” button is pressed with little regard for the impacts, because the Engineers are confident in the pipelines, monitoring and rollbacks that are in place. This for me is the high level goal i want my engineers to strive for, Real Continuous Delivery.

So here’s the full list I have in Draft, feel free to comment, I’ll do some follow up post with dives into some of them.

I used the word “Manifesto” because when i showed them to another manager it’s what he called it, I thought it was cool 🙂

Guiding principles for Systems

  • Systems should be Domain Specific, responsible for one or few domains
  • Systems should be small in the overwhelming majority of cases. Small systems limit complexity
  • Systems should be consistent in design in the overwhelming majority of cases
  • Systems should be easy to build
  • Systems should have test automation that brings confidence and inspires courage in engineers
  • Systems should be easy to contribute to, not require extensive training
  • Systems should have Cross Cutting concerns addressed and shared in an easy and consistent way
  • Systems operate independently for the purpose of testing and debugging
  • Systems have consistent agreed upon telemetry for monitoring
    • Telemetry is a solved cross cutting concern for non-domain specific metrics
  • Systems are built on Modern up-to-date frameworks and platforms
  • Systems use Continuous Integration as a principle not a tool, merge and deploy often and in small increments
  • A System scales horizontally, both within a site and across multiple site. With this comes redundancy, Users experience zero downtime for instance and site outages
  • Systems have owners, who are responsible for the long-term health of the systems and who have contributors as customers

Performant .NET APIs

I’m going to conflate two topics here as they king of go together to make the title, the first isnt dotnet specific, its a general API principle.

Systems Should Own their Own data

If you can make this work then there’s a lot of advantages. What does this mean in practice though?

It means that tables in your database should only every be read and written by a single system, and there’s a lot of Pros around this. Essentially the below is what I am recommending you AVOID

How else to proceed though? there is several options that you may or may not be aware about, I’ll mention a few now but wont go into specifics

Backbend for Front end

Event Sourcing

Data Materialization

I’ve have other blogs on these. But what you are asking is what’s the benefit right?

If you control data from within a closed systems, its easier to control, a pattern which becomes easy here is known a “write through cache”.

Most of you will be familiar with a “Pull Through Cache”, this is the most common caching pattern, the logic flows like this

  1. Inbound request for X
  2. Check cache for key X
  3. If X found in Cache return
  4. Else Get X from Database, Update Cache, Return X

So on access we update the cache, and we set an expiry time of Y. And our data is usually stale by Y or less at any given time, unless it hits the DB and then its fresh and slow.

A write through cache is easy to implement when the same system reading is tightly couple with the system writing (or in my recommendation, the same system).

In this scenario the logic works the same, with one difference, when writing we update the cache with the object we are writing, example:

  1. Inbound update for key X
  2. Update database for key X
  3. Update Cache for key X

This way all forces a cache update and your cache is always fresh. Depending on how we implement our cache will vary on how fresh it becomes though. We could work this with local or remote cache.

For small datasets (1s or 10s of Gigabytes in size) I recommend local cache, but if we have a cluster of 3 servers for example how does this work? I generally recommend using a message bus, the example below step 3 sends a message, all APIs in a cluster subscribe to updates on this bus on startup, and use this to know when to re-request updates from DB when update events occur to keep cache fresh. In my experience this sort of pattern leads to 1-4 seconds of cache freshness, depending on your scale (slightly more if geographically distributed)

So this isn’t dotnet specific but it makes me lead to my next point. Once you have 10-15Gb of Cache in RAM, how do you handle this? how do you query it? which brings me to the next part.

Working with big-RAM

I’m going to use an example where we used immutable collections to store the data. Update means rebuild the whole collection in this case, we did this because the data updated infrequently, for more frequently updated data DONT do this.

Then used Linq to query into them, collections where 80Mb to 1.2Gb in size in RAM, and some of them had multiple keys to lookup, this was the tricky bit.

The example Data we had was Geographic data (cities, states, points of interest, etc), and we had in the collections multiple languages, so lookups generally had a “Key” plus another “Language Id” to get the correct translation.

So the initial Linq query for this was like

_cityLanguage.FirstOrDefault(x => x.Key.KeyId == cityId && x.Key.LanguageId == languageId).Value;

The results of this are below

LookupCityWithLanguage60000929.3 ms17.79 ms17.48 ms

You can see the mean response here is almost a second, which isn’t nice user experience.

The next method we tried was to create a dictionary that was keyed on the two fields. To do this on a POCO you need to implement Equals and GetHash code methods so that the dictionary can Hash and compare the keys like below.

class LanguageKey
        public LanguageKey(int languageId, int keyId)
            LanguageId = languageId;
            KeyId = keyId;
        public int LanguageId { get; }
        public int KeyId { get; }

        public override bool Equals(object obj)
                if(!(obj is LanguageKey)) return false;
                var o = (LanguageKey) obj;
                return o.KeyId == KeyId && o.LanguageId == LanguageId;

        public override int GetHashCode()
            return LanguageId.GetHashCode() ^ KeyId.GetHashCode();

        public override string ToString()
            return $"{LanguageId}:{KeyId}";

So the code we end up with is like this for the lookup

_cityLanguage[new LanguageKey(languageId,cityId)];

And the results are

LookupCityWithLanguage60000332.3 ns6.61 ns10.86 ns

Now we can see we’ve gone from milliseconds to nanoseconds a pretty big jump.

The next approach we tried is using a “Lookup” object to store the index below is the code to create the lookup and how to access it.

// in ctor warmup
lookup = _cityLanguage.ToLookup(x => new LanguageKey(x.Key.LanguageId, x.Key.KeyId));
// in method
lookup[new LanguageKey(languageId, cityId)].FirstOrDefault().Value;

And the results are similar

LookupCityWithLanguage60000386.3 ns17.79 ns51.89 ns

We prefer to look at the high percentiles at Agoda though so measure APIs (usually the P90 or P99) below is a peak at how the API is handling the responses.

consistently below 4ms at P90, which is a pretty good experience.

Overall the “Write Through Cache” Approach is a winner for microservices where its common that systems own their own data.

NOTE: this testing was done on an API written in netcore 3.1, I’ll post updates on what it does when we upgrade to 6 🙂

How to Promote an Engineer

To understand how to promote we need to understand why we have titles, and what they are for.

Most modern tech companies (Amazon, etc) have the IC (Individual Contributor) level track concept, so I will use this as a basis. It works in roughly these levels

  • IC1 – Associate Engineer
  • IC2 – Engineer
  • IC3 – Senior Engineer
  • IC4 – Lead
  • Etc

Titles are important for recognition of people’s achievements, to set them targets to drive personal improvement, they also help with reconciliation of compensation to make sure people are paid what they deserve but I don’t personally believe compensation is their primary purpose.

They can also have a negative culture impact if used in the wrong way, sometimes people can use their title to boss or lord over others, but my advice on this is that it’s not a problem with the title, it’s a problem with the person, and this is toxic behavior, if they can’t fix it, show them the door.

Titles are also used implicitly to set the expectations of others. When people work on a team with someone of a higher title than them, they should hopefully be inspired to be a better engineer, and in turn help drive their own career progression.

Usually titles come with a defined list of Qualities that can be quiet subjective and high level, like “Practices the Latest in CI/CD Technology”, while useful as a guide, these aren’t very actionable or objective that you can give to an Engineer to do.

Many Managers I talk to about career progression tend to look at goal setting as a method for Engineer to prove themselves, I like goal setting and I do it a lot, but when it comes to career progression I think it’s a bit flawed and I’ll explain why.

Some examples people used with me for goals for Engineers were:

  • Do a blog post
  • Lead a project to completion
  • Do a tech talk

Goals like this are fine, but if used for career progression you can effectively create a checklist list for a promotion, and after the Engineer has done X,Y,Z on the list we promote him, this doesn’t mean after being promoted they will continue to do this. If we take the example of an Engineer who is set the above three goals, does them in a Quarter or two, gets promoted to senior engineer, then goes back to doing the same thing he did before. He’s not likely to inspire those around him who are doing the same job now, but don’t have the title. In fact, it may even have a negative impact on the team.

And when you are asked by someone “why is he senior?” is your response of “He did a design review and a tech talk 2 years ago” going to be a good answer?

So when are goals ok?

Goals I believe are good for short term, they are good to push someone out of their comfort zone to give them a taste of something, or to defeat fear. A bit like young children and swimming; children are usually worried about getting wet and will cry and complain, but once you finally get them in the water it’s hard to get them out. Goal setting is good for pushing people out of their comfort zones, and also for giving people a taste of something new that they other wise would never have tried, perhaps in the example of cross training, or opening conversation of new career paths, is my opinion.

But back to career progression

If we want people to be doing the above things, they should be self-motivated to do them not doing them because they are led by the promotion carrot stick. So what we are after more, is a change on mindset as opposed to a “To Do” list, so that it becomes are part of their day-to-day thinking.

How to change or measure people’s mindset?

You can’t measure that I’ve found, but the best proxy I’ve found is the behavior people exhibit. The advantage of using behavior is it is a day-to-day thing. The way people conduct themselves when dealing with others, specific to engineering scenarios, on a day-to-day basis is something you can set goals around, or more so, expectations.

Setting expectations of behavior is something that Ben Horowitz talks about, he wrote a blog a long time ago called “Good PM, Bad PM” applying this to Product Mangers in the 90s and 00s.

If we promote people based on their day-to-day behavior the exhibit, they are likely to continue this behavior as it part of their routine, they are unlikely to degreed their behavior over time, and if more goals are set around improving behavior then they will most likely progress.

Taking the example of the “Inspiration from the Senior Engineer on my team”, if we assume that behavior is consistent over time then the answer to the question about “why is he senior?” becomes easier to answer in that he acts in fashion A,B,C on a day-to-day basis.

I have some example of what I set, to try to explain the method:

  • A Senior Engineer identifies and helps with skill gaps in his immediate area, escalating when they are too much for him to handle. He is the guy that says in a stand up “hey Bob, you haven’t had much experience in system X how about you pick up that task today.” He encourages continuous improvement of the team in the day to day.

The above is an expectation around collaboration and system ownership, this is from my Senior Engineer expectations, you can see how its worded that it’s day-to-day behavior expectation around being a positive influence in the team.

The thing missing from this that’s present in the Horowitz article though is the “Bad PM”. Horowitz remarks on calling out explicit negatives in behavior as well. This is very useful for calling out common bad behavior people pick up within the organization (or in the industry in general) that might be common and help to correct them.

Here’s an example from my basic Engineer Expectations:

  • An Engineer Tests. They employ automation to do so and understands when to use Unit vs System vs Integration Testing. An engineer does not have a “Tester” on their team whose responsibility it is to do the testing.

This is a common pitfall from the industry, especially from older engineers who used to work on teams where they did have “testers”. Engineers like this that have any form of Quality role attached to their team think they have a “tester”, and this is very bad for not only cross functional teams but also the correct use of automation. So by calling out this negative behavior we help to correct this by setting the expectations.

Be careful though, the expectations I have here are specific to my context, not everyone should have the same expectations, there will be things unique to your company, team, etc. that they should change. From the example above, maybe you do have “Testers” on your team, and that is ok for you.

In closing though, I would recommend trying to Set “Behavior Expectations” around your career levels as a method to drive the right change, in your staff, for promotions.

Build System integration with Environment Variables

Different CI systems expose a variety of an array of information in environment variables for you to access, for example commit hash, branch, etc which is handy if you are writing CI tooling. Some of them even seek to standardize these conventions.

This post is primarily about collating that info into a single source for lookup. Ideally if you are writing tooling that you want a lot of people use you should support multiple CI systems to increase adoption.

As we look at each the first thing we need to do is tell which system is running, each CI platform has a convention to allow you to do this that we’ll talk about first

Below is a table of each Major build system and example bash of how to detect that the process is running in them, as a well as link to documentation on Env Vars that the system exposes.

Jenkins“$JENKINS_URL” != “”
Travis“$CI” = “true”
“$TRAVIS” = “true”
AWS Codebuild“$CODEBUILD_CI” = “true”
Teamcity“$TEAMCITY_VERSION” != “”
Circle CI“$CI” = “true”
“$CIRCLECI” = “true”
Semaphore CI“$CI” = “true”
“$SEMAPHORE” = “true”
Drone CI“$CI” = “drone”
“$DRONE” = “true”
Heroku“$CI” = “true”
Appveyor CI“$CI” = “true” || “$CI” = “True”
“$APPVEYOR” = “true” || “$APPVEYOR” = “True”
Gitlab CI“$GITLAB_CI” != “”
Github Actions“$GITHUB_ACTIONS” != “”
Bitbucket“$CI” = “true”

Below is 4 commonly used Parameters as an example, there are much more available, but as you can see form this list there is a lot of commonality.

Build SystembranchcommitPR #Build ID
Github Actions GITHUB_REFGITHUB_SHAcan get from RefGITHUB_RUN_ID

For Teamcity a common work around to it’s lack of env vars is to place a root level set of parameters that will inherit to every config on the server


env.TEAMCITY_BUILD_URL = %teamcity.serverUrl%/viewLog.html?

The `set-env` command is disabled in GitHub Actions

Recent security updates in GitHub actions prevented you from using the Environment variables, but there is a pretty easy work around that i am going to show.

Here’s an example command where we set a version number (APP_VERSION in this case is the name of the env variable) that will be used in various subsequent steps

run: echo "::set-env name=APP_VERSION::${{ env.MAJOR_MINOR_VERSION }}${{ github.run_number }}"        

We can instead pipe a string like this this to the GITHUB_ENV variable

run: echo "APP_VERSION=${{ env.MAJOR_MINOR_VERSION }}${{ github.run_number }}" >> $GITHUB_ENV

Later we can use it the same way we did environment variables before as seen below e we build a docker container using the version number in a subsequent step

run: docker build -t mydockerreg.internal/${{ env.APP_NAME }}:$APP_VERSION -f ${{ env.DOCKER_FILE_PATH }} .

It’s that easy.

If you are running on prem it’s tempting to just enable environment variables, but if you do and one day you want to scale your build workload into GitHub cloud then you’ll have problems, better to be complaint imo.

The full details about this from GitHub are here

Building dotnet in Containers

A lot of people use the dotnet cli to build their projects, and its a very handy tool, but there’s a lot of version of dotnet out there these days, and maintaining build agents in your CI with all the right version can be troublesome.

So that’s where containers can help you, and make sure you are building with the right sdk.

I’ll use a recent example from a project we did in github acitons.

in this build we run

dotnet restore mySolution.sln
dotnet build mySolution.sln --configuration Debug --no-restore
dotnet test --no-build

This project in question we are using dotnet 3.1 sdk.

So to run these in docker we can simple use a docker run with the sdk container

docker run -v ${PWD}:/scr --workdir /src dotnet build mysolution.sln

This will mount the current working directory into the container and use the sdk inside to run the build.

This isn’t the best approach though, the nuget cache for the container will be lost at the end of the build as its stored in user scope.

A better approach is to use a multi-staged docker file, then it will take advantage of cache in the docker chunks. You can add docker files from teh visual studio IDE that have things preconfigured for good performance for you.

Adding a docker file using visual studio

The basic docker file looks like the below

FROM AS base

FROM AS builder
COPY *.sln ./
COPY ["Agoda.Api.WebApi/Agoda.Api.WebApi.csproj", "Agoda.Api.WebApi/"]
COPY ["Agoda.Api.Core/Agoda.Api.Core.csproj", "Agoda.Api.Core/"]
COPY docker-compose.dcproj ./
RUN dotnet restore "Agoda.Api.WebApi/Agoda.Api.WebApi.csproj"
COPY . .
WORKDIR "/src/Agoda.Api.WebApi"
RUN dotnet build "Agoda.Api.WebApi.csproj" -c Release -o /app

FROM build AS publish
RUN dotnet publish "Agoda.Api.WebApi.csproj" -c Release -o /app

FROM base AS final
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "Agoda.Api.WebApi.dll"]

There’s a couple of points to note here.

Firstly you can see the files are copied into the container to build, project files first, and restore is done separately. The reason for this is we create a chunk with the csproj files so that this chunk will only rebuild if the project files change (i.e. this is generally when you update your dependencies), so this chunk will remain cached on your local or on the CI agent and not rebuild unless you update the csproj.

The second thing to note is that there is multiple FROM statements, this is because it is a multi stage docker file, so it has a large builder container that starts with the SDK, and a smaller base container that the output of the build is copied to in order to have a small output container for production.

So to build this one we simply now run:

docker build . -f Agoda.Api.WebApi/Dockerfile -t agodaapi:1.0

This should be run from the root dir of your solution, and Visual Studio puts the docker file in the sub folder.

You can then run a docker push to publish your newly built container.

You can use the similar approach by using a dockerfile with your unit test project, but in that case you don’t need a multistage, you just need a normal docker file that use sdk image and runs “dotnet test” as the entrypoint.

My Arcade Machine

So people have been seeing this in my video feed on calls the last few days so I thought I’d write a post about it.

A few years ago I found a guy in Thailand through Facebook that builds replica Arcade machines. And ordered my favorite game from the 80s called Gauntlet, this was one of the first 4 player machines, and its infamous because it has infinite levels.


Internally though, its not like the original, just the outside is the same as the original, include the descriptions and text of the characters you can play in the game.

Apparently a lot of the artwork from these old games is available online and he was able to download it and print vynal stickers and make it look just like the original. But with a few additions, the white buttons were not on the original, and also the original didn’t have 6 buttons. The addition of the buttons is to support other games.

Also the coin slots are different from the original which had 4 coins (one dedicated to each player), as describe on the instructions seen printed below.

2020-03-31 17.24.59

The 3rd and 4th players only have only 4 buttons because there is no 3 or 4 play arcade games that have 6 buttons.

Original screen was 19″ CRT but there screen inside is 24″ LCD he was able to fit while still maintaining the original dimensions.

Which leads me to whats on the inside.

2020-03-31 14.06.13

Inside there is a PC with MAME and a few other emulators for old consoles. about 20,000 games in total, all legal ones that are out of copy write of course 🙂



If you are wanting to know where to contact this guy to build you one unfortunately he has disappeared, his Facebook page deleted and his phone disconnected, I haven’t been to his house to check it out because I’ve lost the address too. But its a common profession restoring these old things so I’d do some searching and I’m sure you’ll turn up someone.


From Code Standards to Automation with Analyzers and Code Fixes

We started to talk about standards between teams and system owners at Agoda a couple of years ago. We first started out on C#, the Idea was to come up with a list of recommendations for developers, for a few reasons.

One was we follow polyglot programming here and we would sometimes get developers more familiar with Java, JavaScript and other languages that would be working on the C# projects and would often misused or get lost in some of the features (e.g. When JavaScript developers find dynamics in C#, or Java Developer get confused with Extension Methods).

Beyond this we want to encourage a level of good practice, drawing on the knowledge of the people we have we could advise on potential pit falls and drawbacks of certain paths. In short the standards should not be prescriptive ones, as in “you must do it this way”, they should be more “Don’t do this”, but also teach at the same time, as in “Don’t do this, here’s why, and here’s some example code”. But also includes some guidance as well, as in “We recommend this way, or this, or this, or even this, depending on your context”, but we avoid “do this”.


The output was the standards repository that we’ve now open sourced

It’s primarily markdown documents that allow us to easily document, and also use pull requests and issues to start conversation around changes and evolve.

But we had a “If you build it they will come” problem. We had standards, but people either couldn’t find them, didn’t read them, and even if they did, they’ll probably forget half of them within a week.

So the question was how do you go about implementing standards amongst 150 core developers and hundreds more casual contributors in the organisation?

We turned to static code analysis first, the Roslyn API in C# is one of the most mature Language Apis on the market imo. And we were able to write rules for most of the standards (especially the “Don’t do this” ones).

This gave birth to a cross department effort that resulted in a Code fix library here that we like to call Agoda Analyzers.


Initially we were able to add them into the project via the published nuget package and have them present in the IDE, and they are available here.


but like most linting systems they tend to just “error” at the developer without much information, which we don’t find very helpful, so we quickly moved to Sonarqube with it’s github integration.

This allows a good experience for code reviewers and contributors. The contributor get’s inline comments on their pull request from our bot.


This allows the user time to fix issues before they get to the code reviewers, so most common mistakes are fixed prior to code review.

Also the link (the 3 dots at the end of the comment) Deep links into sonarqube’s WebUI to documentation that we write on each rule.


This allows for not just “Don’t do this”, but also to achieve “Don’t do this, here’s why, and here’s some example code”.

Static code analysis is not a silver bullet though, things like design patterns are hard to encompass, but we find that most of the crazy stuff you can catch with it, leaving the code review to be more about design and less about naming conventions and “wtf” moments from code reviewers when reviewing the time a node developer finds dynamics in C# for the first time and decides to have some fun.

We are also trying the same approach with other languages internally, such as TypeScript and Scala are our two other main languages we work in, so stay tuned for more on this.




The Transitive Dependency Dilemma

It sounds like the title of a Big Bang Theory episode, but its not, instead is an all to common problem that breaks the single responsibility rule that I see regularly.

And I’ve seen first hand how much butt hurt this can cause, a friend of mine (Tomz) spent 6 weeks trying to update a library in a website, the library having a breaking change.

The root cause of this issue comes from the following scenario, My Project depends on my Library 1 and 2. My Libraries both depend on the same 3rd party library (or it could be an internal one too).


Now lets take the example that each library refers to a different version.


Now which version do we use, you can solve this issue in dotnet with assembly binding redirects, and nuget even does this for you (other languages have similar approaches too). However, when there is a major version bump and breaking changes it doesn’t work.

If you take the example as well of having multiple libraries (In my friend tomz case there was about 30 libraries that depended on the logging library that had a major version bump) this can get real messy real fast, and a logging library is usually the most common scenario. So lets use this as the example.



So what can we change to handle this better?

I point you to the single responsibility principle. In the above scenario, MyLibrary1 and 2 are now technically responsible for logging because they have a direct dependency on the logging library, this is where the problem lies. They should only be responsible for “My thing 1” and “My thing 2”, and leave the logging to another library.

There is two approaches to solve this that i can recommend, each with their own flaws.

The first is exposing an interface that you can implement in MyProject,


This also helps with if you want to reuse you library in other projects, you wont be dictating to the project how to log.

The problem with this approach though is that you end up implementing a lot of ILoggers in the parent

The other approach is to use a shared common Interfaces library.


The problem with this approach however is when you need to update the Common interfaces library with a breaking change it becomes near impossible, because you end up with most of your company depending on a single library. So I prefer the former approach.