Build System integration with Environment Variables

Different CI systems expose a variety of an array of information in environment variables for you to access, for example commit hash, branch, etc which is handy if you are writing CI tooling. Some of them even seek to standardize these conventions.

This post is primarily about collating that info into a single source for lookup. Ideally if you are writing tooling that you want a lot of people use you should support multiple CI systems to increase adoption.

As we look at each the first thing we need to do is tell which system is running, each CI platform has a convention to allow you to do this that we’ll talk about first

Below is a table of each Major build system and example bash of how to detect that the process is running in them, as a well as link to documentation on Env Vars that the system exposes.

Jenkins“$JENKINS_URL” != “”
Travis“$CI” = “true”
“$TRAVIS” = “true”
AWS Codebuild“$CODEBUILD_CI” = “true”
Teamcity“$TEAMCITY_VERSION” != “”
Circle CI“$CI” = “true”
“$CIRCLECI” = “true”
Semaphore CI“$CI” = “true”
“$SEMAPHORE” = “true”
Drone CI“$CI” = “drone”
“$DRONE” = “true”
Heroku“$CI” = “true”
Appveyor CI“$CI” = “true” || “$CI” = “True”
“$APPVEYOR” = “true” || “$APPVEYOR” = “True”
Gitlab CI“$GITLAB_CI” != “”
Github Actions“$GITHUB_ACTIONS” != “”
Bitbucket“$CI” = “true”

Below is 4 commonly used Parameters as an example, there are much more available, but as you can see form this list there is a lot of commonality.

Build SystembranchcommitPR #Build ID
Github Actions GITHUB_REFGITHUB_SHAcan get from RefGITHUB_RUN_ID

For Teamcity a common work around to it’s lack of env vars is to place a root level set of parameters that will inherit to every config on the server


env.TEAMCITY_BUILD_URL = %teamcity.serverUrl%/viewLog.html?

The `set-env` command is disabled in GitHub Actions

Recent security updates in GitHub actions prevented you from using the Environment variables, but there is a pretty easy work around that i am going to show.

Here’s an example command where we set a version number (APP_VERSION in this case is the name of the env variable) that will be used in various subsequent steps

run: echo "::set-env name=APP_VERSION::${{ env.MAJOR_MINOR_VERSION }}${{ github.run_number }}"        

We can instead pipe a string like this this to the GITHUB_ENV variable

run: echo "APP_VERSION=${{ env.MAJOR_MINOR_VERSION }}${{ github.run_number }}" >> $GITHUB_ENV

Later we can use it the same way we did environment variables before as seen below e we build a docker container using the version number in a subsequent step

run: docker build -t mydockerreg.internal/${{ env.APP_NAME }}:$APP_VERSION -f ${{ env.DOCKER_FILE_PATH }} .

It’s that easy.

If you are running on prem it’s tempting to just enable environment variables, but if you do and one day you want to scale your build workload into GitHub cloud then you’ll have problems, better to be complaint imo.

The full details about this from GitHub are here

Building dotnet in Containers

A lot of people use the dotnet cli to build their projects, and its a very handy tool, but there’s a lot of version of dotnet out there these days, and maintaining build agents in your CI with all the right version can be troublesome.

So that’s where containers can help you, and make sure you are building with the right sdk.

I’ll use a recent example from a project we did in github acitons.

in this build we run

dotnet restore mySolution.sln
dotnet build mySolution.sln --configuration Debug --no-restore
dotnet test --no-build

This project in question we are using dotnet 3.1 sdk.

So to run these in docker we can simple use a docker run with the sdk container

docker run -v ${PWD}:/scr --workdir /src dotnet build mysolution.sln

This will mount the current working directory into the container and use the sdk inside to run the build.

This isn’t the best approach though, the nuget cache for the container will be lost at the end of the build as its stored in user scope.

A better approach is to use a multi-staged docker file, then it will take advantage of cache in the docker chunks. You can add docker files from teh visual studio IDE that have things preconfigured for good performance for you.

Adding a docker file using visual studio

The basic docker file looks like the below

FROM AS base

FROM AS builder
COPY *.sln ./
COPY ["Agoda.Api.WebApi/Agoda.Api.WebApi.csproj", "Agoda.Api.WebApi/"]
COPY ["Agoda.Api.Core/Agoda.Api.Core.csproj", "Agoda.Api.Core/"]
COPY docker-compose.dcproj ./
RUN dotnet restore "Agoda.Api.WebApi/Agoda.Api.WebApi.csproj"
COPY . .
WORKDIR "/src/Agoda.Api.WebApi"
RUN dotnet build "Agoda.Api.WebApi.csproj" -c Release -o /app

FROM build AS publish
RUN dotnet publish "Agoda.Api.WebApi.csproj" -c Release -o /app

FROM base AS final
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "Agoda.Api.WebApi.dll"]

There’s a couple of points to note here.

Firstly you can see the files are copied into the container to build, project files first, and restore is done separately. The reason for this is we create a chunk with the csproj files so that this chunk will only rebuild if the project files change (i.e. this is generally when you update your dependencies), so this chunk will remain cached on your local or on the CI agent and not rebuild unless you update the csproj.

The second thing to note is that there is multiple FROM statements, this is because it is a multi stage docker file, so it has a large builder container that starts with the SDK, and a smaller base container that the output of the build is copied to in order to have a small output container for production.

So to build this one we simply now run:

docker build . -f Agoda.Api.WebApi/Dockerfile -t agodaapi:1.0

This should be run from the root dir of your solution, and Visual Studio puts the docker file in the sub folder.

You can then run a docker push to publish your newly built container.

You can use the similar approach by using a dockerfile with your unit test project, but in that case you don’t need a multistage, you just need a normal docker file that use sdk image and runs “dotnet test” as the entrypoint.

My Arcade Machine

So people have been seeing this in my video feed on calls the last few days so I thought I’d write a post about it.

A few years ago I found a guy in Thailand through Facebook that builds replica Arcade machines. And ordered my favorite game from the 80s called Gauntlet, this was one of the first 4 player machines, and its infamous because it has infinite levels.


Internally though, its not like the original, just the outside is the same as the original, include the descriptions and text of the characters you can play in the game.

Apparently a lot of the artwork from these old games is available online and he was able to download it and print vynal stickers and make it look just like the original. But with a few additions, the white buttons were not on the original, and also the original didn’t have 6 buttons. The addition of the buttons is to support other games.

Also the coin slots are different from the original which had 4 coins (one dedicated to each player), as describe on the instructions seen printed below.

2020-03-31 17.24.59

The 3rd and 4th players only have only 4 buttons because there is no 3 or 4 play arcade games that have 6 buttons.

Original screen was 19″ CRT but there screen inside is 24″ LCD he was able to fit while still maintaining the original dimensions.

Which leads me to whats on the inside.

2020-03-31 14.06.13

Inside there is a PC with MAME and a few other emulators for old consoles. about 20,000 games in total, all legal ones that are out of copy write of course 🙂



If you are wanting to know where to contact this guy to build you one unfortunately he has disappeared, his Facebook page deleted and his phone disconnected, I haven’t been to his house to check it out because I’ve lost the address too. But its a common profession restoring these old things so I’d do some searching and I’m sure you’ll turn up someone.


From Code Standards to Automation with Analyzers and Code Fixes

We started to talk about standards between teams and system owners at Agoda a couple of years ago. We first started out on C#, the Idea was to come up with a list of recommendations for developers, for a few reasons.

One was we follow polyglot programming here and we would sometimes get developers more familiar with Java, JavaScript and other languages that would be working on the C# projects and would often misused or get lost in some of the features (e.g. When JavaScript developers find dynamics in C#, or Java Developer get confused with Extension Methods).

Beyond this we want to encourage a level of good practice, drawing on the knowledge of the people we have we could advise on potential pit falls and drawbacks of certain paths. In short the standards should not be prescriptive ones, as in “you must do it this way”, they should be more “Don’t do this”, but also teach at the same time, as in “Don’t do this, here’s why, and here’s some example code”. But also includes some guidance as well, as in “We recommend this way, or this, or this, or even this, depending on your context”, but we avoid “do this”.


The output was the standards repository that we’ve now open sourced

It’s primarily markdown documents that allow us to easily document, and also use pull requests and issues to start conversation around changes and evolve.

But we had a “If you build it they will come” problem. We had standards, but people either couldn’t find them, didn’t read them, and even if they did, they’ll probably forget half of them within a week.

So the question was how do you go about implementing standards amongst 150 core developers and hundreds more casual contributors in the organisation?

We turned to static code analysis first, the Roslyn API in C# is one of the most mature Language Apis on the market imo. And we were able to write rules for most of the standards (especially the “Don’t do this” ones).

This gave birth to a cross department effort that resulted in a Code fix library here that we like to call Agoda Analyzers.


Initially we were able to add them into the project via the published nuget package and have them present in the IDE, and they are available here.


but like most linting systems they tend to just “error” at the developer without much information, which we don’t find very helpful, so we quickly moved to Sonarqube with it’s github integration.

This allows a good experience for code reviewers and contributors. The contributor get’s inline comments on their pull request from our bot.


This allows the user time to fix issues before they get to the code reviewers, so most common mistakes are fixed prior to code review.

Also the link (the 3 dots at the end of the comment) Deep links into sonarqube’s WebUI to documentation that we write on each rule.


This allows for not just “Don’t do this”, but also to achieve “Don’t do this, here’s why, and here’s some example code”.

Static code analysis is not a silver bullet though, things like design patterns are hard to encompass, but we find that most of the crazy stuff you can catch with it, leaving the code review to be more about design and less about naming conventions and “wtf” moments from code reviewers when reviewing the time a node developer finds dynamics in C# for the first time and decides to have some fun.

We are also trying the same approach with other languages internally, such as TypeScript and Scala are our two other main languages we work in, so stay tuned for more on this.




The Transitive Dependency Dilemma

It sounds like the title of a Big Bang Theory episode, but its not, instead is an all to common problem that breaks the single responsibility rule that I see regularly.

And I’ve seen first hand how much butt hurt this can cause, a friend of mine (Tomz) spent 6 weeks trying to update a library in a website, the library having a breaking change.

The root cause of this issue comes from the following scenario, My Project depends on my Library 1 and 2. My Libraries both depend on the same 3rd party library (or it could be an internal one too).


Now lets take the example that each library refers to a different version.


Now which version do we use, you can solve this issue in dotnet with assembly binding redirects, and nuget even does this for you (other languages have similar approaches too). However, when there is a major version bump and breaking changes it doesn’t work.

If you take the example as well of having multiple libraries (In my friend tomz case there was about 30 libraries that depended on the logging library that had a major version bump) this can get real messy real fast, and a logging library is usually the most common scenario. So lets use this as the example.



So what can we change to handle this better?

I point you to the single responsibility principle. In the above scenario, MyLibrary1 and 2 are now technically responsible for logging because they have a direct dependency on the logging library, this is where the problem lies. They should only be responsible for “My thing 1” and “My thing 2”, and leave the logging to another library.

There is two approaches to solve this that i can recommend, each with their own flaws.

The first is exposing an interface that you can implement in MyProject,


This also helps with if you want to reuse you library in other projects, you wont be dictating to the project how to log.

The problem with this approach though is that you end up implementing a lot of ILoggers in the parent

The other approach is to use a shared common Interfaces library.


The problem with this approach however is when you need to update the Common interfaces library with a breaking change it becomes near impossible, because you end up with most of your company depending on a single library. So I prefer the former approach.

Your Load Balancer Will Kill You

Let’s start by talking about the traditional way people scale applications.

You have a startup, you have a new idea, so you throw something out there fast, maybe it’s a Rails app with a mongoDB backend if you’re unlucky, or something like that

Now thing are going pretty good, maybe you have time and you re-write into something sensible at this point as you business grows and you get more devs, so now you’re on a nice react website with dotnet or node backend or something. But your going slow due to too many users, so you start to scale, first thing people do is this, load balancer in front, horizontally scale the web layer

Now that doesn’t seem too bad, with a few cache optimizations you’re probably handling a few thousands users simultaneous and felling happy. But you keep growing and now you need to handle tens of thousands, so the architecture starts to break out vertically.

So lets imagine something like the below, and if we’ve been good little vegemites we have a good separation of domains, so are able to scale the db by separating the domains out into microservices on the backend with their own independent data.

Our web site then ends up looking a bit like a BFF (Backend For Frontend), and we scale nicely and are able to start to scale up to tens or into the hundreds of thousands of users. And if you are using AWS especially you are going have these lovely Elastically Scaling Load balancers everywhere.

Now when everything is working its fine, but let’s look at a failure scenario.

One of the API B server’s goes offline, like dead, total hardware failure. What happens in the seconds that follow.

To start, let’s look at load balancer redundancy methods, LBs use a health-check endpoint, an aggressive setting would be to ping it every 2 seconds, then after 2 consecutive failures failures take the node offline.

Let’s also take the example we are getting 1,000 requests per second from our BFF.

Second 1
Lose 333 Requests

Second 2
Lose 333 Requests
Health check fails first time

Second 3
Lose 333 Requests

Second 4
Lose 333 Requests
Health check fails second time and LB stops sending traffic to node

So in this scenario we’ve lost about 1300 requests, but we’ve recovered.

Now you say, but how about we get more aggressive with the health check? This only goes so far.

At scale, the more common outage are not ones where things going totally offline (although this does happen), they are ones where things go “slow”.

Now imagine we have aggressive LB health checks (the ones above are already very aggressive so you cant get much more usually), and things are going “slow” to the point health checks are randomly timing out, you’ll start to see nodes pop on and offline randomly, your load will get unevenly distributed to the point usually where you may even have periods of no nodes online, 503s FTW!. I’ve witnessed this first hand, it happens with agressive health checks 🙂

Next is, what happens if your load balancer goes offline? While load balancers are generally very reliable, things like config updates and firmware updates are times when they most commonly fail, but even then, they still can succumb to hardware failure.

If you are running in e-commerce like I have been for the last 15 odd years then traffic is money, every bit of traffic you lose can potentially be costing you money.

Also when you start to get into very large scale, the natural entropy on hardware means hardware failure becomes more common. For example, if you have say 5,000 physical server in your cloud, how often will you have a failure that takes applications offline.

And it doesn’t matter if you are running AWS cloud, kubernetes, etc hardware failure still takes things offline, your VMs and containers may restart with little to no data loss, but they still go offline for periods.

How do we deal with this then? How about Client-side Weighted round-robin?

WTF is that? I hear you say. Good Question!

It’s were we move the load balancing mechanism to the client that is calling the backend. There is several advantages to doing this.

This is usually coupled with a service discovery system, we use consul these days, but there is lots out there.

The basic concept is that the client get a list of all available nodes for a given service. They will the maintain they own in memory list of them and round robin through them similar to a load balancer.

This removes infrastructure (i.e. cost and complexity)

The big differences comes though that the client can retry the request on a different node. You can implement retries when you have a load balancer in front of you, but you are in effect rolling the dice, having the knowledge of the endpoint on the client side means that the retries can be targeted at a different server to the one that errored, or timed out.

What’s the Weighting part though?

Each client maintains its own list of failures, so for example if a client got a 500 or timeout from a node, it would weight him down and start to call him less, this cause a node specific back off, which is extremely important in the more common outage at scale of its “slow”, so if a particular node has been smashed a bit too much by something and is overloaded the clients will slow back off that guy.

Let’s look at the same scenario as before with API B and a node going offline. We usually set our timeouts to 500ms to 1 second for our API requests, so let’s say 1 second, as the requests start to fail they will retry on the next node in the list, and weight down the offline server in the local clients list of servers/weighting, so here’s what it looks like:

Second 1
220 Requests take 1 second longer

Second 2
60 Requests take 1 second longer

Second 3
3 Requests take 1 second longer

Second 4
3 Requests take 1 second longer

Second 5

The Round robin weighting kicks in at the first failure, as we only have 3 web servers in this scenario and they are high volume the back-off isn’t decremented in periods of seconds its in number of requests.

Eventually we get to the point that we are trying the API once every few seconds with a request from each web server until he comes back online, or until the service discover system kicks in and tells us he’s dead (which usually takes under 10 seconds)

But the end result is 0 lost requests.

And this is why I don’t use load balancers any more 🙂

Importing Custom TypeScript tslint rules into Sonarqube

I’ll be the first to say I am not a fan of sonarqube, but is the only tool out there that can do the job we need. Getting TypeScript working with it was royal butt hurt, but we got there in the end so I wanted to share our journey.

The best way we found to work with it was to store our rules in our tslint config in source control with our own settings and use it as, this is good because it’ll help keep the sonarqube server rules in sync with the developers.

The problem we run into is that the rules need to exist on the server, so if you for example add the react-tslint rules to your project they also need to be defined in the sonarqube server here



Once they are there sonar understand the rules, but will not process them, but rather than setting up the processing on the server we decide to use our build.

So what we do is

  1. Import ALL rules to sonar server (once off)
  2. run tslint and export failed rules to file
  3. import failed rules using sonar runner (instead of letting runner do analysis)

The server is aware of ALL rules, but its our tslint output that tells it which ones have failed, so you can disable rules in your tsling config that the server is aware of and it won’t report them.

This then means that the local developer experience and the sonarqube report should be a lot more in sync than having to maintain the server processing, and means it easier to run multiple project on the one server with disparate rule sets.

The hard part here though is the import of rules

For our initial import we did the follow rule sets:

  1. react-tslint
  2. tslint-eslint-rules
  3. tslint-consistent-codestyle
  4. tslint-microsoft-contrib

And I have created some powershell scripts that generates the format that is needed from the rules git repos.

To use this clone each of the above repos, then run its corresponding script to generate the output file, then copy and paste this into the section in the sonarqube admin page (it’s ok, this is a once off step).

[gist /]

you should create one record below for each of the four imports, then paste the output from each powershell script into the boxes on the right, as seen below.


Once this is done you need to restart the sonarqube server for the rules to get picked up

WARNING: check for duplicate rule names, there is some (I forgot which ones sorry) and they prevent the sonarqube server from starting and you will need to edit the SQL database to fix it.

Then browse to your rule set and active the rules into it. I recommend just creating a single rule set and put everything in it, like i said you can control the rules from your tslint run, and just add all rules to all projects on the sonarqube server side.


After this run your sonarqube analysis build (see here if you haven’t built it yet ) and you are away.


Sonarqube with a MultiLanguage Project, TypeScript and dotnet

Sonarqube is a cool tool, but getting multiple languages to work with it can be hard, especially because each language has its own plugin maintained by different people most of the time, so the implementations are different, so for each language you need to learn a new sonar plugin.

In our example we have a frontend project using React/Typescript and dotnet for the backend.

For C# we use the standard ootb rules from microsoft, plus some of our own custom rules.

For typescript we follow a lot of recommendations from AirBnB but have some of our own tweaks to it.

In the example I am using an end to end build in series, but in reality we use build chains to speed things up so our actual solution is quite more complex than this.

So the build steps look something like this

  1. dotnet restore
  2. Dotnet test, bootstrapped with dotcover
  3. Yarn install
  4. tslint
  5. yarn test
  6. Sonarqube runner

Note: In this setup we do not get the Build Test stats in Teamcity though, so we cannot block builds for test coverage metrics.

So lets cover the dotnet side first, I mentioned our custom rules, I’ll do a separate blog post about getting them into sonar and just cover the build setup in this post.

with the dotnet restore setup is pretty simple, we do use a custom nuget.config file for our internal nuget server, i would recommend always using a custom nuget config file, your IDEs will pick this up and use its settings.

dotnet restore\nuget.config MyCompany.MyProject.sln

The dotnet test step is a little tricky, we need to boot strap it with dotcover.exe, using the analyse command and output HTML format that sonar will consume (yes, sonar wants the HTML format).

%teamcity.tool.JetBrains.dotCover.CommandLineTools.DEFAULT%\dotcover.exe analyse /TargetExecutable="C:\Program Files\dotnet\dotnet.exe" /TargetArguments="test MyCompany.MyProject.sln" /AttributeFilters="+:MyCompany.MyProject.*" /Output="dotCover.htm" /ReportType="HTML" /TargetWorkingDir=.

echo "this is working"

Lastly sometimes the error code on failing tests is non zero, this causes the build to fail, so by putting the second echo line here it mitigates this.

Typescript We have 3 steps.

yarn install, which just call that exact command

Out tslint step is a command line step below, again we need to use a second echo step because when there is linting errors it returns a non zero exit code and we need to process to still continue.

node ".\node_modules\tslint\bin\tslint" -o issues.json -p "tsconfig.json" -t json -c "tslint.json" -e **/*.spec.tsx -e **/*.spec.ts
echo "this is working"

This will generate an lcov report, now i need to put a disclaimer here, lcov has a problem where it only reports coverage on the files that where executed during the test, so if you have code that is never touched by tests they will not appear on your lcov report, sonarqube will give you the correct numbers. So if you get to the end and find that sonar is reporting numbers a lot lower than what you thought you had this is probably why.

Our test step just run yarn test, but here is the fill command in the package json for reference.

"scripts": {
"test": "jest –silent –coverage"

Now we have 3 artifacts, two coverage reports and a tslint report.

The final step takes these, runs an analysis on our C# code, then uploads everything

We use the sonarqube runner plugin from sonarsource


The important thing here is the additional Parameters that are below


You can see our 3 artifacts that we pass is, we also disable the typescript analysis and rely on our analysis from tslint. The reason for this is it allows us to control the analysis from the IDE, and keep the analysis that is done on the IDE easily in sync with the Sonarqube server.

Also if you are using custom tslint rules that aren’t in the sonarqube default list you will need to import them, I will do another blog post about how we did this in bulk for the 3-4 rule sets we use.

Sonarqube without a language parameter will auto detect the languages, so we exclude files like scss to prevent it from processing those rules.

This isn’t needed for C# though because we use the nuget packages, i will do another blog post about sharing rules around.

And that’s it, you processing should work and turn out something like the below. You can see in the top right both C# and Typescript lines of code are reported, so this reports Bugs, code smells, coverage, etc is the combined values of both languages in the project.


Happy coding!