Olo Engineering BlogStories, tips, tricks, and more from the engineering team at Olo2023-03-28T00:00:00+00:00https://olo.engineeringOlo EngineeringHow we migrated our Webhook Service to Kafka2022-10-11T00:00:00+00:00https://olo.engineering/posts/migrating-webhooks-to-kafka/
<h2>Webhooks as part of our open platform</h2>
<p>Olo uses <a href="https://en.wikipedia.org/wiki/Webhook" target="_blank">webhooks</a> extensively to notify our partners when important events take place in our ordering platform. Webhooks are an important part of our open platform philosophy and allow integrations between Olo and external systems to be customized in a way that suits our customers’ specific operational needs.</p>
<p>Consider a webhook that signals when an order has been placed. Restaurants might use this webhook information to:</p>
<ul>
<li>display the order details on a kitchen display system (KDS) to help with food preparation</li>
<li>store the information in a database for payments reconciliation</li>
<li>ship information to a 3rd party business intelligence tool for advanced reporting</li>
</ul>
<p>In another scenario, a webhook triggers whenever a new user has signed up, and a customer may use that information to keep their mailing list or CRM system up to date.</p>
<p>The design gives partners an efficient way to respond to the events they are interested in without needing to poll for changes (and it saves us the associated load on our system).</p>
<p>At a technical level, webhooks are implemented by posting a message to an HTTP endpoint that our partner provides. We try to ensure that all messages reach their destination, and have implemented several resilience measures such as retries and circuit breakers in cases of transient issues.</p>
<h2>Webhooks as a potential scaling issue</h2>
<p>Our Webhook Sending Service is responsible for composing and scheduling webhooks for delivery. In our first incarnation of this service, when our ordering platform wanted to publish a webhook, it would send a small message to the Webhook Sending Service using AWS Simple Queue Service (SQS). The service would then query our transactional database for the necessary data before building and sending the webhook.</p>
<img src="https://olo.engineering/img/EventBasedWSS/webhook-using-aws.svg" alt="Flow chart to show the Webhook Sending Service using AWS Simple Queue Service (SQS) ">
<p>Applications could send webhooks easily since they only needed a few key pieces of information. For example: the Order ID and the type of event that had taken place. The Webhook Sending Service took care of the rest.</p>
<p>One downside of this approach was the additional load on a transactional database that was already busy doing other important work, like processing orders. We were hoping to address scaling concerns by using an alternative source for this data.</p>
<h2>Considering Kafka</h2>
<p>We already relied on Kafka for internal communication between Olo subsystems and services. As an example, we publish an event whenever an order has been placed, and if an internal service needs to do something when an order is placed, the service simply subscribes to this Kafka topic.</p>
<p>Kafka and SQS are both first-class messaging technologies at Olo, and over time we’ve developed guidance on when to prefer one over the other. We generally prefer SQS as a means of providing inexpensive point to point messaging when an event can be treated as an instruction to a service working asynchronously, and when there’s only a single subscriber. Kafka is our preferred option for events we describe in the past tense (Order Placed, Credit Card Billed) or when we’d need to have multiple (or dynamic) subscribers.</p>
<p>In our case, the subscriber-agnostic nature of Kafka meant that our order submission code would not have to change to add a new subscriber. The code has no awareness of which subscribers are reading from the topic at all. All of this results in low coupling between our services.</p>
<h2>Leveraging Kafka streams instead of SQS</h2>
<p>We determined that the Kafka events would likely work well as the triggering mechanism for sending our webhooks. The events already contained the vast majority of the data we needed (we use the <a href="https://codesimple.blog/2019/02/16/events-fat-or-thin/" target="_blank">fat events</a> pattern), and we wouldn’t have to make too many changes in order to be able to construct our webhooks without having to query the database at all.</p>
<p>On top of that - and almost as exciting - this design would allow us to add new types of webhooks in the future without having to modify our core order submission logic. This would reduce the risk of such changes significantly, and the team in charge of the Webhook Sending Service would be able to iterate on webhook-related functionality quickly and with a high degree of autonomy.</p>
<p>An approach we considered but decided against was to add the missing data to SQS. This would have allowed the Webhook Sending Service to avoid querying the database, but it would not have given us the low coupling between our order submission logic and our webhook sending logic that we got from using Kafka instead.</p>
<img src="https://olo.engineering/img/EventBasedWSS/add-missing-data.svg" alt="Flow chart to add missing data to SQS that allowed Webhook Sending Service to avoid querying the database">
<h2>A challenging migration</h2>
<p>While migrating from one source of data to another may sound easy in theory, the devil is in the details. We had a number of challenges during this project, including:</p>
<ul>
<li>Coordination was required in order to avoid sending duplicates or dropping webhooks during the migration period.</li>
<li>We wanted to ensure that we were sending the exact same data as before and we needed to preserve formatting for things like phone numbers and decimal places.</li>
</ul>
<h2>Migrating safely</h2>
<p>Our primary goal was to perform the migration without changing any of the webhooks sent to our partners. We decided that the safest way would be to divide the migration into two distinct steps:</p>
<ol>
<li>Compare: Build webhooks using Kafka as the source, but don’t send these yet. Instead, compare the payloads with the payloads that use SQS and the database as their source, and iterate to fix any discrepancies.</li>
<li>Migrate: When we reach a 100% match this allows us to confidently switch to sending webhooks based on Kafka data.</li>
</ol>
<p>Each of the two steps has some interesting challenges.</p>
<h2>Sampled comparisons using Scientist.NET</h2>
<p>We were able to reuse an existing homegrown framework for comparisons at Olo, so implementing this part was quick. The framework uses <a href="https://github.com/scientistproject/Scientist.net" target="_blank">Scientist.NET</a> to perform the comparisons, streams the mismatches to a Redshift database, and reports the results to Datadog for observation.</p>
<p>From our application, all we had to do was generate the legacy and Kafka-based webhook payloads and hand these to the framework along with instructions on how to compare the two JSON blobs, everything else just worked out of the box. Whenever a mismatch showed up we would use the data in Redshift to manually compare the two payloads and work to correct the discrepancy.</p>
<p>One interesting thing to note about the comparisons is that due to how the data from SQS and Kafka is received, doing the comparisons required us to make additional database calls. To avoid unnecessarily increasing database load, we decided to use sampling to specify how many webhooks should go through the comparison logic. A feature flag allowed us to run comparisons for a small percentage of webhooks and in turn let us control the number of additional database calls we were comfortable making. After running comparisons for a few hours we felt confident that we had processed a representative sample of webhooks.</p>
<img src="https://olo.engineering/img/EventBasedWSS/generate-legacy-webhook-payloads.svg" alt="Flow chart to show how to generate the legacy and Kafka-based webhook payloads">
<p>As a result of the comparisons we found a few different classes of discrepancies:</p>
<ul>
<li>Missing data: The Kafka events were missing a few pieces of information needed for our webhooks. In these cases we modified the events to start including the data we needed.</li>
<li>Null vs blank string: We had some cases where legacy payloads would say <code>null</code> to indicate that data was missing and our new payloads had a blank string instead. In order to maintain backwards compatibility we updated the new payloads to match the legacy behavior.</li>
<li>Number precision: Floating point fields didn’t always serialize consistently when running experiments. We accepted the few discrepancies as false positives.</li>
</ul>
<p>We spent a few weeks running comparisons and fixing discrepancies. Eventually we were ready to proceed with the migration.</p>
<h2>Migrating safely with a simple Redis lock</h2>
<p>The SQS-based requests to send a webhook and the Kafka events could be off in either direction by a few seconds due to things like batching or a backed up queue. In the Webhook Sending Service we had different threads for reading from SQS and from Kafka.</p>
<p>All of this would have made it challenging to switch from SQS to Kafka in one move. We could theoretically have ended up with duplicates if the same webhook were sent both from SQS and from Kafka. We could have dropped messages if we switched before we processed from SQS and the corresponding Kafka event had already passed.</p>
<p>To protect against these issues we decided to use a simple Redis lock. The SQS and Kafka consumers both try to acquire a semaphore - implemented as a <a href="https://redis.io/commands/setnx/" target="_blank">SETNX</a> command - and whichever one succeeds sends the webhook message. The strategy was to run both consumers in parallel for about a minute, making sure that we didn’t drop any webhooks, and using the Redis lock to avoid any duplicates.</p>
<img src="https://olo.engineering/img/EventBasedWSS/redis-lock.svg" alt="Flow to show using the Redis lock to avoid any duplicate">
<p>Using SETNX is very simple, but it has <a href="https://redis.io/commands/setnx/#design-pattern-locking-with-codesetnxcode" target="_blank">some limitations</a> when used for locking, so we wanted to verify that it was good enough for our specific, short-lived scenario or if we would have to spend more effort implementing something like <a href="https://redis.io/docs/reference/patterns/distributed-locks/" target="_blank">Redlock</a>. In our staging environment we set up a custom webhook recipient endpoint and used a load test to send a few thousand webhooks with both SQS and Kafka enabled, using SETNX as a locking mechanism. This allowed us to confirm that we got exactly one copy of each webhook and that the simple lock would do the trick.</p>
<h2>Big Enough design upfront</h2>
<p>On this project we did a fair amount of design work upfront and iterated on the design as we learned more about the problem at hand. Creating diagrams made it easier to discuss and update the design, allowing us to ensure that everyone on the team was on the same page. Spending a couple of extra days ensuring we had a decent design in place seemed like a good investment, and limited the number of surprises during the implementation phase. We also found it helpful to be able to reference the diagram later in the project, during implementation and testing.</p>
<p>Our design relied on feature flags to control the migration process. First, they allowed us to enable Kafka processing and payload sampling until we were happy with the result. Next, we used them to run Kafka in parallel with SQS using the Redis lock to prevent duplicates, before shutting off SQS processing and relying only on the Kafka pipeline.</p>
<img src="https://olo.engineering/img/EventBasedWSS/working-diagram.png" alt="Working diagram to show work">
<h2>Doing the migration</h2>
<p>We decided to sample and migrate a few types of webhooks at a time, starting with a few low-volume ones and gradually working our way to the busier ones as we gained confidence. We ran multiple of these migrations in parallel - sampling some types of webhooks while completing the migrations for others.</p>
<p>In order to keep track of all of the feature flags and ensure that we got it right, we created a step by step process that we followed for each webhook. While the actual plan is slightly more detailed, the overall workflow looks something like the following:</p>
<h3>Sampling Phase</h3>
<ol>
<li>Turn on Kafka processing, enabling reading off the stream but not sending.</li>
<li>Sample Kafka vs. SQS payload accuracy at 5%. Report any inaccuracies and fix.</li>
<li>When satisfied, shut off sampling.</li>
<li>Shut off kafka processing and observe the stream no longer being read from.</li>
</ol>
<h3>Migration Phase</h3>
<ol>
<li>Turn on the sending mechanism for Kafka. This will enable the sending feature for Kafka, but not the processing feature.</li>
<li>Enable the Redis lock. At this point SQS will always grab the lock as we haven’t yet enabled Kafka stream processing.</li>
<li>Turn on Kafka processing. Now, both SQS and Kafka will both attempt to send the webhook, the locking mechanism prevents both from sending the payload.</li>
<li>Turn off SQS processing leaving only the Kafka process running.</li>
<li>Turn off the Redis locking mechanism.
</li></ol>
<p>To keep track of how far along we were in the migration process for each webhook we created a simple checklist in Jira:</p>
<img src="https://olo.engineering/img/EventBasedWSS/migration-checklist.png" alt="Migration checklist in Jira" width="640px">
<p>Our service was already instrumented quite extensively, and during the migration process we kept an eye on the level of webhooks sent from our system, verifying that this did not change.</p>
<p>Where we did see a big change, however, was in the number of database calls made from our service. Post migration, we reduced the number of database calls per second from within the Webhook Sending Service by almost three orders of magnitude.</p>
<h2>Wrapping it up</h2>
<p>By re-using our Kafka stream, we were able to:</p>
<ul>
<li>Remove millions of extra database queries a day.</li>
<li>Got rid of extra HTTP calls made to SQS from within our most high traffic applications, saving ~15ms on average per order.</li>
<li>Future proof our webhook sending service by leveraging an ever-expanding stream of information.</li>
</ul>
<img src="https://olo.engineering/.netlify/functions/ga?v=1&_v=j83&t=pageview&dr=https%3A%2F%2Frss-feed-reader.com&_s=1&dh=olo.engineering&dp=%2Fposts%2Fmigrating-webhooks-to-kafka%2F&ul=en-us&de=UTF-8&dt=How%20we%20migrated%20our%20Webhook%20Service%20to%20Kafka&tid=GA ID. Update (ignore) me. Note, that this is not compatible with the not-yet-commonly used version 4 of Google Analytics" width="1" height="1" style="display:none" alt="">
Cyclic releases2022-10-27T00:00:00+00:00https://olo.engineering/posts/cyclic-releases/
<p>In late 2019 Olo reached an inflection point. Our core platform, lovingly crafted for over a decade, had worked its way to a weekly release cadence, but that was no longer fast enough to meet the needs of the business. We needed to deliver changes faster, with less risk and more stability. But with an industry standing built on reliability and our reputation on the line, how could we get closer to a Continuous Delivery future?</p>
<h2>The Problem</h2>
<p>In 2019 the release cadence for Olo’s core platform was once a week. Code freeze for our core platform happened on Thursday afternoon, and code was deployed Tuesday morning. To some folks joining the organization this was so cool. For those who came from organizations with monthly or even bi-annual releases, releasing weekly felt amazing!</p>
<p>But even then, there were cracks in the system. Deployments were a lot of work. Deploys were managed by the Release Rotation, a small team of experienced engineers, rotating each week. They coordinated validation across many teams. They followed up on signals from our monitoring tools. The deploy steps were automated, but a team pushed the buttons to start it, and went through a long checklist of before and after steps. Meanwhile, the schedule created a mad scramble for teams. There was a Thursday morning rush to get branches merged so they wouldn’t see an extra week of delay, and that rush wasn’t helping the success rate of release branches. As we spent a year and a half in rapid growth of our engineering department, these problems became more pronounced.</p>
<p>Between November of 2019 and January of 2020, our deployment success rate deteriorated.</p>
<img src="https://olo.engineering/img/cyclic-releases/deployment-failure-rate-aug-oct.svg" alt="Deployment Failure Rate Aug2019-Oct2019">
<img src="https://olo.engineering/img/cyclic-releases/deployment-failure-rate-nov-jan.svg" alt="Deployment Failure Rate Nov2019-Jan2020">
<p>As more people were contributing code to each release, releases became larger and there were more opportunities for a late-breaking bug to be introduced, which would delay a release or cause a rollback of all changes in the release. As a result, teams had extra unplanned work <em>which reduced capacity for planned work</em>.</p>
<h2>Analysis</h2>
<p>Olo has many independently deployable services, and moving to that model had been our strategy for scaling & accelerating development for many years, but we came to realize that we'd always have some monolithic elements to our system that multiple people are working in. Instead of relying on that going away, we came to embrace the monolith of our core platform a bit more by making it easier to work with. Monoliths have a bad reputation particularly because they have infrequent high-risk releases, and we knew we could reduce a lot of that pain by deploying more often, but the question was, how?</p>
<p>An idea we’d been discussing for months was “release our core platform twice a week”. This <em>could</em> be a big help, but it also might not. We all felt that the amount of work in each release was a risk factor to reduce, but it was also clear that wasn’t enough. With our high deployment failure rate, this was hard to get buy in on.</p>
<p>We could double down on test automation. We knew this was needed. If we had “enough” tests, we could feel confident in shipping more. There were two problems with this path. First, how would we know we had enough? Where could we draw a finish line? The second problem was putting specifics into “doubling down”. We are a growth stage company that needs to innovate rapidly, and we had hit a huge elbow in growth from the first coronavirus shutdown. We could not afford to put a large chunk of engineering capacity into automation.</p>
<p>We knew that we needed to get to a continuous delivery future, but how?</p>
<h2>The Solution</h2>
<p>In January 2020, Laura Edwards joined Olo as a QA Manager. She’d worked on a similar transformation at her last employer, and had a proposal for Olo. Though she called it “prancing the ponies,” which is a long story for another day, it soon went by the name “cyclic releases” at Olo, and it was as simple as it was powerful: <strong>take away the artificial delays</strong>.</p>
<p>To illustrate, here’s what our “golden path” release process had looked like:</p>
<img src="https://olo.engineering/img/cyclic-releases/release-process-golden-path.svg" alt="Flowchart for ordering platform release process golden path">
<p>This is what the target path looked like:</p>
<img src="https://olo.engineering/img/cyclic-releases/platform-release-process-path.svg" alt="Flowchart for ordering platform targeted release process path">
<p>So simple. Don’t wait for Thursday to code freeze - freeze as soon as the previous release is verified in Prod. Don’t wait for Tuesday to release - release as soon as you safely can after verification.</p>
<p>The power in this approach comes from the feedback loops it creates. Instead of a system that makes us wonder “have we done enough to go faster?”, we create a system that goes faster <em>if and only if</em> we do things to improve the system. Our speed is dictated by our release quality, our testing speed, our build and deploy tooling. When you think about it, this system of “cyclic releases” is the essence of <a href="https://continuousdelivery.com/" target="_blank">Continuous Delivery</a>. It didn’t have fancy tools and it was manual in all sorts of ways it ought to be automated. But it delivers a release each time the code is ready.</p>
<h2>Rollout</h2>
<p>Changing a system is hard with software - it’s even harder with people. As those of us involved started thinking about this idea, all kinds of ideas came up about the second, third, and twelfth steps down the road. This was exciting! But we held this back and started small.</p>
<p>There were two things we changed when we started. First, we changed the Release Rotation, constraining it to teams and engineers most bought in to working on improvements here. Second, we stopped paying attention to the day of the week. We had a daily meeting at 3 PM ET, to answer the question “are we ready to release tomorrow morning?” Then we put all the power in the hands of the release rotation to make this decision.</p>
<p>Our communications emphasized the smallness of this change. We weren’t changing the tests we’d run. We weren’t changing the steps in the deployment. We weren’t lowering quality, and we <em>definitely</em> weren’t pushing for long hours. We were just making the decision to deploy a quality-driven one, rather than calendar-driven.</p>
<p>This change got extensive communication. We had a presentation to all of engineering. We hosted Q&A sessions. We wrote reams of documentation. We had several ad hoc meetings with teams. We met one on one with everyone from VPs to individual engineers that were just not sure.</p>
<p>Our primary message: Get changes into production <em>safely</em> and <em>quickly</em> in a <em>sustainable</em> way.</p>
<h2>Results</h2>
<p>We started this new process in May of 2020. We expected to see a rocky start, as teams got used to this difference, but after a month we were humming along:</p>
<img src="https://olo.engineering/img/cyclic-releases/avg-days-between-releases.svg" alt="Chart for avg days between releases">
<h2>Conclusion</h2>
<p>The journey to safe, sustainable, and rapid software release is a long one, and this was just the first step. Since this first step, we’ve created a team for improving our release process, re-imagined our verification and accountability, invested a ton in test automation, and created more separations to allow pieces of our platform to deploy on their own. We continue to improve, deploying at an average of just under 1 release per day. But we’re not satisfied yet. We believe that increased deployment frequency, in concert with improving the other <a href="https://www.thoughtworks.com/radar/techniques/four-key-metrics" target="_blank">Four Key Metrics</a> will enable us to innovate faster and deliver more value to our customers.</p>
<img src="https://olo.engineering/.netlify/functions/ga?v=1&_v=j83&t=pageview&dr=https%3A%2F%2Frss-feed-reader.com&_s=1&dh=olo.engineering&dp=%2Fposts%2Fcyclic-releases%2F&ul=en-us&de=UTF-8&dt=Cyclic%20releases&tid=GA ID. Update (ignore) me. Note, that this is not compatible with the not-yet-commonly used version 4 of Google Analytics" width="1" height="1" style="display:none" alt="">
F# in Action at Olo on the Olo Pay Project2023-02-03T00:00:00+00:00https://olo.engineering/posts/f-sharp-in-action/
<h1>Introduction</h1>
<p>I’m Adam Anderson, a Staff Engineer at Olo and founding member of the Olo Pay team. Much of Olo’s tech stack is based on Microsoft .NET, where C# is the <em>de facto</em> standard language. At the beginning of the project, we had a clean slate in front of us: We knew Olo Pay was going to be a brand-new service with its own repository, and we had the freedom to make a lot of decisions about what technologies we were going to use, including the language. We chose F#. In this post, I’ll share why we chose it, how it affected us, and what it’s like to own an F# application in production.</p>
<h1>Why we chose F#</h1>
<p>Coming into the project, we had a team of 6 Senior+ Software Engineers: 3 had prior experience with F# projects in production, and the other 3 had been exposed to F# by working on a project that was written in C# but whose tests were all written in F#. As an aside, I think this is a great way to provide a low-risk stepping stone for interested engineers to introduce F# into a project and start gaining experience and confidence.</p>
<p>Our team liked and wanted to use F# because they liked the pragmatic blend of safety (especially but not limited to null safety), succinctness, and the ability to program in the functional paradigm where possible, combined with the ability to apply the object/imperative paradigm as needed, such as for interoperability with C# code or opting into mutable values for performance reasons.</p>
<h1>Implementation</h1>
<h2>IDE/Editor support</h2>
<p>F# IDE/editor support in general is not as good as C#’s, but this isn’t such a terrible criticism considering just how good C#’s editor support is. Our team used a combination of Visual Studio 2019/2022 and Visual Studio Code on the project at first. Each one had relative strengths and weaknesses: Visual Studio’s F# CodeLens type annotations were useful and powerful, but were still experimental and would often render in the wrong place, obscuring other code until the entire application was restarted. This feature is still marked as experimental in Visual Studio 2022. In contrast, Code’s type annotations rendered reliably, but lacked navigation links. Code also offers little support in the editor for running tests, and less for debugging them. Because of this, engineers tended to hop back and forth between the two depending on the type of work they were doing. Later, some engineers started using Rider, which boasts CodeLens type annotations comparable to Code’s and full support for test running and debugging, as well as the most F# refactorings. Most of our engineers feel Rider offers the best F# development environment for our needs out of the ones that we used.</p>
<h2>Application frameworks</h2>
<p>The new service we were building consisted of a webhook listener and several worker services. For the webhook listener, we chose <a href="https://saturnframework.org/">Saturn</a>, a web framework which is itself built upon <a href="https://github.com/giraffe-fsharp/Giraffe">Giraffe</a>, a functional web “micro-framework”. Our webhook listener didn’t have complex needs, so we haven’t pushed these frameworks hard. Because we haven’t leveraged the unique characteristics of these frameworks but have taken on their additional cognitive overhead, we would probably not make this choice again. We still love these frameworks, and would choose to use them for projects that benefit enough from their strengths to justify their extra cognitive costs. Our workers use the stock .NET Core Worker Service template, and they currently get deployed as Windows services and also run on Linux as Systemd services in our local Docker development environment. The web and workers were all scaffolded by dotnet CLI templates in F#; the only difference from scaffolding them as C# was specifying the language with the -lang flag.</p>
<h2>Libraries</h2>
<p>One of the great strengths of F# is the ability to write functional code while being able to leverage the many high-quality packages of the wider .NET world. We use some of the same popular packages you’d expect to see in any other .NET project, such as Dapper and Json.NET. We don’t use many F#-specific packages besides Saturn and Giraffe, but <a href="https://demystifyfp.gitbook.io/fstoolkit-errorhandling">FsToolkit.ErrorHandling</a> deserves a mention for adding so many generally useful functions.</p>
<h2>Language features</h2>
<p>If I were asked what language features we benefited from most, that would be a hard question to give a really definitive answer to, or at least it would be hard to give a flashy answer. The language features we benefited from the most were the usual suspects: Immutability by default, composable functions and data, functions as fine-grained interfaces, automatically curried functions, and making the absence of a value (nulls) visible as a type that must be dealt with at compile time. We chose not to use any type providers, which is probably one of the flashier features. We made that decision because the most mature database type provider is for MSSQL, and we are using PostgreSQL, whose type provider is less mature.</p>
<h2>Tooling</h2>
<p>The F# ecosystem is completely compatible with all the tools one might avail themself of in a C# project, but there are a few alternative tools that are commonly used in the F# community. Two we used were Paket and FAKE. Paket is not F#-specific; it can be used in any .NET project, but seems to be most prevalent in the F# community. FAKE isn’t F#-specific, either; it can be used to build any .NET project, but being a F#-based DSL, it probably wouldn’t appeal to audiences who don’t already know F#.</p>
<p><a href="https://fsprojects.github.io/Paket/">Paket</a> is an alternative package manager. It had deterministic restores via lock files long before NuGet did, but NuGet has slowly been catching up in features over the years. However, there are still enough additional features that I am glad we chose Paket, and I would use it again on future projects. Some of the features we appreciated were the lock file (now also provided by NuGet), the solution-wide view of package dependencies (NuGet’s view remains project-centric), and a much more readable version constraint syntax (compare <a href="https://fsprojects.github.io/Paket/nuget-dependencies.html#Version-constraints">Paket’s</a> syntax to <a href="https://docs.microsoft.com/en-us/nuget/concepts/package-versioning#version-ranges">NuGet’s</a>). I’ve also appreciated <a href="https://fsprojects.github.io/Paket/paket-why.html"><em>dotnet paket why <package_id></em></a> more than once while trying to understand why a specific version of some transient dependency was being resolved.</p>
<p><a href="https://fake.build/">FAKE </a>is a F#-based DSL for build scripts. In addition to its core library, it has a wide variety of plugins available to integrate with various build servers and tools. We started out with FAKE, but over time, we found that we were spending more time looking up how to execute the equivalent of a CLI command we already knew how to write then it would have taken to just execute it as a shell command. We ended up dumping FAKE in favor of a home-grown PowerShell build script using only half as many lines of code. I would not use FAKE for a modern .NET Core project because of how easy-to-use the modern dotnet CLI is, but I would consider using it if I needed to write a build script for a .NET Framework 4.x project (where FAKE provides assistance with MSBuild), or if I wanted to write FSX scripts for .NET 4.x that needed to consume NuGet packages, something that is much trickier pre-F# 5.0.</p>
<h1>Scripting</h1>
<p>A nice side benefit of the F# language is its inbuilt support for scripting. Simply create a text file with an .fsx extension, write some F#, then run it from the command line with no build necessary! This came in handy for us a few times when we had a need to write some quick developer tooling that made use of NuGet packages. Modern FSX scripts can easily load NuGet packages with a simple directive, providing a simpler alternative than either adding a console project to our solution, or installing, loading, and consuming a NuGet package with PowerShell.</p>
<h1>Build & Deploy (+ coverage)</h1>
<p>F# didn’t have an impact either way on our build & deployment process. We use the same tools as many of our C# projects at Olo: TeamCity and Octopus Deploy, underscoring that F# projects are just .NET projects at the end of the day.</p>
<p>Later in the project’s life, we decided to add code coverage reporting, and here again we found F# to be well supported. We chose to use JetBrains dotCover, and it worked exactly as expected with no language-specific changes in behavior. The code coverage reports even included highlighting of the source code to show which lines were covered.</p>
<h1>Hiring</h1>
<p>As we grew our team, it was difficult to find candidates who already knew F# due to its relative lack of popularity. We supplemented those who we were able to find with engineers who were open-minded and curious about the language and the functional paradigm. Because our expanded team now included members who were less familiar with F# syntax and functional programming, we instituted regular exercise sessions of code katas to be implemented in F#, either as homework with a show-and-tell, or together as a mob. The cadence of these meetings has since slowed as the team’s skills improved, but they still continue to this day.</p>
<h1>Outcomes/What if?</h1>
<p>In retrospect, it’s difficult to say if choosing F# for Olo Pay helped or not, because we have nothing to compare to directly. For one well-documented close comparison of a system implemented first in C# and then in F#, see <a href="https://web.archive.org/web/20160812070612/http://simontylercousins.net/does-the-language-you-use-make-a-difference-revisited/">this paper by Simon Cousins</a>. I can say that we’ve consistently delivered on time without needing to compromise our code quality. Compared to past projects I’ve been part of, we have seen far fewer null reference exceptions– none at all for at least the last 6 months, which is as far back as our exception logs go. Bugs have been extremely rare, and are rarely caused by the code doing something different than the author intended, but rather by unforeseen behavior of external systems. For this reason, I think I can reasonably say that F# has helped our team move faster, because we have had to spend less time diagnosing and fixing bugs, giving us more time to deliver features. There’s something of a fun factor in play, too– writing F# <em>feels good</em>. Whether it’s for fun, for a kata, or for business, coding in F# is something that we enjoy and look forward to.</p>
<img src="https://olo.engineering/.netlify/functions/ga?v=1&_v=j83&t=pageview&dr=https%3A%2F%2Frss-feed-reader.com&_s=1&dh=olo.engineering&dp=%2Fposts%2Ff-sharp-in-action%2F&ul=en-us&de=UTF-8&dt=F%23%20in%20Action%20at%20Olo%20on%20the%20Olo%20Pay%20Project&tid=GA ID. Update (ignore) me. Note, that this is not compatible with the not-yet-commonly used version 4 of Google Analytics" width="1" height="1" style="display:none" alt="">
Mentoring Junior Engineers the Olo Way2023-03-28T00:00:00+00:00https://olo.engineering/posts/mentoring-jr-engineers/
<p>Beginning work as a new junior engineer can be challenging and intimidating. It is one of the responsibilities of senior engineers to make the juniors feel welcomed while also bringing them up-to-speed as soon as possible so they can make meaningful contributions to their team. This is not only beneficial to the company, but also good for the junior's sense of accomplishment and well-being. This post will largely draw on my own experience on-boarding two juniors and the overwhelming positive feedback I received.</p>
<p>One of Olo Engineering’s values is empathy. In Olo’s case, we choose to focus on <em>cognitive empathy</em> which can be defined as making an effort to understand the perspectives and mental state of those we interact with. This is key in successfully on-boarding a junior, put yourself in their shoes. Get your mind in a state that is similar to where theirs is. Seniors have been there; that first engineering job brings both positive and negative emotions. We’re thrilled and excited to demonstrate our skills and accomplishments to the larger team. However, there are also times when we feel completely overwhelmed by the scale of the domain we are working in. Recall your first job and the feelings that experience evokes. How did you feel? Nervous? Unsure of yourself?</p>
<p>Doing this can help remove some of the more common traps seniors can fall into, which result in being less-than-helpful to the very people they are trying to support. As humans gain experience and confidence, it’s easy to forget how it felt to be new and inexperienced. This makes perfect sense from a biological perspective. Our brains are wired-up to suppress those uncomfortable feelings. So in order to be empathetic to juniors, seniors have to make a conscious effort to recall those old experiences and how they felt.</p>
<p>At Olo, this process is built right into our job description as seniors. It is expected that we take time to mentor and on-board junior engineers during their first few weeks (when a senior is paired with a junior, we refer to each other as "on-boarding buddies", or "buddies" for short). In less mature organizations, juniors are thrown into the mix with barely any direction given. Worse, new engineers often receive little feedback, unless it’s negative. This can result in loss of morale, productivity, and increased turnover.</p>
<p>One example where I practiced empathy for my buddy is when we had a company-wide meeting during their first week. As is often the case, these meetings are full of insider jargon and acronyms. While Olo’s documentation around this terminology is solid, I had a feeling my on-boarding buddy had not yet had a chance to study it. Or perhaps he wasn’t aware it even existed. I knew that in my first few days, I could not remember all of the jargon that was thrown at me. I took the initiative to send him a direct message about what the terms meant as they were brought up in the meeting. He was immediately thankful and told me later on that the meeting was much more understandable to him as he was armed with the context around what the terms meant.</p>
<p>Another example with a different buddy involved helping her set up her Git environment. She had gotten herself in a state where she had numerous merge conflicts, and being new to Git, she was at a loss as to where to go next. We got on a Zoom session and spent time getting everything cleaned up and under control. All the while, I was explaining why things got to a state they were in and how to avoid some of these common pitfalls in the future.</p>
<p>In both instances, I repeatedly made it a point to tell my buddies never to be afraid to ask me for help. If I have time at that moment, I am more than happy to help them. If I am busy, I may not respond right away. This doesn’t mean I am ignoring, but rather I am heads down and focused. But I will eventually get back. If I find I cannot help, I will be sure to hand my buddy off to someone who can help, while orchestrating the whole experience as much as possible. This is essentially the script I tell them in the first few days. The sense of relief I have seen from juniors when explaining this -- even over a remote call -- is impressive.</p>
<p>Both of these examples seem trivial. After all, how hard is it to tell someone what a specific <a href="https://en.wikipedia.org/wiki/Three-letter_acronym">TLA</a> represents? Or how to fix a merge conflict? Regardless, I’ve been amazed by the feedback I received from both of my buddies. They were extremely appreciative of the time and care I took in explaining and guiding them through whatever hurdle they were facing.</p>
<p>In both of these cases, I tried my best to practice <em>cognitive empathy</em>, Put my mind in a similar space to where my buddies’ minds were, which likely involved feelings of being lost, confused and afraid to ask for help. This is a terrible place to be in for a new engineer, so it is the job of the senior to constantly encourage juniors to ask questions. Let them know, repeatedly, don’t be afraid to ask for help.</p>
<img src="https://olo.engineering/.netlify/functions/ga?v=1&_v=j83&t=pageview&dr=https%3A%2F%2Frss-feed-reader.com&_s=1&dh=olo.engineering&dp=%2Fposts%2Fmentoring-jr-engineers%2F&ul=en-us&de=UTF-8&dt=Mentoring%20Junior%20Engineers%20the%20Olo%20Way&tid=GA ID. Update (ignore) me. Note, that this is not compatible with the not-yet-commonly used version 4 of Google Analytics" width="1" height="1" style="display:none" alt="">