AWS

Tommy + the technology of AWS

AWS Examples

Below are some of my projects involving AWS, grouped by company. Click to read more about the relevant projects and chat with me to follow up on any topic you'd like to hear more about!

pull.systems

EV Observability + Analytics

Staff Engineer

2023 - 2024

Project: Pull Workbench v1

Upon joining, I came up to speed quickly on the stack of the early version of Pull Workbench, which was very buggy but demonstrated the initial ideas and had a solid set of the latest technologies and patterns established in the codebase, providing for a solid starting point.

I was entrusted to aid our CTO in hiring several additional employees, and so I joined and conducted interviews for the first several months while working with the existing team AI + Full Stack to deliver features and solidify the system, with the aim of keeping it fully working with each merge, after playing a little catch-up to fix the early bugs that worried our business partners, giving them confidence that our team could deliver.

From there, I developed full stack features solo or by pairing with team members, and ultimately led a squad of 5 team members alongside a second squad that together comprised our engineering team.

Much of my time went into authoring complex analytics sql queries using the impressive Kysely library, a fluent, typesafe query builder that we used for our postgres and redshift databases. Given the nature of the product, we needed to make decisions on which queries could be run in real time vs. which queries and subqueries would need to be computed offline as part of a network of airflow dags.

On the ML Ops side I advocated for traceability and reproducibility / determinism of all models and artifacts, and integrated with systems that implemented that, such as Airflow to coordinate DAGs of ML training jobs and Sagemaker's metadata API, which we controlled via model lifecycle automations that produced and stored models, artifacts and metadata that were in turn consumed at runtime or in batch by our analytics stack

On the frontend, I helped us deliver an initial version of the Pattern Editor, a UI and set of APIs that users could use to put together their own patterns of interest, such as looking for certain anomalous ranges of quantities that themselves may be derived from other user-defined patterns. This entailed not only a UI that was DAG-aware but also a layer that converted the json representation of these patterns from the frontend into typesafe kyesely queries to be executed against redshift.

Key Results

Led 5-person squad delivering Pattern Editor enabling custom anomaly detection workflows
Processed 10M+ daily records with type-safe SQL queries using Kysely
Improved hiring velocity conducting 30+ technical interviews while building product

Full Details

Intertru.ai

AI-assisted Hiring

Lead Engineer

2023 - 2024

Project: Interview Builder

Interview Builder is where customers would go to define the values they wanted to find in their ideal candidate, and map those to attributes and ultimately interview questions that the intertru ai was pre-trained to assess.

My role was to work closely with the CTO to understand what was proven out on the ML side, so that we could deduce a UI that intuitively would extract the necessary inputs from the customer, while providing them with predefined templates as starting points to ease them into the process.

The application was built on react, typescript, graphql (backed by dynamodb) and amazon amplify, and I built it very quickly with simple backends so that we could iterate on the frontend, to get the experience right before investing significant time and effort into an ideal backend. This approach made iterations faster and produced less collateral damage / throwaway code as we refined the user experience.

We then added instrumentation so that we could measure the use of the feature, any bugs that might turn up, and its performance, before releasing it to production, where it was initially used internally to surface any shortcomings before customers were exposed to it.

Project: Candidate Summary

The candidate summary page summarized a candidate's performance during multiple interview stages by presenting radar charts showing degree of fit against the values and attributes being evaluated for their position, as defined in the Interview Builder.

I built the frontend in React and Typescript, and integrated with the backend, which I partially built, which leveraged RAG and ran several Machine Learning models to produce scores and explainable AI. For example, models to break down interview transcripts into quotable fragments, evaluate relevance against configured company values, and call chatGPT APIs to obtain summaries and scores related to that content

Key Results

Delivered MVP in 6 weeks enabling rapid iteration on customer interview workflows
Built AI-powered candidate evaluation dashboard enabling data-driven hiring decisions
Reduced time-to-create interview templates by 70% with intuitive UI design

Full Details

Appen AI

Formerly Figure Eight

Senior Director, Eng

2022 - 2023

Project: DevOps as a Practice

Instead of splitting devops and infrastructure and tests completely separate from development teams, I moved the needle so that product development teams could own more of their own infrastructure and tests, creating less back-and-forth and empowering teams to deliver.

We used Devspace, which meant any dev or team could stand up a reproducible, isolated stack with multiple services and frontends running, in the cloud, as well as modify the definitions of the infrastructure and code themselves, directly, without permission or external team tickets.

This enabled product engineers to do more experimentation and testing thru declarative infrastructure and configuration management while still protecting our production environments, unlocking their shackles and potential as the experts in the software.

At the same time I worked to reduce the outsized role our amazing DevOps team was playing in the day to day management as well as enhancement of environments, which unfairly impeded expert developers by introducing red tape and inter-team processes that didn't add value.

Project: ML Platform Enhancements

I ran Appen's ML Platform, which was used by FAANG and many other startups and enterprises to automate and scale their ML practices, including running both supervised and unsupervised workloads, as well as their global annotation workforce which enabled customers to leverage our crowdsourced professionals to elastically obtain labelling and quality checking services for text, voice, image, video and LIDAR annotation, training and validation use cases.

I reported to the CTO and directed multiple full stack teams each with their own tech leads and range of engineering skills to do both regular maintenance and product enhancements using technologies like Sagemaker, React, K8s (Kubernetes), Spark, Kafka, Airflow, Spring(Boot), Ruby, Python, Java, Typescript and SQL.

Maintenance included regular updates to infrastructure, bug fixes, and performance optimizations across the platform. We migrated more and more services to K8s (Kubernetes) and Ambassador as our API gateway, where we could consolidate cross-cutting logic like auth and versioning.

Enhancements included changes to simplify the UX, kill redundant or unused features, add measurement to inform our choices, and larger efforts like Enterprise OAuth.

Director, Engineering

2021 - 2022

Project: Enterprise OAuth

There were 4 different websites in different technologies, acquired from different companies, and some APIs, that all needed to be unified in terms of sign up, sign in, and sign out, given their existing state of each having separate user stores, including 3rd party vendor users who logged in with vendors and then authed to us with a hidden token.

It was a stalled project, so I started with missing requirements, incomplete designs and misleading progress indicators and focused other leaders and teams on delivery thru tested working software, focusing on tested user stories and on-the-ground learnings as units of progress, instead of large, outdated PRDs waterfall style.

Contributed directly in React / Typescript, Nodejs / express, Ruby on Rails and custom gems, OAuth configuration, Java Spring with runtime loaded SPI implementations from across separate applications domains.

There was a complex architecture at play and teams that did not know each other and weren't working as a single unit, so the landscape was difficult and rife with demoralized team members.

Although my team was to play but one part in many on the project, I realized quickly that there was no single leader or coherent plan, and so there was lots of blame game and treading water.

With permission from our VP of Engineering, I took charge of the teams and worked with product to firm up requirements, and replace the initially conceived solution architecture, which would not have worked and was created in a bit of a vacuum, into one that would actually work, by digging in and running all the services and web apps myself and understanding the multiple data stores and existing auth mechanisms including auth via 3rd party vendors to some parts of the system.

I delivered the project within 5 months and for my efforts was rewarded not long after with a promotion.

Key Results

Reduced deployment lead time by 75% enabling product teams to self-serve infrastructure
Ran ML platform to support 100K+ annotation jobs daily across FAANG clients
Unified authentication across 4 legacy systems reducing login friction by 85%

Full Details

Sourceability

Electronic Component Parts Distributor

Senior Engineering Manager

2019 - 2020

Project: Sourceability Insights

My PM and the business wanted to illustrate to other teams that a fast-paced, fail-fast approach where we released daily (as opposed to 1-3 times per year) would serve us much better in that we could learn quickly, iterate and pivot, without huge costly investments into products that did not meet expectations or deadlines.

Before hiring my team, I set up a CICD pipeline and basic framework of a site that could sustain a heavy and intense crawl from google.

New hires all released to production on their first day of work - a principle I had brought to the table, that it should be so automated and simple that someone could set up and deploy a small feature within their first few hours of working at Sourceability.

Our parts and datasheets website, which also incorporated proprietary availability and quality scores, was used - within 3 months of inception - to successfully sell a 3 year Analytics API contract to an international multibillion dollar company, as well as driving organic traffic and learning how to scale to sustain google crawls of the hundreds of thousands of electronic component parts in our inventory while scaling down outside of the crawl / high-traffic moments.

Full Stack - React, NodeJS, Typescript, Kubernetes, Gitlab
Functional Reactive Programming - RxJS, highlandjs
Daily Production Deploys - Canary Deployment w/ K8s
Constant Collaboration - No “throwing over the wall”
CI/CD Automation Pipeline - Every user story gets an instant shareable environment
Coaching / Mentoring / Leading diverse team

Key Results

Secured $3M analytics API contract within 3 months of product launch
Achieved 400% increase in organic search index uptake thru SEO optimization
Enabled team to deploy on day one reducing time-to-first-deploy from weeks to hours

Full Details

MapR Technologies

Big Data / Hadoop Distributor

Principal Engineer, DevOps

2015 - 2018

Project: DevOps Portal + CICD

A portal bringing together version control, automated test definitions and statuses, quality metrics, jira tickets, CICD jobs, and supportinginfrastructure definitions and status into a single place to aid in release management and devops practices.

Behind the scenes, pipelines made with K8s (Kubernetes), Mesos, Github and Jenkins automatically provisioned environments, deployed our software and ran extensive tests on it, including complex multi-cloud platform scale tests across Google Cloud and AWS as well as on prem with bare metal and Open Stack

Project: DevOps Dojo

I initiated and led "Scala Dojo" ,partnering with QA / Devs interested in adoption of Functional Programming, leading to certifications from Coursera and improved team morale and interest.

That broadened and continued on as "DevOps Dojo", which was a recurring collaboration initiative that produced theDevOps Portal + CICD by engaging across teams and disciplines to ascertain true priority pain points and solutions that scaled across multiple teams, so that we could address those via CI/CD and our DevOps practices.

Project: Spyglass

Observability was introduced to MapR via the Spyglass Project, which sought to obtain metrics from workloads as well as application specific metrics across all the tools and infrastructure of the MapR Hadoop Stack.

My responsibilities included automating the build and deployment of the full hadoop stack under development, automated test authoring and execution, mentoringjunior teammates to do the same, collaborating with dev teams to ensure they plugged into our CI/CD and Test framework nicely, andtroubleshooting problems that arose.

Introduced Scrum process and Jira to the team
Acted as scrum master in early days of project
Developed automation to manage ESXI environments for CI/CD, involving heterogeneous MapR cluster installation, configuration and testing

Key Results

Unified 5 disparate DevOps tools into single portal reducing context switching by 80%
Trained 8+ engineers in Scala and functional programming with Coursera certifications
Implemented comprehensive observability across Hadoop stack monitoring 100+ metrics

Full Details

Lookout

Mobile Security

Senior Test Engineer

2014 - 2015

Project: Test Automation

With my experience at Progressive Insurance where I had learned to build my own code instrumentor for dynamic analysis, which I used alongside existing static analysis tools to test and measure deployed systems, I had a rich understanding of how to analyze runtimes of mobile apps, which was part of Lookout's mobile security backend special sauce.

With this mutual interest in mind I joined and helped create and standardize automated test suites as an engineer, developing libraries, frameworks and declarative jobs that constituted the types of tests needed to assure correctness and security of our own software offerings.

Key Results

Authored scale test automation exercising 4 integrated stacks identifying 20+ scaling bottlenecks
Measured and increased code coverage between 20% and 50% for various repos within 9 months

Full Details

Book a Chat w/ Calendly! LinkedIn