Steve Niemitz

Software Engineer

Over 15 years of experience in massive-scale distributed systems, databases, storage formats, and low-level optimization. I’ve had a diverse career, leading teams and projects across finance, ad-tech, and social media.

Location: Brooklyn, New York, United States
Email: steve@niemi.tz
Website: https://steve.niemi.tz

Twitter: steveniemitz
LinkedIn: Steve Niemitz

Experience

May 2015 – present

Senior Staff Engineer at Twitter

With 8 years of experience on multiple teams at Twitter, I’ve had the unique opportunity to see the impact of my decisions and designs over long time periods. I’ve designed, built, and led teams for multiple systems that push the limits of the industry. My ability to build deep cross-organizational relationships over that time allowed me to positively transform multiple organizations and improve large portions of Twitter’s tech stack.

Key Roles

Dec 2022 – present

Realtime Platform Tech Lead

Twitter significantly changed priorities in late 2022, and with those changes came a new team structure for Data Platform. We changed focus to primarily on-prem stream processing, with me leading the team of ~20 engineers working on various projects around that.

The largest current ongoing project is running Flink on-prem, mainly to be used as a runner for Apache Beam.
I developed an MVP, implemented most major components, and built a team around executing the work needed to run Flink in the Twitter environment on-prem. Since Twitter primarily uses Aurora and Mesos (rather than k8s), I built an Aurora resource provider for Flink to better integrate it with the Twitter ecosystem.

Currently (as of early 2023) there are multiple teams evaluating the Flink environment for their use-cases.

Jan 2021 – Dec 2022

Data Processing Tech Lead

In late 2020 we realized that our team (Revenue Data Platform) had an opportunity to drive significant impact across more than just our org. I, along with Revenue and Platform leadership (Director+) drafted a proposal to merge our team with the existing Heron (streaming) and Scalding (batch) platform teams.

The merged team would have the mandate to drive next-gen data processing at Twitter using Beam (and Dataflow).

I was the tech lead for the overall team (20+ developers at its peak), coordinating the efforts of different sub-teams to drive our goal of beam adoption. Over the ~2 years of running the team, Beam/Dataflow usage grew (from ~20) to 600+ batch and streaming jobs running on Dataflow across the company.

Jan 2017 – Jan 2022

Revenue Data Platform Tech Lead

Revenue Data Platform evolved from many learnings acquired working on TC⚡DC and was formed in 2017 with the goal of modernizing the revenue data stack at Twitter. By leveraging the cloud (Google Cloud Platform) we were able to significantly enhance how data was used by revenue teams.

The team’s largest project, the Large Data Collider (LDC), completely transformed how analytic data was aggregated and queried across Twitter and continues to be used today at massive scales.

I was the tech lead for the team (15+ developers at its peak), coordinating and acting as the largest contributor to the development and migration of several legacy systems onto the new LDC platform over multiple years.

Projects

Large Data Collider (LDC)

scala
finagle
grpc
gcp
bigtable
bigquery
beam
dataflow
pubsub
kafka
avro

524 commits (34% of total) , + 163,483, - 64,456

LDC is a platform that handles aggregating and querying analytic data. It it used at Twitter to power most user and advertiser facing analytics, such as those for tweets, ads, and videos. LDC supports most common aggregation functions (sum, min, max) as well as more complex probabilistic aggregations such as unique counts via HyperLogLog and TopK via frequency sketches.

The transition from ad-hoc analytics to LDC allowed product teams to quickly build scalable analytics into their products easily, without needing to worry about all of the complexities involved. In our ads analytics product for example, product teams were able to add new dimensions and aggregations in hours end to end, rather than the weeks or months it previously took.

At Twitter, LDC aggregates on the order of tens of millions of incoming events per second, and serves millions of incoming analytic queries per second over petabytes of aggregate data at extremely low latency (<100ms p99).
LDC supports multiple storage backends such as Druid, Bigtable, BigQuery, and CockroachDB.

I designed the system, implemented most major parts of it, and built a team of 15+ developers around it. Additionally, I evangelized it within Twitter to drive adoption of it both within and outside of our organization, growing it from just a few revenue datasets to 20+.

Beam/Dataflow

java
grpc
avro
protobuf
gcp

102 commits (19% of total) , + 24,270, - 4,864

A major portion of the work on LDC was revamping the streaming and batch data processing story at Twitter. In order to do the processing that LDC required in an efficient and cost effective way, we needed a new, more sophisticated data processing engine. In 2017 I began evaluating several technologies and chose Beam (running on Google Dataflow) as the system to build our aggregation engine on.

Over the next several years, I evangelized Beam/Dataflow within Twitter, driving significant adoption across the entire company. By 2022 we had grown to 600+ pipelines running on Beam across Twitter.
As mentioned above, I, along with senior management, led an effort to reorganize our team from the revenue to platform org in order to better be able to drive Beam adoption across the company.

As part of working on Beam, I started, and was the largest contributor to our internal Beam libraries (with 20+ contributors within Twitter), as well as a committer to the OSS Beam project. My work on the Dataflow runner (in our fork) around monitoring and optimization allowed us to get significantly more performance from it than the stock runner, leading to significant cost savings.
Many of the optimizations were then merged upstream to the OSS runner.

The transition from scalding (batch) or heron (streaming) to Beam unlocked multiple significant advantages for product teams. Beam is significantly more reliable and scalable than existing solutions, allowing teams to focus on developing value-add for the busniess rather than operational burden. Additionally, many jobs developed in Beam simply could not run on scalding due to scale. Being able to run these massive jobs allowed many teams to innovate in new ways and develop new products that would not be possible previously.

finagle-grpc

finagle
grpc
protobuf
http2

10 commits (22% of total) , + 2,782, - 610

Twitter uses Finagle for almost all RPCs. The library supports various protocols via an easily extensible model.

I collaborated with Vladimir from the CSL team to build a gRPC client and server on top of finagle. Both are fully compatible with “standard” gRPC clients/servers and use the standard generated protobuf stubs.

The introduction of finagle-grpc unlocked the ability for engineers at Twitter to much more easily interoperate with gRPC servers in key areas such as ML and Big Data and was quickly adopted across multiple teams.

JVM Runtime Code-Generation

scala
java
jvm
ByteBuddy

17 commits (100% of total) , + 4,993, - 275

A common performance challenge at Twitter was efficiently serializing and deserializing data based on a schema. While ahead-of-time compiled options such as Scrooge worked well, we more frequently needed more flexibility than could be achieved with static compiled schemas. Additionally, for formats such as Apache Avro, it is essentially impossible to compile a reader for a given schema ahead-of-time.

My solution to the performance problem was to build a runtime code generation library that could be used to build readers and writers from schemas dynamically at runtime. Once that library was in place, I could then quickly build code generators for various formats quickly.

While the JVM has a few good code generation libraries such as ASM or ByteBuddy, they are very low level and difficult to use without deep knowledge of the JVM. My library instead was built on expression trees, targeting a much higher-level abstraction. Rather than building up sets of JVM opcodes and stack manipulations, developers instead build up much higher level expression trees.

This library was very successful, and leveraged significantly by Twitter for avro and thrift formats. With avro for example, our library built on runtime code-generation performs over 5x faster than the OSS one, resulting in massive performance and cost savings.

Data Format Optimization

scala
avro
thrift

31 commits (88% of total) , + 22,271, - 3,687

Twitter uses Scrooge and Apache Avro on the JVM extensively for at-rest data storage.

Using the runtime code-gen libraries above, I built optimized codecs for both Avro and Scrooge, both which operate at schemas (native avro schemas for avro and schemer schemas for thrift). The avro codec is 5x faster than the native avro version, and scrooge 10x faster.

These codecs are widely used at Twitter, providing a significant performance and cost benefit when used.

Nov 2014 – May 2015

Software Engineer at TellApart

TellApart was an ad-tech company doing Dynamic Product Advertising on the major ad exchanges. It was acquired by Twitter in 2015. I worked on the team responsible for running AWS automation, Aurora/Mesos, and monitoring infrastructure.

Projects

Aurora/Mesos

java
python
docker

9 commits , + 1,339, - 209

The TellApart infrastructure ran across 1000s of AWS instances in multiple zones. Previously developers would deploy their changes across the fleet using different bespoke scripts. I led the rollout of Aurora/Mesos to standardize deployments and homogenize compute resources.

I spearheaded an effort in the OSS Aurora community to add support for Docker containers to Aurora, working closely with the Aurora community and eventually becoming a member of the Aurora PMC.

Scales

python
thriftmux
tcp

106 commits , + 22,515, - 14,076

TellApart used finagle extensively (specifically thriftmux) for JVM services (such as the exchange bidders), but also used Python for a significant number of services. The use of thriftmux made RPC between JVM and Python services difficult since there was no first-class support for it in Python.

I designed and implemented a RPC stack in Python called Scales that fully implemented the thriftmux protocol, along with service discovery, request cancellation, client-side load balancing, and observability.

Scales was used successfully at scale to generate millions of client QPS across the services at TellApart, greatly increasing the reliability of those RPCs vs the existing thrift libraries.

Oct 2011 – Nov 2014

Software Engineer at Blackstone

Blackstone is an Alternative Asset Management firm. I joined in 2011 with the new CTO and worked to form a brand new software engineering team from the ground up.

Projects

BXAccess

BXAccess is the investor-facing portal for Blackstone. Investors could log in to get documents and view their portfolios. I led the team, designed and implemented most major functionality of the platform.

Niagara

Niagara is the platform that calculates the per-LP (and GP) returns for most Blackstone funds. Accountants enter transactions into it which feed into various configurable calculations. Accountants can run ad-hoc reports from it, or generate pre-configured reports such as quarterly investor statements.

The design of Niagara was a significant departure from previous iterations of the accounting system. Rather than hard-coded calculations, the entire system was designed from the ground up to be completely dynamic, with all calculations stored in a configuration database. Users could change, and more importantly, view and audit each data-point that was calculated by the system.

I led the team, doing work from building the front-end UI (HTML/JS) to backend API and Database work. I took over the project in the middle of its execution, and with my leadership, we turned the project around, quickly iterating from a non-working, hard-coded prototype to onboarding over 10 funds with dynamic calculation in less than a year.

Core Infrastructure

I ran the core infrastructure library team, building most of the libraries and infrastructure automation (CI/CD, deployment, etc), and staffing a team around it.

This included libraries for:

RPC (WCF)
Data Access
Logging
Authentication / Authorization

Aug 2007 – Oct 2011

Software Engineer at CapitalIQ

CapitalIQ (now S&P Capital IQ) is a financial research platform. I joined in 2007 and worked on various projects.

Projects

Transcripts

I, along with a product manager, built and launched a global transcription product in 2010 from the ground up. The product allowed users of the platform to view earnings calls being transcribed in real-time.

I teamed up with the product manager, as well as a professional transcriptionist to staff a large (100+) team of transcriptionists that would be transcribing calls in real time. This included sourcing a (foreign) office, building an IT team, and logistics for getting them the required equipment to bootstrap the team.

I evaluated the technology of multiple professional transcription products, meeting with their developers to determine the feasibility of integrating them with the platform we wanted to build. I built the realtime transcript ingestion platform that would power this product, as well as the prototype for the front-end of it. I then staffed up a team to help build out the collection tooling as well as the front end.
At peak the system handled 100s of calls running concurrently.

Infrastructure

I worked with a team of 3 other senior developers on the infrastructure and production support team. This involved building deployment automation, CI/CD, and core libraries.

One particularly interesting project I worked on as part of this team was an automatic thread-dumper service for ASP.NET. It allowed ~zero pause thread stack collection in .NET applications and allowed us to track down various application stalls, reducing the p99 latency of commonly accessed areas of the website by as much as 90%. I wrote more about this, and eventually expanded its scope to an entire debugging library.

Publications

May 2020

Modernizing Twitter's ad engagement analytics platform by Google

Aug 2018

How Twitter Migrated its On-Prem Analytics to Google Cloud by Google Cloud NEXT

Aug 2018

Visualizing Cloud Bigtable Access Patterns at Twitter for Optimizing Analytics (Cloud Next '18) by Google Cloud NEXT

Feb 2015

Imagine There's No Server by TellApart

Using Docker, Aurora, and Mesos to deploy applications at TellApart

Education

Sep 2003 – Jun 2007

Bachelor in Computer Science from Rensselaer Polytechnic Institute

Skills

Scala

JVM
Functional

Java

JVM

Debugging
Optimization

Python

Scripting

RPC

Finagle
gRPC
Load Balancing
Resiliency

Data Formats

Avro
Thrift
Protobuf

Databases

Bigtable
BigQuery
Postgres
MSSQL
Calcite

Data Processing

Beam
Dataflow
Flink
Scalding
Hadoop
Heron

Experience

Key Roles

Projects

Large Data Collider (LDC)

Beam/Dataflow

finagle-grpc

JVM Runtime Code-Generation

Data Format Optimization

Projects

Aurora/Mesos

Scales

Projects

BXAccess

Niagara

Core Infrastructure

Projects

Transcripts

Infrastructure

Publications

Education

Skills

Templates (for web app):

Error