Steve Niemitz
Software Engineer
Over 15 years of experience in massive-scale distributed systems, databases, storage formats, and low-level optimization. I’ve had a diverse career, leading teams and projects across finance, ad-tech, and social media.
- Location
- Brooklyn, New York, United States
- steve@niemi.tz
- Website
- https://steve.niemi.tz
- steveniemitz
- Steve Niemitz
Experience
– present
Senior Staff Engineer at Twitter
With 8 years of experience on multiple teams at Twitter, I’ve had the unique opportunity to see the impact of my decisions and designs over long time periods. I’ve designed, built, and led teams for multiple systems that push the limits of the industry. My ability to build deep cross-organizational relationships over that time allowed me to positively transform multiple organizations and improve large portions of Twitter’s tech stack.
Key Roles
– present
Realtime Platform Tech Lead
Twitter significantly changed priorities in late 2022, and with those changes came a new team structure for Data Platform. We changed focus to primarily on-prem stream processing, with me leading the team of ~20 engineers working on various projects around that.
The largest current ongoing project is running Flink on-prem, mainly to be used as a runner for Apache Beam.
I developed an MVP, implemented most major components, and built a team around executing the work needed to run Flink in the Twitter environment on-prem. Since Twitter primarily uses Aurora and Mesos (rather than k8s), I built an Aurora resource provider for Flink to better integrate it with the Twitter ecosystem.
Currently (as of early 2023) there are multiple teams evaluating the Flink environment for their use-cases.
–
Data Processing Tech Lead
In late 2020 we realized that our team (Revenue Data Platform) had an opportunity to drive significant impact across more than just our org. I, along with Revenue and Platform leadership (Director+) drafted a proposal to merge our team with the existing Heron (streaming) and Scalding (batch) platform teams.
The merged team would have the mandate to drive next-gen data processing at Twitter using Beam (and Dataflow).
I was the tech lead for the overall team (20+ developers at its peak), coordinating the efforts of different sub-teams to drive our goal of beam adoption. Over the ~2 years of running the team, Beam/Dataflow usage grew (from ~20) to 600+ batch and streaming jobs running on Dataflow across the company.
–
Revenue Data Platform Tech Lead
Revenue Data Platform evolved from many learnings acquired working on TC⚡DC and was formed in 2017 with the goal of modernizing the revenue data stack at Twitter. By leveraging the cloud (Google Cloud Platform) we were able to significantly enhance how data was used by revenue teams.
The team’s largest project, the Large Data Collider (LDC), completely transformed how analytic data was aggregated and queried across Twitter and continues to be used today at massive scales.
I was the tech lead for the team (15+ developers at its peak), coordinating and acting as the largest contributor to the development and migration of several legacy systems onto the new LDC platform over multiple years.
Projects
Large Data Collider (LDC)
LDC is a platform that handles aggregating and querying analytic data. It it used at Twitter to power most user and advertiser facing analytics, such as those for tweets, ads, and videos. LDC supports most common aggregation functions (sum, min, max) as well as more complex probabilistic aggregations such as unique counts via HyperLogLog and TopK via frequency sketches.
The transition from ad-hoc analytics to LDC allowed product teams to quickly build scalable analytics into their products easily, without needing to worry about all of the complexities involved. In our ads analytics product for example, product teams were able to add new dimensions and aggregations in hours end to end, rather than the weeks or months it previously took.
At Twitter, LDC aggregates on the order of tens of millions of incoming events per second, and serves millions of incoming analytic queries per second over petabytes of aggregate data at extremely low latency (<100ms p99).
LDC supports multiple storage backends such as Druid, Bigtable, BigQuery, and CockroachDB.
I designed the system, implemented most major parts of it, and built a team of 15+ developers around it. Additionally, I evangelized it within Twitter to drive adoption of it both within and outside of our organization, growing it from just a few revenue datasets to 20+.
Beam/Dataflow
A major portion of the work on LDC was revamping the streaming and batch data processing story at Twitter. In order to do the processing that LDC required in an efficient and cost effective way, we needed a new, more sophisticated data processing engine. In 2017 I began evaluating several technologies and chose Beam (running on Google Dataflow) as the system to build our aggregation engine on.
Over the next several years, I evangelized Beam/Dataflow within Twitter, driving significant adoption across the entire company. By 2022 we had grown to 600+ pipelines running on Beam across Twitter.
As mentioned above, I, along with senior management, led an effort to reorganize our team from the revenue to platform org in order to better be able to drive Beam adoption across the company.
As part of working on Beam, I started, and was the largest contributor to our internal Beam libraries (with 20+ contributors within Twitter), as well as a committer to the OSS Beam project. My work on the Dataflow runner (in our fork) around monitoring and optimization allowed us to get significantly more performance from it than the stock runner, leading to significant cost savings.
Many of the optimizations were then merged upstream to the OSS runner.
The transition from scalding (batch) or heron (streaming) to Beam unlocked multiple significant advantages for product teams. Beam is significantly more reliable and scalable than existing solutions, allowing teams to focus on developing value-add for the busniess rather than operational burden. Additionally, many jobs developed in Beam simply could not run on scalding due to scale. Being able to run these massive jobs allowed many teams to innovate in new ways and develop new products that would not be possible previously.
finagle-grpc
Twitter uses Finagle for almost all RPCs. The library supports various protocols via an easily extensible model.
I collaborated with Vladimir from the CSL team to build a gRPC client and server on top of finagle. Both are fully compatible with “standard” gRPC clients/servers and use the standard generated protobuf stubs.
The introduction of finagle-grpc unlocked the ability for engineers at Twitter to much more easily interoperate with gRPC servers in key areas such as ML and Big Data and was quickly adopted across multiple teams.
JVM Runtime Code-Generation
A common performance challenge at Twitter was efficiently serializing and deserializing data based on a schema. While ahead-of-time compiled options such as Scrooge worked well, we more frequently needed more flexibility than could be achieved with static compiled schemas. Additionally, for formats such as Apache Avro, it is essentially impossible to compile a reader for a given schema ahead-of-time.
My solution to the performance problem was to build a runtime code generation library that could be used to build readers and writers from schemas dynamically at runtime. Once that library was in place, I could then quickly build code generators for various formats quickly.
While the JVM has a few good code generation libraries such as ASM or ByteBuddy, they are very low level and difficult to use without deep knowledge of the JVM. My library instead was built on expression trees, targeting a much higher-level abstraction. Rather than building up sets of JVM opcodes and stack manipulations, developers instead build up much higher level expression trees.
This library was very successful, and leveraged significantly by Twitter for avro and thrift formats. With avro for example, our library built on runtime code-generation performs over 5x faster than the OSS one, resulting in massive performance and cost savings.
Data Format Optimization
Twitter uses Scrooge and Apache Avro on the JVM extensively for at-rest data storage.
Using the runtime code-gen libraries above, I built optimized codecs for both Avro and Scrooge, both which operate at schemas (native avro schemas for avro and schemer schemas for thrift). The avro codec is 5x faster than the native avro version, and scrooge 10x faster.
These codecs are widely used at Twitter, providing a significant performance and cost benefit when used.
–
Software Engineer at TellApart
TellApart was an ad-tech company doing Dynamic Product Advertising on the major ad exchanges. It was acquired by Twitter in 2015. I worked on the team responsible for running AWS automation, Aurora/Mesos, and monitoring infrastructure.
Projects
Aurora/Mesos
The TellApart infrastructure ran across 1000s of AWS instances in multiple zones. Previously developers would deploy their changes across the fleet using different bespoke scripts. I led the rollout of Aurora/Mesos to standardize deployments and homogenize compute resources.
I spearheaded an effort in the OSS Aurora community to add support for Docker containers to Aurora, working closely with the Aurora community and eventually becoming a member of the Aurora PMC.
Scales
TellApart used finagle extensively (specifically thriftmux) for JVM services (such as the exchange bidders), but also used Python for a significant number of services. The use of thriftmux made RPC between JVM and Python services difficult since there was no first-class support for it in Python.
I designed and implemented a RPC stack in Python called Scales that fully implemented the thriftmux protocol, along with service discovery, request cancellation, client-side load balancing, and observability.
Scales was used successfully at scale to generate millions of client QPS across the services at TellApart, greatly increasing the reliability of those RPCs vs the existing thrift libraries.
–
Software Engineer at Blackstone
Blackstone is an Alternative Asset Management firm. I joined in 2011 with the new CTO and worked to form a brand new software engineering team from the ground up.
Projects
BXAccess
BXAccess is the investor-facing portal for Blackstone. Investors could log in to get documents and view their portfolios. I led the team, designed and implemented most major functionality of the platform.
Niagara
Niagara is the platform that calculates the per-LP (and GP) returns for most Blackstone funds. Accountants enter transactions into it which feed into various configurable calculations. Accountants can run ad-hoc reports from it, or generate pre-configured reports such as quarterly investor statements.
The design of Niagara was a significant departure from previous iterations of the accounting system. Rather than hard-coded calculations, the entire system was designed from the ground up to be completely dynamic, with all calculations stored in a configuration database. Users could change, and more importantly, view and audit each data-point that was calculated by the system.
I led the team, doing work from building the front-end UI (HTML/JS) to backend API and Database work. I took over the project in the middle of its execution, and with my leadership, we turned the project around, quickly iterating from a non-working, hard-coded prototype to onboarding over 10 funds with dynamic calculation in less than a year.
Core Infrastructure
I ran the core infrastructure library team, building most of the libraries and infrastructure automation (CI/CD, deployment, etc), and staffing a team around it.
This included libraries for:
- RPC (WCF)
- Data Access
- Logging
- Authentication / Authorization
–
Software Engineer at CapitalIQ
CapitalIQ (now S&P Capital IQ) is a financial research platform. I joined in 2007 and worked on various projects.
Projects
Transcripts
I, along with a product manager, built and launched a global transcription product in 2010 from the ground up. The product allowed users of the platform to view earnings calls being transcribed in real-time.
I teamed up with the product manager, as well as a professional transcriptionist to staff a large (100+) team of transcriptionists that would be transcribing calls in real time. This included sourcing a (foreign) office, building an IT team, and logistics for getting them the required equipment to bootstrap the team.
I evaluated the technology of multiple professional transcription products, meeting with their developers to determine the feasibility of integrating them with the platform we wanted to build. I built the realtime transcript ingestion platform that would power this product, as well as the prototype for the front-end of it. I then staffed up a team to help build out the collection tooling as well as the front end.
At peak the system handled 100s of calls running concurrently.
Infrastructure
I worked with a team of 3 other senior developers on the infrastructure and production support team. This involved building deployment automation, CI/CD, and core libraries.
One particularly interesting project I worked on as part of this team was an automatic thread-dumper service for ASP.NET. It allowed ~zero pause thread stack collection in .NET applications and allowed us to track down various application stalls, reducing the p99 latency of commonly accessed areas of the website by as much as 90%. I wrote more about this, and eventually expanded its scope to an entire debugging library.
Read more here
Publications
Modernizing Twitter's ad engagement analytics platform by Google
How Twitter Migrated its On-Prem Analytics to Google Cloud by Google Cloud NEXT
Visualizing Cloud Bigtable Access Patterns at Twitter for Optimizing Analytics (Cloud Next '18) by Google Cloud NEXT
Imagine There's No Server by TellApart
Using Docker, Aurora, and Mesos to deploy applications at TellApart
Education
–
Bachelor in Computer Science from Rensselaer Polytechnic Institute
Skills
- Scala
- Level: MasterKeywords:
- Java
- Level: MasterKeywords:
- JVM
- Level: MasterKeywords:
- Python
- Level: intermediateKeywords:
- RPC
- Level: MasterKeywords:
- Data Formats
- Level: MasterKeywords:
- Databases
- Level: MasterKeywords:
- Data Processing
- Level: MasterKeywords: