About Me

I like to hang out at the bleeding edge of open source and big data.

Day to day, I'm working the using data to save the world at Tesla.

In my freetime I'm also an Ironman triathlete, yogi, foodie and amateur board game maker.

Blog Posts

  • 21 Dec 2020 » Stuck on Kafka

    Low volume data pipelines in Kafka tend to get stuck; there is more data to process but your consumers aren’t moving forward. And its not because you are doing anything wrong in your application. In fact, it’s part of the design! (read more)

  • 01 Jan 2020 » High Performance Kafka Producers

    After my Scaling a Kafka Consumer post, it only seemed fair to take a dive into the producer side of the world too. It’s got it’s own set of problems and tuning fun that we can dive into. (read more)

  • 04 Dec 2019 » Vertically scaling Kafka consumers

    When scaling up Kafka consumers, particularly when dealing with a large number of partitions across a number of topics you can run into some unexpected bottlenecks. They get even worse when dealing with geographically remote clusters. The defaults will get you surprisingly far, but then you are...(read more)

  • 18 Nov 2019 » New (Open Source!) Tooling: Kafka Keystore Building

    The easiest way to setup basic authentication with Kafka is to use x509 certificates. However, getting these certificates into a place where they are actually by your Kafka client can be frustrating and error prone. That is why we recently released a read more)

  • 04 Nov 2019 » A guide to Kafka Consumer Freshness

    In my recent talk at Kafka Summit I mentioned that users don’t think in offsets, but rather in amounts of time - minutes, hours - that a consumer is behind. When you say, “We might have pr...(read more)

  • 04 Oct 2019 » Kafka Upgrade Validation

    If you attended Kafka Summit, or followed along on Twitter, you probably heard many people mentioning that you really really ought to upgrade your Kafka installation. No surprise, it often will fix many obscure bugs (aka those you are guaranteed to hit at scale), while increasing performance and ...(read more)

  • 28 Apr 2019 » From Git Noob to Wizard in 5 minutes

    Looking to improve your efficiency with Git? Learn the secrets to go from novice to master to wizard. Not only that, but it can make life significantly easier and faster - every day. (read more)

  • 07 Apr 2019 » Just Right Parallelism in Akka Streams

    Reliability scaling and managing streaming ingest - particularly when dealing IoT - is a challenging problem. Not only do you have to be low latency, correct and high volumes, you also get huge messages and bursty devices. On top of that, firmware developers have their own goals and are not optim...(read more)

  • 09 Mar 2019 » Partial Multi-module Maven builds for pull requests

    As your Maven projets get larger it can take a non-trivial amount of time to complete a build, particularly if you are running each module in sequence due to code limitations. In this case, you should think about only having to build code that changes and its downstream dependencies. (read more)

  • 03 Mar 2019 » HDFS Block Metrics - Missing vs Corrupt

    You are starting to move away from your Hadoop vendor - it was great for getting started, but you want to control your own destiny, reap huge saving money or institute advanced management. Once you start managing your own Hadoop cluster there are many metrics you will need to start collecting and...(read more)

  • 16 Sep 2018 » Dockerizing Jenkins Maven builds

    Many legacy build pipelines leverage Jenkins. If you get lucky, you will at least find the time to move to a Jenkinsfile - the same power as Jenkins, but now actually codified, rather than fragile point and click. (read more)

  • 30 Jul 2017 » Starting at Telsa

    I’m excited to announce that I’ll be starting a new job on Monday… at Tesla Motors! They have a mission that is incredibly exciting - nothing less than trying to save the world. With lots of opportunity and potential for impact, I can’t wait to get started. (read more)

  • 05 Jun 2017 » Hard isn't Valuable: Looking back on Fineo

    I’ve decided its time to wrap up Fineo. I took a shot for a while (nearly two years!), but I’m way past my original time deadline to get traction and well out of (allocated) money. I’ve spent the last few weeks writing up some of the interesting architecture/design work I did, so at leas...(read more)

  • 22 May 2017 » Building Up Fineo's Continuous Integration with Jenkins

    Getting a robust continuous-integration (CI) suite was an early priority at Fineo. By spending some upfront time getting good infrastructure in place we could move dramatically faster down the road; with a distributed, micro/nano-service based architecture, in-depth testing across the stack is a ...(read more)

  • 18 May 2017 » Interlude: Self-Improving Architecture and Design

    A short break from the Fineo architecture. Recently get for the thinking about self-improving systems. Specifically, I liked the idea of a self-improving system where the actors in and around the system are incentivized to continually improve the system. This makes sense in context of a ...(read more)

  • 17 May 2017 » Handling Errors in Fineo

    Passing pipeline processing errors back to the user was not originally built into the Fineo platform (a big oversight). However, we managed to add support for it over only a couple of weeks. Moreover, we were able to make it feel basically seamless with the existing platform. (read more)

  • 15 May 2017 » Supporting Schema Evolution and Addition in Fineo

    Fineo’s architecture is designed to help people go faster, while having to do less by leveraging our NextSQL system. At the surface, it’s not that much different from the the Lambda Architecture - a realtime serving layer and an offline batch processi...(read more)

  • 12 May 2017 » Scaling Out Fineo

    A deeper look into how Fineo manages its seamless scalability across the multi-layer architecture. By enabling each layer to scale independently and leaning on existing, fully-managed services we can enable wildly scalable infrastructure without notably increasing operations effort (and often dec...(read more)

  • 10 May 2017 » Using DynamoDB for Time Series Data

    Time is the major component of IoT data storage. You have to be able to quickly traverse time when doing any useful operation on IoT data (in essence, IoT data is just a bunch of events over time). (read more)

  • 08 May 2017 » Multi-tenant SQL Security In-Depth

    Multi-tenancy is an abstraction for a big, hard group of problems that touches on security, scalability, resource consumption and quality of service. Generally attempting to back-fit multi-tenancy is, at best, hacky and less than satisfying; at worst, its a recipe for disaster. (read more)

  • 05 May 2017 » Translating SQL queries for schema on NoSQL

    Fineo uses a novel semi-schemaful approach to unlock the potential of NoSQL data stores, while simultaneously enabling ‘metalytics’ queries by providing an engine that seamlessly supports everything from nearline, operational queries (e.g. low latency, small scale) to deep, ad-hoc analytics. Prim...(read more)

  • 02 May 2017 » Implementing Dynamic Schema At Scale

    Rehash of the Dynamic Schema at Scale for how we implemented reading semi-schematized data from a NoSQL. I touch on the query translation and processing of updating schema. (read more)

  • 01 May 2017 » Scaling up for an IoT World

    With Fineo’s Beta availability (link), I thought it would be interesting to look at how Fineo actually supports IoT-scale ingest and eliminates the need for traditional pipelines and the mainta...(read more)

  • 27 Apr 2017 » An investment thesis

    What’s new and interesting? What’s worth focusing on? Its worth taking a step back and looking at the larger picture. It helps make sure you are doing the right thing, for the right reasons. Doing that every day means nothing gets done, but too infrequently means getting lost in the weeds and, pe...(read more)

  • 30 Mar 2016 » Alexa as an API makes smart homes a reality

    For years, hobbyist having been hacking their homes to create smarter parts that respond to their every whim. Smart homes were thrust even further into the public concience with the first Iron Man (2008) movie. Suddenly, everyone wanted self tinting, weather informing windows or a home assistant ...(read more)

  • 09 Mar 2016 » Dynamic, Lazy Schema at Scale

    Schema management is some of the most painful database work and anything you can do to make it easier can dramatically reduce an enterprises’ iteration interval. At Fineo we are focused on delivering a scalable, enterprise grade time-series platform. While we do lot...(read more)

  • 28 Feb 2016 » Fineo Internals - Simpsons Did It

    I’d like to talk a bit about the AWS-focused ingest pipeline that we developed at Fineo. Not too ironically, its very similar to the pipeline that Netflix discussed by in a recent ...(read more)

  • 26 Oct 2015 » Choosing Hadoop Deployment tooling

    Picking the right tools for deployment can be tricky and have long-lasting effects on your organization. Over at Fineo we have the luxury of doing everything from scratch. This means no concern with legacy tools, monitoring, or just cruft. Instead, we have the op...(read more)

  • 02 Oct 2015 » Did some prettyfying

    Updating the layout/look and feel of the site a little bit. Please let me know if things no longer work for you! (read more)

  • 21 Aug 2015 » Building RPMs from Maven projects on OSX

    Packaging software is a necessary evil, and for enterprise software RPMs even more so, but you might as well find out how to manage it when you really want to just work all from your Mac.(read more)

  • 20 Aug 2015 » Sorry for the blog thrash

    Don't know if you noticed, but there may haven been a bunch of RSS thrash from updates to my maven shade post. Jekyll had updated some dependencies(read more)

  • 17 Aug 2015 » Using Maven Shade to Run Multiple Versions in a JVM

    Sometimes, you will want to run multiple versions of the same library in the same JVM. Maybe you are writing a framework to run arbitrary user code or maybe you are just integrating with legacy code; either way, you will need a way to run both versions of some library in the same JVM - enter the ...(read more)

  • 13 Aug 2015 » How to setup your tri bike for a fast, smooth race

    I've had to explain different parts of my bike setup a couple different times, so I thought I would do the end-to-end writeup with the 'whys' (the important part) of each component(read more)

  • 18 Jun 2015 » Dev Tip- Using Gradle without hating it

    Gradle is starting to become mature enough to be used as a viable build system. However, when trying to build with gradle there can be some easy idioms to help you up the learning curve(read more)

  • 23 Apr 2015 » Scalable Real Time Query

    How do you manage a realtime queries and analytics over the same logic data?(read more)

  • 06 Dec 2014 » Ironman Cozumel 2014 Race Report

    Ironman Cozumel 2014 was a heck of a race. A 2.4mi swim half into the current, 112mi bike ride with brutal headwinds and a marathon. Together, it was enough to put me into the hospital.(read more)

  • 17 Nov 2014 » Ironman Cozumel 2014

    Head's up (and links) for my upcoming Ironman race(read more)

  • 11 Jun 2013 » HBase Consistent Secondary Indexing

    Secondary indexing for HBase is a difficult problem, but remains perennially popular. Various implementations exist, but all fall short either in features or latency. [Phoenix](http://www.github.com/forcedotcom/phoenix) is soon gaining support for what we consider "HBase Consistent" Secondary ind...(read more)

  • 01 Dec 2012 » Guest Blogging

    I was recently asked to write a few guest blog posts about HBase. I'd had some ideas bouncing around for a while (and a little personal brand expansion is never a bad thing), so I started working on it. Here's some of my thoughts from the experience(read more)

  • 05 Nov 2012 » Rolling Java GC Logs

    In java 1.6_34, rolling GC logs was added. However, the documentation is wrong and hard to find. Here's how you manage it(read more)

  • 09 Jul 2012 » Consistent Enough Secondary Indexing

    Distributed systems inherently trade-off consistency or availability, making it very difficult (but doable) to implement secondary indexes at scale(read more)

  • 02 May 2012 » Table References in the HBase Shell

    How to use the new table references in the HBase shell(read more)

  • 01 May 2012 » Heads Down, Thumbs Up

    Of late, I've been really busy - let me explain.(read more)

  • 03 Feb 2012 » HBase Eclipse Support

    How-To develop for HBase within Eclipse(read more)

  • 02 Jan 2012 » Building Big

    A lot of this post is based on my recent discussions with a few companies - both big and small - who are attempting to 'change the paradigm' either of society in general,(read more)

  • 10 Dec 2011 » High Tech Colors

    This is the point where Jobs fell out of love with the style of cool, dark colors and more in favor of the style of many of Apple's(read more)

  • 02 Dec 2011 » Technical Leadership

    Yesterday I was talking with a potential future boss and was talking about what I was looking for in my next position(read more)

  • 26 Nov 2011 » Vagrant + Chef - Tips and Tricks

    Some general tips and tricks for using Vagrant and Chef. Dont know what I'm taking about? I'll explain the latest in VM coolness too.(read more)

  • 22 Nov 2011 » The Worst Case

    Just go and do it. The worst case is your life stays exactly the same.(read more)

  • 17 Nov 2011 » Intro To Culvert

    Culvert is a secondary indexing platform for BigTable, which means it provides everything you need to write indexes(read more)

  • 16 Nov 2011 » Filling in the BigTable Gaps

    BigTable is an amazingly scalable system, but has some missing features.(read more)

  • 11 Nov 2011 » Welcome to My Blog

    First post! Welcome to my blog/about me...(read more)