I like to hang out at the bleeding edge of open source and big data.
Day to day, I'm working the using data to save the world at Tesla.
In my freetime I'm also an Ironman triathlete, yogi, foodie and amateur board game maker.
Low volume data pipelines in Kafka tend to get stuck; there is more data to process but your consumers aren’t moving forward. And its not because you are doing anything wrong in your application. In fact, it’s part of the design! (read more)
After my Scaling a Kafka Consumer post, it only seemed fair to take a dive into the producer side of the world too. It’s got it’s own set of problems and tuning fun that we can dive into. (read more)
When scaling up Kafka consumers, particularly when dealing with a large number of partitions across a number of topics you can run into some unexpected bottlenecks. They get even worse when dealing with geographically remote clusters. The defaults will get you surprisingly far, but then you are...(read more)
The easiest way to setup basic authentication with Kafka is to use x509 certificates. However, getting these certificates into a place where they are actually by your Kafka client can be frustrating and error prone. That is why we recently released a read more)
In my recent talk at Kafka Summit I mentioned that users don’t think in offsets, but rather in amounts of time - minutes, hours - that a consumer is behind. When you say, “We might have pr...(read more)
If you attended Kafka Summit, or followed along on Twitter, you probably heard many people mentioning that you really really ought to upgrade your Kafka installation. No surprise, it often will fix many obscure bugs (aka those you are guaranteed to hit at scale), while increasing performance and ...(read more)
Looking to improve your efficiency with Git? Learn the secrets to go from novice to master to wizard. Not only that, but it can make life significantly easier and faster - every day. (read more)
Reliability scaling and managing streaming ingest - particularly when dealing IoT - is a challenging problem. Not only do you have to be low latency, correct and high volumes, you also get huge messages and bursty devices. On top of that, firmware developers have their own goals and are not optim...(read more)
As your Maven projets get larger it can take a non-trivial amount of time to complete a build, particularly if you are running each module in sequence due to code limitations. In this case, you should think about only having to build code that changes and its downstream dependencies. (read more)
You are starting to move away from your Hadoop vendor - it was great for getting started, but you want to control your own destiny, reap huge saving money or institute advanced management. Once you start managing your own Hadoop cluster there are many metrics you will need to start collecting and...(read more)
Many legacy build pipelines leverage Jenkins. If you get lucky, you will at least find the time to move to a Jenkinsfile - the same power as Jenkins, but now actually codified, rather than fragile point and click. (read more)
I’m excited to announce that I’ll be starting a new job on Monday… at Tesla Motors! They have a mission that is incredibly exciting - nothing less than trying to save the world. With lots of opportunity and potential for impact, I can’t wait to get started. (read more)
I’ve decided its time to wrap up Fineo. I took a shot for a while (nearly two years!), but I’m way past my original time deadline to get traction and well out of (allocated) money. I’ve spent the last few weeks writing up some of the interesting architecture/design work I did, so at leas...(read more)
Getting a robust continuous-integration (CI) suite was an early priority at Fineo. By spending some upfront time getting good infrastructure in place we could move dramatically faster down the road; with a distributed, micro/nano-service based architecture, in-depth testing across the stack is a ...(read more)
A short break from the Fineo architecture. Recently get for the thinking about self-improving systems. Specifically, I liked the idea of a self-improving system where the actors in and around the system are incentivized to continually improve the system. This makes sense in context of a ...(read more)
Passing pipeline processing errors back to the user was not originally built into the Fineo platform (a big oversight). However, we managed to add support for it over only a couple of weeks. Moreover, we were able to make it feel basically seamless with the existing platform. (read more)
Fineo’s architecture is designed to help people go faster, while having to do less by leveraging our NextSQL system. At the surface, it’s not that much different from the the Lambda Architecture - a realtime serving layer and an offline batch processi...(read more)
A deeper look into how Fineo manages its seamless scalability across the multi-layer architecture. By enabling each layer to scale independently and leaning on existing, fully-managed services we can enable wildly scalable infrastructure without notably increasing operations effort (and often dec...(read more)
Time is the major component of IoT data storage. You have to be able to quickly traverse time when doing any useful operation on IoT data (in essence, IoT data is just a bunch of events over time). (read more)
Multi-tenancy is an abstraction for a big, hard group of problems that touches on security, scalability, resource consumption and quality of service. Generally attempting to back-fit multi-tenancy is, at best, hacky and less than satisfying; at worst, its a recipe for disaster. (read more)
Fineo uses a novel semi-schemaful approach to unlock the potential of NoSQL data stores, while simultaneously enabling ‘metalytics’ queries by providing an engine that seamlessly supports everything from nearline, operational queries (e.g. low latency, small scale) to deep, ad-hoc analytics. Prim...(read more)
Rehash of the Dynamic Schema at Scale for how we implemented reading semi-schematized data from a NoSQL. I touch on the query translation and processing of updating schema. (read more)
With Fineo’s Beta availability (link), I thought it would be interesting to look at how Fineo actually supports IoT-scale ingest and eliminates the need for traditional pipelines and the mainta...(read more)
What’s new and interesting? What’s worth focusing on? Its worth taking a step back and looking at the larger picture. It helps make sure you are doing the right thing, for the right reasons. Doing that every day means nothing gets done, but too infrequently means getting lost in the weeds and, pe...(read more)
For years, hobbyist having been hacking their homes to create smarter parts that respond to their every whim. Smart homes were thrust even further into the public concience with the first Iron Man (2008) movie. Suddenly, everyone wanted self tinting, weather informing windows or a home assistant ...(read more)
Schema management is some of the most painful database work and anything you can do to make it easier can dramatically reduce an enterprises’ iteration interval. At Fineo we are focused on delivering a scalable, enterprise grade time-series platform. While we do lot...(read more)
I’d like to talk a bit about the AWS-focused ingest pipeline that we developed at Fineo. Not too ironically, its very similar to the pipeline that Netflix discussed by in a recent ...(read more)
Picking the right tools for deployment can be tricky and have long-lasting effects on your organization. Over at Fineo we have the luxury of doing everything from scratch. This means no concern with legacy tools, monitoring, or just cruft. Instead, we have the op...(read more)
Updating the layout/look and feel of the site a little bit. Please let me know if things no longer work for you! (read more)
Packaging software is a necessary evil, and for enterprise software RPMs even more so, but you might as well find out how to manage it when you really want to just work all from your Mac.(read more)
Don't know if you noticed, but there may haven been a bunch of RSS thrash from updates to my maven shade post. Jekyll had updated some dependencies(read more)
Sometimes, you will want to run multiple versions of the same library in the same JVM. Maybe you are writing a framework to run arbitrary user code or maybe you are just integrating with legacy code; either way, you will need a way to run both versions of some library in the same JVM - enter the ...(read more)
I've had to explain different parts of my bike setup a couple different times, so I thought I would do the end-to-end writeup with the 'whys' (the important part) of each component(read more)
Gradle is starting to become mature enough to be used as a viable build system. However, when trying to build with gradle there can be some easy idioms to help you up the learning curve(read more)
How do you manage a realtime queries and analytics over the same logic data?(read more)
Ironman Cozumel 2014 was a heck of a race. A 2.4mi swim half into the current, 112mi bike ride with brutal headwinds and a marathon. Together, it was enough to put me into the hospital.(read more)
Head's up (and links) for my upcoming Ironman race(read more)
Secondary indexing for HBase is a difficult problem, but remains perennially popular. Various implementations exist, but all fall short either in features or latency. [Phoenix](http://www.github.com/forcedotcom/phoenix) is soon gaining support for what we consider "HBase Consistent" Secondary ind...(read more)
I was recently asked to write a few guest blog posts about HBase. I'd had some ideas bouncing around for a while (and a little personal brand expansion is never a bad thing), so I started working on it. Here's some of my thoughts from the experience(read more)
In java 1.6_34, rolling GC logs was added. However, the documentation is wrong and hard to find. Here's how you manage it(read more)
Distributed systems inherently trade-off consistency or availability, making it very difficult (but doable) to implement secondary indexes at scale(read more)
How to use the new table references in the HBase shell(read more)
Of late, I've been really busy - let me explain.(read more)
How-To develop for HBase within Eclipse(read more)
A lot of this post is based on my recent discussions with a few companies - both big and small - who are attempting to 'change the paradigm' either of society in general,(read more)
This is the point where Jobs fell out of love with the style of cool, dark colors and more in favor of the style of many of Apple's(read more)
Yesterday I was talking with a potential future boss and was talking about what I was looking for in my next position(read more)
Some general tips and tricks for using Vagrant and Chef. Dont know what I'm taking about? I'll explain the latest in VM coolness too.(read more)
Just go and do it. The worst case is your life stays exactly the same.(read more)
Culvert is a secondary indexing platform for BigTable, which means it provides everything you need to write indexes(read more)
BigTable is an amazingly scalable system, but has some missing features.(read more)
First post! Welcome to my blog/about me...(read more)