In this 3rd edition of Jampper Stories, our amazing Sr Python Developer, Claudio Freire gives us a glimpse of the "tech" behind "adtech"
In this third edition of Jampper Stories, our brilliant Senior Software Engineer, Claudio Freire gives us a glimpse of the “tech” behind “adtech”, telling us a little about the technological challenges of programmatic Real-Time Bidding, the love of caching, the programming languages we use, the tools we make…. Let’s dive right in! ⤵️
At the beginning of time (for me), I was working for a company that offered accounting services to the government. For those that haven’t had the pleasure(?), the level of bureaucracy and inertia you can find in those places is only rivaled by the amount of politics involved in absolutely every step of every project, big or small. I left that for Livra, and a few years later I joined Jampp.
My time at Livra is hard to separate from my time at Jampp because the atmosphere and culture were very similar — — which by the way is no coincidence, both companies were founded by the same people.
Livra was a real learning experience from day 1.
I’d be given a project in a field I knew little about, using tools I hadn’t used before, and that’s how I learned Python almost from scratch (I had used the language a bit for game scripting, but nothing serious until then).
I thought it was great and after spending some time in the government’s payroll, I could appreciate just how lucky I was.
I’m part of the Core team here at Jampp, where we develop our bidder and reporting software. But there wasn’t always a Core team. At the beginning, there wasn’t even a bidder.
When I started, I had to research real-time bidding (of which I knew next to nothing), and even the adtech business. That’s a typical day in software development here at Jampp: learning new things 🤓. To be honest, it should be like that in all software development teams, but it doesn’t always happen. Of course we don’t switch technologies every day, but there’s enough change to keep things interesting, and that’s not even including production support (or “putting out fires”, to use the proper technical term). Believe me, with hundreds of servers serving hundreds of thousands of requests per second, something goes wrong all the time, so we’re never idle.
Production support. What a trip! 😅 When the bidder was in diapers, we had a switch to turn it off in a pinch, should some algorithm go nuts. Nowadays, flipping that switch would be unthinkable. It would be equivalent to shutting down the stock exchange. Sometimes I think the biggest challenge is managing that insanely stringent unspoken requirement: to always be up and running. “Constant Uptime” means our systems are spending (and ideally also making) money 24/7/365 days a year.
It also means we need to make the infrastructure more efficient without interrupting day to day operations. Usually, the hardest questions to answer when designing a new feature or planning a system overhaul are: “How do we migrate the old data?” and “How do we switch this feature on without a hiccups?”. In the Core Team, the software we produce is the core of the business and stopping the business for a week or two while you “migrate stuff” is really not an option, so you have to get creative.
There’s no silver bullet, ever. We use automation and A/B testing a lot, we also deploy every new software in what’s called a “blue-green” deployment (or Canary Deployment). That’s basically when you test out new software in a small fraction of real traffic since a simulated environment won’t necessarily catch all issues.
Monitoring is the other very crucial tool at our disposal when something new is being tested. But not everything is solvable this way: when we have bigger projects that can’t be tested under those terms… so we have to think out of the box 🤯.
We have hundreds of servers receiving more than half a million requests per second all the time — that’s certain to give anyone headaches if you dwell on it — so the TL;DR version is that we use massive amounts of caching to make all that fast.
We use Amazon AWS for most of our cloud infrastructure, memcache for caching but also some homegrown caches. The whole bidder is written in Python, so it’s easy to work with, and we then use Cython to transpile Python into C so it runs as fast as possible.
We use Tornado as our web framework, and we publish a lot of what we do as open source libs: Chorde and sharedbuffers are a few examples of those. We make heavy use of Postgres for various purposes, mostly analytics. Everything our system does (auctions, bids, impressions, clicks) is stripped of all personal information, aggregated into anonymous aggregate rows, and fed into Postgres out of which we can make tons of reports.
Our team also uses EMR (hadoop, hive, presto, spark), and some “big data” tools we made ourselves.
We communicate all components either through a non-persistent pub-sub bus built on top of zmq, or persistent streams using Amazon Kinesis. And of course we do use and abuse S3 a lot, it’s really unbeatable for storage. (Plus it’s cheap and efficient, if you can design your application around its strengths).
I think what makes our infrastructure unique is that we care for efficiency a lot. Many companies out there will use whatever pre-built tool they can get their hands on (and which their engineers know) and try to hammer each and every nail with it, throwing money at the problem when things get tricky. We try to avoid that🙅.
We’ve built quite a few custom tools (even when there’s already something in the open source ecosystem that could do the trick). A good example is how we do aggregation for reporting. Instead of using one of the big data tools and spend, figuratively, a truckload of money on it, we manage that task with just a little bit of cleverness and a custom-designed architecture that pushes the aggregation task as close to the bidders as possible, reducing overhead to a minimum. That system is aggregating more than half a million requests per second with only 2 (rather small) machines and a tiny little help from all the systems pushing information into it. And in near real-time to boot!
Personally, I think the coolest projects tend to be small, high-value “micro-projects”. Even though I’ll deny it until my dying breath, we devs are, in our hearts, suckers for shiny things ✨. At least until they disappoint, which they usually do (see, I’m already denying it). For me there were quite a few of those little star micro-projects in our internal monitoring tools. One that is shiny and quite fun to watch (and very useful) shows all servers and the anomalies that happen in them in real-time and in a graphical, intuitive way you can check at a glance.
We’re also working on one that will provide visibility into what’s going on in that torrent of traffic while at the same time being mesmerizingly cool 🤩… but internal monitoring tools aren’t cool to anyone but us, so forget everything I said.
Going into some of the BIG projects… we’re redesigning a bunch of systems, eliminating usability, scalability and maintainability barriers. It’s even been prompting reviews of long-upheld “wisdom” that fell out of sync with reality, spurring rewrites and redesigns based on new insight. The project has huge potential, and that makes it very cool. It’s also monstrously complex, and that makes it… Well, also very cool 😎 Long story short, when it’s done, we’ll have opened up the gates to improvements that were really difficult before so keep an eye on our tech blog, I’m pretty sure there will be many a blog post when it’s ready.
I see Jampp as a solution to the maximization of happiness because it’s challenging and interesting, while still being relaxed and informal. We spend most of our lives at work, so if you can’t socialize comfortably at the office and be intellectually and professionally challenged at the same time, there’s zero chance for growth and happiness. I don’t like being bored, not a tiny bit, that’s why Jampp is the perfect fit for me.
Probably the thing that hits you the most is its scale. You don’t notice at first in tech, because all you see are requests to be processed. But nowadays, everybody has a mobile device and in many markets, smartphones have become the new PCs. Mobile is conquering the world, and you can feel that in the scale of what we do, and the amount of information that flows: we are processing +500K auctions per second.
I don’t put much stock in quotes and motivational phrases. I find their corniness a turnoff. Any attempt at shallow motivational talk is de-motivational for me, it triggers a “geez” moment. But, perhaps because it resonated with what I was going through at the time, or perhaps because I found meaning in it (probably not the same meaning the author intended it to have, but who cares) I find the following quote offers an accurate description of something I believe to be truly important in development work:
“Avoid entropy-driven work” — Minnen Ratta
Entropy is both a measure of chaos and information. I understand “entropy-driven work” to be work that follows the path of least resistance, that moves randomly without a clear goal, just because it’s easy to move in that direction. Software development is roughly 80% thinking, 10% writing, and 10% rewriting, with a pinch of inspiration. Nobody should ever start writing code without first defining the desired end result.
Fixing a locomotive’s engine, replacing the car wheels, painting every car a different color and attaching new cars (that you build in-house, of course), all while the train is moving, because it can’t be stopped. At 350kph, because it’s a TGV. And with one hand only, because you need the other to hold the coffee mug ☕