Serverless. There will be dragons.

Courtesy — the internet

I’m trying to wrap my head around with the latest “hot thingy”, serverless… While most of my experience has been in the AWS lambda space, I assume it’s not much different in the rest of the ecosystem.

Starting up

I agree that if you read any of the Hello world tutorials out there, it looks simple. Write 20 lines of code, run this command line, and voila, you have an API. Whoa … Not so fast, buddy. Here’s an (incomplete) list of things you will need to know in order to use it in production: Terraform or CloudFormation, API Gateway, Cognito (or some other authorizer) and integration with gateway, API Schema, wiring up everything with IAM, setting up Route 53 and Certificate Manager, building CloudWatch Dashboards. For starters. God forbid you mess one of the IAM permissions, hackers are just around the corner. Then add RDS or better DynamoDB, because serverless. Oh, you want to integrate Dynamo with API Gateway? Of course you can, it’s serverless world. Go read the Apache VTL spec and start programming in that, because it’s simple. Not.

You have more than one function ? Now you have to think about orchestration.
Make them async and communicate via SNS ? Easy on paper. In prod, you’ll most likely end with a spaghetti of queues for any medium-sized application, and good luck untangling that when you jump in a new project and try to understand what’s happening and how it works.
Have them call each other, like you would do when calling a different microservice ? Sure, but you pay the time while function A is waiting for function B to complete. Have to chain 5–7 calls to fulfill a request, and that happens 50 times per second ? Bad luck…
But wait, we have step functions. Right… Let’s start programming state machines in JSON, orchestrate them and debug them... Really ?? It looks like this:

Easy peasy…

Speaking of orchestration. We switched from the monolith to microservices, then came the whole jungle. Paraphrasing here the late Joe Armstrong on object oriented programming: “You wanted a banana but what you got was a gorilla holding the banana and the entire jungle”. You’re my hero.
Kubernetes, Istio, side cars, circuit breakers, service discovery, client-side load balancing, bulk heading, etc. Where are those in the serverless world? Oh, you don’t need them, because they are not really needed ? Are you sure ?
When you see someone like Kelsey Hightower pushing back for monoliths, when Sam Newman, the author of “Monolith to microservices” tells you that microservices should be the last resort, and when Istio moves back to being a monolith, you might start wonder if going to serverless is the move. For closing that paragraph, I still remember the quip from a while ago:

Or better, remember poor Clippy.

Logging and tracing. So, you have an issue in production. You have a request that spans over 5 or 7 lambdas, because architecture. First, there’s a delay until you can see the logs. Second, you need to identify the request across logs and correlate, because each lambda gets its own log. Imagine you have a system of 30–40 lambdas, 100 req/s, and you try to do the above. Good luck. There are external tools to help you with that. Go cough some more kwan if you really want to understand what’s going on.

Hard limits. They come in many flavors. How long can you run, how much memory can you use, payload size, concurrency and platform-wide limits, etc. Besides being a moving target, they strictly follow Murphy’s laws. If something can go south, it will, and at the worst moment. By example when you have a traffic burst and you need to scale. Alternatively, set them loose, forget a recursive loop in the code, and you’ll end up like the guy who wake up a while ago to a 90K US$ bill on GCP. Ex-Googler. Sorry, I misplaced the link.

Which brings me to pricing. If you have low traffic, *maybe* it’s a good solution. But if you have constant traffic, don’t even think about it. Two lambdas and API gateway at around 100 req/s will make you cough around 2K per month. You can handle that easily with a couple of 150$/month machines and an ALB. Maybe less. Two machines, just for redundancy purposes. In my tests I can handle that load on one machine with 4 cores / 8GB RAM, at 20–30% system load. Of course, it’s a biased test, under very specific conditions, but do the math.

Deployment. What will you use ? Cloudformation ? Terraform ? AWS CDK ? Serverless framework ? What are the quirks? (Hint: each of them has.. many). Now go read a few hundreds of pages of docs to understand one of the above. Then try to properly configure them for atomic management of deployments. Have fun. Understand how to build and deploy layers.
You want canary with traffic mirroring ? Again, good luck. I haven’t been able to set up something like this, I’m not smart enough probably. The docs says it’s possible.

Speaking of tooling, Yes, you can run unit test locally. No, don’t even think to go further than that. The are tools, like localstack, that advertise that capability. Don’t. It’s not a replica of the live environment. You will regret later that decision.

Okay. Rant over… You can read more if you want in this HN thread https://news.ycombinator.com/item?id=26855037 (I was thinking for a long time of writing this post, the thread was the catalyst).

Last words. If you notice errors in the above, please let me know. I’m far from considering myself an expert subject matter. I’m willing to learn. Let me know.

A bon etendeur, salut!

Disclaimer one: it’s my biased opinion. Don’t take it for granted. Do your homework. Investigate. Maybe it’s the right technology for you. Definitely, I got some of the above points wrong. I would expect a lot of bashing in the comments :) … But.
Go to the first principles. Build a well-architected monolith. You need different areas with different scalability requirements and diferrent failure domains? Maybe look into actor systems. Elixir/Erlang( a GenServer is a much better serverless abstraction imho). Akka, ZIO or Vaughn Vernon’s Vlingo platforms on JVM. Bastion-rs, Tokio on Rust. Go and goroutines, with a pinch of salt (hint: they don’t really offer the level of isolation like the others).
Once you have that, and you’re still not happy, apply the strangler pattern and split it to microservices. Still not happy ? Then maybe it’s a good time to look at serverless. I really don’t know what problems they would solve better than the previous options, but maybe they are. Hope it works for you. Start small. Hit the limits. See if you can live with it.
Yes, I know, there are people out there running full apps on serverless only. I admire them. Maybe they found the fit. I haven’t, yet.

Disclaimer two: those are strictly my personal opinions. No former, actual or future employer involved.

I write code