Start building with AWS today!
Learn from Nielsen Marketing Cloud how to process 55TB of data per day while maintaining quality, performance, and cost using a fully automated serverless pipeline.
~Filmed prior to COVID-19~
#AWS
source
Interview Tips & Tricks free videos
Start building with AWS today!
Learn from Nielsen Marketing Cloud how to process 55TB of data per day while maintaining quality, performance, and cost using a fully automated serverless pipeline.
~Filmed prior to COVID-19~
#AWS
source
This is amazing to see the power of Lamda function
It was a bit annoying that the interviewer kept on interrupting him
is it possible to have million lambda invocation in one second ?
not the same lambda but i mean the max number of concurrent lambda per second
Good video thanks
You should consider using RDS PROXY to help with overcoming the connection limiting to essentially
What blackboard are you using? Thank you.
Would be great to know if the recent 100ms -> 1ms billing granularity of Lambda has continued to dramatically reduce the costs?
300K opex per year for such large operation is no brainer. If the same thing was done on prem, it would have run into millions. Optimization is unnecessary.
Tell us that sweet spot
I think it's pleasure to work there, you know. Challenging and interesting task
Using a bunch of lambdas for this sounds like it's way more expensive (cost-wise) than it should be. Even after accounting for the convenience.
A timeline :
9:02 https://aws.amazon.com/fr/this-is-my-architecture/?tma.sort-by=item.additionalFields.airDate&tma.sort-order=desc
Waoooo🔥
Isn't there a limit of 1000 concurrent lambdas per AWS account?
So funny, your system works so well and follows a very nice scalable approach that your partners think you're doing something wrong, amazing! 😮
I can't even imagine how this system can handle 250 BN events a day. The architecture looks very elegant and extremely optimized. The most interesting takeaway for me was how they rate limited the system and was able to reduce the cost per BN events by simply tuning the lambda configurations. Excellent insights. Loved this.
+1
55 TB of data! this is cloud computing pornography 😅
Richy quick study case , good job
I did not see the real-time process workload, I think these workloads
Are batch and ETL based on the AWS
How much data that RDS Postgres is holding? Is that one record for each event?
Only your credit card can stop u know
Superb
I have no idea what these guys are talking about. Where do I start learning about it?
$300,000 system can easily handle this load. On top of that, when it comes to storage , you can ROI within 1-2 months. The only benefit I see using AWS is when you don't use much storage. Look at the storage price, it's been high for so long and we all know SSD doesn't die from READ, it dies from WRITE. Most developers usage are READ based. That's how AWS profit.
I rather over provision with a company server.
The guy asking the questions is legend. He was asking the very same questions that were coming to my mind. If it weren't for his bad-ass questions, this video would have been an average video. Thanks guys. Loved this video.
Nielsen
Garbage In
Garbage Out
Debugging Lambda functions is supremely annoying.
We love this blackboard and the colors you use in these videos, can you share some details on what markers and blackboard that you are using?
Do u somehow try to manage the order of events before sending to diff networks?
There is no way it costs $1000 per day. I don't know if they are getting special pricing, but storing 30 TB of data daily on s3 would cost about $600 and that amount of egress traffic (e.g. mentioned 250 mbps ) also costs a lot. Let's assume that all lambda's and EMR don't even produce the logs and metrics stats (CloudWatch can also cost a lot).
One of the best This Is My Architecture episodes to date. Great job, gents
What was the specific reason of using rds instead of dynamo db?. As dynamo db is write friendly
Would love to see their RDS bill.
that's was AWSome!
Nice!
Isn't lambda concurrency capped out at 1k ?
How long did it take to implement the system? How many people worked(or are still working) on this?
I see a lot of supportive services which are probably used in the background but are not explicitly mentioned. This includes CloudWatch which is probably monitoring the Lambda functions and databases. CloudWatch can also be used with CloudWatch Insights or Contributer Insights to help with the optimization of Lambda functions. SNS which can be used to launch system ticket for issues.
How abt ur EMR Configuration, can u publish the number of nodes and type?
Cost : 1000 USD/day at 6:26 for a such big system, it is cheap.
Does it include everything on the blackboard ( lambda, SQS, RDS, S3, EMR) and outbound bandwidth ?
It looks simple, but with that large amount of processing, you are better off using kubernetes, Lambda cost would be at least 100,000+ USD a day, and that’s just on one of four Lambda service in his architecture diagram.
Nice integration of services and description on scale!
Implementing rate limit so that you dont hurt your partners is next level badass!
Use Kinesis streams instead of SQS. This change is a BIG difference in the bill and the performance is much better!
How about the RDS storage cost? Which is increased a few TB per day?
I will be moving to Lambda/Serverless for my next projects. I can write Python or Node functions within Lamba and scale very easily. Awesome!
This is what an elegant solution looks like. Well done guys. 🙂
rds could be dynamodb if they don't need full sql