Nielsen: Processing 55TB of Data Per Day with AWS Lambda



Start building with AWS today!

Learn from Nielsen Marketing Cloud how to process 55TB of data per day while maintaining quality, performance, and cost using a fully automated serverless pipeline.

~Filmed prior to COVID-19~

#AWS

source

49 thoughts on “Nielsen: Processing 55TB of Data Per Day with AWS Lambda”
  1. 300K opex per year for such large operation is no brainer. If the same thing was done on prem, it would have run into millions. Optimization is unnecessary.

  2. Using a bunch of lambdas for this sounds like it's way more expensive (cost-wise) than it should be. Even after accounting for the convenience.

  3. I can't even imagine how this system can handle 250 BN events a day. The architecture looks very elegant and extremely optimized. The most interesting takeaway for me was how they rate limited the system and was able to reduce the cost per BN events by simply tuning the lambda configurations. Excellent insights. Loved this.

  4. $300,000 system can easily handle this load. On top of that, when it comes to storage , you can ROI within 1-2 months. The only benefit I see using AWS is when you don't use much storage. Look at the storage price, it's been high for so long and we all know SSD doesn't die from READ, it dies from WRITE. Most developers usage are READ based. That's how AWS profit.

    I rather over provision with a company server.

  5. The guy asking the questions is legend. He was asking the very same questions that were coming to my mind. If it weren't for his bad-ass questions, this video would have been an average video. Thanks guys. Loved this video.

  6. There is no way it costs $1000 per day. I don't know if they are getting special pricing, but storing 30 TB of data daily on s3 would cost about $600 and that amount of egress traffic (e.g. mentioned 250 mbps ) also costs a lot. Let's assume that all lambda's and EMR don't even produce the logs and metrics stats (CloudWatch can also cost a lot).

  7. I see a lot of supportive services which are probably used in the background but are not explicitly mentioned. This includes CloudWatch which is probably monitoring the Lambda functions and databases. CloudWatch can also be used with CloudWatch Insights or Contributer Insights to help with the optimization of Lambda functions. SNS which can be used to launch system ticket for issues.

  8. It looks simple, but with that large amount of processing, you are better off using kubernetes, Lambda cost would be at least 100,000+ USD a day, and that’s just on one of four Lambda service in his architecture diagram.

Leave a Reply

Your email address will not be published.

Captcha loading...