AWS Services You Should Avoid

栏目: IT技术 · 发布时间: 5年前

内容简介：Get ready for some personal and definitely opinionated opinions!AWS comes with many components that cover different areas of concerns. But, most are not general purpose and cheap enough to be applied correctly. Used in the wrong context, they end up wastin

Get ready for some personal and definitely opinionated opinions!

AWS comes with many components that cover different areas of concerns. But, most are not general purpose and cheap enough to be applied correctly. Used in the wrong context, they end up wasting time, resources, money, and will create developer friction and frustrations for months.

Here are the top 5 AWS Services you should avoid:

Cognito
CloudFormation
ElasticCache
Kinesis
Lambda

COGNITO — User Management/Mobile Login

Cognito sounds great on paper. It will take care of user management for you, including Google Sign-in, Facebook Login, and handle user role assignments, reset passwords, and define password rules. It can do both Desktop and Mobile. It can work with public user pools, and private user pools (to restrict enterprise private resources).

Cognito will help you save time by presenting its own UI when requesting permission from users.

Cognito has competitors: Auth0, OneLogin, Okta, to name a few. It is also possible to implement all of those above login workflows yourself, given enough time and team size.

But, if you do use Cognito for a while, ugly edges will reveal themselves. You will discover that Cognito cannot be used for Native Mobile logins . When you are making a mobile application, and attempting to get user to login via Facebook, if you had integrated with Firebase, it will prompt the user to open the Facebook App, and allow user to simply authenticate/authorize (assuming they are already logged in on the Facebook App).

With Cognito, they are presented with an embedded WebView, prompting the user to log into Facebook again in the embedded WebView. This creates a lot of friction, so you file tickets to the team asking about Native Logins.

AWS Services You Should Avoid — This is literally filed 16 hours ago, 2020–02–19, source: Github

Cognito does promise that it will work with any OAuth2 Identity provider. We thought this meant we could use it to login with WeChat , to reduce friction in user registration for Apps in China. After research, we ended up going with WeChat’s own SDK. There is no simple way of implementing simple social logins with Cognito.

If Mobile User management is in your requirement, you would do better with Firebase Social Login. It supports Facebook and Google native social logins out of the box, and gives you enough support on the server side to create the user record for WeChat users.

There is also the option of rolling your own authentication strategy and use each of the SDK provided by Google/Apple/Facebook/WeChat. This is the safest route if you can afford the time and engineering resource. But, if you are short on manpower/time, definitely do not go with Cognito. It cannot deliver in the Mobile context.

CloudFormation — Programmatically Configure AWS Resources

I hate CloudFormation. I hate “Stacks.” Parameters are very clunky to work with. The formatting of the template files is super verbose. Of course, the new YAML CF is better than the old JSON CF, but it’s still tough to read long complex stacks. Also you still have an internet full of JSON CloudFormation as your body of reference.

Circular dependency errors are soul crushing.

Do you need to reference the attributes of one resource in a different stack? Can’t. You need to do outputs. Okay what if an output of that stack changes and needs to be reflected in the other stack? You need to remember where it’s used and perform manual intervention.

Do you want to modularize a piece of the infrastructure so you can reuse it across multiple resources? Sorry, you need to use “nested stacks” so you can have even more stack on stacks that are going to get stuck in IN_PROGRESS or ROLLBACK mode. To be clear, what I am saying is, do not use nested stacks.

Lack of Drift detection or reconciliation. With lack of drift detection comes great uncertainty.

I don’t like all the Fn:sub , ! Ref , Fn:join . I honestly cannot imagine a whole infrastructure with multiple accounts, VPCs, subnets, peering connections documented entirely in CloudFormation.

Luckily the solution is easy!Use Terraform! Terraform is super awesome and we have been using it since early 2012. It’s cloud agnostic. You have a great module system that you can use and leverage heavily without fearing for your life.

You can even do logic on top of these modules to add or not add resources based on variables using terraform’s count attribute.

Here’ s a great article about drift with Terraform.

HCL is beautiful to read and write. Terraform plan is amazing for studying infrastructure changes and giving you the confidence that you are changing what you expect. (Yes CF has changesets if you must).

Moral of the story is, use terraform to destress yourself. Use terraform so that you do not spend a whole day fretting over how you will possibly make changes to a stack that is depended on from above and below by other stacks. Terraform plan . Terraform apply .

ElastiCache — Out of box managed Redis

Many a startup has fallen prey to ElastiCache. It usually happens when the team is under-staffed and rushing for a deadline and they type in Redis into the AWS console:

On the setup screen, you name your cluster, accept the defaults.

Voila! Begin using Redis. Perfect.

Obviously it works great.

What’s the problem? It is expensive, and no one notices for months. We’ve seen this happen a few times already when we did cost analysis for a couple clients. Firstly, the default gives you a cluster with beefy cache.r5.large instance. Secondly that cache.* prefix means this instance costs $0.216/hr instead of $0.126/hr , a 71.4% premium. Then you might think you need one for dev, qa, and prod. That all adds up to $$$. You don’t want to be the one who caused the $3000 ElastiCache line item on the AWS bill.

Run a Docker redis in ec2 for a while first. Make sure it serves your purposes. Figure out how the sizing works for you. Remember redis is meant to hold ephemeral data in memory, so please don’t treat it as a DB. It’s best to design your system assuming that redis may or may not lose whatever is inside. Clustering or HA via Sentinel, can all come later when you know you need it!

KINESIS — General Purpose Data Queue

Kinesis, in Amazon’s own words:

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.

Streaming data such as video processing/uploading are a good fit for Kinesis. But, not all data/records/events should go into Kinesis. It is not a general purpose enterprise event bus or queue .

A little context:

We once joined on a project midway, and have to support an ETL pipeline that included Kinesis in the deployment. The application collected small json records, and stuffed them into Kinesis with the python boto3 api . On the other side, worker process running inside EC2/ECS were pulling these records with boto3 and processing them. We then discovered that retrieving records out of Kinesis Streams when you have multiple worker is non-trivial.

You can develop a consumer application for Amazon Kinesis Data Streams using the Kinesis Client Library (KCL). Although you can use the Kinesis Data Streams API to get data from a Kinesis data stream, we recommend that you use the design patterns and code for consumer applications provided by the KCL.

Imagine if you are running 4 workers, and they are all listening to the same Kinesis stream for events. How do we prevent them from working on the same events? In traditional queueing systems, you use locks and owner columns to ensure each task is handled by only one worker. Amazon SQS will mark a message as busy when one of the worker is working on the message, so that other workers do not do duplicate work.

To implement the same pattern with Kinesis Stream, you must implement KCL, which is written in Java (nothing wrong with Java). The KCL agent will be spun up as a second daemon process in the background, and send events to your application when it itself receives events.

This means: all of your python/ruby/node docker images that handles records from Kinesis must install Java dependencies and run the KCL java process in the background . KCL communicates with the main program via std-in/out/err, so logging from the main application becomes problematic as well! If you have a bug in the application, you will not be able to print out relevant logs to STDOUT, because it will cause errors in the KCL agent.

Furthermore, in order to have multiple workers, Kinesis Streams require you to use multiple shards. Each worker will make claim to a shard. More shards = more bills. This worker to shard mapping is stored in DynamoDB! So, to recap: in order to use Kinesis as an enterprise queue, you must:

Bundle KCL agents into your running environment, which increases load and run time, and makes logging errors complicated.
Buy multiple shards to work on things in parallel.
Buy more storage space in DynamoDB, to label which worker can work on which shard.

It is both cumbersome and difficult to apply Kinesis stream correctly, especially when you are looking for an enterprise queue. You would do better by using SNS/SQS combination, or using a queuing framework that sits on top of Redis or traditional databases.

Lambda — Server-less REST API Replacement

Lambdas are great for the following tasks:

Serving/Redirecting requests to CloudFront.
Reacting to events from SNS or SQS — Small asynchronous tasks such as Image transformation (write to S3), OCR, and ETL (When volume is small and you can batch process).

Lambda is horrible for:

A replacement for REST API endpoints.

Take an example of online book shopping REST api:

/api/v1/books — GET/POST/PUT/DELETE
/api/v1/users — GET/POST/PUT
/api/v1/carts — GET/POST/PUT/DELETE
/api/v1/search — GET
/api/v1/stats — GET
… Many more.

A mature internal REST API endpoints can and will have hundreds of REST routes.

Normally, a company may choose an API gateway or Nginx to mount multiple microservices, one for books, one for users, one for carts, etc. Or, they might wise up and use a monolith and put all 4 modules into one backend repository.

With the Lambda server-less paradigm, you end up with 1 lambda function per route.

Each deployment now means you are pushing updates for hundreds of lambda resources.

It becomes difficult to test single routes and look at the code in AWS’s Lambda UI. Each lambda is configured via Environment variables individually. If you have 5 variables to connect into a MySQL RDS instance, it suddenly means you have 5 * 100 routes = 500 variables to configure inside the cloud formation template.

Furthermore, each of them will try to establish database connections when handling requests. There is literally no out of box way to share resources between lambdas for beginners.

Next, QA comes to you and ask about a backend to use with QA environments. You proceed to add “QA-” prefix to all lambda names and look at the lambda dashboard. Now, you have twice as many as small lambdas.

Next, DevOps comes to you and ask about a backend to use with Production environments. You proceed to add “Prod-” prefix to all lambda names and look at the dashboard. Now, you have thrice as many as small lambdas.

You then discover bugs within your code, leading to request handler throwing HTTP 500 errors, customers complain about things not working. You dig into CLOUD WATCH logs . We have three sets of logs all intermixed together. This is unreadable.

A couple of weeks later, you discover that a teammate checked in code such that lambdas are now calling each other recursively. This is really bad, because we no longer know how long a request can take now to complete. This is not a Lambda only problem, microservices can call each other by mistakes (and this is why we should use Monoliths when we can ).

QA comes to you, they are asking how they can run the entire backend stack locally with docker (A reasonable thing to ask) , so they can run selenium or cucumber tests against it. How would you do it when you have several hundreds routes, each tied up in a small lambda? You can’t. Your laptop is not powerful enough to run all of the routes at once.

DevOps comes to you, we would also like to run unit tests in CircleCI or Jenkins to test your feature branch before it gets merged in. But, we have limited computing resources and cannot spin up hundreds of small docker processes just to run the unit tests.

What should have stayed as a Monoliths or small and separate microservices are now impossible to run all at once in a local environment.

Do not attempt to use Lambda for REST API endpoints if you treasure your delivery time, sanity, and work throughput. Use a regular web framework for implementing REST APIs: express/flask/spring/go-revel/rails. They all work as intended, and you will have an easier time using them.

Conclusion

Well there you go. 5 AWS Services we love to hate. Obviously there are proper ways to use all of these, but if you are new, tread carefully or learn some design patterns from experts!

Triggered? Let us have it in the comments or clap us to our senses!

Check out more from us below!

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

AWS Services You Should Avoid

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

设计原本

Frederick P. Brooks, Jr. / InfoQ中文站、王海鹏、高博 / 机械工业出版社 / 2011-1-1 / 55.00元

无论是软件开发、工程还是建筑，有效的设计都是工作的核心。《设计原本:计算机科学巨匠Frederick P. Brooks的思考》将对设计过程进行深入分析，揭示进行有效和优雅设计的方法。本书包含了多个行业设计者的特别领悟。Frederick P. Brooks, Jr.精确发现了所有设计项目中内在的不变因素，揭示了进行优秀设计的过程和模式。通过与几十位优秀设计者的对话，以及他自己在几个设计......一起来看看《设计原本》这本书的介绍吧!

码农工具