The guide I wish I had
TL;DR — Deploying Fargate services is not as straight forward as you may think, especially if you’re used to the current EC2 configuration and are now trying to migrate running services. I had to go through a couple of days and few dozens of CloudFormation deployment iterations to figure out my missing / wrong settings before I made it through my first running Fargate container. Scroll to the bottom to see my final list of settings you should be aware of while migrating, as well as some CloudFormation template snippets.
So AWS released FARGATE for ECS last week on re:Invent. Watching the presentation during the keynote, and reading the fresh documentation released with it, made me think I just have to edit my existing template with a simple change; I’d replace my ECS service LaunchType to FARGATE and I‘ll be ready to deploy my first infrastructure-free service! Awesome right?! Well, not quite.
Before running through the unfortunate events of struggling to migrate a single production service from ECS-EC2 configuration to FARGATE, here’s the layout of my production architecture: I’m running around 30 micro-services on 7 different ECS clusters, where each is running an ALB on top. Using CloudFormation templates, based on the great [open source reference architecture by awslabs](https://github.com/awslabs/ecs-refarch-cloudformation).
The first clue to what I was up against started with the innocent line:
Value of property RequiresCompatibilities must be of type List of String
“That’s easy” I thought; “so I got a few parameter types wrong.. I’ll be a little more thorough and be done with it…”, not exactly. 🙃
Instead of going on and on about how I stumbled across this error and that exception, here’s a partial list of them:
> Fargate only supports network mode awsvpc
> Fargate requires that the privileged setting be false at the container level
> Fargate requires log configuration options to include awslogs-stream-prefix to support log driver awslogs
> Fargate requires task definition to have execution role ARN to support log driver awslogs
> Fargate requires that 'memory' be defined at the task level
> No Fargate configuration exists for given values
> The provided target group <Target-Group-ARN> has target type instance, which is incompatible with the awsvpc network mode specified in the task definition
> Network Configuration must be provided when networkMode 'awsvpc' is specified
> You cannot specify an IAM role for services that require a service linked role
> Placement strategies are not supported with FARGATE launch type
> CannotPullContainerError: API error (500): Get https://000.dkr.ecr.us-east-1.amazonaws.com/v2/
Each line of the list above led me to a different part of the CloudFormation documentation looking for clues on what went wrong. I’ll provide a list of solutions to each of the exceptions, but I’d like to first list *what you should do to begin with *instead.
The ECS service configuration has to be changed
-
LaunchType — Determines whether you run on EC2 or FARGATE
-
NetworkConfiguration: It turns out that FG services **must **run on awsvpc network configuration, to achieve that, you need to set AwsvpcConfiguration, and under that decide whether you AssignPublicIp or not, as well as the SecurityGroups and Subnets on which you plan to deploy your services
-
Roles: this is a tricky one.. Roles in this case should be set to service-linked roles, since this kind of roles cannot be created to the time of writing these lines using CloudFormation, read the docs on how to create them, although if an ECS cluster is already deployed, the role should be already set in your account’s IAM service. NOTE: You should NOT define a Role! Let Fargate catch the service-linked role automatically
-
Memory: “Fargate requires that ‘memory’ be defined at the task level”, meaning, remove the Memory setting from your service settings
Relevant API documentation: CreateService - Amazon EC2 Container Service Runs and maintains a desired number of tasks from a specified task definition. If the number of tasks running in a…docs.aws.amazon.com
Here’s how it should more or less look like:
Obviously, you can use ENABLED for the public IP and set your subnets to be the public ones. Since my containers have no reason to be reached out from the outside world, they are deployed in private subnets with an available route to a NAT gateway.
Money time: Changing the TaskDefinition
-
Cpu — While using FG, you have to provide a virtual CPU unit count for your container
-
RequiresCompatibilities — Should contain FARGATE
-
NetworkMode — By default is bridge, should now be changed to awsvpc
-
Memory — “Fargate requires that ‘memory’ be defined at the task level”, Other than that, the docs are not aligned
If your containers will be part of a task using the Fargate launch type, this field is optional and the only requirement is that the total amount of memory reserved for all containers within a task be lower than the task memory value.
From CloudFormation TaskDefinition doc:
If you are using the Fargate launch type, this field is required and you must use one of the following values, which determines your range of valid values for the cpuparameter
So? Should you set memory or not? It seems that CloudFormation doesn’t allow memoryReservation option, if you’re using a different deployment method, here’s what the API docs add:
You must specify a non-zero integer for one or both of memory or memoryReservation in container definitions. If you specify both, memory must be greater than memoryReservation.
How much memory should be set? Follow that:
512MB, 1GB, 2GB — Available cpu values: 256 (.25 vCPU) 1GB, 2GB, 3GB, 4GB — Available cpu values: 512 (.5 vCPU) 2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB — Available cpu values: 1024 (1 vCPU) Between 4GB and 16GB in 1GB increments — Available cpu values: 2048 (2 vCPU) Between 8GB and 30GB in 1GB increments — Available cpu values: 4096 (4 vCPU)
-
LogConfiguration — In case you’re using CloudWatch logs like me, you have to set awslogs-stream-prefix for the setting to work. Actually I’d like to get an explanation for this one… 🤔
-
Roles: with FG you have to set an execution role and a task role; you already have the predefined AmazonEC2ContainerServiceTaskExecutionRole policy, with which you can create a role, and provide it’s ARN to these (Unless you require extra permissions for your Task role, in which case you can just change the created role as you please): ExecutionRoleArn + TaskRoleArn
-
Note 1: You cannot have Privileged setting for containers on FG
-
Note 2: You cannot use PlacementStrategy, it’s determined by FG
Here’s my TaskDefinition:
And my TargetGroup:
Finally, my service is RUNNING! Cool huh? . . . Almost…
I just figured my entire CI process (working with the awesome drone.io) is only deploying a “normal” ECS TaskDefinition, lacking the above configurations completely when registering a new task. Since it’s a plugin I’ve created, there’s some work ahead 🛑, if you’re using Drone and looking into Fargate you can expect that soon enough. So what now? I got now a completely useless container running on a 100% managed infrastructure… 🤓 Up ahead is integration with CI.
On a more serious note, I wish AWS had taken care of such a list or at least a preparation guide, and if not that, then an example template would have saved tons of hours for me and probably others out there.
Upon my successful deployment I came across a similar document describing some frustration with FG, created by none else than Medium Engineering, so kudos to you guys and to you Bob Corsaro, wish I had seen this one a little earlier 😝: Starting FARGATE For the past year at Medium we’ve been using ECS to deploy containers to AWS. When Amazon announced FARGATE earlier…medium.engineering
At last, my list of errors and ways to handle them
> Fargate only supports network mode ?awsvpc?
# **Yep, check it out [here](http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-service-networkconfiguration.html)**
> Fargate requires that the ?privileged? setting be ?false? at the container level
# **Or not set at all, you can have privileged containers with FG.**
# **I was using it for some OpenFiles Linux settings
**# **That's not an option anymore**
> Fargate requires log configuration options to include awslogs-stream-prefix to support log driver awslogs
# **Simply add it to your log-configuration and set a prefix**
> Fargate requires task definition to have execution role ARN to support log driver awslogs
# **It speaks for itself, using CloudWatch logs? You need a role**
> Fargate requires that 'memory' be defined at the task level
# **Tasks are now "first class citizens" (AWS naming)
**# **Having memory set for each of them is part of that**
> No Fargate configuration exists for given values
# **Memory and CPU has to be set [according to specific rules](http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ecs-taskdefinition.html#cfn-ecs-taskdefinition-memory)
**# **e.g 512MB, 1GB, 2GB — Available *cpu* values: 256 (.25 vCPU)
**# **Setting 800 f.e is not an option, and this would be the result**
> The provided target group <Target-Group-ARN> has target type instance, which is incompatible with the awsvpc network mode specified in the task definition
# **See my TargetGroup - TargetType should be set to 'ip'**
> Network Configuration must be provided when networkMode 'awsvpc' is specified
# **That one is on me, [read the documentation about it](http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html) and check
**# **the** **'Service' example for the complete setting**
> You cannot specify an IAM role for services that require a service linked role
# **It took a while to get, the API docs was more elaborative**
# **than the CloudFormation ones; you CANNOT set a 'role' for your
**# **service when running on FG, ECS will pick it up automatically.
**# **Even if you set the service-linked role explicitly, you'll get
**# **this error until the setting is completely removed**
> Placement strategies are not supported with FARGATE launch type
# **It's clear, remove the setting if you have it**
> CannotPullContainerError: API error (500): Get [https://000.dkr.ecr.us-east-1.amazonaws.com/v2/](https://000.dkr.ecr.us-east-1.amazonaws.com/v2/)
# **If you see this one the problem is probably about subnets;
**#** if you didn't set an auto assign of public IP, and the
**# **subnets where you deployed don't have NAT routing
**#** (e.g Public subnets), this may be the result**
My takeaways
-
Read the API documentation before the CloudFormation one
-
But always read both
-
Assume that every migration like that would take some time and have it’s own birth pangs
-
Take into account all processes involved with infrastructural changes, e.g templating and saving it on VCS, CI processes and plugins etc.
-
Don’t be a too early adopter. I’m kidding, this one won’t happen
Thanks for reading, I hope this helps other Fargate struggling people out there, or at least soften their pain. Please let me know of any mistake, improvement or general questions, I welcome any kind of feedback.