This CloudFormation template deploys all the necessary infrastucture in AWS to support Metaflow’s integration points and extend its capabilities into the Cloud. A brief snapshot of its components are as follows:
Amazon S3 Bucket - Metaflow uses Amazon S3 as a centralized data repository for all data that’s leveraged by and generated for its flows. This template creates a dedicated private bucket and all appropriate permissions.
AWS Batch Compute Environment - In order to extend Metaflow’s compute capabilities to the cloud, AWS Batch provides a simple API that runs container-based jobs to completion on AWS Elastic Container Service.
AWS Step Functions and Event Bridge IAM Resources - While Step Functions state machines aren’t explicitly created by this template, Metaflow’s 2.0+ releases include functionality to allow a 1:1 Flow <–> State Machine relationship. In order to facilitate this, there are some IAM roles and policies specific to allowing Metaflow to deploy and trigger Step Functions State Machines.
Amazon DynamoDB Table - Metaflow leverages DynamoDB to store information related to branching paths in flows executed by AWS Step Functions. This template deploys the appropriate table and overlays necessary permissions for AWS Batch and AWS Step Functions to communicate with it.
Amazon Sagemaker Notebook Instance - Metaflow’s API allows for easy access to flow results and information which can be cleanly displayed in a Jupyter notebook. Amazon Sagemaker Notebook instances provide a fully managed notebook environment with dedicated and customizable compute resources.
Metadata and Database Services on AWS Fargate and Amazon Relational Database Service - To facilitate persistent programmatic access to flow information, Metaflow provides a Metadata service that can be run on cloud resources and enable remote accessibility. This CloudFormation template leverages AWS Fargate and Amazon Relational Database Service to deploy the Metadata Service Automatically.
Amazon API Gateway - To provide secure, encrypted access to a user’s Metadata Service, this CloudFormation template uses Amazon API Gateway as a TLS termination point and an optional point of basic API authentication via key.
Amazon VPC Networking - All underlying network components are deployed to facilitate connectivity for the resources leveraged by Metaflow. Specifically, a VPC with (2) customizable subnets and Internet connectivity will be leveraged for this template.
AWS Identity and Access Management - Roles specific to Metaflow will be provisioned by this template in order to provide “principle of least privilege” access to resources such as AWS Batch and Amazon Sagemaker Notebook instances. Additionally, an optional role can be created that provides restricted access to only the resources Metaflow requires. This allows an easy path of utilization to users who don’t need full access to all AWS resources.
Once complete, you’ll find an “Outputs” tab that contains values for the components generated by this CloudFormation template. Those values correlate to respective environment variables (listed next to the outputs) you’ll set to enable cloud features within Metaflow.
Did you choose to enable “APIBasicAuth” and/or “CustomRole” and are wondering how they work? Then you’re in the right place! Below are some details on what happens when those features are enabled and how to make use of them.
APIBasicAuth - In addition to TLS termination, Amazon API Gateway provides the ability to authenticate requests to Metaflow Metadata Service using an API key. Note that for security reasons, CloudFormation doesn’t include the key itself in the output. CloudFormation only outputs the ID of the API Key for your stack. Follow one of the two instructions below to output the key, and then export it to the METAFLOW_SERVICE_AUTH_KEY
environment variable.
aws apigateway get-api-key --api-key <YOUR_KEY_ID_FROM_CFN> --include-value | grep value
CustomRole - This template can create an optional role that can be assumed by users (or applications) that includes limited permissions to only the resources required by Metaflow, including access only to the Amazon S3 bucket, AWS Batch Compute Environment, and Amazon Sagemaker Notebook Instance created by this template. You will, however, need to modify the trust policy for the role to grant access to the principals (users/roles/accounts) who will assume it, and you’ll also need to have your users configure an appropriate role-assumption profile. The ARN of the Custom Role can be found in the “Output” tab of the CloudFormation stack under MetaflowUserRoleArn
. To modify the trust policy to allow new principals, follow the directions here. Once you’ve granted access to the principals of your choice, have your users create a new Profile for the AWS CLI that assumes the role ARN by following the directions here.
EnableUI
)Please note: This section can be ignored if EnableUI
is set to false (this is the default value).
This template deploys the UI with authentication using Amazon Cognito. For Cognito to work, you’ll need to provide a DNS name and SSL certificate from AWS ACM. That means you’ll need a few additional steps if using the UI:
EnableUI
to “true”, and in addition to this:
PublicDomainName
to the domain name you choseCertificateArn
to the certificate ARN from step 2 aboveLoadBalancerUIDNSName
output value. You’ll need to modify DNS settings to point your domain name to that name.
LoadBalancerUIDNSName