MDAA dataops-lambda module
npm install @aws-mdaa/dataops-lambdaThe Data Ops Lambda CDK application is used to deploy the resources required to orchestrate data operations on the data lake using Lambda functions. The Lambda functions can currently be triggered by S3 EventBridge notifications (Required to be enabled on the source buckets) and EventBridge Rules.
*
Lambda Layers - Lambda layers which can be used in Lambda functions (inside or outside of this config)
Lambda Functions - Lambda functions for use in DataOps
* May be optionally VPC bound with configurable VPC, Subnet, and Security Group Parameters
* Can use an existing security group (from Project, for instance), or create a new security group per function
* If creating a per-function security group:
* All egress allowed by default (configurable)
* No ingress allowed (not configurable)
* Function environment encrypted using project KMS Key
* Encrypted DLQ automatically added for each Lambda with configurable retry/retention parameters
EventBridge Rules - EventBridge rules for triggering Lambda functions with events such as S3 Object Created Events
* EventBridge Notifications must be enabled on any bucket for which a rule is specified
*
Add the following snippet to your mdaa.yaml under the modules: section of a domain/env in order to use this module:
``yaml`
dataops-lambda: # Module Name can be customized
module_path: "@aws-caef/dataops-lambda" # Must match module NPM package name
module_configs:
- ./dataops-lambda.yaml # Filename/path can be customized
`yaml(Required) Name of the Data Ops Project
Name the project the resources of which will be used by these functions.
Other resources within the project can be referenced in the functions config using
the "project:" prefix on the config value.
projectName: dataops-project-test
# (Optional) Function Description
description: Function descriptions
# Function source code directory
srcDir: ./src/lambda/test
# Code path to the Lambda handler function.
handler: test.lambda_handler
# The runtime for the function source code.
runtime: python3.8
# The role with which the Lambda function will be executed
roleArn: ssm:/sample-org/instance1/generated-role/lambda/arn
# Number of times Lambda (0-2) will retry before the invocation event
# is sent to DLQ.
retryAttempts: 2
# The max age of an invocation event before it is sent to DLQ, either due to
# failure, or insufficient Lambda capacity.
maxEventAgeSeconds: 3600
# (Optional) Number of seconds after which the function will time out.
# Default is 3 seconds
timeoutSeconds: 10
# (Optional) Set of environment variables and values to be passed to function
environment:
env-var-name: value
# (Optional) Number of reserved concurrent instances to be configured on the function.
# Ensures function always has this amount of concurrency available, but
# is subtracted from the overall account-wide concurrency limits.
# Default is to not reserve concurrency, and use the account-wide pool.
reservedConcurrentExecutions: 100
# (Optional) Size of function execution memory in MB
# Default is 128MB
memorySizeMB: 512
# (Optional) Size of function ephemeral storage in MB
# Default is 1024MB
ephemeralStorageSizeMB: 1024
# Integration with Event Bridge for the purpose
# of triggering this function with Event Bridge rules
eventBridge:
# Number of times Event Bridge will attempt to trigger this function
# before sending event to DLQ. Note that Event Bridge Lambda invocation
# is async, so Lambda Function execution errors will generally be handled
# on the Lambda side itself.
retryAttempts: 10
# The max age of an event before Event Bridges sends it to DLQ.
maxEventAgeSeconds: 3600
# List of s3 buckets and prefixes which will be monitored via EventBridge in order to trigger this function
# Note that the S3 Bucket must have Event Bridge Notifications enabled.
s3EventBridgeRules:
testing-event-bridge-s3:
# The bucket producing event notifications
buckets: [sample-org-dev-instance1-datalake-raw]
# Optional - The S3 prefix to match events on
prefixes: [data/test-lambda/]
# Optional - Can specify a custom event bus for S3 rules, but note that S3 EventBridge notifications
# are initially sent only to the default bus in the account, and would need to be
# forwarded to the custom bus before this rule would match.
eventBusArn: "arn:{{partition}}:events:{{region}}:{{account}}:event-bus/some-custom-name"
# List of generic Event Bridge rules which will trigger this function
eventBridgeRules:
testing-event-bridge:
description: "testing"
eventBusArn: "arn:{{partition}}:events:{{region}}:{{account}}:event-bus/some-custom-name"
eventPattern:
source:
- "glue.amazonaws.com"
detail:
some_event_key: some_event_value
testing-event-bridge-schedule:
description: "testing"
# (Optional) - Rules can be scheduled using a crontab expression
scheduleExpression: "cron(0 20 ? *)"
# (Optional) - If specified, this input will be passed as the event payload to the function.
# If not specified, the matched event payload will be passed as input.
input:
some-test-input-obj:
some-test-input-key: test-value
# Example of a function which is VPC bound using a custom security group for this function
- functionName: testfunvpc
srcDir: ./src/lambda/test
handler: test.lambda_handler
runtime: python3.9
roleArn: ssm:/sample-org/instance1/generated-role/lambda/arn
# If vpcConfig is specified, Lambda will be VPC bound
vpcConfig:
# Required - The VPC on which the lambda will be bound
vpcId: "some-vpc-id"
# Required - the list of subnet ids on which ENIs will be created for the Lambda
subnetIds:
- "some-subnet-id"
# Optional - If specified, custom security group egress rules will be generated, and
# outgoing traffic from the function will be limited to these rules.
# If not specified, all outgoing traffic from the function will be permitted.
securityGroupEgressRules:
# Allow egress to a CIDR range
ipv4:
- cidr: 10.0.0.0/8
port: 443
protocol: tcp
# Allow egress to another Security Group
sg:
- sgId: sg-12312412412
port: 443
protocol: tcp
# Allow egress to a prefixlist
prefixList:
- prefixList: some-prefixlist-id
port: 443
protocol: tcp
# Example of a function which is VPC bound using an existing security group
- functionName: testfunvpcexistingsg
srcDir: ./src/lambda/test
handler: test.lambda_handler
runtime: python3.9
roleArn: ssm:/sample-org/instance1/generated-role/lambda/arn
# If vpcConfig is specified, Lambda will be VPC bound
vpcConfig:
# Required - The VPC on which the lambda will be bound
vpcId: "some-vpc-id"
# Required - the list of subnet ids on which ENIs will be created for the Lambda
subnetIds:
- "some-subnet-id"
# Optional - If specified, this security group will be bound to the Lambda function vpc interfaces
# In this example, we are using a security group generated by the DataOps Project module
securityGroupId: project:securityGroupId/test-security-group
# Example of a function which uses Lambda layers
- functionName: testlayerfunction
srcDir: ./src/lambda/test
handler: test.lambda_handler
runtime: python3.9
roleArn: ssm:/sample-org/instance1/generated-role/lambda/arn
layerArns:
some-existing-layer: some-existing-layer-arn
generatedLayerNames:
- "test-layer"
# Example of a function which uses a DockerBuild
- functionName: testdockerfunction
srcDir: ./src/lambda/docker
roleArn: ssm:/sample-org/instance1/generated-role/lambda/arn
# If true, lambda function will be built and deployed using Docker
# In this case, the srcDir is expected to container a Dockerfile
dockerBuild: true
`
The dockerBuild feature enables building Lambda functions and layers using Docker containers for complex dependencies and custom runtime environments.
`yaml`
lambdaFunctions:
layers:
- layerName: my-custom-layer
src: ./lambda/layer
description: 'Layer with complex dependencies'
dockerBuild: true
Directory Structure:
``
lambda/layer/
├── Dockerfile
├── requirements.txt
└── (other files)
Example Dockerfile for Layer:
`dockerfile
FROM public.ecr.aws/lambda/python:3.13
COPY requirements.txt .
RUN pip install -r requirements.txt -t /asset/python
CMD ["echo", "Layer built successfully"]
`
`yaml`
lambdaFunctions:
functions:
- functionName: my-docker-function
srcDir: ./lambda/docker-function
dockerBuild: true
description: "Function built with Docker"
Example Dockerfile for Function:
`dockerfile
FROM public.ecr.aws/lambda/python:3.13
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py ${LAMBDA_TASK_ROOT}
CMD ["app.lambda_handler"]
``
Use Docker build when your Lambda requires:
- Native libraries (e.g., NumPy, Pandas with C extensions)
- System-level dependencies
- Custom compiled modules
- Large dependency sets that exceed Lambda layer size limits