Serverless plugin to deploy Glue Jobs
npm install serverless-glue*
I have been away from the world of Glue so it is difficult for me to maintain this plugin since I do not have an AWS account to test and improve the plugin. So if someone wants to keep making updates to the repository, talk to me internally to add it as a maintainer so I can publish new versions.
Regards from Chile!
*
Serverless-glue is an open source MIT licensed project, which has been able to grow thanks to the community. This project is the result of an idea that did not let it rest in oblivion and many hours of work after hours.
If you want to help me you can do it in the following ways:
- With a donation through Paypal here.
- Sharing your feedback here.
I hope you liked this project and it is useful for you.
Any problems? Join to the slack channel.
---
The principal changes are available here
---
This is a plugin for Serverless framework that provide the possibility to deploy AWS Glue Jobs and Triggers
1. run npm install --save-dev serverless-glue
2. add serverless-glue in serverless.yml plugin section
``yml`
plugins:
- serverless-glue
How it works
The plugin creates CloufFormation resources of your configuration before making the serverless deploy then add it to the serverless template.
So any glue-job deployed with this plugin is part of your stack too.
Configure your glue jobs in the root of servelress.yml like this:
`yml
Glue:
bucketDeploy: someBucket # Required
createBucket: true # Optional, default = false
createBucketConfig: # Optional
ACL: private # Optional, private | public-read | public-read-write | authenticated-read
LocationConstraint: af-south-1
GrantFullControl: 'STRING_VALUE' # Optional
GrantRead: 'STRING_VALUE' # Optional
GrantReadACP: 'STRING_VALUE' # Optional
GrantWrite: 'STRING_VALUE' # Optional
GrantWriteACP: 'STRING_VALUE' # Optional
ObjectLockEnabledForBucket: true # Optional
ObjectOwnership: BucketOwnerPreferred # Optional
s3Prefix: some/s3/key/location/ # optional, default = 'glueJobs/'
tempDirBucket: someBucket # optional, default = '{serverless.serviceName}-{provider.stage}-gluejobstemp'
tempDirS3Prefix: some/s3/key/location/ # optional, default = ''. The job name will be appended to the prefix name
jobs:
- name: super-glue-job # Required
id: # Optional, string
scriptPath: src/script.py # Required script will be named with the name after '/' and uploaded to s3Prefix location
Description: # Optional, string
tempDir: true # Optional true | false
type: spark # spark / spark_streaming / pythonshell # Required
glueVersion: python3-2.0 # Required "python3.9-1.0" | "python3.9-2.0" | "python3.9-3.0" | "python3-1.0" | "python3-2.0" | "python3-3.0" | "python2-1.0" | "python2-0.9" | "scala2-1.0" | "scala2-0.9" | "scala2-2.0" | "scala3-3.0"
role: arn:aws:iam::000000000:role/someRole # Required
MaxCapacity: 1 #Optional
MaxConcurrentRuns: 3 # Optional
WorkerType: Standard # Optional, G.1X | G.2X
NumberOfWorkers: 1 # Optional
SecurityConfiguration: # Optional, name of security configuration
Connections: # Optional
- some-conection-string
- other-conection-string
Timeout: # Optional, number
MaxRetries: # Optional, number
DefaultArguments: # Optional
class: string # Optional
scriptLocation: string # Optional
extraPyFiles: string # Optional
extraJars: string # Optional
userJarsFirst: string # Optional
usePostgresDriver: string # Optional
extraFiles: string # Optional
disableProxy: string # Optional
jobBookmarkOption: string # Optional
enableAutoScaling: string # Optional
enableS3ParquetOptimizedCommitter: string # Optional
enableRenameAlgorithmV2: string # Optional
enableGlueDatacatalog: string # Optional
enableMetrics: string # Optional
enableObservabilityMetrics: string # Optional
enableContinuousCloudwatchLog: string # Optional
enableContinuousLogFilter: string # Optional
continuousLogLogGroup: string # Optional
continuousLogLogStreamPrefix: string # Optional
continuousLogConversionPattern: string # Optional
enableSparkUi: string # Optional
sparkEventLogsPath: string # Optional
additionalPythonModules: string # Optional
customArguments: # Optional; these are user-specified custom default arguments that are passed into cloudformation with a leading -- (required for glue)
custom_arg_1: custom_value
custom_arg_2: other_custom_value
SupportFiles: # Optional
- local_path: path/to/file/or/folder/ # Required if SupportFiles is given, you can pass a folder path or a file path
s3_bucket: bucket-name-where-to-upload-files # Required if SupportFiles is given
s3_prefix: some/s3/key/location/ # Required if SupportFiles is given
execute_upload: True # Boolean, True to execute upload, False to not upload. Required if SupportFiles is given
Tags:
job_tag_example_1: example1
job_tag_example_2: example2
triggers:
- name: some-trigger-name # Required
Description: # Optional, string
StartOnCreation: True # Optional, True or False
schedule: 30 12 ? * # Optional, CRON expression. The trigger will be created with On-Demand type if the schedule is not provided.
Tags:
trigger_tag_example_1: example1
actions: # Required. One or more jobs to trigger
- name: super-glue-job # Required
args: # Optional
custom_arg_1: custom_value
custom_arg_2: other_custom_value
timeout: 30 # Optional, if set, it overwrites specific jobs timeout when job starts via trigger
SecurityConfiguration: # Optional, name of security configuration
`
You can define a lot of jobs...
`yml
Glue:
bucketDeploy: someBucket
jobs:
- name: jobA
scriptPath: scriptA
...
- name: jobB
scriptPath: scriptB
...
`
And a lot of triggers...
`yml
Glue:
triggers:
- name:
...
- name:
...
`
|Parameter|Type|Description|Required|
|-|-|-|-|
|bucketDeploy|String|S3 Bucket name|true|
|createBucket|Boolean|If true, a bucket named as bucketDeploy will be created before. Helpful if you have not created the bucket first|false|
createBucketConfig|createBucketConfig| Bucket configuration for creation on S3 |false|
|s3Prefix|String|S3 prefix name|false|
|tempDirBucket|String|S3 Bucket name for Glue temporary directory. If dont pass argument the bucket'name will generates with pattern {serverless.serviceName}-{provider.stage}-gluejobstemp|false|
|tempDirS3Prefix|String|S3 prefix name for Glue temporary directory|false|
|jobs|Array|Array of glue jobs to deploy|true|
|Parameter|Type|Description|Required|
|-|-|-|-|
|ACL|String|The canned ACL to apply to the bucket. Possible values include:
|Parameter|Type|Description|Required|
|-|-|-|-|
|name|String|name of job|true|
|id|String|logical ID in CloudFormation for the job|false|
|Description|String|Description of the job|False|
|scriptPath|String|script path in the project|true|
|tempDir|Boolean|flag indicate if job required a temp folder, if true plugin create a bucket for tmp|false|
|type|String|Indicate if the type of your job. Values can use are : spark, spark_streaming or pythonshell|true|[language][version]-[glue version]
|glueVersion|String|Indicate language and glue version to use ( ) the value can you use are:
|true|
|role|String| arn role to execute job|true|
|MaxCapacity|Double| The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs|false|
|MaxConcurrentRuns|Double|max concurrent runs of the job|false|
|MaxRetries|Int|Maximum number of retires in case of failure|False|
|Timeout|Int|Job timeout in number of minutes|False|
|WorkerType|String|The type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, or G.2X.|false|
|NumberOfWorkers|Integer|number of workers|false|
|SecurityConfiguration|String|The name of the security configuration that the job should use|false|
|Connections|List|a list of connections used by the job|false|
|DefaultArguments|object|Special Parameters Used by AWS Glue for mor information see this read the AWS documentation|false|
|SupportFiles|List|List of supporting files for the glue job that need upload to S3|false|
|Tags|JSON|The tags to use with this job. You may use tags to limit access to the job. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.|false|
|Parameter|Type|Description|Required|
|-|-|-|-|
|name|String|name of the trigger|true|
|schedule|String|CRON expression|false|
|actions|Array|An array of jobs to trigger|true|
|Description|String|Description of the Trigger|False|
|StartOnCreation|Boolean|Whether the trigger starts when created. Not supperted for ON_DEMAND triggers|False|
Only On-Demand and Scheduled triggers are supported.
|Parameter|Type|Description|Required|
|-|-|-|-|
|name|String|The name of the Glue job to trigger|true|
|timeout|Integer|Job execution timeout. It overwrites|false|
|args|Map|job arguments|false|
|Tags|JSON|The tags to use with this triggers. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.|false|
Only run serverless deploy
---
$3
- fix when parametter
s3Prefix is omitted generate a undefine prefix [2.11.1] - 2022-09-13
$3
- Add support for custom logical IDs for jobs $3
- Fix Pascal Case generation for sections of names that are only numeric[2.10.0] - 2022-09-12
$3
- Add support for python 3.9 shell jobs[2.9.0] - 2022-06-03
$3
- Add support to Glue 3.0 (Spark 3.1.1/Python 3.7)
- Now aws-s3 client is generated with region defined on "provider" part of serverless.yml
$3
- the hard coded path generator is replaced by the "path" package, to solve problems when running the package on Windows
- the last "/" characters on
tempDirS3Prefix are automatically removed to avoid wrong paths in S3 [2.8.0] - 2022-03-31
$3
- Add check if bucket exist before create it
[2.7.0] - 2022-02-25
$3
- Add configuration MaxCapacity for job
[2.6.0] - 2022-02-25
$3
- Add support for SecurityConfiguration property[2.5.0] - 2022-02-14
$3
- Add the createBucketConfig feature to set the bucket creation configuration.$3
- Removed message when support files not found, now logging message when support files exist.
$3
- Improve the createBucket example of documentation.[2.4.1] - 2022-02-01
$3
- Fix schema typo that blocks serverless 3.
[2.4.0] - 2022-01-17
$3
- Fix NumberOwfWorkers typo.$3
- Added Timeout, MaxRetries and Description parameters to Glue Job arguments. Added Description and StartOnCreation parameters to Glue Job Trigger arguments.
- Added SupportFiles to Glue Job arguments handling the upload to S3 of relevant-to-the-Glue-Job(s) files
[2.3.0] - 2021-12-23
$3
- Implement Custom Arguments for Jobs
[2.2.0] - 2021-12-22
$3
- Implement Tags for jobs and triggers
[2.1.1] - 2021-12-21
$3
- Remove empty connections object from CF template when dont specify any conection