AWS | DataGinger.com

Archive for the ‘AWS’ Category

Building a Cost-Effective CI/CD Pipeline from Scratch: Automating AWS S3 Uploads with a Shell Script

Posted in AWS, tagged aws-s3, BASH, CICD on February 12, 2023| 1 Comment »

In this article, I’ll walk you through creating a lightweight Continuous Integration and Continuous Deployment (CI/CD) pipeline using a Bash script to automate uploads to an AWS S3 bucket. This solution accelerates deployment and feedback cycles without complex tools or costly infrastructure—a practical “poor man’s CI/CD pipeline” for DevOps engineers and developers alike. I’ll also compare this approach with two alternatives to illustrate different ways to achieve the same outcome, highlighting their strengths and trade-offs.

Why Automate AWS S3 Uploads?

With AWS S3’s rise as a scalable, cost-effective storage solution, manually uploading artifacts for testing or deployment can bottleneck development workflows. Automating this process saves time, ensures consistency, and aligns with DevOps principles of efficiency and repeatability. This article presents a Bash-based solution and evaluates it against other options to demonstrate versatile problem-solving.

Approach 1: The Bash Script Solution

How It Works

This script uses the AWS CLI to upload files from two local directories to an S3 bucket. Key features include:

Input Flexibility: Accepts file extensions (e.g., .zip, .jar) or all as arguments.
Dual-Directory Support: Scans dir1 and dir2 for matching files.
Branch Management: Checks out a specified Git branch (defaults to master).
Error Handling: Validates AWS CLI presence and parameter usage.

The Script

#!/bin/bash

# Configuration
BUCKET_NAME="<bucket-name>"
S3_LOCATION="<s3-location>"
DIR1="<directory-path-1>"
DIR2="<directory-path-2>"
DEFAULT_BRANCH="master"

# Verify AWS CLI installation
if ! command -v aws >/dev/null 2>&1; then
    echo "Error: AWS CLI is not installed. Please install it and configure your credentials."
    exit 1
fi

# Function to upload files to S3
upload_to_s3() {
    local dir="$1"
    local extension="$2"
    for file in $(find "$dir" -name "*$extension"); do
        filename=$(basename "$file")
        aws s3 cp "$file" "s3://$BUCKET_NAME/$S3_LOCATION/$filename"
        if [ $? -eq 0 ]; then
            echo "Uploaded: $filename"
        else
            echo "Error: Failed to upload $filename"
            exit 1
        fi
    done
}

# Checkout specified branch
BRANCH_NAME="${BRANCH_NAME:-$DEFAULT_BRANCH}"
echo "Switching to branch: $BRANCH_NAME"
git checkout "$BRANCH_NAME" || { echo "Error: Failed to checkout $BRANCH_NAME"; exit 1; }

# Process arguments
if [ $# -eq 0 ]; then
    echo "Error: Please provide at least one file extension (e.g., '.zip') or 'all'."
    echo "Usage: $0 [extension1] [extension2] ... | $0 all"
    exit 1
elif [ "$1" == "all" ] && [ $# -eq 1 ]; then
    echo "Uploading all files from $DIR1 and $DIR2..."
    upload_to_s3 "$DIR1" ""
    upload_to_s3 "$DIR2" ""
else
    for ext in "$@"; do
        if [ "$ext" == "all" ]; then
            echo "Error: 'all' cannot be combined with specific extensions."
            echo "Usage: $0 [extension1] [extension2] ... | $0 all"
            exit 1
        fi
        echo "Uploading files with extension: $ext"
        upload_to_s3 "$DIR1" "$ext"
        upload_to_s3 "$DIR2" "$ext"
    done
fi

echo "Upload process completed."

Customization

Adjust BUCKET_NAME, S3_LOCATION, DIR1, and DIR2 to fit your needs. Set BRANCH_NAME (e.g., BRANCH_NAME=dev ./upload.sh all) for branch-specific uploads.

Approach 2: GitHub Actions

How It Works: Define a YAML workflow in your GitHub repository (e.g., .github/workflows/deploy.yml) to upload files to S3 on code pushes.

Example Workflow:

name: Deploy to S3
on: [push]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      - name: Upload to S3
        run: aws s3 sync ./artifacts/ s3://<bucket-name>/<s3-location>/

Trade-Offs: Offers automation tied to Git events but requires a public or private GitHub repo and secret management.

Approach 3: Jenkins Pipeline

How It Works: Configure a Jenkins pipeline (e.g., via a Jenkinsfile) to poll a Git repo and upload artifacts to S3.

Example Pipeline:

pipeline {
    agent any
    stages {
        stage('Checkout') {
            steps {
                git branch: 'master', url: '<repo-url>'
            }
        }
        stage('Upload to S3') {
            steps {
                sh 'aws s3 sync ./artifacts/ s3://<bucket-name>/<s3-location>/'
            }
        }
    }
}

Comparing Alternatives

Trade-Offs: Provides enterprise-grade features but demands server setup and maintenance, making it overkill for simple use cases.

While the Bash script is a lightweight, cost-free solution, other approaches can achieve the same goal. Here’s a comparison of three methods:

Approach	Description	Pros	Cons	Best For
1. Bash Script	A shell script using AWS CLI to upload files from local directories to S3.	– Zero cost – Highly customizable – Runs locally with minimal setup – Fast execution	– Limited scalability – Manual execution – No built-in CI/CD triggers	Solo devs, small teams, quick tests
2. GitHub Actions	A workflow in GitHub Actions to automate S3 uploads on push or pull requests.	– Free tier available – Integrates with Git – Scalable with triggers – UI dashboard	– Requires GitHub repo – Learning curve for YAML – Internet dependency	Teams using GitHub, CI/CD beginners
3. Jenkins Pipeline	A Jenkins job to automate S3 uploads, triggered by SCM polling or webhooks.	– Robust and scalable – Extensive plugin support – On-premises option	– Setup complexity – Resource-intensive – Maintenance overhead	Enterprises, complex workflows

Key Benefits of the Bash Approach

Time Efficiency: Automates uploads in seconds, ideal for rapid testing.
Cost Savings: No paid tools or cloud instances—just Bash and AWS CLI.
Simplicity: Minimal setup, perfect for quick wins or resource-constrained environments.

Compared to GitHub Actions and Jenkins, it lacks native CI/CD triggers but excels in simplicity and cost.

Conclusion

The Bash script offers a fast, free, and flexible way to automate S3 uploads, serving as a “poor man’s CI/CD pipeline” for small-scale needs. While GitHub Actions and Jenkins provide more robust automation, the script’s simplicity makes it a compelling choice for quick deployments or learning projects. Whatever your context, understanding these options equips you to tailor solutions to real-world challenges—a skill any engineering team would value.

Read Full Post »

Our Ginger Slides’ Primer – Thinking To Move Your Data From On-Premise SQL Server to AWS DynamoDB?

Posted in AWS, Hadoop, REDSHIFT, S3, Uncategorized, tagged aws-redshift, aws-s3, Hadoop, hdfs on November 8, 2017| Leave a Comment »

The benefits of running databases in the AWS are compelling but how do you get your data there? In this session, we will explore how to use the AWS Database Migration Service (DMS) to migrate on-premise SQL Server tables to DynamoDB in AWS at a very high level.

I will write up a follow-up blog post focusing on the nitty-gritty details of this migration. Until then, happy cloud surfing 🙂

This slideshow requires JavaScript.

Read Full Post »

AWS – No Password? No Problem! Generating Redshift Credentials via IAM Authentication

Posted in AWS, REDSHIFT, tagged aws-iam, aws-redshift on September 1, 2017| 1 Comment »

I know we are living in a world where phones and other devices with advanced biometric authentication have been increasingly becoming a norm. Apart from the tremendous convenience they offer, they also offer the highest level of security with no longer needing to type in a passcode and worrying about someone watching us over the shoulders. It should be the same with the databases that store our most valuable and secure information. In this article, I am going to show you how we can achieve that in a Redshift database hosted in Amazon cloud.

Commonly, Amazon Redshift users log on to the database by providing a database username and password or use a password file (.pgpass) in the user’s home directory with psql queries. Both these options require you to maintain passwords somewhere which is not always the best way to do. To better manage the access as an alternative to maintaining these credentials we can configure our systems to permit users to create user credentials and log on to the database based on their IAM credentials on the go.

Amazon Redshift provides the GetClusterCredentials API action to generate temporary database user credentials. We can configure our SQL client with Amazon Redshift JDBC or ODBC drivers that manage the process of calling the GetClusterCredentials action. They do so by retrieving the database user credentials and establishing a connection between your SQL client and your Amazon Redshift database. You can also use your database application to programmatically call the GetClusterCredentials action, retrieve database user credentials, and connect to the database.

Create an IAM Role or User With Permissions to Call GetClusterCredentials

Our SQL client needs permission to call the GetClusterCredentials action on our behalf. We manage those permissions by creating an IAM role and attaching an IAM permissions policy that grants (or restricts) access to the GetClusterCredentials action and related actions.

Create an IAM user or role.

Using the IAM service, create an IAM user or role. You can also use an existing user or role. For example, if you created an IAM role for identity provider access, you can attach the necessary IAM policies to that role. I have used an existing role for my test but here is how to create a new user if you need to.

Go to IAM service in AWS Portal and click on Add user

You can either choose Programmatic access or AWS Management Access.

Create and attach a policy to the above user

Go to Policies and click Create Policy

I picked ‘Create Your Own Policy’ so I can copy paste the below code. But you can let AWS create one for you if you choose ‘Policy Generator’

Once you have the Policy Document validate it for any errors and then click ‘Create Policy’

Copy paste the below policy document into the above screen. Make sure to update the “Resource” field for your service. See naming convention for Resource ARN for Redshift here


{
    "Version": "2012-10-17",
    "Statement": [
    {
        "Sid": "Stmt1510160971000",
        "Effect": "Allow",
        "Action": [
          "redshift:GetClusterCredentials"
         ],
        "Resource": [
            "arn:aws:redshift:us-west-2:1234567890:dbuser:datag/temp_creds_user",
            "arn:aws:redshift:us-west-2:1234567890:dbname:datag/dataguser"
         ]
     }
  ]
}

Attach the above policy

Once you create a new policy, now attach that to the user as below. This is like providing the user with the required privileges.

Click Add Permission

Select the attach policy

Click apply permission

Create a Database User and Database Groups

You can create a database user that you use to log on to the cluster database. If you create temporary user credentials for an existing user, you can disable the user’s password to force the user to log on with the temporary password. Alternatively, you can use the GetClusterCredentials Autocreate option to automatically create a new database user.

create user temp_creds_user password disable;
create group auto_login_group with user temp_creds_user;
grant all on all tables in schema public to group auto_login_group;

Use admin password to run the above queries in SQLWorkbench

Connecting through SQL Client Tool – Configuring JDBC connection

You can configure your SQL client with an Amazon Redshift JDBC (or ODBC) driver that manages the process of creating database user credentials and establishing a connection between your SQL client and your Amazon Redshift database.

Download the latest Amazon Redshift JDBC driver from the Configure a JDBC Connection page.

Important: The Amazon Redshift JDBC driver must be version 1.2.7.1003 or later.

Create a JDBC URL with the IAM credentials options

jdbc:redshift:iam://examplecluster.abcd1234.us-west-2.redshift.amazonaws.com:5439/temp_creds_user;

In SQLWorkbench URL field use the below connection string

jdbc:redshift:iam://examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439/dbname?AccessKeyID=abcd&amp;SecretAccessKey=abcde1234567890fghijkl

Add JDBC options that the JDBC driver uses to call the GetClusterCredentials API action. Don’t include these options if you call the GetClusterCredentials API action programmatically. From the below screenshot from SQLWorkbench, you will notice that the connection is successful even without providing a password.

Connecting through Redshift CLI or API – Generating IAM Database Credentials

To generate database credentials you need to run the below redshift CLI command with your cluster name and the username created above.

aws redshift get-cluster-credentials --cluster-identifier exampleCluster --db-user temp_creds_user --db-name birch --duration-seconds 3600</pre>

Below is an example output showing the database password generated on the fly that can be used for logging into redshift using PSQL commands. You can easily automate this command in bash to store the generated password in a file and supplying that file for logging in so as to eliminate the copy and paste work.