Skip to content

Backend Service

Overview

Saut backend service is deployed in aws infra structure. All the aws managed resources are hosted in me-south-1 region (bahrain).

Various AWS resources are used to run the backend service. In following sections each of these resources and their usage is disucssed.


VPC

The Saut backend service operates within its dedicated VPC, isolated from other virtual networks in the AWS cloud. This design ensures robust resource isolation for enhanced security. Access to these resources is managed through well-configured subnets and security groups.

VPC is created with 3 subnets and 2 availability zones.

  1. A public subnet, called ingress subnet is created where all inbound traffic from internet is received.
  2. A private subnet, called application subnet is created where all webserver instances will run. Traffic from ingress is routed to application.
  3. An isolated subnet, called data-storage subnet is created where aurora postgres db and elasticache instances will run. Traffic from application is routed to data-storage.

Security groups are configured in a such a way the resource access and traffic access goes uni directional from ingress -> application -> data-storage

A NAT Gateway is setup for access to internet, this allows a secure outbound internet access configuration for resources deployed in the private subnets

AWS CDK Code Snippet
import { CfnOutput, Stack, StackProps, aws_ec2 as ec2 } from 'aws-cdk-lib';
import { Construct } from 'constructs';
export class VpcStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);
    const appPrefix = process.env.APP_PREFIX;
    if (appPrefix) {
      // create vpc
      const vpc = new ec2.Vpc(this, `${appPrefix}`, {
        maxAzs: 2,
        cidr: '10.0.0.0/21',
        enableDnsSupport: true,
        natGateways: 1,
        subnetConfiguration: [
          // for load balancers and internet facing
          {
            cidrMask: 24,
            name: 'ingress',
            subnetType: ec2.SubnetType.PUBLIC,
          },
          // for application instances and containers
          {
            cidrMask: 24,
            name: 'application',
            subnetType: ec2.SubnetType.PRIVATE_WITH_NAT,
          },
          // for database, redis, elastic search and other storage solutions
          {
            cidrMask: 24,
            name: 'data-storage',
            subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
          },
        ],
      });
      new CfnOutput(this, 'VPC ARN:', {
        value: vpc.vpcArn,
      });
    } else {
      throw new Error('App prefix must be defined');
    }
  }
}

Database: AWS Aurora PostgreSQL Cluster

The database solution for the system is an Aurora PostgreSQL cluster, sized db.t3.medium, providing a balance of performance and cost-efficiency. The cluster is hosted within the data-storage subnet and secured by a dedicated security group, which restricts access exclusively to ECS tasks deployed in the application subnet. This configuration ensures a secure and isolated environment for database operations.

Current Configuration

  • Writer Instance: The cluster currently operates with a single writer instance, which handles all write and read operations.
  • Security: Access is tightly controlled through network configurations, ensuring that only authorized ECS tasks can communicate with the database.

Future Scalability

The architecture is designed with scalability in mind. While it currently utilizes a single writer instance, the cluster can be expanded to support a multi-reader, single-writer architecture. This setup allows for the addition of read replicas, enabling horizontal scaling of read operations to handle increased workload demands.

Backups

Aurora postgres supports point in time restore. Further, we also use AWS Backup manager to perform daily backups of the database.

With its robust features and flexibility, the Aurora PostgreSQL cluster ensures both reliable performance and the ability to grow as the application's needs evolve.

AWS CDK Code Snippet
import {
  CfnOutput,
  Stack,
  StackProps,
  aws_rds as rds,
  aws_secretsmanager as secretsmanager,
  aws_ec2 as ec2,
} from 'aws-cdk-lib';
import { ProvisionedClusterInstanceProps } from 'aws-cdk-lib/aws-rds';
import { Construct } from 'constructs';

export class AuroraStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    const appPrefix = process.env.APP_PREFIX;
    if (
      appPrefix &&
      process.env.VPC_ID &&
      process.env.DB_SECRET_ARN &&
      process.env.DB_NAME
    ) {
      const vpc = ec2.Vpc.fromLookup(this, `${appPrefix}-vpc-lookup`, {
        vpcId: process.env.VPC_ID,
      });
      const dbCreds = secretsmanager.Secret.fromSecretCompleteArn(
        this,
        `${appPrefix}-db-secret-lookup`,
        process.env.DB_SECRET_ARN || '',
      );
      const instanceProps: ProvisionedClusterInstanceProps = {
        instanceType: ec2.InstanceType.of(
          ec2.InstanceClass.BURSTABLE3,
          ec2.InstanceSize.MEDIUM,
        ),
        isFromLegacyInstanceProps: true,
      };
      const dbCluster = new rds.DatabaseCluster(this, `${appPrefix}-db`, {
        engine: rds.DatabaseClusterEngine.auroraPostgres({
          version: rds.AuroraPostgresEngineVersion.VER_15_2,
        }),
        credentials: rds.Credentials.fromSecret(dbCreds),
        defaultDatabaseName: process.env.DB_NAME,
        writer: rds.ClusterInstance.provisioned(`${appPrefix}-db-writer`, {
          ...instanceProps,
        }),
        vpcSubnets: {
          subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
        },
        vpc,
      });
      new CfnOutput(this, 'Aurora Cluster Identifier:', {
        value: dbCluster.clusterIdentifier,
      });
    } else {
      throw new Error('App prefix must be defined');
    }
  }
}

AWS ElastiCache: In-Memory Cache

An AWS ElastiCache instance, sized cache.t3.micro, is utilized as the in-memory caching solution. This instance is deployed within the data-storage subnet and secured by a dedicated security group, allowing access only to ECS tasks running in the application subnet.

Powered by open-source Redis, ElastiCache plays a dual role in the system:

  • Caching: It accelerates data retrieval and reduces database load by serving as a high-performance cache.
  • Background Job Storage: It acts as a reliable storage layer for the background job service, ensuring efficient task queuing and processing.

This integration enhances overall application responsiveness and scalability, providing a seamless user experience.

AWS CDK Code Snippet

``` typescript import { CfnOutput, Stack, StackProps, aws_ec2 as ec2, aws_elasticache as elasticache, } from 'aws-cdk-lib'; import { Construct } from 'constructs';

export class RedisStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);
    const appPrefix = process.env.APP_PREFIX;
    const vpc = ec2.Vpc.fromLookup(this, `${appPrefix}-vpc-lookup`, {
      vpcId: process.env.VPC_ID,
    });
    if (appPrefix) {
      // create elasticache cluster:
      const redisSecurityGroup = new ec2.SecurityGroup(
        this,
        `${appPrefix}-redis-security-group`,
        { vpc: vpc },
      );
      const redisSubnetGroup = new elasticache.CfnSubnetGroup(
        this,
        `${appPrefix}-redis-subnet-group`,
        {
          cacheSubnetGroupName: `${appPrefix}-redis-subnet-group`,
          description: 'The redis subnet group id',
          subnetIds: vpc.isolatedSubnets.map(subnet => subnet.subnetId),
        },
      );
      const redisCluster = new elasticache.CfnCacheCluster(
        this,
        `${appPrefix}-redis`,
        {
          cacheNodeType: 'cache.t3.micro',
          engine: 'redis',
          engineVersion: '7.0',
          numCacheNodes: 1,
          vpcSecurityGroupIds: [redisSecurityGroup.securityGroupId],
          cacheSubnetGroupName: redisSubnetGroup.cacheSubnetGroupName,
        },
      );
      redisCluster.addDependency(redisSubnetGroup);
      new CfnOutput(this, 'Redis URI:', {
        value: redisCluster.attrRedisEndpointAddress,
      });
    } else {
      throw new Error('App prefix must be defined');
    }
  }
}
```

AWS ECR: Container Repository

The system utilizes Amazon Elastic Container Registry (ECR) as the central repository for managing Docker images. This repository serves as a secure and scalable solution for storing and retrieving container images required by the backend services.

Image Creation Process

  • Source Code to Image: The backend Git repository is seamlessly transformed into optimized Docker images using AWS CodeBuild. CodeBuild automates the process of pulling the latest code, building it into a container image, and pushing the finalized image to the ECR repository.
  • Optimized for Deployment: The generated images are lightweight and production-ready, ensuring fast startup times and efficient resource utilization for ECS tasks.

Access Control

  • Restricted Access: Access to the ECR repository is tightly controlled to ensure security and compliance.
  • CodeBuild: Only the CodeBuild service role is allowed to push images to ECR during the build process.
  • ECS Tasks: ECS task roles are exclusively permitted to pull images from ECR during deployment.

By integrating Amazon ECR with CodeBuild and ECS, the system achieves a streamlined, secure, and efficient workflow for building, storing, and deploying containerized applications.

AWS CDK Code Snippet
import {
  CfnOutput,
  Duration,
  RemovalPolicy,
  Stack,
  StackProps,
  aws_ecr as ecr,
} from 'aws-cdk-lib';
import { Construct } from 'constructs';

export class EcrStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    const appPrefix = process.env.APP_PREFIX;
    if (appPrefix) {
      const repo = new ecr.Repository(this, `${appPrefix}`, {
        repositoryName: `${appPrefix}`,
        removalPolicy: RemovalPolicy.DESTROY,
        lifecycleRules: [
          {
            maxImageAge: Duration.days(90),
          },
        ],
      });
      new CfnOutput(this, 'ECR ARN:', {
        value: repo.repositoryArn,
      });
    } else {
      throw new Error('App prefix must be defined');
    }
  }
}

AWS Elastic Container Service: Backend application servers

The system leverages Amazon Elastic Container Service (ECS) with AWS Fargate for deploying and managing application servers. This architecture ensures a cost-efficient, scalable, and high-performance deployment model, optimized for varying workloads.

ECS Cluster and Services

The ECS cluster is configured with three distinct Fargate services to address different operational needs:

1. Web Application Servers

  • Hosts the core application logic.
  • Integrated with an Application Load Balancer for efficient traffic distribution.
  • Equipped with auto-scaling capabilities, enabling dynamic adjustment of task instances based on real-time demand.

2. Background Job Service

  • Designed for processing asynchronous tasks.
  • Includes auto-scaling support to handle spikes in job queue workloads efficiently.

3. One-Off Service for Remote Host Execution

  • Configured to handle specific, on-demand tasks requiring remote execution.
  • Lightweight and task-specific to minimize resource usage.

Cost Efficiency and Resource Management

  • Minimal Task Sizes: Each service is configured with the smallest task sizes necessary for efficient operation, ensuring cost optimization without compromising performance.
  • Auto-Scaling: All services are equipped with auto-scaling, allowing resource allocation to flex dynamically based on usage, which eliminates over-provisioning and reduces operational costs.

Containerized Deployment

  • Image Source: All ECS tasks are container-based and securely pull the required Docker images from Amazon ECR.
  • Task Isolation: Each Fargate task operates within its own security boundary, ensuring robust isolation and controlled access.

Security and Permissions

  • Dedicated Security Groups: Each ECS Fargate task is assigned its own security group. These groups are configured to grant precise permissions required for accessing system resources, such as:
  • Databases
  • In-Memory Cache
  • S3 Buckets
  • Other Subsystem Components

By combining the flexibility of AWS Fargate with the scalability of ECS, the system achieves a deployment strategy that excels in cost efficiency, performance, and adaptability to changing demands.

AWS CDK Code Snippet
import {
  CfnOutput,
  CfnResource,
  Stack,
  StackProps,
  aws_ec2 as ec2,
  aws_elasticache as elasticache,
  aws_secretsmanager as secretsmanager,
  aws_ecs as ecs,
  aws_rds as rds,
  aws_ssm as ssm,
  aws_ecr as ecr,
  aws_ecs_patterns as ecs_patterns,
  aws_iam as iam,
  Duration,
} from 'aws-cdk-lib';
import { ssmList } from '../../ssm_parameters';
import { Construct } from 'constructs';

export class EcsStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);
    const appPrefix = process.env.APP_PREFIX;
    if (appPrefix) {
      const environment = {
        RAILS_ENV: 'production',
        RACK_ENV: 'production',
        NODE_ENV: 'production',
        REDIS_URL: `redis://${process.env.REDIS_HOST}:6379`,
        RAILS_SERVE_STATIC_FILES: '1',
        RAILS_LOG_TO_STDOUT: '1',
      };
      const redisSecurityGroup = ec2.SecurityGroup.fromLookupById(
        this,
        `${appPrefix}-redis-sg-lookup`,
        process.env.REDIS_SG_ID || '',
      );
      const redisConnections = new ec2.Connections({
        securityGroups: [redisSecurityGroup],
        defaultPort: ec2.Port.tcp(6379),
      });
      const dbCreds = secretsmanager.Secret.fromSecretCompleteArn(
        this,
        `${appPrefix}-db-secret-lookup`,
        process.env.DB_SECRET_ARN || '',
      );
      const dbSecurityGroup = ec2.SecurityGroup.fromLookupById(
        this,
        `${appPrefix}-db-sg-lookup`,
        process.env.AURORA_SG_ID || '',
      );
      const dbConnections = new ec2.Connections({
        securityGroups: [dbSecurityGroup],
        defaultPort: ec2.Port.tcp(5432),
      });
      const dbCluster = rds.DatabaseCluster.fromDatabaseClusterAttributes(
        this,
        `${appPrefix}-redis-db-lookup`,
        {
          clusterIdentifier: dbCreds
            .secretValueFromJson('dbClusterIdentifier')
            .toString(),
          port: 5432,
        },
      );
      let secrets: Record<string, ecs.Secret> = {
        DB_HOST: ecs.Secret.fromSecretsManager(dbCreds, 'host'),
        DB_DATABASE: ecs.Secret.fromSecretsManager(dbCreds, 'dbname'),
        DB_USERNAME: ecs.Secret.fromSecretsManager(dbCreds, 'username'),
        DB_PASSWORD: ecs.Secret.fromSecretsManager(dbCreds, 'password'),
      };
      ssmList.map(item => {
        const param = ssm.StringParameter.fromStringParameterName(
          this,
          `${appPrefix}-ssm-lookup-${item.name}`,
          `${appPrefix}-${item.name}`,
        );
        secrets[item.env_name] = ecs.Secret.fromSsmParameter(param);
      });

      const ecrRepo = ecr.Repository.fromRepositoryArn(
        this,
        `${appPrefix}-ecr-lookup`,
        process.env.ECR_ARN || '',
      );

      const webTaskDef = new ecs.FargateTaskDefinition(
        this,
        `${appPrefix}-web-task-definition`,
        {
          cpu: 512,
          memoryLimitMiB: 1024,
        },
      );

      // web def
      const webContainer = webTaskDef.addContainer('web-prod', {
        image: ecs.ContainerImage.fromEcrRepository(ecrRepo, 'cdk-deploy'),
        command: ['bundle', 'exec', 'puma', '-C', 'config/puma.rb'],
        memoryReservationMiB: 1024,
        cpu: 512,
        essential: true,
        environment,
        logging: new ecs.AwsLogDriver({
          streamPrefix: `${appPrefix}-web-container`,
        }),
        secrets,
      });

      webContainer.addPortMappings({
        containerPort: 3000,
        protocol: ecs.Protocol.TCP,
      });

      const vpc = ec2.Vpc.fromLookup(this, `${appPrefix}-vpc-lookup`, {
        vpcId: process.env.VPC_ID,
      });

      const ecsCluster = new ecs.Cluster(this, `${appPrefix}-cluster`, { vpc });

      const fargateWebService =
        new ecs_patterns.ApplicationLoadBalancedFargateService(
          this,
          `${appPrefix}-web-service`,
          {
            memoryLimitMiB: 1024,
            cpu: 512,
            cluster: ecsCluster,
            taskDefinition: webTaskDef,
          },
        );
      fargateWebService.targetGroup.configureHealthCheck({
        path: '/health',
        healthyThresholdCount: 2,
        interval: Duration.seconds(30),
      });

      const webScaling = fargateWebService.service.autoScaleTaskCount({
        minCapacity: 1,
        maxCapacity: 2,
      });

      webScaling.scaleOnCpuUtilization(`${appPrefix}-web-service-cpu-scaling`, {
        targetUtilizationPercent: 40,
        scaleInCooldown: Duration.seconds(60),
        scaleOutCooldown: Duration.seconds(60),
      });

      webScaling.scaleOnMemoryUtilization(
        `${appPrefix}-web-service-memory-scaling`,
        {
          targetUtilizationPercent: 40,
          scaleInCooldown: Duration.seconds(60),
          scaleOutCooldown: Duration.seconds(60),
        },
      );

      dbConnections.allowDefaultPortFrom(fargateWebService.service);
      redisConnections.allowDefaultPortFrom(fargateWebService.service);

      // worker def
      const workerTaskDef = new ecs.FargateTaskDefinition(
        this,
        `${appPrefix}-worker-task-definition`,
        {
          cpu: 512,
          memoryLimitMiB: 1024,
          taskRole: webTaskDef.taskRole,
          executionRole: webTaskDef.executionRole,
        },
      );

      workerTaskDef.addContainer('worker', {
        image: ecs.ContainerImage.fromEcrRepository(ecrRepo, 'cdk-deploy'),
        command: ['bundle', 'exec', 'sidekiq', '-C', 'config/sidekiq.yml'],
        memoryLimitMiB: 1024,
        cpu: 512,
        environment,
        secrets,
        logging: new ecs.AwsLogDriver({
          streamPrefix: `${appPrefix}-worker-container`,
        }),
      });

      const fargateWorkerService = new ecs.FargateService(
        this,
        `${appPrefix}-worker-service`,
        {
          cluster: ecsCluster,
          taskDefinition: workerTaskDef,
          enableExecuteCommand: true,
        },
      );

      const workerScaling = fargateWorkerService.autoScaleTaskCount({
        minCapacity: 1,
        maxCapacity: 2,
      });
      workerScaling.scaleOnCpuUtilization(
        `${appPrefix}-worker-service-cpu-scaling`,
        {
          targetUtilizationPercent: 40,
          scaleInCooldown: Duration.seconds(60),
          scaleOutCooldown: Duration.seconds(60),
        },
      );

      workerScaling.scaleOnMemoryUtilization(
        `${appPrefix}-worker-service-memory-scaling`,
        {
          targetUtilizationPercent: 40,
          scaleInCooldown: Duration.seconds(60),
          scaleOutCooldown: Duration.seconds(60),
        },
      );

      dbConnections.allowDefaultPortFrom(fargateWorkerService);
      redisConnections.allowDefaultPortFrom(fargateWorkerService);

      // oneoff service
      const oneOffTaskDef = new ecs.FargateTaskDefinition(
        this,
        `${appPrefix}-oneoff-task-definition`,
        {
          cpu: 512,
          memoryLimitMiB: 1024,
          taskRole: webTaskDef.taskRole,
          executionRole: webTaskDef.executionRole,
        },
      );

      oneOffTaskDef.addContainer('oneoff', {
        image: ecs.ContainerImage.fromEcrRepository(ecrRepo, 'cdk-deploy'),
        command: [
          'sh',
          '-c',
          'bundle exec rails db:migrate && bundle exec puma -C config/puma.rb',
        ],
        memoryLimitMiB: 1024,
        cpu: 512,
        environment,
        secrets,
        logging: new ecs.AwsLogDriver({
          streamPrefix: `${appPrefix}-oneoff-container`,
        }),
      });

      const fargateOneoffService = new ecs.FargateService(
        this,
        `${appPrefix}-oneoff-service`,
        {
          cluster: ecsCluster,
          taskDefinition: oneOffTaskDef,
          enableExecuteCommand: true,
        },
      );

      dbConnections.allowDefaultPortFrom(fargateOneoffService);
      redisConnections.allowDefaultPortFrom(fargateOneoffService);

      const ssmManagedPolicy = {
        Version: '2012-10-17',
        Statement: [
          {
            Effect: 'Allow',
            Action: ['ssm:DescribeParameters'],
            Resource: '*',
          },
          {
            Effect: 'Allow',
            Action: ['ssm:GetParameters'],
            Resource: `arn:aws:ssm:${process.env.AWS_REGION}:${process.env.AWS_ACCOUNT_ID}:parameter/${appPrefix}-*`,
          },
          {
            Effect: 'Allow',
            Action: ['ssm:GetParameter'],
            Resource: `arn:aws:ssm:${process.env.AWS_REGION}:${process.env.AWS_ACCOUNT_ID}:parameter/${appPrefix}-*`,
          },
          {
            Effect: 'Allow',
            Action: ['ssm:GetParameterHistory'],
            Resource: `arn:aws:ssm:${process.env.AWS_REGION}:${process.env.AWS_ACCOUNT_ID}:parameter/${appPrefix}-*`,
          },
        ],
      };
      webTaskDef.executionRole?.addManagedPolicy(
        new iam.ManagedPolicy(this, `${appPrefix}-ssm-managed-policy`, {
          document: iam.PolicyDocument.fromJson(ssmManagedPolicy),
        }),
      );
    } else {
      throw new Error('App prefix must be defined');
    }
  }
}

AWS Code Pipeline: Continuous Deployment

The deployment process is fully automated using AWS CodePipeline, which integrates seamlessly with the application's GitHub repository. This setup ensures efficiency, performance, and scalability, making it an essential component of the CI/CD workflow.

1. Integration with GitHub

  • CodePipeline is configured with a trigger hook to monitor the GitHub repository for changes.
  • Any push to the repository automatically initiates the pipeline, eliminating the need for manual intervention.

2. Three Stages of Automation

  • Stage 1: Source
    • Clones the repository from GitHub, ensuring that the latest code changes are fetched.
  • Stage 2: Build
    • Triggers an AWS CodeBuild project to execute the build script.
    • This step containerizes the application into a Docker image and securely pushes the image to Amazon ECR.
  • Stage 3: Deploy
    • Uses the artifact generated by CodeBuild to deploy the application updates to the three ECS services (web application servers, background job service, and one-off tasks).

3. Slack Notifications

  • The pipeline is integrated with Slack to notify the team about the progress of each stage.
  • These real-time updates keep the team informed and enhance collaboration during deployments.

Benefits of Automation

  • Efficiency: CodePipeline eliminates manual steps, streamlining the deployment process and reducing the risk of errors.
  • Performance: The automated workflow ensures quick deployments, enabling faster release cycles.
  • Scalability: The pipeline is designed to scale with the application’s growth, supporting larger repositories, more complex builds, and additional services seamlessly.

Security and Permissions

  • Resource Access: The pipeline is equipped with appropriate security group permissions to interact with critical AWS resources such as:

    • Amazon ECR for Docker images.
    • Amazon ECS for deploying updates.
    • CodeBuild for building containerized artifacts.

With AWS CodePipeline, the deployment process is not just automated but optimized for high performance, ensuring the application is always up-to-date and ready to handle growing demands.


AWS Application Load Balancer

An AWS Application Load Balancer (ALB) is a critical component in our infrastructure, enabling efficient routing of HTTP and HTTPS traffic to application servers. It simplifies the complexities of request distribution, ensuring optimal performance, scalability, and security for the application.

1. Traffic Routing and Autoscaling

  • The ALB intelligently routes incoming requests to the appropriate ECS tasks, distributing the load evenly across the servers.
  • It provides vital metrics and health check data to ECS, enabling auto-scaling based on real-time traffic demand, ensuring that the system scales dynamically without over-provisioning resources.

2. SSL Termination and Certificate Validation

  • The ALB handles SSL termination, decrypting HTTPS traffic before forwarding it to the applications servers in ECS tasks.
  • This offloads the cryptographic workload from application servers, improving their performance.
  • It also manages SSL certificate validation, ensuring secure communication and boosting user trust.

3. Security and Access Control

  • The ALB is configured with strict security group rules, allowing only legitimate traffic to reach the application servers.
  • These rules enforce access restrictions at the network level, protecting against unauthorized or malicious requests.

The ALB setup is done by ecs patterns, the code used for setting up ECS also sets up the ALB.


AWS S3 - File storage

We leverage an Amazon S3 bucket to store and manage all file assets used by the system, such as images, PDF files, and other media attachments. The setup prioritizes security, performance, and efficiency, ensuring optimal storage and access.

1. Secure Access Control

  • By default, the S3 bucket access is restricted to prevent unauthorized access.
  • Only specific AWS resources, such as the application servers, are granted the necessary permissions to interact with the bucket.
  • File names are hashed to make it difficult to infer their contents from the names, adding an extra layer of security.

2. Folder Structure for Segmentation

  • The bucket is organized into two subfolders:
  • public folder:
    • Files in this folder are distributed via Amazon CloudFront CDN, ensuring faster access to frequently requested files by end users.
  • private folder:
    • Files in this folder can only be accessed through signed URLs, which are time-limited and securely generated to prevent unauthorized access.

3. Performance Optimization

  • The integration with CloudFront significantly reduces latency for accessing files stored in the public folder, providing a seamless experience for end users.
  • Signed URL implementation for the private folder ensures secure and efficient file access without exposing sensitive data.

4. Scalability and Reliability

  • Amazon S3 is designed to handle high volumes of traffic and large file sizes, ensuring the system can scale effortlessly as storage and access demands grow.
  • Redundancy built into S3 ensures high availability and data durability.

AWS Lambda - PDF generation service

To meet the requirement of generating a majority of system reports as PDFs, we utilize AWS Lambda to deploy a robust PDF service. This solution is powered by a Node.js-based PDF generator that uses Puppeteer, an excellent tool for rendering high-quality PDFs from HTML.

Why Puppeteer?

  • Puppeteer is built on Google Chrome’s rendering engine, ensuring highly accurate PDF generation with consistent layouts, fonts, and styles.
  • It supports modern web standards, making it ideal for converting complex HTML structures, including CSS and JavaScript, into pixel-perfect PDFs.

AWS Lambda and Puppeteer Integration

  • Puppeteer is packaged with custom AWS Lambda layers, allowing the service to run efficiently within the Lambda environment.
  • Lambda's on-demand scaling ensures the service is cost-effective, handling spikes in PDF generation requests without over-provisioning resources.

Service Setup

  1. HTML Input and Output Options
  2. The service accepts HTML content as input, processes it using Puppeteer, and generates either:

    • A downloadable S3 link for the generated PDF.
    • Binary data for direct consumption by other services.
  3. API Access via AWS API Gateway

  4. The service is exposed through a REST API endpoint using AWS API Gateway.
  5. This simplifies access for clients and external applications, providing a seamless interface to interact with the PDF generator.

  6. Enhanced Security with API Keys

  7. Access to the API endpoint is secured with an API key, ensuring that only authorized users and systems can utilize the service.
  8. This access control mechanism reduces the risk of unauthorized usage while maintaining simplicity for authorized clients.

Observability and Monitoring: System Reliability

To maintain system performance, detect issues early, and enable seamless debugging, our setup integrates robust observability tools. This includes AWS CloudWatch for logging and New Relic for application performance monitoring (APM) and error tracking, with real-time notifications via Slack. Observability is a crucial component of continuous deployment (CD), enabling proactive maintenance and minimizing downtime.

Logging with AWS CloudWatch

  1. Centralized Historical Logging

    • All application server and AWS Lambda logs are consolidated in AWS CloudWatch, providing a centralized platform for log storage.
    • CloudWatch efficiently handles historical log data, ensuring logs are retained and accessible for troubleshooting and auditing.
  2. Log Querying with CloudWatch Insights

    • CloudWatch Insights enables advanced querying and visualization of log data, making it easy to identify patterns, diagnose issues, and analyze historical trends.
  3. Why CloudWatch?

    • Scalable: Handles logs from multiple sources without impacting performance.
    • Cost-effective: Pay-as-you-go pricing ensures efficiency for varying log volumes.
    • Integrated: Works seamlessly with AWS services, streamlining log management.

Application Performance Monitoring with New Relic

  1. Comprehensive APM

    • New Relic is deployed to monitor application performance metrics, including response times, throughput, and system resource utilization.
    • It provides detailed insights into application behavior, helping identify bottlenecks and optimize performance.
  2. Error Monitoring and Alerts

    • New Relic tracks critical errors and provides a real-time error report, enabling quick identification and resolution of issues.
  3. Slack Integration for Real-Time Notifications

    • New Relic’s Slack integration ensures that critical alerts, such as downtimes or high-priority errors, are immediately sent to the team’s Slack channel.
    • This setup allows for real-time collaboration and faster response to potential problems.
  4. Why New Relic?

    • User-friendly: Intuitive dashboards make monitoring straightforward.
    • Proactive: Alerts and detailed error insights reduce mean time to recovery (MTTR).
    • Flexible: Easily integrates with multiple platforms, including Slack, for streamlined workflows.

Why Observability Matters in CD

  • Proactive Maintenance: Detect and resolve issues before they impact end users.
  • Faster Debugging: Historical logs and real-time performance data simplify troubleshooting.
  • Continuous Improvement: Insights from monitoring tools enable iterative enhancements to the system.
  • Reduced Downtime: Real-time alerts empower teams to address critical issues swiftly, minimizing impact on users.

By combining AWS CloudWatch for comprehensive logging and New Relic for performance monitoring and error tracking, with Slack integration for instant alerts, our observability setup ensures a secure, scalable, and reliable deployment pipeline.


AWS Infra Architecture Diagram

flowchart TD
  Cloudflare
  subgraph VPC
    direction TB

    subgraph PublicSubnet["Public Subnet"]
      ALB["Application Load Balancer"]
      NAT["NAT Gateway"]      
    end

    subgraph PrivateWithNAT["Private Subnet with NAT"]
      subgraph ECSCluster["ECS Cluster"]
        WebService["Web Service"]
        BackgroundService["Background Service"]
        OneOffService["One-Off Service"]
      end
    end

    subgraph PrivateIsolated["Private Isolated Subnet"]
      Aurora["Aurora Postgres"]
      Elasticache["Elasticache (Redis)"]
    end
  end

  %% Connections
  NAT --> PrivateWithNAT
  PublicSubnet --> NAT
  ALB -->|Target Group| WebService
  WebService --> Aurora
  WebService --> Elasticache
  BackgroundService --> Elasticache
  BackgroundService --> Aurora
  OneOffService -->|Exec Commands| RemoteHost[Remote Host]
  OneOffService --> Elasticache
  OneOffService --> Aurora  
  Cloudflare --> ALB