Sunday, 16 February 2020

AWS Notes


NAT

End Point



 v




  • Custom security groups do not have inbound allow rules (all inbound traffic is denied by default)
  • Default security groups do have inbound allow rules (allowing traffic from within the group)
  • All outbound traffic is allowed by default in custom and default security groups

  • S3 pre-signed URLs can be used to provide temporary access to a specific object to those who do not have AWS credentials. This is the best option

Options for storing logs:
  • CloudWatch Logs
  • Centralized logging system (e.g. Splunk)
  • Custom script and store on S3

  • Kinesis Data Firehose is the easiest way to load streaming data into data stores and analytics tools
  • Captures, transforms, and loads streaming data
  • Enables near real-time analytics with existing business intelligence tools and dashboards
  • Firehose can invoke an AWS Lambda function to transform incoming data before delivering it to a destination
  • For Amazon Redshift destinations, streaming data is delivered to your S3 bucket first
  • Kinesis Data Firehose then issues an Amazon Redshift COPY command to load data from your S3 bucket to your Amazon Redshift cluster
  • If data transformation is enabled, you can optionally back up source data to another Amazon S3 bucket

DynamoDB charges:
  • DynamoDB is more cost effective for read heavy workloads
  • Priced based on provisioned throughput (read/write) regardless of whether you use it or not
  • Write throughput per hour for every 10 units
  • Read throughput per hour for every 50 units
  • Indexed data storage
  • Internet data transfer (outside of a region)
  • S3 event notifications triggering a Lambda function is completely serverless and cost-effective
  • AWS Glue can trigger ETL jobs that will transform that data and load it into a data store such as S3
  • Kinesis Data Streams is used for processing data, rather than extracting and transforming it. The Kinesis consumers are EC2 instances which are not as cost-effective as serverless solutions
  • AWS Data Pipeline can be used to automate the movement and transformation of data, it relies on other services to actually transform the data
  • Virtual Private Gateway: The Amazon VPC side of a VPN connection
  • Customer Gateway: Your side of a VPN connection


  • Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run
  • Amazon RedShift is used for analytics but cannot analyze data in S3
  • AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. It is not used for analyzing data in S3
  • AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals

  • The AWS KMS API can be used for encrypting data keys (envelope encryption)
  • AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services and your internal connected resources
  • The AWS Security Token Service (STS) is a web service that enables you to request temporary, limited-privilege credentials for AWS Identity and Access Management (IAM) users or for users that you authenticate (federated users)
  • IAM access keys are used for signing programmatic requests you make to AWS

  • ElastiCache is a web service that makes it easy to deploy and run Memcached or Redis protocol-compliant server nodes in the cloud
  • The in-memory caching provided by ElastiCache can be used to significantly improve latency and throughput for many read-heavy application workloads or compute-intensive workloads
    • Memcached
    • Redis
     Not persistentData is persistent
    Cannot be used as a data storeCan be used as a datastore
    Supports large nodes with multiple cores or threadsNot multi-threaded
     Scales out and in, by adding and removing nodesScales by adding shards, not nodes




  • There are no additional charges for using Regional Edge Caches
  • You can write to regional edge caches too
  • Glacier objects are visible through S3 only (not Glacier directly)
  • The contents of an archive that has been uploaded cannot be modified
  • Uploading archives is synchronous
  • Downloading archives is asynchronous
  • Retrieval can take a few hours

  • Amazon S3 offers to version. You can use versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. With versioning, you can easily recover from both unintended user actions and application failures
  • Amazon EBS, EFS, and CloudFront do not offer to version

  • There is a charge if you delete data within 90 days – however, we are not talking about deleting data here, just retrieving it
  • Retrieved data is available for 24 hours by default (can be changed)
  • Amazon Glacier must complete a job before you can get its output
  • Glacier automatically encrypts data at rest using AES 256 symmetric keys and supports secure transfer of data over SSL
  • Retrieved data will not be encrypted if it was uploaded unencrypted
  • Amazon Elastic Transcoder is a highly scalable, easy to use and cost-effective way for developers and businesses to convert (or “transcode”) video and audio files from their source format into versions that will playback on devices like smartphones, tablets and PCs
  • MediaConvert converts file-based content for broadcast and multiscreen delivery
  • Data Pipeline helps you move, integrate, and process data across AWS compute and storage resources, as well as your on-premises resources
  • Rekognition is a deep learning-based visual analysis service

  • EFS:
  • - EFS is elastic and grows and shrinks as you add and remove data
  • - Can concurrently connect 1 to 1000s of EC2 instances, from multiple AZs
  • - A file system can be accessed concurrently from all AZs in the region where it is located
  • - Throughput can be 10+ GB per second
  • EBS volumes cannot be accessed by multiple instances
  • S3 is an object store, not a file system and does not store data across multiple AZs (S3 is stored across multiple facilities in the region)
  • Storage Gateway is used for on-premises storage management


  • CloudFront is ideal for caching static content such as the files in this scenario and would increase performance
  • Moving the files to EBS would not make accessing the files easier or improve performance
  • Reducing the file size of the images may result in better retrieval times, however CloudFront would still be the preferable option
  • Using Spot EC2 instances may reduce EC2 costs but it won't improve user experience
  • Access policies define access to resources and can be associated with resources (buckets and objects) and users
  • You can use the AWS Policy Generator to create a bucket policy for your Amazon S3 bucket
  • You can define permissions on objects when uploading and at any time afterwards using the AWS Management Console
  • CRR is an Amazon S3 feature that automatically replicates data across AWS Regions
  • With CRR, every object uploaded to an S3 bucket is automatically replicated to a destination bucket in a different AWS Region that you choose
  • AMIs that are backed by EBS snapshots can be copied between regions
  • You cannot modify an ASG launch configuration, you must create a new launch configuration and specify the copied AMI

  • Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, and IP addresses
  • Elastic Load Balancing provides fault tolerance for applications by automatically balancing traffic across targets – Amazon EC2 instances, containers and IP addresses – and Availability Zones while ensuring only healthy targets receive traffic
  • AWS Budgets gives you the ability to set custom budgets that alert you when your costs or usage exceed (or are forecasted to exceed) your budgeted amount
  • Budget alerts can be sent via email and/or Amazon Simple Notification Service (SNS) topic

  • AWS Device Farm is an app testing service that lets you test and interact with your Android, iOS, and web apps on many devices at once, or reproduce issues on a device in real time
  • Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily. It is not used for testing
  • Amazon WorkSpaces is a managed, secure cloud desktop service

  • Single-node clusters do not support data replication
  • Manual backups are not automatically deleted when you delete a cluster

Cross-region replication allows you to replicate across regions:
  • Amazon DynamoDB global tables provides a fully managed solution for deploying a multi-region, multi-master database
  • When you create a global table, you specify the AWS regions where you want the table to be available
  • DynamoDB performs all of the necessary tasks to create identical tables in these regions, and propagate ongoing data changes to all of them

  • Basic monitoring sends EC2 metrics to CloudWatch about ASG instances every 5 minutes
  • Detailed can be enabled and sends metrics every 1 minute (chargeable)
  • When the launch configuration is created from the CLI detailed monitoring of EC2 instances is enabled by default
  • When you enable Auto Scaling group metrics, Auto Scaling sends sampled data to CloudWatch every minute

  • Redis engine stores data persistently
  • Memached engine does not store data persistently
  • Redis engine supports Multi-AZ using read replicas in another AZ in the same region
  • You can have a fully automated, fault tolerant ElastiCache-Redis implementation by enabling both cluster mode and multi-AZ failover
  • Memcached engine does not support Multi-AZ failover or replication
  • ALB supports IP addresses as targets
  • IP addresses as targets allows load balancing any application hosted in AWS or on-premises using IP addresses of the application back-ends as targets
  • Requires a VPN or Direct Connect connection
  • Amazon Relational Database Service (Amazon RDS) is a managed service that makes it easy to set up, operate, and scale a relational database in the cloud
  • Multi-AZ RDS creates a replica in another AZ and synchronously replicates to it (DR only)

  • You can authenticate using an MFA device in the following two ways:
    • Through the AWS Management Console – the user is prompted for a user name, password and authentication code
    • Using the AWS API – restrictions are added to IAM policies and developers can request temporary security credentials and pass MFA parameters in their AWS STS API requests
    • Using the AWS CLI by obtaining temporary security credentials from STS (aws sts get-session-token)
  • Peering connections can be created with VPCs in different regions (available in most regions now)
  • Data sent between VPCs in different regions is encrypted (traffic charges apply)
  • Must update route tables to configure routing
  • Must update the inbound and outbound rules for VPC security group to reference security groups in the peered VPC
  • When creating a VPC peering connection with another account you need to enter the account ID and VPC ID from the other account

  • Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs
  • Amazon Kinesis Data Analytics is the easiest way to process and analyze real-time, streaming data
  • Kinesis Data Analytics can use standard SQL queries to process Kinesis data streams
  • Kinesis Data Analytics can ingest data from Kinesis Streams and Kinesis Firehose
  • RAID 0 = 0 striping – data is written across multiple disks and increases performance but no redundancy
  • RAID 1 = 1 mirroring – creates 2 copies of the data but does not increase performance, only redundancy
  • SSD, Provisioned IOPS – I01 provides higher performance than General Purpose SSD (GP2) and you can specify the IOPS required up to 50 IOPS per GB and a maximum of 32000 IOPS
  • RDS read replicas cannot be created from EC2 instances
  • There is no lifecycle policy available for EBS or EFS
  • With S3 you can create a lifecycle action using the "expiration action element" which expires objects (deletes them) at the specified time
  • S3 lifecycle actions apply to any storage class, including Glacier, however Glacier would not allow immediate download
  • A password policy can be defined for enforcing password length, complexity etc. (applies to all users)
  • You can allow or disallow the ability to change passwords using an IAM policy
  • Public subnets are subnets that have:
    • “Auto-assign public IPv4 address” set to “Yes” which will assign a public IP
    • The subnet route table has an attached Internet Gateway
  • The instance will also need to a security group with an inbound rule allowing the traffic

  • Amazon EMR is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data
  • EMR utilizes a hosted Hadoop framework running on Amazon EC2 and Amazon S3
  • EMR uses Apache Hadoop as its distributed data processing engine which is an open source, Java software framework that supports data-intensive distributed applications running on large clusters of commodity hardware


19th Dec


  1. Availability Zones are distinct locations that are engineered to be isolated from failures in other Availability Zones
  2. Availability Zones are connected with low latency, high throughput, and highly redundant networking
  3. With EC2 you have full control at the operating system layer
  4. RDS is a fully managed service and you do not have access to the underlying EC2 instance (no root access)
  5. The “DeleteOnTermination” value relates to EBS volumes not EC2 instances
An EBS-backed EC2 instance has been configured with some proprietary software that uses an embedded license. You need to move the EC2 instance to another Availability Zone (AZ) within the region. How can this be accomplished? Choose the best answer.   
Ans:
  • You can take a snapshot, launch an instance in the destination AZ. Stop the instance, detach its root volume, create a volume from the snapshot you took and attach it to the instance. However, this is not the best option
  • The easiest and recommended option is to create an AMI (image) from the instance and launch an instance from the AMI in the other AZ. AMIs are backed by snapshots which in turn are backed by S3 so the data is available from any AZ within the region
  • There’s no way to move an EC2 instance from the management console
  • You cannot perform a copy operation to move the instance

  • Kinesis Data Firehose is the easiest way to load streaming data into data stores and analytics tools
  • Captures, transforms, and loads streaming data
  • Can invoke a Lambda function to transform data before delivering it to destinations
  • Firehose Destinations include:
  • - Amazon S3
  • - Amazon Redshift
  • - Amazon Elasticsearch Service
  • - Splunk
  • For Splunk destinations, streaming data is delivered to Splunk, and it can optionally be backed up to your S3 bucket concurrently

  • You cannot create a deny rule with a security group
  • You cannot use the route table to create security rules
  • NAT Gateways are used for allowing instances in private subnets to access the Internet, they do not provide any inbound services
  • Network ACLs can be used to apply deny rules to lists of specific IP addresses

  • EFS is a fully-managed service that makes it easy to set up and scale file storage in the Amazon Cloud
  • EFS uses the NFSv4.1 protocol
  • EFS is elastic and grows and shrinks as you add and remove data
  • Can concurrently connect 1 to 1000s of EC2 instances, from multiple AZs
  • A file system can be accessed concurrently from all AZs in the region where it is located
  • Amazon EFS is designed to burst to allow high throughput levels for periods of time
  • Queues can be either standard or first-in-first-out (FIFO)
  • Standard queues provide a loose-FIFO capability that attempts to preserve the order of messages
  • Standard queues provide at-least-once delivery, which means that each message is delivered at least once
  • FIFO (first-in-first-out) queues preserve the exact order in which messages are sent and received
  • FIFO queues are available in limited regions currently
  • If you use a FIFO queue, you don’t have to place sequencing information in your message
  • FIFO queues provide exactly-once processing, which means that each message is delivered once and remains available until a consumer processes it and deletes it


  • Multi-AZ RDS creates a replica in another AZ and synchronously replicates to it (DR only)