Wednesday 7 October 2020

Event-Driven ETL job using Python on AWS

Automate an ETL processing CI/CD pipeline for COVID-19 data using Python and cloud services.


What is the Challenge?

The challenge is to implement a event driven Python ETL processing job which will run daily. It will download and process US Covid19 data from different data sources and create a dashboard to visualize the same using AWS QuickSight.

Challenge details can be found at the link here

Approach and Steps towards problem solving

  1. ETL Job: Created a Lambda function in Python as asked by author and scheduled it as a daily job with the help of CloudWatch events rules.
  2. Extraction & Transformation: We need to download two CSV files from different urls through Python Lambda code, which i did by using pandas Python library. Converted the columns like date into proper format using separate Python transformation module. Filtered the dataframes only for US data as asked and kept only those columns which are required(date, cases, deaths, recovered). Merged two data sets into one and same data is loaded into a database in the next step.
  3. Load: Transformed and well formatted data which is returned by above Python module is then loaded into a PostgreSQL RDS instance. Initially i loaded data into DynamoDB but later on realized that there is no direct integration for DynamoDB as a data source with Quicksight. Tried inserting the same data as a JSON file into S3 and generated a dashboard with it in QuickSight also. Thought of using DynamoDB streams also to load daily data and update the JSON file in S3. But appending the JSON file in S3 daily seemed to be not an optimal solution so ended up inserting data into PostgreSQL RDS instance.
  4. Notifications: Notifications have been configured on successfully completion of the ETL job and communicating the number of records inserted into the database daily in email. Any other failures or exceptions in Lambda code will also be notified using SNS topic and an email will be sent through SNS subscription.
  5. Infrastructure as Code: Implemented a CloudFormation template for all the AWS resources used and same is also integrated with the CI/CD pipeline which will update the infrastructure as needed when any code changes are pushed to the code repository. Refer CloudFormation template here.
  6. CI/CD Pipeline: Automated CI/CD pipeline has been setup using the CloudFormation template and AWS resources such as CodePipeline, CodeBuild, CodeDeploy etc.. Pipeline will be triggered as and when code is pushed to the code repository. It has 3 stages Source, Build and Deploy. Build stage will use below command to package the Lambda function code and generate a output template yaml file. The packaged code will be deployed as a Python Lambda function.

aws cloudformation package — template-file etl-covid19-cloudformation.yml — s3-bucket cloud-guru-challenge-etl-lambda-code-rahul — output-template-file output.yml


QuickSight Dashboard

Below is the QuickSight dashboard generated from the above ETL processing job which displays US Covid19 data in different visuals.



CloudFormation diagram

You can refer my GitHub repository above for the working code and different templates used.


Learnings and Challenges faced

  1. Created Python layers manually from EC2 instance for pandas and psycopg2 python libraries which are used for manipulating CSV data and PostgreSQL database. Later on moved python layers to cloud formation template as well.
  2. CloudFormation is the most interesting part of the whole exercise and I learned a lot. Ended up creating all resources through CloudFormation including entire CI/CD pipeline. I am also from the DevOps background and I really enjoyed implementing it through CloudFormation.
  3. Initially loaded data in DynamoDB, but it is not present as a data source for QuickSight. So figured out a way to dump JSON data into S3 and generate dashboard through it. But felt it wasn't an optimal solution so ended up using PostgreSQL for loading the data and generated dashboard through it.
  4. Code pipeline build stage which i used for packaging the Lambda code and generating the output template file. Used same template file in the deploy stage in order to update the cloud formation stack. Was stuck at this stage for some time but figured out the way.


Conclusion

It is a very interesting challenge by Forrest Brazeal and I am literally amazed about the amount of AWS learning and practical experience it has given to me :) It was great learning experience and I would really appreciate any feedback or comments regarding the approach and the blog.

Thank you :)

https://medium.com/@rahulwadekar/event-driven-python-on-aws-cloudguruchallenge-2123453ac75e

https://www.linkedin.com/in/rahul-wadekar-04420b12/



Thursday 23 July 2020

Docker Container Commands


Docker is a platform for developers and sysadmins to build, run, and share applications with containers.

Below article will show different commands those can be used to manage docker containers. 


List running containers

docker ps
docker container ps
docker container ls

List running containers


List both running and stopped containers

docker ps -a
docker container ps -a
docker container ls -a

List both running and stopped containers


Show disk usage by container

docker ps -s
docker container ls -s

Show disk usage by container



Filter and list containers based on STATUS

docker ps --filter status=running
docker container ls --filter status=running

Filter and list containers based on STATUS


Filter and list containers based on EXITED

docker ps -a --filter 'exited=0'
docker container ls -a --filter 'exited=0'

Filter and list containers based on EXITED



Filter and list containers based on NAME

 docker ps --filter "name=nginx"
 docker container ls --filter "name=nginx"

Filter and list containers based on NAME



Formatting: outputs the ID and Command entries separated by a colon (:) for all running containers:

docker container ls --format "{{.ID}}: {{.Command}}"

Formatting: outputs the ID and Command entries separated by a colon (:) for all running containers:



Formatting: To list all running containers with their labels in a table format you can use:

docker container ls --format "table {{.ID}}\t{{.Labels}}"

Formatting: To list all running containers with their labels in a table format you can use:


Create a container from existing docker image without running it

docker create <IMAGE_NAME>
docker container create <IMAGE_NAME>

docker create nginx
docker container create --name nginx-proxy nginx

Create a container from existing docker image without running it


Rename existing container name

docker rename <EXISTING_CONTAINER_NAME> <NEW_CONTAINER_NAME>
docker container rename <EXISTING_CONTAINER_NAME> <NEW_CONTAINER_NAME>

docker container rename nginx-proxy new_nginx_proxy
docker container ls -a

Rename existing container name



Delete a container

docker rm <CONTAINER_ID>

docker rm nginx
docker container stop proxy_nginx
docker container rm new_nginx_proxy

Delete a container



Start docker container

docker start <CONTAINER_ID>
docker container start <CONTAINER_ID>

docker start nginx
docker container start nginx

Start docker container



Stop a running container

docker stop <CONTAINER_ID>

docker container stop nginx

Stop a running container


Stop a running container and start it again

docker restart <CONTAINER_ID>
docker container restart <CONTAINER_ID>

docker restart nginx
docker container restart nginx

Stop a running container and start it again


Pause processes in a running container

docker pause <CONTAINER_ID>
docker container pause <CONTAINER_ID>

docker pause nginx
docker container pause nginx

Pause processes in a running container


Unpause processes in a running container

docker unpause <CONTAINER_ID>
docker container unpause <CONTAINER_ID>

docker unpause nginx
docker container unpause nginx

Unpause processes in a running container


Block a container until others stop (after which it prints their exit codes)

docker wait <CONTAINER_ID>
docker container wait <CONTAINER_ID>

docker wait nginx
docker container wait nginx

Block a container until others stop (after which it prints their exit codes)


Kill a running container

docker kill <CONTAINER_ID>
docker container kill <CONTAINER_ID>

docker kill nginx
docker container kill nginx

Kill a running container


Attach local standard input, output, and error streams to a running container

docker attach <CONTAINER_ID>
docker container attach <CONTAINER_ID>

docker run -itd --name alpine alpine
docker attach alpine

Attach local standard input, output, and error streams to a running container


Hope this list of docker container commands will be helpful to some one in need. 

Thanks You.



Docker Management Commands


Below article will show a list of commonly used Docker Management Commands. 
It also contains a consolidate view of all the commands in a single image below which can be used as quick reference or as a cheat sheet. 

Docker Management Commands for quick reference


Management Commands

Below is the list of docker management commands available. Best way to get this list by doing docker --help on your command line interface and it will display all the management commands those are available.

  •   container   Manage containers
  •   image        Manage images
  •   volume      Manage volumes
  •   network     Manage networks
  •   builder       Manage builds
  •   config        Manage Docker configs
  •   context      Manage contexts
  •   plugin        Manage plugins
  •   secret         Manage Docker secrets
  •   service       Manage services
  •   stack          Manage Docker stacks
  •   swarm        Manage Swarm
  •   node           Manage Swarm nodes
  •   system        Manage Docker
  •   trust            Manage trust on Docker images

Manage containers

Usage:  docker container COMMAND

Commands:
  •   attach      Attach local standard input, output, and error streams to a running container
  •   commit   Create a new image from a container's changes
  •   cp            Copy files/folders between a container and the local filesystem
  •   create      Create a new container
  •   diff          Inspect changes to files or directories on a container's filesystem
  •   exec        Run a command in a running container
  •   export      Export a container's filesystem as a tar archive
  •   inspect     Display detailed information on one or more containers
  •   kill           Kill one or more running containers
  •   logs          Fetch the logs of a container
  •   ls              List containers
  •   pause        Pause all processes within one or more containers
  •   port           List port mappings or a specific mapping for the container
  •   prune        Remove all stopped containers
  •   rename     Rename a container
  •   restart       Restart one or more containers
  •   rm            Remove one or more containers
  •   run           Run a command in a new container
  •   start         Start one or more stopped containers
  •   stats         Display a live stream of container(s) resource usage statistics
  •   stop          Stop one or more running containers
  •   top           Display the running processes of a container
  •   unpause   Unpause all processes within one or more containers
  •   update     Update configuration of one or more containers
  •   wait         Block until one or more containers stop, then print their exit codes


Manage images

Usage:  docker image COMMAND

Commands:
  •   build        Build an image from a Dockerfile
  •   history     Show the history of an image
  •   import      Import the contents from a tarball to create a filesystem image
  •   inspect     Display detailed information on one or more images
  •   load          Load an image from a tar archive or STDIN
  •   ls              List images
  •   prune       Remove unused images
  •   pull          Pull an image or a repository from a registry
  •   push         Push an image or a repository to a registry
  •   rm            Remove one or more images
  •   save         Save one or more images to a tar archive (streamed to STDOUT by default)
  •   tag           Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE


Manage volumes

Usage:  docker volume COMMAND

Commands:
  •   create       Create a volume
  •   inspect     Display detailed information on one or more volumes
  •   ls              List volumes
  •   prune       Remove all unused local volumes
  •   rm            Remove one or more volumes

Manage networks

Usage:  docker network COMMAND

Commands:
  •   connect       Connect a container to a network
  •   create          Create a network
  •   disconnect  Disconnect a container from a network
  •   inspect        Display detailed information on one or more networks
  •   ls                 List networks
  •   prune          Remove all unused networks
  •   rm               Rmove one or more networks


Manage builds

Usage:  docker builder COMMAND

Commands:
  •   build        Build an image from a Dockerfile
  •   prune       Remove build cache


Manage Docker configs

Usage:  docker config COMMAND

Commands:
  •   create       Create a config from a file or STDIN
  •   inspect     Display detailed information on one or more configs
  •   ls              List configs
  •   rm            Remove one or more configs

Manage contexts

Usage:  docker context COMMAND

Commands:
  •   create       Create a context
  •   export      Export a context to a tar or kubeconfig file
  •   import      Import a context from a tar or zip file
  •   inspect     Display detailed information on one or more contexts
  •   ls              List contexts
  •   rm            Remove one or more contexts
  •   update      Update a context
  •   use           Set the current docker context

Manage plugins

Usage:  docker plugin COMMAND

Commands:
  •   create       Create a plugin from a rootfs and configuration. Plugin data directory must contain                        config.json and rootfs directory.
  •   disable     Disable a plugin
  •   enable      Enable a plugin
  •   inspect     Display detailed information on one or more plugins
  •   install       Install a plugin
  •   ls              List plugins
  •   push         Push a plugin to a registry
  •   rm            Remove one or more plugins
  •   set            Change settings for a plugin
  •   upgrade    Upgrade an existing plugin

Manage Docker secrets

Usage:  docker secret COMMAND

Commands:
  •   create       Create a secret from a file or STDIN as content
  •   inspect     Display detailed information on one or more secrets
  •   ls              List secrets
  •   rm            Remove one or more secrets

Manage services

Usage:  docker service COMMAND

Commands:
  •   create       Create a new service
  •   inspect     Display detailed information on one or more services
  •   logs          Fetch the logs of a service or task
  •   ls              List services
  •   ps             List the tasks of one or more services
  •   rm            Remove one or more services
  •   rollback   Revert changes to a service's configuration
  •   scale        Scale one or multiple replicated services
  •   update     Update a service


Manage Docker stacks

Usage:  docker stack [OPTIONS] COMMAND

Options:
      --orchestrator string   Orchestrator to use (swarm|kubernetes|all)

Commands:
  •   deploy      Deploy a new stack or update an existing stack
  •   ls              List stacks
  •   ps             List the tasks in the stack
  •   rm            Remove one or more stacks
  •   services    List the services in the stack

Manage Swarm

Usage:  docker swarm COMMAND

Commands:
  •   ca                 Display and rotate the root CA
  •   init               Initialize a swarm
  •   join              Join a swarm as a node and/or manager
  •   join-token    Manage join tokens
  •   leave            Leave the swarm
  •   unlock         Unlock swarm
  •   unlock-key  Manage the unlock key
  •   update         Update the swarm

Manage Swarm nodes

Usage:  docker node COMMAND

Commands:
  •   demote       Demote one or more nodes from manager in the swarm
  •   inspect       Display detailed information on one or more nodes
  •   ls                List nodes in the swarm
  •   promote     Promote one or more nodes to manager in the swarm
  •   ps               List tasks running on one or more nodes, defaults to current node
  •   rm              Remove one or more nodes from the swarm
  •   update        Update a node

Manage Docker

Usage:  docker system COMMAND

Commands:
  •   df             Show docker disk usage
  •   events      Get real time events from the server
  •   info          Display system-wide information
  •   prune       Remove unused data

Manage trust on Docker images

Usage:  docker trust COMMAND

Management Commands:
  •   key          Manage keys for signing Docker images
  •   signer      Manage entities who can sign Docker images

Commands:
  •   inspect     Return low-level information about keys and signatures
  •   revoke      Remove trust for an image
  •   sign          Sign an image


Hope this list of docker management commands is helpful to some one in need. 

Thanks You.

https://medium.com/@rahulwadekar/docker-management-commands-a36a3784045