Build RESTful Microservices with AWS Lambda and API Gateway
Learn how to design, configure, secure and test HTTP endpoints, using AWS Lambda as backend.
Learn how to design, configure, secure and test HTTP endpoints, using AWS Lambda as backend.
AWS Developer tools: CodeBuild, CodeDeploy and CodePipeline are discussed in the context of a CICD pipeline.
Cloud9 is a cloud-based IDE to build Cloud-Native applications. In this post, I will show how to develop, test and deploy a Serverless app using Cloud9.
In this post, we will see what AWS Step Functions are and what types of problems this service is the best suited to solve with an example.
IaaC is one of the pillars of Cloud DevOps and CloudFormation is the tool that lets us provision resources using declarative programming.
The power of infrastructure-as-code is illustrated by launching a 4-node AWS Redshift cluster, performing some analysis, and destroying the resources, all us...
Why do we need AWS X-Ray service and what are the key components of X-ray service are covered in this post.
Configure a static website with S3 and distribute it using CloudFront.
In this post, we will look at 5 ways to encrypt data in S3: SSE-S3, SSE-KMS, SSE-C, CSE-KMS and CSE-C.
Rotation of CMKs, importing key material for CMKs and Deletion of CMKs are covered in this post.
Key policies are resource based policies which are tied to your CMK. And if you want a principal to be able to access your CMK, then a key policy must be in ...
Many of the AWS services rely on KMS for their encryption needs. KMS allows encryption of data at rest. In this post, we will look at the components that mak...
Install Docker on AWS, find and use images from public docker registry and finally build your docker images using Dockerfiles.
A high level overview of AWS SQS, SNS, SES
Storage fundamentals of AWS are discussed here. Non-S3 is covered in-depth.
Storage fundamentals of AWS are discussed here. S3 is covered in-depth.
The Identity and Access Management service, commonly referred to as IAM is a key security service within AWS and is likely the first service you will encount...
Configure Jenkins plugins to talk to S3 and Github and build a simple pipeline which will upload a file checked into Github to S3. Each time you make a chang...
This post covers the installation of Jenkins on EC2 and configures Blue Ocean plugin for building pipelines.
A glossary of terms pertaining to AWS
Spark’s use of functional programming is illustrated with an example. The intuition for using pure functions and DAGs is explained.
When to use Spark? What are the modes in which Spark can run? When not to use Spark? and What are some alternatives to Spark? - these questions are answered ...
KSQL is a SQL-like interface for building stream processing applications. In this post, I will show how to convert Kafka topics in Streams and Tables and the...
A data pipeline captures the movement and transformation of data from one place/format to another. It often runs on schedule and feeds data into multiple das...
A common use-case in building stream processing applications is to filter and enrich an incoming stream of data and send it to a new topic or create an inter...
Faust is a stream processing python library which allows us to read a stream of data from a Kafka Topic, process it and store the processed data into another...
Faust provides a few options for state storage that we need to understand before we start building streaming applications in production. Here, we are going t...
It is worth understanding how kafka stores data to better appreciate how the brokers achieve such high throughput. Kafka simply has a data directory on disk:...
The role of data schemas, Apache Avro and Schema Registry in Kafka is explained using an example.
How to create and administer Kafka topics? In this post, I will show 3 methods to manage Kafka topics. This will be useful when troubleshooting arcane issues
The Confluent REST Proxy provides a RESTful interface to a Kafka cluster. Learn how to produce, consume, view and administer Kafka cluster using simple pytho...
Use Kafka Connect to stream data from a log file and SQL table into Kafka using python and Kafka Connect HTTP REST API
Covers commonly used commands when working in Kafka ecosystem. Namely, Kafka CLI; Kafka Connect; Kafka REST Proxy; KSQL; Faust
I will show how to build a simple data pipeline using Apache Airflow to retrieve data from S3 and load it into Redshift cluster
Lets understand the rise and popularity of Data Lakes and the need for modern Cloud-based Data Lakes.
The power of infrastructure-as-code is illustrated by launching a 4-node AWS Redshift cluster, performing some analysis, and destroying the resources, all us...
The MapReduce programming technique was designed to analyze massive data sets across a cluster. In this post, we will get a sense of how Hadoop MapReduce works
Understand what EKS service provides, then create an eks cluster using eksctl and connect to the cluster using kubectl. Finally destroy the resources.
Kubernetes provides ConfigMaps and Secrets resource kinds to allow you to separate configuration from pod’s specs. This separation makes it easier to manage ...
Volumes, PersistentVolumes, PersistentVolumeClaims are explored in depth with the use of an example to demonstrate the need for these and how to use them.
Init Containers let you perform some tasks or check some preconditions before the main application container starts. In this post, I will use an example to s...
Is your pod ready as soon as the container starts? This is the key question we will explore in this post through the use of Kubernetes liveness and readiness...
Kubernetes by default uses RollingUpdates strategy. In this post, we will learn how to trigger, pause, resume and view a rollout and demonstrate a rollback.
Instead of creating Pods, we will create deployments and use an example 3-tier application to illustrate scaling and load balancing.
Autoscaling uses metrics server to collect metrics about the cluster and uses this info to make scaling decisions. In this demo, I will show how to create me...
In this post, I will show how to use Kubernetes service discovery mechanisms: env variables and DNS, to design a multi-pod n-tier application.
In this post, I will show how to run a multi-container pod that implements a three tier application in a Kubernetes namespace.
Learn why you can’t live without these Services in Kubernetes and what exactly they solve.
In this post, I will first cover what pods are, how to create, destroy and configure them. I will then run an Nginx web server on Kubernetes cluster.
A basic level architecture overview of Kubernetes is the focus of this post. Clusters, Nodes, Control Plane, Pods, Services and Deployments are touched upon.
Once you’ve decided on using Kubernetes, you have a variety of methods for deploying Kubernetes. Single-node, Multi-node, Vendor managed, on-prem etc.
Overview of Kubernetes, what it does, how it does it and alternatives to Kubernetes.
Before we can leverage text data in a machine learning model you must first transform it into a series of columns of numbers or vectors. There are many diffe...
Are James Bond movies the best in Thriller movies category? - I will answer this using 1500 user movie reviews of 500 top thriller movies using NLP and Topic...
Using NLP and Dimensionality Reduction (t-SNE), group cosemetics based on their chemical ingredients.
1000 user reviews are clustered using K-means and Agglomerative Hierarchical clustering algorithms. Silhouette analysis is conducted to evaluate the effectiv...
In this post, I will show how we can cluster movies based on IMDB and Wiki plot summaries. We will quantify the similarity of movies based on their plot summ...
Two documents are similar if their vectors are similar. In this post, we will explore this idea through an example. A heatmap of Amazon books similarity is d...
Scrape IMDB movie reviews and construct a dataset. Perform shallow parsing on user reviews using spaCy and pattern.
Scrape, clean and normalize 100+ Gutenberg texts and apply basic text analysis
Lexical diversity is a measure of how many different words that are used in a text. The goal of this notebook is to use NLTK to explore the lexical diversity...
A glossary of NLP terms and notes
We will delve into the concept of Vector Spaces by developing an intuition behind this idea through some simple examples. There are many applications and alg...
Predict School budgets using a machine learning pipeline. The target variable is multi-class-multi-label and we have a mix of numeric and text features. We w...
Text Analysis is a major application field for machine learning algorithms. However the raw data, a sequence of symbols cannot be fed directly to the algorit...
When working with text data, you are often required to write regular expressions to pre-process the data, extract useful information from the data, create ne...
Using simple SQL filtering, aggregations and joins answering business questions.
Use PostgreSQL JSON operators and functions to explore and understand couple of datasets stored inside a database
A variety of SQL leetcode questions that I solved. This post captures Day 1 problems and their solutions.
When data is stored in arrays we can make use of some special Postgres operators like ANY and CONTAINS to query, filter and aggregate records
Use requests library to send GET requests to an API endpoint, parse the response, handle pagination, and store the transformed data into a postgres table.
Setup dvdrental database locally for demonstrating features of PostgresSQL
Use PostgreSQL JSON operators and functions to explore and understand couple of datasets stored inside a database
Convert a SQLite database to a Postgres Database. Example when given database.sqlite and you want to do your analysis in Postgres, then read on
Work with dates and times in SQL
Since we don’t have DESCRIBE table in Postgres, we need to write this query again and again. So I am documenting it.
We will look at a real-world scenarios of a health-care company data and learn how to query JSON data using normal PostgreSQL.
Install postgres, administer it using cli, connect to it using jupyter notebook.
Using simple SQL filtering, aggregations and joins answering business questions.
Use PostgreSQL JSON operators and functions to explore and understand couple of datasets stored inside a database
When data is stored in arrays we can make use of some special Postgres operators like ANY and CONTAINS to query, filter and aggregate records
Use requests library to send GET requests to an API endpoint, parse the response, handle pagination, and store the transformed data into a postgres table.
Setup dvdrental database locally for demonstrating features of PostgresSQL
Use PostgreSQL JSON operators and functions to explore and understand couple of datasets stored inside a database
Convert a SQLite database to a Postgres Database. Example when given database.sqlite and you want to do your analysis in Postgres, then read on
Work with dates and times in SQL
Since we don’t have DESCRIBE table in Postgres, we need to write this query again and again. So I am documenting it.
Install postgres, administer it using cli, connect to it using jupyter notebook.
KSQL is a SQL-like interface for building stream processing applications. In this post, I will show how to convert Kafka topics in Streams and Tables and the...
A common use-case in building stream processing applications is to filter and enrich an incoming stream of data and send it to a new topic or create an inter...
Faust is a stream processing python library which allows us to read a stream of data from a Kafka Topic, process it and store the processed data into another...
Faust provides a few options for state storage that we need to understand before we start building streaming applications in production. Here, we are going t...
It is worth understanding how kafka stores data to better appreciate how the brokers achieve such high throughput. Kafka simply has a data directory on disk:...
The role of data schemas, Apache Avro and Schema Registry in Kafka is explained using an example.
How to create and administer Kafka topics? In this post, I will show 3 methods to manage Kafka topics. This will be useful when troubleshooting arcane issues
The Confluent REST Proxy provides a RESTful interface to a Kafka cluster. Learn how to produce, consume, view and administer Kafka cluster using simple pytho...
Use Kafka Connect to stream data from a log file and SQL table into Kafka using python and Kafka Connect HTTP REST API
Covers commonly used commands when working in Kafka ecosystem. Namely, Kafka CLI; Kafka Connect; Kafka REST Proxy; KSQL; Faust
Visually and Statistically detect Stationarity in a time series. Methods to convert a non-stationary time series to stationary are discussed with examples.
Understand the basics of Autoregressive models and how we can use PACF and AIC/BIC to identify the order of AR model.
How to know if your time series is White Noise (or) a Random Walk? How to detect a stationary process? What is the relation b/w these concepts?
Autocorrelation is the correlation of a single time series with a lagged copy of itself. In this post, we will explore this concept with couple of examples.
How to calculate correlation and regression coefficients for two time series?
Using pandas we will explore two types of window functions: rolling and expanding metrics for time series data.
Downsampling and aggregation of time series to observe trends and comparing time series that have different frequencies.
Changing the time series frequency is a very common operation as there are many cases when you want to compare time series with different freq.
Basics of manipulating time series like slicing, changing frequency, shifting, lags, diffs, percent_changes are explored in this post.
I will show how to build a simple data pipeline using Apache Airflow to retrieve data from S3 and load it into Redshift cluster
The following topics are covered: Benefits of containerizing, why choose containers over VM, Docker basics, install postgres from Dockerhub and finally build...
Install Docker on AWS, find and use images from public docker registry and finally build your docker images using Dockerfiles.
Learn about networking fundamentals in Docker. I explore the three pre-configured networks with examples.
What is the difference between Expose, PublishAll and Publish when talking about port mapping in docker? Also learn how to run docker container in detached m...
Build a docker image from a Container using the docker commit command and modify the default command using the change flag
Build your first docker image from a Dockerfile and run a simple go binary file inside the container.
Suppose your gradma caught you twiddling with a Blue Whale Icon on your laptop and you have to explain what that is, how would you explain it?
To understand EC2 Container Service AWS ECS, I will approach this from an Architectural Context, then I’ll cover the Computational Context, and then finally,...
Host a static gallery website on S3 and distribute the content the Edge locations using CloudFront Web Distribution.
We will explore how AWS Beanstalk can be used to help deploy and scale applications without having to worry about provisioning resources manually.
Amazon Elastic Beanstalk, AWS Lambda and AWS Batch are covered in this post.
Amazon EC2 Container Service and Elastic Container Registry are explained.
Create an Amazon EC2 Linux instance, connect to it and extract instance metadata.
AMI, instance types, instance purchase options, tenancy, user data, storage options and security of EC2 instances are covered in this post.
Learn how to design, configure, secure and test HTTP endpoints, using AWS Lambda as backend.
Process CodeCommit events with a Lambda Function to create custom SNS notifications, containing useful information about branch, author and message for each ...
Process AWS S3 events using AWS Lambda. A simple example of uploading an image to a bucket will create a zipped version of it in the same bucket with zip pre...
An overview of AWS Lambda service and its key features.
Cloud9 is a cloud-based IDE to build Cloud-Native applications. In this post, I will show how to develop, test and deploy a Serverless app using Cloud9.
In this post, we will see what AWS Step Functions are and what types of problems this service is the best suited to solve with an example.
Learn to process SNS notifications with AWS Lambda function which will upload a file to S3 upon receipt of a message.
Cloud9 is a cloud-based IDE to build Cloud-Native applications. In this post, I will show how to develop, test and deploy a Serverless app using Cloud9.
IaaC is one of the pillars of Cloud DevOps and CloudFormation is the tool that lets us provision resources using declarative programming.
Continuous Delivery requirements and tools are discussed extensively in this post.
Suppose your gradma caught you twiddling with a Blue Whale Icon on your laptop and you have to explain what that is, how would you explain it?
Configure Jenkins plugins to talk to S3 and Github and build a simple pipeline which will upload a file checked into Github to S3. Each time you make a chang...
This post covers the installation of Jenkins on EC2 and configures Blue Ocean plugin for building pipelines.
Predict School budgets using a machine learning pipeline. The target variable is multi-class-multi-label and we have a mix of numeric and text features. We w...
Approach to solving a binary classification problem
Unsupervised Learning: Clustering using R and Python
Predict Seismic bumps using Logistic Regression in R
PCA in R
Principal Components Analysis and Linear Discriminant Analysis applied to BreastCancer Wisconsin Diagnostic dataset in R
Basic Networking resources within AWS is covered through the use of LucidChart diagrams.
Introduction to Elastic Load Balancing and types of ELBs is covered in this post.
Filter inbound and outbound traffic using NACLs and Security Groups within your VPC.
Everything about VPC subnets and how to design for high availability and resiliency.
AWS Virtual Private Cloud (VPC) allows you to provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network...
What comes to mind when you think of a neural network? In this post, we will develop the intuition for understanding how a neural network works and introduce...
Use Faster Region Based Convolutional Neural Network to detect objects under different driving conditions
A glossary of terms used in CNN.
How to construct a MLP in PyTorch. A glossary of terms.
This post will introduce how to use pytorch to build and train neural networks
The story of how an object goes from Pending to Persisted.
Build a simple flask application which connects to postgres using SQLAlchemy.
SQLAlchemy offers several layers of abstraction and convenient tools for interacting with a database. In this post, we will understand the purpose of each la...
For a long time I have been a fan of psycopg2 and resisted to use SQLAlchemy, but may be not anymore.
Using Flask + Pandas + Plotly + Dash deploy an interactive data dashboard to Heroku
Group Airbnb listings based on similarity. Used folium maps to display the clustering results for Chelsea neighborhood
An interactive 3-D scatter plot of t-SNE features showing similar airbnb listings is shown here
Visualize Airbnb price distribution across 5 boroughs of New York City. Using GeoPandas and Folium maps, created some interesting visualizations
In this post we will think about the graph patterns to apply to the graph database and then we will perform the queries using Cypher
What is a property graph? How to create one in Neo4j using Cypher? are answered in this post using an example.
In this post I cover how to install Neo4j Desktop on mac and explore what is available in the tool. Basics of graph database is discussed.
Used Dimensionality Reduction to reduce 2100 features to 50 principal features
PCA in R
Principal Components Analysis and Linear Discriminant Analysis applied to BreastCancer Wisconsin Diagnostic dataset in R
Predict Seismic bumps using Logistic Regression in R
PCA in R
Principal Components Analysis and Linear Discriminant Analysis applied to BreastCancer Wisconsin Diagnostic dataset in R
Create RDS subnet groups, RDS database clusters, manage access to RDS cluster using security groups and finally connect to your RDS cluster and create tables.
AWS RDS and AWS non-relation databases are two main families of Databases offered by AWS. In this post I will introduce the various databases available on AW...
Learn when to use ElastiCache Service within your applications to improve your overall performance.
Using Flask + Pandas + Plotly + Dash deploy an interactive data dashboard to Heroku
Group Airbnb listings based on similarity. Used folium maps to display the clustering results for Chelsea neighborhood
Unsupervised Learning: Clustering using R and Python
1000 user reviews are clustered using K-means and Agglomerative Hierarchical clustering algorithms. Silhouette analysis is conducted to evaluate the effectiv...
In this post, I will show how we can cluster movies based on IMDB and Wiki plot summaries. We will quantify the similarity of movies based on their plot summ...
Unsupervised Learning: Clustering using R and Python
Content based recommendation using Jaccard Similarity.
Make recommendations based on the knowledge of the crowd. Example, people who watch Gladiator have also watched the Matrix. We will take a dataset and genera...
What are recommendation engines and what type of data is well suited for these problems?
Configure Jenkins plugins to talk to S3 and Github and build a simple pipeline which will upload a file checked into Github to S3. Each time you make a chang...
This post covers the installation of Jenkins on EC2 and configures Blue Ocean plugin for building pipelines.
Approach to solving a binary classification problem
Predict Seismic bumps using Logistic Regression in R
Using Flask + Pandas + Plotly + Dash deploy an interactive data dashboard to Heroku
The power of infrastructure-as-code is illustrated by launching a 4-node AWS Redshift cluster, performing some analysis, and destroying the resources, all us...
IaaC is one of the pillars of Cloud DevOps and CloudFormation is the tool that lets us provision resources using declarative programming.
The power of infrastructure-as-code is illustrated by launching a 4-node AWS Redshift cluster, performing some analysis, and destroying the resources, all us...
Everything about manipulation of dataframes using Pandas
Melting turns columns into rows. Whereas, pivot will take unique values from a column and creates a new columns.
A glossary of terms used in CNN.
This post will introduce how to use pytorch to build and train neural networks
How to construct a MLP in PyTorch. A glossary of terms.
This post will introduce how to use pytorch to build and train neural networks
Use Faster Region Based Convolutional Neural Network to detect objects under different driving conditions
A glossary of terms used in CNN.
A data pipeline captures the movement and transformation of data from one place/format to another. It often runs on schedule and feeds data into multiple das...
I will show how to build a simple data pipeline using Apache Airflow to retrieve data from S3 and load it into Redshift cluster
Dealing with outliers and choosing the type of scaling is covered in this post
Extract data from S3, clean the dataset, deal with missing values and load the cleaned dataset to S3
Used Dimensionality Reduction to reduce 2100 features to 50 principal features
Create a binary bag-of-words representation for amenities and host verifications and Tfidf representation for description of each airbnb listing
Before we can leverage text data in a machine learning model you must first transform it into a series of columns of numbers or vectors. There are many diffe...
Create a binary bag-of-words representation for amenities and host verifications and Tfidf representation for description of each airbnb listing
A common use-case in building stream processing applications is to filter and enrich an incoming stream of data and send it to a new topic or create an inter...
Faust is a stream processing python library which allows us to read a stream of data from a Kafka Topic, process it and store the processed data into another...
General steps for using Sagemaker is discussed in this post. The Training process and inference process is explained in detail.
As an introduction to using SageMaker’s High Level Python API we will look at a relatively simple problem. Namely, we will use the Boston Housing Dataset to ...
Spark’s use of functional programming is illustrated with an example. The intuition for using pure functions and DAGs is explained.
When to use Spark? What are the modes in which Spark can run? When not to use Spark? and What are some alternatives to Spark? - these questions are answered ...
When you have multiple github profiles, you want to be prompted for your user and passwd.
Everything about conda channels
Basics of linear algebra using numpy. Things like calculating the norm, dot product using numpy.
Principal Components Analysis and Linear Discriminant Analysis applied to BreastCancer Wisconsin Diagnostic dataset in R
Install postgres, administer it using cli, connect to it using jupyter notebook.
Get an overview of Amazon Kinesis to collect, process and analyze real-time streaming data.
Predict Seismic bumps using Logistic Regression in R
Predict Seismic bumps using Logistic Regression in R
When working with text data, you are often required to write regular expressions to pre-process the data, extract useful information from the data, create ne...
The power of infrastructure-as-code is illustrated by launching a 4-node AWS Redshift cluster, performing some analysis, and destroying the resources, all us...
Approach to solving a binary classification problem
Predict School budgets using a machine learning pipeline. The target variable is multi-class-multi-label and we have a mix of numeric and text features. We w...
Use PostgreSQL JSON operators and functions to explore and understand couple of datasets stored inside a database
Using NLP and Dimensionality Reduction (t-SNE), group cosemetics based on their chemical ingredients.
I will show how to build a simple data pipeline using Apache Airflow to retrieve data from S3 and load it into Redshift cluster
Used Dimensionality Reduction to reduce 2100 features to 50 principal features
The role of data schemas, Apache Avro and Schema Registry in Kafka is explained using an example.
KSQL is a SQL-like interface for building stream processing applications. In this post, I will show how to convert Kafka topics in Streams and Tables and the...