Posts by Tag

AWS

AWS S3 Encryption Mechanisms

7 minute read

In this post, we will look at 5 ways to encrypt data in S3: SSE-S3, SSE-KMS, SSE-C, CSE-KMS and CSE-C.

AWS KMS Key Management

9 minute read

Rotation of CMKs, importing key material for CMKs and Deletion of CMKs are covered in this post.

AWS KMS Key Policies

8 minute read

Key policies are resource based policies which are tied to your CMK. And if you want a principal to be able to access your CMK, then a key policy must be in ...

AWS Key Management Service (KMS)

15 minute read

Many of the AWS services rely on KMS for their encryption needs. KMS allows encryption of data at rest. In this post, we will look at the components that mak...

Docker on AWS EC2

18 minute read

Install Docker on AWS, find and use images from public docker registry and finally build your docker images using Dockerfiles.

AWS Identity and Access Management (IAM)

16 minute read

The Identity and Access Management service, commonly referred to as IAM is a key security service within AWS and is likely the first service you will encount...

AWS Glossary

3 minute read

A glossary of terms pertaining to AWS

Back to top ↑

Data Engineering

Process a stream of data using Faust

5 minute read

Faust is a stream processing python library which allows us to read a stream of data from a Kafka Topic, process it and store the processed data into another...

Storage options for stream processing apps

2 minute read

Faust provides a few options for state storage that we need to understand before we start building streaming applications in production. Here, we are going t...

Kafka: How Kafka stores data?

5 minute read

It is worth understanding how kafka stores data to better appreciate how the brokers achieve such high throughput. Kafka simply has a data directory on disk:...

Managing Kafka Topics

11 minute read

How to create and administer Kafka topics? In this post, I will show 3 methods to manage Kafka topics. This will be useful when troubleshooting arcane issues

A practical guide to using Kafka REST Proxy

10 minute read

The Confluent REST Proxy provides a RESTful interface to a Kafka cluster. Learn how to produce, consume, view and administer Kafka cluster using simple pytho...

Kafka commonly used commands

1 minute read

Covers commonly used commands when working in Kafka ecosystem. Namely, Kafka CLI; Kafka Connect; Kafka REST Proxy; KSQL; Faust

The Rise of Cloud Data Lakes

9 minute read

Lets understand the rise and popularity of Data Lakes and the need for modern Cloud-based Data Lakes.

MapReduce Programming Example

3 minute read

The MapReduce programming technique was designed to analyze massive data sets across a cluster. In this post, we will get a sense of how Hadoop MapReduce works

Back to top ↑

Kubernetes

Kubernetes ConfigMaps and Secrets

12 minute read

Kubernetes provides ConfigMaps and Secrets resource kinds to allow you to separate configuration from pod’s specs. This separation makes it easier to manage ...

Kubernetes Volumes

15 minute read

Volumes, PersistentVolumes, PersistentVolumeClaims are explored in depth with the use of an example to demonstrate the need for these and how to use them.

Kubernetes Init Containers

6 minute read

Init Containers let you perform some tasks or check some preconditions before the main application container starts. In this post, I will use an example to s...

Kubernetes Liveness and Readiness Probes

11 minute read

Is your pod ready as soon as the container starts? This is the key question we will explore in this post through the use of Kubernetes liveness and readiness...

Kubernetes Rolling Updates and Rollbacks

10 minute read

Kubernetes by default uses RollingUpdates strategy. In this post, we will learn how to trigger, pause, resume and view a rollout and demonstrate a rollback.

Kubernetes Deployments in Depth

9 minute read

Instead of creating Pods, we will create deployments and use an example 3-tier application to illustrate scaling and load balancing.

Kubernetes Autoscaling Demonstration

13 minute read

Autoscaling uses metrics server to collect metrics about the cluster and uses this info to make scaling decisions. In this demo, I will show how to create me...

Kubernetes Multi-Container Pods

10 minute read

In this post, I will show how to run a multi-container pod that implements a three tier application in a Kubernetes namespace.

Kubernetes Architecture

1 minute read

A basic level architecture overview of Kubernetes is the focus of this post. Clusters, Nodes, Control Plane, Pods, Services and Deployments are touched upon.

Deploying Kubernetes options

3 minute read

Once you’ve decided on using Kubernetes, you have a variety of methods for deploying Kubernetes. Single-node, Multi-node, Vendor managed, on-prem etc.

Kubernetes Overview

4 minute read

Overview of Kubernetes, what it does, how it does it and alternatives to Kubernetes.

Back to top ↑

NLP

Create features out of text

15 minute read

Before we can leverage text data in a machine learning model you must first transform it into a series of columns of numbers or vectors. There are many diffe...

Document Clustering

20 minute read

1000 user reviews are clustered using K-means and Agglomerative Hierarchical clustering algorithms. Silhouette analysis is conducted to evaluate the effectiv...

Clustering movies based on their plots

13 minute read

In this post, I will show how we can cluster movies based on IMDB and Wiki plot summaries. We will quantify the similarity of movies based on their plot summ...

Document Similarity

10 minute read

Two documents are similar if their vectors are similar. In this post, we will explore this idea through an example. A heatmap of Amazon books similarity is d...

Scrape IMDB movie reviews

17 minute read

Scrape IMDB movie reviews and construct a dataset. Perform shallow parsing on user reviews using spaCy and pattern.

Lexical Diversity

7 minute read

Lexical diversity is a measure of how many different words that are used in a text. The goal of this notebook is to use NLTK to explore the lexical diversity...

Understanding Vector Space Models

4 minute read

We will delve into the concept of Vector Spaces by developing an intuition behind this idea through some simple examples. There are many applications and alg...

Text as numeric data

7 minute read

Text Analysis is a major application field for machine learning algorithms. However the raw data, a sequence of symbols cannot be fed directly to the algorit...

The World of Regex

28 minute read

When working with text data, you are often required to write regular expressions to pre-process the data, extract useful information from the data, create ne...

Back to top ↑

SQL

Day 1: Leetcode SQL problems

11 minute read

A variety of SQL leetcode questions that I solved. This post captures Day 1 problems and their solutions.

How to work with ARRAYs in Postgres

5 minute read

When data is stored in arrays we can make use of some special Postgres operators like ANY and CONTAINS to query, filter and aggregate records

Back to top ↑

postgres

How to work with ARRAYs in Postgres

5 minute read

When data is stored in arrays we can make use of some special Postgres operators like ANY and CONTAINS to query, filter and aggregate records

Back to top ↑

Deep Learning

Implement gradient descent in python

6 minute read

Using a dataset, implement the gradient descent algorithm to find the line that best separates the points. Explore the training loop with epochs and learning...

Why error functions need to be continuous?

4 minute read

We’ll see what an error function is and why it needs to be continuous and differentiable. We’ll also explore the difference b/w discrete and continuous predi...

Neural Network Basic Introduction

8 minute read

What comes to mind when you think of a neural network? In this post, we will develop the intuition for understanding how a neural network works and introduce...

Back to top ↑

Kafka

Process a stream of data using Faust

5 minute read

Faust is a stream processing python library which allows us to read a stream of data from a Kafka Topic, process it and store the processed data into another...

Storage options for stream processing apps

2 minute read

Faust provides a few options for state storage that we need to understand before we start building streaming applications in production. Here, we are going t...

Kafka: How Kafka stores data?

5 minute read

It is worth understanding how kafka stores data to better appreciate how the brokers achieve such high throughput. Kafka simply has a data directory on disk:...

Managing Kafka Topics

11 minute read

How to create and administer Kafka topics? In this post, I will show 3 methods to manage Kafka topics. This will be useful when troubleshooting arcane issues

A practical guide to using Kafka REST Proxy

10 minute read

The Confluent REST Proxy provides a RESTful interface to a Kafka cluster. Learn how to produce, consume, view and administer Kafka cluster using simple pytho...

Kafka commonly used commands

1 minute read

Covers commonly used commands when working in Kafka ecosystem. Namely, Kafka CLI; Kafka Connect; Kafka REST Proxy; KSQL; Faust

Back to top ↑

Time Series

Intro to Random Cut Forest Algorithm

5 minute read

RandomCutForests are primarily used for detecting anomalous data points. In this post, we will cover how RCF works at a very high level.

Autocorrelation of a time series

7 minute read

Autocorrelation is the correlation of a single time series with a lagged copy of itself. In this post, we will explore this concept with couple of examples.

Resampling a time series: Upsampling

8 minute read

Changing the time series frequency is a very common operation as there are many cases when you want to compare time series with different freq.

Basics of manipulating time series

9 minute read

Basics of manipulating time series like slicing, changing frequency, shifting, lags, diffs, percent_changes are explored in this post.

Back to top ↑

Docker

Containerize a Flask application

9 minute read

The following topics are covered: Benefits of containerizing, why choose containers over VM, Docker basics, install postgres from Dockerhub and finally build...

Docker on AWS EC2

18 minute read

Install Docker on AWS, find and use images from public docker registry and finally build your docker images using Dockerfiles.

Docker Port Mapping by example

7 minute read

What is the difference between Expose, PublishAll and Publish when talking about port mapping in docker? Also learn how to run docker container in detached m...

How to explain Docker to your Grandma?

10 minute read

Suppose your gradma caught you twiddling with a Blue Whale Icon on your laptop and you have to explain what that is, how would you explain it?

Back to top ↑

Machine Learning

Back to top ↑

AWS Compute

Introduction to AWS ECS

6 minute read

To understand EC2 Container Service AWS ECS, I will approach this from an Architectural Context, then I’ll cover the Computational Context, and then finally,...

AWS Elastic Beanstalk explained

9 minute read

We will explore how AWS Beanstalk can be used to help deploy and scale applications without having to worry about provisioning resources manually.

AWS EC2 service overview

14 minute read

AMI, instance types, instance purchase options, tenancy, user data, storage options and security of EC2 instances are covered in this post.

Back to top ↑

Serverless

Process S3 Events using AWS Lambda

4 minute read

Process AWS S3 events using AWS Lambda. A simple example of uploading an image to a bucket will create a zipped version of it in the same bucket with zip pre...

Back to top ↑

DevOps

How to explain Docker to your Grandma?

10 minute read

Suppose your gradma caught you twiddling with a Blue Whale Icon on your laptop and you have to explain what that is, how would you explain it?

Back to top ↑

AWS Networking

VPC subnets

13 minute read

Everything about VPC subnets and how to design for high availability and resiliency.

Back to top ↑

SQLAlchemy

Flask and SQLAlchemy

7 minute read

Build a simple flask application which connects to postgres using SQLAlchemy.

SQLAlchemy Layers explained

4 minute read

SQLAlchemy offers several layers of abstraction and convenient tools for interacting with a database. In this post, we will understand the purpose of each la...

Back to top ↑

Pandas

Back to top ↑

Data Visualization

New York City Airbnb price distribution

7 minute read

Visualize Airbnb price distribution across 5 boroughs of New York City. Using GeoPandas and Folium maps, created some interesting visualizations

Back to top ↑

Neo4j

Cypher Query Language to query Neo4j

3 minute read

In this post we will think about the graph patterns to apply to the graph database and then we will perform the queries using Cypher

Neo4j and Graph database basics

5 minute read

In this post I cover how to install Neo4j Desktop on mac and explore what is available in the tool. Basics of graph database is discussed.

Back to top ↑

PCA

Back to top ↑

R

Back to top ↑

AWS Databases

Create your first Amazon RDS Database

4 minute read

Create RDS subnet groups, RDS database clusters, manage access to RDS cluster using security groups and finally connect to your RDS cluster and create tables.

Databases on AWS

11 minute read

AWS RDS and AWS non-relation databases are two main families of Databases offered by AWS. In this post I will introduce the various databases available on AW...

Back to top ↑

Unsupervised Learning

Back to top ↑

Clustering

Document Clustering

20 minute read

1000 user reviews are clustered using K-means and Agglomerative Hierarchical clustering algorithms. Silhouette analysis is conducted to evaluate the effectiv...

Clustering movies based on their plots

13 minute read

In this post, I will show how we can cluster movies based on IMDB and Wiki plot summaries. We will quantify the similarity of movies based on their plot summ...

Back to top ↑

Recommendation Systems

Making non-personalized recommendations

7 minute read

Make recommendations based on the knowledge of the crowd. Example, people who watch Gladiator have also watched the Matrix. We will take a dataset and genera...

Back to top ↑

Jenkins

Back to top ↑

Classification

Back to top ↑

Cloud

Back to top ↑

IaC

Back to top ↑

Neural Networks

Back to top ↑

Pytorch

Back to top ↑

CNN

Back to top ↑

Airflow

Back to top ↑

Data Cleaning

Back to top ↑

Data Preprocessing

Back to top ↑

Feature Engineering

Create features out of text

15 minute read

Before we can leverage text data in a machine learning model you must first transform it into a series of columns of numbers or vectors. There are many diffe...

Back to top ↑

Faust

Process a stream of data using Faust

5 minute read

Faust is a stream processing python library which allows us to read a stream of data from a Kafka Topic, process it and store the processed data into another...

Back to top ↑

AWS Sagemaker

Building a model using Sagemaker

5 minute read

General steps for using Sagemaker is discussed in this post. The Training process and inference process is explained in detail.

Back to top ↑

Spark

Back to top ↑

git

Back to top ↑

conda

Back to top ↑

Linear Algebra

Back to top ↑

LDA

Back to top ↑

jupyter

Back to top ↑

AWS BigData

AWS Kinesis Overview

15 minute read

Get an overview of Amazon Kinesis to collect, process and analyze real-time streaming data.

Back to top ↑

Logistic Regression

Back to top ↑

Cross Validation

Back to top ↑

regex

The World of Regex

28 minute read

When working with text data, you are often required to write regular expressions to pre-process the data, extract useful information from the data, create ne...

Back to top ↑

AWS RedShift

Back to top ↑

Supervised Learning

Back to top ↑

pipelines

Back to top ↑

EDA

Back to top ↑

tsne

Back to top ↑

Data Pipeline

Back to top ↑

Dimensionality Reduction

Back to top ↑

Avro

Back to top ↑

KSQL

Back to top ↑