all posts

2020

Building a model using Sagemaker

5 minute read

General steps for using Sagemaker is discussed in this post. The Training process and inference process is explained in detail.

Process a stream of data using Faust

5 minute read

Faust is a stream processing python library which allows us to read a stream of data from a Kafka Topic, process it and store the processed data into another...

Storage options for stream processing apps

2 minute read

Faust provides a few options for state storage that we need to understand before we start building streaming applications in production. Here, we are going t...

Back to top ↑

2019

Kafka: How Kafka stores data?

5 minute read

It is worth understanding how kafka stores data to better appreciate how the brokers achieve such high throughput. Kafka simply has a data directory on disk:...

Process S3 Events using AWS Lambda

4 minute read

Process AWS S3 events using AWS Lambda. A simple example of uploading an image to a bucket will create a zipped version of it in the same bucket with zip pre...

Introduction to AWS ECS

6 minute read

To understand EC2 Container Service AWS ECS, I will approach this from an Architectural Context, then I’ll cover the Computational Context, and then finally,...

Create your first Amazon RDS Database

4 minute read

Create RDS subnet groups, RDS database clusters, manage access to RDS cluster using security groups and finally connect to your RDS cluster and create tables.

Databases on AWS

11 minute read

AWS RDS and AWS non-relation databases are two main families of Databases offered by AWS. In this post I will introduce the various databases available on AW...

Create features out of text

15 minute read

Before we can leverage text data in a machine learning model you must first transform it into a series of columns of numbers or vectors. There are many diffe...

Day 1: Leetcode SQL problems

11 minute read

A variety of SQL leetcode questions that I solved. This post captures Day 1 problems and their solutions.

Managing Kafka Topics

11 minute read

How to create and administer Kafka topics? In this post, I will show 3 methods to manage Kafka topics. This will be useful when troubleshooting arcane issues

A practical guide to using Kafka REST Proxy

10 minute read

The Confluent REST Proxy provides a RESTful interface to a Kafka cluster. Learn how to produce, consume, view and administer Kafka cluster using simple pytho...

Kafka commonly used commands

1 minute read

Covers commonly used commands when working in Kafka ecosystem. Namely, Kafka CLI; Kafka Connect; Kafka REST Proxy; KSQL; Faust

How to work with ARRAYs in Postgres

5 minute read

When data is stored in arrays we can make use of some special Postgres operators like ANY and CONTAINS to query, filter and aggregate records

New York City Airbnb price distribution

7 minute read

Visualize Airbnb price distribution across 5 boroughs of New York City. Using GeoPandas and Folium maps, created some interesting visualizations

Document Clustering

20 minute read

1000 user reviews are clustered using K-means and Agglomerative Hierarchical clustering algorithms. Silhouette analysis is conducted to evaluate the effectiv...

Clustering movies based on their plots

13 minute read

In this post, I will show how we can cluster movies based on IMDB and Wiki plot summaries. We will quantify the similarity of movies based on their plot summ...

Document Similarity

10 minute read

Two documents are similar if their vectors are similar. In this post, we will explore this idea through an example. A heatmap of Amazon books similarity is d...

Scrape IMDB movie reviews

17 minute read

Scrape IMDB movie reviews and construct a dataset. Perform shallow parsing on user reviews using spaCy and pattern.

Kubernetes ConfigMaps and Secrets

12 minute read

Kubernetes provides ConfigMaps and Secrets resource kinds to allow you to separate configuration from pod’s specs. This separation makes it easier to manage ...

Kubernetes Volumes

15 minute read

Volumes, PersistentVolumes, PersistentVolumeClaims are explored in depth with the use of an example to demonstrate the need for these and how to use them.

Kubernetes Init Containers

6 minute read

Init Containers let you perform some tasks or check some preconditions before the main application container starts. In this post, I will use an example to s...

Kubernetes Liveness and Readiness Probes

11 minute read

Is your pod ready as soon as the container starts? This is the key question we will explore in this post through the use of Kubernetes liveness and readiness...

Kubernetes Rolling Updates and Rollbacks

10 minute read

Kubernetes by default uses RollingUpdates strategy. In this post, we will learn how to trigger, pause, resume and view a rollout and demonstrate a rollback.

Kubernetes Deployments in Depth

9 minute read

Instead of creating Pods, we will create deployments and use an example 3-tier application to illustrate scaling and load balancing.

Kubernetes Autoscaling Demonstration

13 minute read

Autoscaling uses metrics server to collect metrics about the cluster and uses this info to make scaling decisions. In this demo, I will show how to create me...

Kubernetes Multi-Container Pods

10 minute read

In this post, I will show how to run a multi-container pod that implements a three tier application in a Kubernetes namespace.

Kubernetes Architecture

1 minute read

A basic level architecture overview of Kubernetes is the focus of this post. Clusters, Nodes, Control Plane, Pods, Services and Deployments are touched upon.

Deploying Kubernetes options

3 minute read

Once you’ve decided on using Kubernetes, you have a variety of methods for deploying Kubernetes. Single-node, Multi-node, Vendor managed, on-prem etc.

Kubernetes Overview

4 minute read

Overview of Kubernetes, what it does, how it does it and alternatives to Kubernetes.

Lexical Diversity

7 minute read

Lexical diversity is a measure of how many different words that are used in a text. The goal of this notebook is to use NLTK to explore the lexical diversity...

Back to top ↑

2018

Containerize a Flask application

9 minute read

The following topics are covered: Benefits of containerizing, why choose containers over VM, Docker basics, install postgres from Dockerhub and finally build...

Understanding Vector Space Models

4 minute read

We will delve into the concept of Vector Spaces by developing an intuition behind this idea through some simple examples. There are many applications and alg...

The Rise of Cloud Data Lakes

9 minute read

Lets understand the rise and popularity of Data Lakes and the need for modern Cloud-based Data Lakes.

Pandas

less than 1 minute read

World of DataFrames, Series and their indexes

Back to top ↑

2017

Pandas Indexes Demystified

10 minute read

Pandas Indexes Pandas indexes are the most confusing thing about pandas. So let’s try to understand it. What are the advantages of using indices instead of...

Text as numeric data

7 minute read

Text Analysis is a major application field for machine learning algorithms. However the raw data, a sequence of symbols cannot be fed directly to the algorit...

The World of Regex

28 minute read

When working with text data, you are often required to write regular expressions to pre-process the data, extract useful information from the data, create ne...

Creatingdataframes

3 minute read

What are different ways in which you can create a dataframe ? Using pd.DataFrame() constructor. Zip lists to build a DataFrame. Building dataframes w...

Flask and SQLAlchemy

7 minute read

Build a simple flask application which connects to postgres using SQLAlchemy.

SQLAlchemy Layers explained

4 minute read

SQLAlchemy offers several layers of abstraction and convenient tools for interacting with a database. In this post, we will understand the purpose of each la...

AWS Kinesis Overview

15 minute read

Get an overview of Amazon Kinesis to collect, process and analyze real-time streaming data.

MapReduce Programming Example

3 minute read

The MapReduce programming technique was designed to analyze massive data sets across a cluster. In this post, we will get a sense of how Hadoop MapReduce works

Back to top ↑

2016

Cypher Query Language to query Neo4j

3 minute read

In this post we will think about the graph patterns to apply to the graph database and then we will perform the queries using Cypher

Neo4j and Graph database basics

5 minute read

In this post I cover how to install Neo4j Desktop on mac and explore what is available in the tool. Basics of graph database is discussed.

AWS Elastic Beanstalk explained

9 minute read

We will explore how AWS Beanstalk can be used to help deploy and scale applications without having to worry about provisioning resources manually.

VPC subnets

13 minute read

Everything about VPC subnets and how to design for high availability and resiliency.

AWS S3 Encryption Mechanisms

7 minute read

In this post, we will look at 5 ways to encrypt data in S3: SSE-S3, SSE-KMS, SSE-C, CSE-KMS and CSE-C.

AWS KMS Key Management

9 minute read

Rotation of CMKs, importing key material for CMKs and Deletion of CMKs are covered in this post.

AWS KMS Key Policies

8 minute read

Key policies are resource based policies which are tied to your CMK. And if you want a principal to be able to access your CMK, then a key policy must be in ...

AWS Key Management Service (KMS)

15 minute read

Many of the AWS services rely on KMS for their encryption needs. KMS allows encryption of data at rest. In this post, we will look at the components that mak...

Docker on AWS EC2

18 minute read

Install Docker on AWS, find and use images from public docker registry and finally build your docker images using Dockerfiles.

AWS EC2 service overview

14 minute read

AMI, instance types, instance purchase options, tenancy, user data, storage options and security of EC2 instances are covered in this post.

Docker Port Mapping by example

7 minute read

What is the difference between Expose, PublishAll and Publish when talking about port mapping in docker? Also learn how to run docker container in detached m...

How to explain Docker to your Grandma?

10 minute read

Suppose your gradma caught you twiddling with a Blue Whale Icon on your laptop and you have to explain what that is, how would you explain it?

AWS Identity and Access Management (IAM)

16 minute read

The Identity and Access Management service, commonly referred to as IAM is a key security service within AWS and is likely the first service you will encount...

AWS Glossary

3 minute read

A glossary of terms pertaining to AWS

Back to top ↑