Day 1: Leetcode SQL problems

Introduction

A variety of SQL leetcode questions that I solved. This post captures Day 1 problems and their solutions.

1. Classes More Than 5 Students

URL: https://leetcode.com/problems/classes-more-than-5-students/

Question: There is a table courses with columns: student and class

Please list out all classes which have more than or equal to 5 students.

For example, the table:

student class
A Math
B English
C Math
D Biology
E Math
F Computer
G Math
H Math
I Math

Should output:

class
Math
Note:
The students should not be counted duplicate in each course.

Solution

# Write your MySQL query statement below
/*
- Group by class and get the count
- Filter the groups using a having statement
*/

SELECT
    class
FROM
    courses
GROUP BY class
HAVING COUNT(DISTINCT student) >= 5;

2. Second highest salary

Write a SQL query to get the second highest salary from the Employee table.

Id Salary
1 100
2 200
3 300

For example, given the above Employee table, the query should return 200 as the second highest salary. If there is no second highest salary, then the query should return null.

SecondHighestSalary
200

Solution:

SELECT
    Max(Salary) as SecondHighestSalary
FROM Employee
WHERE Salary NOT IN (
SELECT
    MAX(Salary)
FROM Employee);

3. User activity

Table: Activity

Column Name Type
user_id int
session_id int
activity_date date
activity_type enum

There is no primary key for this table, it may have duplicate rows. The activity_type column is an ENUM of type (‘open_session’, ‘end_session’, ‘scroll_down’, ‘send_message’). The table shows the user activities for a social media website. Note that each session belongs to exactly one user.

Write an SQL query to find the average number of sessions per user for a period of 30 days ending 2019-07-27 inclusively, rounded to 2 decimal places. The sessions we want to count for a user are those with at least one activity in that time period.

The query result format is in the following example:

Activity table:

user_id session_id activity_date activity_type
1 1 2019-07-20 open_session
1 1 2019-07-20 scroll_down
1 1 2019-07-20 end_session
2 4 2019-07-20 open_session
2 4 2019-07-21 send_message
2 4 2019-07-21 end_session
3 2 2019-07-21 open_session
3 2 2019-07-21 send_message
3 2 2019-07-21 end_session
3 5 2019-07-21 open_session
3 5 2019-07-21 scroll_down
3 5 2019-07-21 end_session
4 3 2019-06-25 open_session
4 3 2019-06-25 end_session

Result table:

average_sessions_per_user
1.33

User 1 and 2 each had 1 session in the past 30 days while user 3 had 2 sessions so the average is (1 + 1 + 2) / 3 = 1.33.

This is my solution

# Write your MySQL query statement below

SELECT IFNULL(ROUND(AVG(num_sessions),2),0) as average_sessions_per_user
FROM (
SELECT
    user_id,
    COUNT(distinct session_id) as num_sessions
FROM
    Activity
WHERE activity_date BETWEEN DATE_SUB('2019-07-28', INTERVAL 30 DAY) AND DATE('2019-07-27')
GROUP BY user_id) as cte

4. Shortest distance in a line

Table point holds the x coordinate of some points on x-axis in a plane, which are all integers. Write a query to find the shortest distance between two points in these points.

x
-1
0
2

The shortest distance is ‘1’ obviously, which is from point ‘-1’ to ‘0’. So the output is as below:

shortest
1

Note: Every point is unique, which means there is no duplicates in table point.

Follow-up: What if all these points have an id and are arranged from the left most to the right most of x axis?

Solution:

SELECT
    MIN(ABS(x1.x - x2.x)) as shortest
FROM point as x1, point as x2
WHERE x1.x != x2.x

Intuition: Calculate the distances between each two points first, and then display the minimum one.

Algorithm: To get the distances of each two points, we need to join this table with itself and use ABS() function since the distance is nonnegative. One trick here is to add the condition in the join to avoid calculating the distance between a point with itself.

SELECT
    p1.x, p2.x, ABS(p1.x - p2.x) AS distance
FROM
    point p1
        JOIN
    point p2 ON p1.x != p2.x
;
Note: The columns p1.x, p2.x are only for demonstrating purpose, so they are not actually needed in the end.
Taking the sample data for example, the output would be as below.

| x  | x  | distance |
|----|----|----------|
| 0  | -1 | 1        |
| 2  | -1 | 3        |
| -1 | 0  | 1        |
| 2  | 0  | 2        |
| -1 | 2  | 3        |
| 0  | 2  | 2        |
At last, use MIN() to select the smallest value in the distance column.

5. Pattern of recursive subqueries

Project Employees II:

Table: Project

Column Name Type
project_id int
employee_id int

(project_id, employee_id) is the primary key of this table. employee_id is a foreign key to Employee table. Table: Employee

Column Name Type
employee_id int
name varchar
experience_years int

employee_id is the primary key of this table.

Write an SQL query that reports all the projects that have the most employees.

The query result format is in the following example:

Project table:

project_id employee_id
1 1
1 2
1 3
2 1
2 4

Employee table:

employee_id name experience_years
1 Khaled 3
2 Ali 2
3 John 1
4 Doe 2

Result table:

project_id
1

The first project has 3 employees while the second one has 2.

SOLUTION: This example presents a pattern which I run into quite often. I need to correct my thought process for such problems.

My approach is to:

  • first GROUP BY project_id and count the number of employees in that project.
  • next use MAX num_employees in the outer query to calculate the value of the num employees.
  • next use GROUP BY project_id and filter the groups that are equal to the value in step 2.

So, this means, I have a query within a query within a query! Can I do better than this?

A slightly better approach:

  • first GROUP BY project_id and count the number of employees in that project, ORDER BY num_projects desc and LIMIT to 1.
  • In the outer query, first GROUP BY project_id and in the HAVING clause, equate the COUNT with value from the above query. This way, you will get all the project_id’s even if there are multiple projects with the same number of employees.
SELECT
    project_id
FROM Project
GROUP BY project_id
HAVING COUNT(*) = (

    SELECT COUNT(*) as emp_count
    FROM Project
    GROUP BY project_id
    ORDER BY emp_count DESC
    LIMIT 1    
)

6. Self-join Employee-Manager question

The Employee table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.

+----+-------+--------+-----------+
| Id | Name  | Salary | ManagerId |
+----+-------+--------+-----------+
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | NULL      |
| 4  | Max   | 90000  | NULL      |
+----+-------+--------+-----------+
Given the Employee table, write a SQL query that finds out employees who earn more than their managers. For the above table, Joe is the only employee who earns more than his manager.

+----------+
| Employee |
+----------+
| Joe      |
+----------+

Solution: The trick here is visualizing what the values of ManagerId are going to look like. Imagine, Manager to be a separate table and you wish to augment the Employee table with Manager table’s data. So, we should join Employee table and Manager table on Employee.managerId = Manager.id.

select
	e.name as Employee
from employee as e
left join employee as m
on e.managerid = m.id
where e.salary > m.salary;

7. Trips

The Trips table holds all taxi trips. Each trip has a unique Id, while Client_Id and Driver_Id are both foreign keys to the Users_Id at the Users table. Status is an ENUM type of (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’).

Id Client_Id Driver_Id City_Id Status Request_at
1 1 10 1 completed 2013-10-01
2 2 11 1 cancelled_by_driver 2013-10-01
3 3 12 6 completed 2013-10-01
4 4 13 6 cancelled_by_client 2013-10-01
5 1 10 1 completed 2013-10-02
6 2 11 6 completed 2013-10-02
7 3 12 6 completed 2013-10-02
8 2 12 12 completed 2013-10-03
9 3 10 12 completed 2013-10-03
10 4 13 12 cancelled_by_driver 2013-10-03

The Users table holds all users. Each user has an unique Users_Id, and Role is an ENUM type of (‘client’, ‘driver’, ‘partner’).

Users_Id Banned Role
1 No client
2 Yes client
3 No client
4 No client
10 No driver
11 No driver
12 No driver
13 No driver

Write a SQL query to find the cancellation rate of requests made by unbanned users (both client and driver must be unbanned) between Oct 1, 2013 and Oct 3, 2013. The cancellation rate is computed by dividing the number of canceled (by client or driver) requests made by unbanned users by the total number of requests made by unbanned users.

For the above tables, your SQL query should return the following rows with the cancellation rate being rounded to two decimal places.

Day Cancellation Rate
2013-10-01 0.33
2013-10-02 0.00
2013-10-03 0.50

I have been amazed at what we can do with the CASE statement. Whenever someone says calculate the rate/percent/proportion/avg on a Categorical column like status in our case, then that is a big hint to use CASE statement with an aggregation.

Another common pattern that I have observed is when two columns (that are fk: client_id and driver_id) in a table reference the primary key of a single column in another table (user_id in users table), then the way we filter the main table(i.e trips) is by using 2 WHERE clauses with either IN or NOT IN operators.

SELECT
	request_at as Day,
	ROUND(SUM(CASE WHEN status = 'completed' THEN 0 ELSE 1 END)*1.0/COUNT(status),2) as "Cancellation Rate"
FROM (
    SELECT
		*
	FROM trips
	WHERE client_id NOT IN (SELECT users_id
			FROM users
			WHERE banned = 'Yes') AND
		driver_id NOT IN (SELECT users_id
			FROM users
			WHERE banned = 'Yes') AND
        request_at BETWEEN DATE('2013-10-01') AND DATE('2013-10-03')
) as cte
GROUP BY request_at

8. Count Student Number in Departments

A university uses 2 data tables, student and department, to store data about its students and the departments associated with each major.

Write a query to print the respective department name and number of students majoring in each department for all departments in the department table (even ones with no current students).

Sort your results by descending number of students; if two or more departments have the same number of students, then sort those departments alphabetically by department name.

The student is described as follow:

Column Name Type
student_id Integer
student_name String
gender Character
dept_id Integer

where student_id is the student’s ID number, student_name is the student’s name, gender is their gender, and dept_id is the department ID associated with their declared major.

And the department table is described as below:

Column Name Type
dept_id Integer
dept_name String

where dept_id is the department’s ID number and dept_name is the department name.

Here is an example input:

student table:

student_id student_name gender dept_id
1 Jack M 1
2 Jane F 1
3 Mark M 2

department table:

dept_id dept_name
1 Engineering
2 Science
3 Law

The Output should be:

dept_name student_number
Engineering 2
Science 1
Law 0
SELECT
    dept_name,
    COUNT(student_id) as student_number
FROM department as d
LEFT JOIN student as s
ON d.dept_id = s.dept_id
GROUP BY dept_name
ORDER BY student_number desc, dept_name;

9. Customers placing the largest number of orders

Query the customer_number from the orders table for the customer who has placed the largest number of orders. It is guaranteed that exactly one customer will have placed more orders than any other customer. The orders table is defined as follows:

Column Type
order_number (PK) int
customer_number int
order_date date
required_date date
shipped_date date
status char(15)
comment char(200)

Sample Input:

order_number customer_number order_date required_date shipped_date status comment
1 1 2017-04-09 2017-04-13 2017-04-12 Closed  
2 2 2017-04-15 2017-04-20 2017-04-18 Closed  
3 3 2017-04-16 2017-04-25 2017-04-20 Closed  
4 3 2017-04-18 2017-04-28 2017-04-25 Closed  

Sample Output

customer_number
3

Explanation:

The customer with number ‘3’ has two orders, which is greater than either customer ‘1’ or ‘2’ because each of them only has one order. So the result is customer_number ‘3’. Follow up: What if more than one customer have the largest number of orders, can you find all the customer_number in this case?

SELECT
    customer_number
FROM orders
GROUP BY customer_number
HAVING COUNT(order_number) = (
    SELECT COUNT(order_number)
        FROM orders
    GROUP BY customer_number
    ORDER BY COUNT(order_number) DESC LIMIT 1);

Using a window function:

SELECT
    cte1.customer_number
FROM (
SELECT
    cte.customer_number,
    RANK() OVER(ORDER BY total desc) n_rank
FROM (
SELECT
    customer_number,
    COUNT(*) as total
FROM
    orders
GROUP BY customer_number) cte) cte1
WHERE n_rank = 1;

10. Active Businesses

Table: Events

Column Name Type
business_id int
event_type varchar
occurences int

(business_id, event_type) is the primary key of this table. Each row in the table logs the info that an event of some type occured at some business for a number of times.

Write an SQL query to find all active businesses.

An active business is a business that has more than one event type with occurences greater than the average occurences of that event type among all businesses.

The query result format is in the following example:

Events table:

business_id event_type occurences
1 reviews 7
3 reviews 3
1 ads 11
2 ads 7
3 ads 6
1 page views 3
2 page views 12

Result table:

business_id
1

Average for ‘reviews’, ‘ads’ and ‘page views’ are (7+3)/2=5, (11+7+6)/3=8, (3+12)/2=7.5 respectively. Business with id 1 has 7 ‘reviews’ events (more than 5) and 11 ‘ads’ events (more than 8) so it is an active business.

SELECT
		business_id
FROM Events as e1
INNER JOIN (
		SELECT
			event_type,
			ROUND(AVG(occurences),2) as avg_occurrence
		FROM Events
		GROUP BY event_type) as e2
ON e1.event_type = e2.event_type
WHERE occurences > avg_occurrence
GROUP BY business_id
HAVING COUNT(*) > 1

Don’t over use CASE statement like this:

SELECT
		business_id
FROM Events as e1
LEFT JOIN (
		SELECT
			event_type,
			ROUND(AVG(occurences),2) as avg_occurrence
		FROM Events
		GROUP BY event_type) as e2
ON e1.event_type = e2.event_type
WHERE 1 = CASE WHEN occurences > avg_occurrence THEN 1 ELSE 0 END
GROUP BY business_id
HAVING COUNT(*) > 1

Tags:

Updated: