Skip to content

Latest commit

 

History

History
177 lines (134 loc) · 6.32 KB

File metadata and controls

177 lines (134 loc) · 6.32 KB
comments difficulty edit_url tags
true
Medium
Database

中文文档

Description

Table: Posts

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| post_id     | int     |
| user_id     | int     |
| post_date   | date    |
+-------------+---------+
post_id is the primary key (column with unique values) for this table.
Each row of this table contains post_id, user_id, and post_date.

Write a solution to find users who demonstrate bursty behavior in their posting patterns during February 2024. Bursty behavior is defined as any period of 7 consecutive days where a user's posting frequency is at least twice to their average weekly posting frequency for February 2024.

Note: Only include the dates from February 1 to February 28 in your analysis, which means you should count February as having exactly 4 weeks.

Return the result table orderd by user_id in ascending order.

The result format is in the following example.

 

Example:

Input:

Posts table:

+---------+---------+------------+
| post_id | user_id | post_date  |
+---------+---------+------------+
| 1       | 1       | 2024-02-27 |
| 2       | 5       | 2024-02-06 |
| 3       | 3       | 2024-02-25 |
| 4       | 3       | 2024-02-14 |
| 5       | 3       | 2024-02-06 |
| 6       | 2       | 2024-02-25 |
+---------+---------+------------+

Output:

+---------+----------------+------------------+
| user_id | max_7day_posts | avg_weekly_posts |
+---------+----------------+------------------+
| 1       | 1              | 0.2500           |
| 2       | 1              | 0.2500           |
| 5       | 1              | 0.2500           |
+---------+----------------+------------------+

Explanation:

  • User 1: Made only 1 post in February, resulting in an average of 0.25 posts per week and a max of 1 post in any 7-day period.
  • User 2: Also made just 1 post, with the same average and max 7-day posting frequency as User 1.
  • User 5: Like Users 1 and 2, User 5 made only 1 post throughout February, leading to the same average and max 7-day posting metrics.
  • User 3: Although User 3 made more posts than the others (3 posts), they did not reach twice the average weekly posts in their consecutive 7-day window, so they are not listed in the output.

Note: Output table is ordered by user_id in ascending order.

Solutions

Solution 1: Self-Join + Group Count

We can use self-join to connect the Posts table with itself. The connection condition is p1.user_id = p2.user_id and p2.post_date is between p1.post_date and 6 days after p1.post_date. Then we group the connection results by p1.user_id and p1.post_id to count the number of posts for each user within 7 days of each day. We save this result in table P.

Next, we count the average number of posts per week for each user in February 2024 and save it in table T. Note that we need to find records where post_date is between 2024-02-01 and 2024-02-28, group the records by user_id, then count the number of posts for each user, and finally divide by 4 to get the average number of posts per week. We save this result in table T.

Finally, we connect tables P and T with the condition P.user_id = T.user_id, then group by user_id to count the maximum number of posts within 7 days for each user. We then filter out records that meet the condition max_7day_posts >= avg_weekly_posts * 2 to get the result. Note that we need to sort in ascending order by user_id.

MySQL

# Write your MySQL query statement below
WITH
    P AS (
        SELECT p1.user_id AS user_id, COUNT(1) AS cnt
        FROM
            Posts AS p1
            JOIN Posts AS p2
                ON p1.user_id = p2.user_id
                AND p2.post_date BETWEEN p1.post_date AND DATE_ADD(p1.post_date, INTERVAL 6 DAY)
        GROUP BY p1.user_id, p1.post_id
    ),
    T AS (
        SELECT user_id, COUNT(1) / 4 AS avg_weekly_posts
        FROM Posts
        WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
        GROUP BY 1
    )
SELECT user_id, MAX(cnt) AS max_7day_posts, avg_weekly_posts
FROM
    P
    JOIN T USING (user_id)
GROUP BY 1
HAVING max_7day_posts >= avg_weekly_posts * 2
ORDER BY 1;

Python3

import pandas as pd


def find_bursty_behavior(posts: pd.DataFrame) -> pd.DataFrame:
    # Subquery P
    p1 = pd.merge(posts, posts, on="user_id", suffixes=("_1", "_2"))
    p1 = p1[
        p1["post_date_2"].between(
            p1["post_date_1"], p1["post_date_1"] + pd.Timedelta(days=6)
        )
    ]
    p1 = p1.groupby(["user_id", "post_id_1"]).size().reset_index(name="cnt")

    # Subquery T
    t = posts[
        (posts["post_date"] >= "2024-02-01") & (posts["post_date"] <= "2024-02-28")
    ]
    t = t.groupby("user_id").size().div(4).reset_index(name="avg_weekly_posts")

    # Joining P and T
    merged_df = pd.merge(p1, t, on="user_id", how="inner")

    # Filtering
    filtered_df = merged_df[merged_df["cnt"] >= merged_df["avg_weekly_posts"] * 2]

    # Aggregating
    result_df = (
        filtered_df.groupby("user_id")
        .agg({"cnt": "max", "avg_weekly_posts": "first"})
        .reset_index()
    )
    result_df.columns = ["user_id", "max_7day_posts", "avg_weekly_posts"]

    # Sorting
    result_df.sort_values(by="user_id", inplace=True)

    return result_df