comments | difficulty | edit_url | tags | |
---|---|---|---|---|
true |
Medium |
|
Table: cities
+-------------+---------+ | Column Name | Type | +-------------+---------+ | state | varchar | | city | varchar | +-------------+---------+ (state, city) is the combination of columns with unique values for this table. Each row of this table contains the state name and the city name within that state.
Write a solution to find all the cities in each state and analyze them based on the following requirements:
- Combine all cities into a comma-separated string for each state.
- Only include states that have at least
3
cities. - Only include states where at least one city starts with the same letter as the state name.
Return the result table ordered by the count of matching-letter cities in descending order and then by state name in ascending order.
The result format is in the following example.
Example:
Input:
cities table:
+--------------+---------------+ | state | city | +--------------+---------------+ | New York | New York City | | New York | Newark | | New York | Buffalo | | New York | Rochester | | California | San Francisco | | California | Sacramento | | California | San Diego | | California | Los Angeles | | Texas | Tyler | | Texas | Temple | | Texas | Taylor | | Texas | Dallas | | Pennsylvania | Philadelphia | | Pennsylvania | Pittsburgh | | Pennsylvania | Pottstown | +--------------+---------------+
Output:
+-------------+-------------------------------------------+-----------------------+ | state | cities | matching_letter_count | +-------------+-------------------------------------------+-----------------------+ | Pennsylvania| Philadelphia, Pittsburgh, Pottstown | 3 | | Texas | Dallas, Taylor, Temple, Tyler | 3 | | New York | Buffalo, Newark, New York City, Rochester | 2 | +-------------+-------------------------------------------+-----------------------+
Explanation:
- Pennsylvania:
<ul> <li>Has 3 cities (meets minimum requirement)</li> <li>All 3 cities start with 'P' (same as state)</li> <li>matching_letter_count = 3</li> </ul> </li> <li><strong>Texas</strong>: <ul> <li>Has 4 cities (meets minimum requirement)</li> <li>3 cities (Taylor, Temple, Tyler) start with 'T' (same as state)</li> <li>matching_letter_count = 3</li> </ul> </li> <li><strong>New York</strong>: <ul> <li>Has 4 cities (meets minimum requirement)</li> <li>2 cities (Newark, New York City) start with 'N' (same as state)</li> <li>matching_letter_count = 2</li> </ul> </li> <li><strong>California</strong> is not included in the output because: <ul> <li>Although it has 4 cities (meets minimum requirement)</li> <li>No cities start with 'C' (doesn't meet the matching letter requirement)</li> </ul> </li>
Note:
- Results are ordered by matching_letter_count in descending order
- When matching_letter_count is the same (Texas and New York both have 2), they are ordered by state name alphabetically
- Cities in each row are ordered alphabetically
We can group the cities
table by the state
field, then apply filtering on each group to retain only the groups that meet the specified conditions.
# Write your MySQL query statement below
SELECT
state,
GROUP_CONCAT(city ORDER BY city SEPARATOR ', ') AS cities,
COUNT(
CASE
WHEN LEFT(city, 1) = LEFT(state, 1) THEN 1
END
) AS matching_letter_count
FROM cities
GROUP BY 1
HAVING COUNT(city) >= 3 AND matching_letter_count > 0
ORDER BY 3 DESC, 1;
import pandas as pd
def state_city_analysis(cities: pd.DataFrame) -> pd.DataFrame:
cities["matching_letter"] = cities["city"].str[0] == cities["state"].str[0]
result = (
cities.groupby("state")
.agg(
cities=("city", lambda x: ", ".join(sorted(x))),
matching_letter_count=("matching_letter", "sum"),
city_count=("city", "count"),
)
.reset_index()
)
result = result[(result["city_count"] >= 3) & (result["matching_letter_count"] > 0)]
result = result.sort_values(
by=["matching_letter_count", "state"], ascending=[False, True]
)
result = result.drop(columns=["city_count"])
return result