-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the PLPGSQL queries to resolve duplicate queries #1013
base: master
Are you sure you want to change the base?
Update the PLPGSQL queries to resolve duplicate queries #1013
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks pretty good. I want to understand the CROSS JOIN, but everything else looks good.
Thank you for your patience.
pss.blk_write_time / 1000.0 as block_write_seconds_total | ||
FROM pg_stat_statements pss | ||
JOIN pg_database ON pg_database.oid = pss.dbid | ||
CROSS JOIN percentiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the CROSS JOIN here? I'm having trouble understanding the why. My understanding is that CROSS JOIN will produce a cartesian product where every possible combination of table A and B will exist.
Can you please help in merging this branch to master please |
I need the explanation of the CROSS JOIN before I am willing to merge this code. It's not obvious why it is there so I feel that it deserves some context. |
If you're asking about the fact that the base branch is out of date, you can reference the github guide here: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch#updating-your-pull-request-branch You want to be sure to use the rebase option as that will make my review easier. |
Added cross join to achieve the Cartesian product, as the query id gets duplicated in case multiple case scenarios
We need to possibly generate the Cartesian product as it generates all possible combinations and doesn't excludes out the any vital information that's required. Hence this query is designed in such a way, because the fact the pg_stat_statements table doens't have a unique key constraint to exclude queryid. Hope this helps. |
Update the PLPGSQL queries to resolve duplicate queries Signed-off-by: VigneshViggu <vigneshkumar.venugopal@outlook.com>
cb73f19
to
1081775
Compare
this PR has been rebased with recent changes in the master. |
Any defined time line on merging this PR to main.. any sooner that we can expect ? |
I need to find time to sit down with the query and understand it. Even after your explanation I'm having a hard time understanding the intention of the CROSS JOIN. SQL hides a lot of intention and it's harder to comment than code. I want to be sure that it's producing the desired outcome before merging and that means finding some time to run tests against an instance and getting a better understanding of the query. I'm hoping that I can find some time over the next week. Another option would be to do more of the work in go instead of SQL which would make it easier to comment on the intention of the why. |
Do let me know if you need any extra hands or help.. I'm open to help explaining the query and it's logic |
Hi. Thanks for the PR @vigneshkumar2016 ! I have been testing this change on a database (pg 16) that is tracking a lot of statements. The I am able to get it to match using by wrapping the query from the PR (for pg 16) in another SELECT * FROM (
SELECT DISTINCT ON (pss.queryid, pg_get_userbyid(pss.userid), pg_database.datname)
pg_get_userbyid(pss.userid) AS user,
pg_database.datname AS database_name,
pss.queryid,
pss.calls AS calls_total,
pss.total_exec_time / 1000.0 AS seconds_total,
pss.rows AS rows_total,
pss.blk_read_time / 1000.0 AS block_read_seconds_total,
pss.blk_write_time / 1000.0 AS block_write_seconds_total
FROM pg_stat_statements pss
JOIN pg_database ON pg_database.oid = pss.dbid
JOIN (
SELECT percentile_cont(0.1) WITHIN GROUP (ORDER BY total_exec_time) AS percentile_val
FROM pg_stat_statements
) AS perc ON pss.total_exec_time > perc.percentile_val
ORDER BY pss.queryid, pg_get_userbyid(pss.userid) DESC, pg_database.datname
)
ORDER BY seconds_total DESC
LIMIT 100; I did not test the query for older PG versions, but it looks like it will have the same issue. Looking forward to having a fix for this merged! Thanks again @vigneshkumar2016. |
I'm arriving late to this discussion, i don't know if it's related with having two entries in pg_stat_statements with same queryid, this is related to the pg_stat_statements.track = 'all' witch includes a new boolean field called: toplevel, to avoid duplication we can set: pg_stat_statements.track = 'top'. My suggestion would be to add this field to the labels, what I cannot foresee is if it would be more than one query with toplevel false and the same queryid. I detected with pg16 and pg_stat_statements 1.10, easy to replicate running: |
Update the PLPGSQL queries to resolve duplicate queries