Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The visual tries to read data from the data source - Negative impact on large data sources in Direct Query #21

Open
erapade opened this issue Nov 12, 2020 · 7 comments

Comments

@erapade
Copy link

erapade commented Nov 12, 2020

I tried this filter using direct query. I haven't used it for in memory models, but for direct queries I see no point in reading data from the data source, this just put to much load on the data source.

My suggestion is to implement a settings property where developers can choose to not read data from the data source.

I get this error message while using this visual on data sources that has large amount of data in the text column

-->
Couldn't load the data for this visual
Couldn't retrieve the data for this visual. Please try again later.
Please try again later or contact support. If you contact support, please provide these details.
Activity ID: 5309b3b7-9142-473e-93ff-8b4dbb7abcd0
Request ID: c79958ce-841f-4b21-0792-502916286a85
Correlation ID: dd4f7fe6-468b-64e4-e280-dfa66d3c8e4f
Time: Thu Nov 12 2020 08:36:27 GMT+0100 (Central European Standard Time)
Service version: 13.0.14696.63
Client version: 2011.1.03669-train
Cluster URI: https://wabi-north-europe-redirect.analysis.windows.net/
<--

The error message is not due to a large number of rows, since using this visual on other columns in the same table doesn't generate this error. This error messages comes out of that this column has a large amount of data.

Yes I know that Full Text Index is not supported and using this filter on large text fields and tables will be slow, but the customers/users are willing to accept this being slow, but not that an error message/varning is generated.

By some reason, I see this error more frequently in my P5 Premium tenant than in my Power BI Desktop

Checking the generated DAX-code, there is a limit in the number of rows retrieved:
-->
DEFINE
VAR __DS0Core =
DISTINCT('Test Tickets'[Description])

VAR __DS0BodyLimited =
TOPN(30002, __DS0Core, 'Test Tickets'[Description], 1)

EVALUATE
__DS0BodyLimited

ORDER BY
'Test Tickets'[Description]
<--

But I guess that the combination of 30002 rows of NVARCHAR(4000)) in some situations is just to much for Power BI to handle, see the SQL code below:
-->
SELECT
TOP (30002) [t10].[Description]
FROM
(

SELECT
CAST([Description] AS NVARCHAR(4000))
...
<--

@erapade
Copy link
Author

erapade commented Dec 10, 2020

Could someone, maybe @stopyoukid, take a look at this.

I tried to fix this myself, but the only thing I manged to do was to decrese the number of rows to the visual and by doing that, my Power BI report no longer generates a warning in Power BI Service.
I commited the Pull Request 26 for this.

But there's a need for another change as well.

With the current implementation my filter visual takes ~20 seconds to update since I'm doing Direct Query. The visual generates approximate this code:
SELECT TOP (3) [t0].[Description]
FROM (
SELECT CAST([Description] AS NVARCHAR(4000)) AS [Description]
FROM (
(
SELECT *
FROM [pbi2].[Tickets] AS [$Table]
)
) AS [t0]
) AS [t0]
GROUP BY [t0].[Description]
ORDER BY [t0].[Description] ASC

As you could imaging, Doing a GroupBy, OrderBy on a NVARCHAR(4000) with >1 Million rows, and then a TOPN on-top of that takes a while. The TOPN only protexts PowerBI from getting to much data, but does not effect the amount of data that needs to be grouped and ordered.

Since I don't see any reason for this kind of filter to read data, couldn't we just remove the feature that read data?

@stopyoukid
Copy link
Member

stopyoukid commented Dec 10, 2020

I totally agree, TextFilter doesn't need data at all, just a reference to the column it filters. However, it seems this is a limitation of the custom visual API. I don't know of another way of telling the Power BI API that we need the field only and no data.

In regards to your PR #26, we've gotten conflicting information (my team I mean). It does seem in some instances, maybe depending on how data is connected up to the workbook, that the dataReductionAlgorithm in capabilities.json actually can have an effect on the number of rows that text filter will filter. But, in your case, it sounds like it works with a small number (which is great), but I don't know if we can assume in general that it will work. That being said, I am happy to be corrected/updated if that is no longer the case.

In regards to the perf issue, I pushed some updates to that build-fix branch, which may help.

  • I updated the type of fields it can take (Grouping and Measure)
  • I updated how Power BI returns the data, so instead of it being categorical, it is now tabular (so maybe it won't group by anymore)
  • I incorporated the "top": 1 update that you made in your PR.

When you use it, try setting your field in Power BI to be either "Last " or "First " like below, maybe this will help.

image

@erapade
Copy link
Author

erapade commented Dec 11, 2020

Thanks a lot @stopyoukid.
This works kind of like a charm, but some changes could be done if you plan to update the main branch (and I think this would be possible). We probably see the same problems using inmemory models, but since they usually runs faster and don't have wide text columns we don't notise this

I used Count instead of First/Last since First/Last generated a Min/Max in T-SQL. Imaging doing Min/Max on millions of NVARCHAR(4000) rows :-)
So now the generated T-SQL looks ~like below when I chose count() as the aggregate:
SELECT COUNT_BIG([t0].[Description]) AS [a0]
FROM (
SELECT CAST([Description] AS NVARCHAR(4000)) AS [Description]
FROM (
(
SELECT *
FROM [pbi2].[Tickets] AS [$Table]
)
) AS [t0]
) AS [t0]

With the limitations you mention, I guess this is the best we can do.

Something I noticed:

  • The change you did for TOPN didn't completely work since it generated a TOPN of 1002. I have noticed that we need to set the value to 1, and this will then generate a value of 3 for TOPN. If we set it to 0 or not at all, a default of TOPN(1002) will be generated. I guess there's a small bug here, but these are maybe in the APIs. I can understand the thinking of 1 to be 2 to leave some rooms for headings, but my guess is that this is a missunderstanding. Why 1 becomes 3 I have no idea, maybe two levels in the API's are stacking the same fault ontop of each other. It's the same in both DAX and T_SQL, so you don't need Direct Query to see this
  • The filter settings for the visual can't be changed if I use a count (or any other aggregation) for the columns. I guess this could be a good thing since there's no effect of setting a filter for this visual. It only slows down the visual by adding yet another filter time consuming DAX SEARCH and T-SQL CHARINDEX
  • As the you suspected, the grouping is now removed, but only when choosing an aggregation

This is the T-SQL code generated when not using aggregates:
SELECT TOP (1002) [t0].[Description]
FROM (
SELECT CAST([Description] AS NVARCHAR(4000)) AS [Description]
FROM (
(
SELECT *
FROM [pbi2].[Tickets] AS [$Table]
)
) AS [t0]
) AS [t0]
GROUP BY [t0].[Description]
ORDER BY [t0].[Description] ASC

So what I would suggest (if possible)

  • Set the column type as Measure only, and enfoce inside the visual the aggretation to be a count (no matter what the user configures). This needs to be tested though, i.e. what will happen when you have a visual already and you then update the visual with new column schema requirements and no possiblities to configure filters (if the user by some silly reason have a visual level filter configured)
  • Regarding the filter not beeing able to configure using aggretages. I thing that is a good thing. Not even using Categories, this filter settings will not have any effect on the Power BI report (except adding delays). Tables or related tables will not be filtered based on a filters visual filter settings, and the only thing that will happen is that we will use more cpu resources

@erapade
Copy link
Author

erapade commented Dec 11, 2020

It also looks like some earlier problems I hade like thouse described in issue #19 are gone now. For me these problems didn't come after some days, they did come now-and-then, whenever change filtered columns or duplicated the filter.
I now also have noted that when I set a value to the filter, the value-box are kind of blue (probably indicating an "unsaved" value) until I press enter or push the search button. That did never happen before. So overall I must say that this looks very promesing.

@erapade
Copy link
Author

erapade commented Sep 4, 2021

As I revisit this and see nothing has happened, I place some comments.
My investigation above indicates some strange behavior in both this visual but also in the public API. To improve the Power BI product it would be good if the API problems are addressed. As I’m not a visual developer I’m not the right person to do this, but I hope someone else would pick up this task

@StevenBlair123
Copy link

Was this ever released? We are trying to use this feature, but don't see the First / Last etc

@erapade
Copy link
Author

erapade commented Apr 5, 2023

Opus, yes this one was long ago. As I remember I downloaded something not on the main branch. If that doesn't help, the implicit measures like last, count etc are not available when using calculation groups so if you are using that you will not see then. Alternative, write your own explicit measure. @stopyoukid Any news in this, as I remembered there was things that could be done with not that much effort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants