Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
brick
Employee
Employee

Filtering lines from data table based on values in exclude table

Hello,

 

I'd like some help getting pointed in the right direction. I am new to Power BI but trying to learn and absorb as much as possible by reading books, watching videos, experimenting and asking questions. So, I hope you don't mind a question from a beginner. I am trying to solve a problem and honestly, I am not even sure which direction I should take. I will toss out an idea but I am open to taking a completely different direction from the ground up as I own the data source, report generation, etc.

 

The basic problem is that I have a data table which has many rows which I cannot modify. The data model is in DirectQuery mode. Some of those rows contain data that interests me at times and does not interest me at other times. So, I'd like a way to filter out the data I do not want and then include it in my reports when I do want it. I am visualizing a seperate table which includes a date stamp, the unique identifier and an is_valid column that tells me whether I want to display the data. If could add rows to this exclude table whenever I change my mind. Then, there would be some methodology that queries this exclude table, if it finds no entries for the unique key, it assumes the data in the data table is valid. If it does find data in this exclude table, it finds the row corresponding to the last date, gets the is_valid value and then determines whether to include that data or not. Here is an example of the exclude table:

 

exampletabler.png

 

Let's say the data table has a million rows in it with many of those rows matching the md5 shown above and many not.

 

The problem I am having is that, based on my learning so far, I can imagine several ways to do this and yet I do not have the skills to fully implement any of them. So, I am hoping someone here can read this and tell me I should go learn methodology ABC. Maybe breifly outline the preferred solution. I don't mind learning but I am hoping to at least be pointed in the right direction. One thing to keep in mind is that I expect my data table to get HUGE... perhaps billions of lines. So, that may affect some solutions. Based on my reading, calculated tables/columns consume memory and mesures consume CPU but I'm not sure which is better here. Following are a few (likely wrong) thoughts which prove I have put some effort into this before coming to beg for help 🙂 haha!

 

1. Write a DAX equation to create a table. The DAX equation must assume the data is valid, then search the exclude table for the md5. If found, acquire the last entry for that md5 identifier from the submit_date, and then return the is_valid value. Based on that, somehow filter out rows from the data table. This seems like a bad idea to me as I imagine I'd be mainly duplciating in memory, the single biggest table in my data model. But on the plus side, I would have a data table with only the "good" content and all my visualizations would just work!

 

2. Create a calculated column in the data table based the last known is_valid value in the exclude table. Then use that column to filter the enrite report. I tried getting started on this and noticed the calculated column GUI wanted me to build it only off of values in the data table. Although I am sure there must be a way around that.

 

3. Create a measure which returns the is_valid value for each md5 identifier. I started down this route and then bumped into an issue when I tried to use it as a full report filter. Apparently measures cannot be used as filters for an entire report (all pages)?

 

4. Somehow reduce the incoming exclude table to only the latest date for each md5 and then use it as a filter? I suppose I can actually have a value in the exclude table for every single md5 identifier... it would start with valid by default if that is needed.

 

So, by now you can likely tell I'm in the weeds and quite new to this so I'll stop messing around with ideas and let you experts educate me. Please.

4 REPLIES 4
Jimmy801
Community Champion
Community Champion

Hello @brick

 

I suggest to do all data modification and prefiltering in Power Query first and do only the final calculations in Power Pivot. I don't know your data source, so it's difficult to propose something. But let's assume you are querying analysis service you could create a MDX-statement to select the data you need, without getting them all in your data model. You could apply there already the filtering criteria you are shown in the table.

 

Hope this information is of any help

 

Jimmy

Thanks for taking valuable time out of your day to reply to my request. As I understand it, you are suggesting I reduce the table from something like this (I added a few more rows for clairty):

 

2019-12-22 09_59_32-Kusto.Explorer [v1.0.3.883].png

 

To something like this:

 

2019-12-22 10_23_20-Kusto.Explorer [v1.0.3.883].png

 

I think you suggested I do it with an M language power query. I have a book on that but decided to poke around on the internet before I got back to reading it (about 1/2 way through so far). I found something that seems to work:

 

reduced_exclude_data = FILTER(ALL(table),[submit_date]=CALCULATE(MAX(table[submit_date]),ALLEXCEPT(table,table[md5])))

 

However, I believe this is a DAX formula to be used in the "New Table" call. Indeed it creates a new table and the table is reduced exactly as I would expect. I only get the unique value of each md5 based on the latest date.

 

2019-12-22 10_32_59- - Power BI Desktop.png

 

To use this data, I created the link in my data model:

 

2019-12-22 10_31_37-- Power BI Desktop.png

 

Then I added an all pages filter which seems to be doing the job:

 

2019-12-22 10_35_09-- Power BI Desktop.png

 

Now my question is... is this a bad approach for some reason? I believe you were suggesting an M query and I believe I have implemented a DAX formula to create this table I would assume that the downside is that I now have an extra table in the machine's memory... the full table and the reduced table? However, I expect this table to be fairly small (~500k lines with these 3 simple columns) so I am thinking that may not be an issue.

 

Is there a better way? Is this way wrong?

 

Jimmy801
Community Champion
Community Champion

Hello @brick ,

 

if it's working for you its fine. And if the performance is okay.

The downside of only using this, is that you have to import the whole data even if you don't need all of it.

With power query you could limit the amount of data in your data model and depending on your data source is taking place query folding, meaning you get from the datasource only the lines you need.

But if your solution is working, I wouldn't change it.


If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

Thanks again for the feedback. So, I wanted to try it with the Power Query method you mention. Doing some studying, I found that the "Group By" functionality held some promise. After working at it for a bit, I came up with the below request and the resultant table which looks correct. Although, I will admit the "min" request on the boolean column of is_valid seems wrong.

 

2019-12-23 14_24_28-OneNote.png

 

Any thoughts on this? I next attempted to use this reduced data in my model and found that it works. I applied the same filter on all pages shown earlier and had no trouble. However, I did encounter one oddity. I simply wanted to display the data in this new table and I now get an error when trying to include the md5 in the table. It looks like this:

 

2019-12-23 14_30_11-- Power BI Desktop.png

 

Does that make sense?

 

Edit: No it does not as "Min" only keeps the lowest of true/false... not the one associated with the latest date... back to searching.

 

I followed this tutorial and was able to get my table looking correct... as the example shows:

https://radacad.com/grouping-in-power-query-getting-the-last-item-in-each-group

 

2019-12-23 20_04_48-- Power Query Editor.png

 

Still getting that OLE error...

But the table cannot be used in Power BI.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors