Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Favna
Regular Visitor

Need help with advanced DAX measure to filter out duplicates

Hello PowerBI Community,

For my business I have some latest incidents that I push to PowerBI from a view in an Azure SQL Database. This view properly filters out changes to any of its entries (i.e. status going from 'OPEN' to 'CLOSED') however when this change happens PowerBI sees it as an entirely new entry and rather than updating the previous one it fills a duplicate into the table. An example of this behaviour can be seen in this image wherein the 2nd from the top is an updated duplicate of the first one as referenced by their IDs:

IncorrectTable

 

As I see no sensible way that PowerBI can help me prevent this just page or visual level filters (TOPN doesnt do the trick as the duplicate of the 1st would nigh always have a more recent date than say the 10th entry) I decided to look into Meassures to generate a custom table that way but for the life of me I cannot at all get it figured out. In SQL Server I have a fully working query to get the proper result, but I cannot replicate it into the PowerBI DAX Meassures. I am completely clueless on what functions I can still try and how from the plethora I've already tried.

 

This is the proper SQL query. Note that the table names in PowerBI differ from those in the SQL Database, but that is in no way affecting it as I am rewriting to the proper layout in code. Also note that I talk about a lastUpdated here and in the next code block, this was added since taking the screenshots above. In SQL this column represents a DateTime2 of when the row in question was last updated (the first time it is inserted this is equal to the callDate, after that the DateTime at which the update is done)

 

 

SELECT TOP 5 a.incidentID, a.status, a.briefDescription, a.callDate, a.lastUpdate
FROM [incident] a JOIN
(
SELECT incidentID, MAX([lastUpdate]) AS maxUpdate FROM [incident] GROUP BY incidentID
) b ON a.incidentID = b.incidentID AND a.lastUpdate = b.maxUpdate
ORDER BY b.maxUpdate DESC

 

And this is my PowerBI Dataset:

 

 

{
	"name": "TOPdesk PoC",
	"defaultMode": "PushStreaming",
	"tables":
	[
		{"name": "incidents",
		"columns": [
			{"name": "ID","dataType": "string"},
			{"name": "Status","dataType": "string"},
			{"name": "Description","dataType": "string"},
			{"name": "Date","dataType": "dateTime"},
{"name": "lastUpdated","dataType": "dateTime"}, {"name": "firstLines","dataType": "int64"}, {"name": "secondLines","dataType": "int64"} ]} ] }

 

Finally this is the result I would like to get (photoshopped previous image). As you can see the first one with "Test Attachment Issue" is gone here, as it was the older one compared to the previous one

 

 CorrectTable

 

Final notes:

- Yes I am aware that the dates show the exact same for the two conflicting incidents in my example images. Suffice it to say they are no longer as I am not storing a "lastUpdated" date in the database which is now used.

 

p.s. If only PowerBI REST API had an "UPDATE WHERE" endpoint big sighs

1 ACCEPTED SOLUTION

HI @Favna

 

How does the following calculated measure go?

 

Measure as Fiter = 
VAR RowCount =
    CALCULATE(
        COUNTROWS('incidents') ,
        FILTER(
            ALL('incidents'),
             'incidents'[ID] = MAX('incidents'[ID]) 
             && 'incidents'[lastUpdate] > MAX('incidents'[lastUpdate])
        )
        )
RETURN IF(RowCount=0,1)    

You can use this measure as a filter on your visual and make sure it is set to 1


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

View solution in original post

7 REPLIES 7
Greg_Deckler
Super User
Super User

Can you paste your sample data as text instead of an image?

 

Seems like you could use a GROUPBY or SUMMARIZE to generate the necessary table of ID's and then use ADDCOLUMNS with something like a MAX on the date filter to get the correct row information for each ID. But, I'm not manually typing in your data. Please see this post regarding How to Get Your Question Answered Quickly: https://community.powerbi.com/t5/Community-Blog/How-to-Get-Your-Question-Answered-Quickly/ba-p/38490


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Yes ofc, I have dumped the entire table export from SQL Server on this hastebin (click here) (opens in new tab). Also here is massively trimmed version with only the columns related to this question:

 

CREATE TABLE incident(
	[incidentID] [nvarchar](36) PRIMARY KEY NOT NULL,
	[status] [nvarchar](35) NOT NULL,
	[briefDescription] [nvarchar](30) NOT NULL,
	[callDate] [datetime2](7) NOT NULL,
	[lastUpdate] [datetime2](7) NOT NULL
)'

INSERT incident VALUES ('19f39842-05b6-4541-9b08-549a9d7d5a61', 'firstLine', 'Issue1', '2018-04-18T12:14:30.0000000', '2018-04-17T15:00:00.0000000');
INSERT incident VALUES ('1a835ba8-d9e1-44fb-b9b8-845df7b8de7b', 'firstLine', 'Issue2', '2018-04-18T13:14:30.0000000', '2018-04-17T13:30:00.0000000');
INSERT incident VALUES ('9ec9704f-01d6-4338-b2a8-ba4270936960', 'firstLine', 'Issue3', '2018-04-18T12:20:30.0000000', '2018-04-17T12:30:00.0000000');
INSERT incident VALUES ('cbd50984-8124-4b5e-bdb3-c12c21c7070b', 'secondLine', 'asdfdghfghfgh', '2018-04-17T08:13:00.0000000', '2018-04-17T15:23:47.0000000');
INSERT incident VALUES ('ceb84e8e-c38d-4bc7-bf18-926a919d76d9', 'firstLine', 'Actual Descriptio', '2018-04-17T08:04:00.0000000', '2018-04-17T15:23:47.0000000');
INSERT incident VALUES ('f3f6c516-ff12-4bf8-abaa-f8b83958dfe3', 'firstLine', 'Issue6', '2018-04-18T10:10:30.0000000', '2018-04-18T18:24:00.0000000');

From the trimmed data the SQL query from my first post as well as the DAX I am aiming for would only return 5 of these 6 rows, namely the row with description "Issue3" should be left out while the others are shown.

Hi @Favna

 

@Greg_Deckler is spot on and I had a quick crack at the syntax.  I think the data from your reply is different from your original post but please try this calculated table.

 

Table 4 = 
VAR b = SELECTCOLUMNS(
            SUMMARIZECOLUMNS(incident[incidentID] , "maxUpdate" , MAX('incident'[lastUpdate])),
            "B_IncidentID",[incidentID],
            "B_maxUpdate",[maxUpdate])
            
            
VAR a = GENERATE(incident,filter(b,[b_maxUpdate] = 'incident'[lastUpdate] && [B_IncidentID] = 'incident'[incidentID]))
RETURN 
    SELECTCOLUMNS(a,
        "IncidentID",[incidentID],
        "status",[status],
        "briefDescription",[briefDescription],
        "lastUpdate",[lastUpdate]
    )

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!


@Phil_Seamark wrote:

Hi @Favna

 

@Greg_Deckler is spot on and I had a quick crack at the syntax.  I think the data from your reply is different from your original post but please try this calculated table.

 

Table 4 = 
VAR b = SELECTCOLUMNS(
            SUMMARIZECOLUMNS(incident[incidentID] , "maxUpdate" , MAX('incident'[lastUpdate])),
            "B_IncidentID",[incidentID],
            "B_maxUpdate",[maxUpdate])
            
            
VAR a = GENERATE(incident,filter(b,[b_maxUpdate] = 'incident'[lastUpdate] && [B_IncidentID] = 'incident'[incidentID]))
RETURN 
    SELECTCOLUMNS(a,
        "IncidentID",[incidentID],
        "status",[status],
        "briefDescription",[briefDescription],
        "lastUpdate",[lastUpdate]
    )

 

 


 

You are correct in that the data is slightly different, that is because the OP data was referenced from the PowerBI side whereas the replied is referenced from the SQL Server side. My bad! Here is a map of how they are rewritten:

 

SQL Server        |    PowerBI
----------------------------------
incidentID        |   ID
status            |   Status
briefDescription  |   Description
callDate          |   Date
lastUpdate        |   lastUpdated

 


As for the DAX query provided by  @Phil_Seamark, sadly it does not work. For one I should note that this PowerBI dataset is a PushStreaming dataset and seemingly therefore I cannot use Calculated Columns, only Meassures. More importantly however, plugging that DAX query in PowerBI Desktop (loaded with the data from PowerBI Service, data as seen in the replied message) it gives me this error:

 

Error Message:
Query (2, 3) Calculation error in measure 'incidents'[Table 4]: The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.

If you want to see the full exported error please see this hastebin (click here) (opens in new tab)

 

Edit:

 

I have also slightly tried to reformat the DAX query as well as made it reference the proper columns, this is what I have now:

 

Table 4 = 
VAR b = SELECTCOLUMNS(
SUMMARIZECOLUMNS(incidents[ID]; "maxUpdate";MAX(incidents[lastUpdated])
);
"B_incidentID";incidents[ID];
"B_maxUpdate";[maxUpdate])

var a = GENERATE(incidents;
FILTER(b;[B_maxUpdate] = incidents[lastUpdated] && [B_incidentID] = incidents[ID]))

RETURN SELECTCOLUMNS(
a;
"ID";incidents[ID];
"Status";incidents[Status];
"Description";incidents[Description];
"Last Update";incidents[lastUpdated])

 


 

 Image showing I cannot select Calculated ColumnImage showing I cannot select Calculated Column

Favna
Regular Visitor


You are correct in that the data is slightly different, that is because the OP data was referenced from the PowerBI side whereas the replied is referenced from the SQL Server side. My bad! Here is a map of how they are rewritten:

 

SQL Server        |    PowerBI
----------------------------------
incidentID        |   ID
status            |   Status
briefDescription  |   Description
callDate          |   Date
lastUpdate        |   lastUpdate

 

 

 


 

Adding to this change of how the data is shown, here is the same data from my replied comment in the PowerBI format. I have added one of such duplicate rows where they are both shown even though only the one with the most recent "lastUpdated" date stamp should be shown. To reiterate my goal using this example, I'd want to only ever show the most recent 5 rows based on their lastUpdated value and without duplicate IDs. Thus in this example's case that would mean rows 2 through 6.

 

JTKZwoaL

HI @Favna

 

How does the following calculated measure go?

 

Measure as Fiter = 
VAR RowCount =
    CALCULATE(
        COUNTROWS('incidents') ,
        FILTER(
            ALL('incidents'),
             'incidents'[ID] = MAX('incidents'[ID]) 
             && 'incidents'[lastUpdate] > MAX('incidents'[lastUpdate])
        )
        )
RETURN IF(RowCount=0,1)    

You can use this measure as a filter on your visual and make sure it is set to 1


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!


@Phil_Seamark wrote:

HI @Favna

 

How does the following calculated measure go?

 

Measure as Fiter = 
VAR RowCount =
    CALCULATE(
        COUNTROWS('incidents') ,
        FILTER(
            ALL('incidents'),
             'incidents'[ID] = MAX('incidents'[ID]) 
             && 'incidents'[lastUpdate] > MAX('incidents'[lastUpdate])
        )
        )
RETURN IF(RowCount=0,1)    

You can use this measure as a filter on your visual and make sure it is set to 1


 

That's perfect! It works! Thank you so much! I've marked it as the solution Cat Happy !

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.