Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
IanWaring
Helper IV
Helper IV

Removing duplicate rows, data from different (joined) data sources

I have a report that's a combination SalesOrderHeaders and SalesOrderLines:

 

Customer, Service Purchased, Invoice Date, Amount Due

 

Boston College, Subscription for HE, 4/5/2020, $400

York City College, Subscription for FE Colleges, 4/5/2020, $200

York City College, Subscription for FE Colleges, 4/10/2020, $250

 

I have those sitting in my main Report view as a result of all sorts of joins and filtering. I would like to only show the latest date purchase for each Customer. The gotcha is that most of the ways I can see of removing duplicates like this is to go back to sort source data tables; however, Customer and Invoice Date exist in the SalesorderHeader Table, whereas the Product and the sale value sit in the separate SalesOrderLines table. I'm only looking for the latest sales of two specific products, by customer.

 

How do I remove every record bar the latest transaction for each customer in my report view? (ie: so that first York College record would be removed from my view)?

 

ps: I'm sufficient a noob that even if I try to create a separate table, do a max date and join it back, I don't know what my report view is called. Hence struggling a bit.

1 ACCEPTED SOLUTION

...or I have to build a named TABLE out of at least three database joins, filter it down and then jump over to the report view to finish the summary. Parking it for tonight - shouldn't be doing work today anyway 🙂

View solution in original post

14 REPLIES 14
IanWaring
Helper IV
Helper IV

All fixed. I ended up doing a mass clean of data ahead of putting it into report form, including sorting and de-duping the rows. But then going into the advanced editor to wrap a text.buffer function around the sort descending, just ahead of the de-duplication. With that, rebuilt the report and it's now live with all our users.

 

Thank you! I'm now coming up the learning curve at a pretty decent pace 🙂

IanWaring
Helper IV
Helper IV

Found a video that explained everything succinctly, suggesting:

 

Last Date =
VAR CurrentName = CustomersV3[OrganizationName]

RETURN
MAXX(
FILTER( ALL( CustomersV3 ), CustomersV3[OrganizationName] = CurrentName),
FreeTextInvoiceHeaders[InvoiceDate])

However, i'm back to having it fail with:
A single value for column 'InvoiceDate' in table 'FreeTextInvoiceHeaders' cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result.

Examples seem to consider that you have a single table in front of you of source data, not a view of filtered database joins - unless PowerBI has a table name that reflects what you can see in front of you as a report.

Frustration continues. In the meantime, the YouTube video that looks so close is: Calculate The Last Or Max Value Within A Table Of Data Using DAX In Power BI - YouTube

...or I have to build a named TABLE out of at least three database joins, filter it down and then jump over to the report view to finish the summary. Parking it for tonight - shouldn't be doing work today anyway 🙂

IanWaring
Helper IV
Helper IV

I've put a data sample ino three sheets in an Excel workbook. Task is to join FreeTextInvoiceHeaders to FreeTextOrderLines using the InvoiceIdentifer as the join, then to CustAgedBalances using Account Number as the join. Find all InvoiceLines with "HE Subs" or "FE Subs" in the InvoiceText, List all the Invoices and their values, then disregard all lines where the Amount Outstanding is less that then invoice value (ie: looks like they've paid).

You'll end up with a handful of customers with two different invoices list; I want to be left with just the latest date one for each customer.

Now, where do I place my Excel workbook?

IanWaring
Helper IV
Helper IV

FWIW, creating a measure:

 

LatestDate =
IF(
COUNTROWS(FILTER(FreeTextInvoiceHeaders, FreeTextInvoiceHeaders[InvoiceDate] >=EARLIER( FreeTextInvoiceHeaders[InvoiceDate]) && FreeTextInvoiceHeaders[CustomerAccount] = EARLIER(FreeTextInvoiceHeaders[CustomerAccount]))) = 1, 1, 0)
 
results in:
 
EARLIER/EARLIEST refers to an earlier row context which doesn't exist.
 
Will do some more digging...
IanWaring
Helper IV
Helper IV

@AllisonKennedy Greetings to NZ from Tier 3 in the UK (can we have your PM please?).

 

Another few hours, got one of the earlier attempst working but it was inserting 0's and 1's randomly - I guess as there are other orders in my data source that my view had filtered out. So have gone back and tried to put the filter on the component FreeTextInvoiceHeaders and FreeTextinvoiceLines, but hit the wall again, can't get any measure nor column using the suggestions to get added.

I've also tried to use the text.buffer and text.sort combinations on the data sources without success.

Even got stuck trying to do a MID of a string to extract two characters following a certain word, so probably need to walk away from the screen and let my head unjumble.

I'm going do to some Power BI training over xmas (I have used a certain other product from the Seattle area for 8 years, so just going through a period of rough adjustment to this products nuances). Just need to sort the view by customer and invoice date descending, then trim off all but the 1st row displayed for each customer. Currently proving a bit of a struggle 🙂

Hi,

Could you share some data and also show the expected result.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

@IanWaring Greetings from sunny NZ. Well actually it's quite a gloomy windy day today, but I can't complain. I'll see if I can pull some strings on the PM front. 

 

In terms of Power BI, there are many places you can do the filter, not sure what orders you think might be messing up the 0, 1? The 0 will be applied to all other conditions, so I can see that being seemingly random, but can you give an example of where it returned 1 but shouldn't have? That could help us figure out what's going on in the underlying dataset. 


Please @mention me in your reply if you want a response.

Copying DAX from this post? Click here for a hack to quickly replace it with your own table names

Has this post solved your problem? Please Accept as Solution so that others can find it quickly and to let the community know your problem has been solved.
If you found this post helpful, please give Kudos C

I work as a Microsoft trainer and consultant, specialising in Power BI and Power Query.
www.excelwithallison.com

IanWaring
Helper IV
Helper IV

Thankyou Allison (yet again) 🙂

 

trying:

 

LatestDate =
IF(
COUNTROWS(FILTER('CustAgedBalances', 'FreeTextInvoiceHeaders'[InvoiceDate] >=EARLIER('FreeTextInvoiceHeaders'[InvoiceDate]) && 'FreeTextInvoiceheaders'[CustomerAccount] = EARLIER('FreeTextInvoiceHeaders'[CustomerAccount]))) = 1, 1, 0)

 

But getting:

 

A single value for column 'InvoiceDate' in table 'FreeTextInvoiceHeaders' cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result.

Will look at it once I have some meeting prep out of the way (normal pre-Xmas rush).

@IanWaring  Yup, that's because [InvoiceDate] is not in the same table that you're filtering. I can't remember (but feel like I know this from another post or even this thread), is CustAgedBalances the fact table/SalesOrderLines? 

 

If so, you should be able to use RELATED() function I think: 

 

LatestDate =
IF(
COUNTROWS(FILTER('CustAgedBalances', RELATED('FreeTextInvoiceHeaders'[InvoiceDate] ) >=EARLIER(RELATED('FreeTextInvoiceHeaders'[InvoiceDate])) && RELATED('FreeTextInvoiceheaders'[CustomerAccount]) = EARLIER(RELATED('FreeTextInvoiceHeaders'[CustomerAccount])))) = 1, 1, 0)
 
If that doesn't work, you can add explicit columns as a workaround, but that shouldn't be necessary. 
 
Please use @ to mention me in your replies, and I'll follow up on them tomorrow (it's very late here in NZ, so am going to sign off for the evening now). 
 
Merry Christmas!

Please @mention me in your reply if you want a response.

Copying DAX from this post? Click here for a hack to quickly replace it with your own table names

Has this post solved your problem? Please Accept as Solution so that others can find it quickly and to let the community know your problem has been solved.
If you found this post helpful, please give Kudos C

I work as a Microsoft trainer and consultant, specialising in Power BI and Power Query.
www.excelwithallison.com

IanWaring
Helper IV
Helper IV

Sorry for the serial newbie questions, but where can I find the name of the Table that's currently in my report view? I'm getting an error because I don't know what to substitute for "Table" in the examples. Or does this only work in the source data files? (the things I filter on have values across two joined data sources).

@IanWaring  the table name in the formula will likely be SalesOrderHeaders or SalesOrderLines, depending on where the columns are located. You can find the name of the tables in the fields pane - all the fields (aka columns) are grouped by Table name. 


Please @mention me in your reply if you want a response.

Copying DAX from this post? Click here for a hack to quickly replace it with your own table names

Has this post solved your problem? Please Accept as Solution so that others can find it quickly and to let the community know your problem has been solved.
If you found this post helpful, please give Kudos C

I work as a Microsoft trainer and consultant, specialising in Power BI and Power Query.
www.excelwithallison.com

amitchandak
Super User
Super User

@IanWaring , Try meaures like (with all measure you need on Min date

 

measure = calculate(max(Table[Invoice Date]), filter(Table,Table[Invoice Date] = calculate(Min(Table[Invoice Date]), allexpcept(Table,table[Invoice ID]))))

 

measure = calculate(max(Table[Amount Due]), filter(Table,Table[Invoice Date] = calculate(Min(Table[Invoice Date]), allexpcept(Table,table[Invoice ID]))))

AllisonKennedy
Super User
Super User

@IanWaring 

 

What are the raw tables before the joins merge, etc? 

 

Otherwise if the data you pasted above is all in one table, you could add a column that flags the latest date: 

 

LatestDate =
IF(
COUNTROWS(FILTER('Table', 'Table'[Invoice Date] >=EARLIER('Table'[Invoice Date]) && 'Table'[Customer] = EARLIER('Table'[Customer]))) = 1, 1, 0)

Please @mention me in your reply if you want a response.

Copying DAX from this post? Click here for a hack to quickly replace it with your own table names

Has this post solved your problem? Please Accept as Solution so that others can find it quickly and to let the community know your problem has been solved.
If you found this post helpful, please give Kudos C

I work as a Microsoft trainer and consultant, specialising in Power BI and Power Query.
www.excelwithallison.com

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.