Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
CTan42
Helper II
Helper II

Remove Duplicates?

Below are my power bi sample. I am asked to create a dashboard to present the stat of left orders. For examples, i do a calculation: orders= calculate(count(ID)). The result shows there are 5 orders. It is wrong. It should be 2 orders (just apple and banana). Same goes to calculate the remaining payment. It should be 6dollars instead of sum of the remaining payment column. It is not just about remove "duplicate". Is there anyways to do that?tabletableordersordersremaining paymentremaining payment

15 REPLIES 15
TomMartens
Super User
Super User

Hey,

 

I couldn't resist and created this measure

Remaining Payment latestOrder = 
SUMX(
VALUES('OrdersPayment'[ID])
,
var latestOrderDate =
 CALCULATE(MAX('OrdersPayment'[Date]))
return
CALCULATE(
 SUMX('OrdersPayment',
  IF('OrdersPayment'[Date] = latestOrderDate
   ,'OrdersPayment'[RemainingPayment]
   ,BLANK()
  )
 )
)
)

This is my sample data
Latest Order Remaining Payment - Sample Data.png

 

And this is the result

Latest Order Remaining Payment - Result.png

 

Hope this is what you are looking for

 

Regards



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens 

Data Updates:

In my dataset, there are 2 columns of Date - Order Date & Goods Received Date. So everything should calculated based on Goods Received Date. There is more than 1 row with exactly same date. And another question is some rows are blank as the goods are not received yet. 

Thanks

@TomMartens,

 

Once again thanks for the help. I'm really appreciate it. I had tried it in my dataset. It seems work, however my dataset is huge (1,000,000++rows). I have to find a way to make sure the measure is correct. Do u mind to explain your measure? I dont really understand it. Thanks.

 

Regards,

Chung.

Hey,

 

sure.

 

At the heart of this meausre there are two nested SUMX('table', expression)

SUMX('table'
  ,SUMX('table',IF(...)
)

SUMX() is an table iterator function this means, that the expression is evaluated for each row of the table. The first table calculated by using the VALUES() function, VALUES always returns a one column sized table, this table is calculated by the column with all the IDs (the orders). Two things to know about this function is the fact , that it returns the unique values of the used column and the values returns the values in the current FILTERCONTEXT.

Implicity the value is implicitly used as filter for the the calculation of the expression.

 

The expression for the 1st SUMX is also a SUMX. Before the second expression is evaluated the MAX date is stored in the variable lastorderdate. This happens for each pass of the first iteration.

The 2nd table (1st parameter of the 2nd SUMX) is the table and is implicitly filtered by each ID. The 2nd expression checks if the date equals the value of the variable, if this is the case, the value "RemainingPayment" is used, otherwise the IF statement returns BLANK.

It is necessary to use SUMX for the calculation of the TOTAL row, where no ID is determining the FILTERCONTEXT.

 

I hope this explanation makes things a little clearer.

 

The function does not consider if there are more than 1 row with exactly the same lastorderdate (date and time), if this would be the case a rule has to be defined that describes which row has to be considered and the measure has to be adjusted accordingly.

 

Regards



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens

thanks for your explanation. It's great that you mentioned the condition at the last. As my data have more than 1 row with exactly same date. Hence do you have any idea to do that? 

 

Regards,

Chung. 

Hey,

 

without knowing yor business rules it's just guesswork, but the constraint "more than one row" has to be considered no matter what date is used.

 

Blank "Remaining Payment" rows are due to the "Goods in transit" concept, from my point of view these goods to not affect the remainig payment, but maybe you have another kpi that exactly describes the difference between "goods send" and "goods received".

 

Regards



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens

I would like to explain what am i doing at the moment (the business rules).  The data is from a normal food shops. Basically the idea is to find out how many (orders) and how much (values) of goods are yet to be received. As the goods are yet to be received, the payment won't be made until the goods are received. For the orders part, it is easy to calculate. I can just use 'distinctcount' function for the ID as every goods has a unique ID. For the payment part, it was the hardest part. I need to calculate how much payment for those goods (yet to be received). Wish you could understand what am i doing atm. 🙂 really appreaciate ur help.

Regards.

Hey, thanks for the explanation.

 

Can you provide sample datarow that depicts goods in transit, not yet received by a store and not yet payed. Does your table also distinguishes between orderdate, shipdate and receiveddate, also for the figures quantity and payment.

 

Regards

 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens

My table contain orderdate, delivery date and received date and also for the figures quantity and payment. I will need some time to provide sample datarow. Thanks

 

regards.

Hey,

 

and as I was staring on the measure I would rewrite the measure for performance reason like so

Remaining Payment latestOrder optimized = 
SUMX(
VALUES('OrdersPayment'[ID])
,
var latestOrderDate = 
	CALCULATE(MAX('OrdersPayment'[Date]))
var minRemainigPayment =
	CALCULATE(MIN('OrdersPayment'[RemainingPayment]))
return

CALCULATE(
	SUM('OrdersPayment'[RemainingPayment])
	,'OrdersPayment'[Date] = latestOrderDate 
	,'OrdersPayment'[RemainingPayment] = minRemainigPayment
			
		)
	)

Regards



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Hey,

 

I guess the row to use would be, the row with the MIN('table'[RemainingPayment].

 

So, without trying I would define a 2nd variable

minRemainingPayment = 
CALCULATE(
  MIN('table'[RemainingPayment])
)

and adjust the IF statement like so

IF('OrdersPayment'[Date] = latestOrderDate && 'OrdersPayment'[RemainingPayment] = minRemainigPayment

Hope this helps

 

Regards



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany
TomMartens
Super User
Super User

Hey,

 

this is a difficult question, without knowing why there are duplicate records, it's not possible to recommend a way to remove these records.

 

The ID 1 has 3 records with different dates, for this reason, without more information it's not possible to advise you how to get rid of the duplicate rows (at least I'm not able to do this).

 

Do you have any idea why there are duplicates?

 

Can you share sample data, by providing a link to the pbix on onedrive or dropbox?

 

Regards



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens

thanks for reply. The idea behind the data is to record the purchase order details of the fruit. For example, Apple is ID 1, Banana is ID 2, Orange is ID 3. Today the shop ordered 10 apples in 1st of January 2017. However, the supplier only delivered 5 apples in 1st January 2017. So the shop just paid 5dollar to supplier which mean there is another 5dollar is yet to be paid. And then the next day 2nd of January 2017, the supplier delivered another 3apples. Hence the tables will record all the delivery. So what i was asked to do is find out how many and how much of orders and payment are yet to be collected and paid. So technically i should take the latest date of the deliver item. Hope that you would understand what i'm trying to explain here. 

This makes sense, now I understand.

 

To count the Orders just use distinctcount('yourtable'[ID]). To answer your other questions it would be really helpful if you would prepare sample data, and make this data available. This makes it a lot easier for all of us to provide answers to your questions.

 

Cheers



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens

u can create same tables as screenshot in your power bi. Due to P&C, i cant show u the real dataset. 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors