I am looking to calculate the unique customer count from the groupby function.
Python code would be :
df = temp_df.groupby(['product','od_pairs']).agg(total_bookings=('booking_no_id', 'count'),
total_teus=('total_teus','sum'),
total_bl=('bl_amount', 'sum'),
total_cost=('cost', 'sum'),
total_customer=('control_party_name', 'nunique'),
)
How do i incorporate distinct count in the groupby function?
Solved! Go to Solution.
Hi @jeanL
I don't believe there's any way to calculate a distinct count using GROUPBY.
Depending where you need to use this table expression, I would recommend rewriting with SUMMARIZECOLUMNS (if you are materializing this table or running as a query) otherwise with SUMMARIZE:
(hopefully I interpreted your intended result correctly):
SUMMARIZECOLUMNS version
summary_table =
SUMMARIZECOLUMNS (
export_data[customer_segment],
export_data[product],
export_data[main_service],
export_data[vessel_direction],
export_data[od_pairs],
"booking_count", COUNT ( export_data[booking_no_id] ),
"total_teus", SUM ( export_data[total_teus] ),
"total_bl", SUM ( export_data[bl_amount] ),
"total_cost", SUM ( export_data[cost] ),
"total_cust", DISTINCTCOUNT ( export_data[control_party_name] )
)
ADDCOLUMNS/SUMMARIZE version
summary_table =
ADDCOLUMNS (
SUMMARIZE (
export_data,
export_data[customer_segment],
export_data[product],
export_data[main_service],
export_data[vessel_direction],
export_data[od_pairs]
),
"booking_count", CALCULATE ( COUNT ( export_data[booking_no_id] ) ),
"total_teus", CALCULATE ( SUM ( export_data[total_teus] ) ),
"total_bl", CALCULATE ( SUM ( export_data[bl_amount] ) ),
"total_cost", CALCULATE ( SUM ( export_data[cost] ) ),
"total_cust", CALCULATE ( DISTINCTCOUNT ( export_data[control_party_name] ) )
)
Regards,
Owen
Hi @jeanL
I don't believe there's any way to calculate a distinct count using GROUPBY.
Depending where you need to use this table expression, I would recommend rewriting with SUMMARIZECOLUMNS (if you are materializing this table or running as a query) otherwise with SUMMARIZE:
(hopefully I interpreted your intended result correctly):
SUMMARIZECOLUMNS version
summary_table =
SUMMARIZECOLUMNS (
export_data[customer_segment],
export_data[product],
export_data[main_service],
export_data[vessel_direction],
export_data[od_pairs],
"booking_count", COUNT ( export_data[booking_no_id] ),
"total_teus", SUM ( export_data[total_teus] ),
"total_bl", SUM ( export_data[bl_amount] ),
"total_cost", SUM ( export_data[cost] ),
"total_cust", DISTINCTCOUNT ( export_data[control_party_name] )
)
ADDCOLUMNS/SUMMARIZE version
summary_table =
ADDCOLUMNS (
SUMMARIZE (
export_data,
export_data[customer_segment],
export_data[product],
export_data[main_service],
export_data[vessel_direction],
export_data[od_pairs]
),
"booking_count", CALCULATE ( COUNT ( export_data[booking_no_id] ) ),
"total_teus", CALCULATE ( SUM ( export_data[total_teus] ) ),
"total_bl", CALCULATE ( SUM ( export_data[bl_amount] ) ),
"total_cost", CALCULATE ( SUM ( export_data[cost] ) ),
"total_cust", CALCULATE ( DISTINCTCOUNT ( export_data[control_party_name] ) )
)
Regards,
Owen
Come together to explore latest innovations in code and application development—and gain insights from experts from around the world.
Put your data visualization and design skills to the test! This exciting challenge is happening now through May 31st!
User | Count |
---|---|
364 | |
96 | |
66 | |
54 | |
38 |
User | Count |
---|---|
358 | |
112 | |
76 | |
61 | |
50 |