Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi All,
I'm working on a test project for school but I'm lost... I was given a sample report and I have to create a matrix table like below:
This is the question:
We'd like to compare how durable the Director/Actor relationships are by continent, represented by Netflix releases. We'd like to measure this by average number of movies done together, average length of time between their first and last movie together, and average number of movies done together per year of director's longevity. Please put together a matrix visualization with the results of this analysis and please also try to complete this challenge without modifying the data model. That means no calculated columns, calculated tables, or query modifications
I've attached the file: https://1drv.ms/u/s!AnJmyZTL2iv4gZEhp2wzhSZzgZyrDw?e=YpEK8c
Any help at all will be highly appreciated. Thank you in advance!
Solved! Go to Solution.
@am_i_really Try this as I believe this provides the results shown:
AvgMoviesTogether 2 =
VAR __Table = SUMMARIZE(FILTER('Netflix',[director]<>BLANK() && [cast] <> BLANK() && [director]<>[cast]),[director],[cast],"__MoviesTogether", COUNTROWS(DISTINCT('Netflix'[show_id])))
VAR __Result = AVERAGEX(__Table,[__MoviesTogether])
RETURN
__Result
Now, as to which is "correct", that's a bit of a different matter. The dataset seems kind of jank. For example, HARRY BRADBEER as director did only one show, s2010 but for some reason the Netflix table lists it basically three times, once for Dramas, Children & Family Movies and Action & Adventure. Seems like kind of a hacky way to put the same thing in three categories I guess. So, the GROUPBY method counts this as 3 because it does not distinguish DISTINCT show_id while the second method using SUMMARIZE counts this "correctly" as 1 since it accounts for DISTINCT show_id.
@am_i_really Here is the first one. The others should be a fairly similar pattern. Not sure if the screen shot is supposed to be the answers because I get different numbers. In the formula below I provide 2 options. __Table filters out the rows where the director is blank (null) which I feel would be the right way to do it. __Table1 does not include this filter. Currently, this formula returns results for __Table and not __Table1 but you can easily switch it.
AvgMoviesTogether =
VAR __Table = GROUPBY(FILTER('Netflix',[director]<>BLANK()),[director],[cast],"__MoviesTogether", COUNTX(CURRENTGROUP(), [show_id]))
VAR __Table1 = GROUPBY('Netflix',[cast],"__MoviesTogether", COUNTX(CURRENTGROUP(), [show_id]))
VAR __Result = AVERAGEX(__Table,[__MoviesTogether])
RETURN
__Result
If your professor gives you low marks for not using CALCULATE, refer them to this:
Thank you for your answer but its not giving the result that he specified. The screenshot that was attached is what he said the results have to be. Any idea how we may be able to get that?
Did you find any solution for the other 2 columns?
@am_i_really Try this as I believe this provides the results shown:
AvgMoviesTogether 2 =
VAR __Table = SUMMARIZE(FILTER('Netflix',[director]<>BLANK() && [cast] <> BLANK() && [director]<>[cast]),[director],[cast],"__MoviesTogether", COUNTROWS(DISTINCT('Netflix'[show_id])))
VAR __Result = AVERAGEX(__Table,[__MoviesTogether])
RETURN
__Result
Now, as to which is "correct", that's a bit of a different matter. The dataset seems kind of jank. For example, HARRY BRADBEER as director did only one show, s2010 but for some reason the Netflix table lists it basically three times, once for Dramas, Children & Family Movies and Action & Adventure. Seems like kind of a hacky way to put the same thing in three categories I guess. So, the GROUPBY method counts this as 3 because it does not distinguish DISTINCT show_id while the second method using SUMMARIZE counts this "correctly" as 1 since it accounts for DISTINCT show_id.
Thank you so much for your help and the explanation, the solution worked! If you don't mind can you please help with the other 2 columns as well?
@am_i_really I'm not sure how the other two columns are being calculated in terms of how those numbers were arrived at. For example, if I use what I think is logical I get different results. PBIX is attached beneath signature.
AvgRelLength =
VAR __Table =
SUMMARIZE(
FILTER('Netflix',[director]<>BLANK() && [cast] <> BLANK() && [director]<>[cast]),
[director],[cast],[show_id],
"__ReleaseYear", MAXX(FILTER('Content',[show_id] = EARLIER([show_id])),[release_year])
)
VAR __Table1 =
ADDCOLUMNS(
SUMMARIZE(
__Table,
[director],[cast],
"__MinYear",MINX(FILTER(__Table,[director]=EARLIER([director]) && [cast] = EARLIER([cast])), [__ReleaseYear]),
"__MaxYear",MAXX(FILTER(__Table,[director]=EARLIER([director]) && [cast] = EARLIER([cast])), [__ReleaseYear])
),
"__Diff", [__MaxYear] - [__MinYear]
)
VAR __Result = AVERAGEX(FILTER(__Table1,[__Diff]>0),[__Diff])
RETURN
__Result
and
AvgMoviesLong =
VAR __Table =
SUMMARIZE(
FILTER('Netflix',[director]<>BLANK() && [cast] <> BLANK() && [director]<>[cast]),
[director],[cast],[show_id],
"__ReleaseYear", MAXX(FILTER('Content',[show_id] = EARLIER([show_id])),[release_year])
)
VAR __Table1 =
ADDCOLUMNS(
ADDCOLUMNS(
SUMMARIZE(
__Table,
[director],[cast],
"__MinYear",MINX(FILTER(__Table,[director]=EARLIER([director])), [__ReleaseYear]),
"__MaxYear",MAXX(FILTER(__Table,[director]=EARLIER([director])), [__ReleaseYear]),
"__Count",COUNTROWS(FILTER(__Table,[director] = EARLIER([director]) && [cast] = EARLIER([cast])))
),
"__Diff", [__MaxYear] - [__MinYear]
),
"__NumPerYear", DIVIDE([__Count],[__Diff],0)
)
VAR __Result = AVERAGEX(FILTER(__Table1,[__Diff]>0),[__NumPerYear])
RETURN
__Result
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
49 | |
25 | |
20 | |
15 | |
12 |
User | Count |
---|---|
57 | |
49 | |
44 | |
19 | |
18 |