Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
charleshale
Responsive Resident
Responsive Resident

Histogram of number of occurrences of text based and / or numerical measures

Does anyone have advice on easier ways to do what I have done and am trying to do below?  

I am analyzing the depth of coverage (measured primarily by the average number of unique local stories) across a large news site (patch.com), which has ~2000 local websites, ~30m monthly unique readers, and different types of author roles (5 = journalist, for example, 7 = local business, etc).


Here's a sample of the data set.  

TABLE: all_articles_ever

DateCleansedTitleDateAuthorPatch_IdAuthor_IdAuthor_Role_Id
12/21/2018$339kcinnaminsonhomeisanentertainersdream;12/21/2018;93127817931275
11/29/2018$319kcinnaminsonhomesitsonhalfanacre;11/29/2018;93127817931275
11/28/2018$220sneakersstolenfromunlockedvehicleinbrynmawrpolice;11/28/2018;22896833628228968335
11/5/2018$57mundevelopedestateupforsaleinmclean;11/5/2018;22924509848229245095
10/25/2018$1millionpowerballticketsoldinsouthjersey;10/25/2018;93127817931275
10/12/201810contemporarycahomestogetyoudreaming;10/12/2018;2185473167218547315
9/28/2018$4600ofpropertystolenfromofficeinbeachwoodblotter;9/28/2018;22896693834228966935
8/30/2018$1millionpowerballticketsoldinwiltonstillunclaimed;8/30/2018;103600141036005
8/27/2018$550kmeadowviewvillagehomeavailableincinnaminson;8/27/2018;93127817931275
8/23/2018$11mautorepaircollegeopensinbloomfield(photos);8/23/2018;22844250473228442505
8/20/2018$1360incounterfeitmoneyfoundincrashedcarinbeachwood;8/20/2018;22896693834228966935
8/15/2018$5millionhomefeaturesviewscountryclubliving;8/15/2018;2185473167218547315
8/14/2018$500kbloomfieldhomeonmorseavenuehasenormouspool;8/14/2018;22844250473228442505
8/3/2018$5sandwiches|burglarysuspectsmanhattanbeachtopstories;8/3/2018;2294291354229429135
7/30/2018$24mgladwynehomehaswinecellarfireplaces2garagesmore;7/30/2018;22896833628228968335
7/19/2018$199msaleonmcleansgoldcoastsetsrecord;7/19/2018;22924509848229245095
7/11/2018$1000rewardofferedinfatalsouthaustinshootingcase;7/11/2018;2287299810936228729985
7/5/201810+newlawsthatcanchangeyourlifeinctstartingnow;7/5/2018;2282489012228248905
6/28/2018&pizzatoopenatthemallatprincegeorges;6/28/2018;22891861240228918615
6/27/2018$14millionchetsecuritybreachuncovered;6/27/2018;2282489012228248905
6/21/201810burlingtoncountytownscouldbeeliminatedunderstateplan;6/21/2018;93127817931275
6/19/2018100thdirtcheapstoretoopeninarlingtonthursday;6/19/2018;2296971911273229697195
6/18/2018$753klottoticketsoldinpomptonplains;6/18/2018;2289630144228963015
6/13/2018$20millionschoolsecuritygrantprogramproposedinburlco;6/13/2018;93127817931275

6/8/2018

$41kgameshowwin|150ftfallmanhattanbeachtopstories;6/8/2018;2294291354229429135

 

Since we generally get what we inspect, not what we expect (!), I am inspecting how many of the ~2000 sites get what number of daily unique journalist-produced articles on average.   I've largely gotten there, except for visual display of the information, which I am finding to be a tricky problem.

 

Here's what I have done so far, and how I have done it: 

 

A.  My first goal was to understand average number of local stories per Patch.   I did this in three steps, below, which could probably be done more elegantly.

 

1. I created a measure for Distinct articles to avoid counting dupes by first creating in M Query an amalgum of article title,date,Author and then doing a measure for distinct count of non blanks.   Measure below. 

#Loc.Article_Distinct = 
CALCULATE(
                DISTINCTCOUNT(all_articles_ever[CleansedTitleDateAuthor]) , 
                all_articles_ever[CleansedTitleDateAuthor] <> BLANK () 
) 

 

2. Next, I made a measure to calculate the average number of articles over time:

__DailyLocAvg = 
VAR _Days = COUNTROWS(DimDate) //Note my date table is called DimDate.  It is a standard date table
RETURN
[#Loc.Article_Distinct] / _Days 

//Side note: the above measure is a straight average, meaning that if a site measured over a year and 364 days with no articles and one day with 365, then the yearly average would be one article a day.    A straight average for now is okay, although an ideal measure probably looks at the percent of days over X articles a day.   


3. I realized that average articles per site measure, above, needed to have an explicit filter becaue I want to track by site.   Accordingly, I made a new measure that does that, below.   

__DailyLocAvgPerPatchID = 
AVERAGEX(
	KEEPFILTERS(VALUES(all_articles_ever[Patch_Id])),
	CALCULATE([__DailyLocAvg])
)

//Side note #2: I struggled to combine steps 2 and 3, above, into a single measure and ultimately moved on -- anyone have any bright ideas for more efficient measures here?

 

Anyway, so far so good.   My steps may be clumsy, but I am getting the results I should be: average stories per Patch per year:

Capture.PNG

 

HOWEVER: here's the issue with which I have been struggling despite thorough message board crawling.  What is the best way to take the measure [__DailyLocAvgPerSiteID] and get a histogram of the results to see what % of Patches get what number of articles?   I have tried the following:

 

1.  Summarize Columns to product a table of the above output, which I could then graph... 

Summary = SUMMARIZECOLUMNS(
    all_articles_ever[Patch_Id], 
    all_articles_ever[Date],
    "Distinct Count", DISTINCTCOUNT(all_articles_ever[CleansedTitleDateAuthor])
)

...but get the error "The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.

 

2.  I have tried creating a bucket in DAX, which works... 

__AvgStories DaxBucketed = 
IF ( [#Loc.Article_Distinct] > 0,
   SWITCH(
    TRUE()
    ,[__DailyLocAvg] > .01 && [__DailyLocAvg]< 1, "A. <1"
    ,[__DailyLocAvg] >= 1 && [__DailyLocAvg] < 2, "B. >=1 <2"
    ,[__DailyLocAvg] >= 2 && [__DailyLocAvg] < 4, "C. >=2 <4"
    ,[__DailyLocAvg] >=4  && [__DailyLocAvg] < 8, "D. >=4 <8"
    ,[__DailyLocAvg] >=8  && [__DailyLocAvg] < 999999, "E. >=8"
    ,"0. 0"
),
"Null" ) 

 ...but find myself then  unable to find a graph that takes the textual bucket measures and produces a chart that groups by them -- only ones that group by columns.   Additionally, I have to filter out a lot of nulls because the KEEPFILTERS seems like a pretty clumsy way to deal with per Site_ID.

 

Note - here's an example of the chart I've used in the past when I've had a column reference on a summary table that shows, for any given period, the count of stories.

Capture2.PNG

3. Finally, I have tried rounding the measure [__DailyLocAvg] so I can graph that.  However, the closest to the above desired chart I am able to get is the Data Plot chart 1.6, which shows information by Patch_Id but doesnt give the data display I am trying to get to, above.

  Capture3.PNG

 

Oh PowerBI and DAX masters out there, any advice?

 

Thank you.

 

Charlie

2 REPLIES 2
v-yuta-msft
Community Support
Community Support

@charleshale ,

 

To be general, could you change [#Loc.Article_Distinct] and [__DailyLocAvg] into calculate column? Becaue the problem will be more complicated when you use refer to one measure in another measure.

 

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

I agree.   Since the daily average would be per day and the table isnt per day, I have been trying to use SUMMARIZECOLUMNS for that (and probably with a Date.[day] in it to summarize by day).   Any idea why my SUMMARIZECOLUMNS measure is breaking?  Thank you, @v-yuta-msft

 

PS - I'd think the SUMMARIZECOLUMNS would look like this for any given date array, and then since the __AvgStoriesDaxBucketed is a column, I should be able to graph it as  % of total.

Patch_Id__AvgStories DaxBucketed
2B. >=1 <2
3B. >=1 <2
4B. >=1 <2
6C. >=2 <4
7C. >=2 <4
8B. >=1 <2
10C. >=2 <4
11D. >=4 <8
12D. >=4 <8
13D. >=4 <8
14C. >=2 <4
15B. >=1 <2
16C. >=2 <4
17C. >=2 <4
18C. >=2 <4
19C. >=2 <4
20C. >=2 <4
21B. >=1 <2
22B. >=1 <2
23B. >=1 <2
24A. <1
25C. >=2 <4
26B. >=1 <2
27B. >=1 <2
28B. >=1 <2
29B. >=1 <2
30C. >=2 <4
32C. >=2 <4
33C. >=2 <4
34B. >=1 <2

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.