Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
charleshale
Responsive Resident
Responsive Resident

Histogram of number of occurrences of text based and / or numerical measures

Does anyone have advice on easier ways to do what I have done and am trying to do below?  

I am analyzing the depth of coverage (measured primarily by the average number of unique local stories) across a large news site (patch.com), which has ~2000 local websites, ~30m monthly unique readers, and different types of author roles (5 = journalist, for example, 7 = local business, etc).


Here's a sample of the data set.  

TABLE: all_articles_ever

DateCleansedTitleDateAuthorPatch_IdAuthor_IdAuthor_Role_Id
12/21/2018$339kcinnaminsonhomeisanentertainersdream;12/21/2018;93127817931275
11/29/2018$319kcinnaminsonhomesitsonhalfanacre;11/29/2018;93127817931275
11/28/2018$220sneakersstolenfromunlockedvehicleinbrynmawrpolice;11/28/2018;22896833628228968335
11/5/2018$57mundevelopedestateupforsaleinmclean;11/5/2018;22924509848229245095
10/25/2018$1millionpowerballticketsoldinsouthjersey;10/25/2018;93127817931275
10/12/201810contemporarycahomestogetyoudreaming;10/12/2018;2185473167218547315
9/28/2018$4600ofpropertystolenfromofficeinbeachwoodblotter;9/28/2018;22896693834228966935
8/30/2018$1millionpowerballticketsoldinwiltonstillunclaimed;8/30/2018;103600141036005
8/27/2018$550kmeadowviewvillagehomeavailableincinnaminson;8/27/2018;93127817931275
8/23/2018$11mautorepaircollegeopensinbloomfield(photos);8/23/2018;22844250473228442505
8/20/2018$1360incounterfeitmoneyfoundincrashedcarinbeachwood;8/20/2018;22896693834228966935
8/15/2018$5millionhomefeaturesviewscountryclubliving;8/15/2018;2185473167218547315
8/14/2018$500kbloomfieldhomeonmorseavenuehasenormouspool;8/14/2018;22844250473228442505
8/3/2018$5sandwiches|burglarysuspectsmanhattanbeachtopstories;8/3/2018;2294291354229429135
7/30/2018$24mgladwynehomehaswinecellarfireplaces2garagesmore;7/30/2018;22896833628228968335
7/19/2018$199msaleonmcleansgoldcoastsetsrecord;7/19/2018;22924509848229245095
7/11/2018$1000rewardofferedinfatalsouthaustinshootingcase;7/11/2018;2287299810936228729985
7/5/201810+newlawsthatcanchangeyourlifeinctstartingnow;7/5/2018;2282489012228248905
6/28/2018&pizzatoopenatthemallatprincegeorges;6/28/2018;22891861240228918615
6/27/2018$14millionchetsecuritybreachuncovered;6/27/2018;2282489012228248905
6/21/201810burlingtoncountytownscouldbeeliminatedunderstateplan;6/21/2018;93127817931275
6/19/2018100thdirtcheapstoretoopeninarlingtonthursday;6/19/2018;2296971911273229697195
6/18/2018$753klottoticketsoldinpomptonplains;6/18/2018;2289630144228963015
6/13/2018$20millionschoolsecuritygrantprogramproposedinburlco;6/13/2018;93127817931275

6/8/2018

$41kgameshowwin|150ftfallmanhattanbeachtopstories;6/8/2018;2294291354229429135

 

Since we generally get what we inspect, not what we expect (!), I am inspecting how many of the ~2000 sites get what number of daily unique journalist-produced articles on average.   I've largely gotten there, except for visual display of the information, which I am finding to be a tricky problem.

 

Here's what I have done so far, and how I have done it: 

 

A.  My first goal was to understand average number of local stories per Patch.   I did this in three steps, below, which could probably be done more elegantly.

 

1. I created a measure for Distinct articles to avoid counting dupes by first creating in M Query an amalgum of article title,date,Author and then doing a measure for distinct count of non blanks.   Measure below. 

#Loc.Article_Distinct = 
CALCULATE(
                DISTINCTCOUNT(all_articles_ever[CleansedTitleDateAuthor]) , 
                all_articles_ever[CleansedTitleDateAuthor] <> BLANK () 
) 

 

2. Next, I made a measure to calculate the average number of articles over time:

__DailyLocAvg = 
VAR _Days = COUNTROWS(DimDate) //Note my date table is called DimDate.  It is a standard date table
RETURN
[#Loc.Article_Distinct] / _Days 

//Side note: the above measure is a straight average, meaning that if a site measured over a year and 364 days with no articles and one day with 365, then the yearly average would be one article a day.    A straight average for now is okay, although an ideal measure probably looks at the percent of days over X articles a day.   


3. I realized that average articles per site measure, above, needed to have an explicit filter becaue I want to track by site.   Accordingly, I made a new measure that does that, below.   

__DailyLocAvgPerPatchID = 
AVERAGEX(
	KEEPFILTERS(VALUES(all_articles_ever[Patch_Id])),
	CALCULATE([__DailyLocAvg])
)

//Side note #2: I struggled to combine steps 2 and 3, above, into a single measure and ultimately moved on -- anyone have any bright ideas for more efficient measures here?

 

Anyway, so far so good.   My steps may be clumsy, but I am getting the results I should be: average stories per Patch per year:

Capture.PNG

 

HOWEVER: here's the issue with which I have been struggling despite thorough message board crawling.  What is the best way to take the measure [__DailyLocAvgPerSiteID] and get a histogram of the results to see what % of Patches get what number of articles?   I have tried the following:

 

1.  Summarize Columns to product a table of the above output, which I could then graph... 

Summary = SUMMARIZECOLUMNS(
    all_articles_ever[Patch_Id], 
    all_articles_ever[Date],
    "Distinct Count", DISTINCTCOUNT(all_articles_ever[CleansedTitleDateAuthor])
)

...but get the error "The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.

 

2.  I have tried creating a bucket in DAX, which works... 

__AvgStories DaxBucketed = 
IF ( [#Loc.Article_Distinct] > 0,
   SWITCH(
    TRUE()
    ,[__DailyLocAvg] > .01 && [__DailyLocAvg]< 1, "A. <1"
    ,[__DailyLocAvg] >= 1 && [__DailyLocAvg] < 2, "B. >=1 <2"
    ,[__DailyLocAvg] >= 2 && [__DailyLocAvg] < 4, "C. >=2 <4"
    ,[__DailyLocAvg] >=4  && [__DailyLocAvg] < 8, "D. >=4 <8"
    ,[__DailyLocAvg] >=8  && [__DailyLocAvg] < 999999, "E. >=8"
    ,"0. 0"
),
"Null" ) 

 ...but find myself then  unable to find a graph that takes the textual bucket measures and produces a chart that groups by them -- only ones that group by columns.   Additionally, I have to filter out a lot of nulls because the KEEPFILTERS seems like a pretty clumsy way to deal with per Site_ID.

 

Note - here's an example of the chart I've used in the past when I've had a column reference on a summary table that shows, for any given period, the count of stories.

Capture2.PNG

3. Finally, I have tried rounding the measure [__DailyLocAvg] so I can graph that.  However, the closest to the above desired chart I am able to get is the Data Plot chart 1.6, which shows information by Patch_Id but doesnt give the data display I am trying to get to, above.

  Capture3.PNG

 

Oh PowerBI and DAX masters out there, any advice?

 

Thank you.

 

Charlie

2 REPLIES 2
v-yuta-msft
Community Support
Community Support

@charleshale ,

 

To be general, could you change [#Loc.Article_Distinct] and [__DailyLocAvg] into calculate column? Becaue the problem will be more complicated when you use refer to one measure in another measure.

 

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

I agree.   Since the daily average would be per day and the table isnt per day, I have been trying to use SUMMARIZECOLUMNS for that (and probably with a Date.[day] in it to summarize by day).   Any idea why my SUMMARIZECOLUMNS measure is breaking?  Thank you, @v-yuta-msft

 

PS - I'd think the SUMMARIZECOLUMNS would look like this for any given date array, and then since the __AvgStoriesDaxBucketed is a column, I should be able to graph it as  % of total.

Patch_Id__AvgStories DaxBucketed
2B. >=1 <2
3B. >=1 <2
4B. >=1 <2
6C. >=2 <4
7C. >=2 <4
8B. >=1 <2
10C. >=2 <4
11D. >=4 <8
12D. >=4 <8
13D. >=4 <8
14C. >=2 <4
15B. >=1 <2
16C. >=2 <4
17C. >=2 <4
18C. >=2 <4
19C. >=2 <4
20C. >=2 <4
21B. >=1 <2
22B. >=1 <2
23B. >=1 <2
24A. <1
25C. >=2 <4
26B. >=1 <2
27B. >=1 <2
28B. >=1 <2
29B. >=1 <2
30C. >=2 <4
32C. >=2 <4
33C. >=2 <4
34B. >=1 <2

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.