Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
aksuffolk-80
Frequent Visitor

Create trending report from CSV files in Data Lake

Hi,

 

We have been using Power BI Dataflows to load data in to our Data Lake Gen 2 for almost a year now. A use-case has come up where we want to create a trending report from all of this data. At this point there are thousands of CSV files in the lake, one for each day the data was refreshed.

 

We are running into these issues:

  1. CSV Files are not stored in hierarchical structure (all are stored in one folder rather than nested folders for year/month)
  2. A CSV file is created for each time the refresh happened
  3. CSV files do not contain any header information (column headers)

There is a model.json file that corresponds to each CSV file in the lake. That json file contains all of the metadata about the tables at the time of refresh. The only issue is we haven't found a great way to tie those programmatically back to the CSV file.

 

As a proof of concept, I was able to download a few CSV files from the lake, and their corresponding json files, and come up with some M Query to get the header information from the json and merge it with the data from the CSV. But that entire process was manual. 

 

There must be a way to do this programmatically! How is everyone else using data from their data lakes? Is the only option to go with something like Synapse Analytics? I think in this situation you would still have to marry the json files with the CSV files in order to produce a useful view of the data.

 

Interested in your insight! what am I missing?

 

Thanks!

Andy

 

2 REPLIES 2
aksuffolk-80
Frequent Visitor

Hey Pat,

 

Thanks for this. I'm working through it but am stuck. The file with the columns is in JSON format. I can't figure out the correct query to expand that, pick the correct entity and attributes column and pull that in to a new column.

 

When I did it manually from one model.json here's the steps I used:

 

 

let
Source = Json.Document(File.Contents("model.json")),
entities = Source[entities],
entities1 = entities{0},
attributes = entities1[attributes],
#"Converted to Table" = Table.FromList(attributes, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"name"}, {"Column1.name"}),
#"Transposed Table" = Table.Transpose(#"Expanded Column1")
in
#"Transposed Table"

 

I merged the JSON files with the CSV (content) files in to one row, and have the binary file which contains the JSON payload:

aksuffolk-80_0-1618243435905.png

 

 

When I try to convert the Binary file to JSON it says "Expression.Error: We cannot convert a value of type Binary to type Text"

 

Same if I try to drill in more.

 

I appreciate the help.

 

Andy

 

mahoneypat
Employee
Employee

This should be doable in the query editor.  Here is one way to try:

 

- have one query that connects to your JSON file and results in a table of filenames (and other metadata)

- another query to connects to all your csv files (starts with two columns - filename and content with the Binary of each file.

- merge the two queries on filename

- add a column with Csv.Document([Content]) or whatever that column is called to extract the data into a column of tables

- remove the column with the original Binaries

- expand the tables

 

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.