Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi,
We have been using Power BI Dataflows to load data in to our Data Lake Gen 2 for almost a year now. A use-case has come up where we want to create a trending report from all of this data. At this point there are thousands of CSV files in the lake, one for each day the data was refreshed.
We are running into these issues:
There is a model.json file that corresponds to each CSV file in the lake. That json file contains all of the metadata about the tables at the time of refresh. The only issue is we haven't found a great way to tie those programmatically back to the CSV file.
As a proof of concept, I was able to download a few CSV files from the lake, and their corresponding json files, and come up with some M Query to get the header information from the json and merge it with the data from the CSV. But that entire process was manual.
There must be a way to do this programmatically! How is everyone else using data from their data lakes? Is the only option to go with something like Synapse Analytics? I think in this situation you would still have to marry the json files with the CSV files in order to produce a useful view of the data.
Interested in your insight! what am I missing?
Thanks!
Andy
Hey Pat,
Thanks for this. I'm working through it but am stuck. The file with the columns is in JSON format. I can't figure out the correct query to expand that, pick the correct entity and attributes column and pull that in to a new column.
When I did it manually from one model.json here's the steps I used:
let
Source = Json.Document(File.Contents("model.json")),
entities = Source[entities],
entities1 = entities{0},
attributes = entities1[attributes],
#"Converted to Table" = Table.FromList(attributes, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"name"}, {"Column1.name"}),
#"Transposed Table" = Table.Transpose(#"Expanded Column1")
in
#"Transposed Table"
I merged the JSON files with the CSV (content) files in to one row, and have the binary file which contains the JSON payload:
When I try to convert the Binary file to JSON it says "Expression.Error: We cannot convert a value of type Binary to type Text"
Same if I try to drill in more.
I appreciate the help.
Andy
This should be doable in the query editor. Here is one way to try:
- have one query that connects to your JSON file and results in a table of filenames (and other metadata)
- another query to connects to all your csv files (starts with two columns - filename and content with the Binary of each file.
- merge the two queries on filename
- add a column with Csv.Document([Content]) or whatever that column is called to extract the data into a column of tables
- remove the column with the original Binaries
- expand the tables
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
102 | |
101 | |
78 | |
70 | |
64 |
User | Count |
---|---|
141 | |
107 | |
100 | |
82 | |
74 |