Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
so this is a 2 part question.
i've created a pipeline that queries the Power BI REST API get activity events for a day or multiple days (parameter) and stores the data for each day in a file in a lakehouse with date appended to the filename. This part is working.
My first question is there are different file formats to choose from in the copy activity when creating the destination file (JSON, Delimited, Parquet, Binary, Avro, ORC). Right now i just chose delimited and the pipeline creates .txt files for each day that I pass in the parameter and stores them in the files folder in the lakehosue. What i eventually want to do in the pipeline is read the data from these files, most likely just about 12-15 columns which will always be the same each time, and insert the rows into a table in the lakehouse. What is the best file format to choose for this process?
My second question is what is the best way to iterate all the files in the lakehouse and select the columns i need and insert them into the lakehouse table. My guess is that there are probably better formats to handle the select query than others and what activities should i use to loop through each file and move the needed columns and rows into the lakehouse table?
thanks
Scott
Solved! Go to Solution.
Hi @scabral ,
Thanks for using Fabric Community.
Choosing the most suitable approach depends completely on your specific needs and type of data thats being handled. Here are some suggestions I have based on my understanding of your scenario:
File Format Selection:
For your scenario, where you'll be reading specific columns from the data and inserting them into a lakehouse table, columnar file formats are the clear choice. Here's a breakdown of the most suitable option:
Advantages of Columnar Formats:
Delimited Text (CSV) Considerations:
While delimited text (CSV) might seem familiar, it's not ideal due to:
Iteration and Data Processing:
Here's a recommended approach for iterating through files and inserting data into your lakehouse table:
By following these recommendations, you'll establish a more efficient and scalable pipeline for processing your Power BI activity event data in your lakehouse.
Hope this is helpful. Please let me know incase of further queries.
Hi @scabral ,
Thanks for using Fabric Community.
Choosing the most suitable approach depends completely on your specific needs and type of data thats being handled. Here are some suggestions I have based on my understanding of your scenario:
File Format Selection:
For your scenario, where you'll be reading specific columns from the data and inserting them into a lakehouse table, columnar file formats are the clear choice. Here's a breakdown of the most suitable option:
Advantages of Columnar Formats:
Delimited Text (CSV) Considerations:
While delimited text (CSV) might seem familiar, it's not ideal due to:
Iteration and Data Processing:
Here's a recommended approach for iterating through files and inserting data into your lakehouse table:
By following these recommendations, you'll establish a more efficient and scalable pipeline for processing your Power BI activity event data in your lakehouse.
Hope this is helpful. Please let me know incase of further queries.
Hello @scabral ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
Hi @scabral ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .
I did create a process in Fabric data factory given some of the points that were listed above.
I created a new data pipeline with the following activities:
1. a ForEach activity that calls the Power BI REST API for each day (parameter driven) and calls the getctivities events and stores the data in PARQUET files for each day in a new lakehouse.
2. a GetMetadata activity to get the list of file names from the lakehouse
3. a ForEach activity that uses a copy activity to load each PARQUET file into a lakehouse table and archive each file into an archive folder
i still need to make some minor tweaks, but the overall pipeline is working.
Hi @scabral ,
Glad to know that your query got resolved. Please continue using Fabric Community for your further queries.