Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Janaya1
Frequent Visitor

Question on Data Cleaning/Pre-Work

I have a large wide data set (hundreds of columns and each row is a new batch or lot) with data from manufacturing runs and I am trying to set up the data in the most usable way. This is data over several days where many types of tests are run, I have basically the same data across several days for many different assays that come in different types like text, percentages, very small values (small decimals), very large values (billions).

 

I have been trying to just use the data from the source as is, but I am finding it much more difficult to use than I've seen in other similar programs (Tableau and Spotfire). For example, if I want to see how a particular assay shows across the entire manufacturing for each lot, I have no idea how to group all of the column from that particular assay so that they can be shown easily in a visualization.

 

So my question is, how can I organize the data in a way that makes this possible by either using the current data table in it's current shape/format or by transforming the data? My initial thoughts were:

 

1. Use the data as is and figure out how to group columns in a meaningful way that allows me to compare individual batches as well as create dynamic control charts. I have no idea how to do this at this point.

2. Create several tables including a dimension table with 1 row per lot and all of the text/date columns and then several unpivoted tables for each assay. My concern here is that I wouldn't really be able to create dynamic visualizations and that the columns may eventually change and I'm not sure how to go about ensuring issues don't arise.

 

Edit:

Here is an example of the first output I'd like to see. This is just a control chart based on one of the measures. What I've done in the past is create something like this below created a Parameter with Parameter Values (this is from Tableau) where I could have the y axis (along with the reference lines, moving average, and axis scale) change based on a drop down menu to any particular measure that I wanted.

 

Screenshot 2024-04-23 155132.png

I also would like something like this where it's a line chart of different timepoints for the same measure and I have all of the lots plotted out, but only the one I have selected shows as a color and the rest are grey and have increased transparency. I did this in Tableau before but recreated it here in excel.

 

Screenshot 2024-04-24 092311.png 

 

There are a lot more visualizations I will be working on, but I thought these two might give me a good base to work from. Here is the fake data I generated (I had to delete some columns and rows to make the character limit for the post):

 

BatchStudyAgeDate of Manufacturing StartThaw TimeThaw ValueThaw PercentageIntermediate ValueIntermediate PercentageCulture PathwayTarget Culture VolumeCulture ValueCulture PercentageCulture Assay 1Culture Assay 2Culture Assay 3Culture Assay 4Culture Assay 5Culture DateCulture TimeDay 3 Additive LotDay 3 ValueDay 3 PercentageDay 6 ValueDay 6 PercentageDay 9 PathwayDay 9 ValueDay 9 PercentageDay 13 ValueDay 13 PercentagePre-Formulation ValuePre-Formulation PercentagePost-Formulation ValuePost-Formulation Percentage
A001A381/1/2023112.03E+09931.13E+0995A703.31E+08920.9643650.9099110.92660.42684393461/4/20233:43:00 PMBC0011.75E+08823.77E+0882X6.96E+08893.42E+09775.26E+09725.88E+0866
A002B311/11/2023173.22E+09931.49E+0978B2002.24E+08820.5238730.4963580.5784010.82548795441/14/20233:16:00 PMBC0011.81E+08704.73E+0872Y9.97E+08833.34E+09854.64E+09635.54E+0862
A003C321/21/2023203.28E+09921.34E+0998C303.16E+08800.4582820.9916690.3795540.50233399461/24/20232:59:00 PMBC0012.25E+08724.46E+0879Z9.83E+08773.05E+09844.87E+09685.85E+0867
A004A361/31/2023183.95E+09981.25E+0986A702.20E+08730.0874410.0788520.5439150.61884390722/3/20232:25:00 PMBC0011.94E+08894.02E+0876X7.69E+08903.65E+09845.57E+09744.76E+0868
A005B412/10/2023172.20E+091001.24E+0992A2002.59E+08720.6872260.6670370.0997940.12901891712/13/20233:15:00 PMBC0011.90E+08774.10E+0874X9.09E+08822.83E+09735.21E+09684.69E+0887
A006C452/20/2023192.21E+09991.10E+0976A702.23E+08790.5882230.5727470.0310490.24469394682/23/20234:13:00 PMBC0021.98E+08993.83E+0878X9.72E+08843.06E+09834.28E+09835.04E+0881
A007A563/2/202352.69E+09911.37E+0976C702.91E+08730.982730.5401280.0042330.63026993743/5/20234:53:00 PMBC0022.47E+08823.77E+0897X8.09E+08792.74E+09795.31E+09625.06E+0889
A008B353/12/2023162.10E+09851.33E+0979C2003.29E+08990.3488660.5997850.7125970.26882198513/15/20234:19:00 PMBC0021.58E+08934.32E+0876X9.33E+08702.61E+09834.58E+09735.92E+0880
A009C493/22/202362.91E+09871.03E+0972B302.82E+08700.8450620.9353550.5769810.99744699113/25/20233:30:00 PMBC0021.92E+08983.73E+0894Y7.65E+08783.87E+09895.06E+09605.78E+0886
2 REPLIES 2
parry2k
Super User
Super User

@Janaya1 It is very tough to answer without looking at the data. The starting point and goal should be to work towards a star schema Understand star schema and the importance for Power BI - Power BI | Microsoft Learn

 

Maybe if you can share sample data (just create dummy data) with some expected outputs, it will help to provide some details.

 

Don't share any sensitive information.



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

Thank you! I created some fake data and visualizations I'm looking for. I had to end up deleting a bunch of the data to make the character limit for the post but it should still work.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.