Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
dan_evans_ngn
Regular Visitor

How to model this - date fields in two tables, more complicated than this usually is?

Hi,

 

I'm new to BI but have a reasonable background in data and databases. I'm trying to build something to help me explore the relationships between gas pipe failure data and soil types, slicing/filtering based on pipe attributes and/or failure attributes, to give a normalised failure rate in failures per km of pipe, for different soil types, over time.

 

I've used GIS to intersect my pipe features with the soil polygons, and my repair locations with the soil polygons, so I have two intersection tables that relate by a common ID to both the pipes and the repairs.

 

Where it gets complicated is that the repair data goes back over a decade, and the pipe network has changed over that time as pipes are replaced etc.

 

So my pipes dataset is actually several yearly extracts with attributes that I've calculated to give each pipe a start date and an end date. I can easily query this dataset on a particular date (start date < desired date and end date > desired date) and get the population of pipes that were live on that date.

 

I'm struggling to understand the best way to model this in Power BI. I can see how to filter the failures by month or by a date range etc. and get a count of failures (perhaps filtered by failure cause and only on certain pipe materials) for those date ranges, but I'm not sure how best to get the equivalent pipe length for those dates. There could be changes to pipes at multiple points within the date range so summing lengths could end up double-counting some pipes. I suppose I'd need to get the average lengths over the period, but would this require transforming the data so that I have a table with the full population for every single day across my study period? This could get very big and potentially very slow...

 

I probably don't need it down to individual day level, so I did think about aggregating it to weekly populations, and doing the same with the failure data. Is this something that can be achieved in the Query editor, or am I best off pre-processing this in another environment (R?)

 

Very keen to hear how more experienced BI modellers would approach this.

 

Thanks

 

Dan

3 REPLIES 3
Anonymous
Not applicable

Hi,

 

Read your post and I think you have trouble only in the following part of your post..

 

"but I'm not sure how best to get the equivalent pipe length for those dates. There could be changes to pipes at multiple points within the date range so summing lengths could end up double-counting some pipes."

 

Based on what I understood from your post, is it possible for you to perform the following calculation..

 

Pipe Count on 1st Jan 2017:

Pipe 1Pipe 2Pipe 3Pipe 4Pipe 5Pipe 6Pipe 7Pipe 8Pipe 9Pipe 10

 

Replacements in January 2017

 |-----| |-----| |-----|Pipe 11 |-----| |-----| |-----| |-----|Pipe 12 |-----|

Replacements in February 2017

 |-----|Pipe 13 |-----| |-----| |-----| |-----|Pipe 14 |-----| |-----| |-----|

Live Pipes on 28th Feb 2017

Pipe 1Pipe 13Pipe 3Pipe 11Pipe 5Pipe 6Pipe 14Pipe 8Pipe 12Pipe 10

 

Calculation:

If you count the number of live pipes between 01st Jan 2017 to 28th Feb 2017, you will get the result 14 from your pipe dataset.

 

A = count of live pipes between 01st Jan 2017 to 28th Feb 2017 = 14

B = count of pipes replaced between 01st Jan 2017 to 28th Feb 2017 = 4

C = live pipe count without duplication = A-B = 14-4 = 10

 

Without duplication, 10 is the number of live pipes between the considered period.

 

In addition to this, in case if the length has increased by adding - say 2 more pipes, you can add that also, provided you have the data of replacements of pipes and additions of pipes separately.

 

This is as per my understanding of your post. If I am missing any point, please let me know. 

After some playing around, and having read part of The Definitive Guide to DAX, I can now reformulate my question a bit...

 

Basically what I need to do is create a measure that gets the middle date from the filter context and returns the sum of the pipe lengths on that day, i.e. when the start date for each pipe is less than the middle day and the end date for each pipe is greater than the middle day.

 

Any ideas how I would do this?

 

My date table is not related to my pipes dataset because there isn't a single date field for the pipes that corresponds to it. Is this a problem?

@dan_evans_ngn can you provide sample data to get the answer. Also here is a topic which help to get solution quicker.



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.