Data Derivatives in Power BI: A Python-based Inter...

technolog · ‎06-30-2023

Screenshot 2023-06-30 at 19.31.16.png

This article focuses on exploring the use of interpolation and approximation of data sets in Power BI utilizing Python.

Given a data set, as depicted in the following figure:

x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y1 = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])

An important aspect to be considered is that a portion of our y7 data contains zeroes, which will be crucial to bear in mind during further calculations.

The aim is to construct derivatives from the data for all values of x and y, inclusive of y2, y3, etc. To facilitate this, we commence by building an interpolation based on these data, from which we will then derive.

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate


x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y1 = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])
y8 = np.array([351, 388, 430, 478, 528, 588, 660, 740, 0  , 0  , 0  , 0  , 0  ])
y9 = np.array([358, 378, 400, 424, 463, 502, 538, 584, 628, 680, 0  , 0  , 0  ])
y10 = np.array([363, 383, 402, 430, 469, 507, 545, 590, 633, 685, 0  , 0  , 0  ])
y11 = np.array([243, 269, 295, 320, 350, 383, 416, 453, 497, 545, 593, 662, 722])
y12 = np.array([243, 264, 287, 313, 339, 372, 405, 445, 486, 534, 582, 649, 711])

y_arr = np.concatenate((y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12), axis = 0)
y_arr = np.reshape(y_arr, (12, 13))

i = 1

for y_list in y_arr:
    x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
    y_list = y_list[y_list > 0]
    
    x = np.resize(x, len(y_list))
    
    minx = np.min(x)
    maxx = np.max(x)
        
    
    tck = interpolate.splrep(x, y_list, s=0)

    xnew = np.arange(minx, maxx, 0.5)
    ynew = interpolate.splev(xnew, tck, der=0)
    yder2 = interpolate.splev(xnew, tck, der=1)
    
    fig, ax1 = plt.subplots()

    ax2 = ax1.twinx()
  
    ax1.plot(x, y_list, 'o', xnew, ynew, x, y_list, 'b')
    plt.title('Cubic-spline interpolation for ' + str(i) + ' line')

    ax2.plot(xnew, yder2)

    plt.show()
    i = i+1

Following the implementation of the aforementioned method, we are presented with the subsequent graph. This illustrates the original data points along with the derivative values:

An observation of the derivative reveals a considerable irregularity, which is primarily due to the non-uniform arrangement of data points. This irregularity may impose challenges for any conclusions to be drawn from these data in the future.

Hence, we aim to improve the initial code to perform data approximation instead of interpolation, and then compute the derivative from the resulting function.

import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])
y8 = np.array([351, 388, 430, 478, 528, 588, 660, 740, 0  , 0  , 0  , 0  , 0  ])
y9 = np.array([358, 378, 400, 424, 463, 502, 538, 584, 628, 680, 0  , 0  , 0  ])
y10 = np.array([363, 383, 402, 430, 469, 507, 545, 590, 633, 685, 0  , 0  , 0  ])
y11 = np.array([243, 269, 295, 320, 350, 383, 416, 453, 497, 545, 593, 662, 722])
y12 = np.array([243, 264, 287, 313, 339, 372, 405, 445, 486, 534, 582, 649, 711])

y_arr = np.concatenate((y,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12), axis = 0)
y_arr = np.reshape(y_arr, (12, 13))

i = 1

for y_list in y_arr:
    x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
    y = y_list
    
    y = y[y > 0]
    
    x = np.resize(x, len(y))
    
    
    minx = np.min(x)
    maxx = np.max(x)
    
    d=3 # степень полинома
    fp, residuals, rank, sv, rcond = np.polyfit(x, y, d, full=True) # Модель
    f = sp.poly1d(fp) # аппроксимирующая функция
    y1=[fp[0]*x[i]**2+fp[1]*x[i]+fp[2] for i in range(0,len(x))] # значения функции a*x**2+b*x+c
    so=round(sum([abs(y[i]-y1[i]) for i in range(0,len(x))])/(len(x)*sum(y))*100,4) # средняя ошибка
    fx = np.linspace(x[0], x[-1] + 1, len(x)) # можно установить вместо len(x) большее число для интерполяции

    y_list = f(fx)

    tck = interpolate.splrep(x, y_list, s=0)

    xnew = np.arange(minx, maxx, 0.5)
    ynew = interpolate.splev(xnew, tck, der=0)
    yder2 = interpolate.splev(xnew, tck, der=1)

    fig, ax1 = plt.subplots()

    ax2 = ax1.twinx()

    ax1.plot(x, y_list, 'o', xnew, ynew, x, y_list, 'b')
    plt.title('Approximation for ' + str(i) + ' line')
    ax2.plot(xnew, yder2)

    plt.show()
    i = i + 1

The result is displayed below:

As demonstrated, employing an approximation function results in a significantly smoother derivative of the function.

We proceed by examining the implementation for y8:

Upon inspection, we notice that excessive data along the x-axis is eliminated, with no attempt to transition to zero.

Subsequently, to continue working with the code and set up the necessary visualizations, we transfer the scripts into Power BI, as displayed below:

The contrast between the two approaches - interpolation and approximation - can be vividly discerned here.

Data Derivatives in Power BI: A Python-based Interpolation and Approximation Analysis

Important Links every Power BI enthusiastic shoul...

Re: Supercharge Your Visuals: Easy Conditional For...

Re: Power BI Tip: How To Sort A Column With Duplic...

Power BI Quick Reference Sheet

Creating a “custom” or “hybrid” matrix in PowerBI