Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

technolog

Data Derivatives in Power BI: A Python-based Interpolation and Approximation Analysis

Screenshot 2023-06-30 at 19.31.16.png

This article focuses on exploring the use of interpolation and approximation of data sets in Power BI utilizing Python.

Given a data set, as depicted in the following figure:

 

x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y1 = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])

 

 

An important aspect to be considered is that a portion of our y7 data contains zeroes, which will be crucial to bear in mind during further calculations.

The aim is to construct derivatives from the data for all values of x and y, inclusive of y2, y3, etc. To facilitate this, we commence by building an interpolation based on these data, from which we will then derive.

 

 

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate


x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y1 = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])
y8 = np.array([351, 388, 430, 478, 528, 588, 660, 740, 0  , 0  , 0  , 0  , 0  ])
y9 = np.array([358, 378, 400, 424, 463, 502, 538, 584, 628, 680, 0  , 0  , 0  ])
y10 = np.array([363, 383, 402, 430, 469, 507, 545, 590, 633, 685, 0  , 0  , 0  ])
y11 = np.array([243, 269, 295, 320, 350, 383, 416, 453, 497, 545, 593, 662, 722])
y12 = np.array([243, 264, 287, 313, 339, 372, 405, 445, 486, 534, 582, 649, 711])

y_arr = np.concatenate((y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12), axis = 0)
y_arr = np.reshape(y_arr, (12, 13))

i = 1

for y_list in y_arr:
    x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
    y_list = y_list[y_list > 0]
    
    x = np.resize(x, len(y_list))
    
    minx = np.min(x)
    maxx = np.max(x)
        
    
    tck = interpolate.splrep(x, y_list, s=0)

    xnew = np.arange(minx, maxx, 0.5)
    ynew = interpolate.splev(xnew, tck, der=0)
    yder2 = interpolate.splev(xnew, tck, der=1)
    
    fig, ax1 = plt.subplots()

    ax2 = ax1.twinx()
  
    ax1.plot(x, y_list, 'o', xnew, ynew, x, y_list, 'b')
    plt.title('Cubic-spline interpolation for ' + str(i) + ' line')

    ax2.plot(xnew, yder2)

    plt.show()
    i = i+1

 

 

 

Following the implementation of the aforementioned method, we are presented with the subsequent graph. This illustrates the original data points along with the derivative values:

technolog_3-1688143298483.png

An observation of the derivative reveals a considerable irregularity, which is primarily due to the non-uniform arrangement of data points. This irregularity may impose challenges for any conclusions to be drawn from these data in the future.

Hence, we aim to improve the initial code to perform data approximation instead of interpolation, and then compute the derivative from the resulting function.

 

 

import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])
y8 = np.array([351, 388, 430, 478, 528, 588, 660, 740, 0  , 0  , 0  , 0  , 0  ])
y9 = np.array([358, 378, 400, 424, 463, 502, 538, 584, 628, 680, 0  , 0  , 0  ])
y10 = np.array([363, 383, 402, 430, 469, 507, 545, 590, 633, 685, 0  , 0  , 0  ])
y11 = np.array([243, 269, 295, 320, 350, 383, 416, 453, 497, 545, 593, 662, 722])
y12 = np.array([243, 264, 287, 313, 339, 372, 405, 445, 486, 534, 582, 649, 711])

y_arr = np.concatenate((y,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12), axis = 0)
y_arr = np.reshape(y_arr, (12, 13))

i = 1

for y_list in y_arr:
    x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
    y = y_list
    
    y = y[y > 0]
    
    x = np.resize(x, len(y))
    
    
    minx = np.min(x)
    maxx = np.max(x)
    
    d=3 # степень полинома
    fp, residuals, rank, sv, rcond = np.polyfit(x, y, d, full=True) # Модель
    f = sp.poly1d(fp) # аппроксимирующая функция
    y1=[fp[0]*x[i]**2+fp[1]*x[i]+fp[2] for i in range(0,len(x))] # значения функции a*x**2+b*x+c
    so=round(sum([abs(y[i]-y1[i]) for i in range(0,len(x))])/(len(x)*sum(y))*100,4) # средняя ошибка
    fx = np.linspace(x[0], x[-1] + 1, len(x)) # можно установить вместо len(x) большее число для интерполяции

    y_list = f(fx)

    tck = interpolate.splrep(x, y_list, s=0)

    xnew = np.arange(minx, maxx, 0.5)
    ynew = interpolate.splev(xnew, tck, der=0)
    yder2 = interpolate.splev(xnew, tck, der=1)

    fig, ax1 = plt.subplots()

    ax2 = ax1.twinx()

    ax1.plot(x, y_list, 'o', xnew, ynew, x, y_list, 'b')
    plt.title('Approximation for ' + str(i) + ' line')
    ax2.plot(xnew, yder2)

    plt.show()
    i = i + 1

 

 

The result is displayed below:

technolog_0-1688159925403.png

As demonstrated, employing an approximation function results in a significantly smoother derivative of the function.

We proceed by examining the implementation for y8:

technolog_1-1688159958631.png

Upon inspection, we notice that excessive data along the x-axis is eliminated, with no attempt to transition to zero.

Subsequently, to continue working with the code and set up the necessary visualizations, we transfer the scripts into Power BI, as displayed below:

technolog_2-1688159992781.png

The contrast between the two approaches - interpolation and approximation - can be vividly discerned here.