Moving average over non-numeric values (correct er...

strachi · ‎03-20-2018

Hi,

how can I smooth string values in a column?

I have time series data (timestamp; string) with some errors or gaps in it:

timestamp;string

1521585642;a

1521585643;a

1521585644;

1521585645;a

1521585646;a

1521585647;x

1521585648;a

1521585649;a

1521585650;a

I would like to fill the gap and replace the error ("x") with the values in proximity (lets say we want the most frequent value looking at the last 2 and next 2 values). You could call this moving average with strings. The result in this simple example would be all "a" in the string-column.

I feel like this comes close, but MAXA does not work with strings of course:

Smooth = 
CALCULATE (
    CALCULATE (
        MAXA( 'timeseries'[string] );
        'timeseries'[Datetime]
            >= VALUES ( 'timeseries'[Datetime] ) - 4 ;
        'timeseries'[Datetime] <= VALUES ( 'timeseries'[Datetime] )
    );
    ALLEXCEPT ( 'timeseries'; 'timeseries'[Tag];'timeseries'[Logfile];'timeseries'[Datetime] )
)

Any Ideas would be greatly appreciated. I was not able to find a solution.

v-jiascu-msft · ‎03-22-2018

Hi @strachi,

Can you share a complete sample please? I can't convert the "timestamp" into a time or a date.

Best Regards,

Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

strachi · ‎03-24-2018

Hi @v-jiascu-msft, thanks for your reply.

In fact we can further simplify. The "timestamp" does not matter here. The first column is just to indicate the order of the timeseries data. Your can think of it as an ordered index.

Source:

timestamp;string

1;a

2;a

3;(blank)

4;a

5;a

6;x

7;a

8;a

9;a

Result I am looking for:

timestamp;string

1;a

2;a

3;a

4;a

5;a

6;a

7;a

8;a

9;a

The "blank" and the "x" are errors to be identified by looking at the previous and following values in the series. They should be replaced by the most frequent value "in the neighbourhood".

Thank you for giving it another thought.

strachi · ‎03-29-2018

Sorry to push here... any ideas? @v-jiascu-msft

strachi · ‎04-03-2018

I am trying to use this to narrow down the strings in proximity to the data gap...

FILTER(Table1;Table1[Index]<=EARLIER(Table1[Index])+1 && Table1[Index]>=EARLIER(Table1[Index])-1)

I guess this could help me I do not succeed in putting it together in a calculated column:

https://community.powerbi.com/t5/Desktop/How-to-obtain-the-most-common-value-from-a-column-and-displ...

Most Frequent String = 
FIRSTNONBLANK (
    TOPN (
        1; 
        VALUES ( Table1[string] ); 
        RANKX( ALL( Table1[string] ); COUNTROWS(Table1);;ASC)
    ); 
    1 
)

Anyone?

strachi · ‎03-20-2018

Hi,

how can I smooth string values in a column?

I have time series data (timestamp; string) with some errors or gaps in it:

timestamp;string

1521585642;a

1521585643;a

1521585644;

1521585645;a

1521585646;a

1521585647;x

1521585648;a

1521585649;a

1521585650;a

I would like to fill the gap and replace the error ("x") with the values in proximity (lets say we want the most frequent value looking at the last 2 and next 2 values). You could call this moving average with strings. The result in this simple example would be all "a" in the string-column.

I feel like this comes close, but MAXA does not work with strings of course:

Smooth = 
CALCULATE (
    CALCULATE (
        MAXA( 'timeseries'[string] );
        'timeseries'[Datetime]
            >= VALUES ( 'timeseries'[Datetime] ) - 4 ;
        'timeseries'[Datetime] <= VALUES ( 'timeseries'[Datetime] )
    );
    ALLEXCEPT ( 'timeseries'; 'timeseries'[Tag];'timeseries'[Logfile];'timeseries'[Datetime] )
)

Any Ideas would be greatly appreciated. I was not able to find a solution.