cancel
Showing results for 
Search instead for 
Did you mean: 
stevedep

Generate sample data

Hi,

 

In this article, I will show how to generate sample data and how to add this to your data model. 

The result is a sales table and related dimensions:

stevedep_1-1614058299300.png

stevedep_2-1614058326565.png

This is the code to generate the sales table:

 

Sales = 
VAR _tbl =
    SELECTCOLUMNS (
        CROSSJOIN ( ROW ( "test", DATE ( 2020, 10, 3 ) ), GENERATESERIES ( 1, 60, 1 ) ),
        "StartDate", [test],
        "Increment", [Value]
    )
VAR _dates =
    ADDCOLUMNS ( _tbl, "NewDate", [StartDate] + [Increment] )
RETURN
    GENERATE (
        SELECTCOLUMNS (
            { "Vendor 1", "Vendor 2", "Vendor 3", "Vendor 4", "Vendor 5" },
            "Vendor", [Value]
        ),
        SELECTCOLUMNS (
            ADDCOLUMNS (
                CROSSJOIN ( _dates, GENERATESERIES ( 1, 10, 1 ) ),
                "open_time",
                    [NewDate]
                        + NORM.INV ( RAND (), 0.5 + ( [Increment] / 100 ), 0.08 ),
                "PRODUCT",
                    CONVERT ( UNICHAR ( RANDBETWEEN ( 65, 90 ) ), STRING )
                        & CONVERT ( UNICHAR ( RANDBETWEEN ( 65, 90 ) ), STRING )
                        & CONVERT ( UNICHAR ( RANDBETWEEN ( 65, 90 ) ), STRING ),
                "sales_amount", NORM.INV ( RAND (), [Value], [Value] / 10 )
            ),
            "sales_date_time", [open_time],
            "PRODUCT", [PRODUCT],
            "sales_amount", [sales_amount]
        )
    )

 

 

You can add the above code by navigating to the ribbon in Power BI, select 'Modelling' to then select 'New Table'. Paste the above code in the editor. 

stevedep_4-1614059464022.png

 

Explanation

In the next part, I will break down the working of the code. 

 

First, we generate a series of dates:

stevedep_3-1614058549862.png

We started with a default date and added an incremental column, that column sets the number of days that we will add to the default data. 

 

Next is adding vendors:

 

     SELECTCOLUMNS (
            { "Vendor 1", "Vendor 2", "Vendor 3", "Vendor 4", "Vendor 5" },
            "Vendor", [Value]

 

For each vendor, all the dates (generated above) are added and a timestamp is added:

 

  SELECTCOLUMNS (
            ADDCOLUMNS (
                CROSSJOIN ( _dates, GENERATESERIES ( 1, 10, 1 ) ),
                "open_time",
                    [NewDate]
                        + NORM.INV ( RAND (), 0.5 + ( [Increment] / 100 ), 0.08 ),

 

A product is added by randomly selected characters from the UNICHAR array:

 

 "PRODUCT",
                    CONVERT ( UNICHAR ( RANDBETWEEN ( 65, 90 ) ), STRING )
                        & CONVERT ( UNICHAR ( RANDBETWEEN ( 65, 90 ) ), STRING )
                        & CONVERT ( UNICHAR ( RANDBETWEEN ( 65, 90 ) ), STRING ),

 

A sales amount is added:

 

      "sales_amount", NORM.INV ( RAND (), [Value], [Value] / 10 )

 

Finally, the required columns are added:

 

            "sales_date_time", [open_time],
            "PRODUCT", [PRODUCT],
            "sales_amount", [sales_amount]
        )

 

With the above steps, we have created the fact table. With that table in place we can add the dimension tables by simply adding a table with DAX like below:

 

dimProducts = VALUES(Sales[PRODUCT])

 

As a final step, we add the relationships in the data model. That is hopefully on familiar grounds. 

 

You can have fun with generating data and creating art from it as well 🙂 , see below:

stevedep_5-1614060078299.png

The above image is from a response that I wrote when helping out somebody with a DAX challange. 

 

Hope that you find it useful. An example is attached.

 

Kind regards, 

Steve. 

 

 

Polls
What is your favorite Power BI Feature release this month?