cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Helper I
Helper I

IndexError: index 1323169739 is out of bounds for axis 0 with size 1322985896

Hello all, 

 

I'm pretty new to Power BI and I'm facing some difficulties in running python script.

 

In fact, I'm getting an error in Power Query when running python script, for which I couldn't find a solution that works.

 

I have already worked with the latest version of python (3.9), but at this moment I'm working with version 3.6 because I suspect it may be more compatible with Power BI. Moreover, the packages used in the code have also been installed and imported.

 

My code: 

import pandas as pd

pd.crosstab(index=dataset[‘clientID’], columns=dataset[‘productID’])

 

The error:

DataSource.Error: ADO.NET: Python script error.

Traceback (most recent call last):

  File "PythonScriptWrapper.PY", line 15, in <module>

    pd.crosstab(index=dataset['clientID'], columns=dataset['productID'])

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\pivot.py", line 577, in crosstab

    **kwargs

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\frame.py", line 6089, in pivot_table

    observed=observed,

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\pivot.py", line 127, in pivot_table

    table = agged.unstack(to_unstack)

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\frame.py", line 6393, in unstack

    return unstack(self, level, fill_value)

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 412, in unstack

    return _unstack_frame(obj, level, fill_value=fill_value)

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 442, in _unstack_frame

    constructor=obj._constructor,

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 142, in __init__

    self._make_selectors()

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 177, in _make_selectors

    mask.put(selector, True)

IndexError: index 1323169739 is out of bounds for axis 0 with size 1322985896

 

Details:

    DataSourceKind=Python

    DataSourcePath=Python

    Message=Python script error.

Traceback (most recent call last):

  File "PythonScriptWrapper.PY", line 15, in <module>

    pd.crosstab(index=dataset['clientID'], columns=dataset['productID'])

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\pivot.py", line 577, in crosstab

    **kwargs

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\frame.py", line 6089, in pivot_table

    observed=observed,

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\pivot.py", line 127, in pivot_table

    table = agged.unstack(to_unstack)

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\frame.py", line 6393, in unstack

    return unstack(self, level, fill_value)

  File "C:\USERS\...\PYTHON36\lib\site-packages\pandas\core\reshape\reshape.py", line 412, in un...

    ErrorCode=-2147467259

    ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException

 

 

I would appreciate if someone could help me.

4 REPLIES 4
Helper I
Helper I

Thank you for your answer. 

If pandas crosstab has a limit, I cannot find it... 

Both columns have about 1 263 000 rows. ClienteID has almost 40 000 unique values and ProductID has more than 460 000 unique values. 

So you should expect a crosstab with 18.4 billion cells.  A bit rich.

I have already tried with a sample of the dataset and I think it works. However, when running python script, a decimal place is added to "ProductID". For example, before I had "12345" and now "12345.0". I tried to round it in python but it says it is a string, so I cannot understand where it came from. I think it happens before applying crosstab function since in the input dataset (created after running the script), this change is already there.
Also, when applying the power query changes it returns the error: Fail to save modifications on the server. Error returned: 'The SUM function only accepts column reference as the argument number 1. The '215965.0' column does not exist in the rowset. ' .

Super User III
Super User III

What is the cardinality of your clientID and ProductID columns?  Looks like you are hitting a pandas crosstab limit.  Might want to try with a smaller dataset.

Helpful resources

Announcements
secondImage

Congratulations!

We are excited to announce the Power BI Super Users!

Microsoft Ignite

Microsoft Ignite with Arun Ulag

Featured Session Drive Data Culture with Power BI: Vision, Strategy & Roadmap. Register here https://myignite.microsoft.com #MSIgnite #PowerPlatform #Power BI ​

Microsoft Ignite

Microsoft Ignite

Join digitally, March 2–4, 2021 to explore new tech that's ready to implement. Experience the keynote in mixed reality through AltspaceVR!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors