Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Python script is not working with Regex contains "|" or "*"

Appreciate for your help in advance.

 

I built regex based text extract codes to extract the timestamp information from freetext. 

 

# 'dataset' holds the input data for this script

regex = r'(((\d{1})|(\d{2}))/((\d{1})|(\d{2}))/((\d{2})|(\d{4}))\s\d{2}:\d{2}:\d{2}(\s[aAPp][mM])*)'
dataset['Timestamp'] = dataset['description'].str.extract(regex)

 

The regex is valid, but output with some error messages in below. But the code works with some simple regex  without "|" or "*"

 

DataSource.Error: ADO.NET: Python script error.
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Timestamp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1069, in set
loc = self.items.get_loc(item)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Timestamp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 14, in <module>
dataset['Timestamp'] = dataset['description'].str.extract(regex)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 3472, in __setitem__
self._set_item(key, value)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 3550, in _set_item
NDFrame._set_item(self, key, value)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3381, in _set_item
self._data.set(key, value)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1072, in set
self.insert(len(self.items), item, value)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1181, in insert
block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\blocks.py", line 3267, in make_block
return klass(values, ndim=ndim, placement=placement)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\blocks.py", line 2775, in __init__
super().__init__(values, ndim=ndim, placement=placement)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\blocks.py", line 128, in __init__
"{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
ValueError: Wrong number of items passed 11, placement implies 1

Details:
DataSourceKind=Python
DataSourcePath=Python
Message=Python script error.
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Timestamp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1069, in set
loc = self.items.get_loc(item)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\p...
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException

3 REPLIES 3
v-piga-msft
Resident Rockstar
Resident Rockstar

Hi @Anonymous ,

I'm not good at Python script, are you sure your python script correct?

By my test with the script below in Query Editor, there is no any error. So I 'm afraid that Python script is working with Regex contains "|" or "*" .

From the blog, we could know how to use "|" and "*"  in Python script.

 

# 'dataset' holds the input data for this script
import re
pattern = '^a|b..cd*n$'
test_string = 'abycdn'
result = re.match(pattern, test_string)
if result:
  print("Search successful.")
else:
  print("Search unsuccessful.")

Please check your script again, if that is completely correct and you still have problem in Power BI , please let me know.

 

Best  Regards,

Cherry

 

Community Support Team _ Cherry Gao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi @v-piga-msft  Cherry,

 

Thanks for looking into my issue. The Python script is correct, if I use is for a simple regex pattern. Like the below one.

 

# 'dataset' holds the input data for this script

regex = r'(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})'
dataset['Timestamp'] = dataset['description'].str.extract(regex)

 

So, it is only not working when the regex become complex somehow. Does POWER BI offers any native filter option based on the regex, then the Python script is not really necessary.

Hi @Anonymous @v-piga-msft,

 

I am having the same issue using a Python script with regular expressions in Power BI. Did you find a solution to run the script? Please let me know. You can also reply to my post here. Thank you!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors