Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
teng1
Frequent Visitor

Get Notion databases into power bi using python script

Dear,

 

I created a generic python script and it is able to run and return results in python IDE for various databases (dynamic columns and data types). The script just loop through a list of database id and print out the results. 

 

When i put that script into power bi, it only print out the last record.

May i know how do i get different databases into different datasets using loop or any other methods ?

 

my script:

 

import requests
import pandas as pd

#pls use your own token
token ='secret_******'

payload_dname = {
    "filter": {
        "value": "database",
        "property": "object"
    },
    "page_size": 100
}


headers = {
    "Authorization": f"{token}",
    "Notion-Version": "2022-02-22",
    "Content-Type": "application/json"
}


class NotionSync:
    def __init__(self):
        pass


    # search database name
    def  notion_search(self,integration_token = token):
        url = f"https://api.notion.com/v1/search"
        response = requests.post(url, json=payload_dname, headers=headers)

        if response.status_code != 200:
            return 'Error: ' + str(response.status_code)
            exit(0)
        else:
            return response.json()

    # query database details
    def  notion_db_details(self,database_id,integration_token = token):
        url = f"https://api.notion.com/v1/databases/" + database_id + "/query"
        response = requests.post(url, headers=headers)

        if response.status_code != 200:
            return 'Error: ' + str(response.status_code)
            exit(0)
        else:
            return response.json()

    # to get databases id and name
    def get_databases(self,data_json):
        databaseinfo = {}
        databaseinfo["database_id"] = [data_json["results"][i]["id"].replace("-","")
                                                for i in range(len(data_json["results"])) ]

        databaseinfo["database_name"] = [data_json["results"][i]["title"][0]["plain_text"]
                                                  if data_json["results"][i]["title"]
                                                  else ""
                                                  for i in range(len(data_json["results"])) ]

        databaseinfo["url"] = [ data_json["results"][i]["url"]
                                         if data_json["results"][i]["url"]
                                         else ""
                                         for i in range(len(data_json["results"])) ]
        return databaseinfo


    # to get column title of the table
    def get_tablecol_titles(self,data_json):
        return list(data_json["results"][0]["properties"].keys())
    
    # to get column data type for processing by type due to data structure is different by column type
    def get_tablecol_type(self,data_json,columns_title):
        type_data = {}
        for t in columns_title:
            type_data[t] = data_json["results"][0]["properties"][t]["type"]
        return type_data

    # to get table data by column type
    def get_table_data(self,data_json,columns_type):
        table_data = {}
        for k, v in columns_type.items():
            # to check column type and process by type
            if v in ["checkbox","number","email","phone_number"]:
                table_data[k] = [ data_json["results"][i]["properties"][k][v]
                                    if data_json["results"][i]["properties"][k][v]
                                    else ""
                                    for i in range(len(data_json["results"]))]
            elif v == "date":
                table_data[k] = [ data_json["results"][i]["properties"][k]["date"]["start"]
                                    if data_json["results"][i]["properties"][k]["date"]
                                    else ""
                                    for i in range(len(data_json["results"])) ]
            elif v == "rich_text" or v == 'title':
                table_data[k] = [ data_json["results"][i]["properties"][k][v][0]["plain_text"]
                                    if data_json["results"][i]["properties"][k][v]
                                    else ""
                                    for i in range(len(data_json["results"])) ]
            elif v == "files":
                table_data[k + "_FileName"] = [ data_json["results"][i]["properties"][k][v][0]["name"]
                                                if data_json["results"][i]["properties"][k][v]
                                                else ""
                                                for i in range(len(data_json["results"])) ]
                table_data[k + "_FileUrl"] = [ data_json["results"][i]["properties"][k][v][0]["file"]["url"]
                                           if data_json["results"][i]["properties"][k][v]
                                                else ""
                                           for i in range(len(data_json["results"])) ]
            elif v == "select":
                table_data[k] = [data_json["results"][i]["properties"][k][v]["name"]
                                    if data_json["results"][i]["properties"][k][v]
                                    else ""
                                    for i in range(len(data_json["results"]))]
            elif v == "people":
                table_data[k + "_Name"] = [ [data_json["results"][i]["properties"][k][v][j]["name"]
                                                if data_json["results"][i]["properties"][k][v]
                                                # to check if key 'name' exists in the list
                                                and "name" in data_json["results"][i]["properties"][k][v][j].keys()
                                                else ""
                                                for j in range(len(data_json["results"][i]["properties"][k][v]))]
                                                for i in range(len(data_json["results"])) ]
            elif v == "multi_select":
                table_data[k] = [ [data_json["results"][i]["properties"][k][v][j]["name"]
                                  if data_json["results"][i]["properties"][k][v]
                                  else ""
                                  for j in range(len(data_json["results"][i]["properties"][k][v]))]
                                  for i in range(len(data_json["results"])) 
                                ]

        return table_data    


if __name__=='__main__':
    nsync = NotionSync()

    # to search all databases.
    data = nsync.notion_search()

    # to get database id and name.
    dbid_name = nsync.get_databases(data)

    #convert dictionary to dataframe.
    df = pd.DataFrame.from_dict(dbid_name)

    # convert to bool and then drop record with empty databasae name.
    df = df[df['database_name'].astype(bool)]
    print (df)


    # to loop through database id and get the database details.
    for d in dbid_name["database_id"]:
        # notion given another API to get the details of databases by database id. search API does not return databases details.
        dbdetails = nsync.notion_db_details(d)

        # get column title
        columns_title = nsync.get_tablecol_titles(dbdetails)

        # get column type
        columns_type = nsync.get_tablecol_type(dbdetails,columns_title)

        # get table data
        table_data = nsync.get_table_data(dbdetails,columns_type)

        #convert dictionary to dataframe
        df1 = pd.DataFrame.from_dict(table_data)
		print (df1)

 

 

 

Thanks and regards

1 ACCEPTED SOLUTION
teng1
Frequent Visitor

Kinda found the answer. Power BI will import different databases results by variable name. In other words, change the variable name, in the loop, for each and every databases:

 

this

#convert dictionary to dataframe
df1 = pd.DataFrame.from_dict(table_data)
print (df1)

change to

#convert dictionary to dataframe
globals()[f"df{d}"] = pd.DataFrame.from_dict(table_data)
print (globals()[f"df{d}"])

 

Above changes will allow to import different databases into different datases when running the script to import data for the first time. If there is new databases added to notion page after the first run of the python script, new databases will not be added automatically as new datasets. We cannot simply click on the refresh button to hope that new datasets will be added automatically.

View solution in original post

2 REPLIES 2
teng1
Frequent Visitor

Kinda found the answer. Power BI will import different databases results by variable name. In other words, change the variable name, in the loop, for each and every databases:

 

this

#convert dictionary to dataframe
df1 = pd.DataFrame.from_dict(table_data)
print (df1)

change to

#convert dictionary to dataframe
globals()[f"df{d}"] = pd.DataFrame.from_dict(table_data)
print (globals()[f"df{d}"])

 

Above changes will allow to import different databases into different datases when running the script to import data for the first time. If there is new databases added to notion page after the first run of the python script, new databases will not be added automatically as new datasets. We cannot simply click on the refresh button to hope that new datasets will be added automatically.

amitchandak
Super User
Super User

@teng1 , The way you are doing you can add one table at a time. You need to explore Power bi cmdlets, Power bi Rest APIS, and Power BI XMLA endpoint

 

https://docs.microsoft.com/en-us/power-bi/enterprise/service-premium-connect-tools

https://docs.microsoft.com/en-us/powershell/power-bi/overview?view=powerbi-ps

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.