Connect jupyter to kafka
β
A step-by-step guide to integrating Jupyter with Streambased, unlocking powerful capabilities for interactive data exploration and analysis on streaming data.
β
Pre-requisites
β
Install the following packages in your python environment
β
pip install jupyterlab
pip install jupysql
pip install sqlalchemy-trino
pip install pandas
β
Step 1: Start the notebook
β
Launch a notebook directly with:
β
jupyter lab
β
Step 2: Create Database Engine
β
From your notebook create a database engine using sqlalchemy.engine
β
from sqlalchemy.engine import create_engine
engine = create_engine("trino://streambased.cloud:8443/kafka",
connect_args ={"http_scheme":"https", "schema":"streambased"})
β
![](https://cdn.prod.website-files.com/66098d655e9084457b00d675/665f0e98e5fa1ddc688505f6_AD_4nXdTh7Xn-l0T7tP5h-5T71N2k7xW0CGhQCSCqbjLARXAAQr9vNlHvboV9sqHepC_JQvlXbHEHYzrzSAz41U2J2iCiGXj7Q1cncZLLAhJwz6tERIB7z3kseOTB0_m-4XM7DK2Fw5BmyJy5Is4h-GbcgxaOJ1x.png)
β
Step 3: Load the SQL extension
β
From your notebook load the SQL extension:
β
%load_ext sql
β
Step 4: Connect SQL engine to Database
β
From your notebook connect sql engine to database:
β
%sql engine
β
Step 5: Run a query
β
Now we can run a query:
β
%sql SELECT * FROM demo_transactions
β
Step 6: (optional) Pandas?
β
Change the query to pandas dataframe
β
transactions = %sql SELECT * FROM demo_transactions
df = result.DataFrame()
β
![](https://cdn.prod.website-files.com/66098d655e9084457b00d675/665f0f41cfbd6d00606f23ed_AD_4nXf24M63L9eyb4S47ioA5h_o_ekjL3fiTh-y2Vpwij4_9W2WEABNpC78O3Io_kwuWpl7Av0e56kh--UvdWqVTYcBwAmbopI7R2IWX51ckYEBohkLaCNG6bJylaMeRJP4xw6rWqOPp7UiPKyW1uB_W4AbO2c.png)
β
β