Storage Access
Storage Access allows your Data App to read data from and write data back to Keboola Storage tables in real-time. Your app connects directly to Keboola’s storage through Query Service via SQL, enabling:
- Real-time data access: Always work with the latest data, no redeployment needed
- Write-back capability: Update, insert into existing Storage tables directly from your app
- Interactive applications: Build data entry forms, approval workflows, and collaborative tools
This feature is available for both Streamlit and Python/JS Data Apps. Code examples on this page use Python; the same concepts apply when calling the Query Service API from JavaScript.
When to Use Storage Access
Section titled “When to Use Storage Access”Use Storage Access when you need to
Section titled “Use Storage Access when you need to”- Build interactive data entry or editing applications
- Work with large datasets more efficiently - no need to load them via input mapping
- Enable business users to update data directly from the app
Stick with Input Mapping when
Section titled “Stick with Input Mapping when”- You don’t need write-back capability — the app only reads and displays data.
- Your dataset is small and changes infrequently (e.g., static reference data loaded at deploy time).
- You want the simplest possible setup with no additional configuration.
How It Works
Section titled “How It Works”Architecture Overview
Section titled “Architecture Overview”When you enable Storage Access, Keboola creates a dedicated workspace for your Data App. This workspace contains a database user with specific permissions (SELECT, INSERT, UPDATE, DELETE, TRUNCATE) on the tables you’ve selected.
Your Data App │ ▼Query Service ────► Workspace User ────► Storage Tables │ │ │ │ └── Handles authentication, └── Your selected tables billing, metadata refresh with granted permissionsYour app communicates with Storage through the Query Service API, not directly with Snowflake. This provides:
- Automatic authentication using your app’s token
- Usage tracking for billing
- Automatic metadata refresh after writes
- Abstraction from the underlying backend
The recommended Python client library is keboola-query-service (also available for JavaScript/TypeScript as @keboola/query-service).
Workspace Lifecycle
Section titled “Workspace Lifecycle”The workspace is ephemeral - a fresh workspace is created each time your app starts (including wake-up from sleep):
| Event | Workspace Action |
|---|---|
| App deploys | New workspace created |
| App wakes from sleep | New workspace created (old one deleted) |
| App redeployed | New workspace created (old one deleted) |
| App deleted | Workspace deleted |
This design ensures:
- Permission changes take effect on next app start
- No stale credentials or connections
- Clean isolation between app runs
Setting Up Storage Access
Section titled “Setting Up Storage Access”Step 1: Enable Storage Access
Section titled “Step 1: Enable Storage Access”- Go to the Project Settings.
- Go to the Features.
- Find the Storage Access feature and activate it.
Step 2: Configure Writable Tables
Section titled “Step 2: Configure Writable Tables”- Open your Data App configuration in Keboola.
- Go to the Advanced Settings tab.
- Find the Storage Access section.
- Click + Add Writable Table.
- Select a bucket and table from Storage.
- For each table, the app will have SELECT, INSERT, UPDATE, DELETE, and TRUNCATE permissions.
- You can add multiple tables from different buckets.
- All selected tables must exist before deploying.
Configuring Writable Tables Programmatically
Section titled “Configuring Writable Tables Programmatically”If you manage Data App configurations through the Storage API (or via automation/agents) rather than the UI, the same writable-table selection is expressed in the configuration JSON under storage.output.tables. Each entry is a table the app gets read/write permissions on:
{ "storage": { "output": { "tables": [ { "destination": "out.c-data-app.mvc-crashes", "unload_strategy": "direct-grant" } ] } }}destination— the full Storage table ID (<stage>.<bucket>.<table>) the app should be able to read and write. The table must exist before the app is deployed.unload_strategy: "direct-grant"— required marker that tells the platform “grant the app’s workspace direct SELECT/INSERT/UPDATE/DELETE/TRUNCATE on this table.” Tables without this strategy instorage.output.tablesare not exposed via Storage Access.
To add or remove writable tables programmatically, update the Data App’s configuration via the Storage API (Component Configurations endpoint) and redeploy the app for the new permissions to take effect.
Step 3: Deploy Your App
Section titled “Step 3: Deploy Your App”Click Deploy (or Redeploy for existing apps). During deployment:
- Keboola creates a new workspace with your selected table permissions.
- The workspace ID is passed to your app as an environment variable.
- Your app code can now use the Query Service to read and write data.
Reading Data from Storage
Section titled “Reading Data from Storage”Using the Query Service Client
Section titled “Using the Query Service Client”Install the Keboola Query Service client:
In pyproject.toml (Python):
dependencies = [ "keboola-query-service>=0.2.0",]In your Python code:
import jsonimport osfrom keboola_query_service import Client
# Storage Access config is set by the platform when the feature is enabled.# workspace_id is read from the manifest file (recommended); the other values# are plain env vars.try: branch_id = os.environ["BRANCH_ID"] query_service_url = os.environ["QUERY_SERVICE_URL"] with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f: workspace_id = json.load(f)["workspaceId"]except (KeyError, FileNotFoundError) as e: raise RuntimeError( "Storage Access is not enabled for this app. " "Enable it in Advanced Settings and redeploy." ) from e
# Initialize the Query Service clientclient = Client( base_url=query_service_url, token=os.environ["KBC_TOKEN"],)Reading Selected Tables
Section titled “Reading Selected Tables”To read a table you’ve selected in the UI:
import pandas as pd
# Query a table - use the full table ID (bucket.table)results = client.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=['SELECT * FROM "in.c-main"."customers" LIMIT 1000'],)
# One QueryResult per statement — we sent one statement, so take the first.result = results[0]
# Convert to DataFramedf = pd.DataFrame(result.data, columns=[c.name for c in result.columns])print(df.head())Table naming convention
Section titled “Table naming convention”- Use the full table ID in quotes:
"bucket_stage.bucket_name"."table_name" - Example:
"in.c-sales"."orders"for a tableordersin bucketin.c-sales
Running Custom Queries
Section titled “Running Custom Queries”You can run any SELECT query against your permitted tables:
# Join multiple tablesquery = """ SELECT c.customer_name, SUM(o.amount) as total_spent FROM "in.c-main"."customers" c JOIN "in.c-main"."orders" o ON c.id = o.customer_id GROUP BY c.customer_name ORDER BY total_spent DESC LIMIT 10"""
results = client.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=[query],)result = results[0]Writing Data Back to Storage
Section titled “Writing Data Back to Storage”Storage Access allows your app to modify data in Storage tables using standard SQL statements via the Query Service. This is useful for:
- Data entry forms
- Approval workflows
- Data correction interfaces
- Collaborative editing
Inserting and Updating Data
Section titled “Inserting and Updating Data”You can use standard SQL INSERT and UPDATE statements directly via the Query Service. Pass statements as a list — the SDK will execute them (transactionally by default) and return one result per statement:
# INSERT new recordsclient.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=[''' INSERT INTO "in.c-main"."approvals" ("id", "name", "status", "updated_at") VALUES (1, 'New Record', 'pending', CURRENT_TIMESTAMP) '''],)
# UPDATE existing recordsclient.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=[''' UPDATE "in.c-main"."approvals" SET status = 'approved', updated_at = CURRENT_TIMESTAMP WHERE id = 123 '''],)
# DELETE recordsclient.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=[''' DELETE FROM "in.c-main"."approvals" WHERE status = 'cancelled' '''],)The Query Service automatically handles metadata refresh in Storage after write operations, so row counts and table statistics stay current without any additional calls.
Truncating Tables
Section titled “Truncating Tables”To remove all data from a table:
client.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=['TRUNCATE TABLE "in.c-main"."temp_data"'],)Important Considerations
Section titled “Important Considerations”Metadata refresh: After any write operation, Keboola automatically refreshes the table metadata. This ensures:
- Row counts are accurate in the Storage UI
- Other components see the updated data
- Table statistics are current
Concurrency: Multiple users of your app may write simultaneously. If you need to prevent conflicts, you must handle this in your application logic:
# Example: Optimistic locking with version columnquery = """ UPDATE "in.c-main"."records" SET status = 'approved', version = version + 1 WHERE id = 123 AND version = 5"""results = client.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=[query],)
if results[0].rows_affected == 0: raise Exception("Record was modified by another user. Please refresh and try again.")Environment Variables
Section titled “Environment Variables”When Storage Access is enabled, the platform sets these environment variables in your Data App container:
| Variable | Description |
|---|---|
KBC_WORKSPACE_MANIFEST_PATH | Path to the workspace manifest JSON file. The file contains workspaceId (and other workspace metadata). Recommended source for the workspace ID. |
WORKSPACE_ID | ID of the provisioned workspace for this app. Also available in the manifest file (above) — prefer reading the manifest in new code. |
BRANCH_ID | Storage API branch ID of the project. |
QUERY_SERVICE_URL | URL of the Query Service API (stack-specific). |
KBC_TOKEN | Keboola Storage API token. |
If Storage Access is not enabled, KBC_WORKSPACE_MANIFEST_PATH / WORKSPACE_ID / BRANCH_ID / QUERY_SERVICE_URL are not set. Read them with a clear error message for users:
import jsonimport os
try: branch_id = os.environ["BRANCH_ID"] query_service_url = os.environ["QUERY_SERVICE_URL"] with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f: workspace_id = json.load(f)["workspaceId"]except (KeyError, FileNotFoundError) as e: raise RuntimeError( "Storage Access is not enabled. Enable it in Advanced Settings and redeploy." ) from eFor the full list of environment variables exposed to Data Apps, see the data-app-python-js runtime README.
Comparison: Input Mapping vs Direct Storage Access
Section titled “Comparison: Input Mapping vs Direct Storage Access”| Aspect | Input Mapping | Direct Storage Access |
|---|---|---|
| Data freshness | Snapshot at deploy time | Real-time, always current |
| Data loading | CSV files loaded to /data/in/tables/ | Query on demand via API |
| Write capability | None (read-only) | INSERT, UPDATE, DELETE, TRUNCATE |
| Dataset size | Limited by container memory | Virtually unlimited (pagination) |
| Configuration | Select tables in UI | Select tables + enable toggle |
| Use case | Static dashboards, reports | Interactive apps, data entry |
You can use both together: Input Mapping for reference data that rarely changes, Storage Access for data you need to read/write in real-time.
Example: Read-Write Data App
Section titled “Example: Read-Write Data App”This example shows a simple Flask app that reads records from Storage and allows users to update their status.
app.py:
from flask import Flask, request, render_template_stringimport jsonimport osfrom keboola_query_service import Client
app = Flask(__name__)
# Read Storage Access config once at startup.# workspace_id is read from the manifest (recommended); other values are env vars.BRANCH_ID = os.environ["BRANCH_ID"]with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f: WORKSPACE_ID = json.load(f)["workspaceId"]
qs_client = Client( base_url=os.environ["QUERY_SERVICE_URL"], token=os.environ["KBC_TOKEN"],)
ALLOWED_STATUSES = {"pending", "approved", "rejected"}
@app.route("/", methods=["GET", "POST"])def index(): if request.method == "POST": # Validate and sanitize user input BEFORE it reaches SQL. # int() guarantees record_id is a number; the allowlist guarantees # new_status is one of three exact strings. This is the only reason # the f-string below is safe — do not add other form fields to the # query without analogous validation. record_id = int(request.form["record_id"]) new_status = request.form["status"] if new_status not in ALLOWED_STATUSES: return "Invalid status", 400
qs_client.execute_query( branch_id=BRANCH_ID, workspace_id=WORKSPACE_ID, statements=[f''' UPDATE "in.c-main"."approvals" SET status = '{new_status}', updated_at = CURRENT_TIMESTAMP WHERE id = {record_id} '''], )
# Load current records results = qs_client.execute_query( branch_id=BRANCH_ID, workspace_id=WORKSPACE_ID, statements=['SELECT id, name, status, updated_at FROM "in.c-main"."approvals" ORDER BY id'], ) result = results[0] column_names = [c.name for c in result.columns] records = [dict(zip(column_names, row)) for row in result.data]
return render_template_string(TEMPLATE, records=records)
TEMPLATE = """<!DOCTYPE html><html><head><title>Approval Manager</title></head><body> <h1>Pending Approvals</h1> <table border="1"> <tr><th>ID</th><th>Name</th><th>Status</th><th>Action</th></tr> {% for r in records %} <tr> <td>{{ r.id }}</td> <td>{{ r.name }}</td> <td>{{ r.status }}</td> <td> <form method="POST" style="display:inline"> <input type="hidden" name="record_id" value="{{ r.id }}"> <select name="status"> <option value="pending">Pending</option> <option value="approved">Approved</option> <option value="rejected">Rejected</option> </select> <button type="submit">Update</button> </form> </td> </tr> {% endfor %} </table></body></html>"""
if __name__ == "__main__": app.run(host="0.0.0.0", port=5000)pyproject.toml:
[project]name = "approval-app"version = "0.1.0"requires-python = ">=3.11"dependencies = [ "flask>=3.0.0", "keboola-query-service>=0.2.0",]
[build-system]requires = ["setuptools>=61.0"]build-backend = "setuptools.build_meta"Best Practices
Section titled “Best Practices”1. Handle missing workspace gracefully
import jsonimport os
try: branch_id = os.environ["BRANCH_ID"] query_service_url = os.environ["QUERY_SERVICE_URL"] with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f: workspace_id = json.load(f)["workspaceId"]except (KeyError, FileNotFoundError): # Storage Access is not enabled — show a user-friendly error import streamlit as st # or use your framework's error handling st.error("Storage Access is not enabled for this app. Enable it in Advanced Settings and redeploy.") st.stop()2. Validate and sanitize user input to prevent SQL injection
Since the Query Service accepts raw SQL strings, you must validate all user input before including it in queries:
# ❌ DANGEROUS - never do thisquery = f"SELECT * FROM table WHERE id = {user_input}"
# ✅ SAFE - validate types and use allowlistssafe_id = int(user_input) # Ensure it's actually a numberquery = f"SELECT * FROM table WHERE id = {safe_id}"
# ✅ For string values, use an allowlist of permitted valuesALLOWED_STATUSES = {"pending", "approved", "rejected"}if status not in ALLOWED_STATUSES: raise ValueError(f"Invalid status: {status}")query = f"UPDATE table SET status = '{status}' WHERE id = {safe_id}"3. Implement keyset pagination for large datasets
Use keyset (cursor-based) pagination instead of OFFSET, which can produce duplicates or gaps on live data:
page_size = 1000last_id = 0 # Start from the beginning
while True: results = client.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=[f''' SELECT * FROM "in.c-main"."my_table" WHERE id > {last_id} ORDER BY id ASC LIMIT {page_size} '''], ) rows = results[0].data if not rows: break process_batch(rows) last_id = rows[-1][0] # Update cursor to last row's id4. Cache frequently-used data
For Streamlit apps, use st.cache_data:
import streamlit as st
@st.cache_data(ttl=300) # Cache for 5 minutesdef load_reference_data(): results = client.execute_query( branch_id=branch_id, workspace_id=workspace_id, statements=['SELECT * FROM "in.c-main"."reference_data"'], ) result = results[0] return pd.DataFrame(result.data, columns=[c.name for c in result.columns])For Python/JS (non-Streamlit) apps, use a simple in-memory cache:
import time
_cache = {}_cache_ttl = 300 # seconds
def get_cached_data(key, query_fn): now = time.time() if key in _cache and now - _cache[key]["ts"] < _cache_ttl: return _cache[key]["data"] data = query_fn() _cache[key] = {"data": data, "ts": now} return data5. Track write operations
Write operations are automatically tracked by the Query Service for billing purposes. For additional application-level auditing, log to stdout (visible in Data App container logs):
import logginglogging.basicConfig(level=logging.INFO)
logging.info(f"User {current_user} updated record {record_id} to status {new_status}")# Output goes to stdout → visible in the Terminal Log tab of your Data AppLimitations
Section titled “Limitations”- Snowflake only: Storage Access currently works only with Snowflake backends. BigQuery support is planned for a future release.
- Column-level permissions not supported: If you grant access to a table, the app can read/write all columns.
- Permission changes require app restart: If you add or remove tables from the Storage Access configuration, the changes take effect on the next app start (deploy, redeploy, or wake from sleep).