Skip to content
Data Apps

Storage Access

Storage Access allows your Data App to read data from and write data back to Keboola Storage tables in real-time. Your app connects directly to Keboola’s storage through Query Service via SQL, enabling:

  • Real-time data access: Always work with the latest data, no redeployment needed
  • Write-back capability: Update, insert into existing Storage tables directly from your app
  • Interactive applications: Build data entry forms, approval workflows, and collaborative tools

This feature is available for both Streamlit and Python/JS Data Apps. Code examples on this page use Python; the same concepts apply when calling the Query Service API from JavaScript.

  • Build interactive data entry or editing applications
  • Work with large datasets more efficiently - no need to load them via input mapping
  • Enable business users to update data directly from the app
  • You don’t need write-back capability — the app only reads and displays data.
  • Your dataset is small and changes infrequently (e.g., static reference data loaded at deploy time).
  • You want the simplest possible setup with no additional configuration.

When you enable Storage Access, Keboola creates a dedicated workspace for your Data App. This workspace contains a database user with specific permissions (SELECT, INSERT, UPDATE, DELETE, TRUNCATE) on the tables you’ve selected.

Your Data App
Query Service ────► Workspace User ────► Storage Tables
│ │
│ │
└── Handles authentication, └── Your selected tables
billing, metadata refresh with granted permissions

Your app communicates with Storage through the Query Service API, not directly with Snowflake. This provides:

  • Automatic authentication using your app’s token
  • Usage tracking for billing
  • Automatic metadata refresh after writes
  • Abstraction from the underlying backend

The recommended Python client library is keboola-query-service (also available for JavaScript/TypeScript as @keboola/query-service).

The workspace is ephemeral - a fresh workspace is created each time your app starts (including wake-up from sleep):

EventWorkspace Action
App deploysNew workspace created
App wakes from sleepNew workspace created (old one deleted)
App redeployedNew workspace created (old one deleted)
App deletedWorkspace deleted

This design ensures:

  • Permission changes take effect on next app start
  • No stale credentials or connections
  • Clean isolation between app runs
  1. Go to the Project Settings.
  2. Go to the Features.
  3. Find the Storage Access feature and activate it.
  1. Open your Data App configuration in Keboola.
  2. Go to the Advanced Settings tab.
  3. Find the Storage Access section.
  4. Click + Add Writable Table.
  5. Select a bucket and table from Storage.
  6. For each table, the app will have SELECT, INSERT, UPDATE, DELETE, and TRUNCATE permissions.
  • You can add multiple tables from different buckets.
  • All selected tables must exist before deploying.

Configuring Writable Tables Programmatically

Section titled “Configuring Writable Tables Programmatically”

If you manage Data App configurations through the Storage API (or via automation/agents) rather than the UI, the same writable-table selection is expressed in the configuration JSON under storage.output.tables. Each entry is a table the app gets read/write permissions on:

{
"storage": {
"output": {
"tables": [
{
"destination": "out.c-data-app.mvc-crashes",
"unload_strategy": "direct-grant"
}
]
}
}
}
  • destination — the full Storage table ID (<stage>.<bucket>.<table>) the app should be able to read and write. The table must exist before the app is deployed.
  • unload_strategy: "direct-grant" — required marker that tells the platform “grant the app’s workspace direct SELECT/INSERT/UPDATE/DELETE/TRUNCATE on this table.” Tables without this strategy in storage.output.tables are not exposed via Storage Access.

To add or remove writable tables programmatically, update the Data App’s configuration via the Storage API (Component Configurations endpoint) and redeploy the app for the new permissions to take effect.

Click Deploy (or Redeploy for existing apps). During deployment:

  1. Keboola creates a new workspace with your selected table permissions.
  2. The workspace ID is passed to your app as an environment variable.
  3. Your app code can now use the Query Service to read and write data.

Install the Keboola Query Service client:

In pyproject.toml (Python):

dependencies = [
"keboola-query-service>=0.2.0",
]

In your Python code:

import json
import os
from keboola_query_service import Client
# Storage Access config is set by the platform when the feature is enabled.
# workspace_id is read from the manifest file (recommended); the other values
# are plain env vars.
try:
branch_id = os.environ["BRANCH_ID"]
query_service_url = os.environ["QUERY_SERVICE_URL"]
with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f:
workspace_id = json.load(f)["workspaceId"]
except (KeyError, FileNotFoundError) as e:
raise RuntimeError(
"Storage Access is not enabled for this app. "
"Enable it in Advanced Settings and redeploy."
) from e
# Initialize the Query Service client
client = Client(
base_url=query_service_url,
token=os.environ["KBC_TOKEN"],
)

To read a table you’ve selected in the UI:

import pandas as pd
# Query a table - use the full table ID (bucket.table)
results = client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=['SELECT * FROM "in.c-main"."customers" LIMIT 1000'],
)
# One QueryResult per statement — we sent one statement, so take the first.
result = results[0]
# Convert to DataFrame
df = pd.DataFrame(result.data, columns=[c.name for c in result.columns])
print(df.head())
  • Use the full table ID in quotes: "bucket_stage.bucket_name"."table_name"
  • Example: "in.c-sales"."orders" for a table orders in bucket in.c-sales

You can run any SELECT query against your permitted tables:

# Join multiple tables
query = """
SELECT
c.customer_name,
SUM(o.amount) as total_spent
FROM "in.c-main"."customers" c
JOIN "in.c-main"."orders" o ON c.id = o.customer_id
GROUP BY c.customer_name
ORDER BY total_spent DESC
LIMIT 10
"""
results = client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=[query],
)
result = results[0]

Storage Access allows your app to modify data in Storage tables using standard SQL statements via the Query Service. This is useful for:

  • Data entry forms
  • Approval workflows
  • Data correction interfaces
  • Collaborative editing

You can use standard SQL INSERT and UPDATE statements directly via the Query Service. Pass statements as a list — the SDK will execute them (transactionally by default) and return one result per statement:

# INSERT new records
client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=['''
INSERT INTO "in.c-main"."approvals" ("id", "name", "status", "updated_at")
VALUES (1, 'New Record', 'pending', CURRENT_TIMESTAMP)
'''],
)
# UPDATE existing records
client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=['''
UPDATE "in.c-main"."approvals"
SET status = 'approved', updated_at = CURRENT_TIMESTAMP
WHERE id = 123
'''],
)
# DELETE records
client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=['''
DELETE FROM "in.c-main"."approvals"
WHERE status = 'cancelled'
'''],
)

The Query Service automatically handles metadata refresh in Storage after write operations, so row counts and table statistics stay current without any additional calls.

To remove all data from a table:

client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=['TRUNCATE TABLE "in.c-main"."temp_data"'],
)

Metadata refresh: After any write operation, Keboola automatically refreshes the table metadata. This ensures:

  • Row counts are accurate in the Storage UI
  • Other components see the updated data
  • Table statistics are current

Concurrency: Multiple users of your app may write simultaneously. If you need to prevent conflicts, you must handle this in your application logic:

# Example: Optimistic locking with version column
query = """
UPDATE "in.c-main"."records"
SET status = 'approved', version = version + 1
WHERE id = 123 AND version = 5
"""
results = client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=[query],
)
if results[0].rows_affected == 0:
raise Exception("Record was modified by another user. Please refresh and try again.")

When Storage Access is enabled, the platform sets these environment variables in your Data App container:

VariableDescription
KBC_WORKSPACE_MANIFEST_PATHPath to the workspace manifest JSON file. The file contains workspaceId (and other workspace metadata). Recommended source for the workspace ID.
WORKSPACE_IDID of the provisioned workspace for this app. Also available in the manifest file (above) — prefer reading the manifest in new code.
BRANCH_IDStorage API branch ID of the project.
QUERY_SERVICE_URLURL of the Query Service API (stack-specific).
KBC_TOKENKeboola Storage API token.

If Storage Access is not enabled, KBC_WORKSPACE_MANIFEST_PATH / WORKSPACE_ID / BRANCH_ID / QUERY_SERVICE_URL are not set. Read them with a clear error message for users:

import json
import os
try:
branch_id = os.environ["BRANCH_ID"]
query_service_url = os.environ["QUERY_SERVICE_URL"]
with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f:
workspace_id = json.load(f)["workspaceId"]
except (KeyError, FileNotFoundError) as e:
raise RuntimeError(
"Storage Access is not enabled. Enable it in Advanced Settings and redeploy."
) from e

For the full list of environment variables exposed to Data Apps, see the data-app-python-js runtime README.

Comparison: Input Mapping vs Direct Storage Access

Section titled “Comparison: Input Mapping vs Direct Storage Access”
AspectInput MappingDirect Storage Access
Data freshnessSnapshot at deploy timeReal-time, always current
Data loadingCSV files loaded to /data/in/tables/Query on demand via API
Write capabilityNone (read-only)INSERT, UPDATE, DELETE, TRUNCATE
Dataset sizeLimited by container memoryVirtually unlimited (pagination)
ConfigurationSelect tables in UISelect tables + enable toggle
Use caseStatic dashboards, reportsInteractive apps, data entry

You can use both together: Input Mapping for reference data that rarely changes, Storage Access for data you need to read/write in real-time.

This example shows a simple Flask app that reads records from Storage and allows users to update their status.

app.py:

from flask import Flask, request, render_template_string
import json
import os
from keboola_query_service import Client
app = Flask(__name__)
# Read Storage Access config once at startup.
# workspace_id is read from the manifest (recommended); other values are env vars.
BRANCH_ID = os.environ["BRANCH_ID"]
with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f:
WORKSPACE_ID = json.load(f)["workspaceId"]
qs_client = Client(
base_url=os.environ["QUERY_SERVICE_URL"],
token=os.environ["KBC_TOKEN"],
)
ALLOWED_STATUSES = {"pending", "approved", "rejected"}
@app.route("/", methods=["GET", "POST"])
def index():
if request.method == "POST":
# Validate and sanitize user input BEFORE it reaches SQL.
# int() guarantees record_id is a number; the allowlist guarantees
# new_status is one of three exact strings. This is the only reason
# the f-string below is safe — do not add other form fields to the
# query without analogous validation.
record_id = int(request.form["record_id"])
new_status = request.form["status"]
if new_status not in ALLOWED_STATUSES:
return "Invalid status", 400
qs_client.execute_query(
branch_id=BRANCH_ID,
workspace_id=WORKSPACE_ID,
statements=[f'''
UPDATE "in.c-main"."approvals"
SET status = '{new_status}', updated_at = CURRENT_TIMESTAMP
WHERE id = {record_id}
'''],
)
# Load current records
results = qs_client.execute_query(
branch_id=BRANCH_ID,
workspace_id=WORKSPACE_ID,
statements=['SELECT id, name, status, updated_at FROM "in.c-main"."approvals" ORDER BY id'],
)
result = results[0]
column_names = [c.name for c in result.columns]
records = [dict(zip(column_names, row)) for row in result.data]
return render_template_string(TEMPLATE, records=records)
TEMPLATE = """
<!DOCTYPE html>
<html>
<head><title>Approval Manager</title></head>
<body>
<h1>Pending Approvals</h1>
<table border="1">
<tr><th>ID</th><th>Name</th><th>Status</th><th>Action</th></tr>
{% for r in records %}
<tr>
<td>{{ r.id }}</td>
<td>{{ r.name }}</td>
<td>{{ r.status }}</td>
<td>
<form method="POST" style="display:inline">
<input type="hidden" name="record_id" value="{{ r.id }}">
<select name="status">
<option value="pending">Pending</option>
<option value="approved">Approved</option>
<option value="rejected">Rejected</option>
</select>
<button type="submit">Update</button>
</form>
</td>
</tr>
{% endfor %}
</table>
</body>
</html>
"""
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)

pyproject.toml:

[project]
name = "approval-app"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"flask>=3.0.0",
"keboola-query-service>=0.2.0",
]
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

1. Handle missing workspace gracefully

import json
import os
try:
branch_id = os.environ["BRANCH_ID"]
query_service_url = os.environ["QUERY_SERVICE_URL"]
with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f:
workspace_id = json.load(f)["workspaceId"]
except (KeyError, FileNotFoundError):
# Storage Access is not enabled — show a user-friendly error
import streamlit as st # or use your framework's error handling
st.error("Storage Access is not enabled for this app. Enable it in Advanced Settings and redeploy.")
st.stop()

2. Validate and sanitize user input to prevent SQL injection

Since the Query Service accepts raw SQL strings, you must validate all user input before including it in queries:

# ❌ DANGEROUS - never do this
query = f"SELECT * FROM table WHERE id = {user_input}"
# ✅ SAFE - validate types and use allowlists
safe_id = int(user_input) # Ensure it's actually a number
query = f"SELECT * FROM table WHERE id = {safe_id}"
# ✅ For string values, use an allowlist of permitted values
ALLOWED_STATUSES = {"pending", "approved", "rejected"}
if status not in ALLOWED_STATUSES:
raise ValueError(f"Invalid status: {status}")
query = f"UPDATE table SET status = '{status}' WHERE id = {safe_id}"

3. Implement keyset pagination for large datasets

Use keyset (cursor-based) pagination instead of OFFSET, which can produce duplicates or gaps on live data:

page_size = 1000
last_id = 0 # Start from the beginning
while True:
results = client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=[f'''
SELECT * FROM "in.c-main"."my_table"
WHERE id > {last_id}
ORDER BY id ASC
LIMIT {page_size}
'''],
)
rows = results[0].data
if not rows:
break
process_batch(rows)
last_id = rows[-1][0] # Update cursor to last row's id

4. Cache frequently-used data

For Streamlit apps, use st.cache_data:

import streamlit as st
@st.cache_data(ttl=300) # Cache for 5 minutes
def load_reference_data():
results = client.execute_query(
branch_id=branch_id,
workspace_id=workspace_id,
statements=['SELECT * FROM "in.c-main"."reference_data"'],
)
result = results[0]
return pd.DataFrame(result.data, columns=[c.name for c in result.columns])

For Python/JS (non-Streamlit) apps, use a simple in-memory cache:

import time
_cache = {}
_cache_ttl = 300 # seconds
def get_cached_data(key, query_fn):
now = time.time()
if key in _cache and now - _cache[key]["ts"] < _cache_ttl:
return _cache[key]["data"]
data = query_fn()
_cache[key] = {"data": data, "ts": now}
return data

5. Track write operations

Write operations are automatically tracked by the Query Service for billing purposes. For additional application-level auditing, log to stdout (visible in Data App container logs):

import logging
logging.basicConfig(level=logging.INFO)
logging.info(f"User {current_user} updated record {record_id} to status {new_status}")
# Output goes to stdout → visible in the Terminal Log tab of your Data App
  • Snowflake only: Storage Access currently works only with Snowflake backends. BigQuery support is planned for a future release.
  • Column-level permissions not supported: If you grant access to a table, the app can read/write all columns.
  • Permission changes require app restart: If you add or remove tables from the Storage Access configuration, the changes take effect on the next app start (deploy, redeploy, or wake from sleep).
Ask Kai

Ask anything about Keboola — I'll search the docs and cite the pages I use.