Apache Iceberg
Apache Iceberg is an open table format for large analytical datasets, with support for AWS Glue and Amazon S3 Tables catalogs.
Overview
Apache Iceberg is an open table format designed for large-scale analytical datasets. DBCode connects to Iceberg catalogs through DuckDB’s iceberg extension, supporting:
- AWS Glue / SageMaker Lakehouse: Connect to Iceberg tables registered in the AWS Glue Data Catalog
- Amazon S3 Tables: Connect directly to S3 Tables table buckets via ARN
- Schema exploration: Browse Glue databases and their tables in the object tree
- Standard SQL queries: Query Iceberg tables using DuckDB’s analytical SQL engine
- Credential management: AWS authentication via DBCode’s auth profile system with automatic credential refresh
Iceberg is ideal for teams running large-scale data lakes on AWS who want to explore and query their Iceberg tables without spinning up Spark or EMR clusters.
Connecting
To connect to an Iceberg catalog in DBCode:
- Open the DBCode Extension: Launch Visual Studio Code and open the DBCode extension.
- Add a New Connection: Click on the “Add Connection” icon.
- Complete connection form:
- Select Apache Iceberg as the database type
- Choose your catalog type (AWS Glue or S3 Tables)
- For Glue: provide your AWS Account ID and Glue endpoint
- For S3 Tables: provide the table bucket ARN
- Configure an AWS authentication profile with your credentials
- Connect: Click save to establish your connection.
- Start Querying: Browse and query your Iceberg tables.
For detailed instructions on connecting to Apache Iceberg, refer to the Connect article.
Iceberg Features in DBCode
DBCode provides read access to Iceberg catalogs with:
- Catalog browsing: Explore Glue databases and Iceberg tables in the object tree
- SQL queries: Run analytical queries against Iceberg tables using DuckDB SQL
- SQL autocomplete: Intelligent suggestions for table and column names
- Data export: Export query results to CSV, Excel, Parquet, and other formats
By using Apache Iceberg with DBCode, you can explore and query your data lake tables directly within Visual Studio Code, without needing heavyweight tools like Spark or Athena.
For more information about Apache Iceberg, check out Apache Iceberg.