The Dastardly DataFrame Dataset¶
Every DataFrame viewer works fine on pd.DataFrame({'a': [1, 2, 3]}).
The question is what happens when the data gets weird.
Buckaroo ships a collection of deliberately tricky DataFrames called the
Dastardly DataFrame Dataset (DDD). These are the DataFrames that break
other viewers — the ones with MultiIndex columns, NaN mixed with infinity,
columns literally named index, integers too large for JavaScript, and
types that most tools pretend don’t exist.
This page shows each one rendered live in buckaroo’s static embed. No Jupyter kernel, no server — just HTML and JavaScript. If you can see the tables below, the static embedding system is working.
Why this matters¶
If you build dashboards, you choose what data goes into your table. You control the types, the column names, the index. But if you’re doing exploratory data analysis — loading CSVs from vendors, joining tables from different systems, debugging a pipeline that produces unexpected output — you don’t control any of that. The data is what it is.
df.head() hides the problem. It shows you 5 rows and lets you believe
everything is fine. Buckaroo is built for the opposite workflow: show you
everything, especially the parts that are surprising.
The Dastardly DataFrames¶
Each section below shows the exact function from buckaroo.ddd_library
that creates the DataFrame, explains why it’s tricky, and renders it live
in a buckaroo static embed.
pip install buckaroo
from buckaroo.ddd_library import *
Infinity and NaN¶
# from buckaroo/ddd_library.py
def df_with_infinity() -> pd.DataFrame:
return pd.DataFrame({'a': [np.nan, np.inf, np.inf * -1]})
df_with_infinity()
Three values, three completely different things: a missing value, positive infinity, and negative infinity. Many viewers display all three as blank or “NaN”. Buckaroo distinguishes them.
This also tests whether summary stats (mean, min, max) handle infinity
correctly — they should, because np.inf is a valid float, not missing
data.
Really Big Numbers¶
# from buckaroo/ddd_library.py
def df_with_really_big_number() -> pd.DataFrame:
return pd.DataFrame({"col1": [9999999999999999999, 1]})
df_with_really_big_number()
Python integers have arbitrary precision. JavaScript’s Number type has
53 bits of integer precision (Number.MAX_SAFE_INTEGER = 9007199254740991).
The value 9999999999999999999 exceeds this — if you naively convert it to a
JS number, it silently rounds to 10000000000000000000.
Buckaroo detects values above MAX_SAFE_INTEGER and preserves them as
strings to maintain exact precision. This matters for database primary keys,
blockchain transaction IDs, and any system that uses 64-bit integers.
Column Named “index”¶
# from buckaroo/ddd_library.py
def df_with_col_named_index() -> pd.DataFrame:
return pd.DataFrame({
'a': ["asdf", "foo_b", "bar_a", "bar_b", "bar_c"],
'index': ["7777", "ooooo", "--- -", "33333", "assdf"]})
df_with_col_named_index()
When you call df.reset_index(), pandas creates a column called index.
Many widgets break because they confuse this column with the DataFrame’s
actual index. Buckaroo handles the ambiguity by internally renaming columns
to a, b, c... and mapping back via orig_col_name.
Named Index¶
# from buckaroo/ddd_library.py
def get_df_with_named_index() -> pd.DataFrame:
"""someone put the effort into naming the index,
you'd probably want to display that"""
return pd.DataFrame(
{'a': ["asdf", "foo_b", "bar_a", "bar_b", "bar_c"]},
index=pd.Index([10, 20, 30, 40, 50], name='foo'))
get_df_with_named_index()
Someone took the time to name this index foo. That name carries meaning —
it might be a join key, a time series frequency, or a categorical grouping.
Buckaroo displays named indexes as a distinct pinned column so the name is
visible.
MultiIndex Columns¶
# from buckaroo/ddd_library.py
def get_multiindex_with_names_cols_df(rows=15) -> pd.DataFrame:
cols = pd.MultiIndex.from_tuples(
[('foo', 'a'), ('foo', 'b'), ('bar', 'a'),
('bar', 'b'), ('bar', 'c')],
names=['level_a', 'level_b'])
return pd.DataFrame(
[["asdf", "foo_b", "bar_a", "bar_b", "bar_c"]] * rows,
columns=cols)
get_multiindex_with_names_cols_df(rows=6)
Hierarchical column headers are common after .pivot_table() and
.groupby().agg(). Most viewers either crash or flatten them into ugly
tuple strings like ('foo', 'a'). Buckaroo flattens them into readable
headers while preserving the level information.
MultiIndex on Rows¶
# from buckaroo/ddd_library.py
def get_multiindex_index_df() -> pd.DataFrame:
row_index = pd.MultiIndex.from_tuples([
('foo', 'a'), ('foo', 'b'),
('bar', 'a'), ('bar', 'b'), ('bar', 'c'),
('baz', 'a')])
return pd.DataFrame({
'foo_col': [10, 20, 30, 40, 50, 60],
'bar_col': ['foo', 'bar', 'baz', 'quux', 'boff', None]},
index=row_index)
get_multiindex_index_df()
Multi-level row indexes are the counterpart to MultiIndex columns. They
appear after .groupby() without .reset_index(), or when loading
data from hierarchical sources. The tricky part: each index level becomes
an additional column that has to be displayed alongside the data columns
without breaking the column count.
This DataFrame also has a None in the last row of bar_col — a missing
string value mixed with non-missing strings.
Three-Level MultiIndex¶
# from buckaroo/ddd_library.py
def get_multiindex3_index_df() -> pd.DataFrame:
row_index = pd.MultiIndex.from_tuples([
('foo', 'a', 3), ('foo', 'b', 2),
('bar', 'a', 1), ('bar', 'b', 3), ('bar', 'c', 5),
('baz', 'a', 6)])
return pd.DataFrame({
'foo_col': [10, 20, 30, 40, 50, 60],
'bar_col': ['foo', 'bar', 'baz', 'quux', 'boff', None]},
index=row_index)
get_multiindex3_index_df()
If two levels are hard, three levels are harder. This exercises the column-renaming logic that has to handle an arbitrary number of index levels without collision.
MultiIndex on Both Axes¶
# from buckaroo/ddd_library.py
def get_multiindex_with_names_both() -> pd.DataFrame:
row_index = pd.MultiIndex.from_tuples([
('foo', 'a'), ('foo', 'b'),
('bar', 'a'), ('bar', 'b'), ('bar', 'c'),
('baz', 'a')],
names=['index_name_1', 'index_name_2'])
cols = pd.MultiIndex.from_tuples(
[('foo', 'a'), ('foo', 'b'), ('bar', 'a'),
('bar', 'b'), ('bar', 'c'), ('baz', 'a')],
names=['level_a', 'level_b'])
return pd.DataFrame([
[10, 20, 30, 40, 50, 60]] * 6,
columns=cols, index=row_index)
get_multiindex_with_names_both()
The boss fight: hierarchical headers on both axes, with named levels on
both sides. This is what pd.pivot_table() produces on complex groupings.
Everything about column counting, index handling, and header rendering gets
tested simultaneously.
Weird Types (Pandas)¶
# from buckaroo/ddd_library.py
def df_with_weird_types() -> pd.DataFrame:
"""DataFrame with unusual dtypes that historically broke rendering.
Exercises: categorical, timedelta, period, interval."""
return pd.DataFrame({
'categorical': pd.Categorical(
['red', 'green', 'blue', 'red', 'green']),
'timedelta': pd.to_timedelta(
['1 days 02:03:04', '0 days 00:00:01',
'365 days', '0 days 00:00:00.001',
'0 days 00:00:00.000100']),
'period': pd.Series(
pd.period_range('2021-01', periods=5, freq='M')),
'interval': pd.Series(
pd.arrays.IntervalArray.from_breaks([0, 1, 2, 3, 4, 5])),
'int_col': [10, 20, 30, 40, 50],
})
df_with_weird_types()
Four types that most viewers ignore:
Categorical: Has a fixed set of allowed values. Not a string.
Timedelta: A duration, not a timestamp. “1 day, 2 hours, 3 minutes, 4 seconds” is a single value.
Period: A span of time (“January 2021”), not a point in time.
Interval: A range like
(0, 1]. Common inpd.cut()output.
Buckaroo detects each type and applies the appropriate formatter. Timedeltas display as human-readable durations (“1d 2h 3m 4s”), not raw microsecond counts.
Weird Types (Polars)¶
# from buckaroo/ddd_library.py
def pl_df_with_weird_types():
"""Polars DataFrame with unusual dtypes that historically broke
rendering. Exercises: Duration (#622), Time, Categorical,
Decimal, Binary."""
import datetime as dt
import polars as pl
return pl.DataFrame({
'duration': pl.Series([100_000, 3_723_000_000,
86_400_000_000, 500, 60_000_000],
dtype=pl.Duration('us')),
'time': [dt.time(14, 30), dt.time(9, 15, 30),
dt.time(0, 0, 1), dt.time(23, 59, 59),
dt.time(12, 0)],
'categorical': pl.Series(
['red', 'green', 'blue', 'red', 'green']
).cast(pl.Categorical),
'decimal': pl.Series(
['100.50', '200.75', '0.01', '99999.99', '3.14']
).cast(pl.Decimal(10, 2)),
'binary': [b'hello', b'world', b'\x00\x01\x02',
b'test', b'\xff\xfe'],
'int_col': [10, 20, 30, 40, 50],
})
pl_df_with_weird_types()
Polars has its own set of tricky types:
Duration: Microsecond-precision time spans. Was completely blank before issue #622.
Time: Time-of-day without a date component.
Decimal: Fixed-precision decimal (not float). Important for financial data.
Binary: Raw bytes. Displayed as hex strings.
Buckaroo renders both pandas and polars DataFrames with the same viewer. If you’re migrating from pandas to polars, buckaroo moves with you.
What’s happening under the hood¶
Every table on this page is a static embedding of the full buckaroo widget. There is no Python kernel running. Here’s what happened:
A Python script called
buckaroo.artifact.to_html()on each DataFrameThe function serialized the data to base64-encoded Parquet (compact binary)
The summary stats (dtype, mean, histogram, etc.) were computed and serialized
Everything was embedded in an HTML file as a JSON
<script>tagThe
static-embed.jsbundle (1.3 MB) decodes the Parquet, renders AG-Grid, and draws histograms — all client-side
No server required. The file can be hosted on any static file server, CDN, or even opened from disk. The tables on this page are iframes pointing to standalone HTML files that share a single copy of the JS bundle.
For details on how to create your own static embeds, see the Buckaroo Embedding Guide.
Try it yourself¶
from buckaroo.ddd_library import *
from buckaroo.artifact import to_html
# Generate a static HTML page for any DataFrame
html = to_html(df_with_weird_types(), title="Weird Types Demo")
with open('weird-types.html', 'w') as f:
f.write(html)
Or in a Jupyter notebook, just:
import buckaroo
from buckaroo.ddd_library import df_with_weird_types
df_with_weird_types() # renders inline
The Dastardly DataFrame Dataset is also available as an interactive tour
in Marimo — see docs/example-notebooks/marimo-wasm/buckaroo_ddd_tour.py
in the repository.