How Types and Data Move from Engine to Browser ================================================ You have a DataFrame in Python. Moments later it's rendered in a browser — scrollable, formatted, with histograms in the summary row. What happened in between? This article traces the full path: column renaming, type coercion, Parquet encoding, base64 transport, hyparquet decoding, and finally the displayer/formatter system that turns raw values into what you see on screen. Column renaming: why everything becomes ``a, b, c`` ----------------------------------------------------- The very first thing buckaroo does when serializing a DataFrame is rename every column. The original column ``"revenue"`` becomes ``a``. ``"cost"`` becomes ``b``. The 27th column becomes ``aa``, then ``ab``, ``ac``, and so on — base-26 using lowercase ASCII. .. code-block:: python # buckaroo/df_util.py def to_chars(n: int) -> str: digits = to_digits(n, 26) return "".join(map(lambda x: chr(x + 97), digits)) def old_col_new_col(df): return [(orig, to_chars(i)) for i, orig in enumerate(df.columns)] Why? Three reasons: 1. **Column names can be anything.** Tuples (from MultiIndex), integers, strings with spaces and special characters, even a column literally called ``"index"``. Parquet column names must be strings. AG-Grid field names should be simple identifiers. Renaming to ``a, b, c`` sidesteps every edge case at once. 2. **Collision avoidance.** When a DataFrame has a column named ``"index"`` and we need to serialize the actual index as a column too, there's a name collision. Renaming to short opaque names means the index columns (``index``, ``index_a``, ``index_b`` for MultiIndex levels) never collide with data columns. 3. **Smaller payloads.** The column name is repeated in every row of the JSON/Parquet output. ``"a"`` is smaller than ``"quarterly_revenue_usd"``. The original name is preserved in the ``column_config`` that travels alongside the data. On the JS side, each column's ``header_name`` (or ``col_path`` for MultiIndex) tells AG-Grid what to display in the header. The user never sees ``a, b, c`` — they see the real names. .. code-block:: python # In styling_core.py — fix_column_config maps col→header_name base_cc['col_name'] = col # "a" base_cc['header_name'] = str(orig_col_name) # "revenue" Cleaning before serialization ------------------------------ Python's type system is richer than what Parquet (or JSON) can express directly. Before writing to Parquet, buckaroo coerces the awkward types: .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Python type - Becomes - Why * - ``pd.Period`` (e.g. "2021-01") - ``str`` - Parquet has no period type * - ``pd.Interval`` (e.g. ``(0, 1]``) - ``str`` - Parquet has no interval type * - ``pd.Timedelta`` - ``str`` (e.g. "1 days 02:03:04") - fastparquet can't encode timedeltas * - ``bytes`` (e.g. from ``pl.Binary``) - hex string (e.g. ``"68656c6c6f"``) - Parquet object columns need strings * - PyArrow-backed strings - ``object`` dtype - fastparquet needs object, not ArrowDtype * - Timezone-naive datetimes - UTC datetimes - Avoids ambiguous serialization For the main DataFrame, this happens in ``to_parquet()`` (``serialization_utils.py``). The function also calls ``prepare_df_for_serialization()`` which does the column rename and flattens MultiIndex levels into regular columns (``index_a``, ``index_b``, etc.). Summary stats have an additional wrinkle: each column's stats dict contains mixed types (strings like ``"int64"`` for dtype, floats for mean, lists for histogram bins). fastparquet can't handle mixed-type columns, so ``sd_to_parquet_b64()`` JSON-encodes every cell value first, making each column a pure string column. The JS side knows to ``JSON.parse`` each cell back. .. code-block:: python # Every cell becomes a JSON string before parquet encoding def _json_encode_cell(val): return json.dumps(_make_json_safe(val), default=str) Parquet encoding and base64 transport -------------------------------------- buckaroo uses **fastparquet** with a custom JSON codec to write the DataFrame to an in-memory Parquet file. Categorical and object columns get JSON-encoded within the Parquet file (fastparquet's ``object_encoding='json'``). The raw Parquet bytes are then base64-encoded into an ASCII string: .. code-block:: python def to_parquet_b64(df): raw_bytes = to_parquet(df) return base64.b64encode(raw_bytes).decode('ascii') The result is a tagged payload: .. code-block:: json {"format": "parquet_b64", "data": "UEFSMQ..."} This travels over the wire — via Jupyter's comm protocol, a WebSocket, or embedded directly in an HTML ``