Google Sheets Dummy Values

When importing Google Sheets workbooks, we ask for the Excel version of the workbook.

Excel workbooks denote spilled ranges via the F attribute on cells. The CSF looks like so:

{
  "cells": {
    "A1": { "v": 1, "F": "A1:A3", "f": "=SEQUENCE(3)" },
    "A2": { "v": 2, "F": "A1:A3" },
    "A3": { "v": 3, "F": "A1:A3" }
  }
}

Because of the F attribute, we know that the cells A1:A3 belong to the spill range, anchored at the top-left cell of the spill range A1.

However, in some cases, Google Sheets does not emit the F attribute to denote the spill range of cell. The Excel export’s XML looks like so:

<row r="1">
  <c r="A1" s="2" t="str">
    <f>
      IFERROR(__xludf.DUMMYFUNCTION(&quot;QUERY(B:C, &quot;&quot;select B where C &gt;= 2&quot;&quot;)&quot;),&quot;Name&quot;)
    </f>
    <v>Name</v>
  </c>
</row>
<row r="2">
  <c r="A2" s="2" t="str">
    <f>
      IFERROR(__xludf.DUMMYFUNCTION(&quot;&quot;&quot;COMPUTED_VALUE&quot;&quot;&quot;),&quot;John&quot;)
    </f>
    <v>John</v>
  </c>
</row>

In this case, the spill anchor A1 is using the QUERY function which Excel does not support. To make Excel not freak out about this, Google wraps the function in an IFERROR + DUMMYFUNCTION combo which falls back to the computed value if the function is not available.

GRID wants the original formula, so we remove the IFERROR and DUMMYFUNCTION to restore the original formula.

In this workbook, the spill anchor A1 is spilling to A2. However, Google does not provide a spilled range attribute that we can use. Instead, it denotes the “spilled status” of the cell A2 via DUMMYFUNCTION("""COMPUTED_VALUE"""). This tells us that A2 is a non-anchor cell belonging to a spilled range, but it doesn’t tell us which spilled range.

We collect such spilled cells and call them GSDV cells (Google Sheets Dummy Values). We provide them in the CSF:

{
  "cells": {
    "A1": { "v": "Name", "f": "=QUERY(B:C, \"select B where C > 2\")" },
    "A2": { "v": "John" }
  },
  "gsdv": ["A2"]
}

PS: Notice that A1 is not marked as a GSDV cell. We do not know whether it spills or not.

So what we need to do is figure out which GSDV cells belong to which spilled anchors. This is left to Apiary to figure out, and this document walks through how Apiary does that.

Why do we need to solve this?

Given the example CSF we looked at before:

{
  "cells": {
    "A1": { "v": "Name", "f": "=QUERY(B:C, \"select B where C > 2\")" },
    "A2": { "v": "John" }
  },
  "gsdv": ["A2"]
}

If any cell in B:C were to update, the A1 cell would be recalculated. Upon recalculation, it would be blocked by A2 because A2 does not belong to A1’s spilled range, and so A1 becomes a #SPILL! error.

By figuring out that A2 belongs to A1’s spilled range, A1 is allowed to spill into A2.

GSDV claimant algorithm

The GSDV algorithm works by discovering GSDV claimants and their claimed ranges, and resolving overlapping claimed ranges via recalculation. This happens in three distinct phases.

A “GSDV” claimant is a formula cell above or to the left of a GSDV cell that has no F attribute. The claim range is a range anchored (top-left) at the claimant where every coordinate in the claimed range is a GSDV cell.

Phase 1

We discover GSDV claimants by iterating over all GSDV cell coordinates and checking the neighboring cells immediately above and to the left of those coordinates.

A GSDV claimant in phase 1 must:

have a formula, and
use one of the known GSDV functions in its formula, and
not have an F attribute.

For each of these claimants, we compute their claimed range. This is done via a best effort heuristic because there are multiple ways to determine a claimed range. Take this example where C represents the claimant and □ represents GSDV cells:

C □ □ □
□ □ □
□ □
□

There are four possible claimed ranges:

C □ □ □    C ■ □ □    C ■ ■ □    C ■ ■ ■
■ □ □      ■ ■ □      ■ ■ ■      □ □ □
■ □        ■ ■        □ □        □ □
■          □          □          □

We could try to figure out which claim is “optimal” by looking for other claimants and figuring out which claimants cancel out others, but that would be very computationally expensive.

Instead, we traverse down and then right. Our heuristic is assuming that vertical spills are more common than horizontal ones.

Once we’ve determined the GSDV claimants and their claimed ranges, we filter out any overlapping claims. In the case of overlapping claimed ranges, we cannot determine the spilled ranges with certainty.

For the successful claims (claimed ranges with no overlap), we mark all cells in the claimed range as belonging to the same spilled range and remove their GSDV status.

On claimed range correctness

If there are multiple possible claimed ranges for a given claimant, how can we know whether the one we picked is correct? At first glance, it seems like we can’t guarantee correctness if our claimed range is arbitrary.

However, we can know that a claim is correct if there are no overlapping claims. If there are no other claims for a given GSDV cell, there is no way for it to exist except if it were to belong to the claimed range.

If we picked the “wrong” claimed range, we will encounter overlapping claims and the next phases are designed to tackle those.

Phase 2

Phase 2 attempts to resolve overlapping claims by recalculating all claimants that encountered overlapping claimed ranges. We can know the spilled range with certainty by recalculating the GSDV claimant and using the size of the result to calculate the spilled range.

So during recalculation, we allow the calculated spill range for a given cell to “override” GSDV cells. This is in contrast to normal cells, which would block the spill.

When the calculated spill range overlaps a GSDV cell, we remove its GSDV status and allow the spill to take over.

This does not work in two instances:

Volatile-sized spills
Claimant cells whose formula cannot be evaluated (because it failed to parse, or more commonly, because it calls a function that we do not support, or uses syntax we do not support, such as lambda calls)

Phase 3 tackles the latter case.

Phase 3

There are some Google Sheets functions which we do not support, such as IMPORTRANGE. Phase 3 is intended to resolve the case where:

a claimant encountered overlapping claims, and
that same claimant was not included in the initial recalculation because its formula contains something we do not support --- so it did not claim any GSDV cells.

So in phase 3, we rediscover the claimants again with a new requirement:

The claimant must have a formula that cannot be parsed or that calls an unsupported function.

At this point, we should have resolved all other claimants that encountered no overlapping claims or were successfully recalculated, so the number of claimants should be reduced. The reduced number of claimants also reduces the likelihood of overlapping claims.

With that, we repeat phase 1 with the new set of claimants. If a claimant makes a successful claim, we also restore the claimant’s original value from the CSF (which avoid the recalculation had replaced with a #NAME? error).

PS: We may still encounter overlapping claims, but we hope to encounter significantly fewer.

Lastly, we remove the remaining GSDV cells. The idea is that they are most likely the result of volatile-sized spills. This may be incorrect, but we don’t have a great way to deal with the remaining GSDV cells, and leaving them in place risks uselessly blocking subsequent spills (likely in particular if indeed the cause is volatile-sized spills).