We have done a little theorising on complexity of working with a simple functional language. Although not necessarily famed for being a functional language, we are looking at spreadsheets and applying a classical model of human performance known as the Keystroke Level Model.
The simple view is that a spreadsheet cell contains a formula expression, plus the ability to refer to other cells within the expression. When it does refer to other cells, then they themselves have other expressions. They are sub-expressions of the first one. Hence, overall expressions are composed of functions and operators applied to other sub-expressions repeatedly until one gets to fixed values. The fixed values are the data going into the computation performed.
The human factors complexity we focus upon is when a user finds a cell that isn’t doing what they think it should. For example, if they had =2+3*4 in a cell and are surprised to find the result is 14 when they thought it would be 20. (In this case the user has forgot that BODMAS should be used to understand formula).
Faced with this dilemma, the cell showing 14 and the user thinking it should be 20, a general strategy is to try each bit of the formula to work out what is wrong. So, they might try checking what “2+3” makes (i.e., 5) and checking what “3*4” makes (i.e., 12). Then they try using those values in place of the expressions. So, they try checking what “5*4” and “2+12” produce. At that point the user hopefully notices that the second of these is producing 14, from 2+12 and the 12 is from 3*4. So the multiplication is being done first. Depending upon the user’s maths knowledge, they will revise their formula to do what they wanted, by for example bracketing the addition – “(2+3)*4”.
This example is very simplistic but for more complex cases the strategy makes sense. In short, when the expression is not doing what you think it should, so you check parts of the expression are each working as you think they will and that you are combining them in the write way. It is what “debugging” often involves.
For example if we were using an accounting function like “SLN(30*A1,A2,B2*12)” and the answer was not what we expected we would check “30*A1” is what we expect, “A2” is what we expect and also “B2*12”. Finally, we’d check that “SLN” is right function to use.
The keystroke level model is a classical measure of physical effort when working with a computer. Here we use the model to look at the effort of debugging a spreadsheet. The model is quite detailed but at a simple level it focuses upon keystrokes as the basic measure of effort. With the above original formula it needed 6 keystrokes but it took 20 keystrokes to check it (and that is ignoring mouse or cursor movements).
To generalise this simple idea to a more general case we have make some assumptions, we start by considering the expression computed as a whole, including all the sub-expressions. The number of terms in the expression is a key parameter (S); and more subtly and the average number of arguments any operator or function use (F). Finally, how many keystrokes forming a term needs to be known, for this we shall assume T is the average size of a term.
The simplistic analysis the size of an expression is T.S characters. The effort of checking all the sub-expressions that form the formula means in parts typing the entire formula again, and again for each level of nesting of sub-expressions. We make the conservative assumption that the depth of nesting is at least:
log(F)(S) – 1
So, if the user is comprehensively checking all sub-expressions in a cell formula, they will have to enter each of the sub-expressions at each level of nesting. Hence a lower bound on the activity required will be:
T . S . (log(F)(S) – 1)
Note this is a very generic account of effort, the effort of mental management has been ignored as have details such as brackets and commas within a formula.
The simple interpretation of this is not surprising the few the terms used in an expression the less effort debugging is, and shorter the terms the less effort required to debug. Slightly more interestingly with the same number of terms, expressions with more arguments are easier to debug, because they’ve less depth complexity. For example: “(512+734+226+691)/4” involves more debugging effort than “SUM(512,734,226,691)/4” because there are less nested expressions.
Using our model the effort to debug “(512+734+226+691)/4”, is based on 8 terms (S=8), where terms are on average 2.38 characters (T = 2.38) and the number of arguments of the operators is 2 (F = 2). This gives a keystroke complexity for debugging the expression as 38 keystrokes.
By contrast the effort to debug “SUM(512,734,226,691)/4”, involves 7 terms (S = 7), with an average term size of 2.43 (T = 2.43), and an average number of arguments as 3 (F = 3). The resulting complexity of is only 13.1 keystrokes. Less than half the effort of the previous case.
Although this complexity can seem quite an abstract concept it is useful in quantifying effort and also highlighting what might help reduce that effort. For instance, we can see that built in functions can help simplify the complexity. In this example, using the AVERAGE function would increase F further and reduce S both of which would reduce the complexity.
Having a good measure of complexity like this also helps illustrate the power of additional tool features. For example, if a user were to have the EQUS add-in operating with any of these examples, the tool will automatically present the breakdown of a formula thus minimising the effort required whenever a formula seems to be not performing as expected.
So, using EQUS with either “(512+734+226+691)/4” or “SUM(512,734,226,691)/4”, the visualisation shows all the details that hand debugging involves with zero keystrokes, see the EQUS outputs, below for the two cases.
It can be seen that the physical effort when the visualisation is provided goes to zero, since the user only has to read off components of the expressions involved.