Introduction to cursor maintenance

This article is for developers who are building a text input field that offers a movable cursor and applies formatting in the same field. Whether the format is applied after every keystroke or after a special user action, you have to decide where to put the cursor after the text has been formatted.

If you are familiar with the problem of maintaining cursor position in a formatted input field, you may want to go straight to my source code. In this article, I introduce the concept of cursor maintenance and present several attempts at a solution.

The problem

Suppose we’re building an input field that expects the user to enter a whole number such as the population of a city. The input field accepts digits and commas, rejecting all other characters. After every keystroke, the input field formats the text into groups of three digits separated by commas.

For example, if the user enters 1125, the input field formats it to 1,125.

The user has a movable cursor, which we’ll represent with a ^. Here the cursor is at the right end of the input text:

112,500^

There are seven characters to the left of the cursor, so we say that the cursor is at position 7.

We have to update the cursor position after the format is applied. If the cursor is at the right end of the input field before formatting, we want it to be at the right end after formatting as well.

Suppose the user enters a 0:

112,5000^

After this input action, the cursor has moved to position 8. The input field formats the text 112,5000 to 1,125,000. Now there are nine characters, so we move the cursor to position 9:

1,125,000^

To maintain the cursor at the right end of the input field, we follow a simple formula: if the text consists of n characters, we place the cursor at position n.

Things get complicated when the user moves the cursor elsewhere in the text.

Suppose the user moves the cursor to position 4:

1,12^5,000

Now the user hits backspace, deleting the 2 and moving the cursor to position 3:

1,1^5,000

The input field formats 1,15,000 to 115,000. Where should we put the cursor? This is what happens if we keep it in position 3:

115^,000

The cursor has jumped across the 5 character, which is surprising and inconvenient. The user expects the cursor to remain between the 1 and the 5:

11^5,000

This is obvious to a human observer. It is less obvious how to design a process to compute cursor positions that meet the user’s expectations.

The root of the problem is that text formatting does not involve the cursor. Although the cursor is visually displayed in the text, it is not logically part of the text. It is an interface element that assists user editing by indicating a position in the text. The format is applied to a sequence of characters without any representation of the cursor. After the text has been transformed by formatting, we want to place the cursor in a position that minimizes user surprise.

In a few cases, the answer is obvious. If the format leaves the text unchanged, we leave the cursor position unchanged. If the cursor is at the leftmost or rightmost position prior to formatting, we place it respectively leftmost or rightmost after formatting. In other cases, we have to examine the cursor’s context before formatting and attempt to place the cursor in a similar context after formatting.

The status quo

When you use an automated teller machine (ATM), you do not have the luxury of a movable cursor. Even the most advanced ATMs today limit you to inserting and deleting digits at the right end of the input field. Though equipped with touch screens and speech synthesis modules, they offer the same user interface for entering dollar amounts as the ATMs of the 1970s.

TD Bank ATM dollar amount interface

Web apps and mobile apps are only slightly more sophisticated. A banking app may let you move a cursor around the input field when you’re entering a dollar value, but it won’t format the value directly in the input field. Instead it displays the formatted value in a separate, non-editable text field.

PC Financial dollar amount interface

The rigid interface may be excused on the grounds that information integrity is paramount in a financial transaction. Showing the user’s raw input and the formatted dollar amount in separate fields serves to dispel ambiguity. The drawbacks are redundancy and complexity. In other applications, users would probably prefer a simpler and friendlier interface.

Consider the clunky interface for editing cells in spreadsheet applications like Google Sheets and Microsoft Excel. You can make a spreadsheet cell that displays its value in a dollar format, but you can’t edit the formatted amount directly. You are restricted to editing the raw text behind the format. The formatted dollar amount only appears after you leave the cell.

Google Sheets dollar amount interface

It would be more convenient if you could insert your cursor into the formatted text and edit it directly. Then you could hit a special key to format the text while the cursor stayed where you expected. For some formats, the text could be instantly formatted after every keystroke and the cursor would be unperturbed.

Larger-scale formatting interfaces would also benefit from cursor maintenance. Let’s say you’re editing a document of many pages in a word processing application. Currently there are two spaces after every sentence, and you apply a format that reduces all of these to one space. The document gets repaginated as the whitespace is reduced, so the line you were editing ends up on a different page after formatting. You have to hunt through the text to find your place again. In a better world, the application would format the document while preserving the cursor position with respect to the content.

How can we develop formatted input elements that maintain the cursor in a user-friendly position? Is there an easy way to formulate rules to maintain cursor position in all possible cases for a given format? Is it possible to generalize such rules to handle many formats?

Terminology

In this article, I use the word format to mean a deterministic text transformation. Mathematically speaking, a format is a function on strings. A formatter is a computational process, such as a JavaScript function, that implements a format.

When the user modifies the contents of an input field, we obtain raw text, which has been edited by the user and not yet formatted. A formatter takes raw text and returns formatted text.

Cursor maintenance is the third step in the following usage flow:

  1. The user does some editing in the input field and leaves the cursor somewhere in the raw text.

  2. The raw text is passed to a formatter, and the resulting formatted text replaces the raw text in the input field.

  3. The cursor is moved to some position in the formatted text.

After the last step, we want the user to perceive the cursor position as unchanged even though the text has changed. How can we choose a cursor position to accomplish this illusion? Broadly speaking, that is the problem of cursor maintenance.

How should we formally define the problem? In other words, what input do we provide and what output do we require? There are several conceivable ways to go about it. Let me propose two definitions.

Consider a formatting instance defined by three values:

For example, suppose the raw text is 12,5900, the raw cursor position is 4, and the formatted text is 125,900. We can write this formatting instance in a compact form:

12,5^900  ->  125,900  

Consider a function that takes a formatting instance and produces one output value:

Such a function is a cursor maintainer. It solves a variant of the cursor-maintenance problem in which the only information we explicitly provide is a formatting instance.

A cursor maintainer that attempts to handle all possible formats won’t be very accurate because a new cursor position that works well for one format will be a poor choice for other formats. However, we can imagine designing a cursor maintainer to solve a restricted set of formatting instances. We might build a cursor maintainer for several similar formats or just for one format.

The notion of doing cursor maintenance for a fixed format leads to a variant problem definition. We provide two input values:

We require two output values:

A function that performs this computation is a cursor-maintaining formatter.

How do we make a cursor-maintaining formatter? We might examine a format specification and manually build a cursor-maintaining formatter. On other occasions, we might use an automated process that transforms a plain formatter into a cursor-maintaining formatter.

Solution quality

Let’s walk through a usage example with an input field that applies formatting after every text modification. Initially the input field contains the formatted text 12,900, with the cursor at position 4:

12,9^00

The user moves the cursor to position 3:

12,^900

The text is unchanged, so there is no need for formatting. Next, the user enters the character 5, leaving the input field in this state:

12,5^900

The cursor is now at position 4. The input field detects a text modification. It passes the raw text to the formatter, which returns 125,900. This replaces the raw text in the input field. Now the input field must choose a cursor position.

Position 3 is a reasonable answer. The input field would end up in this state:

125^,900

It would also be reasonable to choose position 4, resulting in this state:

125,^900

It isn’t clear which state would be preferred by a majority of users. However, these are clearly the two best cursor positions. We want to choose one of these rather than a less user-friendly cursor position. We also want to make the choice consistently. In ambiguous cases that are similar to this one, we want to choose similarly each time.

There is no definitive way to decide whether the problem has been solved correctly. Cursor maintenance is a fuzzy problem. Our goal is to minimize user surprise. Equivalently, we aim to maximize the odds of meeting the user’s expectations. We want to answer this question: Where would the user expect to see the cursor after the raw text has been replaced by the formatted text?

Instant versus on-demand formatting

Although formatting the content of the input field instantly after every keystroke may be aesthetically pleasing, it can have unintended effects. With some formats, instant formatting makes the input field unusable.

Consider a format that strips all whitespace from both ends of the input. If an input field were to apply this format instantly, it would be impossible to type a space character at the right end of the text. The input text Hello, would be instantly formatted to Hello,, effectively deleting the space each time the user attempted to type it.

Other formats would be more amenable to instant formatting, yet susceptible to destructive formatting. Consider a numerical input field that automatically strips zero digits from the left end. If you paste in 0040001, it gets formatted to 40001, which seems reasonable. But if you decide to change this number to 50001, and you backspace over the 4, this is what happens in the input field:

`0001` -> `1`

Instead of typing 5 as you intended, you have to type 5000 because the input field deleted the leading zeros.

Such a format should only be applied on demand. This means that formatting takes place when the user clicks a certain button or hits a certain key. After formatting, we still face the question: Where should the cursor go? The easy solution is to remove the cursor from the input field, so that the user must click inside the input field to reposition the cursor and resume editing. The user-friendly solution would be to automatically move the cursor to a position where the user expects to see it in the formatted text. Cursor maintenance is desirable regardless of whether formatting is done instantly or on demand.

Sample formats

In my experiments with cursor maintenance, I used two formats extensively. The first renders positive whole numbers into groups of three digits separated by commas. I call this format commatize. It can be implemented with the following JavaScript function.

plain.commatize(s) = function(s) {         // s is a string composed of
  var start, groups, i;                    //  digits and commas.
  s = s.replace(/,/g, '');                 // Remove all commas.
  start = s.length % 3 || 3;               // Begin with 1, 2, or 3 digits.
  groups = [ s.substring(0, start) ];      // Make the first group of digits.
  for (i = start; i < s.length; i += 3) {
    groups.push(s.substring(i, i + 3));    // Add three-digit groups.
  }
  s = groups.join(',');                    // Insert commas between groups.
  return s;
};

The other format, trimify, removes all whitespace from the beginning of the text, and condenses whitespace sequences elsewhere in the text to one character each.

For example, the input text

`  how much  wood  could  a   woodchuck  chuck  `

is transformed by trimify into:

`how much wood could a woodchuck chuck `

This format can be implemented as follows.

plain.trimify = function (s) { // s is an arbitrary string.
  s = s.replace(/^\s+/, '');   // Remove whitespace from the beginning.
  s = s.replace(/\s+/g, ' ');  // Condense remaining whitespace sequences
  return s;                    //  to one space each.
};

Testing methodology

Some cursor-maintenance approaches involve reimplementing the formatter. To verify that the reimplemented formatter performs the same text transformation as the original formatter, I ran command-line tests with a [batch tester]().

In addition to verifying the format, the batch tests helped me to manually assess the new cursor positions computed by various cursor-maintenance approaches. Each case in the test suite contains four values:

Here is a case in the trimify test suite:

('  whirled    peas  now  ', 11, 'whirled peas now ', 8)

This describes a situation in which the cursor is initially at position 11:

`  whirled  ^  peas  now  `

After trimify, the cursor moves to position 8:

`whirled ^peas now `

That is the answer expected by this test case. Is it the optimal answer? It’s hard to say.

What we can say confidently is that if the cursor is surrounded by spaces in the raw text, the cursor should end up to the immediate left or right of the single space that remains after trimification. In the test case above, it’s on the right. This is what it would look like on the left, in position 7:

`whirled^ peas now `

The user would be surprised by any answer other than position 7 or 8. We always want to choose one of these two answers, and we want to do so consistently.

What does consistency mean? In cases where the cursor is surrounded by spaces in the raw text, we could do the following after trimification:

Any one of these would make a consistent pattern of cursor positioning. In my test suite, I made the arbitrary choice to always place the cursor to the right of the space.

To observe other patterns, I followed up automated testing on the command line with manual testing. I did much of this with an [interactive comparison test]() that displays the results of six cursor-maintenance approaches applied to commatize and trimify. Each approach is explained below.

Ad hoc solutions

If we are well acquainted with a given format, we may see a trick to calculating new cursor positions. Consider the general effect of commatize: it inserts and deletes commas while leaving the digit characters as they are. It makes sense to position the cursor in such a way that the number of digits found to the left of the cursor remains the same after formatting.

We can use this idea to write a new version of commatize:

adHoc.commatize = function (s, cursor) {
  var pos, ch, leftDigitCount;
  // Count the digit characters to the left of the cursor.
  leftDigitCount = cursor - count(s.substring(0, cursor), ',');
  // Apply the original commatize function.
  s = commatize(s);
  if (leftDigitCount == 0) {
    return { text: s, cursor: 0 };
  }
  // Count off digit characters.
  for (pos = 0; pos < s.length; ++pos) {
    ch = s.charAt(pos);
    if (ch != ',') {
      if (--leftDigitCount == 0) {
        break;
      }
    }
  }
  // Place the cursor to the right of the counted digits.
  cursor = pos + 1;
  return { text: s, cursor: cursor };
};

This function calls the original commatize formatter to transform the text. It calculates a new cursor position by exploiting a particular quality of commatize.

Let’s see if we can devise a cursor calculation for trimify. The effect of trimify is to delete whitespace characters. It stands to reason that we should move the cursor one position to the left for each character that is deleted to the left of the cursor. But if the cursor is adjacent to a whitespace character, it isn’t obvious whether we should deem that character to be deleted.

In this case, the space to the left of the cursor is preserved:

` Hello, ^world.  `  ->  `Hello, ^world. `

In these cases it is removed:

`  ^  Hello, world.  `  ->  `^Hello, world. `

` ^Hello, world.  `  ->  `^Hello, world. `

It turns out that we can handle every case with the trick of appending a non-whitespace character to s.substring(0, cursor) before trimifying it:

adHoc.trimify = function (s, cursor) {
  var leftTrimmed;
  // Split the string and mark the cursor's position with non-whitespace.
  leftTrimmed = format.trimify(s.substring(0, cursor) + '|').text;
  s = format.trimify(s).text;
  // The cursor moves to the right end of the trimified substring or the
  //  entire trimified text, whichever is shorter.
  cursor = Math.min(s.length, leftTrimmed.length - 1);
  return { text: s, cursor: cursor };
};

This is a good trick in that it calculates reasonable cursor positions. The trouble is that it is an improvised idea rather than a systematic approach.

Using a mock cursor

If the problem is that the text and the cursor are separate entities, perhaps the solution is to bring them together. Let’s incorporate the cursor into the text as a special character that we’ll call the mock cursor. Any character that is not already in the text will do. Consider this state:

49^50,000

The mock cursor could be a caret, ^, so that the text becomes:

49^50,000

We can perform cursor maintenance as follows:

  1. Insert the mock cursor at the raw cursor position.
  2. Run a modified formatter that accommodates the mock cursor.
  3. Record the position of the mock cursor in the formatted text.
  4. Remove the mock cursor from the text.

The problem of cursor maintenance is reduced to modifying the formatter to work around the mock cursor. Let’s see what this entails for commatize and trimify.

By inserting the mock cursor into text that is to be commatized, we displace each comma that occurs to the right of the cursor. We can adjust for this by scanning the characters of the text in reverse, appending every non-comma character to a list, counting only the digit characters and adding a comma after every third one. At the end we reverse the list and concatenate its contents to obtain the formatted text. This implementation achieves the same effect as the original commatize and works equally well on text with or without a mock cursor.

With trimify, the mock cursor impedes formatting more locally. Inserting the mock cursor and applying the original trimify results in text containing at most one extraneous space to the immediate left or right of the mock cursor.

If there is a space on each side of the cursor, we delete the one on the right:

"Hello, ^ world."  ->  `Hello, ^world."

If the mock cursor is the first character and there is a space next to it, we delete that space:

"^ Hello, world."  ->  "^Hello, world."

This clean-up work is sufficient to adapt trimify to text containing a mock cursor. In comparison, commatize required a completely rewritten implementation.

The differences in adaptation demonstrate that introducing a mock cursor fails to simplify cursor maintenance. Although it may help with some formats, it is not a systematic remedy. It merely defers improvisation.

Many other ideas for cursor maintenance exhibit the same flaw. For example, it may seem useful to split the text at the cursor position, independently format the left and right portions, then glue them back together and tidy up around the split. This is another idea that adds superficial rigor but devolves into ad hoc computations.

The meta approach

Let’s try again to bring the cursor and the text together. This time, let’s go beyond text to a kind of meta-text. Let’s define a new data type that represents text with a cursor. An object of this type will have a text property and a cursor property. It will support operations that are similar to string operations, except they affect the cursor as well as the text. Each operation that causes the text to change will calculate a new cursor position that makes sense for that operation. If we implement a format with a sequence of such operations, we can perform cursor maintenance by reading the cursor property after formatting.

Consider what should happen to the cursor when we insert a character into the text. If the character is inserted to the right of the cursor, the cursor position is unchanged. If the character is inserted to the left, the cursor position decreases by one. We can extend this calculation to inserting several characters at a time.

Suppose we have the text "lemon" with the cursor at position 5:

"lemon^"

Inserting "ade" at position 3 causes no change in cursor position:

"lemon^ade"

Inserting "pink ", a string of 5 characters, at position 0 changes the cursor to 3 + 5 = 8:

"pink lemon^ade"

Now consider deletion. If a character is deleted to the right of the cursor, the cursor doesn’t move. If a character is deleted to the left of the cursor, the cursor position decreases by one. The calculation remains straightforward if we want to delete a span of characters lying strictly to the left or to the right of the cursor. When we delete a span of characters that includes the cursor, the cursor moves to the starting point of the deletion.

Suppose we want to delete 10 characters starting from position 2 in this text-with-cursor object:

"pink lemon^ade"

The cursor is at position 8, which is between 2 and 2 + 10 = 12. Therefore, the deletion moves the cursor to position 2:

"pi^e"

Any deterministic text transformation can be carried out by a sequence of insertions and deletions. Therefore, any format can be implemented in terms of these two text-with-cursor operations.

Have we found a foolproof method of cursor maintenance? Not exactly. A single insertion or deletion modifies the cursor in a predictable and reasonable manner. The hitch is that a sequence of insertions and deletions can cause cursor displacement that is neither readily predictable nor reasonable to a human user.

For example, an implementation of commatize could start by initializing a new text-with-cursor object and appending characters one at a time as it scans the input text. Consequently, the new cursor position is always at the left end of the text.

The meta approach is only likely to result in user-friendly cursor positions if each format is implemented with a sequence of local changes to the input text.

Here is my cursor-maintaining implementation of commatize using a text-with-cursor object implemented in JavaScript:

meta.commatize = function (s, cursor) {
  var t = new CursorMaintenance.TextWithCursor(s, cursor),
      digitCount = 0,
      pos;
  for (pos = t.length() - 1; pos >= 0; –pos) {
    if (t.read(pos) == ‘,’) {
      t.delete(pos);
    } else if (digitCount == 3) {
      t.insert(pos + 1, ‘,’);
      digitCount = 1;
    } else {
      ++digitCount;
    }
  }
  return t;
};

It may be argued that reimplementing a format with local operations on a text-with-cursor object is a more structured approach to cursor maintenance than the previous ad hoc approaches. Nonetheless, it is probably best to think of the meta approach as a member of the ad hoc family because it requires considerable creativity to reimplement the format. Although the text-with-cursor object hides the bookkeeping of cursor calculations, it doesn’t eliminate the necessity of thinking about the cursor. The developer doing the reimplementation has to understand the format thoroughly and think carefully about the cursor displacement caused by each insertion and deletion.

Retrospective approach with edit distance

So far we have attempted to do cursor maintenance by analyzing and reimplementing the formatter. Let’s see what we can accomplish by narrowing our attention to one formatting instance. We disregard the general workings of the formatter and only consider these three values:

To compute a new cursor position, let’s try to answer this question: What position in the formatted text resembles the cursor’s position in the raw text?

To make it feasible to compute, the resemblance must be quantifiable with a measure that agrees with human perception of the cursor’s place in the text. I call this the retrospective approach because it chooses a cursor position in the formatted text by looking back at the cursor in the raw text.

One idea for measuring resemblance begins with the observation that the cursor splits the text into left and right parts. It can be argued that the left part of the raw text should resemble the left part of the formatted text, and the right part of the raw text should resemble the right part of the formatted text.

Maximizing resemblance is the same as minimizing difference. The difference between two strings can be expressed by their edit distance, which is the number of elementary operations required to transform one string into another. One such measure is the Levenshtein distance, in which each elementary operation is the insertion, deletion, or replacement of a character.

Suppose the Levenshtein distance is implemented with a function called levenshtein. Given the raw text s and raw cursor position p, we can assign a score to each position q in the formatted text t by computing the following sum and choosing positions that achieve the lowest scores:

levenshtein(s.substring(0, p), t.substring(0, q)) +
    levenshtein(s.substring(p), t.substring(q))

I call that the split Levenshtein cost function. In my informal testing, it works fairly well. It usually agrees with the results of the ad hoc approaches even though it is oblivious to the details of the format.

The weakness of split Levenshtein is that it often computes tied scores. For example, given the raw text and cursor

14,^00

with the formatted text

1,400

the following scores are computed for each new cursor position:

^1,400 6
1^,400 4
1,^400 2
1,4^00 2
1,40^0 3
1,400^ 5

Whenever there is a tie, I arbitrarily choose the leftmost of the lowest-scoring positions. In many cases, such as the one shown here, this turns out to be a poor choice. Here the rightmost among the tied positions is best. In other cases, it would be better to use the leftmost tied position or something in the middle. There is no simple tie-breaking strategy that works consistently.

Another drawback is the computational cost. It takes O(n^2) time to compute the Levenshtein distance for a pair of strings of length O(n). There are n possible cursor positions, so it costs O(n^3) to solve one instance of the cursor-maintenance problem.

Although cubic time isn’t a burden for small input fields, it becomes problematic for document-scale formatting. If the text consists of 10^3 characters, or about 200 words of English prose, the cost of using Levenshtein for one instance of cursor maintenance is on the order of 10^9 computational steps.

Retrospective approach with frequency ratios

Another way to evaluate cursor positions is to count character frequencies on each side of the cursor.

For example, in the state

`  later, ^ alligator  `

we can observe that the character l occurs once to the left of the cursor and twice to its right.

Let’s define a pair of character-frequency functions, left and right, that each take a character, a string, and a position. They return the frequency of the character to the left and right of the position, respectively. In the above state, if we call the raw text s, we have:

left('l', s, 9) = 1
right('l', s, 9) = 2

Suppose s gets formatted, resulting in text t. If there is a position q in t such that

left('l', t, q) = 1
right('l', t, q) = 2

we might view it as evidence that q is a good choice of new cursor position because it preserves the left and right frequencies of the character l. There is no guarantee that we will find such a position or that it will be unique. Indeed, in our example there are several positions that preserve the left and right frequencies of l. We should try to take into account the frequencies of all the characters that appear in the text.

Given position p in the raw text s, let us compute the frequency ratio

a(p) = left(c, s, p) / (left(c, s, p) + right(c, s, p))

for each character c that occurs in both s and t.

Observe that left(c, s, p) + right(c, s, p) is equal to the total frequency of the character c in s. By denoting the total frequency as count(c, s), we can write the expression more concisely:

a(p) = left(c, s, p) / count(c, s)

At each position q in the formatted text t, we likewise calculate

b(q) = left(c, t, q) / count(c, t)

for each character c that occurs in both s and t.

For each character, we take the square of the difference between a(p) and b(q):

cost(c, q) = (a(p) - b(q)) ** 2

Note that p, the raw cursor position, stays constant, while we vary q over every possible cursor position in the formatted text. We calculate a score for each candidate position q by taking the sum of the values cost(c, q) over all characters c that occur in both s and t. The cursor is set to the position that achieves the minimum score. If there are several, we choose the leftmost.

For the formatting instance 14,^00 -> 1,400, the scores are:

0 ^1,400 3
1 1^,400 2
2 1,^400 1
3 1,4^00 0
4 1,40^0 0.25
5 1,400^ 1

Here the cursor position with the lowest score is indeed optimal.

This is the retrospective approach with frequency ratios. It succeeds in some problem instances where split Levenshtein fails. In other cases, it does no better.

For the formatting instance 14^,00 -> 1,400, the scores computed by the frequency-ratio cost function are:

0 ^1,400 2
1 1^,400 1
2 1,^400 2
3 1,4^00 1
4 1,40^0 1.25
5 1,400^ 2

We have a tie between 1^,400 and 1,4^00. By comparing these to the raw state 14^,00, we can see that 1^,400 scores 1 because the 4 was previously to the left of the cursor and here is to the right. Conversely, 1,4^00 scores 1 because the , has moved from the right to the left. The cost function isn’t aware, as humans are, that digit characters are more important than commas when it comes to positioning the cursor.

When it comes to computational cost, the frequency-ratio cost function is distinctly superior to the split Levenshtein cost function. Given a fixed set of characters in a text of size n, we can count the left frequencies at every position with one scan in O(n) time. The rightmost left frequency gives us the count value. With a second scan through the left frequencies, we compute the ratio of left to count at each position. The overall time complexity is linear with respect to the length of the text.

The layer approach

A weakness of the split Levenshtein and frequency-ratio cost functions is that they treat all characters equally, whereas users do not. A human user attaches different meanings to various characters in accordance with the format. In a commatized input field, the user primarily sees the cursor’s position with respect to the digit characters. It would make sense to position the cursor among the digits in the text before considering commas. In a trimified input field, the user primarily sees the cursor’s positioning among non-whitespace characters.

These observations lead to the idea of evaluating cursor positions relative to text subsequences that we’ll call layers. For example, we could define two layers for commatize, the first consisting of digits and the second consisting of commas. Thus, the text 1,400 has the following layers:

     text: 1,400
  layer 0: 1 400
  layer 1:  ,

More formally, layers are induced by character sets. Let’s use the following character sets for commatize:

character set 0: { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }
character set 1: { ',' }

The idea of evaluating a cursor position by counting character frequencies around the cursor can be adapted to the layers induced by character sets.

Let’s define a new function left(C, s, p) that yields the number of characters of text s to the left of position p that are in character set C. Let’s also define a new function count(C, s) that yields the total number of characters of text s that are in character set C, noting that count is easily computed by calling left with the length of s as the parameter p.

For a character set C and text s, we assign a score to position p by computing the ratio:

left(C, s, p) / count(C, s)

We scan the text layers one by one, in the order of the character sets. When considering the layer induced by the first character set, C_0, we seek a position in the formatted text where the score is closest to the score achieved by the raw cursor position in the raw text. That is to say, given the target

a_0  =  left(C_0, s, p) / count(C_0, s)

for character set C_0 in the raw text s at the raw cursor position p, we are looking for the minimum absolute difference

abs(left(C_0, t, q) / count(C_0, t) - a_0)

over all positions t in the formatted text q.

If several positions in the formatted text achieve the minimum difference, these positions must be contiguous because the scoring function increases monotonically from left to right.

Let’s write [l_0, r_0] to denote the range of optimal positions found for character set C_0 over the entire formatted text. If l_0 is equal to r_0, we stop looking and return l_0 as our choice of cursor position. Otherwise, we proceed to the next character set, C_1, and seek to break the tie by evaluating the scoring function over the range [l_0, r_0] in the formatted text.

We ignore positions outside the range [l_0, r_0] even if they achieve a closer score in C_1. The search in the higher-ranked C_0 restricts the range of candidate positions. Thus, the range [l_1, r_1] must be a subrange of (and possibly equal to) [l_0, r_0].

We scan the layers induced by each character set until the candidate range narrows to a single position or no more character sets remain. If we run out of character sets, we return the left end of the final range by default. We can optionally configure the algorithm to return the right end of the final range.

In sum, the parameters that we supply to the layer approach are a list of character sets and a one-bit value indicating whether the tie-breaker should be the left end or right end of the final range.

It is convenient to implement layer scanning with regular expressions because they offer a concise syntax for specifying character sets, which are known to regex aficionados as character classes. With the character sets defined earlier for commatize, we can find layer 0 in a given text by iterating over its characters and testing each one with the regular expression /[0-9]/ or, equivalently, /\d/. We can find layer 1 by testing with /,/.

Let’s use these character sets to find a new cursor position for this formatting instance:

`14^,00` -> `1,400`

First we scan layer 0 (the digits) of the raw text and evaluate the raw cursor position, which is 2. Layer 0 has two characters to the left of position 2 and four characters overall. Thus, the target ratio for layer 0 is:

2 / 4 = 0.5

Now we scan layer 0 of the formatted text. The initial candidate range is [0, 5]. We count the layer 0 characters to the left of each cursor position in the candidate range:

  formatted text:  1 , 4 0 0
         layer 0:  1   4 0 0
 cursor position: 0 1 2 3 4 5
   layer 0 count: 0 1 1 2 3 4

At position 3, the ratio is 2 / 4 = 0.5, which is equal to the target ratio. No other position achieves this, so we choose 3 as the new cursor position:

1,4^00

We have solved this instance of commatize successfully. In fact, the layer approach always keeps the cursor positioned correctly among digits. That’s because commatize does not delete or insert digits, making layer 0 equal in the raw text and formatted text.

It turns out that we would be better off without layer 1. If we have two candidate positions after processing layer 0, they must be to the left and right of a comma. Leaving the decision to layer 1 would result in sometimes going left and sometimes right. By deleting layer 1 from the configuration and always going to the left of a comma, we achieve consistent cursor positioning.

The computational cost of the layer approach is asymptotically the same as that of the retrospective approach with frequency ratios, assuming that the number of character sets is bounded by a constant and that it takes constant time to check membership in a character set. For a given character set and given text, we scan the text once to count the number of layer characters to the left of each position. In a second scan of candidate positions, we compute the position scores. The overall cost is O(n) for text of length n.

Next steps

So you want to implement same-field formatting with cursor maintenance. Which approach should you choose? If you have enough time and you know the format well enough, you can achieve optimal accuracy by coding an ad hoc solution. Among the varieties of ad hoc I have explored, my preference is for the meta approach because it provides a layer of abstraction over the cursor calculations. As with every ad hoc approach, using it properly requires detailed knowledge of the format.

At the other end of the spectrum is the retrospective approach. It requires the least implementation effort because it is oblivious to the format. The only configuration required is the choice of retrospective cost function. I recommend the frequency-ratio cost function. In my testing, it positioned the cursor correctly much of the time.

My impression of the retrospective approach’s accuracy is vague because I lack the data for a meaningful quantitative assessment. In order to measure the real-world accuracy of a cursor-maintenance implementation, it would have to be tested on a corpus of human text-editing actions sampled without bias from a real-world application, and the test results would have to be compared to human judgments of the acceptable cursor positions in each formatting instance.

It may not be worthwhile to conduct a study on user data. I contend that the only acceptable cursor-maintenance algorithm is one that places the cursor correctly and consistently in every instance. If the cursor sometimes jumps to a surprising position, I consider the input field to be broken. Instead of trying to quantify the inaccuracy of a cursor-maintenance algorithm, I believe that we should look for solutions that we can prove to be completely accurate.

A more accurate solution generally requires more customization for the targeted format. Among the approaches that I implemented, ranging from fully ad hoc to unconfigured retrospective, the layer approach seems to offer the best trade-off between accuracy and complexity of configuration. A small configuration can make the layer approach completely accurate for certain formats. This is not to say that arriving at a good layer configuration is straightforward. It is necessary to think through configuration choices and how they affect cursor positioning, especially in edge cases.

We saw in the previous section that reducing the number of layers may result in more predictable cursor positioning. With other formats, the contrary may be true. There is always a question of what the last layer should look like. Should you define a fallback character set that matches characters that were not matched by earlier layers? Or should you make a character set that matches all possible characters? You may need extensive experimentation and case analysis to find a reasonable configuration.

Even so, I recommend that you try the layer approach before anything else. Analyze the format and see if you can decompose it into semantically significant character sets. Look at situations in which the cursor is positioned where one layer ends and another begins. These tend to be the trickiest cases. Consider tie-breaking with a fallback layer to resolve these cases consistently.

You shouldn’t be entirely satisfied with your layer configuration until you can prove that it results in correct and consistent cursor maintenance. If you can’t, I recommend that you try the meta approach. You should also consider building the input field without cursor maintenance. It would be better to remove the cursor from the input field at formatting time, or to display the formatted text separately from the input field, than to have inconsistent cursor maintenance.