What Typing Patterns Say About You

You might remember a story from around 13 years ago about how Target could predict that one of its young teenage customers was pregnant. By tracking and analyzing what she bought, the retail giant found out about her pregnancy before her own father. Her purchases included certain vitamins and lotions that pregnant women typically buy, and Target's system flagged this behavior as indicative of pregnancy. Subsequently, the retail giant sent coupons for baby clothes, diapers, and cribs to capitalize on the teenager’s future child. When her father saw the coupons, he was furious. He thought they were encouraging his daughter to get pregnant. But it turns out she was already pregnant, and after having confronted the store, her father apologized.

This was a long time, and what interactions with different services (like the daughter’s purchases) can reveal has changed. Today, novel methods can derive insights of a different kind—insights that even the originator of the behavior was unaware of. These new methods raise uncomfortable questions about the right to privacy, integrity, and the commodification of personal data, but they also offer new opportunities for the betterment of society.

A graph visualization of “What Typing Patterns Say About You” generated with the keystroke logging program GenoGraphiXLog.

What Typing Patterns Say About You

What is keystroke logging?
Applications in Education
Applications in Healthcare
Could keystroke logging be exploited?
Final remarks

What is keystroke logging?

You have probably heard of keylogging before, an abbreviation of keystroke logging. Most people associate keylogging with malware used to collect confidential information. Someone might download malicious software that, unbeknownst to them, installs a keylogger. The keylogger can record their keystrokes in the hopes of revealing sensitive information like passwords or PIN codes.

Keystroke logging as malicious software does not predict any useful information from behavior—it’s only interested in the eventual output, the text itself. Yet it is possible to analyze keystroke logging data to make inferences about the writer, without them explicitly including that information in what they type. This type of analysis is often used for purposes other than stealing passwords. To better understand how that is possible, let’s take a look at the keystroke log.

Keystroke logging involves just what it sounds like: keeping track of the keys a writer presses while typing on a computer. The typed keys are recorded on a log that grows quickly (as demonstrated by the log below).

Table 1. This is a log of the sentence “GGXLog records each keystroke” recorded with the keystroke logging program GenoGraphiXLog (GGXLog).

Patterns primarily derived from different timing features emerge in these extensive logs. The patterns occur at multiple levels, such as individual keystrokes, syllables, words, and paragraphs. For example, at the level of individual keystrokes, patterns can involve how long a writer holds down a key (called hold time or dwell time), the time between a writer releases one key and presses the next (called flight time or up-down time), and the time between pressing two consecutive keys (sometimes called down-down time).

These patterns are a behavioral biometric: behavioral patterns unique enough to be consistently attributed to a specific individual. Fingerprints and DNA are also biometrics, but they’re physical biometrics consisting of patterns in physical properties rather than patterns in behaviors.

The stylistic patterns exhibited in handwriting is another behavioral biometric, and similar to handwriting, people exhibit individually distinctive ways of typing. Handwriting analysis can be surprisingly accurate, with trained forensic examiners in some cases achieving error rates as low as 3%. Identifying who signed an important contract can be valuable, but people don’t interact with sensitive computer systems through handwriting. However, the standard way of interacting with a computer system is with a keyboard. Typing can be analyzed in real time using computer software running machine learning algorithms in the background. Those algorithms can use the timing features of the log that remain consistent over time, primarily at the level of individual keystrokes, such as the aforementioned hold times or down-down intervals.

User authentication through input patterns as a biometric goes back a long time, even predating the computer in the case of verifying the authors of Morse Code through the sender’s tapping rhythm. If typing on a keyboard and mouse to interact with a computer has rhythm, why would the same principle not apply?

Current research indicates that keystroke analysis has a high accuracy in user verification. According to the results of the IEEE International Conference on Big Data, participants in the 2023 Keystroke Verification Challenge achieved error rates of 3%. In fact, there are a few security tools that currently use keystroke logging analysis for user verification, such as BioCatch Connect, TypingDNA ActiveLock, Plurilock DEFEND, and TypeSense.

This is all very interesting, but it doesn’t reflect the full capacity of keystroke logging technology as an analytical tool. While applications in cyber security are already in use by several companies, the potential of keystroke analysis stretches beyond protecting sensitive computer systems. This broader potential stems from the fact that the behavioral patterns emerging from typing can also provide a window into people’s cognitive and emotional processes.

Applications in Education

One use of keystroke logging with potential lies within the bounds of the classroom, an application of keystroke logging I’ve personally had an interest in. The idea is that students can take tests or perform other tasks on computers while having their keystrokes logged. Test administrators can then analyze this data to get more information about the student’s test taking processes.

Why might it be useful to observe the this process? One common issue is anxiety. As Baxendale and colleagues discuss in an upcoming volume by Brill on special education, anxiety related issues are a common difficulty for children in educational environments.

One example known to reduce performance and development is math anxiety, which is a well-studied dysfunction. It’s characterized by feelings of worry, stress, or panic, when doing or sometimes even thinking about mathematics. Academic anxiety is a general problem in school across subjects and isn’t always apparent to outside observers or the individuals experiencing them. A student facing anxiety might give the right answers to questions in a test, therefore concealing signs of their difficulties. However, as Baxendale and colleagues suggest, if teachers record the test taking process with keystroke logging, it could reveal the anxiety through patterns in the students’ typing.

The way in which anxieties cause problems when performing tasks is through clogging up working memory and increasing cognitive load. When working memory is clogged up, and cognitive load is high, performing mental tasks gets difficult. This can cause writers to get distracted, spend more time thinking about their decisions, or make more mistakes, which show up in the keystroke log in the form of longer and more frequent pausing, as well as increased revisions during writing.

In contrast to applications in cyber security, the interesting timing features for educational purposes aren’t only located at the level of individual keystrokes. Here it becomes much more useful to consider patterns at the level of syllables, words, and paragraphs.

Markers of cognitive effort at different locations can indicate different processes. For instance, longer pauses or revisions within words can suggest problems with orthographic processing (spelling problems), longer pauses between words can suggest problems with lexical retrieval (remembering words), and longer pauses between paragraphs can suggest difficulties when planning larger parts of the text’s content, or indicate that the writer is evaluating previously written content.

The use of computer keystroke logging could be an especially effective tool for identifying students that don’t struggle enough to be identifiable by their performance on tests and other tasks alone (as will be discussed in more detail by Baxendale). These are cases where their difficulties prevent the students from reaching their full potential, but where they’re proficient enough to not meet the requirements for special education. Therefore, if students take tests with keystroke logging tools, academic anxieties and other difficulties they face could be more easily identified, facilitating early interventions. In fact, facilitating early interventions for subclinical symptoms seems to be a strong suit of keystroke analysis.

Applications in Healthcare

In recent years, there has been an increase in the number of studies investigating the application of keystroke dynamics in identifying certain neurological and psychiatric disorders. Namely, these studies have demonstrated the ability to identify disorders that, in one way or another, affect fine motor skills—and therefore influence how that person writes on the computer, phone, or tablet.

One particularly promising application of this technology is to identify mild cognitive impairment (MCI) earlier than through traditional methods. MCI is an early stage of Alzheimer’s Disease, and symptoms at this stage are very difficult to detect using traditional diagnostic methods. But the early stages often involve a degradation of fine motor skills, which then affects how that person writes. A person with MCI generally types slower than a healthy peer, also showing increased time between releasing a key and pressing another. This can be indicative of cognitive delays and attention deficits, causing difficulties remembering what to write. Some sections of the typing process may be inexplicably faster than other sections, for no apparent reason.

Unsurprisingly, Parkinson’s Disorder was also investigated, as it’s directly associated with the breakdown of fine motor control. In contrast, MCI primarily affect fine motor control through impaired executive function, slowed processing speed, and attention deficits, which indirectly disrupts movement. People with early stage Parkinson’s, however, often develop bradykinesia, meaning “slowness of movement,” which can show up in the keystroke log as increased hold and flight times, among other patterns.

Neurological disorders such as dementia or Parkinson’s aren’t the only disorder that exhibits the kinds of symptoms that causes specific typing patterns. For example, a person suffering from depression can also show a reduction in typing speed due to psychomotor impairment—a general slowness of thoughts, actions, and speech. Another symptom of depression, cognitive fatigue, can result in variability in typing speed as well.

According to the meta-analysis of 25 studies by Alfalahi colleagues, keystroke analysis yielded high classification rates (82-89%) of all the above disorders as a standalone diagnostic method, with confidence intervals ranging from good to excellent. Unsurprisingly, the accuracy increased when combining multiple diagnostic methods besides keystroke dynamics. These findings suggest that pathologists could use keystroke logging to improve diagnostic procedures for disorders that either directly, or indirectly, affect the way people write through fine motor decline.

Could keystroke logging be exploited?

There seems to be great potential in keystroke logging technology. The findings and applications of the technology could lead to great technological and medical improvements. The security of sensitive systems have been, and might be further improved by implementing keystroke analysis with more complex machine learning algorithms, new educational technologies could be developed to include keystroke analysis to identify developmental obstacles, and diagnostic methodology could apply keystroke analysis to identify multiple neuropsychiatric disorders. Despite this great potential, I can’t help but contemplate potential exploitation beyond traditional cyberattacks.

To connect back to the Target story I introduced in the beginning, there are similarities in what keystroke logging data could potentially be used for, and for what purpose. Much like the Target example, companies can leverage the inferred knowledge about users for economic gain. In one case, a company uses purchases in their store to infer pregnancy, and in the other, a company could use typing patterns to infer identities, anxieties, psychiatric and neurological disorders. This is a loose connection; the similarity lies primarily in both types of data being behavioral. But in the same way that purchasing items in a department store is an indispensable part of visiting a department store, writing is an indispensable part of using almost any service on the internet, especially social media.

Most people already know that their behavior is increasingly datafied. Google gathers information about every aspect of users’ interactions with their services: searches, geographic location, preferences, and habits. They use this information to build statistical models that predict future behavior to optimize advertisement and enhance product development. This is of course true of almost every service people use on the internet.

It seems bizarre to think that a company like Google, Apple, or Meta might be able to track typing patterns over time to then derive the health of their billions of users, without gaining access to their medical records or the users themselves consciously revealing that information. Yet, with the aforementioned research into the use of keystroke logging that show success in analyzing in-the-wild keystroke data to infer neuropsychiatric conditions, it seems theoretically possible. To my knowledge, none of these companies store that kind of interaction data, at least with the detail required to conduct such analyses. But that could change.

Consider the following fictional example. Imagine a social media platform called Xbook. This platform collects a wide variety of data from its users, consisting of the users’ activity on Xbook, including the content of their posts, their interactions with other users, their time spent viewing other posts and videos, the user’s searches, location data, as well as their activity on other platforms collected from third party cookies. They also collect keystroke logging data from every post, drafted post, and search. Using the keystroke dynamics of a user, Xbook can identify indicators of depression, like psychomotor slowing. Even if a user has never shared any information about their psychological well-being, Xbook can create a psychological profile that includes mood disorders, allowing the social media platform to use the psychological profile for ad targeting.

Why might advertisers want to know the mental health status of their users? One way they might be able to leverage that kind of knowledge is through certain comorbidities of depression. Depressed people are more likely to show symptoms of compulsive buying disorders. One study found that around 30% of participants with depression were diagnosed as compulsive buyers, and another reported that compulsive buying behavior was a strong predictor of depressive symptoms—explaining about 42% of the variance. Mental health information such as this could be valuable information for advertisers given that people with compulsive buying disorders might be more susceptible to advertisement. This kind of knowledge could create an economic incentive to allocate more money towards ads for people with a certain psychological profile.

Inferring mental characteristics for targeted advertisement might seem implausible simply from its dystopic nature. However, it would not be the first time that large corporations has proposed it. For instance, you might remember the Facebook-Cambridge Analytica scandal, wherein millions of users had their data harvested for political advertisement. Cambridge Analytica developed psychological profiles from facebook data, which was then used for targeted political ads. Meta even has a patent for a system that uses linguistic analysis on text communications to determine personality characteristics, and the patent explicitly states the intention of such a system:

“The identified personality characteristics are stored in the user’s user profile and are used to select content for presentation to the user. For example, the identified personality characteristics may be used along with other information to select news stories, advertisements, or recommendations of actions presented to the user.”

This system was never used according to Meta, but that doesn’t mean that Meta won’t use it or a similar system in the future. In fact, it seems like they have offered advertisers the possibility to target ads to teens during particularly vulnerable moments in their lives—moments when they felt “worthless,” “stressed,” or “anxious,” according to an article published in Wired.

Meta denied that they offer such services, and claimed it was only for research purposes, much like previous research on “emotional contagion” where Facebook investigated how changes in the emotional character of their news feed can manipulate users’ emotional states, and, therefore, the content that they post on their website. However, this doesn’t explain why research on the identification of young teens’ emotional states was presented to potential advertisers.

Final remarks

I don’t think the dystopian scenario I painted will be realized. From what I can tell, Google, Meta, or Apple hasn’t used keystroke logging for behavioral analysis on their platforms. It might never be a cost effective way to extract value from users considering the incredible amount of data it would require. But applications of keystroke logging in cyber security, education, and medicine show immense potential. Even in everyday use of technology, with proper consent and transparency. All in all, I think the positive possibilities significantly outweigh the potential pitfalls.

But it’s important to think about the possible consequences of new technology, or new applications of old technology. Especially today, when the traces left behind from our interactions is increasingly linked to the way companies like Google, Apple, and Meta remain profitable. Where our activity on their platforms generate data which they use to predict our behavior, influence our activity, and sell to third parties—all provided for free by users.