Blog post
Email Age And Their Domains: Distributions as a Fraud Signal
SentiLink
Published
August 1, 2023
An email address can tell a fraud fighter a lot about an applicant. From the syntax and structure to its age and length of association, a deeper understanding of this one identity element can provide valuable signals as to whether fraud is being committed.
As the use of disposable domains has become prevalent, recognizing that "not all domains are created equal" has become increasingly important. But it's not just the domain itself that can provide signals for detecting fraud. For example: There are about 1.8 billion Gmail accounts and about 1.5 million AOL users. Within these two very different email providers, what does the age of an email relative to all other emails across that same domain tell us about the likelihood that the email belongs to the individual whose PII is in the application?
AOL vs Gmail: What can email domain distributions tell us about fraudulent activity?
The potential for using the distribution of email ages with respect to a specific domain as a fraud signal relies first on two assumptions:
- First, is the assumption that an AOL email will have been in existence longer than a Gmail address. That is:
π(πππππ ππππππ ππ πΊππππ) β© π(πππππ ππππ π‘ π πππ πππ‘π ππ ππππππ‘) = πππππ’π β βππβ
π(πππππ ππππππ ππ π΄ππΏ) β© π(πππππ ππππ π‘ π πππ πππ‘π ππ ππππππ‘) = πππ€
- Second, we similarly assume that users of AOL accounts skew toward an older demographic than Gmail users, namely:
π(πππππ ππππππ ππ πΊππππ) β© π(ππ π πππππ‘ππ πππππ‘ππ‘π¦ ππ π¦ππ’ππ) = πππππ’π β βππβ
π(πππππ ππππππ ππ π΄ππΏ) β© π(ππ π πππππ‘ππ πππππ‘ππ‘π¦ ππ π¦ππ’ππ) = πππ€
Domain distribution analysis
To test the validity of this observation, we determined the cumulative probabilities for each domain across various points in time, i.e., what is the probability that a random Gmail/AOL email is:
- 1 day old or younger
- 30 days old or younger
- 2 months old or younger
- 6 months old or younger
- 2 years old or younger
- 10 years old or younger
- etcβ¦
Below, we collected the cumulative probabilities for each time interval to build a curve that describes the distribution of email ages for a given monthβs sample by domain.
N.b. The jumps are due to varying interval sizes at certain cutoffs
If we zoom into a single point on these graphs respectively, we note that for the population we studied:
- The likelihood of a random Gmail email address being 1082 days or younger is 20.2%
- The likelihood of a random AOL email address being 1082 days or younger is 3.8%
The analysis of this sample suggests that an AOL email address with a first-seen date of 3 years or younger is a less common outcome and therefore, potentially suspicious while a Gmail email address with the same date could be perceived as less suspicious.
Enhanced understanding without friction
Deriving insights from core identity data -- name, DOB, SSN, address, phone, and email -- is central to what SentiLink does. But providing our partners with attribute-level tools to sharpen their understanding of each of these inputs individually allows for richer context and a deeper understanding of an applicant. In this example, while knowing the age of an email address can provide helpful direction, understanding the implication of that email's age relative to all other emails within its domain can provide a Risk Team an extra edge to more confidently identify fraud.
While a single attribute can neither prove nor disprove the validity of an application, access to unique and novel attributes allows risk teams more sophisticated approaches to answering the question of legitimacy without adding friction to the customer flow.