Blog post

Email Age And Their Domains: Distributions as a Fraud Signal

SentiLink

Published

August 1, 2023

An email address can tell a fraud fighter a lot about an applicant. From the syntax and structure to its age and length of association, a deeper understanding of this one identity element can provide valuable signals as to whether fraud is being committed.


As the use of disposable domains has become prevalent, recognizing that "not all domains are created equal" has become increasingly important. But it's not just the domain itself that can provide signals for detecting fraud. For example: There are about 1.8 billion Gmail accounts and about 1.5 million AOL users. Within these two very different email providers, what does the age of an email relative to all other emails across that same domain tell us about the likelihood that the email belongs to the individual whose PII is in the application?

AOL vs Gmail: What can email domain distributions tell us about fraudulent activity?

The potential for using the distribution of email ages with respect to a specific domain as a fraud signal relies first on two assumptions:

  • First, is the assumption that an AOL email will have been in existence longer than a Gmail address. That is:

𝑃(π‘’π‘šπ‘Žπ‘–π‘™ π‘‘π‘œπ‘šπ‘Žπ‘–π‘› 𝑖𝑠 πΊπ‘šπ‘Žπ‘–π‘™) ∩ 𝑃(π‘’π‘šπ‘Žπ‘–π‘™ π‘“π‘–π‘Ÿπ‘ π‘‘ 𝑠𝑒𝑒𝑛 π‘‘π‘Žπ‘‘π‘’ 𝑖𝑠 π‘Ÿπ‘’π‘π‘’π‘›π‘‘) = π‘šπ‘’π‘‘π‘–π‘’π‘š βˆ’ β„Žπ‘–π‘”β„Ž
𝑃(π‘’π‘šπ‘Žπ‘–π‘™ π‘‘π‘œπ‘šπ‘Žπ‘–π‘› 𝑖𝑠 𝐴𝑂𝐿) ∩ 𝑃(π‘’π‘šπ‘Žπ‘–π‘™ π‘“π‘–π‘Ÿπ‘ π‘‘ 𝑠𝑒𝑒𝑛 π‘‘π‘Žπ‘‘π‘’ 𝑖𝑠 π‘Ÿπ‘’π‘π‘’π‘›π‘‘) = π‘™π‘œπ‘€

  • Second, we similarly assume that users of AOL accounts skew toward an older demographic than Gmail users, namely:

𝑃(π‘’π‘šπ‘Žπ‘–π‘™ π‘‘π‘œπ‘šπ‘Žπ‘–π‘› 𝑖𝑠 πΊπ‘šπ‘Žπ‘–π‘™) ∩ 𝑃(π‘Žπ‘ π‘ π‘œπ‘π‘–π‘Žπ‘‘π‘’π‘‘ 𝑖𝑑𝑒𝑛𝑑𝑖𝑑𝑦 𝑖𝑠 π‘¦π‘œπ‘’π‘›π‘”) = π‘šπ‘’π‘‘π‘–π‘’π‘š βˆ’ β„Žπ‘–π‘”β„Ž
𝑃(π‘’π‘šπ‘Žπ‘–π‘™ π‘‘π‘œπ‘šπ‘Žπ‘–π‘› 𝑖𝑠 𝐴𝑂𝐿) ∩ 𝑃(π‘Žπ‘ π‘ π‘œπ‘π‘–π‘Žπ‘‘π‘’π‘‘ 𝑖𝑑𝑒𝑛𝑑𝑖𝑑𝑦 𝑖𝑠 π‘¦π‘œπ‘’π‘›π‘”) = π‘™π‘œπ‘€

Domain distribution analysis

 

To test the validity of this observation, we determined the cumulative probabilities for each domain across various points in time, i.e., what is the probability that a random Gmail/AOL email is:

- 1 day old or younger
- 30 days old or younger
- 2 months old or younger
- 6 months old or younger
- 2 years old or younger
- 10 years old or younger
- etc…

Below, we collected the cumulative probabilities for each time interval to build a curve that describes the distribution of email ages for a given month’s sample by domain.

Email Domain Age Distribution

N.b. The jumps are due to varying interval sizes at certain cutoffs

If we zoom into a single point on these graphs respectively, we note that for the population we studied:

  • The likelihood of a random Gmail email address being 1082 days or younger is 20.2%

Screenshot 2023-07-31 at 9.14.35 PM

  • The likelihood of a random AOL email address being 1082 days or younger is 3.8%

Screenshot 2023-07-31 at 9.05.43 PM

The analysis of this sample suggests that an AOL email address with a first-seen date of 3 years or younger is a less common outcome and therefore, potentially suspicious while a Gmail email address with the same date could be perceived as less suspicious.

Enhanced understanding without friction


Deriving insights from core identity data -- name, DOB, SSN, address, phone, and email -- is central to what SentiLink does. But providing our partners with attribute-level tools to sharpen their understanding of each of these inputs individually allows for richer context and a deeper understanding of an applicant. In this example, while knowing the age of an email address can provide helpful direction, understanding the implication of that email's age relative to all other emails within its domain can provide a Risk Team an extra edge to more confidently identify fraud.

While a single attribute can neither prove nor disprove the validity of an application, access to unique and novel attributes allows risk teams more sophisticated approaches to answering the question of legitimacy without adding friction to the customer flow.

 

Share

Learn how we can help.

Schedule a demo with a fraud expert and evaluate our solutions.