Gmail Profile Extraction
You have two data sources. Extract different things from each.
Source 1: Discovery Searches
The discovery section contains emails found by targeted searches. Extract specific facts:
| Category | What to Extract |
|---|---|
| location | City, state, zip from shipping addresses |
| children | Names, ages, schools |
| partner | Name, relationship type |
| pets | Names, species/breed |
| birthday | Birth date or month |
| phone_numbers | Phone numbers mentioned in emails (extract the number) |
| whatsapp, signal_app, telegram | Messenger usernames or phone numbers from verification emails |
| instagram, linkedin, twitter, github, etc. | Username or profile URL from email subject/snippet |
| work_calendar, meetings | Regular meeting patterns, preferred video platform |
| slack | Workspace names they belong to |
| amazon, subscriptions, food_delivery, retail | Preferred services and stores |
| homeowner/renter | Property type, approximate area |
| vehicle | Make, model, year |
| banking, credit, investments | Institution names only (never account numbers) |
| moving | New location, move date |
| job_change | New company, role, start date |
| wedding | Event date, partner name if mentioned |
| travel | Frequent destinations, airlines, loyalty programs |
| health | Healthcare providers (never conditions) |
| education | Schools, degrees, graduation years |
| spotify, netflix, apple, discord | Platform usage (indicates ecosystem preferences) |
| gaming | Platforms, games they play |
| online_courses, substack | Learning interests, newsletters they read |
| fitness | Gym, apps, fitness activities |
| venmo | Frequent contacts (who they pay/receive from) |
| work_tools | Professional tools they use (Figma, Notion, etc.) |
| donations | Causes and charities they support (reveals values) |
| books | Reading habits, genres, recent reads |
| side_business | Whether they sell/create something, what platform |
| kids_activities | Sports, activities, involvement level |
| professional_orgs | Industry associations, memberships, conferences |
Signal Confidence
Not all discovery results are equal. Weight by source:
| Source | Confidence | Why |
|---|---|---|
in:sent queries (children, partner, pets, phone, job_change, moving) |
High | User wrote this about themselves |
| Verification emails (from:whatsapp, from:instagram with "code") | High | Confirms account ownership |
| Shipping/delivery notifications | Medium | Address could be gift recipient |
| Generic service emails | Low | Confirms usage, not details |
Recency Matters
Use timeAgo to weight information:
- Location: Prefer recent shipping addresses.
2mo ago>3y ago - Job: Recent announcements override old ones
- Partner/children: Older mentions are fine (stable facts)
- Phone numbers: Prefer recent (numbers change)
- Platform accounts: Any age confirms existence
Ignore These Patterns
Even with targeted queries, some noise gets through. Skip:
Marketing/Promotional:
- Emails with "unsubscribe" in footer but no personal info
- "Your husband will love this!" — not about THEIR husband
- Generic "dear customer" or "dear member" emails
False Positives by Category:
| Category | Ignore |
|---|---|
| children | "Kids sale!" "For your kids" (marketing) |
| partner | "Gift for your wife" (retail marketing) |
| location | Gift shipping addresses (check if name matches user) |
| job_change | Recruiter outreach, job alerts |
| donations | "Please donate" solicitations (not actual donations) |
| professional_orgs | Spam conference invites |
Signal vs Noise Indicators:
- Signal: Specific names, dates, confirmation language, user in To: field
- Noise: Generic language, bulk sender patterns, promotional tone
Source 2: Writing Samples
The writing samples contain sent email content. Extract persistent patterns, not transient activity.
Extract:
- Work domain: Infer field/industry from recurring themes, technical vocabulary, domain concepts
- Interests: Topics appearing meaningfully across multiple emails (2+ mentions with depth)
Do NOT extract:
- Specific project names (transient)
- Current tasks or deadlines (changes constantly)
- Topics from single emails (could be one-off)
- Collaborator names (privacy concern)
Persistence signals: Topic appears across emails with different dates, relates to work domain, shows depth not just mentions.
Output
Write to User Profile:
- personal — location, family, pets, birthday
- work — work domain, industry (only if clear pattern emerges)
- interests — social platforms with usernames, hobbies, topics of genuine interest
Rules: Only write facts with clear evidence. Skip weak signals. Never write sensitive financial details. When uncertain, write nothing—false positives are worse than missing data.