Back to blog

How to prevent duplicates during imports (with a simple workflow)

8 min read

Duplicate records are the silent killer of CRM data quality. They cause reporting errors, waste sales time, and create customer confusion when different reps contact the same person.

Most duplicates are created during imports. You think you're cleanly bringing in historical data, but six months later you're dealing with "Acme Inc," "Acme, Inc.," "ACME," and "Acme Corporation" all as separate company records.

Here's the workflow I use to prevent duplicates during every import project.

Step 1: Understand your CRM's duplicate detection

Before importing anything, know how your CRM identifies duplicates. Different platforms use different logic.

Pipedrive duplicate detection:

  • Checks email for contacts
  • Checks company name for organizations
  • Checks phone number as fallback
  • Match must be exact (case-insensitive)

Attio duplicate detection:

  • More flexible matching
  • Can match on multiple fields
  • Uses fuzzy matching for names
  • Configurable duplicate rules

Critical insight: Most CRMs are looking for exact matches. "Acme Inc" and "Acme, Inc." are different strings and will create separate records.

Step 2: Export and audit your import data

Get all your data into a spreadsheet before trying to import. This lets you clean it systematically.

Common data sources:

  • Spreadsheets from various team members
  • Old CRM exports
  • Email contact lists
  • Event attendee lists
  • Form submissions

Combine all sources into a master spreadsheet. Add a "Source" column so you know where each record came from.

Step 3: Deduplicate within your import file

Before even thinking about the CRM, remove duplicates in your spreadsheet.

For contacts/people:

  • Sort by email address
  • Identify exact duplicate emails
  • Merge data from duplicate rows (combine all non-empty fields)
  • Delete duplicate rows

For companies:

  • Sort by company name
  • Standardize naming (remove "Inc," "LLC," etc. variations)
  • Look for similar names that should be merged
  • Research any unclear cases

Pro tip: Use Excel's or Google Sheets' "Remove Duplicates" feature as a starting point, but manually review the results. Automated deduplication often misses things or incorrectly merges distinct entities.

Step 4: Standardize formatting

Inconsistent formatting creates false duplicates even when the data is the same.

Company names:

  • Remove extra spaces
  • Standardize "Inc." vs "Incorporated" vs "Inc"
  • Decide on comma usage: "Acme, Inc." vs "Acme Inc."
  • Consistent capitalization
  • Remove parenthetical suffixes unless critical

Example transformations:

  • "acme corp. (formerly Beta)" → "Acme Corp"
  • "ACME CORPORATION" → "Acme Corporation"
  • "Acme Inc." (two spaces) → "Acme Inc."

Email addresses:

  • Convert to lowercase
  • Trim whitespace
  • Remove any with invalid format

Phone numbers:

  • Choose one format (e.g., +1-555-123-4567)
  • Remove or standardize all numbers consistently
  • Include country code if international

Step 5: Check against existing CRM data

Export a list of existing companies and contacts from your CRM. Compare against your import file to identify potential duplicates.

Simple approach:

  • Export existing CRM data (companies and contacts)
  • Create a column in your import file: "Already in CRM?"
  • Use VLOOKUP or similar to check if email/company exists
  • Flag matches for review

More thorough approach:

  • Use fuzzy matching tools to catch near-duplicates
  • Check multiple fields (company name + location, person name + company)
  • Review all flagged matches manually

For flagged records, decide:

  • Skip import (already have this record)
  • Update existing record (import has newer/better data)
  • Import as new (false positive, actually distinct)

Step 6: Create a matching strategy

Before importing, document your matching rules. This helps when reviewing post-import.

Example matching strategy:

Contacts:
- Primary match: Email address (exact, case-insensitive)
- Secondary match: Name + Company (both must match)
- Action if match found: Update existing record

Companies:
- Primary match: Company name (standardized format)
- Secondary match: Domain name
- Action if match found: Skip import

Share this with your team so everyone understands the logic.

Step 7: Stage your import

Never import everything at once. Use a staged approach to catch issues early.

Import stages:

Stage 1: Test batch (10-20 records)

  • Import a small, diverse sample
  • Check for duplicates manually
  • Verify data maps to correct fields
  • Confirm matching rules work as expected

Stage 2: Pilot batch (100-200 records)

  • Larger sample across different record types
  • Run duplicate detection reports
  • Get team feedback on data quality

Stage 3: Full import

  • Only proceed if Stages 1-2 had zero or near-zero duplicates
  • Import in chunks (500-1000 at a time)
  • Run duplicate checks between chunks

Stage 4: Post-import cleanup

  • Run full duplicate detection
  • Manually review any duplicates created
  • Merge or delete as needed

Step 8: Use your CRM's deduplication tools

After import, use built-in duplicate detection to catch anything you missed.

Pipedrive:

  • Go to Contacts/Companies
  • Use "Duplicates" feature
  • Review and merge suggested duplicates
  • Can set up rules for automatic merging (use carefully)

Attio:

  • Configure duplicate detection rules
  • Review suggested merges
  • Merge records with merge conflict resolution
  • Set up ongoing duplicate prevention rules

Important: Always review suggested merges. Automated merging can incorrectly combine distinct entities, especially with common names.

Step 9: Set up ongoing duplicate prevention

Import isn't the only way duplicates get created. Prevent future duplicates with these rules.

CRM-level rules:

  • Block creation of contacts with duplicate emails
  • Warn when company name is similar to existing
  • Require email validation before saving
  • Use enrichment to auto-populate data (reduces manual entry errors)

Process-level rules:

  • Require reps to search before creating new records
  • Use form integrations that check for existing records
  • Regular duplicate audits (monthly or quarterly)

Team training:

  • Show reps how to search effectively
  • Explain why duplicates matter
  • Make it easy to report suspected duplicates

Common import scenarios and solutions

Scenario: Merging from multiple CRMs

You have data in two different systems and need to consolidate.

Solution:

  • Export from both systems
  • Identify unique identifier (usually email or company domain)
  • Merge data in spreadsheet before importing
  • Use the CRM with better data as "master"
  • Import the merged dataset

Scenario: Customer data + prospect data

You have existing customers in one system and prospects in another.

Solution:

  • Identify customers who are also in prospect list
  • Decide which data is more current/accurate
  • Create "Status" field to mark as customer vs prospect
  • Import customers first, then prospects
  • Use email matching to prevent customer duplicates

Scenario: International data with name variations

Same companies/people have different naming conventions in different regions.

Solution:

  • Create "Legal Name" field for official company name
  • Use "Trading As" or "DBA" field for regional variations
  • Choose one primary name for CRM
  • Document variations in notes field
  • Use domain name as tiebreaker for matching

Tools to help with deduplication

While manual review is best, these tools speed up the process:

OpenRefine: Free tool for cleaning messy data, finding duplicates, and standardizing formats.

Google Sheets Remove Duplicates: Built-in feature for basic deduplication.

Excel Power Query: Advanced data transformation and deduplication.

Duplicate detection extensions: Many CRMs have marketplace apps for enhanced duplicate detection.

Enrichment services: Tools like Clearbit or Apollo that auto-populate data reduce manual entry errors.

What to do if you already have duplicates

If you've already imported data and have duplicate problems:

Quick triage:

  • Export all records
  • Sort by email/company name
  • Identify obvious duplicates (exact email matches)
  • Merge these first

Systematic cleanup:

  • Run CRM duplicate detection report
  • Review 10-20 duplicates per day
  • Document merge decisions
  • Track common patterns to prevent future duplicates

Prevent making it worse:

  • Stop all imports until duplicate rules are fixed
  • Train team on searching before creating new records
  • Set up validation rules to block obvious duplicates

Measuring success

You'll know your import process is working when:

During import:

  • Test batches show zero duplicates
  • Full import creates minimal new duplicates (<1% of imported records)
  • No data is lost or incorrectly merged

After import:

  • CRM duplicate reports show minimal matches
  • Team can find records easily (not multiple versions)
  • Reports are accurate (not inflated by duplicates)

Long-term:

  • New duplicate creation rate is low (<5 per month)
  • Existing duplicates decrease over time
  • Team spends less time on manual deduplication

Final checklist

Before your next import:

  • [ ] Understand your CRM's duplicate detection logic
  • [ ] Export and combine all data sources
  • [ ] Remove duplicates within import file
  • [ ] Standardize all formatting
  • [ ] Check against existing CRM data
  • [ ] Document matching strategy
  • [ ] Import test batch and review
  • [ ] Set up ongoing duplicate prevention
  • [ ] Train team on search-before-create

The extra upfront work prevents months or years of cleanup later. Clean data is the foundation of a useful CRM—start there.

Need help with your CRM?

If you're dealing with messy data, manual processes, or a CRM that doesn't fit your team, let's talk.

Book a call