How to prevent duplicates during imports (with a simple workflow)

January 28, 2024•8 min read

Duplicate records are the silent killer of CRM data quality. They cause reporting errors, waste sales time, and create customer confusion when different reps contact the same person.

Most duplicates are created during imports. You think you're cleanly bringing in historical data, but six months later you're dealing with "Acme Inc," "Acme, Inc.," "ACME," and "Acme Corporation" all as separate company records.

Here's the workflow I use to prevent duplicates during every import project.

Step 1: Understand your CRM's duplicate detection

Before importing anything, know how your CRM identifies duplicates. Different platforms use different logic.

Pipedrive duplicate detection:

Checks email for contacts
Checks company name for organizations
Checks phone number as fallback
Match must be exact (case-insensitive)

Attio duplicate detection:

More flexible matching
Can match on multiple fields
Uses fuzzy matching for names
Configurable duplicate rules

Critical insight: Most CRMs are looking for exact matches. "Acme Inc" and "Acme, Inc." are different strings and will create separate records.

Step 2: Export and audit your import data

Get all your data into a spreadsheet before trying to import. This lets you clean it systematically.

Common data sources:

Spreadsheets from various team members
Old CRM exports
Email contact lists
Event attendee lists
Form submissions

Combine all sources into a master spreadsheet. Add a "Source" column so you know where each record came from.

Step 3: Deduplicate within your import file

Before even thinking about the CRM, remove duplicates in your spreadsheet.

For contacts/people:

Sort by email address

Identify exact duplicate emails

Merge data from duplicate rows (combine all non-empty fields)

Delete duplicate rows

For companies:

Sort by company name

Standardize naming (remove "Inc," "LLC," etc. variations)

Look for similar names that should be merged

Research any unclear cases

Pro tip: Use Excel's or Google Sheets' "Remove Duplicates" feature as a starting point, but manually review the results. Automated deduplication often misses things or incorrectly merges distinct entities.

Step 4: Standardize formatting

Inconsistent formatting creates false duplicates even when the data is the same.

Company names:

Remove extra spaces
Standardize "Inc." vs "Incorporated" vs "Inc"
Decide on comma usage: "Acme, Inc." vs "Acme Inc."
Consistent capitalization
Remove parenthetical suffixes unless critical

Example transformations:

"acme corp. (formerly Beta)" → "Acme Corp"
"ACME CORPORATION" → "Acme Corporation"
"Acme Inc." (two spaces) → "Acme Inc."

Email addresses:

Convert to lowercase
Trim whitespace
Remove any with invalid format

Phone numbers:

Choose one format (e.g., +1-555-123-4567)
Remove or standardize all numbers consistently
Include country code if international

Step 5: Check against existing CRM data

Export a list of existing companies and contacts from your CRM. Compare against your import file to identify potential duplicates.

Simple approach:

Export existing CRM data (companies and contacts)

Create a column in your import file: "Already in CRM?"

Use VLOOKUP or similar to check if email/company exists

Flag matches for review

More thorough approach:

Use fuzzy matching tools to catch near-duplicates
Check multiple fields (company name + location, person name + company)
Review all flagged matches manually

For flagged records, decide:

Skip import (already have this record)
Update existing record (import has newer/better data)
Import as new (false positive, actually distinct)

Step 6: Create a matching strategy

Before importing, document your matching rules. This helps when reviewing post-import.

Example matching strategy:

Contacts:
- Primary match: Email address (exact, case-insensitive)
- Secondary match: Name + Company (both must match)
- Action if match found: Update existing record

Companies:
- Primary match: Company name (standardized format)
- Secondary match: Domain name
- Action if match found: Skip import

Share this with your team so everyone understands the logic.

Step 7: Stage your import

Never import everything at once. Use a staged approach to catch issues early.

Import stages:

Stage 1: Test batch (10-20 records)

Import a small, diverse sample
Check for duplicates manually
Verify data maps to correct fields
Confirm matching rules work as expected

Stage 2: Pilot batch (100-200 records)

Larger sample across different record types
Run duplicate detection reports
Get team feedback on data quality

Stage 3: Full import

Only proceed if Stages 1-2 had zero or near-zero duplicates
Import in chunks (500-1000 at a time)
Run duplicate checks between chunks

Stage 4: Post-import cleanup

Run full duplicate detection
Manually review any duplicates created
Merge or delete as needed

Step 8: Use your CRM's deduplication tools

After import, use built-in duplicate detection to catch anything you missed.

Pipedrive:

Go to Contacts/Companies
Use "Duplicates" feature
Review and merge suggested duplicates
Can set up rules for automatic merging (use carefully)

Attio:

Configure duplicate detection rules
Review suggested merges
Merge records with merge conflict resolution
Set up ongoing duplicate prevention rules

Important: Always review suggested merges. Automated merging can incorrectly combine distinct entities, especially with common names.

Step 9: Set up ongoing duplicate prevention

Import isn't the only way duplicates get created. Prevent future duplicates with these rules.

CRM-level rules:

Block creation of contacts with duplicate emails
Warn when company name is similar to existing
Require email validation before saving
Use enrichment to auto-populate data (reduces manual entry errors)

Process-level rules:

Require reps to search before creating new records
Use form integrations that check for existing records
Regular duplicate audits (monthly or quarterly)

Team training:

Show reps how to search effectively
Explain why duplicates matter
Make it easy to report suspected duplicates

Common import scenarios and solutions

Scenario: Merging from multiple CRMs

You have data in two different systems and need to consolidate.

Solution:

Export from both systems

Identify unique identifier (usually email or company domain)

Merge data in spreadsheet before importing

Use the CRM with better data as "master"

Import the merged dataset

Scenario: Customer data + prospect data

You have existing customers in one system and prospects in another.

Solution:

Identify customers who are also in prospect list

Decide which data is more current/accurate

Create "Status" field to mark as customer vs prospect

Import customers first, then prospects

Use email matching to prevent customer duplicates

Scenario: International data with name variations

Same companies/people have different naming conventions in different regions.

Solution:

Create "Legal Name" field for official company name

Use "Trading As" or "DBA" field for regional variations

Choose one primary name for CRM

Document variations in notes field

Use domain name as tiebreaker for matching

Tools to help with deduplication

While manual review is best, these tools speed up the process:

OpenRefine: Free tool for cleaning messy data, finding duplicates, and standardizing formats.

Google Sheets Remove Duplicates: Built-in feature for basic deduplication.

Excel Power Query: Advanced data transformation and deduplication.

Duplicate detection extensions: Many CRMs have marketplace apps for enhanced duplicate detection.

Enrichment services: Tools like Clearbit or Apollo that auto-populate data reduce manual entry errors.

What to do if you already have duplicates

If you've already imported data and have duplicate problems:

Quick triage:

Export all records

Sort by email/company name

Identify obvious duplicates (exact email matches)

Merge these first

Systematic cleanup:

Run CRM duplicate detection report

Review 10-20 duplicates per day

Document merge decisions

Track common patterns to prevent future duplicates

Prevent making it worse:

Stop all imports until duplicate rules are fixed

Train team on searching before creating new records

Set up validation rules to block obvious duplicates

Measuring success

You'll know your import process is working when:

During import:

Test batches show zero duplicates
Full import creates minimal new duplicates (<1% of imported records)
No data is lost or incorrectly merged

After import:

CRM duplicate reports show minimal matches
Team can find records easily (not multiple versions)
Reports are accurate (not inflated by duplicates)

Long-term:

New duplicate creation rate is low (<5 per month)
Existing duplicates decrease over time
Team spends less time on manual deduplication

Final checklist

Before your next import:

[ ] Understand your CRM's duplicate detection logic
[ ] Export and combine all data sources
[ ] Remove duplicates within import file
[ ] Standardize all formatting
[ ] Check against existing CRM data
[ ] Document matching strategy
[ ] Import test batch and review
[ ] Set up ongoing duplicate prevention
[ ] Train team on search-before-create

The extra upfront work prevents months or years of cleanup later. Clean data is the foundation of a useful CRM—start there.

Need help with your CRM?

If you're dealing with messy data, manual processes, or a CRM that doesn't fit your team, let's talk.

Book a call