If you sell anything to local businesses — software, supplies, insurance, marketing services — you've probably wanted a complete list of your target customers in a given city. Not a sampling. The whole map.
This is harder than it sounds and easier than the database vendors want you to believe.
I'll use dental clinics in Phoenix as the running example because that's what I've spent the most time on, but the same approach works for restaurants, salons, gyms, vets, law firms, accountants — anything with a physical location and a website.
The realistic upper bound
Before you build anything, it's worth asking: how many of these businesses are there, actually?
For Phoenix dentistry, the answer is roughly 600 to 800 practices in the metro area, depending on how you count single-doctor offices versus group practices. Of those, about 400 to 500 have their own websites. The rest are either part of a corporate DSO that operates under a single site, or they're old enough that they never bothered with a website at all.
That gap matters. When a salesperson tells you their database has "1,200 Phoenix dental contacts," what they mean is they have 1,200 records, some of which are former employees, some are duplicates, some are dentists who left town years ago, and some are real and current. The actual number of current practices you can reach by reading the open web is somewhere under 500. That's your ceiling and there's no clever way around it.
Once you accept that, the question becomes how close you can get to the ceiling without wasting a weekend doing manual research.
The sources, ranked by what they actually give you
Four places have good data on local businesses. They each fill in different gaps.
The first is plain web search. If you Google "dental clinic Phoenix" and 20 variations of that, you'll get a couple hundred unique practice websites in the results. Not all of them — search engines miss some, especially for sites with bad SEO or recent listings — but a solid baseline.
The second is OpenStreetMap. This one is underrated. OSM has structured data on most physical-location businesses worldwide. For Phoenix dentists, an Overpass query with amenity=dentist returns roughly 600 entries with names, addresses, and (for about 40%) websites. The data is free, structured, and human-edited by mappers who care about accuracy. The downside is freshness — a clinic that opened last month probably isn't in OSM yet.
The third is directory aggregators. Yelp, Healthgrades, BBB, Avvo, and similar sites all have category pages that list 20 to 50 businesses each. If your scraper can follow those category pages and pull out the individual business profiles, you'll get coverage that web search missed. Yelp in particular has a lot of business listings that the businesses themselves don't have separate websites for.
The fourth is the residue of every previous run anyone has done. If five different people have already searched "Phoenix dentists" through Extractly, the corpus already has the answer. The sixth person who searches gets it back instantly without us hitting search engines at all. This source only exists if you're using a shared platform — it's the practical version of the "data network effect" everyone talks about.
For one query, combining sources matters a lot. Search alone gets you ~250 results for Phoenix dentists. Add OSM and you're at ~400. Add directory-follow and you're at ~480. That's roughly 95% of the realistic ceiling.
A real run
I ran "dental clinics in Phoenix, Arizona" with a 500-lead target. Here's the timing and the breakdown.
The discovery step took about 35 seconds. The funnel looked like this:
| Stage | Count |
|---|---|
| Raw search results across 3 engines (30 queries × 3 engines × ~50 results) | 4,200 |
| After dedup and aggregator-filter | 540 |
| After AI confirms each is a real business | 287 |
| Added from OpenStreetMap | 142 (overlap with above already removed) |
| Final list | 429 distinct practices |
So I asked for 500 and got 429. The gap is partly because the genuine ceiling for Phoenix is under 500, and partly because some practices have websites I couldn't find. About 60-70 practices I knew of from local knowledge weren't in the final list, mostly very small single-doctor shops that aren't well indexed.
The extraction step on those 429 took 14 minutes. The crawler visited up to 5 pages per site — homepage, /about, /contact, /team, /services. Total compute cost: $0.62 on the backend.
What came back, on average per practice:
| Field | Hit rate |
|---|---|
| Business name + address + phone | 98% |
| At least one office email | 84% |
| At least one named doctor | 76% |
| Specific doctor's direct email | 38% |
| Services list | 91% |
| Pricing (explicit numbers) | 24% |
| Insurance carriers accepted | 47% |
| Social media handles | 64% |
If you're a salesperson, the line that matters is the 38% with a named doctor's direct email. That's roughly 160 leads where you can email a specific human at a specific practice. The other 270 records are useful in other ways — for filtered cold-calling lists, for territory mapping, for benchmarking — but they're not personalized-email-ready.
Three real things you can do with this dataset
If you sell software to dental practices, the personalization angle gets a lot easier. You can write to Dr. Patel at Bright Smile Dental and reference that their site lists "Invisalign from $3,500" — which immediately tells her you've actually looked at the practice. The reply rate on emails like that runs an order of magnitude higher than generic blasts.
If you run a dental practice and want competitive intelligence, the same dataset shows you every competitor's service menu, their published prices, which insurance they accept, and which marketing channels they emphasize. It's a snapshot of what your local market looks like to a patient searching online.
If you're an investor or analyst, the same data gives you market sizing — actual market sizing, not the kind that comes from extrapolating press releases. How many Phoenix dental practices explicitly advertise cosmetic services? How many accept Medicaid? How many list pediatric care? You can answer all of those with a SQL query against your scraped data, with provenance — you can point to the exact page on each website where the claim appears.
Scaling beyond one query
The same workflow runs across cities. If you want a state-wide dental database for Arizona, you'd typically run the discovery on Phoenix, Tucson, Mesa, Scottsdale, Chandler, Gilbert, and Glendale — that covers about 90% of the state's dental population. Each city takes 30 seconds to discover and 15 minutes to extract, give or take.
For a national list, you'd pick the 50 largest metros and run them in sequence. Total cost on Extractly's Growth plan ($79/mo, 4,000 credits) would be one or two months' subscription depending on how many practices show up per city.
The corpus across runs builds up. By the time you're 5 cities in, the cache and shared corpus are doing meaningful work — your 6th city often takes 10 seconds instead of 30 because half the businesses overlap with cities you've already done.
The honest limits
A few things you should know going in:
You won't get personal mobile numbers. Practices don't put these on their websites. If a paid database sells you these, ask where the data came from. The answer is usually "scraped from a previous CRM," and the legal status of that is shaky.
You won't get every email working. Plan for 8-15% of office emails to bounce on a cold run. If deliverability matters, verify with NeverBounce or ZeroBounce ($0.008 per check) before you send.
You won't get fresh personnel changes. A new dentist hired last week won't show up until the practice updates their team page, which can be months. If real-time staffing data matters, you need LinkedIn data — and that comes from Proxycurl or similar, not from website scraping.
You won't always hit your target number. As covered above, the ceiling on Phoenix dentists is around 500 — if you ask for 1,000, you'll be told "we found 480 for you," and pushing harder won't change that. The web doesn't have 1,000 distinct Phoenix dental websites.
For most local-business outbound, none of these limits matter. You just need to know about them so you can size your campaigns accordingly.
How to try this
If you want to see whether this is real on a vertical you actually care about, run a free discovery. The free tier covers 50 leads, which is enough to see whether the funnel for your specific query gives you usable counts. If it does, the paid plans are basically a multiplier. If it doesn't, you've saved yourself the cost of trying.
And if you've already built something like this internally — or hit a wall you think we should know about — drop me a note via the docs page. Practical use cases from real outbound teams are how we figure out what to build next.