Chapter 1 of 9
Use Authoritative Nameservers for DNS Record Verification
ENG-5268 — Switches from Cloudflare's global DNS cache to querying authoritative nameservers directly, reducing false negatives during DNS setup.
aaricpittman authored
6 commits
16 files changed
branch ENG-5268
Intent: Previously, DNS record verification queried Cloudflare (1.1.1.1) which could return stale cached results. This PR walks the DNS delegation chain from IANA root servers to discover and query authoritative nameservers directly — the source of truth for any domain's DNS records. Falls back to Cloudflare if NS discovery fails.
Side effect: Consolidates ~320 lines of duplicate, hand-rolled DNS lookup logic in Emails::Domain into the existing Record object abstraction. Also adds a grace period to prevent premature unvalidation of recently-verified records during transient DNS failures.
Chapter 2 of 9
Walking the Delegation Chain
New resolve_authoritative_nameservers method walks root → TLD → domain to discover authoritative NS.
+50 lines
dns_resolver.rb
base.rb
What Changed
A new method resolve_authoritative_nameservers on DNSResolver performs a full delegation walk starting from a random IANA root server. It queries down through TLD servers until it reaches the domain's own NS records.
The base resolver class also gains resolve_with_authoritative_nameserver — a convenience method that discovers the authoritative NS, picks one at random, then resolves the record through it. Falls back to Cloudflare if no NS is found.
How It Works
PublicSuffix.domain(hostname) extracts the registrable domain.
- Splits into labels and iterates from the TLD down to the registrable domain.
- At each level, opens a new
Resolv::DNS pointed at the current NS and queries for NS records of the next label.
- On the final iteration, returns all NS hostnames for the domain.
Design Decisions
The 13 IANA root servers are hardcoded as ROOT_NAMESERVERS. A random one is picked via .sample to distribute load. Each resolver is properly closed in an ensure block to prevent socket leaks.
Warning — Nondeterministic NS Selection
nameservers.sample picks a random authoritative NS for each check. If one NS has stale zone data while another is updated, results could be inconsistent across invocations.
Warning — Error Propagation
resolve_authoritative_nameservers only closes the resolver in ensure. Network timeouts or DNS errors from getresources will propagate unhandled, and the old code had retry logic with backoff.
+ ROOT_NAMESERVERS = %w[
+ a.root-servers.net
+ b.root-servers.net
+ ...
+ m.root-servers.net
+ ].freeze
+ def resolve_authoritative_nameservers(hostname:, timeout: DEFAULT_TIMEOUT)
+ domain = PublicSuffix.domain(hostname)
+ return [] unless domain
+
+ labels = domain.split(".")
+ current_ns = ROOT_NAMESERVERS.sample
+
+ labels.length.times do |i|
+ query_name = labels[labels.length - 1 - i..].join(".")
+ ns_resolver = library.new(nameserver: current_ns)
+ ns_resolver.timeouts = timeout
+ ns_records = ns_resolver.getresources(query_name, Resolv::DNS::Resource::IN::NS)
+ next if ns_records.empty?
+ current_ns = ns_records.first.name.to_s
+ return ns_records.map { |ns| ns.name.to_s } if i == labels.length - 1
+ ensure
+ ns_resolver&.close
+ end
+
+ []
+ end
+ def resolve_authoritative_nameservers(hostname:, timeout: DEFAULT_TIMEOUT)
+ []
+ end
+ def resolve_with_authoritative_nameserver(name:, types: [], timeout: DEFAULT_TIMEOUT)
+ nameservers = resolve_authoritative_nameservers(hostname: name, timeout: timeout)
+ nameserver = nameservers.any? ? nameservers.sample : ..::CLOUDFLARE_GLOBAL_DNS
+ with_options(nameserver: nameserver).resolve(name: name, types: types, timeout: timeout)
+ end
Chapter 3 of 9
RecordChecker: Authoritative Lookup
New check_authoritative method with automatic Cloudflare fallback.
+24 lines
−1 line
record_checker.rb
What Changed
A new check_authoritative method is added to RecordChecker. For each record, it resolves authoritative nameservers, picks one at random, queries it through the existing check method, and falls back to check_cloudflare if no authoritative NS is found.
Fallback Strategy
The fallback to Cloudflare is a good safety net. If NS discovery fails, behavior reverts to the previous approach rather than failing entirely.
Warning — Sequential Per-Record Processing
check_authoritative iterates records with .each and calls check per record individually. For the current single-record usage this is fine, but it is worth noting if batch usage is planned.
CLOUDFLARE_KEY = "cloudflare"
CLOUDFLARE_GLOBAL_DNS = "1.1.1.1"
+ AUTHORITATIVE_KEY = "authoritative"
+ def self.check_authoritative(...)
+ new.check_authoritative(...)
+ end
+ def check_authoritative(records:, timeout: DEFAULT_TIMEOUT)
+ results = {}
+ records.each do |record|
+ nameservers = resolver.resolve_authoritative_nameservers(hostname: record.name, timeout: timeout)
+ result = if nameservers.any?
+ check(records: [record], name_servers: {AUTHORITATIVE_KEY => nameservers.sample}, timeout: timeout)
+ else
+ check_cloudflare(records: [record], timeout: timeout)
+ end
+ results.merge!(result)
+ end
+ results
+ end
- attr_reader :resolver
+ attr_reader :resolver, :default_timeout
Chapter 4 of 9
Record Entity: Flexible NS Targeting
matches? now accepts name_servers param and defaults to authoritative lookup.
+21 lines
−4 lines
entities/record.rb
What Changed
The matches? and private matches methods on ClickFunnels::DNS::Entities::Record now accept an optional name_servers hash parameter.
Behavior Change
- Before: Always called
record_checker.check_cloudflare.
- After: If
name_servers are provided, it uses record_checker.check. Otherwise it uses record_checker.check_authoritative.
API Compatibility
The new parameter defaults to {}, so existing callers continue to work without modification. The semantic change from Cloudflare to authoritative lookup is implicit.
- def matches?(ignore_cache: false, record_checker: ..RecordChecker.new)
- matches(ignore_cache: ignore_cache, record_checker: record_checker)
+ def matches?(
+ ignore_cache: false,
+ name_servers: {},
+ record_checker: ..RecordChecker.new
+ )
+ matches(ignore_cache: ignore_cache, name_servers: name_servers, ...)
- result = record_checker.check_cloudflare(records: [self])[self].values.first
+ result =
+ if name_servers.any?
+ record_checker.check(records: [self], name_servers: name_servers)[self].values.first
+ else
+ record_checker.check_authoritative(records: [self])[self].values.first
+ end
Chapter 5 of 9
The Big Consolidation
~320 lines of hand-rolled DNS lookup logic in Emails::Domain replaced with Record objects.
+53 lines
−131 lines
net −78
emails/domain.rb
What Was Removed
Private DNS lookup helpers are deleted entirely, including TXT, CNAME, and SPF resolver methods, lower-level fetch helpers, public resolver config accessors, matcher helpers, and retry constants.
What Replaced Them
All verify_*_dns_record! methods now delegate to their corresponding *_dns_record_obj.matches? call. The Record objects handle caching, matching, and verification timestamp updates through their set_match_cache callbacks.
Refactored: verify_legacy_dkim_dns_record!
Instead of calling a removed matcher helper, it now constructs a temporary Record with a whitespace-only content matcher and calls matches? on it.
Warning — Removed Retry Logic
The old resolve_dns_* methods had five retries with exponential backoff. The new authoritative NS path has no retry mechanism, so newly-added domains get no retries on first verification.
VERIFICATION_CACHE_PERIOD = 10.minutes
VERIFICATION_GRACE_PERIOD = 1.hour
- DNS_LOOKUP_RETRIES = 5
- DNS_LOOKUP_BACKOFF_SECONDS = 0.05
def verify_dkim_dns_record!
- return true if dkim_dns_record_verified_at.present? && ...
- if dkim_dns_record_matches?
+ dkim_dns_record_obj&.matches?
60+ lines of manual DNS resolution with retry/backoff deleted
- def resolve_dns_txt_record(record_name)
- public_resolvers.each do |resolver|
- DNS_LOOKUP_RETRIES.times do |attempt|
- result = dns_txt_records(ns_resolver, record_name).first
- rescue Resolv::ResolvError, Resolv::ResolvTimeout => e
- Error.soft(e, context: "DNS TXT lookup failed...")
- end
- def public_resolvers
- Rails.configuration.outgoing_http[:public_resolvers]
- end
Chapter 6 of 9
Grace Periods & Content Matchers
set_match_cache callbacks now respect a 1-hour grace period. New DKIM normalizer for whitespace and prefix differences.
+40 lines
−8 lines
emails/domain.rb
Grace Period in set_match_cache
Previously, the set_match_cache lambda on each Record object was a simple ternary updating verified_at. Now it has three-way logic with an explicit one-hour grace period before clearing verification on a miss.
DKIM Content Matcher
New dkim_content_matcher logic normalizes values before comparison by stripping whitespace and removing the v=dkim1; prefix when present.
- set_match_cache: ->(matched) { update_columns(dkim_dns_record_verified_at: matched ? Time.now : nil) }
+ set_match_cache: ->(matched) {
+ if matched
+ update_columns(dkim_dns_record_verified_at: Time.now)
+ elsif dkim_dns_record_verified_at.blank? || dkim_dns_record_verified_at < VERIFICATION_GRACE_PERIOD.ago
+ update_columns(dkim_dns_record_verified_at: nil)
+ end
+ }
+ def dkim_content_matcher
+ ->(expected, actual) {
+ normalize = ->(v) { v.gsub(/\s+/, "").sub(/\Av=dkim1;/i, "") }
+ normalize.call(expected) == normalize.call(actual)
+ }
+ end
Chapter 7 of 9
View & Helper Cleanup
Templates simplified to use Record objects. 210 lines of unused helper methods removed.
+22 lines
−255 lines
net −233
_dns_record.html.erb
_show.html.erb
_dns_table.html.erb
_cname_data_row.html.erb
domains_helper.rb
domains.en.yml / .ja.yml
_dns_record Partial
The partial is simplified to accept domain and record_obj. It calls record_obj.matches? once and uses the object's own attributes directly.
DomainsHelper
About 210 lines of verification helper methods are removed. This consolidates behavior into Record objects and narrows the helper down to view-specific logic.
Locale Key Rename
cname: is renamed to return_address: in both locale files to match the record object's reference_name.
Warning — Inline DNS Lookup on Render
The _dns_record partial now calls record_obj.matches? at render time. That triggers a full DNS resolution synchronously during view rendering.
Warning — Bug Fix Embedded in Unrelated Change
In domains_helper.rb, @domain.has_dependent_destroy_error? changed to domain.has_dependent_destroy_error?. It fixes a real bug, but it is buried in a large cleanup commit.
- attempt_to_verify_on_render ||= false
- record_verified ||= if attempt_to_verify_on_render
- case type
- in :dkim
- domain.verify_dkim_dns_record!
+ record_type = record_obj.reference_name
+ record_verified = record_obj.matches?
Bug fix: @domain → domain (ivar → parameter)
- def parsed_domain(domain:)
- @parsed_domain ||= PublicSuffix.parse(domain.name)
def domain_usage_header(domain)
- if @domain.has_dependent_destroy_error?
+ if domain.has_dependent_destroy_error?
Chapter 8 of 9
Test Coverage
New tests for authoritative resolution. Existing tests updated to match the new architecture.
+230 lines
−182 lines
net +48
dns_resolver_test.rb (+106)
record_checker_test.rb (+42)
record_test.rb (+21/−6)
domain_test.rb (+61/−176)
New: dns_resolver_test.rb
Comprehensive tests cover a successful root → TLD → domain resolution walk, empty NS results, invalid PublicSuffix domains, and the Cloudflare fallback path.
New: record_checker_test.rb
Tests verify that check_authoritative queries the authoritative nameserver when one is found and falls back to Cloudflare when one is not.
Updated: record_test.rb and domain_test.rb
Existing tests were updated to point at check_authoritative, to cover explicit name_servers routing, and to validate the new DKIM matcher and grace-period behavior.
+ describe "#resolve_authoritative_nameservers" do
+ describe "when delegation chain resolves successfully" do
+ it "walks root -> TLD -> domain and returns authoritative nameservers" do
+ mock_resolv_library.expects(:new).with(nameserver: "a.root-servers.net")
+ mock_root_resolver.expects(:getresources).with("com", Resolv::DNS::Resource::IN::NS)
+ mock_resolv_library.expects(:new).with(nameserver: "a.gtld-servers.net")
+ mock_tld_resolver.expects(:getresources).with("example.com", Resolv::DNS::Resource::IN::NS)
+ result = subject.resolve_authoritative_nameservers(hostname: "www.example.com")
+ assert_equal ["ns1.example.com", "ns2.example.com"], result
+ end
+ end
+ end
Chapter 9 of 9
Review Summary
Key takeaways and action items for this PR.
Verdict
This is a well-structured refactoring that replaces fragile, duplicated DNS resolution code with a cleaner abstraction. The authoritative NS approach is technically correct and should reduce false negatives for DNS verification.
Strengths
- Proper delegation chain walk follows DNS standards.
- Cloudflare fallback ensures no regression if NS discovery fails.
- Grace period prevents flapping on transient DNS failures.
- Consolidation removes significant code duplication.
- Good test coverage for the new code paths.
- Locale files updated in both English and Japanese.
Action Items
- Retry logic removed: consider whether the authoritative NS path needs at least one retry for transient network failures.
- _show.html.erb renders DNS inline: verify whether blocking DNS resolution during render is acceptable.
- Nondeterministic NS selection:
nameservers.sample could complicate debugging flaky verifications.
- Embedded bug fix: the
@domain → domain helper fix should be called out in the PR description.