ASL utilizes DNS servers based on powerdns with a mysql backend.
These DNS servers support the following:
- AllStarlink.org DNS authoritative
- registration server redundancy
- DNS lookup for nodes information
Authoritative DNS servers
The authoritative DNS servers run on karl-tpa.allstarlink.org and smithers-fnt.allstarlink.org with the backend in the distributed database. These servers may be administered via 'pdnsutil' on the cli or via the gui at http://karl-tpa.allstarlink.org:9191 or http://smithers-fnt.allstarlink.org:9191 over the VPN or via the bastion hosts.
DNSSEC is enabled on all domains and trust is expanded to all sub servers.
Secondary DNS is very important as provided by ns[1-4].keekles.org and ns6.gandi.net. This is very important as if the database is hard down in FNT and TPA, the primary DNS will be offline. With the secondary servers online DNS will continue to work, and NMS requires DNS for the allstarlink.org zone.
This Zone is served by the registration servers, and is pulled directly from the database. There is no secondary on these zones, just the three primary servers on the registration servers.
The redundancy of registration is handled by a TTL of 120 seconds on all the records. We've added another field in the 'records' table 'UnixSeconds' which is NULL by default, but updated by the heartbeat health check scripts on the servers. If the heartbeat script detects the DB or connectivity down at a site, it will shut down that server and stop updating the DNS UnixSeconds.
On the DNS server we have modified the default query for a lookup to:
gmysql-basic-query=SELECT content,ttl,prio,type,domain_id,disabled,name,auth FROM records WHERE disabled=0 AND (UnixSeconds is NULL OR UnixSeconds > (UNIX_TIMESTAMP() - 120)) and type=? and name=?
This will only return a record if UnixSeconds is Null or has been updated in the last 120 seconds.
If the node loses connectivity, it will be timed out of DNS due to this in 120 seconds. This is a "dead-man switch" function which will enable losing any one node in the cluster.
register.allstarlink.org is a CNAME to register.regscvs.allstarlink.org under this. The node list servers are under this as well as node[1-4].allstarllink.org CNAME nodes.regsvcs.allstarlink.org
recovery of a down server
Need to fill this out, right now it's a manual verification, db edit and reset.
DNS node lookup
nodes.allstarlink.org is delegated to a DNS running on the db servers. The users_Nodes table has a trigger which is run and creates entries/edits them on the records table in the 'allstar' database. This populates a SRV, TXT and A record for every node in the system when it's updated. The trigger has been optimized and has little to no preformance impact on the registration process.
Note that servers not in nodes list can appear in DNS, there is no ageing out of entries in DNS. It's up to the server to know it's registered.
_iax._srv.<nodenumber>.allstarlink.org. will return for a node as follows:
_iax._udp.50000.nodes.allstarlink.org. 30 IN SRV 10 10 4569 50000.nodes.allstarlink.org.
where 4569 is the IAX port and then it will do a A lookup on 50000.nodes.allstarlink.org. for the IP.
A remote base will be returned like:
_iax._udp.50000.nodes.allstarlink.org. 30 IN SRV 10 10 4569 50000.remotebase.nodes.allstarlink.org.
<nodenumber>.nodes.allstarlink.org. and <nodenumber>.remotebase.nodes.allstarlink.org. will return the IP address of the IAX server or the proxy IP if defined.
The TXT record is used for debugging purposes with a query below:
This will return:
"NN=50000" "RT=2019-02-28 18:41:29" "RB=0" "IP=18.104.22.168" "PIP=" "PT=4569" "RH=register-fnt" NN is node number RT is the last update registration time RB is 0 for node is not a remote base, RB is 1 if it is a remote base IP is the IP address of the node PIP is the proxy IP of the node if set PT is the port RH is the registration server the node last registered to.