ISC Dhcpd and failover shutdown
avr 16th, 2008 by Prune
As I was trying to install a ISC DHCP server with failover I just find out… nothing !
Of course many people are talking about how to set up a DHCP server… like :
apt-get dhcpd
server dhcpd start
And that’s it.
I do need more, first because I work on Solaris (10) and because, heh, I’m not doing basic stuffs.
First of all, you have 3 “stable” versions online you can download. Just tell me if you can find the differences between them. I just found that the latest is IPv6, not the other. I chosed to go with the latest, 4.0.0, but with IPv6 disabled.
Compilation was fine. I just had to add a LDFLAG so the compiled binary knows how to find the libcrypto shared object library, which is in /opt/sfw/lib on Solaris 10.
Then comes the configuration.
I went through some website talking about the failover config parameters. You can also find them in the Man pages. This is pretty simple.
On the master :
failover peer "dhcp-failover" {
# ce serveur est le maitre
primary;
# L'adresse IP local du serveur maitre
address 172.16.8.95;
# Le port IP local du serveur maitre
port 1520;
# L'adresse IP du serveur esclave
peer address 172.16.8.96;
# Le port IP du serveur esclave
peer port 1520;
# Le temps (en secondes) de pooling pour definir quand le serveur d'en face est mort
max-response-delay 30;
# Le nombre de message BNDUPD avant de considerer que le serveur d'en face est mort
max-unacked-updates 10;
# Permet d'eviter l'etat ou un serveur repond aux messages de failover mais plus aux requetes
load balance max seconds 3;
# Pendant combien de temps (en secondes) un serveur DHCP qui a repondu sur l'autre moitie de le plage
# que la sienne continue-t-il a repondre aux requetes
mclt 1800;
# Cela permet de specifier comment couper en deux la plage d'adresses dynamiques
# Il utilise un mecanisme de hashage et 128 permet de mettre autant d'adresses IP sur chaque serveur.
split 128;
}On the Secondary :
failover peer "dhcp-failover" {
# ce serveur est le maitre
secondary;
# L'adresse IP local du serveur maitre
address 172.16.8.96;
# Le port IP local du serveur maitre
port 1520;
# L'adresse IP du serveur esclave
peer address 172.16.8.95;
# Le port IP du serveur esclave
peer port 1520;
# Le temps (en secondes) de pooling pour definir quand le serveur d'en face est mort
max-response-delay 30;
# Le nombre de message BNDUPD avant de considerer que le serveur d'en face est mort
max-unacked-updates 10;
# Permet d'eviter l'etat ou un serveur repond aux messages de failover mais plus aux requetes
load balance max seconds 3;
}Then I configured a subnet, options, etc. I also created a special “empty” subnet containing the IP I am serving DHCP on :
# Local Subnet
subnet 172.16.8.0 netmask 255.255.252.0 {
option routers 172.16.11.254;
authoritative;
}Then you can start it… I used the -d (debug) and -f (foreground) options to debug.
I won’t spend too much time here in explanations, but it wasn’t starting. The error message was : no pool defined for any interface.
I am using Vlan, virtual interfaces, etc. So, the physical interface is e1000g1. The vlan tagged one is e1000g54001. The virtual vlan tagged interface is on e1000g54001:4 (the 4th interface).
For doing this, you just plumb interface e1000g54001, giving it no IP, letting it down. Then you use the “addif” option of ifconfig to add the virtual interface over e1000g54001.
For any reason (a bug ?) dhcpd can’t see the IP’s on virtual interfaces if the physical interface is not up.
Just do a “ifconfig e1000g54001 up”.
Then add the e1000g54001 interface at the end of the dhcpd command line to bind against it.
Everything went find then.
Wow!!!!
and what if I stop one of the node ?
Unfortnatly, stopping one node with a kill command set the other node in a mode where it does not answer any request. The reson is that the living node does not know if the other is still alive or not. To be sure not to answer twice to a request, it stop answering for some time. I havn’t been able to know for how long yet. maybe 24 hours. I read few things about it, and never found how to change this behaviour. I would like to minimize this value to 5 minuts, maybe less, so if you have an idea, come along.
Then I heard of omshell and dhcpctl. This is a way to connect to the dhcp server and give him live commands.
After a lot of researches, I finaly found how to use it. First, configure omshell to start up, in both nodes config file :
key omapiname {
algorithm hmac-md5;
secret "XXXXXXXXXX";
};
omapi-key omapiname;
omapi-port 7911;“omapiname” can be any name. Replace the XXX with a key, which can be anything, I think.
You can also use dnssec-keygen to creake a key and copy the public part as a key :
dnssec-keygen -a HMAC-MD5 -b 512 -n HOST dhcpd_key_file
Restart dhcpd and you’re all set.
Start omshell, and connect to one of the node, the one you want to stop :
# /opt/dhcp/dhcp-4.0.0/bin/omshell > server 172.16.248.18 > port 7911 > key rtlomapi "XXXXXX" > connect > new control obj: control > open obj: control state = 00:00:00:00 > set state=2 obj: control state = 2 > update obj: control state = 2
You’ll see in the logs :
Apr 16 18:15:22 cetus dhcpd: [ID 702911 local7.info] failover peer dhcp-failover: peer moves from normal to shutdown Apr 16 18:15:22 cetus dhcpd: [ID 702911 local7.info] failover peer dhcp-failover: I move from normal to partner-down Apr 16 18:15:23 cetus dhcpd: [ID 702911 local7.info] peer dhcp-failover: disconnected
Wow cool,
A dire vrai je m’étais jamais posé la question de savoir si on pouvait faire du fail-over simplement avec un dhcpd, ca tombe pas souvent en panne ces chose là, après la machine c’est un autre probleme.
Intéressant en tout cas
Thanks for the tips about omshell. When i am trying to connect to the omshell, the connection is refused. Primarily, i don’t see the port 7911 being bound by any process on my machine. Is there any basic config i am missing? Please let me know. Following is what i got when i tried to connect:
# ./omshell
> server 127.0.0.1
> port 7911
> key rtlomapi “f10″
> connect
dhcpctl_connect: connection refused
>
Thank you very much.
Are you sure you compiled omshell support in it ?
I don’t remember if you have to, though
Check your config again, maybe you missed somthing. Also check you’re using the latest version of dhcpd.
As I did not really used this extensively, I’m sorry I won’t be able to help you much…
Let me know if you find something useful and i’ll add it here.
Thanks.