Saturday, August 30, 2008

An introduction to Puppet(config management)

Puppet is a configuration management tool and more. If you have the same configuration, set of packages, or simply files that you'd like to roll out to multiple machines, puppet is bound to make your life easier.

If it's less than a half dozen machines, you can likely get away with clusterssh, which allows you to control multiple machines at once via ssh. But if you have more, or you want a more elegant and centralized way of managing configuration, you want Puppet. Yes, there's also cfengine, but puppet is said to be more flexible. I can't comment on that, since I've only used cfengine briefly, and thought it was too complicated to be worth it. Having said that Puppet has a fairly steep learning curve as well.

Puppet has a client-server architecture. The client is "puppet" the server is "puppetmaster". Installing puppetmaster will automagically install puppet on the same host. For other hosts that you want to control via your main puppetmaster host, simply install just the puppet package.

By default puppet clients expect their master to be called "puppet" in DNS, but you can change this. If you plan to have multiple puppetmasters(for whatever reason, such as separate networks/clients etc) it's probably a good idea to change this(see below on how to do that). Having said that, the puppet system is clever enough that it won't just start changing things on clients that you specify on the puppetmaster. In fact it's the clients that poll the server for changes, and will only apply a change to themselves, if they've exchanged keys with the server beforehand.


So how do I get the clients to talk to the master?

On each client do:

puppetd --server yourpuppetmaster --waitforcert 60 --test

The puppetmaster can list which clients have asked to be controlled by it:

puppetca --list

Finally, if the server wants to control that client, it should sign it's certificate that the client requested in the previous steps:

puppetca --sign puppetclientname

Note, the puppet client on the puppetmaster server itself, is already authorized, and doesn't need to go through the above steps.
Ok, so let's test it

Let's first try creating a file. Puppet can push out existing files, but it can also create new ones. For this first example, we'll try the latter.

You put the configs in /etc/puppet/manifests, and by default, puppet expects there to be a file called "site.pp" You can split up your configs and have other files in the same directory, and then link them from site.pp, but we'll do that later. For now just add this to your site.pp file(which you'll create):
# Create "/tmp/testfile" if it doesn't exist.
class test_class {
file { "/tmp/testfile":
ensure = present,
mode = 644,
owner = root,
group = root }

}
# tell puppet on which client to run the class

node yourpuppetclient { #this is the name of one or more of your puppet clients
include test_class
}
 
Here's another simple example for running a script.
Notice the "require" statement which is where Puppet's power lies.
class test2 {
exec { "my test program"
cwd "/var/tmp",
command = "/var/tmp/test.sh",
alias = "testscript",
# require = User['tibor'], #require that the user "tibor" exists before running the script }
}
#And then specify which client to apply it to:

node yourpuppetclient { include test }

So when will the changes be applied?

By default puppet applies its changes every 30min. If you want to manually apply an update, you can run

puppetd -o -v

Changing the puppet master name from the default "puppet"
This is optional...In /etc/puppet/puppet.conf on each client add
[puppetd]
server=yourpuppetmasterserver

and on the server only under the [puppetmasterd] section
  certname=yourpuppetmasterserver
 
To make sure this post is not too overwhelming, I'll stop here. Next post about puppet, I'll include some more complex examples to show the power of Puppet.
 
-T  

Thursday, August 7, 2008

OpenID and Googlepedia

Googlepedia is a Firefox extension that combines google search results with wikipedia pages for that specific search item. How does it do that? It creates a second windowpane on the right(of your google result page), that contains the wikipedia article for your search string. And if you navigate the Wikipedia links, it will take those links and google search them for you. If it gets in your way, you can hide it. I've found it quite useful as I'm often switching between the two sites.

I've been starting to see OpenID login options on several websites, and always wondered what it was. So I thought I'd try it out. But first, what is it? It's an easier way to login without the pain of having to remember multiple usernames and passwords. It's also decentralized and free.

Let's say you have a Yahoo account, and you want to post a comment on Blogger(google's site). By default only people with google accounts can post, or the blog owner has the choice of opening up comments to anyone, which is just asking for spam trouble.

Enter OpenID. Instead of having to create a new Google account, you enter your OpenID, which is a URL(that you sign up for at the OpenID provider) that then takes you back to login to your yahoo account, asks you if you want to login to the new site, and then proceeds. One important distinction here is that you can tell the openID provider site to remember that you've ok'd a certain site, so it doesn't keep prompting you.

And then you're authenticated to the blogger site and can post your comment. It is all done over SSL, so it's encrypted, and your password is not sent between the two sites, only an authentication token. Clever aye?

Or, let's say you have a sourceforge account, with a unique username and password, that you can never remember. Use their new OpenID login instead. The first time you use it, you'll need to login to the actual Sourceforge account, using your username and password(to link the two), but after that you can always just login with the URL(which again, if you're not logged into your openID provider, will prompt you to login.

So how do you get an OpenID? From an OpenID provider, or if you have your own server, you can become your own OpenID provider. If you have a google account, then you already have an OpenID, it's the URL of your blog site, although you'll need to use the beta draft.blogspot.com as your dashboard to enable it for your blog. Yahoo's openID site is openid.yahoo.com. For theirs, you go through a couple of steps to create one, but you can make it custom one(ie. me.yahoo.com/whateveryouwant_here_that's_not_already_taken) I only mention these two cause I have accounts with them. Here's a more complete list of OpenID providers:
http://openid.net/get/

So OpenID is a great idea, but it's just starting to catch on. Some people argue that the password manager within a browser already does what OpenID is attempting to do(ie. save people from having to remember lots of different passwords). That's true, but OpenID works if you're away from your usual computer, and don't have your saved passwords handy. It also doesn't stop blog spammers, just slows them down.

I believe the idea will catch on, as more and more websites start using it. The extent to which one site will trust another, especially competitor's openId provider will likely, and sadly always be limited. A nice exception here is sourceforge, although it's limited to which openID providers it will accept(it appears anyway)

As a final note, Drupal (popular CMS application) now has support for OpenID logins, and the OpenID project is offering a $5000 bounty to other projects that implement it. Nice.
-T

Saturday, August 2, 2008

openldap sync replication instead of slurpd

syncrepl is a new replication mode, first introduced in openldap 2.2, and used exclusively in 2.4, where slurpd is deprecated. So if you're running Etch, you can use both methods, side by side even.

So why would you want to use it(besides the fact that slurpd will be obsolete in Lenny)? Well it provides a smarter way of replication, starting with the fact that your replica can start out completely empty, so no more having to copy DB's to slaves. Also, no more having to restart the master or add config changes when you want to setup a new slave. And reportedly more reliable replication(which I'm keen to see)

There are a couple of concepts in syncrepl that may be confusing at first. First, the "master" is called the "provider" and the slaves are called "consumers". Secondly, the basic setup of syncrepl(called refreshOnly) is a pull-based replication. So the consumer pulls updates from the provider.

So let's say you already have an ldap master configured, and your slaves are configured with the old slurpd replication. How do you start to migrate? In this example, we'll setup a new slave that will use syncrepl. It assumes you already have a replication user that has full read access to the master(you should have this if your use slurpd). It also assumes that you have the directive "lastmod on" enabled on your master. By default it is on, but to get replication working between etch and sarge ldap instances you may have it off. So if you still have sarge boxes in your replica chain, then stop now, otherwise you'll break them :)

First add the following 4 lines to your master:
#Under the Global Directives section
moduleload syncprov.la
#Under your Database definition
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 100
--------------------------------------------------
Don't define the new slave on the master, as you do with slurpd replication.

On the slave, copy the slapd.conf from the master(minus the replica & replogfile lines), and make sure your slave has all the same schemas(in /etc/ldap/schema) that your master does. Then add the following 12 lines to your new slave.
#Under the database definition
syncrepl rid=1 #Identification number for the provider, max 3 digits long
provider=ldap://ldap #your master or rather "provider" ldap server
type=refreshOnly #we want pull-based to start with
interval=00:00:05:00 #schedule a replication event every 5 minutes
searchbase="dc=example,dc=com" #your search base
filter="(objectClass=*)" #get all elements
attrs="*"
scope=sub
schemachecking=on #ensure schema is not violated
bindmethod=simple #authentication method
binddn="cn=replica,dc=example,dc=com" #your replication user
credentials="secret" #your replication password

Now simply restart your slave and watch /var/lib/ldap increase as the data is pulled from the master. Beautiful aye? If you don't particularly like the 5 minute wait, you can decrease that value, or look at setting up refreshandPersist replication "type". Haven't tried that yet, so can't comment on it.

-T