Open Source and Security

Monday, June 11, 2018

Introduction to Ansible and writing your first playbook

Well it's been way too long since I've written in this blog, so I thought I'd put out a quick how-to on getting started with Ansible.

Ansible is an automation tool for configuration management, similar to Puppet which I wrote about in my previous post.

It's written in python, which means whitespace is very important, which in turn means if you indent things wrong, you'll get errors that may not help you pinpoint the problem. But the main advantage of Ansible, over other similar tools like Puppet, is that it doesn't require a daemon to run. In fact all it requires is an ssh connection.

So let's get started. The ansible package only needs to be installed on the machine you plan to run it on. Beyond that, as long as you have ssh access to the rest of your servers, you can start automating. You don't even need a dedicated account to run ansible under(you can run it from your own user account if it has sufficient access on the remote machines), but it's a good idea to set one up.

In my setup, I used shared keys to make it easier to connect to the rest of the servers, and an ansible account with a passwordless sudo setup. Yes that's not the best security wise, but for an example it's good enough.

This means adding a single line in /etc/sudoers(or in a new file in /etc/sudoers.d if you prefer):

ansible ALL=(ALL) NOPASSWD:ALL

Next important decision you need to make is where you want to store your playbooks. I went with /etc/ansible/playbooks, because /etc/ansible already exists in a default setup.

But before we get to actually writing a playbook, let's start with setting up our hosts. This is done in /etc/ansible/hosts, where there are already some examples. Basically you can define both hosts and then groups of hosts. A group is defined like this:

[ansibletestservers]
lab-ansible-test-01
lab-ansible-test-02

The first line is the group name, the other two are servers that are in DNS. Or if they're not, you can use IP addresses. The first host in my case is actually the host I have running ansible on, but that doesn't have to be the case

Once you have that, you can write a basic playbook to push out a file. This looks something like this:

---
- hosts: ansibletestservers
become: yes
gather_facts: false

tasks:
- name: Copy a new "ntp.conf file into place, backing up the original if it differs from the copied version"
    copy:
      src: ntp.conf.ansible
      dest: /etc/ntp.conf
      owner: root
      group: root
      mode: 0644
      backup: yes

Save the above with a .yml extension, let's say ntp.yml and you're ready to run it. The above directives are essentially self-explanatory, except maybe for "become" which is a directive to tell it to sudo to root before running. "Gather_facts" it would do by default, which is basically to run an inventory of the hardware before it executed, so unless you're specifically trying to do that, it's best to set it to "false". The source file itself can be stored in the same directory.

So let's run it finally, and see what happens. Since I created the ansible accounts, I use that to run my playbooks. So I sudo to root, then su to ansible, then do:

ansible-playbook /etc/ansible/playbooks/ntp.yml

You should see something like:

PLAY ***************************************************************************

TASK [Copy a new "ntp.conf file into place, backing up the original if it differs from the copied version"] ***
ok: [lab-ansible-test-02]
changed: [lab-ansible-test-01]

PLAY RECAP *********************************************************************
lab-ansible-test-01        : ok=1    changed=1    unreachable=0    failed=0
lab-ansible-test-02       : ok=1    changed=1    unreachable=0    failed=0

If you don't see that, then check that your ansible account can ssh to the ansible user on all the machines you want to run it on. Also check after that, that it can sudo to root without a password. If that all works, but your playbook still doesn't, check the permissions on the files.

There's a lot more interesting things you can do with Ansible. I'll do some more advanced examples in my next post.

Saturday, August 30, 2008

An introduction to Puppet(config management)

Puppet is a configuration management tool and more. If you have the same configuration, set of packages, or simply files that you'd like to roll out to multiple machines, puppet is bound to make your life easier.

If it's less than a half dozen machines, you can likely get away with clusterssh, which allows you to control multiple machines at once via ssh. But if you have more, or you want a more elegant and centralized way of managing configuration, you want Puppet. Yes, there's also cfengine, but puppet is said to be more flexible. I can't comment on that, since I've only used cfengine briefly, and thought it was too complicated to be worth it. Having said that Puppet has a fairly steep learning curve as well.

Puppet has a client-server architecture. The client is "puppet" the server is "puppetmaster". Installing puppetmaster will automagically install puppet on the same host. For other hosts that you want to control via your main puppetmaster host, simply install just the puppet package.

By default puppet clients expect their master to be called "puppet" in DNS, but you can change this. If you plan to have multiple puppetmasters(for whatever reason, such as separate networks/clients etc) it's probably a good idea to change this(see below on how to do that). Having said that, the puppet system is clever enough that it won't just start changing things on clients that you specify on the puppetmaster. In fact it's the clients that poll the server for changes, and will only apply a change to themselves, if they've exchanged keys with the server beforehand.

So how do I get the clients to talk to the master?

On each client do:

puppetd --server yourpuppetmaster --waitforcert 60 --test

The puppetmaster can list which clients have asked to be controlled by it:

puppetca --list

Finally, if the server wants to control that client, it should sign it's certificate that the client requested in the previous steps:

puppetca --sign puppetclientname

Note, the puppet client on the puppetmaster server itself, is already authorized, and doesn't need to go through the above steps.

Ok, so let's test it

Let's first try creating a file. Puppet can push out existing files, but it can also create new ones. For this first example, we'll try the latter.

You put the configs in /etc/puppet/manifests, and by default, puppet expects there to be a file called "site.pp" You can split up your configs and have other files in the same directory, and then link them from site.pp, but we'll do that later. For now just add this to your site.pp file(which you'll create):

# Create "/tmp/testfile" if it doesn't exist.
class test_class { 
file { "/tmp/testfile":    
ensure = present,
mode   = 644,    
owner  = root,    
group  = root   }

}

# tell puppet on which client to run the class

  node yourpuppetclient {          #this is the name of one or more of your puppet clients  
     include test_class
  }

Here's another simple example for running a script.
Notice the "require" statement which is where Puppet's power lies.

class test2 {
exec { "my test program"
cwd "/var/tmp",
command = "/var/tmp/test.sh",
alias     = "testscript",
#  require = User['tibor'],   #require that the user "tibor" exists  before running the script       }
}

#And then specify which client to apply it to:

node yourpuppetclient { include test }

So when will the changes be applied?

By default puppet applies its changes every 30min. If you want to manually apply an update, you can run

puppetd -o -v

Changing the puppet master name from the default "puppet"

This is optional...In /etc/puppet/puppet.conf on each client add
[puppetd]
server=yourpuppetmasterserver

and on the server only under the [puppetmasterd] section

  certname=yourpuppetmasterserver

To make sure this post is not too overwhelming, I'll stop here. Next post about puppet, I'll include some more complex examples to show the power of Puppet.

-T

Thursday, August 7, 2008

OpenID and Googlepedia

Googlepedia is a Firefox extension that combines google search results with wikipedia pages for that specific search item. How does it do that? It creates a second windowpane on the right(of your google result page), that contains the wikipedia article for your search string. And if you navigate the Wikipedia links, it will take those links and google search them for you. If it gets in your way, you can hide it. I've found it quite useful as I'm often switching between the two sites.

I've been starting to see OpenID login options on several websites, and always wondered what it was. So I thought I'd try it out. But first, what is it? It's an easier way to login without the pain of having to remember multiple usernames and passwords. It's also decentralized and free.

Let's say you have a Yahoo account, and you want to post a comment on Blogger(google's site). By default only people with google accounts can post, or the blog owner has the choice of opening up comments to anyone, which is just asking for spam trouble.

Enter OpenID. Instead of having to create a new Google account, you enter your OpenID, which is a URL(that you sign up for at the OpenID provider) that then takes you back to login to your yahoo account, asks you if you want to login to the new site, and then proceeds. One important distinction here is that you can tell the openID provider site to remember that you've ok'd a certain site, so it doesn't keep prompting you.

And then you're authenticated to the blogger site and can post your comment. It is all done over SSL, so it's encrypted, and your password is not sent between the two sites, only an authentication token. Clever aye?

Or, let's say you have a sourceforge account, with a unique username and password, that you can never remember. Use their new OpenID login instead. The first time you use it, you'll need to login to the actual Sourceforge account, using your username and password(to link the two), but after that you can always just login with the URL(which again, if you're not logged into your openID provider, will prompt you to login.

So how do you get an OpenID? From an OpenID provider, or if you have your own server, you can become your own OpenID provider. If you have a google account, then you already have an OpenID, it's the URL of your blog site, although you'll need to use the beta draft.blogspot.com as your dashboard to enable it for your blog. Yahoo's openID site is openid.yahoo.com. For theirs, you go through a couple of steps to create one, but you can make it custom one(ie. me.yahoo.com/whateveryouwant_here_that's_not_already_taken) I only mention these two cause I have accounts with them. Here's a more complete list of OpenID providers:
http://openid.net/get/

So OpenID is a great idea, but it's just starting to catch on. Some people argue that the password manager within a browser already does what OpenID is attempting to do(ie. save people from having to remember lots of different passwords). That's true, but OpenID works if you're away from your usual computer, and don't have your saved passwords handy. It also doesn't stop blog spammers, just slows them down.

I believe the idea will catch on, as more and more websites start using it. The extent to which one site will trust another, especially competitor's openId provider will likely, and sadly always be limited. A nice exception here is sourceforge, although it's limited to which openID providers it will accept(it appears anyway)

As a final note, Drupal (popular CMS application) now has support for OpenID logins, and the OpenID project is offering a $5000 bounty to other projects that implement it. Nice.
-T

Saturday, August 2, 2008

openldap sync replication instead of slurpd

syncrepl is a new replication mode, first introduced in openldap 2.2, and used exclusively in 2.4, where slurpd is deprecated. So if you're running Etch, you can use both methods, side by side even.

So why would you want to use it(besides the fact that slurpd will be obsolete in Lenny)? Well it provides a smarter way of replication, starting with the fact that your replica can start out completely empty, so no more having to copy DB's to slaves. Also, no more having to restart the master or add config changes when you want to setup a new slave. And reportedly more reliable replication(which I'm keen to see)

There are a couple of concepts in syncrepl that may be confusing at first. First, the "master" is called the "provider" and the slaves are called "consumers". Secondly, the basic setup of syncrepl(called refreshOnly) is a pull-based replication. So the consumer pulls updates from the provider.

So let's say you already have an ldap master configured, and your slaves are configured with the old slurpd replication. How do you start to migrate? In this example, we'll setup a new slave that will use syncrepl. It assumes you already have a replication user that has full read access to the master(you should have this if your use slurpd). It also assumes that you have the directive "lastmod on" enabled on your master. By default it is on, but to get replication working between etch and sarge ldap instances you may have it off. So if you still have sarge boxes in your replica chain, then stop now, otherwise you'll break them :)

First add the following 4 lines to your master:
#Under the Global Directives section
moduleload syncprov.la
#Under your Database definition
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 100
--------------------------------------------------
Don't define the new slave on the master, as you do with slurpd replication.

On the slave, copy the slapd.conf from the master(minus the replica & replogfile lines), and make sure your slave has all the same schemas(in /etc/ldap/schema) that your master does. Then add the following 12 lines to your new slave.
#Under the database definition
syncrepl rid=1 #Identification number for the provider, max 3 digits long
provider=ldap://ldap #your master or rather "provider" ldap server
type=refreshOnly #we want pull-based to start with
interval=00:00:05:00 #schedule a replication event every 5 minutes
searchbase="dc=example,dc=com" #your search base
filter="(objectClass=*)" #get all elements
attrs="*"
scope=sub
schemachecking=on #ensure schema is not violated
bindmethod=simple #authentication method
binddn="cn=replica,dc=example,dc=com" #your replication user
credentials="secret" #your replication password

Now simply restart your slave and watch /var/lib/ldap increase as the data is pulled from the master. Beautiful aye? If you don't particularly like the 5 minute wait, you can decrease that value, or look at setting up refreshandPersist replication "type". Haven't tried that yet, so can't comment on it.

-T

Thursday, July 31, 2008

Splatd, the glue between LDAP and your home directory

LDAP is awesome for central authentication, and even more advanced things like mail routing and database info. But there are some things that it doesn't handle like creating and later cleaning and archiving user home directories. Or easily pushing out authorized_keys files for ssh. This is where splatd comes in.

Splatd can create home directories based on criteria that it can gather from ldap(such as min and maximum uidNumber), can copy your authorized_keys file from ldap, handle .forward files for users(again gathered from ldap), and finally can archive, and later delete home directories for users based on the criteria that you specify.

Unfortunately splatd doesn't have a Debian(etch) package, but it's fairly painless to use install it from source, then take the config and init script from an Ubuntu package. The only thing to adjust in the init script is the location of the binary, and away you go. You can tell it how often to query ldap for updates(default is 10 minutes), and apply its changes.

Update: To get authorized_keys working, you'll need to copy ooo.schema and ooossh.schema to /etc/ldap/schema on all your ldap instances, which allows you to set the sshAccount objectClass, and under that sshPublicKey. You can have multiple public keys.

In my tests it worked very nicely, and I really liked how easy the config file was. I'm pretty sure all of these actions could be done by something like Puppet(which I'll be blogging next week), but splatd made it easy.

Update: Speaking of ldap, it appears that slurpd replication no longer works in 2.4(I'm guessing Debian Lenny) so I'll also be investigating changing that to the new "syncrepl" replication.
-T

Thursday, July 24, 2008

Positive Stress

When is stress good? When it's a .deb package :) What does it do?

It allows you to put the CPU, memory, hard disk, or i/o systems (or all at once if you want) into a loop so you can do stress testing on your system. Why would you want to do that? Well, you can see how your application perform under load, or to identify a bad piece of hardware. Some examples:

Run a CPU test for 30 seconds
stress -c 10 --timeout 30s

Run a memory test for 60 minutes
stress -m 10 --timeout 60m

Run a combined test for 2 days:
stress -m 10 -c 5 -d 2 -i 9 --timeout 2d

Notice how you can specify the number of "hogs"(love that term) for each subsystem.

Be careful that the disk test(-d) will write files and may even fill up your disk(if it's small). Happened to me, but it was very smart and quickly removed its temp files, and exited with an error to let you know what happened.

Also, goes without saying, watch the load on your system and your logfiles to make sure you haven't DOS-ed any of your services. Of course you shouldn't run this outside a scheduled maintenance window, right? :)

Tuesday, July 1, 2008

vnstat-daily network statistics from the CLI

I found vnstat a few days ago, when I was researching netflow monitors for Cacti. Cacti is great for providing a visual display of almost anything that you can query through SNMP, which, provided the extendability of SNMP, can be numeric output from any script over time.

Sometimes, it's nice to have a CLI tool though, that can provide both an active and a historical view of traffic on an interface.It would also be an added benefit, if you didn't _have_ to be root, and more importantly didn't need to sniff the network interface(which is usually quite CPU/memory intensive) vnstat fills this requirement very nicely, and it is an (k)ubuntu package, so just apt-get install it.

After you install it, you need to run
vnstat -u -i eth0 (or eth1, or whatever interface you want to monitor)

It's possible to monitor multiple interfaces

then wait a while for it to gather some data(it reads /proc btw), and then you can have it report by hour(-h) by day(-d) by month(-m) or top10(-t):
Example: vnstat -u -i ath0

ath0 22:56
^ r
| r
| r
| r
| r
| r
| r
| rt
| rt
| rt
-+--------------------------------------------------------------------------->
| 23 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22

h rx (kB) tx (kB) h rx (kB) tx (kB) h rx (kB) tx (kB)
23 0 0 07 0 0 15 0 0
00 0 0 08 0 0 16 0 0
01 0 0 09 0 0 17 0 0
02 0 0 10 0 0 18 0 0
03 0 0 11 0 0 19 0 0
04 0 0 12 0 0 20 0 0
05 0 0 13 0 0 21 0 0
06 0 0 14 0 0 22 3 1