Friday, October 14, 2011

CEPH on Fedora 15

Yesterday, I read a blog post using CEPH for a backend store for virtual machine images.  I've heard a lot about ceph in the last year, especially after it was integrated into the mainline kernel in 2.6.34.  So I thought I'd give it a try.

Before I get into the install, I want to summarize my thoughts on Ceph.  I think it has a lot of potential, but parts of it are trying too hard to do everything for you.  I always think there is a careful balance between a program doing too much for you, and making you do too much.  For example, the mkcephfs script that creates a ceph filesystem will ssh to all the worker nodes (defined in ceph.conf) and configure the filesystem.  If I was in operations, this would scare me.

Also, the keychain configuration is overly complicated.  I think the Ceph is designed to be secure over the WAN (secure, not encrypted), so maybe it's needed.  But it seems overly complicated when you compare it to other distributed file systems (Hadoop, Lustre).

On the other hand, I really like the full posix compliant client, especially since it's in the mainline kernel.  It is too bad that it was added in 2.6.34 rather than 2.6.32 (RHEL 6 kernel).  I guess we'll have to wait 2 years for RHEL 7 to have it in something we can use in production.

Also, the distributed metadata and multiple metadata servers are interesting aspects to the system.  Though, in the version I tested, the MDS crashed a few times (the system picked it up and compensated).

On Fedora 15, ceph packages are in the repos.
yum install ceph

The configuration I settled on was:
[global]
    auth supported = cephx
    keyring = /etc/ceph/keyring.admin

[mds]
    keyring = /etc/ceph/keyring.$name
[mds.i-00000072]
    host = i-00000072
[mds.i-00000073]
    host = i-00000073
[mds.i-00000074]
    host = i-00000074

[osd]
    osd data = /srv/ceph/osd$id
    osd journal = /srv/ceph/osd$id/journal
    osd journal size = 512
    osd class dir = /usr/lib64/rados-classes
    keyring = /etc/ceph/keyring.$name
[osd0]
    host = i-00000072
[osd1]
    host = i-00000073
[osd2]
    host = i-00000074

[mon]
    mon data = /srv/ceph/mon$id
[mon0]
    host = i-00000072
    mon addr = 10.148.2.147:6789
[mon1]
    host = i-00000073
    mon addr = 10.148.2.148:6789
[mon2]
    host = i-00000074
    mon addr = 10.148.2.149:6789

As you can read from the configuration file, all files are stored in /srv/ceph/...  You will need to make this directory on all your worker nodes.

Next I needed to create a keyring for authentication with the client/admin/dataservers.  The keyring tool is distributed with Ceph, and is called cauthtool.  Even now, it's not clear to me how to use this tool, or how Ceph uses the keyring.  First you need to make a caps (capabilities?) file:

osd = "allow *"
mds = "allow *"
mon = "allow *"

Here are the cauthtool commands to get it to work.

cauthtool --create-keyring /etc/ceph/keyring.bin
cauthtool -c -n i-00000072 --gen-key /etc/ceph/keyring.bin 
cauthtool -n i-00000074 --caps caps /etc/ceph/keyring.bin
cauthtool -c -n i-00000073 --gen-key /etc/ceph/keyring.bin
cauthtool -n i-00000073 --caps caps /etc/ceph/keyring.bin
cauthtool -c -n i-00000074 --gen-key /etc/ceph/keyring.bin 
cauthtool -n i-00000072 --caps caps /etc/ceph/keyring.bin
cauthtool --gen-key --name=admin /etc/ceph/keyring.admin


From the blog post linked above, I used their script to create the directories and copy the ceph.conf to the other hosts.

n=0
for host in i-00000072 i-00000073 i-00000074 ; \
   do \
       ssh root@$host mkdir -p /etc/ceph /srv/ceph/mon$n; \
       n=$(expr $n + 1); \
       scp /etc/ceph/ceph.conf root@$host:/etc/ceph/ceph.conf
   done
mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/keyring.bin


Then copy the keyrings
for host in i-00000072 i-00000073 i-00000074 ; \
   do \
       scp /etc/ceph/keyring.admin root@$host:/etc/ceph/keyring.admin; \
   done


Then startup the daemons on all the nodes:

service ceph start

And to mount the system:
mount -t ceph 10.148.2.147:/ /mnt/ceph -o name=admin,secret=AQBlV5dO2TICABAA0/FP7m+ru6TJLZaPxFuQyg==

Where the secret is the output from the command:
 cauthtool --print-key /etc/ceph/keyring.bin 

5 comments:

  1. According to your description, I tried to do as you describe. At the stage geniratsii key here was this error:
    or:/etc/ceph # cauthtool -c -n i-00000072 --gen-key /etc/ceph/keyring.bin
    error reading config file(s) -n

    Can you help me?
    OS OpenSUSE 11.3
    # ceph -v
    ceph version 0.20

    Thanks!

    ReplyDelete
    Replies
    1. Sorry for the late reply, but it looks like you are not giving the configuration file for cauthtool. According the man page for cauthtool, -c is for 'create key', not configuration file. Maybe it changed meaning with versions? I'm not sure why cauthtool would need a configuration file.

      Delete
  2. hey can you help me out with understanding what exactly the hierarchical cluster map means or its physical significance.

    Thanks in advance .

    ReplyDelete
    Replies
    1. I'm not exactly a CEPH expert, but here's my take.

      The hierarchical cluster map helps CEPH decide where to place data. CEPH internally uses a bunch of hashes, but they follow rules and probabilities in the cluster map to determine where to place the data.

      Hope this helps.

      Delete
    2. i have gone through the complete CRUSH & CEPH paper but i am unable to understand the internal details as they are stated at abstract level.


      anywayz thanx for trying.

      Delete