Thursday, June 14, 2012

test-driven infrastructure changes

I'm currently working a lot on infrastructure-related tasks. As a software developer I really appreciate TDD with all the certainty I get for code changes. Many changes of the infrastructure stack can be tested manually using curl. For instance to check if a request to a certain URL succeeds, this is all you need:

~ curl  -I "http://10.0.0.1/"
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 14 Jun 2012 10:19:33 GMT
...

You can easily add cookies, do virtual host routing and pretend SSL termination:

~ curl  -I -H"Host: awwsnap.io" -H"X-Forwarded-Proto: https" \
        -b"cookie=value" "http://10.0.0.1/"

Ok, that's old news. But, how about automating those tests and spec'ing your stack?

To accomplish this, I use roundup made by Blake Mizerany. It's a great testing framework that perfectly fits into UNIX environments. And its tests are written in plain bash. That's exactly what I need. Check it out, if you don't know it yet!

With roundup we can quickly put the example from before into a spec:

#!/usr/local/bin/roundup

### matchers

function returns {
  cat | head -n1 | grep $1
}

### spec

it_should_redirect_requests_to_root() {
  curl  -I -H"Host: awwsnap.io" -H"X-Forwarded-Proto: https" \
        -b"cookie=value" "http://10.0.0.1/" | returns 200
}

Roundup detects the success or failure of a test by its error status. In the previous example, the matcher function returns uses grep to find 302 in the first line of the response header. If grep finds that http status code it returns 0 otherwise 1.

This enables me to describe the wanted behaviour of a system up-front and "fix it" in a more focused way.
One additional benefit during a rollout of a new configuration is that you can now run the test repeatedly against a host to validate the changes.

To stimulate your imagination a bit more, here are some additional examples:

#!/usr/local/bin/roundup

### matchers

function returns {
  cat | head -n1 | grep $1
}

# check for rails-specific header
function is_rails {
  cat | grep "^X-Runtime: "
}

function redirects_to {
  res=$(cat)
  echo $res | returns 302 && echo $res | grep "Location: $1"
}

### helpers

# using tee makes debugging easier
function get {
  curl  -I -H"Host: awwsnap.io" -H "X-Forwarded-Proto: https" \
        -b"cookie=value" | tee /dev/stderr
}

### spec

it_should_serve_root() {
  get "http://10.0.0.1/" | returns 200
}

it_should_pass_to_rails_backend_for_admin() {
  get "http://10.0.0.1/admin" | is_rails
}

it_should_redirect_to_cdn_for_assets() {
  get "http://10.0.0.1/assets" | redirects_to "cdn.awwsnap.io"
}




Wednesday, November 16, 2011

Ruby Heredocs in Array Constants

While writing a spec that uses different chunks of csv and txt data I was wondering about the best way to define multi-line strings in array constants.

Normally, I would use Heredoc to define a single multi-line string like this:

CSV_CHUNK = <<-CSV
10, "a", "b"
20, "c", "d"
30, "e", "e"
CSV

Perfect. The unattractiveness starts when adding more chunk definitions. It usually ends up with having CSV_CHUNK_0, CSV_CHUNK_1, CSV_CHUNK_3 and so on in place. Thats a bit unfortunately. For example this hinders to use normal array iteration like each and friends.

So, my question was if there is a way to simply add chunk after chunk to an array. Sure its possible:

chunks = []
chunks <<<<-CSV
10, "a", "b"
20, "c", "d"
30, "e", "f"
CSV
chunks <<<<-CSV
40, "a", "b"
50, "c", "d"
60, "e", "f"
CSV

This is valid Ruby syntax. Actually its just the << method of Array plus the Heredoc syntax. ( Yes, you can add a space inbetween :) )

But, since we are altering a variable we can't use a constant here. To use a constant, we have to do the Heredoc definition inline in the Array declaration:

CHUNKS = [
  <<-CSV ,
10, "a", "b"
20, "c", "d"
30, "e", "f"
  CSV
  <<-CSV ]
40, "a", "b"
50, "c", "d"
60, "e", "f"
   CSV

Although, this looks pretty scary, its again valid Ruby syntax. As many other languages Ruby allows a comma in front of the closing square bracket. We can use this to pretty up this construct and to make it more readable:

CHUNKS = [
  <<-CSV ,
10, "a", "b"
20, "c", "d"
30, "e", "f"
  CSV
  <<-CSV ,
40, "a", "b"
50, "c", "d"
60, "e", "f"
   CSV
]

Thursday, October 6, 2011

Flume and Hadoop on OS X

For a kick-ass webscale big-data setup on your local mac, you'll want to have Hadoop and Flume place.
No seriously - this setup is especially useful, if you want to route your syslog output from your nodes to HDFS in order to process it later using Map/Reduce jobs.

It took me a while to figure out, how Flume and Hadoop have to be configured so that receiving messages are getting written into HDFS. This is why I decided to write a quick tutorial to get things up and running.

I assume that you have brew in place. So, the first step is as easy as:

brew install flume

But don't think that you can just install Hadoop with brew. The current version of Hadoop is pinned to 0.21 wich is an unstable version that, AFAIK, doesn't play together with Flume. We need version 0.20.2. I edited my Hadoop formula locally on my mac. But this should work, too:

brew install https://raw.github.com/mxcl/homebrew/d0efd9ee94a55e243f3b10e903526274fc21d569/Library/Formula/hadoop.rb

After you finished the hadoop installation, we need to edit a bunch of files in order to configure hadoop for local single node setup. Go to /usr/local/Cellar/hadoop/0.20.2/libexec/conf and change the following files:

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
  </property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

Hadoop is now configured to use /tmp/hadoop as HDFS folder. Now, we need to create and format the directory.

mkdir /tmp/hadoop
cd /tmp/hadoop
hadoop namenode -format

Hadoop will connect to localhost using ssh. To configure ssh in the way that it can connect from localhost to localhost without needing a password, we need to add you public key to your authorized keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

We can test this by trying to ssh into localhost without using a password:

ssh localhost

You now should be able to start Hadoop. Fire up Hadoop by typing:

start-all.sh

To check if HDFS is up and running, try to list the files of your brand-new distributed filesystem:

hadoop dfs -ls /

To make Flume talk to HDFS we need to replace the hadoop-core.jar in the lib directory with the one that was shipped with hadoop.

cd /usr/local/Cellar/flume/0.9.3-CDH3B4/libexec/lib/
mv hadoop-core-0.20.2-CDH3B4.jar hadoop-core-0.20.2-CDH3B4.jar.unused
cp /usr/local/Cellar/hadoop/0.20.2/libexec/hadoop-0.20.2-core.jar .

Now it's time to start a Flume master node:

flume master

Go to a different terminal window and start a Flume node, too:

flume node_nowatch

At this point, we should be able to start the "dashboard" of Flume in the browser. Open http://localhost:35871/.
In the config section we can now configure a sink that writes into HDFS. For testing purposes we can use a fake source that just reads a local file. Choose your local node from the drop-down list and enter

text("/etc/services")
as source and
collectorSink("hdfs://localhost/","testfile")
as sink.
Now, check if the file was written to HDFS with:
hadoop dfs -ls /
# cat it
hadoop dfs -cat /testfilelog.00000038.20111006-160327897+0200.1317909807897847000.seq

If everything worked well, you're fine to switch the source in Flume to:

syslogUdp(5140)
Now, Flume acts like an syslog server and it writes all log messages directly to HDFS.

Cheers, Arbo

Friday, December 17, 2010

AMQP: Integrating Spring and Rails with RabbitMQ

Why?

Spring web applications tend to grow and become bigger and more complicated over the time. So it's often a good choice to source out a specific responsibility and process it in an other app made for the specific job. You can build a clean interface to the new app and maybe have an extra team just working on it. Plus you have a free choice of the programming language. I like Ruby.

Another big reason is that you don't want to do all computations in the time of one request of your Spring webapp. If something could be processed later, why not improve the response time and latency of your web app by asynchronously handing the job over to another app.

Maybe you guessed it: This can easily done with the AMQP protocol and one or many RabbitMQ broker. AMQP stands for Advanced Message Queuing Protocol which is an open standard and RabbitMQ is an excellent open sourced server or broker that implements this protocol. With AMQP you can asynchronously send messages in a standardized protocol that is supported by the vast amount of programming languages. There even is a spring project called spring-amqp and of course a ruby gem called amqp by tmm1. To get an idea how RabbitMQ works I encourage you to check the projects homepage.

Getting started

First install RabbitMQ. Be sure to install RabbitMQ of version greater than 2. Otherwise you get "Protocol Mismatch" Errors.

If your on a Mac and have brew installed its easy like this:
$ brew install rabbitmq

Next install the amqp gem and start rabbitmq. You can use the following scripts to verify that sending and receiving messages over AMQP works:

# consumer.rb
require "rubygems"
require "mq"

AMQP.start do
  queue = MQ.queue('hello.world.queue')
  
  queue.subscribe do |word|
    puts word
  end
  
end
# producer.rb
require "rubygems"
require "mq"

AMQP.start do
  queue = MQ.queue('hello.world.queue')
  
  i = 0
  EM::add_periodic_timer(1) do
    queue.publish "hello world #{i+=1}"
  end
  
end

The Consumer simply subscribes to the hello.world.queue. If a message arrives the subscribe block gets called and prints the message to the console. The Producer just pushes a message every second to the queue. Since AMQP already uses EventMachine, you can use all features of EventMachine like add_periodic_timer and so on.

If everything works fine, you should see the Consumer receiving "hello world"-messages.

Now we build a new Rails 3 app and configure it to process messages.

$ rails new consumer_app
You need to put the dependencies for tmm1's amqp gem and the thin server in your gemfile:
gem 'amqp', :require => 'mq'
gem 'thin'

Why do we use thin as webserver? Simply because thin provides a running EventMachine reactor. Without this reactor EventMachine will block the current thread and no HTTP request would be processed. If you like to use an other server than thin you need to activate the EventMachine reactor manually. To do this, just put

Thread.new { EM.run }
in an initializer.

Now, create a Model called example_message.rb with following content:

# app/models/example_message.rb
class ExampleMessage

  def self.start_listen
    AMQP.start do  
      MQ.queue('hello.world.queue').subscribe do |msg|
        puts msg
      end
    end
  end

end

The listen-method can be triggered by a Rails initializer. But the AMQP block will block during the boot process of Rails when we call ExampleMessage.listen directly. We can get around this by using EventMachine to call it later in the next available time slot.

# config/initializers/example_message.rb
EM::next_tick do
  ExampleMessage.listen
end

Notice: My colleague Tobias Sunderdiek found out, that this only works for Ruby Version greater than 1.8.7 with a patchlevel around 320. It will not work with the current Ruby Enterprise Edition, which has a patchlevel of 253.

Now, if you start your Rails app and the Producer script you should see your app receiving messages.

Lets try to switch from the simple Producer script in Ruby to a simple Producer built in Java that uses the same queue. Luckily the Spring-AMQP project comes with an example that exactly does this called helloworld. You can fetch the Spring-AMQP sources using git as shown here:

$ git clone https://github.com/SpringSource/spring-amqp.git

To compile and run the helloworld example navigate to samples/helloworld and type:

$ mvn compile
$ mvn exec:java -Dexec.mainClass="\
  org.springframework.amqp.helloworld.async.Producer"

Those Spring-AMQP examples are a great starting point to find out how Spring-AMQP works. From there it is a easy step to build a Service that does the message sending for a specific use case. Here is a simple Service that uses the RabbitTemplate to send messages:

@Service
public class HelloWorldMQService {

    @Autowired
    private RabbitTemplate rabbitTemplate;

    public void sendMessage(int i) {
        rabbitTemplate.convertAndSend("hello world "+i);
    }
}

The RabbitTemplate must be configured before it can be used. To do this you need a AbstractRabbitConfiguration as shown below:

@Configuration
public class HelloWorldConfiguration 
        extends AbstractRabbitConfiguration {

    protected final String helloWorldQueueName = "hello.world.queue";

    @Override
    public RabbitTemplate rabbitTemplate() {
        RabbitTemplate template = 
            new RabbitTemplate(connectionFactory());
        template.setRoutingKey(this.helloWorldQueueName);
  
        return template;
    }

    @Bean
    public ConnectionFactory connectionFactory() {
        SingleConnectionFactory connectionFactory = 
            new SingleConnectionFactory("localhost");
        connectionFactory.setUsername("guest");
        connectionFactory.setPassword("guest");
        return connectionFactory;
    }

Now you can integrate this AMQP service with your Spring webapp.

If you like to dig deeper in AMQP, Ruby and Spring I encourage you to check out this article and this presentation on infoq.com.

Cheers,
Arbo

Monday, November 29, 2010

Testing SOAP Webservices with RSpec

SOAP webservices are widely used in enterprise environments. Although they feel a bit clumsy in comparison to slim REST services, sometimes you have to deal with them.

The great thing is, to test such a service you are often free to use any tool you like. I like RSpec!

To query a web service you just need a few lines of code. I recommend Savon as SOAP client. It is used as shown here:

require 'rubygems'
require 'savon'

WSDL_URL  = 'http://www.webservicex.net/geoipservice.asmx?wsdl'

client = Savon::Client.new WSDL_URL
response = client.get_geo_ip do |soap|
  soap.body = { "wsdl:IPAddress" => "209.85.149.106" }
end
puts response

The response object can be converted to hash with the to_hash method, so you can fetch all values simply like you would do it with any other hash.

Now, the rest should be easy and is just a normal RSpec test:

require 'rubygems'
require 'savon'

WSDL_URL  = 'http://www.webservicex.net/geoipservice.asmx?wsdl'

RETURN_CODE_OK    = "1"
RETURN_CODE_ERROR = "0"

describe "Geo IP Webservice at #{WSDL_URL}" do
  
  # helper method
  def get_geo_ip_result ip
    response = @client.get_geo_ip do |soap|
      soap.body = {"wsdl:IPAddress" => ip}
    end
    response.to_hash[:get_geo_ip_response][:get_geo_ip_result]
  end
  
  before :all do
    @client = Savon::Client.new WSDL_URL
  end

  it "should yield a country name" do
    result = get_geo_ip_result "209.85.149.106"
    result[:country_name].should_not be_nil
    result[:return_code].should eql(RETURN_CODE_OK)
  end
 
  it "should return error for malformed ip address" do
    result = get_geo_ip_result "not.an.ip.address"
    result[:return_code].should eql(RETURN_CODE_ERROR)
  end
 
  it "should fail if no ip address is submitted" do
    lambda { @client.get_geo_ip }.should raise_error
  end

  # ...

end
Happy testing!

EDIT:

@dbloete pointed me to the fact that with RSpec 2 you can expect errors even more readable:
it "should fail if no ip address is submitted" do
  expect { @client.get_geo_ip }.to raise_error
end

Wednesday, August 18, 2010

CouchDB: Using List Functions to sort Map/Reduce-Results by Value

I just found out that it is possible to sort the result of Map/Reduce with a list function.

Let's take the simple example that you want to count all documents grouped by a field called type. The following map function emits the values of the type fields of all documents:

function(doc) {
  emit(doc.type, 1);
}

To sum up the documents with the same value in the type field, we just need this well-known reduce function:

function(key, values) {
  return sum(values)
}
By default CouchDB yields the result ordered by the keys. But if you want to order the result by occurrences of the type of the document you either have to sort it in your app or you use a list function like this:
function(head, req) { 
  var row
  var rows=[]
  while(row = getRow()) { 
    rows.push(row)  
  } 
  rows.sort(function(a,b) { 
    return b.value-a.value
  }) 
  send(JSON.stringify({"rows" : rows}))
}
If you save the list function as sort and the Map/Reduce-functions as count together in a design document, you can fetch your sorted result like this:
curl http://.../design-doc/_list/sort/count?group=true

Of course there are other options to sort a view result. I didn't found much documentation on this topic, but this thread at stackoverflow is very informative.

Back to the couch - Cheers!

Thursday, August 5, 2010

jQuery meets CoffeeScript

Disclaimer: This article is pretty old.


I'm just amazed how brilliant jQuery and CoffeeScript are working together. JQuery promises "write less, do more", but with the clean syntax of CoffeeScript you can be even more concise!

Here is a quick example:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title>jQuery meets CoffeeScript</title>
    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
    <script src="http://jashkenas.github.com/coffee-script/extras/coffee-script.js"></script>
  
    <script type="text/coffeescript">

      show_message = (msg) -> 
        $('#message').hide().text(msg).fadeIn(2222, 
          -> $('#message').append('!') 
        )
              
      $ -> show_message "world"
      $('#message').click -> show_message "you"
      
    </script>

</head>
<body>
  <h1>hello <span id="message"></span></h1>
</body>
</html>

Here you see it running. Just click "world" to fire the click event.

hello

Just have a look to the JavaScript version:

var show_message = function(msg) {
  return $('#message').hide().text(msg).fadeIn(2222, function() {
    $('#message').append('!');
  });
};
$(function() {
  show_message("world");
});
$('#message').click(function() {
  show_message("you");
});
Happy coding!