Author Archive
How to troubleshoot Problems in Server Setups, Rails Apps or any other Config or Code Problem
This post might be interesting for all people who are faced with strange problems like this: “Yesterday it worked. Now it’s broken” or “It works on my machine (and it does not in production)”.
I’m sure that all Programmers and sysadmins have had an incident like this in their lives. I’ve had a lot of these problems and found out that in the end there’s always an explaination for the problem. Very rarley it’s some quantum mechaincs effect that caused the problem. In most cases there is a really simple explaination for the problem even if it was hard to find. These things include “the cause of a crashing perl script is the java version of the app starting the script”, “failing tests that where caused by a minor version difference of a testing library where the error message lead to something completely different”
The process described below was used in all cases to find the root cause of the problem which then was solved very easily. We weren’t aware that we used this process but rather did it intuitively. After talking about the process and writing it down, we were able to find more and more problems by following these steps and also to transfer the knowledge to other people so that they will build up the intuition to find root causes of problems as well.
General idea
Systems that work and system that don’t work differ.
If you make the not working system equal to the working system, it will work.
That’s all there is to Troubleshooting (basically).
Process to find out the difference
The hard part is to find out where the working and not working systems differ.
The general process is really simple though:
- List all items that can differ
- Check if they differ
- Make them equal (one at a time!)
- Repeat 3 until finished. If still broken, think harder about 1 and start again
The optimized version of this is, to start with the things that are most likely to cause the problem.
The term “most likely” is based on
- your own experience
- information found in the web: blog posts, google searches, etc.
- experiences of your co-workers
It is fundamental that you make each step conciously (writing the step/change down helps to do that). If one step doesn’t yield the desired outcome: revert it immediately. Again: having written it done helps not to forget anything. Forgetting steps may make the situation even worse.
How can systems differ?
The main questions are:
- What changed since it worked (if it is the same system)?
- What is different or changed on the not working system compared to the working system (if it is a different system)?
The latter is a lot easier because you something to compare to. In the former case you have to create the “working system” again. Which in itself may be the solution to the problem.
If the answer is “nothing”. Think again…! Because time has progressed. So at least the time changed.
Possible effects of changed time
- File system full
- weird time dependend behavior of applications
- system/application restart occured
- data changes happend
Other things that may have changed:
- software versions through package updates – Minor Changes are important!
- OS Kernel
- OS packages
- application libraries (ruby gems, jars))
- Database schemas
- Database content
- Filesystem content of any kind (That includes timestamps of a file that is only read!)
- Location of files
- symlink vs. real files
- timestamps
- Hardware
- Increased load
- Network I/O
- Disk I/O
- CPU
- Exceeded RAM -> Swapping
Some things will be straightforward and it is obvious why something brakes something else. Some things are not as obvious (at least not at the time when you try to find it – it’s always obvious afterwards!). Don’t jump to conclusions about cause and effect while you debug. If you think “I’ll don’t try X because X has nothing to do with Y” try X! Maybe it has something to do with Y. You don’t know before you try. Revert (or create the equal state to the working system for) the “most obvious” things that “can’t possibly interfere with the problem”. That includes
- Comments in Source or Configuration files
- Whitespaces
- Trivial Code/Configuration changes
- minor version changes in Packages
The “most likley” rule does apply here, too. Don’t start with whitespace if there are other not so subtle changes still different. Don’t look for access time timestamps if the files on one system are are in completely different locations compared to on the other system. This requires some experience but with time you’ll find which things to look for first.
Tools
- Filesystem-Analysis: df, ls, find,
- Application-Behavior: strace (dtruss on Solaris and Mac OS), lsof, netstat
- Databases: For mysql: mysql, innotop
- Packages: Debian: apt-get, dpkg
- Finding differences/problem causes in running vs. not running code: Binary Search (e.g. via git bisect, debugger or just plain “print”-Debugging).
We hope these thoughts help you to debug and troubleshoot strange problems. Feel free to post additions, comments, tool or experiences with troubleshooting.
Popularity: 1% [?]
Testing PDF generation
Generating PDFs in a Rails application is a fairly common task. Maybe you want to create a letter, report, document or maybe an invoice. Either way the stuff that normally ends up in an PDF is important and you want to make sure the right stuff ends up there.
This pretty much sounds like a case for automated testing. But how do you test PDF content? One option would be to generate the PDF and then create a HTML out of the PDF using pdftohtml, parse the HTML and make some assertions. As you can guess, this approach isn’t very feasable, because the generated HTML isn’t very easy to parse.
Most of the time PDF generation in Rails applications is done using the RTex Plugin – the PDF is generated via LaTeX. This makes testing a lot easier because you can just parse and check the generated LaTeX-Source.
Everyone how has seen a LaTeX source file may ask: How the hell do I parse that?
In our case we added some “helper” comments like ”% SUM BEGIN” and ”% SUM END” before and after the part we were interested in and then used basic RegEx to parse out the interesting part. You have to manually check that the markup still looks as expected due to the newline handling of LaTeX (one is ok, two = new paragraph). Most of the time it is sufficient to look for ERB-Tags and use < %- instead of < %.
This approach works pretty well for us. One question which you should always keep in mind when you write tests is: What do I test on this level of testing and what do I leave out.
For the PDF/LaTeX-Testcase we choose to test the basic interaction between the objects that provide values for the PDF generation and the Template. We don’t test all combinations, just a few basic cases. Testing all or at least a lot of combinations, edge cases etc. is clearly a concern of unit tests.
Popularity: 1% [?]
Setting your custom deploy strategy in capistrano
Capistrano 2 supports custom deploy strategies. You basically just have to implement the “deploy!” and “check!” methods in your class and you’re good to go.
But how do you tell Capistrano to use your strategy?
I looked into the code and found no way of setting your strategy. So I changed capistrano, that it’s possible to set the strategy. When I asked Jamis Buck to pull the change, he suggested that I just set the strategy directly. This approach wouldn’t need any changes to to code base.
Well then we went ahead and did just that. So here’s the code for setting your custom deploy strategy (Don’t forget to require the file with your code)
set :strategy, Capistrano::Deploy::Strategy::DifferentAppRootRemoteCache.new(self)
It works like a charm.
Popularity: 1% [?]
imedo at Scotland on Rails
We’re proud to announce that our talk proposal The big bang – what to do if your Rails codebase grows to big? was accepted by the Scotland on Rails organizers.
From what we’ve heard Scotland on Rails is a nice conference with and by interesting people. Maybe we see you there! Register now!

Popularity: 1% [?]
Memory leak links
Here are some usefull links if you have to find memory leaks in Rails apps:
- Memory leak profiling
- Ruby Memory leaks
- Tracking down a rails memory leak
- Ruby prof
- Heap fragmentation in a long running ruby process
- Ruby live process introspection
- Inspecting a live ruby process
Popularity: 1% [?]
Using rack to hunt memory leaks
To use rails with rack support you can either use the Ezras Rails fork (or edge Rails) or the simpler alternative to get started is just use the rack adapter included in “thin” or “fuzed” (which is basically the same thing – or each of them is based on the other one… or something. Quite self-referential)
To start a rails app with rack just fire up an plain irb and type this:
require 'rack'
require "~/tmp/fuzed/rlibs/rails_adapter.rb"
#=> true
app = Rack::Adapter::Rails.new(:root => "/Users/hvolkmer/imedo/code/imedo")
Now create a request and let your rails app handle it:
req = {"METHOD" => "GET",
"HTTP_VERSION" => [1, 1],
"PATH_INFO" => "/",
"QUERY_STRING" => "",
"SERVERNAME" => "testing:8002",
"HEADERS" => {"connection" => "keep-alive",
"accept" => "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
"host" => "localhost:8002",
"referer" => "http://localhost:8002/main/ready",
"user_agent" => "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3",
"keep_alive" => "300",
"content_length" => "7",
"content_type" => "application/x-www-form-urlencoded",
"Cache-Control" => "max-age=0",
"Accept-Charset" => "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
"Accept-Encoding" => "gzip,deflate",
"Accept-Language" => "en-us,en;q=0.5"},
"HTTP_COOKIE" => "_some_session_id=d3eae987aab3230377abc433b7a8d7c1",
"postdata" => "val=foo"}
response = app.call(req)
Now why is this useful?
Well, you can test your Rails app without a webserver and still utilize the full real application stack (unlike in test environment). We currently use this to get some information about memory usage of certain parts of the application. With rack there is very little overhead (rack itself) compared to say mongrel or thin. Mongrel and thin have their own behaviour when it comes to high loads due to the request queue. You can easily script a load test using rack with just some ruby scripts. No need for httperf or siege . These tools are really useful to test real life loads but when you just want to see how a certain part of your app behaves when you put x requests at it, rack can do this pretty well.
And it’s always a good idea to look at things in isolation. Rack does a pretty good job for that. Also who needs a web browser when you can look at your rails app in irb
Popularity: 1% [?]
Using Rails engines with JRuby and glassfish
If you try to deploy your Rails app which is using Rails Engines with JRuby in an Java application server, you end up with errors on the first request. Basically JRuby is complaining that it cannot create directories. This is due to the fact that engines tries to copy over assets from the plugins into the public dir of the rails app.
In an application server the rails directory structure is “turned inside out” as Nick Sieger calls it. So the public directory is actually inside the base dir of the deployed app and the class files are in the WEB-INF dir.
You can easily fix that if you put the following code right after your engines require in your environment.rb (or inside an initializer file):
if Object.const_defined?("PUBLIC_ROOT")
Engines.public_directory = PUBLIC_ROOT
end
PUBLIC_ROOT is defined inside JRuby (or warble) and set to the correct path by warble.
I found out about this behavouir in the Rails Tutorial Session Take the Jruby Challenge: Deploy Your Rails Application With JRuby and Taste the Difference with the helping hand of Nick Sieger
Popularity: 1% [?]
CruiseControl.rb and changing SVN paths
When you move around some code in your SVN repository or like change the repository URL, CC.rb will start to complain because it isn’t able to find your code where it used to be. The message will be something like this:
svn: Cannot replace a directory from within
To fix this you can use
svn switch file:///this/is/the/path/to/my/new/svn/repo
But CC.rb will still complain and won’t build your project. In order to fix this, you have to commit once and CC.rb will pick up the changes and start building your project again.
We had this exact problem recently and it took some time to figure out how to solve it. So maybe this post will save someone some time.
Popularity: 1% [?]
Es ist immer der Gärtner
Neulich beim Coden:
Hendrik: Ich hab da ein Problem mit XYZ… Hast ‘ne Idee, woran es liegen könnte?
Thomas: Lies den Sourcecode! Ich mache es auch. Ich lese gerade ActiveRecord::Base
Hendrik: Und? Ist es spannend?
Thomas: Ja. Ich glaube es war der Gärtner.
Popularity: 1% [?]
Overriding rake tasks and db:test:prepare strangeness
Some time ago our CC.rb build started to fail with errors like this:
Mysql::Error: Can't create table './cc_test/#sql-8e7_5bab.frm' (errno: 150): ALTER TABLE questions ADD CONSTRAINT questions_ibfk_1 FOREIGN KEY (user_id) REFERENCES users (id) ON DELETE CASCADE
We’re using the foreign key migrations plugin which automatically generates foreign keys for mysq. It works fine but somehow the db:test:prepare rake task started to fail – everytime at a different key and only on the build server.
After hours of hunting down the problem I finally gave up and did what you always can do if you can’t solve a problem: cheat. So I just created a new db:test:prepare task which calls the mysql commands and basically does the same as the rake task. It’s not as portable as the default one, of course, but it has one property that the default one had lost: It works.
Rake doesn’t allow redefining task by default. So to override a rake task you have to delete the task and then define the new one. I opted for the manual remove task and redefine option.
Here’s my task in case anyone experiences similar problems:
We call rake using “rake -I /path/to/override.rb” in our build scripts and it works fine now.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
Rake::TaskManager.class_eval do def remove_task(task_name) @tasks.delete(task_name.to_s) end end def remove_task(task_name) Rake.application.remove_task(task_name) end namespace :db do namespace :test do remove_task :"db:test:prepare" desc 'prepares the db - mysql style' task :prepare do require 'yaml' config = YAML::load(File.read('config/database.yml')) devdb = config['development']['database'] devpass = config['development']['password'] devuser = config['development']['username'] testdb = config['test']['database'] testpass = config['test']['password'] testuser = config['test']['username'] puts "dumping development schema" puts %x{mysqldump -u #{devuser} --password=#{devpass} -d #{devdb} > dev.sql} puts "dropping test db" puts %x{mysqladmin -u #{testuser} --password=#{testpass} -f drop #{testdb}} puts "recreating testdb" puts %x{mysqladmin -u #{testuser} --password=#{testpass} create #{testdb}} puts "loading development schema into test db" puts %x{mysql -u #{testuser} --password=#{testpass} #{testdb} < dev.sql} end end end |
Popularity: 1% [?]
