Archive for the ‘Deployment’ Category
How to troubleshoot Problems in Server Setups, Rails Apps or any other Config or Code Problem
This post might be interesting for all people who are faced with strange problems like this: “Yesterday it worked. Now it’s broken” or “It works on my machine (and it does not in production)”.
I’m sure that all Programmers and sysadmins have had an incident like this in their lives. I’ve had a lot of these problems and found out that in the end there’s always an explaination for the problem. Very rarley it’s some quantum mechaincs effect that caused the problem. In most cases there is a really simple explaination for the problem even if it was hard to find. These things include “the cause of a crashing perl script is the java version of the app starting the script”, “failing tests that where caused by a minor version difference of a testing library where the error message lead to something completely different”
The process described below was used in all cases to find the root cause of the problem which then was solved very easily. We weren’t aware that we used this process but rather did it intuitively. After talking about the process and writing it down, we were able to find more and more problems by following these steps and also to transfer the knowledge to other people so that they will build up the intuition to find root causes of problems as well.
General idea
Systems that work and system that don’t work differ.
If you make the not working system equal to the working system, it will work.
That’s all there is to Troubleshooting (basically).
Process to find out the difference
The hard part is to find out where the working and not working systems differ.
The general process is really simple though:
- List all items that can differ
- Check if they differ
- Make them equal (one at a time!)
- Repeat 3 until finished. If still broken, think harder about 1 and start again
The optimized version of this is, to start with the things that are most likely to cause the problem.
The term “most likely” is based on
- your own experience
- information found in the web: blog posts, google searches, etc.
- experiences of your co-workers
It is fundamental that you make each step conciously (writing the step/change down helps to do that). If one step doesn’t yield the desired outcome: revert it immediately. Again: having written it done helps not to forget anything. Forgetting steps may make the situation even worse.
How can systems differ?
The main questions are:
- What changed since it worked (if it is the same system)?
- What is different or changed on the not working system compared to the working system (if it is a different system)?
The latter is a lot easier because you something to compare to. In the former case you have to create the “working system” again. Which in itself may be the solution to the problem.
If the answer is “nothing”. Think again…! Because time has progressed. So at least the time changed.
Possible effects of changed time
- File system full
- weird time dependend behavior of applications
- system/application restart occured
- data changes happend
Other things that may have changed:
- software versions through package updates – Minor Changes are important!
- OS Kernel
- OS packages
- application libraries (ruby gems, jars))
- Database schemas
- Database content
- Filesystem content of any kind (That includes timestamps of a file that is only read!)
- Location of files
- symlink vs. real files
- timestamps
- Hardware
- Increased load
- Network I/O
- Disk I/O
- CPU
- Exceeded RAM -> Swapping
Some things will be straightforward and it is obvious why something brakes something else. Some things are not as obvious (at least not at the time when you try to find it – it’s always obvious afterwards!). Don’t jump to conclusions about cause and effect while you debug. If you think “I’ll don’t try X because X has nothing to do with Y” try X! Maybe it has something to do with Y. You don’t know before you try. Revert (or create the equal state to the working system for) the “most obvious” things that “can’t possibly interfere with the problem”. That includes
- Comments in Source or Configuration files
- Whitespaces
- Trivial Code/Configuration changes
- minor version changes in Packages
The “most likley” rule does apply here, too. Don’t start with whitespace if there are other not so subtle changes still different. Don’t look for access time timestamps if the files on one system are are in completely different locations compared to on the other system. This requires some experience but with time you’ll find which things to look for first.
Tools
- Filesystem-Analysis: df, ls, find,
- Application-Behavior: strace (dtruss on Solaris and Mac OS), lsof, netstat
- Databases: For mysql: mysql, innotop
- Packages: Debian: apt-get, dpkg
- Finding differences/problem causes in running vs. not running code: Binary Search (e.g. via git bisect, debugger or just plain “print”-Debugging).
We hope these thoughts help you to debug and troubleshoot strange problems. Feel free to post additions, comments, tool or experiences with troubleshooting.
Popularity: 1% [?]
Continuous releasing with git
Agile software development can be seen as heavily influenced by the open-source world. In 1996 Eric S. Raymond described two different models of open-source-software (OSS) development in his essay The Cathedral and the Bazaar. Back then OSS has been crafted like a cathedral, i.e. an exclusive group of developers was releasing source code when a stable version has been reached. But developers like Linus Torvalds have started to follow a new philosophy: “release early and often, delegate everything you can, be open to the point of promiscuity”. Eric S. Raymond called this the bazaar model.
With agile software development the world of proprietary software presents its learnings from the “great babbling bazaar”. The development process has become more flexible and release cycles have shortened drastically. Processes like Scrum allow product managers the rapid planning of new features, e.g. to react on customer demands or market changes. Software frameworks like Ruby on Rails help developers to create new features easily. Even the quality assurance (QA) has become more agile: Tests have been automated and continuous integration systems replace the old-fashioned checklists.
But open source as well as agile software development were not possible without powerful code management systems like Subversion or git which allow the separation of code versions. This makes it possible that developers can work on new features while the QA folks are testing the code version to be released. In the following is described how we manage our code.
We are using Scrum, Ruby on Rails, and git with submodules. Our Scrum process contains sprints, stories, tasks, and bugs. One sprint is a cycle of two weeks and one of our stories describes a new use case for a feature or the change of one use case. Each story consists of a number of tasks which are development jobs that can be finished during one work day or, of course, in less time by one pair of developers. Well, and a bug is a bug.
Although we have two-weeks sprints, we are constantly able to deploy our code. Hence bugfixes or new features can go online whenever the QA gives their OK. The way we keep our code deployable is to branch it for every new story or set of stories. Creating a new branch in git goes like this.
git checkout master
git branch <branch name>
git push origin <branch name>
git checkout <branch name>
If the work on a story includes a submodule, a new branch needs to be created in this submodule, too. In the corresponding branch of the main application this story branch of the submodule needs to be checked out afterwards. Here is how to add a submodule branch to the main application.
cd <main application path>
git checkout <branch name>
git submodule add -b <branch name> <repository> <path>
git push origin <branch name>
This not only looks like pretty much overhead for small changes, it is. So there is another way of separating submodule versions. When changes in a submodule do not conflict with the work of other developers on this submodule, it is possible to do them in the master branch without going online on the next deployment. Therefore the submodule revision is only updated in the story branch of the main application, but not in the master branch. To do so, run the following commands.
cd <main application path>
git checkout <branch name>
git add <submodule path>
git commit -m"Updated submodule <submodule>"
git push origin <branch name>
After finishing a story, the master branch is merged into the story branch in order to resolve conflicts which may occure before testing. This again has to be done in the submodules, too.
git checkout <branch name>
git fetch origin
git merge master
git push origin <branch name>
When all conflicts have been resolved the story can be deployed on a staging system which is a clone of the production system, so that the quality assurance guys can have a look at it.
git checkout <branch name>
git pull <branch name>
Afterwards, when the story has passed the quality assurance tests, it can be merged into the master branch and goes online with the next deployment of our code, which can be for example once a day.
git checkout master
git merge <branchname>
git push origin master
Whenever new code has been merged into the master branch, our CI system starts to run relevant unit, functional, and integration tests. All of these tests need to pass before the deployment can be continued. If one of the tests failed the deployment stops until the failure has been fixed. This way we can prevent regressions from going online.
Edit: Made introduction more comprehensible.
Popularity: 1% [?]
Setting your custom deploy strategy in capistrano
Capistrano 2 supports custom deploy strategies. You basically just have to implement the “deploy!” and “check!” methods in your class and you’re good to go.
But how do you tell Capistrano to use your strategy?
I looked into the code and found no way of setting your strategy. So I changed capistrano, that it’s possible to set the strategy. When I asked Jamis Buck to pull the change, he suggested that I just set the strategy directly. This approach wouldn’t need any changes to to code base.
Well then we went ahead and did just that. So here’s the code for setting your custom deploy strategy (Don’t forget to require the file with your code)
set :strategy, Capistrano::Deploy::Strategy::DifferentAppRootRemoteCache.new(self)
It works like a charm.
Popularity: 1% [?]
Overriding rake tasks and db:test:prepare strangeness
Some time ago our CC.rb build started to fail with errors like this:
Mysql::Error: Can't create table './cc_test/#sql-8e7_5bab.frm' (errno: 150): ALTER TABLE questions ADD CONSTRAINT questions_ibfk_1 FOREIGN KEY (user_id) REFERENCES users (id) ON DELETE CASCADE
We’re using the foreign key migrations plugin which automatically generates foreign keys for mysq. It works fine but somehow the db:test:prepare rake task started to fail – everytime at a different key and only on the build server.
After hours of hunting down the problem I finally gave up and did what you always can do if you can’t solve a problem: cheat. So I just created a new db:test:prepare task which calls the mysql commands and basically does the same as the rake task. It’s not as portable as the default one, of course, but it has one property that the default one had lost: It works.
Rake doesn’t allow redefining task by default. So to override a rake task you have to delete the task and then define the new one. I opted for the manual remove task and redefine option.
Here’s my task in case anyone experiences similar problems:
We call rake using “rake -I /path/to/override.rb” in our build scripts and it works fine now.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
Rake::TaskManager.class_eval do def remove_task(task_name) @tasks.delete(task_name.to_s) end end def remove_task(task_name) Rake.application.remove_task(task_name) end namespace :db do namespace :test do remove_task :"db:test:prepare" desc 'prepares the db - mysql style' task :prepare do require 'yaml' config = YAML::load(File.read('config/database.yml')) devdb = config['development']['database'] devpass = config['development']['password'] devuser = config['development']['username'] testdb = config['test']['database'] testpass = config['test']['password'] testuser = config['test']['username'] puts "dumping development schema" puts %x{mysqldump -u #{devuser} --password=#{devpass} -d #{devdb} > dev.sql} puts "dropping test db" puts %x{mysqladmin -u #{testuser} --password=#{testpass} -f drop #{testdb}} puts "recreating testdb" puts %x{mysqladmin -u #{testuser} --password=#{testpass} create #{testdb}} puts "loading development schema into test db" puts %x{mysql -u #{testuser} --password=#{testpass} #{testdb} < dev.sql} end end end |
Popularity: 1% [?]
Limiting mongrel to one request at a time with haproxy
I don’t know why we didn’t try that earlier, but haproxy is a far better proxy solution then all the others we tried. Ok, we didn’t try that many but the most common ones on the Rails deployment landscape:
- Apache 2.2 mod_proxy_balancer
- nginx standard proxy module
- nginx with fair proxy module
- haproxy
So why is haproxy so much better? It is better because it actually can limit the requests per mongrel to one at a time. This important because otherwise you get behavour like this: One mongrel has a long running request and through round-robin, gets another request while other mongrels are idleing. Also mongrels with a request queue bigger then one start to eat memory like hell.
The same thing is possible with Apache’s mod_proxy_balancer. We tried and failed to get it working. And it seems as if we are not alone with that problem
The plain nginx balancer has the same problem. This is where the fair proxy module comes in. It’s supposed to send requests only to idleing mongrels. But we had the same “mongrels with many requests while others are ideling” problem again.
We finally tried haproxy which is in use for Rails deployments for quite some time but got a lot of buzz recently. Ilya Grigorik wrote a nice article about load balancing QoS with haproxy and Alexander Staubo posted a performance comparison of nginx and haproxy which got the attention of William Tarreau (the haproxy author). They found some haproxy bugs which got fixed and resulted in an even better performance. Details can be found in the second comparison of haproxy and nginx
So we are quite happy with haproxy at the moment and hope it stays this way.
Popularity: 1% [?]
Strange web requests
Ever seen requests for files like this in your webserver logfile?
/_vti_bin/owssvr.dll
/MSOffice/cltreq.asp
According to this forum post it’s some kind of “Microsoft Office gone wild” thing:
“This file is part of Microsoft Office Server Extensions (OSE) and your host has installed them on your server. This file gets accessed when someone visiting your site has Microsoft Office and Internet Explorer installed and has enabled the “Discuss” toolbar in his/her browser. When the toolbar is enabled, the browser will automatically query the file when visiting your site to determine if the OSE extensions are installed.”
It’s almost as annoying as Firefox’s favicon requests.
Popularity: 1% [?]
Useful process info for mongrel and thin
Ilya Grigorik wrote about mongrel proctitle plugin which is indeed very useful. We discovered the mongrel plugin about two months ago and use it in production without problems. It has helped us to debug several problems.
We run more than one Rails app per server and so it hurts to install the plugin for every app again (oh that painful DRY nerve
)
The solution of course is to just create a gem and install it. Luckily there is already such a thing: rtomayko’s mongrel proctitle github repository
There’s also a proctitle plugin (actually a rackup file) for thin: thin proctitle git repository – no gem yet, though.
Popularity: 1% [?]
Wordpress Deployment with Capistrano 2 and git
I know: PHP deployment is really easy. You just copy the files to the server and you’re good to go. Why bother with something like Capistrano for Wordpress deployment? Well, we’re using Rails and we’re spoiled children, and because we can. That’s why. Ok, to be honest: I don’t like the copy/checkout the files and upload them deployment.
There already is a good tutorial for Wordpress deployment with capistrano but it’s for Capistrano 1 and SVN. I’ll show you the neccessary modifications to use cap 2 and git. It’s easy, really.
Ok, then let’s go:
Local directory structure:
base_dir
- .git (Git Repository)
- Capfile
- config
- deploy.rb
- public
- *.php (etc...)
Server side Capistrano structure:
app-dir
- current => link to releases/2008....
- shared
- wp-config.php
- uploads (wp-uploads-folder)
- releases
- 2008.....
- public/ (wordpress goes here)
So you have to configure your Apache (or lighty or whatever) to use app-dir/current/public as Docroot.
My deploy.rb file looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
set :application, "mywordpress-blog" set :deploy_to, "/var/www/apps/#{application}" set :deploy_via, :copy set :copy_strategy, :checkout set :user, 'deploy' set :use_sudo, false set :keep_releases, 3 set :scm, :git set :repository, "/Users/hvolkmer/Projects/mywordpress-blog/" set :branch, "master" role :app, "blog.example.com" role :web, "blog.example.com" role :db, "blog.example.com", :primary => true desc "This is here to overide the original :restart" deploy.task :restart, :roles => :app do # do nothing but overide the default end desc 'Link to upload folder, cache and config' task :after_symlink do run "cp #{deploy_to}/#{shared_dir}/wp-config.php #{deploy_to}/#{current_dir}/public/wp-config.php" run "ln -nfs #{deploy_to}/#{shared_dir}/uploads/ #{deploy_to}/#{current_dir}/public/wp-content/uploads" end |
Notice that I symlink the uploads folder but copy the wp-config.php file. That’s ugly but neccessary, because php resolves the basepath of the target file and not the symlink when it tries to include other files. (So it would end up trying to include files from the shared directory). I’m told that in the latest PHP that bevaiour is fixed.
Using the deploy_via “copy” model doesn’t require to install git on the target system.
I was thinking about using vlad the deployer for that task because it is supposed to be simpler and leaner and all that. But as it is currently lacking the copy-deployment model and required git on the server side, I just stayed with capistrano which turned out to be the simpler solution for for us in this case.
So always remember: Pick the right tool (for you and) for the job and be happy with it.
Popularity: 2% [?]
Resurrecting mongrel_rails –clean and –only options
So you wake up, brew a coffee check the system status and realize that some mongrels aren’t running despite the fact, that monit should take care of them. So just start them manually and see what happens, right?
$ mongrel_rails cluster::restart -C /path_to/mongrel_cluster.yml --clean --only 8101 ** Ruby version is not up-to-date; loading cgi_multipart_eof_fix invalid option: --clean for command 'cluster::restart'
What? Why?
That’s exactly the command that monit has been using for almost a year in our production system…
So what changed? Apparently not much… just another gem was installed in preparation for the next deploy. But mongrel and mongrel_rails are still the same version. That’s strange.
I haven’t really found out what was wrong with the gem, but I’ve found a solution which is also mentioned in Paul Goscicki article :
gem cleanup mongrel_cluster
Just wipe all old mongrel_cluster gems and you’re good to go. YMMV. Friday saved.
Popularity: 1% [?]
