I'm experiencing an unresponsive socket in with my Puma setup after random time. Up to this point I don't have a clue what's causing the issue. I was hoping somebody over here can help we with some answers or point me in the right direction. I'm having the following setup:
I'm using the official docker ruby-2.2.3-slim image together with the latest puma release 2.15.3, I've also installed Nginx as a reverse proxy. But I'm already sure Nginx isn't the problem over here because and I've tried to verify if the socket was working using this script. And the socket wasn't working, I got a timeout over there as well so I could ignore Nginx.
This is a testing environment so the server isn't experiencing any extreme load, I've also check memory consumption it has still several GB's of free space so that couldn't be the issue either. What triggered me to look at the puma socket was the error message I got in my Nginx error logging:
upstream timed out (110: Connection timed out) while reading response header from upstream
Also I couldn't find anything in the logs of puma indicating what is going wrong, over here are my puma setup:
threads 0, 16
app_dir = ENV.fetch('APP_HOME')
environment ENV['RAILS_ENV']
daemonize
bind "unix://#{app_dir}/sockets/puma.sock"
stdout_redirect "#{app_dir}/log/puma.stdout.log", "#{app_dir}/log/puma.stderr.log", true
pidfile "#{app_dir}/pids/puma.pid"
state_path "#{app_dir}/pids/puma.state"
activate_control_app
on_worker_boot do
require 'active_record'
ActiveRecord::Base.connection.disconnect! rescue ActiveRecord::ConnectionNotEstablished
ActiveRecord::Base.establish_connection(YAML.load_file("#{app_dir}/config/database.yml")[ENV['RAILS_ENV']])
end
And this it the output in my puma state file:
---
pid: 43
config: !ruby/object:Puma::Configuration
cli_options:
conf:
options:
:min_threads: 0
:max_threads: 16
:quiet: false
:debug: false
:binds:
- unix:///APP/sockets/puma.sock
:workers: 1
:daemon: true
:mode: :http
:before_fork: []
:worker_timeout: 60
:worker_boot_timeout: 60
:worker_shutdown_timeout: 30
:environment: staging
:redirect_stdout: "/APP/log/puma.stdout.log"
:redirect_stderr: "/APP/log/puma.stderr.log"
:redirect_append: true
:pidfile: "/APP/pids/puma.pid"
:state: "/APP/pids/puma.state"
:control_url: unix:///tmp/puma-status-1449260516541-37
:config_file: config/puma.rb
:control_url_temp: "/tmp/puma-status-1449260516541-37"
:control_auth_token: cda8879717be7a645ea323d931b88d4b
:tag: APP
The application itself is a Rails app on the latest version 4.2.5, it's deployed on GCE (Google Container Engine).
If somebody could give me some pointer's on how to debug this any further would be very much appreciated. Because now I don't see any output anywhere which could help me any further.
I replaced the unix socket with tcp connection to Puma with the same result, still hangs after x time
I'd start with:
Not knowing more about your application, I'd say it's likely the threads get stuck doing some long/blocking calls without timeouts or spinning on some computation until the whole thread pool gets depleted.
We'll see.
I finally found out why my application was behaving the way it was. After trying to use a tcp connection and switching to Unicorn I start looking into other possible sources.
That's when I thought maybe my connection to Google Cloud SQL could be the problem. Once I read the faq of Cloud SQL, they mentioned that you have to tweak you Compute instances to ensure they keep open your DB connection. So I performed the next steps they recommend and that solved the problem for me, I added them just in case:
# Display the current tcp_keepalive_time value.
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
# Set tcp_keepalive_time to 60 seconds and make it permanent across reboots.
$ echo 'net.ipv4.tcp_keepalive_time = 60' | sudo tee -a /etc/sysctl.conf
# Apply the change.
$ sudo /sbin/sysctl --load=/etc/sysctl.conf
# Display the tcp_keepalive_time value to verify the change was applied.
$ cat /proc/sys/net/ipv4/tcp_keepalive_time