Adventures in Production Rails Debugging

Scott Francis

At Shopify we frequently need to debug production Rails problems. Adding extra debugging code takes time to write and deploy, so we’ve learned how to use tools like gdb and rbtrace to quickly track down these issues. In this post, we’ll explain how to use gdb to retrieve a Ruby call stack, inspect environment variables, and debug a really odd warning message in production.

We recently ran into an issue where we were seeing a large number of similar warning messages spamming our log files:


/artifacts/ruby/2.1.0/gems/rack-1.6.4/lib/rack/utils.rb:92: warning: regexp match /.../n against to UTF-8 string

This means we are trying to match an ASCII regular expression on a UTF-8 source string.

These messages were coming in at a rate of around 50k per minute and were generating a lot of noise. What I could tell from the message itself was that it was a Ruby warning, generated via the rb_warn API, and that it was coming from Rack::Utils#parse_query. Unfortunately, what rb_warn doesn’t show us is a stack trace of what was actually calling parse_query. Doing a quick scan through the Shopify code base didn’t reveal anything obvious.

There were a couple of ways I thought of to figure out where these calls were coming from:

  • We could add some instrumentation around parse_query to dump a call stack. However, this would require adding the code, deploying it, and later removing it. Too much work!
  • Use tmm1’s rbtrace to configure a tracepoint to dump the stack when parse_query is called. Unfortunately, our current runtime Docker configuration prevents us from using rbtrace, so this method was out.
  • Attach gdb to a Unicorn worker in production and figure it out from there. This was the approach we ultimately decided to use.

The first thing I needed to do was come up with a game plan. The ideal approach would not cause any significant harm to production traffic. With that in mind, I decided to:

Attach to an idle Unicorn worker:


scott@xxx.xxx:~ $ sudo gdb -p 31446
[… snip …]
Attaching to process 31446

Set a breakpoint at the appropriate location. How did I know where to set the breakpoint? Performing a grep through the Ruby codebase for the warning message was all that was needed. One important thing to note here - in order to set breakpoints in source files you’ll need to compile your Ruby interpreter with debug symbols.


(gdb) break 're.c:1300'
Breakpoint 1 at 0x7f63aba58bc1: file re.c, line 1300.

Resume the process:


(gdb) c
Continuing.

It didn’t take very long for the breakpoint to fire. Once it did, I immediately dumped a core, and then detached from the process, leaving the customer request intact:


Breakpoint 1, rb_reg_prepare_enc (warn=1, str=140066762149160, re=140066722973200) at re.c:1300
1300 re.c: No such file or directory.
(gdb) generate-core-file
warning: Memory read failed for corefile section, 8192 bytes at 0x7fff2e1ec000.
Saved corefile core.31446
(gdb) detach
Detaching from program: /usr/lib/shopify-ruby/2.1.6-shopify2/bin/ruby, process 31446
(gdb) quit

Dumping a core took less than a second, and provided the ability to debug the problem out of band and at my leisure, without having to worry about potentially impacting a customer request. With the core file, we can now run gdb again with the core and really figure out what’s going on:


scott@xxx.xxx:~ $ gdb /usr/lib/shopify-ruby/2.1.6-shopify2/bin/ruby core.31446
[… snip …]

The first thing I wanted to see was where the request was coming from in Ruby-land. Using a little Python snippet, it’s easy to get a Ruby callstack:


(gdb) python get_ruby_stacktrace()
/artifacts/ruby/2.1.0/gems/rack-1.6.4/lib/rack/utils.rb:92:in `parse_query'
/artifacts/ruby/2.1.0/gems/rack-1.6.4/lib/rack/request.rb:307:in `cookies'
/app/lib/shopify_session_store.rb:136:in `extract_session_id'
[… snip …]

Ah ha! So parsing the cookies is the problem! RFC 6265 states that cookies should be encoded using US-ASCII, so it would be interesting to see what’s really going on here, and why we’re seeing this UTF-8 warning. No worries, gdb has us covered. By deciphering the parameters passed to rb_reg_prepare_enc, we can see the source string to which we’re trying to apply the regexp:


(gdb) where 1
#0 rb_reg_prepare_enc (warn=1, str=140066762149160, re=140066722973200) at re.c:1300
(More stack frames follow…)

This is where knowing a little about MRI's internal object representation is helpful. With 64-bit versions of MRI, every Ruby object is 40 bytes. In our case the specific object we’re looking at is an RString, Ruby’s string object. Pat Shaughnessy has a great article on how Ruby strings are implemented. If you haven’t read it before, go ahead and give it a read - I’ll wait. So, now that we know how Ruby strings work under the covers, we can use that knowledge to figure out what string we’re trying to analyze (some information has been obscured for privacy purposes):


(gdb) p *((struct RString *) str)
$2 = {basic = {flags = 6299685, klass = 140066084410960}, as = {heap = {len = 1628,
ptr = 0x7f63d8447800 “_session_id=xxx; fsb_prepa=/collections/views-smart-collection-7; _shopify_visit=t; _shopify_uniq=x; _shopify_y=xxx; _shopify_s=xxx"...,
aux = {capa = 2850, shared = 2850}}, ary = "\\\006\000\000\000\000\000\000\000xD\330c\177\000\000\"\v\000\000\000\000\000"}}

The string by default is truncated, but by setting print elements 0, we can print the full string:


(gdb) set print elements 0
(gdb) p ((struct RString *) str)->as->heap->ptr
$5 = 0x7f63d8447800 "_session_id=xxx; fsb_prepa=/collections/views-smart-collection-7; _shopify_visit=t; _shopify_uniq=x; _shopify_y=xxx; _shopify_s=xxx; ajs_user_id=null; ajs_group_id=null; ajs_anonymous_id=%22xxx%22; _ga=GA1.3.0000000000.0000000000; _gat=1; addshoppers.com=2%7C1%3A0%7C10%XXX%7C15%3Aaddshoppers.com%7C44%3XXX%3D%7Cxxx; __zlcmid=XXX; sloyalty-vid=XXX; modal=noshow; shopify_recently_viewed=womenshoodies-birds-madewithlove%20womenscardigans-cecilia-misseleven; _sp_id.45cc=xxx; _sp_ses.0000=*; fsb_all={\"opacity\":1,\"font_size\":20,\"goal\":35,\"currency_exchange_rate\":1,\"message_three\":\"��������� ������ �������� ��� ", '�' , " ���!!!\",\"show_time\":5,\"interval_time\":120,\"is_active\":\"yes\",\"bar_height\":40,\"message_one\":\"������ �������� �� ������ ��� ��� {{currency_symbol}}{{goal}} !\",\"text_color_two\":\"#FF0000\",\"text_color_one\":\"#FFFFFF\",\"display_page\":\"all\",\"currency_symbol\":\"�\",\"position\":\"top_push_sticky\",\"font\":\"Righteous\",\"background_color\":\"#000000\",\"message_two\":\"�� �������� �������� ����� {{currency_symbol}}{{remainder}} ,�� ��������� ������ ��������!\",\"currency_code\":\"EUR\",\"charged\":false}; fsb_index=fsb_all”

There it is! It looks like one of the cookies is embedding a chunk of JSON that contains non US-ASCII characters, and that’s what’s triggering the UTF-8 warnings.

The fix that we went with was fairly simple. We monkey patched Rack::Request#cookies to suppress any warnings:


require 'rack/request'

module Rack
  class Request
    def cookies_with_warning_suppression
      silence_warnings { cookies_without_warning_suppression }
    end
    alias_method_chain :cookies, :warning_suppression
  end
end

It’s not a permanent solution, but it’s enough to at least suppress the large number of warnings.

Conclusion

As with everything in life, YMMV. In this case, it was faster to debug in production than it would have been to instrument the code, deploy and monitor. For more information:

Finally, if you like doing this kind of performance analysis, we’re hiring!

Special thanks thanks to Julia Evans for feedback on this post!