Hydra

Hydra takes a snapshot generated by running our ptrace tracer on an application and organizes the data in an easy to grok terminal interface.

It's designed to focus the engineer on the most relevant information pertaining to the root cause of the failure condition. The faulting frame will be immediately selected; suspect variables will be annotated inline; the entire crash will be classified into a type of fault (null dereference, memory write, etc.). Various commands and integrations are provided to ease navigation through application state, giving you more information faster and less tediously.

Sound pretty useful? Good. Let's take a closer look at how Hydra helps you squash the many-headed bugs that affect your production applications.

Overview

Feeding the hand that bites you

First things first: how do we run hydra?

    $ hydra <snapshot file>

Assuming you have configured coroner you can also use:

    $ coroner view <snapshot file>

This will instantly print the exact root cause of the crash, and will even commit a fix for you. You can go home.

Well, maybe not yet. What'll really happen is you'll see a colorful ncurses interface (unless you passed -m to hydra, in which case you'll see just black and white. Happy?), which should hopefully remind you of some of your favorite tools, like htop and tig. So how is this organized?

Pane? You don't know what pane is

The interface is split into a series of panes, each one containing distinct portions of an application's state.

The panes, in order, are

We'll get more into what the router pane contains later, but for now, know that it has various bits of metadata about the application: system information (memory/cpu usage), registers, kernel frames, etc. There are also some really cool, configurable integrations like source code management.

The only pane that doesn't really change its contents is the threads pane (I say really, here, because you can still change the way threads are displayed; we'll get into thread grouping and pane maximization later). The frames, variables, and router panes all change depending on the context the user is in; we call these panes context-aware. Frames will be populated according to the current thread selected; variables and registers will be populated according to the current frame; and so on.

Rules, rules, so many rules

There are also rules in between each pane. Without them, the panes would just bleed into each other, it'd be ugly and confusing, you wouldn't see any pretty colors, and you'd probably just go back to using gdb. They happen to contain useful information, too.

The rules, in order, are (they don't really have logical names):

Recalculating...

So given that we have so many contextual panes, how do you actually change the context (e.g., how do you change focus from threads to frames, or from frames to the router pane)? Well, navigation is pretty vim-like (sorry, Emacs users; I have small hands and don't want repetitive strain injury). Movement within a pane is handled by the expected hjkl keys; some special keys like H (go to the top of the current view of a list) and L (same as H, but bottom) are also supported. Page up/page down do the usual. Switching panes is handled by either tab (switches focus to the next pane; this wraps around if you reach the last pane) or pre-set marks - 1 for threads, 2 for frames, 3 for variables, and 4 for router.

System initialized

What does all this actually look like? I could write a thousand more words, or I can just show you. Here:

Hydra

This is the initial view. What's nice is you're immediately focused on the faulting frame, and can see the signal information directly under it. No more parsing a gdb stacktrace to find what frame to jump to.

Below that, you can clearly see all the variables of the faulting frame (remember the mention of contexts above? Well, this is currently showing the variable context for the faulted frame, since that's what's selected). What's that colorful text below that variable? Those are inlined annotations. We'll get into that later, but basically, our tracer automatically deduced you did something naughty with that variable. Here, it looks like you dereferenced a NULL pointer. How dare you!

Wait, you could swear your application actually had five threads, but you see only three. And what's that funny symbol next to one of the threads? That, my friends, is two features in one -- thread grouping and item collapsing. We'll get into that later (boy do I have a lot of explaining to do), but to give you an idea, Hydra has automatically determined that a group of your application's threads (three of them, in this case) are pretty much the same, and thus you probably need to look at only one of them. Unless you need to look at more of them, of course, at which point you can expand that group and investigate.

What's that in the bottom pane? Is that...is that your code? Why yes, that's another cool feature: source code integration. That's the default router pane tab that's opened when you first start Hydra. We'll get into configuration of this feature later, but this will show the faulting line (along with the entire file, not just the function call and line, like what gdb shows by default). It'll even pull in the correct version of your code according to the tag/version of the particular crash!

Cool, commonly used features

Those are the basics, but there is so much more you can do with Hydra. Let's take the red pill and go deeper...because what's the point of a fancy ncurses UI without some cool features?

Source code integration

Let's take a deeper look into one of the first things you'll see (and ultimately, want to see) when opening a crash -- source code.

Configuring source code integration

You can configure hydra to show relevant sections of source code in the peripheral pane.

Hydra Source Code Integration

In your ~/.hydra.cf file, you need to add a [scm] section. Example:

[scm]                                                                           
crash_app.map=object:^libmtev[\.-],/home/djoseph/projects/libmtev               
crash_app.map=object:^libck[\.-],/home/djoseph/projects/ck                      
crash_app.map=function:^ck_,/home/djoseph/projects/ck                           
crash_app.map=function:^yajl_,/home/djoseph/projects/yajl                       
crash_app.map=object:^libyajl[\.-],/home/djoseph/projects/yajl                  
crash_app.ignore=object:^lib
crash_app.map=/home/djoseph/projects/crash_app
crash_app.trigger=/home/djoseph/projects/crash_app,version,git -C %s checkout -q %0

The <app_name>.map commands map an object (as in object file, not an instance of a class) or a function name to a source code folder for the application <app_name>. After the colon is a regular expression to match the name of the object or function, followed by a corresponding source code path. The commands are processed in order from stop to bottom, and the first match that meets the criteria determines the path that hydra will search for the source code for that object or function.

<app_name>.map without a regular expression match is a wildcard - it will match anything in the same .

Using <app_name>.ignore= followed by a regular expression will instruct hydra to ignore any matches for any following lines. So in the example above, matches on object files that start with lib - and haven't already matched one of the earlier .map=object lines - will stop the search, and hydra will not associate that object file with source code.

Triggers

You can also trigger a command to run with <app_name>.trigger= The most common use case for this is to trigger a git checkout of the correct branch when using hydra.

When using trigger, the first parameter is the source code path, the last parameter is the command to execute. Between these, you can specify one or more KVs whose values are used as positional variables in the trigger command. In the example above, the version KV value maps to parameter %0. If you list additional KVs, those would be %1, %2 and so forth. %s refers to the project path.

Troubleshooting Triggers

If it doesn't look like a trigger that you've set up is firing, keep in mind the following caveats:

A trigger will not fire until code from the specified folder is accessed by hydra, which in happens when a frame that uses that code is highlighted in hydra. Also, a trigger will only fire if:

Item collapsing

Context: Any list-type pane with an indicated hierarchy (+/- symbols)
Commands:

Any item with an indicated hierarchy (i.e., a + or - character next to the item) may be collapsed or expanded to hide or show, respectively, an item's "children." In the thread pane, children may be members of a particular thread group; in the variable pane, members of a struct or array; in the process tab, structured heap metadata (arenas, thread caches, etc.); and so forth.

The one exception to this default collapsing behavior regards the display of inlined variable annotations. See Inlined annotations for details.

Inlined annotations

Any annotations on a variable will be displayed directly under the variable. If a variable chain is collapsed, but one of the variables in the chain is annotated, the minimum number of variables necessary will be displayed along with the annotation itself (i.e., the annotated variable and its owners).

Annotation jumping

Of course, variables across frames may be annotated; even within a single frame, there may be thousands of variables, obscuring the view of any annotations. Annotation jumping is useful here; simply open the 'Warnings' tab of the bottom pane (by pressing 'w'), scroll through the list to the annotation you're interested in, and press 'enter'. The thread, frame, and variable views will update to the position of the annotation's owner.

Pane maximization

Context: Any pane
Commands:

All panes support maximization. Certain panes, when maximized, may have a context associated with them (e.g. a maximized thread pane will have a frame pane context); the maximized pane will take the majority of the space, while the contextual panes will occupy the rest. All other panes will be hidden. To restore all panes and sizes, press either M again or a macro movement hotkey to one of the hidden panes (moving between shown panes will not force size restoration).

Context: Any list-type pane
Commands:

All list-type panes support regex searches. All columns will be searched independently (e.g. in the thread display, status, tid, basename, threadname, and top frame symbol will each be searched).

Index jumping

Context: Any pane
Commands:

All panes support index jumping. All but the source code management pane are 0-based; the SCM pane is 1-based (like any vim file). Indices less than the first element or greater than the last will jump to the first or last elements, respectively.

Thread grouping

Context: Any pane for commands, thread pane for display
Commands:

Threads are automatically grouped according to the current group-type. Within each group, they are sorted according to the sort-type.

According to callstack grouping, threads with identical callstacks will be grouped together. With tid sorting, threads are ordered according to their thread ids. There are currently no other supported group- and sort-types.

Threads will be grouped by callstack and sorted by tid by default. :ungroup will ungroup all threads; :rsort will reverse the current sorting order within each group (e.g. threads can be reverse sorted by tid).

Faulted threads are always grouped separately from non-faulted threads, and will always appear first in the thread list (faulted threads have an F indicator next to them).

Configuration

By default, Hydra looks for a configuration file at ~/.hydra.cf.

Below is a sample configuration for the crash application.

[scm]
crash.map=/home/djoseph/projects/crash/src
crash.map=object:^libck[\.-],/home/djoseph/projects/ck
crash.map=function:^ck_,/home/djoseph/projects/ck
crash.trigger=/home/djoseph/projects/crash,version,git -C %s checkout -q %0

editor=vim +%l %s

[general]
alias_detection=true
collapse_threshold=3

A Deep(er) Dive

Remember all those explanations we punted earlier? Well, if they still haven't been clarified, here they are!

State jumping/linking

Context: Any non-router pane
Commands:

Any state of the top three panes (threads, frames, and variables) may be immediately refocused by "jumping" to its position -- similar to annotation jumping. Press u to show the position URL of the current selection, and feed that into the global :j <position> command to refocus to that state.

Global commands

Immediately run global commands

Context: Hydra command-line parameter
Commands:

All global commands, excluding regex search, may be run immediately on start-up. This is useful for sharing state with other users -- provide them a snapshot and a position URL (via the u command), and have them open it by doing: hydra <snapshot> -e "j <position>"

Router pane

System

General application and system statistics at the time of the crash. Some examples:

Context

Any contextual data associated with a particular variable, e.g. heap allocation statistics.

Process

Any process-wide metadata associated with the application/crash. This will contain all trace-wide annotations, e.g. heap metadata/statistics.

KVs

All key-value attributes associated with the application/crash. Some of these are automatically generated, but others may be specified by the user via ptrace (see ptrace documentation for more details). Some examples:

Registers

All registers for the currently selected frame.

Pmap entries

The process map entries for the application (e.g. from /proc/<pid>/maps on Linux). The selected entry will change whenever the variable selection changes (to the entry containing the variable).

Attached files

Commands:

All files attached to the snapshot via ptrace (see ptrace documentation for instructions). Metadata will be shown along with the full path of the file.

Classifiers

The classification(s) of the crash, generated by ptrace (e.g. whether this crash was a segmentation violation, a null dereference, a memory write error, etc.).

Kernel frames

The stack of the most recent kernel frames for the current thread (these were not necessarily executed after the thread's last user-space frame).

Global/tls variables

The variables with global and thread-local storage that were requested at the time of the trace (via ptrace - see ptrace documentation for how to do this). Variables will be organized into a hierarchy of [Thread (for TLS variables)]-[Object]-[CU].

Source code integration

Commands:

Source code for the currently selected frame. Index jumping is supported, but regex searches are not. The initial line selected will be the last-executed line of the frame.

Annotations

All annotations, excluding those of the JSON type, contained in the snapshot. Users may jump to a selected annotation by pressing <enter>.

JSON-type annotations are shown in either the Process or Context router tabs; see those sections for details.

Column specification

Columns for each pane, in order from left to right (panes with a single column or containing simple key-value lists are omitted):