How Are Commands Defined in Rizin Shell?

Rizin, which originates from Radare2, envolves fast with neat code style and friendly community. A big step of Rizin is the new implementation of shell compared with Radare2. Radare2 highly depends on switch statements to parse commands and conducts corresponding handlers. The situation becomes worse when it comes to the huge number of commands, which may still grow according to the various requirements of users.

Simply put, Rizin enhances the shell mainly in two aspects. First, it leverages the tree-sitter-based parser to parse the input commands in shell instead of using switch statements. Second, the developers define the commands, tips of commands, and the handlers of commands in yaml files, which will be processed by python scripts later. In this blog, I will introduce how commands defined in yaml files are processed and integrated into the Rizin shell, i.e., the second aspect. This may inspire the guys who wanna hack Rizin by introducing new commands or migrating commands in old mechanism to new shell.

Definition of the commands for new shell

To define the commands, one needs to edit the yaml files under librz/core/cmd_descs, as shown below.

(base) ➜ librz/core/cmd_descs: ls
cmd_analysis.yaml  cmd_descs.c            cmd_descs.yaml  cmd_flirt.yaml       cmd_interpret.yaml  cmd_plugins.yaml  cmd_remote.yaml  cmd_system.yaml  cmd_yank.yaml
cmd_block.yaml     cmd_descs_generate.py  cmd_egg.yaml    cmd_heap_glibc.yaml  cmd_macro.yaml      cmd_print.yaml    cmd_resize.yaml  cmd_tasks.yaml   meson.build
cmd_cmp.yaml       cmd_descs.h            cmd_eval.yaml   cmd_history.yaml     cmd_meta.yaml       cmd_project.yaml  cmd_seek.yaml    cmd_type.yaml    __pycache__
cmd_debug.yaml     cmd_descs_util.py      cmd_flag.yaml   cmd_info.yaml        cmd_open.yaml       cmd_quit.yaml     cmd_shell.yaml   cmd_write.yaml   rzshell_which.py

Commands are included in different yaml files according to their relevance. For example, cmd_analysis.yaml contains the definitions of commands starting with a (aac and aad, to name a few). These commands are mainly responsible for analysis. Then let’s take a closer look at the content of cmd_analysis.yaml.

    ...
    - name: aae
    summary: Analysis commands using ESIL
    subcommands:
        - name: aae
        summary: Analyze references with ESIL
        cname: analysis_all_esil
        type: RZ_CMD_DESC_TYPE_ARGV
        args:
            - name: len
            type: RZ_CMD_ARG_TYPE_RZNUM
            optional: true
        details:
            - name: Examples
            entries:
                - text: "aae"
                arg_str: ""
                comment: analyze ranges given by analysis.in
                - text: "aae"
                arg_str: " $SS @ $S"
                comment: analyze the whole section
        - name: aaef
        summary: Analyze references with ESIL in all functions
        cname: analysis_all_esil_functions
        type: RZ_CMD_DESC_TYPE_ARGV
        args: []
    ...

The selected snippet is the definitions of commands of aae and aaef, as suggested by the line leading by - name.

Field subcommands

First of all, aae and aaef are the subcommands of the group aae. The commands in new shell have inheritance relationships. For example, pa and po are the subcommands of p, while pae and pad are the subcommands of pa. These inheritance relationships are represented by the keywork subcommands in yaml files.

Field summary

Moreover, we can see the keyword summary. As the word implies, this part presents a simple introduction of the command. The content of summary will be shown while one uses ? to find out the usage of command.

Field cname

There is also the keyword cname, which assigns the handlers of the commands. For example, the field cname of aae is analysis_all_esil, the handler that processes the command aae will be rz_analysis_all_esil.

Field type

The keyword type describes the properties, including whether this command owns subcommands and whether the command is in old style and not ported to new shell, to name a few, of the command. The possible values and corresponding explanations can be found in cmd.h, as shown below.

typedef enum rz_cmd_desc_type_t {
	/**
	 * For old handlers that parse their own input and accept a single string.
	 * Mainly used for legacy reasons with old command handlers.
	 */
	RZ_CMD_DESC_TYPE_OLDINPUT = 0,
	/**
	 * For handlers that accept argc/argv. It cannot have children. Use
	 * RZ_CMD_DESC_TYPE_GROUP if you need a command that can be both
	 * executed and has sub-commands.
	 */
	RZ_CMD_DESC_TYPE_ARGV,
	/**
	 * For cmd descriptors that are parent of other sub-commands, even if
	 * they may also have a sub-command with the same name. For example,
	 * `wc` is both the parent of `wci`, `wc*`, etc. but there is also `wc`
	 * as a sub-command.
	 */
	RZ_CMD_DESC_TYPE_GROUP,
	/**
	 * For cmd descriptors that are just used to group together related
	 * sub-commands. Do not use this if the command can be used by itself or
	 * if it's necessary to show its help, because this descriptor is not
	 * stored in the hashtable and cannot be retrieved except by listing the
	 * children of its parent. Most of the time you want RZ_CMD_DESC_TYPE_GROUP.
	 */
	RZ_CMD_DESC_TYPE_INNER,
	/**
	 * For entries that shall be shown in the help tree but that are not
	 * commands on their own. `|?`, `@?`, `>?` are example of this. It is
	 * useful to provide help entries for them in the tree, but there are no
	 * command handlers for these. The RzCmdDescDetail in the help can be
	 * used to show fake children of this descriptor.
	 */
	RZ_CMD_DESC_TYPE_FAKE,
	/**
	 * For handlers that accept argc/argv and that provides multiple output
	 * modes (e.g. rizin commands, quiet output, json, long). It cannot have
	 * children. Use RZ_CMD_DESC_TYPE_GROUP if you need a command that can
	 * be both executed and has sub-commands.
	 */
	RZ_CMD_DESC_TYPE_ARGV_MODES,
	/**
	 * For handlers that accept argc/argv and that provides multiple output
	 * modes (e.g. rizin commands, quiet output, json, long). It cannot have
	 * children. Use RZ_CMD_DESC_TYPE_GROUP if you need a command that can
	 * be both executed and has sub-commands.
	 *
	 * Differently from \p RZ_CMD_DESC_TYPE_ARGV_MODES, these handlers receive
	 * an output structure with the mode and data already initialized (e.g. PJ,
	 * RzTable, etc.) and the handler just has to fill the data in those
	 * structure, while RzCmd will allocate, free and print the data within.
	 */
	RZ_CMD_DESC_TYPE_ARGV_STATE,
} RzCmdDescType;

Field args

(in case you scroll the page up)

        args:
            - name: len
            type: RZ_CMD_ARG_TYPE_RZNUM
            optional: true

The field args defines what kinds of arguments this command accepts. The detailed explanations of possible values of type can be found in cmd.h, as shown below. The keywords name and optional define the name of the argument and whether this argument is mandatory, respectively.

/**
 * Type of argument a command handler can have. This is used for visualization
 * in help messages and for autocompletion as well.
 */
typedef enum rz_cmd_arg_type_t {
	RZ_CMD_ARG_TYPE_FAKE, ///< This is not considered a real argument, just used to show something in the help. Name of arg is shown as-is and it is not counted.
	RZ_CMD_ARG_TYPE_NUM, ///< Argument is a number
	RZ_CMD_ARG_TYPE_RZNUM, ///< Argument that can be interpreted by RzNum (numbers, flags, operations, etc.)
	RZ_CMD_ARG_TYPE_STRING, ///< Argument that can be an arbitrary string
	RZ_CMD_ARG_TYPE_ENV, ///< Argument can be the name of an existing rizin variable
	RZ_CMD_ARG_TYPE_CHOICES, ///< Argument can be one of the provided choices
	RZ_CMD_ARG_TYPE_FCN, ///< Argument can be the name of an existing function
	RZ_CMD_ARG_TYPE_FILE, ///< Argument is a filename
	RZ_CMD_ARG_TYPE_OPTION, ///< Argument is an option, prefixed with `-`. It is present or not. No argument.
	RZ_CMD_ARG_TYPE_CMD, ///< Argument is an rizin command
	RZ_CMD_ARG_TYPE_MACRO, ///< Argument is the name of a pre-defined macro
	RZ_CMD_ARG_TYPE_EVAL_KEY, ///< Argument is the name of a evaluable variable (e.g. `et` command)
	RZ_CMD_ARG_TYPE_EVAL_FULL, ///< Argument is the name+(optional)value of a evaluable variable (e.g. `e` command)
	RZ_CMD_ARG_TYPE_FCN_VAR, ///< Argument is the name of a function variable/argument
	RZ_CMD_ARG_TYPE_FLAG, ///< Argument is a rizin flag
	RZ_CMD_ARG_TYPE_ENUM_TYPE, ///< Argument is a C enum type name
	RZ_CMD_ARG_TYPE_STRUCT_TYPE, ///< Argument is a C struct type name
	RZ_CMD_ARG_TYPE_UNION_TYPE, ///< Argument is a C union type name
	RZ_CMD_ARG_TYPE_ALIAS_TYPE, ///< Argument is a C typedef (alias) name
	RZ_CMD_ARG_TYPE_CLASS_TYPE, ///< Argument is a C++/etc class name
	RZ_CMD_ARG_TYPE_ANY_TYPE, ///< Argument is the any of the C or C++ type name
	RZ_CMD_ARG_TYPE_GLOBAL_VAR, ///< Argument is a user defined global variable
	RZ_CMD_ARG_TYPE_REG_FILTER, ///< Argument is a register name, size, type or "all"
	RZ_CMD_ARG_TYPE_REG_TYPE, ///< Argument is a register type/arena like "gpr"
} RzCmdArgType;

Field details

        details:
            - name: Examples
            entries:
                - text: "aae"
                arg_str: ""
                comment: analyze ranges given by analysis.in
                - text: "aae"
                arg_str: " $SS @ $S"
                comment: analyze the whole section

Field details is similar to summary but gives more detailed explanations about the command. By typing aae?? in Rizin’s shell, we can see how the above content is printed.

[0x00000000]> aae??
Usage: aae [<len>]   # Analyze references with ESIL

Examples:
| aae          # analyze ranges given by analysis.in
| aae $SS @ $S # analyze the whole section

How do the definitions integrate into Rizin?

This section will discuss how the definitions in these yaml files integrate into Rizin. There is the file meson.build, which describes the building steps, under the directory containing these yaml files. Part of this build file is shown below.

cmd_descs_generate_py = files('cmd_descs_generate.py')
cmd_descs_yaml = files(
  'cmd_analysis.yaml',
  'cmd_block.yaml',
  ...
)
...
  cmd_descs_ch = custom_target(
    'cmd_descs.[ch]',
    output: ['cmd_descs.c', 'cmd_descs.h'],
    input: cmd_descs_yaml,
    command: [py3_exe, cmd_descs_generate_py, '--output-dir', '@OUTDIR@', '--src-output-dir', meson.current_source_dir(), '@INPUT@']
  )

We can see that during the building process, cmd_descs_generate.py is invoked to process yaml files. In particular, cmd_descs_generate.py will parse the yaml files and generate two C files, i.e., cmd_descs.h and cmd_descs.c. Moreover, the variable cmd_descs_ch represents two generated files. This variable is further referenced by the meson.build in the parent directory for building.

In addition, we said if the cname is analysis_all_esil, the corresponding handler of this command will be rz_analysis_all_esil_handler. The addition of the prefix and suffix is performed in cmd_desc_util.py, which is imported by cmd_desc_generate.py, as shown below.

def get_handler_cname(ty, handler, cname):
    if ty == CD_TYPE_OLDINPUT:
        return "rz_" + (handler or cname)

    return "rz_" + (handler or cname) + "_handler"

By analyzing the templates of the generated cmd_descs.c defined in cmd_descs_generate.py, we can see it includes three parts, as shown below. The first two parts, including {helps_declarations} and {helps}, declare and define the help information of commands, which is defined in the fields summary and details in the yaml files.

CMDDESCS_C_TEMPLATE = """// SPDX-FileCopyrightText: 2021 RizinOrg <info@rizin.re>
// SPDX-License-Identifier: LGPL-3.0-only
//
// WARNING: This file was auto-generated by cmd_descs_generate.py script. Do not
// modify it manually. Look at cmd_descs.yaml if you want to update commands.
//

#include <cmd_descs.h>

{helps_declarations}

{helps}
RZ_IPI void rzshell_cmddescs_init(RzCore *core) {{
\tRzCmdDesc *root_cd = rz_cmd_get_root(core->rcmd);
\trz_cmd_batch_start(core->rcmd);
{init_code}
\trz_cmd_batch_end(core->rcmd);
}}
"""

The last part {init_code} stores the defined handler and help information into the RzCore, which is an important structure during the execution. In particular, the call chain of the statement within {init_code} is rz_cmd_desc_argv_new -> argv_new -> create_cmd_desc. Within create_cmd_desc, the statement ht_pp_insert(cmd->ht_cmds, name, res) stores the name (the name of the command) and res (RzCmdDesc, stores the command’s handler and help information) as a pair into cmd->ht_cmds (core->rcmd->ht_cmds). The related code snippets are shown below.

// statement within {init_code}
RzCmdDesc *analysis_all_esil_functions_cd = rz_cmd_desc_argv_new(core->rcmd, aae_cd, "aaef", rz_analysis_all_esil_functions_handler, &analysis_all_esil_functions_help);

// cmd_api.c
RZ_API RzCmdDesc *rz_cmd_desc_argv_new(RzCmd *cmd, RzCmdDesc *parent, const char *name, RzCmdArgvCb cb, const RzCmdDescHelp *help) {
	rz_return_val_if_fail(cmd && parent && name && help && help->args, NULL);
	return argv_new(cmd, parent, name, cb, help, true);
}

// cmd_api.c
static RzCmdDesc *argv_new(RzCmd *cmd, RzCmdDesc *parent, const char *name, RzCmdArgvCb cb, const RzCmdDescHelp *help, bool ht_insert) {
	RzCmdDesc *res = create_cmd_desc(cmd, parent, RZ_CMD_DESC_TYPE_ARGV, name, help, ht_insert);
	if (!res) {
		return NULL;
	}

	res->d.argv_data.cb = cb;
	get_minmax_argc(res, &res->d.argv_data.min_argc, &res->d.argv_data.max_argc);
	return res;
}

// cmd_api.c
static RzCmdDesc *create_cmd_desc(RzCmd *cmd, RzCmdDesc *parent, RzCmdDescType type, const char *name, const RzCmdDescHelp *help, bool ht_insert) {
	RzCmdDesc *res = RZ_NEW0(RzCmdDesc);
	if (!res) {
		return NULL;
	}
	res->type = type;
	res->name = strdup(name);
	if (!res->name) {
		goto err;
	}
	res->n_children = 0;
	res->help = help ? help : &not_defined_help;
	rz_pvector_init(&res->children, (RzPVectorFree)cmd_desc_free);
	if (ht_insert && !ht_pp_insert(cmd->ht_cmds, name, res)) {
		goto err;
	}
	cmd_desc_set_parent(cmd, res, parent);
	return res;
err:
	cmd_desc_free(res);
	return NULL;
}

Moreover, we can see the function rzshell_cmddescs_init contains {init_code}. Rizin should invoke rzshell_cmddescs_init somewhere. The corresponding call chain is rz_core_new->rz_core_init->rz_core_cmd_init->rzshell_cmddescs_init. This is to say, Rizin inits the defined commands during the creation of RzCore.