Renaming Subroutine-Blocks and Identifying Them in IDA

As Dan recently mentioned we have been working on a project called IDAScope. Dan has been doing a lot of work on flow and code analysis of functions. While I have been working on analysis of multiple functions and how they relate. One of the goals of IDAScope is to help with speeding up the reversing process. I have been spending a good amount of time analyzing banking trojans. One of the daunting parts of reversing banking malware is the size of the code. Zeus is 1k+ functions, some Ransomware is 1.2k+ and Gozi is .5k+ functions. Banking trojans offer a wide array of functionality. One function could be responsible for browser injection, another for process monitoring, and another for creating a GET request to download a config file. When dealing with so many functions it's easy to get lost. In order to prevent getting lost I'll label each function I reviewed. This helps me know what the function does with a glance. Plus, if all the functions have been labeled it's unlikely that I missed functionality. Sometimes I'll come across a function that contains child functions that all have similar functionality. As long as I can identify the functionality, I can label all the child function and move on. But what about renaming all the functions for an executable that is linked with Openssl, zlib or some other library? It's not feasible to do it by hand but it's easy to do using IDAPython.

The first thing that we will need is the root parent function. Usually this function is identified during the reversing process. I have been working on a script that will automatically identify large block of functions that are linked libraries or stand alone subroutine-blocks. This will be described later on in this post.  Let's use Gozi as an example. One of it's commands in Gozi is called "GET_CERTS". This function will grab a set of certificates such as AuthRoot, Certificate Authority and a couple of others then write them to the %TEMP% directory, zip up the files and where they will wait till they are processed. The zip library is statically linked to the Gozi binary.



The above image is the cross-reference from call graph in IDA for the function that I identified as the root parent for the zip functionality. I don't know if the term is correct but I have started calling the root parent function and all the child nodes, subroutine-blocks. All the functions for a linked library could be thought of as one subroutine-block. They are all used to support some similar functionality or all the child nodes can be traced back to a root function. In the Gozi zip example we have 86 nodes. Let's describe real quick how we could rename all of these functions. First we will need to find the root parent function (check), then a string of what we want to label the function with, then we will need to recursively create a set that contains all the functions down, we will need to remove all APIs names from the set and then we will rename all the functions in the set.

Now let's describe how we can do that using IDAPython. Getting the function and label/tag can be done in IDA with the AskStr('example_name', "Text to ask"). We can recursively "graph_down" by stealing code from Carlos G. Prado on his awesome  Milf-Plugin for IDA. Then we can get a set of all the imported API by using idaapi.get_import_module_name() and idaapi.enum_import_names(). Once we have a set of the code path and a set of all the API names we can remove all intersections from the code path set. This will remove any imported API names because we don't want to rename them. Then from there it's a simple for loop to rename the function name with the MakeNameEx() function. Let's go over how this would look visually in IDA.



Entering the root parent function name that we would like to recursively rename all the child nodes.


Add the tag/label that we would like to add to the start of the function name. Note if the function has already been labeled with the tag it will not be renamed.

 Now all the subroutine-block names have been tagged.  For people who don't want to wait for IDAScope to be released you can download the source code below. Previously in the post I mentioned that I have been working on a script that will automatically identify subroutine-blocks for stand alone code or linked libraries. A side effect of child functions that are part of subroutine-blocks is they are not often called outside of their subroutine-block. This can be seen easily using IDA's proximity view.

Yeah, that was the first time I found use of proximity view also, kind of cool. We can see from the graph all of our zip functions that were previously renamed. If we review all the functions we will notice there are not any calls to and from outside of the subroutine-block. A way that we can automatically identify these subroutine-blocks is to to enumerate all functions, remove all imported functions, for each function we create a self that includes all child functions/nodes that we will call the path, then for each function in the path we check if the cross-reference to the function are outside of the path. If the cross-reference is outside of the path then we know it's not a subroutine-block because it's not self contained. This scenario is actually extremely rare because its too rigid.  In order to help with this we can add a threshold. The threshold is a Python set that is updated if a function contained in the subroutine-block is cross-referenced by another function outside of the subroutine-block. If we allow a threshold of one or two than we can successfully identify sub-routine blocks.

Disclaimer: I don' have much background in algorithms. Nor do I have a background in graph theory. My approach is basically a brute-force approach. I'm sure this could be speed up or optimized. As Dan mentioned to me there are some issues with linear complexity in my algorithm. The speed is decent on IDBs that contain less that 1k functions. After that it can run a little slow. But then again it's much quicker than trying to find it by hand. 

Time for an example.  Note: ALT-F9 is your friend if you are developing scripts in IDA.


The above output is from a python script that can be found below. At the bottom of the output window you can see that we identified the parent function of  what was thought to be the start of the subroutine-block for the zip functions. Hopefully by being able to recursively rename functions and automatically being able to identify those subroutine-blocks it will help you with speeding up the reversing process. Below is the code for anyone who doesn't want to wait for IDAScope to be released.

rec_rename.py - recursively rename function - Download

## This script helps with renaming functions
## See the following link for more info
## Created by alexander.hanel@gmail.com

from idaapi import * 
import idautils
import idc
import sys
imports_list = []

# The following two functions are used to get the import API names.
# ret_list_of_imports() returns the api names in a list
def imp_cb(ea, name, ord):
    global imports_list 
    if not name:
        pass 
    else:
        imports_list.append(name)
    return True

def ret_list_of_imports():
    global imports_list 
    nimps = idaapi.get_import_module_qty()
    for i in xrange(0,nimps):
        name = idaapi.get_import_module_name(i)
        if not name:
            pass 
        idaapi.enum_import_names(i, imp_cb)

    return imports_list


def graph_down(ea, graph = {}, path = set([])):
    # This function was borrowed from Carlos G. Prado. Check out his Milf-Plugin for IDA on Google Code. 
    graph[ea] = list()    # Create a new entry on the graph dictionary {node: [child1, child2, ...], ...}
    path.add(ea)        # This is a set, therefore the add() method

    # Iterate through all function instructions and take only call instructions
    for x in [x for x in FuncItems(ea) if is_call_insn(x)]:        # Take the call elements
            for xref in XrefsFrom(x, XREF_FAR):                                   
                    if not xref.iscode:
                            continue
                                    
                    if xref.to not in path:        # Eliminates recursions
                            graph[ea].append(xref.to)
                            graph_down(xref.to, graph, path)
    return path

def main():
    # Get function name as input.
    func_name = LocByName(AskStr("sub_0xxxxx", "Enter Function Name"))

    if func_name == 0xffffffff:
        Warning("[ERROR] Bad Function Name [ERROR]")
        return

    tag = AskStr("string", "Function Tag")  
    if tag == None:
        Warning("[ERROR] Tag cannot be None [ERROR]")
        return

    list_imports = ret_list_of_imports()
    # graph down needs the address of the function passed. 
    nodes_xref_down = graph_down(func_name,graph = {}, path = set([]))
    # graph_down returns the int address needs to be converted 
    tmp  = []
    tmp1 = ''
    for func in nodes_xref_down:
        tmp1 = GetFunctionName(func)
        if tmp1 != '':
            tmp.append(tmp1)
    nodes_xref_down = tmp

    # Remove the APIs from the xref list 
    for xx in set(list_imports).intersection(set(nodes_xref_down)):
        nodes_xref_down.remove(xx)

    for rename in nodes_xref_down:
        func_addr =  LocByName(rename)
        if tag not in rename:
            MakeNameEx(func_addr, str(tag) + str('_') + rename, SN_NOWARN) 
 
 =============================================================================
 
 sub_blocks.py - subroutine-blocks finder - Download
 
 if __name__ == "__main__":
    main()
## This script will find subroutine-blocks using IDA
## See the following link for more info
## Created by alexander.hanel@gmail.com

from idaapi import * 
import idautils
import idc
import operator
import sys
imports_list = []

sys.setrecursionlimit(2000)

# Get every function name. Returns the function names in a list. This is basically
# the output from the "Function name" Tab/Window 
def get_func_names():
    func_name_list = []
    for x in idautils.Functions(): func_name_list.append(GetFunctionName(x))
    return func_name_list

# The following two functions are used to get the import API names.
# ret_list_of_imports() returns the api names in a list
def imp_cb(ea, name, ord):
    global imports_list 
    if not name:
        pass 
    else:
        imports_list.append(name)
    return True

# 2nd function as described above
def ret_list_of_imports():
    global imports_list 
    nimps = idaapi.get_import_module_qty()
    for i in xrange(0,nimps):
        name = idaapi.get_import_module_name(i)
        if not name:
            pass 
        idaapi.enum_import_names(i, imp_cb)

    return imports_list

# Returns a set of of calls that are found xrefed from
def graph_down(ea, graph = {}, path = set([])):
    # This function was borrowed from Carlos G. Prado. Check out his Milf-Plugin for IDA on Google Code. 
    graph[ea] = list()    # Create a new entry on the graph dictionary {node: [child1, child2, ...], ...}
    path.add(ea)        # This is a set, therefore the add() method

    # Iterate through all function instructions and take only call instructions
    for x in [x for x in FuncItems(ea) if is_call_insn(x)]:        # Take the call elements
            for xref in XrefsFrom(x, XREF_FAR):                                   
                    if not xref.iscode:
                            continue
                                    
                    if xref.to not in path:        # Eliminates recursions
                            graph[ea].append(xref.to)
                            graph_down(xref.to, graph, path)
    return path

def get_nodes(func_name, list_imports):
    threshold = set([])
    # Create list of each node in the xref from
    nodes_xref_down = graph_down(LocByName(func_name),graph = {}, path = set([]))
    # graph_down returns the int address needs to be converted 
    tmp  = []
    tmp1 = ''
    for func in nodes_xref_down:
        tmp1 = GetFunctionName(func)
        if tmp1 != '':
            tmp.append(tmp1)
    nodes_xref_down = tmp
    # Do not want the parent function xrefs to, just the child functions
    nodes_xref_down.remove(func_name)

    # Remove the APIs from the xref list 
    for xx in set(list_imports).intersection(set(nodes_xref_down)):
        nodes_xref_down.remove(xx)

    for nodes in nodes_xref_down:
        # Get all code xrefs to 
        ref_addr = CodeRefsTo(LocByName(nodes),0)

        # For each code xrefs to is not in nodes 
        for func_ref in ref_addr:
            # if func is not in all nodes 
            if GetFunctionName(func_ref) not in nodes_xref_down and GetFunctionName(func_ref) != func_name:
                threshold.add(GetFunctionName(func_ref))
                if len(threshold) == 3:
                    return
    ret = []   
    if len(threshold) < 3 and len(threshold) != 0:
        ret.append(func_name)
        ret.append(len(nodes_xref_down))
        for each in threshold:
            if each == '':
                ret.append('Unknown Function Address')
            else:
                ret.append(each)
    return ret

def sortByColumn(bigList, *args):
    bigList.sort(key=operator.itemgetter(*args))
    # Uncomment and comment above if sort from high to low, rather high to low
    #bigList.sort(key=operator.itemgetter(*args), reverse = True) # sorts the list in place

def main():
    data_o = []
    k = ''
    # List of all import APIs
    list_imports = ret_list_of_imports()
    # List of all function names
    list_func_names = get_func_names()

    # Some API names will be labeled functions by IDA. The below for loop
    # deletes all intersections that reside in both lists. 
    for intersection_function_name in set(list_imports).intersection(set(list_func_names)):
         list_func_names.remove(intersection_function_name)

    # For each func_name in set
    for func_name in list_func_names:
        k = get_nodes(func_name,list_imports)
        if k != None and k != []:
            data_o.append(k)
            
    # Sort by number of child nodes
    sortByColumn(data_o,1)

    for results in data_o:
        print 'Subroutine-Block: %s' % results[0]
        print '\tChild Nodes: %s' % results[1]
        if len(results) >= 3:
            print '\tThreshold Function: %s' %  results[2]
                
        if len(results) == 4:
            print '\tThreshold Function: %s' %  results[3]

    print 'Completed'


if __name__ == "__main__":
    main()

No comments:

Post a Comment