Files
capa/README.md
2020-06-30 22:11:57 +02:00

538 lines
27 KiB
Markdown

# capa
capa detects capabilities in executable files.
You run it against a .exe or .dll and it tells you what it thinks the program can do.
For example, it might suggest that the file is a backdoor, is capable of installing services, or relies on HTTP to communicate.
```
$ capa.exe suspicious.exe
+------------------------+----------------------------------------------------------------------+
| ATT&CK Tactic | ATT&CK Technique |
|------------------------+----------------------------------------------------------------------|
| DEFENSE EVASION | Obfuscated Files or Information [T1027] |
| DISCOVERY | Query Registry [T1012] |
| | System Information Discovery [T1082] |
| EXECUTION | Command and Scripting Interpreter::Windows Command Shell [T1059.003] |
| | Shared Modules [T1129] |
| EXFILTRATION | Exfiltration Over C2 Channel [T1041] |
| PERSISTENCE | Create or Modify System Process::Windows Service [T1543.003] |
+------------------------+----------------------------------------------------------------------+
+-------------------------------------------------------+-------------------------------------------------+
| CAPABILITY | NAMESPACE |
|-------------------------------------------------------+-------------------------------------------------|
| check for OutputDebugString error | anti-analysis/anti-debugging/debugger-detection |
| read and send data from client to server | c2/file-transfer |
| execute shell command and capture output | c2/shell |
| receive data (2 matches) | communication |
| send data (6 matches) | communication |
| connect to HTTP server (3 matches) | communication/http/client |
| send HTTP request (3 matches) | communication/http/client |
| create pipe | communication/named-pipe/create |
| get socket status (2 matches) | communication/socket |
| receive data on socket (2 matches) | communication/socket/receive |
| send data on socket (3 matches) | communication/socket/send |
| connect TCP socket | communication/socket/tcp |
| encode data using Base64 | data-manipulation/encoding/base64 |
| encode data using XOR (6 matches) | data-manipulation/encoding/xor |
| run as a service | executable/pe |
| get common file path (3 matches) | host-interaction/file-system |
| read file | host-interaction/file-system/read |
| write file (2 matches) | host-interaction/file-system/write |
| print debug messages (2 matches) | host-interaction/log/debug/write-event |
| resolve DNS | host-interaction/network/dns/resolve |
| get hostname | host-interaction/os/hostname |
| create a process with modified I/O handles and window | host-interaction/process/create |
| create process | host-interaction/process/create |
| create registry key | host-interaction/registry/create |
| create service | host-interaction/service/create |
| create thread | host-interaction/thread/create |
| persist via Windows service | persistence/service |
+-------------------------------------------------------+-------------------------------------------------+
```
# download
Download capa from the [Releases](/releases) page or get the nightly builds here:
- Windows 64bit: TODO
- Windows 32bit: TODO
- Linux: TODO
- OSX: TODO
# contents
- [installation](#installation)
- [example](#example)
- [rule format](#rule-format)
- [meta block](#meta-block)
- [features block](#features-block)
- [extracted features](#extracted-features)
- [function features](#function-features)
- [api](#api)
- [number](#number)
- [string](#string)
- [bytes](#bytes)
- [offset](#offset)
- [mnemonic](#mnemonic)
- [characteristics](#characteristics)
- [file features](#file-features)
- [string](#file-string)
- [export](#export)
- [import](#import)
- [section](#section)
- [counting](#counting)
- [matching prior rule matches](#matching-prior-rule-matches)
- [limitations](#Limitations)
# installation
See [doc/installation.md](doc/installation.md) for information on how to setup the project, including how to use it as a Python library.
For more information about how to use capa, including running it as an IDA script/plugin see [doc/usage.md](doc/usage.md).
# example
Here we run capa against an unknown binary (`suspicious.exe`),
and the tool reports that the program can decode data via XOR,
contains an embedded PE, writes to a file, and spawns a new process.
Taken together, this makes us think that `suspicious.exe` could be a dropper or backdoor.
Therefore, our next analysis step might be to run `suspicious.exe` in a sandbox and try to recover the payload.
```
$ capa.exe suspicious.exe
+------------------------+----------------------------------------------------------------------+
| ATT&CK Tactic | ATT&CK Technique |
|------------------------+----------------------------------------------------------------------|
| DEFENSE EVASION | Obfuscated Files or Information [T1027] |
| DISCOVERY | Query Registry [T1012] |
| | System Information Discovery [T1082] |
| EXECUTION | Command and Scripting Interpreter::Windows Command Shell [T1059.003] |
| | Shared Modules [T1129] |
| EXFILTRATION | Exfiltration Over C2 Channel [T1041] |
| PERSISTENCE | Create or Modify System Process::Windows Service [T1543.003] |
+------------------------+----------------------------------------------------------------------+
+-------------------------------------------------------+-------------------------------------------------+
| CAPABILITY | NAMESPACE |
|-------------------------------------------------------+-------------------------------------------------|
| check for OutputDebugString error | anti-analysis/anti-debugging/debugger-detection |
| read and send data from client to server | c2/file-transfer |
| execute shell command and capture output | c2/shell |
| receive data (2 matches) | communication |
| send data (6 matches) | communication |
| connect to HTTP server (3 matches) | communication/http/client |
| send HTTP request (3 matches) | communication/http/client |
| create pipe | communication/named-pipe/create |
| get socket status (2 matches) | communication/socket |
| receive data on socket (2 matches) | communication/socket/receive |
| send data on socket (3 matches) | communication/socket/send |
| connect TCP socket | communication/socket/tcp |
| encode data using Base64 | data-manipulation/encoding/base64 |
| encode data using XOR (6 matches) | data-manipulation/encoding/xor |
| run as a service | executable/pe |
| contain an embedded PE file | executable/subfile/pe |
| get common file path (3 matches) | host-interaction/file-system |
| read file | host-interaction/file-system/read |
| write file (2 matches) | host-interaction/file-system/write |
| print debug messages (2 matches) | host-interaction/log/debug/write-event |
| resolve DNS | host-interaction/network/dns/resolve |
| get hostname | host-interaction/os/hostname |
| create a process with modified I/O handles and window | host-interaction/process/create |
| create process | host-interaction/process/create |
| create registry key | host-interaction/registry/create |
| create service | host-interaction/service/create |
| create thread | host-interaction/thread/create |
| persist via Windows service | persistence/service |
+-------------------------------------------------------+-------------------------------------------------+
```
By passing the `-vv` flag (for Very Verbose), capa reports exactly where it found evidence of these capabilities.
This is useful for at least two reasons:
- it helps explain why we should trust the results, and enables us to verify the conclusions, and
- it shows where within the binary an experienced analyst might study with IDA Pro
```
λ capa.exe suspicious.exe -vv
execute shell command and capture output
namespace c2/shell
author matthew.williams@fireeye.com
scope function
att&ck Execution::Command and Scripting Interpreter::Windows Command Shell [T1059.003]
references https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/ns-processthreadsapi-startupinfoa
examples Practical Malware Analysis Lab 14-02.exe_:0x4011C0
function @ 0x10003A13
and:
match: create a process with modified I/O handles and window @ 0x10003A13
and:
or:
api: kernel32.CreateProcess @ 0x10003D6D
number: 0x101 @ 0x10003B03
or:
number: 0x44 @ 0x10003ADC
optional:
api: kernel32.GetStartupInfo @ 0x10003AE4
match: create pipe @ 0x10003A13
or:
api: kernel32.CreatePipe @ 0x10003ACB
or:
string: cmd.exe /c @ 0x10003AED
...
```
# rule format
capa uses a collection of rules to identify capabilities within a program.
These rules are easy to write, even for those new to reverse engineering.
By authoring rules, you can extend the capabilities that capa recognizes.
In some regards, capa rules are a mixture of the OpenIOC, Yara, and YAML formats.
Here's an example rule used by capa:
```
───────┬──────────────────────────────────────────────────────────────────────────
│ File: rules/data-manipulation/checksum/crc32/chechsum-data-with-crc32.yml
───────┼──────────────────────────────────────────────────────────────────────────
1 │ rule:
2 │ meta:
3 │ name: checksum data with CRC32
4 │ namespace: data-manipulation/checksum/crc32
5 │ author: moritz.raabe@fireeye.com
6 │ scope: function
7 │ examples:
8 │ - 2D3EDC218A90F03089CC01715A9F047F:0x403CBD
9 │ - 7D28CB106CB54876B2A5C111724A07CD:0x402350 # RtlComputeCrc32
10 │ features:
11 │ - or:
12 │ - and:
13 │ - mnemonic: shr
14 │ - number: 0xEDB88320
15 │ - number: 8
16 │ - characteristic(nzxor): true
17 │ - api: RtlComputeCrc32
──────────────────────────────────────────────────────────────────────────────────
```
Rules are yaml files that follow a certain schema.
The top-level element is a dictionary named `rule` with two required children dictionaries:
`meta` and `features`.
## meta block
The meta block contains metadata that identifies the rule, groups the technique,
and provides references to additional documentation.
Here are the common fields:
- `name` is required. This string should uniquely identify the rule.
- `namespace` is required when a rule describes a technique (as opposed to matching a role or disposition).
The namespace helps us group rules into buckets, such as `host-manipulation/file-system` or `impact/wipe-disk`.
When capa emits its final report, it orders the results by category, so related techniques show up together.
- `att&ck` is an optional list of [ATT&CK framework](https://attack.mitre.org/) techniques that the rule implies, like
`Discovery::Query Registry [T1012]` or `Persistence::Create or Modify System Process::Windows Service [T1543.003]`.
These tags are used to derive the ATT&CK mapping for the sample when the report gets rendered.
- `mbc` is an optional list of [Malware Behavior Catalog](https://github.com/MBCProject/mbc-markdown) techniques that the rule implies,
like the ATT&CK list.
- `maec/malware-category` is required when the rule describes a role, such as `dropper` or `backdoor`.
- `maec/analysis-conclusion` is required when the rule describes a disposition, such as `benign` or `malicious`.
- `scope` indicates to which feature set this rule applies.
It can take the following values:
- **`basic block`:** limits matches to a basic block.
It is used to achieve locality in rules (for example for parameters of a function).
- **`function`:** identify functions.
It doesn't support child functions (see [doc/limitations.md](doc/limitations.md#wrapper-functions-and-matches-in-child-functions)).
It is the default.
- **`file`:** matches file format aspects.
- **`program`:** *matches the matches* of `function` and `file` scopes.
Not yet implemented.
- `author` specifies the name or handle of the rule author.
- `examples` is a required list of references to samples that should match the capability.
When the rule scope is `function`, then the reference should be `<sample hash>:<function va>`.
- `references` lists related information in a book, article, blog post, etc.
Other fields are allowed but not defined in this specification. `description` is probably a good one.
## features block
This section declares logical statements about the features that must exist for the rule to match.
There are five structural expressions that may be nested:
- `and` - all of the children expressions must match
- `or` - match at least one of the children
- `not` - match when the child expression does not
- `N or more` - match at least `N` or more of the children
- `optional` is an alias for `0 or more`, which is useful for documenting related features. See [write-file.yml](/rules/machine-access-control/file-manipulation/write-file.yml) for an example.
For example, consider the following rule:
```
9 │ - and:
10 │ - mnemonic: shr
11 │ - number: 0xEDB88320
12 │ - number: 8
13 │ - characteristic(nzxor): True
```
For this to match, the function must:
- contain an `shr` instruction, and
- reference the immediate constant `0xEDB88320`, which some may recognize as related to the CRC32 checksum, and
- reference the number `8`, and
- have an unusual feature, in this case, contain a non-zeroing XOR instruction
If only one of these features is found in a function, the rule will not match.
## limitations
### circular rule dependencies
While capa supports [matching on prior rule matches](#matching-prior-rule-matches) users should ensure that their rules do not introduce circular dependencies between rules.
# extracted features
## function features
capa extracts features from the disassembly of a function, such as which API functions are called.
The tool also reasons about the code structure to guess at function-level constructs.
These are the features supported at the function-scope:
- [api](#api)
- [number](#number)
- [string](#string)
- [bytes](#bytes)
- [offset](#offset)
- [mnemonic](#mnemonic)
- [characteristics](#characteristics)
### api
A call to a named function, probably an import,
though possibly a local function (like `malloc`) extracted via FLIRT.
The parameter is a string describing the function name, specified like `module.functionname` or `functionname`.
Windows API functions that take string arguments come in two API versions. For example, `CreateProcessA` takes ANSI strings and `CreateProcessW` takes Unicode strings. capa extracts these API features both with and without the suffix character `A` or `W`. That means you can write a rule to match on both APIs using the base name. If you want to match a specific API version, you can include the suffix.
Example:
api: kernel32.CreateFile # matches both Ansi (CreateFileA) and Unicode (CreateFileW) versions
api: CreateFile
api: GetEnvironmentVariableW # only matches on Unicode version
### number
A number used by the logic of the program.
This should not be a stack or structure offset.
For example, a crypto constant.
The parameter is a number; if prefixed with `0x` then in hex format, otherwise, decimal format.
To associate context with a number, e.g. for constant definitions, append an equal sign and the respective name to
the number definition. This helps with documenting rules and provides context in capa's output.
Examples:
number: 16
number: 0x10
number: 0x40 = PAGE_EXECUTE_READWRITE
Note that capa treats all numbers as unsigned values. A negative number is not a valid feature value.
To match a negative number you may specify its two's complement representation. For example, `0xFFFFFFF0` (`-2`) in a 32-bit file.
### string
A string referenced by the logic of the program.
This is probably a pointer to an ASCII or Unicode string.
This could also be an obfuscated string, for example a stack string.
The parameter is a string describing the string.
This can be the verbatim value, or a regex matching the string.
Regexes should be surrounded with `/` characters.
By default, capa uses case-sensitive matching and assumes leading and trailing wildcards.
To perform case-insensitive matching append an `i`. To anchor the regex at the start or end of a string, use `^` and/or `$`.
Examples:
string: This program cannot be run in DOS mode.
string: Firefox 64.0
string: /SELECT.*FROM.*WHERE/
string: /Hardware\\Description\\System\\CentralProcessor/i
Note that regex matching is expensive (`O(features)` rather than `O(1)`) so they should be used sparingly.
### bytes
A sequence of bytes referenced by the logic of the program.
The provided sequence must match from the beginning of the referenced bytes and be no more than `0x100` bytes.
The parameter is a sequence of hexadecimal bytes followed by an optional description.
The example below illustrates byte matching given a COM CLSID pushed onto the stack prior to `CoCreateInstance`.
Disassembly:
push offset iid_004118d4_IShellLinkA ; riid
push 1 ; dwClsContext
push 0 ; pUnkOuter
push offset clsid_004118c4_ShellLink ; rclsid
call ds:CoCreateInstance
Example rule elements:
bytes: 01 14 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = CLSID_ShellLink
bytes: EE 14 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = IID_IShellLink
### offset
A structure offset referenced by the logic of the program.
This should not be a stack offset.
The parameter is a number; if prefixed with `0x` then in hex format, otherwise, decimal format.
Examples:
offset: 0xC
offset: 0x14
Note that capa treats all offsets as unsigned values. A negative number is not a valid feature value.
### mnemonic
An instruction mnemonic found in the given function.
The parameter is a string containing the mnemonic.
Examples:
mnemonic: xor
mnemonic: shl
### characteristics
Characteristics are features that are extracted by the analysis engine.
They are one-off features that seem interesting to the authors.
For example, the `characteristic(nzxor)` feature describes non-zeroing XOR instructions.
capa does not support instruction pattern matching,
so a select set of interesting instructions are pulled out as characteristics.
| characteristic | scope | description |
|--------------------------------------------|-----------------------|-------------|
| `characteristic(embedded pe): true` | file | (XOR encoded) embedded PE files. |
| `characteristic(switch): true` | function | Function contains a switch or jump table. |
| `characteristic(loop): true` | function | Function contains a loop. |
| `characteristic(recursive call): true` | function | Function is recursive. |
| `characteristic(calls from): true` | function | There are unique calls from this function. Best used like: `count(characteristic(calls from)): 3 or more` |
| `characteristic(calls to): true` | function | There are unique calls to this function. Best used like: `count(characteristic(calls to)): 3 or more` |
| `characteristic(nzxor): true` | basic block, function | Non-zeroing XOR instruction |
| `characteristic(peb access): true` | basic block, function | Access to the process environment block (PEB), e.g. via fs:[30h], gs:[60h], or `NtCurrentPeb` |
| `characteristic(fs access): true` | basic block, function | Access to memory via the `fs` segment. |
| `characteristic(gs access): true` | basic block, function | Access to memory via the `gs` segment. |
| `characteristic(cross section flow): true` | basic block, function | Function contains a call/jump to a different section. This is commonly seen in unpacking stubs. |
| `characteristic(tight loop): true` | basic block | A tight loop where a basic block branches to itself. |
| `characteristic(indirect call): true` | basic block, function | Indirect call instruction; for example, `call edx` or `call qword ptr [rsp+78h]`. |
## file features
capa extracts features from the file data.
File features stem from the file structure, i.e. PE structure or the raw file data.
These are the features supported at the file-scope:
- [string](#file-string)
- [export](#export)
- [import](#import)
- [section](#section)
### file string
An ASCII or UTF-16 LE string present in the file.
The parameter is a string describing the string.
This can be the verbatim value, or a regex matching the string.
Regexes should be surrounded with `/` characters. By default, capa uses case-sensitive matching.
To perform case-insensitive matching append an `i`.
Examples:
string: Z:\Dev\dropper\dropper.pdb
string: [ENTER]
string: /.*VBox.*/
string: /.*Software\Microsoft\Windows\CurrentVersion\Run.*/i
Note that regex matching is expensive (`O(features)` rather than `O(1)`) so they should be used sparingly.
### export
The name of a routine exported from a shared library.
Examples:
export: InstallA
### import
The name of a routine imported from a shared library.
Examples:
import: kernel32.WinExec
import: WinExec # wildcard module name
import: kernel32.#22 # by ordinal
### section
The name of a section in a structured file.
Examples:
section: .rsrc
## counting
Many rules will inspect the feature set for a select combination of features;
however, some rules may consider the number of times a feature was seen in a feature set.
These rules can be expressed like:
count(characteristic(nzxor)): 2 # exactly match count==2
count(characteristic(nzxor)): 2 or more # at least two matches
count(characteristic(nzxor)): 2 or fewer # at most two matches
count(characteristic(nzxor)): (2, 10) # match any value in the range 2<=count<=10
count(mnemonic(mov)): 3
count(basic block): 4
## matching prior rule matches
capa rules can specify logic for matching on other rule matches.
This allows a rule author to refactor common capability patterns into their own reusable components.
You can specify a rule match expression like so:
- and:
- match: file creation
- match: process creation
Rules are uniquely identified by their `rule.meta.name` property;
this is the value that should appear on the right-hand side of the `match` expression.
capa will refuse to run if a rule dependency is not present during matching.
Common rule patterns, such as the various ways to implement "writes to a file", can be refactored into "library rules".
These are rules with `rule.meta.lib: True`.
By default, library rules will not be output to the user as a rule match,
but can be matched by other rules.
When no active rules depend on a library rule, these the library rules will not be evaluated - maintaining performance.
# limitations
To learn more about capa's current limitations see [here](doc/limitations.md).