Commit Graph

464 Commits

Author SHA1 Message Date
Willi Ballenthin
aa8e4603d1 inspect-binexport2: render aarch64 vector element sizes
closes #2528
2024-12-09 11:27:11 +01:00
Willi Ballenthin
5fdf7e61e2 inspect-binexport2: better render ARM lsl/lsr and pruned expressions 2024-12-06 07:19:39 +01:00
Harshit Wadhwani
28c0234339 Fix: Issue #2307 (#2439)
* fix #2307

---------

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2024-12-05 09:53:15 +01:00
mr-tz
2987eeb0ac update type annotations
tmp
2024-10-22 09:38:25 +00:00
Fariss
51a4eb46b8 replace tqdm, termcolor, tabulate with rich (#2374)
* logging: use rich handler for logging

* tqdm: remove unneeded redirecting_print_to_tqdm function

* tqdm: introduce `CapaProgressBar` rich `Progress` bar

* tqdm: replace tqdm with rich Progress bar

* tqdm: remove tqdm dependency

* termcolor: replace termcolor and update `scripts/`

* tests: update `test_render.py` to use rich.console.Console

* termcolor: remove termcolor dependency

* capa.render.utils: add `write` & `writeln` methods to subclass `Console`

* update markup util functions to use fmt strings

* tests: update `test_render.py` to use `capa.render.utils.Console`

* replace kwarg `end=""` with `write` and `writeln` methods

* tabulate: replace tabulate with `rich.table`

* tabulate: remove `tabulate` and its dependency `wcwidth`

* logging: handle logging in `capa.main`

* logging: set up logging in `capa.main`

this commit sets up logging in `capa.main` and uses a shared
`log_console` in `capa.helpers` for logging purposes

* changelog: replace packages with rich

* remove entry from pyinstaller and unneeded progress.update call

* update requirements.txt

* scripts: use `capa.helpers.log_console` in `CapaProgressBar`

* logging: configure root logger to use `RichHandler`

* remove unused import `inspect`
2024-09-27 09:34:21 +02:00
Willi Ballenthin
bcd57a9af1 detect and use third-party analysis backends when possible (#2380)
* introduce script to detect 3P backends

ref #2376

* add idalib backend

* binary ninja: search for API using XDG desktop entry

ref #2376

* binja: search more XDG locations for desktop entry

* binary ninja: optimize embedded PE scanning

closes #2397

* add script for comparing the performance of analysis backends
2024-09-26 13:21:55 +02:00
Willi Ballenthin
ee17d75be9 implement BinExport2 backend (#1950)
* elf: os: detect Android via clang compiler .ident note

* elf: os: detect Android via dependency on liblog.so

* main: split main into a bunch of "main routines"

[wip] since there are a few references to BinExport2
that are in progress elsewhre. Next commit will remove them.

* features: add BinExport2 declarations

* BinExport2: initial skeleton of feature extraction

* main: remove references to wip BinExport2 code

* changelog

* main: rename first position argument "input_file"

closes #1946

* main: linters

* main: move rule-related routines to capa.rules

ref #1821

* main: extract routines to capa.loader module

closes #1821

* add loader module

* loader: learn to load freeze format

* freeze: use new cli arg handling

* Update capa/loader.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* main: remove duplicate documentation

* main: add doc about where some functions live

* scripts: migrate to new main wrapper helper functions

* scripts: port to main routines

* main: better handle auto-detection of backend

* scripts: migrate bulk-process to main wrappers

* scripts: migrate scripts to main wrappers

* main: rename *_from_args to *_from_cli

* changelog

* cache-ruleset: remove duplication

* main: fix tag handling

* cache-ruleset: fix cli args

* cache-ruleset: fix special rule cli handling

* scripts: fix type bytes

* main: nicely format debug messages

* helpers: ensure log messages aren't very long

* flake8 config

* binexport2: formatting

* loader: learn to load BinExport2 files

* main: debug log the format and backend

* elf: add more arch constants

* binexport: parse global features

* binexport: extract file features

* binexport2: begin to enumerate function/bb/insns

* binexport: pass context to function/bb/insn extractors

* binexport: linters

* binexport: linters

* scripts: add script to inspect binexport2 file

* inspect-binexport: fix xref symbols

* inspect-binexport: factor out the index building

* binexport: move index to binexport extractor module

* binexport: implement ELF/aarch64 GOT/thunk analyzer

* binexport: implement API features

* binexport: record the full vertex for a thunk

* binexport: learn to extract numbers

* binexport: number: skipped mapped numbers

* binexport: fix basic block address indexing

* binexport: rename function

* binexport: extract operand numbers

* binexport: learn to extract calls from characteristics

* binexport: learn to extract mnemonics

* pre-commit: skip protobuf file

* binexport: better search for sample file

* loader: add file extractors for BinExport2

* binexport: remove extra parameter

* new black config

* binexport: index string xrefs

* binexport: learn to extract bytes and strings

* binexport: cache parsed PE/ELF

* binexport: handle Ghidra SYMBOL numbers

* binexport2: handle binexport#78 (Ghidra only uses SYMBOL expresssions)

* main: write error output to stderr, not stdout

* scripts: add example detect-binexport2-capabilities.py

* detect-binexport2-capabilities: more documentation/examples

* elffile: recognize more architectures

* binexport: handle read_memory errors

* binexport: index flow graphs by address

* binexport: cleanup logging

* binexport: learn to extract function names

* binexport: learn to extract all function features

* binexport: learn to extract bb tight loops

* elf: don't require vivisect just for type annotations

* main: remove unused imports

* rules: don't eagerly import ruamel until needed

* loader: avoid eager imports of some backend-related code

* changelog

* fmt

* binexport: better render optional fields

* fix merge conflicts

* fix formatting

* remove Ghidra data reference madness

* handle PermissionError when searching sample file for BinExport2 file

* handle PermissionError when searching sample file for BinExport2 file

* add Android as valid OS

* inspect-binexport: strip strings

* inspect-binexport: render operands

* fix lints

* ruff: update config layout

* inspect-binexport: better align comments/xrefs

* use explicit search paths to get sample for BinExport file

* add initial BinExport tests

* add/update BinExport tests and minor fixes

* inspect-binexport: add perf tracking

* inspect-binexport: cache rendered operands

* lints

* do not extract number features for ret instructions

* Fix BinExport's "tight loop" feature extraction.

`idx.target_edges_by_basic_block_index[basic_block_index]` is of type
`List[Edges]`. The index `basic_block_index` was definitely not an
element.

* inspect-binexport: better render data section

* linters

* main: accept --format=binexport2

* binexport: insn: add support for parsing bare immediate int operands

* binexport2: bb: fix tight loop detection

ref #2050

* binexport: api: generate variations of Win32 APIs

* lints

* binexport: index: don't assume instruction index is 1:1 with address

* be2: index instruction addresses

* be2: temp remove bytes feature processing

* binexport: read memory from an address space extracted from PE/ELF

closes #2061

* be2: resolve thunks to imported functions

* be2: check for be2 string reference before bytes/string extraction overhead

* be2: remove unneeded check

* be2: do not process thunks

* be2: insn: polish thunk handling a bit

* be2: pre-compute thunk targets

* parse negative numbers

* update tests to use Ghidra-generated BinExport file

* remove unused import

* black reformat

* run tests always (for now)

* binexport: tests: fix test case

* binexport: extractor: fix insn lint

* binexport: addressspace: use base address recovered from binexport file

* Add nzxor charecteristic in BinExport extractor.

by referencing vivisect implementation.

* add tests, fix stack cookie detection

* test BinExport feature PRs

* reformat and fix

* complete TODO descriptions

* wip tests

* binexport: add typing where applicable (#2106)

* binexport2: revert import names from BinExport2 proto

binexport2_pb.BinExport2 isnt a package so we can't import it like:

    from ...binexport2_pb.BinExport2 import CallGraph

* fix stack offset numbers and disable offset tests

* xfail OperandOffset

* generate symbol variants

* wip: read negative numbers

* update tight loop tests

* binexport: fix function loop feature detection

* binexport: update binexport function loop tests

* binexport: fix lints and imports

* binexport: add back assert statement to thunk calculation

* binexport: update tests to use Ghidra binexport file

* binexport: add additional debug info to thunk calculation assert

* binexport: update unit tests to focus on Ghidra

* binexport: fix lints

* binexport: remove Ghidra symbol madness and fix x86/amd64 stack offset number tests

* binexport: use masking for Number features

* binexport: ignore call/jmp immediates for intel architecture

* binexport: check if immediate is a mapped address

* binexport: emit offset features for immediates likely structure offsets

* binexport: add twos complement wrapper insn.py

* binexport: add support for x86 offset features

* binexport: code refactor

* binexport: init refactor for multi-arch instruction feature parsing

* binexport: intel: emit indirect call characteristic

* binexport: use helper method for instruction mnemonic

* binexport: arm: emit offset features from stp instruction

* binexport: arm: emit indirect call characteristic

* binexport: arm: improve offset feature extraction

* binexport: add workaroud for Ghidra bug that results in empty operands (no expressions)

* binexport: skip x86 stack string tests

* binexport: update mimikatz.exe_ feature count tests for Ghidra

* core: loader: update binja import

* core: loader: update binja imports

* binexport: arm: ignore number features for add instruction manipulating stack

* binexport: update unit tests

* binexport: arm: ignore number features for sub instruction manipulating stack

* binexport: arm: emit offset features for add instructions

* binexport: remove TODO from tests workflow

* binexport: update CHANGELOG

* binexport: remove outdated TODOs

* binexport: re-enable support for data references in inspect-binexport2.py

* binexport: skip data references to code

* binexport: remove outdated TODOs

* Update scripts/inspect-binexport2.py

* Update CHANGELOG.md

* Update capa/helpers.py

* Update capa/features/extractors/common.py

* Update capa/features/extractors/binexport2/extractor.py

* Update capa/features/extractors/binexport2/arch/arm/insn.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* initial add

* test binexport scripts

* add tests using small ARM ELF

* add method to get instruction by address

* index instructions by address

* adjust and extend tests

* handle operator with no children bug

* binexport: use instruction address index

ref: https://github.com/mandiant/capa/pull/1950/files#r1728570811

* inspect binexport: handle lsl with no children

add pruning phase to expression tree building
to remove known-bad branches. This might address
some of the data we're seeing due to:
https://github.com/NationalSecurityAgency/ghidra/issues/6821

Also introduces a --instruction optional argument
to dump the details of a specific instruction.

* binexport: consolidate expression tree logic into helpers

* binexport: index instruction indices by address

* binexport: introduce instruction pattern matching

Introduce intruction pattern matching to declaratively
describe the instructions and operands that we want to
extract. While there's a bit more code, its much more
thoroughly tested, and is less brittle than the prior
if/else/if/else/if/else implementation.

* binexport: helpers: fix missing comment words

* binexport: update tests to reflect updated test files

* remove testing of feature branch

---------

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
Co-authored-by: mr-tz <moritz.raabe@mandiant.com>
Co-authored-by: Lin Chen <larch.lin.chen@gmail.com>
2024-09-12 10:09:05 -06:00
mr-tz
e8550f242c rename using dashes for consistency 2024-08-26 09:55:00 +00:00
Moritz
d98c315eb4 Merge branch 'master' into vmray-extractor 2024-08-26 11:31:18 +02:00
Willi Ballenthin
a33f67b48e add landing page and rules website (#2310)
* web: index: add gif of capa running

* index: add screencast of running capa

produced via:

```
asciinema capa.cast
./capa Practical\ Malware\ Analysis\ Lab\ 01-01.dll_
<ctrl-d>
agg --no-loop --theme solarized-light capa.cast capa.gif
```

* web: index: start to sketch out style

* web: landing page

* web: merge rules website

* web: rules: update bootstrap and integrate rules

* web: rules: use pygments to syntax highlight rules

Use the Pygments syntax-highlighting library to parse
and render the YAML rule content. This way we don't have
to manually traverse the rule nodes and emit lists; instead,
we rely on the fact that YAML is pretty easy for humans
to read and let them consume it directly, with some text 
formatting to help hint at the types/structure.

* web: rules: use capa to load rule content

capa (the library) has routines for deserializing the YAML
content into structured objects, which means we can use tools
like mypy to find bugs. So, prefer to use those routines instead
of parsing YAML ourselves.

* web: rules: linters

Run and fix the issues identified by the following linters:

  - isort
  - black
  - ruff
  - mypy

* web: rules: add some links to rule page

Add links to the following external resources:

  - GitHub rule source in capa-rules repo
  - VirusTotal search for matching samples

* web: rules: accept ?q= parameter for initial search

Update the rules landing page to accept a HTTP
query parameter named "q" that specifies an initial 
search term to to pass to pagefind. This enables
external pages link to rule searches.

* web: rules: add link to namespace search

* web: rules: use consistent header

Import header from root capa landing page.

* web: rules: add umami script

* web: add initial whats new section, TODOs

* web: rules: remove old images

* changelog

* CI: remove temporary branch push event triggers

* Delete web/rules/public/css/bootstrap-4.5.2.min.css

* Delete web/rules/public/js/bootstrap-4.5.2.min.js

* Delete web/public/img/capa.cast

* Rename readme.md to README.md

* web: rules: add scripts to pre-commit configs

* web: rules: add scripts to pre-commit configs

* lints

* ci: add temporary branch push trigger to get incremental builds

* web: rules: assert start_dir must exist

* ci: web: rules: deep checkout so we can get rule history

* web: rules: check output of subprocess

* web: rules: factor out common CSS

* web: rules: fix header links

* web: rules: only index rule content, not surrounding text

* ci: web: remote temporary branch push trigger
2024-08-22 09:42:40 +02:00
Moritz
afb72867f4 assert sample analysis data is present 2024-08-01 07:58:29 +02:00
mr-tz
e83f289c8e add script to minimize vmray archive to only relevant files 2024-07-31 13:28:41 +00:00
Mike Hunhoff
87dfa50996 scripts: remove old code from show-features.py 2024-07-29 12:00:29 -06:00
Mike Hunhoff
3043fd6ac8 vmray: merge upstream 2024-07-29 11:37:37 -06:00
Soufiane Fariss
8476aeee35 scripts/show-capabilities-by-function.py: fix incorrect function address 2024-07-29 14:17:40 +02:00
Mike Hunhoff
194017bce3 vmray: merge upstream 2024-07-12 09:27:49 -06:00
Maxime Berthault
76913af20b Binary Ninja update and fix (#2205)
* Fix binja warning (use of a deprecated API method)

* Update binja plugin
> Fix json openning and parsing
> Fix base address

* Fix code_style

* lint black update
2024-07-12 12:25:19 +02:00
Yacine Elhamer
0b70abca93 show-features.py: add other usage of get_process_name() 2024-07-01 12:03:12 +01:00
Yacine Elhamer
6de22a0264 show-features.py: fix process filtering bug 2024-07-01 10:34:19 +01:00
Yacine Elhamer
fd811d1387 scripts/show-features.py: use extractor.get_process_name() interface for getting process name 2024-07-01 09:55:24 +01:00
Mike Hunhoff
4b08e62750 vmray: fix flake8 lints 2024-06-20 14:12:34 -06:00
Mike Hunhoff
19502efff3 vmray: connect process, thread, and call 2024-06-20 13:05:32 -06:00
Mike Hunhoff
d26a806647 vmray: update scripts/show-features.py to emit process name from extractor 2024-06-18 14:59:29 -06:00
mr-tz
9ec9a6f439 fix mypy issues 2024-06-12 09:32:03 +00:00
mr-tz
97a3fba2c9 fix black 2024-06-12 09:24:16 +00:00
ReWithMe
52e24e560b FEAT(capa2sarif) Add SARIF conversion script from json output (#2093)
* feat(capa2sarif): add new sarif conversion script converting json output to sarif schema, update dependencies, and update changelog

* fix(capa2sarif): removing copy and paste transcription errors

* fix(capa2sarif): remove dependencies from pyproject toml to guarded import statements

* chore(capa2sarif): adding node in readme specifying dependency and applied auto formatter for styling

* style(capa2sarif): applied import sorting and fixed typo in invocations function

* test(capa2sarif): adding simple test for capa to sarif conversion script using existing result document

* style(capa2sarif): fixing typo in version string in usage

* style(capa2sarif): isort failing due to reordering of typehint imports

* style(capa2sarif): fixing import order as isort on local machine was not updating code

---------

Co-authored-by: ReversingWithMe <ryanv@rewith.me>
Co-authored-by: Willi Ballenthin <wballenthin@google.com>
2024-06-11 15:01:26 +02:00
RainRat
8ad74ddbb6 fix typos 2024-06-01 11:48:19 -07:00
Willi Ballenthin
b59df659c9 pep8 2024-05-08 16:20:10 +02:00
Willi Ballenthin
519cfb842e profile-time: more result reporting, and learn to specify other backends 2024-05-08 16:20:10 +02:00
N0stalgikow
0eb4291b25 Updating copyright across all files based on when it was first introduced. (#2027)
* updating copyright, back to the date of origin of file

* updating regex to account for linter violation
2024-03-13 14:04:53 +01:00
Aayush Goel
49231366f1 Handles circular dependencies while getting rules and dependencies (#2014)
* Remove test for scope "unspecified"

* raise error on circular dependency

* test for circular dependency
2024-03-06 11:39:21 +01:00
Moritz
2c93c5fc83 lint: get backend from format (#1964)
* get backend from format

* add lint.py script test

* create FakeArgs object

* adjust EOL handling in lints

---------

Co-authored-by: Willi Ballenthin <wballenthin@google.com>
2024-02-01 11:33:16 +01:00
Moritz
2f2d4a1d6b Merge branch 'master' into dependabot/pip/flake8-bugbear-24.1.17 2024-01-31 11:41:05 +01:00
Jensen Coonradt
1a4f2559fa Change log update to show the removal of the scripts/vivisect-py2-vs-py3.sh file (#1952)
* remove scripts/vivisect-py2-vs-py3.sh

---------

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2024-01-31 11:37:46 +01:00
mr-tz
66c2f07ca8 remove BaseException usage 2024-01-31 11:32:00 +01:00
dependabot[bot]
ba044a980f build(deps-dev): bump black from 23.12.1 to 24.1.1 (#1955)
* build(deps-dev): bump black from 23.12.1 to 24.1.1

Bumps [black](https://github.com/psf/black) from 23.12.1 to 24.1.1.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/23.12.1...24.1.1)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* black 24.1.1 formatting

* update flake config to match black 24.1.1 format

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
Co-authored-by: mr-tz <moritz.raabe@mandiant.com>
2024-01-31 11:18:54 +01:00
Willi Ballenthin
c3301d3b3f refactor main to for ease of integration (#1948)
* main: split main into a bunch of "main routines"

[wip] since there are a few references to BinExport2
that are in progress elsewhre. Next commit will remove them.

* main: remove references to wip BinExport2 code

* changelog

* main: rename first position argument "input_file"

closes #1946

* main: linters

* main: move rule-related routines to capa.rules

ref #1821

* main: extract routines to capa.loader module

closes #1821

* add loader module

* loader: learn to load freeze format

* freeze: use new cli arg handling

* Update capa/loader.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* main: remove duplicate documentation

* main: add doc about where some functions live

* scripts: migrate to new main wrapper helper functions

* scripts: port to main routines

* main: better handle auto-detection of backend

* scripts: migrate bulk-process to main wrappers

* scripts: migrate scripts to main wrappers

* main: rename *_from_args to *_from_cli

* changelog

* cache-ruleset: remove duplication

* main: fix tag handling

* cache-ruleset: fix cli args

* cache-ruleset: fix special rule cli handling

* scripts: fix type bytes

* main: remove old TODO message

* loader: fix references to binja extractor

---------

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2024-01-29 13:59:05 +01:00
mr-tz
9bc04ec612 update data via script 2024-01-16 15:29:25 +01:00
Mike Hunhoff
2dbac05716 ghidra: fix IndexError exception (#1879)
* ghidra: fix IndexError exception
2023-12-15 16:23:19 -08:00
Arnim Rupp
1d3ae1f216 Update capa2yara.py (#1904)
Extend unhandled strings to allow capa2yara to run through
2023-12-13 15:51:56 +01:00
Yacine Elhamer
d5ae2ffd91 capa.capabilities: move has_file_limitations() from capa.main to the capabilities module 2023-10-20 10:15:20 +02:00
Yacine Elhamer
96fb204d9d move capa.features.capabilities to capa.capabilities, and update scripts 2023-10-20 09:54:24 +02:00
Moritz
2cfd45022a improve and fix various dynamic parts (#1809)
* improve and fix various dynamic parts
2023-10-18 10:59:41 +02:00
Willi Ballenthin
1aac4a1a69 mypy 2023-10-17 14:42:58 +00:00
Willi Ballenthin
44d05f9498 dynamic: fix some tests 2023-10-17 11:41:40 +00:00
Willi Ballenthin
bf233c1c7a integrate Ghidra backend with dynamic analysis 2023-10-17 10:56:35 +00:00
Willi Ballenthin
182a9868ca merge master 2023-10-17 10:32:25 +00:00
Yacine Elhamer
953b2e82d2 rendering: several fixes and added types/classes 2023-10-11 11:52:16 +02:00
Yacine Elhamer
8b287c1704 scripts/profile_time.py: revert restriction that sample extractors can only be static ones 2023-10-04 10:51:53 +02:00
Yacine Elhamer
28a722d4c3 scripts/profile_time.py: revert restriction that frozen extractors can only be static ones 2023-10-04 10:51:02 +02:00