8000 Add `sigurl` to get the YARA signature from https server for scan by kumarak · Pull Request #6607 · osquery/osquery · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sigurl to get the YARA signature from https server for scan #6607

Merged

Conversation

kumarak
Copy link
Contributor
@kumarak kumarak commented Aug 24, 2020

The PR adds a new column sigurl that can be used to pass the url from where the YARA signature can be downloaded for the scan. The signature urls need to be pre-registered with the configuration file.

 // Configuration file for registering the URL's 
 "yara": { 
   "signature_urls": { 
     "sig_url_1": "https://raw.githubusercontent.com/Yara-Rules/rules/master/cve_rules/CVE-2010-0805.yar", 
     "sig_url_2": "https://raw.githubusercontent.com/Yara-Rules/rules/master/crypto/crypto_signatures.yar", 
     "sig_url_3": "https://raw.githubusercontent.com/Yara-Rules/rules/master/malware/APT_APT3102.yar"
   }
 }

The feature will be disabled by default and can be enabled with a hidden flag enable_yara_sigurl. It also forces all the strings to be private, if the rules are downloaded from url unless . The private string will not show up in the yara table. the feature can be disabled with disable_yara_string_private flag. A query with signature urls:

osquery> select * from yara where path like '%' and sigurl='sig_url_2';
+-------------------------+--------------+-------+-----------+---------+---------+---------+------+-----------+
| path                    | matches      | count | sig_group | sigfile | sigrule | strings | tags | sigurl    |
+-------------------------+--------------+-------+-----------+---------+---------+---------+------+-----------+
| CMakeCache.txt          | Big_Numbers1 | 1     |           |         |         |         |      | sig_url_2 |
+-------------------------+--------------+-------+-----------+---------+---------+---------+------+-----------+

@directionless
Copy link
Member

I feel really ambivalent about this. It seems like it's adding a layer of indirection in the config, and I don't understand why the complexity is valuable. I'm not opposed, but I am skeptical.

  • What is gained by specifying the URL in the config, and not in the the query directly?
  • Is there theoretically a deployment where the config is locked down, but distributed query is open, and this limits the potential yara rules?
  • Is this better served by an allow-list for urls that can be passed to the yara scanner? (including wildcards or RE is an interesting thought)

I'm going through the thought exercise of using this. And it has a lot of moving parts for a site admin.

@kumarak kumarak force-pushed the kumarak/yara_table_extension_url branch from 4ef5f2d to 370f046 Compare August 31, 2020 15:46
@mike-myers-tob mike-myers-tob added this to the 4.5.0 milestone Aug 31, 2020
@kumarak kumarak force-pushed the kumarak/yara_table_extension_url branch from 370f046 to ffafe82 Compare September 1, 2020 14:08
@kumarak kumarak marked this pull request as ready for review September 1, 2020 14:38
@kumarak
Copy link
Contributor Author
kumarak commented Sep 1, 2020

Thanks @directionless for the comments. The reason I added signature urls to config file to allow only a set of url from where the rules can be downloaded. I had concern about passing them with query. Someone can pass arbitrary url's with the query and download data into the memory.
Having an allowed-list of url is also an interesting idea and we can discuss what is the better way to support getting yara rules from url's.

@kumarak kumarak force-pushed the kumarak/yara_table_extension_url branch from 837b9ca to 4f52202 Compare September 1, 2020 15:34
@mike-myers-tob
Copy link
Member
mike-myers-tob commented Sep 2, 2020

To review office hours:

What is gained by specifying the URL in the config, and not in the the query directly?

Just that the user doesn't have to include the URL or URLs in the query every time. The URL may never (or rarely) change, so it can be "remembered" in the config. Also there might be URL length limitations in a SQL statement but I don't know for sure.

Is there theoretically a deployment where the config is locked down, but distributed query is open, and this limits the potential yara rules?

We weren't assuming a difference in access privilege of query delivery and of osquery config changes. Supplying the signature URL(s) in the config was the chosen approach for the reason above.

Is this better served by an allow-list for urls that can be passed to the yara scanner? (including wildcards or RE is an interesting thought)

Protecting against an untrusted query wasn't in our requirements, so no. And we weren't thinking that there were two privilege levels here (role in which it is possible to change the config; role in which they cannot change the config but they can issue arbitrary queries). If this is definitely a thing, we can add the allow_list string in the config to mitigate risk from the latter. But if it's unknown if it's a thing, I suggest not holding up this PR to write a mitigation for an imagined problem.

Let me see if @directionless would agree that these changes would get us to mergeable state:

  • Make it possible for the URL or URLs to be supplied in the query, in addition to the configuration file. Note that this is different from the last PR, which made it possible to add the YARA rule(s) themselves in the query. But it would be effectively the same risk, and need to be guarded by a config flag the same as has been discussed.
  • If needed add another option to the config file, a domain_allow_list, that restricts the URLs that can be provided in a query.

@directionless
Copy link
Member

I think we're having a UI/UX conversation, not a functionality or security one.

Specifically, I think office hours was pretty clearly in favor of the functionality, even if it duplicates joining curl. And while I think it's unlikely there's an entity that splits control of the osquery conf from access to live queries, there's probably one somewhere.

So, I think that means:

  1. We want to be able to pass a URL as a source for sigrules.
  2. We want to limit those to something allowed by the config file

I think the correct way to achieved (2) is to use a glob or regex based allow_list in the config file. Further I think this does not need a flag, the allow list starts empty. Eg: none are allowed.

I do not think we should be naming URLs and then querying by name. I think the indirection is more trouble than good.

I imagine this would looks something like:

 "yara": { 
   "signature_urls": [
      "https://github.com/Yara-Rules/rules/tree/master/malware/*",
      "www.internal/yara/*",
  ]
}

Notable undiscussed here is what, if any, authentication happens. Or how often osquery re-fetches the URL content.

@mike-myers-tob
Copy link
Member

Notable undiscussed here is what, if any, authentication happens.

TLS-client authentication is the only kind of authentication I see support for in osquery's http_client (osquery/osquery/remote/http_client.cpp). If additional authentication schemes for http_client are desired can we address that in a separate enhancement issue? Other than that, what is the desired error handling behavior if the URL returns a 401 Unauthorized (authentication required)? I imagine the table will just return empty results and log an error. I don't know if it is logging an error yet or if @kumarak should add one at yara.cpp line 229.

Or how often osquery re-fetches the URL content.

In the PR, osquery/tables/yara/yara.cpp#121-ff implement a check of the "Last-Modified" HTTP header, which is to say it re-fetches the content only when the content has changed since it was last cached in the client.

in favor of the functionality, even if it duplicates joining curl.

Besides the aforementioned HTTP caching, I also don't believe the hypothetical JOIN between yara and curl would support the multiple signature files as implemented in this PR, how would that work? Maybe I lack the understanding of SQL.

@mike-myers-tob
Copy link
Member

I do not think we should be naming URLs and then querying by name. I think the indirection is more trouble than good.

@kumarak do you think this could be avoided in an update to this PR?

osquery/tables/yara/yara.cpp Outdated Show resolved Hide resolved
osquery/tables/yara/yara.cpp Outdated Show resolved Hide resolved
osquery/tables/yara/yara.cpp Outdated Show resolved Hide resolved
@@ -177,6 +222,21 @@ Status getYaraRules(YARAConfigParser parser,
break;
}

case YC_URL: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of caching... This is all inside the compiled cache block, isn't it? Which, is probably desired to keep from re-compiling these rules. /However/ doesn't that mean that the URL content will be cached for the life of the osquery process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiled cache blocks gets cleaned for YC_RULE and YC_URL type after the scan and will not be there for the life-time of the osquery process.

  // possibilities of collision when arbitrary queries are executed
  // with distributed API. Clear the hash string from the cache
  // Also cleanup the cache block if rules are downloaded from url
  for (const auto& sign : scanContext) {
    if (sign.first == YC_RULE || sign.first == YC_URL) {
      auto it = rules.find(hashStr(sign.second, sign.first));
      if (it != rules.end()) {
        rules.erase(hashStr(sign.second, sign.first));
      }
    }
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Hrm. I know that's in a different PR, but it seems weird.

If we're storing these with a key based on the hash of the rule body, why delete? (I know it's existing code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I throw them out of the cache after scanning is to avoid having the recompiled rule for the lifetime of the osquery process. It fits with the URL, if we are going to re-fetch the updated contents.

osquery/tables/yara/yara.cpp Show resolved Hide resolved
@directionless
Copy link
Member

If additional authentication schemes for http_client are desired can we address that in a separate enhancement issue?

100%. future work.

Other than that, what is the desired error handling behavior if the URL returns a 401 Unauthorized (authentication required)? I imagine the table will just return empty results and log an error. I don't know if it is logging an error yet or if @kumarak should add one at yara.cpp line 229.

Some kind of log line seems reasonable here, yeah.

in favor of the functionality, even if it duplicates joining curl.

Besides the aforementioned HTTP caching, I also don't believe the hypothetical JOIN between yara and curl would support the multiple signature files as implemented in this PR, how would that work? Maybe I lack the understanding of SQL.

I think one of us is missing something. (And there's a good chance I'm confused about something) the sigrule example, the requested sigrule is passed wholesale to compileFromString. The same happens here. The file content is downloaded, and passed to compileFromString. Whatever works for one, should work for the other.

@kumarak kumarak force-pushed the kumarak/yara_table_extension_url branch from 172a0d0 to 842c747 Compare September 3, 2020 07:39
@kumarak
Copy link
Contributor Author
kumarak commented Sep 3, 2020

Thanks, @mike-myers-tob, @directionless for the comments. I updated the PR to implement the allowed list in the config and the URL will be matched against them. It checks for scheme, host match, and does the regexp match for the path. This will provide more flexibility in populating the allowed list.

If additional authentication schemes for http_client are desired can we address that in a separate enhancement issue?

100%. future work.

It does not support the authentication schema and it will be done with future enhancements.

in favor of the functionality, even if it duplicates joining curl.

Besides the aforementioned HTTP caching, I also don't believe the hypothetical JOIN between yara and curl would support the multiple signature files as implemented in this PR, how would that work? Maybe I lack the understanding of SQL.

I think one of us is missing something. (And there's a good chance I'm confused about something) the sigrule example, the requested sigrule is passed wholesale to compileFromString. The same happens here. The file content is downloaded, and passed to compileFromString. Whatever works for one, should work for the other.

It uses the same code path as sigrule after downloading rules from the URL. Similar functionality can also be achieved by joining sigrule with curl table but there is a scope of future enhancement in terms of caching and authentication. Also, the allowed list will prevent someone getting data from the arbitrary URLs.

@kumarak kumarak force-pushed the kumarak/yara_table_extension_url branch 2 times, most recently from 4cc8814 to 25e4143 Compare September 3, 2020 15:47
@kumarak kumarak force-pushed the kumarak/yara_table_extension_url branch from 25e4143 to 4ba3402 Compare September 4, 2020 22:34
Copy link
Member
@directionless directionless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry. I'm having trouble putting this code into my head this last week. A couple of small comments, of these I think I only feel strongly about removing the starts_with

osquery/tables/yara/yara.cpp Show resolved Hide resolved
osquery/tables/yara/yara.cpp Outdated Show resolved Hide resolved
@@ -177,6 +222,21 @@ Status getYaraRules(YARAConfigParser parser,
break;
}

case YC_URL: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Hrm. I know that's in a different PR, but it seems weird.

If we're storing these with a key based on the hash of the rule body, why delete? (I know it's existing code)

osquery/tables/yara/yara_utils.cpp Outdated Show resolved Hide resolved
osquery/tables/yara/yara_utils.h Outdated Show resolved Hide resolved
osquery/tables/yara/yara.cpp Outdated Show resolved Hide resolved
Moved to allow set for checking the sigurl

Update comments and check for response code
@kumarak kumarak force-pushed the kumarak/yara_table_extension_url branch from 5f38564 to 65177d9 Compare September 11, 2020 04:29
@mike-myers-tob mike-myers-tob removed their request for review September 11, 2020 16:34
@theopolis theopolis merged commit 69bcf70 into osquery:master Sep 11, 2020
@kumarak kumarak deleted the kumarak/yara_table_extension_url branch September 30, 2020 01:20
@MinhKMA
Copy link
MinhKMA commented Feb 18, 2021

@kumarak How to use this function? I config same your config but It return:

W0218 02:59:36.048328  6684 yara.cpp:247] Failed to get YARA rule url: sig_url_2

Pls share me your document? thank you

@aleksf
Copy link
aleksf commented Jul 15, 2021

@kumarak How to use this function? I config same your config but It return:

W0218 02:59:36.048328  6684 yara.cpp:247] Failed to get YARA rule url: sig_url_2

Pls share me your document? thank you

has to be an array @MinhKMA, my yara config block

  // Description of the YARA feature.
  "yara": {
    "signatures": {
      // Each key is an arbitrary group name to give the signatures listed
      "sig_group_1": [ "/Users/vagrant/analysis/sigs/macho-codesig-anomaly.sig", "/Users/vagrant/analysis/sigs/packed-macho.sig" ],
      "sig_group_2": [ "/Users/vagrant/analysis/sigs/macho-codesig-anomaly.sig" ]
    },
    "signature_urls": [
      "https://bitbucket.org/osx-malware-analysis/yara-rules/raw/master/macho/packed-macho.yara",
      "https://bitbucket.org/osx-malware-analysis/yara-rules/raw/master/macho/macho-codesig-anomaly.yara"
    ],
    "file_paths": {
      // Each key is a key from file_paths
      // The value is a list of signature groups to run when an event fires
      // These will be watched for and scanned when the event framework
      // fire off an event to yara_events table
      "system_binaries": [ "sig_group_1" ],
      "tmp": [ "sig_group_1", "sig_group_2" ]
    }
  },

  // Paths to watch for filesystem events
  "file_paths": {
    "system_binaries": [ "/usr/bin/%", "/usr/sbin/%" ],
    "tmp": [ "/Users/%/analysis/%%", "/tmp/%" ]
  }

aikuchin pushed a commit to aikuchin/osquery that referenced this pull request Jul 11, 2023
…0 to master

* commit 'ab092654b8b027e35a91ef19feb74f9eeec0fc2b': (98 commits)
  osquery-4.5.0: merge updates and fix conflicts
  osquery-4.5.0: merge updates and fix conflicts
  Add 4.5.0 CHANGELOG (osquery#6646)
  Add sigurl column to pass yara signatures with the queries (osquery#6607)
  Update process auditing docs schema link (osquery#6645)
  Support for Background Activity Moderator (osquery#6585)
  database: Remove read only concept (osquery#6637)
  Parsing of user data in Windows event log (osquery#6643)
  events: Remove selectAllFrom from linux process events callback (osquery#6638)
  libs: Set glog and gflags includes as SYSTEM (osquery#6635)
  tests: Fix flaky test_2_daemon_with_option (osquery#6634)
  tests: Improve the reliability of TLSServerRunner (osquery#6632)
  database: Move initialization retry logic into DB API (osquery#6633)
  libs/smartmontools: Add aarch64 support
  libs/lzma: Add aarch64 support
  libs/lldpd: Add aarch64 support
  libs/librdkafka: Add aarch64 support
  libs/libgpg-error: Add aarch64 support
  libs/libgcrypt: Add aarch64 support
  libs/libdpkg: Add aarch64 support
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0