8000 Proposal: Collect sanitized metrics · Issue #366 · macadmins/nudge · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Proposal: Collect sanitized metrics #366
Open
@erikng

Description

@erikng

I had planned on bringing in some metrics for Nudge 1.1.7 but that's no longer happening now as of 6fa38af

My goal here is the following:

  • Understand how many devices use Nudge and what version of the software
    • Use this for internal justification of engineering allocation(s)
  • Collect the minimal amount of data necessary to accomplish
  • Ship this data as few times as possible

Currently Nudge's preference file looks like this

{
    deferRunUntil = "2022-05-31 21:08:29 +0000";
    requiredMinimumOSVersion = "12.99.99";
    userDeferrals = 2;
    userQuitDeferrals = 1;
    userSessionDeferrals = 0;
}

What I had proposed was the following

{
    deviceConfiguration =     {
        appVersion = "1.1.7.81406";
        bundlePath = "/Applications/Utilities/Nudge.app";
        configJSON = 2ee6b587eb1bab8f36bcde35a35e2dcd;
        deviceID = "D63F50B5-AA6D-4B44-BB9B-C83B2C91E72D";
    };
    requiredMinimumOSVersion = "12.99.99";
    userDeferrals = 0;
    userQuitDeferrals = 0;
    userSessionDeferrals = 0;
}

if you had a JSON file it would add configJSON which is a calculated MD5 hash of the device's current config.

{
    deviceConfiguration =     {
        appVersion = "1.1.7.81406";
        bundlePath = "/Applications/Utilities/Nudge.app";
        configJSON = 2ee6b587eb1bab8f36bcde35a35e2dcd;
        deviceID = "D63F50B5-AA6D-4B44-BB9B-C83B2C91E72D";
    };
    requiredMinimumOSVersion = "12.99.99";
    userDeferrals = 0;
    userQuitDeferrals = 0;
    userSessionDeferrals = 0;
}

if you had a MDM file it would add configProfile which is a calculated MD5 hash of the device's current config.

{
    deviceConfiguration =     {
        appVersion = "1.1.7.81406";
        bundlePath = "/Applications/Utilities/Nudge.app";
        configProfile= 2ee6b587eb1bab8f36bcde35a35e2dcd;
        deviceID = "D63F50B5-AA6D-4B44-BB9B-C83B2C91E72D";
    };
    requiredMinimumOSVersion = "12.99.99";
    userDeferrals = 0;
    userQuitDeferrals = 0;
    userSessionDeferrals = 0;
}

if you had both a json and mdm profile, both keys would be present

if you had a signed application of Nudge (essentially everyone but local development tests)

{
    deviceConfiguration =     {
        appVersion = "1.1.7.81406";
        bundlePath = "/Applications/Utilities/Nudge.app";
        configProfile= 2ee6b587eb1bab8f36bcde35a35e2dcd;
        developerCertificate = "Developer ID Application: Clever DevOps Co. (9GQZ7KUFR6)";
        deviceID = "D63F50B5-AA6D-4B44-BB9B-C83B2C91E72D";
    };
    requiredMinimumOSVersion = "12.99.99";
    userDeferrals = 0;
    userQuitDeferrals = 0;
    userSessionDeferrals = 0;
}

so if you had the latest release, a JSON file and a MDM profile it would look like

{
    deviceConfiguration =     {
        appVersion = "1.1.7.81406";
        bundlePath = "/Applications/Utilities/Nudge.app";
        configJSON = 2ee6b587eb1bab8f36bcde35a35e2dcd;
        configProfile= 2ee6b587eb1bab8f36bcde35a35e2dcd;
        developerCertificate = "Developer ID Application: Clever DevOps Co. (9GQZ7KUFR6)";
        deviceID = "D63F50B5-AA6D-4B44-BB9B-C83B2C91E72D";
    };
    requiredMinimumOSVersion = "12.99.99";
    userDeferrals = 0;
    userQuitDeferrals = 0;
    userSessionDeferrals = 0;
}

Client Behavior

Upon the first run of the new nudge it would submit this information to https://deviceconfiguration.nudgeapp.workers.dev which is a CloudFlare workers serverless instance that looks for the Nudge application to submit data.

After it submits the data successfully, Nudge will not submit new data unless any of the following had changed:

  • appVersion
  • configJSON
  • configProfile
  • developerCertificate

The following keys do not matter for resubmission

  • deviceID
  • bundlePath

If Nudge detected weird serial number formats, it would assume a virtual machine and never submit data.

Server Infrastructure

Nudge Client -> POST -> CloudFlare Worker -> Validation
-> POST/Kafka Producer -> Kafka/UpStash
-> POST -> ElasticSearch/Private Docker -> Kafka Consumer -> Kibana/Grafana visualizations

As mentioned before, the submission endpoint is running on CloudFlare workers within their own security model.

The CloudFlare worker would then take the data, ensure it is in the correct format (and reject it if it isn't) and send it to a kafka topic running on upstart. From there it would be consumed into a private ElasticSearch.

The CloudFlare worker is not open source because of the following reasons:

  • It has API keys to write to Kafka
  • It will eventually have API keys to write to ElasticSearch
  • There is custom code to protect against malicious submissions and I want to make that harder on $bad actors, not easier

Currently, the CloudFlare worker does not ship to ElasticSearch because I am still figuring out the most secure, least cost prohibitive model for this. I also do not want to expose this data to any potential malicious actors.

One thing to note is that I do not get access to the CloudFlare workers raw logs (not do I want them) which has the source IP of the client device. This protects both you and I from exposing data we don't intend to.

The Kafka message queue purges old messages after 7 days which is compliant with GDPR.

deviceConfiguration Dictionary design

I have tried to be thoughtful on what keys I want and why

appVersion

This is straight forward, but this just pulls the current nudge version.

bundlePath

This is also straight forward but is the install path for Nudge. For most people this will be /Applications/Utilities/Nudge.app.

I'm mainly curious to collect this to understand if people are using other application paths as it could impact the preinstall and postinstall scripts.

configJSON

This function would use the new -print-json-config logic and take the resulting string and convert it to a MD5 hash.

By submitting this data, I could put machines into collections, while not exposing what company this collection belongs to.

configProfile

This function would use the new -print-profile-config logic and take the resulting string and convert it to a MD5 hash.

By submitting this data, I could put machines into collections, while not exposing what company this collection belongs to.

developerCertificate

This is also straight forward, but is the signature of the nudge application. For most people this will be Developer ID Application: Clever DevOps Co. (9GQZ7KUFR6).

Given that this data is generally not seen as any measure of security I am collecting it to understand if/when forks are being used. Think jumpcloud patch management

deviceID

This is the key that took the most thought around.

These were the core requirements:

  • Do not expose PII
  • Prevent my database from growing with bad data and increasing costs
  • Persist ID through device wiping or Nudge configuration device wiping (defaults delete ~/Library/Preferences/com.github.macadmins.Nudge

In order to do this I came up with the following idea:

  • Grab the Hardware UUID of the device and store it in temporary memory
  • Grab the Serial of the device and store in temporary memory
    • This data is already stored for the (?) button
  • Convert Serial to a UUID compliant string
  • Create a new string of com.github.macadmins.Nudge + Hardware UUID + Serial UUID
  • Convert that string to a SHA256 hash
  • Take that resulting hash, take the first 16 digits of it and use that to create a new UUID

By going through all of these steps, I can safely create a method that is immutable (except for logic board changes). More importantly though, this UUID cannot be used to reverse the serial number or hardware ID.

Opt out method

Initially I do not want to make an opt-out method because I want to achieve the following:

  • How do resellers of Nudge opt-out, if at all, to this feature?
    • My current thinking is if they want to opt-out and continue to profit from my work, they can completely fork Nudge and maintain their fork themselves.
  • What kind of resiliency do I need?
    • When a bunch of companies deploy a new version of Nudge, what will that look like for the servers?
    • When new versions of macOS come out, what will that look like for the servers?
  • Is this data collection cost prohibitive?
    • If this costs me more than $10/month, I honestly don't want to do it

Potential concerns around data and my views on it

Almost every piece of software, including the software you are using to read this GitHub issue has some metrics being shipped out. If it's Safari, I'm sure Apple is sanitizing it in a beautiful way, but if it's Google, they are getting your hardware uuid, your serial, your IP address and much more information about you that you cannot opt-out.

This small bit of data was thoughtfully designed and greatly helps me continue to support this software for free, now and in the future.

The version, profile and/or JSON are PII and should be considered insider information

While I agree that the raw values could absolutely be used against a company, by converting it to an MD5, I think the risk is less severe.

That said, after thinking about it, I can come up with some situations where perhaps this data could be used to infer things:

  • When collection "foo" begins a new Nudge event
  • When collection "foo" begins to upgrade Nudge
  • When a new collection (company) "bar" starts deploying Nudge for the first time

Because of this, I could see the following key

  • disableOptionalMetrics

This would essentially give me the absolute bare minimum device collection data and prevent me from understanding those types of events

  • bundlePath
  • deviceID
  • developerCertificate

Moving forward

I'm not sure how much longer I can continue to maintain Nudge by myself. I am no longer a macadmin and I continue to force myself to find time to improve Nudge. If I am to continue maintaining it, I must show my new organization the true impact of what Nudge is doing to help company's secure their devices.

When you deploy something it is very likely your management wants to track and understand the status of the deployment. For two years I have essentially been blind as to how Nudge is operated.

Other projects you may have used or your developers use do similar things:

Other tools check into APIs

I could go on and on and will point you to Facebook, Google, Apple. All of them track you and ship far more information about you than this tool ever will.

https://nordvpn.com/blog/worst-privacy-apps/

I look forward to the discussion and I'm sure it will get heated.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0