8000 GitHub - kerinin/pest: A framework for Probability Estimation
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

kerinin/pest

Folders and files

D6CF
NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pest, a framework for Probability Estimation

Build Status

A concise API focused on painless investigation of data sets

Pest provides a framework for interacting with different probability estimation models. Pest abstracts common statstical operations including:

  • Marginal, Joint and Conditional point probability
  • Interval and Cumulative probability
  • Entropy, Cross Entropy, and Mutual Information
  • Mean, Median, Mode, etc

Scalability if you need it

Pest tries to be agnostic about the underlying data data structures, so changing libraries (NArray -> Hadoop) is as simple as using a different data source. Pest is designed to create estimators using subsets of larger data sources, and transparently constructs estimators to facilitate dynamic querying

Code structure designed to be extended

Implementing custom estimation models is easy, and Pest implements some model common ones for you.

Install

Add it to your Gemfile and bundle

gem "pest"

bundle install 

API

# Creating Datasets
test = Pest::DataSet::Hash.from_hash hash             # Creates a Hash dataset of observations from a hash
train = Pest::DataSet::NArray.from_hash hash          # Creates a NArray dataset

# DataSet Variables
test.variables                                        # hash of Variable instances detected in observation set
test.v                                                # alias of 'variables'
test.v[:foo]                                          # a specific variable
test.v[:foo] = another_variable                       # explicit declaration

# Creating Estimators
e = Pest::Estimator::Frequency.new(data)              # Frequentist estimator - values treated as unordered set
e = Pest::Estimator::Multinomial.new(data)            # Multinomial estimator
e = Pest::Estimator::Gaussian.new(data)               # Gaussian mean/varaince ML estimator

# Descriptive Statistical Properties
#e.mode(:foo)                                          # Mode
#e.mean(:foo)                                          # Mean (discrete & continuous only)
#e.median(:foo)                                        # Median (discrete & continuous only)
# quantile?
# variance?
# deviation?

# Estimating Entropy (Set & Discrete only)
e.entropy(:foo)                                       # Entropy of 'foo'
e.h(:foo, :bar)                                       # Joint entropy of 'foo' AND 'bar'
e.h(:foo).given(:bar)                                 # Cross entropy of 'foo' : 'bar'
e.mutual_information(:foo, :bar)                      # Mutual information of 'foo' and 'bar'
e.i(:foo, :bar)                                       # Alias

# Estimating Point Probability
e.probability(e.variables[:foo] => 1)                 # Estimate the probability that foo=1
e.p(:foo => 1)                                        # Same as above, tries to find a variable named 'foo'
e.p(:foo => 1, :bar => 2)                             # Estimate the probability that foo=1 AND bar=2
e.p(:foo => 1).given(:bar => 2)                       # Estimate the probability that foo=1 given bar=2
e.p(:foo => 1, :bar => 2).given(:baz => 3, :qux => 4) # Moar

# Batch Point Probability Estimation
e.batch_probability(:foo).in(test)                    # Estimate the probability of each value in test
e.batch_p(:foo, :bar).in(test)                        # Joint probability
e.batch_p(:foo).given(:bar).in(test)                  # Conditional probability
e.batch_p(:foo, :bar).given(:baz, :qux).in(test)      # Moar

# Estimating Cumulative & Interval Probability
#e.probability(:foo).greater_than(:bar).in(test)
#e.p(:foo).greater_than(:bar).less_than(:baz).in(test)
#e.p(:foo).gt(:bar).lt(:baz).given(:qux).in(test)

TODO

the builders should validate the variables they're given and throw errors if they're not part of the estimators data

About

A framework for Probability Estimation

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

0