In this repository, I tried to build – from scratch – a tool that enables sampling from a custom probability density function. Particularly, I implemented the inverse transform sampling which is a method used to generate samples from density function by leveraging a uniform distribution. It exploits both the bijection between a uniform and a custom density function, and the ease of sampling from a uniform distribution (through a pseudorandom generator).
I have written a detailed article about the topic where I explain the intuition behind this method. Here is the link for the article.
The developed tool that enables sampling from a custom density function comes in a form of a class DistributionFunction
The class needs two argument to be instantiated:
- explicit formula of the density function
- the interval on which it is defined
Within the class, I define the cumulative distribution function which will be used to implement the mapping process.
The main method of the class is generate_samples
, which enables generating n
samples from the given probability density function.
This method call n
time the method map_dist_uniform
which map a randomly generated number to it peer in the density function.
Also, there is an implementation of a slightly modified version of binary search. It is used to help determining a suitable starting point to solve the mapping process equation.
These tools can be found in tools
folder.
The remaining scripts are meant to create animation and graphs for the article.
Here are code snippets that show how to use this tool.
Example of creating samples from custom density function
Note that RecFunction
is a function that is equal to 1 in a given interval
and equal to zero otherwise
# imports
import numpy as np
import matplotlib.pyplot as plt
from tools.class_distribution_function import DistributionFunction
from tools.step_functions import RecFunction
We first define the density function as well as the interval in which it is defined.
Then we create an instance of DistributionFunction
interval = [0, 1]
f = lambda u: 3 * u ** 2 * RecFunction(*interval)(u)
dist = DistributionFunction(
func=f,
interval=interval
)
Plot of the density function
nb_points = 100
y = np.linspace(*interval, nb_points)
plt.plot(y, dist(y))
Plot of the cumulative distribution function
probs = [dist.cumulative_dist_function(y0) for y0 in y]
plt.plot(y, probs)
Generate samples
size = 100
print(dist.generate_samples(size))
I made an animation to see the mapping algorithm in action
I also made an animation where I generate samples from a uniform and map them to their peers.
The animation shows the evolution of the histogram bars length. As we increase the number of samples the bars approach the curve of the density function.
Rendezvous: "Insight on pseudonumber generators"