Spark Bitcoin Parser

Bitcoin transaction data stored in blkXXXXX.dat files in every full node of the network. As more people participate in Bitcoin transactions, the data grows exponentially. Scalable analysis is called for.

This project is a Bitcoin transaction data Parser, based on Python and Spark's Python API – PySpark.

Files

Program File	Description	First Author	Modified by
base58.py	module: necessary encoder for public address	Gavin Andresen	Peng Wu
blocktools.py	module: tools for reading binary data from block files	Alex Gorale	Peng Wu
block.py	module: classes for Blocks, Transactions	Alex Gorale	Peng Wu
spark_parser.py	parser: step 1	Peng Wu
spark_mapinput.py	parser: step 2	Peng Wu
spark_mapaddrs.py	parser: step 3	Peng Wu

Output CSV Files	Format
transactions.csv	tx_hash, tx_value, timestamp, num_inputs, num_outputs
outputs.csv	tx_hash, output_index, output_value, output_address
inputs_mapping.csv	tx_hash, input_index, prev_tx_hash, output_index
inputs.csv	tx_hash, input_index, input_value, input_address
addrs.csv	payer_address, payee_address, value, tx_hash

Usage: Data Pipeline

General In and Out

Input: unlimited number of blkXXXXX.dat files
Output: .csv files (saved for files) OR Spark-RDD (for further analysis)
The following steps illustrate the CSV way of output. However, if you familiar with Spark can combine all 3 steps into one, so that all outputs in the middle will be handled as RDD. It'd be faster.
For quick and easy application, see spark_workflow.sh.

Step 1

spark-submit spark_parser.py blk00001.dat blk00002.dat ...

Input: blkXXXXX.dat

Output: transaction.csv, inputs_mapping.csv, outputs.csv`

Step 2

spark-submit spark_mapinput.py

Input: inputs_mapping.csv, outputs.csv

Outputs: inputs.csv

Step 3

spark-submit spark_mapaddr.py

Input: inputs.csv, outputs.csv

Outputs: addrs.csv

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spark Bitcoin Parser

Files

Usage: Data Pipeline

General In and Out

Step 1

Step 2

Step 3

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
base58.py		base58.py
blkdate.txt		blkdate.txt
block.py		block.py
blocktools.py		blocktools.py
spark_mapaddr.py		spark_mapaddr.py
spark_mapinput.py		spark_mapinput.py
spark_parser.py		spark_parser.py
spark_workflow.sh		spark_workflow.sh

License

pengwu22/spark-bitcoin-parser

Folders and files

Latest commit

History

Repository files navigation

Spark Bitcoin Parser

Files

Usage: Data Pipeline

General In and Out

Step 1

Step 2

Step 3

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages