barkaelogy
A small npm package to do bitcoin file archaeology.
Since the early days of bitcoin, non-financial data has been "etched" into the blockchain via several methods.
This npm package provides easy to use functions to help decoding files contained in blockchains.
We support all methods given in the aforementioned link and more.
Specifically, this library supports:
-
data stored sequentially in
scriptSigs
(automatically handlesDER signatures
andRedeem Scripts
) -
data stored sequentially in
scriptPubSigs
(heuristics to exclude transactions not in theUTXO-set
) -
data encoded using the
"satoshi"
format. For details, see data in TXs:4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17 (uploader) 6c53cd987119ef797d5adccd76241247988a0a5ef783572a9972e7371c5fb0cc (downloader)
-
data encoded using apertus
-
data encoded using cryptograffiti
-
heuristics to detect irrelevant blobs at the beginning of the data via
libmagic
(for example)
If you're interested in coinbase
or ASCII data, just use strings
(or bitcoinstrings.com).
Getting Started
Prerequisites
You need a bitcoin core (or bcash
, bsv
, ...) node daemon with txindex=1
running.
Pass connection information to the Parser
constructor as described here.
Basic example
Read the bitcoin whitepaper stored in the blockchain into a local file satoshi.pdf
:
const bark = require('barkaelogy');
const whitepaper_txid = "54e48e5f5c656b26c3bca14a8c95aa583d07ebe84dde3b7dd4a78f4e4186e713";
const parser = new bark.Parser({network:"mainnet", username: "foo", password: "bar"});
const data = await parser.extractData([whitepaper_txid]);
await parser.write(data, "satoshi");
Some helpful functions :
getInputData/getOutputData
: retrieves parts of data contained in a raw transactionparseData
: convert parts of data into a file by detecting it's mimetypeextractData(txid, parse_input=false)
: extracts data from a list of sequential txids, only checks input ifparse_input=true
(calls previous functions for you)write
: write data to a file, guessing it's extension according to it's mimetype
See the files in the tests/
directory for more helpers and usage examples.
Running the tests
Before running the tests, make sure that the __connectionDetails
variable in jest.config.js
matches your bitcoin daemon configuration.
Then simply install dependencies and check the package is working via npm run test
(using jest
).
License
This project is licensed under the GPLv3 License - see the COPYING file for details
FAQ
How to extract the whole UTXO set?
Etching methods using scriptPubSigs
produce outputs that are in the UTXO set, so we can trivially have a small superset of them.
You can use for example bitcoin-utxo-dump and filter by txids which are at least N
times in the UTXO set:
$ bitcoin-utxo-dump -f txid -db chainstate_folder_location
$ huniq -c < utxodump.csv > uniq_tx_utxo.csv #use huniq (https://github.com/ahamlinman/huniq) or just uniq+sort
$ grep -v "^[1-9] " uniq_tx_utxo.csv | awk '{print $2}' > txids_to_check.csv #get a list of txids appearing at least 10 times in the UTXO set, ~450k as of march 2020
How can I etch my data?
Bitcoin was designed to store financial data, not your wedding pictures. Don't spam (and pollute the UTXO set) for everyone who runs a bitcoin node. Or at least use a blockchain dedicated to that. Or go spam BSV, they seem to like it. Or even better, use an appropriate data structure.
And if you insist, just please at least :
- don't create a new format
- prepend the
<4B LE length><4B LE crc32 checksum>
at the beginning of your data