This repository contains the code that is used to fetch the price and other information for a particular book from a particular vendor.
How the price is fetched varies from an API call to scraping the vendor's website.
The extracted information is returned in a standard form that can then be cleaned up as needed.
The response is a hash, looking something like this:
Text. These fields are optional, but it is nice if they are included.
entries array should contain an entry for each combination of
country and currency (note that
countries is an array, as often lots of
countries will have the same price). The
availability flag is true if the
vendor can supply this book, false otherwise. What
canSell means is something
that each scrapers needs to determine, but generally
true mean that they can
sell it, and
false means they can (don't stock it, book not found on their
site, out of stock etc).
The 'shipping' and hence 'total' returned assume that you are only buying the
one book. The
shippingNote should be used to clarify if there are discounts
to be had for buying more (as in the above example).
url is to the page on the vendor's site for this book.
Note that the format is similar to, but different from, the format returned by the OpenBookPrices API.
During development it is convenient to run a proxy that cache all requests so that the time taken to run a scraper is much shorter, and it is politer to the target site too. Polipo works well for this:
# install using your package manager of choice, in this case brewbrew install polipo# run polipo in a separate terminal telling it to cache everything once fetchedpolipo -- relaxTransparency=true logLevel=0xFF idleTime=1s# in the terminal where you run the scripts set the env variableexport http_proxy=# When you want to clear the cache just delete the files (adapt to your system)rm -r /usr/local/var/cache/polipo/*