Today we will have a deep dive into volume traded in different exchanges and how to choose data sources to prevent negative impacts in our models.
Two weeks ago, I published the first part of this series of articles. If you haven’t read it yet, I encourage you to do it because it will help you to understand what’s going on here (starting with propaganda? I love this writer inside of me!). If you didn’t, and you don’t want to, it doesn’t matter! Maybe if you like this one you will feel tempted (machiavellian laugh).
First, let me introduce you to what volume represents for traders and why it will be essential to train our models.
Volume: Refers to the total amount traded of a pair on a particular exchange in a given period of time.
In my attempt to be really pedagogical — which I’m not — give me the chance to compare the price and volume oscillation with dragonflies migration (WHAT?!!). This migration phenomenon was analyzed by a team of researchers with the goal of understanding the dragonfly’s journey in the USA. Long story short: they tested each wing sample for a chemical code that would indicate approximately where the bugs were born, and with that information, they were able to trace their path.
Now let’s suppose that WE are that group of researchers (don’t even try to ask me why we are supposed to gladly agree to do this). Assume too, that we know nothing about chemistry or biology, and because of that, we decide to face this issue the old-fashioned way: just looking at them at a certain point. To make the connection with our case, let’s assume for a while that every dragonfly is a transaction: there are big ones, little ones. Also, there are groups of them that we could see as the exchanges. Our interest in predictions is the price, so — and follow me on this — dragonflies direction will be our price.
Okay… this bunch of insects is going somewhere, but while doing this migration, it’s not easy for all of them to remain united. What could we do to know where are they going? Maybe see the generality: look for the big groups or big dragonflies and try to understand their direction, if there are a lot of dragonflies headed to the south, it’s a strong indicator that they are actually going south (be patient here comes the part where this analogy makes sense).
Now visualize this situation, a dragonflies group it’s thirsty, desperately looking for water. The nearest river is located to the north, it doesn’t matter that they are really migrating in the opposite direction, a gulp of water and they will be all set to fly those extra kilometers. Now is when the sad part comes: there we are, in the very first minutes of our quest, just adapting ourselves to the ground, when suddenly, this thirsty group appears in front of our incredulous eyes. Our first reflex could be looking at each other with self-importance expression thinking “easiest job ever”. We couldn’t have been more wrong.
And that’s the main logic behind this example: transactions are like dragonflies, and because of that, there might be some lost ones, others that think they’re going in the right direction even though they’re not, others just following a big one or a group, and so on. And because of that, it’s SUPER important to look for a tool that tries to consolidate all this market information, which will let us avoid signals that are not entirely representative.
We saw that Cryptocompare had a lot of information about more than 200 exchanges. Happily, it also has an API, so we can answer our questions with data. But how we compare that? While I was reading the API documentation, suddenly the monitor started getting yellow in a particular place, I don’t know if it was the 5 hours I slept the day before or the memories of a catholic education childhood, maybe both, but, I swear! A mystical yellow light surrounded one particular parameter. This is just an artistic representation of what I saw.
It was this a divine signal? … It probably was, because answers tend to hide in the shadowest part of the forest — not looking like aureolas — . The API has the exchange parameter! That means we can have a reliable reference for the volume and price traded for every exchange. Another thing about this documentation caught my attention: this CCCAGG default parameter. When I was searching for it, voilà! It was like a dream becoming true, an entire methodology designed to show the best price estimation for crypto traders, which means we can have a solid reference for the volume and price traded aggregated (or at least I was thinking so). This includes features like outlier detection, adding new exchanges after test trials, volume adjustment based on liquidity, and more. CCCAGG seems promising, but let’s do some sanity checks first.
The unofficial Python package suggested was cryptocompare, but the get_historical_price method wasn’t the manageable I wanted, so I found crytomcompy, which allows us to see different indicators associated with the pair and exchange requested. In the script below you can see the top 5 exchanges sorted by volume for the ETH-USD pair.
Let’s do the same exercise, but this time for the pair ETH-USDT, instead of using dollars this time, we are going to use Tether, a stablecoin proxy to the USD.
Interesting! The volume and the leading players are widely different between the two pairs, which tell us that every pair could have its own peculiarities. Ok, but the sentence to introduce this paragraph was: “ let’s do some sanity checks first”, and we still haven’t seen anything related to CCCAGG. Well, to start we will compare the top 2 exchanges in the pair ETH-USD with CCCAGG for the price (to store the data as CSV I used the instructions I found in this article, in case you want to do some charts on your own, you can check out the code on my Github).
It seems pretty good, there are no signs of abnormalities in the price. I was expecting this because the methodology it’s built based on 24 hours weighted average to estimate the prices. So, price sanity check: approved. Now let’s try the same but with the volume. The volume is usually measured in different ways, but the most common is known as “volumefrom” which means that the transactions are measured in the first currency in the pair (Ethereum in this case). It also exists the “volumeto” which is the same but measured in the second currency in the pair (USD for this example).
Mmmhhhh stupid dragonflies!… there are a lot of points in which the volume of just one of the exchanges is higher than CCCAGG (i marked just some of them), and I was expecting that CCCAGG could provide us the sum of all exchanges. This is kind of disappointing, but it also could make sense. The whole point of the methodology behind the aggregation is to filter data points that could be outliers, and maybe the number is not necessarily referred to as the sum of every volume traded.
So, to dissipate all doubts, we must look for more data sources, here I found 5 including Cryptocompare. To make this comparison, I selected another source (AlphaVantage). I chose AlphaVantage because it was the only, on the top 3, in which I found an aggregated option for the ETH-USD pair. Let’s take a look at the last 50 days.
Ok, it’s not perfect, but the lines are going near to each other, so it’s a good indicator. Two sources of aggregated information are saying similar things. It’s a reason to trust. What we should be thinking by now it’s what are we going to do with this volume uncertainty, but that, fellas, won’t be happening now. We are going to revisit this when we tackle the feature selection chapter of the saga. And how? we will just select the volume that behaves best in our predictions.
But let’s take a break, I think that we have enough to diggest in just one single chapter, so, to make some sort of conclusion, I believe that will be good for us to recap what we learn today:
- Transactions are dragonflies.
- There are aggregated measures for the price and volume.
- Can we trust the aggregated price? I might say yes!.
- Can we trust the aggregated volume? Not sure yet.
So, what is our next step? Exploratory data analysis (FINALLY!), the next stop on our journey, will be all about deciphering patterns, following clues, and gaining insights into this problem. I hope you enjoyed this as much as I did, see you in two weeks!.