Data science project: cryptocurrencies (Part 1)

In this first article, I will try to answer the question: why are we here? (not the metaphysical one), and also give you a little taste of one of our first steps.

Mauricio Letelier
Photo by Abigail Faith on Unsplash (i was just wondering if…)

There are a lot of articles that describe all the introductory context of cryptocurrencies, what blockchain is, how the market works, what is technical analysis, and so on. Given that there are brilliants descriptions of all of this and you can find excellent resources in the embedded words and in a lot of other places we will skip this part (at least for now) because we will jump straight forward to analyze one of the thousands of ways to tackle all the questions that trading involves like “should I buy?”, “when to sell?” etc.

So, you might be thinking, “ok, but there is a lot of articles applying all kinds of machine learning algorithms to cryptocurrencies too (mainly LSTM models) if the point is to avoid existing content you are not doing any better.” That is a long shot predicting what you were thinking about, huh! (COVID19 would be a safer choice), but in case I was right, let me explain it to you.

The discussion of this succession of articles will be focused more on the questions that I had as a newcomer to this problem than the models themselves (don’t worry, there will be a lot about the models though). I’m not saying that this will be completely original, but at least that will be one of the goals. I don’t pretend either to have the best answer, because as I mentioned before I’m a newcomer and because is hard to believe that exists such a thing as the best way to do anything. What I will be sharing here is the path of trying to solve this problem as the time I go through it.

Considering all of that, what you should expect to see in the next articles:

  • A lot of interior monologues (until the point of the stream of consciousness sometimes). This is because I want to avoid this usual third-person omniscient narrative point of view used in most of the academic articles, which makes me feel like “where did all that stuff came from?” and “that is the absolute truth then?”. As you can see, I will even show you the article I read to write this paragraph.

After this super unnecessary background, let’s finally start to taste some of the meat of the project (soy meat for vegans).

Retrieving the data may not appear such a challenging task, but in this particular scenario, it’s definitively something we must think twice.

My first approach was to look for a reliable exchange and look for its API documentation. My first Google search was “Best exchanges for cryptocurrencies” and I found a lot of rankings with a bunch of arguments of which one was the best suitable for every different specific goal, and Coinbase was in almost everyone.

Checkout best Crypto API for developers

The package for Python (yes, and besides of coding in Python I will be executing it in a Jupyter Notebook, so original!) that I found cbpro, seemed pretty ok to me: easy to use and also with the granularity parameter, which will be super useful when we have the discussion about investment horizons. After struggling a little with the parameters for choosing the start and the end date, I check out the API documentation, and there was the natural “start” and “end” parameter. So, for example, let’s see a simple call for the ETH-USD pair.

I also had fun with candlesticks (Plotly). Link over here.

I chose an hour granularity, and the candlestick reflects the price fluctuation of these 12 days.

This first step reveals one of our first “wait a minute”. So wait a minute! What if I wanted to retrieve the last month data with the same granularity. Well, this is what happens widening up to a month the period of the call.

Beautiful! A lovely generic error. After trying different things, I found out the actual error, surpassing the 300 data points maximum. This implies that we probably need to develop a method if we want to gather more data points. But when I was figgering out how to solve this problem BANG! wait a minute! Coinbase is just one of the multiple exchanges, and its volume could be extremely low compared to the entire market. Because of that, it could be not representative of subtle signals happening in other exchanges.

After grumbling for a while, and blaming myself for having these thoughts whose only makes me start all over again, I decided to go for it. At this point is when I found Cryptocompare. One of the features that this page has is a ranking of the volume traded in the last 24 hours for more than 200 exchanges. This is the exchange table ordered by volume of the first 7 markets:

Screenshot –

Coinbase was not even in the top 20 (29 in that particular moment). So the intuition was right, the volume of the entire market is HUGE compared to just one exchange. The questions that arise are, “will be that really important in our analysis?”, “do we really need all the data?”. The answer is: let’s see.

But that questions, my friend (yes, if you read until here I declare you “my friend”) will belong to a whole new story (Medium story). I hope you grasped a little bit of what will be going on here, and if you like it, I invite you for us to meeting two weeks from now again, with all this super exciting debate of what really is a reliable source of information.

Check out the Part 2


What do you think?


电子邮件地址不会被公开。 必填项已用*标注





Data Science Project: Cryptocurrencies (Part 2) — Volume and Data Source —

對抗疫情金融危機,Steaker 如何增長客戶收益?