Books. "Dark Data: A Practical Guide to Making Good Decisions in a World of Missing Data"


I will not hide the fact that I always have a certain amount of professional interest in working with data,

their analysis.Therefore, when I saw David Hand's book with the intriguing title "Dark Data", and also read the subtitle "A Practical Guide to Making the Right Decisions in a World of Missing Data", I decided to leaf through it. To my delight, David Hand turned out to be a British statistician and also President of the Royal Statistical Society, for his work he received the title of Officer of the Order of the British Empire.

Selectively opened the book in several places,liked what I saw, ended up reading the book in a couple of days (360 pages or so, not much). The text is written in easy language, but without unnecessary simplifications and assumptions, which is very valuable in such works. Behind the simplicity lies the author's great knowledge and well-chosen examples that allow you to appreciate the importance of the data. “Dark data” refers to information that is intentionally or accidentally distorted, missing from the dataset, and changing our understanding of the subject under discussion. Let's give an example that will explain it well:

"To the Arctic Expeditions of 1852, 1857 and 1875.supplied Arctic Ale, an extra-low freezing beer brewed by Samuel Allsopp. Alfred Barnard, who wrote the history of British brewing, tasted this ale in 1889, describing it as “a pleasant brown hue, with a taste of wine and nuts, and such a fizz as if it had just been brewed ... Due to the large amount of unfiltered extract remaining, its should be regarded as an extremely valuable and nutritious product.” Just what you need in the Arctic expeditions.

In 2007 a bottle from the batch of 1852was listed on eBay with a starting price of $299. The seller who kept it for 50 years misspelled the name of the beer by missing one "p" from the word "Allsopp". As a result, the item did not show up in vintage beer searches, so there were only two submissions. Of these, the bid of 25-year-old Daniel Woodul won, who offered as much as $304. In an effort to determine the value of the purchase, Woodoul immediately put the bottle up for sale again, but this time with the correct name. In response, 157 bids were submitted with a maximum bid of $503,300.

In this case, one missing letter was worth half a million dollars. This is a clear example that the loss of information can lead to significant consequences.”

In fact, the offer of half a million wascomically, the bottle ended up being bought for $4,300. Which is still an order of magnitude different from what the first owner bailed out. Accidental distortion of information led to the fact that quite real money was lost, but such situations happen all the time. Remember how the stock market buys shares of random companies whose names are consonant with those that are really of interest. It seems that this is impossible in our time, but the situation is recurring, and it is impossible to avoid it.


Affiliate material

Reality and prospects of the IT professions market

What professions are the most popular and highly paid?

Saturday coffee #201

Pour a cup of fragrant Saturday coffee andcheck out the news of the week. Yandex opened a pre-order for a new column, VKontakte launched a messenger, Huawei held a global presentation, and LEGO offers to put together a picture…

Haval F7 test. Big, beautiful… Chinese

According to "AUTOSTAT" in February 2020Chinese automakers sold 3,208 new cars in Russia, up 35.9% from a year ago. The Haval brand has become the leader among manufacturers from the Middle Kingdom.

Honor 30. Drowned and crashed.

A saga with a happy ending about how you can break and drown a phone with splash protection and get off with a slight fright, and also learn interesting news about a profitable repair.

Introducing the reader to the classification of dark data,Hand offers insight into how errors occur. For example, there is data about which we do not know that they are missing. In America, log cabins from the time of the development of the Wild West are often cited as evidence of the building skills of ancestors. It seems that the very fact of the existence of these buildings proves the skill of the builders. But few people think where all the other huts have gone, because they have disappeared. Until our time, only the best examples have survived, and 99% have disappeared. And this is the very data that most do not think about. We often cite as an example the urban legend of dolphins that save people by pushing them towards the shore. But those whom the dolphins, playing, pushed into the open sea, can no longer tell anything. Survivor's mistake.

Another example of data corruption might betheir perception when society first pays attention to them. For example, newspapers do not publish a certain type of crime in the crime chronicle, and then it becomes fashionable. And the erroneous impression is created that this is something new and there is an increase in such crimes. But in most cases this is not the case, rather, we see the data for the first time, they become new to us.

Errors in the original data happen all the timenearby, they also need to be able to recognize. The human factor always comes first, I am sure that you will be surprised if you see millions in your bank account that were not there yesterday. The mistake is not so rare, employees of different companies often make mistakes in commas when they set numbers. For example, the Italian airline Alitalia sold business class tickets from Toronto to Cyprus in 2006 for $39 instead of $3,900 each. The total loss was $7.2 million.

But you can confuse not only commas in numbers, butcorrect columns. Mizuho Securities, an investment company, lost $300 million in 2005. She offered 610,000 shares of J-com at a price of one yen, although it should have been the other way around - the share price was 610,000 yen. The book lists dozens of such mistakes that have cost companies billions. And this is also about working with data that may not just be missing, but be distorted due to an error, or this can be done consciously.

What other errors can be? Any.An error in measuring instruments or a technique that fails. Intentional falsification that can be found using mathematical methods. One example that made me smile was sending a fake article to hundreds of scientific journals that contained nothing specific, resulting in hundreds of publications without any verification. This well illustrates the world in which we live.

You know, this is one of the books that you canto recommend reading to the widest range of people, from those who are engaged in data analysis professionally (you won’t find anything new, but you will meet interesting moments), to ordinary people who want to understand a little better how the world of information is being formed around all of us. In a word, you need to read the book, it is frankly good and written in easy language. I recommend.

Let's read #2 together. A selection of interesting books about IT and not only

About ten books that will give you excellentemotions and pleasure from reading - about the conflict between Tim Cook and Quince, about the bombing of Tokyo, the history of New York in the context of entertainment and more. A selection of books for a couple of months of leisurely reading.

A selection of books about technology and not only. Reading together

A selection of ten books - smart and funny, sometimessad. About who lives next door to us and whom we do not see, about scammers and our perception of reality. Different books, but always smart, interesting and thought provoking.