# Book Reviews

**What We’re Reading **

# Book Review

**Test-Driven Machine Learning **

The book begins with an introduction to test-driven machine learning and quantifying model quality. From there, you will test a neural network, predict values with regression, and build upon regression techniques with logistic regression...

**Python Data Science Handbook**

By Jake VanderPlas

The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it's a fast-paced introduction to the Python language aimed at researchers and scientists.

**Neural Networks and Deep Learning**

By Michael Nielsen

*Neural Networks and Deep Learning* is a free online book. The book will teach you about:

- Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data
- Deep learning, a powerful set of techniques for learning in neural networks

**Think Bayes**

By Allen B. Downey

*Think Bayes* is an introduction to Bayesian statistics using computational methods.

The premise of this book, and the other books in the *Think X* series, is that if you know how to program, you can use that skill to learn other topics.

**Statistical Learning with Sparsity: The Lasso and Generalizations**

By Trevor Hastie, Robert Tibshirani, Martin Wainwright

During the past decade there has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. This book descibes the important ideas in these areas in a common conceptual framework.

**Statistical inference for data science**

By Brian Caffo

This book is written as a companion book to the Statistical Inference Coursera class as part of theData Science Specialization. However, if you do not take the class, the book mostly stands on its own. A useful component of the book is a series of YouTube videos that comprise the Coursera class.

**Convex Optimization**

By Stephen Boyd and Lieven Vandenberghe

This book is about convex optimization, a special class of mathematical optimization problems, which includes least-squares and linear programming problems.

**Natural Language Processing with Python **

By Steven Bird, Ewan Klein, and Edward Loper

This is a book about Natural Language Processing. By "natural language" we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules.

**Automate the Boring Stuff with Python**

By Al Sweigart

If you've ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be. But what if you could have your computer do them for you?

**Social Media Mining: An Introduction**

By Reza Zafarani, Mohammad Ali Abbasi and Huan Liu

The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining.

**Naked Statistics: Stripping the Dread from the Data**

“For those who slept through Stats 101, this book is a lifesaver. The author strips away the arcane and technical details and focuses on the underlying intuition that drives statistical analysis. He clarifies key concepts such as inference, correlation, and regression analysis, reveals how biased or careless parties can manipulate or misrepresent data, and shows us how brilliant and creative researchers are exploiting the valuable data from natural experiments to tackle thorny questions.”

**Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking**

“Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today. Based on an MBA course Provost has taught at New York University over the past ten years, the book provides examples of real-world business problems to illustrate these principles.”

**Data Smart: Using Data Science to Transform Information into Insight**

“Data Science gets thrown around in the press like it’s magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It’s a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions. Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that’s done within the familiar environment of a spreadsheet.”

**Data Science from Scratch: First Principles with Python**

“In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.”

**R Cookbook: Proven Recipes for Data Analysis, Statistics, and GFraphics (O’reilly Cookbooks)**

“This book helps you perform data analysis with R quickly and efficiently. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression. Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an experienced data programmer, it will jog your memory and expand your horizons.”

**R for Data Science: Import, Tidy, Transform, Visualize, and Model Data**

“Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results.”

**Numsense! Data Science for the Layman: No Math Added**

“Want to get started on data science? Our promise: no math added. This book has been written in layman’s terms as a gentle introduction to data science and its algorithms. Each algorithm has its own dedicated chapter that explains how it works, and shows an example of a real-world application. To help you grasp key concepts, we stick to intuitive explanations, as well as lots of visuals, all of which are colorblind-friendly. This book provides a practical understanding of data science, so that you can leverage its strengths in making better decisions.”

“Do you know that last two years accounts for 90 percent of the data in the world? Data whispers stories. Only if you listen carefully, process it, analyze it and act on it, to move towards your next revolution. In this book, you will have gain tremendous insights, understanding and basics of Big Data and how it can helps to identify new growth areas and product opportunities, streamline their costs, increase their operating margins and above all; make better human resource decisions using efficient budgets.”

“Practical Data Science with R shows you how to apply the R programming language and useful statistical techniques to everyday business situations. Using examples from marketing, business intelligence, and decision support, it shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels. This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed.”

**Introduction to Machine Learning with Python: A Guide for Data Scientists**

“Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination. You’ll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library.”

“The Data Science Handbook is an ideal resource for data analysis methodology and big data software tools. The book is appropriate for people who want to practice data science, but lack the required skill sets. This includes software professionals who need to better understand analytics and statisticians who need to understand software. Modern data science is a unified discipline, and it is presented as such. This book is also an appropriate reference for researchers and entry-level graduate students who need to learn real-world analytics and expand their skill set.”

**Doing Data Science: Straight Talk from the Frontline**

“In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science. *Doing Data Science* is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.”

“Data Science For Dummies is the perfect starting point for IT professionals and students who want a quick primer on all areas of the expansive data science space. With a focus on business cases, the book explores topics in big data, data science, and data engineering, and how these three areas are combined to produce tremendous value. If you want to pick-up the skills you need to begin a new career or initiate a new project, reading this book will help you understand what technologies, programming languages, and mathematical methods on which to focus.”

**Python Data Science Handbook: Essential Tools for Working with Data**

“Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.”

**The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists**

“The Data Science Handbook contains interviews with 25 of the world’s best data scientists. In The Data Science Handbook, you will find war stories from DJ Patil, US Chief Data Officer and one of the founders of the field. You’ll learn industry veterans such as Kevin Novak and Riley Newman, who head the data science teams at Uber and Airbnb respectively. This book is perfect for aspiring or current data scientists to learn from the best. It’s a reference book packed full of strategies, suggestions and recipes to launch and grow your own data science career.”

**Data Analytics: Master The Techniques For Data Science, Big Data And Data Analytics**

I”nside you will find the tools you need in order to take full advantage of all of the data that your business is already generating. There are currently over a quintillion byte of data being created each and every day and if you aren’t considering how you can make the most of your share then you are already losing out to the competition. Understanding what this data truly means is key to succeeding in the marketplace these days and if you are looking for a way to give yourself an edge then Data Analytics is the book you have been waiting for.”

**Practical Statistics for Data Scientists: 50 Essential Concepts**

“Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.”

**Data Analytics Made Accessible: 2017 Edition**

“This book fills the need for a concise and conversational book on the growing field of Data Science. Easy to read and informative, this lucid book covers everything important, with concrete examples, and invites the reader to join this field. The book contains case-lets from real-world stories at the beginning of every chapter. There is also a running case study across the chapters as exercises. This 2017 edition has added four new chapters in response to the thoughts and suggestions expressed by many reviewers. Finally, it includes a tutorial for R platform.”

**Learning R: A Step-by-Step Function Guide to Data Analysis**

“Learn how to perform data analysis with the R language and software environment, even if you have little or no programming experience. With the tutorials in this hands-on guide, you’ll learn how to use the essential R tools you need to know to analyze data, including data types and programming concepts. The second half of Learning R shows you real data analysis in action by covering everything from importing data to publishing your results. Each chapter in the book includes a quiz on what you’ve learned, and concludes with exercises, most of which involve writing R code.”

“After discussing the trajectory from data to insight to decision, the book describes four approaches to machine learning: information-based learning, similarity-based learning, probability-based learning, and error-based learning. Each of these approaches is introduced by a nontechnical explanation of the underlying concept, followed by mathematical models and algorithms illustrated by detailed worked examples. Finally, the book considers techniques for evaluating prediction models and offers two case studies that describe specific data analytics projects through each phase of development.”

**The Book of Why: The New Science of Cause and Effect**

In his new book, *The Book of Why: The New Science of Cause and Effect*, computer scientist and statistician Judea Pearl argues that AI will need to understand the how and why of relationships to reach human-like intelligence. Pearl, winner of the 2011 Turing Award, the highest honor in computer science, proposes that a reliance on association rule learning is hampering the development of AI. Pearl asserts that truly intelligent machines could handle situations for which they no have data and that machines equipped with causal reasoning tools, such as the algorithmization of counterfactuals, will experience accelerated learning speeds.

Confident **Data** Skills, by Kirill Eremenko, reviewed

Irish Tech News

About the author: Kirill Eremenko is a **data scientist** and entrepreneur. He is the founder and CEO of Superdatascience.com an online educational platform in the space of **Data Science** and Artificial Intelligence. The company's mission is to 'Make The Complex Simple', teaching tool-based courses such ...

January 9, 2018 by Daniel Gutierrez Leave a Comment

AI and deep learning are pretty hot technologies right now, what with the accelerating interest in computer vision, image recognition and classification, natural language processing (NLP), and speech recognition. Deep Neural Networks (DNNs), upon which deep learning is based, are trained with large amounts of data, and can solve complex tasks with unprecedented accuracy. TensorFlow is a leading open source software framework that helps you build and train neural networks. Here’s a nice resource to help you kick-start your use of TensorFlow – “Learning TensorFlow” by Tom Hope, Yehezkel S. Resheff and Itay Leider.

This O’Reilly book is short and sweet at 228 pages. I found it concise in how it provides a hands-on approach to TensorFlow fundamentals for a broad technical audience – from data scientists, to data engineers, to students and researchers. If you’re looking for an in-depth introduction to neural networks and deep learning however, this book is not for you. There are many other fine texts for that purpose. The purpose of the book rather, is to provide a quick introduction to the TensorFlow framework and get you up and running. I think this goal is achieved. This book is a welcome alternative to the online documentation for TensorFlow (I don’t like learning purely from online content; I need to hold a book in my hands). You’ll need to be familiar with Python programming, as code snippets are found throughout the book.

The book directs you to the MNIST handwritten digits data set to perform some machine learning and image processing. So early on, you are building Convolutional Neural Networks (CNNs) with Python and TensorFlow. Next, you’re introduced to the CIFAR10 data set. You’ll learn to train a DNN and build models to recognize images of automobiles, airplanes, and various animals with a decent 70% accuracy using TensorFlow. Some may question the use of the MNIST and CIFAR data sets (which are the same ones you’ll find discussed on the TensorFlow website), but I don’t see that as a bad thing. These are industry standard data sets and offer a certain level of familiarity while learning a new framework.

**Here is a list of chapters:**

Chapter 1 – Introduction

Chapter 2 – Go with the Flow: Up and Running with TensorFlow

Chapter 3 – Understanding TensorFlow Basics

Chapter 4 – Convolutional Neural Networks

Chapter 5 – Text I: Working with Text and Sequences, and TensorBoard Visualization

Chapter 6 – Text II: Word Vectors, Advanced RNN, and Embedding Visualization

Chapter 7 – TensorFlow Abstractions and Simplifications

Chapter 8 – Queues, Threads, and Reading Data

Chapter 9 – Distributed TensorFlow

Chapter 10 – Exporting and Serving Models with TensorFlow

At only 228 pages, you can’t consider this book a complete TensorFlow reference manual. However, you can use the book as a first-level reference as you dig deeper into the framework using more in-depth online resources. You’ll benefit from knowing some Python, and reasonable knowledge of computer science, machine learning, linear algebra, and statistics are almost expected.

I appreciated that the higher level abstractions are saved for later in the book (Chapter 7). Once you’ve worked through CNNs and RNNs, the book introduces contrib.learn, TFLearn and Keras for higher level abstraction. The book walks you through how to install and use these open source technologies. Additionally, no contemporary book about deep learning technology would be complete without a discussion of distributed computing (Chapter 8). This chapter walks through examples of working with clusters to compute gradients across a cluster to speed up training.

“Learning TensorFlow” represents a quick introduction to this popular deep learning framework. It won’t be your only learning resource, but it’s a great place to start. If you find yourself going through the new 5 course series for the Deep Learning Specialization on Coursera, you’ll find that TensorFlow is used regularly and this book will be a welcome resource.

*Contributed by Daniel D. Gutierrez, Managing Editor and Resident Data Scientist of insideBIGDATA. In addition to being a tech journalist, Daniel also is a practicing data scientist, author, educator and sits on a number of advisory boards for various start-up companies. *

### Recommendation Engines: Learn How to Drive More Users to Your Content

A new guide from Dataiku provides a high-level overview of recommendation engines, how they’re built, and how they can be used to improve your business. Download the full report to find out how recommendation engines can be an effective way to drive more eyes to your content.

### “Deep Learning” Book Chapter Walk-Throughs by Ian Goodfellow

Here's a tremendous learning resource for Deep Learning practitioners - a complete set of video walk-through presentations for each chapter from the recent book "Deep Learning" by Goodfellow, Bengio, and Courville. This book is considered one of the finest texts on the subject. The video series is an excellent way to advance through all the material in the book. Get the full story here.

· China’s AI Awakening. The West shouldn’t fear China’s artificial-intelligence revolution. It should copy it. (MIT Technology Review)

· Numerai’s Master Plan. The core idea of Numerai was to give away all of our data for free, and let anyone train machine learning algorithms on it and submit predictions to our hedge fund. (Medium)

· Using Monte Carlo simulations to balance supply and demand in a marketplace. How Instacart decides when to stop taking orders and that hinders them from fulfilling same day delivery. (Medium)

· Deep Generative Models. A tutorial of the recent advances in deep generative models that allow for data-efficient learning and model-based reinforcement learning. (DeepMind)

· Hardware Architectures for Deep Neural Networks. An overview of DNNs, the tradeoffs of the various architectures, techniques to reduce the computation cost, and the different hardware requirements for inference and training. (MIT)

I recently had a need for a Python language resource to supplement a series of courses on Deep Learning I was evaluating that depended on this widely used language. As a long-time data science practitioner, my language of choice has been R, so I relished the opportunity to dig into Python to see first hand how the other side of the data science world did machine learning. The book I settled on was "Python Data Science Handbook: Essential Tools for Working with Data" by Jake VanderPlas. Get the full story here.

#### Statistics Done Wrong: The Woefully Complete Guide

*"... a pithy, essential guide to statistical blunders in modern science that will show you how to keep your research blunder-free..."*