Join Our Mailing List

Agile Data Science 2.0

Building Full-Stack Data Analytics Applications with Spark


With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters are available, and the final ebook bundle is released.

Building analytics products at scale requires a deep investment in people, machines, and time. How can you be sure you're building the right models that people will pay for? With this hands-on book, you'll learn a flexible toolset and methodology for building effective analytics applications with Spark.

Using lightweight tools such as Python, PySpark, Elastic MapReduce, MongoDB, ElasticSearch, Doc2vec, Deep Learning, D3.js, Leaflet, Docker and Heroku, your team will create an agile environment for exploring data, starting with an example application to mine flight data into an analytic product.

  • Details
  • Table of Content
  • About Author


Title: Agile Data Science 2.0
By: Russell Jurney
Publisher: O'Reilly Media
Formats: Print Safari Books Online Early Release Ebook
Print: May 2017 (est.)
Early Release Ebook: September 2016
Pages: 325 (est.)
Print ISBN: 978-1-4919-6011-0 | ISBN 10:1-4919-6011-6
Early Release Ebook ISBN: 978-1-4919-6004-2 | ISBN 10:1-4919-6004-3

Table of Content

  1. Theory - Introduces the Agile Big Data methodology.
  2. Toolset - Introduces our toolset, and helps you get it up and running on your own machine
  3. Data - Describes the dataset used in this book
  4. Collecting and Displaying Records - Helps you download flight data and then connect or “plumb” flight records through to a web application.
  5. Visualizing Data with Charts and Tables - Steps you through how to navigate your data by preparing simple charts in a web application.
  6. Exploring Data with Reports - Teaches you how to extract entities from your data and parametize and link between them to create interactive reports.
  7. Making Predictions - Takes what you’ve done so far and predicts whether your flight will be on-time or late.
  8. Deploying Predictive Systems - Shows how to deploy predictions to ensure they impact real people and systems.
  9. Improving Predictions - Iteratively improve on the performance of our on-time flight prediction.

About Author

Russell Jurney is a practicing data scientist living in Pacifica, CA. He is principal consultant at Data Syndrome. His other works include Agile Data Science (1.0), Big Data for Chimps and Mapping Big Data.