Join Our Mailing List

Agile Data Science 2.0

Building Full-Stack Data Analytics Applications with Spark

Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use 'big data' tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.

Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn and Apache Airflow. More important than any one technology, we show you how to compose a data platform to make you a productive application developer.

  • Details
  • Table of Content
  • About Author

Details

Title: Agile Data Science 2.0
By: Russell Jurney
Publisher: O'Reilly Media
Formats: Print Safari Books Online Early Release Ebook
Print: May 2017 (est.)
Early Release Ebook: September 2016
Pages: 325 (est.)
Print ISBN: 978-1-4919-6011-0 | ISBN 10:1-4919-6011-6
Early Release Ebook ISBN: 978-1-4919-6004-2 | ISBN 10:1-4919-6004-3

Table of Content

  1. Theory - Introduces the Agile Big Data methodology.
  2. Toolset - Introduces our toolset, and helps you get it up and running on your own machine
  3. Data - Describes the dataset used in this book
  4. Collecting and Displaying Records - Helps you download flight data and then connect or “plumb” flight records through to a web application.
  5. Visualizing Data with Charts and Tables - Steps you through how to navigate your data by preparing simple charts in a web application.
  6. Exploring Data with Reports - Teaches you how to extract entities from your data and parametize and link between them to create interactive reports.
  7. Making Predictions - Takes what you’ve done so far and predicts whether your flight will be on-time or late.
  8. Deploying Predictive Systems - Shows how to deploy predictions to ensure they impact real people and systems.
  9. Improving Predictions - Iteratively improve on the performance of our on-time flight prediction.

About Author

Russell Jurney is a practicing data scientist living in Pacifica, CA. He is principal consultant at Data Syndrome. His other works include Agile Data Science (1.0), Big Data for Chimps and Mapping Big Data.