Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use 'big data' tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.
Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn and Apache Airflow. More important than any one technology, we show you how to compose a data platform to make you a productive application developer.
Title: | Agile Data Science 2.0 |
---|---|
By: | Russell Jurney |
Publisher: | O'Reilly Media |
Formats: | Print Safari Books Online Early Release Ebook |
Print: | May 2017 (est.) |
Early Release Ebook: | September 2016 |
Pages: | 325 (est.) |
Print ISBN: | 978-1-4919-6011-0 | ISBN 10:1-4919-6011-6 |
Early Release Ebook ISBN: | 978-1-4919-6004-2 | ISBN 10:1-4919-6004-3 |
Russell Jurney is a practicing data scientist living in Pacifica, CA. He is principal consultant at Data Syndrome. His other works include Agile Data Science (1.0), Big Data for Chimps and Mapping Big Data.