Data Syndrome

Presentations

Agile Data Science 2.0
Introduction to PySpark

Agile Data Science 2.0

Building Full Stack Data Analytics Applications with Kafka and Spark

Agile Data Science 2.0 (O’Reilly 2017) defines a methodology and a software stack with which to apply the methods. The methodology seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. The stack is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications.

Introduction to PySpark

Spark has emerged as the leading general purpose distributed data processing platform. PySpark offers a Python interface to Spark that enables all the power of Python for data processing to come to bear when computing with Spark. Working with airline flight delay data, the tutorial will start by covering basic operations in PySpark: loading and storing data, filtering, mapping, grouping, and SQL operations. We'll go on to tour the RDD and DataFrame APIs, showing how and when to use them. We'll learn how to prepare data and store it in different kind of databases. The class will show how to combine data flow programming andSpark SQL to slice and dice data of any size. Finally, we'll show how to use machine learning via Spark MLlib to build a predictive model to predict flight delays.

DATE	EVENT	TALK TITLE	DESCRIPTION
Oct 12, 2017	Agile Data London	Agile Data Science 2.0	Discussing the agile development of full-stack analytics applications.
July 13, 2017	Seattle Spark Meetup	Agile Data Science 2.0	Discussing the agile development of full-stack analytics applications.
July 12, 2017	Metis: Seattle Data Science	Agile Data Science 2.0	Discussing the agile development of full-stack analytics applications.
July 11, 2017	Seattle Full Stack Meetup	Agile Data Science 2.0	Discussing the agile development of full-stack analytics applications.
Jun 29, 2017	DFW MongoDB Meetup, DFW Data Science, DFW Big Data	Agile Data Science 2.0	Discussing the agile development of full-stack analytics applications.
Jun 15, 2017	Los Angeles MongoDB User Group	Agile Data Science 2.0	Discussing how to use Spark with MongoDB to build big data science applications.
Jun 14, 2017	Research Methods and Data Science Meetup, Los Angeles, CA	Agile Data Science 2.0	Discussing the agile development of full-stack analytics applications. Focusing on how to do applied research through Agile Data Science.
Jun 12, 2017	Dataworks Summit San Jose	Agile Data Science 2.0: Agile and Iterative Machine Learning	Discussing the agile development of full-stack analytics applications.
Apr 13, 2017	Metis San Francisco Data Science	Agile Data Science 2.0	Discussing the agile development of full-stack analytics applications.
Mar 16, 2017	Office Hour with Russell Jurney at O'Reilly Strata in San Jose	Agile Data Science 2.0	The creation, deployment, and iterative improvement of a real-time predictive system using Python, Spark MLlib, Spark Streaming, Kafka, MongoDB, and JQuery.
Mar 15, 2017	O'Reilly Strata San Jose	Track Host, Room 210 A/E	I will be introducing speakers and entertaining the audience on Wednesday in room 210 A/E.
Mar 14, 2017	Airflow Meetup @ Paypal	Predictive Analytics with Airflow and PySpark	How to wrangle PySpark scripts into a full blown analytics application using Airflow.
Mar 6, 2017	SF Data Science Meetup	Agile Data Science	Discussing the agile development of full-stack analytics application.
Feb 24, 2017	Big Data Science Meetup	Agile Data Science	Discussing the agile development of full-stack analytics application.
Feb 15, 2017	SF Python Meetup	Introduction to PySpark	An introduction to data processing and machine learning with Apache PySpark.
Feb 8, 2017	Big Data Application Meetup	Agile Data Science	Discussing the agile development of full-stack analytics application.

DATE

EVENT

TALK TITLE

DESCRIPTION

Oct 12, 2017

Agile Data London

Agile Data Science 2.0

Discussing the agile development of full-stack analytics applications.

July 13, 2017

Seattle Spark Meetup

Agile Data Science 2.0

Discussing the agile development of full-stack analytics applications.

July 12, 2017

Metis: Seattle Data Science

Agile Data Science 2.0

Discussing the agile development of full-stack analytics applications.

July 11, 2017

Seattle Full Stack Meetup

Agile Data Science 2.0

Discussing the agile development of full-stack analytics applications.

Jun 29, 2017

DFW MongoDB Meetup, DFW Data Science, DFW Big Data

Agile Data Science 2.0

Discussing the agile development of full-stack analytics applications.

Jun 15, 2017

Los Angeles MongoDB User Group

Agile Data Science 2.0

Discussing how to use Spark with MongoDB to build big data science applications.

Jun 14, 2017

Research Methods and Data Science Meetup, Los Angeles, CA

Agile Data Science 2.0

Discussing the agile development of full-stack analytics applications. Focusing on how to do applied research through Agile Data Science.

Jun 12, 2017

Dataworks Summit San Jose

Agile Data Science 2.0: Agile and Iterative Machine Learning

Discussing the agile development of full-stack analytics applications.

Apr 13, 2017

Metis San Francisco Data Science

Agile Data Science 2.0

Discussing the agile development of full-stack analytics applications.

Mar 16, 2017

Office Hour with Russell Jurney at O'Reilly Strata in San Jose

Agile Data Science 2.0

The creation, deployment, and iterative improvement of a real-time predictive system using Python, Spark MLlib, Spark Streaming, Kafka, MongoDB, and JQuery.

Mar 15, 2017

O'Reilly Strata San Jose

Track Host, Room 210 A/E

I will be introducing speakers and entertaining the audience on Wednesday in room 210 A/E.

Mar 14, 2017

Airflow Meetup @ Paypal

Predictive Analytics with Airflow and PySpark

How to wrangle PySpark scripts into a full blown analytics application using Airflow.

Mar 6, 2017

SF Data Science Meetup

Agile Data Science

Discussing the agile development of full-stack analytics application.