Tutorials/Invited Talks – ICDATA'19

Invited Talks, Tutorials

1) Peter Geczy: Data Revolution: From Data Science to Data Economy

2) Dawud Gordon: Applications of behavior-based authentication in the business world

3) Diego Galar: More Data Less Downtime

4) Mark C. Lewis: Big Data Analytics with Spark

5) John N. Carbone: Reducing Cyber Security Non-linearity, Risks and Complexity utilizing Axiomatic Design and Autonomous AI/Machine Learning

6) Andrew Johnston: Tutorial on Darknets and Cryptocurrency

7) Ulf Johansson: Tutorial on Predicting with confidence

Tutorial 1

Speaker	Andrew Johnston Mandiant (Consultant)
Topic/Title	Tutorial on "Darknets and Cryptocurrency"
Date & Time	Wednesday, August 1, 2018 05:40 - 07:40pm
Location	Galleria B
Description	Hidden anonymous networks (commonly known as "darknets") only grow more popular as the criminal underworld continues to recognize the value of these services. In a similar vein, cryptocurrency is the preferred way for criminals and terrorists to exchange funds. In this talk, we will explore these two technologies and how each are currently used by different threat groups. With the "state of the dark side" established, we will explore the new and emerging threats these technologies pose both to individuals, corporations, and governments. Techniques such as asymmetric communication channels, C2-less malware, and misattributable network intrusions will be discussed. Finally, we will discuss why our traditional defenses will fail to meet these threats, and analyze some potential solutions to counter the latest generation of threats.
Short Bio	Andrew Johnston is a consultant at Mandiant, a division of FireEye. Currently, he works as a proactive consultant performing red team, social engineering, and physical assessments for multinational clients. Andrew also leads a research team at Fordham University dedicated to using machine learning to eradicate Islamic extremism and narcotics trafficking online. Previously, Andrew had worked with City of Hope Hospitals, LaQuinta Hotels, Staples, and the FBI employing his data-driven approach to improving cybersecurity.

Tutorial 2

Speaker	Ulf Johansson Department of Computer Science and Informatics, Jönköping University, Sweden, ulf.johansson@ju.se
Topic/Title	Tutorial on "Predicting with confidence"
Date & Time	Tuesday, July 31, 2018 03:40 - 05:40pm
Location	Galleria B
Description	How good is your prediction? In risk-sensitive applications, it is crucial to be able to assess the quality of a prediction, but traditional classification and regression models don't provide their users with any information regarding prediction trustworthiness. Conformal predictors, on the other hand, are predictive models that associate each of their predictions with a precise measure of confidence. Given a user-defined significance level E, a conformal predictor outputs, for each test pattern, a multivalued prediction region (class label set or real-valued interval) that, under relatively weak assumptions, contains the test pattern’s true output value with probability 1-E. In other words, given a significance level E, a conformal predictor makes an erroneous prediction with probability E. The conformal prediction framework allows any traditional classification or regression model to be transformed into a confidence predictor with little extra work, both in terms of implementation and computational complexity. Some key properties of conformal prediction are: • We obtain probabilities/error bounds per instance • Probabilities are well-calibrated: 95% means 95% • We don't need to know the priors • We make a single assumption - that the data is exchangeable ~ i.i.d. • We can apply it to any machine learning algorithm • It is rigorously proven and straightforward to implement • There is no magic involved – only mathematics and algorithms Hence, confidence predictors is an important tool that every data scientist should carry in their toolboxes, and conformal prediction represents a straight-forward way of associating the predictions of any predictive machine learning algorithm with confidence measures. This tutorial aims to provide an introduction and an example-oriented exposition of the conformal prediction framework, directed at machine learning researchers and professionals. A publicly available Python library, developed by one of the authors of the tutorial, will be used for the running examples. The goal of the tutorial is to provide attendees with the knowledge necessary for implementing functional conformal predictors, and to highlight current research on the subject. Authors: Henrik Boström (Stockholm University), Lars Carlsson (AstraZeneca), Alex Gammerman (Royal Holloway, University of London), Ulf Johansson (Jönköping University) and Henrik Linusson (University of Borås)
Short Bio	Prof. Ulf Johansson holds a M.Sc. in Computer Engineering and Computer Science from Chalmers University of Technology, and a PhD degree in Computer Science from the Institute of Technology, Linköping University, Sweden. Ulf Johansson’s research focuses on developing machine learning algorithms for data analytics. Most of the research is applied, and often co-produced with industry. Application areas include drug discovery, health science, marketing, high-frequency trading, game AI, sales forecasting and gambling. In 2011, he had his 15 minutes of fame when called as an expert witness in the Swedish Supreme Court regarding whether Poker is a game of skill or chance. In the court, Prof. Johansson argued that skill predominates over chance using, among other sources, his paper “Fish or Shark – Data Mining Online Poker”, originally presented at DMIN 2009. Ulf Johansson has published extensively in the fields of artificial intelligence, machine learning, soft computing and data mining. He is also a regular program committee member of the leading conferences in computational intelligence and machine learning. During the last few years, Prof. Johansson has published several papers on conformal prediction, some presented in top-tier venues like the Machine Learning journal and the ICDM conference.
Slides	Conformal Prediction

Invited Talk 1

Speaker	Dr. Peter Geczy National Institute of Advanced Industrial Science and Technology (AIST), Japan
Topic/Title	Data Revolution: From Data Science to Data Economy
Date & Time	Monday, July 30, 2018 11:00a - 12:20p
Location	Galleria B
Description	We are witnessing data revolution. Explosion of digital data has been affecting all segments of contemporary society—from science to economy. Commercial organizations and governments have been accumulating vast volumes of diverse data. Data has become a key asset for modern technology companies and organizations. It has a significant inherent value. When properly utilized, it drives commercial revenue streams, innovation, and discovery. Realization of value of data is a notable challenge. Data Science has emerged as an interdisciplinary endeavor to tackle such challenges. Approaches and methods of data science have been extensively employed by data-oriented businesses. They have been playing a vital role in an expanding spectrum of economic activities—giving birth to new data economy. We shall explore pertinent drivers and trends at the intersection of data science and data economy.
Short Bio	Dr. Peter Geczy holds a senior position at the National Institute of Advanced Industrial Science and Technology (AIST). His recent research interests are in information technology intelligence. This multidisciplinary research encompasses development and exploration of future and cutting-edge information technologies. It also examines their impacts on societies, organizations and individuals. Such interdisciplinary scientific interests have led him across domains of technology management and innovation, data science, service science, knowledge management, business intelligence, computational intelligence, and social intelligence. Dr. Geczy received several awards in recognition of his accomplishments. He has been serving on various professional boards and committees, and has been a distinguished speaker in academia and industry. He is a senior member of IEEE and has been an active member of INFORMS and INNS.
Slides	not available

Invited Talk 2

Speaker	Dr. Dawud Gordon CEO & Co-Founder at TWOSENSE.AI, NYC
Topic/Title	Applications of behavior-based authentication in the business world
Date & Time	Wednesday, August 1, 2018 03:40 - 05:40pm
Location	Galleria B
Description	This talk will look at the applications of behavior-based authentication in the business world and how it has been affected by research. We’ll look at the field as a whole, and also at the vision for us at TwoSense, and demonstrate that this is a problem that is insoluble without the application of machine learning. We will then take a deeper look at the machine learning challenges that must be overcome, and a few novel solutions from our labs. We will then look at some of the lessons learned from deploying behavioral biometrics in the wild in product settings. From there, we will look at some research methodology issues that we’ve come across and conclude by proposing a few best practices for the behavioral biometrics community.
Short Bio	Dr. Dawud Gordon is CEO & Co-Founder at TWOSENSE.AI, a NYC-based cybersecurity startup working with Behavioral Biometrics. TWOSENSE.AI uses Machine Learning to create a mobile AI that learns to recognize the user based on their behavior. This enables authentication that is actionless, so there’s nothing you have to do, continuous, so it’s always on even if you’re not interacting with the device, and more secure than a fingerprint. TwoSense changes the fundamental paradigm of identity security away from making you responsible for proving you’re the authorized user, to making the machine do the work for you. Dawud holds a Ph.D. in Computer Engineering from KIT in Karlsruhe, Germany for his work on using Machine Learning to recognize social group behaviors from sensor signals off of members’ mobile and wearable devices. He has published over 30 peer reviewed papers and patents on related topics, won several awards for his research including Best Paper, and currently serves on the Programming Committee of the International Symposium for Wearable Computing (ISWC).
Slides	Behavior-based authentication

Invited Talk 3

Speaker	Dr. Diego Galar Luleå University of Technology, Sweden
Topic/Title	More Data Less Downtime - The road from description, prediction towards prescription by the means of digitization
Date & Time	Monday, July 30, 2018 05:20 - 07:20pm
Location	Galleria B
Description	Industrial systems are complex with respect to technology and operations with involvement in a wide range of human actors, organizations and technical solutions. Indeed, the concept of Industry 4.0 is a term that describes the digital generation of industrial activity which is enabled by smart systems and Internet-based solutions. Two of the characteristic features of 4.0 revolution are computerization by utilizing cyber-physical systems and intelligent assets that are based on the concept of “internet of things”. The maintenance of industrial machines is one of the most challenging applications in the new era of Industry 4.0. Considering the fact that industrial markets are extremely asset-intensive, optimizing the utilization, management and tracking of assets is a key priority. However, to be able to achieve maximal results, it is not enough only to predict in a standardised way, but in real-time with real-data, and smart-devices all connected. With the rise of IoT, involving millions of ‘’connected “things” generating millions of data, with the cost of sensors dropping and the availability of Big Data technologies increasing, critical asset maintenance is and will be largely monitored remotely with Big Data analytics. Indeed, Operation and maintenance is one of the application areas, in form of self-learning and smart systems that predicts failure, makes diagnosis and triggers O&M actions by making use of “internet of things” and further analytics. IoT generates huge amount of data in different operational and contextual scenarios which help the use/owner/maintainer/designer to perform virtual commissioning of the asset and virtualize it combining the existing physical models with the data collected from the field by the means of IoT devices producing a digital twin containing both data driven and physical information. This talk will discuss the possibilities that lie within applying the IoT as a pillar for further analytics and servitization in industry by the means of virtualization. In this regard, much information needs to be captured and mined to assess the overall condition of the whole system including the one from design and manufacturing which obviously contains the physical knowledge. Therefore, the integration of information during the entire lifecycle (PLM approach) is required to get accurate health and performance assessment of the whole system.
Short Bio	Dr. Diego Galar is Professor of Condition Monitoring in the Division of Operation and Maintenance Engineering at LTU, Luleå University of Technology where he is coordinating several H2020 projects related to different aspects of cyber physical systems, Industry 4.0, IoT or industrial Big Data. He was also involved in the SKF UTC centre located in Lulea focused on SMART bearings and also actively involved in national projects with the Swedish industry or funded by Swedish national agencies like Vinnova. He is also principal researcher in Tecnalia (Spain), heading the Maintenance and Reliability research group within the Division of Industry and Transport. He has authored more than five hundred journal and conference papers, books and technical reports in the field of maintenance, working also as member of editorial boards, scientific committees and chairing international journals and conferences and actively participating in national and international committees for standardization and R&D in the topics of reliability and maintenance. In the international arena, he has been visiting Professor in the Polytechnic of Braganza (Portugal), University of Valencia and NIU (USA) and the Universidad Pontificia Católica de Chile. Currently, he is visiting professor in University of Sunderland (UK), University of Maryland (USA), and Chongqing University in China.
Slides	More Data Less Downtime

Invited Talk 4

Speaker	Prof. Mark C. Lewis Department of Computer Science, Trinity University, San Antonio, Texas, USA
Topic/Title	Big Data Analytics with Spark
Date & Time	Wednesday August 1, 2018, 09:20 - 10:20am
Location	Galleria B
Description	We live in an age of data, presenting us with the challenge of trying to find meaning in all of that data. Google's MapReduce, as embodied in the Hadoop implementation, ushered in the era of big data analytics by providing a standard system that allowed data to be analyzed across a cluster with good fault tolerance. Hadoop does this by storing results off to disk after each reduce step. This provides fault tolerance, but at a high cost to speed. The Spark framework sits in the Hadoop ecosystem as an alternative to straight MapReduce that performs more operations in memory, and thus can run much faster. Standard benchmarks have shown it performing as much as 100x faster than Hadoop on standard benchmarks. Attendees of this tutorial will be introduced to the Spark framework. We will run through a number of example problems showing how they can be solved using the operations provided through both Resilient Distributed Datasets (RDDs) and the Dataset abstraction of Spark SQL. We will also run through a number of examples of machine learning using the Spark ML library.
Short Bio	Mark Lewis has been in the Department of Computer Science at Trinity University since 2001. His courses tend to focus on aspects related to programming/programming languages, including web development, and simulation/scientific computing. He has been the lead author on over 30 papers spanning a range of topics from planetary ring dynamics in the journal Icarus to the SIGCSE annual conference proceedings. He is also the author of several textbooks using Scala published by CRC Press and has over 1 million views of his tutorial videos on his YouTube channel that focuses on Scala.

Invited Talk 5

Speaker	Dr. John N. Carbone Forcepoint Cyber Security, USA; Senior Director Technical Solutions and scientific advisor to Forcepoint LLC Global Governments and Critical Infrastructure division
Topic/Title	Reducing Cyber Security Non-linearity, Risks and Complexity utilizing Axiomatic Design and Autonomous AI/Machine Learning
Date & Time	Wednesday August 1, 2018, 08:20 - 09:20am
Location	Galleria B
Description	Historically, research shows analysis, characterization, and classification of complex heterogeneous non-linear systems and interactions have been difficult to accurately understand and effectively model. The exponential expansion of IoT and Cyber Physical Systems and the litter of current unscrupulous cyber events depict an ever challenging environment wrought with synonymous non-linearity. This provides today’s significant incentive to industry and academia for advancement of cyber security research towards solutions and prediction of advanced and persistent global threats. Recent advances in Artificial Cognitive Architectures and Artificial Intelligence (AI) are benefitting disciplines struggling with learning from rapid increasing data volume, velocity, and complexity. Research shows complexity reducing axiomatic design benefitting optimization of complex systems, significant ambiguity reduction and increased insights from high volume, highly diverse, high velocity data. Therefore, proposed is a multidisciplinary approach to potentially reduce cyber security risk and complexity. We combine novel AI adaptive machine based learning and axiomatic design based complexity reduction techniques to optimize cyber systems and improve autonomous cyber learning potential. Specifically, we apply new, novel, near real-time, advanced machine learning algorithms designed to significantly increase fidelity of complex cyber event data including behavioral usage characteristics in order to achieve significantly reduced system non-linearity, reduce risks from external and insider cyber threats, thereby, increasing efficiencies and "peace of mind", and reducing system cost and overwhelmed cyber analyst labor costs.
Short Bio	Dr. John N. Carbone has served Raytheon Company and the defense industry for ~30 years. Dr. Carbone currently serves as Senior Director, Technical Solutions and scientific advisor to Forcepoint LLC, Global Governments and Critical Infrastructure division. He recently served as Chief Science Advisor and Raytheon Engineering Fellow within the Intelligence, Information and Services (IIS) Division after previously serving as Product Area Technology Director and Chief Engineer for Innovation. Dr. Carbone has served in various Chief Engineer, System, Software Architect Roles and more recently as Chief Data Architect/Scientist for major Cloud programs.

Go Top