Data Science Rivierabeta





Data Science meetup

Summaries, slides, papers, and code







#20.01 - Biometrics
11 Februray 2020, Sophia-Antipolis


         "What's in a Face? Computer Vision for Face Analysis and Generation" , A. Dantcheva (Inria-Stars)
Summary and Papers

The topic of human facial analysis has engaged researchers in multiple fields including computer vision, biometrics, forensics, cognitive psychology and medicine. Interest in this topic has been fueled by scientific advances that suggest insight into a person's identity, intent, attitude, as well as health; all solely based on their face images and videos.

The above observations lead to the tantalizing question: "What's in a Face?"

In my talk I will firstly provide a brief overview of the research landscape in face analysis and generation. This area, over the last years, has witnessed a tremendous progress due to deep convolutional neural networks (CNNs). I will then zoom into recent works on face analysis, where we have used face images and videos to deduce attributes, emotions, as well as the more complex state of apathy. While a large body of work has aimed at extracting and classifying such information from faces, currently the inverse problem - namely face generation - has received increased attention. In this context, I will talk about our recent designed generative models, which allow for realistic generation of face images and videos, and the related deepfake detection.

Dantcheva, Bremond, Bilinski (2018): "Show me your face and I will tell you your height, weight and body mass index", ICPR 2018

Happy, Dantcheva, Das, Zeghari, Robert, and Bremond (2019): "Characterizing the State of Apathy with Facial Expression and Motion Analysis", FG 2019

Wang, Bilinski, Bremond, Dantcheva (2020): "ImaGINator: Conditional Spatio-Temporal GAN for Video Generation", WACV 2020




#19.07 - Activities recognition in videos - Automated machine learning
28 November 2019, Sophia-Antipolis


         "Spatio-temporal attention mechanism for Activities of Daily Living" , S. Das (Inria-Stars)
Summary and Papers

Action Recognition has been a popular problem statement in the vision community because of its large scale applications. We particularly focus on Activities of Daily Living (ADL) which can be used for monitoring hospital patients, smarthome applications and so on. In real-world videos, ADL look simple but their recognition are often more challenging than sport, Youtube or movie videos. These actions have often very low inter-class variance making the task of discriminating them from one another very challenging.

The recent spatio-temporal 3D ConvNets are too rigid to capture the subtle visual patterns across an action, so we propose a novel pose driven spatio-temporal attention mechanism through 3D ConvNets. We show that our method outperforms state-of-the-art methods on the large-scale NTU-RGB+D, on a human-object interaction dataset - Northwestern-UCLA, and on a real-world challenging human activity dataset, Toyota Smarthome.

Das, Chaudhary, Bremond and Thonnat (2019): 'Where to Focus on for Human Action Recognition?', in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Waikoloa Village, Hawaii, January 7-11, 2019

Das, Dai, Koperski, Minciullo, Garattoni, Bremond and Francesca (2019): "Toyota Smarthome: Real-World Activities of Daily Living", in Proceedings of the 17th International Conference on Computer Vision, ICCV 2019, in Seoul, Korea, October 27 to November 2, 2019

Das, Bremond and Thonnat (2020): "Looking deeper into Time for Activities of Daily Living Recognition", in Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV 2020, Snowmass village, Colorado, March 2-5, 2020


         "Tada - MyDataModels's platform for small data analysis and prediction" C. Fanara (MyDataModels)
Summary and Papers

Since its foundation, MyDataModels (MDM), has been specializing on small data, showing how these 'democratize' the field, allowing domain experts and professionals to access machine learning results in an unprecedented way.

Machine Learning models can be generated with artificial neural network, deep learning, but also - like in the case of MDM - using evolutionary programming (EP) and genetic algorithms (GA).

An outline about EP and GA is given. Whilst the latter may occasionally be costly in terms of execution time, ways to define early convergence can be found. This is one of the efforts currently undertaken at MDM, and it is worth, because the outcome is given with mathematical formulae which include the variables from the original dataset: thus, models become explainable and exploitable.

Two White Papers explain the approach further:

MyDataModels - ZGP Engine (Transformative Data Pre-Processing)
https://www.mydatamodels.com/resources/mydatamodels-zgp-engine-transformative-data-pre-processing/

MyDataModels - ZGP Engine (Signal Detection Demonstration)
https://www.mydatamodels.com/resources/mydatamodels-zgp-engine/




#19.06 - Sensing earthquakes - Satellite images (architecture & deep learn-GAN)
03 October 2019, Sophia-Antipolis


         "Distributed sensing of earthquakes and ocean-solid Earth interactions on seafloor telecom cables" , A. Sladen (GeoAzur)
Summary and Paper

Two thirds of the surface of our planet are covered by water and are still poorly instrumented, which has prevented numerous important questions to be adressed. The potential to leverage the existing fiber optic seafloor telecom cables that criss-cross the oceans, by turning them into dense arrays of seismo-acoustic sensors, remains to be evaluated. Here, we report Distributed Acoustic Sensing measurements on a 41.5 km-long telecom cable that is deployed offshore Toulon, France. Our observations demonstrate the capability to monitor with unprecedented details the ocean-solid earth interactions from the coast to the abyssal plain, in addition to regional seismicity (e.g., a magnitude 1.9 micro-earthquake located 100 km away) with signal characteristics comparable to those of a coastal seismic station.

Sladen, Rivet, Ampuero, De Barros, Hello, Calbris & Lamare (2019): "Distributed sensing of earthquakes and ocean-solid Earth interactions on seafloor telecom cables "


         "ColorMapGAN: Unsupervised domain adaptation for map segmentation using GANs" O. Tasar (Inria, Titane team)
Summary and Paper

Due to the various reasons such as atmospheric effects and differences in acquisition, it is often the case that there exists a large difference between spectral bands of satellite images collected from different geographic locations. The large shift between spectral distributions of training and test data causes the current state of the art supervised learning approaches to output poor maps. We present a novel end-to-end framework, called Color Mapping Generative Adversarial Networks (ColorMapGAN), that is robust to such shift. It can generate fake training images that are semantically exactly the same as training images, but whose spectral distribution is similar to the distribution of the test images. We then use the fake images and the ground-truth for the training images to fine-tune the already trained classifier. Contrary to the existing GANs, the generator in ColorMapGAN does not have any convolutional or pooling layers. It learns to transform the colors of the training data to the colors of the test data by performing only one element-wise matrix multiplication and one matrix addition operations. ColorMapGAN outperforms the existing approaches by a large margin in terms of both accuracy and computational complexity.

Tasar, Happy, Tarabalka & Alliez (2019): "ColorMapGAN: Unsupervised Domain Adaptation for Semantic Segmentation Using Color Mapping Generative Adversarial Networks"


         "An architecture to massively download and deal with satellite images" J. Nguyen (IM2-SSTIM, ACRI-ST)
Summary

Since 2017, various types of images from the Sentinel satellites are available free of charge. These images are of a significant size (about 1 GB), and there are many of them (about 700 for a geographical area of 10,000 km² - knowing that the earth is 510 million km²).

Recovering this data by hand is a costly and time-consuming procedure that begs for automation and thus requires defining and implementing a data recovery architecture. The program that has been implemented makes it possible to manage the heavy data (1 GB) available on different complex interfaces (the different suppliers), over long time series (e. g. 3 years), and quickly. It also supports near-real-time downloading.

The resources needed include a large storage space, of the order of a hundred terabytes, and a remote server allowing the program to be deployed in production. The choosen language is python as it allows for prototyping and has many open-source libraries. The software used is docker. That software allows applications to be easily launched in software containers.

The service developed offers users different types of products containing images and metadata and corresponding to several satellite sources.




#19.05 - Transfer learning with small data sets for person re-identification - Open source security issues
30 April 2019, Sophia-Antipolis


         "Cross domain residual transfer learning for person re-identification" , F. Brémond (INRIA-STARS)
Summary and Paper

We present a novel way to transfer model weights from one domain to another using residual learning framework instead of direct fine-tuning. We also argue for hybrid models that use learned (deep) features and statistical metric learning for multi-shot person re-identification when training sets are small. This is in contrast to popular end-to-end neural network based models or models that use hand-crafted features with adaptive matching models (neural nets or statistical metrics).

Our experiments demonstrate that a hybrid model with residual transfer learning gives comparable performance than an end-to-end model on large datasets and can yield significantly better re-identification performance when the training set is small. On iLIDS-VID and PRID datasets, we achieve rank-1 recognition rates of 89.8% and 95%, respectively, which is a significant improvement over state-of-the-art.

Khan & Brémond (2019): "Cross domain Residual Transfer Learning for Person Re-identification"


         "Automated classification of security fixes in open-source code repositories" , A. Sabetta and R. Cabrera Lozoya (SAP Security Research)
Summary and Papers

The vulnerability management process of a software with open source components is challenging due its dependence on non-reliable standard sources of advisories and vulnerability data (such the National Vulnerability Database, NVD). Previous efforts aimed to reduce this dependency by directly analyzing source code for the automatic detection of commits that are security-relevant.

In our previous work, we treated source code changes as documents in natural language processing, potentially ignoring the structured nature of source code.

In our recent work, we incorporate the semantic properties of code into our analysis. We leverage on state-of-the art approaches to generate distributed code representations by analyzing and aggregating paths extracted from the abstract syntax tree of the code. We extend one of such approaches (code2vec), to represent of code changes (commits). We use a dataset of vulnerabilities (and commits fixing them) affecting open-source components used in SAP software. This dataset was manually collected and curated by the team operating the vulnerability assessment tool known internally to SAP as Vulas. We show how this representation can be used to identify commits that address security bugs.

Ponta, Plate, Sabetta, Bezzi & Dangremont (2019): A manually-curated dataset of fixes to vulnerabilities of open source software

Sabetta & Bezzi (2018): A practical approach to the automatic classification of security-relevant commits, ICSME 2018

Ponta, Plate & Sabetta (2018): Beyond metadata - code-centric und usage-based analysis of known vulnerabilities in open-source software, ICSME 2018 - recipient of the IEEE/TCS Distinguished Paper Award

Open source security tool: https://github.com/SAP/vulnerability-assessment-tool




#19.04 - Motion detection in videos with applications to apathy diagnosis and soccer
02 April 2019, Sophia-Antipolis


         "Apathy diagnosis by analyzing facial dynamics in videos" , S L Happy (INRIA-STARS)
Summary and Paper

Reduced emotional response, lack of motivation, and limited social interaction comprise the major symptoms of apathy. Current methods for apathy diagnosis require the patient's presence in a clinic, and time consuming clinical interviews and questionnaires involving medical personnel, which are costly and logistically inconvenient for patients and clinical staff, hindering among other large scale diagnostics.

In this talk, a novel machine learning framework will be discussed to classify apathetic and non-apathetic patients based on analysis of facial dynamics, entailing both emotion and facial movement. Our approach caters to the challenging setting of current apathy assessment interviews, which include short video clips with wide face pose variations, very low-intensity expressions, and insignificant inter-class variations. Based on extensive experiments, we show that the fusion of emotion and facial local motion produces the best feature set for apathy classification. In addition, we train regression models to predict the related clinical scores (e.g. mental state examination - MMSE and neuropsychiatric apathy inventory - NPI) using the motion and emotion features, which further improves the performance of the system.

Furthermore, we will discuss a multi-task learning (MTL) framework to leverage the additional information by considering the prediction of multiple clinical scores as auxiliary tasks, which might be closely or distantly related to the main task of apathy classification. Our MTL approach jointly learns the model weights and the relatedness of the auxiliary tasks to the main task in an iterative manner, thereby avoiding negative transfer of the distantly related tasks.

Happy, Dantcheva, Das, Zeghari, Robert & Bremond (2019): "Characterizing the State of Apathy with Facial Expression and Motion Analysis", in IEEE International Conference on Automatic Face & Gesture Recognition, May 2019 (FG-19)


         "Detection of salient motions and temporal alignment in videos" , K. Blanc (Lab I3S)
Summary, Papers, and Code

Video recognition gained in performance during the last years, especially due to the improvement in the deep learning performances on images. However, traditional methods or shallow architectures remain competitive, and combinations of different network architectures are the usual chosen approach. This limitation is certainly due to the difficulty to handle and describe the motion.

The talk will present the current state of the art in video representation and classification, and then introduce motion modeling through two applications that deal with building an informative description and recognizing different events.

Firstly, a hand-made motion representation based on optical flow singularities was designed. This feature locally describes the motion contained in a video. Then, a framework based on this representation was built to extract particular motions and salient moments in football matches.

The second focus was on the robustness of a video representation in regard to speed variations of the motion. One strategy to enable the robustness is to normalize directly the data and thus reduce the temporal intra-class variations before the description. Hence, the results showed the interest of taking into account temporal elasticity for better video classification.

Blanc, Lingrand & Precioso (2017): "SINGLETS: Multi-Resolution Motion Singularities for Soccer Video Abstraction", in Proceedings of the Workshop CvSports in conjunction with CVPR, IEEE, Hawai, 21 july 2017

Blanc, Lingrand, Paladini, Coviello, Mitrev, Soehler, Guzman & Precioso (2019): 'Analysis of temporal alignment for Video Classification', IEEE International Conference on Automatic Face and Gesture Recognition (FG-19)

Code: https://github.com/blancKaty/Singlets




#19.03 - Analyzing multiple confidential, non-shareable datasets as one big dataset
28 February 2019, Sophia-Antipolis


         "Federated learning in biomedical data: application to brain imaging and genetics analysis" , M. Lorenzi (UCA, Inria-Epione)
Summary and Papers

When dealing with distributed biomedical data, meta-analysis methods are classically used to analyse cohorts distributed in different centres. Standard approaches to meta-analysis rely on univariate testing, by sharing test statistics or effect sizes. However, when the features to be analysed are in the order of millions (e.g. in case of medical images), the mass-univariate paradigm is prone to statistical limitations, such as the multiple comparisons problem, as well as interpretability issues when features are highly correlated. All in all, mass-univariate results often lack stability and reproducibility.

To overcome these limitations, we propose to reformulate multivariate analysis methods, such as dimensionality reduction and regression, within a federated paradigm. Our strategy consists in estimating independent sub-models at each centre, whose parameters are subsequently shared. Importantly, our formulation does not require any data exchange, and involves a very limited amount of information transfer across centres. We already successfully turned our research methods into usable and accessible software, that will soon be applied to the analysis of imaging-genetics data from the large-scale multicentric consortium ENIGMA (enigma.ini.usc.edu). This modelling paradigm may open the way to the effective use of advanced statistical learning methods in today's complex healthcare scenario.

Lorenzi, Gutman, Thompson, Alexander, Ourselin, & Altmann (2017): "Secure multivariate large-scale multi-centric analysis through on-line learning: an imaging genetics case study", 12th International Symposium on Medical Information Processing and Analysis

Silva, Gutman, Romero, Thompson, Altmann, Lorenzi (2019): "Federated Learning in Distributed Medical Databases: Meta-Analysis of Large-Scale Subcortical Brain Data", International Symposium on Biomedical Imaging




#19.02 - Information topological data analysis and gene expression - Heart rate & face biometrics
30 January 2019, Sophia-Antipolis


         "Information Topological data analysis, condensed view of clusters and complex data structures" , P. Baudot (Inserm-Unité de Neurobiologie, Faculté de Médecine, Université Aix-Marseille)
Summary, Papers and Code

As a domain that formalizes the classification and recognition of patterns-structures in mathematics, Topological Data Analysis has progressively gathered the interest of the data science community. On the side of neural networks following Hinton, Amari, information geometric approaches have provided well defined metric and gradient descent methods.

This presentation will focus on an original approach of algebraic topology intrinsically based on probability/statistics and information, developed notably with D. Bennequin since 2006. Information topology characterizes uniquely usual information functions, unraveling that two theories, cohomology and information theory, are of the same nature. These probabilistic tools describe the statistical forms or patterns present in databases and make them correspond to discrete symmetries. The set of statistical interactions-dependencies between k elementary variables is quantified by the multivariate mutual information between these k components. It provides a generalized and metric-free decomposition of free energy that is used in machine learning and artificial intelligence.

Its application to gene expression under open source software makes it possible to detect functional modules of covariant variables (collective dynamics) as well as clusters (corresponding to condensation phenomena and negative synergistic interactions) in high dimension, and thus to analyze the structure and to quantify diversity in data or arbitrary complex systems (imagery, omics, social networks, ecosystems, ...).

Baudot & Bennequin (2015): "The homological nature of entropy", Entropy, 17, 1-66

Baudot, Tapia & Goaillard (2018): "Topological Information Data Analysis: Poincare-Shannon Machine and Statistical Physic of Finite Heterogeneous Systems"

Tapia, Baudot, Dufour, Formizano-Treziny, Temporal, Lasserre, Kobayashi & Goaillard (2018): "Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons", Nature Scientific Reports, 8:13637

Forum: Geometric science of information

Code: INFOTOPO


         "Robust face analysis employing machine learning techniques for remote heart rate estimation and towards unbiased attribute analysis" , A. Das (INRIA-STARS)
Summary and Papers

In the last century, automatic face analysis has been a very prominent topic of interest for researchers as it can be employed for identity/attribute classification for security and identity purposes, emotion analysis to understand the mental state of an individual, health monitoring, etc. Owing to the real-life application, face analysis comprises a lot of challenges.

This talk will highlight some of the recent computer vision and machine learning techniques to hinge robust face analysis. The talk will address:
1) the convolution neural network (CNN)-based attention for remote heart rate estimation, and
2) work on CNN-based multi-tasking techniques for robust face attribute classification.

Das, Dantecheva & Bremond (2018): "Mitigating Bias in Gender, Age, and Ethnicity Classification: a Multi-Task Convolution Neural Network Approach", winner paper of the ECCV'18 challenge on bias estimation in face analysis (BEFA)

Niu, Das &o (2019): "Robust Remote Heart Rate Estimation from Face Videos Utilizing Spatial-temporal Attention"




#19.01 - Deep Learning in NLP - Personalized health care
14 January 2019, Nice


         "Looking for linguistic markers using deep learning methods - example with political discourse learning" (pdf) , L. Vanni (CNRS-Lab BCL)
Summary and Papers

Why talk about linguistics in deep learning? For linguistics, artificial intelligence is a black box and it needs to be explained to be used. For textual data, authorship detection is one of most efficient tasks provided by artificial intelligence. The goal of this presentation is to open the black box using linguistics. Today some deep learning models based on Convolutional Neuronal Networks are able to show an activation score on each word in a text. These activation scores are meaningful and represent a new kind of linguistic markers (mixing Words, Parts of Speech, Lemmas, ...). Deep learning seems to focus on syntactical 'Motifs' and textual structures. This is a new way to read textual data, and a new abstract level of the text. With linguistics analysis, we can now explain the underlying mechanisms of deep learning.

L. Vanni, M. Ducoffe and al. (2018): "Text Deconvolution Saliency (TDS): a deep tool box for linguistic analysis", 56th Annual Meeting of the Association for Computational Linguistics (ACL)


         "Artificial Intelligence and personalized health: can my digital twin improve my drug intake?" , F. Dayan (ExactCure)
Summary

ExactCure is a pioneer startup offering a software solution to reduce the impact of inaccurate medications.

Our Digital Twin simulates the efficacy and interactions of medicines in the body of a patient based on his/her personal characteristics. We help the patient to avoid under-doses, overdoses and drug interactions. It is the result of 3 years of fundamental research in personalized PKPD modelling and Artificial Intelligence.

This presentation will help you to understand the legal principles of the protection of personal data and to know your rights in the subject matter.




#18.07 - Deep learning and aerial and satellite images
28 November 2018, Sophia-Antipolis

         "Can we classify the world? Where deep learning meets remote sensing" , Y. Tarabalka (INRIA-Titane)
Summary and Papers

Deep learning has been recently gaining significant attention for the analysis of data in multiple domains. It seeks to model high-level knowledge as a hierarchy of concepts. With the exploding amount of available data, the improvement of hardware and the advances in training methodologies, now such hierarchies can contain many more processing layers than before, hence the adoption of the term "deep".

In remote sensing, recent years have witnessed a remarkable increase in the amount of available data, due to a consistent improvement in the spectral, spatial and temporal resolutions of the sensors. Moreover, there are new sources of large-scale open access imagery, governments are releasing their geographic data to the public, and collaborative platforms are producing impressive amounts of cartography. With such an overwhelming amount of information, it is of paramount importance to develop smart systems that are able to handle and analyze these data. The scalability of deep learning and its ability to gain insight from large-scale datasets, makes it particularly interesting to the remote sensing community. It is often the case, however, that the deep learning advances in other domains cannot be directly applied to remote sensing. The type of input data and the constraints of remote sensing problems require the design of specific deep learning techniques.

The talk did discuss how deep learning approaches help in remote sensing image interpretation. In particular, it focussed on the most powerful architectures for semantic labeling of aerial and satellite optical images, with the final purpose to produce and update world maps.

Girard, Charpiat, Tarabalka: "Aligning and updating cadaster maps with aerial images by multi-resolution, multi-task deep learning", ACCV 2018.

Zampieri, Charpiat, Girard, and Tarabalka: "Multimodal image alignment through a multiscale chain of neural networks with application to remote sensing", ECCV, Munich, Germany, 2018.

Huang, Lu, Audebert, Khalel, Tarabalka, Malof, Boulch, Le Saux, Collins, Bradbury, Lefèvre, and El-Saban: "Large-scale semantic classification: outcome of the first year of Inria aerial image labeling benchmark", IGARSS'2018.

Girard, Tarabalka: "End-to-End Learning of Polygons for Remote Sensing Image Classification", IGARSS'2018.




#18.06 - GDPR/RGPD and IT security/protection
24 October 2018, Sophia-Antipolis


         "How companies track you as you browse the web and how to protect yourself?" , N. Bielova (INRIA-INDES)
Summary and Papers

Today millions of users browse the web on a daily basis, and as they browse the web, they become producers of data that are continuously collected by numerous companies and agencies. Such data collection is very profitable for companies but often is worrisome for the privacy of the users.

In this presentation, we will cover the following questions that are of interest both to the web users and companies that build web applications:
- what technologies do companies use to track users as they browser the web?
- how to protect yourself with existing tools?
- what advanced ways of tracking exist [1] and is there a way to protect yourself completely?
- what companies can do to protect their users [2]?

Additionally, we have prepared a MOOC (free and open online course) where we explain how to protect your digital identity, and explain privacy problems and solutions while using emails, smartphone apps, web browsers and consumer services:
https://www.fun-mooc.fr/courses/course-v1:inria+41015+session02/about

[1] Gulyás, Somé, Bielova, and Castelluccia (2018): "To Extend or not to Extend: on the Uniqueness of Browser Extensions and Web Logins", Workshop on Privacy in the Electronic Society (WPES 2018) at ACM CCS

[2] Somé, Bielova, and Rezk (2017): "Control What You Include! Server-Side Protection against Third Party Web Tracking", International Symposium on Engineering Secure Software and Systems (EssoS)


         "GDPR: Awareness, issues and implementation" (FR version) , D. Martin (UNS, data protection officer)
Summary

The General Data Protection Regulation (GDPR), was voted in 2016, and is applied in the European Union since May 25, 2018.

This presentation will help you to understand the legal principles of the protection of personal data and to know your rights in the subject matter.




#18.05 - Dynamic topic analysis of networks with text
15 October 2018, Sophia-Antipolis

         "A dynamic stochastic topic block model for networks with textual edges" , M. Corneli (UCA-Lab Dieudonné)
Summary

The increasing volume of communication in social networks (e.g. Linkedin, Twitter and Facebook), personal emails (Gmail, Clinton's mails, ...), emails of companies (e.g. Enron emails), digital/numeric documents (Panama papers, co-authorships, ...), or archived documents in libraries (digital humanities) has recently given rise to new techniques that account for homogeneous groups within a given network in terms of graph connectivity as well as - and that is new - the textual content of the edges. Dynamic extensions of these approches try to detect structural changes in the networks (nodes may come and go - edges may crash and recover) that can affect either the groups composition or the way existing groups interact.

Based on the STBM (stochastic topic block model), a probabilistic model is developed to cluster the nodes of a dynamic graph, accounting for the content of textual edges as well as their frequency. Vertices are grouped together such that they are homogeneous in terms of interaction frequency and the discussed topics. The dynamic graph is considered stationary within a time interval if the proportions of topics discussed between each pair of node groups do not change during that interval.

Experiments on simulated data and an application to the Enron dataset assess and illustrate the proposed approach.

Corneli, Bouveyron, Latouche & Rossi (2018): "The dynamic stochastic topic block model for dynamic networks with textual edges"




#18.04 - Developments in Deep Learning
13 June 2018, Sophia-Antipolis

         "Transfer Learning with CNNs - Off the shelf top notch performances" , E. Feuilleaubois (twitter.com/Deep_In_Depth)
Summary

The application of Convolutional Neural Networks (CNNs) to image recognition has led to highly effective open models (e.g. VGG16, ResNet50, SqueezeNet, InceptionV3, ... freely available in the Keras core) that can discriminate between thousands of object classes with extremely good accuracy. These models have been built with very large databases and high computing power which are not necessarily available for other tasks. Transfer learning (TL), which uses a model developed for a certain task as a starting point for a different task, makes these performances available to custom image classification tasks and avoids a lot of the hurdles that can hamper the construction of CNN classification model from scratch. The TL process specific to CNN will be presented, including some fine-tuning techniques which have the potential to significantly improve the first results obtained by the TL process.



         "Learning Wasserstein Embeddings" , M. Ducoffe (CNRS-Lab I3S)
Summary, Paper and Code

The Wasserstein distance has received a lot of attention recently in the community of machine learning because it is a powerful tool to compare data distributions with wide applications in image processing, computer vision and machine learning. It already has found numerous applications in several hard problems, such as domain adaptation, dimensionality reduction or generative models. However, its use is still limited by a heavy computational cost. We provide an approximation mechanism that allows to overcome its inherent complexity and to compute optimization problems in the Wasserstein space extremely fast.

Numerical experiments supporting this idea are conducted on the digit data set MNIST, and the drawings dataset Google doodle. They show the wide potential benefits of the method presented.

Courty, Flamary & Ducoffe (2017): "Learning Wasserstein embeddings"

https://github.com/mducoffe/Learning-Wasserstein-Embeddings




#18.03 - Medical Imaging
16 May 2018, Sophia-Antipolis

         "Cardiac modelling & machine learning: physiology-based constraints for data-driven methods in healthcare" , M. Sermesant (INRIA-Epione)
Summary and Papers

Machine learning has proven very efficient in many areas, however application to healthcare is still challenging. This is due in particular to the complexity of setting up large labelled databases and to the variability between patients. A way to tackle these difficulties is to leverage the important clinical knowledge on pathophysiology. Such prior knowledge can be introduce through computational modelling of the human body. I will present different ways we explored in order to interleave machine learning and biophysical modelling.

Giffard-Roisin, Sermesant &o (2017): "Non-Invasive Personalisation of a Cardiac Electrophysiology Model from Body Surface Potential Mapping"

Cabrera-Lozoya, Sermesant &o (2017): "Model-based Feature Augmentation for Cardiac Ablation Target Learning from Images"


         "iBiopsy: non-invasive, deep-learning diagnostic tools for prostate, liver and lung disease" , R. Cabrera-Lozoya (Median Technologies)
Summary and Papers

With more than 600 million imaging procedures being done in medical facilities in the US alone, the application of artificial intelligence techniques for the analysis of medical data seems inevitable. This presentation will give an overview of three applications tackled by Median Technologies for the establishment of accurate non-invasive methodologies for disease assessment:
- Prostate Cancer: In collaboration with the Memorial Sloan Kettering Cancer Center (MSKCC), we seek a fast disease progressor assessment for prostate cancer (PCa)
- Liver NASH: In collaboration with the University of California San Diego, the aim is to quantify the fibrosis staging in NASH patients
- Lung Cancer: In collaboration with the Nice University Hospital (CHU), a biomarker for lung cancer screening is sought

Cabrera-Lozoya &o (2018): "Assessing the relevance of multi-planar MRI acquisitions for prostate segmentation using deep learning techniques"




#18.02 - Economic and legal effects of algorithmic pricing - Fintech Start-up Quantilia
27 March 2018, Nice

         "Competition effects of algorithmic pricing" , F. Marty (GREDEG (CNRS), UCA, Science Po)
Summary and Papers

From a non-profit perspective, a platform may contribute to the sharing economy and favour non-market transactions. From a for-profit perspective, it provides more efficient matching services and helps to bypass regulations. However, a platform might be less a tool of rent dispersion than a vector of its polarisation. Huge volumes of data and powerful algorithms may allow platforms to implement perfect price discrimination strategies and facilitate the emergence and the sustainability of tacit collusion equilibria. If those may be welfare-enhancing, they involve strong surplus transfers between economic agents. Competition law enforcement encounters major difficulties to cope with these effects.

We consider alternative responses, as public regulation or enhanced consumer countervailing power and discuss the contestability level of these dominant operators by taking into account the counter-strategies of market participants, the extended rivalry hypothesis and the disruptive potential of blockchain.

F. Marty (2017): "L'économie des plateformes: dissipation ou concentration de la rente ?"

F. Marty (2017): "Algorithmes de prix, intelligence artificielle et équilibres collusifs"

F. Marty (2017): "Algorithmes de prix et discrimination parfaite : Une théorie concurrentielle en voie de trouver sa pratique ?"


         "Quantitatively informed investment strategies based on Quantilia's web-platform" , L. Fauchon (Quantilia)
Summary

The start-up Quantilia has a web-based platform which offers comprehensive data from quantitative strategies. The information is collected from leading global providers and is updated daily. There are 3000+ strategies to choose from.

Investment banks, asset managers, private banks, family offices, and pension funds, as well as SWFs, all need to make informed decisions and they can do so by the means of Quantilia's quantitative tools designed specifically for that purpose.




#1801 - From NIPS: visual domain adaptation - Search engines
15 January 2018, Nice

         "Joint distribution optimal transportation for domain adaptation" , R. Flamary (OCA-Lagrange, CNRS, UCA)
Summary, Paper and Code

In the context of supervised learning, one generally assumes that the test data is a realization of the same process that generated the learning set. However, in practical applications this is often not the case. For instance, in computer vision, for a given new dataset of images without any label, one may want to exploit a different annotated dataset, provided that they share sufficient common information and labels. However, the generating process can be different in several aspects, such as the conditions and devices used for acquisition, different pre-processing, different compressions, etc. So-called "domain adaptation techniques" aim at alleviating this issue by transferring knowledge between domains. Here a principled and theoretically founded way of tackling the unsupervised domain adaptation problem is proposed by making the following assumption: there exists a non-linear transformation between the joint feature/label space distributions of the two domains. We propose a solution to this problem with optimal transport, that allows to recover an estimated target by optimizing simultaneously the optimal coupling and the prediction function. We show that our method, called JDOT (Joint Distribution Optimal Transport), corresponds to the minimization of a bound on the target error, and provide an efficient algorithmic solution, for which convergence is proved.

The versatility of that approach - both in terms of class of hypothesis or loss functions - is demonstrated with three real world classification and regression problems, for which state-of-the-art results are reached or surpassed:
1) Caltech-Office classification dataset on visual adaptation where several features (e.g. presence/absence of background, lightning conditions, image quality, etc.) induce a distribution shift between the source and the target domains.
2) Amazon review classification dataset which contains online reviews of different products collected on the Amazon website. Reviews are encoded with bag-of-word unigram and bigram features as input. The binary classification problem consists in predicting positive or negative notation of reviews. Since different words are employed to qualify the different categories of products, a domain adaptation task can be formulated if one wants to predict positive reviews of a product from labelled reviews of a different product.
3) Wifi localization regression dataset. From a multi-dimensional signal (signal strength from several access points), the goal is to locate the device in a hallway by the means of a regression. As the signals were acquired at different time periods by different devices, a shift can be encountered and calls for an adaptation.

Courty, Flamary, Habard & Rakotomamonjy (2017): "Joint distribution optimal transportation for domain adaptation"

Open source python code: JDOT


         "From Open Source Search building blocks to a full Enterprise Search solution with Datafari" , C. Ulmer (France Labs)
Summary

The search engine Datafari was presented, that comprises the typical steps of search engines with an architecture that is able to index and search hundreds of millions of documents. It was explained how Datafari stands out in the search engine landscape and why this makes it particularly attractive for a wide range of users. The differences between the open source community version and the enterprise edition was also highlighted.

Two use cases were presented: a corporate website (AEAP) and an intranet search (Nuclear Industry).




#1709 - Data science and finance
12 December 2017, Sophia-Antipolis

         "The next step in the Robo Advisor landscape: mass-customization" , Prof L. Martellini (EDHEC)
Summary and Papers

Individual investors, just like institutional investors, are facing complex problems for which they need dedicated investment solutions, as opposed to off-the-shelf investment products. If mass production happened a long time ago in investment management, with the introduction of commingled mutual funds, the missing piece of the puzzle is now mass customisation.

Existing financial products marketed as "retirement investment solutions" do not meet the needs of future retirees, which involve securing their essential goals expressed in terms of minimum levels of replacement income (focus on safety), while generating a relatively high probability of achieving their aspirational goals expressed in terms of target levels of replacement income (focus on performance). Meaningful solutions should therefore combine safety and performance to meet this dual objective.

A theoretical answer to the retirement investment problem is a dynamic goal-based investing strategy that maximises the probability of reaching a target level of income (aspirational goal) while securing a minimum (essential goal). Mass-customized versions of goal-based investing strategies exist, which can reliably secure essential goals, unlike balanced or target-date funds, and they have attractive probabilities of reaching aspirational goals.

Martellini (2016): "The Rise of the Robo-Advisors"

Martellini & Milhau (2017): "Mass customisation vs mass production in retirement investment management"


         "Big data & data science in finance: myths, realities and pitfalls - a practitioner's perspective" , J-R. Giraud (KORIS International)
Summary

After a brief presentation of his two companies, Koris International, which sucessfully provides quantitative-based investment advice by bridging the gap between academic research and asset management, and Trackinsight, which stands out among very few platforms that allow one to classify, filter and compare a very broad range of ETFs (Exchange Traded Funds) according to some carefully chosen criteria, J-R Giraud highlighted a number of "myths" and pitfalls in relation to the current "Big Data" and "Data Science" rhetoric. His remarks covered three areas:
- the reality of big data analysis in financial firms,
- pitfalls of data science techniques in portfolio management, and
- the (true or virtual) reality of artificial intelligence in financial services.




#1708 - Eye gaze-based image classification
06 December 2017, Sophia-Antipolis

         "Eye kinetor - what are you looking for" , S. Lopez (Lab I3S)
Summary and Papers

One daunting challenge of Content Based Image Retrieval systems is the requirement of annotated databases. To limit the burden of annotation, a system of image annotation based on gaze data is proposed. The purpose of this ANR-sponsored work, which is part of the Visiir project, is to classify a small set of images according to a target category (binary classification) in order to classify a set of unseen images.

First, we have designed a protocol based on visual preference paradigm in order to collect gaze data from different groups of participants during a category identification task. Among the gaze features known to be informative about the intentions of the participants, we have determined a Gaze-Based Intention Estimator (GBIE), computable in real-time, independent from both the participant and the target category. This implicit annotation is better than random annotation but is inherently uncertain.

In a second part, the images annotated by the GBIE from the participants' gaze data are used to classify a bigger set of images with an algorithm that handles label uncertainty: probabilistic SVM combining classification and regression SVM. We have determined among different strategies a criterion of relevance in order to discriminate the most reliable labels, involved in the classification part, from the most uncertain labels, involved in the regression part. The average accuracy of probabilistic SVM is evaluated in different contexts and can compete with the performances of standard classification algorithm trained with true-class labels. These evaluations were first conducted on a standard benchmark for comparing with state-of-the-art results and later conducted on food image dataset.

The standard benchmark in this area is VOC2007 http://host.robots.ox.ac.uk/pascal/VOC/voc2007/

Lopez &o (2017): "Handling noisy labels in gaze-based CBIR systems"

Lopez &o (2016): "Catching relevance in one glimpse - food or not food"

Lopez &o (2015): "One gaze is worth ten thousand (key-)words"




#1707 - Big data and network modeling - Risk management
13 November 2017, Nice

         "Statistical learning with networks and texts" , Prof C. Bouveyron (CNRS, INRIA)
Summary and Paper

Many random graph models have been proposed to extract information from communication networks like Facebook, Twitter, Linkedin, email, web, e-publication, etc. However, most of them only focus on person-to-person links, without taking into account information on the contents. This work introduces the stochastic topic block model (STBM), a probabilistic model that also accounts for textual edges in a network. Simulated data sets are considered in order to highlight its main features and assess the STBM against alternative approaches within a range of different scenarios. The effectiveness of the methodology is demonstrated on two real-word data sets: a medium-size directed communication network (Enron e-mail) and a large-size undirected co-authorship network (NIPS). The recent Linkage project allows everybody to try out the STBM on its own prefered network dataset or just to play around with the datasets already available there (www.linkage.fr).

Bouveyron, Latouche, & Zreik (2017): "The stochastic topic block model for the clustering of vertices in networks with textual edges"


         "Knowledge risk management - les nouveaux outils en gestion des risques" , D. Museur (RiskAttitude)
Summary

Key major risks in the current environment are linked to digital security, and natural and technological disasters. Thereby, the human factor accounts for a very large part of the causes of accidents. Today, innovative technology and artificial intelligence are major ways to tackle that risk factor. Innovative technologies allow us to merge many internal and external data in order to build a knowledge base in risk management, to anticipate the risks, and finally mitigate them by transforming them into bearable events. The evolving size of those databases makes it possible to adapt and refine the security strategy implemented. RiskAttitude's approach to risk management focuses on artificial intelligence by performing sequential data analysis and using a risk analysis and management platform in order to prevent the risks. RiskAttitude's risk management toolbox features, for instance, virtual and augmented reality simulations to enable the acquisition of risk management skills. Going foward, and based on the considerable amount of data collected, artificial intelligence will become increasingly used and be indispensable in the entire insurance industry chain.




#1706 - Big news about gravitational waves - Developments in scalable learning
30 October 2017, Sophia-Antipolis

         "Gravitational waves - statistical autopsy of black hole and neutron star mergers" , Prof N. Christensen (OCA-Artemis)
Summary and Papers

Two big news made this Data Science meetup very timely:
1) In early october, the Nobel price in physics was attributed for contributions to the observation of gravitational waves - the latter being the result of several decades of internationally coordinated research of some 1000+ people,
2) on oct 16th, i.e. barely two weeks later, the announcement was made that gravitational waves from a neutron stars merger (rather than a merger of black holes) were detected which means there is an optical counterpart to those wave signals. This opens up a whole range of new possibilities to investigate the cosmos.

It is save to say that the relatively new field of gravitational-wave astronomy now definitively is on track.

We were very fortunate that Prof Christensen agreed to talk us through these topics by focusing on data science aspects. From a statistical point of view, there are two key challenges:
1) extracting a potential signal embedded in a lot of noise,
2) estimating the parameters of the signal waveform which characterises the astrophysical event that created the gravitational wave (in order to find out, for instance: What caused the event? Where did it come from?).

Prof. Christensen explained where in this endeavour frequentist statistical methods and where Bayesian methods, specifically Markov chain Monte Carlo (MCMC) techniques, are useful and why. The focus of the talk was on the Bayesian methods used for the parameter estimation of the waveform. It was explained why the neutron star situation is more difficult in terms of estimation and how it leads to some nice new results about the Hubble "constant" and thus the overall scale of the universe. An outlook was provided about the possibility to test General Relativity theory based on observations of stochastic gravitational wave background.

Meyer & Christensen (2016): "Gravitational waves - statistical autopsy of a black hole merger"

LIGO and VIRGO, & alii (2017): "A gravitational-wave standard siren measurement of the Hubble constant"

Callister, Biscoveanu, Christensen, & Thrane (2017): "Tests of general relativity with the stochastic gravitational-wave background"


         "Scalable Gaussian processes with a twist of probabilistic numerics" , K. Cutajar (Eurecom)
Summary and Papers

Developing scalable learning models without compromising performance is at the forefront of machine learning research. The scalability of such models is predominantly hindered by linear algebraic operations having large computational complexity, among which is the solution of linear systems involving kernel matrices. A common way to tackle this scalability issue is to use the conjugate gradient algorithm, but this technique is not without its own issues: the conditioning of kernel matrices is often such that conjugate gradients will have poor convergence in practice. Preconditioning is a common approach to alleviating this issue.

With particular emphasis on Gaussian processes, this talk outlined how preconditioning can be effectively exploited to develop a scalable approach to both solving kernel machines and learning their hyperparameters. Inspired by recent developments in the field of probabilistic numerics, the talk also covered ongoing work on characterising the computational uncertainty introduced by such algebraic approximations. This ties in with recent work on casting the computation of the log determinant of a matrix as a Bayesian estimation problem.

Cutajar, Cunningham, Osborne, & Filippone (2016): "Preconditioning Kernel Matrices"

Fitzsimons, Cutajar, Osborne, Roberts, & Filippone (2017): "Bayesian Inference of Log Determinants"




#1705 - Deep learning in satellite/aerial image analysis
17 October 2017, Sophia-Antipolis

         "Convolutional neural networks for large-scale remote sensing image analysis" , E. Maggiori (INRIA)
Summary and Papers

In recent years, large-scale sources of data became available covering much of the earth's surface, often at impressive spatial resolutions. In addition to the computational complexity issues that arise in this context, one of the current challenges is to handle the variability in the appearance of the objects across different geographic regions. For this, it is necessary to design classification methods that go beyond the analysis of individual pixel spectra, introducing higher-level contextual information in the process. The presented work uses convolutional neural networks (CNNs), proposes different solutions to output high-resolution classification maps, and studies the acquisition of training data. A benchmark dataset of aerial images over dissimilar locations was created (https://project.inria.fr/aerialimagelabeling/), and, on that basis, the generalization capabilities of CNNs was assessed.

Maggiori, Tarabalka, Charpiat, & Alliez (2017): "High-resolution semantic labeling with convolutional neural networks"

Maggiori, Tarabalka, Charpiat, & Alliez (2017): "Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark"

Maggiori, Tarabalka, Charpiat, & Alliez (2017): "Recurrent neural networks to correct satellite image classification maps"




#1704 - Semantic web & object learning - Data Science competitions
20 June 2017, Sophia-Antipolis

         "Semantic web-mining and deep vision for lifelong object discovery" , V. Basile (INRIA)
Summary and Paper

Autonomous robots that are to assist humans in their daily lives must recognize and understand the meaning of objects in their environment. However, the open nature of the world means robots must be able to learn and extend their knowledge about previously unknown objects on-line. In this work, we investigate the problem of unknown object hypotheses generation, and employ a semantic web-mining framework along with deep-learning-based object detectors. This allows us to make use of both visual and semantic features in combined hypotheses generation. Experiments on data from mobile robots in real world application deployments show that this combination improves performance over the use of either method in isolation.

Young, Kunze, Basile, Cabrio, Hawes, & Caputo (2017): "Semantic Web-Mining and Deep Vision for Lifelong Object Discovery"


         "The landscape of data science competitions - 2"
Summary

Presentation of Data Science competition and challenge platforms with a focus on the differences between them.




#1703 - Deep Learning - Data Science competitions
29 Mai 2017, Nice

         "Deep Learning - basics and recent developments" , M. Ducoffe (CNRS - Lab I3S)
Summary

After a presentation of the essential notions of Deep Learning (perceptron, neural network), it is explained how convolutional neural networks and Autoencoders are built. Then, a number of tips and tricks are provided about implementation details allowing one to build and train one's own deep networks (vanishing gradient, regularization, under- and overfitting, batch normalization, optimization). finally, an overview is given about work of the MIND team, focussing in particular on effective and scalable batch learning in Deep Learning.


         "The landscape of data science competitions - 1"
Summary

Overview of the 20 years history of Data Science competitions and challenges




#1702 - Medical imaging for Alzheimer and cancer diseases
30 March 2017, Sophia-Antipolis

         "Multivariate statistical online learning of large-scale imaging-genetics data" , M. Lorenzi (INRIA)
Summary and Paper

State-of-the-art data analysis methods in genetics and related fields have advanced beyond massively univariate analyses. However, these methods suffer from the limited amount of data available at a single research site. Recent large-scale multi-centric imaging-genetic studies, such as ENIGMA, have to rely on meta-analysis of mass univariate models to achieve critical sample sizes for uncovering statistically significant associations. Indeed, model parameters, but not data, can be securely and anonymously shared between partners.

Here partial least squares (PLS) are proposed as a multivariate imaging-genetics model in meta-studies. In particular, an online estimation approach to partial least squares is used for the sequential estimation of the model parameters in data batches, based on an approximation of the singular value decomposition (SVD) of partitioned covariance matrices.

The proposed approach is applied to the challenging problem of modeling the association between 1,167,117 genetic markers (SNPs, single nucleotide polymorphisms) and the brain cortical and sub-cortical atrophy (354,804 anatomical surface features) in a cohort of 639 individuals from the Alzheimer's Disease Neuroimaging Initiative. When comparing two different modeling strategies (sequential- and meta-PLS) to the classic non-distributed PLS, it appears that both strategies exhibited only minimal approximation errors of model parameters. The proposed approaches pave the way to the application of multivariate models in large scale imaging-genetics meta-studies, and may lead to novel understandings of the complex brain phenotype-genotype interactions.

Lorenzi, Gutman, Thompson, Alexander, Ourselin, & Altmann (2017): "Secure multivariate large-scale multi-centric analysis through on-line learning: an imaging genetics case study"


         "The iBiopsy plateform - imaging phenomics and the limits of genomics" , H. Beaumont (Median Technologies)
Summary

Precision Medicine is about to revolutionize the way how diagnostic and biological data is used to pinpoint and deliver care that is preventive, targeted and effective. It targets patient-specific, individually-tailored therapies. To date, this strategy is largely based on a person's unique genetic profile. However, despite massive efforts to develop new genomic profiling platforms, pharmacogenetic studies have mostly failed to successfully predict drug therapy outcomes in individual patients. As a result, other strategies, such as phenotyping, are coming to the fore. However, most common diseases are multifactorial and clinical phenotypes (defined as any biological, physiological, morphological, or behavioral trait) are difficult to predict due to their inherent complexity.

Phenomics, the study of large numbers of expressed traits across populations, investigates how genetic information translates into a full phenotype. In short, phenomics is revolutionizing our ability to connect the genotype and phenotype. Phenotypes are associated with biomarkers reflecting biological processes. The quantitative evaluation of these imaging biomarkers has the potential to open up the science of phenomics. Imaging phenomics has a number of key advantages:
- it is noninvasive and derived from standard imaging tests,
- it avoids limitations associated with tissue biopsies,
- it leverages widely accepted criteria key to targeted therapy (e.g. RECIST).

It is against that background, that Median Technologies has developped iBiopsy™, a phenomics platform specifically designed to acquire, index, and analyze thousands of individual phenotypes for the purpose of establishing biological associations with high predictive accuracy. iBiopsy™ combines noninvasive imaging biomarkers with phenomic-based strategies to identify associations that may help to predict a patient's response to treatment, thereby enhancing precision medicine. It is aimed to deliver an easy to use solution, decoding the biomarkers from standard medical images, and so to contribute revolutionizing the way patients with cancer and many other chronic diseases are diagnosed, treated and monitored.




#1701 - Data Science in medecine - Big Data in the travel industry
24 January 2017, Nice

         "Data science in medicine", Prof P. Staccini (faculté de médecine & CHU de Nice)
Summary

A recent report of the US-National Academy of Sciences concluded: "The current heath-care system has important shortcomings and inefficiencies. Insights from research are poorly managed, the available evicence is poorly used, and the care experience is poorly captured, resulting in missed opportunities, wasted resources, and potential harm to patients." It is probably fair to say that these conclusions are not limited to the US alone.

At the same time, there currently exists a confluence of science, technology, and medecine that creates new oportunities for data science applications, particularly in the areas of prevention through prescriptive analytics (e.g. early diseases detection), personalized healtcare and precision medecine, and automated health data reporting for research (e.g. clinical trials), clinical decisions, and healtcare spending (e.g. detection of high cost patients). Thereby, the participation of the patients will be key for achieving successfull prevention and efficient treatment.

However, the are a number of challenges for big data analytics in the health care area: the evidence base, methodological issues, and issues concerning the clinical integration and utility. For successfully taking up the different challenges, it will be key to have clear ideas about the use case from the patient and the health care professional points of view, to identify the data sources and understand the maturity levels, to articulate the main steps for data processing, to be fully aware of the legal constrains, and to thoroughly understand the sometimes considerably challenging ethical aspects.


         "Big Data in the travel industry", Ch. Imbert (Milanamos)
Summary

Tomorrow's smart cities and multimodal networks will demand a new approach to travel. As a specialist in data science for travel, Milanamos helps airports, travel operators, portals and government to develop a data-centric travel strategy. Airports, train stations and coach/bus stations represent a considerable investment. Milanamos's PlanetOptim software platform helps infrastructure owners to maximize their return on investment by identifying 1) new transport partnerships, and 2) potential new routes.

1) End-to-end multimodal solutions: This means integrating data from all industries to provide one reliable dataset, enable decision making at network level, and recommend door-to-door itineraries that provide optimal load factors and revenues.

For this, the PlanetOptim platform is available for all transportation stakeholders (rail, bus and mobility companies) - be they big and small - to co-operate in terms of intermodal offer to achieve an integrated time schedule with revenue sharing.
The platform helps public services to identify traffic flows and build efficient travel networks. This can help to optimise existing resources through connecting points, but also to identify future connection points faciliating public transport.

2) Trend forcasting: PlanetOptim.future provides the big data analytics all travel operators need to forecast current market trends and identify potential new routes and new business partnerships.

We can do a market analysis on any route with data on passenger traffic, total revenue and average revenue on the period. Even for routes for which there is no schedule and with no historical data, we are able to provide simulations including an estimation of traffic and revenue. We can assess potential connections of a given flight and find the best departure time to maximize the connections with scheduling data up to 12 months in the future. We have a benchmarking tool to compare the networks of different airlines or airports.




If you have comments or suggestions, if you want to present your own work or tools and techniques, or if you have an idea about who to invite, do no hesitate to send an e-mail to the organizer of the Data Science meetup Nice/Sophia-Antipolis

nice.dsmeetupATcom.gmail
(hint: some mistakes here, that should start with ds...)