- Ph.D. in Computer Science: Concordia University (Canada)
- M.Sc. in Electrical Engineering: ESIEE (France)
- Tripartite: ESIEE (France), Karlsruhe Institute of Technology (Germany), Essex U. (England)
- Machine Learning, Natural Language Processing, Data Science
- Software Developer, Data Scientist
- Machine Learning: Linear Classifiers, Support Vector Machines (SVMs), Neural Networks, Hidden Markov Models (HMMs), Logistic Regression, Decision Trees, Random Decision Forest, Boosted Decision Trees, ExtraTrees
- Natural Language Processing : word sense disambiguation, Named Entity Extraction, word embedding (word vectors)
- Deep Learning
- Predictive Modelling, classification, regression analysis, Information Retrieval, confidence scoring
- Data mining, statistics, clustering, K-Means
- Languages: Python, C++
- Databases: MongoDB (NoSQL), PostgreSQL
- Cloud computing: AWS, Oracle Grid Engine, multi-threaded, distributed computing environment
- Tools: Linux, Unix, git, scikit-learn, libsvm, liblinear, scipy, pandas, graph-tool, spaCy, gensim, Word2Vec, GloVe, GNU Makefile, Anaconda
- Big Data: Hadoop, MapReduce, Apache Spark, Apache MLLib, TensorFlow, PyTorch
- Data visualization: Bokeh, graph-tool, t-SNE
- Tools: API REST, Flask, Flask-RESTful
- Canada (Montréal), 2017-present
- Machine Learning Expert: Natural Language Processing - Machine learning
- Canada (Montréal), 2002-2017
Reporting to the Head of Research and Development (R&D)
- Commercial product in natural language processing for the semantic analysis of texts.
- Database with billions of sense annotated queries, tweets and documents.
- Product Adsquirl designed for marketing agencies to drive advertising campaigns:
- precise semantic meaning of keywords
- negative keywords generation using Idilia’s database of sense annotated queries.
- long tail queries generation by matching the precise meaning of seed keywords with Idilia’s database of sense annotated queries.
- Product for the extraction and filtering of information coming from social media.
- Working with large data sets: billions of tweets, queries and documents.
- Knowledge graph (Idilia Language Graph), with millions of word senses and several hundred million links.
- Designed a document classification module to extract named entities from Wikipedia and insert them into Idilia’s taxonomy. Linear classifier trained with millions of documents, several hundred thousand features and several hundred labels. Database: MongoDB.
- Designed modules for named entity recognition: Hidden Markov Models and rules.
- Designed a capitalization module for recovering / correcting capitalization in lowercase, uppercase and mixed case sentences. Classifier combining HMMs, huge amount of N-grams data, and other sources of information.
- Designed modules for word sense disambiguation.
- Designed various other modules for lexical analysis, classification and confidence scoring.
- API REST interface to some of the modules.
- Software to be deployed on thousands of servers and able to serve thousands of queries per second. Software needs to be fast, scalable, distributed, multi-threaded and very reliable in its predictions with respect to its confidence scores.
- Tools: Linux, C++, boost C++ libraries, Python, Jupyter (IPython), MongoDB, GNU Make, Git, Oracle Grid Engine, JSON, Flask-RESTful
- Machine Learning: libsvm, liblinear, sklearn, graph-tool, spaCy, gensim, Neural Networks, Hidden Markov Models, Random Decision Forest.
Locus Dialogue (acquired by Nuance)
- Software Engineer - Machine Learning: Speech Recognition and Machine Learning
- Canada (Montréal), 2000-2002
Reporting to the Research and Development Manager (R&D)
- Team responsible for the post-processing of the Automatic Speech Recognition (ASR) output.
- Hidden Markov Model, Gauss, generalized linear models, neural networks.
- Software deployed at close to 1,000 installations worldwide and able to serve approximately one-half billion calls annually. Reliability of predictions being extremely important.
- Design and implementation of a new version of the post-processing module.
- Responsible for implementation, machine learning tools, evaluation, testing and improvements.
- Significantly enhanced the robustness and performance of the post-processing module shipped in the Locus Dialogue speech products.
- Tools: Windows, Cygwin, C++, ClearCase
- Machine Learning: Neural Networks, Hidden Markov Models, Generalized Linear Models
- Member of the Research Staff: Machine learning, Pattern recognition
- Japan (Kawasaki), 1997-2000
Reporting to the Research and Development Manager (R&D)
- Pattern Recognition Laboratory.
- R&D activities for the NEC postal sorting machines shipped worldwide.
- Design and implementation of software to be deployed in Finland’s postal sorting facilities to handle hundreds of millions of mail pieces per year. Recognition results need to have an extremely low error rate.
- Responsible for full development cycle of handwritten digits and word recognition modules.
- U.S.A. patent for a word lexicon reduction system: US Patent 6834121.
- Tools: Linux, EWS-UX, HP-UX, C++
- Machine Learning: Neural Networks, Hidden Markov Models.
ACSIEL (formerly SYCEP)
- Manager of the German Office: Export
- Germany (Munich), 1987-1989
Reporting to the Company’s President in Paris, France
- Acted as a liaison between the French manufacturers of electronic components and their German customers and distributors.
- Provided assistance to the French companies in establishing their distribution channel in Germany.
- Assisted the French companies in their business dealings with the German Institute of Norms (DIN).
- Ph.D. Computer Science
- M.Sc. Electrical Engineering
Privacy Preserving Machine learning
- Secure and Private AI (Udacity)
- Deep Learning with PyTorch (Udacity) : PyTorch
- Deep Learning (Collège de France)
- Deep Learning (Udacity/Google) : TensorFlow
- CS224n: Deep Learning for Natural Language Processing (Stanford)
- CS231n: Convolutional Neural Networks for Visual Recognition
Machine Learning - Natural Language Processing
- Machine Learning (Stanford)
- Statistical Learning (Stanford)
- Scalable Machine Learning (UC Berkeley) : Apache Spark
- Natural Language Processing (Stanford)
- Natural Language Processing (Columbia)
- Mining Massive Datasets (Stanford) : Locality Sensitive Hashing (LSH)
- Introduction to Data Science (U. of Washington) : Apache Hadoop , NewSQL
- Data Manipulation at Scale: Systems and Algorithms (U. of Washington) : Exact Test, …
- Data Science and Analytics in Context (Columbia)
Computer Science - Statistics - Data Structures - Algorithms
- Calculus: Single Variable (Penn) : THE course… A gem…
- Algorithms, Part I (Princeton) : UnionFind, stack, queue, sort, BSTs, …
- Algorithms, Part II (Princeton) : Graphs, …
- Networked Life (Penn)
- Model Thinking (Michigan)
- Functional Programming Principles in Scala (Polytechnique Lausanne)
Learning / Education
- Teaching Character (Relay GSE) : Grit
- Learning How to Learn (UC San Diego) : Focused versus Diffuse Mode of Thinking
- How to Learn Math (Stanford) : Fixed versus Growth Mindset
- French (mother tongue)
- English (bilingual)
- German (fluent, lived-studied-worked 3 years in Germany)
- Japanese (intermediate, lived 3 years in Japan)
- Work and study in France, Germany, England, Canada, USA and Japan.
- Sports: running, cycling, dragon boat.