Professional Experiences

From academic research to industry applications: a journey through impactful AI projects delivering measurable business value through innovative solutions.

10+ Years in AI
50% Efficiency Gains
>90% Success Rate
20+ Projects Delivered
Upskills R&D | 2023-2024

Privacy-Preserving RAG System for Financial Contracts

On-premise AI Solution with Advanced Document Intelligence

70% Time Reduction
90%+ Accuracy
100% Data Privacy

Situation & Privacy Challenge

Financial institutions require AI solutions that balance performance with data privacy:

  • Data Sensitivity: Financial contracts contain highly confidential information
  • Performance Needs: 4+ hour manual review per complex contract
  • Privacy Constraints: Prohibition of external API calls for sensitive data
  • Accuracy Requirements: Need for >90% accuracy in legal document analysis

Task & Technical Objectives

Develop a fully on-premise RAG system meeting strict privacy requirements:

  • Privacy-First Architecture: 100% on-premise deployment, no external APIs
  • Open-Source LLM Integration: Implement Llama 3 for local inference
  • Advanced Chunking: Semantic chunking preserving document structure
  • Optimized Retrieval: Implement re-ranking for precision improvement
  • Maximum Efficiency: Target 70%+ reduction in processing time

Action & Technical Innovations

🔒 Privacy-Preserving Architecture

  • Full On-Premise Deployment: Complete control over data flow
  • Llama 3 Integration: 8B parameter model running locally
  • Air-Gapped Capability: Operates without internet connectivity
  • Encrypted Storage: Vector database with at-rest encryption

📄 Advanced Document Processing

  • Semantic Chunking: Intelligent segmentation preserving:
    • Legal document structure (sections, subsections)
    • Financial table integrity
    • Cross-reference preservation
    • Hierarchical relationships
  • Context-Aware Processing:
    • Document type detection (NDA, MSA, SOW, etc.)
    • Party identification and role extraction
    • Jurisdiction and governing law detection

âš¡ Optimized Retrieval Pipeline

  • Multi-Stage Retrieval:
    1. Initial Retrieval: Dense embeddings with sentence transformers
    2. Re-ranking: Cross-encoder models for precision improvement
    3. Context Expansion: Adjacent chunk inclusion for coherence
  • Performance Optimizations:
    • Batch processing for parallel inference
    • Model quantization for reduced memory footprint
    • Cache layer for frequent queries

Key Technical Innovations:

  • Developed document-structure-aware chunking algorithm
  • Built multi-modal re-ranker combining semantic and lexical signals
  • Designed privacy audit trail for compliance reporting

Results & Impact

70%
Reduction in contract review time
90%+
Accuracy in clause extraction
100%
On-premise data privacy
  • Unprecedented Efficiency: Contract review time reduced from 4 hours to 72 minutes (70% reduction)
  • Enterprise-Grade Privacy: Full compliance with financial regulations and data sovereignty requirements
  • Superior Accuracy: 90%+ accuracy achieved through re-ranking and domain adaptation
  • Cost Optimization: Eliminated external API costs while maintaining performance

📈 Performance Benchmarks vs Alternatives

Metric Our Solution GPT-4 API Manual Review
Accuracy 92% 94% 85%
Data Privacy 100% 0% 100%
Cost per Document $0 $2.50 $200
Llama 3 RAG Architecture Semantic Chunking Re-ranking On-premise AI FastAPI Vector Databases Document Intelligence Privacy by Design Model Quantization

Key Achievements

Privacy Innovation

Delivered enterprise-grade privacy with 100% on-premise deployment

Performance Excellence

Achieved 70% time reduction while maintaining 90%+ accuracy

Technical Leadership

Successfully implemented Llama 3 with advanced RAG optimizations

Business Impact

Enabled secure AI adoption in highly regulated financial environment

System Architecture Overview

Document Ingestion

PDF/OCR, parsing

Semantic Chunking

Structure-aware

Vector Storage

Local embeddings

Retrieval & Re-rank

Multi-stage

Llama 3 Generation

On-premise

Upskills R&D | 2022-2023

Intelligent Email Classification System

DBS Banking - Multi-label classification for customer service optimization

90%+ F1 Score
70→90% Improvement

Situation & Challenge

DBS banking teams were receiving hundreds of emails daily with varying urgency levels based on email types/labels. The manual classification process was:

  • Time-consuming and inconsistent across different labelers
  • Prone to human error in identifying urgent vs non-urgent emails
  • Unable to handle the volume efficiently (multiple labels per email)

Task & Responsibilities

My primary responsibilities included:

  • Design and implementation of NLP preprocessing pipelines
  • Fine-tuning of deep learning models for multi-label classification
  • Comprehensive evaluation and analysis of model performance
  • Data quality investigation and process improvement recommendations

Action & Approach

Initial results showed poor model performance (~70% F1). Through deep data analysis, I discovered a critical issue:

  • Inconsistency Discovery: Comparing emails with identical labels revealed significant variations when labeled by different people
  • Process Intervention: Requested unified labeling standards and attended labeling sessions to understand data better
  • Improved Preprocessing: Enhanced data cleaning and feature engineering based on labeling insights
  • Model Optimization: Implemented advanced NLP techniques and fine-tuned models with improved data

Results & Impact

  • Performance Leap: Improved F1 score from 70% to over 90%
  • Process Standardization: Established consistent labeling protocols across teams
  • Efficiency Gain
  • Scalable Solution: System capable of handling growing email volumes
NLP Deep Learning Python Multi-label Classification Data Analysis Transfer Learning

Key Achievements

Data Quality Insight

Identified critical labeling inconsistencies that were degrading model performance

Process Improvement

Implemented standardized labeling protocols that improved data quality

Performance Optimization

Achieved >20% improvement in classification accuracy

Business Impact

Enabled faster email processing and better customer service response times

Upskills R&D | 2021-2022

Automated Financial Table Extraction System

Financial Spreading Automation - PDF Report Processing

96% Detection Rate
95%+ Accuracy

Situation & Challenge

Manual extraction of financial tables (balance sheets, income statements, cash flow) from annual PDF reports was a major bottleneck in financial spreading processes:

  • Time-consuming manual process prone to human errors
  • Existing table extraction solutions not adapted to financial document complexity
  • No standardized templates across different companies
  • Multi-page tables with complex structures

Task & Responsibilities

My mission was to design, implement, and evaluate a robust automated financial table extraction pipeline:

  • Analyze and evaluate state-of-the-art table extraction technologies
  • Define the overall solution architecture and pipeline
  • Create and annotate a reference dataset for training and evaluation
  • Develop post-processing algorithms for table correction and restructuring
  • Conduct comprehensive system performance evaluation

Action & Approach

I developed an innovative hybrid architecture combining multiple approaches:

  • State-of-the-art Analysis: Evaluated existing tools (Camelot, Tabula) and deep learning models (CascadeTabNet, TableNet)
  • Hybrid Pipeline Design:
    • Discovery step using regex to identify pages containing target financial tables
    • YOLOv3 model optimized with FinTabNet training for precise table region detection
    • Custom post-processing module for cleaning extraction artifacts and reconstructing logical table structure
  • Dataset Creation: Supervised creation of manually annotated dataset (291 tables from 100 annual reports)
  • Custom Metrics: Developed ExactMatchSim metric and multi-criteria evaluation methodology

Results & Impact

  • High Detection Rates: >95% detection for single-page tables, >96% overall document detection
  • Excellent Accuracy: 94% column similarity, 92.1% row similarity, 95.7% number extraction accuracy
  • Research Contribution: Created reference dataset and published results in scientific paper
  • Business Value: Dramatically reduced manual extraction time and errors
Computer Vision YOLOv3 Deep Learning Table Extraction PDF Processing Financial Analysis

Key Achievements

Innovative Architecture

Designed hybrid pipeline combining regex discovery, deep learning detection, and custom post-processing

Dataset Creation

Built comprehensive annotated dataset for financial table extraction

High Accuracy

Achieved >95% accuracy on critical financial data extraction

Research Publication

Published results validating innovative approach in scientific paper

Upskills R&D | 2020-2021

Murex Trading System Reconciliation Clustering

Root Cause Analysis Automation

96% Feature Reduction
0.67 Rand Score

Situation & Challenge

Trading system reconciliation projects typically involved 20+ business analysts over 2 years. The challenge was to cluster mismatches with the same root cause:

  • Existing solution had computational complexity preventing timely results
  • Large dataset: 130,000+ transactions (65,000 mismatches, 48 features)
  • Highly imbalanced data with repetitive root causes and duplicates
  • Need for scalable solution to handle growing data volumes

Task & Responsibilities

My main task was to reduce solution complexity and achieve results in reasonable time:

  • Data preprocessing and quality improvement
  • Feature reduction and dataset optimization
  • Clustering algorithm implementation and evaluation
  • Performance comparison and impact analysis
  • Development of reproducible methodology

Action & Approach

I implemented a comprehensive data reduction and clustering strategy:

  • Data Preprocessing:
    • Defined unique transaction keys by concatenating Murex number and operation type
    • Removed rows with missing values, inconsistent duplicates, and transactions without root cause
  • Data Reduction:
    • Duplicate removal (reduction from 65,273 to 19,095 observations)
    • Custom sampling with max_count threshold per cluster
    • Elimination of low-variance or single-value features
  • Clustering Implementation:
    • Implemented FuzzyART and DBSCAN with parameter tuning
    • Used metrics: Adjusted Rand Score, purity, detectability
    • Validated by projecting clusters onto complete dataset

Results & Impact

  • Significant Data Reduction: Up to 96% feature reduction while preserving 97% of root causes
  • Improved Performance: FuzzyART achieved 0.65-0.67 Adjusted Rand Score
  • Method Validation: Demonstrated that removing duplicates and uninformative features doesn't harm clustering quality
  • Reproducible Methodology: Validated approach on simulated data with various noise levels
  • Scalable Solution: Enabled efficient and scalable root cause detection in industrial context
Clustering FuzzyART DBSCAN Data Reduction Feature Engineering Big Data

Key Achievements

Data Optimization

Reduced dataset by 96% while maintaining 97% of critical information

Algorithm Performance

Achieved high-quality clustering with 0.67 Adjusted Rand Score

Computational Efficiency

Enabled timely results through intelligent data reduction

Industrial Scalability

Developed methodology applicable to large-scale industrial data

Postdoctoral Researcher | 2017-2019

Multimodal Tree Species Recognition System

LISTIC Lab, Annecy - Mobile AI Application for Botany

56%→75% Accuracy Gain
+20% vs State-of-the-art

Situation & Challenge

Tree species recognition presents significant challenges due to:

  • High diversity of tree species in nature
  • Interspecies similarity and intra-species variability
  • Confusions during recognition caused by species similarities
  • Need for offline mobile applications accessible to everyone
  • Existing solutions achieving only 56% accuracy

Task & Objectives

The project had two main contributions:

  • Intelligent Decision System: Develop a system emulating botanist expertise using belief functions theory to reduce confusion and improve accuracy
  • Mobile Solution: Create a practical smartphone application working offline, adapted to memory and computation limits
  • Accessibility: Make tree species recognition accessible and easy to use for everyone in nature

Action & Technical Approach

I developed an innovative two-step multimodal recognition approach:

  • Step 1: Leaf Identification
    • Used to reduce problem dimensionality
    • Identifies subset of most probable species
    • Leverages leaf morphology and texture features
  • Step 2: Bark Refinement
    • Modified evidential k-Nearest Neighbors (EkNN) algorithm
    • Recognizes bark from first step output
    • Belief functions theory for reasoning with uncertainty
  • Mobile Optimization
    • Designed for offline smartphone use
    • Optimized for memory and computation constraints
    • Lightweight model architecture
  • Experimental Validation
    • Conducted experiments on real-world data
    • Compared against existing solutions
    • Validated accuracy improvements

Results & Impact

  • Significant Accuracy Improvement: Increased recognition accuracy from 56% to 75% (19% absolute gain)
  • Superior Performance: Outperformed state-of-the-art methods by over 20%
  • Confusion Reduction: Belief functions theory effectively reduced confusion between similar species
  • Mobile-Ready Solution: Developed application working offline on smartphones
  • Scientific Contribution: Published in Expert Systems with Applications journal (9 citations)
  • Practical Application: Enabled non-experts to identify tree species using their smartphones
Computer Vision Evidence Theory Mobile AI Multimodal Fusion EkNN Algorithm Image Processing

Key Achievements

Accuracy Breakthrough

Improved recognition accuracy by 19% absolute (56% → 75%)

Innovative Methodology

Developed novel two-step approach using belief functions theory

Mobile Innovation

Created first offline tree recognition app for smartphones

Research Impact

Published in top-tier journal (Expert Systems with Applications)

PhD Research | 2013-2016

Influence Maximization in Social Networks

PhD Dissertation - Evidence Theory Applications

+80% Precision Gain
85% Positive Opinion

Situation & Challenge

For viral marketing in social networks, companies needed to identify the most influential users, but existing solutions:

  • Relied mainly on network structure, ignoring user opinions
  • Were not robust to data uncertainty in social networks
  • Could not target different marketing scenarios based on influencer and audience opinions
  • Lacked theoretical foundations for handling uncertainty

Task & Objectives

Develop new influence maximization models that:

  • Consider multiple influence aspects (network position, activity, opinion)
  • Are robust to social network data uncertainty
  • Enable targeting of different marketing scenarios
  • Surpass existing model performance in influencer quality

Action & Research Approach

Implemented comprehensive research methodology:

  • Innovative Modeling:
    • Developed two influence maximization models based on belief function theory
    • Created seven different influence measures for three marketing scenarios
    • Introduced evidential influence measure combining network position, message popularity, and user activity
  • Rigorous Experimentation:
    • Collected and processed real Twitter dataset (36,274 users, 251,329 tweets)
    • Implemented complete opinion estimation pipeline using SentiWordNet and POS taggers
    • Conducted systematic comparisons with state-of-the-art models
    • Developed generated dataset for algorithm accuracy evaluation
  • Technical Optimization:
    • Implemented CELF algorithm for efficient maximization
    • Ensured solution scalability for large networks
    • Validated theoretical properties (monotonicity, submodularity) of objective functions

Results & Contributions

  • Technical Performance: 80%+ precision improvement on generated data
  • Computational Efficiency: 32-536 ms vs several minutes/hours for classical approaches
  • Quality Improvement: Detected influencers with 85% positive opinion vs 41% for existing models
  • Theoretical Contribution: Novel application of evidence theory to social network analysis
  • Research Impact: Multiple publications in top-tier journals and conferences
Evidence Theory Social Network Analysis Influence Maximization Machine Learning Twitter API CELF Algorithm

Key Achievements

Theoretical Innovation

First application of belief function theory to influence maximization

Performance Breakthrough

Achieved 80%+ precision improvement over state-of-the-art

Real-world Dataset

Built and analyzed comprehensive Twitter dataset

Research Recognition

Published in Knowledge-Based Systems (66+ citations)