personal blog: http://litowang.top/
- means validation is valid, * means promissing
- Factorization Machines
- Field-aware Factorization Machines for CTR Prediction
- Field-weighted Factorization Machines for Click-Through RatePrediction in Display Advertising
- AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks
- FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction
- CFM: Convolutional Factorization Machines for Context-Aware Recommendation
- Field-aware Neural Factorization Machine for Click-Through Rate Prediction
- Holographic Factorization Machines for Recommendation -> note
- Cross and Deep network for Ad Click Predictions
- Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features
- xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
- Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data
- Product-based Neural Networks for User Response Prediction
- DeepGBM: A Deep Learning Framework Distilled by GBDT for Online Prediction Tasks
- Online Deep Learning: Learning Deep Neural Networks on the Fly
- InteractionNN: A Neural Network for Learning Hidden Features in Sparse Prediction
- High-order Factorization Machine Based on Cross Weights Network for Recommendation
- Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions
- Quaternion Collaborative Filtering for Recommendation
- * Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction
- Exploring Content-based Video Relevance for Video Click-Through Rate Prediction
- DGFFM: Generalized Field-aware Factorization Machine based on DenseNet
- LMLFM: Longitudinal Multi-Level Factorization Machines
- *Sequence-Aware Factorization Machines for Temporal Predictive Analytics
- FLEN: Leveraging Field for Scalable CTR Prediction
- Beyond Similarity: Relation Embedding with Dual Attentions for Item-based Recommendation
- *Learning Feature Interactions with Lorentzian Factorization Machine
- Learning to Recommend via Meta Parameter Partition
- Online continual learning with no task boundaries
- Solving Cold Start Problem in Recommendation with Attribute Graph Neural Networks
- Mixed Dimension Embedding with Application to Memory-Efficient Recommendation Systems
- Generalized Embedding Machines for Recommender Systems
- A Sparse Deep Factorization Machine for Efficient CTR prediction
- AutoEmb: Automated Embedding Dimensionality Search in Streaming Recommendations
- ReZero is All You Need: Fast Convergence at Large Depth
- Dual-attentional Factorization-Machines based Neural Network for User Response Prediction
- Deep Match to Rank Model for Personalized Click-Through Rate Prediction
- Sequential Advertising Agent with Interpretable User Hidden Intents
- A Dual Input-aware Factorization Machine for CTR Prediction
- Deep Collaborative Filtering Based on Outer Product
- MsFcNET: Multi-scale Feature-Crossing Attention Network for Multi-field Sparse Data
- Controllable Multi-Interest Framework for Recommendation
- MMCTR: A MULTI-TASK MODEL FOR SHORT VIDEO CTR PREDICTION WITH MULTI-MODAL VIDEO CONTENT FEATURES
- TRUNCATED SVD-BASED FEATURE ENGINEERING FOR SHORT VIDEO UNDERSTANDING AND RECOMMENDATION
- Recommending What Video to Watch Next: A Multitask Ranking System
- Model Ensemble for Click Prediction in Bing Search Ads
- Field-aware Probabilistic Embedding Neural Network for CTR Prediction
- FedCTR: Federated Native Ad CTR Prediction with Multi-Platform User Behavior Data
- AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction
- DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving
- MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction
- Memory-efficient Embedding for Recommendations
- DNN2LR: Interpretation-inspired Feature Crossing for Real-world Tabular Data
- TFNet: Multi-Semantic Feature Interaction for CTR Prediction
- DCN-M: Improved Deep & Cross Network for Feature Cross Learning in Web-scale Learning to Rank Systems
- LT4REC:A Lottery Ticket Hypothesis Based Multi-task Practice for Video Recommendation System
- DS-FACTO: Doubly Separable Factorization Machines
- DEEP RELATIONAL FACTORIZATION MACHINES
- xDeepInt: a hybrid architecture for modeling the vector-wise and bit-wise feature interactions
- FIELD-EMBEDDED FACTORIZATION MACHINES FOR CLICK-THROUGH RATE PREDICTION
- Unbiased Ad Click Prediction for Position-aware Advertising Systems
- Compact and Computationally Efficient Representation of Deep Neural Networks
- Dot Product Matrix Compression for Machine Learning
- RaFM: Rank-Aware Factorization Machines
- Neural Input Search for Large Scale Recommendation Models
- A Meta-Learning Perspective on Cold-Start Recommendations for Items
- Automated Embedding Size Search in Deep Recommender Systems
- GMCM: Graph-based Micro-behavior Conversion Model for Post-click Conversion Rate Estimation
- GateNet:Gating-Enhanced Deep Network for Click-Through Rate Prediction
- Res-embedding for Deep Learning Based Click-Through Rate Prediction Modeling
- Task-distribution-aware Meta-learning for Cold-start CTR Prediction
- Ad Recommendation Systems for Life-Time Value Optimization
- Customer Lifetime Value in Video Games Using Deep Learning and Parametric Models
- Automatic Representation for Lifetime Value Recommender Systems
- Customer Lifetime Value Prediction in Non-Contractual Freemium Settings: Chasing High-Value Users Using Deep Neural Networks and SMOTE
- Modeling and Application of Customer Lifetime Value in Online Retail
- Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate
- Predicting Different Types of Conversions with Multi-Task Learning in Online Advertising
- Deep Bayesian Multi-Target Learning for Recommender Systems
- A Causal Perspective to Unbiased Conversion Rate Estimation on Data Missing Not at Random
- MULTI-LOSS WEIGHTING WITH COEFFICIENT OF VARIATIONS
- Multi-Task Learning as Multi-Objective Optimization
- GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
- Efficient Continuous Pareto Exploration in Multi-Task Learning
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
- Learning to Compare: Relation Network for Few-Shot Learning
- An Overview of Multi-Task Learning in Deep Neural Network
- A Pareto-Eficient Algorithm for Multiple Objective Optimization in E-Commerce Recommendation
- Learning Task Grouping and Overlap in Multi-Task Learning
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
- Accelerating Matrix Factorization by Overparameterization
- Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations
- Rotograd: Dynamic Gradient Homogenization for Multi-Task Learning(https://arxiv.org/pdf/2103.02631.pdf)
- A Principled Approach for Learning Task Similarity in Multitask Learning
- Probabilistic Lipschitzness (PL) condition
- Modeling Delayed Feedback in Display Advertising
- A Nonparametric Delayed Feedback Model for Conversion Rate Prediction
- A Practical Framework of Conversion Rate Prediction for Online Display Advertising
- * Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction
- Unbiased Learning to Rank with Unbiased Propensity Estimation
- * Dual Learning Algorithm for Delayed Feedback in Display Advertising
- A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback
- An Attention-based Model for CVR with Delayed Feedback via Post-Click Calibration
- Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction
- An overview of gradient descent optimization algorithms
- A Survey of Optimization Methods from a Machine Learning Perspective
- Introduction to Online Convex Optimization (book)
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- RmsProp: Overview of mini-batch gradient descent
- ADADELTA: AN ADAPTIVE LEARNING RATE METHOD
- ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
- DECOUPLED WEIGHT DECAY REGULARIZATION
- Optimizing Neural Networks with Kronecker-factored Approximate Curvature
- Shampoo: Preconditioned Stochastic Tensor 8000 Optimization
- Second Order Optimization Made Practical
- Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization
- Ad Click Prediction: a View from the Trenches
- *Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty
- Deep online learning via meta-learning: Continual adaptation for model-based RL
- Online Learning: A Comprehensive Survey
- Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization
- Follow the Moving Leader in Deep Learning
- Online Meta-Learning
- *Lookahead Optimizer: k steps forward, 1 step back
- On the Ineffectiveness of Variance Reduced Optimization for Deep Learning
- Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
- Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning
- *Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
- DC-S3GD: Delay-Compensated Stale-Synchronous SGD for Large-Scale Decentralized Neural Network Training
- The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
- Asynchronous Stochastic Gradient Descent with Delay Compensation
- An Attention-based Model for Conversion Rate Prediction with Delayed Feedback via Post-click Calibration
- A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter
- Natasha 2: Faster Non-Convex Optimization Than SGD
- Training Neural Networks for and by Interpolation (线性差值)
- *Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
- Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
- Follow the Leader: Theory and Applications (ppt)
- Stochastic Gradient Descent as Approximate Bayesian Inference
- Gradient descent with momentum — to accelerate or to super-accelerate?
- Visualizing the Loss Landscape of Neural Nets (理解NN LOSS)
- Adaptive Serverless Learning (去中心化sgd训练,也许有一些思路)
- Error Compensated Distributed SGD Can Be Accelerated
- DECOUPLED WEIGHT DECAY REGULARIZATION(weight decay和L2正则的一些思考)
- * AdaBelief(https://github.com/juntang-zhuang/Adabelief-Optimizer)
- FIXING WEIGHT DECAY REGULARIZATION IN ADAM
- []
- Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
- Stochastic Proximal Algorithms for AUC Maximization
- Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks
- Online AUC maximization
- FAST OPTIMIZATION ALGORITHMS FOR AUC MAXIMIZATION
- Matchbox: Large Scale Online Bayesian Recommendations
- Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine
- Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks
- PBODL : Parallel Bayesian Online Deep Learning for Click-Through Rate Prediction in Tencent Advertising System
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba
- Real-time Personalization using Embeddings for Search Ranking at Airbnb
- CNN features off-the-shelf: an astounding baseline for recognition
- ImageNet Classification with Deep Convolutional Neural Networks
- Deeply learned face representations are sparse, selective, and robust (PCA降维)
- Particular object retrieval with integral max-pooling of CNN activations
- Aggregating Deep Convolutional Features for Image Retrieval (SPoC)
- Deep Supervised Hashing for Fast Image Retrieval (DSH)
- Dimensionality reduction by learning an invariant mapping (Contrastive Loss)
- FaceNet: A Unified Embedding for Face Recognition and Clustering (Triplet Loss)
- Deep metric learning via lifted structured feature embedding (Lifted Structure Loss)
- Learning deep embeddingswith histogram loss (Histogram Loss)
- Largescale image retrieval with attentive deep local features (Spatial-wise Attention)
- Squeeze-and-excitation networks (SENET)
- SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning (SA+CA Attention)
- Online Second Price Auction with Semi-bandit Feedback Under the Non-Stationary Setting
- Smart Targeting: A Relevance-driven and Configurable Targeting Framework for Advertising System
- Optimization Problems for Machine Learning: A Survey
- Correct Normalization Matters:Understanding the Effect of Normalization On Deep Neural Network Models For CTR Prediction
- Why ResNet Works? Residuals Generalize(残差网络有效性分析)
- Visualizing the Loss Landscape of Neural Nets(残差网络可视化https://github.com/tomgoldstein/loss-landscape)
- Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
- On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
- Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
- Exploring Generalization in Deep Learning
- Interpreting neural network judgments via minimal, stable, and symbolic corrections
- Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
- [] Convergence Analysis of Two-layer Neural Networks with ReLU Activation
- Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation(tencent 迁移学习冷启动)
- TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN
- From Federated Learning to Fog Learning: Towards Large-Scale Distributed Machine Learning in Heterogeneous Wireless Networks
- How To Backdoor Federated Learning
- FedDistill: Making Bayesian Model Ensemble Applicable to Federated Learning
- Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
- Deep Neural Networks for YouTube Recommendations
- Latent Cross: Making Use of Context in Recurrent Recommender Systems
- [MIND召回]
- Simple and scalable predictive uncertainty estimation using deep ensembles
- Countdown Regression: Sharp and Calibrated Survival Predictions
- Probabilistic Forecasting with Spline Quantile Function RNNs
替代或者与ce loss融合,
- Improving Recommendation Quality in Google Drive
- [Improving Deep Learning For Airbnb Search]
- Learning to Rank using Gradient Descent
- BPR: Bayesian Personalized Ranking from Implicit Feedback