This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
第7回全日本コンピュータビジョン勉強会「CVPR2021読み会」(前編)の発表資料です
https://kantocv.connpass.com/event/216701/
You Only Look One-level Featureの解説と、YOLO系の雑談や、物体検出における関連する手法等を広く説明しています
【ECCV 2016 BNMW】Human Action Recognition without HumanHirokatsu Kataoka
Project page:
http://www.hirokatsukataoka.net/research/withouthuman/withouthuman.html
The objective of this paper is to evaluate "human action recognition without human". Motion representation is frequently discussed in human action recognition. We have examined several sophisticated options, such as dense trajectories (DT) and the two-stream convolutional neural network (CNN). However, some features from the background could be too strong, as shown in some recent studies on human action recognition. Therefore, we considered whether a background sequence alone can classify human actions in current large-scale action datasets (e.g., UCF101). In this paper, we propose a novel concept for human action analysis that is named "human action recognition without human". An experiment clearly shows the effect of a background sequence for understanding an action label.
To the best of our knowledge, this is the first study of human action recognition without human. However, we should not have done that kind of thing. The motion representation from a background sequence is effective to classify videos in a human action database. We demonstrated human action recognition in with and without a human settings on the UCF101 dataset. The results show the setting without a human (47.42%; without human setting) was close to the setting with a human (56.91%; with human setting). We must accept this reality to realize better motion representation.
【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction...Hirokatsu Kataoka
Project page
http://www.hirokatsukataoka.net/research/transitionalactionrecognition/transitionalactionrecognition.html
Herein, we address transitional actions class as a class between actions. Transitional actions should be useful for producing short-term action predictions while an action is transitive. However, transitional action recognition is difficult because actions and transitional actions partially overlap each other. To deal with this issue, we propose a subtle motion descriptor (SMD) that identifies the sensitive differences between actions and transitional actions. The two primary contributions in this paper are as follows: (i) defining transitional actions for short-term action predictions that permit earlier predictions than early action recognition, and (ii) utilizing convolutional neural network (CNN) based SMD to present a clear distinction between actions and transitional actions. Using three different datasets, we will show that our proposed approach produces better results than do other state-of-the-art models. The experimental results clearly show the recognition performance effectiveness of our proposed model, as well as its ability to comprehend temporal motion in transitional actions.
【論文紹介】Fashion Style in 128 Floats: Joint Ranking and Classification using Wea...Hirokatsu Kataoka
CVPR2016にてシモセラ・エドガー氏が発表した、StyleNetの紹介資料です。
"Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction," Edgar Simo-Serra and Hiroshi Ishikawa, in CVPR2016.
論文情報
http://hi.cs.waseda.ac.jp/~esimo/publications/SimoSerraCVPR2016.pdf
プロジェクトページ
http://hi.cs.waseda.ac.jp/~esimo/ja/research/stylenet/
【CVPR2016_LAP】Dominant Codewords Selection with Topic Model for Action Recogn...Hirokatsu Kataoka
http://www.hirokatsukataoka.net/pdf/cvprw16_kataoka_ddt.pdf
In this paper, we propose a framework for recognizing human activities that uses only in-topic dominant codewords and a mixture of intertopic vectors. Latent Dirichlet allocation (LDA) is used to develop approximations of human motion primitives; these are mid-level representations, and they adaptively integrate dominant vectors when classifying human activities. In LDA topic modeling, action videos (documents) are represented by a bag-of-words (input from a dictionary), and these are based on improved dense trajectories ([18]). The output topics correspond to human motion primitives, such as finger moving or subtle leg motion. We eliminate the impurities, such as missed tracking or changing light conditions, in each motion primitive. The assembled vector of motion primitives is an improved representation of the action. We demonstrate our method on four different datasets.
【ISVC2015】Evaluation of Vision-based Human Activity Recognition in Dense Traj...Hirokatsu Kataoka
ISVC2015 paper
http://www.hirokatsukataoka.net/pdf/isvc15_kataoka_dt13feature.pdf
Activity recognition has been an active research topic in computer vision. Recently, the most successful approaches use dense trajectories that extract a large number of trajectories and features on the trajectories into a codeword. In this paper, we evaluate various features in the framework of dense trajectories on several types of datasets. We implement 13 features in total by including five different types of descriptor, namely motion-, shape-, texture- trajectory- and co-occurrence-based feature descriptors. The experimental results show a relationship between feature descriptors and performance rate at each dataset. Different scenes of traffic, surgery, daily living and sports are used to analyze the feature characteristics. Moreover, we test how much the performance rate of concatenated vectors depends on the type, top-ranked in experiment and all 13 feature descriptors on fine-grained datasets. Feature evaluation is beneficial not only in the activity recognition problem, but also in other domains in spatio-temporal recognition.
【ITSC2015】Fine-grained Walking Activity Recognition via Driving Recorder DatasetHirokatsu Kataoka
ITSC2015
http://www.itsc2015.org/
The paper presents a fine-grained walking activity recognition toward an inferring pedestrian intention which is an important topic to predict and avoid a pedestrian’s dangerous activity. The fine-grained activity recognition is to distinguish different activities between subtle changes such as walking with different directions. We believe a change of pedestrian’s activity is significant to grab a pedestrian intention. However, the task is challenging since a couple of reasons, namely (i) in-vehicle mounted camera is always moving (ii) a pedestrian area is too small to capture a motion and shape features (iii) change of pedestrian activity (e.g. walking straight into turning) has only small feature difference. To tackle these problems, we apply vision-based approach in order to classify pedestrian activities. The dense trajectories (DT) method is employed for high-level recognition to capture a detailed difference. Moreover, we additionally extract detection-based region-of-interest (ROI) for higher performance in fine-grained activity recognition. Here, we evaluated our proposed approach on “self-collected dataset” and “near-miss driving recorder (DR) dataset” by dividing several activities– crossing, walking straight, turning, standing and riding a bicycle. Our proposal achieved 93.7% on the self-collected NTSEL traffic dataset and 77.9% on the near-miss DR dataset.
Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity ...Hirokatsu Kataoka
In this paper we propose a novel feature descriptor Extended Co-occurrence HOG (ECoHOG) and integrate it with dense point trajectories demonstrating its usefulness in fine grained activity recognition. This feature is inspired by original Co-occurrence HOG (CoHOG) that is based on histograms of occurrences of pairs of image gradients in the image. Instead relying only on pure histograms we introduce a sum of gradient magnitudes of co-occurring pairs of image gradients in the image. This results in giving the importance to the object boundaries and straightening the difference between the moving foreground and static background. We also couple ECoHOG with dense point trajectories extracted using optical flow from video sequences and demonstrate that they are extremely well suited for fine grained activity recognition. Using our feature we outperform state of the art methods in this task and provide extensive quantitative evaluation.