Keywords

1 Introduction

Our civilization is supported by various devices, and patent rights protect rights related to the invention of those devices. Patent rights assign exclusive rights for the implementation of inventions to their inventors as a reward for their contribution to the development of industry. It is therefore important to obtain patent rights for use of one’s inventions. Patent rights are particularly important for companies focused on technology development. According to a survey of patent applications filed with the Japanese Patent Office in FY2017, the number of patent applications is increasing in 11 of 18 industries [1]. These figures suggest the high importance of patent rights.

The pharmaceutical industry has a special relationship with patent rights in two ways. First, the fundamental research and clinical studies for drug development require about 9–17 years, with research and development investments of 20 to 50 billion yen and success probabilities as low as 1 in 20,000 [2]. However, successful development of a new drug can result in huge profits. According to a 2017 IQVIA survey, the ten best-selling domestic drugs had sales of over 60 billion yen per year [3]. Each pharmaceutical patent is therefore extremely important.

Second, pharmaceutical patents require special systems and examinations. Because pharmaceutical inventions require long periods for experiments, manufacturing, and marketing approval, there is a possibility that patent rights cannot be implemented during the twenty-year duration of an awarded patent. Patent durations can therefore be extended by up to five years. The Japanese Patent Office establishes patent examination guidelines for medical fields [4]. In general, it is difficult to determine from the structure and name of a pharmaceutical invention what it does and how to use it. A description of the results of pharmacological experiments must therefore be provided, following the examination guidelines.

Despite the pharmaceutical industry’s special relationship with patent rights, problems can arise when filing a patent. The Japanese Patent Act follows a first-to-file principle, so important patent applications must be filed earlier than other inventors. However, inventors will not be awarded a patent if the description of the invention in the application is insufficient. In the pharmaceutical industry in particular, the time and money required for R&D and pharmaceutical preparations call for early patent applications. An evaluation index for descriptions of the invention in a patent application is necessary to solve this problem.

We propose construction of a prediction model for pharmaceutical patentability using a nonlinear support vector machine (SVM) to improve prediction accuracy over that of a model proposed in previous research, described below. Specifically, we extract description features and correct labels from the Japanese Unexamined Patent Application Publication (2006) for pharmaceutical preparations, and input them into the SVM.

2 Previous Research

2.1 Patentability

It is difficult to immediately judge whether applicant descriptions meet criteria for novelty, progress, and so on. Therefore, Hido et al. [5] introduced “patentability” as an evaluation criterion for patents. Patentability describes the likelihood of a patent being awarded, with higher scores suggesting increased likelihood that patent description requirements are met.

2.2 Target Data

To construct their patentability prediction model, Hido et al. created correctly labeled data from the Japanese Unexamined Patent Application Publication. These data were extracted from about 300,000 patent applications filed between 1989 and 1998. These data show whether a patent was actually granted or denied for each application. In this study, we constructed a prediction model from data with correct labels by using the statistical characteristics of the description, word age, syntactic complexity, and term frequency–inverse document frequency (TF–IDF). These features are calculated for each description, and the resulting values are input to the model as an explanatory variable vector. Figure 1 shows an overview of model construction. For each description, Hido et al. prepared correct labels and feature sets for patentability to construct their prediction model. In contrast, we prepared a prediction model in which explanatory variables are varied, and conducted accuracy evaluation experiments. The following four prediction models were prepared.

  • Model 1: Statistical characteristics of the description

  • Model 2: Statistical characteristics of the description + word age

  • Model 3: Statistical characteristics of the description + word age + syntax complexity

  • Model 4: Statistical characteristics of the description + word age + syntax complexity + TF–IDF

Experiments were carried out by ten-fold cross validation using the area under the curve (AUC) value as the model evaluation value. In these experiments, the evaluation value for model 1 was 0.594. The highest accuracy was obtained from model 4 with all features added, which produced an evaluation value of 0.607.

Fig. 1.
figure 1

Model 1 (previous research)

2.3 Limitations of the Previous Research

Hido et al.’s research established a patentability prediction model using logistic regression for all patents from 1989 to 1998. The previous research had two problems.

First was an inability to construct a patentability prediction model applicable to all patents. Models were divided into eight types A–H according to the International Patent Classification (IPC) item in the description. The word information, word age, and effective statistical characteristics in the description varied greatly for each IPC, so target data had to be limited.

The second problem is that their patentability prediction model used a logistic model regression that is weak to separate nonlinear problems. Since this logistic regression model makes it difficult to nonlinearly discriminate features, prediction accuracy tends to decrease. The present research aims at solving these two problems.

3 Proposed Model

3.1 Features

We constructed a prediction model for pharmaceutical patentability using statistical characteristics of the description, which is an explanatory variable derived from structure and sentences in the description. We used the explanatory variable set used by Hido et al. and Nagata et al., along with the number of examples of pharmacological examinations related to the examination guidelines for medical fields. Table 1 shows the features used in this study.

Table 1. Statistical characteristics for the description

3.2 Correct Label

We extracted patentability from patent examination data in the description of pharmaceutical patents. Table 2 shows correct patentability labels. When applied to the logistic regression model, 1 indicates an awarded patent, and 0 a declined patent. When applied to SVM, 1 indicates an awarded patent, and \(-1\) a declined patent.

3.3 Scaling

Features are scaled to mean 0 and standard deviation 1. This scaling makes it possible to prevent information loss in the inner product of the kernel function features.

Table 2. Correct label

3.4 Under-Sampling

Under-sampling was applied because there was bias in the obtained data for correct answer labels. Under-sampling is a method of randomly extracting from majority data to match numbers of minority data. In this study, we extracted 1,834 data for awarded patents from among 3,492 data, in accordance with the no-patent-awarded data, and treated the total as 3,668 data.

3.5 Overview of SVM

We constructed the prediction model using SVM. SVM is a method of constructing a two-class pattern classifier that learns parameters from margin maximization in a training sample. Margin maximization determines the hyperplane in which the Euclidean distance is maximal with respect to the one closest to the other class (a support vector) in the learning data. The target Euclidean distance is called the “margin,” represented as \( {| w |} ^ {- 1}\). Maximizing the margin allows minimization of the discrimination error for the undersampling data. To find the optimal hyperplane with the largest margin, we calculate the parameter that minimizes a cost function (here, Eq. (1)) under a constraint condition (Eq. (2)). In this study, we used a soft margin and the kernel trick as SVM methods. We describe each below.

$$\begin{aligned} \mathrm{min}:\frac{1}{2}||w||^2 \end{aligned}$$
(1)
$$\begin{aligned} \mathrm{sub.to}:t_i(w^Tx_i+b)\ge 1 \end{aligned}$$
(2)

3.6 Soft Margin

Soft-margin SVM is a method for relaxing constraints to allow some error. Relaxing constraint conditions allows for acceptable results even when the margin is not maximal. Therefore, it can be applied even when separation fails. To find the optimal soft-margin hyperplane, we calculate the parameter value that minimizes the cost function (Eq. (3)) under a constraint condition (Eq. (4)).

$$\begin{aligned} \mathrm{min}:\frac{1}{2}||w||^2+C\sum _{i=1}^N \xi _i \end{aligned}$$
(3)
$$\begin{aligned} \mathrm{sub.to}:t_i(w^Tx_i+b)\ge 1-\xi _i, \xi _i\ge 0 \end{aligned}$$
(4)

Parameter C is a constant that determines the balance between the magnitude of the margin and the degree of protrusion.

3.7 Kernel Trick

The kernel trick is a method of linearly discriminating feature vectors after nonlinear transformation to deal with nonlinearity problems. In ordinary SVM, it is not always possible to construct a classifier with high performance in intrinsically nonlinear, complex discrimination problems. With the kernel trick, however, it is possible to construct an optimum discrimination function using kernel calculations only, by mapping to a higher dimension. To find the optimal discrimination function, we maximize a cost function (Eq. (5)) under a constraint condition (Eq. (6)).

$$\begin{aligned} \mathrm{max}:\sum _{i=1}^N\alpha _i-\frac{1}{2}\sum _{i,j=1}^N \alpha _i\alpha _j y_i y_j \mathrm{K}(x_i,x_j) \end{aligned}$$
(5)
$$\begin{aligned} \mathrm{sub.to}:0\le \alpha _i\le C,\sum _{i=1}^N y_i\alpha _i = 0 \end{aligned}$$
(6)

The K in the second term is called the “kernel.” In this study, we used the radial basis function (RBF) kernel (Eq. (7)).

$$\begin{aligned} \mathrm{K}(x_i,x_j)={\exp }(-\gamma ||x_i-x_j||^2) \end{aligned}$$
(7)

3.8 Study Model

Figure 2 shows an overview of the model construction. We divide statistical characteristics for the description and the correct labels into training data and test data, and input these into the SVM. Furthermore, we adjusted the C and \(\gamma \) parameters in Eqs. (6) and (7) to construct the optimal hyperplane. This resulted in a prediction model for pharmaceutical patentability that allowed for nonlinear discrimination.

Fig. 2.
figure 2

Study model

4 Overview of the Experiment

4.1 Target Data

In this study, we collected 14 types of patents related to pharmaceutical preparations. We used 5,326 Unexamined Patent Application Publications published between January 1 and December 31, 2006, along with data describing whether these applications resulted in an awarded patent. By under-sampling, it was possible to acquire 3,668 Unexamined Patent Application Publications from these data. We used 2006 data because the eighth edition of the IPC used in Japan was issued in 2006, and data associated with that edition are most numerous.

4.2 Evaluation

We used AUC to evaluate accuracy. AUC is defined as the area between the vertical and horizontal axes of the receiver operating characteristic (ROC) curve plotting the true positive rate (TPR) and the false positive rate (FPR). This value ranges from 0 to 1, with proximity to 1 indicating increased accuracy. Where classification is impossible, the value is defined as 0.5. TPR is the ratio of values judged as positive when the true class is positive. FPR is the ratio of values incorrectly judged as positive despite the true class being negative.

Table 3. Best parameters and AUC
Fig. 3.
figure 3

Parameters

5 Results and Discussion

5.1 Search for Parameters

To find the optimum parameters, we searched parameter C at 0.1 intervals in the range \(1 \le C \le 10\) and parameter \(\gamma \) in the range \(1 \le \gamma \le 10\) at 0.01 intervals. Table 3 and Fig. 3 show the experimental results. Table 3 shows that we construct the best hyperplane when \(C = 2.0\) and \(\gamma = 7.6\). Figure 3 shows that smaller values of C are more likely to be affected by \(\gamma \), and it can be discriminated by using appropriate support vectors.

5.2 Comparison of Prediction Models

Table 4 and Fig. 4 show the experimental results when using SVM. As the figure shows, the AUC value for logistic regression was 0.566, and the AUC value with SVM was 0.701. The SVM model thus improved accuracy by 0.107 compared with the 0.594 result by Hido et al. The AUC value improved because the logistic regression model linearly discriminates the feature space as it is, whereas the SVM model is linearly separated by mapping to a higher-dimensional feature space by nonlinear transformation. Therefore, an SVM capable of nonlinear discrimination is effective in a prediction model for pharmaceutical patentability. The logistic model regression was 0.028 lower than that by Hido et al., so narrowing down data classification to pharmaceutical patents cannot considered effective for the prediction model. However, while 5,326 data were used in this study, the previous study used about 300,000. An increased amount of data may be effective for narrowing data classification of pharmaceutical patents.

Table 4. AUC
Fig. 4.
figure 4

ROC curve for SVM and logistic regression

6 Conclusion

The Japanese Patent Act follows a first-to-file principle, so important patents must be filed earlier than other inventors. However, inventors will not be awarded a patent if the description of the invention in the application is insufficient. To solve this problem, a prediction model using logistic regression for patentability was studied in previous research. However, that model used linear discrimination, so the discrimination accuracy was not high. In this study, to increase prediction accuracy, we used a nonlinear SVM in a predictive model of patentability. In SVM model evaluation experiments, we confirmed the effectiveness of a nonlinear SVM model for constructing a prediction model for pharmaceutical patentability. The following summarizes the experiments.

In the experiments, we used a nonlinear SVM and logistic regression to construct the prediction model for pharmaceutical patentability. The experiments showed that the SVM model was more effective than the linear model. From the experimental results, the AUC of the SVM model using a RBF kernel was 0.701, and AUC increased to 0.107 more than that for the linear model in the previous study. A nonlinear SVM model is thus more effective for prediction of pharmaceutical patentability than is the logistic regression model.

As a future issue, in order to improve accuracy, it is necessary to increase the number of data, add features, and verify the model.