1430 IEICE TRANS. INF. & SYST., VOL.E101–D, NO.5 MAY 2018 PAPER Graph-Based Video Search Reranking with Local and Global Consistency Analysis Soh YOSHIDA†a), Takahiro OGAWA††b), Miki HASEYAMA††c), Members, and Mitsuji MUNEYASU†d), Senior Member SUMMARY Video reranking is an effective way for improving the re- trieval performance of text-based video search engines. This paper pro- poses a graph-based Web video search reranking method with local and global consistency analysis. Generally, the graph-based reranking ap- proach constructs a graph whose nodes and edges respectively correspond to videos and their pairwise similarities. A lot of reranking methods are built based on a scheme which regularizes the smoothness of pairwise rel- evance scores between adjacent nodes with regard to a user’s query. How- ever, since the overall consistency is measured by aggregating only the local consistency over each pair, errors in score estimation increase when noisy samples are included within query-relevant videos’ neighbors. To deal with the noisy samples, the proposed method leverages the global consistency of the graph structure, which is different from the conventional methods. Specifically, in order to detect this consistency, the propose method intro- duces a spectral clustering algorithm which can detect video groups, in which videos have strong semantic correlation, on the graph. Furthermore, a new regularization term, which smooths ranking scores within the same group, is introduced to the reranking framework. Since the score regular- ization is performed by both local and global aspects simultaneously, the accurate score estimation becomes feasible. Experimental results obtained by applying the proposed method to a real-world video collection show its effectiveness. key words: video search reranking, graph learning, graph consistency analysis, spectral clustering 1. Introduction With the explosive growth of social media, a great number of videos are being generated and shared on the Internet. For example, YouTube has over a billion users and people watch hundreds of millions of hours every day∗. Thus, many tech- niques have been developed for multimedia searches. Ow- ing to the success of information retrieval businesses, such as Google, Bing, and Yahoo!, most search engines employ text-based techniques by using nonvisual information such as surrounding text and user-provided tags, associated with visual content. However, since textual information is some- times noisy or unavailable, the inconsistency between tex- tual features and visual contents can cause poor image/video Manuscript received September 1, 2017. Manuscript revised December 20, 2017. Manuscript publicized January 30, 2018. †The authors are with Kansai University, Suita-shi, 564–8680 Japan. ††The authors are with Hokkaido University, Sapporo-shi, 060– 0814 Japan. a) E-mail: sohy@kansai-u.ac.jp b) E-mail: ogawa@lmd.ist.hokudai.ac.jp c) E-mail: miki@ist.hokudai.ac.jp d) E-mail: muneyasu@kansai-u.ac.jp DOI: 10.1587/transinf.2017EDP7277 search results [1], [2]. To improve the text-based search performance and overcome the semantic gap between text information and video contents, visual search reranking has been the focus of attention in recent years [3]–[9]. This technique adjusts the initial ranking orders by mining visual content or lever- aging some auxiliary knowledge. Most reranking methods have been developed on the basis of the following three as- sumptions: (1) visual contents with dominant patterns are expected to be ranked higher than others, (2) visual contents with similar visual appearance are to be ranked closely, and (3) top-ranked contents in initial search results are expected to be ranked relatively higher than the others. Under these assumptions, visual information is introduced to refine the initial search result. A lot of reranking methods are formulated as finding the optimal ranked list from the perspective of Bayesian theory [10], [11] and manifold discovery [12], [13]. These reranking approach assumes that relevant multimedia doc- uments such as images and videos lie on a manifold in vi- sual feature space. Then the reranking is accomplished by graph-based learning methods. Therefore, we call it graph- based reranking. Generally, the approach constructs a graph, where the nodes are multimedia documents and the edges reflect their pairwise similarities. The initial relevance of each document can be viewed as the stationary probabil- ity of each node and can be transitioned to other similar nodes until some convergence conditions are satisfied. This graph representation of search results can be integrated into a regularization framework by considering the following two terms: a graph regularizer that keeps the ranking po- sitions of visually similar documents close and a loss term insuring that the reranked results do not change too much from the initial ranking list. Although many different methods have been proposed, the visual consistency between similar video contents is not always guaranteed due to the complexity of real-world video contents. Then, in several cases, search performance may be even degraded after the reranking. This is because that most graph-based methods measure visual consistency pair- wisely. The overall consistency is measured by aggregat- ing the local consistency over each pair. Thus, errors in score estimation increase when noisy samples are included in each pair. To solve this problem, we introduce the idea ∗https://www.youtube.com/yt/press/statistics.html Copyright c© 2018 The Institute of Electronics, Information and Communication Engineers YOSHIDA et al.: GRAPH-BASED VIDEO SEARCH RERANKING WITH LOCAL AND GLOBAL CONSISTENCY ANALYSIS 1431 of social network analysis. Specifically, community detec- tion methods have attracted great research interests in the past years [14]. A community consists of a group of nodes that are densely connected to each other but sparsely con- nected to other dense groups. Since a community struc- ture in networks usually reveals the common topic or inter- est, the consistency over an area among a same community means a video group whose videos have strong correlation with its neighbor. We call it global consistency. However, these consistency analysis is not considered for improving performance of video search reranking. Therefore, it is de- sirable to develop a novel algorithm that regularizes graph consistency based on both local and global aspects, simulta- neously. In this paper, we propose a novel graph-based rerank- ing with local and global consistency analysis. We adopt the following two procedures: (A) detection of the global con- sistency over the graph and (B) modeling of the graph-based reranking considering both local and global consistency. First, in (A), we detect the global consistency by adopt- ing a spectral clustering algorithm [15] to the constructed graph. Given a similarity graph, a spectral clustering algo- rithm finds a partition of the set of its nodes into clusters. This algorithm satisfies the following hold: nodes in dif- ferent clusters are dissimilar to each other, which aims to minimize the between-cluster similarities; and nodes in the same cluster are similar to each other, which aims to maxi- mize the within-cluster similarities. From the clustering re- sult, we extract center nodes corresponding to representa- tive nodes of each cluster. Then we define a new affinity matrix representing the similarity between center nodes and the similarity between nodes among the same video group. In (B), we model reranking using the graph and the affinity matrix which reflects global consistency over the graph. Our reranking model is built based on a Bayesian formulation [10] and its multimodal expansion [9]. In this paper, we introduce a new graph regularizer that smooths the ranking scores among the same video group obtained by the procedure in (A). For a video, instead of calculating the consistency with each of its neighbors individually, the proposed regularizer considers the consistency with all of videos among the same group simultaneously. By using this term with the previous regularization framework, the pro- posed method can suppress the influence of noisy videos. Furthermore, it is difficult to assign the appropriate parame- ters to the two types of affinity matrices. In order to integrate two aspects, we introduce the graph-learning approach and tune these parameters automatically. Finally, by minimizing the objective function including three terms, i.e., graph lo- cal and global regularizer terms and a loss term, the desired consistency over the graph is guaranteed. Therefore, perfor- mance improvement by the graph-based reranking becomes feasible. The contribution of this work is summarized as fol- lows: 1) We propose a graph global consistency detection ap- proach for video search reranking. This enables in- tegration of global consistency analysis into a graph- based regularization framework. 2) The proposed method simultaneously regularizes the smoothness of the ranking scores between not only adjacent nodes but also nodes among the same video group. This approach enables suppression of the influ- ence of noisy videos’ score propagation. This paper is an extended version of [16]. In this pa- per, the following three aspects are enhanced. 1) In order to improve the robustness of the algorithm for obtaining the affinity matrix of each aspect, we introduce the graph-based learning approach in our method. By using this approach, tuning parameters for determining the scale of the affinity matrix are automatically learned. 2) We complement dis- cussions of parameters to set manually. 3) We collect a Web video search dataset using 15 queries and study the effec- tiveness of the proposed method by comparing it with vari- ous conventional graph-based reranking methods. The remainder of this paper is organized as follows. In Sect. 2, we review the related work on the visual search reranking for image and video retrieval. Section 3 presents the proposed method, which retrieves videos using a graph- based reranking framework with local and global consis- tency analysis. Section 4 provides experimental results that verify the performance of the proposed method. Finally, Sect. 5 presents concluding remarks. 2. Related Work 2.1 Visual Search Reranking Visual search reranking has been widely investigated for im- proving the search performance of images, videos and other multimedia documents. The existing visual search reranking efforts can be mainly classified into two categories accord- ing to whether there are query examples available, which are called example-based reranking and self-reranking. For the first category, these methods need several ex- amples in addition to a text-query. Yan et al. [3] regard the query examples as relevant samples and several bottom- ranked results in a ranking list as irrelevant ones. A Sup- port Vector Machine (SVM) model is then learned based on these samples to rerank the search results. Natsev et al. [4] improve the robustness of this example-based approach by a bagging strategy. They collect multiple irrelevant sam- ple sets and then generate different ranking lists accord- ingly. These ranking lists are aggregated to generate the final reranked result. Liu et al. [5] use the query examples to dis- cover the relevant and irrelevant concepts for a given query and identify an optimal set of document pairs by using an information theory. A ranking list is then directly recovered from this pair set. These methods can improve search per- formance if good visual examples are provided. However, these methods cannot be used in the cases when there is no visual example available. 1432 IEICE TRANS. INF. & SYST., VOL.E101–D, NO.5 MAY 2018 For the second category, the self-reranking approach does not rely on query examples. It aims to improve text- based search by mining the visual information of images or videos. In many cases, we can assume that the top-ranked documents are the few “relevant” (called pseudo relevant) documents that can be viewed as “positive”. This is in contrast to relevance feedback where users explicitly pro- vide feedback by labeling the results as positive or nega- tive. Kennedy et al. [6] regard top and bottom-ranked re- sults in a ranking list as pseudo relevant and irrelevant sam- ples respectively to discover the related concepts. The de- tection results of the related concepts are then used as high- level features in SVM to build classifiers for reranking. Hsu et al. [17] formulate the reranking process as a random walk over a context graph, where videos are nodes and the edges between them are weighted by multimodal similarities. Jing et al. [8] apply the PageRank [18] to product image search and design the VisualRank algorithm for reranking. After a similarity-based image link graph is generated, an itera- tive computation similar to PageRank is utilized to rerank the images. Yang et al. [19] extract multiple features from each image and collect a training set that contains several queries and labeled search results. Reranking is then re- garded as a supervised learning task. Tian et al. [10] model the textual and visual information from the probabilistic per- spective and formulate visual reranking as an optimization problem in the Bayesian framework, named Bayesian visual reranking. This method encodes the assumptions that the reranked results do not change much from the initial rank- ing list and the ranking positions of visually similar images are close. However, its fundamental deficiency lies in the noise, i.e., it is not guaranteed that the irrelevant instances are al- ways apart from the top returns, which would push away true positive after reranking in many cases. In this work, to perform robust visual reranking in this kind of situation, we investigate video search reranking with local and global con- sistency analysis based on community detection approach. By learning the adaptive similarity weights of each aspect, we will show that our approach can effectively integrate two aspects to boost ranking performance. 2.2 Graph-Based Learning Graph-based learning has been introduced into visual reranking in the past year. One major advantage of graph- based learning is to encode the data structure into the data similarity measurement to refine inference and modeling. In these methods, a graph is constructed based on the given data, where nodes and edges respectively correspond to samples and their pairwise similarities. They are usually formulated in a regularization scheme with two terms. One term is used to enforce the function to be smooth on the graph, and the other term is used to keep the function consis- tent with prior information such as the labeling information of several samples. The algorithms can be accomplished by a random walk process. He et al. [12] adopt a graph-based method named manifold-ranking in image retrieval. Wang et al. [9] devel- oped a multi-graph learning approach to fuse multiple fea- ture channels based on semi-supervised learning. In [20], multiple graphs from different retrieval methods are fused by summing up the edge weights, and then a graph align- ment is conducted to build an overall similarity graph. In [8], [10], [21], the initial ranking list is refined on the graph by propagating the ranking scores through the edges. Unfortunately, the regularization term used in these methods measures the graph consistency pairwisely. Specif- ically, the overall consistency is measured by aggregating the local consistency over each pair. The consistency on the graph is multiplewise instead of pairwise since it is a term defined over the whole neighboring samples. Therefore, the consistency approximated through pairwise regularizers is not satisfactory enough. Our method is inspired by [9], [14]. Our approach first detects the global consistency of the over- all graph. By using the multimodal graph learning method, we then fuse the two types of graphs and then estimate an optimal relevance score with regard to the user’s query. 3. Graph-Based Video Search Reranking with Consis- tency Analysis In this section, we describe our proposed reranking ap- proach. We first introduce the existing graph-based rerank- ing methods with a general regularization scheme. We then present our approach including consistency analysis and new graph regularization. For clarity, the notations and def- initions throughout this paper are summarized in Table 1. 3.1 Graph-Based Reranking with Local Regularizer We first follow [10] to define several terms in reranking. Let r̄ = [r̄1, r̄2, . . . , r̄N]T and r = [r1, r2, . . . , rN]T denote vec- tors of the initial ranking scores and the relevance scores, which correspond to the video set X = {x1, x2, . . . , xN}. r̄i and ri are the initial ranking scores, which are calculated from the ranking position by keyword search, and the rel- evance scores with regard to the user’s query. We also use Table 1 Notation table. Notation Definition X, xi The Video set and ith video in a ranking list. r̄, r̄i The vector of the initial ranking scores and the score of xi. r, ri The vector of the relevance scores and the score of xi. L, G Indicators for local and global aspects. W• The affinity matrix of videos. A• The transformation matrix including the affinity matrix. L•, L̃• The graph Laplacian and the normalized graph Laplacian derived from W•. D• The degree matrix derived from W•. O The centroids of spectral clustering. C The node set which corresponds to each centroid. α•, ρ Tuning parameters. N The number of videos. K The number of clusters for spectral clustering. T , T1 The iteration time in the alternating optimization. YOSHIDA et al.: GRAPH-BASED VIDEO SEARCH RERANKING WITH LOCAL AND GLOBAL CONSISTENCY ANALYSIS 1433 xi to denote its feature vector. In this paper, three kinds of visual features and one kind of audio feature are adopted (described in 4.1). Generally, graph-based reranking can be formulated as a regularization framework. The objective function is then defined as: arg min r Q(r) = R(r,W) + ρL(r, r̄), (1) where the first part is a regularization term that makes the ranking scores of visually similar videos close, the second part is a loss term that estimates the difference between r and r̄, and ρ is a trade-off parameter. As the term R(r,W), a graph G is constructed with nodes being the videos and sim- ilar videos are linked by edges. Then graph Laplacian [22] and normalized graph Laplacian [23] can be widely utilized. When constructing the graph G, each video is connected with its k-nearest neighbors [10]. W is an affinity matrix in which Wi j indicates the visual similarity between xi and x j. In this paper, we use WL and WG as the affinity matri- ces for local and global aspects, respectively. For the local aspect, if two videos xi and x j are connected as the edge, the similarity WLi j is calculated based on the Gaussian ker- nel with the scaling parameter σL. Otherwise, two videos are not connected WLi j = 0. We define the affinity matrix WL ∈ RN×N by taking WLi j as its (i, j)th element. Through minimizing the objective function Q(r), the optimum rank- ing score list r∗ can be derived as r∗ = arg minr Q(r) using the local regularizer R(r,WL). 3.2 Global Consistency Detection This subsection shows how to detect global consistency by using a spectral clustering algorithm [15]. In this paper, global consistency means that videos on the same video group structure, typically referred to as a cluster, are likely to have a high similarity. Since this structure in the graph usually reveals the common topic or interest, the consis- tency over a local area within the same graph means that each sample has strong correlation with its neighbor. Thus, if we can deduce a sample’s score in its neighbors precisely, it is regarded that this sample is locally consistent. Spectral clustering unveils the video group structure by exploiting the eigen-structure of the graph Laplacian matrix LL, where LL = DL −WL and DL is a diagonal matrix and its (i, i)th element is the sum of ith row of WL. Let U con- sist of the unit-length eigenvectors which are associated with the K smallest eigenvalues of LL, namely U = {u1, . . . ,uK}, which is a K-dimensional embedding of the graph. The in- formation of each node is therefore captured by a point in R K . In order to discover the video group structure, k-means clustering is applied to the rows of U and returns the video group labels z = {z1, . . . , zN} ∈ {1, . . . ,K} and K centroids O = {μ1, . . . , μK}. Then we detect nodes C = {c1, . . . , cK}, which correspond to each centroid O and are called center nodes. A spectral clustering algorithm is provided in Al- gorithm 1 with the input being the affinity matrix WL and Algorithm 1 Global consistency detection using a spectral clustering algorithm Input: The affinity matrix WL of the video graph G and K Output: Label set z and center nodes C 1: procedure GlobalConsistencyDetection(W, K) 2: dLii ← ∑N j=1 W L i j 3: DL ← diag{d11, . . . , dNN } 4: LL ← DL −WL 5: {u1, . . . ,uK } ← unit-length eigenvectors of LL which are associ- ated with the K smallest eigenvalues of LL 6: U← {u1, . . . ,uK } 7: Cluster labels for all nodes and centroids of K groups (z,O) ← results of k-means clustering on the rows of U with K centres 8: {c1, . . . , cK } ← nodes corresponding to each centroid O = {μ1, . . . , μK } 9: C← {c1, . . . , cK } 10: return (z,C) 11: end procedure Fig. 1 Center node detection and similarity definition based on shortest path problem. the pre-specified number of groups K. Its outputs are the estimated labels z and the center nodes C. The goal of our reranking is to regularize smoothness of the ranking scores between not only adjacent nodes but nodes among the same video group simultaneously. There- fore, we define a new weight WGi j , which represents the sim- ilarity between each node and its center node among the same video group. As shown in Fig. 1, if two videos xi and x j have the same label z and x j ∈ C, we connect them by an edge and calculate its weight WGi j . We define the affin- ity matrix WG ∈ RN×N by taking WGi j as its (i, j)th element. By using the affinity matrix WG, we formulate the reranking problem. 3.3 Proposed Graph-Based Reranking Algorithm We develop our approach based on normalized graph Laplacian and ranking distance. Typically, the similarity of kth aspect (k ∈ {L,G}) between ith and jth videos is firstly defined as Wki j = exp(−||xi−x j||2/σ2k), whereσk is the scaling parameter of the Gaussian function that converts distance to similarity. However, Euclidean distance may not be appro- priate as the most suitable distance metric [24]. Therefore, we replace the Euclidean distance metric with the follow- ing Mahalanobis distance metric, which can be learned an optimization framework: Wki j = exp ( −(xi − x j)T Mk(xi − x j) ) , (2) 1434 IEICE TRANS. INF. & SYST., VOL.E101–D, NO.5 MAY 2018 where Mk is a symmetric positive semi-define real matrix. We decompose Mk as Mk = ATk Ak, where Ak ∈ Rd×d and is substituted it into Eq. (2) as Wki j = exp ( −||Ak(xi − x j)||2 ) . (3) This is equivalent to transform each video xi to Akxi. For the initialization, we set Ak to a diagonal matrix I/σk, where σk is the median value of the pairwise Euclidean distance of the videos in the kth aspect. The proposed method considers local and global as- pects in the graph. Here, we linearly combine the normal- ized graph Laplacian regularizers. Mathematically, in order to smooth reranking scores based on both global and local consistencies, we model the reguralizer term so as to com- bine local and global terms as follows: R(r,AL,AG) = ∑ k∈{L,G} ∑ i, j αkW k i j ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ri√ dkii − r j√ dkj j ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ 2 = ∑ k∈{L,G} αkrT L̃kr, (4) where αk is the weight for local and global regularizers. The weights satisfy 0 ≤ αk ≤ 1 and αL+αG = 1. dkii is the sum of the ith row of Wk, L̃k = I−D−1/2k WkD−1/2k is the normalized graph Laplacian, and Dk is the diagonal matrix whose (i, i)th element is dkii. Accordingly, our algorithm can be formulated as the following optimization problem: min r,AL,AG Q(r,AL,AG) = ∑ k∈{L,G} ∑ i, j αkW k i j ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ri√ dkii − r j√ dkj j ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ 2 + ρ ∑ i, j∈S r̄ ( 1 − ri − r j r̄i − r̄ j )2 , (5) where the loss term indicates the preference strength ranking distance [10] and S r̄ is the set of pairs (i, j) whose relevance scores of all the sample-pairs (xi, x j) satisfy r̄i > r̄ j. Note that an appropriate scale of Ak for estimating Wk will also be automatically determined. The scaling parameter is usu- ally very sensitive for graph-based learning, and it needs to be carefully tuned. The elimination of the parameter by au- tomatically determining the scale of Ak is also an important element of our approach. 3.4 Alternating Optimization The formulation shown in Eq. (5) is a minimization problem involving two variables to optimize. Since this objective is not convex, it is difficult to simultaneously recover both un- knowns. However, if we hold one unknown constant and solve the objective for the other, we have two convex prob- lems that can be optimally solved. In the rest of this section, we introduce an alternating optimization for our reranking framework, which iterates between the updates of r and Ak. 3.4.1 Update for r By using the form of normalized graph Laplacian, we can rewrite Eq. (5) as follows: Q(r,AL,AG) = ∑ k∈{L,G} αkrT L̃kr + ρ ∑ i, j∈S r̄ ( 1 − ri − r j r̄i − r̄ j )2 , (6) If the transformation matrices AL and AG are constant, then denote βi j = 1/(r̄i− r̄ j) and the relevance score list r can be updated by solving the following optimization problem: min r Q(r) = min r ∑ k∈{L,G} αkrT L̃kr + ρ ∑ i, j∈S r̄ { 1 − βi j(r̄i − r̄ j) }2 = min r ∑ k∈{L,G} αkrT L̃kr + ρ(rT L(B) − 2Be)r, (7) where L(B) is a graph Laplacian matrix defined over the graph GB which has the same structure of G regarding the weight between nodes xi and x j as |βi j|. B = [βi j]N×N is an anti-symmetric matrix, and e is a vector with all elements equal to 1. Finally, the relevance score list r is derived by differen- tiating w.r.t r and equating it to zero as follows: r = ⎛⎜⎜⎜⎜⎜⎝ ∑ k αkL̃k + ρL(B) ⎞⎟⎟⎟⎟⎟⎠ −1 ρ̃, (8) where ρ̃ = 2ρ(Be). It can be seen that different from the normalized graph Laplacian based learning, the two types of normalized graph Laplacian matrices have been linearly combined with weights αk. 3.4.2 Update for Ak Now, we consider the optimization of Ak (k = L,G). Since the optimization of both AL and AG is the same process, we describe that of AL as an example. Considering r and AG are fixed, we then derive the derivative of Q with respect to AL as follows: ∂ ∂AL Q(AL,AG) = αL ∂ ∂AL ∑ i, j WLi j ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ri√ dLii − r j√ dLj j ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ 2 = αL ∑ i, j (hLi j) 2 ∂WLi j ∂AL −WLi jhLi j ⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ri√ (dLii) 3 ∂dLii ∂AL − r j√ (dLj j) 3 ∂dLj j ∂AL ⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ , (9) YOSHIDA et al.: GRAPH-BASED VIDEO SEARCH RERANKING WITH LOCAL AND GLOBAL CONSISTENCY ANALYSIS 1435 Algorithm 2 Gradient descent process for solving Ak. Input: Step-size parameter ηt = 1. Output: The transformation matrix Ak . 1: Set A(0)k to a diagonal matrix I/σk , where σk is the median value of the pairwise Euclidean distances of the videos in the kth aspect. 2: for t = 1 to T1 do 3: Let A(t+1)k = A (t) k − ηt ∂Q∂Ak |Ak=A(t)k . 4: if Q(A(t+1)k ) < Q(A (t) k ) then 5: ηt+1 = 2ηt; 6: else 7: A(t+1)k = A (t) k , ηt+1 = ηt/2. 8: end if 9: end for Algorithm 3 Optimization process of the reranking algo- rithm Input: Tuning parameters αk and trade-off parameter ρ. The affinity ma- trices W(0)L ,W (0) G for initialization. Output: The relevance score list r. 1: Set A(0)L , A (0) G to diagonal matrices I σL , IσG , respectively, where σ• is the median value of the pairwise Euclidean distances of the videos in each aspect. 2: for t = 1 to T do 3: Compute the tth optimal relevance score list r(t) according to Eq. (8). 4: Update tth transformation matrices A(t+1)L and A (t+1) G sequentially according to Algorithm 2. 5: Update the similarity matrices W(t+1)k as Eq. (3). 6: end for where hLi j = ri√ dLii − r j√ dLj j , ∂WLi j ∂AL = −2WLi jAL(xi − x j)T (xi − x j), ∂dLii ∂AL = N∑ j=1 ∂WLi j ∂AL . (10) In order to solve the optimization of AL using Eq. (9), we adopt a gradient descent process. In the gradient descent process, we dynamically adapt the step-size in order to ac- celerate the process while guaranteeing its convergence. De- note A(t)L as a result of AL in tth turn of the iterative process. If Q(A(t+1)L ,AG) < Q(A (t) L ,AG), i.e., the cost function ob- tained after the gradient descent is reduced, we double the step-size. Otherwise, we decrease the step-size and do not update AL. The process is shown in Algorithm 2. In this process, we denote Q(AL) as the value of the object func- tion when entering AL. After the iteration of AL, r and AL are fixed, and AG is calculated by the same way as AL. The whole alternating optimization process is illus- trated in Algorithm 3. After the alternating optimization, the proposed method returns videos in accordance with the optimal relevance score r as the video searching result. Table 2 15 Event Queries. Queries UEFA EURO 2016 highlights Sochi 2014 Winter Olympics opening ceremony World Figure Skating Championships 2016 Rio 2016 Summer Olympics games NBA Finals 2014 highlights CCTV new years gala 2016 Speech at Apec China 2014 speech 2014 Hong Kong protests 2014 Israel Gaza Conflict air strikes Malaysia Airlines Flight 17 crash moment New York Fashion Week 2014 runway November 2015 Paris attacks Flood in Indonesia 2014 Calbuco Volcano Eruption in Chile Italy earthquake 2016 4. Experimental Results In this section, we verify the effectiveness of our pro- posed method. We first describe the datasets collected from YouTube† and the measurements in the experiments. We then analyze the performance of our method of video search reranking. 4.1 Datasets and Features Datasets: While the research on video search has recently received intensive attention, the public datasets do not re- flect current social event topics. To substantially evaluate our approach, we collected a new dataset with rank infor- mation from YouTube for video search reranking. Specifi- cally, the used videos were crawled from YouTube by using 15 event queries as shown in Table 2. There is a MSRA- MM (Microsoft Research Asia Multimedia) dataset [25] as a well-known dataset for video search. In this task, 9 categories of videos are searched. Therefore, we use 15 queries in the experiments. These queries cover current topics of news from events, which were selected by refer- ence to the categories “Categories:2014, 2015, and 2016” from Wikipedia††. For each query, we obtained max top-500 videos, and analyzed the related videos of each video by us- ing YouTube API†††. Furthermore, the associated contextual information such as tags, titles and descriptions were also crawled together with videos. This dataset is a real-world Web video dataset containing the original ranking informa- tion. By using these videos, we construct the video graph G. When constructing the graph G, each sample is connected with its k-nearest neighbors. The neighborhood size is set to 5. For the iteration times T and T1, we set them to 5 and 10, respectively. Our method assigns the initial score r̄i = 1− r̂i/N, where r̂i is the rank of video xi returned by the search engine. †http://www.youtube.com ††http://en.wikipedia.org/wiki/Category:2014, 2015, and 2016 †††https://developers.google.com/youtube/v3/ 1436 IEICE TRANS. INF. & SYST., VOL.E101–D, NO.5 MAY 2018 Fig. 2 Visual results of the video reranking from different approaches of the specific query (Rio 2016 Summer Olympics Games): (a) Ours, (b) Ours (without Ak optimization), (c) Ours (αG = 0), (d) So- cialRank, (e) MGL, (f) Bayesian, (g) VisualRank, (h) RandomWalk, (i) BM25, (j) Initial (No method). Note that the corresponding YouTube IDs are shown below the images. Features: For query videos, we extract the following the sequential features from the whole videos and the frame- level visual and audio features from keyframes. Note that we denote I-frames of the MPEG-4 video as the keyframes. C3D: We apply the C3D model [26] pre-trained on the Sports 1M dataset to compute representations with 512 dimensions. Inception-v3: We apply the Inception-V3 model [27] pre- trained on the ImageNet 1K classification task to com- pute representations with 2048 dimensions. HSV Color histogram: We use the HSV color histogram to exploit the color information. To contain spatial information, keyframes are divided into 25 blocks of the same size. A 1600-dimensional HSV normalized color histogram of each region with 4 bins in each color space is extracted. MFCC: Mel-frequency cepstral coefficients (MFCC), which describe the short-time spectral shape of au- dio frames, are extracted to capture audio information. MFCC are widely used not only for speech recogni- tion but also for generic audio classification. ΔMFCC, ΔΔMFCC, log-power, Δlog-power and ΔΔlog-power are extracted in addition to the MFCC. The dimension of the audio feature is 39 including 12-dimensional MFCC. These sequential and frame-level visual and audio features are combined by early fusion followed by PCA to reduce the dimension to 256. The video-level feature xi of the ith video is mean-pooled from frame-level features. YOSHIDA et al.: GRAPH-BASED VIDEO SEARCH RERANKING WITH LOCAL AND GLOBAL CONSISTENCY ANALYSIS 1437 4.2 Evaluation Metrics The performance evaluation of our method is voted by eight volunteers who are invited to assign the relevance scores for top N videos of each query. The averaged relevance score is used to measure the retrieval results. The performance is measured by the widely used aver- age precision (AP), which averages the precision obtained when each relevant video occurs. We average the APs over all the 15 queries to obtain the mean AP (MAP) as an over- all performance measurement. Then, to measure the video search performance, the normalized discounted cumulative gain (NDCG) [28], which is commonly used measure in in- formation retrieval when there are more than two relevance levels, is adopted. For a given query, the NDCG score at position d in the ranking list is calculated as follows: NDCG@d = Zd d∑ j=1 2t j − 1 log(1 + j) , (11) where t j is the degree of the jth video in the ranking list and Zd is a normalization constant chosen to guarantee that NDCG@d is 1 for a perfect ranking. For each video, in the experiments, the relevance degree t j was judged man- ually on four scales: “0:Irrelevant”, “1:Fair”, “2:Relevant”, and “3:Very Relevant”. To evaluate the overall performance, we average the NDCGs over all queries to obtain the mean NDCG (MNDCG). Table 3 MAP comparison of video reranking performance. Methods MAP Ours 0.735 Ours (without Ak optimization) 0.729 Ours (αG = 0) 0.684 SocialRank 0.731 MGL 0.727 Bayesian 0.717 VisualRank 0.635 RandomWalk 0.529 BM25 0.583 Initial (No method) 0.597 Table 4 MNDCG@d comparison of the video reranking performance. Methods @5 @10 @20 @30 @40 @50 @60 @70 @80 @90 @100 Ours 0.901 0.895 0.837 0.825 0.811 0.792 0.762 0.751 0.749 0.751 0.746 Ours (without Ak optimization) 0.893 0.881 0.832 0.824 0.804 0.791 0.751 0.735 0.733 0.731 0.733 Ours (αG = 0) 0.806 0.781 0.773 0.754 0.744 0.733 0.721 0.701 0.699 0.685 0.679 SocialRank 0.899 0.894 0.831 0.823 0.813 0.789 0.742 0.743 0.741 0.739 0.729 MGL 0.871 0.850 0.845 0.796 0.785 0.778 0.761 0.749 0.738 0.751 0.733 Bayesian 0.850 0.811 0.786 0.759 0.753 0.742 0.738 0.735 0.730 0.723 0.722 VisualRank 0.671 0.645 0.647 0.661 0.659 0.651 0.656 0.652 0.649 0.644 0.651 RandomWalk 0.581 0.578 0.575 0.581 0.582 0.573 0.571 0.586 0.587 0.579 0.590 BM25 0.682 0.679 0.659 0.641 0.638 0.642 0.633 0.628 0.625 0.632 0.631 Initial (No method) 0.677 0.663 0.668 0.668 0.661 0.665 0.665 0.654 0.654 0.652 0.653 4.3 Reranking Results To evaluate the performance of the proposed reranking al- gorithm, we first compare the proposed method with the fol- lowing eight reranking methods: 1) No method, i.e., the initial search results without reranking. This method is denoted as “Initial”. 2) The text-based search results based on the Okapi BM- 25 formula [29] using the associated contextual infor- mation of each video. The method is denoted as “BM25”. 3) The random walk method proposed in [17]. The method is denoted as “RandomWalk”. 4) Graph-based reranking proposed in [8]. The method is denoted as “VisualRank”. 5) Bayesian reranking proposed in [10]. The method is denoted as “Bayesian”. 6) Multimodal graph-based reranking proposed in [9], which is the state-of-the-art for graph-based reranking. The method is denoted as “MGL”. 7) Social ranking proposed in [30]. User information is utilized to boost the retrieval performance. A regu- larization framework which fuses the visual and views information is introduced. The method is denoted as “SocialRank”. 8) The proposed reranking method without the global reg- ularizer. That means we fix αG = 0. The method is denoted as “Ours (αG = 0)” 9) The proposed reranking method with assigning equiva- lent scaling parameters to two aspects. That means we Table 5 p values of the significance test comparison. The performance measure is MAP. Methods p values versus Ours (without Ak optimization) 3.16 ×10−3 versus Ours (αG = 0) 5.75 ×10−4 versus SocialRank 2.65 ×10−2 versus MGL 1.05 ×10−3 versus Bayesian 6.25 ×10−4 versus VisualRank 1.94 ×10−7 versus RandomWalk 1.78 ×10−7 versus BM25 1.15 ×10−5 versus Initial (No method) 1.45 ×10−6 1438 IEICE TRANS. INF. & SYST., VOL.E101–D, NO.5 MAY 2018 Table 6 MNDCG@d comparison of the video reranking performance when the initial search results obtained by each query equally contain 80% of noisy samples. Methods @5 @10 @20 @30 @40 @50 @60 @70 @80 @90 @100 Ours 0.766 0.741 0.715 0.664 0.605 0.607 0.587 0.609 0.635 0.638 0.656 Ours (without Ak optimization) 0.741 0.729 0.690 0.663 0.602 0.577 0.575 0.586 0.623 0.627 0.644 Ours (αG = 0) 0.702 0.664 0.698 0.629 0.600 0.592 0.556 0.529 0.524 0.534 0.522 SocialRank 0.748 0.714 0.696 0.641 0.589 0.586 0.584 0.595 0.632 0.635 0.653 MGL 0.767 0.731 0.698 0.677 0.621 0.587 0.585 0.596 0.633 0.636 0.654 Bayesian 0.739 0.689 0.698 0.643 0.591 0.577 0.575 0.586 0.623 0.627 0.644 VisualRank 0.448 0.538 0.536 0.563 0.596 0.552 0.572 0.591 0.586 0.585 0.579 RandomWalk 0.258 0.457 0.451 0.482 0.518 0.476 0.460 0.480 0.476 0.475 0.465 BM25 0.364 0.317 0.395 0.417 0.377 0.384 0.360 0.375 0.393 0.402 0.417 Initial 0.652 0.634 0.611 0.629 0.579 0.592 0.556 0.529 0.525 0.534 0.523 fix Ak = 1/σk. The method is the same as [16] and denoted as “Ours (without Ak optimization)” For fair comparison, the comparisons 3) - 7) were imple- mented by using the same video-level features as shown in Sect. 4.1. Figure 2 shows the top results with comparisons be- tween the proposed method and other methods for an ex- ample query “Rio 2016 Summer Olympics games”. It is obvious that our approach is superior to all compared meth- ods owing to our capability to rank the relevance videos by using multiple types of objects and multiple types of rela- tionships. The results of the MAP comparison are shown in Table 3. It can be seen that the proposed reranking algo- rithm has a better performance than the other methods. This demonstrates the robustness of our algorithm. Next, we show the video retrieval results obtained by using the proposed method and the other retrieval meth- ods. Table 4 demonstrates the MNDCG@5,10,20,30,40,50, 60,70,80,90,100 of different methods. Overall, our pro- posed graph-based reranking outperforms the other meth- ods, and the improvements are consistent and stable at dif- ferent depths of NDCG. Especially, using the proposed method, the value of the MNDCG@100 shows an improve- ment of 0.017 and 0.013 over SocialRank and MGL, which are the state-of-the-art methods in reranking, respectively. To verify whether the improvement of the proposed method is statistically significant, we further perform a sta- tistical significance test. Here, we conduct paired T-test at the 5% significance level between ours and all other meth- ods. The p values are shown in Table 5. The T-test is con- ducted over 15 queries. From this result, we can see that the improvement of the proposed method is statistically signifi- cant. Table 6 shows the simulation results to verify the ro- bustness to noisy videos. It is observed that the average of noise ratio, which means the ratio of relevant and irrel- evant videos, is originally 72% in our dataset. Thus, in this experiment, we randomly insert noisy videos from other queries’ ranking lists in the target initial ranking list so that the ratio of noise videos is 80%. Table 6 also demonstrates the MNDCG@5,10,20,30,40,50,60,70,80,90,100 of differ- ent methods. As shown in Table 6, our proposed graph- based reranking outperforms the other methods, and the im- Fig. 3 Performance comparisons between different center nodes detec- tion approaches for different parameter K in terms of MNDCG@100: (a) Ours, (b) PageRank, (c) HITS. provements are consistent and stable at most of different depths of NDCG. Thus, it can be seen that our method in- cluding the global consistency analysis can improve the ro- bustness to nosy videos. Next, in order to confirm the effectiveness of the pro- posed center node detection using a spectral clustering algo- rithm, we compare the proposed method with two popular representative node detection schemes including: 1) The PageRank algorithm [18] which was used in Google and was designed as a method for link analysis. The method is defined as “PageRank”. 2) The HITS (Hypertext Induced Topic Selection) algo- rithm [31]. HITS makes the distinction between hubs and authorities and computes them in a mutually rein- forcing way. The method is defined as “HITS”. Note that for implementation of PageRank and HITS, we also used the same video graph and its affinity matrix as those used in the proposed method. To further analyze the results, we compare the results of the different parameter K, which is the number of center nodes. Figure 3 depicts the performance of three types of methods, the proposed method, HITS and PageRank with different K ranging from 5 to 30 in terms of NDCG@100. From the results, we can see that the proposed method always gives better perfor- mance, and the best number for K is 10. Finally, we also test the sensitivity of the two parame- ters ρ and αL, which are used in the proposed method. We first set αL = 0.5 and vary ρ from 0.001 to 1. Figure 4 demonstrates the performance curve with respect to the vari- ation of ρ. We then set ρ = 0.1 and vary αL (αG = 1 − αL) from 0.1 to 0.9. Figure 5 demonstrates the performance curve with respect to the variation of αL. Here, we also illus- YOSHIDA et al.: GRAPH-BASED VIDEO SEARCH RERANKING WITH LOCAL AND GLOBAL CONSISTENCY ANALYSIS 1439 Fig. 4 Illustration of the effects of the parameter ρ in terms of MNDCG@100: (a) Ours, (b) Ours (without Ak optimization), (c) Ours (αG = 0). Fig. 5 Illustration of the effects of the parameter αL in terms of MNDCG@100: (a) Ours, (b) Ours (without Ak optimization). trate the performance of the methods based on the proposed method. From the results we can see that the performance of our approach will not be significantly degraded when the two parameters vary in a fairly wide range, and it can keep outperforming the other methods. From the above experimental results, we can verify the effectiveness of the proposed method using the local and global consistency analysis. Therefore, the proposed method improves the performance of graph-based reranking in video searches. 4.4 Complexity Analysis From the above solution process, we can see that its compu- tational cost mainly contains three parts, which are for de- tecting global consistency, updating r, and updating A{L,G}, respectively. First, the computational cost of the global con- sistency detection is O(K3 + KNt), where K is the number of clusters, N is the number of videos, and t is the num- ber of k-means iterations. In the graph-based reranking method, we sparsify W{L,G} by only keeping the l largest components in each row, where l is the number of neigh- bors for each video. From Eq. (8), we can see that the cost for updating r is O(Nl). For updating AL and AG, from the process in Algorithm 2, we can see that the cost is O(T1Nld2). Overall, the total time complexity for reranking is O(K3+KNt+T (Nl+T1Nld2)), where d is the dimension- ality of video feature vectors, and T and T1 are the iteration times of optimization, respectively. Besides theoretical analysis, we also test the time cost experimentally for the proposed method. It is implemented by using Python and run on a workstation with Intel Xeon E5-2620 v3, 2.4 GHz, 32GB memory in a single thread. By averaging the time cost of the all queries, our method can rank videos within 10s when N = 500 in a single thread. From the theoretical analysis and the experimental test dis- cussed above, we can see that the efficiency of the proposed method is acceptable for real applications. 5. Conclusions This paper has presented a method to improve performance of graph-based Web video search reranking. We first con- struct the video graph and detect global consistency over the graph by using a spectral clustering algorithm. From the clustering result, we extract center nodes, which are rep- resentative nodes of each cluster and then define the new affinity matrix and the global regularizer representing the similarity between center nodes and each node among the same video group. Secondly, by considering both local and global graph consistency, video search reranking is formu- lated as an optimization problem. The effectiveness of inte- grating local and global regularizers has been demonstrated. We have also compared our method with several existing reranking methods, and the results demonstrate the superi- ority of our method. Acknowledgments This research was financially supported by JSPS KAKENHI Grant Number 17K12687. References [1] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Trans. Pattern Anal. Mach. Intell., vol.22, no.12, pp.1349–1380, 2000. [2] A. Hauptmann, R. Yan, W.H. Lin, M. Christel, and H. Wactlar, “Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news,” IEEE Trans. Multimedia, vol.9, no.5, pp.958–966, 2007. [3] R. Yan, A. Hauptmann, and R. Jin, “Multimedia search with pseu- do-relevance feedback,” Proceedings of International Conference on Content-Based Image and Video Retrieval, vol.2728, pp.238–247, 2003. [4] A.P. Natsev, M.R. Naphade, and J. TešiĆ, “Learning the semantics of multimedia queries and concepts from a small number of examples,” Proceedings of the ACM International Conference on Multimedia, pp.598–607, 2005. [5] Y. Liu and T. Mei, “Optimizing visual search reranking via pairwise learning,” IEEE Trans. Multimedia, vol.13, no.2, pp.280–291, 2011. [6] L.S. Kennedy and S.-F. Chang, “A reranking approach for contex- t-based concept fusion in video indexing and retrieval,” Proceedings of the ACM International Conference on Image and Video Retrieval, pp.333–340, 2007. [7] W.H. Hsu, L.S. Kennedy, and S.-F. Chang, “Video search rerank- ing via information bottleneck principle,” Proceedings of the ACM International Conference on Multimedia, pp.35–44, 2006. [8] Y. Jing and S. Baluja, “VisualRank: Applying pagerank to large- scale image search,” IEEE Transanctions on Pattern Analysis and Machine Intelligence, vol.30, no.11, pp.1877–1890, 2008. [9] M. Wang, H. Li, D. Tao, K. Lu, and X. Wu, “Multimodal graph-based reranking for web image search,” IEEE Trans. Image Process., vol.21, no.11, pp.4649–4661, 2012. [10] X. Tian, Y. Yang, J. Wang, X. Wu, and X.-S. Hua, “Bayesian vi- sual reranking,” IEEE Trans. Multimedia, vol.13, no.4, pp.639–652, 1440 IEICE TRANS. INF. & SYST., VOL.E101–D, NO.5 MAY 2018 2011. [11] T. Mei, Y. Rui, S. Li, and Q. Tian, “Multimedia search rerank- ing: A literature survey,” ACM Computing Surveys, vol.46, no.3, pp.38:1–38:38, 2014. [12] J. He, M. Li, H.-J. Zhang, H. Tong, and C. Zhang, “Manifold-rank- ing based image retrieval,” Proceedings of the ACM International Conference on Multimedia, pp.9–16, 2004. [13] S.T. Roweis and L.K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol.290, no.5500, pp.2323– 2326, 2000. [14] M.A. Porter, J.P. Onnela, and P.J. Mucha, “Communities in net- works,” Notices of the AMS, vol.56, no.9, pp.1082–1097, 2009. [15] U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol.17, no.4, pp.395–416, 2007. [16] S. Yoshida, T. Ogawa, and M. Haseyama, “Graph-based Web video search reranking through consistency analysis using spectral clus- tering,” Proceedings of the IEEE International Conference on Mul- timedia and Expo, pp.1–6, 2016. [17] W.H. Hsu, L.S. Kennedy, and S.-F. Chang, “Video search rerank- ing through random walk over document-level context graph,” Pro- ceedings of the ACM International Conference on Multimedia, pp.971–980, 2007. [18] S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” Computer Networks and ISDN Systems, vol.30, no.1-7, pp.107–117, 1998. [19] L. Yang and A. Hanjalic, “Supervised reranking for Web image search,” Proceedings of the ACM International Conference on Mul- timedia, pp.183–192, 2010. [20] S. Zhang, M. Yang, T. Cour, K. Yu, and D.N. Metaxas, “Query spe- cific rank fusion for image retrieval,” vol.7573, pp.660–673, 2012. [21] W. Liu, Y.-G. Jiang, J. Luo, and S.-F. Chang, “Noise resistant graph ranking for improved web image search,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.849–856, 2011. [22] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-supervised learning using gaussian fields and harmonic functions,” Proceedings of the International Conference on Machine Learning, pp.912–919, 2003. [23] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, and B. Schölkopf, “Learning with local and global consistency,” Proceedings of the In- ternational Conference on Neural Information Processing Systems, pp.321–328, 2004. [24] B. Geng, D. Tao, and C. Xu, “DAML: Domain adaptation metric learning,” IEEE Trans. Image Process., vol.20, no.10, pp.2980–2989, 2011. [25] H. Li, M. Wang, and X.-S. Hua, “MSRA-MM 2.0: A large-scale web multimedia dataset,” Proceedings of the IEEE International Confer- ence on Data Mining Workshops, pp.164–169, 2009. [26] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learn- ing spatiotemporal features with 3D convolutional networks,” Pro- ceedings of the IEEE International Conference on Computer Vision, pp.4489–4497, 2015. [27] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re- thinking the inception architecture for computer vision,” Proceed- ings of The IEEE Conference on Computer Vision and Pattern Recognition, 2016. [28] C.D. Manning, P. Raghavan, and H. Schütze, Introduction to Infor- mation Retrieval, Cambridge University Press, New York, NY, USA, 2008. [29] S. Robertson and H. Zaragoza, “The probabilistic relevance frame- work: Bm25 and beyond,” Foundations and Trends in Information Retrieval, vol.3, no.4, pp.333–389, 2009. [30] D. Lu, X. Liu, and X. Qian, “Tag-based image search by social re-ranking,” IEEE Trans. Multimedia, vol.18, no.8, pp.1628–1639, 2016. [31] J.M. Kleinberg, “Authoritative sources in a hyperlinked environ- ment,” Journal of the ACM, vol.46, no.5, pp.604–632, 1999. Soh Yoshida received the B.S., M.S., and Ph.D. degrees in electronics and information en- gineering from Hokkaido University, Japan, in 2012, 2014, and 2016, respectively. He joined the Faculty of Engineering, Kansai University, in 2016, where he is currently an Assistant Pro- fessor. His research interests are Image/Video Semantic Analysis and Information Retrieval. He is a member of the ACM, the IEEE, the IEICE, and the ITE. Takahiro Ogawa received the B.S., M.S., and Ph.D. degrees in electronics and infor- mation engineering from Hokkaido University, Japan, in 2003, 2005, and 2007, respectively. He joined the Graduate School of Information Sci- ence and Technology, Hokkaido University, in 2008, where he is currently an Associate Profes- sor. His research interests are multimedia signal processing and its applications. He has been an Associate Editor of the ITE Transactions on Me- dia Technology and Applications. He is a mem- ber of the ACM, the EURASIP, the IEICE, and the ITE. Miki Haseyama received the B.S., M.S., and Ph.D. degrees in electronics from Hokkaido University, Japan, in 1986, 1988, and 1993, re- spectively. She joined the Graduate School of Information Science and Technology, Hokkaido University, as an Associate Professor in 1994. She was a Visiting Associate Professor with Washington University, USA, from 1995 to 1996. She is currently a Professor with the Graduate School of Information Science and Technology, Hokkaido University. Her current research interests include image and video processing and its development into semantic analysis. She has been the Vice President of the Institute of Image Information and Television Engineers (ITE), Japan, an Editor- in-Chief of the ITE Transactions on Media Technology and Applications, and the Director of the International Coordination and Publicity, Institute of Electronics, Information, and Communication Engineers (IEICE). She is a member of the IEICE, the ITE, and the ASJ. Mitsuji Muneyasu received the B.E. and M.E. degrees in system engineering from Kobe University in 1982 and 1984, respectively, and Doctor of Engineering degree from Hiroshima University, Japan, in 1993. In 1984, he joined Oki Electric Industry Co., Ltd., in Tokyo, Japan. From 1990 to 1991, he was a Research Assistant at the Faculty of Engineering, Tottori University, Tottori, Japan. From 1991 to 2001, he was a Re- search Assistant and Associate Professor at the Faculty of Engineering, Hiroshima University, Higashi-Hiroshima, Japan. In 2001 he joined the Faculty of Engineering, Kansai University, Osaka, Japan, where he is currently a Professor. His research interests include image processing theory and nonlinear digital signal processing. He is a member of IEICE, IEEE, and IPSJ.