SIAM-SEAS 2005 Minisymposium MSIR
Information Retrieval

Friday 2:15-4:15

Organized by Amy Langville, North Carolina State University and College of Charleston, anlangvi@unity.ncsu.edu and Carl Meyer, North Carolina State University, meyer@ncsu.edu

  1. 2:15-2:40 Rebecca Wills and Ilse Ipsen, NCSU
    Different Approaches to Computing PageRank

    Computing PageRank scores is a major component of the Web search ranking software used by Google. The traditional algorithm determines PageRank by computing the dominant eigenvector of a Markov matrix via the Power Method. An alternate method formulates the problem of finding the PageRank vector as a solution to a linear system. To ascertain the possible advantages of viewing the computation of PageRank as a linear system problem, iterative stationary methods, such as Jacobi, Gauss-Seidel and SOR, and pre-conditioned Krylov subspace methods are applied to the linear system. Presented is a comparison of these methods.

  2. 2:45-3:10 Prakash Chanchana, NCSU Updating the stationary distribution vector for an irreducible Markov Chain

    Updating the stationary distribution vector of an irreducible Markov chain is the ongoing research problem. Using idea of aggregation/disaggregation to build the Iterative algorithm is one way to update the stationary distribution vector, especially, for the well known Google's PageRank application. The algorithm works for both states updating and elements updating problem.

  3. 3:15-3:40 Naomi Caldwell and Robert E. Funderlic, Department of Computer Science, NCSU
    The k-Means Clustering Method and Concept Vectors Based on the Centroid Method

    The k-means partitional clustering method is put into a general setting. As is known, a singular vector associated with the largest singular value has been used in place of the mean of a k-means cluster. In generalizations of k-means, sometimes called Òk-means like methods,Ó we call all concept vectors simply center objects of clusters and call all these methods k-center methods. We explore a fast discrete estimate for a center vector and present related linear algebra. These discrete center vectors are based on what statisticians call the centroid method for which the vectors are restricted to have components with absolute value one.

  4. 3:45-4:10 Amy N. Langville, NCSU and CofC, and Carl Meyer, NCSU
    Text Mining using the Nonnegative Matrix Factorization

    Text mining usually begins by using a low rank factorization of the familiar term-by-document matrix to reduce linguistic noise and reveal hidden patterns. One of the most common low rank approximations is the singular value decomposition (SVD), which factors the nonnegative term-by-document matrix in factors that are mixed in sign. A newer technique, the nonnegative matrix factorization (NMF), instead factors the term-by-document matrix into nonnegative factors, which provide several advantages over the SVD factors. We discuss popular nonlinear optimization algorithms for computing the NMF and use several examples to demonstrate the advantages of this new factorization.