18.1 BLAS

BLAS is an acronym for Basic Linear Algebra Subroutines. As the name indicates, it contains subprograms for basic operations on vectors and matrices. BLAS was designed to be used as a building block in other codes, for example LAPACK. The source code for BLAS is available through Netlib. However, many computer vendors will have a special version of BLAS tuned for maximal speed and efficiency on their computer. This is one of the main advantages of BLAS: the calling sequences are standardized so that programs that call BLAS will work on any computer that has BLAS installed. If you have a fast version of BLAS, you will also get high performance on all programs that call BLAS. Hence BLAS provides a simple and portable way to achieve high performance for calculations involving linear algebra. LAPACK is a higher-level package built on the same ideas.

Levels and naming conventions

The BLAS subroutines can be divided into three levels:

Each BLAS and LAPACK routine comes in several versions, one for each precision (data type). The first letter of the subprogram name indicates the precision used:

      S      Real single precision.
      D      Real double precision.
      C      Complex single precision.
      Z      Complex double precision.

Complex double precision is not strictly defined in Fortran 77, but most compilers will accept one of the following declarations:

      double complex list-of-variables
      complex*16     list-of-variables

BLAS 1

Some of the BLAS 1 subprograms are:

The first letter (x) can be any of the letters S,D,C,Z depending on the precision. A quick reference to BLAS 1 can be found at http://www.netlib.org/blas/blasqr.ps

BLAS 2

Some of the BLAS 2 subprograms are:

A detailed description of BLAS 2 can be found at http://www.netlib.org/blas/blas2-paper.ps.

BLAS 3

Some of the BLAS 3 subprograms are:

The more advanced matrix operations, like solving a linear system of equations, are contained in LAPACK. A detailed description of BLAS 3 can be found at http://www.netlib.org/blas/blas3-paper.ps.

Examples

Let us first look at a very simple BLAS routine, SSCAL. The call sequence is

      call SSCAL ( n, a, x, incx )

Here x is the vector, n is the length (number of elements in x we wish to use), and a is the scalar by which we want to multiply x. The last argument incx is the increment. Usually, incx=1 and the vector x corresponds directly to the one-dimensional Fortran array x. For incx>1 it specifies how many elements in the array we should "jump" between each element of the vector x. For example, if incx=2 it means we should only scale every other element (note: the physical dimension of the array x should then be at least 2n-1). Consider these examples where x has been declared as real x(100).

      call SSCAL(100, a, x, 1)
      call SSCAL( 50, a, x(50), 1)
      call SSCAL( 50, a, x(2), 2)

The first line will scale all 100 elements of x by a. The next line will only scale the last 50 elements of x by a. The last line will scale all the even indices of x by a.

Observe that the array x will be overwritten by the new values. If you need to preserve a copy of the old x, you have to make a copy first, e.g., by using SCOPY.

Now consider a more complicated example. Suppose you have two 2-dimensional arrays A and B, and you are asked to find the (i,j) entry of the product A*B. This is easily done by computing the inner product of row i from A and column j of B. We can use the BLAS 1 subroutine SDOT. The only difficulty is to figure out the correct indices and increments. The call sequence for SDOT is

      SDOT ( n, x, incx, y, incy )

Suppose the array declarations were

      real A(lda,lda)
      real B(ldb,ldb)

but in the program you know that the actual size of A is m*p and for B it is p*n. The i'th row of A starts at the element A(i,1). But since Fortran stores 2-dimensional arrays down columns, the next row element A(i,2) will be stored lda elements later in memory (since lda is the length of a column). Hence we set incx = lda. For the column in B there is no such problem, the elements are stored consecutively so incy = 1. The length of the inner product calculation is p. Hence the answer is

      SDOT ( p, A(i,1), lda, B(1,j), 1 )

How to get the BLAS

First of all you should check if you already have BLAS on your system. If not, you can find it on Netlib at http://www.netlib.org/blas.

Documentation

The BLAS routines are almost self-explanatory. Once you know which routine you need, fetch it and read the header section that explains the input and output parameters in detail. We will look at an example in the next section when we address the LAPACK routines.

Copyright 1995-7 by Stanford University. All rights reserved.