timdoug's tidbits


2009-06-06

Installing Debian on a PlayStation 3 and running a Cell/SPU "Hello World"

What could be better than flexing a bit of matrix-multiply muscle? We're going to use Daniel Hackenberg's hand-coded asm routines, because they achieve ridiculous performance.

  1. Follow these instructions.
  2. Install some packages: apt-get install build-essential rpm libc6-dev-ppc64 gcc-4.3-spu spu-tools
  3. Grab the following RPMs off of CellSDK-Devel-Fedora_3.0.0.1.0.iso.
    1. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-3.0-16.ppc.rpm | cpio -id
    2. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-3.0-16.ppc64.rpm | cpio -id
    3. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-devel-3.0-16.ppc.rpm | cpio -id
    4. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-devel-3.0-16.ppc64.rpm | cpio -id
  4. Debian ships with a 32 bit libspe2, but we need a 64 bit version:
    1. wget http://ftp.debian.org/debian/pool/main/libs/libspe2/libspe2_2.2.80-95.orig.tar.gz
    2. Add -m64 to CFLAGS in make.defines.
    3. make
  5. Add spufs /spu spufs defaults 0 0 to /etc/fstab.
  6. mkdir /spe && mount /spe
  7. wget "http://www.tu-dresden.de/hpcadm/dcount/dcount.php?package=matmul&get=matmul.tar.gz"
  8. Untar matmu.tar.gz and patch along these lines:
    diff -urN matmul-old/COMPILE.sh matmul/COMPILE.sh
    --- matmul-old/COMPILE.sh	2008-02-29 06:30:08.000000000 -0500
    +++ matmul/COMPILE.sh	2009-06-06 17:12:39.000000000 -0400
    @@ -29,15 +29,15 @@
     ${CELL_BIN}/spu-gcc -o matmul_spu matmul_spu.o matmul_spu_simd.o
     
     # embedd SPE object file into PPE object
    -echo "${CELL_BIN}/ppu-embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o"
    -${CELL_BIN}/ppu-embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o
    +echo "${CELL_BIN}/embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o"
    +${CELL_BIN}/embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o
     
     # compile PPE code
    -echo "${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c"
    -${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c
    +echo "${CELL_BIN}/gcc -m64 -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c"
    +${CELL_BIN}/gcc -m64 -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c
     
     # link SPE and PPE object files together
    -echo "${CELL_BIN}/ppu-gcc -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2"
    -${CELL_BIN}/ppu-gcc -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2
    +echo "${CELL_BIN}/gcc -m64 -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2 -lpthread -L/home/timdoug/libspe2-2.2.80-95"
    +${CELL_BIN}/gcc -m64 -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2 -lpthread -L/home/timdoug/libspe2-2.2.80-95
     
     rm -f matmul_spu *.o
  9. Open up spu-top in another terminal to check up on things.
  10. ./matmul -s 6 -m 3072
Output:
timdoug@ps3:~/matmul$ ./matmul -s 6 -m 3072

Fast matrix multiplications on Cell (SMP) systems.
Copyright (C) 2007  Daniel Hackenberg, ZIH, TU-Dresden

Running matrix multiplication of 3072x3072 matrices using 6 SPEs...
Initializing arrays with random numbers... done!
Starting SPE calculations...
Done!

Performance results:
Performance of SPE  0: 25.35 GFLOPS
Performance of SPE  1: 25.35 GFLOPS
Performance of SPE  2: 25.36 GFLOPS
Performance of SPE  3: 25.36 GFLOPS
Performance of SPE  4: 25.36 GFLOPS
Performance of SPE  5: 25.36 GFLOPS
Aggregated performance for all 6 SPEs: 152.14 GFLOPS.

PPE-measured performance of matrix multiplication using 6 SPEs: 152.11 GFLOPS.
(of 153.60 GFLOPS theoretical peak at 3200 MHz clock frequency)
152 SP GFLOPS is crazy absurd for $400. The above taks ~7.5 seconds to run, but most of the time is in initializing the matrix with random values!

[/ps3] permanent link


© 2006-24 timdoug | email: "me" at this domain
So necessary