timdoug's tidbits

2009-06-06

Installing Debian on a PlayStation 3 and running a Cell/SPU "Hello World"

What could be better than flexing a bit of matrix-multiply muscle? We're going to use Daniel Hackenberg's hand-coded asm routines, because they achieve ridiculous performance.

Follow these instructions.
Install some packages: apt-get install build-essential rpm libc6-dev-ppc64 gcc-4.3-spu spu-tools
Grab the following RPMs off of CellSDK-Devel-Fedora_3.0.0.1.0.iso.
1. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-3.0-16.ppc.rpm | cpio -id
2. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-3.0-16.ppc64.rpm | cpio -id
3. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-devel-3.0-16.ppc.rpm | cpio -id
4. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-devel-3.0-16.ppc64.rpm | cpio -id
Debian ships with a 32 bit libspe2, but we need a 64 bit version:
1. wget http://ftp.debian.org/debian/pool/main/libs/libspe2/libspe2_2.2.80-95.orig.tar.gz
2. Add -m64 to CFLAGS in make.defines.
3. make
Add spufs /spu spufs defaults 0 0 to /etc/fstab.
mkdir /spe && mount /spe
wget "http://www.tu-dresden.de/hpcadm/dcount/dcount.php?package=matmul&get=matmul.tar.gz"

Untar matmu.tar.gz and patch along these lines:

diff -urN matmul-old/COMPILE.sh matmul/COMPILE.sh
--- matmul-old/COMPILE.sh	2008-02-29 06:30:08.000000000 -0500
+++ matmul/COMPILE.sh	2009-06-06 17:12:39.000000000 -0400
@@ -29,15 +29,15 @@
 ${CELL_BIN}/spu-gcc -o matmul_spu matmul_spu.o matmul_spu_simd.o
 
 # embedd SPE object file into PPE object
-echo "${CELL_BIN}/ppu-embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o"
-${CELL_BIN}/ppu-embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o
+echo "${CELL_BIN}/embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o"
+${CELL_BIN}/embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o
 
 # compile PPE code
-echo "${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c"
-${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c
+echo "${CELL_BIN}/gcc -m64 -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c"
+${CELL_BIN}/gcc -m64 -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c
 
 # link SPE and PPE object files together
-echo "${CELL_BIN}/ppu-gcc -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2"
-${CELL_BIN}/ppu-gcc -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2
+echo "${CELL_BIN}/gcc -m64 -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2 -lpthread -L/home/timdoug/libspe2-2.2.80-95"
+${CELL_BIN}/gcc -m64 -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2 -lpthread -L/home/timdoug/libspe2-2.2.80-95
 
 rm -f matmul_spu *.o

Open up spu-top in another terminal to check up on things.
./matmul -s 6 -m 3072

Output:

timdoug@ps3:~/matmul$ ./matmul -s 6 -m 3072

Fast matrix multiplications on Cell (SMP) systems.
Copyright (C) 2007  Daniel Hackenberg, ZIH, TU-Dresden

Running matrix multiplication of 3072x3072 matrices using 6 SPEs...
Initializing arrays with random numbers... done!
Starting SPE calculations...
Done!

Performance results:
Performance of SPE  0: 25.35 GFLOPS
Performance of SPE  1: 25.35 GFLOPS
Performance of SPE  2: 25.36 GFLOPS
Performance of SPE  3: 25.36 GFLOPS
Performance of SPE  4: 25.36 GFLOPS
Performance of SPE  5: 25.36 GFLOPS
Aggregated performance for all 6 SPEs: 152.14 GFLOPS.

PPE-measured performance of matrix multiplication using 6 SPEs: 152.11 GFLOPS.
(of 153.60 GFLOPS theoretical peak at 3200 MHz clock frequency)

152 SP GFLOPS is crazy absurd for $400. The above taks ~7.5 seconds to run, but most of the time is in initializing the matrix with random values!

[/ps3] permanent link