timdoug's interesting tidbits

started in 2009? ...how unfashionable.

2010-03-02

What CPUs do the Amazon EC2 High-Memory On-Demand Instances use?

I can't vouch for the Extra Large or Double Extra Large ones, but the Quadruple Extra Large ones use Nehalem-based Gainestown Intel Xeon X5550 @ 2.67GHz.

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping        : 5
cpu MHz         : 2666.760
cache size      : 8192 KB
physical id     : 0
siblings        : 1core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca popcnt lahf_lm
bogomips        : 5337.91
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

[/hpc] permanent link

The Authoritative List of Packages Needed to Comile the WikiReader Firmware

This is for a 32-bit Ubuntu Karmic schroot (as outlined in doc/Using-schroot.text) on a 64-bit Debian stable installation.

flex
bison
ocaml
python
python-gd
gforth
guile-1.8
gawk
php5-cli
php5-sqlite
sqlite3
xfonts-utils
cjk-latex
dvipng
qt4-qmake
libqt4-dev
sudo
wget

[/wikireader] permanent link

2009-12-28

Using SheepShaver on Mac OS X

Build as listed on the website. Video uses X and audio goes through CoreAudio, so you don't need SDL. A few other bits of information, though:

[/osx] permanent link

2009-12-05

Help! I have no /dev/dsp!

Just modprobe snd_pcm_oss and snd_mixer_oss, and restart udev. Then relive all your old Mac game meories through SheepShaver...

[/debian] permanent link

2009-12-01

How to type accents in X

You'll want to use something called the Compose Key. Good instructions on this site, but here's a quick installation summary:

Mmm, háčeky.

[/debian] permanent link

2009-11-02

My Terminal Emulator Setup in Debian

On my Debian boxes I use evilwm as my window manager, because it gets out of my way in just the way I like. Hence, I don't use gnome-terminal or kterm or ETerm or what have you; xterm has worked perfectly fine.

Nevertheless, I frequently spawn a lot of terminals, and the 5MB that each xterm process eats up (not including the 3MB bash does) add up after I fill a few virtual desktops with terminals. Xterm is also quite slow: cat'ing a 15K line banner output takes ~4 seconds on my netbook.

Hence, I went looking for replacements. Rxvt takes ~2MB per instance and cats the same file in ~1.2 seconds. Mrxvt takes ~3MB and ~0.5 seconds. Rxvt-unicode takes the cake, though, at ~3 MB and ~0.3 seconds. The real kicker, though, is its "daemon" mode, in which only once process is spawned for all "client" terminals, saving a lot of RAM. (Just hope it doesn't crash!)

Here's my setup:

  1. apt-get install rxvt-unicode-lite
  2. ~/.Xresources: (for a nice black-on-white xterm-like experience)
    URxvt*scrollBar: false
    URxvt*reverseVideo: true
  3. sudo update-alternatives --config x-terminal-emulator and select urxvtcd
Now I save 4-5MB of RAM for every terminal I have open, and when that gets to ~10 terminals that's not an insignificant amount of memory on a machine with 1GB of memory (and even more important on my lower-spec'd machines).

[/debian] permanent link

2009-10-18

How to set up the QT4 WikiReader simulator in Debian unstable

  1. Certain build scripts require Python 2.6, so find some way to install it (Debian experimental, by source, etc.)
  2. apt-get install libqt4-dev qt4-qmake netpbm gforth build-essential php5-cli sqlite3
  3. git clone git://github.com/wikireader/wikireader.git
  4. cd wikireader/samo-lib/include
  5. Copy config.h-default to config.h and uncomment the SIMULATOR #define.
  6. make qt4-simulator
  7. Breakage will occur. Be merciless with hackage! I had to change a few lines of C and asm when functions weren't defined, etc.
  8. mkdir image work
  9. make DESTDIR=image WORKDIR=work XML_FILES=xml-file-samples/japanese_architects.xml index parse render combine (per doc/QuickStart)
  10. make DESTDIR=image install
  11. cp host-tools/qt4-simulator/bin/wikisim image/
  12. cd image
  13. ./wikisim
  14. It will segfault soon enough, but you'll hopefully be able to get in a bit of reading on Japanese architects. Good luck!

[/wikireader] permanent link

2009-09-15

Installing SheepShaver and BasiliskII on Debian

I've tested SheepShaver with Mac OS 7.6.1 and Mac OS 8.1 with a 4MB OldWorld ROM. Sound works (even though ESD isn't used), video works, haven't tried ethernet. (all done on my current Debian unstable Intel Atom netbook.) The same instructions work for BasiliskII.

  1. Install the following packages: build-essential autoconf2.59 automake1.4 libxt-dev libxext-dev libxxf86dga-dev libxxf86vm-dev
  2. Follow the instructions here to download and the source from CVS, and make links as directed.
  3. Patch autogen.sh in src/Unix thusly:
    --- old/SheepShaver/src/Unix/autogen.sh 2007-06-13 14:09:05.000000000 +0200
    +++ ss2/SheepShaver/src/Unix/autogen.sh 2009-09-15 12:21:32.000000000 +0200
    @@ -40,13 +40,13 @@
     
     aclocalinclude="$ACLOCAL_FLAGS"; \
     (echo $_echo_n " + Running aclocal: $_echo_c"; \
    -    aclocal $aclocalinclude; \
    +    aclocal-1.4 $aclocalinclude; \
      echo "done.") && \
     (echo $_echo_n " + Running autoheader: $_echo_c"; \
    -    autoheader; \
    +    autoheader2.59; \
      echo "done.") && \
     (echo $_echo_n " + Running autoconf: $_echo_c"; \
    -    autoconf; \
    +    autoconf2.59; \
      echo "done.") 
     
     rm -f config.cache
  4. The output should look something like this:
    SheepShaver configuration summary:
    
    SDL support ...................... : none
    FBDev DGA support ................ : yes
    XFree86 DGA support .............. : yes
    XFree86 VidMode support .......... : yes
    Using PowerPC emulator ........... : yes
    Enable JIT compiler .............. : yes
    Enable video on SEGV signals ..... : yes
    ESD sound support ................ : no
    GTK user interface ............... : no
    mon debugger support ............. : no
    Addressing mode .................. : real
    Bad memory access recovery type .. : siginfo
    
    Configuration done. Now type "make".
  5. Patch sys_unix.cpp in src/Unix thusly:
    --- sys_unix.cpp.old    2009-09-15 12:28:54.000000000 +0200
    +++ sys_unix.cpp        2009-09-15 12:29:06.000000000 +0200
    @@ -883,11 +883,6 @@
                            }
                    }
     #endif
    -#ifdef CDROM_DRIVE_STATUS
    -               if (fh->cdrom_cap & CDC_DRIVE_STATUS) {
    -                       return ioctl(fh->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK;
    -               }
    -#endif
                    cdrom_tochdr header;
                    return ioctl(fh->fd, CDROMREADTOCHDR, &header) == 0;
     #elif defined(__FreeBSD__) || defined(__NetBSD__)
  6. Make.
  7. If SheepShaver gets angry about ERROR: Cannot map Low Memory Globals: Permission denied. try: sudo sysctl -w vm.mmap_min_addr=0
To be continued...

[/debian] permanent link

2009-08-25

Yet Another MSI Wind Hackintosh Tutorial

Just contributing my method. General outline: install OSX on an external drive from a MacBook, then copy the installation to the internal Wind drive.

Requirements:

Instructions:
  1. Boot the MacBook from the Leopard DVD.
  2. Partition the external drive, with a GUID scheme.
  3. Reboot and install the 10.5.7 update (newest 10.5.8 doesn't have good support at time of writing), reboot.
  4. Install CyberGreg's driver pack.
  5. Install Chameleon on the external drive.
  6. Boot the Wind from the external drive.
  7. Use SuperDuper to copy the OSX installation to the interal drive.
  8. Run Chameleon again on the interal drive.
Hurrah!

[/osx] permanent link

2009-08-21

When bit twiddling, make sure to use unsigned types.

This public service announcement was brought to you by the "Why The Fuck Don't My Bit Shifts Work" Department.

Goodness, that was amateur.

[/coding] permanent link

2009-08-09

GMABooster segfaults on my machine!

Run it is root.

[/debian] permanent link

2009-07-22

Benchmarking Intel's UXA, round 2

Debian unstable just upgraded to xserver-xorg-video-intel version 2.8.0. (released two days ago -- who says Debian is out of date?) This release is notable in that it removes support for EXA and DRI1, leaving only UXA and DRI2 for acceleration. Considering my previous benchmarking attempts, I thought it was only right to check the new drivers.

UXA on 2.8 is still slower than EXA on 2.7, but not by much. Notably, it's much faster than UXA on 2.7. I'm sure further releases will only lower these numbers further.

[/debian] permanent link

2009-07-21

Installing Grub 2 on Debian sid

Sid has recently moved grub to grub-legacy and made grub 2 the default. My installation went fine -- all I had to do was remove grub-legacy (because of some conflicting man pages) and install os-prober and recreate grub.cfg to recognize my WinXP partition.

Hooray new packages in Debian!

[/debian] permanent link

2009-07-16

Benchmarking Intel's UXA vs EXA

There's been a lot of development with regard to Intel's new UXA acceleration framework for its GMA integrated graphics chips. Benchmarks over at Phoronix have shown mixed results, so I decided to run some tests myself.

Tests run on an Asus Eee PC 900HA, with a 1.6GHz Atom N270 processor and integrated GMA 950 graphics, gtkperf 0.40, default Debian kernel 2.6.30, Xorg 1.6.2, and Intel driver 2.7.1. (i.e., an up to date unstable installation.)

So it looks like I'm sticking with EXA for now. If you have any tips as to boost either of these numbers, do let me know.

[/debian] permanent link

2009-06-23

How to create a custom Debian package

I'm nowhere close to having the knowledge of a true Debian Developer, but I've used the OS enough to want to create my own packages. In this circumstance, I wanted a newer version of evilwm (it's 1.0.0 in the repo, and 1.0.1 is the newest release) and I wanted to change the mouse button behaviour. The steps I followed (I'm sure there's a cleaner way -- do inform me).

  1. apt-get install devscripts build-essential
  2. apt-get build-dep evilwm
  3. apt-get source evilwm
  4. Patch/hack away...
  5. debchange -n and document your non-maintainer changes
  6. debuild -us -uc so we don't try to sign it
  7. dpkg -i the resultant package
Cheers!

[/debian] permanent link

2009-06-22

Getting an Elantech touchpad working in Debian squeeze (or: how to disable tap-to-click on an Eee 900HA)

Tap to click on a trackpad is a terrible idea. When I want to click, I'll hit the damn button myself, thank you very much. My ratio of unwanted to wanted tap-to-clicks is understandably undefined.

Three systems need to be configured for this to work: the kernel, the Xorg driver, and the X server itself.

  1. The kernel
    As documented here, recent kernels don't need to be patched; they just need CONFIG_MOUSE_PS2_ELANTECH enabled. Sadly, because of other issues the maintainers know about, it won't be enabled in mainline Debian kernels anytime soon. So either you'll have to build a kernel yourself, or just take the psmouse.ko I've created from 2.6.30, and pop it in the apropriate location in /lib/modules.
  2. The Xorg driver
    Debian testing ("squeeze" at time of writing) has an old version of xserver-xorg-input-synaptics that doesnt have the necessary Elantech patches. Unstable does, but pinning the binary has other dependencies that would pretty much require upgrading all of X, which I'd rather not do. So you can either pin the source file and build it yourself, or download and install the package I've made. Grab it here.
  3. The X server
    We just need to poke aorund xorg.conf a bit. What worked for me:
    Section "InputDevice"
            Identifier      "Configured Mouse"
    	Option		"CorePointer"
            Driver          "synaptics"
            Option          "Device"                "/dev/input/event10"
            Option          "Protocol"              "auto-dev"
            Option          "SHMConfig"             "true"
    	Option		"MaxTapTime"		"0"
            Option          "VertTwoFingerScroll"   "1"
            Option          "HorizTwoFingerScroll"  "1"
    EndSection
Good luck!

[/debian] permanent link

Running Debian testing on an Eee PC 900HA

Great news: almost everything works out of the box. I was concerned about hardware support, and struggling with wireless, so I just apt-pinned the latest kernel from unstable, and it works perfectly.

  1. X11 at 1024x600, DRI
  2. Sound
  3. Ethernet
  4. Touchpad w/ two-finger scrolling (although with annoying-as-hell tap to click -- more on that later)
  5. Wireless
  6. Camera (w/ Skype)
  7. Suspend to RAM
Protip: make sure to modprobe acpi-cpufreq for frequency scaling.

I'm using evilwm as my window manager, because I love how it gets out of my way and lets me do my work. I'd post a screenshot, but there's no eye-candy to show off... =) I've also had to hack Firefox into shape so it doesn't take up silly amounts of vertical pixels.

[/debian] permanent link

2009-06-18

How to Install Debian on an Eee PC 900HA

I just picked up an Asus Eee PC 900HA. It, currently, has the best mix of size, cost, and capabilities for what I want in a netbook. I'm keeping Windows on it to use Ableton Live and for "just in case" purposes. Here's documentation of what I've done to re-partition, keeping the provided Windows installation, and install Debian testing.

  1. Download and repartition using a GParted bootable USB drive. I used the Windows-based choice 3.
  2. Make sure to set the USB disk as higher priority than the internal drive, or else it'll boot from the HD every time.
  3. My drive came partitioned into four partitions:
    1. Windows boot (~80 GB)
    2. Windows free (empty, ~60 GB)
    3. Windows restore partition (~8 GB)
    4. Extra restore info? (~40 MB)
    You should be safe deleting all but the first partition and going from there, but it's not as if I'm wanting for space on this netbook, so I let 'em stay.
  4. Follow the instructions on this page of the Debian installation manual.
To be continued...

[/debian] permanent link

2009-06-17

Does Ableton Live work on an Eee PC 900HA?

Very much so. With two decks, equalizers, and a few other affects, CPU usage doesn't appear to go higher than ~20% or so. This may just become my new gig rig!

Just make sure to install QuickTime if MP3s are at all part of your setup. Cheers.

[/music] permanent link

How to configure and make gcc with GRAPHITE

Done on a standard Debian installation with a sane toolchain installed.

  1. Download and untar libgmp, configure with ./configure --enable-cxx, and install.
  2. Download, untar, patch (according to their website), configure, and install libmpfr.
  3. You know the drill by now: ppl.
  4. apt-get install automake1.7 (for cloog-ppl)
  5. Who said we were anywhere close to done? cloog-ppl. Its configure script is really picky, though; I had to explicitly use --with-gmp=dir and --with-ppl=dir.
  6. Grab gcc-4.4.0, untar, and configure. These are the options I used (following Debian's gcc -v lead): $ ../gcc-4.4.0/configure --enable-languages=c,c++,fortran --prefix=/home/timdoug/local/ --enable-shared --with-system-zlib --without-included-gettext --enable-threads=posix --with-gmp=/home/timdoug/local/ --with-mpfr=/home/timdoug/local/ --enable-mpfr --with-ppl=/home/timdoug/local/ --with-cloog=/home/timdoug/local/ (note the different object directory!)
  7. The configure script should spit out something like this, amongst its output:
    checking for correct version of gmp.h... yes
    checking for correct version of mpfr.h... yes
    checking for version 0.10 of PPL... yes
    checking for correct version of CLooG... yes
  8. make, and grab a sandwich.
  9. Curse profusely when it doesn't compile.
To be continued...

[/gcc] permanent link

2009-06-06

Installing Debian on a PlayStation 3 and running a Cell/SPU "Hello World"

What could be better than flexing a bit of matrix-multiply muscle? We're going to use Daniel Hackenberg's hand-coded asm routines, because they achieve ridiculous performance.

  1. Follow these instructions.
  2. Install some packages: apt-get install build-essential rpm libc6-dev-ppc64 gcc-4.3-spu spu-tools
  3. Grab the following RPMs off of CellSDK-Devel-Fedora_3.0.0.1.0.iso.
    1. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-3.0-16.ppc.rpm | cpio -id
    2. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-3.0-16.ppc64.rpm | cpio -id
    3. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-devel-3.0-16.ppc.rpm | cpio -id
    4. ps3:/# rpm2cpio /home/timdoug/libs/cell-libs-devel-3.0-16.ppc64.rpm | cpio -id
  4. Debian ships with a 32 bit libspe2, but we need a 64 bit version:
    1. wget http://ftp.debian.org/debian/pool/main/libs/libspe2/libspe2_2.2.80-95.orig.tar.gz
    2. Add -m64 to CFLAGS in make.defines.
    3. make
  5. Add spufs /spu spufs defaults 0 0 to /etc/fstab.
  6. mkdir /spe && mount /spe
  7. wget "http://www.tu-dresden.de/hpcadm/dcount/dcount.php?package=matmul&get=matmul.tar.gz"
  8. Untar matmu.tar.gz and patch along these lines:
    diff -urN matmul-old/COMPILE.sh matmul/COMPILE.sh
    --- matmul-old/COMPILE.sh	2008-02-29 06:30:08.000000000 -0500
    +++ matmul/COMPILE.sh	2009-06-06 17:12:39.000000000 -0400
    @@ -29,15 +29,15 @@
     ${CELL_BIN}/spu-gcc -o matmul_spu matmul_spu.o matmul_spu_simd.o
     
     # embedd SPE object file into PPE object
    -echo "${CELL_BIN}/ppu-embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o"
    -${CELL_BIN}/ppu-embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o
    +echo "${CELL_BIN}/embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o"
    +${CELL_BIN}/embedspu -m64 matmul_spu matmul_spu matmul_spu-embed64.o
     
     # compile PPE code
    -echo "${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c"
    -${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c
    +echo "${CELL_BIN}/gcc -m64 -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c"
    +${CELL_BIN}/gcc -m64 -W -Wall -O3 ${INC_PPU} -c matmul_ppu.c
     
     # link SPE and PPE object files together
    -echo "${CELL_BIN}/ppu-gcc -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2"
    -${CELL_BIN}/ppu-gcc -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2
    +echo "${CELL_BIN}/gcc -m64 -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2 -lpthread -L/home/timdoug/libspe2-2.2.80-95"
    +${CELL_BIN}/gcc -m64 -o matmul matmul_ppu.o matmul_spu-embed64.o -lspe2 -lpthread -L/home/timdoug/libspe2-2.2.80-95
     
     rm -f matmul_spu *.o
  9. Open up spu-top in another terminal to check up on things.
  10. ./matmul -s 6 -m 3072
Output:
timdoug@ps3:~/matmul$ ./matmul -s 6 -m 3072

Fast matrix multiplications on Cell (SMP) systems.
Copyright (C) 2007  Daniel Hackenberg, ZIH, TU-Dresden

Running matrix multiplication of 3072x3072 matrices using 6 SPEs...
Initializing arrays with random numbers... done!
Starting SPE calculations...
Done!

Performance results:
Performance of SPE  0: 25.35 GFLOPS
Performance of SPE  1: 25.35 GFLOPS
Performance of SPE  2: 25.36 GFLOPS
Performance of SPE  3: 25.36 GFLOPS
Performance of SPE  4: 25.36 GFLOPS
Performance of SPE  5: 25.36 GFLOPS
Aggregated performance for all 6 SPEs: 152.14 GFLOPS.

PPE-measured performance of matrix multiplication using 6 SPEs: 152.11 GFLOPS.
(of 153.60 GFLOPS theoretical peak at 3200 MHz clock frequency)
152 SP GFLOPS is crazy absurd for $400. The above taks ~7.5 seconds to run, but most of the time is in initializing the matrix with random values!

[/ps3] permanent link

2009-06-03

Help! I'm running out of stack space!

Edit /etc/security/limits.conf, open up a new shell, and have fun with ulimit.

[/debian] permanent link

Fun summer projects

So. With a summer ahead of me, there are a few things I want to get into now that I have the time. A list:

[/general] permanent link

Reason N why I love Python

Code like this: ",".join(nodes_list[:num_nodes])
Objects, strings (finally sane in 3.0), lists, list slices...

[/python] permanent link

2009-06-02

Reseting the root passwd on a PS3 Debian installation

I conveniently forgot the password for my year-old PS3 installation, but conveniently recognized that kboot is a little Linux installation in and of itself. Hence, mounting /dev/ps3da1, chrooting into it, and running passwd does the trick.

[/ps3] permanent link

2009-06-01

How to install rsh on Debian

Debian is aptly (hah!) fickle with regard to rsh -- by default, if you have OpenSSH installed, rsh is just a symlink to it. For my research I need rsh in order to get around the overhead that ssh incurs; security isn't an issue because the machines are on a private network anyway.

  1. apt-get install rsh-redone-client rsh-redone-server
  2. Add appropriate hostnames to ~/.rhosts
  3. Edit /etc/hosts.deny and add in.rshd in.rlogind: ALL
  4. Edit /etc/hosts.allow and add in.rshd in.rlogind: [hostnames] as necessary.
Have (insecure) fun!

[/debian] permanent link

High-Performance Computing and the PlayStation 3

I purchased a PS3 last summer with the express intent of running some simulations and benchmarks (and playing Gran Turismo 5 Prologue), but never got around to it (the first part, that is).

A few interesting links and PDFs:

I'm intrigued by CUDA as well. We'll see where that goes.

[/hpc] permanent link

How to Install and Run HPL with GotoBLAS on Linux

GFLOPs are fun. Here's how to determine your cluster's performance.

  1. (make sure you have a proper build system: gcc, make, etc. Debian: apt-get install build-essential)
  2. Install OpenMPI and gfortran. On Debian, it's as simple as apt-get install gfortran libopenmpi-dev openmpi-bin.
  3. Download and compile GotoBLAS. It's generally lauded as the fastest BLAS for (at least) x86 machines. Compilation is a simple ./quickbuild.64bit (or 32, as appropriate) in the root directory of the tarball.
  4. Download and untar HPL. I used 1.0a -- 2.0 wouldn't compile for me.
  5. Create a Make.[arch] in the root dir of the hpl folder, and configure accordingly. Appripriate example diff against setup/Make.Linux_PII_CBLAS:
    --- setup/Make.Linux_PII_CBLAS	2004-01-22 00:13:11.000000000 -0500
    +++ Make.timdoug	2009-06-01 00:23:29.000000000 -0400
    @@ -61,7 +61,7 @@
     # - Platform identifier ------------------------------------------------
     # ----------------------------------------------------------------------
     #
    -ARCH         = Linux_PII_CBLAS
    +ARCH         = timdoug
     #
     # ----------------------------------------------------------------------
     # - HPL Directory Structure / HPL library ------------------------------
    @@ -81,9 +81,9 @@
     # header files,  MPlib  is defined  to be the name of  the library to be
     # used. The variable MPdir is only used for defining MPinc and MPlib.
     #
    -MPdir        = /usr/local/mpi
    -MPinc        = -I$(MPdir)/include
    -MPlib        = $(MPdir)/lib/libmpich.a
    +MPdir        = /usr
    +MPinc        = -I$(MPdir)/include/mpi
    +MPlib        = -lmpi
     #
     # ----------------------------------------------------------------------
     # - Linear Algebra library (BLAS or VSIPL) -----------------------------
    @@ -92,9 +92,9 @@
     # header files,  LAlib  is defined  to be the name of  the library to be
     # used. The variable LAdir is only used for defining LAinc and LAlib.
     #
    -LAdir        = $(HOME)/netlib/ARCHIVES/Linux_PII
    +LAdir        =
     LAinc        =
    -LAlib        = $(LAdir)/libcblas.a $(LAdir)/libatlas.a
    +LAlib        = ~/GotoBLAS/libgoto.a
     #
     # ----------------------------------------------------------------------
     # - F77 / C interface --------------------------------------------------
    @@ -156,7 +156,7 @@
     #    *) call the BLAS Fortran 77 interface,
     #    *) not display detailed timing information.
     #
    -HPL_OPTS     = -DHPL_CALL_CBLAS
    +HPL_OPTS     = 
     #
     # ----------------------------------------------------------------------
     #
    @@ -173,7 +173,7 @@
     # On some platforms,  it is necessary  to use the Fortran linker to find
     # the Fortran internals used in the BLAS library.
     #
    -LINKER       = /usr/bin/g77
    +LINKER       = /usr/bin/gcc
     LINKFLAGS    = $(CCFLAGS)
     #
     ARCHIVER     = ar
  6. In hpl/bin/[arch], tweak HPL.dat. Example file:
    HPLinpack benchmark input file
    Innovative Computing Laboratory, University of Tennessee
    HPL.out      output file name (if any)
    6            device out (6=stdout,7=stderr,file)
    1            # of problems sizes (N)
    16384        Ns
    1            # of NBs
    128          NBs
    0            PMAP process mapping (0=Row-,1=Column-major)
    1            # of process grids (P x Q)
    1            Ps
    2            Qs
    16.0         threshold
    1            # of panel fact
    2            PFACTs (0=left, 1=Crout, 2=Right)
    1            # of recursive stopping criterium
    4            NBMINs (>= 1)
    1            # of panels in recursion
    2            NDIVs
    1            # of recursive panel fact.
    2            RFACTs (0=left, 1=Crout, 2=Right)
    1            # of broadcast
    1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
    1            # of lookahead depth
    0            DEPTHs (>=0)
    0            SWAP (0=bin-exch,1=long,2=mix)
    64           swapping threshold
    0            L1 in (0=transposed,1=no-transposed) form
    0            U  in (0=transposed,1=no-transposed) form
    1            Equilibration (0=no,1=yes)
    8            memory alignment in double (> 0)
    This is appropriate for a dual-core, 4GB RAM system. Important values to change:
    • N -- problem size. Start at ~1000 and ramp up until you hit the limit of your RAM. Note: it's quadratic (dimension of the matrices).
    • NB -- block size. 128 works well for me. Others suggest 80, 160, or 256. Experiment.
    • P and Q -- these multiplied should be the number of cores in your cluster. Certain configurations work better than others; do test.
    HPL.dat tweaking is a kind of black magic -- check the tubes for further information.
  7. Run GOTO_NUM_THREADS=1 mpiexec -np [num processes] ./xhpl
Using these instructions, I achieved 5.5 GFLOPs per core and 10 GFLOPs in total on a Pentium D 930 machine, 9.5 GFLOPs per core and 18 GFLOPs on a Core 2 Duo E6700 processor, and 50.6 GFLOPs on an Amazon EC2 "High-CPU Extra Large Instance" (dual quad-core Xeon E5345s):
domU:~/hpl/bin/timdoug# GOTO_NUM_THREADS=1 mpiexec -np 8 ./xhpl
============================================================================
HPLinpack 1.0a  --  High-Performance Linpack benchmark  --   January 20, 2004
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Labs.,  UTK
============================================================================
[[[snip]]]
============================================================================
T/V                N    NB     P     Q               Time             Gflops
----------------------------------------------------------------------------
WR01L2C4       20000   128     2     4             105.33          5.064e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0067492 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0065323 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0012449 ...... PASSED
Hooray!

[/hpc] permanent link


© 2006-9 timdoug. | email: timdoug at google's email service.
So necessary