another benchmark tool… sigh. just what we needed.

By in monolog on March 4, 2010

another benchmark tool…. sigh. just what we needed.

actually, there can never be enough tools like these. each of them generates some numbers. sometimes there is a graph to go with the number. sometimes the numbers the tool generates will be different on each machine you run it on. and what is weird is weird about that is you can have two machines that are exactly the same yet both will generate different results. so right out of the gate there is a hint of skepticism generated. what numbers do you believe? certainly NOT the numbers in the brochure created by marketing. you need tools to generate your own tests or those run by uninterested 3rd parties.

benchmark tools weren’t created to lie to us. but sometimes they are used that way unintentionally. we all know that a faster is better, that bigger is better and that smaller and slower are not measures of success. marketing wants to present the best numbers possible. thing is, I’m really not interested in the numbers that marketing has to offer. because what they promise isn’t the same as I what I see in my day to day use. what we should avoid is generating “Frame Rate Bragging Rights” and focus on the question “can this tool to do the job I need it to do?”

the thing with testing is that it’s subjective. all of us carry some baggage into a test because we already know a few numbers going in: we know the theoretical top speed of a SATA2 drive, the number of triangles the GPU can fling or the clock speed of the CPU. a Windows users may bring opinions about using a Macintosh the same way that a Mac user might be shocked (and unbelieving) that the cheaper PC is actually faster and better. the things known however don’t translate into what is really happening.

there is another kind of subjective which is influenced by time of day and where you are. I know from testing colors on displays and printers that what I see at 12PM is very different from what I see at 12AM. our eyes lie to us depending on the kind of light and there is. what looks white in one environment will turn pink in another. it’s all about our brain knowing what white is supposed to look like so it corrects what you are seeing to make it so.

having tested and reviewed video cards I know that it’s tough to come up with meaningful commentary about the new thing. if you take old video card out replacing it with the new kick ass thing the first thing you notice is NOTHING changed! Windows still draw, menus still pull down, it’s as if you wasted $300. the robot scroll test [scroll from the top of a long document to the bottom] gives me an idea of screen refresh but the iWork apps don’t present data that way anymore. it’s more likely you are working from marked up paper going from page to page.

sometimes the experts spout off before they’ve done any actual testing. the recent example is the new integrated nVidia graphics processor found in some of the new Macs. just because its integrated doesn’t automatically it’s graphics are going to perform in a bad way. the “marketing” on Apple.com/iMac shows there is an improvement over the last iMac that had a “real” graphics processor. granted that GPU was kind of anemic to begin with. so it’s not really comparing Apples to Apples. even on the PC DIY side the nVidia motherboards with integrated graphics kicked butt compared to Intel’s offering. but everyone “knows” that “integrated sucks” so even though there was an advantage it was automatically dismissed by the “press”, “reviewers” and “geeks” because a graphics card is always better. never mind that it isn’t the case.

to make testing not subjective means that you have to have lots of different ways to test things. copying a file using the Finder and timing it with a stop watch will get you variation because of the human factor. copying files with a script will get you consistency in the test but who copies files using a script in the real world? using a binary that copies theoretical data is NOT the same as moving a folder that has text, video, photoshop files, fonts and whatever else is there. if there is an advantage that a new GPU core has that can be gained with a specific patch it might not exist… yet. there may be unrealized performance. this has been the case since forever. in fact sometimes there is no improvement because of the way the software was programmed. Castle Wolfenstein was frame locked to 60FPS. so even if you had the better X800 card it wasn’t going to frame any faster compared to the 9800. and that pissed off lots of people looking for FPS bragging rights.

one problem if using Applications for testing is that the results are revealing something else entirely. I’m surprised that Apple is using Motion as a benchmark. mostly because Motion is far from taxing the GPU. most of Motion’s particle effects are made with just a few triangles. meaning you’ll get the same results on every Mac you try the effect on. Motion was designed this way intentionally. and if you “push Motion to the limit” you are far more likely running the card out of memory instead of running out of triangles or pipelines. which means you are testing pushing stuff across the PCI bus and not taxing the GPU.

be wary of the X claims. I really have to question the 1.X, 2.X, 3.x faster claims and a graph showing the difference. let’s say it did 2FPS before and now it does 4FPS. that’s 2X faster right? see how this is misleading? and why is frame rate the measure of speed? if you are measuring hard drive speed you should be able to stop at the SATA drives top transfer rate. thus a SATA2 drive is 2X faster compared to a SATA drive. yet so many other factors come into play here from the controller, what’s being written and what else is going on.

the better benchmark is finding a frame rate like 30FPS (or whatever FPS you choose here because we’re looking for a good better best result) that runs on all the Macs. now turn on Anti-Aliased Edges, Fog, Shadows and other effects until 30FPS can no longer be achieved. then SHOW the screen shots. the “faster” GPU should be able to provide a better looking (visual) play experience.

in the case of video I don’t care how fast it goes as long as it can sustain the data rate promised. DV video is data locked to 3.5 megabytes a second. it would have to be a pretty pethetic RAID not to be able to keep up with that. having a faster bigger drive won’t make that video play any faster (that would be weird?). so video that is 720P and 1080P have data rates that as much as 57X more demanding! do I care about the numbers or is answering the question can you sustain that or not good enough?

another problem with benching with programs like Modo, Cinema4D or Motion Builder is that the GPU improvements are measured in 1/10th of a second or less. the user may report (or not) that they feel an improvement with a faster card. most of the time this false. I did a test were I told people “try this, now try that… okay this one has this card, that one has that card. that card is faster. now try this, try that. which one is faster?” everyone picked the “faster” box. the thing was both had the same video card in them. part two of the test did the same thing only this time a faster card was present. half were told the slow card was the fast card and the other half were told the truth. everyone once again picked the machine “with the fast card” as the faster Mac. amazing test. surprise right? everyone always buys into marketing.

finally, a tool like Cinebench only shows the potential of the card when it’s slammed to the wall. but nothing does that in day to day use. games sometimes take advantage of the GPU to the max but most games are written to work on a wide range of Macs not a specific top end model. we’ve seen some paint programs that take advantage of the GPU to do effects. but it’s the user not the GPU thats the slow component here.

benchmarks are tough things to interpret. sure, the “to the metal tests” are interesting but it’s only one perspective. its comparable to saying “this how fast fast it can go downhill with a tail wind driven by our best driver.” I want to see tests that use the same function calls for open, closing, writing and reading files that the app uses as there are rules that apps have to follow allowing them to share a code base allowing them to work on G4, G5 and Intel processors. without the abstraction the programmer would be buried under keeping it straight. thing is, those functions aren’t as fast as writing to the metal. and I’m fine if its a robot that doing the testing. because it’s just a number in the end that will tell me something about the system in front of me. in the end I really want to see benchmarks that reflect the way Applications work. not they some numbers with a sideline text that says “longer is better” or “shorter is faster”.

the big wrap up of this is that I can make anything slow. you know, load up a few more layers in After Effects, turn on fog, add another ray in the tracer or make 24 simultaneous copies as in start one start another repeat. that always works faster. the flip side of benchmarks is this reality “Ohhhhhh, 8 core 3GHz processor with 16G of RAM! we’re typing fast now!”