Ben J. Christensen

Software Development and Other Random Stuff

Statistics for Why Web Performance Matters

Following are some good posts with further evidence of the impact of webpage/webapp responsiveness to the user experience, amount of usage and ultimately revenue.

Good introductory quote to the external links and images shown below:

There’s no longer any debate. There’s reliable, reproducible evidence that web page latency is directly tied to the bottom line. At Velocity, Microsoft, Google and Shopzilla made this abundantly clear in a series of awesome presentations: detailed, controlled testing proves that slower pages hurt the bottom line. In Google’s case, adding delay reduces the average number of searches a visitor does each day even after the delay is removed.

Regarding “smaller scale” sites more typical than Google and Bing:

The results of their analysis show how significant a reduction in page latency can be. In addition to reducing bounce rates, and increasing pages per visit & time on site, they found a 16.07% increase in conversion rates and a 5.50% increase in average order value.http://radar.oreilly.com/2009/10/watching-websites.html

External Links:

http://radar.oreilly.com/2009/10/watching-websites.html
http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
http://www.watchingwebsites.com/archives/proof-that-speeding-up-websites-improves-online-business

Good summary PDF showing metrics of performance impact.

conversion-rate-and-order-value

bing-delayimpact

Filed under: Performance, Production, User Interface

Speed of Thought

I’ve focused on performance for several years in my server-side and web application development – as much as I’ve been able to fit into the timelines. It has involved digging into minute details of Java and JVM tuning that rarely get explored by most java developers (from what I can tell anyways) and focusing on tuning the CSS, images, caching, GZIP and other settings of the front-end. It has generally paid off. Today my team operates servers processing millions of complex, dynamic, uncacheable web service transactions completing on average in around 250ms each (server side, not including network transport to client). I believe with further investment we could improve that even more.

I have read comments from companies such as Google and Amazon how the performance of an application can dramatically affect how much people use it. I agree. The slightest friction in searching makes me search less, or shop less, etc.

This past week I’ve been using the new iPhone 3GS which is at least 2x faster than the previous iPhone 2G I had. In some cases it’s 4x and 6x faster.

I already used the iPhone a lot. The increase in speed has further reduced the “friction” of use to the point that if I even have a thought of quickly looking something up or performing some other action, I am much more likely to do it.

On my last iPhone, I consciously chose to not bother at certain times because of the time it would take. Yes, I’m talking in seconds and even milliseconds here — but when it’s a “thought”, if the tool doesn’t work at the same speed, then it’s friction. Same goes for another application I use which involves looking up reference materials and documents. Before I kind of had to avoid “flipping around’ while someone was referring to things. It was actually faster to use the paper documents. Now, I can keep up or be faster with my iPhone than the paper version ‘users’. Therefore it encourages use.

The new user experience of using the iPhone 3GS, so significantly improved just by the performance improvement, has reminded me as a developer and architect how critical it is to design, plan for and develop to achieve high performance. Functionality isn’t enough — we should be aiming for the “speed of thought”.

Interestingly, Google has just launched a new site just for “speeding up the web“.

The following video shows “the experts” talking about how the human mind perceives changes of 100ms (one tenth of a second).

It’s my belief that this isn’t just a “nice to have” feature. If a product, service or application wants to be adopted and deemed “necessary” by its users, its performance must reduce friction as much as technically feasible to the point where it approaches or achieves “speed of thought”.

Filed under: Architecture, Performance, Production, User Interface

Uptime & Availability

A good blog entry that I’m pasting below about the cost/benefit of actually achieving 99.999% uptime and how ludicrous it is for most to even attempt let alone claim it.

http://blog.amber.org/2008/07/21/understanding-availability/

With the recent Amazon S3 outage of approximately 8 hours, there’s a lot of people blowing a lot of energy on lambasting Amazon for the downtime. While I think we’d all love to have systems that never go down, the probability of such occurring in the “real world” is relatively small, unless you’re running some esoteric hardware that most people aren’t. Before we go any further, let’s quickly break down what we mean by availability.

When most people talk about availability, they often using the marketing-speak method of speaking in “nines”. Five nines, or 99.999% is the “gold standard” of what most people talk about, but few people actually achieve. To clarify, here’s what it means when you convert percentages to actual time spans.

This means that the vaunted “five nines” allows for only a bit more than five minutes of downtime in a year, and only 6 seconds per week, which translates to less than 1 second per day. Quite honestly, you can’t even bounce an HTTP server in that period of time reliably. For example, if you’re running on a single server, and you have to reboot it more than once a year, you’ll likely never hit 99.99%, even if nothing else breaks.

So what does this mean when we talk about systemic availability? It means that putting all your eggs in one basket—regardless of the quality of the basket—is silly. While many people think about drives failing, and implement RAID or some other technique, and some think about CPU and memory, very few think or plan for electrical system failure or cooling failure. These kind of problems, which strike entire data centers, are not uncommon, and can not be waved away by saying that you have redundant infrastructure.

A vast majority of availability problems, however, are not hardware driven—even though that’s all people think about. They come from a few areas:

- Operator error
- Configuration error
- Software failure/bugs
- Networking
- Power and cooling

All of these cost serious money to solve. They are solved through processes and planning and not just traditional technical operations. In examining failure modes of systems I’ve worked on, a vast majority are preventable. They are due to someone making an unplanned change that isn’t properly vetted. They’re based on software configuration errors, and they’re based on upgrades that simply weren’t tested first.

The silver lining here is this: a vast majority of sites, companies, etc., do not need this kind of availability. The pursuit of high availability tends to be a mental masturbation exercise by people who want to spend money, but aren’t willing to do the cost-benefit analysis. Before undertaking anything above 99.9%, you really need to understand your business to a level that will allow you to make a rational decision about risks. Often, it is cheaper to rebate money to people than it is to fix the problem.

So what do I say to those who puff up and say “I can do better”? I say “no you can’t”. At least, not likely. The cost of running exceptionally high availability systems is not just hardware. It is operational costs. It is staffing, monitoring infrastructure, planning and operational processes. It doesn’t happen when you’ve only got one machine. It doesn’t even happen when you have 50 machines.

Don’t delude yourself any more about your own ability to run systems at that level than you delude yourself into assuming someone else can as well.

Filed under: Production

Quotes Which Made My Week

“We accidentally shot ourselves in the foot.
Then, when looking down the barrel of the gun to see where the bullet came from, we shot ourselves in the eye.
Then we let go of the gun, which dropped on (and broke) our good foot….”

“There’s no plan to upgrade systems except by forklift.”

“We’re running a website written with hammers and chisels on stone in an environment where gerbils are running on wheels in cages.”

Filed under: Fun, Production

Websphere Multi-JVM jsessionid

Works just like it should :-)

IBM Support Link

Two JVMs with different contexts but the same domain now use the same jsessionid so they can talk back and forth in the same browser without jsessionid schizophrenia.

Filed under: Architecture, Code, Production

Lessons Learned

[1] Don’t be accommodating to a client … otherwise, when all hell breaks loose none of that will matter and any issues resulting from attempts to accommodate last minute changes, bad decisions and architecture/environmental issues will all be the fault of the development team — instead of the client.

[2] If one wants to be accommodating, then everything must be done so with disclaimers in writing to significantly high enough ranking people — Director or VP — managers don’t cut it.

[3] Dictate every single aspect of an environments requirements and thresholds, even if it’s not directly related to the application and even (especially?!) if it’s thought to be assumed.

For example:

  • firewall behavior
  • load balancer logic
  • request throttling
  • JVM configuration
  • appserver configuration
  • relationship of application to other apps in environment
  • how many applications are running within a JVM or on a box
  • thread pools and configuration (throttling, min/max, growth)
  • db pool config
  • database server config (memory, available connections to all applications, not just the one being deployed)
  • network latency and error thresholds
  • environment capacity
  • disclaimers on environmental issues which will affect application

[4] Since virtually nobody has an actual replica of production, dictate that an environment must be made available that everyone agrees on as being the ‘spec’ to which the app will be built and signed off on. The client must then accept that since their prod environment does not match that spec, any risk of issues going into prod are theirs — such as bugs, crashes, delays, etc.

[5] If a client does not have adequate testing tools or refuses to provide opportunities for load testing (internal tools, Gomez, Keynote etc) then an official disclaimer must be communicated to the client (see above).

[6] Always get a baseline of a production environment before deployment:

  • thread dumps
  • application performance numbers
  • network errors/latency/performance
  • application logs

[7] Do not assume that because code has worked in other environments that it will work in your clients.

[8] The client doesn’t care if it works anywhere else besides their prod environment — even if it works in the dev environment they provided. See [4] above.

Filed under: Management & Leadership, Production

Twitter Updates

  • I *really* wish iBooks and Kindle would let me copy/paste text so I can quote a sentence or paragraph! Ridiculous that I can't. 22 hours ago
  • Great weekend (and ‘food tourism’) in Los Angeles. "Sooo fun!" as described by a certain short person when asked on the drive home. 23 hours ago
  • Small world! Just ran into a colleague from work - a 7hr drive from the office! 1 day ago
  • We made it to LA! Feels like returning home :-) 3 days ago
  • Peter Pan Baby http://twitpic.com/2kg6n4 5 days ago
View Ben Christensen's profile on LinkedIn