July 02, 2009

Video: Glenda Rhodes on GIMP from UTOSC 2008

As a final reminder of how completely awesome the Utah Open Source Conference is and why you should attend, we present a final video from UTOSC 2008: Glenda Rhodes presenting on GIMP for Photographers. This video is now available at OpenSourceTV.TV as a in-line Flash video and also as Ogg Theora and Xvid AVI downloads.

Glenda Rhodes

Glenda Rhodes

This presentation discusses using GIMP as an alternative to Photoshop for editing digital photos. Basic GIMP techniques (black/white, sepia, cropping) and some advanced techniques (layer masks for selective coloring, head-swapping, background blurring) are covered.

Glenda says, “I have been on the outskirts of the open source community ever since my husband made me send out the initial release of Ubuntu in our Christmas cards back in 2004. Currently I help run Utah OpenTech, a company dedicated to helping small businesses implement VoIP using open source technologies. I am a frequent user of GIMP (on windows and ubuntu), using it for editing photos and most recently digital scrapbooking.”

Automatically Building, Configuring, and Maintaining Complex Infrastructure
Servers designed for Linux

Image via Wikipedia

I've been heads down for the last few weeks getting a project out the door for a new customer. As I mentioned, this involves creating a virtual appliance. I decided, due to the circumstances of this deployment that the best option was the build an appliance factory that is capable of churning out new virtual machines at will. I'm going to describe how I did that in this post.

There are bascially three steps to creating a new image that runs the Kynetx Network Service (KNS):

  1. Create a new virtual machine
  2. Install packages and Perl libraries, create users, and otherwise configure the machine to run KNS
  3. Deploy the KNS code and test it

I was exporing Kickstart files for automatically installing Fedora and CentOS when someone pointed me at Cobbler. Cobbler is a Linux installation server that is simply amazing. It includes templated kickstart files, DHCP and DNS servers, the ability to manage multiple distros and repositories, and a database for keeping it all straight.

You start by importing distros and images, then define profiles that combine those with kickstart files, and finally create system definitions for each machine refering to profiles. I pnly needed one distro, one repo, and one kickstart, so I ended up with multiple systems hanging off of one profile. Once that's done, a command called koan (kickstart over a network) is used on the Dom0 machine to create virtual machines as defined by the system definitions cobbler.

I carefully edited the kickstart file to create just the machine I wanted with the right packages installed. At this point, I was building new VMs and taking them down 20-30 times a day as I tested this. That's the beauty of automation--tacking up a machine is just dirt simple.

I was lucky that I'd already invested considerable effort in Puppet recipes for building the environment that KNS need to run, so the second step was almost done. In fact, with just a few edits, I had Puppet building the new VMs up.

The third step was also one that I'd spent some time on. I have a custom deploy script (in Perl) that deploys KNS code based on server role and takes care of all the little details like setting up the configuration files for the various servers.

Every system is slightly different, but I think there's a definite distinction between machine setup, system configuration, and code deployment. The first creates a fairly standard environment, the second configures it to a specific purpose, and the third manages the code.

Some thoughts on all of this:

  • Some have asked "Why not put the code in Puppet (i.e. why use a deployment system)?" My answer is that code deployment is a dynamic process that I want more control of than puppet's automatic configuration provides. You could probably press Puppet into this, but it didn't seem to fit for me.
  • I had to create a simple YAML-based configuration file for KNS to pull everything together. YAML was the right answer for this. I chose to put that configuration file in Puppet, but I think I'll pull it into the deployment process in the future.
  • One missing piece is a database that everything can read system configurations from. Cobbler provides a light-weight one that may serve our purposes for a while, but something like iClassify is more flexible. Right now there's system information in Cobbler, Puppet, and the deploy script. There's a way to put additional attributes in Cobbler that we could use in other places.
  • All of this--Cobbler, Puppet, and the deploy script--were installed and running on a virtual machine that we call the factory. That one image, once installed in Xen is capable of creating as many copies of each type of machine we run as needed.
  • This can all be done on physical boxes too, of course, but I prefer the flexibility of virtual machines--even when only one will be running on the physical hardware. They can be moved, replicated, and managed with a lot more ease that physical hardware. Plus I have the ability to fire up new ones for QA or whatever without buying and installing new physical hardware. When a 8 core, 32 Gb box costs $4K, you can amortize that investment a lot with virtual machines.

Startups need to be lean. Achieving that goal in a compute-intensive business requires automation. Fortunately with tools like Cobbler and Puppet, automating the build-side of your infrastructure is not only possible, but fairly easy. We manage several dozen machines with only a few hours a week of effort. What's more, adding a new box for load or experimenting is as easy as typing a few commands and waiting 20-30 minutes.

Tags: kynetx system+administration cobbler puppet

website downage

Apparently, if you forget to  pay your ISP bills for three months, they'll cut off your access.  Hmm, whoops.  That's why my blog as well as *.larrythecow.org went down for a few days.

I signed up for a Linode (and did automatic billing), and I'll be moving the sites there rather soonly.  That'll save us all from other responsibility-addled issues.  Well, financial ones at least. :)

July 01, 2009

openFATE: Now with more open

It was just announced that openFATE, openSUSE’s feature tracking system, will now be open to non openSUSE members.  What this means anyone can submit new feature requests.  For more info: openFATE – Adding New Features Now Open for Everybody.

Example Image Upload with YUI Rich Text Editor 2.7.0
It's somewhat slow coming, but I've checked compatibility with the image uploader and YUI version 2.7.0. If you haven't read the original YUI Image Uploader page, start there. After that, you can use this page for an example getting the script to work with the latest YUI. The 2.6 image uploader is compatible with [...]
More to Come

Last fall I wrote about what I see coming down the pipe. I gave a glossed over explanation of why I fear we may see hyperinflation, and how I came to that conclusion.

Well, since then I have taken a university course on economics. I have a deeper and more solid understanding of those notions I had then. I also have the vocabulary to articulate it better. So, if you've taken economics and understand such terms as CPS, velocity of money, fractional reserves and money multipliers, then jump right in. If not, then I'll try to give a little guidance and some links to further reading.

The purpose of this particular post is two fold: to kick off the coming series and let my readers (really? I must mean myself) know that I haven't given up writing, and to act as an impetus, a commitment, to actually write the rest of it in a timely fashion.

0 comments
gnome-screensaver and alternative window managers

I've been using gnome-screensaver with awesome for a while without any problems. Unfortunately that all came to an end when GNOME 2.26 hit Debian Sid last week. Just so that no-one else has to dig for this, gnome-screensaver now uses gnome-session to determine idle time. gnome-screensaver will run without gnome-session, but the screensaver and locking mechanism will never kick in. Fortunately, there is an easy fix. I changed my

gnome-power-manager &

line to

gnome-session &

in ~/.xsession and everything works now.

June 30, 2009

tee functionality with python subprocess PIPEs
Sometimes we may want the output of a process to go to more then one place. In a shell, we would probably use tee(1).

One example application is dumping large databases with storage engine that can't give you a time of last modification, e.g. mysql, using innodb. Then I want the chksum of the dump, and if that is the same as the dump before, I'll unlink the older one and symlink it to the new dump. I also want to check the dump stream to make sure it has finished properly. And for good measure lets compress the stream on the fly. SURE, you could do all this after the dump completes but then you have to wait for a lot of disk I/O that the pipes avoid.

If you are like me, you'll want to use python. Note that if you take the stdout from one pipe and read it with more that one (n) other process, each process will only get a fraction of the data, ~1/n. So use a buffer to store the data and write it to each process that needs it.


__version__ = [int(x) for x in "$Revision: 1.2 $".split()[1].split('.')]
__author__ = "Kael Fischer <kael.fischer@gmail.com>"

.
.
.


syslog.syslog(syslog.LOG_ERR,"INFO: Starting mysqldump of %s on %s" % (db,host))
try:
# make output files
outFile = file(outfileName,'w')
digFile = file(digestName,'w')

# make pipes
dumper= subprocess.Popen(["mysqldump","--opt","--skip-dump-date",
"--single-transaction","--quick",db],
stdout=subprocess.PIPE)
grepper = subprocess.Popen(["grep", '-q','^-- Dump completed$'],
stdin=subprocess.PIPE,stdout=sys.stdout)
digester = subprocess.Popen(["md5", "-q"],stdin=subprocess.PIPE,stdout=digFile)
bziper = subprocess.Popen(["bzip2"],
stdin=subprocess.PIPE,stdout=outFile)

while dumper.poll() == None:
# use os.read NOT file.read
# file.read blocks.
# copy output from mysqldump to a buffer
buf=os.read(dumper.stdout.fileno(),5000)
# write buffer contents to 3 different processes
grepper.stdin.write(buf)
digester.stdin.write(buf)
bziper.stdin.write(buf)

# after dumper finishes,
# explicitly shut down input streams
# to other jobs
grepper.stdin.close()
digester.stdin.close()
bziper.stdin.close()

while grepper.poll() == None:
# wait if needed (shouldn't be)
time.sleep(1)

if grepper.returncode != 0:
raise RuntimeError("End of dump not found: grep returned - %s"%
grepper.returncode)

except BaseException, e:
# dump's no good, KeyboardInterrupt, whatever
syslog.syslog(syslog.LOG_ERR,"ERROR: %s : %s" % (type(e),e))
syslog.syslog(syslog.LOG_ERR,"ERROR: mysqldump of %s on %s failed" % (db,host))

# unsuccessful
# put files back where they were
# and exit

# exercise for reader


syslog.syslog(syslog.LOG_ERR,"INFO: mysqldump of %s on %s finished" % (db,host))

June 29, 2009

python subprocess based parallel processing
The python threading module is cool and when combined with rpyc and the Sun Grid Engine you can get a lot done really fast on a cluster. I will blog that later, but using one of the tricks I use with rpyc with the newish 'subprocess' module in the python standard library, multi-process based parallel processing seems simpler then ever now.

This is a jiffy to run a shell command on a number of hosts.



#!/usr/local/bin/python -u
#
# runAllOver.py
# Run a shell command on several machines using ssh
#
__version__ = tuple([int(x) for x in
'$Revision: 1.2 $'.split()[1].split('.')])
__author__ = "Kael Fischer"

import sys
import time
import optparse
from subprocess import Popen, PIPE


HOSTS = ["nfs1","nfs2","compute1","compute2","db1" ]

def main(sysargs):

oneLineUsage = "Usage: %prog [options] '<remote command>'"

op = optparse.OptionParser(
oneLineUsage,
version="%prog " + '.'.join([str(x) for x in __version__]))

(opts,args) = op.parse_args(sysargs)


try:
if len(args) == 0:
raise RuntimeError, "No remote command specified."
except Exception, eData:
print >> sys.stderr, ("\nUsage Error: %s\n" %eData.message)
print >> sys.stderr, op.format_help()
return 1

cmd = ' '.join(args)
print cmd

# make one running pipe object per host
pipes = [remotePipe(h,cmd) for h in HOSTS]

# report the results in turn
for i,p in enumerate(pipes):
print HOSTS[i] +':'
while p.poll() == None:
time.sleep(0.5)
print p.stdout.read()

return(0) # we did it!

def remotePipe(host,cmd,block=False):
p=Popen("ssh %s '%s'" %(host, cmd),shell=True,stdout=PIPE)
if block:
while p.poll() == None:
time.sleep(1)
return p

if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))


What's cool about that? Well all the processes are running on the hosts simultaneously and that makes it go fast.

Isn't that insecure? Could be, depending on the context. For ways to secure that kind of thing more, read this article by Brian Hatch: http://www.hackinglinuxexposed.com/articles/20021211.html. It was the basis for the intermachine communication in the PHABRIX and most especially prun was built using his authprogs as a starting concept (with greater flexibility and extra security layers added).
Twitter Weekly Updates for 2009-06-28

Powered by Twitter Tools.

June 28, 2009

Baby Announcement

We are very happy to announce the arrival of our second daughter, Elizabeth, today. She was born early this morning. 9lbs. 7oz (big baby!), 22″ long. She and mother are doing fine and resting.

elizabeth1

Other Points of Interest

  • No Related Post

June 27, 2009

Launching wxPython apps with an iPython shell
Suppose you want to run your fancy wxPython application but have a shell in the background to peek and poke at certains settings, help debug, and possibly even use an API that your program provides to automate tasks. iPython has built in wx support (as well as support for other GUIs and frontends). [...]

June 26, 2009

UT Blogger/Geek Dinner Photos: June 2009

The UT Blogger/Geek Dinner was a lot of fun despite the power outage about halfway through the meal. Here are some snapshots of the group.

UT Blogger/Geek Dinner Photos: June 2009 by LauraMoncur from Flickr

Of course, we had Charlie Oliver recreate his famous Geek Money Shot, this time with Mark Freestone!

UT Blogger/Geek Dinner Photos: June 2009 by LauraMoncur from Flickr

You can see all the photos here:

Better Backups
So, yesterday I talked about me casually wielding the sword of death that is `rm -rf`. Part of the issue has been that with some things I have daily backups. But for most stuff our backups are sporadic at best. So this is my question. I obviously need a better, and preferably FOSS, solution for backups. Any suggestions?

My first thought is to set up a NAS server that I can copy stuff to. Anything better?
Starting a High Tech Business: Selling the Third Deal
Kynetx Logo

I'm starting a new business called Kynetx. As I go through some of the things I do, I'm planning to blog them. The whole series will be here. This is the nineteenth installment. You may find my efforts instructive. Or you may know a better way--if so, please let me know!

I have a theory that the third deal matters more than the first two. Here's why.

The first time you sell your product--your first deal--is always exciting. But let's be honest, it could be a fluke. If you beat the bushes long enough you're likely to find someone who'll buy almost anything.

The second deal feels good because you at least can convince yourself that the first deal wasn't an accident.

But the third time you sell your product you have confidence around a few important things:

  • Proven repeatability - to get to the third deal you've proven that you understand what your selling and you're able to explain it in a way that people connect with.
  • Turn the crank - at this point you ought to be able to "turn the crank" operationally and deliver. If you're still doing one-offs by the third deal, you need to ask yourself what will change by the fourth, eighth or 100th deal? You can't achieve scale without operational excellence.
  • Know your price - On the first deal you're always a little unsure of the price you've set. Is it too high? Too low? Will you get laughed out of the room? By the third deal, you can go into pricing discussions with confidence. After all, two other customers have paid it--why won't everyone?

I've found that to get any deal you usually have to put your ego aside. The sweetspot is when you've found (a) something you're good at, (b) something you like to do, and (c) something someone will pay for. A deal implies (a) and (c). If you have to cave at all on (b), then your ego's likely to get in the way of the deal.

Putting ego aside is not always easy for techies to do. After all, you've spent years working on this and generally have dreams and even fantasies about how people will use it. Take a deep breath and realize: someone's willing to pay money for something you built. That's a good feeling. Go with it and enjoy the ride.

Tags: kynetx startup sales

June 25, 2009

Annotating Anything
Kynetx Logo

Today I released build 299 of KNS. There are three important updates to KRL in this build.

First, KRL now support literal hashes. Hashes are creating by enclosing comma-delimted name-value pairs in curly braces like so:

{"foo" : "bar", "fizz" : 3, "flop" : [1, 2, 3]}

Second, the annotate_search_results action has been modified to support two new configuration parameters:

  • results_lister - (defaults to "li.g, div.g, li div.res, #results>ul>li") - jQuery selector for finding relevant results to annotate. The default finds search results for Google, Yahoo, and Bing.
  • element_to_modify - (defaults to "div.s,div.abstr,p") - the jQuery selector for finding the element to modify within the results returned by the result lister. The default finds the main body of a search result..

These parameters give you the ability to find and annotate other items on almost any page that displays results. For example, the following action will annotate the paid search results in Google given an appropriately defined my_selector function.

annotate_search_results(my_selector)
 with results_lister = "#mbEnd li" and
      element_to_modify = "a#an1";

Third, user-defined datasources are now more flexible. Previously, the arguments to the datasource function were assumed to be name-value pairs presented as strings. The function put them together to create a QUERY string. That wasn't as flexible as needed for path-based APIs. So now the datasource takes a single parameter of either a string or a hash:

  • If the parameter is a string, then it is concatenated with the URL root given in the datasource declaration without modification. What you supply here, when appended to the datasource root URL must result in exactly the URL that you intend to call.
  • If the parameter is a hash, then the has is turned into a properly formatted HTTP QUERY string with names 7 values delimited by an equals sign (=) to create a name-value pair and each name-value pair delimited by an ampersand (&):
    pre {
      tweets = datasource:twitter_search({"q": query});
    }
    

This allows path-based API calls to be created as strings and QUERY string calls to be created by either creating the string or putting the name-value pairs in a hash and letting the function create the query string.

Tags:

CTO Breakfast Tomorrow
CTO Breakfast

Tomorrow is the CTO breakfast. It starts at 8am and goes to 9:30am. The location is, as usual, the Novell cafeteria. Sorry for the late notice; for some reason my calendar wasn't showing the Google calendar event. Luckily an email prodded me from my stoopor.

The CTO breakfast isn't just for CTOs, but also for those who aspire to be CTOs or are interested in building high-texh products. The discussion is open-format. We decide what to talk about when we get there. You're welcome to bring your topic and bring it up.

Here are the scheduled dates for upcoming meetings:

  • No breakfast in July
  • August 28, 2009 (Friday)
  • September 24, 2009 (Thursday)

I've created a Utah CTO Breakfast group at LinkedIn. You're invited to join.

I hope to see you there!

Tags: breakfast cto utah events

Django Windmill Tests – GSOC Progress Update

I feel that a status update is long overdue, but as the corpus of Windmill tests grows, so does the time it takes to run a complete instance of the regression suite. However, I do have some fun progress to report as well as a few questions/problems that are showing themselves now that all the fluff is over. First, let’s talk about the fun stuff!

I do have 3 of my major improvements/fixes/restructures to django.test somewhat complete. At the moment they are lacking most in documentation, a problem I intended to rectify later this week.

  1. Windmill Tests: Windmill test runners are nearly complete, threaded development server for AJAX widget testing complete.
  2. Code Coverage: Coverage.py support for runtests.py and management command. Extensible system is easily pluggable with other coverage systems.
  3. Test-Only Models: This is still a topic of discussion, but adding the property ‘test_models’ to a TestSuite will load and wipe the models. Has tests and limited docs.

My major TODO’s still outstanding:

  • Documentation!
  • Twill Runner Support (Utilizing the Windmill Threaded Server)
  • Windmill Admin Regression Tests (Healthy set of tests written, need to document and finish more)
  • Skip tests that are known to fail
  • Test new features/API’s

That’s it for now, more updates are available on the django-dev list!

Painful Lesson to learn
So I was SSH'd into one of our servers and decided that one of the directories needed to be deleted. Little did I realize that the directory contained symbolic links (either that or the whole directory was a sym link) and so when I ran my little 'rm -rf' a lot of stuff, rather a TON of stuff was gone.

So, I have spent a lot of time getting stuff back together and making things work. It has not been easy or fun. In fact it has been a rather big PITA.

So, lesson of the day. Check twice before doing a rm -rf. :(
Browser Wars

firefox-girlThe browser war isn’t a real war with machine guns and artillery shells and breaking Geneva conventions, it’s more of a virtual turf war for the world’s web surfers. Let’s review the history of this nifty interweb of ours:

  • 1989: CERN researcher Tim Berners-Lee launches the first web server*. You can see that web server (and I have) because it’s just sitting on a table in one of the CERN lobby buildings. You could even steal it if you wanted (I considered it), because there is very little security– those silly Swiss!
  • Mosaic, the first graphical browser, came out a few years later, in 1993, followed by it’s cousin Netscape Navigator in 1994. Remember?
  • Microsoft, with their market-busting, monopolistic fervor, introduced Internet Explorer (IE) in 1995 and began shipping it with all version of Windows. Because it came pre-installed, it quickly grew to near complete market domination.
  • SEVEN YEARS passes with little change. Everyone had been using IE. Then, WHAMMO! Firefox (www.getfirefox.com). Firefox is available on Linux, Mac and PCs. It’s fast, has lots of awesome plugins and TABS!
  • A year or so passes and Microsoft plays catch up and finally releases a tabbed version of IE. Other browsers appear (like Google Chrome.)

browser_warHow goes the fierce, non-violent, nerdy war? As of May 2009, the browser market share looks like:
IE7 21.3%
IE6 14.5%
IE8 5.2%
FF 47.7
Chrome 5.5%
Safari 3.0%
Opera 2.2%

So, what’s the next step? Better tabs. So good ol’ Mozilla (makers of Firefox) are having a competition to redesign the tabbing experience.

My good buddy, Grady, a world-class graphic designer, has created a new paradigm for tabbing, he calls it favitabs, (http://www.favitabs.com). Today starts the voting for the people’s choice award, and anyone can vote: http://design-challenge.mozilla.com/summer09/showcase.php

The voting ends of July 5th, so if you want to be part of history, if you think Favitabs is an awesome idea, then click on over to the design challenge and vote.

* Sir Tim also came up with the crazy http:// prefix for web addresses. He is said to regret that decision.

June 24, 2009

Writing Effictive PHP Caches with Memcached

Delayed, yet again, at the airport. Time to get this article written once and for all.

This is a rough draft, I still need to go through and proof read this article. However, several friends were anxious to read it, so here it is rough for now. :P

When your website or project grows, demands on your architecture and infrastructure can dramatically increase. You can then run into "bottlenecks", or parts of your project that cap out their abilities, and cause the rest of your application to slow down. One of the more common parts of your architecture to reach its limits is your database. There is a reason for this: its called ACID. While I won't get into the details, basically databases are awesome because of it's "ACID compliance." You can store information and get information easily. However, these requirements of being a good database can also require a lot of leg work for your server. So when you have hundreds, thousands, and even millions of queries executing on your server, it can require a lot of CPU, Memory, and I/O to do all the work.

This is where memcached comes in. I've implemented with great success in the past, and you can too. This article is not a step-by-step how-to setup memcached. There are plenty of articles that show you how here and here (to name a few). You also have the php documentation for memcache + php. Instead, we're going to discuss the theory behind creating an effective cache. Our memcache servers at Dating DNA run at about 99.9% efficiency (meaning 99.9% of all requests to our cache find a valid entry and doesn't hit our database). We'll cover a few basic concepts, and then talk about the two types of caching methods.

General Concept

What is a cache? To quote Wikipedia:

In computer science, a cache (pronounced /kæʃ/) is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (owing to longer access time) or to compute, compared to the cost of reading the cache. In other words, a cache is a temporary storage area where frequently accessed data can be stored for rapid access. Once the data is stored in the cache, it can be used in the future by accessing the cached copy rather than re-fetching or recomputing the original data.

So basically it is a collection of data that sits between your code and server. You typically check the cache first to see if it has a valid entry. If so, you use the information in the cache. If not, you generate the information manually. After generating the information manually, you put it in the cache. Ideally you want your cache to be full of good information to save your database the work again.

Why cache? In a perfect world where servers have no limitations you wouldn't need a cache. Your database would be able to handle trillions of queries without ever running into locking issues, slow responses, expensive joins, etc. However, we live in a realistic world where our databases have limitations. So we implement caches to help alleviate those limitations.

Databases aren't the only things to look for a source of what/where to cache. RSS feeds, web service responses, xml files, etc. can also be sources of load for your server. While through out this article I will reference databases a great deal, keep in mind they aren't the only source of data & load for your application.

Identifying What To Cache

So how do we do this? We analyze. There are many techniques that you can find good places cache. They key is to look for information that has some of the following characteristics:

  1. High Demand - If you have something that is used on every single page of your website, most likely you'll be find a way to cache that information and gain performance.
  2. Expensive - Some information is faster or easier than others to retrieve. If you find one particular query that takes longer, or requires more work, those are the queries to target first.
  3. Large - The query might be relatively quick, but it also might have a lot of data. Transferring that data takes time, and while usually quick, being large can also compound the problem when added to these other characteristics.
  4. Common - This information is common through out the site and is non-unique to a particular scenario.

The more characteristics a particular piece of information has, the greater performance boost you will get when you implement an effective cache for it. Lets look at a good and bad example of something to cache.

Good Example - Profile Information

I'll use this example later on as well, but profile information is always something good to cache. Typically this information is spread across several tables in your database and can have a lot to it. It is used often, and if you have a lot of users it can make those user tables very, very busy.

Bad Example - User Message

While your messaging system for your website might be used a lot, caching individual messages between users won't be very effective. An individual message is viewed only a handful of times. So you would fill up your cache with tons of data that would be retrieved very little. This is an example of a piece of data that appears like it would be good to cache, but its unique so its not needed often.

Techinque for Finding Potential Caches

To know what to cache requires you to know your application. There are several ways you can do this:

  1. Monitor Queries - Knowing what queries run when and how often is half the battle. There are several ways you can do this, however my one recommendation is a tool called "Jet Profiler". They have a free version and a paid version. The free version should give you most information you need. The paid version is more advanced. My rule of thumb: if you're site is popular enough and you're running into locking issues or other advanced problems, the paid version will pay itself off in a few hours.
  2. Output Queries - If you use some form of Database classes to handle your queries, most likely there is a way to log your queries and at the end of the page spew out the log. Only do this for developers in a development environment. But being able to see this information in context will help. Also, if you can list the time it takes for that query to execute next to it, that will help identify problems as well.
  3. Monitor Page Loads - Keep an eye on which pages take longer to load. The longer a page loads, the more likely the information on that page could benefit from caching.
  4. Monitor Web Analytics - Not only are the pages that take longer to load important, but also the most viewed pages.
  5. Brainstorming - As time passes by, as a developer, you should have a "feel" for your application.  Brainstorm with your team about the different parts of your application that could cache to decrease load.

Once you find parts to implement caching, there are two basic methods of caching you can implement. While there are other ways to implement caching, I feel like these are the two most common.

Timeout Cache / Output Caching

It it characterized by having information that is queries frequently, but also changes frequently. Typically its a summary of some sort. Here is an example of this type of cache:

output-cache

This screenshot is from WordPress.com's front page. It lists a series of blog posts on the WordPress.com network. Its a list of popular blogposts, either by very important people or hot topics. While I'm not positive, I'm pretty sure they use some form of algorithm to generate that list. WordPress.com has about 200,000 posts a day. Lest just guess and say that the WordPress.com home page is viewed 1,000 times a minute. Thats a lot of traffic, and imagine if each page view their PHP script would have to query 200,000 posts to determine what should be on that front page. That is a LOT of work. So how about every 15 to 30 minutes you re-generate that list and then cache it. Instead of generating that piece of information 30,000 times in half an hour, you generate it maybe once or twice.

The reason its called an Output Cache is you're typically storing the raw output instead of the information used to create the output. It is also an example of being a Timeout Cache because the majority of the time what causes it cache to expire and re-generate is time driven: every minute, hour, day, etc. In the example above, while a new post may be written by a VIP, or become a hot topic, its okay if it doesn't show up on the list immediately. If it takes 30 minutes, that okay.

When To Use a Timeout / Output Based Cache

  1. Summaries - If you have a summary list of products, news items, or new members that is viewed a lot. If it is even slightly expensive to generate the list, that is less work on your database because typically summaries have to read a lot of rows before cutting down the information.
  2. Not Time Critical - If the information is safe to be a little out of date, or not reflect new information immediately, then its a good idea to rely on a timeout for refreshing the data.

Event Driven Cache / Object Caching

Event driven caches and object caches differ from the previous scenario of caching. Information in this type of a cache is queried frequently, however it isn't changed frequently. On many websites, certain things are more complex. One website I work on is Dating DNA, an social networking dating website. A "person" on the website is a collection of rows on many tables. They have their general user information, things they display on their profile, pictures, address information, their answers on our dating survey, etc. An image from xkcd can show how "complicated" things can get:

xkcd-cache

Before implementing caching, many times I would have to join multiple tables together to get the desired information I needed to use. Contrasted between the previous example with WordPress.com's 200,000 posts a day, a single user on our website changes their information seldom. Maybe once a day, but more likely the change several things once every week or so. So this information, while used on almost every page of the website, changes rarely for website standards.

What a developer can do is create an object called ProfileInformation. This object would contain everything about a person. Then, you create a factory method to retreive the ProfileInformation instance for a user. If the cached data is missing, invalid, or expired the class will call the SQL queries to gather all of that information, store it in the class's variables, and store the class in the cache. If a valid entry is found, once again it returns the valid entry instead of doing all the hard work of putting it together.

This cache is similar to the one above, for the exception we can set the timeout to something high, like 7 days. If the user changes any of their information, after the UPDATE command finishes executing, we call ProfileInformation::DestroyCache($user_id). This will then mark the cache as invalid and the next time its requested the data will be regenerated. That is why I call this method of caching an "Object Cache" or "Event Cache": you store a complex object that the primary method of expiration is event driven.

Why event driven? It doesn't change very often, however, when it does change it is very important that it expire promptly. Imagine if a user changes their email address, but when they go back to view their account summary it still shows their old information? They won't think "Oh, they are probably caching this information to reduce load in their servers." They will think "What the heck? This stupid site is broken."

When to use Event Driven / Object Caching

  1. Infrequent Updates - If this information updates infrequently, then having it expire based on events will allow you to keep that cached data longer and less load on your servers.
  2. Inconsistent Updates - If the information updates inconsistently then use events to expire data. Example: one day it might be updated 5 or 6 times, but then go an entire week without being updated. You could safely make your cache timeout set to several days without problems.
  3. Time Critical - If having out-of-date information for a short period of time is not an option, then having each event that changes the information clear the cache is your best option.

Common Pitfalls w/ Caching

While caching can be a great solution to improve performance, if not engineered properly it can cause major headaches.

Query Caching

I've seen a lot of examples with caching where a developer caches a query. They will do something like this:

PHP:
  1. <?php
  2.  
  3. $sql = "SELECT * FROM users WHERE id = 5";
  4. $result = $mem->get(md5($sql));
  5. if(!$result)
  6. {
  7.     // Execute and get the result of the query in an array
  8.     $result = Database::FetchArray($sql);
  9.     // set the timeout to 4 hours
  10.     $mem->set(md5($sql), $result, 0, 4 * 60 * 60);
  11. }
  12.  
  13.  
  14. ?>

While it works, there is a problem. There isn't an easy way to expire the cache. That can cause you headaches down the road. If you want to make sure you can clear the cache from anywhere in your code, wrap the cache in a class. I highly recommend you wrap every cache in a class. Here is an example:

PHP:
  1. <?php
  2.  
  3. /** class.usercache.php **/
  4.  
  5. // Class to Wrap Around This Cachable Object
  6. class UserCache
  7. {
  8.     // Class that returns the current instance of the memcached object
  9.     static public function GetMemcache()
  10.     {
  11.         // We have a class that handles the memcache object and this function
  12.         // returns current instance of that class to us.
  13.         return MemcacheClass::GetObj()
  14.     }
  15.    
  16.     // A class that handles the generation of the key
  17.     // That way if we want to change the key structure
  18.     // we only have to handle it here
  19.     static public function GetKey($user_id)
  20.     {
  21.         $key = "UserCache_".$user_id;
  22.         return md5($key);
  23.     }
  24.    
  25.     // Function to get the cache
  26.     static public function GetCache($user_id)
  27.     {
  28.         // Get our memcache object
  29.         $mem = self::GetMemcache();
  30.        
  31.         // Get the key for this user
  32.         $key = self::GetKey($user_id);
  33.        
  34.         // See if its in the cache
  35.         $data = $mem->get($key);
  36.        
  37.         // If the cache didn't have a valid entry lets make one
  38.         if(!$data)
  39.         {
  40.             // structure the query
  41.             $sql = "SELECT * FROM users WHERE id = $user_id ";
  42.             // execute the query and get the result
  43.             $data = Database::FetchArray($sql);
  44.             // store the result in the cache
  45.             $mem->set($key, $data, 0, 4 * 60 * 60)
  46.         }
  47.    
  48.         // return the cache with the correct data
  49.         return $data;
  50.     }
  51.    
  52.     // Deleting/Clearing the cache
  53.     static public function DestroyCache($user_id)
  54.     {
  55.         // Get our memcache object
  56.         $mem = self::GetMemcache();
  57.        
  58.         // Get the key for this user
  59.         $key = self::GetKey($user_id);
  60.        
  61.         // delete it from the cache
  62.         $mem->delete($key);
  63.     }
  64. }
  65.  
  66. /** somewhere else in your code **/
  67.  
  68. // Get the user's information!
  69. $data = UserCache::GetCache(5);
  70.  
  71. /** the function that updates the user record **/
  72.  
  73. // .. update the user
  74.  
  75. UserCache::DestroyCache($user_id);
  76.  
  77. // .. continue execution
  78.  
  79. ?>

Over Caching / Only Solution

Once start caching, you can get a little overboard if not careful. It can also be used as a band-aid to cover up inefficient code that even with caching down the road will cause problems. You also have to face the fact that your memcached servers might go offline. Memcached should be used to speed up your website, but not be the only thing holding it up. If your caching servers go down, or you need to flush them, your website needs to be able to function without them. I know of people who have fried their database server after their caching solution went offline. Mecached is a great solution, but not a band-aid for sloppy design.

Live Example

Here is a Jing recording of the Dating DNA website and the several places we use caching. These aren't a full list of all the places we cache, but a few. You can view the full size video here.

Conclusion

Speed up your website with caching. While its not the solution for everything, it can decrease database load and page load times. If you have any questions or comments, please feel free to leave them below.

Related posts:

  1. Memcached: Simple, Effective, and Powerful Realizing once again I haven't written a blog post for...
  2. PHP Design – Biggest Database Oversights Over the last three years I've had the opportunity to...
  3. PHP Singletons, Sub-Classing, and HAS-A Relationships I've been very busy these last fews weeks and have...

Related posts brought to you by Yet Another Related Posts Plugin.

Python with a modular IDE (Vim)

On Thursday, May 9th, 2008 the Utah Python User Group decided to settle the debate that has plagued us developers since the beginning of time: If you were a programming language, what editor would you use?

I was tasked with showing Eclipse with the PyDev plugin in all its glory–but we all know–real men / developers don’t use IDE’s, so we are going to talk about using Python and Vim together, reaching a state of Zen that the Dalai LLama would be jealous of and establishing more Feng Shui than Martha Stewart’s Kitchen.

Freely jump between your code and python class libraries

There are 2 ways to add your ability to jump between python class libraries, the first is to setup vim to know where the Python libs are so you can use ‘gf’ to get to them (gf is goto file). You can do this by adding this snippet to your .vimrc:

python << EOF
import os
import sys
import vim
for p in sys.path:
    if os.path.isdir(p):
        vim.command(r"set path+=%s" % (p.replace(" ", r"\ ")))
EOF

With that snippet you will be able to go to your import statements and hit ‘gf’ on one of them and it’ll jump you to that file.

Continuing accessibility of the Python class libraries we are going to want to use ctags to generate an index of all the code for vim to reference:

$ ctags -R -f ~/.vim/tags/python.ctags /usr/lib/python2.5/

and then in your .vimrc

set tags+=$HOME/.vim/tags/python.ctags

This will give you the ability to use CTRL+] to jump to the method/property under your cursor in the system libraries and CTRL+T to jump back to your source code.

I also have 2 tweaks in my .vimrc so you can use CTRL+LeftArrow and CTRL+RightArrow to move between the files with more natural key bindings.

map <silent><C-Left> <C-T>
map <silent><C-Right> <C-]>

You can also see all the tags you’ve been to with “:tags”

Code Completion

To enable code completion support for Python in Vim you should be able to add the following line to your .vimrc:

autocmd FileType python set omnifunc=pythoncomplete#Complete

but this relies on the fact that your distro compiled python support into vim (which they should!).

Then all you have to do to use your code completion is hit the unnatural, wrist breaking, keystrokes CTRL+X, CTRL+O. I’ve re-bound the code completion to CTRL+Space since we are making vim an IDE! Add this command to your .vimrc to get the better keybinding:

inoremap <Nul> <C-x><C-o>

Documentation

No IDE is complete without the ability to access the class libraries documentation! You’ll need to grab this vim plugin. This gives you the ability to type :Pydoc os.path or use the keystrokes <Leader>pw and <Leader>pW to search for the item under the cursor. (Vim’s default <Leader> is “\”).

Syntax Checking

Vim already has built in syntax highlighting for python but I have a small tweak to vim to give you notifications of small syntax errors like forgetting a colon after a for loop. Create a file called ~/.vim/syntax/python.vim and add the following into it:

syn match pythonError "^\s*def\s\+\w\+(.*)\s*$" display
syn match pythonError "^\s*class\s\+\w\+(.*)\s*$" display
syn match pythonError "^\s*for\s.*[^:]$” display
syn match pythonError “^\s*except\s*$” display
syn match pythonError “^\s*finally\s*$” display
syn match pythonError “^\s*try\s*$” display
syn match pythonError “^\s*else\s*$” display
syn match pythonError “^\s*else\s*[^:].*” display
syn match pythonError “^\s*if\s.*[^\:]$” display
syn match pythonError “^\s*except\s.*[^\:]$” display
syn match pythonError “[;]$” display
syn keyword pythonError         do

Now that you have the basics covered, lets get more complicated checking added. Add these 2 lines to your .vimrc so you can type :make and get a list of syntax errors:

autocmd BufRead *.py set makeprg=python\ -c\ \"import\ py_compile,sys;\ sys.stderr=sys.stdout;\ py_compile.compile(r'%')\"
autocmd BufRead *.py set efm=%C\ %.%#,%A\ \ File\ \"%f\"\\,\ line\ %l%.%#,%Z%[%^\ ]%\\@=%m

You will have the ability to to type :cn and :cp to move around the error list. You can also type :clist to see all the errors, and finally, sometimes you will want to check the syntax of small chunks of code, so we’ll add the ability to execute visually selected lines of code, add this snippet to your .vimrc:

python << EOL
import vim
def EvaluateCurrentRange():
eval(compile('\n'.join(vim.current.range),'','exec'),globals())
EOL
map <C-h> :py EvaluateCurrentRange()

Now you will be able to visually select a method/class and execute it by hitting “Ctrl+h”.

Browsing the source

Moving around the source code is an important feature in most IDE’s with their project explorers, so to get that type of functionality in vim we grab the Tag Listplugin. This will give you the ability to view all opened buffers easily and jump to certain method calls in those buffers.

The other must-have feature of an IDE when browsing code is being able to open up multiple files in tabs. To do this you type :tabnew to open up a file in a new tab and than :tabn and :tabp to move around the tabs. Add these to lines to your .vimrc to be able to move between the tabs with ALT+LeftArrow and ALT+RightArrow:


map <silent><A-Right> :tabnext<CR>
map <silent><A-Left> :tabprevious<CR>

Debugging

To add debugging support into vim, we use the pdb module. Add this to your ~/.vim/ftplugin/python.vim to have the ability to quickly add break points and clear them out when you are done debugging:

python << EOF
def SetBreakpoint():
    import re
    nLine = int( vim.eval( 'line(".")'))

    strLine = vim.current.line
    strWhite = re.search( '^(\s*)', strLine).group(1)

    vim.current.buffer.append(
       "%(space)spdb.set_trace() %(mark)s Breakpoint %(mark)s" %
         {'space':strWhite, 'mark': '#' * 30}, nLine - 1)

    for strLine in vim.current.buffer:
        if strLine == "import pdb":
            break
    else:
        vim.current.buffer.append( 'import pdb', 0)
        vim.command( 'normal j1')

vim.command( 'map <f7> :py SetBreakpoint()<cr>')

def RemoveBreakpoints():
    import re

    nCurrentLine = int( vim.eval( 'line(".")'))

    nLines = []
    nLine = 1
    for strLine in vim.current.buffer:
        if strLine == ‘import pdb’ or strLine.lstrip()[:15] == ‘pdb.set_trace()’:
            nLines.append( nLine)
        nLine += 1

    nLines.reverse()

    for nLine in nLines:
        vim.command( ‘normal %dG’ % nLine)
        vim.command( ‘normal dd’)
        if nLine < nCurrentLine:
            nCurrentLine -= 1

    vim.command( ‘normal %dG’ % nCurrentLine)

vim.command( ‘map <s-f7> :py RemoveBreakpoints()<cr>’)
EOF

With that code you can now hit F7 and Shift-F7 to add/remove breakpoints. Then you just launch your application with !python % (percent being the current file, you can declare your main file here if its different).

Another tweak I use is to have my vim inside screen with a horizontal split, that way I can see the python interpreter and debug while still having vim there so I can easily fix my code.

Snippets

A great time saver with standard IDE’s is code snippets, so you can type a few key strokes and get a lot of code out of it. An example of this would be a django model, instead of typing out the complete declaration you could type ‘mmo<tab><tab>’ and have a skeleton of your model done for you. To do this in vim we grab the Snippets EMU plugin.

Check out a great screencast of snippetsEmu in action here

Emacs

Here is a great post on how to do the same with Emacs.

kick it on DotNetKicks.com
Engimo 2 and Wine

After beating Enigmo in the iPhone, I was eager for more. I’m a sucker for these type of games. I downloaded the demo of the Windows version of Enigmo 2, and, to my delight, it works great out of the box in Wine! It’s got tons more doodads and etc than the iPhone version, so I’d say it’s a perfect upgrade path if you like that game.

The installer works fine, but I recommend launching the Enigmo 2 application after you have installed it with a command similar to this:

wine explorer /desktop=Enigmo,1024x768 'c:\\Program Files\\Ideas From the Deep\\Enigmo 2 Supernova\\Enigmo 2.exe'

Enigmo 2 appears to have a maximum resolution of 1024×768, and this command will allow you to run it in a Window that size.

It even works on my HP Mini 1030, which doesn’t have the most fantastic video card in the world :)

Can Anyone Explain This Photo?


Can Anyone Explain This Photo? » Synthtopia
3 Uses for iPhone Screenshots

For all the iPhone users out there: You probably know you can take a snapshot of whatever you see on your screen:

  1. Briefly press the top and front buttons at the same time.
  2. The screen will flash white and you’ll hear a “snapshot” sound.
  3. A picture of your screen is now in your iPhone “Photos”.

I’ve found it extremely helpful to make screenshots, and I do it all the time. Here are a few reasons:

Remember an Interesting Part of a Podcast

If I’m driving and hear something I like in a podcast, I make a quick screenshot of the playback screen. When I get back to my computer, I can return to that spot in the podcast and take notes.

iphone_screenshot_podcast

Save a Point on a Map

Sometimes I want to “bookmark” a location on the map before looking up something else. A screenshot is a fast way to do this.

iphone_screenshot_map

Save a Website Address Without Interrupting Your Reading

Sometimes when I’m reading in Google Reader, I want to save the location of an article to read later. (I don’t want to leave Google Reader immediately because it has to entirely reload when I return.)

If you hold your finger on a link for a few seconds, a menu will popup with the address of the link. Sometimes I simply save a screenshot of the link, then hit Cancel and go back to my reading. Later I read the items I saved in my screenshots.

iphone_screenshot_opened_link

Screenshots can help you practice “ubiquitous capture” — capturing all notes, thoughts, and ideas, as they come to you, so you don’t have to keep them in your head.

June 23, 2009

iPod Touch 2G with 3.0 Update: First Thoughts

It took me a while, but I finally updated my iPod Touch to the new 3.0 update.  I liked the idea of copy and paste, and the Spotlight search was nice, but I did it entirely because I wanted to enable Bluetooth on my Touch, so I could use it for wireless music and for VoIP. 

The update was clean, and took a good half hour.  The download was about 15 minutes, and the actual update process was another 15.  Then for the next 15 minutes, my iPod backed itself up.  Yes, I paid for the update, and I’m glad I did.  Piracy is not something I condone at any level.  Be honest in your dealings.  If you don’t want to pay $10 for a software upgrade, try installing Linux on your iPod (yes, there is currently a project for it). 

Once it was finished, the first thing I tried to do was pair my Plantronics Voyager 510 headset I purchased several years ago with my iPod.  Bluetooth was easy to find on the iPod Touch in Settings, under General.  I enabled it, set my iPod in discovery mode, and then set up my headset in discovery mode.  And…nothing.  My iPod wouldn’t even see it. 

Puzzled, and concerned that perhaps the chip was dead on my iPod that I bought a long time ago, I thought I would check out the boards and see if anyone else was experiencing the same issue.  Report after report came of people either not able to pair their non-stereo headset, or the mic not working on a paired stereo headset. 

So this told me two things:  The headset needs to be an SD2P stereo headset (good thing I didn’t buy a headset for my wife quite yet), and the mic wasn’t working for those out there that already had it set up.  My heart sank, because this was the main reason why I wanted Bluetooth for my iPod Touch.  I want a WiFi phone that is easy to carry around, doesn’t ring when I don’t want it to, and let’s me access my voicemail on my own time. 

But the reports of the bluetooth mic were made primarily in conjunction with Skype, and when I opened Skype on my iPod, I got an incompatible OS Version error.  The app seemed to work just fine, but I wondered if the mic functionality was only being tested on an app that may not support it, instead of a limitation in the OS.

More searching brought up some discussions of the mic not working at all, even for Voice Memos, so my guess is the bluetooth mic support is not built into the OS update.  Perhaps it will be added in another minor update (one would hope), because the convenience would make owning the iPod Touch that much more of a benefit.

With regard to the other features, it is nice to have copy and paste now, though it took me a few seconds to figure out how to use it.  The Calendar update is something I have wanted for a long time (CalDAV supported at last!), but since I’m using Exchange at work, I can’t have other calendars set up.  Spotlight is really nice, because it finds anything on the iPod.  I had wondered how someone with many pages of apps would find the app they wanted.  Spotlight takes care of that. 

I had wondered when the shake to shuffle functionality would come to the iPhone and iPod Touch when it was added to the iPod Nano, though I don’t think it would be ideal for joggers.  In fact I have already heard of joggers and mountain bikers recommending the feature be turned off before starting.  The Parental Controls are a welcome feature.  That alone makes the $10.00 upgrade worth it!

I like the idea of syncing the Notes, though I still need to find the location to which they are copied.  I like the idea because it allows for spontaneous fits of writing to my iPod, and then I can copy the results to an app at another time. 

Lastly, Safari.  I had high hopes for this update, hoping that it would include full Java applet support, and not just Javascript.  Why?  Because I hold office hours in Wimba,