more info ⬇

@amattn

subscribe for more
stuff like this:

SW engineering, engineering management and the business of software



2013 09 03

Grand Central Dispatch (GCD): Summary, Syntax & Best Practices

Queue and A

Apple originally described Grand Central Dispatch (GCD) this way:

  1. Threading is hard
  2. Using GCD makes it simple and fun

Both statements are correct; here are some additional points:

Submitting Blocks to Queues

The primary mechanism of using GCD is by submitting blocks to queues or responding to events that pop out of queues. That’s it. There are different ways of submitting and many kinds of queues, some of them quite fancy. Ultimately, you are just scheduling tasks to be performed or performing tasks in response to events.

The magic part is that the concurrency aspect is handled for you. Thread management is automatic and tuned for system load. The usual concurrency dangers apply however: all UI must be done on the main queue and as always, check the documentation/googles to see if specific NS or UI bits are thread safe or not.

This post focuses on “submitting blocks to queues” but the buyer should be aware that libdispatch has more under the hood:

- Dispatch Groups        // coordinate groups of queues
- Semaphores             // traditional counting Semaphores
- Barriers               // synchronize tasks in a given concurrent queue
- Dispatch Sources       // event handling for low-level events
- Dispatch I/O           // file descriptor–based operations
- Dispatch Data Buffers  // memory-based data buffer

Creating or Getting Queues

It is worth repeating: the primary mechanism of using GCD is submitting tasks to queues.

The best way to conceptualize queues is to first realize that at the very low-level, there are only two types of queues: serial and concurrent.

Serial queues are monogamous, but uncommitted. If you give a bunch of tasks to each serial queue, it will run them one at a time, using only one thread at a time. The uncommitted aspect is that serial queues may switch to a different thread between tasks. Serial queues always wait for a task to finish before going to the next one. Thus tasks are completed in FIFO order. You can make as many serial queues as you need with dispatch_queue_create.

The main queue is a special serial queue. Unlike other serial queues, which are uncommitted, in that they are “dating” many threads but only one at time, the main queue is “married” to the main thread and all tasks are performed on it. Jobs on the main queue need to behave nicely with the runloop so that small operations don’t block the UI and other important bits. Like all serial queues, tasks are completed in FIFO order. You get it with dispatch_get_main_queue.

If serial queues are monogamous, then concurrent queues are promiscuous. They will submit tasks to any available thread or even make new threads depending on system load. They may perform multiple tasks simultaneously on different threads. It is important that tasks submitted to the global queue are thread-safe and minimize side effects. Tasks are submitted for execution in FIFO order, but order of completion is not guaranteed.

In Mac OS X 10.6 and iOS 4, there were only three, built-in (global) concurrent queues and you could not make them, you could only fetch them with dispatch_get_global_queue. As of Mac OS 10.7 and iOS 5, you can create them with dispatch_queue_create("label", DISPATCH_QUEUE_CONCURRENT). You cannot set the priority of concurrent queue you create yourself. In practice, it often makes more sense to use the global concurrent queue with the appropriate priority than to make your own.

The primary functions used to create or get queues are summarized here:

dispatch_queue_create       // create a serial or concurrent queue
dispatch_get_main_queue     // get the one and only main queue
dispatch_get_global_queue   // get one of the global concurrent queues
dispatch_get_current_queue  // DEPRECATED

dispatch_queue_get_label    // get the label of a given queue

A quick note on dispatch_get_current_queue: It is deprecated and it also didn’t always work in every case. If your implementation requires this, then your implementation should be refactored. The most common use case of this was to “run some block on whatever queue I am running on”. Refactored designed should pass an explicit target queue along with the block as arguments or parameters, rather than trying to rely on the runtime to determine which queue to submit to.

Adding Tasks to the Queues

Once you have queues of your very own, you can make them useful by adding tasks to them.

The primary mechanisms for do so are the following:

// Asynchronous functions
dispatch_async
dispatch_after
dispatch_apply
// Synchronous functions
dispatch_once
dispatch_sync

dispatch_async will submit a task to a queue and return. immediately. dispatch_after returns immediately, but delays until the specified time to submit the task. dispatch_apply also returns immediately and the task is submitted multiple times.

dispatch_sync will submit a task to a queue, and returns only when the task completes. dispatch_once will submits a task once and only once over the application lifetime, returns when the block completes.

In practice, I find myself using dispatch_async, dispatch_after and dispatch_once the most.

Example Code:

// add ui_update_block to the main queue
dispatch_async(dispatch_get_main_queue(), ui_update_block);

// add check_for_updates_block to some_queue in 2 seconds
dispatch_after(dispatch_time(DISPATCH_TIME_NOW, 2 * NSEC_PER_SEC), some_queue, check_for_updates_block);

// add work_unit_block to some_queue i times.
dispatch_apply(i, some_queue, work_unit_block);

// perform the only_once_block once and only once. 
static dispatch_once_t onceToken = 0; // It is important this is static!  
// wait for completion
dispatch_once(&onceToken, only_once_block);

// add blocking_block to background_queue & wait for completion
dispatch_sync(background_queue, blocking_block);

Queue memory management

GCD first became available in Mac OS X 10.6 and iOS 4. At that time, GCD objects (queues, semaphores, barriers, etc.) were treated like CFObjects and required you to call dispatch_release and dispatch_retain according to the normal create rules.

As of Mac OS X 10.8 and iOS 6, GCD objects are managed by ARC and as such manual reference counting is explicitly disallowed.

Furthermore, under ARC the following caveats apply:

  1. If you are using a GCD object within blocks that are used by the GCD object, you may get retain cycles. Using __weak or explicitly destroying the object (via mechanisms such as dispatch_source_cancel) are good ways around this. As of Xcode 4.6, the static analyzer does NOT catch this. Example:

    // Create a GCD object:
    dispatch_queue_t someQueue = dispatch_queue_create("someQueue", nil);
    // put a block on the queue, the queue retains the block.
    dispatch_async(someQueue, ^{
        // capture the GCD object inside the block,
        // the block retains the queue and BAM! retain cycle!
        const char *label = dispatch_queue_get_label(someQueue);
        NSLog(@"%s", label);
    });
    
    // You can use the typical __weak dance to workaround:
    __weak dispatch_queue_t weakQueue = someQueue;
    dispatch_async(someQueue, ^{
        __strong dispatch_queue_t strongQueue = weakQueue;
        const char *label = dispatch_queue_get_label(strongQueue);
        NSLog(@"%s", label);
    });
    
  2. Lastly, this little nugget was buried in man dispatch_data_create_map. The GCD functions dispatch_data_create_map and dispatch_data_apply create internal objects and extra care must be taken when using them. If the parent GCD object is released, then the internal objects get blown away and bad things happen. The __strong variables or the objc_precise_lifetime on the parent dispatch_data_t can help keep the parent object alive.

    // dispatch_data_create_map returns a new GCD data object.
    // However, since we are not using it, the object is immediately
    // destroyed by ARC and our buffer is now a dangling pounter!
    dispatch_data_create_map(data, &danglingBuffer, &bufferLen);
    
    // By stashing the results in a __strong var, our buffer
    // is no longer dangerous.
    __strong dispatch_data_t newData = dispatch_data_create_map(data, &okBuffer, &bufferLen);
    

Queues In Practice

Queues, like most powerful tools, can cause bodily harm if used inappropriately. Real world usage requires some discipline. Here are some general guidelines:

The second bullet above deserves further exploration. Because queues are lightweight, you can make lots and lots of them. It is better to have many specialized serial queues than to stuff many disconnected tasks into one or two “mega” serial/concurrent queues.

Typical “purposeful” queues look like this:

//used for importing into Core Data so we don't block the UI
dispatch_queue_create("com.yourcompany.CoreDataBackgroundQueue", NULL);

//used to prevent concurrent access to Somefile
dispatch_queue_create("com.yourcompany.SomeFile.ReadWriteQueue", NULL);

//used to perform long calculations in the the background
dispatch_queue_create("com.yourcompany.Component.BigLongCalculationQueue", NULL);

Practical queue usage typically involves nested dispatching:

dispatch_queue_t background_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, NULL);
dispatch_async(background_queue, ^{
    // do some stuff that takes a long time here...

    // follow up with some stuff on the main queue
    dispatch_async(dispatch_get_main_queue(), ^{
        // Typically updating the UI on the main thread.
    });
});

Here we launch a long-running task on the background queue. When the task is complete, we finish up by triggering a UI update to be performed on the main queue.

Also be aware of excessively nested dispatching. It hampers readability & maintainability and should be considered a somewhat pungent code smell.

Advanced Studies

If you have particular interest on any of the more quiet corners of GCD (dispatch groups, semaphores, barriers, etc.), let me know and I’ll write something up.

In the mean time, the usual sources of knowledge apply, documentation available on the web and via Xcode as well as WWDC talks on GCD and blocks.

2013 09 10

Principles of Scalable Architectures

Most of these rules are broken in the name of performance. Make sure the tradeoff is worth it. If you overcomplicate your architecture it means you will have more components where something can go wrong.

When building or replacing a component, the general rule of thumb is to plan for 2 orders of magnitude of growth. That gives you room to grow without over planning.

When choosing components to replace, you just find the bottleneck, widen and repeat.

2013 09 11

Notes & Summary of Gail Goodman’s The Long Slow SaaS Ramp Of Death

If you’ve every had the thought: “I know, I’ll make a SaaS product” then Gail Goodman’s 2012 talk on the Long Slow Saas Ramp of Death is for you.

Transcript and more here:

http://businessofsoftware.org/2013/02/gail-goodman-constant-contact-how-to-negotiate-the-long-slow-saas-ramp-of-death/

Here are my notes and summary of that presentation. Any errors are likely induced by me. I don’t know Gail, nor am I a customer of her company, but I loved this presentation.

The One Paragraph Version

The long, slow SaaS ramp of death is that it just takes a long time to get to minimum critical mass.

The basic premise is that you may never see hockey stick user growth, but with SaaS it might not matter. If your customers are saying you have something and you have some growth, then over time (possibly a long and challenging time), the math of SaaS usually works out in your favor.

Another initial point she made was to think of SaaS as more like a flywheel than hockey stick.

Avoid Mirages

Don’t fall for the mirages. There are lots of different components that seem like they will boost you to some next level (partners, new feature, free, viral, seo, etc). Rather, your mindset should be more like SW development in that there is no silver bullet. No one event or feature will induce hockey stick growth, instead you will end up working on a thousand cumulative optimizations.

(08:40) Instead, you have to work the funnel by

… making sure that when someone tries or buys your product, they have a ‘wow’ experience, they get quick to an understanding and an outcome that blows them away.

The Funnel

(12:22) “I would argue that most of those little things will happen if you continue to view your business from your customer or user inward rather than from the metrics you want to change outward.”

Try to optimize the customer outcomes first. Spending significant time optimizing your landing page before nailing a feature set that delivers value is an inversion of the process.

(12:40) “The key to changing those internal metrics (funnel), is by starting with the view from your customer looking at your business and your experience. Not by looking at your metrics & trying to change your customer’s behavior.”

Sone solutions were decidedly old fashioned. Radio and free seminars worked for Constant Contact because small business owners often have radios on during the work day.

At the top of the funnel (landing pages, ad buys, etc.): Test, Scale, Tune & repeat.

Try Understand why customers weren’t flocking to you.

(22:35) “Quick to Wow.”

It’s all about optimizing the quick to wow.

(23:13)

Turns out the number one way to get them to stay, is to get them successful early.

Human Nature: When faced with a learning curve, humans tend to learn just enough to get the job done, then stop learning. It is hard to get customers to look at new features.

(23:53) Middle & Bottom of the funnel: Measure, test repeat.

(25:00) Innovate everywhere. Not just on the tech side of the house.

Lifetime value

(26:20) A simple formula for calculating LTV (Lifetime Value of a single customer). It’s one over your retention rate. In the case of Constant Contact, their average monthly retention is 2.2% a month. One over 2.2% is ~45 months.

As an aside, different industries and regions seem to have different acronyms for this concept. I’ve seen LTV, LCV, CLTV, CLV & LTCV.

Best blog post: David Skok: SaaS Metrics, A guide to measuring and improving what matters.

How did we survive?

“Operating at cash level: Only eating what we were killing.”

All spare cash going into marketing spend because at that time, CAC (Customer Acquisition Cost) was ~300 and LTV as ~1650. we knew that we could turn down marketing and be instantly profitable. In other words the money generation machine was working at full capacity.

The Inflection Point

(30:50) Inflection point came at the time where the combination of understanding 1. 2. 3. all came together to the point where we were confident scaling the business:

  1. channels at the top of the funnel
  2. funnel conversion
  3. LTV

When to give up

(33:23)

If your customers are telling you you’ve got something and your metrics are continuously improving, stay on the long slow ramp of death. But if either of those aren’t true, it is probably time to parachute off.

Q&A

“You shouldn’t have to build your own metrics anymore.”

And she save one other great nugget for the very end:

You don’t own the gas pedal on word of mouth. The best gas pedal is a great experience.

2013 09 19

Tutorial: PostgreSQL Usage and Examples with Docker

So I’m a loyal acolyte in the church of docker. I also have this little schoolgirl crush on PostgreSQL. Here’s how you can combine both into a crime-fighting dream team.

The Long, Instructive Way

Just the basics:

Spin up a container, install a text editor and snapshot an image:

sudo docker run -i -t ubuntu:precise /bin/bash

Inside the container install a text editor (because the default precise image doesn’t come with one installed):

apt-get update
apt-get install vim-tiny
exit

Snap an image. Your name is probably not amattn, however just for a moment, pretend otherwise. I know it is unpleasant, but only for a short while. I called my image precise-vim but you can call it dinglemuffin if you really want to.

sudo docker commit CONTAINER_ID amattn/precise-vim

Install the default PostgreSQL

Again with the spinning up of a new container:

sudo docker run -i -t amattn/precise-vim /bin/bash

Do the basic install. The assist with the repo info is credited to https://wiki.postgresql.org/wiki/Apt

apt-get update
apt-get install -y wget
wget -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | apt-key add -
echo "deb http://apt.postgresql.org/pub/repos/apt/ precise-pgdg main" > /etc/apt/sources.list.d/pgdg.list
apt-get update
apt-get install -y postgresql-9.3 postgresql-client-9.3 postgresql-contrib-9.3
exit

Just a note, the above will install postgres-9.3.X where X is the latest. At the time of this update (Early Jan 2014), that is 9.3.2, but obviously, that may or may not be the case when you read this.

Again with the snapping of an image. Just a note here, I got odd failures when my image names had capital letters (as of docker 0.6.1).

sudo docker commit CONTAINER_ID amattn/postgresql-9.3.2

Container Cleanup

You can list all containers with docker ps -a We don’t actually need the containers that we used to create images. Once we have images, we simply spin up totally new containers, while the sad, lonely ones we used to create the images get rm’d.

sudo docker rm CONTAINER_ID CONTAINER_ID ... CONTAINER_ID

Typical configuration from here:

Here’s the magic part. We want to configure PostgreSQL to put its data in the container’s a directory at the root level called /data. This folder is shared with the docker host. This way, we can use any container configured to look at /data with a persistent file on the host. Our data becomes decoupled from our container. In this example we use $HOME/postgresdata, but feel free mount any host directory you like.

mkdir -p $HOME/postgresdata
sudo docker run -v="$HOME/postgresdata":"/data"  -i -t -p 5432 amattn/postgresql-9.3.2 /bin/bash

First setup our .conf & .hba files:

cp /etc/postgresql/9.3/main/postgresql.conf /data/postgresql.conf
cp /etc/postgresql/9.3/main/pg_hba.conf /data/pg_hba.conf

Use our custom data directory (/data/main) & .hba file:

sed -i '/^data_directory*/ s|/var/lib/postgresql/9.3/main|/data/main|' /data/postgresql.conf
sed -i '/^hba_file*/ s|/etc/postgresql/9.3/main/pg_hba.conf|/data/pg_hba.conf|' /data/postgresql.conf

Create /data/main/ and fill it with stuff.

mkdir -p /data/main
chown postgres /data/*
chgrp postgres /data/*
chmod 700 /data/main
su postgres --command "/usr/lib/postgresql/9.3/bin/initdb -D /data/main"
cp /postgresql.conf /data/postgresql.conf
cp /pg_hba.conf /data/pg_hba.conf

If you want to allow access from any ip address, the next three commands are for you. This is obviously a huge security risk, especially if you don’t have a firewall or similar in place. Caveat Developor

sed -i "/^#listen_addresses/i listen_addresses='*'" /data/postgresql.conf
sed -i "/^# DO NOT DISABLE\!/i # Allow access from any IP address" /data/pg_hba.conf
sed -i "/^# DO NOT DISABLE\!/i host all all 0.0.0.0/0 md5\n\n\n" /data/pg_hba.conf

Start PostgreSQL

su postgres --command "/usr/lib/postgresql/9.3/bin/postgres -D /data/main -c config_file=/data/postgresql.conf" &

# As the user postgres, create a user named docker
su postgres --command 'createuser -P -d -r -s docker'

# As the user postgres, create a db docker owned by postgres user docker
su postgres --command 'createdb -O docker docker'

Shutdown PostgreSQL

su postgres --command '/usr/lib/postgresql/9.3/bin/pg_ctl --pgdata=/data/main stop'
exit

Now we commit, but we should use a tag! Until now, all our commits are for general purpose containers. Even though all data and configuration is “outside” the container, we still want to be able to identify for what purpose a container exists. As of this writing, tags are the best way to do so.

sudo docker commit CONTAINER_ID amattn/postgresql-9.3.2 TAGNAME

I’ve found that tags in the format of amattn/component:appname work very well in practice:

amattn/postgres-9.2.1:favstarclone
amattn/postgres-9.3.2:flickrclone
amattn/mariadb-55:bookmarker
amattn/redis-2.6.16:bookmarker

The tags also help us remember not to delete those containers.

Launching the Container

Launch the container with the run command. Notice that we aren’t spinning up a shell anymore. We are launching a container w/ the tag TAGNAME, running a single process (postgres) as the user postgres, with a random port forwarded to the container’s port 5432 and a directory mounted to the container’s /data.

sudo docker run -v="$HOME/postgresdata":"/data" -d -p 5432 amattn/postgresql-9.3.2:TAGNAME su postgres --command "/usr/lib/postgresql/9.3/bin/postgres -D /data/main -c config_file=/data/postgresql.conf"

At this point, the container should be humming along in the background. You can even prove it to your disbelieving self with the ps command. In particular, the status column should list an uptime and not an exit code:

docker ps -a

Start and stop the container with:

sudo docker stop CONTAINER_ID
sudo docker start CONTAINER_ID

Get the host port with either of:

sudo docker ps -a
sudo docker port CONTAINER_ID

The Short, Borderline Cheating Way

In the host:

mkdir -p $HOME/postgresdata
sudo docker run -v="$HOME/postgresdata":"/data"  -i -t -p 5432 amattn/postgresql-9.3.2 /bin/bash

Inside the container:

cp /etc/postgresql/9.3/main/postgresql.conf /data/postgresql.conf
cp /etc/postgresql/9.3/main/pg_hba.conf /data/pg_hba.conf
sed -i '/^data_directory*/ s|/var/lib/postgresql/9.3/main|/data/main|' /data/postgresql.conf
sed -i '/^hba_file*/ s|/etc/postgresql/9.3/main/pg_hba.conf|/data/pg_hba.conf|' /data/postgresql.conf

mkdir -p /data/main
chown postgres /data/*
chgrp postgres /data/*
chmod 700 /data/main
su postgres --command "/usr/lib/postgresql/9.3/bin/initdb -D /data/main"

# OPTIONAL: configure /data/postgresql.conf & /data/pg_hba.conf to allow access from trusted IP addresses

# Start PostgreSQL
su postgres --command "/usr/lib/postgresql/9.3/bin/postgres -D /data/main -c config_file=/data/postgresql.conf" &

# OPTIONAL: add PostgreSQL user(s), go other setup config

# Stop PostgreSQL
su postgres --command '/usr/lib/postgresql/9.2/bin/pg_ctl --pgdata=/data/main stop'

exit

Back in the host, optionally commit and tag. Launch the container with the run command:

sudo docker run -v="$HOME/postgresdata":"/data" -d -p 5432 amattn/postgresql-9.3.2:OPTIONAL_TAGNAME su postgres --command "/usr/lib/postgresql/9.3/bin/postgres -D /data/main -c config_file=/data/postgresql.conf"


the fine print:
aboutarchive@amattn
© matt nunogawa 2010 - 2017
back ⬆