Categories
Architecture

Why Some Mobile Apps are So Slow

If you haven’t read the lengthy article on Why Mobile Web Apps are So Slow I recommend you check it out. It appears well researched, citing lots of tests, sources, benchmarks and authorities. In summary JavaScript is garbage collected, and garbage collection introduces overhead making it up to 5x slower than native code. This isn’t such a big deal on x86 desktops, but with the slower architecture of ARM it is killing the performance of mobile apps.

Take a look at it, even if you just skim it you will no doubt learn something about this heated debate. Oh, and everywhere it talks about LLVM and ARC remember that is the same architecture that Delphi uses [PDF]  for iOS development (and Android soon too).

Also keep in mind that JavaScript isn’t the only garbage collected language on mobile devices. Languages that run on the Java Virtual Machine and .NET Framework are also garbage collected, as are most all scripting languages. This includes the Dalvik Virtual Machine that non-NDK Android apps run on. Granted Dalvik is optimized differently than that Java Virtual Machine, but it is still garbage collected, so it will still pause program execution at some point.

Quote from the article by Herb Sutter:

Managed languages made deliberate design tradeoffs to optimize for programmer productivity even when that was fundamentally in tension with, and at the expense of, performance efficiency

Which was endorsed by Miguel de Icaza of Mono & Xamarin:

This is a pretty accurate statement on the difference of the mainstream VMs for managed languages (.NET, Java and Javascript). Designers of managed languages have chosen the path of safety over performance for their designs.

Points to remember:

  • Garbage collection is up to 5x slower than native code
  • This is a much bigger deal on ARM architecture than x86
  • Automatic Reference Counting (ARC) is not Garbage collection
  • Delphi uses LLVM architecture and supports ARC like Objective-C

Clarification: A big part of the slowdown is that JavaScript is also a dynamic language, so all garbage collected languages are not 5x slower than native code. There are pros and cons to GC and ARC. There is a comment on the article that points out the 5x comparison was between GC and manual memory management, not ARC. There is overhead with ARC, but it doesn’t pause your apps execution.

Read the article and draw your own conclusions, but I’d love to hear what you think.

14 replies on “Why Some Mobile Apps are So Slow”

The article really makes two points: javascript is slower than “native code” (presumably because it’s a dynamic language) and seems to imply ARC is better than GC for some situations. Both ways of doing it have their own overhead. ARC takes more time for each assignment of a reference (has to do an atomic +1/-1) but takes a constant time. GC is faster at allocation, but when it does hit it is slower than the constant time.

If you just look at some simple examples. Like a tree of Nodes where the nodes hold some references to each (in addition to the tree structure).

You might notice that ARC has the “best” of both worlds.
a) The complex development task.
Compared to manual memory management (where you just kill all the nodes) it adds a certain order that has to be respected when removing the references.

b) The slowness (finding and eliminating references in the right order).

Also there’s the real world outside there where tasks and data-structures aren’t always as simple as a tree with a few references on top…

So, garbage collection is inherently slow unless you give it far more memory than it actually needs. Nothing most of us didn’t already know, but it’s nice to have that backed up by hard data.

For some reason I can’t help but wonder what Joseph Mitzen would say if he read that paper…

On the other hand all mobile OS but one rely GC, the most widespread one relies on GC, and the second most successful one was successful with a GC initially, and hasn’t improved its market share since it switched to ARC…
Also this conveniently overlooks two facts:
– the graphic subsystem is far more often the bottleneck on modern mobile apps
– multi core is becoming the norm, and ARC does not scale so well in multicore, while GC can

So before crying wolf, one should profile and be wary of exchanging developer time (ARC weak references) where hardware can do it better (ram and multi-core).

ARC can be slower than GC, especially if ARC is not properly implemented from the performance point of view – which is the case of the Delphi NextGen RTL: even if it uses LLVM as back-end, the associated Delphi RTL uses a giant lock and is not optimized for multi thread applications.

For a series of article involving Delphi, ARC and GC, you can take a look at some of my blog entries at http://blog.synopse.info/tag/GarbageCollector

In late 2011, I was already describing how ARC could be great for the future of Delphi. But I’m still not convinced with the NextGen implementation, which force you to use ARC, and has already some identified issues. I hope it will be fixed as soon as possible.

@Eric: That article really doesn’t look like it’s suffering from a lack of profiling. It’s going over a huge amount of hard data and talking about a specific scenario, which seems to be a different one from the one you’re referring to. The bottom line is, in the specific scenario of a memory-constrained environment–defined as one in which you’re using more than 1/6 of the total memory available–garbage collection becomes an incredibly expensive bottleneck.

@Mason yes the problem of the article is that it starts from a specific case and hardware and then draws generic conclusions about mobile

Jim, good that you revived blog posting and podcast. If you take into account Eric’s article http://delphitools.info/2013/07/18/mobile-performance-a-look-back/ you’ll see there is no 5-10 times speed difference between x86 and ARM architectures both in JavaScript and native. Two times slower seems more like it. Of course someone signed as EMB mentioned the very same article you began with. I only hope that that article is not the only source you draw conclusions upon the performance of ARM applications and more posts on the subject to come 🙂

@IL I found the article interesting and informative, but it certainly isn’t the basis of anything beyond this blog post.

The 5x difference is between Garbage Collected JavaScript vs. Non-GC Native code. The article then says that the impact of the 5x difference is more noticeable on ARM than on x86 desktop.

Eric’s article was really informative too!

Jim, I just want to be sure you are open to the dialog and exchanging opinions on rapidly changing topic of mobile development, because perhaps you are somehow related to decisions about Delphi on Android and more important speaking to public, developers and so on.
Please excuse me some haste in publishing the comment, cause I haven’t read the entire article yet and going to spend the time needed, it is worth reading.
That article is as good as you can reproduce results and the opportunity was provided as the author shared the code. I’ve tried myself on x86 computer:
test.c (mingw32) – 4 secs
test.cpp (VC++ 2008) – 3.3 secs
test.html (chrome 28) – 17 secs
test.html (IE10) – 9 secs
I’ve run spectralnorm(2500) in every case.

Interesting read, but at the end of the day it is the _application design_ that determines the actual speed of the application. You can definitely make slow native apps, and you can surly make GC apps that are fast enough.

The user experience is not determined by the optimal speed of an app…

Just take a look at these two examples and make your own judgment:

1) A native app build with XE4:
https://itunes.apple.com/en/app/profund/id648519668

2) A web app build with Smart Mobile Studio:
http://clevertrain.de/app/sportbootfuehrerschein-binnen/

That 5x worse factor looks like somebody pulled it out of xis/xers ass.

Good GC’s actually were, for some time, more efficient than RefCounting (which costs you all the time, use by use, cache invalidation by cache invalidation). They just weren’t predictable.

Today we have incremental GC’s (which have predictable pauses), even concurrent (means minimal pauses at the start / end of GC cycle for non-GC threads).

Comments are closed.