IO performance

IO is very important in almost all types of applications, because IO operations can cause bottleneck very easily.

In Java's world, there are two groups of IO classes, Traditional IO(TIO), New IO(NIO). And a coming enhancement for NIO, NIO2.
The NIO(and NIO2) are targeted to improve performance for certain cases and provide better OS level IO integration, but they can not replace TIO! There are huge places, TIO is the only option for you.
Today we will talk about TIO's performance.

There are two major types IO bottlenecks:

Wrong IO buffer usage
Overkilled synchronized protection.

We all know buffer can improve IO performance, but not everyone knows how to use buffer correctly, at the end of this blog i will list some best practice advices.

Part I: Wrong IO buffer usage
There are two popular misusage and one misunderstand

a)Add buffer to in-memory IO classes(misusage)

b)Add explicit buffer to Buffered version IO classes(misusage)

c)The relationship between Buffered verion IO classes and adding explicit buffer(misunderstand)

For a), this is ridiculous! Adding buffer is supposed to group a lot of IO device accessing to one acessing, in-memory IO classes(like ByteArrayInput/OutputStream) will never touch any IO device, there is no reason to buffer them.

For b), this is redundant! You only need one level buffer, more than one level buffering can only introduce more stack calls and garbage creation.

For c), this needs more explanation.

They are trying to achieve the same goal, but by different ways, which causes they have different performance!

We did a test for this, comparing reading/writeing files by Buffered version IO classes and explicit buffer
The following performance test result shows you how big the performance difference is:

Read: (All numbers are token after wamup, each sample time is for 10 times read, time unit-ms)

File size	1K	10K	100K	1M	10M	100M	1G
BufferedInputStream	0	1	5	53	549	5492	56002
With explicit byte[] buffer	0	0	1	10	113	1126	11448

Write: (All numbers are token after wamup, each sample time is for 10 times write, time unit-ms)

File size:	1K	10K	100K	1M	10M	100M	1G
BufferedOutputStream	0	1	5	45	472	4793	48794
With explicit byte[] buffer	0	1	1	10	124	1300	13138

Why there is such a huge performance difference? There are two reasons:

Buffered version IO causes more stack calls(Thanks to the decorator pattern)
All Buffered version IO classes are thread-safe which means a lot of synchronized protection(Will explain this more in Part II)

Now you know explicit buffer has better performance, try to use it whenever possible, but there are two cases you still need buffered version IO:

You are working with some third part lib who requires IO input parameters, but using them in a stream way, not with an explicit buffer. To improve the performance you have to pass in a Buffered version IO.
Well, if you are lazy, you may prefer buffered version IO classes, since they can make your code has fewer lines.

Part II:Overkilled synchronized protection

I mean JDK's io package, i don't really like those codes, since they are all thread-safe which also means a lot of synchronized protection. If I need thread-safe i prefer to do the protection myself, so i will never add overkilled synchronized protection. But JDK's io package gives me no choice

As long as you use JDK IO code, you are adding a lot of synchronized protection, even though you are 100% sure you are under single thread context, you can not bypass these unnecessary synchronized protection. You may wonder is this really a serious problem? JVM should able to handle less contended locks faster, right? Apparenly, he can not do it well enough, see the performance test result.

I recreate a batch of IO classes following JDK IO package's javadoc, all my classes do not do any synchronization protection. All tests are done in a single thread, so don't worry about thread-safe.
We did a tests for this, comparing reading/writing in-memory data by original JDK IO classes and our unsync version IO classes. The reason for using in-memory data is to magnify synchronized's performance impact.

Read:

Write:

The write curve is not as smoothness as read, because of the internal growing byte[] causes a lot of GC(Similar with the problem we saw in SB).

Ok, now you see how heavy the synchronized protection is. We have a lot of IO usage within a method call which is guaranteed in a single thread. We also have a lot of IO usage, even though the references of IO objects go out the method scope, but we can reason out they are only accessed by a single thread. For cases like these, feel free to use the Unsyc version IO classes under com.liferay.portal.kernel.io.unsync

For more detail about com.liferay.portal.kernel.io.unsync, see issues.liferay.com/browse/LPS-6649

My final advices:
1)Use explicit buffer rather than buffered version IO whenever possible.
2)Use buffered version IO with third part lib, or when you are lazy
3)Use Unsync version IO classes from com.liferay.portal.kernel.io.unsync whenever you are sure about you are under single thread context, or you are adding the sync protection by yourself.