This post is a slightly amended version of the presentation that Jan Weigner gave during Cinegy’s Technical Conference 2021 which was live-streamed on March 25th and is now available to view as a recording.
The post is named "Bytes and Pieces", as a general headline to cover all of the things that are going to be discussed. We’re going to review GPUs, CPUs, the latest greatest smartphones and what you can do with them. The latest Intel NUC and some of the greatest PCI for SSDs will also be reviewed. It will provide pointers and also a view at what has really changed in the last five years. A lot of things have changed, and surprisingly, in a lot of areas where more progress could be expected, anticipated things haven’t happened.
Let’s start by looking at CPUs and what has changed in recent years. Here is what we had five years ago and what we have today.
Five years ago Intel was still a Skylake architecture, something like the Core i7-6700. There are still plenty of those machines around.
AMD five years ago – that was Athlon and Opteron, because the Zen architecture only got launched in 2017.
ARM five years ago – smartphones and tablets were embedded everywhere. But if we look at today, things have changed quite dramatically, especially for ARM.
Intel now finally has Tigerlake, it is already in laptops, it’ll come to the Xeons.
On the desktops for now, it’ll be 14 nm and there’s a hope that in the second half of this year we will see those then coming out in 10 nm as well. We have already seen the launch of Zen 3 and 3rd generation on 7 nm for the Ryzens. Now we have seen it for the EPYCs being announced and finally then, of course, for workstations. Threadripper is going to come as well.
So, all around is a good progress, but the real change that has really started to happen in the last couple of years was more and more ARM on servers. And now with the giant Apple switch away from Intel, there will be more and more ARM on laptops, on desktops, but really in the data center. Also, more and more ARM in the cloud, in any area. So, ARM is really where a lot of things is going to happen. If you think that Intel should be afraid of our friends at AMD, the bigger threat actually may be ARM in the long run.
Ryzen 9 5950 is a 16-core Zen 3 Ryzen, as the top level of what you can get for desktop and workstation machines from AMD today. Of course, there’s still the workstation range with the Threadripper. Threadripper is going up to 64 cores, but it’s still Zen 2 that should soon also be upgraded to Zen 3. An announcement has been made that the EPYC is being lifted to Zen 3 with another 10-20% efficiency gain, which is already fantastic. And quite frankly, Intel has a problem and Intel knows that themselves. On the other hand, whatever they can manufacture, they can also sell. So, the problem is kind of limited, but in terms of prestige or sought leadership, certainly, currently AMD with their CPUs has the lead. We’ll see how Intel will catch up, don’t count them out. But for now, if you today had to build a top-end desktop or a top-end workstation, you are hard-pressed not to go for AMD. Of course, if you can find the CPUs, but it’s the same problem that we have with the GPUs, and you just have to pay the premium and still you can arguably say that for the money you pay you still get a premium quality product.
Let’s look at some of the numbers. It’s certainly interesting, what you can squeeze into a socket these days.
Cinegy has already ported the Daniel2 codec to ARM, ARM Linux, ARM Mac OS, Android, iOS and so forth. Here is a benchmark done with the Apple M1, that is found in the Mac Mini, for instance: 84 fps 4:2:2 of Daniel2 8K. In any case, 84 frames is not bad, but next you see that 6-core Ryzen, that has the integrated APU, meaning Zen 2 architecture with graphics, delivers 100 fps. While 8-core Ryzen delivers 135 fps, and 16-core EPYC 7302 – 209 fps. And finally here is EPYC 7402 that has 24 cores, which delivers 307 fps encode 8K 4:2:2 10-bit. Anyway, for an 8-core ARM chip, 84 frames aren’t bad, it has a very low power consumption, so it’s a good start.
Let’s see what Apple can bring. In the coming months you will hear of great things – 12-core, 16-core, and if to extrapolate the numbers of what we see today, there should be great things coming. But we are really looking forward to the new 64-core Threadripper, based on Zen 3, because it is expected that CPU is going to give an excess of 1000 frames of 8K encoding per second. Well, you will see in a couple of months whether this prediction is going to be true.
That chart doesn’t have Ryzen 9 5950X in it. Speaking about its performance, it has 310 fps encode, so quite a bit more than Rysen 9 3950, which is just one generation before that.
The new Intel NUC is the latest generation of chips that is basically the same chip you find in their latest 11th Gen laptops. It finally has 2,5 Gbit output, two Thunderbolt ports, it is Quadcore, it has the Intel Iris Xe Graphics, but compared to the 8-core AMD APUs it is a little bit of a slow thing. It has the latest Quick Sync in it and that does 4:2:2 10-bit HEVC, for instance, and that is really something very interesting, especially if you want to build a very small contribution encoder. So, that alone is worth looking at it.
Let’s compare the NUC from today with what we had five years ago.
Five years ago it was the Intel NUC 6, today – the Intel NUC 11, and not really that much has happened. Five years ago we used to have a CPU Mark of 3125 and now that has grown to a CPU Mark of 10810. So, 3.5 times the speed. Yeah, but the gods of Intel marketing basically just doubled the core count and they revved up the frequency from 1.8 to 2.6 GHz that basically would already have given you at least a CPU Mark of 8900. A little bit of the process optimization, after all. This went from 40 nm now to 10 nm and there you go – there’s your CPU Mark of 10810. Of course, now you have Intel Iris Graphics in there, which is the other wonderful nice thing, which gives you the new Quick Sync with these wonderful goodies such as 4:2:2. Speaking about the price, fully equipped with RAM, SSD and so on and so forth, it costs less than 600 euros. It’s an interesting box to look at, for just that. It’s enough to use the Core i5, there’s no real reason to go to the Core i7, you have the same number of cores, you have the same number of threads. The only thing - the i7 does revs a little higher in terms of CPU speed, in terms of megahertz, but that’s really it.
The RTX 4000 is not the greatest, not the latest, but it is still one of the favorite cards. It’s a single-slot design, that means you can fit more of them into the same box. It requires less power, it’s not so power hungry, it can be powered with just one connector. It is still Turing generation, it’s not the latest and greatest Ampere, but it has two NVidia NVDEC units. It has two hardware accelerated decoder units and this is the box that has the card inside, that we actually used to build a large Multiviewer. If you want to build a server that will get you 200+ HD signals decoded, H.264 HEVC, whatever – this is the card to work with in terms of density and value for money. The RTX A6000 – the latest greatest Quadro Ampere card – has also 2 hardware decoders, but it’s a dual-slot design, it is almost twice as power-hungry. In terms of expenses, we have to actually get 6-7 more times and a bank to pay for it. This card is available, it’s still shipping. For a whole lot of projects, especially in the HD and even entry over HD range also for playout, it’s a good card, but it shares the same problem – if we’re talking about HD or SD and you still need interlaced encoding – this card hasn’t it anymore. Then, basically, your best bet is still the P2200 card that still does interlaced encoding and that’s still Pascal series. But it’s still shipping, so you have to choose the right card for your project: interlace – P2200; density for massive decoding – RTX 4000; high-end 8K playout, graphics savvy stuff – RTX A6000, certainly. So, it is really a matter of choice and being able to get them, as usual.
Let’s look at the quick specs here, in this little chart, showing you which the cards are, and where currently to get them, and that gives you an idea of what we’re talking here.
Here are the important specs for the both cards we have singled out – the Quadro P2200 and the Quadro RTX 4000.
The Quadro P2200 is still Pascal series, as mentioned, which gives us the interlaced support. So, for SD and HD, where you still need maybe 10 NTA, this is the card especially for interlaced encoding.
You don’t have that with the Quadro RTX 4000, but you have the wonderful two NVDEC units which allow you to decode twice as many streams as with the other card. And because it’s already Turing series, it has the better HEVC encoder, if that’s required.
You can see the difference between them, of course. Basically, the RTX 4000 is twice as fast, but the Quadro P2200 has the advantage. Like the other, it is a single-slot card, but it doesn’t require a power connector, so the P2200 is completely powered through the bus.
Anyway, both cards are shipping and available on the market. Today you can get both of those cards and immediate delivery at $475 for the P2200 and around $900 for the RTX 4000. Both are still in active production, so that shouldn’t be an issue.
Let’s have a look at what we had five years ago.
May 2016 – we saw the introduction of the GTX 1080, the first Pascal card hitting the market. The top of the line was the TITAN Xp and that had all the way up to 3840 cores at 250 Wt power consumption with a max of 12 Gb of RAM. Well, 12 teraflops was the result of that. Now with 40 teraflops is where we top out with the RTX A6000 with a wonderful large amount of 10752 cores and up to 48 Gb as in the A6000. But, of course, it comes to the price – while the TITAN Xp when introduced was a mere $1199, the recommended retail price for the A6000 is $4650.
In case you may wonder why we are actually holding a Radeon™ RX 6900XT here, well, we actually may have not officially said it, but if you followed our development online and you went to our Daniel2 website, we actually have ported the Daniel2 codec to support OpenCL. You might want to run something on OpenCL and you can run OpenCL on NVidia, but this is not a good idea since NVidia is clearly CUDA, unlike OpenCL on Radeon and on Intel – this is exactly what you need. And so, the fastest OpenCL number-cruncher, that you can buy for money right now, is the RX 6900XT, which can be managed to get for only 20-30% over the sticker price. We made our benchmarks with that and the numbers are actually quite nice. You could say the RX 6900XT is right there up with the RTX 3080 or even 3090, depending on what you’re trying to accomplish. So, quite a nice card, of course, it lacks a little bit of the support and the usage, and, well, CUDA that NVidia has, but for pure number crunching and for the amount of memory that it comes with, and theoretically, if it was sold at the sticker price it would be a very interesting alternative to NVidia.
So, as you can see so far, all the numbers are just for NVidia cards. We haven’t bothered to put the ratio numbers in there, yet, for OpenCL, because we are not completely finished with that. It is still some weeks away, hopefully not more than that. We have decoded numbers already, but again, we’d like to present the complete finished picture for OpenCL, but we believe that with the RX 6900 XT we’re going to really be in line with the RTX 3090.
But let’s have a quick look on the numbers of previous gen with the RTX 2060: 262 fps of 8K, 4:2:2 encode all the way up to 781 fps with the RTX 3090. It is necessary to point out that these are 8K numbers and on decode with 8K we achieve 1607 frames per second 8K 4:2:2 on the RTX 3090.
So, in a not too long time we’re going to crack the 2000 fps mark, as well. You may ask – who needs 2000 frames of 8K per second?
It just means that only by using a fraction of the GPU performance we can then do 8K at 60, 8K at 120, and have lots of performance left on that card for other stuff – for filters, for effects, for AI. A pass to encode or decode 8K is just a nice little side-effect that GPU can just do on the side.
The latest and greatest SSD available is the Samsung PM1735 half-height/half-length PCIe 4.0; there are 3.2 Tb model, 6.4 Tb model, 12.8 Tb and so forth. It is incredibly fast, it has basically everything you normally use to build from lots of hardware devices, be it multiple U2 disks, be it other lots of striped disks – this beats it all. And actually, if you can get it and you can find it, from a value proposition it is very interesting.
Let’s look at the specs in the picture, to have a quick rundown to see what it does. There’s more and more other devices of this nature coming onto the market, but this one is currently the favorite.
As said, this device exists from 1.6 Tb all the way up to 12.8 Tb. The sequential read performance is up to 8 Gb/s and sequential write is up to 3.8 Gb/s for the higher-end models. As for the prices, it starts at €362 plus VAT for the entry-level 1.6 Tb model all the way up to €2601 plus VAT for the 12.8 Tb model. If you had been looking for this type of storage 5 years ago, 10 years ago, it was hundreds of thousands of dollars, maybe even millions, to get these performance values, and now this is something in the shape of a half-height/half-length card that you can plug into your home PC.
This is ideal for database applications, for all kinds of number-crunching applications, for cash drives and so forth. At that price it will be silly not to use it for stuff like that. Similar devices (not this Samsung device) exist as U2 or U3 type of devices, meaning in a 2.5-inch form factor. So, you can put 24 of them into recommended unit, aggregate the bandwidth even further, so you couldn’t even saturate your 400 Gb LAN connections as well. For a database, SQL database like you could use for Cinegy Archive, this is like nitrous oxide to turbocharge your database in terms of performance.
It’s time to look at smartphones. You could say: "Why are we talking about smartphones in broadcast?". Because if you’ve heard what Lewis said in his presentation: if you’re looking at HD and even 4K, smartphones quite clearly can be used for production purposes.
The latest Samsung Galaxy S21 Ultra 5G is actually claiming to do 8K recording, and it does! It is still currently limited to 20 fps and only does HEVC 4:2:0, but it is getting there. The older S20 has similar claims, but here they upped the ante – it’s a little bit faster and has a nicer screen, and has to justify its exorbitant price.
But give it another generation or two, and then for 4K production, giving the fact that this has actually a multiple optical zoom, and we’re talking about every 10x zoom coming in, not just digital zooms, but optical zooms in these devices. Then we’re looking at Samsung.
The iPhone 12 PRO – the picture quality is great for what it is. It has a 4K sensor, the 12-megapixel sensor here, but that means the stencils are bigger and the picture quality created is also quite nice. But compared, of course, to the 100+ megapixels in Samsung, there is no comparison – this is the future, and we can guarantee – the next Apple smartphone will also feature the 12-megapixel sensor, maybe still for a macro, but also everything else. The megapixel race is on. Apple has kind of refused to be participating in that, but it will be a surprise if the next Apple smartphone couldn’t do 8K. Soon, in September of this year, we’ll see if they call it iPhone 13 or they’re superstitious and call it something else. Our guess is - they’ll call it 13.
But hopefully, we’ll finally do 8K, that’d be great, especially if we then scale it down to 4K, we’ll get even much better picture quality.
Let’s have a quick look at the specs of these phones, but this is just to be aware. We, of course, are very proud that we can say – Daniel2 runs on both of these, so for us there is a bright future. And what we can do even for broadcast recording applications in the very high-end on these smartphones – not just H.264 or HEVC, but proper 4:2:2, 4:4:4, RGB, whatever. All of that can actually give them enough compute power, we can encode that on these smartphones today. It’s 4K60 for sure, but if they’ll increase the compute power of these smartphones and they’ll give us access to the wonderful sensors the way we want it, then the sky is the limit.
Here are some concrete numbers from some smartphones over the last 2-3 years. Snapdragon 855, Snapdragon 865, Exynos 990. So, basically, that’s an old S20 from Samsung.
Snapdragon 865 is in a number of other devices, even current devices, but, of course, it could be great to put here the Snapdragon 888 values, but there was no possibility to get hold of an S21 with that processor.
The new Exynos and the current S21 are a little bit of a disappointment. The numbers are only just a tad higher than what’s shown here for the Snapdragon 865. There seems to be some thermal issue with that CPU. After only a very few moments, the benchmark really drops in numbers and really comes back to the values that we already have here for the Snapdragon 865. We’ll see, maybe some firmware updates will improve that, but at the moment that’s a thorough disappointment. We hope that the Snapdragon 888 is a little bit better than that.
At the moment 103 fps off, and this has to be stressed – this is 4K, we’re not talking about 8K. Here is what we can achieve with purely CPU-based software encoding. So, there’s the caveat that once we also will add the GPU to the mix here, the numbers will certainly go up as well.
But in none of these cases do we come closer to our ultimate goal, which is, of course, 8K60. But 23 frames is what the highest we get here right now. Give us the Snapdragon 888, maybe we’ll get up to 30 fps, but still – 8K60 is the goal. Our hope is that with the Exynos of next generation from Samsung, which will also have Radeon graphics inside, that is achievable.
At the same time, hopefully, the next generation iPhones will finally be 8K and we will also be looking forward to the latest upgraded ARM performance there, which promises also to make 8K60 a possibility.