Results 1 to 18 of 18

Thread: ARM vs x64

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,576
    Thanks
    790
    Thanked 687 Times in 372 Posts

    ARM vs x64

    It's really hard to find good benchmarks of ARM vs x86 CPUs, but Anand provides some measurements on industry-standard benchamrks:



    Note that while Apple A13 has the same speed as Ryzen/Core, the frequencies is at least 1.5x different, so Apple CPUs already has 1.5x higher IPC!!!

    And while you may think that they have special optimizations for SPEC, I've also seen 7-zip benchmark results comparing Intel and ARM cpus - Intel has better IPC on compression (due to 128-bit memory controller, I believe), while ARM has better IPC on decompression. And you can see above that Apple CPUs are significantly better than ARM own ones.

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    Can't wait for the rumors about ARM-based Apple notebook to materialize. With enough RAM it could run some serious benchmarks. SPEC CPU2006 requires only 1 GB of RAM and is pretty outdated, but there's SPEC CPU2017 already and it requires at least 16 GiB of RAM for running the full version.

    Geekbench 5 scores shows similar story as SPEC CPU2006 (despite poor opinions about Geekbench scores reliability), i.e. Apple A13 sits between Intel Core i9-9900k and AMD Ryzen 9 3950X. Fastest cores from Arm company seem to have IPC comparable to high-end x86 microarchitectures (Skylake, Zen 2) when running in native mode (no x86 emulation or other weird things like that).

    A disadvantage of mobile SoCs (microarchitectures) is that they are still limited to rather old SIMD extensions, i.e. 128-bit SIMD named "Advanced SIMD" (Neon). I think it's rather likely that the mobile ARM cores would lose a lot in SIMD-heavy workloads vs high-end x86 cores. There's Scalable Vector Extension (SVE) for ARM but it's not yet available in hardware. Fujitsu A64FX will have it, but it seems it will have a low clock and pretty low general performance (instead it focuses on reliability, scalability and wide SIMD). IIUC then SPEC CPU benchmarks don't allow manual SIMD optimizations, so the only way to achieve any gain from SSE/ AVX/ NEON/ SVE/ whatever is to rely on compiler optimizations which seem to not give substantial gains.

  3. Thanks:

    Bulat Ziganshin (9th May 2020)

  4. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    Japan Captures TOP500 Crown with Arm-Powered Supercomputer
    The new top system, Fugaku, turned in a High Performance Linpack (HPL) result of 415.5 petaflops, besting the now second-place Summit system by a factor of 2.8x. Fugaku, is powered by Fujitsu’s 48-core A64FX SoC, becoming the first number one system on the list to be powered by ARM processors. In single or further reduced precision, which are often used in machine learning and AI applications, Fugaku’s peak performance is over 1,000 petaflops (1 exaflops). The new system is installed at RIKEN Center for Computational Science (R-CCS) in Kobe, Japan
    It seems they broke the record without using GPUs at all. Just https://en.wikipedia.org/wiki/Fujitsu_A64FX CPUs.
    Current TOP500 list: https://top500.org/lists/top500/2020/06


    Update:
    Rumors of Apple transitioning from x86 to ARM this year were true: https://www.anandtech.com/show/15875...-to-apple-socs
    In few months we'll see the actual performance in desktop workloads.
    Last edited by Piotr Tarsa; 23rd June 2020 at 14:47.

  5. #4
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    115
    Thanks
    39
    Thanked 30 Times in 21 Posts
    Also note A64FX uses HBM2 with 32GB per 48 cores and bandwidth of 1TB/s. A lot of bandwidth but small memory capacity.

  6. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    Quote Originally Posted by algorithm View Post
    Also note A64FX uses HBM2 with 32GB per 48 cores and bandwidth of 1TB/s. A lot of bandwidth but small memory capacity.
    There's 32 GiB of local RAM per node, but the nodes are connected with 6D torus named Tofu interconnect, which provides tens of gigabytes per second of bandwidth between nodes.

  7. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,576
    Thanks
    790
    Thanked 687 Times in 372 Posts

  8. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Ampere Altra is based on Arm Neoverse N1 which is a modification of area-optimized Cortex-A76. As seen on https://images.anandtech.com/doci/15...h-page-008.jpg it still has pretty weak SIMD - only 2x128-bit. OTOH Cortex-X1 has twice as many units https://en.wikichip.org/wiki/arm_hol...ures/cortex-x1 putting it in line with client version of Skylake (server Skylakes have AVX-512 which is a totally different league, but OTOH AVX-512 isn't that well supported in software).

    Apple has fat ARM cores, but they lost their chief CPU architect (Gerard Williams III), who is now leading new startup https://nuviainc.com/leadership that aims at server market. They (Nuvia inc) will probably deliver Apple A1x-like general IPC, but they are pretty quiet about their plans.

  9. Thanks:

    algorithm (24th June 2020)

  10. #8
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    115
    Thanks
    39
    Thanked 30 Times in 21 Posts
    Piotr Tarsa Thank you for pointing out Nuvia. Very interesting. Unfortunately it is going to be ARM and not RISCV. Maybe they thing it is too early for riscv.

  11. #9
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    Quote Originally Posted by algorithm View Post
    Piotr Tarsa Thank you for pointing out Nuvia. Very interesting. Unfortunately it is going to be ARM and not RISCV. Maybe they thing it is too early for riscv.
    RISC-V is still mostly an academic project. Arm has built a software ecosystem and a set of standards: https://nuviainc.com/blog/the-import...servers-boring ARM servers are already deployed (e.g. Amazon Graviton CPU on Amazon cloud), software is slowly catching up (e.g. Microsoft Edge finally has an ARM version IIRC), etc

    Choosing RISC-V over ARM won't give you any performance difference in itself. At least I didn't see anywhere any proof that any ISA (x86, ARM, PowerPC, SPARC, etc) is somewhat performance limited by design. AMD was planning an ARM core with performance equal to Zen core, but they didn't make it https://en.wikipedia.org/wiki/AMD_K12 Probably that wasn't financially sensible - AMD doesn't have enough influence over the market to make such products attractive to consumers and system integrators. Apple OTOH makes not only CPUs, but also GPUs, NPUs, plenty of other hardware, operating systems, compilers, productivity suites, etc so they don't have to negotiate with anyone else to integrate mentioned products with new ones.
    Last edited by Piotr Tarsa; 25th June 2020 at 10:39. Reason: .

  12. #10
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    115
    Thanks
    39
    Thanked 30 Times in 21 Posts
    Yes RISCV want buy you any performance.

    The point of RISCV is that you don't pay licenses, you are not dependent on ARM and you can add ISA extensions without asking anyone.
    Also POWER and MIPS are also free now(but you can't add extensions i think). RISCV, I think, is going to be big in embedded (already is to some extend). Server market is difficult to enter.

    About RISCV cores. There is Sifive with a cortex a72 like core and also Alibaba, and Esperanto. Western digital has released as open source some very performant embedded cores EH1 and EH2.And they are going to release a linux capable one. But probably it is going to be an order one.
    Also EU is developing HPC with ARM and a RISCV vector accelerator.

  13. #11
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    The point of RISCV is that you don't pay licenses
    Designing a high performance core (i.e. significantly higher performance than the standard Cortex-A or Cortex-X series) is probably multiple times more costly than the license to do so.

    you are not dependent on ARM and you can add ISA extensions without asking anyone
    IIUC Arm added the capability for custom instructions: https://www.forbes.com/sites/tiriasr...-instructions/

    RISCV, I think, is going to be big in embedded (already is to some extend)
    Embedded doesn't require high single thread performance, it requires low power. I think the in-order CPU designs from Arm have relatively cheap licenses: https://www.arm.com/products/flexible-access/startup

    Server market is difficult to enter.
    Yes and as I've said - there's already a server ARM ecosystem and standards, while RISC-V is still maturing.

  14. #12

  15. #13
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    (Sorry if this post is a little rude, but I'm fed up with baseless stories about ARM inferiority)

    Guy is spewing typical nonsense:
    - ARM can't have as high performance as x86
    - ARM lacks some mysterious features that only x86 can have
    - ARM can't integrate with as much hardware as x86
    - etc

    Where's any proof of that? The actual situation seems to be quite opposite:
    - ARM in the form of Apple Silicon has very high performance already and it's going up quickly. That is visible even in the first post here.
    - I haven't seen any example of functionalities that are possible on x86, but aren't on ARM. x86 prophets tells us otherwise, but is x86 a religion? You have access to ISAs (instruction set architecture) so you can find the mysterious features yourself, but there aren't any.
    - ARM can be integrated with hardware typically seen with x86, e.g. nVidia has full support for CUDA on ARM processors (currently nVidia supports x86, ARM and POWER architectures), nVidia Shield is an integration of ARM with GeForce, there are rumors of Samsung integrating RDNA (the new Radeon cores) in their upcoming ARM based smartphone SoCs, etc

    I'm mostly interested in any logical explanation on why ARM can't scale its performance up to the levels of x86 or above. No biased, unfounded, vague claims but actual technical analysis showing understanding of ARM and x86 architectures.

    Quote Originally Posted by Gotty View Post
    Mac on ARM is an unknown so it's a perfectly logical idea to wait for independent benchmarks and see how the software we're interested in will perform on Apple Silicon based machines. Nothing shocking there. Same goes for choosing between AMD CPU and Intel CPU or AMD GPU and nVidia GPU.

  16. #14
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    623
    Thanks
    372
    Thanked 390 Times in 211 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    (Sorry if this post is a little rude, but I'm fed up with baseless stories about ARM inferiority)
    Oh, no problem. You have to know, I don't know much about the topic.
    I felt the 1st video a bit (?) biased, but since I didn't find hard numbers that clearly supports or refutes the claims (the full picture is still very blurry)... I thought I'd post these - could be interesting for the readers of the thread.
    Thank you for posting your view.

  17. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,576
    Thanks
    790
    Thanked 687 Times in 372 Posts
    NUVIA Phoenix Targets +40-50% ST Performance Over Zen 2 for Only 33% the Power

    Click image for larger version. 

Name:	N1.png 
Views:	62 
Size:	209.1 KB 
ID:	7856

    When tested by GeekBench 5, at every point, ARM’s results are more power efficient/higher performant than anything available on x86, even though at the high end Apple and Intel are almost equal on performance (for 4x the power on Intel). Note that Intel cores run up to 5 Ghz, while Apple cores run only up to 3 GHz.


  18. #16
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    Nuvia doesn't even have any timeline on when their servers will hit the market and it seems it could take them 2+ years to do so, so they need high IPC jump vs at least Intel Skylake. In meantime the landscape is changing:
    - Intel released laptop Tiger Lake which is basically laptop Ice Lake with much higher frequencies (there's small IPC change, mostly due to beefier caches), nearly 5 GHz. This means Intel at least figured out how to clock their 10nm high, but since laptop Tiger Lake is still limited to max quad core it seems that yield is still poor.
    - Arm has prepared two new cores: V1 for HPC workload (2 x 256bit SIMD) and N2 for business apps (2 x 128bit SIMD): https://fuse.wikichip.org/news/4564/...6-sve-support/ https://www.anandtech.com/show/16073...neoverse-v1-n2 IPC jump is quite big, but it remains to be seen when the servers will hit the market as it previously took much time for Neoverse N1 to be available since the announcement. At least those are SVE (Scalable Vector Extensions) enabled cores (both V1 and N2) so apps can finally be optimized using decent SIMD ISA, comparable to AVX (AVX512 has probably more features than SVE1, but SVE is automatically scalable without the need for recompilation).
    - Apple already presented iPad Air 2020 with Apple A14 Bionic 5nm SoC, but the promised performance increase over A13 seems to be small. I haven't found reliable source mentioning Apple A14 clocks so maybe they kept them constant to reduce power draw in mobile devices like iPad and iPhone? Right now there are people selling water cooling cases for iPhone (WTF?): https://a.aliexpress.com/_mtfZamJ
    - Oracle will offer ARM servers in their cloud next year: https://www.anandtech.com/show/16100...a100-and-altra and IIRC they say they will compete on price.

  19. Thanks:

    Bulat Ziganshin (24th September 2020)

  20. #17
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,498
    Thanks
    26
    Thanked 135 Times in 103 Posts
    A few interesting things happened recently and some didn't happen.

    Let's start with some disappointments:
    - Intel's (actually available to customers) 10nm CPUs are still limited to quad-core. There are no benchmark leaks of higher core count consumer CPUs. There are also no leaks of Ice Lake-SP (i.e. Intel 10nm server CPUs) results. Weird, given that Ice Lake-SP is supposedly based on tiles (Intel's avoidance of word chiplets) and mutli-die interconnects (somehow that it's not glue in Intel case) to achieve higher yields (I could be wrong there because there's really very little evidence about Intel's 10nm server CPUs right now).
    - Intel's still pushing their 14nm cores to the extreme. They have designed a Peltier module based CPU cooler (!) together with EK and CoolerMaster: https://www.intel.com/content/www/us...o-cooling.html https://hexus.net/tech/news/cooling/...ster-products/ There were Peltier modules based coolers in the past but they didn't catch on. Peltier module with 100W cooling power (Intels' CPU cooler is probably much more powerful than that) is a flat structure that cools one side by 100W, but heats other side by 200W. Therefore there's 2x more heat to transfer out. Adding the PSU (power supply unit) inefficiences that Intel cooling would consume about half a kilowatt for just a 10-core CPU (or something like in that territory).

    Some news about startups (i.e. big unknowns):
    - There's a startup Tachyum with extraordinary claims: https://www.tachyum.com/ They have developed a processor with complexity comparable to in-order CPUs, but performance comparable to out-of-order CPUs. I don't know the exact details on how that works, but some of the main characteristics seem to be lack of mechanisms for explicit instructions reordering, but allowing some next instructions to be executed before previous ones if there are no dependencies between them. They claim high performance, high power efficiency, low design complexity and also versatility - a single core can works as a typical general purpose CPU but also as an AI accelerator. They are however comparing themselves to Intel Xeons which are pretty outdated by now. AMD's EPYCs surpassed Intel Xeons by far, in terms of absolute performance, power efficiency, SKU (stock keeping unit) cost and TCO (total cost of ownership). Tachyum claims are still impressive, but overall it's not clear whether the advantages outweighs the costs. Tachyum developed their own ISA (instruction set architecture) to which they are translating x86, ARM and probably also RISC-V code, but they want software manufactures to target their ISA directly. I think Tachyum is too small to achieve wide adoption of their ISA unless their product is truly revolutionary.
    - Nuvia Inc. (that company developing new server CPUs based on ARM architecture) published a blog post where they compare GeekBench results to SPEC benchmark results: https://nuviainc.com/blog/performanc...ch-versus-spec The end results are surprisingly close. Detailed characteristics differ on frequency of branch mispredicts and cache misses:
    Click image for larger version. 

Name:	Figure+3_+Relative+mispredict+and+miss+rate+for+CPU2006+and+CPU2017+baselined+to+Geekbench+5.png 
Views:	15 
Size:	54.5 KB 
ID:	8077

    Now the real news:
    - AMD released Zen 3-based Ryzen 5000 series CPUs and they deliver relatively big single thread performance gains. AMD destroys Intel in most applications that doesn't use AVX-512 (well, AMD was winning before with Zen 2, but now it's more pronounced), both in single thread and multi thread scenarios. On average AMD loses a bit to Intel in gaming benchmarks, but that's probably because typical testing scenarios fall into a CPU utilization level where Intel has an edge. More details here: https://www.techpowerup.com/review/i...g-performance/
    - Apple finally announces and presented their ARM-based MacBooks (i.e. those with Apple Silicon CPUs/ SoCs named Apple M1)!!! However I hoped they would push their CPU clocks higher. They stopped at 3.2 GHz. That didn't prevent them from winning at least the GeekBench 5 results. MacBooks with Apple M1 clocked 3.2 GHz score a bit above 1700 on GeekBench 5: https://browser.geekbench.com/mac-benchmarks That's faster than Zen 3 based Ryzen 5000 CPUs. Ryzen 5950X is rated at 4.9 GHz boost, but boosts past 5 GHz often. It still, however, achieves less than 1700 on GeekBench 5 single thread benchmark: https://browser.geekbench.com/processor-benchmarks So in the end, Apple's IPC advantage is still huge, even comparing to Zen 3. Thanks to low clocks (just 3.2 GHz) on Apple M1 it's able to be cooles passively. The new MacBook Air is completely fanless, but still scores above 1700 in single thread GeekBench 5. That's awesome.

    Now some opinions (about Apple's ARM-based computers):
    Many people expect that Apple will charge a premium for their Apple Silicon based computers. That's not the case now - their ARM-based laptops are both cheaper and faster than their Intel-based laptops. The opinion stating that Apple will charge a premium for Apple Silicon based computers is based on the situation in smartphone industry where Apple is also using ARM-based CPUs. In the end it's Apple who heightened the prices of flagship smartphones. The situation is vastly different in my opinion. We need to consider what is the dominant ISA in each industry. In desktops it's x86. In smartphones it's ARM. Therefore releasing ARM-based smartphone doesn't need any financial incentives. It's the other way around. It's Intel who had to sell their Atoms without profits to achieve market share in smarphones. They still failed, probably because their Atoms weren't good enough to persuade software vendors to fully and broadly support Intel CPUs/ SoCs. Apple Silicon entering laptop and desktop market is similar to Intel entering smartphone and tablet markets. Apple is fighting dominant role of x86. Therefore they need to offer a solution that is both cheaper and technically better to keep both consumers and software vendors interested in the product. I expect ARM-based Apple computers to be cost effective until Apple becomes confident that their position is strong enough to raise prices up. That should take a few years.

    Another remark about Apple Silicon GeekBench 5 results:
    MacBooks with Apple Silicon score above 1700 in single thread GeekBench 5 while Core i9 MacBooks (e.g. Intel Core i9-9980HK in MacBook Pro from late 2019) score only about 1100 in the same benchmark. Theoretically the gap should be smaller, based on IPC and boost values, but MacBooks with Intel CPUs overheat a lot, leading to low benchmark results. Is this a fault of Apple or Intel? I think it's more Intel's fault as MacBooks are specifically designed to be slim and light. If you want high performance cooling in a laptop choose some Windows based bulky ones. On the other hand, Apple achieved score above 1700 in single thread GeekBench 5 without any active cooling in MacBook Air. That's a lot to catch up to. Apple seems to be absolute king of mobile computers right now.

    Next week ARM-based MacBooks will be available in stores and there should be full reviews of them so there will be even more interesting things to watch.

    PS:
    Some reviewers describe Apple's M1 as ordinary 8-core CPU, while it's 4+4 CPUs. There are 4 really strong cores and 4 pretty weak cores. Apple's M1 results are comparable to Apple's A14 results. Look at comparison between A14 big and small cores: https://www.anandtech.com/show/16226...14-deep-dive/3
    Click image for larger version. 

Name:	spec2006_A14.png 
Views:	23 
Size:	66.3 KB 
ID:	8078
    In Apple 14 big cores have above 3x higher performance than small cores. In the end the whole 4 small cores cluster in A14 is probably about as performant as one big core. The small cores in Apple CPUs probably add even less performance than HyperThreading in latest Intel CPUs.

  21. Thanks (3):

    Bulat Ziganshin (14th November 2020),Cyan (23rd November 2020),Shelwien (14th November 2020)

  22. #18
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    890
    Thanks
    486
    Thanked 279 Times in 119 Posts
    Phoronix made an early comparison of M1 cpu performance on Mac mini:
    https://www.phoronix.com/scan.php?pa...e-mac-m1&num=1

    Results are impressive, especially for the kind of power class M1 works at.
    Even when running `x64` code with an emulation layer, M1 is still faster than its ~2 years old Intel competitor (i7-8700B).

    The downside of this study is that, since it only uses Mac Mini platforms,
    it isn't comparing vs Intel's newer Tiger Lake, which would likely show a more nuanced picture.

    Still, it's an impressive first foray into PC territory.

Similar Threads

  1. IDA 7.2 + hexrays-x64 leak
    By Shelwien in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 25th June 2019, 00:05
  2. ARM processors specific compression
    By WitFed in forum Data Compression
    Replies: 1
    Last Post: 8th September 2014, 19:25
  3. Compression of arm executables.
    By osk in forum Data Compression
    Replies: 2
    Last Post: 19th August 2014, 03:10
  4. native deduplication in Windows 8 x64
    By jimbow in forum Data Compression
    Replies: 6
    Last Post: 31st October 2012, 00:57
  5. pim 2.9 compress mysql 5.1.32 x64 files
    By l1t in forum Data Compression
    Replies: 0
    Last Post: 23rd March 2009, 16:06

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •