Commit graph

381 commits

Author SHA1 Message Date
Nekotekina
b49a1f27eb Warning fixes 2022-09-17 16:35:02 +03:00
Nekotekina
5985f0eefa BufferUtils: cleanup regarding ARM64 2022-09-07 17:59:07 +03:00
sguo35
a0d48c588a spu/arm64: clean up assembly code generation
Clean up asmjit usage so we don't unnecessarily allocate memory
anymore for SPURecompiler functions.
2022-09-07 17:33:01 +03:00
Eladash
ee1384341e rsx: Implement atomic vertex upload (with Strict Rendering Mode) 2022-09-01 20:09:28 +03:00
Eladash
506b9deec5 Savestates/SPU LLVM: Improve saving performance 2022-08-25 23:54:56 +03:00
Malcolm Jestadt
51e6d0a336 SPU LLVM: Add integer compare optimization for FCMGT 2022-07-29 11:59:59 +03:00
sguo35
73ed657e00 spu/arm64: fix 16 byte branch patch alignment 2022-07-15 12:37:33 +03:00
sguo35
c52abed4d3 spu: implement ubertrampoline generator for arm64
Implement the ubertrampoline generator for arm64. It generally follows
the x86 version, but uses asmjit to generate code instead of writing raw
opcodes to memory, trading memory usage for readability. Currently the
trampoline implementation is fairly inefficient in terms of instruction
size and is substantially larger than the x86 version.
2022-07-15 12:37:33 +03:00
sguo35
9e57efe82c spu: implement assembly functions for arm64 2022-07-15 12:37:33 +03:00
sguo35
77ab872bec spu: remove rotqby C++ impl
rotqby C++ implementation is broken, since replacing it with the
intrinsic version reliably fixes spurs test. A conditional branch
immediately after a rotqby instruction will fail using the C++ version
but succeed using the intrinsic.
2022-07-15 12:37:33 +03:00
Eladash
3e51426379 Savestates/SPU: Kill emulation when its safe to save SPU state 2022-07-15 09:30:53 +03:00
Nekotekina
4b787b22c8 Implement FN (lambda shortener)
Useful for some higher order functions.
Allows to make short lambdas even shorter.
2022-07-08 14:47:41 +03:00
Eladash
f0c71ae2ae Savestates: Fix saving sys_event_queue_destroy 2022-07-08 12:57:43 +03:00
Eladash
2ccb0c8f42 SPU LLVM/Savestates: Remove unneeded store insurance and add related fix 2022-07-06 19:43:25 +03:00
Elad Ashkenazi
fcd297ffb2
Savestates Support For PS3 Emulation (#10478) 2022-07-04 16:02:17 +03:00
Ivan
c2190f71ca
SPU/PPU LLVM: fix triple setup (regression fix) (#12228) 2022-06-14 18:13:43 +03:00
Jeff Guo
cefc37a553
PPU LLVM arm64+macOS port (#12115)
* BufferUtils: use naive function pointer on Apple arm64

Use naive function pointer on Apple arm64 because ASLR breaks asmjit.
See BufferUtils.cpp comment for explanation on why this happens and how
to fix if you want to use asmjit.

* build-macos: fix source maps for Mac

Tell Qt not to strip debug symbols when we're in debug or relwithdebinfo
modes.

* LLVM PPU: fix aarch64 on macOS

Force MachO on macOS to fix LLVM being unable to patch relocations
during codegen. Adds Aarch64 NEON intrinsics for x86 intrinsics used by
PPUTranslator/Recompiler.

* virtual memory: use 16k pages on aarch64 macOS

Temporary hack to get things working by using 16k pages instead of 4k
pages in VM emulation.

* PPU/SPU: fix NEON intrinsics and compilation for arm64 macOS

Fixes some intrinsics usage and patches usages of asmjit to properly
emit absolute jmps so ASLR doesn't cause out of bounds rel jumps. Also
patches the SPU recompiler to properly work on arm64 by telling LLVM to
target arm64.

* virtual memory: fix W^X toggles on macOS aarch64

Fixes W^X on macOS aarch64 by setting all JIT mmap'd regions to default
to RW mode. For both SPU and PPU execution threads, when initialization
finishes we toggle to RX mode. This exploits Apple's per-thread setting
for RW/RX to let us be technically compliant with the OS's W^X
    enforcement while not needing to actually separate the memory
    allocated for code/data.

* PPU: implement aarch64 specific functions

Implements ppu_gateway for arm64 and patches LLVM initialization to use
the correct triple. Adds some fixes for macOS W^X JIT restrictions when
entering/exiting JITed code.

* PPU: Mark rpcs3 calls as non-tail

Strictly speaking, rpcs3 JIT -> C++ calls are not tail calls. If you
call a function inside e.g. an L2 syscall, it will clobber LR on arm64
and subtly break returns in emulated code. Only JIT -> JIT "calls"
should be tail.

* macOS/arm64: compatibility fixes

* vm: patch virtual memory for arm64 macOS

Tag mmap calls with MAP_JIT to allow W^X on macOS. Fix mmap calls to
existing mmap'd addresses that were tagged with MAP_JIT on macOS. Fix
memory unmapping on 16K page machines with a hack to mark "unmapped"
pages as RW.

* PPU: remove wrong comment

* PPU: fix a merge regression

* vm: remove 16k page hacks

* PPU: formatting fixes

* PPU: fix arm64 null function assembly

* ppu: clean up arch-specific instructions
2022-06-14 15:28:38 +03:00
Nekotekina
cb2c0733e2 SPU LLVM: fix vrangeps usage in clamp_smax 2022-06-12 16:40:04 +02:00
Malcolm Jestadt
ebeeafc94f SPU LLVM: Use vrangeps in clamp_smax
- This instruction can clamp a value between a range of values, something which previously needed 2 instructions.
- With the immediate byte set to 0x2 it will compute the minimum between the absolute value of the first input and the second input, and then copy the sign from the first input to the result.
2022-06-11 18:25:31 +03:00
Elad Ashkenazi
17e28ae85d SPU LLVM: Improve expression matching detection for moved registers 2022-06-11 16:13:58 +03:00
Malcolm Jestadt
64616f1408 SPU LLVM: Microfixes
- Avoid vpermb path in shufb when op.ra == op.rb
- Reverse indices with (c ^ 0xf) rather than (~c) in vpermb path, vpternlogd is a 3 input operation and requires needless mov instructions to avoid destroying inputs
2022-06-08 22:50:30 +03:00
Malcolm Jestadt
1227b0a633 SPU LLVM: Reneable icelake shufb paths
- The previous code works just fine
2022-06-05 13:08:00 +03:00
Elad Ashkenazi
9bb7e8d614
rsx: Implement atomic FIFO fetching (stability improvement) (non-default setting) (#12107) 2022-06-04 15:35:06 +03:00
Malcolm Jestadt
0e5514003a SPU LLVM: Optimize LQR/STQR
- Avoid type mismatch between adds that prevented llvm from combining the operations
2022-06-03 16:16:28 +03:00
Malcolm Jestadt
e9dfb3cb63 SPU LLVM: Fixup for inline MFC transfers
- Could previsouly segfault when src and dst were swapped. Just use unaligned instructions instead.
2022-05-29 19:08:36 +03:00
Malcolm Jestadt
6f4398889e SPU LLVM: Optimize inline MFC transfers
- Use wider instructions when possible
2022-05-29 15:32:25 +03:00
Eladash
2ba437b6dc SPU: Implement timer freezing ability 2022-05-14 22:03:47 +03:00
Malcolm Jestadt
91673f8fdc SPU LLVM: Add relaxed xfloat option
- This new setting is on by default
- It's active when approximate default is disabled
- Approximate xfloat is now exposed to the gui
2022-01-31 08:02:48 +03:00
Nekotekina
dba2baba9c Implement utils::memory_map_fd (partial)
Improve JIT profiling dump format (data + name, mmap)
Improve objdump interception util (better speed, fix bugs)
Rename spu_ubertrampoline to __ub+number
2022-01-26 15:46:16 +03:00
Nekotekina
11ee1f3eb2 Improve JIT profiling on Linux
Add JIT object dumping functionality.
Add source for objdump interception utility.
2022-01-25 03:16:37 +03:00
Nekotekina
12c83b340d Remove built_function
With today's branch prediction techniques, it's hardly useful.
2022-01-24 22:21:41 +03:00
Nekotekina
4704367382 Remove unnecessary asmjit::imm_ptr 2022-01-18 00:10:32 +03:00
Nekotekina
580bd2b25e Initial Linux Aarch64 support
* Update asmjit dependency (aarch64 branch)
* Disable USE_DISCORD_RPC by default
* Dump some JIT objects in rpcs3 cache dir
* Add SIGILL handler for all platforms
* Fix resetting zeroing denormals in thread pool
* Refactor most v128:: utils into global gv_** functions
* Refactor PPU interpreter (incomplete), remove "precise"
* - Instruction specializations with multiple accuracy flags
* - Adjust calling convention for speed
* - Removed precise/fast setting, replaced with static
* - Started refactoring interpreters for building at runtime JIT
*   (I got tired of poor compiler optimizations)
* - Expose some accuracy settings (SAT, NJ, VNAN, FPCC)
* - Add exec_bytes PPU thread variable (akin to cycle count)
* PPU LLVM: fix VCTUXS+VCTSXS instruction NaN results
* SPU interpreter: remove "precise" for now (extremely non-portable)
* - As with PPU, settings changed to static/dynamic for interpreters.
* - Precise options will be implemented later
* Fix termination after fatal error dialog
2022-01-15 06:48:04 +03:00
Nekotekina
cb2748ae08 Update ASMJIT (new upstream API) 2021-12-29 02:45:00 +03:00
Nekotekina
d836033212 LLVM: enable some JIT events (Intel, Perf)
Made some related adjustments.
Currently incomplete.
2021-12-26 16:41:37 +03:00
Nekotekina
dcd011048d Implement "built_function" utility (runtime-generated assembly)
Similar to build_function_asm, but links without indirection.
Achieved by emitting code directly into a byte array.
2021-12-22 19:27:20 +03:00
Malcolm Jestadt
2f93df480b SPU LLVM: Disable affineqb shufb paths temporarilly 2021-12-10 19:32:10 +03:00
Malcolm Jestadt
0617e9e14b SPU LLVM: Fix vgf2p8affineqb usage
- Reverse the order of the bytes in the selection masks. Previously it was assumed that byte 0 would determine the output of bit 0, but byte 7 determines the output of bit 0.
2021-12-06 12:34:11 +03:00
Malcolm Jestadt
3fde455932 SPU LLVM: Optimize branch following ORX
- test the input of ORX directly for zeroes, instead of the result
2021-11-11 12:58:38 +03:00
Malcolm Jestadt
7573d7289b SPU LLVM: Hook up 128 bit spu verification
- Also fix FMA enablement for sapphirerapids
2021-11-06 21:12:12 +03:00
Nekotekina
69f321a471 LLVM 13 2021-11-02 20:11:08 +03:00
Malcolm Jestadt
f06c8b22e8 PPU/SPU LLVM: Emulate VPERM2B with a 256 bit wide VPERMB
- Save 1 uop by using 256 wide VPERMB instead of VPERM2B. (Compiles down to a vinserti128 and vpermb)
2021-10-13 17:51:54 +03:00
Eladash
ab50e5483e
GUI Utilities: Implement instruction search, PPU/SPU disasm improvements (#10968)
* GUI Utilities: Implement instruction search in PS3 memory
* String Searcher: Case insensitive search
* PPU DisAsm: Comment constants with ORI
* PPU DisAsm: Add 64-bit constant support
* SPU/PPU DisAsm: Print CELL errors in disasm
* PPU DisAsm: Constant comparison support
2021-10-12 23:12:30 +03:00
Malcolm Jestadt
86716dc37b SPU LLVM: Optimize branches following byteswaps
- The first element can be extracted via vmovd rather than vpextrd, which saves 1 uop.
2021-09-30 13:22:35 +03:00
Malcolm Jestadt
f9ab077908 SPU LLVM: Use VDBPSADBW in SUMB
- This instruction can be used to sum bytes horrizontally if the second input vector is all zeroes.
2021-09-30 13:22:35 +03:00
Nekotekina
9e62ca562b SPU LLVM: implement SQRT and DIV pattern detection (xf) 2021-09-17 10:23:43 +03:00
Nekotekina
d28b0ba2fa SPU LLVM: implement spu_re, spu_rsqrte
Improve matching with peek_through_bitcasts() helper.
Implement erase_stores() helper.
2021-09-17 10:23:43 +03:00
Nekotekina
aba332d4c4 SPU LLVM: make intrinsics for most xfloat instructions 2021-09-17 10:23:43 +03:00
Nekotekina
543fb7a9cb LLVM DSL / SPU LLVM: implement infinite precision shifts
Remove old make_*** helpers in favor of matcheable expressions.
2021-09-17 10:23:43 +03:00
Nekotekina
67b3fc70f8 LLVM DSL: implement absd and match helpers
Matcheable expression absd(a, b) (absolute difference).
2021-09-17 10:23:43 +03:00