Commit graph

314 commits

Author SHA1 Message Date
Nekotekina
2b4da18709 SPU LLVM: fix xfloat regression
It was an old bug with possible hidden use of deleted instructions.
2019-05-02 13:39:43 +03:00
Nekotekina
d48dc29e55 SPU LLVM: fix perf regression
Bug in the analyser was created recently in #5882.
2019-05-02 13:39:43 +03:00
Nekotekina
69d2ea35b9 SPU: minor analyser cleanup 2019-05-02 13:39:43 +03:00
Nekotekina
a4c4ee9cb2 SPU: fix excessive cache size regression 2019-05-02 13:39:43 +03:00
Nekotekina
1bc5e27507 SPU LLVM: move reg origin search to analyser
Refactor SPU analyser (block_info struct).
Fill register use info (currently unused).
2019-05-01 00:37:15 +03:00
Nekotekina
1294e0d189 SPU LLVM: improve codegen in loops
Use a trick in check_state to improve LICM pass.
2019-05-01 00:37:15 +03:00
Nekotekina
e09c6ea4b4 SPU analyser: add spu_iflag
Register information about register accesses.
2019-04-30 14:33:27 +03:00
Nekotekina
716737ecf2 LLVM DSL: expression matching (alpha)
Implement remaining instructions.
Implement match_expr method.
Implement helper methods.
2019-04-30 14:33:27 +03:00
Nekotekina
8754bbd444 SPU LLVM: add match_vr<> template
Returns reg value only if type is compatible, avoiding bitcast.
2019-04-24 23:55:41 +03:00
Nekotekina
dd9bd1338b SPU LLVM: add get_vrs<> template 2019-04-24 23:55:41 +03:00
Nekotekina
3e0b45719d LLVM DSL: rewrite zshuffle, shuffle2, build
Add llvm_const_vector template.
2019-04-24 23:55:41 +03:00
Nekotekina
b02503963e LLVM DSL: rewrite splat, fsplat, vsplat
Add llvm_const_float and llvm_splat templates.
2019-04-24 23:55:41 +03:00
Nekotekina
c83e65f29e LLVM DSL: rewrite extract and insert 2019-04-24 23:55:41 +03:00
Nekotekina
b7b93eae13 SPU LLVM: minor bitcast cleanup
Remove redundant explicit constand propagation in some instructions.
2019-04-24 23:55:41 +03:00
Nekotekina
ac473eb400 Rewrite cpu_translator::rol, add fshl and fshr
Use new funnel shift intrinsics
2019-04-24 23:55:41 +03:00
Nekotekina
42448cf3e5 Remove cpu_translator::scarry, cpu_translator::merge 2019-04-24 23:55:41 +03:00
Nekotekina
524aac75ed LLVM DSL: rewrite bitcast, zext, sext, trunc, select, min, max ops
Are made composable in expressions similar to arithmetic ops.
Implement noncast in addition to bitcast (no-op case).
Implement bitcast constant folding.
Fixed some misuse of sext<>.
2019-04-24 23:55:41 +03:00
Nekotekina
dc9118ef50 LLVM DSL refactoring
Properly forward value categories in expression structs.
Simplify SFINAE tests (is_llvm_expr, llvm_common_t) in global operators.
Add llvm_const_int and remove llvm_add_const, llvm_sub_const, etc.
Add llvm_ord and llvm_uno for FP comparison via >=< operators.
Replace cpu_translator::fcmp with fcmp_ord and fcmp_uno.
2019-04-24 23:55:41 +03:00
Nekotekina
8deb20e928 SPU: write cache before compiling 2019-04-13 22:56:11 +03:00
eladash
8da78c098c SPU LLVM: Fix branch to self at start of block state check 2019-04-11 17:47:52 +03:00
eladash
eba8e2284b SPU LLVM: Fix CFLTU
Clamp properly result from both sides!

TODO: Figure out whats different CreateFPToUi has from CFLTU and why it fails here.
2019-04-11 17:47:52 +03:00
eladash
969af86eba SPU: Implement BISLED
DFCMGT instruction removed, it was wrong to add to begin with

ASMJIT: Fix compilation of double compare instructions, move exception to runtime instead of compiletime!
Jarves confirmed that he implemented this instruction because of that bug with asmjit only, affected God Of War 3
2019-04-11 17:47:52 +03:00
Nekotekina
d873802b9c Use LLVM 9
Use new add/sub with saturation intrinsics
2019-03-30 01:36:48 +03:00
Nekotekina
d77fed6105 SPU LLVM: remove wrong dead code 2019-03-29 17:00:53 +03:00
Nekotekina
71b88cdc82 New SPU interpreter (SPU fast)
Use LLVM to build SPU interpreter.
Simplify interpreter loop.
2019-03-27 20:33:44 +03:00
Nekotekina
7ea04d5d76 Minor optimization in SPU analyser
Reduce vector copy/allocation
2019-03-23 02:43:41 +03:00
Nekotekina
4b381fbbb1 Implement spu_runtime::reset
To handle JIT: Out Of Memory error.
2019-03-23 02:43:41 +03:00
Nekotekina
1880a17f79 SPU recs: implement spu_runtime::find
Use this function to link to existing functions from branch patchpoints.
Don't compile from branch patchpoints.
2019-03-23 02:43:41 +03:00
Nekotekina
31304f4234 SPU rec: refactor some trampoline generation
Move branch/dispatch trampoline generation at startup.
2019-03-23 02:43:41 +03:00
Nekotekina
3794f65bb6 Add cpu_flag::jit_return 2019-03-23 02:43:41 +03:00
Nekotekina
466d58ccef SPU LLVM: fix branch patchpoints
Forgot to passthrough 3rd arg (rip)
2019-03-23 02:43:41 +03:00
Nekotekina
e9b6beadfc SPU LLVM: implement static branch weights
May help branch prediction in some cases
2019-03-13 21:14:55 +03:00
Nekotekina
388d49db80 SPU LLVM: fix SPU MMIO in TSX mode 2019-03-13 21:14:55 +03:00
Nekotekina
fb64b28886 SPU LLVM: reintroduce branch patchpoints
Previously only used on SPU ASMJIT, may improve perf in some cases.
Now refactored to spu_runtime::make_branch_patchpoint.
2019-03-01 00:08:20 +03:00
Nekotekina
765d15f23f Optimize SPU trampolines
Load values in EAX and reuse it if possible
2019-03-01 00:08:19 +03:00
Nekotekina
58358e85dd spu_runtime::add minor optimization
Use preallocated vectors in trampoline generation subroutine
2019-01-29 03:32:16 +03:00
Nekotekina
2b66abaf10 Implement atomic_t<>::release
More relaxed store with release memory order
2019-01-29 03:32:16 +03:00
Nekotekina
50922faac9 Remove SPUThread::jit_dispatcher
Use global array - save memory
Move the array to JIT memory
2019-01-29 03:32:16 +03:00
Nekotekina
4292997a01 Added jit_runtime class
Is a memory manager for ASMJIT, replaces asmjit::JitRuntime
Unified memory manager for ASMJIT and LLVM
Unified SPU trampoline generation
Remove previous workarounds
2019-01-29 03:32:16 +03:00
Nekotekina
4f152ad126 SPU: multithread compilation
Allow parallel compilation of SPU code, both at startup and runtime
Remove 'SPU Shared Runtime' option (it became obsolete)
Refactor spu_runtime class (now is common for ASMJIT and LLVM)
Implement SPU ubertrampoline generation in raw assembly (LLVM)
Minor improvement of balanced_wait_until<> and balanced_awaken<>
Make JIT MemoryManager2 shared (global)
Fix wrong assertion in cond_variable
2019-01-22 22:02:02 +03:00
elad
fc92ae4085 SPU/PPU atomics performance and LR event fixes (#5435)
* Fix SPU LR event setting in atomic commands according to hw test
* MFC: increment timestamp for PUT cmd in non-tsx path
* MFC: fix reservation lost test on non-tsx path in regard to the lock bit
* Reservation notification moved out of writer_lock scope to reduce its lifetime
* Use passive_lock/unlock in ppu atomic inctrustions to reduce redundancy
* Lock only once for dma transfers (non-TSX)
* Don't use RDTSC in reservation update logic
* Remove MFC cmd args passing to process_mfc_cmd
* Reorder check_state cpu_flag::memory check for faster unlocking
* Specialization for 128-byte data copy in SPU dma transfers
* Implement memory range locks and isolate PPU and SPU passive lock logic
2019-01-15 18:31:21 +03:00
eladash
f19fd23227 spu: Fix support for multiple lists when one is stalled 2019-01-15 02:33:22 +03:00
Nekotekina
a419e98acb Move PPU and shader cache
New hash-based location (already used for SPU)
Bump PPU cache version, improve naming and decrease size

Remove fs::get_data_dir
Disable boot.elf cache
2019-01-14 01:24:05 +03:00
Nekotekina
aefee04c4a SPU analyser: fix branch to self
Fixed not filling the predeccessor list on BR-to-self on entry point
Version bumped (v1-tane)
Closes #5353
2019-01-14 00:01:27 +03:00
Nekotekina
d7be0a96f3 SPU LLVM: approximate xfloat option
Adapt previous SPU ASMJIT changes made by @kd-11
FM, FMA, FNMS, FMS are approximated.
FCGT, FCMGT are accurate.
2018-12-24 16:04:46 +03:00
Nekotekina
2fd384ae95 SPU LLVM: check state in every callable chunk
It's often redundant but may be necessary
2018-11-09 16:19:59 +03:00
Nekotekina
488928eca2 Fix SPU STOP instruction
Check thread state after STOP instruction
2018-11-05 14:35:50 +03:00
Nekotekina
1b37e775be Migration to named_thread<>
Add atomic_t<>::try_dec instead of fetch_dec_sat
Add atomic_t<>::try_inc
GDBDebugServer is broken (needs rewrite)
Removed old_thread class (former named_thread)
Removed storing/rethrowing exceptions from thread
Emu.Stop doesn't inject an exception anymore
task_stack helper class removed
thread_base simplified (no shared_from_this)
thread_ctrl::spawn simplified (creates detached thread)
Implemented overrideable thread detaching logic
Disabled cellAdec, cellDmux, cellFsAio
SPUThread renamed to spu_thread
RawSPUThread removed, spu_thread used instead
Disabled deriving from ppu_thread
Partial support for thread renaming
lv2_timer... simplified, screw it
idm/fxm: butchered support for on_stop/on_init
vm: improved allocation structure (added size)
2018-10-19 22:22:35 +03:00
Nekotekina
ca5158a03e Cleanup semaphore<> (sema.h) and mutex.h (shared_mutex)
Remove semaphore_lock and writer_lock classes, replace with std::lock_guard
Change semaphore<> interface to Lockable (+ exotic try_unlock method)
2018-09-03 23:00:36 +03:00
Nekotekina
8abe6489ed Mega-cleanup for atomic_t<> and named bit-sets bs_t<>
Remove "atomic operator" classes
Remove test, test_and_set, test_and_reset, test_and_complement global functions
Simplify atomic_t<> with constexpr if, remove some garbage
Redesign bs_t<> to use class, mark its methods constexpr
Implement atomic_bs_t<> for optimizations
Remove unused __bitwise_ops concept (should be in other header anyway)
Bitsets can now be tested via safe bool conversion
2018-09-03 21:40:36 +03:00