* sys_spu: Fix race in sys_spu_thread_group_destroy and other minor fixes
* SPU: Wait for all threads to have error codes if exited by sys_spu_thread_exit
On last thread in group to run.
* sys_spu: Fix sys_spu_thread_group_start
* fixup ad fix sys_spu_thread_group_terminate
idk why "- !group->running" was put in the first place but its probably no longer relevant due to other changes and was causing other issues such as not always waiting for last SPU thread to set group state to INITIALIZED.
* Removed wrong code in sys_spu_thread_group_terminate.
* SPU Thread ID is accurate, including 5th thread id "rule".
* Fixed possible use-after-free access of spu_thread::group member.
* RawSPU ID management simplified.
vm::spu max address was overflowing resulting in issues, so cast to u64 where needed. Fixes#6145.
Use vm::get_addr instead of manually substructing vm::base(0) from pointer in texture cache code.
Prefer std::atomic_thread_fence over _mm_?fence(), adjust usage to be more correct.
Used sequantially consistent ordering in semaphore_release for TSX path as well.
Improved memory ordering for sys_rsx_context_iounmap/map.
Fixed sync bugs in HLE gcm because of not using atomic instructions.
Use release memory barrier in lwsync for PPU LLVM, according to this xbox360 programming guide lwsync is a hw release memory barrier.
Also use release barrier where lwsync was originally used in liblv2 sys_lwmutex and cellSync.
Use acquire barrier for isync instruction, see https://devblogs.microsoft.com/oldnewthing/20180814-00/?p=99485
- Multiple header files where missing #includes to other headers that
where used in the header. Correct header was included in correct
order in source files which caused everything to compile.
- Added missing #includes so header files correctly include all their
dependencies and fixes problems with IDEs being unable to parse
headers correctly due to missing symbols
Misc: fix EH frame registration (LLVM, non-Windows).
Misc: constant-folding bitcast (cpu_translator).
Misc: add syntax for LLVM arrays (cpu_translator).
Misc: use function names for proper linkage (SPU LLVM).
Changed function search and verification in Giga mode.
Basic stack frame layout analysis.
Function detection in Giga mode.
Basic use of new information in SPU LLVM.
Fixed jump table compilation in SPU LLVM.
Disable broken optimization in Accurate xfloat mode.
Make compiled SPU modules position-independent in SPU LLVM.
Optimizations include but not limited to:
* Compiling SPU functions as native functions when eligible
* Avoiding register context write-out
* Aligned stack assumption (CWD alike instruction)
* Fix SPU LR event setting in atomic commands according to hw test
* MFC: increment timestamp for PUT cmd in non-tsx path
* MFC: fix reservation lost test on non-tsx path in regard to the lock bit
* Reservation notification moved out of writer_lock scope to reduce its lifetime
* Use passive_lock/unlock in ppu atomic inctrustions to reduce redundancy
* Lock only once for dma transfers (non-TSX)
* Don't use RDTSC in reservation update logic
* Remove MFC cmd args passing to process_mfc_cmd
* Reorder check_state cpu_flag::memory check for faster unlocking
* Specialization for 128-byte data copy in SPU dma transfers
* Implement memory range locks and isolate PPU and SPU passive lock logic
Move lambda into a cpu_stop()
Use running thread counter to synchronize with sys_spu_thread_group_join()
Use SPU_STATUS_STOPPED_BY_STOP exclusively for sys_spu_thread_exit() as before
Remove unnecessary waiting in sys_spu_thread_group_exit()
Rollback some minor unnecessary changes
Use shared_mutex in SPU TG
Disable extra modes for SPU LLVM for now.
In Mega mode, SPU Analyser tries to determine complete functions.
Recompiler tries to speed up returns via 'stack mirror'.
Allow indirect calls within current function using a jumptable
This restores some functionality removed in SPU ASMJIT 2.0
Change SPUThread::get_ch_value prototype
Use opt-out shared spu_runtime to save memory (Option: SPU Shared Runtime)
Implement "übertrampolines" for dispatching compiled blocks
Patch fixed branch points to use trampolines after check failure