Summary:This allows the passing of extra arguments from builders to annotated script runners.Originally I was planning on adding the ability to pass `extra_cmake_args`, however this approach seems more general to fit many needs.For libc builders this will allow us to pass `--asan` and in the script this will add in the `extra_cmake_args`.

Summary:Proposal and roadmap towards vector predication in LLVM.This patch documents thata) It is recognized that current LLVM is ill-equipped for vector predication.b) The community is working on a solution.c) A concrete prototype exists in the VP extension (D57504).

1) Fix a regression in llvmorg-11-init-2485-g0e3a4877840 that wouldreject some cases where a class name is shadowed by a typedef-namecausing a destructor declaration to be rejected. Prefer a tag type overa typedef in destructor name lookup.

2) Convert the "type in destructor declaration is a typedef" error to anerror-by-default ExtWarn to allow codebases to turn it off. GCC and MSVCdo not enforce this rule.

This patch adds a first version of a MemorySSA based DSE. It is missinga lot of features, which will get added as follow-ups, to help to keepthe review manageable.

The patch uses the following general approach: given a MemoryDef, walkupwards to find clobbering MemoryDefs that may be killed by thestarting def. Then check that there are no uses that may read thelocation of the original MemoryDef in between both MemoryDefs. A bitmore concretely:

For all MemoryDefs StartDef:1. Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards.2. Check that there no reads between DomAccess and the StartDef by checking all uses starting at DomAccess and walking until we see StartDef.3. For each found DomDef, check that: 1. There are no barrier instructions between DomDef and StartDef (like throws or stores with ordering constraints). 2. StartDef is executed whenever DomDef is executed.3. StartDef completely overwrites DomDef.4. Erase DomDef from the function and MemorySSA.

The patch uses a very simple approach to guarantee that no throwinginstructions are between 2 stores: We only allow accesses to stackobjects, access that are in the same basic block if the block does notcontain any throwing instructions or accesses in functions that donot contain any throwing instructions. This will get lifted later.

Besides adding support for the missing cases, there is plenty of additionalpotential for improvements as follow-up work, e.g. the way we visit stores(could be just a traversal of the MemorySSA, rather than collecting themup-front), using the alias information discovered during walking to optimizethe MemorySSA.

m_size can only be 1 or 0 and indicates if the optional has a value. Callingit 'm_size', giving it a size_t data type and then also comparing indices against'size' is very confusing. Let's just make this a bool.

As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).---Internal shuffle tests indicate theres a bug somewhere that I haven't been able to track down yet.

This change implements the llvm intrinsic llvm.read_register forthe SystemZ platform which returns the value of the specifiedregister(http://llvm.org/docs/LangRef.html#llvm-read-register-and-llvm-write-register-intrinsics).This implementation returns the value of the stack register, andcan be extended to return the value of other registers. Theimplementation for this intrinsic exists on various other platformsincluding Power, x86, ARM, etc. but missing on SystemZ.

The DebugInfo/dwarfdump-invalid-line-table test used a pre-canned binarygenerated by a fuzzer to demonstrate a bug fix. Unfortunately, thebinary is rigid and requires hand-editing if we change behaviour, suchas rejecting certain properties within it (as I plan on doing in anotherchange).

Rather than hand-edit the binary, I have replaced it with two tests. Thefirst tests the high-level code path from the debug line parser thatproduces the same error as this test previously did, and the second is aset of unit test cases that comprehensively cover theFormValue::skipValue method, which in turn covers the area that theoriginal bug fix touched.

The existing (default) calling convention for memrefs in standard-to-LLVMconversion was motivated by interfacing with LLVM IR produced from C sources.In particular, it passes a pointer to the memref descriptor structure whencalling the function. Therefore, the descriptor is allocated on stack beforethe call. This convention leads to several problems. PR44644 indicates aproblem with stack exhaustion when calling functions with memref-typedarguments in a loop. Allocating outside of the loop may lead to concurrentaccess problems in case the loop is parallel. When targeting GPUs, the contentsof the stack-allocated memory for the descriptor (passed by pointer) needs tobe explicitly copied to the device. Using an aggregate type makes it impossibleto attach pointer-specific argument attributes pertaining to alignment andaliasing in the LLVM dialect.

Change the default calling convention for memrefs in standard-to-LLVMconversion to transform a memref into a list of arguments, each of primitivetype, that are comprised in the memref descriptor. This avoids stack allocationfor ranked memrefs (and thus stack exhaustion and potential concurrent accessproblems) and simplifies the device function invocation on GPUs.

Provide an option in the standard-to-LLVM conversion to generate auxiliarywrapper function with the same interface as the previous calling convention,compatible with LLVM IR porduced from C sources. These auxiliary functionspack the individual values into a descriptor structure or unpack it. They alsohandle descriptor stack allocation if necessary, serving as an allocationscope: the memory reserved by `alloca` will be freed on exiting the auxiliaryfunction.

The effect of this change on MLIR-generated only LLVM IR is minimal. Wheninterfacing MLIR-generated LLVM IR with C-generated LLVM IR, the integrationonly needs to require auxiliary functions and change the function name to callthe wrapper function instead of the original function.

This also opens the door to forwarding aliasing and alignment information frommemrefs to LLVM IR pointers in the standrd-to-LLVM conversion.

Add a simplification to fuse a manual vector extract with shifts andtruncate into a bitcast.

Unpacking and packing values into vectors is only optimized withextractelement instructions, not when manually unpacked using shiftsand truncates.This patch simplifies shifts and truncates into a bitcast if possible.

If a debug line section with version of greater than 5 is encountered,prior to this change the parser would accept it and treat it as version5. This might work to some extent, but then it might not at all, as itreally depends on the format of the unspecified future version, whichwill be different (otherwise there would be no point in changing theversion number). Any information we could provide has a good chance ofbeing invalid, so we should just refuse to parse such tables.

[compiler-rt] Some clean up / refactoring in sanitizer_symbolizer_libcdep.cpp.

Summary:Nothing critical, just a few potential improvements I've noticed while readingthe code:- return `false` when symbolizer buffer is too small to read all data- invert some conditions to reduce indentation- prefer `nullptr` over `0` for pointers; init some pointers on stack;- remove minor code duplication

Create a clang-tidy check to warn when -dealloc is implemented inside an ObjC class category.

Summary: Such implementations may override the class's own implementation, and even be a danger in case someone later comes and adds one to the class itself. Most times this has been encountered have been a mistake.

The current standard to llvm conversion pass lowers subview ops only ifdynamic offsets are provided. This commit extends the lowering with acode path that uses the constant offset of the target memref for thesubview op lowering (see Example 3 of the subview op definition for anexample) if no dynamic offsets are provided.

This patch renames `__personality_routine` to `_Unwind_Personality_Fn`in `unwind.h`. Both `unwind.h` from clang and GCC headers use this nameinstead of `__personality_routine`. With this patch one is also able tobuild libc++abi with libunwind support on Windows.

This CL refactors EDSCs to layer them better and break unnecessarydependencies. After this refactoring, the top-level EDSC target onlydepends on IR but not on Dialects anymore and each dialect has itsown EDSC directory.

This simplifies the layering and breaks cyclic dependencies.In particular, the declarative builder + folder are made explicit andare now confined to Linalg.

As the refactoring occurred, certain classes and abstractions that were notpaying for themselves have been removed.

was intended to automatically derive the type of i from n(signed/unsigned int) and avoid the 'mixed signed/unsigned comparison'warning. However, almost-always-auto was never used in the LLVM codingstyle (although we used it in Polly for some time) and I did neverintended to use this idiom upstream.

PVS Studio may warns about this idiom as 'warning: both sides ofoperator are equivalent [misc-redundant-expression]'.

Look up the -arch flags to pass to the mig invocation from anoptionally-defined MIG_ARCHS variable. We can't use CMAKE_OSX_ARCHSbecause the {i,tv,watch}OS builds don't use this mechanism to achievefat builds (they build each slice separately & then lipo them together).

Summary:Instead of hand-crafting an offset into the structure returned bydlopen(3) to get at the link map, use the documented API. This isdescribed in dlinfo(3): by calling it with `RTLD_DI_LINKMAP`, thedynamic linker ensures the right address is returned.

LoopCacheAnalysis currently assumes the loop will be iterated over ina forward direction. This patch addresses the issue by using theabsolute value of the stride when iterating backwards.

Note: this patch will treat negative and positive array access thesame, resulting in the same cost being calculated for single andbi-directional access patterns. This should be improved in asubsequent patch.

Summary:The return address validation in D71372 will fail if the memory permissions can't be determined. Many embedded stubs either don't implement the qMemoryRegionInfo packet, or don't have memory permissions at all.

Remove the return from the if clause that calls GetLoadAddressPermissions, so this call failing doesn't cause the step out to abort. Instead, assume that the memory permission check doesn't apply to this type of target.

Summary:This revision adds EDSC support for VectorOps to enable the creation of a `vector_matmul` declaratively. The `vector_matmul` is a simple configuration of the `vector.contract` op that follows the StructuredOps abstraction.

We have spv.entry_point_abi for specifying the local workgroup size.It should be decorated onto input gpu.func ops to drive the SPIR-VCodeGen to generate the proper SPIR-V module execution mode. Comparedto using command-line options for specifying the configuration, usingattributes also has the benefits that 1) we are now able to usedifferent local workgroup for different entry points and 2) thetests contains the configuration directly.

Summary: The lit feature object-emission was added because Hexagon did not support the integrated assembler, so some tests needed to be turned off with a Hexagon target. Hexagon now supports the integrated assembler, so this feature can be removed.

Null-check and adjut a TypeLoc before casting it to a FunctionTypeLoc.This fixes a crash in -fsanitize=nullability-return, and also makes thelocation of the nonnull type available when the return type is adjusted.

As discussed in PR41083:https://bugs.llvm.org/show_bug.cgi?id=41083...we can assert/crash in EarlyCSE using the current hashing scheme andinstructions with flags.

ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) orother flags when detecting patterns such as min/max/abs composed ofcompare+select. But the value numbering / hashing mechanism used byEarlyCSE intersects those flags to allow more CSE.

Several alternatives to solve this are discussed in the bug report.This patch avoids the issue by doing simple matching of min/max/abspatterns that never requires instruction flags. We give up some CSEpower because of that, but that is not expected to result in muchactual performance difference because InstCombine will canonicalizethese patterns when possible. It even has this comment for abs/nabs:

/// Canonicalize all these variants to 1 pattern. /// This makes CSE more likely.

(And this patch adds PhaseOrdering tests to verify that the expectedtransforms are still happening in the standard optimization pipelines.

I left this code to use ValueTracking's "flavor" enum values, so wedon't have to change the callers' code. If we decide to go back tousing the ValueTracking call (by changing the hashing algorithminstead), it should be obvious how to replace this chunk.

Summary:Instead of hand-crafting an offset into the structure returned bydlopen(3) to get at the link map, use the documented API. This isdescribed in dlinfo(3): by calling it with `RTLD_DI_LINKMAP`, thedynamic linker ensures the right address is returned.

This is a recommit of 92e267a94dc4272511be674062f8a3e8897b7083, withdlinfo(3) expliclity being referenced only for FreeBSD, non-AndroidLinux, NetBSD and Solaris. Other OSes will have to add their ownimplementation.

Add an optional table lookup after the existing logarithm computationfor MidSize < Size <= MaxSize during size -> class lookups. The lookup isO(1) due to indexing a precomputed (via constexpr) table based on a sizetable. Switch to this approach for the Android size class maps.

Other approaches considered:- Binary search was found to have an unacceptable (~30%) performance cost.- An approach using NEON instructions (see older version of D73824) was found to be slightly slower than this approach on newer SoCs but significantly slower on older ones.

By selecting the values in the size tables to minimize wastage (for example,by passing the malloc_info output of a target program to the includedcompute_size_class_config program), we can increase the density of allocationsat a small (~0.5% on bionic malloc_sql_trace as measured using an identitytable) performance cost.

I did this 3 times both before and after this change and the results were:

Before: 365650, 356795, 372663After: 344521, 356328, 342589

These results are noisy so it is hard to make a definite conclusion, butthere does appear to be a significant effect.

On other platforms, increase the sizes of all size classes by a fixed offsetequal to the size of the allocation header. This has also been found to improvedensity, since it is likely for allocation sizes to be a power of 2, whichwould otherwise waste space by pushing the allocation into the next size class.

I'm /guessing/ this isn't terribly testable without a very large inputfile. Even generated from a more compact assembly file, it's probablybest not to generate a giant temporary test file - if I'm wrong aboutthat/anyone has good suggestions for testing, I'm all ears!

Based on post-commit review feedback from Igor Kudrin oneed0242330926815d19dd0d54f393576bcffc762

[libFuzzer] communicate through pipe to subprocess for MinimizeCrashInput

For CleanseCrashInput, discards stdout output anyway since it is not used.

These changes are to defend against aggressive PID recycle on windows to reduce the chance of contention on files.

Using pipe instead of file also workaround the problem that when theprocess is spawned by llvm-lit, the aborted process keeps a handle to theoutput file such that the output file can not be removed. This willcause random test failures.

The primary motivation is to fix an assertion failure inisl_basic_map_alloc_equality:

isl_assert(ctx, room_for_con(bmap, 1), return -1);

Although the assertion does not occur anymore, I could not identifywhich of ISL's commits fixed it.

Compared to the previous ISL version, Polly requires some changes for this update

* Since ISL commit 20d3574 "perform parameter alignment by modifying both arguments to function" isl_*_gist_* and similar functions do not always align the paramter list anymore. This caused the parameter lists in JScop files to become out-of-sync. Since many regression tests use JScop files with a fixed parameter list and order, we explicitly call align_params to ensure a predictable parameter list.

* ISL changed some return types to isl_size, a typedef of (signed) int. This caused some issues where the return type was unsigned int before: - No overload for std::max(unsigned,isl_size) - It cause additional 'mixed signed/unsigned comparison' warnings. Since they do not break compilation, and sizes larger than 2^31 were never supported, I am going to fix it separately.

* With the change to isl_size, commit 57d547 "isl_*_list_size: return isl_size" also changed the return value in case of an error from 0 to -1. This caused undefined looping over isl_iterator since the 'end iterator' got index -1, never reached from the 'begin iterator' with index 0.

* Some internal changes in ISL caused the number of operations to increase when determining access ranges to determine aliasing overlaps. In one test, this caused exceeding the default limit of 800000. The operations-limit was disabled for this test.

The existing wording leaves it unclear if C++ standard library datastructures should be preferred over custom LLVM ones, e.g., SmallVector,even though common practice seems clear on the issue. This change makesthe wording more explicit and aligns it better with the code base.

We need to use vector instructions for these operations. Previouslywe handled this with isel patterns that used extra instructionsand copies to handle the the conversions.

Now we use custom lowering to emit the conversions. This allowsthem to be pattern matched and optimized on their own. Forexample we can now emit vpextrw to store the result if its goingdirectly to memory.

I've forced the upper elements to VCVTPHS2PS to zero to keep somecode similar. Zeroes will be needed for strictfp. I've added aDAG combine for (fp16_to_fp (fp_to_fp16 X)) to avoid extrainstructions in between to be closer to the previous codegen.

StringRef will call strlen on the C string which is inefficient (as ConstString alreadyknows the string lenght and so does StringRef). This patch replaces all those callswith GetStringRef() which doesn't recompute the length.

Summary:When renaming a class with template constructors, we are missing theoccurrences of the template constructors, because getUSRsForDeclaration doesn'tgive USRs of the templated constructors (they are not in the normal `ctors()`method).

LiveDebugVariables uses interval maps to explicitly represent DBG_VALUEintervals. DBG_VALUEs are filtered into an interval map based on their {Variable, DIExpression }. The interval map will coalesce adjacent entries thatuse the same { Location }. Under this model, DBG_VALUEs which refer to the samebits of the same variable will be filtered into different interval maps if theyhave different DIExpressions which means the original intervals will not beproperly preserved.

This patch fixes the problem by using { Variable, Fragment } to filter theDBG_VALUEs into maps, and coalesces adjacent entries iff they have the same{ Location, DIExpression } pair.

The solution is not perfect because we see the similar issues appear whenpartially overlapping fragments are encountered, but is far simpler than acomplete solution (i.e. D70121).

Fixup the UserValue methods to use FragmentInfo instead of DIExpression becausethe DIExpression is only ever used to get the to get the FragmentInfo. TheDIExpression is meaningless in the UserValue class because each definition pointadded to a UserValue may have a unique DIExpression.

[Debuginfo][NFC] Rename error handling functions using the same pattern.

Summary:That patch is extracted from https://reviews.llvm.org/D74308.Currently there are two patterns to name error handling functions:using "Callback" and "Handler". This patch uses "Handler" for allusage places.

Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.

Summary:The only use of this class was to implement the SharedCluster of ValueObjects.However, the same functionality can be implemented using a regularstd::shared_ptr, and its little-known "sub-object pointer" feature, where thepointer can point to one thing, but actually delete something else when it goesout of scope.

This patch reimplements SharedCluster using this feature --SharedClusterPointer::GetObject now returns a std::shared_pointer which pointsto the ValueObject, but actually owns the whole cluster. The only change Ineeded to make here is that now the SharedCluster object needs to be createdbefore the root ValueObject. This means that all private ValueObjectconstructors get a ClusterManager argument, and their static Create functions dothe create-a-manager-and-pass-it-to-value-object dance.

We now have a virtual-functions test and a multiple-inheritance test thatare testing the same functionality (and more) using the newer test functions whichwe have in LLDB these days. These tests should also be less flaky andless dependent on other unrelated LLDB functionality.

[libc++] Disable a filesystem test that uses debug mode with the macOS system libc++

The system libc++.dylib doesn't support the debug mode, so this testcan't be supported. As a fly-by fix, we also specify more stringentlythat only the macOS system library is unsupported in other tests usingthe debug mode.

The C++ rules briefly allowed this, but the rule changed nearly 10 yearsago and we never updated our implementation to match. However, we'vewarned on this by default for a long time, and no other compiler accepts(even as an extension).

[OPENMP50]Add restrictions for memory order clauses in atomic directive.

Added restrictions for atomic directive.1. If atomic-clause is read then memory-order-clause must not be acq_rel or release.2. If atomic-clause is write then memory-order-clause must not be acq_rel or acquire.3. If atomic-clause is update or not present then memory-order-clause must not be acq_rel or acquire.

Added a test for #pragma clang __debug llvm_fatal_error to test for the original issue.Added llvm::sys::Process::Exit() and replaced ::exit() in places where it was appropriate. This new function would call the current CrashRecoveryContext if one is running on the same thread; or call ::exit() otherwise.

This patch removes forcedconstant to simplify things for themove to ValueLattice, which includes constant ranges, but noforced constants.

This patch removes forcedconstant and changes ResolvedUndefsInto mark instructions with unknown operands as overdefined. Thismeans we do not do simplifications based on undef directly in SCCPany longer, but this seems to hardly come up in practice (see statsbelow), presumably because InstCombine & others take careof most of the relevant folds already.

It is still beneficial to keep ResolvedUndefIn, as it allows us delayinggoing to overdefined until we propagated all known information.

I also built MultiSource, SPEC2000 and SPEC2006 and comparedsccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impactis quite low:

Note that you have to be careful about whether the function return typeis `auto` or `decltype(auto)`. The difference is that bare `auto`strips const and reference, just like lambda return type deduction. Insome cases that's what we want (or more likely, we know that the returntype is a value type), but whenever we're wrapping a templated functionwhich might return a reference, we need to be sure that the return typeis decltype(auto).

Summary:Add a new method (tryParseRegister) that attempts to parse a register specification.

MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't.

[X86CmovConversion] Make heuristic for optimized cmov depth more conservative (PR44539)

Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539.As discussed there, this pass makes some overly optimisticassumptions, as it does not have access to actual branch weights.

This patch makes the computation of the depth of the optimized cmovmore conservative, by assuming a distribution of 75/25 rather than50/50 and placing the weights to get the more conservative result(larger depth). The fully conservative choice would bestd::max(TrueOpDepth, FalseOpDepth), but that would break at leastone existing test (which may or may not be an issue in practice).

1. We were calling FSEventStreamStop and FSEventStreamInvalidate beforewe called FSEventStreamStart and FSEventStreamSetDispatchQueue, if theDirectoryWatcher was destroyed before the initial async work was done.This violates the requirements of the FSEvents API.

2. Calls to Receiver could race between the initial work and theinvalidation during destruction.

This code seems wrong as the directory variable actually containsthe file name. It's also unreachable code as m_include_support_filesis hardcoded to false which is the condition for the surrounding 'ifstatement'. Let's just remove all of this.

Currently, isTruncateFree() and isZExtFree() callbacks return falseas they are not implemented in BPF backend. This may cause suboptimalcode generation. For example, if the load in the context of zero extensionhas more than one use, the pattern zextload{i8,i16,i32} willnot be generated. Rather, the load will be matched first andthen the result is zero extended.