[pull] master from tensorflow:master #169

pull · 2024-12-08T19:59:09Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

PiperOrigin-RevId: 715072690

PiperOrigin-RevId: 715080397

Imported from GitHub PR openxla/xla#21375 In later stages of optimization, there are instances of copy fusion on the parameter of the while body. With this, we need to allow inlining of fusions while getting the induction variable index, otherwise we cannot deduce the tuple index. Copybara import of the project: -- ae85690876a106c4d74715fed299779e29e8e641 by Shraiysh Vaishay <svaishay@nvidia.com>: [ds-fusion] Get While loop analysis with copy fusion In later stages of optimization, there are instances of copy fusion on the parameter of the while body. With this, we need to allow inlining of fusions while getting the induction variable index, otherwise we cannot deduce the tuple index. Merging this change closes #21375 PiperOrigin-RevId: 715080653

PiperOrigin-RevId: 715084813

Optimize this by replacing multiplication with advancing the pointer every iteration. Also avoid reloading depth/etc. from args every time. Fixing benchmark for depthwise conv and running them I get a lot of noise, but it seems positive overall. name old cpu/op new cpu/op delta BM_ConvFloatDepthwiseFwdCPU1_conv0_float/real_time [32_112_112_3_8_24_3_3_1_2_cpu1 ] 33.4µs ±16% 34.7µs ±28% ~ (p=0.284 n=38+39) BM_ConvFloatDepthwiseFwdCPU4_conv0_float/real_time [32_112_112_3_8_24_3_3_1_2_cpu4 ] 27.3µs ±57% 26.6µs ±52% ~ (p=0.556 n=40+40) BM_ConvFloatDepthwiseFwdCPU1_conv1_float/real_time [32_112_112_64_1_64_3_3_1_2_cpu1] 35.6µs ±24% 36.3µs ±27% ~ (p=0.283 n=35+40) BM_ConvFloatDepthwiseFwdCPU4_conv1_float/real_time [32_112_112_64_1_64_3_3_1_2_cpu4] 30.0µs ±27% 31.1µs ±33% ~ (p=0.377 n=36+34) BM_ConvFloatDepthwiseFwdCPU1_conv2_float/real_time [32_56_56_128_1_128_3_3_1_2_cpu1] 32.8µs ±14% 33.1µs ±18% ~ (p=0.761 n=33+38) BM_ConvFloatDepthwiseFwdCPU4_conv2_float/real_time [32_56_56_128_1_128_3_3_1_2_cpu4] 25.7µs ±57% 26.4µs ±55% ~ (p=0.609 n=40+40) BM_ConvFloatDepthwiseFwdCPU1_conv3_float/real_time [32_56_56_128_1_128_3_3_2_2_cpu1] 32.2µs ±17% 31.7µs ±12% ~ (p=0.204 n=37+35) BM_ConvFloatDepthwiseFwdCPU4_conv3_float/real_time [32_56_56_128_1_128_3_3_2_2_cpu4] 27.8µs ±32% 27.0µs ±24% ~ (p=0.341 n=34+39) BM_ConvFloatDepthwiseFwdCPU1_conv4_float/real_time [32_28_28_128_1_128_3_3_1_2_cpu1] 32.1µs ±13% 31.9µs ±12% ~ (p=0.470 n=39+36) BM_ConvFloatDepthwiseFwdCPU4_conv4_float/real_time [32_28_28_128_1_128_3_3_1_2_cpu4] 26.2µs ±30% 25.5µs ±44% ~ (p=0.677 n=38+37) BM_ConvFloatDepthwiseFwdCPU1_conv5_float/real_time [32_14_14_512_1_512_3_3_1_2_cpu1] 31.5µs ±18% 31.7µs ±17% ~ (p=0.742 n=38+39) BM_ConvFloatDepthwiseFwdCPU4_conv5_float/real_time [32_14_14_512_1_512_3_3_1_2_cpu4] 28.5µs ±28% 27.3µs ±29% ~ (p=0.208 n=35+37) BM_ConvFloatDepthwiseFwdCPU1_conv6_float/real_time [32_7_7_1024_1_1024_3_3_1_2_cpu1] 29.3µs ±16% 28.9µs ±21% ~ (p=0.334 n=39+31) BM_ConvFloatDepthwiseFwdCPU4_conv6_float/real_time [32_7_7_1024_1_1024_3_3_1_2_cpu4] 8.35µs ±62% 7.08µs ±46% -15.24% (p=0.026 n=40+37) BM_ConvFloatDepthwiseFwdCPU1_conv7_float/real_time [32_112_112_3_8_24_3_3_2_2_cpu1 ] 31.2µs ±17% 31.4µs ±22% ~ (p=0.987 n=35+38) BM_ConvFloatDepthwiseFwdCPU4_conv7_float/real_time [32_112_112_3_8_24_3_3_2_2_cpu4 ] 25.9µs ±45% 26.5µs ±32% ~ (p=0.859 n=39+38) BM_ConvFloatDepthwiseFwdCPU1_conv8_float/real_time [32_112_112_3_8_24_3_3_2_1_cpu1 ] 30.0µs ±16% 30.5µs ±18% ~ (p=0.228 n=34+33) BM_ConvFloatDepthwiseFwdCPU4_conv8_float/real_time [32_112_112_3_8_24_3_3_2_1_cpu4 ] 26.2µs ±41% 24.4µs ±53% ~ (p=0.288 n=36+40) BM_ConvFloatDepthwiseFwdCPU1_conv9_float/real_time [1_100_100_72_1_72_3_3_1_2_cpu1 ] 26.5µs ±16% 25.6µs ±15% ~ (p=0.051 n=34+37) BM_ConvFloatDepthwiseFwdCPU4_conv9_float/real_time [1_100_100_72_1_72_3_3_1_2_cpu4 ] 6.33µs ±37% 5.60µs ±36% -11.46% (p=0.011 n=40+35) BM_ConvFloatDepthwiseFwdCPU1_conv10_float/real_time [1_100_100_72_1_72_5_5_1_2_cpu1 ] 26.4µs ±13% 27.8µs ±20% ~ (p=0.140 n=33+40) BM_ConvFloatDepthwiseFwdCPU4_conv10_float/real_time [1_100_100_72_1_72_5_5_1_2_cpu4 ] 14.6µs ±79% 9.2µs ±90% -36.75% (p=0.000 n=40+40) PiperOrigin-RevId: 715085497

This function is the struct size checker that's used only on the plugin side (and is only valid there since it's checking to see if the struct is greater than or equal). Make the error text it generates clear that the plugin version is later than the framework, and hence, is an unsupported combo. PiperOrigin-RevId: 715086761

…les. PiperOrigin-RevId: 715098005

…ns when breaking circular cp for better traceability. PiperOrigin-RevId: 715104228

PiperOrigin-RevId: 715108619

…is CL makes two minor tweaks to the `BasicStringArray` class (the string array implementation in the PjRt-IFRT backend): (1) `CopyToHostBuffer` now supports the host buffer semantics of `kImmutableUntilTransferCompletes`. (2) `FullyReplicated` now works with `ConcreteSharding`. PiperOrigin-RevId: 715110567

PiperOrigin-RevId: 715112095

…duce more of its member function to C Api. PiperOrigin-RevId: 715113720

…ting during instruction fusion. The computations are not being sorted in a semantically meaningful order; they are sorted by instruction count with ties being broken consistently but arbitrarily (based on a hash of the string representation of the computation). There is therefore no reason why these passes need to traverse the computations in this specific order. PiperOrigin-RevId: 715122377

PiperOrigin-RevId: 715131342

…ot_handler.cc. This function creates a copy for the HloInstruction in the given PartitionedHlo and returns a new PartitionedHlo for the copy. This can be reused by other operators (like gather/scatter). PiperOrigin-RevId: 715138672

PiperOrigin-RevId: 715140967

PiperOrigin-RevId: 715153851

PiperOrigin-RevId: 715158015

Breaking internal tests Reverts 5e78ccd PiperOrigin-RevId: 715160248

…stom op. Attributes will determine behavior PiperOrigin-RevId: 715161504

PiperOrigin-RevId: 715162849

Only bitcast requires the layout to be known when evaluating HLO. In all other cases, we can evaluate without knowing the layout. This is needed for collective pipelining where we have to analyse while loops before layouts were assigned. PiperOrigin-RevId: 715163612

A gather has two operands, input and indices. If they point to the same instruction, create a copy for indices. A scatter has n inputs, 1 indices, and n updates (2n+1 operands in total). We allow overlap between n inputs. We also allow overlap between n updates. We need to create a copy if * indices overlap with any input or update * update overlap with any input The added copy will be removed if it is redundant in the following memory related passes (e.g., CopyInsertion). PiperOrigin-RevId: 715164959

This CL takes care of 1. Migrating external projects dependencies from ``` tensorflow/compiler/xla:test tensorflow/compiler/xla:test_helpers tensorflow/compiler/xla/service:pattern_matcher_gmock ``` to `tensorflow/compiler/xla/hlo/testlib:*` PiperOrigin-RevId: 715167258

…dleDotHelper`. `HandleDotHelper` is called once for a single dot operation, while `PartitionDot` can be called many times. We need to consider adding a copy only once. PiperOrigin-RevId: 715189518

Imported from GitHub PR openxla/xla#21104 Transpose folding pass doesn't preserve backend config when creating the new dot with transpose folded. Changing the behavior to copy the old dot's config to the new dot. Copybara import of the project: -- d2d6b628af1cab777a210e4ac62184e52fe9f4a9 by TJ Xu <tjx@nvidia.com>: Preserve backend config when folding transpose -- 6b5fa3a1cb70a790803e3ac57ff8329690e88e5e by TJ Xu <tjx@nvidia.com>: use SetupDerivedInstruction instead of just copying the backend config Merging this change closes #21104 PiperOrigin-RevId: 715204523

This method was renamed but staging function kept, switch to renamed variant. PiperOrigin-RevId: 715208433

PiperOrigin-RevId: 715229132

…mmits/632bfc342d3a7d63ce8b21209355139ee070d392) PiperOrigin-RevId: 715243878

Add a "simple" scatter benchmark with a reduce combiner. PiperOrigin-RevId: 716654572

global_data.h is deprecated PiperOrigin-RevId: 716654755

These will soon be shared between CPU and GPU. PiperOrigin-RevId: 716666785

…ng as the rest. PiperOrigin-RevId: 716675474

PiperOrigin-RevId: 716675943

Updating: - `env.h` - `env_time.h` - `errors.h` - `file_statistics.h` - `file_system.h` - `file_system_helper.h` - `logging.h` - `macros.h` - `status.h` - `status_matchers.h` - `status_to_from_proto.h` - `statusor.h` - `test.h` - `test_benchmark.h` - `threadpool.h` - `threadpool_async_executor.h` - `threadpool_interface.h` - `threadpool_options.h` - `types.h` and associated targets. PiperOrigin-RevId: 716721921

PiperOrigin-RevId: 716740061

Add DutyCycleCombiner for handling intra and inter chip duty cycle aggregation. Fix DutyCycleTracker bugs with idleness and duplicate active times. PiperOrigin-RevId: 716741025

…ting behavior unchanged. Only when coordination_agent_recoverable is set, it tries to reconnect to the cluster and would lead to AlreadyExists error. In this case the already_existing error can be handled by checking the existing topology is same as the new one. PiperOrigin-RevId: 716748621

…ets. This step towards encouraging extrenal projects to migrate to the already migrated hlo sub-components. PiperOrigin-RevId: 716762154

… reshape ops Allow a single bounded dynamic dimension. This is likely a short term fix as bounded dynamism as a while likely needs a lot of thought, but this solution with a single bounded dim is unambiguous so should be safe. PiperOrigin-RevId: 716763010

PiperOrigin-RevId: 716763354

Updates LLVM usage to match [bf17016a92bc](llvm/llvm-project@bf17016a92bc) PiperOrigin-RevId: 716773763

PiperOrigin-RevId: 716780241

The attribute should be named `channel_handle`, not `channel_id`. PiperOrigin-RevId: 716822019

…on host memory as host compute. This, of course, excludes DynamicUpdateSlices which are used for host offloading DMAs. PiperOrigin-RevId: 716839236

PiperOrigin-RevId: 716840170

PiperOrigin-RevId: 716844952

PiperOrigin-RevId: 716851959

PiperOrigin-RevId: 716860231

PiperOrigin-RevId: 716914254

PiperOrigin-RevId: 716914439

…size, not just the size of the newly added values. PiperOrigin-RevId: 716915408

PiperOrigin-RevId: 716930131

PiperOrigin-RevId: 716939298

PiperOrigin-RevId: 716939752

PiperOrigin-RevId: 716945103

PiperOrigin-RevId: 716955798

PiperOrigin-RevId: 716956415

PiperOrigin-RevId: 716958380

pull bot added the ⤵️ pull label Dec 8, 2024

rtg0795 and others added 29 commits January 13, 2025 13:07

Increase wheel limit size up to 270M for a temporary nightlies fix.

468aa55

PiperOrigin-RevId: 715072690

Decrease Linux CPU wheel limit size to 260M.

3040885

PiperOrigin-RevId: 715080397

Fix wrong index when inserting a copy from host to a call's parameter

75319e0

PiperOrigin-RevId: 715084813

Add a utility pass for writing atom programs and main IFRT func to fi…

ef61a40

…les. PiperOrigin-RevId: 715098005

Add original cp name prefix to the names of the decomposed instructio…

c68bd7c

…ns when breaking circular cp for better traceability. PiperOrigin-RevId: 715104228

Format one constraint for stablehlo.scatter operation.

441b5cf

PiperOrigin-RevId: 715108619

Internal change to update visibility.

1a9c9cc

PiperOrigin-RevId: 715112095

[PJRT:C] Implement PJRT_AsyncHostToDeviceTransferManager class. Intro…

6c3ac7b

…duce more of its member function to C Api. PiperOrigin-RevId: 715113720

Disable failed test.

db4a60f

PiperOrigin-RevId: 715131342

Move convert_async_collectives_to_sync to collectives directory

202471e

PiperOrigin-RevId: 715140967

Make dot thunk capable of running without a thread pool.

511cb60

PiperOrigin-RevId: 715153851

[xla:gpu] Move XLA:GPU runtime to xla/backends/gpu

8e12d62

PiperOrigin-RevId: 715158015

Rollback of PR #21375

46ef0e1

Breaking internal tests Reverts 5e78ccd PiperOrigin-RevId: 715160248

Allow composite op odml.quantize_and_dequantize to be converted to cu…

6194269

…stom op. Attributes will determine behavior PiperOrigin-RevId: 715161504

Fix an overflow issue in TransposePlan

750b8be

PiperOrigin-RevId: 715162849

Moving the logic of making a copy for rhs from PartitionDot to `Han…

720a8ec

…dleDotHelper`. `HandleDotHelper` is called once for a single dot operation, while `PartitionDot` can be called many times. We need to consider adding a copy only once. PiperOrigin-RevId: 715189518

Update to match upstream API change (NFC).

752fdc3

This method was renamed but staging function kept, switch to renamed variant. PiperOrigin-RevId: 715208433

Automated Code Change

67dce50

PiperOrigin-RevId: 715229132

Integrate Triton up to [632bfc34](https://github.com/openai/triton/co…

ad41d6f

…mmits/632bfc342d3a7d63ce8b21209355139ee070d392) PiperOrigin-RevId: 715243878

cota and others added 30 commits January 17, 2025 07:09

[xla:cpu] scatter_benchmark: add SimpleScatterReduceF32_R3

4887ecb

Add a "simple" scatter benchmark with a reduce combiner. PiperOrigin-RevId: 716654572

[XLA] Cleanup global_data.h references

604ed30

global_data.h is deprecated PiperOrigin-RevId: 716654755

[xla:gpu] move some transforms to xla/codegen/emitters/transforms

3c8521b

These will soon be shared between CPU and GPU. PiperOrigin-RevId: 716666785

Fix up Windows libtensorflow artifacts to have the same location/nami…

83ccbd9

…ng as the rest. PiperOrigin-RevId: 716675474

Add some clarifying comments for Dockerfiles.

1ae20f1

PiperOrigin-RevId: 716675943

Dump CL number as part of the filename of a HLO dump

f1fe5e5

PiperOrigin-RevId: 716740061

Add idle and busy time for TPUs to OpStats.

d81a2fe

Add DutyCycleCombiner for handling intra and inter chip duty cycle aggregation. Fix DutyCycleTracker bugs with idleness and duplicate active times. PiperOrigin-RevId: 716741025

[HLO Componentization] Add deprecation timeline to aliased build targ…

019da17

…ets. This step towards encouraging extrenal projects to migrate to the already migrated hlo sub-components. PiperOrigin-RevId: 716762154

Integrate StableHLO at openxla/stablehlo@c125b328

55efa11

PiperOrigin-RevId: 716763354

Integrate LLVM at llvm/llvm-project@bf17016a92bc

a7c0260

Updates LLVM usage to match [bf17016a92bc](llvm/llvm-project@bf17016a92bc) PiperOrigin-RevId: 716773763

Pass flatten-tuple : Migrate from MHLO to StableHLO

b339767

PiperOrigin-RevId: 716780241

Fix wrong name of the attribute for channel handle

371a3a6

The attribute should be named `channel_handle`, not `channel_id`. PiperOrigin-RevId: 716822019

Change HostOffloader to mark every DynamicUpdateSlice which operates …

ccd4edc

…on host memory as host compute. This, of course, excludes DynamicUpdateSlices which are used for host offloading DMAs. PiperOrigin-RevId: 716839236

Fix issue with SparseCore device ids and trace viewer.

65f893d

PiperOrigin-RevId: 716840170

Create a SourceTargetPairs class.

b3ea35c

PiperOrigin-RevId: 716844952

[XLA:MSA] Add dynamic-slice to async conversion in msa

90f3b95

PiperOrigin-RevId: 716851959

print is_fully_replicated in DebugString

5da3baf

PiperOrigin-RevId: 716860231

Automated Code Change

9d1b4fa

PiperOrigin-RevId: 716914254

Automated Code Change

f1a3c9b

PiperOrigin-RevId: 716914439

When calling AppendFeatureValues, reserve capacity for the new total …

118b14b

…size, not just the size of the newly added values. PiperOrigin-RevId: 716915408

Automated Code Change

140eb66

PiperOrigin-RevId: 716930131

compat: Update forward compatibility horizon to 2025-01-18

ca4b381

PiperOrigin-RevId: 716939298

Update GraphDef version to 2111.

3ed87a6

PiperOrigin-RevId: 716939752

Automated Code Change

cf1fe10

PiperOrigin-RevId: 716945103

Automated Code Change

1ec0e02

PiperOrigin-RevId: 716955798

Automated Code Change

d4d42d2

PiperOrigin-RevId: 716956415

Automated Code Change

8e8df64

PiperOrigin-RevId: 716958380

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from tensorflow:master #169

[pull] master from tensorflow:master #169

pull bot commented Dec 8, 2024 •

edited

Loading

[pull] master from tensorflow:master #169

Are you sure you want to change the base?

[pull] master from tensorflow:master #169

Conversation

pull bot commented Dec 8, 2024 • edited Loading

pull bot commented Dec 8, 2024 •

edited

Loading