forked from tensorflow/tensorflow
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from tensorflow:master #169
Open
pull
wants to merge
1,648
commits into
47-studio-org:master
Choose a base branch
from
tensorflow:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PiperOrigin-RevId: 715072690
PiperOrigin-RevId: 715080397
Imported from GitHub PR openxla/xla#21375 In later stages of optimization, there are instances of copy fusion on the parameter of the while body. With this, we need to allow inlining of fusions while getting the induction variable index, otherwise we cannot deduce the tuple index. Copybara import of the project: -- ae85690876a106c4d74715fed299779e29e8e641 by Shraiysh Vaishay <svaishay@nvidia.com>: [ds-fusion] Get While loop analysis with copy fusion In later stages of optimization, there are instances of copy fusion on the parameter of the while body. With this, we need to allow inlining of fusions while getting the induction variable index, otherwise we cannot deduce the tuple index. Merging this change closes #21375 PiperOrigin-RevId: 715080653
PiperOrigin-RevId: 715084813
Optimize this by replacing multiplication with advancing the pointer every iteration. Also avoid reloading depth/etc. from args every time. Fixing benchmark for depthwise conv and running them I get a lot of noise, but it seems positive overall. name old cpu/op new cpu/op delta BM_ConvFloatDepthwiseFwdCPU1_conv0_float/real_time [32_112_112_3_8_24_3_3_1_2_cpu1 ] 33.4µs ±16% 34.7µs ±28% ~ (p=0.284 n=38+39) BM_ConvFloatDepthwiseFwdCPU4_conv0_float/real_time [32_112_112_3_8_24_3_3_1_2_cpu4 ] 27.3µs ±57% 26.6µs ±52% ~ (p=0.556 n=40+40) BM_ConvFloatDepthwiseFwdCPU1_conv1_float/real_time [32_112_112_64_1_64_3_3_1_2_cpu1] 35.6µs ±24% 36.3µs ±27% ~ (p=0.283 n=35+40) BM_ConvFloatDepthwiseFwdCPU4_conv1_float/real_time [32_112_112_64_1_64_3_3_1_2_cpu4] 30.0µs ±27% 31.1µs ±33% ~ (p=0.377 n=36+34) BM_ConvFloatDepthwiseFwdCPU1_conv2_float/real_time [32_56_56_128_1_128_3_3_1_2_cpu1] 32.8µs ±14% 33.1µs ±18% ~ (p=0.761 n=33+38) BM_ConvFloatDepthwiseFwdCPU4_conv2_float/real_time [32_56_56_128_1_128_3_3_1_2_cpu4] 25.7µs ±57% 26.4µs ±55% ~ (p=0.609 n=40+40) BM_ConvFloatDepthwiseFwdCPU1_conv3_float/real_time [32_56_56_128_1_128_3_3_2_2_cpu1] 32.2µs ±17% 31.7µs ±12% ~ (p=0.204 n=37+35) BM_ConvFloatDepthwiseFwdCPU4_conv3_float/real_time [32_56_56_128_1_128_3_3_2_2_cpu4] 27.8µs ±32% 27.0µs ±24% ~ (p=0.341 n=34+39) BM_ConvFloatDepthwiseFwdCPU1_conv4_float/real_time [32_28_28_128_1_128_3_3_1_2_cpu1] 32.1µs ±13% 31.9µs ±12% ~ (p=0.470 n=39+36) BM_ConvFloatDepthwiseFwdCPU4_conv4_float/real_time [32_28_28_128_1_128_3_3_1_2_cpu4] 26.2µs ±30% 25.5µs ±44% ~ (p=0.677 n=38+37) BM_ConvFloatDepthwiseFwdCPU1_conv5_float/real_time [32_14_14_512_1_512_3_3_1_2_cpu1] 31.5µs ±18% 31.7µs ±17% ~ (p=0.742 n=38+39) BM_ConvFloatDepthwiseFwdCPU4_conv5_float/real_time [32_14_14_512_1_512_3_3_1_2_cpu4] 28.5µs ±28% 27.3µs ±29% ~ (p=0.208 n=35+37) BM_ConvFloatDepthwiseFwdCPU1_conv6_float/real_time [32_7_7_1024_1_1024_3_3_1_2_cpu1] 29.3µs ±16% 28.9µs ±21% ~ (p=0.334 n=39+31) BM_ConvFloatDepthwiseFwdCPU4_conv6_float/real_time [32_7_7_1024_1_1024_3_3_1_2_cpu4] 8.35µs ±62% 7.08µs ±46% -15.24% (p=0.026 n=40+37) BM_ConvFloatDepthwiseFwdCPU1_conv7_float/real_time [32_112_112_3_8_24_3_3_2_2_cpu1 ] 31.2µs ±17% 31.4µs ±22% ~ (p=0.987 n=35+38) BM_ConvFloatDepthwiseFwdCPU4_conv7_float/real_time [32_112_112_3_8_24_3_3_2_2_cpu4 ] 25.9µs ±45% 26.5µs ±32% ~ (p=0.859 n=39+38) BM_ConvFloatDepthwiseFwdCPU1_conv8_float/real_time [32_112_112_3_8_24_3_3_2_1_cpu1 ] 30.0µs ±16% 30.5µs ±18% ~ (p=0.228 n=34+33) BM_ConvFloatDepthwiseFwdCPU4_conv8_float/real_time [32_112_112_3_8_24_3_3_2_1_cpu4 ] 26.2µs ±41% 24.4µs ±53% ~ (p=0.288 n=36+40) BM_ConvFloatDepthwiseFwdCPU1_conv9_float/real_time [1_100_100_72_1_72_3_3_1_2_cpu1 ] 26.5µs ±16% 25.6µs ±15% ~ (p=0.051 n=34+37) BM_ConvFloatDepthwiseFwdCPU4_conv9_float/real_time [1_100_100_72_1_72_3_3_1_2_cpu4 ] 6.33µs ±37% 5.60µs ±36% -11.46% (p=0.011 n=40+35) BM_ConvFloatDepthwiseFwdCPU1_conv10_float/real_time [1_100_100_72_1_72_5_5_1_2_cpu1 ] 26.4µs ±13% 27.8µs ±20% ~ (p=0.140 n=33+40) BM_ConvFloatDepthwiseFwdCPU4_conv10_float/real_time [1_100_100_72_1_72_5_5_1_2_cpu4 ] 14.6µs ±79% 9.2µs ±90% -36.75% (p=0.000 n=40+40) PiperOrigin-RevId: 715085497
This function is the struct size checker that's used only on the plugin side (and is only valid there since it's checking to see if the struct is greater than or equal). Make the error text it generates clear that the plugin version is later than the framework, and hence, is an unsupported combo. PiperOrigin-RevId: 715086761
…les. PiperOrigin-RevId: 715098005
…ns when breaking circular cp for better traceability. PiperOrigin-RevId: 715104228
PiperOrigin-RevId: 715108619
…is CL makes two minor tweaks to the `BasicStringArray` class (the string array implementation in the PjRt-IFRT backend): (1) `CopyToHostBuffer` now supports the host buffer semantics of `kImmutableUntilTransferCompletes`. (2) `FullyReplicated` now works with `ConcreteSharding`. PiperOrigin-RevId: 715110567
PiperOrigin-RevId: 715112095
…duce more of its member function to C Api. PiperOrigin-RevId: 715113720
…ting during instruction fusion. The computations are not being sorted in a semantically meaningful order; they are sorted by instruction count with ties being broken consistently but arbitrarily (based on a hash of the string representation of the computation). There is therefore no reason why these passes need to traverse the computations in this specific order. PiperOrigin-RevId: 715122377
PiperOrigin-RevId: 715131342
…ot_handler.cc. This function creates a copy for the HloInstruction in the given PartitionedHlo and returns a new PartitionedHlo for the copy. This can be reused by other operators (like gather/scatter). PiperOrigin-RevId: 715138672
PiperOrigin-RevId: 715140967
PiperOrigin-RevId: 715153851
PiperOrigin-RevId: 715158015
Breaking internal tests Reverts 5e78ccd PiperOrigin-RevId: 715160248
…stom op. Attributes will determine behavior PiperOrigin-RevId: 715161504
PiperOrigin-RevId: 715162849
Only bitcast requires the layout to be known when evaluating HLO. In all other cases, we can evaluate without knowing the layout. This is needed for collective pipelining where we have to analyse while loops before layouts were assigned. PiperOrigin-RevId: 715163612
A gather has two operands, input and indices. If they point to the same instruction, create a copy for indices. A scatter has n inputs, 1 indices, and n updates (2n+1 operands in total). We allow overlap between n inputs. We also allow overlap between n updates. We need to create a copy if * indices overlap with any input or update * update overlap with any input The added copy will be removed if it is redundant in the following memory related passes (e.g., CopyInsertion). PiperOrigin-RevId: 715164959
This CL takes care of 1. Migrating external projects dependencies from ``` tensorflow/compiler/xla:test tensorflow/compiler/xla:test_helpers tensorflow/compiler/xla/service:pattern_matcher_gmock ``` to `tensorflow/compiler/xla/hlo/testlib:*` PiperOrigin-RevId: 715167258
…dleDotHelper`. `HandleDotHelper` is called once for a single dot operation, while `PartitionDot` can be called many times. We need to consider adding a copy only once. PiperOrigin-RevId: 715189518
Imported from GitHub PR openxla/xla#21104 Transpose folding pass doesn't preserve backend config when creating the new dot with transpose folded. Changing the behavior to copy the old dot's config to the new dot. Copybara import of the project: -- d2d6b628af1cab777a210e4ac62184e52fe9f4a9 by TJ Xu <tjx@nvidia.com>: Preserve backend config when folding transpose -- 6b5fa3a1cb70a790803e3ac57ff8329690e88e5e by TJ Xu <tjx@nvidia.com>: use SetupDerivedInstruction instead of just copying the backend config Merging this change closes #21104 PiperOrigin-RevId: 715204523
This method was renamed but staging function kept, switch to renamed variant. PiperOrigin-RevId: 715208433
PiperOrigin-RevId: 715229132
…mmits/632bfc342d3a7d63ce8b21209355139ee070d392) PiperOrigin-RevId: 715243878
Add a "simple" scatter benchmark with a reduce combiner. PiperOrigin-RevId: 716654572
global_data.h is deprecated PiperOrigin-RevId: 716654755
These will soon be shared between CPU and GPU. PiperOrigin-RevId: 716666785
…ng as the rest. PiperOrigin-RevId: 716675474
PiperOrigin-RevId: 716675943
Updating: - `env.h` - `env_time.h` - `errors.h` - `file_statistics.h` - `file_system.h` - `file_system_helper.h` - `logging.h` - `macros.h` - `status.h` - `status_matchers.h` - `status_to_from_proto.h` - `statusor.h` - `test.h` - `test_benchmark.h` - `threadpool.h` - `threadpool_async_executor.h` - `threadpool_interface.h` - `threadpool_options.h` - `types.h` and associated targets. PiperOrigin-RevId: 716721921
PiperOrigin-RevId: 716740061
Add DutyCycleCombiner for handling intra and inter chip duty cycle aggregation. Fix DutyCycleTracker bugs with idleness and duplicate active times. PiperOrigin-RevId: 716741025
…ting behavior unchanged. Only when coordination_agent_recoverable is set, it tries to reconnect to the cluster and would lead to AlreadyExists error. In this case the already_existing error can be handled by checking the existing topology is same as the new one. PiperOrigin-RevId: 716748621
…ets. This step towards encouraging extrenal projects to migrate to the already migrated hlo sub-components. PiperOrigin-RevId: 716762154
… reshape ops Allow a single bounded dynamic dimension. This is likely a short term fix as bounded dynamism as a while likely needs a lot of thought, but this solution with a single bounded dim is unambiguous so should be safe. PiperOrigin-RevId: 716763010
PiperOrigin-RevId: 716763354
Updates LLVM usage to match [bf17016a92bc](llvm/llvm-project@bf17016a92bc) PiperOrigin-RevId: 716773763
PiperOrigin-RevId: 716780241
The attribute should be named `channel_handle`, not `channel_id`. PiperOrigin-RevId: 716822019
…on host memory as host compute. This, of course, excludes DynamicUpdateSlices which are used for host offloading DMAs. PiperOrigin-RevId: 716839236
PiperOrigin-RevId: 716840170
PiperOrigin-RevId: 716844952
PiperOrigin-RevId: 716851959
PiperOrigin-RevId: 716860231
PiperOrigin-RevId: 716914254
PiperOrigin-RevId: 716914439
…size, not just the size of the newly added values. PiperOrigin-RevId: 716915408
PiperOrigin-RevId: 716930131
PiperOrigin-RevId: 716939298
PiperOrigin-RevId: 716939752
PiperOrigin-RevId: 716945103
PiperOrigin-RevId: 716955798
PiperOrigin-RevId: 716956415
PiperOrigin-RevId: 716958380
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )