-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mainline development for mt6735 #1
Comments
In commit 510410b ("drm/msm: Implement mmap as GEM object function") we switched to a new/cleaner method of doing things. That's good, but we missed a little bit. Before that commit, we used to _first_ run through the drm_gem_mmap_obj() case where `obj->funcs->mmap()` was NULL. That meant that we ran: vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags)); vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot); ...and _then_ we modified those mappings with our own. Now that `obj->funcs->mmap()` is no longer NULL we don't run the default code. It looks like the fact that the vm_flags got VM_IO / VM_DONTDUMP was important because we're now getting crashes on Chromebooks that use ARC++ while logging out. Specifically a crash that looks like this (this is on a 5.10 kernel w/ relevant backports but also seen on a 5.15 kernel): Unable to handle kernel paging request at virtual address ffffffc008000000 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000008293d000 [ffffffc008000000] pgd=00000001002b3003, p4d=00000001002b3003, pud=00000001002b3003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP [...] CPU: 7 PID: 15734 Comm: crash_dump64 Tainted: G W 5.10.67 #1 [...] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP SKU2 platform (DT) pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--) pc : __arch_copy_to_user+0xc0/0x30c lr : copyout+0xac/0x14c [...] Call trace: __arch_copy_to_user+0xc0/0x30c copy_page_to_iter+0x1a0/0x294 process_vm_rw_core+0x240/0x408 process_vm_rw+0x110/0x16c __arm64_sys_process_vm_readv+0x30/0x3c el0_svc_common+0xf8/0x250 do_el0_svc+0x30/0x80 el0_svc+0x10/0x1c el0_sync_handler+0x78/0x108 el0_sync+0x184/0x1c0 Code: f8408423 f80008c3 910020c6 36100082 (b8404423) Let's add the two flags back in. While we're at it, the fact that we aren't running the default means that we _don't_ need to clear out VM_PFNMAP, so remove that and save an instruction. NOTE: it was confirmed that VM_IO was the important flag to fix the problem I was seeing, but adding back VM_DONTDUMP seems like a sane thing to do so I'm doing that too. Fixes: 510410b ("drm/msm: Implement mmap as GEM object function") Reported-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20211110113334.1.I1687e716adb2df746da58b508db3f25423c40b27@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>
If you happened to try to access `/dev/drm_dp_aux` devices provided by the MSM DP AUX driver too early at bootup you could go boom. Let's avoid that by only allowing AUX transfers when the controller is powered up. Specifically the crash that was seen (on Chrome OS 5.4 tree with relevant backports): Kernel panic - not syncing: Asynchronous SError Interrupt CPU: 0 PID: 3131 Comm: fwupd Not tainted 5.4.144-16620-g28af11b73efb #1 Hardware name: Google Lazor (rev3+) with KB Backlight (DT) Call trace: dump_backtrace+0x0/0x14c show_stack+0x20/0x2c dump_stack+0xac/0x124 panic+0x150/0x390 nmi_panic+0x80/0x94 arm64_serror_panic+0x78/0x84 do_serror+0x0/0x118 do_serror+0xa4/0x118 el1_error+0xbc/0x160 dp_catalog_aux_write_data+0x1c/0x3c dp_aux_cmd_fifo_tx+0xf0/0x1b0 dp_aux_transfer+0x1b0/0x2bc drm_dp_dpcd_access+0x8c/0x11c drm_dp_dpcd_read+0x64/0x10c auxdev_read_iter+0xd4/0x1c4 I did a little bit of tracing and found that: * We register the AUX device very early at bootup. * Power isn't actually turned on for my system until hpd_event_thread() -> dp_display_host_init() -> dp_power_init() * You can see that dp_power_init() calls dp_aux_init() which is where we start allowing AUX channel requests to go through. In general this patch is a bit of a bandaid but at least it gets us out of the current state where userspace acting at the wrong time can fully crash the system. * I think the more proper fix (which requires quite a bit more changes) is to power stuff on while an AUX transfer is happening. This is like the solution we did for ti-sn65dsi86. This might be required for us to move to populating the panel via the DP-AUX bus. * Another fix considered was to dynamically register / unregister. I tried that at <https://crrev.com/c/3169431/3> but it got ugly. Currently there's a bug where the pm_runtime() state isn't tracked properly and that causes us to just keep registering more and more. Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Kuogee Hsieh <quic_khsieh@quicinc.com> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Link: https://lore.kernel.org/r/20211109100403.1.I4e23470d681f7efe37e2e7f1a6466e15e9bb1d72@changeid Signed-off-by: Rob Clark <robdclark@chromium.org>
Fix the following NULL pointer dereference in mt7915_get_phy_mode routine adding an ibss interface to the mt7915 driver. [ 101.137097] wlan0: Trigger new scan to find an IBSS to join [ 102.827039] wlan0: Creating new IBSS network, BSSID 26:a4:50:1a:6e:69 [ 103.064756] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 103.073670] Mem abort info: [ 103.076520] ESR = 0x96000005 [ 103.079614] EC = 0x25: DABT (current EL), IL = 32 bits [ 103.084934] SET = 0, FnV = 0 [ 103.088042] EA = 0, S1PTW = 0 [ 103.091215] Data abort info: [ 103.094104] ISV = 0, ISS = 0x00000005 [ 103.098041] CM = 0, WnR = 0 [ 103.101044] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000460b1000 [ 103.107565] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 103.116590] Internal error: Oops: 96000005 [#1] SMP [ 103.189066] CPU: 1 PID: 333 Comm: kworker/u4:3 Not tainted 5.10.75 #0 [ 103.195498] Hardware name: MediaTek MT7622 RFB1 board (DT) [ 103.201124] Workqueue: phy0 ieee80211_iface_work [mac80211] [ 103.206695] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--) [ 103.212705] pc : mt7915_get_phy_mode+0x68/0x120 [mt7915e] [ 103.218103] lr : mt7915_mcu_add_bss_info+0x11c/0x760 [mt7915e] [ 103.223927] sp : ffffffc011cdb9e0 [ 103.227235] x29: ffffffc011cdb9e0 x28: ffffff8006563098 [ 103.232545] x27: ffffff8005f4da22 x26: ffffff800685ac40 [ 103.237855] x25: 0000000000000001 x24: 000000000000011f [ 103.243165] x23: ffffff8005f4e260 x22: ffffff8006567918 [ 103.248475] x21: ffffff8005f4df80 x20: ffffff800685ac58 [ 103.253785] x19: ffffff8006744400 x18: 0000000000000000 [ 103.259094] x17: 0000000000000000 x16: 0000000000000001 [ 103.264403] x15: 000899c3a2d9d2e4 x14: 000899bdc3c3a1c8 [ 103.269713] x13: 0000000000000000 x12: 0000000000000000 [ 103.275024] x11: ffffffc010e30c20 x10: 0000000000000000 [ 103.280333] x9 : 0000000000000050 x8 : ffffff8006567d88 [ 103.285642] x7 : ffffff8006563b5c x6 : ffffff8006563b44 [ 103.290952] x5 : 0000000000000002 x4 : 0000000000000001 [ 103.296262] x3 : 0000000000000001 x2 : 0000000000000001 [ 103.301572] x1 : 0000000000000000 x0 : 0000000000000011 [ 103.306882] Call trace: [ 103.309328] mt7915_get_phy_mode+0x68/0x120 [mt7915e] [ 103.314378] mt7915_bss_info_changed+0x198/0x200 [mt7915e] [ 103.319941] ieee80211_bss_info_change_notify+0x128/0x290 [mac80211] [ 103.326360] __ieee80211_sta_join_ibss+0x308/0x6c4 [mac80211] [ 103.332171] ieee80211_sta_create_ibss+0x8c/0x10c [mac80211] [ 103.337895] ieee80211_ibss_work+0x3dc/0x614 [mac80211] [ 103.343185] ieee80211_iface_work+0x388/0x3f0 [mac80211] [ 103.348495] process_one_work+0x288/0x690 [ 103.352499] worker_thread+0x70/0x464 [ 103.356157] kthread+0x144/0x150 [ 103.359380] ret_from_fork+0x10/0x18 [ 103.362952] Code: 394008c3 52800220 394000e4 7100007f (39400023) Fixes: 37f4ca9 ("mt76: mt7915: register per-phy HE capabilities for each interface") Fixes: e57b790 ("mt76: add mac80211 driver for MT7915 PCIe-based chipsets") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/ddae419a740f1fb9e48afd432035e9f394f512ee.1637239456.git.lorenzo@kernel.org
into HEAD KVM/riscv fixes for 5.16, take #1 - Fix incorrect KVM_MAX_VCPUS value - Unmap stage2 mapping when deleting/moving a memslot (This was due to empty kvm_arch_flush_shadow_memslot())
When the `rmmod sata_fsl.ko` command is executed in the PPC64 GNU/Linux, a bug is reported: ================================================================== BUG: Unable to handle kernel data access on read at 0x80000800805b502c Oops: Kernel access of bad area, sig: 11 [#1] NIP [c0000000000388a4] .ioread32+0x4/0x20 LR [80000000000c6034] .sata_fsl_port_stop+0x44/0xe0 [sata_fsl] Call Trace: .free_irq+0x1c/0x4e0 (unreliable) .ata_host_stop+0x74/0xd0 [libata] .release_nodes+0x330/0x3f0 .device_release_driver_internal+0x178/0x2c0 .driver_detach+0x64/0xd0 .bus_remove_driver+0x70/0xf0 .driver_unregister+0x38/0x80 .platform_driver_unregister+0x14/0x30 .fsl_sata_driver_exit+0x18/0xa20 [sata_fsl] .__se_sys_delete_module+0x1ec/0x2d0 .system_call_exception+0xfc/0x1f0 system_call_common+0xf8/0x200 ================================================================== The triggering of the BUG is shown in the following stack: driver_detach device_release_driver_internal __device_release_driver drv->remove(dev) --> platform_drv_remove/platform_remove drv->remove(dev) --> sata_fsl_remove iounmap(host_priv->hcr_base); <---- unmap kfree(host_priv); <---- free devres_release_all release_nodes dr->node.release(dev, dr->data) --> ata_host_stop ap->ops->port_stop(ap) --> sata_fsl_port_stop ioread32(hcr_base + HCONTROL) <---- UAF host->ops->host_stop(host) The iounmap(host_priv->hcr_base) and kfree(host_priv) functions should not be executed in drv->remove. These functions should be executed in host_stop after port_stop. Therefore, we move these functions to the new function sata_fsl_host_stop and bind the new function to host_stop. Fixes: faf0b2e ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Adding a check on len parameter to avoid empty skb. This prevents a division error in netem_enqueue function which is caused when skb->len=0 and skb->data_len=0 in the randomized corruption step as shown below. skb->data[prandom_u32() % skb_headlen(skb)] ^= 1<<(prandom_u32() % 8); Crash Report: [ 343.170349] netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0 [ 343.216110] netem: version 1.3 [ 343.235841] divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 343.236680] CPU: 3 PID: 4288 Comm: reproducer Not tainted 5.16.0-rc1+ [ 343.237569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 [ 343.238707] RIP: 0010:netem_enqueue+0x1590/0x33c0 [sch_netem] [ 343.239499] Code: 89 85 58 ff ff ff e8 5f 5d e9 d3 48 8b b5 48 ff ff ff 8b 8d 50 ff ff ff 8b 85 58 ff ff ff 48 8b bd 70 ff ff ff 31 d2 2b 4f 74 <f7> f1 48 b8 00 00 00 00 00 fc ff df 49 01 d5 4c 89 e9 48 c1 e9 03 [ 343.241883] RSP: 0018:ffff88800bcd7368 EFLAGS: 00010246 [ 343.242589] RAX: 00000000ba7c0a9c RBX: 0000000000000001 RCX: 0000000000000000 [ 343.243542] RDX: 0000000000000000 RSI: ffff88800f8edb10 RDI: ffff88800f8eda40 [ 343.244474] RBP: ffff88800bcd7458 R08: 0000000000000000 R09: ffffffff94fb8445 [ 343.245403] R10: ffffffff94fb8336 R11: ffffffff94fb8445 R12: 0000000000000000 [ 343.246355] R13: ffff88800a5a7000 R14: ffff88800a5b5800 R15: 0000000000000020 [ 343.247291] FS: 00007fdde2bd7700(0000) GS:ffff888109780000(0000) knlGS:0000000000000000 [ 343.248350] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 343.249120] CR2: 00000000200000c0 CR3: 000000000ef4c000 CR4: 00000000000006e0 [ 343.250076] Call Trace: [ 343.250423] <TASK> [ 343.250713] ? memcpy+0x4d/0x60 [ 343.251162] ? netem_init+0xa0/0xa0 [sch_netem] [ 343.251795] ? __sanitizer_cov_trace_pc+0x21/0x60 [ 343.252443] netem_enqueue+0xe28/0x33c0 [sch_netem] [ 343.253102] ? stack_trace_save+0x87/0xb0 [ 343.253655] ? filter_irq_stacks+0xb0/0xb0 [ 343.254220] ? netem_init+0xa0/0xa0 [sch_netem] [ 343.254837] ? __kasan_check_write+0x14/0x20 [ 343.255418] ? _raw_spin_lock+0x88/0xd6 [ 343.255953] dev_qdisc_enqueue+0x50/0x180 [ 343.256508] __dev_queue_xmit+0x1a7e/0x3090 [ 343.257083] ? netdev_core_pick_tx+0x300/0x300 [ 343.257690] ? check_kcov_mode+0x10/0x40 [ 343.258219] ? _raw_spin_unlock_irqrestore+0x29/0x40 [ 343.258899] ? __kasan_init_slab_obj+0x24/0x30 [ 343.259529] ? setup_object.isra.71+0x23/0x90 [ 343.260121] ? new_slab+0x26e/0x4b0 [ 343.260609] ? kasan_poison+0x3a/0x50 [ 343.261118] ? kasan_unpoison+0x28/0x50 [ 343.261637] ? __kasan_slab_alloc+0x71/0x90 [ 343.262214] ? memcpy+0x4d/0x60 [ 343.262674] ? write_comp_data+0x2f/0x90 [ 343.263209] ? __kasan_check_write+0x14/0x20 [ 343.263802] ? __skb_clone+0x5d6/0x840 [ 343.264329] ? __sanitizer_cov_trace_pc+0x21/0x60 [ 343.264958] dev_queue_xmit+0x1c/0x20 [ 343.265470] netlink_deliver_tap+0x652/0x9c0 [ 343.266067] netlink_unicast+0x5a0/0x7f0 [ 343.266608] ? netlink_attachskb+0x860/0x860 [ 343.267183] ? __sanitizer_cov_trace_pc+0x21/0x60 [ 343.267820] ? write_comp_data+0x2f/0x90 [ 343.268367] netlink_sendmsg+0x922/0xe80 [ 343.268899] ? netlink_unicast+0x7f0/0x7f0 [ 343.269472] ? __sanitizer_cov_trace_pc+0x21/0x60 [ 343.270099] ? write_comp_data+0x2f/0x90 [ 343.270644] ? netlink_unicast+0x7f0/0x7f0 [ 343.271210] sock_sendmsg+0x155/0x190 [ 343.271721] ____sys_sendmsg+0x75f/0x8f0 [ 343.272262] ? kernel_sendmsg+0x60/0x60 [ 343.272788] ? write_comp_data+0x2f/0x90 [ 343.273332] ? write_comp_data+0x2f/0x90 [ 343.273869] ___sys_sendmsg+0x10f/0x190 [ 343.274405] ? sendmsg_copy_msghdr+0x80/0x80 [ 343.274984] ? slab_post_alloc_hook+0x70/0x230 [ 343.275597] ? futex_wait_setup+0x240/0x240 [ 343.276175] ? security_file_alloc+0x3e/0x170 [ 343.276779] ? write_comp_data+0x2f/0x90 [ 343.277313] ? __sanitizer_cov_trace_pc+0x21/0x60 [ 343.277969] ? write_comp_data+0x2f/0x90 [ 343.278515] ? __fget_files+0x1ad/0x260 [ 343.279048] ? __sanitizer_cov_trace_pc+0x21/0x60 [ 343.279685] ? write_comp_data+0x2f/0x90 [ 343.280234] ? __sanitizer_cov_trace_pc+0x21/0x60 [ 343.280874] ? sockfd_lookup_light+0xd1/0x190 [ 343.281481] __sys_sendmsg+0x118/0x200 [ 343.281998] ? __sys_sendmsg_sock+0x40/0x40 [ 343.282578] ? alloc_fd+0x229/0x5e0 [ 343.283070] ? write_comp_data+0x2f/0x90 [ 343.283610] ? write_comp_data+0x2f/0x90 [ 343.284135] ? __sanitizer_cov_trace_pc+0x21/0x60 [ 343.284776] ? ktime_get_coarse_real_ts64+0xb8/0xf0 [ 343.285450] __x64_sys_sendmsg+0x7d/0xc0 [ 343.285981] ? syscall_enter_from_user_mode+0x4d/0x70 [ 343.286664] do_syscall_64+0x3a/0x80 [ 343.287158] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 343.287850] RIP: 0033:0x7fdde24cf289 [ 343.288344] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 db 2c 00 f7 d8 64 89 01 48 [ 343.290729] RSP: 002b:00007fdde2bd6d98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 343.291730] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdde24cf289 [ 343.292673] RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000004 [ 343.293618] RBP: 00007fdde2bd6e20 R08: 0000000100000001 R09: 0000000000000000 [ 343.294557] R10: 0000000100000001 R11: 0000000000000246 R12: 0000000000000000 [ 343.295493] R13: 0000000000021000 R14: 0000000000000000 R15: 00007fdde2bd7700 [ 343.296432] </TASK> [ 343.296735] Modules linked in: sch_netem ip6_vti ip_vti ip_gre ipip sit ip_tunnel geneve macsec macvtap tap ipvlan macvlan 8021q garp mrp hsr wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic curve25519_x86_64 libcurve25519_generic libchacha xfrm_interface xfrm6_tunnel tunnel4 veth netdevsim psample batman_adv nlmon dummy team bonding tls vcan ip6_gre ip6_tunnel tunnel6 gre tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_security iptable_raw ebtable_filter ebtables rfkill ip6table_filter ip6_tables iptable_filter ppdev bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec parport_pc drm joydev floppy parport sg syscopyarea sysfillrect sysimgblt i2c_piix4 qemu_fw_cfg fb_sys_fops pcspkr [ 343.297459] ip_tables xfs virtio_net net_failover failover sd_mod sr_mod cdrom t10_pi ata_generic pata_acpi ata_piix libata virtio_pci virtio_pci_legacy_dev serio_raw virtio_pci_modern_dev dm_mirror dm_region_hash dm_log dm_mod [ 343.311074] Dumping ftrace buffer: [ 343.311532] (ftrace buffer empty) [ 343.312040] ---[ end trace a2e3db5a6ae05099 ]--- [ 343.312691] RIP: 0010:netem_enqueue+0x1590/0x33c0 [sch_netem] [ 343.313481] Code: 89 85 58 ff ff ff e8 5f 5d e9 d3 48 8b b5 48 ff ff ff 8b 8d 50 ff ff ff 8b 85 58 ff ff ff 48 8b bd 70 ff ff ff 31 d2 2b 4f 74 <f7> f1 48 b8 00 00 00 00 00 fc ff df 49 01 d5 4c 89 e9 48 c1 e9 03 [ 343.315893] RSP: 0018:ffff88800bcd7368 EFLAGS: 00010246 [ 343.316622] RAX: 00000000ba7c0a9c RBX: 0000000000000001 RCX: 0000000000000000 [ 343.317585] RDX: 0000000000000000 RSI: ffff88800f8edb10 RDI: ffff88800f8eda40 [ 343.318549] RBP: ffff88800bcd7458 R08: 0000000000000000 R09: ffffffff94fb8445 [ 343.319503] R10: ffffffff94fb8336 R11: ffffffff94fb8445 R12: 0000000000000000 [ 343.320455] R13: ffff88800a5a7000 R14: ffff88800a5b5800 R15: 0000000000000020 [ 343.321414] FS: 00007fdde2bd7700(0000) GS:ffff888109780000(0000) knlGS:0000000000000000 [ 343.322489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 343.323283] CR2: 00000000200000c0 CR3: 000000000ef4c000 CR4: 00000000000006e0 [ 343.324264] Kernel panic - not syncing: Fatal exception in interrupt [ 343.333717] Dumping ftrace buffer: [ 343.334175] (ftrace buffer empty) [ 343.334653] Kernel Offset: 0x13600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 343.336027] Rebooting in 86400 seconds.. Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20211129175328.55339-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Driver needs to nullify the port select attributes of the LAG when port selection is destroyed, otherwise it breaks recreation of the LAG. It fixes the below kernel oops: [ 587.906377] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 587.908843] #PF: supervisor read access in kernel mode [ 587.910730] #PF: error_code(0x0000) - not-present page [ 587.912580] PGD 0 P4D 0 [ 587.913632] Oops: 0000 [#1] SMP PTI [ 587.914644] CPU: 5 PID: 165 Comm: kworker/u20:5 Tainted: G OE 5.9.0_mlnx #1 [ 587.916152] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 587.918332] Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core] [ 587.919479] RIP: 0010:mlx5_del_flow_rules+0x10/0x270 [mlx5_core] [ 587.920568] mlx5_core 0000:08:00.1 enp8s0f1: Link up [ 587.920680] Code: c0 09 80 a0 e8 cf 42 a4 e0 48 c7 c3 f4 ff ff ff e8 8a 88 dd e0 e9 ab fe ff ff 0f 1f 44 00 00 41 56 41 55 49 89 fd 41 54 55 53 <48> 8b 47 08 48 8b 68 28 48 85 ed 74 2e 48 8d 7d 38 e8 6a 64 34 e1 [ 587.925116] bond0: (slave enp8s0f1): Enslaving as an active interface with an up link [ 587.930415] RSP: 0018:ffffc9000048fd88 EFLAGS: 00010282 [ 587.930417] RAX: ffff88846c14fac0 RBX: ffff88846cddcb80 RCX: 0000000080400007 [ 587.930417] RDX: 0000000080400008 RSI: ffff88846cddcb80 RDI: 0000000000000000 [ 587.930419] RBP: ffff88845fd80140 R08: 0000000000000001 R09: ffffffffa074ba00 [ 587.938132] R10: ffff88846c14fec0 R11: 0000000000000001 R12: ffff88846c122f10 [ 587.939473] R13: 0000000000000000 R14: 0000000000000001 R15: ffff88846d7a0000 [ 587.940800] FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000 [ 587.942416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 587.943536] CR2: 0000000000000008 CR3: 000000000240a002 CR4: 0000000000770ee0 [ 587.944904] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 587.946308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 587.947639] PKRU: 55555554 [ 587.948236] Call Trace: [ 587.948834] mlx5_lag_destroy_definer.isra.3+0x16/0x90 [mlx5_core] [ 587.950033] mlx5_lag_destroy_definers+0x5b/0x80 [mlx5_core] [ 587.951128] mlx5_deactivate_lag+0x6e/0x80 [mlx5_core] [ 587.952146] mlx5_do_bond+0x150/0x450 [mlx5_core] [ 587.953086] mlx5_do_bond_work+0x3e/0x50 [mlx5_core] [ 587.954086] process_one_work+0x1eb/0x3e0 [ 587.954899] worker_thread+0x2d/0x3c0 [ 587.955656] ? process_one_work+0x3e0/0x3e0 [ 587.956493] kthread+0x115/0x130 [ 587.957174] ? kthread_park+0x90/0x90 [ 587.957929] ret_from_fork+0x1f/0x30 [ 587.973055] ---[ end trace 71ccd6eca89f5513 ]--- Fixes: b726786 ("net/mlx5: Lag, add support to create/destroy/modify port selection") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
[Why] IGT bypass test will set crc source as DPRX,and display DM didn`t check connection type, it run the test on the HDMI connector ,then the kernel will be crashed because aux->transfer is set null for HDMI connection. This patch will skip the invalid connection test and fix kernel crash issue. [How] Check the connector type while setting the pipe crc source as DPRX or auto,if the type is not DP or eDP, the crtc crc source will not be set and report error code to IGT test,IGT will show the this subtest as no valid crtc/connector combinations found. 116.779714] [IGT] amd_bypass: starting subtest 8bpc-bypass-mode [ 117.730996] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 117.731001] #PF: supervisor instruction fetch in kernel mode [ 117.731003] #PF: error_code(0x0010) - not-present page [ 117.731004] PGD 0 P4D 0 [ 117.731006] Oops: 0010 [#1] SMP NOPTI [ 117.731009] CPU: 11 PID: 2428 Comm: amd_bypass Tainted: G OE 5.11.0-34-generic torvalds#36~20.04.1-Ubuntu [ 117.731011] Hardware name: AMD CZN/, BIOS AB.FD 09/07/2021 [ 117.731012] RIP: 0010:0x0 [ 117.731015] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 117.731016] RSP: 0018:ffffa8d64225bab8 EFLAGS: 00010246 [ 117.731017] RAX: 0000000000000000 RBX: 0000000000000020 RCX: ffffa8d64225bb5e [ 117.731018] RDX: ffff93151d921880 RSI: ffffa8d64225bac8 RDI: ffff931511a1a9d8 [ 117.731022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 117.731023] CR2: ffffffffffffffd6 CR3: 000000010d5a4000 CR4: 0000000000750ee0 [ 117.731023] PKRU: 55555554 [ 117.731024] Call Trace: [ 117.731027] drm_dp_dpcd_access+0x72/0x110 [drm_kms_helper] [ 117.731036] drm_dp_dpcd_read+0xb7/0xf0 [drm_kms_helper] [ 117.731040] drm_dp_start_crc+0x38/0xb0 [drm_kms_helper] [ 117.731047] amdgpu_dm_crtc_set_crc_source+0x1ae/0x3e0 [amdgpu] [ 117.731149] crtc_crc_open+0x174/0x220 [drm] [ 117.731162] full_proxy_open+0x168/0x1f0 [ 117.731165] ? open_proxy_open+0x100/0x100 BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1546 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In machine_kexec_post_load() we use __pa() on `empty_zero_page`, so that we can use the physical address during arm64_relocate_new_kernel() to switch TTBR1 to a new set of tables. While `empty_zero_page` is part of the old kernel, we won't clobber it until after this switch, so using it is benign. However, `empty_zero_page` is part of the kernel image rather than a linear map address, so it is not correct to use __pa(x), and we should instead use __pa_symbol(x) or __pa(lm_alias(x)). Otherwise, when the kernel is built with DEBUG_VIRTUAL, we'll encounter splats as below, as I've seen when fuzzing v5.16-rc3 with Syzkaller: | ------------[ cut here ]------------ | virt_to_phys used for non-linear address: 000000008492561a (empty_zero_page+0x0/0x1000) | WARNING: CPU: 3 PID: 11492 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12 | CPU: 3 PID: 11492 Comm: syz-executor.0 Not tainted 5.16.0-rc3-00001-g48bd452a045c #1 | Hardware name: linux,dummy-virt (DT) | pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12 | lr : __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12 | sp : ffff80001af17bb0 | x29: ffff80001af17bb0 x28: ffff1cc65207b400 x27: ffffb7828730b120 | x26: 0000000000000e11 x25: 0000000000000000 x24: 0000000000000001 | x23: ffffb7828963e000 x22: ffffb78289644000 x21: 0000600000000000 | x20: 000000000000002d x19: 0000b78289644000 x18: 0000000000000000 | x17: 74706d6528206131 x16: 3635323934383030 x15: 303030303030203a | x14: 1ffff000035e2eb8 x13: ffff6398d53f4f0f x12: 1fffe398d53f4f0e | x11: 1fffe398d53f4f0e x10: ffff6398d53f4f0e x9 : ffffb7827c6f76dc | x8 : ffff1cc6a9fa7877 x7 : 0000000000000001 x6 : ffff6398d53f4f0f | x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff1cc66f2a99c0 | x2 : 0000000000040000 x1 : d7ce7775b09b5d00 x0 : 0000000000000000 | Call trace: | __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12 | machine_kexec_post_load+0x284/0x670 arch/arm64/kernel/machine_kexec.c:150 | do_kexec_load+0x570/0x670 kernel/kexec.c:155 | __do_sys_kexec_load kernel/kexec.c:250 [inline] | __se_sys_kexec_load kernel/kexec.c:231 [inline] | __arm64_sys_kexec_load+0x1d8/0x268 kernel/kexec.c:231 | __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] | invoke_syscall+0x90/0x2e0 arch/arm64/kernel/syscall.c:52 | el0_svc_common.constprop.2+0x1e4/0x2f8 arch/arm64/kernel/syscall.c:142 | do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:181 | el0_svc+0x60/0x248 arch/arm64/kernel/entry-common.c:603 | el0t_64_sync_handler+0x90/0xb8 arch/arm64/kernel/entry-common.c:621 | el0t_64_sync+0x180/0x184 arch/arm64/kernel/entry.S:572 | irq event stamp: 2428 | hardirqs last enabled at (2427): [<ffffb7827c6f2308>] __up_console_sem+0xf0/0x118 kernel/printk/printk.c:255 | hardirqs last disabled at (2428): [<ffffb7828223df98>] el1_dbg+0x28/0x80 arch/arm64/kernel/entry-common.c:375 | softirqs last enabled at (2424): [<ffffb7827c411c00>] softirq_handle_end kernel/softirq.c:401 [inline] | softirqs last enabled at (2424): [<ffffb7827c411c00>] __do_softirq+0xa28/0x11e4 kernel/softirq.c:587 | softirqs last disabled at (2417): [<ffffb7827c59015c>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline] | softirqs last disabled at (2417): [<ffffb7827c59015c>] invoke_softirq kernel/softirq.c:439 [inline] | softirqs last disabled at (2417): [<ffffb7827c59015c>] __irq_exit_rcu kernel/softirq.c:636 [inline] | softirqs last disabled at (2417): [<ffffb7827c59015c>] irq_exit_rcu+0x53c/0x688 kernel/softirq.c:648 | ---[ end trace 0ca578534e7ca938 ]--- With or without DEBUG_VIRTUAL __pa() will fall back to __kimg_to_phys() for non-linear addresses, and will happen to do the right thing in this case, even with the warning. But we should not depend upon this, and to keep the warning useful we should fix this case. Fix this issue by using __pa_symbol(), which handles kernel image addresses (and checks its input is a kernel image address). This matches what we do elsewhere, e.g. in arch/arm64/include/asm/pgtable.h: | #define ZERO_PAGE(vaddr) phys_to_page(__pa_symbol(empty_zero_page)) Fixes: 3744b52 ("arm64: kexec: install a copy of the linear-map") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/20211130121849.3319010-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
smc_lgr_cleanup_early() meant to delete the link group from the link group list, but it deleted the list head by mistake. This may cause memory corruption since we didn't remove the real link group from the list and later memseted the link group structure. We got a list corruption panic when testing: [ 231.277259] list_del corruption. prev->next should be ffff8881398a8000, but was 0000000000000000 [ 231.278222] ------------[ cut here ]------------ [ 231.278726] kernel BUG at lib/list_debug.c:53! [ 231.279326] invalid opcode: 0000 [#1] SMP NOPTI [ 231.279803] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.46+ torvalds#435 [ 231.280466] Hardware name: Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014 [ 231.281248] Workqueue: events smc_link_down_work [ 231.281732] RIP: 0010:__list_del_entry_valid+0x70/0x90 [ 231.282258] Code: 4c 60 82 e8 7d cc 6a 00 0f 0b 48 89 fe 48 c7 c7 88 4c 60 82 e8 6c cc 6a 00 0f 0b 48 89 fe 48 c7 c7 c0 4c 60 82 e8 5b cc 6a 00 <0f> 0b 48 89 fe 48 c7 c7 00 4d 60 82 e8 4a cc 6a 00 0f 0b cc cc cc [ 231.284146] RSP: 0018:ffffc90000033d58 EFLAGS: 00010292 [ 231.284685] RAX: 0000000000000054 RBX: ffff8881398a8000 RCX: 0000000000000000 [ 231.285415] RDX: 0000000000000001 RSI: ffff88813bc18040 RDI: ffff88813bc18040 [ 231.286141] RBP: ffffffff8305ad40 R08: 0000000000000003 R09: 0000000000000001 [ 231.286873] R10: ffffffff82803da0 R11: ffffc90000033b90 R12: 0000000000000001 [ 231.287606] R13: 0000000000000000 R14: ffff8881398a8000 R15: 0000000000000003 [ 231.288337] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 231.289160] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.289754] CR2: 0000000000e72058 CR3: 000000010fa96006 CR4: 00000000003706f0 [ 231.290485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 231.291211] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 231.291940] Call Trace: [ 231.292211] smc_lgr_terminate_sched+0x53/0xa0 [ 231.292677] smc_switch_conns+0x75/0x6b0 [ 231.293085] ? update_load_avg+0x1a6/0x590 [ 231.293517] ? ttwu_do_wakeup+0x17/0x150 [ 231.293907] ? update_load_avg+0x1a6/0x590 [ 231.294317] ? newidle_balance+0xca/0x3d0 [ 231.294716] smcr_link_down+0x50/0x1a0 [ 231.295090] ? __wake_up_common_lock+0x77/0x90 [ 231.295534] smc_link_down_work+0x46/0x60 [ 231.295933] process_one_work+0x18b/0x350 Fixes: a0a62ee ("net/smc: separate locks for SMCD and SMCR link group lists") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
^ I don't know why GH says these commits referenced your message :D Hi, I'm happy to learn about more people working on Mediatek devices! I see that you are working on a Denali Family SoC (mt6735) and mt6577 differs so much that I won't be able to help you using knowledge I gained while working on it, and I will explain why:
Weird, where did you see LK on Mediatek? I thought all mtks rely on UBoot. The only occurrences of LK I saw were in SP Flash Tool and in the appropriate scatter files.
It sounds super great, I'm eager to see your code for mt6328. My device has an mt6329 PMIC and judging by the downstream code it uses multiple ways to communicate with a CPU. At the very least I notices mentions of I2C (PMIC is sitting at some address on the first I2C bus) and DVFS (CPU and PMIC are communicating over 2 pins by reading/writing various registers). I haven't touched the power management subsystem yet to form an educated opinion on it.
Thanks for your kind words, I'm very happy to learn about I've helped someone by sharing such a specific knowledge :D
I have no desire to send my mt6577 stuff to mainstream, too, but for other reasons (maybe one day I will have some balls to actually say it on LKML haha) as I am repelled by some nuances. I enjoy programming but my primary language is not C and its derivatives so I struggle a lot sometimes. Linux coding style is very well documented btw, and speaking of code formatting you can easily point
Don't worry mate, it takes experienced Mediatek developers more than a year to half-ass address the I2C driver issue. Fyi I've been trying to ask them just a DMA register address required for I2C driver as neither mt6577 datasheet nor genuine BSPs had it. I did several attempts but none of them were successive. Sometimes I feel like @mbgg is the only sane and helpful and generally awesome person on linux-mediatek mailing list. While the code in this repository looks sort of fresh (mainly because of rebases and force pushes) I must say I've been writing it since ~April 2020. I've had some free time when the first Covid restrictions took place in my country and I decided to dive into Linux kernel to see how deep could I go. Back then I knew absolutely nothing and now I have some basic knowledge which is definitely cool. Currently I don't have a lot of free time and to be honest I've been slowly losing my interest in this small project. I'm returning to it every month or so for maybe an hour and that's all :D
No :( As I said above, Mediatek was still relying on the ancient code even in their v3.4.x kernels when the modern Linux clock subsystem we know today was introduced. On downstream, all PLLs and clocks are fired up all together at boot in the according driver. And the normal linux clock driver is just....... well, see it yourself. Porting mt_clock_manager to mainline is impossible because it's proprietary so the only way is to write it from scratch which is painfully impossible, to say the least.
Well, almost ;) Here's the complete list of clocks.
I don't have any social media accounts per se, and you can reach me on Matrix. My Matrix username is written on my Github profile below my avatar. |
@alifilhan0 I worked on MT6737T on a Galaxy Grand Prime+ which is pretty much the same. I looked into replacing the stock ATF since on this device it appears to be somewhat broken and isn't able to boot 64-bit Linux. The stock bootloader is also broken and can't properly load an initramfs. However, secure boot has made it impossible to do anything so far. I tried using an exploit with bypass_utility, and it supposedly was able to disable secure boot, but I would get stuck in BROM after using it, and a reboot restores secure boot. SP flash tool also doesn't work on this device, and my guess is that Samsung disabled it since they provide their own Odin, but I couldn't risk flashing anything with it since it is part of the bootloader, and if I were to get the device stuck in an earlier stage, say in BROM or preloader because of a signature mismatch or something else after flashing a modified ATF, then I'd basically have a brick in my hands.
@arzam16 they do use LK on MT6735, and possibly newer SoCs as well. |
@Tooniis , The device not booting Linux 64, you mean booting a downstream kernel compiled for 64 bit or any attempt like these? I expected these errors already, maybe a better ATF would solve it? I For what I have seen, the LK will show header errors when loading a BOOTIMAGE that doesn't satisfy the peculiar requirements in these LK bootloaders to be loaded. that is why I started working in U-Boot. It did show header image problems when I tried the same. I think we need a new bootloader in this device. For preloader you could try using source codes from BSPs with the same chip and same memory configuration and flash it with SP Flash? So far ATF and LK replacement is a must if we want to advance further. @arzam16 , MT6329 actually uses an I2C interface, and in terms of working, it is similar to MT6311. For older ancient codebase, I also have a MT8382, basically a MT6582 without a modem, and did work on it too. There seems to be available BSPs on GitHub no matter how old and crappy they are, info is always useful. And I may be wrong, but I have seen LK on all possible MTK chips that have PWRAP, no matter how old they are, and most I2C PMIC based chips had U-Boot. For example, the MT6582, MT6572 I saw LK for them too, at least MT6582 never had any U-Boot where MT6572 may have. And yes, there are LK versions of MT6577, have a look at here https://github.com/gallants/acer-e350s-aa66/tree/master/bootable/bootloader/lk/build-aa66/platform/mediatek/mt6577 These older chips running 32 bits wouldn't have a ATF, so yes, your chip doesn't need it. But it is compulsory if you see into 64 bits, and I think PLL is easy, the ****PLL_CON0 is enreg and so on. A bit of off-topic |
@alifilhan0 I meant booting a 64-bit mainline kernel. The vendor kernel is 32-bit, but this SoC has armv8 CPUs so it would've been able to boot a 64-bit OS if it weren't for the broken ATF that can't switch the normal world to 64-bit mode. I don't think a 32-bit port for a SoC with armv8 CPUs would be accepted in Linux upstream, so I'm inclined to make 64-bit boot work. PSCI is also missing from the stock ATF, so that's another thing that can be improved with a modified ATF.
As I said, SP flash tool doesn't work on this device, so the only way to flash a preloader probably would be to |
@Tooniis , You have ATF source code? I have them from Orange Pi 4G IoT BSP....It is for MT6737 and MT6735 at the same time since they are exactly similar (Except for the fact that MT6737 is just a slower clocked and has an updated Modem). So, what I was thinking that MT8173 has really good support from the linux kernel, ATF and whatnot......so I thought of updating the ATF using my source code but following how mt8173 is added in ATF. But, the source tree for the ATF and MT6735/MT6737 is exactly the same, structured properly And since the vendor OS is 32 bit, it is totally expected by me that it wouldn't be able to run 64 bits, the ATF and LK/U-boot stuff may never shift the CPU properly to 64 bits. My device luckily, runs on full arm64 though. |
@alifilhan0 I don't, but I extracted a binary from another MT6735 device that runs arm64 android from factory and I was planning to use that. |
Every time I sit down and try to write anything for the clock driver, I usually quit after 10 minutes 😆 Clocks on fresh platforms look better than whatever mt6577 has. Taking a closer look at the mt6577 clock_manager thing reveals using various techniques for enabling PLLs: some need to enable a bit in a register, some need to disable a bit in a register, some need a special value written into a register. Also, it was super hard to me to understand (I still haven't) how is each clock related to each other on my platform since modern Linux clock infrastructure requires specifying parents. Even with a PDF* with useful info I can't really adapt the old mt6577 code for the new infra because it seems to differ A LOT. Well... or maybe I'm just lazy, or dumb, or just don't have enough time to understand it once and for all. I still haven't chosen an excuse for that 😅 The next subsystem on my imaginary "roadmap" after clocks is GPIO and I think it looks a bit easier.
Hey, thanks! 👍 *I can't say some things on this thread because I might want to upstream this code in future (and I would like it to be legally clear) but there are places where you can get a PDF and tons of genuine BSPs without any kind of coins required. For this and any other matter you can HMU on matrix at any time. Worth mentioning, there are postmarketOS chats with a lot of members (most of them work on qcom tho), maybe you will be interested in joining? Me and @Tooniis are both there in the |
@arzam16 Deducting how a clock was related may be the only option for now. Or other than that, it can be guessed by the clock names from the clock manager header right? I know it is not even close to enough, but at least it gives some of the clocks used by hardware........but the big issue remains, how the PLLs were divided. For example, how many Xpll_d**, Xpll**_d** (where ** are numbers, X is like main, univ, sys, etc) are there and what is their parent PLL. On these, I don't think even any datasheet can help. @Tooniis But the risk of being bricked for flashing another ATF still remains, or does it not? |
I have some initial work done on u-boot in my u-boot repo.....interested in reviewing it? |
Update, this is my almost ready to test condition, added wrapper, regulator and usb-otg mode from scratch and borrowed code for rest peripherals from MT7623 and MT8516(observing datasheets). Only thing needed is to add different config options now(I just added obj-y += so that it is forced to compile and the error messages are found off the bat) |
I have 2 tablets on mt6735 (and mt6328). Almost a year ago I tried to port the mainline to them, but I couldn't even enable uart. This is partly the blame for the curve bootloader in these devices. I would like to try your u-boot, but I see that it has voltage regulators, do I need to change the voltages to my own? |
@ave4 When writing almost ready I completely forgot about the part that these MTK devices MUST go through the preloader. And the preloader need some mtk header appended to the bootloder image to be loaded. U boot for MT7623 already follows that method and adds headers to the image and I need to follow the same for this build too. I will start work that tomorrow and let you know when it is done. So for now until header problem is solved, if you want to test boot(which is the only way available too) the bootloader, just replace the kernel image with this u-boot.bin and repact, and then fastboot boot boot.img. You can thus test u-boot safely and then revert back to android by just rebooting again. (I think you know more in this regard). Moreover, can I know how and why you faced the UART problem you mentioned? If you tried to get some console on the downstream firmware, it is disabled for the kernel and is mapped to other UARTs(those not mapped on the PCB) for preloader and LK I don't know why but this was actually the case with many devices As for the regulators, you can change them. It is better to follow the downstream device tree voltages if you want to change it. But the thing is, the driver along with the PWRAP controller was written by me from scratch. So it needs some testing. For now, just assuming the code will work and working on adding the POWER button turn off feature for uboot |
This will probably become the common way to install it. Using U-Boot as a secondary bootloader should be possible even on devices with secure boot enabled and no known way of disabling it or working around it, and on devices with broken/non-existent fastboot, you get a working fastboot interface as a bonus :) I'll probably give it a shot on my device. Booting 32-bit mainline Linux is still much better than nothing. |
@Tooniis, mind you though....i would recomend it to test boot using fastboot boot boot.img only, without any flashing involved, reverting back to thr safe old android then is just a reboot away. And I am still looking for a way to totally disregard the crappy LK these devices are shipped with, maybe the header problem will fix the issue. |
@ave4 @Tooniis Is there a known fix to the secure boot problem? I saw many sites providing bypass utilities and tools like this one https://forum.hovatek.com/thread-37957.html . Is this what we're looking for? The bypass_utility @Tooniis shared looks interesting. But don't you have the DA file for your model? The stock factory firmware archive don't contain it? As for the U-Boot update, I am currently working on a config.h file for the test device. The bootloader image header problem does need some insights. |
@alifilhan0 I might be wrong but IIRC the MTK-bypass/bypass_utility didn't work for @Tooniis because the bootloader of his device is kinda weird. Also it's not cool for me to speak for him but I definitely can remember us discussing something about the MTK-bypass not working in the past |
I changed the issue title to something more informative as it's not really related to mt6577 but definitely worth discussing in my opinion. |
@arzam16 This thread looks more like a discussion thread than informing issues. Is this one disturbing all of your work? If I had a MT6577, I could help too. And after I finish u-boot on the MT6735, I am interested in trying out writing clock drivers for MT6577. The problem with the different bit combination you mentioned earlier is more or less present on almost all MTK SoCs(At least on my MT6735 and MT8382, MT6582 device sources) and might be related to the modem present. I got my hands on some modem source code and that's where my theories are coming from. And for initial stage, it may be like that we miss the modem clocks to try out normal enabling the PLLs and their derivatives. Only then thinking about how to pipe them to each peripheral. How was your approach and progress on the clock? |
Personally I have no complaints. Though other people who
I approached clock driver development a few times already. Mostly I just complained about how hard it is on various chats but in the meanwhile I tried the following:
aaand that's it. As I said I don't have enough free time and currently I'm working on other projects while I can. But yeah I frequently catch myself thinking about that pesky clock driver. [1] MT6577 HSPA Smartphone Application Processor datasheet v0.94 (24.07.2012) |
@ave4 ,
Wait, what? I don't think you're supposed to do this. I can't speak for mt6735 but on mt6577 uboot/lk could only be flashed with SP Flash Tool. |
@arzam16 , we are just trying to boot u-boot (chainloaded by LK) safely without bricking our devices. It is entirely possible. But what we might be missing is how MTK's boot images are packed. MT6577's U-Boot is wholly on another level which I dare not to speak of in terms of interacting with it XD. On that platform, yes it isn't possible, but it is with newer ones which at least use a device tree(or even if it is an old LK, I could try appending the dtb with bootloader itself and then give it a shot. No harm when the device is fine isn't it?) @ave4 And we need the proper dtb for u-boot, that means, we may have to change the .dtb file from unpacked boot image as well. And if possible, rename both the uboot.bin and dtb into the old filenames. I will keep trying to get this to at least spit some lines. It would be great if you could look out something too. And in menuconfig, change CONFIG_DEBUG_UART_SHIFT from 0 to 2. Not necessary but should help since you also use that in your mt6735 linux kernel defconfig |
@ave4 , For now, I the only problem I can think of is the base address being wrong. Maybe changing them will work, but we need to know from which address boot.img is accessed. I am testing with TEXT_BASE = 0x40000000, and load addr 0x40080000(this one is not needed right at this moment) Upd : When I try TEXT_BASE = 0x40000000, and I get invalid ramdisk address, overlaps with LK. So maybe it is close to the right address sets? |
@ave4, What do you think about this? https://lists.denx.de/pipermail/u-boot/2020-August/424231.html Before we make conclusions on where the base addresses and U-Boot Text base should be, I would like to see the full normal bootlog of your device. I mean just a normal boot containing the LK and Linux kernel UART log. My LK' UART is disabled(shielded to another UART port not mapped), only Linux kernel log is available so if you could help please..... |
Hope this is enough: |
@ave4 So, the conclusion is, The text base is 0x40080000. And just a before, you have to replace the kernel binary with u-boot-dtb.bin as you did earlier and got a reboot, just also replace the dtb binary with u-boot.dtb, and if still doesn't work then rename the new files as your replaced counterparts. Build with the new text base and select load addr 0x41000000, deselect CONFIG_POWER_MT6328. Let me know what happens |
Changing text_base getting a lot of bind errors, changing load addr doesn't affect anything
|
@ave4 this is at least some progress....we see u-boot executing. The load addr is just a useless value at this moment since we are not targeting kernels right off the bat, but I think it is good to keep it in a build that is being used/tested The drivers not binding is wholly a different story. Need to change the memory allocated for u-boot binding the drivers. I may have done something wrong at setting other addresses while getting the text_base right. |
@ave4 , you could try increasing the size of CONFIG_SYS_MALLOC_F_LEN until you see different error messages. The message seems that u-boot exhausted out the memory allocated for it. Start from CONFIG_SYS_MALLOC_F_LEN 0x4000 and keep increasing it until something new comes |
|
@ave4, This is U-Boot panic, for not matching the CONS_INDEX with the UART0 port for your build. By default, I selected it to one for my mapped uart is UART1. But this is the cause of the error. Deselect CONFIG_SPECIFY_CONSOLE_INDEX manually(it is selected by default no matter what) and build -> boot. By the way, how much Malloc did it need to get this far? |
After adding configs to build 32-bit U-Boot, adding a device tree for my device, and some tinkering with UART and TEXT_BASE, I got a message from U-Boot to show up:
Note that I am using UART0 (0x11002000) here. |
@Tooniis Umm, what? You are using serial 0x11002000 but that message came from UART1, 0x11003000. Pinctrl problem on UART1....And can you show a bigger log please, it helps greatly. And kindly, can I see your defconfig? |
The message came over UART0, and if I understand correctly it is saying that it failed to select a pinctrl state for UART1 for some reason. Everything above
|
Can we make a Matrix room? It would be more suited for discussion than a github issue. |
I rebooted and captured the full log starting from BROM:
|
I removed
looks like it doesnt init uart0 properly or sets it to a different baud rate than what LK does which causes that garbled output. |
@Tooniis What's your serial port speed? Mediatek has 921600 by default, but u-boot has 115200. |
@ave4 115200 |
I've created a Matrix room if anyone is interested: https://matrix.to/#/#u-boot-mt6735:matrix.org |
@Tooniis I've looked into earlier issues first and I know MT7628 having similar problems regarding pinctrl. And in u-boot, before jumping to point, I am observing your LK and early log to be greatly different than MT6735 devices running 64 bit firmware by default, and the main problem with me is LK log is shielded to another port and thus I cannot get U-Boot logs either if I can't get further in the boot of this bootloader. Now, CONFIG_SYS_TEXT_BASE=0x40004000, are you sure? I mean does your kernel load from this address? This address should be exactly the same as your kernel address loaded by LK since U-Boot here essentially pretends to be the kernel. Regarding the garbage output, either there is some baud rate mismatch or the clock I set(26 MHz) is wrong. |
@ave4 , have you tried deselecting CONFIG_SPECIFY_CONSOLE_INDEX? What's the output? |
@alifilhan0 Yes that is where the bootloader loads the kernel. It does not even print that pinctrl message with a different TEXT_BASE. |
@Tooniis , So for you 0x40004000 is correct. I am very unfamiliar with Matrix. Any guidance on how to use it? |
@alifilhan0 You should find instructions in the link. Otherwise install Element or use the web client on https://element.io/ and create an account then join the room with it. |
@alifilhan0 matrix is something between telegram and email, bringing the best from both worlds. From former you get the decent IM, from latter you get the decentralization (just like with email, you can register on any matrix server and join conversation from there). Usually you just pick a compatible client and sign up on any matrix homeserver - there are lots of them but feel free to pick any. If you don't really feel like creating an account in yet another messenger, let us know if you maybe prefer telegram instead? I think it could be easily bridged by @Tooniis so you could stay in tg while we stay in matrix |
@alifilhan0 I don't use telegram so I can't say, but as @arzam16 said they can be bridged so it doesn't really matter. Although there are other Matrix rooms you might be interested in joining so getting into Matrix might not be totally useless for you. |
@alifilhan0 I agree with @Tooniis and I recommend joining Matrix, too 👍 |
Nothing changed. |
Hello, it seems like a great effort. Sorry, it's a huge message but I had and have a lot to say about the topic, so it's a request to read it.
My goal was also something similar, using only modem and wifi code from downstream kernel and rest from the mainline kernel. Establish a new free bootchain by updated ATF(Arm-Trusted-Firmware), OP-TEE as the new TEE OS, U-Boot replacing LK totally. Then running standard linux and mainstream android on my device....it's MT6735 btw. For all the work I have done and thought about doing later, I just compared how MT6797's basic parts were available mainline was relatedto the downstream kernel sources, and collected some datasheets from various chinese websites to facilate my work(I have NO intentions of sending them upstream since I am no programmer or familiar with linux's mainline coding style). It would be great we could share whatever we both know and I might learn something even more new. I already have solutions for updated ATF, OP-TEE so I consider it done(For ATF I have MT6735 source code v1.0 and I think it would be very easy to integrate it with the mainline ATF code(not upstreaming it though. And for the TEE OS, it has support for MT8173. MT8173 and MT6735 should work exactly similar in the OP-TEE, as I have compared their base registers and found them to be exactly same). My current progress is on U-Boot, added pwrap for MT6735(inspired from SPMI in Qualcomm, but what I wrote is VERY MT6735 specific an just basically a scrapped version from linux kernel PWRAP), UART, SD, PWM, CLK, PINCTRL, all of them are done. Now adding regulator support for MT6328 following how da9063 is supported.
On linux kernel, I also wrote PINCTRL, CLK, SD, PWRAP, SCPSYS, UART, I2C, AUXADC, PWM, MT6328 regulator and many more. Got huge help from your mainlining notes, thank you very much for that. I will start uploading on github properly once I am done working on U-Boot at least :). But the problem is I have to do all these alone, with no one of great high skills to help, so that when I get stuck badly, it takes days to solve a complex problem.
Looking on to your work, it looks serious. Do you have a clock driver for MT6577? I compared MT6797, MT8173 and many more's downstream clkmgr and found some theories, the PLL registers are just the base addresses like ARMPLL_CON0 is the enreg etc. name of clocks and how many clocks should be on the clk-mt6577.c file are in the mt_clock_manager.c right? As for the PWRAP on most other MTK SoCs, you may not need it because it works over i2c, the fuelgauge driver(if ever needed) is very similar to 88pm860x-battery.c. For me, I had to add pwrap support for the MT6735, then scrap MT6323 and MT6359's regulator drivers to form the MT6328's regulator drivers. Then added MFD support for MT6328 following MT6323.......I think if we just add support lscrapping drivers for similarly available hardware, it should work right? I am here just for learning myself....but would be more than happy to know if I could help.
Finally, messaging in the github's issues is very annoying isn't it? Do you have any more convenient place to talk? Like some social media or something?
The text was updated successfully, but these errors were encountered: