This is because how IPv4 hosts communicate with each other. The hosts must know each other's MAC address (Layer 2 address) in order to send frames to each other. With TCP/IP, hosts only know each other's IP addresses; therefore, they need to discover each other's MAC addresses before they can properly build an Ethernet frame. Any Ethernet frame must have the destination MAC address and the source MAC address in it. When it comes to IP addressing, the Ethernet frame doesn't have to have an IP address. In fact, the IP address is a feature of the IP protocol, not a feature of the Ethernet protocol. Back in the 1980s and early 1990s, before Internet became pervasive, Microsoft used the NetBEUI protocol for LAN hosts to communicate with each other for file sharing. NetBEUI was not Layer 3 aware, and so there was no concept of Layer 3 addresses assigned to hosts in NetBEUI whatsoever, yet NetBEUI leveraged the same Ethernet protocol that we are using today with TCP/IP.
So, hosts using TCP/IP for communicating with one another use Address Resolution Protocol (ARP) to discover each other's MAC addresses. The ARP request protocol frame has the IP address of the host whose MAC address needs to be discovered in the payload of the frame, the broadcast destination MAC address in the Ethernet header of the frame and the MAC address of the host sending the ARP request as the source MAC address. The broadcast destination MAC address (FF:FF:FF:FF:FF:FF) has the following meaning: Every host in the broadcast domain must pay attention to such a frame. Therefore, each host on the same broadcast domain has to use its CPU cycles to look inside the frame and make sure that the IP address listed as the payload is not its own IP. If the IP address inside the frame doesn't belong to the host, the host discards the frame. However, if the IP address listed inside the frame belongs to the host, then the host must send an ARP response back to the host that originated the ARP request, using the sender's MAC address that was included in the ARP request. Back in the day of the NetBEUI protocol, hosts were identified on the LAN by their hostnames. However, because NetBEUI ran over Ethernet, for host "Fork" to communicate with host "Spoon", Fork needed to discover Spoon's MAC address, which was done similarly to the way ARP discovers MAC addresses, i.e. Fork would issue a broadcast requesting to resolve name Spoon into a MAC address. Once Fork knew Spoon's MAC address, Fork could build an Ethernet frame to send to Spoon, and the payload of that frame had no IP addresses in it at all. That's why NetBEUI was replaced with TCP/IP once hosts needed to communicate across the boundaries of their broadcast domain for accessing the Internet.
Here's the kicker. ARP only works within one broadcast domain. What is a broadcast domain? It's a collection of hosts that can see an Ethernet broadcast (destination MAC address FF:FF:FF:FF:FF:FF). An unmanaged Ethernet switch sends Ethernet broadcasts out of all of its ports, whereas a VLAN-aware switch (managed switch) forwards broadcasts only out of the ports assigned to the VLAN where the broadcast was originated.
If there is a network of unmanaged switches, then every switch in the network will send an Ethernet broadcast out of each of its ports. Therefore, in a theoretical scenario of 1 million hosts connected to a network of unmanaged switches, 999,999 hosts will receive one host's Ethernet broadcast. Provided that each host issues multiple Ethernet broadcasts, there will be tens or hundreds of millions of Ethernet broadcasts occurring on such a network. This means that every host on the network will have to use its precious CPU cycles to constantly process broadcast Ethernet frames because the destination MAC address in each of this frame will be FF:FF:FF:FF:FF:FF, which means that every host that receives such a frame must pay attention to this frame. Needless to say that most of the CPU cycles of each host on such a network will be dedicated to processing broadcast frames, so other tasks that the computers are supposed to perform will be performed extremely slowly (if at all). Additionally, it means that each of the Ethernet links (let's say they are 1 Gbps links) connecting hosts to the switches will be saturated with Ethernet broadcasts.
This is why the concept of broadcast domain was introduced. A broadcast domain is a certain number of hosts that can hear each other's Ethernet broadcast frames. To minimize the problems described in the previous paragraph, the entire network is segmented into smaller Ethernet chunks called broadcast domains. This is done with segmenting Ethernet networks into VLANs. The recommendation is to limit each broadcast domain to 254 hosts (or fewer) . Therefore, it's recommended to match a /24 subnet (a subnet with the network portion limited to 24 bits) to each VLAN. This makes the host portion of each subnet to be 8-bit long and the maximum number of hosts on each VLAN to be 256 (2^8=256). Because each subnet is supposed to have a "network IP" and a "broadcast IP" (this is not the same as broadcast MAC, by the way), the total number of usable hosts on each /24 subnet is 254.
Hosts on two different broadcast domains (VLANs) cannot directly communicate with each other because they cannot hear each other's ARP requests. Therefore, if you match a subnet to a VLAN, then in order for Host A on VLAN10 (192.168.10.10/24) to communicate with host B on VLAN20 (192.168.20.10/24), Host A will need to use a L3 routing device (router or L3 switch). Host A will compare its own IP address/mask combination to Host B's IP address and will discern the fact that Host B is on a different subnet. In this case, Host A will know that it cannot send its packet directly to Host B and instead it needs to involve its default gateway. The default gateway is an IP address on the router connected to the same broadcast domain (VLAN) as Host A. In this case, if Host A needs to send a packet to Host B, Host A cannot issue an ARP request to Host B (because Host B is on a different broadcast domain), but Host A can issue an ARP request for its default gateway, discover the default-gateway's MAC address and forward the packet destined for Host B as an Ethernet frame addressed to the default gateway. The destination MAC address in such a frame will be the MAC address of the default gateway (router or L3 switch), but the destination IP address in the TCP/IP packet (which is included within the Ethernet frame) will be the IP address of Host B. At this point, it will be the responsibility of the default gateway to figure out how to deliver the packet to Host B. As you may have guessed, if the default gateway has an interface on the IP subnet where Host B lives, then the default gateway will issue an ARP request for Host B's IP address, and once it receives Host B's MAC address, it will build an Ethernet frame with the Host B's MAC address as the destination address of the frame. Enclosed within that Ethernet frame will be the original IP packet (sent by Host A) with the destination IP address of Host B. If the default gateway has an Ethernet switch (as it's usually the case with consumer-grade routers), then this frame will be sent out of the Ethernet switch port where the ARP response from Host B was heard.
That, in a nutshell, is how IPv4 works. If the default gateway doesn't have an interface on the same subnet where Host B lives, then the default gateway will look at its routing table to figure out where to send this packet. The least the default gateway should have is a default route, which means that if the default gateway has no knowledge of the IP subnet where Host B lives, then it should use the next L3 hop specified as the destination of the default route. In a simple Internet router scenario, the default route will point to the ISP's next-hop router, and this is where the default gateway (the consumer Internet router) will send all the packets with destination IPs that do not belong the networks connected locally to the consumer Internet router.