SteveMaskery Posted March 16, 2022 Posted March 16, 2022 Site URL: https://www.stevemaskery.com/ Hi, I'm new to SS and not a lot of previous experience with HTML. I have two domains, workshopessentials.com and stevemaskery.com, both of which are forwarded to: https://amphibian-celery-b7en.squarespace.com I'm getting a nag from Google saying that I need to add a couple of lines to my robots.txt file so that it can index my images properly. How do I find and access this file please? Many thanks Steve
Solution paul2009 Posted March 17, 2022 Solution Posted March 17, 2022 (edited) On 3/16/2022 at 10:47 AM, SteveMaskery said: How do I find and access [my robots.txt] file please? All Squarespace sites use the same robots.txt file and as a Squarespace user you cannot access or edit it. Squarespace have already specified the pages that should not be crawled by search engines because they’re for internal use only or display duplicate content. For example, /config/ is your Admin login page, and /api/ blocks the Analytics tracking cookie. Here are some pages that Squarespace ask search engines not to crawl. These pages organize content that exists elsewhere on your site. /search /*?author=* /*&author=* /*?tag=* /*&tag=* /*?month=* /*&month=* /*?view=* /*&view=* /*?format=* /*&format=* /*?reversePaginate=* /*&reversePaginate=* For a complete list of the excluded pages, view the robots.txt file on any Squarespace website. Edited August 6, 2022 by paul2009 Added additional information Me: I'm Paul, a SQSP user for >18 yrs & Circle Leader since 2017. I value honesty, transparency, diversity and good design ♥. Work: Founder of SF.DIGITAL. We provide high quality original extensions to supercharge your Squarespace website. Content: Views and opinions are my own. Links in my posts may refer to my own SF.DIGITAL products or may be affiliate links. Forum advice is completely free. You can thank me by selecting a feedback emoji. Buying a coffee is generous but optional.
SteveMaskery Posted March 17, 2022 Author Posted March 17, 2022 (edited) Ah, OK, thank you. Google is telling me this: Quote Dear Google Merchant Center user, Merchant Center Account: xxxxxxxxx We have detected a recent configuration change of the site hosting your images that result in the disapproval of some of your items in your Merchant Center account. Since images are an important part of the rich product information shown in Shopping ads, we require that all items include a valid image that can be indexed by Google. We crawl the images you submit to Merchant Center every few weeks to ensure that users always see the most recent version. Our ability to crawl your images can be restricted with a robots.txt file, which might lead to the disapproval of affected items. Learn more about robots.txt by visiting https://support.google.com/webmasters/answer/6062608. What's the issue? We have detected that a robots.txt file that controls the indexing of some of your provided images has been updated recently. As a result of these updates we aren't able to index the images of some of your items. This will result in the disapproval of the affected items. Details and impact: Estimated percentage of offers affected: 100 File: http://www.workshopessentials.com/robots.txt In order for us to access these images, please modify the robots.txt file mentioned above to allow the user-agent Googlebot-Image to index these images. You can do this by adding the following lines to the robots.txt file: User-agent: Googlebot-Image Disallow: If modifying this robots.txt file is not feasible you might want to consider hosting your images on a different hosting service that allows images to be indexed by Google. Is this something I should be worried about or can I safely ignore it? Many thanks Steve Edited March 18, 2022 by SteveMaskery Remove sensitive info
risingsunyoga Posted May 8, 2022 Posted May 8, 2022 I am having the same issue... Did you find a resolution?
burmdoge Posted May 11, 2022 Posted May 11, 2022 Hi Paul, Perhaps there is there a workaround? Or some alternative measure?
paul2009 Posted May 11, 2022 Posted May 11, 2022 @burmdoge What issue are you trying to address? Me: I'm Paul, a SQSP user for >18 yrs & Circle Leader since 2017. I value honesty, transparency, diversity and good design ♥. Work: Founder of SF.DIGITAL. We provide high quality original extensions to supercharge your Squarespace website. Content: Views and opinions are my own. Links in my posts may refer to my own SF.DIGITAL products or may be affiliate links. Forum advice is completely free. You can thank me by selecting a feedback emoji. Buying a coffee is generous but optional.
Dion Posted May 13, 2022 Posted May 13, 2022 Hi Paul, I'm trying to get the index page on our Squarespace website indexed, but in the Google Search Console, URL inspection it's showing: URL is not on Google: Indexing errors Failed: Redirect error https://mesaio.com.au Looks like it's related to the robots.txt file, could you please help resolve this issue? Thank you, Dion
erin Posted June 4, 2022 Posted June 4, 2022 I'm having the same issue and my clients are getting very anxious about not being find-able on Google, it's been almost 3 months since we launched arcticroadrally.com. Please help!
ConnerO Posted July 20, 2022 Posted July 20, 2022 See this article by Squarespace. https://support.squarespace.com/hc/en-us/articles/206543207-Understanding-Google-SEO-emails-and-console-errors
MJB1923 Posted September 5, 2022 Posted September 5, 2022 Squarespace robots.txt is blocking Google from indexing my blog posts. How do I edit the robots.txt to stop this? Totally ridiculous. Checking before I have to move my site off of Squarespace as it will not allow blog posts to be indexed. Regular pages are indexed. Chiara1234 and sweatmeadow 2
JSantos Posted October 31, 2022 Posted October 31, 2022 @MJB1923 Did you find any resolution to this? My blog posts are receiving the same Google message. Please let me know. Thank you, Jackie
Chiara1234 Posted November 15, 2022 Posted November 15, 2022 On 9/5/2022 at 5:29 PM, MJB1923 said: Squarespace robots.txt is blocking Google from indexing my blog posts. How do I edit the robots.txt to stop this? Totally ridiculous. Checking before I have to move my site off of Squarespace as it will not allow blog posts to be indexed. Regular pages are indexed. I experience the exact same!! Can someone from Squarespace please help? Because blogposts sure are NOT pages that google should not crawl. Tell us how to fix it or a lot of people will move their business away from squarespace. NiclasG and sweatmeadow 2
paul2009 Posted November 20, 2022 Posted November 20, 2022 On 11/15/2022 at 11:14 AM, Chiara1234 said: Squarespace robots.txt is blocking Google from indexing my blog posts. Can someone from Squarespace please help? Because blogposts sure are NOT pages that google should not crawl. @Chiara1234 I believe you have misunderstood the message from Google. Squarespace is not blocking Google from indexing your blog posts. Google is indexing your blog but Squarespace is correctly preventing Google from indexing the category links on your blog page, for example /blog/category/Reviews. This link leads to duplicate content (the same blog posts that have already been indexed individually) and so this link should not be indexed, as I mentioned in my answer above. Did this help? Please give feedback by clicking an icon below ⬇️ schmutzie 1 Me: I'm Paul, a SQSP user for >18 yrs & Circle Leader since 2017. I value honesty, transparency, diversity and good design ♥. Work: Founder of SF.DIGITAL. We provide high quality original extensions to supercharge your Squarespace website. Content: Views and opinions are my own. Links in my posts may refer to my own SF.DIGITAL products or may be affiliate links. Forum advice is completely free. You can thank me by selecting a feedback emoji. Buying a coffee is generous but optional.
BenCircular Posted August 7, 2023 Posted August 7, 2023 On 3/6/2023 at 7:36 AM, Desi-Rae said: Can Squarespace PLEASE stop controlling this? It's unnecessary, has far reaching negative implications for users, and why? Who decided these things? Why should it not be indexed? Why can't the user decide this? What exactly will this break for Squarespace while it is positively hurting customers? As a webdeveloper of 14 years experience, I 100% disagree. A robots.txt is extremely important to SEO and given Squarespace has intimate knowledge of their own systems that it would be impossible for everyday users to know they should absolutely set the default robots.txt. You need to have more trust in a company that literally creates and runs millions of website rather than some article you read online....context is key here. You get the benefit of their expertise, and essentially they are saving you from killing your SEO in 100s of different way by having a misconfigured robots.txt or missing configurations because you don't know all the paths Squarespace has available. You have asked a lot of questions that if you don't know the answers already, you should not be making changes to a robots.txt and as you will ruin your website, despite what you think. Ideally, having a section in the admin with sufficient warnings that allows a section that is appended to the default robots.txt could be good, but then again it would be a very advanced feature that user would need to understand some real important why and why nots to add pages there.
asillince Posted January 2 Posted January 2 On 8/8/2023 at 5:43 AM, BenCircular said: As a webdeveloper of 14 years experience, I 100% disagree. A robots.txt is extremely important to SEO and given Squarespace has intimate knowledge of their own systems that it would be impossible for everyday users to know they should absolutely set the default robots.txt. You need to have more trust in a company that literally creates and runs millions of website rather than some article you read online....context is key here. You get the benefit of their expertise, and essentially they are saving you from killing your SEO in 100s of different way by having a misconfigured robots.txt or missing configurations because you don't know all the paths Squarespace has available. You have asked a lot of questions that if you don't know the answers already, you should not be making changes to a robots.txt and as you will ruin your website, despite what you think. Ideally, having a section in the admin with sufficient warnings that allows a section that is appended to the default robots.txt could be good, but then again it would be a very advanced feature that user would need to understand some real important why and why nots to add pages there. I would have to disagree with you on this - I have worked in e-commerce and SEO for 15 years and being able to control your own robots.txt is a pretty essential functionality. I'm not so interested on what Squarespace chooses to not index by default, I trust them on that too, but I have additional pages on my site that I want to be able to block from being indexed/crawled and as far as I can see I'm not able to define this and without robots access I don't have the power to. There are many people out here using squarespace sites who aren't developers, but have a lot of web experience and do need to be able to manage the SEO of their sites properly. If Squarespace wants to claim to be a good platform for SEO then they need to be providing the tools for businesses to make that happen. I've worked with many major e-commerce and web platforms and consider access to this to be a very standard feature these days. IgorAvidon 1
paul2009 Posted January 2 Posted January 2 1 hour ago, asillince said: I have additional pages on my site that I want to be able to block from being indexed/crawled and as far as I can see I'm not able to define this Squarespace has a built in feature that allows you to hide pages (and collections) from search engines. You'll find this setting in the settings panel of each page, within the SEO tab. To find this, hover over a page title in the PAGES panel and click the gear icon next to it. Me: I'm Paul, a SQSP user for >18 yrs & Circle Leader since 2017. I value honesty, transparency, diversity and good design ♥. Work: Founder of SF.DIGITAL. We provide high quality original extensions to supercharge your Squarespace website. Content: Views and opinions are my own. Links in my posts may refer to my own SF.DIGITAL products or may be affiliate links. Forum advice is completely free. You can thank me by selecting a feedback emoji. Buying a coffee is generous but optional.
ChrisL123 Posted January 5 Posted January 5 (edited) Google Search Console is telling me that my https://home.mydomain.com/robots.txt file is 404 Not Fetched (Not Found) - Hostload Exceeded. This is a Squarespace homepage and I'm not sure why it's not working correctly. I checked the SEO settings and everything looks normal. Any ideas? Edited January 5 by ChrisL123
asillince Posted May 24 Posted May 24 On 1/2/2024 at 5:58 PM, paul2009 said: Squarespace has a built in feature that allows you to hide pages (and collections) from search engines. You'll find this setting in the settings panel of each page, within the SEO tab. To find this, hover over a page title in the PAGES panel and click the gear icon next to it. As far as I can see this doesn't apply to individual product pages. Can you confirm that this is correct and if there is a way to do that?
creedon Posted May 24 Posted May 24 2 hours ago, asillince said: As far as I can see this doesn't apply to individual product pages. Can you confirm that this is correct and if there is a way to do that? It is true that the products themselves do not have settings to control SEO access. The SEO setting applies to the whole collection. Two possible solutions. Create two Store pages (collections). One that allows crawlers and the other not. Please see the following. The issue in this thread is different but the JavaScript would be similar. I know of no code that currently does this. Find my contributions useful? Please like, upvote, mark my answer as the best ( solution ), and see my profile. Thanks for your support! I am a Squarespace ( and other technological things ) consultant open for new projects.
IgorAvidon Posted June 3 Posted June 3 On 1/2/2024 at 8:58 AM, paul2009 said: Squarespace has a built in feature that allows you to hide pages (and collections) from search engines. You'll find this setting in the settings panel of each page, within the SEO tab. To find this, hover over a page title in the PAGES panel and click the gear icon next to it. The problem is not with Squarespace URLs - it's actually with pages that DON'T exist on the Squarespace site, but DID exist on the prior site. Case in point: we've moved multiple clients from WordPress to Squarespace. In each case, old WP URLs are actively being crawled by Google to this day. We need the option to disallow Google from crawling folders like /wp-includes/ and /wp-content/ because that content is simply no longer there. P.S. A 301 redirect is not an appropriate solution here because the old URLs are spammy content caused by a hack of the old website. Squarespace does not offer an option to set a 410 server response on an individual URL level either (another feature Squarespace should absolutely offer to site administrators). Premier Los Angeles SEO Agency
Collaborada Posted June 8 Posted June 8 Regarding 410s, a 404 status code is fine. Building high-performance websites with expert SEO for local and national reach. Have an SEO question? Google to find our advice or book a Zoom session for tailored help. ★ Squarespace since 2013 ★
sweatmeadow Posted June 17 Posted June 17 I got it, we're not allowed to edit robots.txt Then please let me know how else I can fix the 401 mistake to make the website crawled by Google. It seems that the robots.txt blocks crawling ilseS 1
ilseS Posted June 29 Posted June 29 (edited) I am at a complete loss, hoping someone can help. Suddenly nothing is showing up for my website, see screen shots below. I haven't added any new code to my site, none of my main pages are toggled to be hidden from search results, and I've spent hours trying to figure out how to fix this. I did toggle off AI Crawlers, but nothing else. Thanks in advance for any insight or help. Edited June 29 by ilseS More detail
ilseS Posted June 29 Posted June 29 @paul2009 or anyone here, I wonder if you might be able to help? Im out of my league with Robots.txt suddenly my site and all of its pages appear to be blocked on google due to a Rotots.txt setting ...and I have ZERO idea why. I've not done anything new, not installed any strange code, and am, at a loss ... Any help would be tremendously appreciated 🙂
creedon Posted June 29 Posted June 29 (edited) 1 hour ago, ilseS said: suddenly my site and all of its pages appear to be blocked on google due to a Rotots.txt setting I suspect crawlers in Settings > Website > Crawlers has been disabled. Looking at your robots.txt we can see the disallow. Please see Excluding your site from AI scans. Let us know how it goes. Edited June 29 by creedon Find my contributions useful? Please like, upvote, mark my answer as the best ( solution ), and see my profile. Thanks for your support! I am a Squarespace ( and other technological things ) consultant open for new projects.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment